Wednesday, April 27, 2016

Modelling a Subset problem: An interesting sequence obtained from an orbits simulator (III)

This is the third part of the previous posts about the topic "an interesting sequence obtained from an orbit simulator" and its second part.

Several months have passes and still I do not know very much about the properties of my initial subset problem, so what I am trying to do is first trying to write all more properly, so instead of talking about an example (an orbits simulator) I am trying to explain the algorithm mathematically. So here is what so far I was able to do:

It seems that my original problem could fit into this area:

>Partition Problem > multi-way partition problem

Which is defined as (Wikipedia) "the optimization version of the partition problem. Here, the goal is to divide a set or multiset of n integers into a given number k of subsets, minimizing the difference between the smallest and the largest subset sums."

For that reason I am using as a lead paper to gather ideas:  "Korf, Richard E. (2009). [Multi-Way Number Partitioning (PDF)][2]. IJCAI."

...but I am not really sure if my defined problem could be included in that kind of generic problems because instead of using a set of integers I am using a "FIFO first-in-first-out pile" of integers F, and the subsets are created "on the fly", meaning that you just can take the head element when using the algorithm that creates the partitions to populate the generated subsets. Here is the description / algorithm of the problem:

1. Initially there is a FIFO set F and an empty list S of subsets. Every time a subset is generated it will have an index i, for instance S_i. The first subset will be S_0, then S_1, etc. The main rule is that in every moment the sum of the elements of any S_j will be smaller or equal to the sum of the elements of any S_i with an index i smaller than j.

2. Take the head element of the FIFO set and

a) If the sum of the elements of the subset with greatest index $S_k$ is smaller or equal to the new element, then create a new subset $S_{k+1}$ and insert the element.

b) If not, then insert the element in the existing subset S_j with the greatest possible index j such as when the element is aggregated the sum of the subset S_j is still smaller or equal than any sum of any subset S_i with lower index i than j. If no subset with index greater than 0 complies with the rule, then the element is inserted into S_0 (if S_0 still was not generated, it is created and then the element is inserted).

3. Repeat from step 2 until the FIFO set is empty.

Example with a FIFO set F= N^{+} containing the natural numbers not including 0 in strictly increasing order {1,2,3,4,5...}:

1. Take the head element, 1 , a) look for the existing subset available at S with the greatest index. As S is still S empty, create one (in this case S_0) and add it S_0={1}.

2. Take head element, 2, and a) go the the subset with available with the greatest index, in this case S_0. As 2 is not smaller than the sum of the elements of the subset (1) then applying b) 2 is inserted into S_0.

3. Take the head element, 3, and go the the existing subset with a greatest index (in this case there is only one S_0). a) As the new element 3 is smaller or equal to the sum of the elements of S_0 (3) it is possible to create a new subset. So the element is added to S_1, S_1 = {3}.

So far we got two subsets S={S_0,S_1} and the sum of S_j <=  the sum of S_i for all i<j.

4. If the process is repeated for the next element head of the pile, 4, as the sum of S_1 is 3 not >= 4 then we can not create a new subset, so we try to add 4 to S_1, but if we add 4 to S_1 then the sum of S_0 (3) will not be greater or equal to the sum of S_1 that would be 3+4=7 so we can not add 4 to S_1. We go down one level, now we try to add 4 to S_0 as there are not other subsets lower than S_0 we can add 4 to S_0.

So far the subsets are:

S={S_0={1,2,4}, S_1={3}}

After some repetitions of this algorithm, the subsets look like this (this image was already shown in the first post I did about the topic):




There are two interesting subsets there, S_0 and the set of first elements of each S_i, I will call it F.

S_0={1,2,4,5,8,12,17,19...}. S_0 will always be the subset with the greatest value of the sum of its elements.

F={1,3,7,11,16,25...}. The elements of this subset are those elements of the initial FIFO set capable of forcing the generation of a new subset.

Tried to find them at OEIS, but unluckily they are not included.

When I thought about this problem for the first time (in the first post above mentioned) I was thinking about an application of this algorithm regarding a model for energy levels in different orbits. Imagine a "nucleus" that can have several orbits around, initially there are no orbits, then arrives a chunk of energy 1 and it is located in the lower orbit equivalent to S_0, then arrives another chunk of energy 2 but it can not be located in a second orbit or level of energy because the energy of a greater level can not be greater than the energy in the lower levels, so it remains at level S_0. But when the next chunk 3 arrives, it can be located in a new level of energy because there is at least the same quantity of energy in the lower levels.

By doing the algorithm as above, always the lower levels (the levels closer to the "nucleus") have more energy than the outer levels, and always any level of energy. Or saying the same from the mathematical model: the previously generated subsets will always have a sum greater or equal than any other lately generated subsets.

Basically I am lost about how to catalog this (and then expand it, study the properties, etc.) so I have also asked a question regarding to this topic at MSE (link here).

Monday, April 11, 2016

Can we replace the upper limit condition of the Sieve of Eratosthenes sqrt(n) with the value sqrt(p) where p is the closest prime < n ?

By chance I stumbled upon the OEIS list A033677 of the smallest divisor of n greater or equal to sqrt(n). Roughly speaking if we use the classic enhanced sieve of Eratosthenes, sqrt(n) is the typical upper limit to find a divisor d>1 of n, so if n is composite then it is expected that exists d in [2,sqrt(n)].

(*This is a mirror of my same question at MSE.)

I did the same exercise but instead of sqrt(n) using as the limit the square root of the previous closest prime p to n and made the calculation of the smallest divisor of n greater or equal to sqrt(p).

Testing with Python and if I did not make an error, curiously I obtained exactly the same list than A033677 for every n in [3,6*10^6] (after that point it gets slower, I will try with PARI/GP later). So I wonder if it is possible to use the above mentioned sqrt(p) upper limit instead of sqrt(n) to find a divisor of n when sieving.

This is the (very basic) Python code that compares one by one the elements of the sequence A033677 with the sequence of smallest divisor of n greater or equal to sqrt(p), please feel free to use it:

    from sympy import prevprime, divisors
        for n in range (3,10000000):
            A0033677_Value=0
            for d in divisors(n):
                if d>=sqrt(n):
                    A0033677_Value=d
                    break
                   
            for d in divisors(n):
                if d>int(sqrt(prevprime(n))):
                    if A0033677_Value == d:
                        print("SUCCESS "+str(n))
                    else:
                        print("ERROR " + str(n) + "\t" + str(A0033677_Value)+"\t"+str(prevprime(n))+"\t"+str(d))
                        return
                    break


If the observation is true, the property would be something like this:

for all n (composite) exists d: d|n, d in [2,sqrt(p1)], p1 is prime, p1<n, and not exists p2: p1 < p2 < n.

For instance for n=20,21,22,23 we would look for a divisor of n in the interval [2,sqrt(19)] because 19 is the closest prime lower than those n. And we already sieved 19 in a former step so it is in our list of already sieved primes and thus available to be used (meaning we already know at this point that 19 is prime). Then applying Eratosthenes as usual it would be detected that n=20,21,22 are composite and n=23 is the next prime.

In some cases (e.g when n is a perfect square) having the same divisor d over sqrt(n) and sqrt(p) does not imply that we can find a divisor at the interval [2,sqrt(p)] and it is required to go up to [2,sqrt(p)+1], meaning that the first divisor is exactly sqrt(p)+1.

If true I am not sure if it might be significant. By Bertrand's postulate, there should be a prime between [n/2,n], so in the best of cases, the previous prime would be p=n/2, so it would imply for that best of cases the use of sqrt(n/2) as the upper limit for the sieve instead of sqrt(n).

So I wonder, would it be possible to use sqrt(p) instead of sqrt(n)? Was already in use that approach? Is there a counterexample (probably very big)? If not, what could be the reason behind it?

The code below tests the Sieve of Eratosthenes using the modified algorithm as explained above, including the special condition for perfect squares. It seems to work so far up to 10^5.
    from gmpy2 import is_prime, is_square
        # modified Eratosthenes, upper sieving limit sqrt(p)
            lopM=[2] #list of already sieved primes
            lp=2 # last prime sieved
            for pos in range(3,1000008):
                ul=int(sqrt(lp)) #upper limit
                if is_square(pos):
                    ul=ul+1 # correction for perfect squares
                    # basically: if is a perfect square we would continue for
                if ul==1:
                    ul=2 # exclude 1 of possible divisors
                comp=False #composite detected
                for j in lopM: # using the list of already known primes
                    if j in range(2,ul+1): # while j is under the upper limit
                        if (pos%j)==0:
                            comp=True
                            break
                    else: # we arrived to the upper limit
                        break
                    if comp==False:
                        lopM.append(pos)
                        lp=pos
           
            for elem in lopM:
                if not is_prime(elem): # sanity check
                    print("ERROR, not prime " + str(elem))
                    return
            print("Primes sieved: " + str(len(lopM)) + " and last prime was " + str(lp))


So heuristically it seems correct, but is there a demonstration or a counterexample of it?