Index Ofa Sub Set

I need some help here. But you could consider it a MathQuiz. I have been searching for an algorithm that does the following:

Given a set A and a subset of it B. The elements of A have a total order. Determine the index of the B in the sequence of all subsets of A ordered lexicographically.

The difficulty is in doing so without creating the sequence explicitly, but to calculate the index from A and B alone.

Example:

A = {1,2,3,4}

B = {1,2,3}

The ordered sequence of subsets of A is

{}
{1}
{1,2}
{1,2,3} <----
{1,2,3,4}
{1,2,4}
{1,3}
{1,3,4}
{1,4}
{2}
{2,3}
{2,3,4}
{2,4}
{3}
{3,4}
{4}

Thus the position of B is 4.

This is useful if you have some property associated with each element of the sequence, but do not what to store the key and search for it (especially if the property is just a single or a few bits). In this case you can just calculate the index directly and do an array-lookup.

Interesting special case (maybe this is easier): We are interested only in subsets of a certain length:

Given a set A and a subset of it B. The elements of A have a total order. Determine the index of the B in the sequence of all subsets of the same cardinality of A ordered lexicographically.

Example:

A = {1,2,3,4,5}

B = {2,4}

The ordered sequence of subsets of A is

{1,2}
{1,3}
{1,4}
{1,5}
{2,3}
{2,4} <---
{2,5}
{3,4}
{3,5}
{4,5}

The index of B in the sequence is 6.

I already know, that at least the latter special case should be solvable by evaluating a polynomial on the index of the elements in their total order.

The 2-element subset case looks as follows:

A ={1,2,...,n) B ={b1,b2}

index = n*(n-1)/2 - (n-b1)*(n-b1+1)/2 + (b2-b1)

-- GunnarZarncke

If this is the only thing you need the subset indices for, then it seems that you don't really need the indices from a lexicographic order; any order on the subsets will do just fine. If A has N elements, then there is a simple mapping between subsets of A and unsigned integers with at most N bits. If you have an ordered array containing the N elements of A, and a possibly unordered array containing the M elements of B, you can compute the index of B in O(M log N) operations. (For each element of B, you need to look up its index in A, then do a bit shift and an addition.) -- JosephDale

You are right, for the general case the bitset encoding is the better choice if you want to achieve the stated objective (save memory). I knew the set-encoding, but didn't think about, because I'm actually more interested in the special case. I cannot also use the set-encoding there, because it creates too much overhead. What I need is e.g. all 8-of-32 sets. -- .gz

Ah, I didn't realize that that was what you were looking for. Here's an idea. Suppose that x1 < x2 < x3, and we want the index of the set {x1, x2, x3} in the lexicographic ordering of all 3-of-N sets. The number of sets that precede {x1, x2, x3} in the ordering is equal to:

the number N1 of sets that start with a number strictly less than x1, plus
the number N2 of sets that start with x1 and a number strictly between x1 and x2, plus
the number N3 of sets consisting of x1 and x2 and a number strictly between x2 and x3.

N1 can be obtained by evaluating (x1 - 1) binomial coefficients; N2 by evaluating (x2 - x1 - 1) binomial coefficients; and N3 is equal to (x3 - x2 - 1). Then if we start counting from one, the index of {x1, x2, x3} is (N1 + N2 + N3 + 1). This generalizes to M-of-N sets, and it looks like we need to evaluate about (N - M) binomial coefficients to get each index, which is not so bad. I tested this on a few 3-of-8, 3-of-9, and 4-of-9 sets, and it seems to work, but let me know if I've made a blunder somewhere. -- JosephDale

That's exactly, what I have come up with by now too. I will post an algorithm shortly. One optimization: I think, that the binominal coefficients may be related and so we can factor out some multiplications. By the way: Do you know an otherwise efficient binominal coefficient calculation for java? -- .gz

This was my initial code (java):

    public static int indexOfASubSet(int setMaxElement, int[] orderedSubSet1Based)
    {
        int setSize = orderedSubSet1Based.length;
        int indexAcc = 0;
        for ( int posInSubSet = 1; posInSubSet <= setSize; posInSubSet++ )
        {
            int a = combinationsOf(setSize - posInSubSet + 1, setMaxElement - (setSize - posInSubSet) - (posInSubSet == 1 ? 0 : orderedSubSet1Based[posInSubSet - 1 - 1]));
            int b = combinationsOf(setSize - posInSubSet + 1, setMaxElement - (setSize - posInSubSet) - (orderedSubSet1Based[posInSubSet - 1] - 1));
            indexAcc += a - b;
        }
        return indexAcc;
    }

    public static int combinationsOf(int numberOfElements, int setMaxElement)
    {
        long accu = 1;
        for ( int i = 0; i < numberOfElements; i++ )
        {
            accu *= (setMaxElement + i);
        }
        for ( int i = 2; i <= numberOfElements; i++ )
        {
            accu /= i;
        }
        return (int) accu;
    }

Yes, the func "numberOfElements" is in fact a binomial coeff. And the two call for it can be combined.

I searched the web for an algorithm or even a mathematical description, but could'nt find any. Does anybody know, if this IndexOfaSubSet algorithms has some special name? -- .gz

CategoryMath