How to iteratively generate a subset of k elements from a collection of size n in Java?


User550617:

I'm working on a difficult problem that involves analyzing k subsets of all sizes and finding which subset is the best. I wrote a solution that works when the number of subsets is small, but for larger problems it runs out of memory. Now I'm trying to convert an iterative function written in python to java so that I can analyze each subset as I create it and get only the values ​​that represent how well it's optimized, not the entire set so I don't run out Entire set of memory. Here's what I have so far, even minor issues don't seem to be resolved:

public static LinkedList<LinkedList<Integer>> getSets(int k, LinkedList<Integer> set)
{
    int N = set.size();
    int maxsets = nCr(N, k);
    LinkedList<LinkedList<Integer>> toRet = new LinkedList<LinkedList<Integer>>();

    int remains, thresh;
    LinkedList<Integer> newset; 
    for (int i=0; i<maxsets; i++)
    {
        remains = k;
        newset = new LinkedList<Integer>();
        for (int val=1; val<=N; val++)
        {
            if (remains==0)
                break;
            thresh = nCr(N-val, remains-1);
            if (i < thresh)
            {
                newset.add(set.get(val-1));
                remains --;
            }
            else 
            {
                i -= thresh;
            }
        }
        toRet.add(newset);
    }

    return toRet;

}

Can someone help me debug this function or suggest another algorithm to iteratively generate subsets of size k?

EDIT: I finally got the function to work, I had to create a new variable identical to i to do the i and thresh comparisons because python handles loop indices differently.

Deserving:

First, if you plan to have random access to lists, you should choose a list implementation that can efficiently support lists. From the javadoc on LinkedList:

All operations are performed as expected for a doubly linked list. Indexing into a list will traverse the list from the beginning or end, whichever is closer to the specified index.

ArrayList is not only more space efficient, but also faster for random access. In fact, you can even use normal arrays since you know the length in advance.

Algorithm: Let's start simple: how to generate all subsets of size 1? Probably something like this:

for (int i = 0; i < set.length; i++) {
    int[] subset = {i};
    process(subset);
}

where process is a method that does something with a collection, such as checking if it's "better" than all the subsets processed so far.

Now, how would you extend it to work with a subset of size 2? What is the relationship between a subset of size 2 and a subset of size 1? Well, any subset of size 2 can be turned into a subset of size 1 by removing its largest element. In other words, each subset of size 2 can be generated by taking a subset of size 1 and adding new elements that are larger than all other elements in the set. In code:

processSubset(int[] set) {
    int subset = new int[2];
    for (int i = 0; i < set.length; i++) {
        subset[0] = set[i];
        processLargerSets(set, subset, i);
    }
}

void processLargerSets(int[] set, int[] subset, int i) {
    for (int j = i + 1; j < set.length; j++) {
        subset[1] = set[j];
        process(subset);
    }
}

For subsets of arbitrary size k, note that any subset of size k can be converted to a subset of size k-1 by shredding the largest element. That is, all subsets of size k can be generated by generating all subsets of size k-1, for each subset, and for each value greater than the maximum value in the subset, add that value to the set . In code:

static void processSubsets(int[] set, int k) {
    int[] subset = new int[k];
    processLargerSubsets(set, subset, 0, 0);
}

static void processLargerSubsets(int[] set, int[] subset, int subsetSize, int nextIndex) {
    if (subsetSize == subset.length) {
        process(subset);
    } else {
        for (int j = nextIndex; j < set.length; j++) {
            subset[subsetSize] = set[j];
            processLargerSubsets(set, subset, subsetSize + 1, j + 1);
        }
    }
}

Test code:

static void process(int[] subset) {
    System.out.println(Arrays.toString(subset));
}


public static void main(String[] args) throws Exception {
    int[] set = {1,2,3,4,5};
    processSubsets(set, 3);
}

But before making calls on large collections, keep in mind that the number of subsets can grow rapidly.

Related


Algorithm to generate K elements from a slice of N elements

Andrew LeFevre: I'm trying to port an algorithm from a Stackoverflow question in Go . The algorithm I'm trying to use is the following: Given a string slice of arbitrary length and a "depth", find all combinations of elements of length depth in the original sl

Algorithm to generate K elements from a slice of N elements

Andrew LeFevre: I'm trying to port an algorithm from a Stackoverflow question in Go . The algorithm I'm trying to use is the following: Given a string slice of arbitrary length and a "depth", find all combinations of elements of length depth in the original sl

Get a subset of elements from a collection in a loop

j Consider a set with 100 elements. Set<String> originalSet; //[1....100] size is 100 From the originalSet, a subset of (m) elements of size (n) with some starting index (i) must be retrieved. example: m = 4, n = 45, i = 1 The following must be retrieved s

Subset n elements from a list

Matias Andina I have such a big list str(sep) List of 54 $ AK:'data.frame': 5 obs. of 3 variables: ..$ nombre : Factor w/ 4510 levels "ABBEVILLE AREA MEDICAL CENTER",..: 3085 40 1119 39 2176 ..$ heartattack: num [1:5] 13.4 14.5 15.5 15.7 17.7 ..$ s

Subset n elements from a list

Matias Andina I have such a big list str(sep) List of 54 $ AK:'data.frame': 5 obs. of 3 variables: ..$ nombre : Factor w/ 4510 levels "ABBEVILLE AREA MEDICAL CENTER",..: 3085 40 1119 39 2176 ..$ heartattack: num [1:5] 13.4 14.5 15.5 15.7 17.7 ..$ s

Subset n elements from a list

Matias Andina I have such a big list str(sep) List of 54 $ AK:'data.frame': 5 obs. of 3 variables: ..$ nombre : Factor w/ 4510 levels "ABBEVILLE AREA MEDICAL CENTER",..: 3085 40 1119 39 2176 ..$ heartattack: num [1:5] 13.4 14.5 15.5 15.7 17.7 ..$ s

How to return N consecutive elements from a collection?

double07: I pass a collection of objects (in my case some Contact class) and need to return a page from this collection. My code feels a lot longer than it needs to be. Am I missing some library that performs more elegantly than if I iterate over each element

How to return N consecutive elements from a collection?

double07: I pass a collection of objects (in my case some Contact class) and need to return a page from this collection. My code feels a lot longer than it needs to be. Am I missing some library that performs more elegantly than if I iterate over each element

How to randperm the k first elements of a vector with size N

user 3051460 I am using randperm to randomize the elements in a vector. However, I have a problem that I have a vector of size N, and I only want the first k elements in the vector to be random permutations, with the Nk elements unchanged. How to do it with ma