How to iteratively generate a subset of k elements from a collection of size n in Java?
I'm working on a difficult problem that involves analyzing k subsets of all sizes and finding which subset is the best. I wrote a solution that works when the number of subsets is small, but for larger problems it runs out of memory. Now I'm trying to convert an iterative function written in python to java so that I can analyze each subset as I create it and get only the values that represent how well it's optimized, not the entire set so I don't run out Entire set of memory. Here's what I have so far, even minor issues don't seem to be resolved:
public static LinkedList<LinkedList<Integer>> getSets(int k, LinkedList<Integer> set)
{
int N = set.size();
int maxsets = nCr(N, k);
LinkedList<LinkedList<Integer>> toRet = new LinkedList<LinkedList<Integer>>();
int remains, thresh;
LinkedList<Integer> newset;
for (int i=0; i<maxsets; i++)
{
remains = k;
newset = new LinkedList<Integer>();
for (int val=1; val<=N; val++)
{
if (remains==0)
break;
thresh = nCr(N-val, remains-1);
if (i < thresh)
{
newset.add(set.get(val-1));
remains --;
}
else
{
i -= thresh;
}
}
toRet.add(newset);
}
return toRet;
}
Can someone help me debug this function or suggest another algorithm to iteratively generate subsets of size k?
EDIT: I finally got the function to work, I had to create a new variable identical to i to do the i and thresh comparisons because python handles loop indices differently.
First, if you plan to have random access to lists, you should choose a list implementation that can efficiently support lists. From the javadoc on LinkedList:
All operations are performed as expected for a doubly linked list. Indexing into a list will traverse the list from the beginning or end, whichever is closer to the specified index.
ArrayList is not only more space efficient, but also faster for random access. In fact, you can even use normal arrays since you know the length in advance.
Algorithm: Let's start simple: how to generate all subsets of size 1? Probably something like this:
for (int i = 0; i < set.length; i++) {
int[] subset = {i};
process(subset);
}
where process is a method that does something with a collection, such as checking if it's "better" than all the subsets processed so far.
Now, how would you extend it to work with a subset of size 2? What is the relationship between a subset of size 2 and a subset of size 1? Well, any subset of size 2 can be turned into a subset of size 1 by removing its largest element. In other words, each subset of size 2 can be generated by taking a subset of size 1 and adding new elements that are larger than all other elements in the set. In code:
processSubset(int[] set) {
int subset = new int[2];
for (int i = 0; i < set.length; i++) {
subset[0] = set[i];
processLargerSets(set, subset, i);
}
}
void processLargerSets(int[] set, int[] subset, int i) {
for (int j = i + 1; j < set.length; j++) {
subset[1] = set[j];
process(subset);
}
}
For subsets of arbitrary size k, note that any subset of size k can be converted to a subset of size k-1 by shredding the largest element. That is, all subsets of size k can be generated by generating all subsets of size k-1, for each subset, and for each value greater than the maximum value in the subset, add that value to the set . In code:
static void processSubsets(int[] set, int k) {
int[] subset = new int[k];
processLargerSubsets(set, subset, 0, 0);
}
static void processLargerSubsets(int[] set, int[] subset, int subsetSize, int nextIndex) {
if (subsetSize == subset.length) {
process(subset);
} else {
for (int j = nextIndex; j < set.length; j++) {
subset[subsetSize] = set[j];
processLargerSubsets(set, subset, subsetSize + 1, j + 1);
}
}
}
Test code:
static void process(int[] subset) {
System.out.println(Arrays.toString(subset));
}
public static void main(String[] args) throws Exception {
int[] set = {1,2,3,4,5};
processSubsets(set, 3);
}
But before making calls on large collections, keep in mind that the number of subsets can grow rapidly.