To find the most appropriate sets of sequences (lots of combinations)

0 like 0 dislike
62 views
There is a set of bit sequences of a fixed length (L == 1024). The number of sequences of ~43000.
Among them to find the combination of N (N == 10) such that the number of enabled bits in a bitwise logical disjunction of the result is maximized?

For Example (L = 4, N = 2)
0100
0010
0110
1000
1100


The answer is:
0110
1100
---- check which bits we have included
1110


The task is complicated by the huge volume of data. The number of possible combinations makes sad.
Is it possible to solve this problem in a reasonable time, or to rely on the case and run for a couple of days a random algorithm?

upd
Either I'm stupid or don't understand the proposed solution. Search for the maximum count of bits has no effect. Illustrate:
L =4, N = 2, for a set
1110 — 3
1100 — 2
0110 — 2
0001 — 1

if you count to bits and sort (take the top N), we get
1110
1100

and need
1110
0001

upd2
I decided to do a search of "best results":
L =5, N = 3, for a set
11100 — 3
11000 — 2
01110 — 2
00010 — 1
00001 — 1

0) m = 11111
1), we obtain the best sequence for m (11111)
result = [11100]
now m = 00011
2) get the best sequence for m(00011)
result = [11100, 01110]
now m = 00001
3), we obtain the best sequence for m(00001)
result = [11100, 01110, 00001]
now m = 00000, finished

got the answer — [11100, 01110, 00001]
but this is not the only solution (step 2 fit 01110, 00010, 00001)
if the number of source data is large, the "best" results will be many. and, of course, not the fact that there are generally combination that will cover all the bits(what I need). in nature I get a lot of results (if "best" options I will choose a random). here are some of the "best" in the end, I will choose the most suitable for me (sort of count bits covered).
by | 62 views

6 Answers

0 like 0 dislike
In upd2 the right algorithm to choose the "best" possible first, it does not affect the resulting score. (If I have understood correctly — that You are considering all options with different "best" is chosen at each step?)
by
0 like 0 dislike
And what is the problem to count the bits in the sequence length of 1024 bits? They have what is stored? I think in the end this problem will be reduced to 1024 shifts. Load them in a 32 long and run through the array.
\r
\r
long[][] myArray = new long[43000][32]; int[] bitsCount = new int[43000]; //loaded in 32 long its sequence. 32*4*8 = 1024 bits <here your code> //define the number of set bits for(int i = 0;i<43000;i++) //loop over all sequences for (int j = 0;j<32;j++) //loop for all long am for(int k = 0;k<32;k++) //loop through all the bits of the long if((myArray[i][j] >> k) & 0x1) bitsCount[i]++; //then sort the array and take bitsCount N top. Or optimize sorting so that it is not fully sorted, but only looking for the top N //??? //PROFIT! 

\r
Will not work for very long. Certainly not a couple of days. Or I task not understand?
by
0 like 0 dislike
You only need to read the file is 5 megabytes by reading pieces of 10 kbps.
You need to read each item (with 1 bit by 1024, 2 at 1025 and so forth), but if a piece really gives a very bad result, it is possible to pass from K bits, where K is the difference between the result and a TOP10 result at the moment (so that nothing is missed).
\r
In my opinion work fairly quickly, although absolutely not algoritmico.
by
0 like 0 dislike
I have this vague feeling that this is a NP-complete task. But the algebra will be able to :(
by
0 like 0 dislike
Why reject the most obvious way:
Take any combination, then turns to try every remaining sequence, whether it can improve the mix (the main parameter is the increase in coverage area, side — the number of unit bits).
The complexity is linear
by
0 like 0 dislike
There is an example for L=22:
0011 1111 1111 1100 0000 00
0000 0000 1111 1111 1111 00
0101 0101 0101 0101 0101 01
1010 1010 1010 1010 1010 10
\r
Here, a greedy algorithm will generate the pair (1,2) — 18 units, and to withdraw from it will not succeed, except the other pair (3,4) giving only 17. So my heuristic doesn't work either.
by

Related questions

0 like 0 dislike
1 answer
0 like 0 dislike
2 answers
0 like 0 dislike
1 answer
110,608 questions
257,186 answers
0 comments
28,881 users