Eliminating duplicate rows with Mysql?


Warning: count(): Parameter must be an array or an object that implements Countable in /home/styllloz/public_html/qa-theme/donut-theme/qa-donut-layer.php on line 274
0 like 0 dislike
6 views
Hello,

The task was to razdobit about 60GB string data. Unique among them, about 25-30%

Decided to use a mysql unique index for this.


Questions:
1. Unique is better to make a field with the row (1-5 words) or better take first the crc32 of this string, and the hash is to hang a unique index?

2. Is it possible to use consternee semblance of particioniranja, but not at the table level and at the DB level?

For example, to divide the data by the first letter of the string (get 28 physical bases), and at the same time to only complete one of them, thereby reducing the consumption of RAM?
by | 6 views

4 Answers

0 like 0 dislike
definitely in mysql?
if the data in the text file it is possible on the AVC:
awk '!t[$0]++' data.txt
works faster than anything else, returns the first unique string, but will require RAM or virtual memory for the entire array, and is directly proportional to the count of unique rows.
\r
States for dividing a super idea. but Pobjeda order, and also need hands to list all the options, for example the initial letter of the string.
\r
awk '/^[A-F]/ {print $0 >> file1_AF.txt } /^[G-M]/ {print $0 >> file2_GM.txt } ...'
processing each of these files, the same awk filter will require less memory.
by
0 like 0 dislike
The source data is in text files are. Subject to fill.
In General, the index for crude you can add this:
\r
ALTER IGNORE TABLE `test` ADD UNIQUE (`text`)
by
0 like 0 dislike
and, i.e. you want to create a database with a unique key and try her importnat all rows?
about particioniranja don't know, I'm such Essentials are not met
assume the hash is not recommended, especially 1-5 words and very short text,MySQL itself will cope with the task
by
0 like 0 dislike
Unfortunately, only (no)mysql, there is also razdavlivanija more advanced functionality is needed. Settled on mysql as only worked with her tight and experience an order of magnitude greater than with other DBMS.
by

Related questions

0 like 0 dislike
1 answer
0 like 0 dislike
7 answers
asked Mar 21, 2019 by mihavxc
0 like 0 dislike
5 answers
0 like 0 dislike
3 answers
0 like 0 dislike
1 answer
asked Apr 9, 2019 by ART4
110,608 questions
257,186 answers
0 comments
26,908 users