Sampling similar results / MySQL

0 like 0 dislike
There is a base with.
John DOE
Familky L.
Ludwig Aristarkhovich Familky
The Bullies Cyril

It is necessary to select and count the number of similar records with PHP/MySQL.
For example:
DOE - 3 PCs.
Familky - 3 PCs.
Derzhimorda - 1 PC.

How to implement?
by | 4 views

3 Answers

0 like 0 dislike
You first decide how you will allocate the cluster. If your add list "Ludwig aristarkhovich DOE" — where it belongs? To DOE, or to Ludwiga, or both at the same time?
After you decide how to allocate the cluster, you can use the Levenshtein distance to determine cluster membership.
0 like 0 dislike
I. If you want to count the same number of words, ie:
DOE found 3 times
Bob — it occurs 2 times etc, it can be done either through stored procedures or other brute force (in php), or (better) to normalize the data to one line only had one word, then you can use the GROUP BY operator.
II. If you want to count the number of the same names (which I think you want), then the database again it is better to normalizovat on this principle:
1. The longest word in the string to take the name, unless it ends in "ICH", "ech" and other harakternye for patronymic the end, otherwise take the next longest word in the string.
2. The shortest word in the string to take the name.
These data is recorded in a table with columns Name and Surname (or some other suitable format) and you can save the keys on the source zapisi. Then on the field Name to use GROUP BY.
Something like that. Your problem is the unnormalized data.
0 like 0 dislike
If the solution must be lightweight and it should be done as soon as possible — add two fields, fill them broken into words (on whitespace and trim point) name, and look at him OR LIKE

Related questions

0 like 0 dislike
6 answers
asked Mar 24, 2019 by ukku
0 like 0 dislike
1 answer
0 like 0 dislike
3 answers
110,608 questions
257,186 answers
32,874 users