Advise algorithm of string comparison with the working principle like this:
'Ivan Ivanovich Ivanov' = 'Ivanov Ivan Ivanovich'
'Ivan Ivanovich' ~ 'Ivanov Ivanovitch'
'Ivan Ivanovich Ivanov in the morning goes without pants' != 'Ivanov Ivan Ivanovich dress pants at night'
That is, you need to find the coefficient of similarity of strings, taking into account the fact that the words in the string can be swapped.
UPD: it Seems to have come up with:
a — array of words of the first line
b — the array of words of the second line
n is the number of words of the first line
m is the number of words of the second line
S — the coefficient of similarity of words a[i] and b[j] (you can use soundex or Levenshtein distance)
K = (C11 + C12 +... + С1м + C21 + C22 +... + C2m +... + Cnm) / ((n + m) / 2)
Total for instance, suppose Cij is calculated as
a[i] == b[j] ? 1 : 0
a = ['Ivan', 'Ivanitch', 'Ivanov']
b = ['Ivanov', 'Ivan', 'Ivanovich']
K = (0 + 1 + 0 + 0 + 0 + 1 + 1 + 0 + 0) / ((3 + 3) / 2) = 3 / 3 = 1 — strings are the same
a = ['Ivan', 'Ivanovich']
b = ['Ivanov', 'Ivanovitch']
K = (0 + 0 + 0 + 1) / ((2 + 2) / 2) = 1 / 2 = 0.5 — similar, but not equal
Thank you hamMElionreminding me to break lines at word %)