There is a good library from facebook called fasttext. I recently started studying and I need help.
Now I do not quite understand how to find the line (say the title of the news as Yandex) similar news.
If I understand correctly, the mapping is searched for vectors. That is based on the model (here also do not quite understand how to make training such a model on the basis of fasttext) we get a vector on the proposal (headline news).
Next we get the new news and build vector on it. Both vectors finding the cosine and obtain the resulting ratio of similarity. But in this case we get that there are millions of news into a database, and we need to get for each piece of news to build a vector where the value falls within a certain percentage - there and to include?
In General, I need help in understanding:
1) how to train such a model on the basis of fasttext. In official documentation I didn't understand the principle of the preparation of the training data using a label. Indeed, in this case I will have a lot of labels, as no news could very much. How to add a new news model? And do I need to do?
2) how to make a comparison. There is enough description of the algorithm. If there is an opportunity to reinforce a formula, or an imaginary example - I would be very grateful.
Also I would be grateful, if will prompt where to read. Delve into the jungle is not necessary, since "magic" is implemented in the library is for self-development. A little basic understanding of I have. But the essence of the library is to hide the user from complicated calculations. That is the kind of information you would like to receive: here is the article - it tells how on the basis of such information, to obtain such results.
Very waiting for your directions (but not in Google).