How to make search results faster than GOOGLE?


Warning: count(): Parameter must be an array or an object that implements Countable in /home/styllloz/public_html/qa-theme/donut-theme/qa-donut-layer.php on line 274
0 like 0 dislike
41 views
Results: about 104 000 000 (0.37 seconds.)
Even with the local cache this is very difficult to achieve, and then the http request, so many results, Windows 10 looks for the file in the system hundreds of times slower.
Should also take into account fuzzy matching and the issuance of recommendations in addition to those that already found.

How achieved this speed?
How can I make it faster?
by | 41 views

4 Answers

0 like 0 dislike
The key word is "about". So the calculation is not a direct calculation of all variants.

Windows 10 looks for the file in the system hundreds of times slower

1. You do not compare the performance of your "bowl" and the server Google
2. Don't compare OSes sharpened on multifunctionality and the OS is designed to search
3. They yuzayut neural networks: https://neurohive.io/ru/papers/mnasnet-avtomatizac...

How achieved this speed?

A lot of money + a lot of specialists

But the speed is not important, there are other figures. For example in 2011 Google had indexed about 1 trillion unique URLS. Now probably orders of magnitude more.

More reading on the topic: https://www.insight-it.ru/highload/2011/arkhitektu...

There is no single technology with the famous name. You don't expect to see a reply like: "So they yuzayut SPSQL + FastFRS server (and 1)". Everything is much more complicated and no details. Open only the General data and the old.
by
0 like 0 dislike
It's a pretty simple implemented:
Read, for example:
Oleg Bartunov, Alexander Korotkov
GIN Improvements
Full-text search in PostgreSQL in milliseconds
PGConf.EU-2012, Prague
\rhttps://wiki.postgresql.org/images/2/25/Full-text_...

Google is not in speed attainment. And the adequacy of the result - "relevance" is called.
Smart neyroseti and everything.

And only for speed good banal FTS.
The algorithm there is a primitive, you can even on the weekends to warm up implement.

Or just use the ready-made very fast decision
\rsphinxsearch.com

On your local computer slowly, because this is not its main function.
If the developers believed that search is a function of paramount importance - it would be allocated just more resources for indexing, index storage, and more RAM for caching, etc.
But this would require to take resources from more important functions of the computer.

The algorithm FTS:

Training:

1) Divide the text into words
2) Discard function words (prepositions, etc.). The resulting so-called tokens.
3) Run the resulting word-tokens using the algorithm of stemming snowball.tartarus.org/algorithms/russian/stemmer.html
4) the Received words without endings (called Thermae) stuffed in roaringbitmap.org
Will look like this:

The source objects for the search

a) "Hey, bear"
b) "bear force"

a) -> "Hello", "bear" -> "Hello", "the bear"
b) -> "bears", "power" -> "bear", "SIL"

In the index like this:

"Hello" 10
"bear" is 11
"forces" 01

A search for the word "bear":

1) Turn to the index, we get 11 that oznacena as in the first and in the second sentence is of interest to us the word.

2) Sort the result by relevance
\rnlpx.net/archives/57 or https://ru.wikipedia.org/wiki/Okapi_BM25

Search for the phrase "Hey, bear":

1) Go by the first word, we get 10
2) Looking for the second word, we get 11
3) the Broken operation on the intersection of the results 10
4) Sorted by relevance

It is easy to notice:

a) Algorithm stemming may mess up
b) Relevancy is calculated purely mechanical

But with the speed, still no problem.

On the local computer is just not the main focus.
Do a quick search of the local - no problem.
by
0 like 0 dislike
That will tell you everything, right?) There optimization at each step, starting from the CDN and the software code ending with data indexing and parallel queries to databases.
by
0 like 0 dislike
This is "about" is the focus. Constant updating of the number of pages for each query + the formation of the first few pages of search engine results in the form of static html and now the answer to your question is not going out of the vault and given the fact that already formed.
by

Related questions

0 like 0 dislike
2 answers
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
110,608 questions
257,186 answers
0 comments
35,390 users