Hi, exactly, I certainly can not say, but most likely used a Fourier transformation (FFT), and an appropriate set of filters. The filters cut the noise above and below any levels (the ear hears anywhere from 6 to 20,000 Hertz, but I think the cut they make a lot already). Then this slice is decomposed using FFT. Can you imagine that as a decomposition of the songs on frequency components. A lot of bass coefficients of the lower frequencies will be higher. A lot of the high sounds (hi-hat) — top components will be higher. It turns out the imprint of the song. When you load up your recording with the game server tries to find the most similar fingerprint.
There certainly are trained AI algorithms and data mining to look for the most effective. The simplest example is an artificial neural network (everything is in the wiki).