For rasprsivanja html using libxml2. In General satisfied, but I want something fast.
Watched some opensorce search engines (Xapian, Dataparksearch) — they have their own parsers. To deal with their sources and adapted to their needs — is not yet ripe, although close to it.
If anyone knows of other open parsers, lighter and more nimble than libxml2? I either Google or Yandex could not help. Maybe not as asked.