Friday, October 29, 2010

Migrating from Xapian to ElasticSearch

We need full text search capabilities in our frontend WWW interface to allow users to search through logs sent by embedded devices on the field. Quite recently, we changed the search backend from Xapian to ElasticSearch.


Xapian is an open source, GPL licensed C++ library that implements a rich set of features for indexing any type of documents, searching and ranking them. An application that uses Xapian embeds it by linking to the C++ library. There's no server involved whatsoever, unless your application itself is a server.

ElasticSearch is an open source, Apache licensed Java application that implements a server that performs indexing and searching of JSON documents. It's built on top of Lucene, a popular Java library used by many higher-level search engines. ElasticSearch has a HTTP REST API as well as higher performance Thrift API, and it's query DSL provides rich searching capabilities.

From day one, our log indexing service has indexed JSON documents. Log parsers output JSON documents and mappings can be used to convert specific fields to forms understood by Xapian. I think the system (not invented by me!) is quite clever on how Xapian is used to allow indexing JSON documents.

Problems with Xapian

As more and more logs started to flow, we started facing problems with Xapian.

First, we had problems on how to scale indexing. Xapian's database is a bunch of files, and only one process is allowed to write to the database at a time. We wanted good durability, so the database was flushed often to not lose any data. Due to how the communication between the client (that sends logs) and the server (that submits them to indexing) works, we couldn't index a large batch of documents and then flush the database. So as the amount of incoming logs started to grow, the indexing was left behind at times.

The next problem was search performance. As our log database hit about 10 million entries, searches on a single device's logs were taking many seconds to complete. Searching through the logs of all devices took minutes, even if limiting to a few lines of results. The situation was worsened by the fact that flushing a Xapian database invalidates all ongoing search operations and they have to be restarted. And our indexer flushed often.

Our setup had a single node and a single database. I believe that with some refactorizations, splitting databases, adding nodes, etc. we could have made better with Xapian. It would just have been too much trouble, as we would have to build clustering and scaling all by ourselves. At about 19 million log entries we decided to do something about it.

Meet ElasticSearch

We started to look for alternatives and found ElasticSearch. It was amazing how it seemed to fit to our needs perfectly. It uses JSON as the native document format, its mapping capabilities and JSON-based search language were built in the same spirit as in our Xapian-based system.

So I started playing with it.

ElasticSearch was ridiculously simple to get running. Just download the binaries and start one shell script. It was up an running in 5 minutes, with zero configuration. Once I got grip of the mapping system, it was easy to make the same fields searchable in the same way as we had done with Xapian. What needed most work was to change from building Xapian-type queries to ElasticSearch ones. But after all, this wasn't so big deal either, as our own query language was also based on JSON.

After these issues were solved became the fun part: Moving our logs from Xapian to ElasticSearch. I wrote a small Python script that iterated through all the documents in our 16GB Xapian database, made minor modifications to them and used the ElasticSearch bulk API to index a few thousand in each request. The process took a few hours to complete, and after it was done, it was time to see what had happened to the performance.

ElasticSearch is fast

Our first ElasticSearch node had one CPU and 2GB of memory, of which 1GB was dedicated for ElasticSearch. And searching was blazingly fast. After getting used to waiting 10-15 seconds for the 1000 most recent log entries of a single client with Xapian, ElasticSearch returned the results in 5 seconds. When I pressed the search button again, the I got the results (with a few new lines) in less than a second. This was amazing.

The log indexer perfomed a lot better too. Before, cathing up on 2000 pending indexing jobs took an hour to complete. Now it was 2 minutes.

ElasticSearch is bonsai cool

All the worries about scaling are gone. If speed becomes an issue, we can start a new node or three, and let ElasticSearch work out load balancing behind the scenes.

But we're nowhere near requiring more performance. Currently, in our testing environment we still have a single ElasticSearch node, but I reduced the memory limit of ElasticSearch to 512 MB. Nothing changed in terms of speed even though the available memory was cut to half.

We're really happy about ElasticSearch and would never change back to our old system.Because of the speed, we're now able to enhance the log searching user experience. We have plans on implementing polling for new entries from log browser, fetching more lines dynamically when the user scrolls the window, and more.

As its website states, ElasticSearch really is bonsai cool.