How to index source code with ElasticSearch How to index source code with ElasticSearch elasticsearch elasticsearch

How to index source code with ElasticSearch


Interesting question but I'm not aware of an out of the box solution. You can use a WordDelimiter tokenizer as you can specify e.g. the underscore to be handled as a digit and then functions like hello_world (or helloWorld if camelcase is enabled) will be searchable via hello or world.

But I doubt that the results are sufficient ... and you'll have to implement a source code analyzer yourself or use code which extracts the syntax tree to index method names and bodies into different fields


You can use the attachment type plugin to load the files into Elasticsearch and let it index the files. It can handle meta data for the files and index the content of the files.

The github page includes information on how to do highlighting of the search documents.


Unless you want to expose this as a service to somebody, i would recommend you to install InstaSearch plugin in eclipse; this plugin creates lucense index and gives you instantaneous results.