How to index a PDF in Elasticsearch 6.1 with ingest-attachment plugin & JavaScript Client? How to index a PDF in Elasticsearch 6.1 with ingest-attachment plugin & JavaScript Client? elasticsearch elasticsearch

How to index a PDF in Elasticsearch 6.1 with ingest-attachment plugin & JavaScript Client?


I found the answer to my problem:

Elasticsearch does not fetch data from source so,

curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/my_index/my_type/id?pipeline=my-pipeline-id' -d '        {          "pdf" : @/base64_encoded_file        }'

won't work. The "field" from attachment options (in my example, "pdf") must be data, not a filepath. This thread explains three options for sending [pdf] content to elasticsearch:

  1. You can extract the content [from the pdf] and just send what you want to index to elasticsearch.
  2. You can send the binary BASE64 to elasticsearch ingest which will do the extraction
  3. You can send the binary to FSCrawler which will do the extraction before sending to elasticsearch.

In short, the data passed to elasticsearch must be as defined in the documentation.

curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/my_index/my_type/id?pipeline=my-pipeline-id' -d '    {        "pdf" : "base64_encoded_data"    }'