Ingest email attachments on ElasticSearch
In the end I defined a totally different pipeline.I read emails using a Ruby application with the mail
library (you can find it on github
), where it's quite easy to extract attachments.Then I put the base64
encoding of those attachments directly on ElasticSearch
, using Ingest Attachment Processor
.
I filter on content_type
just to be sure to load only "real" attachments, as the multiparts emails treat any multimedial content in the body (ie: images) as attachment.
P.S.
Using the mail library, you should do something like:
Mail.defaults do retriever_method :imap, { :address => address, :port => port, :user_name => user_name, :password => password, :enable_ssl => enable_ssl, :openssl_verify_mode => openssl_verify_mode }
and new_messages = Mail.find(keys: ['NOT','SEEN'])
to retrieve unseen messages.
Then iterate over new_messages. After, you can encode a message simply using encoded = Base64.strict_encode64(attachment.body.to_s)
. Please inspect new_messages
to check the exact field names to use.
Your problem might come from strip_attachment => true
in the imap input plugin.