How do you full text search an amazon s3 bucket?

php amazon-web-services amazon-s3

The only way to do this will be via CloudSearch, which can use S3 as a source. It works using rapid retrieval to build an index. This should work very well but thoroughly check out the pricing model to make sure that this won't be too costly for you.

The alternative is as Jack said - you'd otherwise need to transfer the files out of S3 to an EC2 and build a search application there.

php amazon-web-services amazon-s3

Since october 1st, 2015 Amazon offers another search service with Elastic Search, in more or less the same vein as cloud search you can stream data from Amazon S3 buckets.

It will work with a lambda function to make sure any new data sent to an S3 bucket triggers an event notification to this Lambda and update the ES index.

All steps are well detailed in amazon doc with Java and Javascript example.

At a high level, setting up to stream data to Amazon ES requires the following steps:

Creating an Amazon S3 bucket and an Amazon ES domain
Creating a Lambda deployment package.
Configuring a Lambda function.
Granting authorization to stream data to Amazon ES.

php amazon-web-services amazon-s3

Although not an AWS native service, there is Mixpeek, which runs text extraction like Tika, Tesseract and ImageAI on your S3 files then places them in a Lucene index to make them searchable.

You integrate it as follows:

Download the module: https://github.com/mixpeek/mixpeek-python

Import the module and your API keys:

 from mixpeek import Mixpeek, S3 from config import mixpeek_api_key, aws

Instantiate the S3 class (which uses boto3 and requests):

 s3 = S3(     aws_access_key_id=aws['aws_access_key_id'],     aws_secret_access_key=aws['aws_secret_access_key'],     region_name='us-east-2',     mixpeek_api_key=mixpeek_api_key )

Upload one or more existing S3 files:

     # upload all S3 files in bucket "demo"                 s3.upload_all(bucket_name="demo")     # upload one single file called "prescription.pdf" in bucket "demo"     s3.upload_one(s3_file_name="prescription.pdf", bucket_name="demo")

Now simply search using the Mixpeek module:

     # mixpeek api direct     mix = Mixpeek(         api_key=mixpeek_api_key     )     # search     result = mix.search(query="Heartgard")     print(result)

Where result can be:

 [     {         "_id": "REDACTED",         "api_key": "REDACTED",         "highlights": [             {                 "path": "document_str",                 "score": 0.8759502172470093,                 "texts": [                     {                         "type": "text",                         "value": "Vetco Prescription\nVetcoClinics.com\n\nCustomer:\n\nAddress: Canine\n\nPhone: Australian Shepherd\n\nDate of Service: 2 Years 8 Months\n\nPrescription\nExpiration Date:\n\nWeight: 41.75\n\nSex: Female\n\n℞  "                     },                     {                         "type": "hit",                         "value": "Heartgard"                     },                     {                         "type": "text",                         "value": " Plus Green 26-50 lbs (Ivermectin 135 mcg/Pyrantel 114 mg)\n\nInstructions: Give one chewable tablet by mouth once monthly for protection against heartworms, and the treatment and\ncontrol of roundworms, and hookworms. "                     }                 ]             }         ],         "metadata": {             "date_inserted": "2021-10-07 03:19:23.632000",             "filename": "prescription.pdf"         },         "score": 0.13313256204128265     } ]

Then you parse the results

CodeHunter

How do you full text search an amazon s3 bucket?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last