Reading an JSON file from S3 using Python boto3
As mentioned in the comments above, repr
has to be removed and the json
file has to use double quotes for attributes. Using this file on aws/s3:
{ "Details" : "Something"}
and the following Python code, it works:
import boto3import jsons3 = boto3.resource('s3')content_object = s3.Object('test', 'sample_json.txt')file_content = content_object.get()['Body'].read().decode('utf-8')json_content = json.loads(file_content)print(json_content['Details'])# >> Something
The following worked for me.
# read_s3.pyimport boto3BUCKET = 'MY_S3_BUCKET_NAME'FILE_TO_READ = 'FOLDER_PATH/my_file.json'client = boto3.client('s3', aws_access_key_id='MY_AWS_KEY_ID', aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY' )result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ) text = result["Body"].read().decode()print(text['Details']) # Use your desired JSON Key for your value
It is not good idea to hard code the AWS Id & Secret Keys directly. For best practices, you can consider either of the followings:
(1) Read your AWS credentials from a json file stored in your local storage:
import jsoncredentials = json.load(open('aws_cred.json'))client = boto3.client('s3', aws_access_key_id=credentials['MY_AWS_KEY_ID'], aws_secret_access_key=credentials['MY_AWS_SECRET_ACCESS_KEY'] )
(2) Read from your environment variable (my preferred option for deployment):
import osclient = boto3.client('s3', aws_access_key_id=os.environ['MY_AWS_KEY_ID'], aws_secret_access_key=os.environ['MY_AWS_SECRET_ACCESS_KEY'] )
Let's prepare a shell script (set_env.sh
) for setting the environment variables and add our python script (read_s3.py
) as follows:
# set_env.shexport MY_AWS_KEY_ID='YOUR_AWS_ACCESS_KEY_ID'export MY_AWS_SECRET_ACCESS_KEY='YOUR_AWS_SECRET_ACCESS_KEY'# execute the python file containing your code as stated above that reads from s3python read_s3.py # will execute the python script to read from s3
Now execute the shell script in a terminal as follows:
sh set_env.sh
Wanted to add that the botocore.response.streamingbody
works well with json.load
:
import jsonimport boto3s3 = boto3.resource('s3')obj = s3.Object(bucket, key)data = json.load(obj.get()['Body'])