If there is an application, where the zip file is uploaded directly on S3, but later on all the content inside the zip file is needed. We can create a lambda function in python and add an S3 trigger so that whenever a zip file is uploaded in a folder, lambda will trigger and unzip the file content in the same folder.
First create the handled, and get the source and destination keys from the event received when the S3 triggers the lambda.
def handler(event, context):
source_bucket, source_key = parse_s3_uri('s3://' + event['Records'][0]['s3']['bucket']['name'] + "/" + event['Records'][0]['s3']['object']['key'])
destination_bucket, destination_key = parse_s3_uri('s3://' + event['Records'][0]['s3']['bucket']['name'] + "/" + event['Records'][0]['s3']['object']['key'].replace(event['Records'][0]['s3']['object']['key'].split('/')[-1], ''))
Here, we’ll get source_bucket
and source_key
from the S3 trigger event from which we’ll create the destination_bucket
and destination_key
.
We’ll be needing few installations, and imports in the python script to complete our script, i.e. boto3
for running the AWS actions and zipfile
for performing actions on zip file.
import os
import sys
import re
import boto3
import zipfile
For each lambda, we get temporary storage in /tmp
directory, where we’ll create a temporary zip location to perform our actions.
temp_zip = '/tmp/file.zip'
We’ll create s3 clients and download the uploaded file in temp location.
s3_client = boto3.client('s3')
# OR
# s3_client = boto3.client('s3',
# aws_access_key_id='',
# aws_secret_access_key='',
# region_name = 'us-east-1')
s3_client.download_file(source_bucket, source_key, temp_zip)
As the file is now downloaded in the temp_zip location, now we’ll unzip the files, and get the list of all files to upload them one by one at the same S3 location.
zfile = zipfile.ZipFile(temp_zip)
file_list = [( name,
'/tmp/' + os.path.basename(name),
destination_key + os.path.basename(name))
for name in zfile.namelist()]
We used zipfile to unzip the files and return the output in zfile. And then create the file location array in file_list variable. This variable will be used to run the loop and then upload the files to the S3.
for file_name, local_path, s3_key in file_list:
if local_path == '/tmp/':
continue
data = zfile.read(file_name)
with open(local_path, 'wb') as f:
f.write(data)
del(data) # free up some memory
s3_client.upload_file(local_path, destination_bucket, s3_key.replace(s3_key.split('/')[-1], '') + source_key.split('/')[-1].split('.')[0] + '/' + s3_key.split('/')[-1])
os.remove(local_path)
Here, we will run the loop on the paths array, read the files, upload them, and the delete them from the temporary location to free up the space.
we are using upload_file
to upload the files from local memory to the S3 destination.
We’ll return the complete output, what was uploaded on the S3 in JSON format.
return {"files": ['s3://' + destination_bucket + '/' + s.replace(s.split('/')[-1], '') + '' + source_key.split('/')[-1].split('.')[0] + '/' + s.split('/')[-1] for f,l,s in file_list]}
This is how we’ll unzip the .zip files on S3 without any hassle using lambda triggers.
Pingback: How to create automated database backups on GitHub & S3. - Raman