{"id":243,"date":"2023-01-30T04:42:04","date_gmt":"2023-01-30T04:42:04","guid":{"rendered":"https:\/\/ramansaini.in\/blog\/?p=243"},"modified":"2023-10-18T10:38:55","modified_gmt":"2023-10-18T10:38:55","slug":"unzip-a-zip-file-on-s3-using-lambda","status":"publish","type":"post","link":"https:\/\/ramansaini.in\/blog\/unzip-a-zip-file-on-s3-using-lambda\/","title":{"rendered":"How to unzip a .zip file at the same location on S3 using Lambda."},"content":{"rendered":"\n<p>If there is an application, where the zip file is uploaded directly on S3, but later on all the content inside the zip file is needed. We can create a lambda function in python and add an S3 trigger so that whenever a zip file is uploaded in a folder, lambda will trigger and unzip the file content in the same folder.<\/p>\n\n\n\n<p>First create the handled, and get the source and destination keys from the event received when the S3 triggers the lambda.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def handler(event, context):\n    source_bucket, source_key = parse_s3_uri('s3:\/\/' + event&#91;'Records']&#91;0]&#91;'s3']&#91;'bucket']&#91;'name'] + \"\/\" + event&#91;'Records']&#91;0]&#91;'s3']&#91;'object']&#91;'key'])\n    destination_bucket, destination_key = parse_s3_uri('s3:\/\/' + event&#91;'Records']&#91;0]&#91;'s3']&#91;'bucket']&#91;'name'] + \"\/\" + event&#91;'Records']&#91;0]&#91;'s3']&#91;'object']&#91;'key'].replace(event&#91;'Records']&#91;0]&#91;'s3']&#91;'object']&#91;'key'].split('\/')&#91;-1], ''))<\/code><\/pre>\n\n\n\n<p>Here, we&#8217;ll get <code>source_bucket<\/code> and <code>source_key<\/code> from the S3 trigger event from which we&#8217;ll create the <code>destination_bucket<\/code> and <code>destination_key<\/code>.<\/p>\n\n\n\n<p>We&#8217;ll be needing few installations, and imports in the python script to complete our script, i.e. <code>boto3<\/code> for running the AWS actions and <code>zipfile<\/code> for performing actions on zip file.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import os\nimport sys\nimport re\nimport boto3\nimport zipfile<\/code><\/pre>\n\n\n\n<p>For each lambda, we get temporary storage in <code>\/tmp<\/code> directory, where we&#8217;ll create a temporary zip location to perform our actions.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>temp_zip = '\/tmp\/file.zip'<\/code><\/pre>\n\n\n\n<p>We&#8217;ll create s3 clients and download the uploaded file in temp location.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>s3_client = boto3.client('s3')\n#     OR\n#     s3_client = boto3.client('s3',\n#             aws_access_key_id='',\n#             aws_secret_access_key='',\n#             region_name = 'us-east-1')\n  \ns3_client.download_file(source_bucket, source_key, temp_zip)<\/code><\/pre>\n\n\n\n<p>As the file is now downloaded in the temp_zip location, now we&#8217;ll unzip the files, and get the list of all files to upload them one by one at the same S3 location.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>zfile = zipfile.ZipFile(temp_zip)\n    \nfile_list = &#91;( name, \n               '\/tmp\/' + os.path.basename(name),\n               destination_key + os.path.basename(name)) \n            for name in zfile.namelist()]<\/code><\/pre>\n\n\n\n<p>We used zipfile to unzip the files and return the output in zfile. And then create the file location array in file_list variable. This variable will be used to run the loop and then upload the files to the S3.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for file_name, local_path, s3_key in file_list:\n        if local_path == '\/tmp\/':\n            continue\n        data = zfile.read(file_name)\n        with open(local_path, 'wb') as f:\n            f.write(data)\n            del(data) # free up some memory\n        s3_client.upload_file(local_path, destination_bucket, s3_key.replace(s3_key.split('\/')&#91;-1], '') + source_key.split('\/')&#91;-1].split('.')&#91;0] + '\/' + s3_key.split('\/')&#91;-1])\n        os.remove(local_path)<\/code><\/pre>\n\n\n\n<p>Here, we will run the loop on the paths array, read the files, upload them, and the delete them from the temporary location to free up the space.<\/p>\n\n\n\n<p>we are using <code>upload_file<\/code> to upload the files from local memory to the S3 destination.<\/p>\n\n\n\n<p>We&#8217;ll return the complete output, what was uploaded on the S3 in JSON format.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>return {\"files\": &#91;'s3:\/\/' + destination_bucket + '\/' + s.replace(s.split('\/')&#91;-1], '') + '' + source_key.split('\/')&#91;-1].split('.')&#91;0] + '\/' + s.split('\/')&#91;-1] for f,l,s in file_list]}<\/code><\/pre>\n\n\n\n<p>This is how we&#8217;ll unzip the .zip files on S3 without any hassle using lambda triggers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If there is an application, where the zip file is uploaded directly on S3, but later on all the content inside the zip file is needed. We can create a lambda function in python and add an S3 trigger so that whenever a zip file is uploaded in a folder, lambda will trigger and unzip&hellip;&nbsp;<a href=\"https:\/\/ramansaini.in\/blog\/unzip-a-zip-file-on-s3-using-lambda\/\" class=\"\" rel=\"bookmark\">Read More &raquo;<span class=\"screen-reader-text\">How to unzip a .zip file at the same location on S3 using Lambda.<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"[\"content\",\"tags\",\"comments\"]","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","_themeisle_gutenberg_block_has_review":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6],"tags":[16,14,15],"class_list":["post-243","post","type-post","status-publish","format-standard","hentry","category-programming","tag-lambda","tag-python","tag-s3"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/posts\/243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/comments?post=243"}],"version-history":[{"count":2,"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/posts\/243\/revisions"}],"predecessor-version":[{"id":301,"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/posts\/243\/revisions\/301"}],"wp:attachment":[{"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/media?parent=243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/categories?post=243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ramansaini.in\/blog\/wp-json\/wp\/v2\/tags?post=243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}