I agree with @faul_sname that the bash is more readable.
But maybe a better (more readable/maintainable) Python alternative is to explicitly use Amazon’s Python API for S3 downloads? I’ve never used it myself, but googling suggests:
import json
import boto3
from io import BytesIO
import gzip
try:
s3 = boto3.resource('s3')
key='YOUR_FILE_NAME.gz'
obj = s3.Object('YOUR_BUCKET_NAME',key)
n = obj.get()['Body'].read()
gzipfile = BytesIO(n)
gzipfile = gzip.GzipFile(fileobj=gzipfile)
content = gzipfile.read()
print(content)
except Exception as e:
print(e)
raise e
You could wrap that in a function to parallelize the download/decompression of path1 and path2 (using your favorite python parallelization paradigm). But this wouldn’t handle piping the decompressed files to cmd without using temp files...
I agree with @faul_sname that the bash is more readable.
But maybe a better (more readable/maintainable) Python alternative is to explicitly use Amazon’s Python API for S3 downloads? I’ve never used it myself, but googling suggests:
You could wrap that in a function to parallelize the download/decompression of
path1
andpath2
(using your favorite python parallelization paradigm). But this wouldn’t handle piping the decompressed files tocmd
without using temp files...I don’t see how that solves any of the problems I have here?