Skip to content

xopen: open gzipped files transparently#76

Open
kristjaneerik wants to merge 4 commits into
counsyl:masterfrom
kristjaneerik:xopen
Open

xopen: open gzipped files transparently#76
kristjaneerik wants to merge 4 commits into
counsyl:masterfrom
kristjaneerik:xopen

Conversation

@kristjaneerik

Copy link
Copy Markdown
Collaborator

Here's an implementation of stor.xopen which works just like stor.open for regular files, but uses a layer of gzip.GzipFile for files ending in .gz such that you don't have to worry about compression, it happens behind the scenes.

Not sure if I did it correctly with respect to forcing the mode etc..

The portion of the test that runs with stor.xopen(stor.join(self.drive, 'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp: doesn't work if the mode is r. I suspect it's something to do with the mocking rather than the implementation, since the code above it with mode r works. ¯\_(ツ)_/¯

Suggestions for better approaches to testing appreciated!

Inspiration from https://github.com/marcelm/xopen. Would be great if we could just use that, but I'm not sure we can.

@jtratner

jtratner commented Apr 20, 2018 via email

Copy link
Copy Markdown
Contributor

@jtratner

jtratner commented Apr 20, 2018 via email

Copy link
Copy Markdown
Contributor

@kristjaneerik

Copy link
Copy Markdown
Collaborator Author

I added an emoji to the test file.

All of these give PASSED as the result for the single test being run:

tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular

@jtratner

jtratner commented Apr 20, 2018 via email

Copy link
Copy Markdown
Contributor

Comment thread stor/base.py
if self.endswith('.gz'):
if mode == 'r':
mode = 'rb'
if mode == 'w':

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these codepaths aren't actually tested (that's what is causing CI to fail).

Additionally, you need to pass the original mode to the gzip.GzipFile object, so perhaps you want something like:

if mode in ('r', 'rb'): fp_mode='rb'
if mode in ('w', 'wb'): fp_mode = 'wb'
fp = self.open(fp_mode, *args, **kwargs)
gzfp = gzip.GzipFile(mode=mode, fileobj=fp)

Additionally, GzipFile's docs say that it doesn't automatically close the underlying file object - which means that if you do something like this:

with stor.xopen('s3://unauthed-bucket/myfile.csv.gz') as fp:
    fp.write('somedata')

the exception will not bubble up to the user (I believe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants