xopen: open gzipped files transparently#76
Open
kristjaneerik wants to merge 4 commits into
Open
Conversation
Contributor
|
Looks nice, but have you tested this on python3? Try writing an emoji or
other Unicode into your file :-/ might blow up unfortunately
…On Thu, Apr 19, 2018 at 6:21 PM Kristjan Eerik Kaseniit < ***@***.***> wrote:
Here's an implementation of stor.xopen which works just like stor.open
for regular files, but uses a layer of gzip.GzipFile for files ending in
.gz such that you don't have to worry about compression, it happens
behind the scenes.
Not sure if I did it correctly with respect to forcing the mode etc..
The portion of the test that runs with stor.xopen(stor.join(self.drive,
'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp: doesn't work if the mode
is r. I suspect it's something to do with the mocking rather than the
implementation, since the code above it with mode r works. ¯\_(ツ)_/¯
Suggestions for better approaches to testing appreciated!
Inspiration from https://github.com/marcelm/xopen. Would be great if we
could just use that, but I'm not sure we can.
------------------------------
You can view, comment on, or merge this pull request online at:
#76
Commit Summary
- Merge pull request #1 from counsyl/master
- Sem-Ver: feature - xopen: transparently handle gzipped-files
- add file with some unicode
File Changes
- *M* stor/__init__.py
<https://github.com/counsyl/stor/pull/76/files#diff-0> (1)
- *M* stor/base.py
<https://github.com/counsyl/stor/pull/76/files#diff-1> (17)
- *A* stor/tests/file_data/utf8_file_with_unicode.txt
<https://github.com/counsyl/stor/pull/76/files#diff-2> (2)
- *M* stor/tests/shared_obs.py
<https://github.com/counsyl/stor/pull/76/files#diff-3> (33)
Patch Links:
- https://github.com/counsyl/stor/pull/76.patch
- https://github.com/counsyl/stor/pull/76.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#76>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABhjqzVpNOZtPd8dyQO2kqOURHss5wliks5tqTgOgaJpZM4TcuTN>
.
|
Contributor
|
(If it works on python 3 then I’m super down and will review more in-depth!)
On Thu, Apr 19, 2018 at 7:15 PM Jeffrey Tratner <jeffrey.tratner@gmail.com>
wrote:
… Looks nice, but have you tested this on python3? Try writing an emoji or
other Unicode into your file :-/ might blow up unfortunately
On Thu, Apr 19, 2018 at 6:21 PM Kristjan Eerik Kaseniit <
***@***.***> wrote:
> Here's an implementation of stor.xopen which works just like stor.open
> for regular files, but uses a layer of gzip.GzipFile for files ending in
> .gz such that you don't have to worry about compression, it happens
> behind the scenes.
>
> Not sure if I did it correctly with respect to forcing the mode etc..
>
> The portion of the test that runs with stor.xopen(stor.join(self.drive,
> 'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp: doesn't work if the
> mode is r. I suspect it's something to do with the mocking rather than
> the implementation, since the code above it with mode r works. ¯\_(ツ)_/¯
>
> Suggestions for better approaches to testing appreciated!
>
> Inspiration from https://github.com/marcelm/xopen. Would be great if we
> could just use that, but I'm not sure we can.
> ------------------------------
> You can view, comment on, or merge this pull request online at:
>
> #76
> Commit Summary
>
> - Merge pull request #1 from counsyl/master
> - Sem-Ver: feature - xopen: transparently handle gzipped-files
> - add file with some unicode
>
> File Changes
>
> - *M* stor/__init__.py
> <https://github.com/counsyl/stor/pull/76/files#diff-0> (1)
> - *M* stor/base.py
> <https://github.com/counsyl/stor/pull/76/files#diff-1> (17)
> - *A* stor/tests/file_data/utf8_file_with_unicode.txt
> <https://github.com/counsyl/stor/pull/76/files#diff-2> (2)
> - *M* stor/tests/shared_obs.py
> <https://github.com/counsyl/stor/pull/76/files#diff-3> (33)
>
> Patch Links:
>
> - https://github.com/counsyl/stor/pull/76.patch
> - https://github.com/counsyl/stor/pull/76.diff
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#76>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABhjqzVpNOZtPd8dyQO2kqOURHss5wliks5tqTgOgaJpZM4TcuTN>
> .
>
|
Collaborator
Author
|
I added an emoji to the test file. All of these give tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular |
Contributor
|
Awesome!
…On Fri, Apr 20, 2018 at 2:07 PM Kristjan Eerik Kaseniit < ***@***.***> wrote:
I added an emoji to the test file.
All of these give PASSED as the result for the single test being run:
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#76 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABhjq2sm_HtH2NSobBeRsWYF_Puhghnuks5tqk4SgaJpZM4TcuTN>
.
|
jtratner
reviewed
Jun 28, 2018
| if self.endswith('.gz'): | ||
| if mode == 'r': | ||
| mode = 'rb' | ||
| if mode == 'w': |
Contributor
There was a problem hiding this comment.
I think these codepaths aren't actually tested (that's what is causing CI to fail).
Additionally, you need to pass the original mode to the gzip.GzipFile object, so perhaps you want something like:
if mode in ('r', 'rb'): fp_mode='rb'
if mode in ('w', 'wb'): fp_mode = 'wb'
fp = self.open(fp_mode, *args, **kwargs)
gzfp = gzip.GzipFile(mode=mode, fileobj=fp)
Additionally, GzipFile's docs say that it doesn't automatically close the underlying file object - which means that if you do something like this:
with stor.xopen('s3://unauthed-bucket/myfile.csv.gz') as fp:
fp.write('somedata')
the exception will not bubble up to the user (I believe)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here's an implementation of
stor.xopenwhich works just likestor.openfor regular files, but uses a layer ofgzip.GzipFilefor files ending in.gzsuch that you don't have to worry about compression, it happens behind the scenes.Not sure if I did it correctly with respect to forcing the mode etc..
The portion of the test that runs
with stor.xopen(stor.join(self.drive, 'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp:doesn't work if the mode isr. I suspect it's something to do with the mocking rather than the implementation, since the code above it with moderworks. ¯\_(ツ)_/¯Suggestions for better approaches to testing appreciated!
Inspiration from https://github.com/marcelm/xopen. Would be great if we could just use that, but I'm not sure we can.