Skip to content

How to read last line of a gzip file in S3 efficiently? #769

Answered by mpenkov
hongbo-miao asked this question in Q&A
Discussion options

You must be logged in to vote

No, you can't do that because of how gzip works. You need to decompress the entire block (in this case, the entire file) in order to be able to access the contents of the block.

One work-around is to compress your file so that it consists of multiple gzip blocks, for example:

$ gzip part1
$ gzip part2
$ gzip part3
$ cat part1.gz part2.gz part3.gz > file.gz

file.gz now consists of three blocks. You can seek to the start of each block and begin decompressing from there.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@mpenkov
Comment options

@hongbo-miao
Comment options

Answer selected by hongbo-miao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants