Hardening against adversarial input #85

klauspost · 2022-04-19T12:33:36Z

Describe the bug

This library provides no protection against untrusted inputs.

It is trivial to write inputs that will cause a server to OOM or crash from another issue.

This makes it impossible to use this package unless you have full trust in the users uploading input.

In our minio server we are therefore forced to disable Parquet parsing in S3 Select, since the server may be running in an environment where the users are untrusted.

Adding limits and safety to this package would extend its usage a lot. I know from experience it can be hard and there are a lot of paths to cover, but usually fuzz tests can guide you.

I made a fuzz test to test basic functionality. Even without a seed corpus it crashes within a minute, and letting it run a bit longer runs the OS out of resources as well. Ideally I would like to specify limits in terms of memory usage so we can make reasonable assumptions of how much memory each Reader will take at max.

Unit test to reproduce

Pre 1.18 fuzz test:

func Fuzz(data []byte) int {
	r, err := NewFileReader(bytes.NewReader(data))
	if err != nil {
		return 0
	}
	for {
		_, err := r.NextRow()
		if err != nil {
			break
		}
		for _, col := range r.Columns() {
			_ = col.Element()
		}
	}
	return 1
}

Build+execute with:

λ go-fuzz-build -o=fuzz-build.zip .
λ go-fuzz -timeout=60 -bin=fuzz-build.zip -workdir=testdata/fuzz -procs=16

This will start crashing within a minute.

Example crash:

panic: runtime error: makeslice: cap out of range

goroutine 1 [running]:
github.com/fraugster/parquet-go/parquet.(*FileMetaData).ReadField4(0xc00036c140, {0xbcc0f0, 0xc000016168}, {0xbcf4c0, 0xc00036c1e0})
	e:/gopath/pkg/mod/github.com/fraugster/parquet-go@v0.10.0/parquet/parquet.go:11882 +0xd1
github.com/fraugster/parquet-go/parquet.(*FileMetaData).Read(0xc00036c140, {0xbcc0f0, 0xc000016168}, {0xbcf4c0, 0xc00036c1e0})
	e:/gopath/pkg/mod/github.com/fraugster/parquet-go@v0.10.0/parquet/parquet.go:11753 +0x64b
github.com/fraugster/parquet-go.readThrift({0xbcc0f0, 0xc000016168}, {0xbc80c0, 0xc00036c140}, {0xbc81a0, 0xc00035c0d8})
	e:/gopath/pkg/mod/github.com/fraugster/parquet-go@v0.10.0/helpers.go:107 +0xe2
github.com/fraugster/parquet-go.ReadFileMetaDataWithContext({0xbcc0f0, 0xc000016168}, {0xbca2b0, 0xc000358150}, 0x80)
	e:/gopath/pkg/mod/github.com/fraugster/parquet-go@v0.10.0/file_meta.go:68 +0x5e7
github.com/fraugster/parquet-go.ReadFileMetaData({0xbca2b0, 0xc000358150}, 0x1)
	e:/gopath/pkg/mod/github.com/fraugster/parquet-go@v0.10.0/file_meta.go:18 +0x70
github.com/fraugster/parquet-go.NewFileReaderWithOptions({0xbca2b0, 0xc000358150}, {0xc0000bfe10, 0x1, 0x1})
	e:/gopath/pkg/mod/github.com/fraugster/parquet-go@v0.10.0/file_reader.go:39 +0x173
github.com/fraugster/parquet-go.NewFileReader({0xbca2b0, 0xc000358150}, {0x0, 0x984993, 0x625ea3ea})
	e:/gopath/pkg/mod/github.com/fraugster/parquet-go@v0.10.0/file_reader.go:127 +0xa9
github.com/minio/minio/internal/s3select/parquet.Fuzz({0x1c37f2c0000, 0x3b, 0x3b})
	d:/minio/minio/internal/s3select/parquet/fuzz.go:19 +0xbf
go-fuzz-dep.Main({0xc0000bff60, 0x2, 0x938c05})
	go-fuzz-dep/main.go:36 +0x15b
main.main()
	github.com/minio/minio/internal/s3select/parquet/go.fuzz.main/main.go:17 +0x45
exit status 2

I am not looking for a solution for the specific crash, but rather the class of crashes that can be triggered with malicious user inputs.

parquet-go specific details

What version are you using?

0.10.0

Can this be reproduced in earlier versions?

Likely.

The text was updated successfully, but these errors were encountered:

panamafrancis · 2022-04-19T16:25:09Z

Hello,

Thank you opening this issue, we were aware of the lack of hardening for parquet-go as a peripheral concern. However the target environment has always been where the user input can be trusted, i.e. reading back files which we have written using this package.

We can do more fuzzing to weed out the kind of bugs you've found but i think you've hit the nail on the head here in identifying how tough hardening is...

re. uncompressed size: I know snappy writes the uncompressed length in the first few bytes, we track the compressed/uncompressed sizes of column chunks (which are composed of pages which in turn have one or more compressed blocks) so it should be theoretically possible to extract this information before reading the whole file.

I'll create some tickets to track the fuzzing & uncompressed size estimate efforts

klauspost · 2022-04-20T10:00:42Z

Yeah, broadly there are two classes:

A) Unvalidated input, like the ReadField4 crash above, where a negative size is provided. I see the intent is to check this value, but not all implementations check the correct value AFAICT.

B) Check for excessive memory. I am far from an expert in Parquet, and it seems you have at least done a good deal of work in this department. This is of course much harder to track down, since you need to check every allocation based on sizes from input data.

Decompressed sizes can be checked.

Assuming you have a maxSize:

	if snappy.DecodedLen(block) > maxSize {
		return nil, errors.New("max block size exceeded")
	}

// limitReader returns a Reader that reads from r
// but stops with EOF after n bytes.
// The underlying implementation is a *limitedReader.
func limitReader(r io.Reader, n int64) io.Reader { return &limitedReader{r, n} }

// A limitedReader reads from R but limits the amount of
// data returned to just N bytes. Each call to Read
// updates N to reflect the new amount remaining.
// Read returns EOF when N <= 0 or when the underlying R returns EOF.
type limitedReader struct {
	R io.Reader // underlying reader
	N int64  // max bytes remaining
}

func (l *limitedReader) Read(p []byte) (n int, err error) {
	if l.N <= 0 {
		return 0, errors.New("max block size exceeded")
	}
	if int64(len(p)) > l.N {
		p = p[0:l.N]
	}
	n, err = l.R.Read(p)
	l.N -= int64(n)
	return
}


func (gzipCompressor) DecompressBlock(block []byte) ([]byte, error) {
	buf := bytes.NewReader(block)
	r, err := gzip.NewReader(buf)
	if err != nil {
		return nil, err
	}

	ret, err := ioutil.ReadAll(limitReader(r, maxSize))
	if err != nil {
		return nil, err
	}

	return ret, r.Close()
}

Maybe you want to extend the BlockCompressor interface to accept a max size for any given block. It would seem that in some cases this size is also provided by the header, so a precise value can also be given.

But again, decompression is just some of the places where memory can "explode" AFAICT.

akrennmair · 2022-04-20T10:14:22Z

@klauspost thanks for the report! I'm kinda surprised to see such obvious crashers in fuzz testing, as we've put in effort into fuzz testing in the past. I was able to reproduce this rather quickly, and all 5 crashers (so far) seem to stem from lack of input validation in code auto-generated by thrift. This makes me think potentially all Go implementations of parquet suffer from the same or similar issues. I'll keep fuzz-testing this and try and fix the thrift code generator.

akrennmair · 2022-04-20T12:23:00Z

FYI: apache/thrift#2582 seems to fix the thrift-related crashers so far.

panamafrancis · 2022-04-21T16:30:22Z

We've merged the fixes for the Thrift related crashes: #86

panamafrancis · 2022-04-21T16:42:22Z

@klauspost You can use the new v0.11.0 tag which includes the above fixes: https://github.com/fraugster/parquet-go/releases/tag/v0.11.0

klauspost · 2022-04-22T09:33:03Z

Confirmed, I now only see OOM crashes, but it can't really run very long before it dies.

klauspost mentioned this issue Apr 19, 2022

[S3Select] Switch to new parquet library and reduce locking minio/minio#14731

Merged

7 tasks

panamafrancis added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Apr 19, 2022

panamafrancis mentioned this issue Apr 19, 2022

parquet-tool - file statistics command #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardening against adversarial input #85

Hardening against adversarial input #85

klauspost commented Apr 19, 2022 •

edited

panamafrancis commented Apr 19, 2022

klauspost commented Apr 20, 2022

akrennmair commented Apr 20, 2022

akrennmair commented Apr 20, 2022

panamafrancis commented Apr 21, 2022

panamafrancis commented Apr 21, 2022

klauspost commented Apr 22, 2022

Hardening against adversarial input #85

Hardening against adversarial input #85

Comments

klauspost commented Apr 19, 2022 • edited

0.10.0

panamafrancis commented Apr 19, 2022

klauspost commented Apr 20, 2022

akrennmair commented Apr 20, 2022

akrennmair commented Apr 20, 2022

panamafrancis commented Apr 21, 2022

panamafrancis commented Apr 21, 2022

klauspost commented Apr 22, 2022

klauspost commented Apr 19, 2022 •

edited