-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
Description
- Version: v9.0.0, v8.9.0
- Platform: Windows (but really, all of them)
- Subsystem: streams
Problem
My use case is the following:
I want to send a file over the network but at the same time calculate a local checksum (like a sha-1 hash) for later use.
I tried the following (abbreviated and pseudo):
const fileStream = fs.createReadStream(path, {encoding: null});
const checksum = crypto.createHash("sha1");
const network = createSomeSlowSlowNetworkStream();
fileStream.pipe(checksum);
fileStream.pipe(network);
const res = await network.response();
checksum.end();
if (res.checksum !== checksum.read().toString("hex")) {
throw new Error("dis is bad");
}And it works... Except that pipe'ing to checksum - which can consume a lot faster than a networked stream - will cause the file stream to emit data at that high rate, but since the networked stream is a lot slower, the file stream will buffer all data for it.
In the end (almost) the entire file is buffered in memory this way, which is kinda bad especially if that file is a multi gigabyte file like in my tests.
Inspecting the node process showed thousands of 16KB (default highWatermark) buffers in memory totaling 5GB (essentially the file size of the file stream) waiting to be consumed by the network stream.
Remedies
When a read stream has multiple pipes, it should emit data at the rate of the slowest attached stream, not at the rate of the fastest stream.
Workaround
Well, I cheated and implemented a custom Transform that will checksum.write in .transform. That way, only the backpressure of the network stream is taken into account by the file stream emitting data.