Skip to content

Piping a stream into slow and fast consumer streams in parallel causes excessive buffering #16706

@RealDolos

Description

@RealDolos
  • Version: v9.0.0, v8.9.0
  • Platform: Windows (but really, all of them)
  • Subsystem: streams

Problem

My use case is the following:
I want to send a file over the network but at the same time calculate a local checksum (like a sha-1 hash) for later use.

I tried the following (abbreviated and pseudo):

const fileStream = fs.createReadStream(path, {encoding: null});
const checksum = crypto.createHash("sha1");
const network = createSomeSlowSlowNetworkStream();
fileStream.pipe(checksum);
fileStream.pipe(network);
const res = await network.response();
checksum.end();
if (res.checksum !== checksum.read().toString("hex")) {
  throw new Error("dis is bad");
}

And it works... Except that pipe'ing to checksum - which can consume a lot faster than a networked stream - will cause the file stream to emit data at that high rate, but since the networked stream is a lot slower, the file stream will buffer all data for it.
In the end (almost) the entire file is buffered in memory this way, which is kinda bad especially if that file is a multi gigabyte file like in my tests.
Inspecting the node process showed thousands of 16KB (default highWatermark) buffers in memory totaling 5GB (essentially the file size of the file stream) waiting to be consumed by the network stream.

Remedies

When a read stream has multiple pipes, it should emit data at the rate of the slowest attached stream, not at the rate of the fastest stream.

Workaround

Well, I cheated and implemented a custom Transform that will checksum.write in .transform. That way, only the backpressure of the network stream is taken into account by the file stream emitting data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    streamIssues and PRs related to the stream subsystem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions