New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Retire redundant Stream allocation and its materialization in ResponseLogger
#6889
base: main
Are you sure you want to change the base?
Conversation
@danicheg Here is some possible intuition / explanation as to what may be a significant difference: In both codes, the "observer" is pulling the bytes from the source stream, one chunk of bytes at a time. In the old code, the chunk is not discarded but added to the vector of chunks, so the "observer" does nothing more than inserting into the vector. On the finaliser, a new stream is built out of that vector, which emits the very same The new code converts each source chunk to a ByteVector (which it may already be) and concatenates those into a single large byte-vector. I am looking into the implementation of the |
@danicheg What would be the performance if the code was to do the following:
|
Hmm, according to the benchmark, we get opposite results. The allocation rate is fewer by 10-100% with the new impl. Or do I misunderstand the results? |
@diesalbla also appending to an immutable vector leads to a vector allocation, doesn't it? |
Yeah, but only a part of it. IIRC internal implementation is a sort of tree, so some nodes are reused. |
Note that |
Looking at the ByteVector implementation, one thing I can notice is that the sub-class The implementation of |
I did some testing of
Benchmark's code.
|
@danicheg Looking at the benchmark code, is that the right use of the
Edit a bit more code: def factor: Double = Math.abs(random.nextInt()) % (maxFactor - minFactor) + minFactor
def go(acc: Chunk[O], sizeOpt: Option[Int], s: Stream[F2, Chunk[O]]): Pull[F2, O, Unit] = {
def nextSize(chunk: Chunk[O]): Int =
sizeOpt.getOrElse((factor * chunk.size).toInt) So in essence, the two factor parameters are used to bound the "coefficient" on what the size of an output chunk will be in proportion to the size of a source chunk. |
@diesalbla I assumed that it isn't matter. So, I've rerun the benchmark with
so discarding the fluctuation, we have the same results. And I'm still sure that playing with arguments of that function could change something in the benchmark's numbers. |
So, in the benchmarks, you have the following code: fs2.Stream
.emits(rawData)
.covary[IO]
Where |
Since the whole response body is already materialized, there is no reason (besides the one described below) to allocate another Stream and materialize it further. That should reduce memory consumption vastly. But unfortunately, there is no free lunch. Although memory consumption is drying down noteworthy, building a
String
fromStream
is 30-40% faster for bodies 100Kb and up than fromByteVector
. I'm not quite following the reasons, but things are so.Maybe there is some inaccuracy in the benchmark. So whoever wants can dig into it.
TLDR is that reducing memory consumption comes with some degradation of throughput, so I'd like to get your thoughts.