Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Source <> InputStream operations and chunks #134

Merged
merged 8 commits into from May 13, 2024
Merged

Add Source <> InputStream operations and chunks #134

merged 8 commits into from May 13, 2024

Conversation

kciesielski
Copy link
Member

No description provided.

true
}
catch
case t: Throwable =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NonFatal! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a linter for that ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no sorry 🤦 stream operators catch IE on purpose ... as these are propagated downstream. I discover this not the first time, I just need to document it somewhere visible ... ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and also I copied that from your code :) softwaremill/sttp#2158

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe yes, well, it just shows it's not that intuitive ;)

@kciesielski kciesielski marked this pull request as ready for review May 10, 2024 13:10
@kciesielski kciesielski requested a review from adamw May 10, 2024 13:10
@kciesielski kciesielski requested a review from adamw May 13, 2024 08:11
/** An immutable finite indexed sequence of elements, backed by Array. Raw rrays are expensive performance-wise when you want to do
* operations like concatenation, splitAt, drop, etc. Such operations are often useful when doing Source processing. `Chunk` offers a
* wrapper focused primarly on performance optimizations for such costly operations, while maintaining familiar API of an `IndexedSeq`.
* Underneath, a `Chunk` should leverage lazy data structures to avoid unnecessary data copying, which is a typical drawback for most Array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does? it looks like a wrapper on an array, are there any optimizations in palce?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no optimizations yet, that's why I wrote that it "should" use these optimizations. I intended this PR to expose the Chunk API but start with simple non-optimized array operations. It can then later evolved as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good, let's just explicitly write that: Chunk for now is a think wrapper on arrays, in the future we want to optimize its operations

Copy link
Member

@adamw adamw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from chunk - looks good :) I mean, chunk is ok as well, but I'm not sure if it's complete

@adamw adamw merged commit 6d9502b into master May 13, 2024
5 checks passed
@adamw adamw deleted the byte-chunk branch May 13, 2024 13:51
@adamw
Copy link
Member

adamw commented May 13, 2024

Thanks :) Can you add follow-up tasks to (a) optimize the chunks, (b) add more input/output methods, I think reading/writing to/from File/Path would be a good start, anything else that you think is essential? Maybe also check with fs2/akka streams for inspiration

@kciesielski
Copy link
Member Author

Chunk task: #136
As for other operations I'd consider essential:

  • It may be useful to have a _.chunk method for Source[Byte], especially if we want to represent byte streams like this in Tapir. Conversely - _.unchunk.
  • I/O: File read/write, maybe also Sink.fromOutputStream?
  • Text: we could consider adding methods to decode/encode between Source[Chunk[Byte]] and Source[String] with given charset (UTF-8 for a start).
  • Reactive streams integration

@adamw
Copy link
Member

adamw commented May 14, 2024

I/O: File read/write, maybe also Sink.fromOutputStream?

Yes, or just source.toOutputStream(OutputStream), same as you might have source.toFile(File) (or Path). We could also write a Source[String] to a file, just as a Source[Chunk[Byte]].

Text: we could consider adding methods to decode/encode between Source[Chunk[Byte]] and Source[String] with given charset (UTF-8 for a start).

I guess .lines is such a method?

But let's open the tasks, we can discuss there further.

@kciesielski
Copy link
Member Author

Ok, tasks created:
#138
#137

I think encode/decode will be a different thing than .lines, not assuming line breaks but working on all kinds of text streams continually.

@adamw
Copy link
Member

adamw commented May 14, 2024

Thanks :) Ok, we'll need to document this properly then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants