Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reading from buffer/streams? #79

Open
empz opened this issue Jul 15, 2020 · 12 comments
Open

Support for reading from buffer/streams? #79

empz opened this issue Jul 15, 2020 · 12 comments
Labels
version-2 Saving this for Data-Forge version 2.

Comments

@empz
Copy link
Contributor

empz commented Jul 15, 2020

I see data-forge uses papaparse under the hood to parse CSV files.

Papaparse allows reading from a stream when used in a Node environment (https://github.com/mholt/PapaParse/blob/master/README.md#papa-parse-for-node).

Can we allow such option in the library?

An idea would be to make dataForge.fromCSV() to accept either a string or a stream.

@ashleydavis
Copy link
Member

It's always been the plan to support this and I even tried to implement it once. The problem is that it might require a very different interface and so I might have to save it for data-forge version 2.

I will come back to this again at some point and rethink it.

In the meantime, if you have any proposal on how this should work I'd love to discuss it with you!

@ashleydavis ashleydavis added the version-2 Saving this for Data-Forge version 2. label Jan 31, 2021
@olawalejuwonm
Copy link

@ashleydavis i have an idea about it, and i can work on it. Because i really need this presently

@ashleydavis
Copy link
Member

Hey @olawalejuwonm, I'd love to see if you could implement this. If it fits well I'd definitely like to include it in the library.

@rhesus
Copy link

rhesus commented Jun 27, 2022

@olawalejuwonm did you have any success with enabling streaming in papaparse? or looking into some other CSV library? Wanting to use data-forge but having some problems with memory consumption even for smaller files.

@ashleydavis I saw you split out the file system access, do you have any thoughts about trying to utilize temp files to help "batch data" and reduce memory usage?

@ashleydavis
Copy link
Member

@rhesus I've decided to not attempt to implement streaming in Data-Forge. It's something I always wanted, but actually not something I ever turned out to need.

I'm more than happy for anyone to present a plan for adding streaming data to reduce memory usage.

A first step would be to create a project in GitHub that runs out of memory while processing a data file. That would give us something to centre our discussions on.

@rhesus
Copy link

rhesus commented Jun 27, 2022

That's fair, I've been wanting to use it inside of lambdas and I've experienced several OOM issues. Probably just a case of trying to use the wrong tool for the job.

@ashleydavis
Copy link
Member

Have you tried just breaking your data into smaller bundles that can be processed separately?

That's probably easier than trying to figure out how to upgrade Data-Forge.

@olawalejuwonm
Copy link

Hey @olawalejuwonm, I'd love to see if you could implement this. If it fits well I'd definitely like to include it in the library.

Yes, can I open a PR for it?

@ashleydavis
Copy link
Member

ashleydavis commented Jun 28, 2022

@olawalejuwonm of course!

A good way to start would be to log an issue describing how you would integrate the feature. Then we can discuss it there.

@olawalejuwonm
Copy link

sorry please, i'm very familiar with javascript but quite new to ts. can you guide me on how to go with my first contribution on this @ashleydavis ?

@olawalejuwonm of course!

A good way to start would be to log an issue describing how you would integrate the feature. Then we can discuss it there.

@ashleydavis
Copy link
Member

If you are new to TypeScript, I'd suggest you learn some before trying to contribute.

Then you can proceed in one of two ways:

  • Log an issue and describe what you want to achieve, how you think you might achieve it and we can discuss from there.
  • Or feel free to fork and hack something in, then we can discuss how to get to a pull request.

@olawalejuwonm
Copy link

olawalejuwonm commented Aug 30, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version-2 Saving this for Data-Forge version 2.
Projects
None yet
Development

No branches or pull requests

4 participants