Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to best use the code as a library? #253

Open
Robert-M-Muench opened this issue Jan 30, 2020 · 4 comments
Open

How to best use the code as a library? #253

Robert-M-Muench opened this issue Jan 30, 2020 · 4 comments

Comments

@Robert-M-Muench
Copy link

I like TSV utils a lot. I'm wondering what's the best way to use the code as a library in own applications?

Is it planned to separate the generic code-parts into a library? Maybe even add them to Phobos? IMO that would make a lot of sense too.

@jondegenhardt
Copy link
Contributor

Thanks! I'm you like the tools.

Can you describe the types of functionality you'd like to see in a library?

At present there are no plans to separate out and release library components from the individual applications. This could be done. It's mostly a matter of whether the time investment would be worthwhile.

That said, it is possible to use the functionality in the common directory as library components. For an example, see dcat-perf. This uses the buffered IO routines in common directory to do some performance testing. The dub.json file lists the dependency ("tsv-utils:common": "~>1.4.1") and source/app.d imports and uses the IO routines (e.g. import tsv_utils.common.utils : bufferedByLine;).

However, I wouldn't recommend this anything really serious, simply because these features haven't been published with the intent of being a general library. For example, if it turns out that the tsv-utils need a change, these modules may get changed in a non-backward compatible way. The features in common are well tested though and should be relatively solid.

There are some generally useful features in common, and I'd be interesting in hearing whether any are useful to you. A good place to see the documentation is: tsv-utils.dpldocs.info/tsv_utils.common.

I'm guessing though that many of the more desirable features are higher up the stack. csv-to-tsv conversion, sampling routines, filtering, uniquing, etc. I'm definitely interested in hearing your thoughts on this.

@jondegenhardt
Copy link
Contributor

Well, I'll list a couple things I thought of that would be library candidates.

One category is low-level utilities for manipulating TSV data. Here's the main thing is the inputFieldReordering. As it is it is useful but the interface is a bit rough. However, it would be especially useful in conjunction with support for named fields. There are a couple other worthwhile enhancements that could be added as well.

Another category is algorithms that could be applied to streaming data generally. Quite a lot of tsv-utils is designed to operate on indefinite or infinite length input streams. tsv-filter for example, and a number of other tools and algorithms as well. It would be helpful to try these in some alternative, representative environments prior to turning them into library utilities. That would help ensure a generalized enough API was being provided.

There are also some algorithms useful outside the context of an input stream. This is a smaller set, but there are useful things that could be done.

@Robert-M-Muench
Copy link
Author

Sorry for answering late. Here are some thoughts/ideas:

  • High-level functions for loading & saving (with implicit CSV-to-TSV)
  • Merging/Joining several files by key
  • Working with several files at the same time
  • Search & Replace
  • Access by position/header name
  • syntax/format checking
  • repairing (quotes, escapes)
  • transparently supporting big CSV files (that might fit your streaming point)

@jondegenhardt
Copy link
Contributor

Thanks, that's a useful list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants