How to best use the code as a library? #253

Robert-M-Muench · 2020-01-30T09:57:20Z

I like TSV utils a lot. I'm wondering what's the best way to use the code as a library in own applications?

Is it planned to separate the generic code-parts into a library? Maybe even add them to Phobos? IMO that would make a lot of sense too.

jondegenhardt · 2020-01-30T18:03:34Z

Thanks! I'm you like the tools.

Can you describe the types of functionality you'd like to see in a library?

At present there are no plans to separate out and release library components from the individual applications. This could be done. It's mostly a matter of whether the time investment would be worthwhile.

That said, it is possible to use the functionality in the common directory as library components. For an example, see dcat-perf. This uses the buffered IO routines in common directory to do some performance testing. The dub.json file lists the dependency ("tsv-utils:common": "~>1.4.1") and source/app.d imports and uses the IO routines (e.g. import tsv_utils.common.utils : bufferedByLine;).

However, I wouldn't recommend this anything really serious, simply because these features haven't been published with the intent of being a general library. For example, if it turns out that the tsv-utils need a change, these modules may get changed in a non-backward compatible way. The features in common are well tested though and should be relatively solid.

There are some generally useful features in common, and I'd be interesting in hearing whether any are useful to you. A good place to see the documentation is: tsv-utils.dpldocs.info/tsv_utils.common.

I'm guessing though that many of the more desirable features are higher up the stack. csv-to-tsv conversion, sampling routines, filtering, uniquing, etc. I'm definitely interested in hearing your thoughts on this.

jondegenhardt · 2020-02-06T05:52:16Z

Well, I'll list a couple things I thought of that would be library candidates.

One category is low-level utilities for manipulating TSV data. Here's the main thing is the inputFieldReordering. As it is it is useful but the interface is a bit rough. However, it would be especially useful in conjunction with support for named fields. There are a couple other worthwhile enhancements that could be added as well.

Another category is algorithms that could be applied to streaming data generally. Quite a lot of tsv-utils is designed to operate on indefinite or infinite length input streams. tsv-filter for example, and a number of other tools and algorithms as well. It would be helpful to try these in some alternative, representative environments prior to turning them into library utilities. That would help ensure a generalized enough API was being provided.

There are also some algorithms useful outside the context of an input stream. This is a smaller set, but there are useful things that could be done.

Robert-M-Muench · 2020-02-08T17:02:02Z

Sorry for answering late. Here are some thoughts/ideas:

High-level functions for loading & saving (with implicit CSV-to-TSV)
Merging/Joining several files by key
Working with several files at the same time
Search & Replace
Access by position/header name
syntax/format checking
repairing (quotes, escapes)
transparently supporting big CSV files (that might fit your streaming point)

jondegenhardt · 2020-02-09T04:14:51Z

Thanks, that's a useful list.

jondegenhardt added enhancement question labels Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to best use the code as a library? #253

How to best use the code as a library? #253

Robert-M-Muench commented Jan 30, 2020

jondegenhardt commented Jan 30, 2020

jondegenhardt commented Feb 6, 2020

Robert-M-Muench commented Feb 8, 2020

jondegenhardt commented Feb 9, 2020

How to best use the code as a library? #253

How to best use the code as a library? #253

Comments

Robert-M-Muench commented Jan 30, 2020

jondegenhardt commented Jan 30, 2020

jondegenhardt commented Feb 6, 2020

Robert-M-Muench commented Feb 8, 2020

jondegenhardt commented Feb 9, 2020