Skip to content

v1.5.0 Release

Compare
Choose a tag to compare
@jondegenhardt jondegenhardt released this 16 Feb 06:34
v1.5.0
31a318e

To download and unpack prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.5.0/tsv-utils-v1.5.0_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.5.0/tsv-utils-v1.5.0_osx-x86_64_ldc2.tar.gz | tar xz

Installation instructions are in the ReleasePackageReadme.txt file in the release package.

To be notified of new releases:

GitHub supports notification of new releases. Click the "Watch" button on the repository page and select "Releases Only".

Release 1.5.0 Changes:

  • Prebuilt binaries have been updated to use the latest LDC compiler (1.20.0).

  • tsv-filter: Field list support (PR #259).

    Field list provide a compact way to specify multiple fields for a command. Most tsv-utils tools already support field lists, now tsv-filter does as well. Examples:

    $ # Select lines where fields 1-10 are not empty.
    $ tsv-filter --not-empty 1-10 data.tsv
    
    $ # Select lines where fields 1-5 and 17 are less than 100
    $ tsv-filter --lt 1-5,17:100 data.tsv
    
  • tsv-filter: New field length tests based on either characters or bytes (PR #258).

    The new operators allow filtering on field length. Field length can be measured in either characters or bytes. (Characters can occupy multiple bytes in UTF-8). Examples:

    $ # Keep only lines where field 3 is less than 50 characters
    $ tsv-filter --char-len-lt 3:50 data.tsv
    
    $ # Find lines where field 5 is more than 20 bytes
    $ tsv-filter --byte-len-gt 5:20
    

    Character length tests have names of the form: --char-len-eq|ne|lt|le|gt|ge]. Byte length tests have names of the form: --byte-len-[eq|ne|lt|le|gt|ge].

  • tsv-filter: Improved error messages when invalid regular expressions are used.

    The error message printed by tsv-filter now includes the error text provided by the D regular expression engine. This is helpful when trying to debug complex regular expressions. Examples:

    $ # Old error message (tsv-filter 1.4.4)
    $ tsv-filter --regex 4:'abc(d|e' data.tsv
    [tsv-filter] Error processing command line arguments: Invalid values in option: '--regex 4:abc(d|e'. Expected: '--regex <field>:<val>' where <field> is a number and <val> is a regular expression.
    
    $ # New error message (tsv-filter 1.5.0)
    [tsv-filter] Error processing command line arguments: Invalid regular expression: '--regex 4:abc(d|e'. no matching ')'
    Pattern with error: `abc(d|e` <--HERE-- ``
       Expected: '--regex <field>:<val>' or '--regex <field-list>:<val>' where <val> is a regular expression.
    

    The formatting of the message can be improved and is likely to be updated in the future.

  • tsv-uniq: Performance improvements (PRs #234, #235).

    Better memory management and other changes improved tsv-uniq performance by 5-35% depending on the operation.

  • tsv-sample: Performance improvements reading large data blocks from standard input (PR #238).

    Sampling and shuffling operations requiring that all data be read into memory were unnecessarily slow when large amounts of data was read from standard input. Performance issues were noticed with data sizes larger than 10 GB. This is now fixed.

  • Sample bash scripts included in release package (PR #254).

    Sample versions of the tsv-sort and tsv-sort-fast scripts described on the Tips and Tricks page are now included in the repository and in prebuilt binary packages.