Skip to content

Releases: wfondrie/depthcharge

Depthcharge v0.4.8

10 May 08:13
486221c
Compare
Choose a tag to compare

[v0.4.8]

Changed

  • Tokenizer.detokenize() now truncates the output to the first stop token it finds, if trim_stop_token=True.

Depthcharge v0.4.7

09 May 23:20
1b53a35
Compare
Choose a tag to compare

[v0.4.7]

Fixed

  • Add stop and start tokens for AnnotatedSpectrumDataset, when available.
  • When reverse is used for the PeptideTokenizer, automatically reverse the decoded peptide.

Depthcharge v0.4.6

08 May 06:00
3ca2297
Compare
Choose a tag to compare

[v0.4.6]

Added

  • Added support for unsigned modification masses that don't quite conform to the Proforma standard.

Depthcharge v0.4.5

30 Apr 18:08
8519369
Compare
Choose a tag to compare

Changed

  • The scan_id column for parsed spectra is not a sting instead of an integer. This is less space efficient, but we ran into issues with Sciex indexing when trying to use only an integer.

Depthcharge v0.4.4

29 Apr 22:19
b8be2e2
Compare
Choose a tag to compare

Changed

  • Partially revert length changes to SpectrumDataset and AnnotatedSpectrumDataset. We removed __len__ from both due to problems with PyTorch Lightning compatibility.
  • Simplify dataset code by removing redundancy with lance.pytorch.LanceDatset.
  • Improved warning message for skipped spectra.

Depthcharge v0.4.3

26 Apr 06:28
15d52f4
Compare
Choose a tag to compare

Changed

  • Length of the SpectrumDataset and AnnotatedSpectrumDataset now reflect the samples parameter of the lance.pytorch.LanceDataset parent class.

Depthcharge v0.4.2

25 Apr 06:27
35bf3e7
Compare
Choose a tag to compare

Changed

  • The length of SpectrumDataset and AnnotatedSpectrumDataset is now the number of batches, not the number of spectra. This let's tools like PyTorch Lighting create their progress bars properly.
  • Parsing a dataset now no longer requires reading essentially the whole first file. Now the schema is inferred from the first 128 spectra.

Depthcharge v0.4.1

19 Apr 22:23
d46adf1
Compare
Choose a tag to compare

Added

  • Significant updates to documentation. Add how to model mass spectra.
  • Reading and writing from cloud storage on everything!

Changed

  • Migrated to Mike for mkdocs to manage multiple versions.
  • Moved test GitHub Action from pip to uv.

Depthcharge v0.4.0

17 Apr 20:22
98035ec
Compare
Choose a tag to compare

We have completely reworked of the data module.
Depthcharge now uses Apache Arrow-based formats instead of HDF5; spectra are converted either Parquet or streamed with PyArrow, optionally into Lance datasets.

We now also have full support for small molecules, with the MoleculeTokenizer,
AnalyteTransformerEncoder, and AnalyteTransformerDecoder classes.

Breaking Changes

  • PeptideTransformer* are now AnalyteTransformer*, providing full support for small molecule analytes. Additionally the interface has been completely reworked.
  • Mass spectrometry data parsers now function as iterators, yielding batches of spectra as pyarrow.RecordBatch objects.
  • Parsers can now be told to read arbitrary fields from their respective file formats with the custom_fields parameter.
  • The parsing functionality of SpctrumDataset and its subclasses have been moved to the spectra_to_* functions in the data module.
  • SpectrumDataset and its subclasses now return dictionaries of data rather than a tuple of data. This allows us to incorporate arbitrary additional data
  • SpectrumDataset and its subclasses are now lance.torch.data.LanceDataset subclasses, providing native PyTorch integration.
  • All dataset classes now do not have a loader() method.

Added

  • Support for small molecules.
  • Added the StreamingSpectrumDataset for fast inference.
  • Added spectra_to_df, spectra_to_df, spectra_to_stream to the depthcharge.data module.

Changed

  • Determining the mass spectrometry data file format is now less fragile.
    It now looks for known line contents, rather than relying on the extension.

depthcharge v0.3.1

19 Aug 04:02
c18fa1c
Compare
Choose a tag to compare

[v0.3.1] - 2023-08-18

Added

  • Support for fine-tuning the wavelengths used for encoding floating point numbers like m/z and intensity to the FloatEncoder and PeakEncoder.

Fixed

  • The tgt_mask in the PeptideTransformerDecoder was the incorrect type.
    Now it is bool as it should be.
    Thanks @justin-a-sanders!