Skip to content

Latest commit

 

History

History
695 lines (482 loc) · 37.3 KB

CHANGELOG.md

File metadata and controls

695 lines (482 loc) · 37.3 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Added

  • Added remove_profile_spectra filter
  • Allowed peaks to have any floating point dtype
  • Added require_matching_ionmode_and_adduct filter
  • Added remove_noise_below_frequent_intensities

Removed:

  • Require_precursor_below_mz is deprecated, require_precursor_mz now also allows for argument maximum_mz

[0.25.0] -2024-05-21

Added

  • filters require_formula and require_compound_name. #627
  • filters require_retention_time and require_retention_index. #585

changed

  • Removed repair_precursor_is_parent_mass
  • repair_adduct_based_on_smiles does not repair adducts [M]+ and [M]- anymore, since these cases could also be due to a mistake in filling in the parent mass instead of the precursor mz.
  • repair_parent_mass_is_molar_weight does only repair parent mass and does not change the precursor mz.
  • Change repair_parent_mass_is_mol_wt to repair_parent_mass_is_molar_mass
  • Set RDKIT version to rdkit = ">=2023.3.2,<2023.9.5" to fix installation issues.
  • SpectrumProcessor will try to incrementally save when destination files are of type .msp or .mgf
  • Use StackedSparseArray for MetadataMatch equal_match when array_type is sparse #642

[0.24.4] -2024-01-16

changed

  • return processing_report by pipeline

[0.24.3] -2024-01-16

changed

  • Removed repair_precursor_is_parent_mass

  • Removed option accept_parent_mass_is_mol_wt in Repair_adduct_based_on_smiles

  • Merged require_precursor_mz and require_precursor_mz_below_mz into require_precursor_mz_below_mz

  • Added repair_adduct_based_on_parent_mass

  • Changed repair_adduct_and_parent_mass_based_on_smiles to update parent mass to the monoisotopic mass of the smiles, instead of updating based on precursor_mz and new adduct.

0.24.1 -2024-01-16

  • Derive_ionmode now also derives ionmode from charge, before it was only derived from the adduct.

Fixed

  • Fix to handle spectra with empty peak arrays. #598
  • Fix instability introduced in CosineGreedy by np.argsort. #595

Changed

  • Speed up save_to_mgf by preventing repetitive file opening
  • Code refactoring for import functions #593.

0.24.0 -2023-11-21

Added

  • Option to set custom key replacements #547
  • Option to set the export style in save_as_mgf and save_as_json to choose other than matchms styles such as nist, riken, gnps #557
  • Added a save spectra function. To automatically save in the specified file format. #543
  • Add saving function in SpectrumProcessor #543

Fixed

  • Fixed bug when loading empty metadata in msp #548
  • Handle missing precursor_mz in representation and #452 introduced by #514#540
  • Fixed retention time harmonization for msp files #551
  • Fix closing mgf file after loading and prevent reopening. #555

Changed

  • Renamed derive_smiles_from_pubchem_compound_name_search to derive_annotation_from_compound_name. #559
  • Derive_annotation_from_compound_name does not add smile or inchi when this cannot be interpreted by rdkit. #559
  • Refactored SpectrumProcessor. Reduced code repetition and improved modularity. Matchms filters can now be added as functions and in a different position than specified. #565
  • The default pipelines now stores matchms functions instead of string representation. #565
  • The option to add predefined pipelines to SpectrumProcessor has been removed. Predefined pipelines can now just be added by adding the default_pipelines (which is a list) to the filters parameter. #565

[0.23.1] - 2023-10-18

Added

  • Additional tests for filter pipeline order
  • ProcessingReport. This adds an overview of the number of spectra changed by each filter step. (multiple PR's)
  • repair_not_matching_annotation filter #505
  • Missing docstring documentions #507

Changed

  • Logger warning for rdkit molecule conversion #507
  • Repair_smiles_from_compound_name, now works without matchmsextras #509
    • pubchempy was added as dependency
  • Default filters are now stored in the yaml file as separate filters #496
  • Duplicated filters are only added once to the pipeline #524
  • Custom filters are added after default filters or at a position specified by the user #498
  • The file structure of metadata_utils was refactored #503
  • interpret_pepmass now removes the pepmass field after entering precursor_mz #533
  • Filters that did not have any effect are also mentioned in processing report #530
  • Added regex to pepmass reading to properly interpret string representations #539

Fixed

  • handle missing weight information in repair_parent_mass_is_mol_wt filter #507
  • handle missing smiles in repair_smiles_of_salts filter #507
  • The filter settings are now stored as well in logging. #536

0.22.0 - 2023-08-18

Added

  • New SpectrumProcessing class to be the central hub for all filter functions #455. Also takes care that filters are executed in a useful order. This is also integrated into the Pipeline class.

Changed

  • Adjustment to logger levels to remove uninformative warnings #484 and #487.
  • Extensive code refactoring and cleaning.
  • Pipeline class refactoring, Loading of yaml file happens outside Pipeline class #479
  • Yaml file now stores individual filters in the correct order #480
  • File names are not stored in yaml file anymore, they are now supplied when calling run in Pipeline #481
  • Yaml does not store logging information and spectrum files anymore #481 and #482

0.21.2 - 2023-08-01

Added

Changed

  • no more warning if precursor m/z field is updated but change is < 0.001 in interpret_pepmass filter step #460.
  • using poetry as a build system #466

Fixed

  • reading MoNA msp files which specify RT in minutes #462
  • added missing pyyaml dependency #463

0.21.1 - 2023-07-03

Added

  • missing code documentations #454

Changed

  • Moved matchms filter functions into new folder structure #454.
  • Removed outdated (redundant) filters: make_ionmode_lowercase and set_ionmode_na_when_missing #454.

0.21.0 - 2023-06-30

Added

  • New filter functions to repair a smiles that do not match parent mass #440
    • Updated adduct conversion and known adducts
    • added repair_adduct_based_on_smiles
    • added repair_parent_mass_is_mol_wt
    • added repair_precursor_is_parent_mass
    • added repair_smiles_of_salts
    • added require_parent_mass_match_smiles
    • added function to combine this in repair_parent_mass_match_smiles_wrapper
  • Added repair_smiles_from_compound_name #448
  • Added require_correct_ionmode #449
  • Added require_valid_annotation #451

Changed

  • Use pandas for loading adducts dict
  • Moved functions from add_parent_mass to derive_precursor_mz_and_parent_mass from
  • Updated reiterate_peak_comments function to convert the peak_comments keys to float #437
  • Removed filter_by_range non-inplace version #438
  • Updated regex in get_peak_values function #439

Fixed

  • Fixed mistake in calculating parent mass from adduct
  • Added metadata_harmonization parameter to load_spectra function #443

0.20.0 - 2023-05-30

Added

  • min_mz, max_mz and title parameters to spectrum plot (mostly array plot) #419

Changed

  • Fixed pipeline filter #414
  • Removed fingerprint writing to file #416
  • Updated harmonize_values function to remove invalid metadata #418
  • Fixed metadata export style bug #423
  • Updated comment parsing logic in load_from_msp #420
  • Minor changes to regular expressions in clean_compound_name #424

Fixed

0.19.0 - 2023-05-10

Added

  • Added function to infer filetype when loading spectra
  • CI test runs now include Python 3.10

Changed

  • Support reading old NIST and GOLM MSP formats #392
  • expanded options to handle different metadata key styles for (msp) file export #300
  • light refactoring of Metadata constructor to reduce spectra reading time #371
  • two minor corrections of adduct masses (missing electron mass) #374
  • Arranged test in folders #408
  • Updated datatype of peak_comments returned by load_from_mgf reader #410

Fixed

  • Support sparse score arrays also for FingerprintSimilarity scores #389

0.18.0 - 2023-01-05

Added

  • new Pipeline class to define entire matchms workflows. This includes importing one or several datasets, processing using matchms filtering/processing functions as well as similartiy computations. Also allows to import/export workflows as yaml files.

Changed

  • major change of Scores class. Internally, scores are now stored as a stacked sparse array. This allows to store several different scores for spectrum-spectrums pairs in an efficient way. Also makes it possible to run large-scale comparisons in particular when pipelines start with rapid selective similarity scoring methods such as MetadataMatch or PrecursorMzMatch.
  • Scoring/similarity methods now also get a .sparse_array() method (next to the previous .pair() and .matrix() methods).

Fixed

  • minor fix in interpret_pepmass function.

0.17.0 - 2022-08-23

Added

  • Scores: added functionality for writing and reading Scores objects to/from disk as JSON and Pickle files #353
  • save_as_msp() now has a mode option (write/append) #346

0.16.0 - 2022-06-12

Added

  • Spectrum objects now also have .mz and .intensities properties #339
  • SimilarityNetwork: similarity-network graphs can now be exported to cyjs, gexf, gml, and node-link JSON formats #349

Changed

  • metadata filtering: made prefilter check for SMILES and InChI more lenient, eventually resulting in longer runtimes but more accurate checks #337

0.15.0 - 2022-03-09

Added neutral losses similarity score (cosine-type score) and a few small fixes.

Added

  • new spectral similarity score: NeutralLossesCosine which is based on matches between neutral losses of two spectra #329

Changed

  • added key conversion: "precursor_type" to "adduct" #332
  • added key conversion: "rtinseconds" to "retention_time" #331

Fixed

  • handling of duplicate entries in spectrum files (e.g. as field and again in the comments field in msp files) by ugrade of pickydict to 0.4.0 #332

0.14.0 - 2022-02-18

This is the first of a few releases to work our way towards matchms 1.0.0, which also means that a few things in the API will likely change. Here the main change is that Spectrum.metadata is no longer a simple Python dictionary but became a Metadata object. In this context metadata field-names/keys will now be harmonized by default (e.g. "Precursor Mass" will become "precursor_mz). For list of conversions see matchms key conversion table.

Added

  • new MetadataMatchsimilarity measure in matchms.similarity. This can be used to find matches between metadata entries and currently supports either full string matches or matches of numerical entries within a specified tolerance #315
  • metadata is now stored using new Metadata class which automatically applied restrictions to used field names/keys to avoid confusion between different format styles #293
  • all metadata keys must be lower-case, spaces will be changed to underscores.
  • Known key conversions are applied to metadata entries using a matchms key conversion table
  • new interpret_pepmass() filter to handle different pepmass entries found in data [#298][#298]

Changed

  • Metadata harmonization will now happen by default! This includes changing field name style and applying known key conversions. To avoid the key conversions user have to make this explicit by setting metadata_harmonization=False #293
  • Spikes class has become Fragments class #293
  • Change import style (now: isort 5 and slightly different style) #323

Fixed

  • can now handle charges that come as a string of type "2+" or "1-" #301
  • new Metadataclass fixes issue of equality check for different entry orders #285

0.13.0 - 2022-02-08

Added

  • Updated and extended plotting functionality, now located in matchms.plotting. Contains three plot types: plot_spectrum() or spectrum.plot(), plot_spectra_mirror() or spectrum.plot_against() and plot_spectra_array() #303

Changed

  • Spectrum objects got an update of the basic spectrum plots spectrum.plot() #303
  • require_precursor_mz() filter will now also discard nonsensical m/z values < 10.0 (value can be adapted by user) #309

Fixed

  • Updated to new url for load_from_usi function (old link was broken) #310
  • Small bug fix: add_retention filters can now properly handle TypeError for empty list. #314

0.12.0 - 2022-01-18

Added

  • peak comments (as an mz: comment dictionary) are now part of metadata and can be addressed via a Spectrum() object peak_comments property #284
  • peak comments are dynamically updated whenever the respective peaks are changed #277

Changed

  • Major refactoring of unit test layout now using a spectrum builder pattern #261
  • Spikes object now has different getitem method that allows to extract specific peaks as mz/intensity pair (or array) #291
  • add_parent_mass() filter now better handles existing entries (including fields "parent_mass", "exact_mass" and "parentmass") #292
  • minor improvement of compound name cleaning in derive_adduct_from_name() filter #280
  • save_as_msp() now writes peak comments (if present) to the output file #277
  • load_from_msp() now also reads peak comments #277

Fixed

  • able to handle spectra containg empty/zero intensities #289

0.11.0 - 2021-12-16

Added

  • better, more flexible string handling of ModifiedCosine #275
  • matchms logger, replacing all former print statments to better control logging output #271
  • add_logging_to_file(), set_matchms_logger_level(), reset_matchms_logger() functions to adapt logging output to user needs #271

Changed

  • save_as_msp() can now also write to files with other than ".msp" extensions such as ".dat" #276
  • refactored add_precursor_mz, including better logging #275

0.10.0 - 2021-11-21

Added

  • Spectrum() objects now also allows generating hashes, e.g. hash(spectrum) #259
  • Spectrum() objects can generate .spectrum_hash() and .metadata_hash() to track changes to peaks or metadata #259
  • load_from_mgf() now accepts both a path to a mgf file or a file-like object from a preloaded MGF file #258
  • add_retention filters with function add_retention_time() and add_retention_index() #265

Changed

  • Code linting triggered by pylint update #257
  • Refactored add_parent_mass() filter can now also handle missing charge entries (if ionmode is known) #252

0.9.2 - 2021-07-20

Added

  • Support for Python 3.9 #240

Changed

  • Use bool instead of np.bool #245

0.9.1 - 2021-06-16

Fixed

  • Correctly handle charge=0 entries in add_parent_mass filter #236
  • Reordered written metadata in MSP export for compatability with MS-FINDER & MS-DIAL #230
  • Update README.rst to fix fstring-quote python example #226

0.9.0 - 2021-05-06

Added

  • new matchms.networking module which allows to build and export graphs from scores objects #198
  • Expand list of known negative ionmode adducts and conversion rules #213
  • .to_numpy method for Spikes class which allows to run spectrum.peaks.to_numpy #214
  • save_as_msp() function to export spectrums to .msp file #215

Changed

  • add_precursor_mz() filter now also checks for metadata in keys precursormz and precursor_mass #223
  • load_from_msp() now handles .msp files containing multiple peaks per line separated by ; #221
  • add_parent_mass() now includes overwrite_existing_entry option (default is False) #225

Fixed

  • add_parent_mass() filter now makes consistent use of cleaned adducts #225

0.8.2 - 2021-03-08

Added

  • Added filter function 'require_precursor_mz' and added 1 assert function in 'ModifiedCosine' #191

  • make_charge_int() to convert charge field to integer #184

Changed

  • now deprecated: make_charge_scalar(), use make_charge_int() instead #183

Fixed

  • Make load_from_msp work with different whitespaces #192
  • Very minor bugs in add_parent_mass #188

0.8.1 - 2021-02-19

Fixed

  • Add package data to pypi tar.gz file (to fix Bioconda package) #179

0.8.0 - 2021-02-16

Added

  • helper functions to clean adduct strings, clean_adduct() #170

Changed

  • more thorough adduct cleaning effecting derive_adduct_from_name() and derive_ionmode() #171
  • significant expansion of add_parent_mass() filter to take known adduct properties into account #170

Fixed

  • too unspecific formula detection (and removal) from given compound names in derive_formula_from_name #172
  • no longer ignore n_max setting in reduce_to_number_of_peaks filter #177

0.7.0 - 2021-01-04

Added

  • scores_by_query and scores_by reference now accept sort=True to return sorted scores #153

Changed

  • Scores.scores is now returning a structured array #153

Fixed

  • Minor bug in add_precursor_mz #161
  • Minor bug in Spectrum class (missing metadata deepcopy) #153
  • Minor bug in Spectrum class (eq method was not working with numpy arrays in metadata) #153

0.6.2 - 2020-12-03

Changed

  • Considerable performance improvement for CosineGreedy and CosineHungarian #159

0.6.1 - 2020-11-26

Added

  • PrecursorMzMatch for deriving precursor m/z matches within a given tolerance #156

Changed

  • Raise error for improper use of reduce_to_number_of_peaks filter #151
  • Renamed ParentmassMatch to ParentMassMatch #156

Fixed

  • Fix minor issue with msp importer to avoid failing with unknown characters #151

0.6.0 - 2020-09-14

Added

  • Four new peak filtering functions #119
  • score_by_reference and score_by_query methods to Scores #142
  • is_symmetric option to speed up all-vs-all type score calculation #59
  • Support for Python 3.8 #145

Changed

  • Refactor similarity scores to be instances of BaseSimilarity class #135
  • Marked Scores.calculate() method as deprecated #135

Removed

  • calculate_parallel function #135
  • Scores.calculate_parallel method #135
  • similarity.FingerprintSimilarityParallel class (now part of similarity.FingerprintSimilarity) #135
  • similarity.ParentmassMatchParallel class (now part of similarity.ParentmassMatch) #135

0.5.2 - 2020-08-26

Changed

  • Revision of JOSS manuscript #137

0.5.1 - 2020-08-19

Added

  • Basic submodule documentation and more code examples #128

Changed

  • Extended, updated, and corrected documentation for filter functions #118

0.5.0 - 2020-08-05

Added

  • Read mzML and mzXML files to create Spectrum objects from it #110
  • Read msp files to create Spectrum objects from it #102
  • Peak weighting option for CosineGreedy and ModifiedCosine score #96
  • Peak weighting option for CosineHungarian score #112
  • Similarity score based on comparing parent masses #79
  • Method for instantiating a spectrum from the metabolomics USI #93

Changed

  • CosineGreedy function is now numba based #86
  • Extended readthedocs documentation #82

Fixed

  • Incorrect denominator for cosine score normalization #98

0.4.0 - 2020-06-11

Added

  • Filter add_fingerprint to derive molecular fingerprints #42
  • Similarity scores based on molecular fingerprints #42
  • Add extensive compound name cleaning and harmonization #23
  • Faster cosine score implementation using numba #29
  • Cosine score based on Hungarian algorithm #40
  • Modified cosine score #26
  • Import and export of spectrums from json files #15
  • Doc strings for many methods #49
  • Examples in doc strings which are tested on CI #49

Changed

  • normalize_intensities filter now also normalizes losses #69

Removed

0.3.4 - 2020-05-29

Changed

  • Fix verify step in conda publish workflow
  • Fixed mixed up loss intensity order. #20

0.3.3 - 2020-05-27

Added

  • Build workflow runs the tests after installing the package #47

Changed

  • tests were removed from the package (see setup.py) #47

0.3.2 - 2020-05-26

Added

  • Workflow improvements
    • Use artifacts in build workflow
    • List artifact folder in build workflow

Changed

  • Workflow improvements #244
    • merge anaconda and python build workflows
    • fix conda package install command in build workflow
    • publish only on ubuntu machine
    • update workflow names
    • test conda packages on windows and unix separately
    • install conda package generated by the workflow
    • split workflows into multiple parts
    • use default settings for conda action
  • data folder is handled by setup.py but not meta.yml

Removed

  • remove python build badge #244
  • Moved spec2vec similarity related functionality from matchms to iomega/spec2vec
  • removed build step in build workflow
  • removed conda build scripts: conda/build.sh and conda/bld.bat
  • removed conda/condarc.yml
  • removed conda_build_config.yaml
  • removed testing from publish workflow

0.3.1 - 2020-05-19

Added

  • improve conda package #225
    • Build scripts for Windows and Unix(MacOS and Linux) systems
    • verify conda package after uploading to anaconda repository by installing it
    • conda package also includes matchms/data folder

Changed

  • conda package fixes #223
    • move conda receipe to conda folder
    • fix conda package installation issue
    • add extra import tests for conda package
    • add instructions to build conda package locally
    • automatically find matchms package in setup.py
    • update developer instructions
    • increase verbosity while packaging
    • skip builds for Python 2.X
    • more flexible package versions
    • add deployment requirements to meta.yml
  • verify conda package #225
    • use conda/environment.yml when building the package
  • split anaconda workflow #225
    • conda build: tests conda packages on every push and pull request
    • conda publish: publish and test conda package on release
    • update the developer instructions
    • move conda receipe to conda folder

0.3.0 - 2020-05-13

Added

Changed

  • Seperate filters #97
  • Translate filter steps to new structure (interpret charge and ionmode) #73
  • filters returning a new spectrum #100
  • Flowchart diagram #135
  • numpy usage #191
  • consistency of the import statements #189

0.2.0 - 2020-04-03

Added

  • Anaconda actions

0.1.0 - 2020-03-19

Added