Skip to content

Releases: jim-schwoebel/allie

Allie Version 1.0.1

15 Aug 14:40
27503ad
Compare
Choose a tag to compare

Allie Version 1.0

12 Aug 21:14
0546a51
Compare
Choose a tag to compare

Changelog:

  • fix labels on axes for visualization scripts
  • get Dockerfile to pass all unit tests
  • write docker.py script to call from docker as a next-step in downloading nltk models
  • improved readmes and documentation
  • refactored transformer script to use folder paths (instead of tdir1 and tdir2... --> made CLI a little more user-friendly)
  • Added in a CLI tutorial in the wiki.
  • CLI adding in various settings or printing them out.
  • minor bug fixes in model loading with new CSV load feature
  • edited cleaning/augmentation scripts to input / output files as lists to iterate sequentially properly without erroring
  • edited project boards to be up-to-date
  • solve regression problem loading machine learning models and making predictions (from spreadsheets)
  • added in new ability to featurize .CSV spreadsheets using the standard Allie Features API and default_features
  • created a nice CLI interface to use all the API functions of Allie
  • extensive documentation of the entire repository with readmes, updated the wiki, and individual files to make it clearer what all the sections of the repository mean and how to use them
  • added rename.py helper script to rename files to prevent naming conflicts after annotation.
  • added new cleaning feature in renaming files to avoid any file naming issues (with spaces or whatnot) during featurization for audio, text, image, and video files
  • made it so visualization API does error out on regression problems; disabled this for regression problems in version 1.0
  • made create-csv.py script to prepare folders of files into a regression or classification format
  • documentation of the repository / video examples for research paper
  • improved documentation for cleaning and augmentation techniques
  • added in text, image, video, and audio cleaning techniques (in new format)
  • added in text, image, video, and audio augmentation techniques (in new format)
  • add error handling into all of Allie's featurizations + error array into feature array itself ("error" form of column on features)
  • kept create_readme setting for making readmes in the repositories themselves (deleted create_YAML setting)
  • deleted the production folder schema within Allie
  • added component numbers for both dimensionality reducers and feature selectors in settings.json
  • fix small bug .JSON files for model files.
  • add 'pip3 freeze > requirements.txt' --> to machine learning model training systems to reproduct environments on different CPUs
  • added audio_features/loudness_features.py using pyloudnorm (in dB)
  • cleaned up audio_features/sa_feature array to be a simpler # of lines (and made a fixed length-array)
  • fixed bug in loading AutoGluon models for making predictions with the load.py script in the ./models/ directory (and loading model_type variable generally)
  • add in ['zscore','isolationforest'] to remove outliers (https://stackoverflow.com/questions/51390196/how-to-calculate-cooks-distance-dffits-using-python-statsmodel) - remove_outliers == True / False.
  • added a sample validation script in the ./models directory to quickly assess how well machine learning models generalize to new datasets
  • added Figlet for cool text renderings / messages when loading modeling scripts (http://www.figlet.org/)
  • bug fix - minor bug fix in visualize.py script; fixed loading broken .JSON files during featurization (broke the visualization script during model training)
  • bug fix - edited transforms such that they are named by the common name and not by all the classes trained, as if you have >30 classes this will cause the transform to fail at saving / loading
  • added option in modeling script to create csv files (if create_csv == True, then creates .CSV files during training) - note the reason for this is for very large files it can take a long time to create them, so model training sessions can be sped up by setting create_csv == False.
  • added annotate.py script to annotate files (beta version) - need to add to .JSON schema (in labels (regression)
  • come up with the ability to train regression models by a class and value
  • add in single model prediction mode in ./load.py script (-audio (sampletype) -c_autokeras (folder) -directory)
  • add in all model loaders from the model trainers
  • fixed cvopt and autokaggle training script bugs
  • added in the ability to quickly visualize ML models trained in a spreadhseet with the model2csv.py script
  • bug fix - minor bug fixes associated with transcription during featurization for audio, image, video, and .CSV files
  • add notion of "tabular" data instead of .CSV to tie to audio, video, and image data (e.g. for loading datasets) - as laid out in the d3m-schema - did this in the featurize_csv script where .CSV files can contain audio, text, image, video, numerical, and categorical data.
  • test and validate model compression works for all training scripts / can load compressed models and make predictions (w/ production)
  • finish up model trainers and clean them up with standard metrics for accuracy
  • add in version to Allie (to assess deprecation issues into the future)
  • add in deepspeech functionality to transcription for open source (and other open source audio transcribers)
  • add in transcriber settings as a list ['pocketsphinx', 'deepspeech', 'google', 'aws'], etc.
  • added in transcribers as lists (can be adapted into future)
  • created a version 2 trainer for machine learning models (as part of Allie release 1.0.0)