ADNEXT_predict

Framework for supervised classification experiments, applied in the context of the ADNEXT project

Experiment-dir: exp/ ---> rank: Data, Features, Weights, Classifier config file column file per setting (exp/exp1/ exp/exp2/: feature file, files (for LCS), pickles: vocabulary;classifier, folds, results (performance, plots, featurefiles), config data/ ---> storage of all csv-files data/raw/ data/frogged/ data/formatted/

Pipeline 1a: doc2csv possible input: excel, txt, json arguments: infile, outfile, configfile specifying the columns uses: utils output: csv with fields doc_id, user_id, date, time, user, text 2: frog_data input: csv-file arguments: tokenize only --> ucto uses: datahandler, utils output: csv-file with extra (frogged) column 3: format_instances - remove * by string (e.g. RT) * by time window * by end hashtag - add label - combine files uses: datahandler output: csv-file, log-file 4: extract features input: csv-file (frogged or not) arguments: uses: featurizer-class, csv-reader output: [class,vector], csv-file 5: classifier arguments: classifiers 10-fold - config for classifiers, make defaults in etc/ - train-test validate 6: report * precision, recall, f1 * confusion matrix * top features

Classes: Datahandler Featurizer Classifier Evaluation Experiment

Functions: Utils

Procedures: doc2csv frog_data

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
modules		modules
old		old
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
add_tagged.py		add_tagged.py
classify_python2.py		classify_python2.py
datahandler.py		datahandler.py
demo_classification.py		demo_classification.py
doc2csv.py		doc2csv.py
experiment.py		experiment.py
experimenter.py		experimenter.py
featurizer.py		featurizer.py
featurizer_old.py		featurizer_old.py
format_instances.py		format_instances.py
frog_data.py		frog_data.py
lcs_classifier.py		lcs_classifier.py
main.py		main.py
prepare_files_lcs.py		prepare_files_lcs.py
sklearn_classifier.py		sklearn_classifier.py
tokenize_docs.py		tokenize_docs.py
vectorizer.py		vectorizer.py
vectorizer_old.py		vectorizer_old.py
write_sparse_matrix.py		write_sparse_matrix.py

fkunneman/ADNEXT_predict

Folders and files

Latest commit

History

Repository files navigation

ADNEXT_predict

About

Resources

Stars

Watchers

Forks

Languages