Interactive Spoken Content Retrieval

Installation

Lasagne, Theano
Progressbar, tqdm
tsne, cprint

Work flow

Built language models from PTV transcripts * Transcript directory should be a directory with T0001,T0002,...T5047 transcription files * Specify transcript directory in src/transcript2docmodel.py * Takes approximately 6 hours mainly due to 100k (keyterm) * cmd: python src/transcript2docmodel.py
Train agent * run.py: Specify data, fold, feature, experiment_prefix(directory to save results), result_directory with argparser * Other argument can be adjusted/added/altered, see for yourself * cmd: python src/run.py
View Results * Use merge_csvs.py to merge result/*.log * cmd: python result/parse_log_to_csv.py $dir

Change feature

Change feature type: src/IR/statemachine.py, run_training.py - if/else condition in constructor, featureExtraction & argparser

Change cost

Change cost table: src/IR/actionmanager.py, possibly add another option in run_training.py, argparse

Visualize

specify network pickle ,feature file, number of features, save_path with src/run_visualize.py
use jupyter notebook to open result/plot_feature_action.ipynb & previous save h5 file

Other Notes

Don't ask me about the code and the data storage format, it's just as it is
I believe there are bugs in Wen's data, naming a few
- Some keyterms/requests do not exist, can reproduce if I can access Wen's recognition transcripts
Other cutting methods: snownlp

Name		Name	Last commit message	Last commit date
Latest commit History 290 Commits
data		data
result		result
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

result

result

src

src

.editorconfig

.editorconfig

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Interactive Spoken Content Retrieval

Installation

Work flow

Change feature

Change cost

Visualize

Other Notes

About

Releases

Packages

Contributors 3

Languages

tzuhsial/ISCR-DRL

Folders and files

Latest commit

History

Repository files navigation

Interactive Spoken Content Retrieval

Installation

Work flow

Change feature

Change cost

Visualize

Other Notes

About

Topics

Resources

Stars

Watchers

Forks

Languages