step by step #61

Zuckonit · 2019-11-20T03:32:52Z

I read the doc, but still meet some problem. I have 20 cbeta xml(20 diff label, assume 1 to 20), and I wanna make a diff result of them. could you please to provide a 'step-by-step' tutor of this.

ajenhl · 2019-11-20T05:45:20Z

Sure. Here are the steps, assuming that the CBETA XML files are in a directory called xml_dir, that you want 1-6-grams, and that the catalogue is called catalogue.txt:

Create the corpus from the XML:
tacl prepare source_dir xml_dir
tacl strip xml_dir corpus_dir
Create the database:
tacl ngrams cbeta.db corpus_dir 1 6
Run the diff:
tacl diff cbeta.db corpus_dir catalogue.txt > diff-results.csv

Does this help?

Zuckonit · 2019-11-20T11:32:00Z

how about corpus_dir? what does it contains, and how can I make one

ajenhl · 2019-11-20T18:24:53Z

corpus_dir is created by tacl strip - it takes the files in xml_dir (itself created as the output of tacl prepare) and outputs the stripped versions of them in whatever you specify as corpus_dir.

In my example, xml_dir, corpus_dir, catalogue.txt, cbeta.db, and diff-results.csv are all paths that you specify. Only in the case of catalogue.txt do you need to have any content there before running those commands in that sequence.

Faxinrepent · 2019-12-18T19:08:15Z

Sure. Here are the steps, assuming that the CBETA XML files are in a directory called xml_dir, that you want 1-6-grams, and that the catalogue is called catalogue.txt:

Create the corpus from the XML:
tacl prepare source_dir xml_dir
tacl strip xml_dir corpus_dir

Create the database:
tacl ngrams cbeta.db corpus_dir 1 6

Run the diff:
tacl diff cbeta.db corpus_dir catalogue.txt > diff-results.csv

Does this help?

It is helpful!
Could you please write how to manipulate results (by tacl results/align/highlight ) as this case？Because I had trouble in them, like the attached image, even though pandas, biopython, etc are all installed.
Thanks a lot for writing and sharing this software!

ajenhl · 2019-12-18T21:02:07Z

So in that case, as per the last line of the error text, there is no results file diff-result.csv in that directory, so it is unable to manipulate those results. Presumably the results are either in a file with a different name, or in a different directory, or both.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

step by step #61

step by step #61

Zuckonit commented Nov 20, 2019

ajenhl commented Nov 20, 2019

Zuckonit commented Nov 20, 2019

ajenhl commented Nov 20, 2019

Faxinrepent commented Dec 18, 2019

ajenhl commented Dec 18, 2019

step by step #61

step by step #61

Comments

Zuckonit commented Nov 20, 2019

ajenhl commented Nov 20, 2019

Zuckonit commented Nov 20, 2019

ajenhl commented Nov 20, 2019

Faxinrepent commented Dec 18, 2019

ajenhl commented Dec 18, 2019