Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step by step #61

Open
Zuckonit opened this issue Nov 20, 2019 · 5 comments
Open

step by step #61

Zuckonit opened this issue Nov 20, 2019 · 5 comments

Comments

@Zuckonit
Copy link

I read the doc, but still meet some problem. I have 20 cbeta xml(20 diff label, assume 1 to 20), and I wanna make a diff result of them. could you please to provide a 'step-by-step' tutor of this.

@ajenhl
Copy link
Owner

ajenhl commented Nov 20, 2019

Sure. Here are the steps, assuming that the CBETA XML files are in a directory called xml_dir, that you want 1-6-grams, and that the catalogue is called catalogue.txt:

  1. Create the corpus from the XML:
    tacl prepare source_dir xml_dir
    tacl strip xml_dir corpus_dir

  2. Create the database:
    tacl ngrams cbeta.db corpus_dir 1 6

  3. Run the diff:
    tacl diff cbeta.db corpus_dir catalogue.txt > diff-results.csv

Does this help?

@Zuckonit
Copy link
Author

how about corpus_dir? what does it contains, and how can I make one

@ajenhl
Copy link
Owner

ajenhl commented Nov 20, 2019

corpus_dir is created by tacl strip - it takes the files in xml_dir (itself created as the output of tacl prepare) and outputs the stripped versions of them in whatever you specify as corpus_dir.

In my example, xml_dir, corpus_dir, catalogue.txt, cbeta.db, and diff-results.csv are all paths that you specify. Only in the case of catalogue.txt do you need to have any content there before running those commands in that sequence.

@Faxinrepent
Copy link

Sure. Here are the steps, assuming that the CBETA XML files are in a directory called xml_dir, that you want 1-6-grams, and that the catalogue is called catalogue.txt:

  1. Create the corpus from the XML:
    tacl prepare source_dir xml_dir
    tacl strip xml_dir corpus_dir
  2. Create the database:
    tacl ngrams cbeta.db corpus_dir 1 6
  3. Run the diff:
    tacl diff cbeta.db corpus_dir catalogue.txt > diff-results.csv

Does this help?

It is helpful!
Could you please write how to manipulate results (by tacl results/align/highlight ) as this case?Because I had trouble in them, like the attached image, even though pandas, biopython, etc are all installed.
Thanks a lot for writing and sharing this software!
error

@ajenhl
Copy link
Owner

ajenhl commented Dec 18, 2019

So in that case, as per the last line of the error text, there is no results file diff-result.csv in that directory, so it is unable to manipulate those results. Presumably the results are either in a file with a different name, or in a different directory, or both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants