Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata output to tacl diff/intersect/report #42

Open
ajenhl opened this issue Oct 30, 2015 · 1 comment
Open

Add metadata output to tacl diff/intersect/report #42

ajenhl opened this issue Oct 30, 2015 · 1 comment
Assignees

Comments

@ajenhl
Copy link
Owner

ajenhl commented Oct 30, 2015

Add an option to tacl diff/intersect/report to output the results to the specified file rather than stdout, and in addition output metadata about the operation to a file with the same name but different extension. This metadata would consist of:

  • The catalogue entries used in the query (not the name of the file, but the actual texts and labels).
  • The date the query was run.
  • The date each text used in the query was last updated in the database.
  • The hash of each text from the database.
  • The date each text used in the query was last updated in the corpus.
  • The hash of each used text in the corpus.
  • The type of query (diff or intersect).
  • The tokenizer used.
  • The report operations applied.

In the case of tacl report, the metadata output would include any metadata available for the query being operated on.

@ajenhl ajenhl self-assigned this Oct 30, 2015
@ajenhl
Copy link
Owner Author

ajenhl commented Jan 6, 2016

An advantage to doing this is that it allows for warnings to be generated when the user does something inadvisable. For example, if some results are being reduced, and one or more sources of those results have already been reduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant