Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corpusreader for TAC dataset - need usage instructions #750

Open
lashmore opened this issue Aug 30, 2021 · 0 comments
Open

Corpusreader for TAC dataset - need usage instructions #750

lashmore opened this issue Aug 30, 2021 · 0 comments

Comments

@lashmore
Copy link

lashmore commented Aug 30, 2021

It is very difficult to intuitively understand how the TACReader class is meant to be used. What path do I send to "corpusRoot"? Here is the file hierarchy of the raw TAC 2014-2015 data, where 2015 has a similar folder structure to 2014.

From what I can tell, TACReader is breaking down XML documents. The only folder containing XML data is in source_documents. Inside the .txt files is XML file structure. Is TACReader ONLY parsing information from source_documents, or does it parse from other folders in the file structure?

Screen Shot 2021-08-30 at 2 50 10 PM

Here's how I'm trying to use TACReader and here's the error message I'm getting. Note, I've tried a bunch of different paths to set corpusRoot at, and they're all giving me the same error. I'm running completely blind here. Any help would be very appreciated!

import edu.illinois.cs.cogcomp.nlp.corpusreaders.TACReader;

public class PreprocessTAC {
    public static void main(String[] args) throws Exception {
        String path = "/path/to/tac_kbp_eng_event_arg_comp_train_eval_2014-2015/data/";
        TACReader reader_tac = new TACReader(path, false);
    }
}

Error message:

Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "<local4>" is null
	at edu.illinois.cs.cogcomp.core.io.IOUtils.lsFilesRecursive(IOUtils.java:145)
	at edu.illinois.cs.cogcomp.nlp.corpusreaders.TACReader.getFileListing(TACReader.java:239)
	at edu.illinois.cs.cogcomp.nlp.corpusreaders.XmlDocumentReader.initializeReader(XmlDocumentReader.java:107)
	at edu.illinois.cs.cogcomp.nlp.corpusreaders.AnnotationReader.<init>(AnnotationReader.java:47)
	at edu.illinois.cs.cogcomp.nlp.corpusreaders.AbstractIncrementalCorpusReader.<init>(AbstractIncrementalCorpusReader.java:61)
	at edu.illinois.cs.cogcomp.nlp.corpusreaders.XmlDocumentReader.<init>(XmlDocumentReader.java:89)
	at edu.illinois.cs.cogcomp.nlp.corpusreaders.TACReader.<init>(TACReader.java:113)
	at PreprocessTAC.main(PreprocessTAC.java:7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant