Skip to content
This repository has been archived by the owner on Feb 22, 2021. It is now read-only.

Issue creating corpus #32

Open
RishabGargeya opened this issue Jan 5, 2017 · 1 comment
Open

Issue creating corpus #32

RishabGargeya opened this issue Jan 5, 2017 · 1 comment
Labels

Comments

@RishabGargeya
Copy link

Getting this error:

[info] Assembly up to date: /home/rg203/work/scripts/wiki2vec/target/scala-2.10/wiki2vec-assembly-1.0.jar
[success] Total time: 2 s, completed Jan 5, 2017 7:29:26 AM
Creating Readable Wiki..
Exception in thread "main" java.io.IOException: Stream is not in the BZip2 format
	at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.init(BZip2CompressorInputStream.java:255)
	at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.java:138)
	at org.idio.wikipedia.dumps.ReadableWiki.getWikipediaStream(ReadableWiki.scala:19)
	at org.idio.wikipedia.dumps.ReadableWiki.createReadableWiki(ReadableWiki.scala:31)
	at org.idio.wikipedia.dumps.CreateReadableWiki$.main(ReadableWiki.scala:55)
	at org.idio.wikipedia.dumps.CreateReadableWiki.main(ReadableWiki.scala)
Creating Word2vec Corpus
/home/rg203/work/scripts/wiki2vec/working/spark-1.2.0-bin-hadoop2.4/bin/spark-class: line 113: [: : integer expression expected
/home/rg203/work/scripts/wiki2vec/working/spark-1.2.0-bin-hadoop2.4/bin/spark-class: line 187: /usr/lib/jvm/java-8-oracle/jre/bin/java/bin/java: Not a directory
/home/rg203/work/scripts/wiki2vec/working/spark-1.2.0-bin-hadoop2.4/bin/spark-class: line 187: exec: /usr/lib/jvm/java-8-oracle/jre/bin/java/bin/java: cannot execute: Not a directory
Joining corpus..
cat: 'part*': No such file or directory
 ^___^ corpus : /home/rg203/work/scripts/wiki2vec/spanish_output//eswiki.corpus

Any ideas? Thanks for the help!

@keynmol
Copy link
Contributor

keynmol commented Jan 5, 2017

We'd need more info to debug that. Are you sure you're giving it a .bz2 compressed wikipedia dump?

@keynmol keynmol added the ready label Jan 18, 2017
@jsgriffin jsgriffin added backlog and removed ready labels Feb 6, 2017
@Lugrin Lugrin added icebox and removed backlog labels Apr 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants