You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PDFprocessingoption (previously PDFextract). Now it is a list that allows you to choose whether to use pdf2html, pdfextract or Apache Tika (new PDF processor), by @aarongaliano in Dir2warc #247
New Bitextor multilangoption (if activated, warc2text will extract content in different languages from the same document), by @aarongaliano in Dir2warc #247
bitextor-v8.3.zip tarball does include submodules code and binaries. If you start compiling the project after cloning from the repository, you need first to git submodule update --init --recursive. Also, you can't issue this command on the source code .tar.gz and .zip packages generated by GitHub, so we recommend the bitextor-v8.3.zip tarball or cloning the repo v8.3 tag.
We will support Bitextor 8.x branch until the next major version is released.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've seen things you people wouldn't believe. Roy Batty, The Preverticant
What's Changed
paragraphIdentification
by @lpla in Final paragraph number #241neural_modifications
branch: Metadata refactorization #245directories
anddirectioriesFile
documentation, by @aarongaliano in Dir2warc #247PDFprocessing
option (previouslyPDFextract
). Now it is a list that allows you to choose whether to use pdf2html, pdfextract or Apache Tika (new PDF processor), by @aarongaliano in Dir2warc #247multilang
option (if activated, warc2text will extract content in different languages from the same document), by @aarongaliano in Dir2warc #247bicleanerExtraArgs
to pass extra arguments to Bicleaner(-AI) by @lpla in Bitextor argument to pass extra arguments to Bicleaner(-AI) #250New Contributors
Full Changelog: v8.2...v8.3
Notes
bitextor-v8.3.zip
tarball does include submodules code and binaries. If you start compiling the project after cloning from the repository, you need first togit submodule update --init --recursive
. Also, you can't issue this command on the source code.tar.gz
and.zip
packages generated by GitHub, so we recommend thebitextor-v8.3.zip
tarball or cloning the repov8.3
tag.We will support Bitextor
8.x
branch until the next major version is released.This discussion was created from the release Bitextor 8.3: Snake Runner, the Sentence Retirer.
Beta Was this translation helpful? Give feedback.
All reactions