You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfortunately there's no data preserving that kind of information for Paracrawl. In the raw file, you might find the less filtered version we have. If you group by url, and in the same order they appear in the file, concatenating the sentences will give you some kind of "documents" but there will be sentences missing and the order might not be correct. But, if you are interested in document level and not particularly in the languages of Paracrawl, the parallel data from https://macocu.eu has been created with more recent Bitextor versions. Therefore, the latest version of each language-pair has a doc.txt file available for download. In those files, you will find in the columns 3 and 4, a base64 encoded document. Note that you might need additional filtering, as this doc version is less filtered in order to preserve full documents.
Hi,
is there somewhere a release of Paracrawl with bitextor granularity "Document" instead of sentences.
if not what if the easiest way to reproduce those.
Cheers.
The text was updated successfully, but these errors were encountered: