Skip to content
View pjox's full-sized avatar
Drinking coffee
Drinking coffee

Highlights

  • Pro

Organizations

@commoncrawl @bigscience-workshop @oscar-project
Block or Report

Block or report pjox

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pjox/README.md

Hi there 👋

I'm a Senior Research Scientist at the Common Crawl Foundation.

I am interested in large corpora for training language models, specially for under resourced languages and historical languages. I am interested in tasks such as Name Entity Recognition (NER), Dependency Parsing and Part-of-Speech tagging, Machine Translation and Document structuration.

I love coffee ☕️, cookies 🍪 and maths.

Popular repositories

  1. gutf gutf Public

    Terminal tool that converts files encoding to UTF-8

    Go 10 1

  2. gofishing gofishing Public

    An extremely fast entity-fishing client

    Go 4

  3. CamemBERT-Experiments CamemBERT-Experiments Public

    A notebook with CamemBERT experiments.

    Jupyter Notebook 4

  4. thesis thesis Public

    My Ph.D. Thesis

    TeX 3

  5. sirene-sql sirene-sql Public

    Une query utile pour importer le fichier csv de la base de données sirene dans une base de données SQL

    2

  6. portizs portizs Public

    My personal website

    TeX 2