Skip to content
@CI-Research

CI-Research

Popular repositories

  1. KeywordAnalysis KeywordAnalysis Public

    Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends

    56 13

  2. spark-Jupyter-AWS spark-Jupyter-AWS Public

    Forked from PiercingDan/spark-Jupyter-AWS

    A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

    Jupyter Notebook 1

  3. cdx-index-client cdx-index-client Public

    Forked from ikreymer/cdx-index-client

    A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/

    Python 1 1

  4. commoncrawl-examples commoncrawl-examples Public

    Forked from commoncrawl/commoncrawl-examples

    A library of examples showing how to use the Common Crawl corpus.

    Java 1

  5. dkpro-c4corpus dkpro-c4corpus Public

    Forked from dkpro/dkpro-c4corpus

    DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate removal, language detection, and near-duplicate removal.

    Java 1

  6. common_crawl_index common_crawl_index Public

    Forked from trivio/common_crawl_index

    Index URLs in Common Crawl

    Python 2

Repositories

Showing 9 of 9 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…