Skip to content

Latest commit

 

History

History

software-tutorials

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ContentMine logo

The usage of ContentMine tools can be learned step-by-step with the help of the tutorials. They describe functionalities, what results to expect, and how to link the different elements of the content mining pipeline. They are based on the ContentMine virtual machine, which has all necessary software pre-installed.

The tutorials can be used in workshops as well as for self-guided learning.

Table of contents

  1. Purpose and installation of the ContentMine-VirtualMachine
    A VirtualBox-Image contains all necessary software as well as sample datasets for getting started with content mining. This tutorial explains how to install the ContentMine-VM and use it as a sandbox environment.

  2. Introduction to the command line interface
    This tutorial introduces the basic UNIX-commands and shows how to navigate folders and handle files.

  3. Getting started with getpapers
    This tutorial demonstrates how to create an initial corpus for fact extraction.

  4. Getting started with quickscrape
    This tutorial introduces quickscrape, and how to use it to extract semi-structured information from web pages.

  5. Create your own scraper definition
    This tutorial shows how to contribute to, and extend the ContentMine scraper collection. If you need a specific definition for your use with quickscrape, here you can learn how to create it.

  6. Normalizing scholarly literature
    This tutorial shows how to normalize scientific literature into a unified format which can be processed by machines.

  7. ContentMine data structure: CProject
    This tutorial gives an overview of the data structure used, and how it can be integrated in your analysis.

  8. Extracting facts with AMI-plugins
    This tutorial demonstrates how to extract, aggregate, and filter facts from scholarly.html.