Skip to content

aliburchard/Data-digging

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-digging

This repository contains scripts and documentation related to analyzing classification data from Zooniverse projects. Most content is tailored to Panoptes-based Project Builder projects, but there is also some legacy Ouroboros-based code.

docs: Column descriptions for Panoptes export CSV files.

example_scripts: The example_scripts directory holds top-level example scripts (which are generally applicable to any project) and project-specific subdirectories, each with scripts and data files. These scripts convert classification data export CSV into more useful formats and data products. In most cases, these scripts extract information from the compact JSON-formatted “annotations” column data into an easier flat CSV file.

development: Sandbox directory for code development.

Project & Script Descriptions

Below we describe the analysis components implemented in each processing script. Feel free to pick-and-choose features described below when writing new scripts for your own project.

Some issues that all or most of these scripts address:

  • extracting classification marks/answers from within the JSON fields of the CSV classification data exports
  • cleaning the classification export files:
    • removing duplicate classifications (if they occur)
    • dealing with empty classifications (some projects throw them out, others count them as "nothing here" votes)
    • only including classifications from the most up-to-date workflow version(s)

Marking star cluster locations in Hubble Space Telescope images.

Script -- Creates CSV of circular marker info from simple marking workflow.

Marker type -- circle

The decoding the civil war project invites volunteers to transcribe contemporary, hand-written transcripts of telegrams sent between allies during the American Civil War. Portions of these transcripts are enciphered using whole-word substitutions. The ultimate goal of the project is to allow volunteers to identify these substituted words based on their contextual appropriateness.

The bespoke consensus and aggregation code written for this project is archived and documented in a separate repository.

Marker type -- line, text input attached to mark

A beta project to examine HI structures in the Milky Way.

Scripts -- Extracts markings from classification file into individual files (ready for clustering).

Marker type -- line, point, ellipse, text input attached to mark

Answering questions about the presence of bar structures and marking bar dimensions.

Scripts -- Analyzes joint question+marking workflow (but mostly the markings).

Marker type -- line

A transcription project for museum collections. The label reconciliation scripts are maintained in a separate repository.

Extracting markings of damage and other features from post-disaster satellite imagery.

Script -- puts classification information together with geocoordinate information from subject exports.

Marker type -- point, polygon (though these aren't reduced here)

Marking interesting objects (including moving objects) in images from the WISE satellite.

Script -- Creates CSV of point marker info from simple marking workflow.

Marker type -- point

Classification of radio observations to identify pulsar candidates.

Scripts -- Analyzes responses and aggregates object type answer, also script for counting classifications. IP address tracking was wonky during this project, so unique non-logged-in users were identified with browser session info instead.

Marker type -- no markers, only 1 question task

Workflow #1: Yes/No if sea lions are present.

Scripts -- 1) Extracts normal csv from embedded JSON. 2) Aggregates results.

Marker type -- no marks, only question tasks

Older Scripts (Ouroboros-based)

Galaxy Zoo: Misc

Includes scripts that generate progress reports for Ouroboros-based GZ project, and decision tree processing

Galaxy Zoo: Talk

Scripts that compute statistics and analyzes Talk data for Ouroboros-based GZ project.

About

Scripts and such for data management, analysis, visualization, etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 54.4%
  • Python 45.3%
  • R 0.3%