Skip to content

getiria-onsongo/itallic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 

Repository files navigation

ITALLIC: A tool for automatically identifying and correcting errors in location based plant breeding data

One of the challenges of integrating plant breeding data to collectively analyze it with other sources of data such as genotype, environment, management, and socioeconomic data is errors in location data. Collectively, this data could be used to inform genetic predictive models for maize, wheat, and other crops. Typical errors in plant breeding location data include flipped latitude and longitude values, missing negative signs, and, in some cases, missing data. This tool, an Integrated Tool for Automatic Lat Long Imputation and Cleaning (ITALLIC), automatically detects and corrects errors in location data and imputes missing values for location-dependent data, such as region name.

This page contains instructions for installing and uaing ITALLIC. These instructions assume familiarity working on a terminal.

Pre-Installation

ITALLIC is a Python 3 application. In addition to Python 3, we highly recommend also installing Conda. Click this link for more information on installing Conda.

Even though you do not need Conda to use ITALLIC, using Conda has some advantages that will make life easier. It will not only make installation for ITALLIC and other Python packages easy, it also enables use of conda environments. Use of environments is a good way to prevent conflicts that might arise when working on different projects that require different versions of the same software package. This blog nicely summarizes some advantages of using environments.

Prepare working environment

Create a conda environment for data cleaning and install ITALLIC in that environment. The command below uses "DataCleaning" as the environment name and Python 3.8 as the Python version to use. You can use a different name for your conda environment but we recommend sticking with Python 3.8. Any Python 3 version should work but since ITALLIC was tested on Python version 3.8, we recommend using Python 3.8.

NOTE: One of the many benefits of using Conda is even if you have a different version of Python installed on your system, it will will install version 3.8 for your "DataCleaning" environment. Just use the conda create command as shown below.

  • Create conda environment.
$ conda create --name DataCleaning python=3.8 -y
  • Activate the environment.
$ conda activate DataCleaning
  • Install Jupter Notebook. ITALLIC has a visualization tool that works well with Jupyer Notebook. Use conda to install Jupter.
$ conda install -c conda-forge jupyter -y
  • Install dependencies needed to use jupyter.
$ conda install -c conda-forge ipykernel -y
  • Create kernel for this environment to use with jupyter notebook. We recommend using the same name for the kernel that was used for the environment.
$ ipython kernel install --user --name=DataCleaning

Installation

Now that you have the environment setup, and installed jupyter, you are ready to install ITALLIC.

  • Install ITALLIC.
$ conda install -c conda-forge itallic -y

If you are not new to using Python for some reason you are have issues installing the package, try updating conda usng the command below.

$ conda update --all --yes
  • You can now deactivate the conda environment and switch to using Jupyter Notebook to get started.
$ conda deactivate

Getting Started

  • Create a working directory
$ mkdir DataCleaningDir
  • Navigate into the directory
$ cd DataCleaningDir
  • Get compressed folder with country boundary data and a sample dataset to use for testing
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/data.tar.gz

If your platform does not have wget, you can install it using conda "conda install -c conda-forge wget"

  • Uncompress data folder
$ tar -xvf data.tar.gz 

You can also download the compressed folder by clicking on this link and then clicking the "Download" button.

  • Download a Getting Started Python Notebook with basic commands on how to get started.
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/GettingStarted.ipynb
  • Launch jupypter notebook
$ jupyter notebook
  • Once you launch the notebook, a browser should be launched with contents of your working directory displayed as shown below. Double click on the Getting Started notebook.

  • Ensure you are using the kernel we created "DataCleaning" with itallic and its dependency software installed. The image below illustrates how to change your notebook kernel.

  • Follow the notebook to learn basic commands on how to get started.

About

A tool that automatically detects and corrects errors in location data and imputes missing values for location-dependent data, such as region name.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published