Name	Name	Last commit message	Last commit date
parent directory ..
conda	conda
intake_example_package	intake_example_package
MANIFEST.in	MANIFEST.in
README.md	README.md
setup.py	setup.py

Intake Example Package

A very simply python package structure showing how to "install data" using only setuptools / pip and not requiring conda.

This is done using the entrypoints keys in setup.py:

    entry_points={
        'intake.catalogs': [
            'sea_cat = intake_example_package:cat',
            'sea_data = intake_example_package:data'
        ]
    }

Explanations

entry_points: this key is required, states that this information is to be accessible using the entrypoints package. This is a dependency of Intake, you should not need to install it separately.
'intake.catalogs': marks the list that follows as a set of installed catalogs
'sea_cat = intake_example_package:cat': the first identifier is the name to give in the builtin catalog, and the second part is a reference to the catalog object, as module:object .
'sea_data = intake_example_package:data': very similar to the previous line, but this time referencing a data source rather than a catalog. Remember that, in Intake, a catalog is just another data source.

Effectively, this does the following once the package is installed: set the "sea_cat" entry in intake.cat to reference the "cat" catalog object in intake_example_package, and "sea_data" to reference the "data" data source.

Try it:

In this directory, run install, one of

python setup.py develop
python setup.py install
pip install -e .
pip install . (the "develop" or "-e" versions use links rather than copying files; you may also need to add options for "user" installs if you don't have write permissions in your environment)

Or if you don't want to download and extract,

pip install git+https://github.com/intake/intake-examples#egg=intake_example_package&subdirectory=data_package

Following that, you can try in python, and see both entries in action:

>>> import intake
>>> list(intake.cat)  # may have more entries here
['sea_cat', 'sea_data']
>>> list(intake.cat.sea_cat)  # this is a catalog
['sea_ice']
>>> intake.cat.sea_cat.sea_ice.read().head()
      Time  Arctic  Antarctica
0  1990M01   12.72        3.27
1  1990M02   13.33        2.15
2  1990M03   13.44        2.71
3  1990M04   12.16        5.10
4  1990M05   10.84        7.37
>>> intake.cat.sea_data.read().head()
      Time  Arctic  Antarctica
0  1990M01   12.72        3.27
1  1990M02   13.33        2.15
2  1990M03   13.44        2.71
3  1990M04   12.16        5.10
4  1990M05   10.84        7.37

The data source also comes with a predefined plot, so you can visualise that (in a notebook or via the viz server) or run the interactive dataframe viz GUI. Enjoy!

Notes

Intake does not import the module, unless a user uses the entry
In this example, the two data sources created in the init of a package, but it could have been a single module, or conversely, within a sub-package
the package.module:object should refer to the original definition of the object, where it is instantiated; i.e., if you have a package init with a statement like from .data import mycat, you should reference package.data:mycat and not package:mycat
The catalog object here is created with intake.open_catalog, but any method for creating Catalog of any other data source object would do.

Distribute

To PyPI

Since this directory forms a complete python package, you could upload it to pypi. For example, the following commands would do the job

> python setup.py sdist bdist_wheel
> twine upload dist/*

(this assumes you have twine installed and have a valid account on PyPI).

If you were to try the above command, you would find that the package name is already taken by the Intake authors, but you can use this pattern to upload your own package names for your own data/catalogs. In this case, it make it possible for anyone else to then install the data package using

> pip install intake_example_package

(instead of the installation lines in the "Try it" section, above)

Note that it would be good practice to modify setup.py to include further information such as a detailed description/readme, all dependencies and their required versions, licencing, etc., before building and uploading anything. Note that setup.py specifically includes YAML file. You could, in theory, include data files directly in such packages, but in practice, PyPI packages should be small, and the data referenced remotely.

Build your own conda package

The folder conda/ contains a simple recipe for conda. Assuming you have conda-build installed, you can run

conda build conda/

and either share the resultant file directly, or upload the file to your channel (the command will be printed to the console when the package build is done).

This example can be found on anaconda cloud and installed using

> conda install -c intake intake_example_package

Of course, here you would substitute the name and channel appropriate for your data package.

Using conda-forge

conda-forge provides an automated and uniform way to build and upload conda packages, which will appear on the popular conda-forge channel. It will notice when new version are uploaded to PyPI, and automatically build new versions.

Full instructions on how to submit a package for inclusion in conda-forge can be found in their documentation. The eventual installation line would look something like

> conda install -c conda-forge newdatapackage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_package

data_package

conda

conda