Skip to content

snorkel-team/snorkel-zoo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snorkel Zoo

snorkel-logo

The Snorkel Zoo is a collection of utilities for writing labeling functions, transformation functions, and slicing functions, as seen in the core Snorkel library. We’ve demonstrated the efficacy of templates or declarative operators across a range of use cases in prior work (Ratner et. al 2019). In this reposiotry, we aim to provide a shared resource for different builders, generators, and primitives that are effective in both research and production contexts. More importantly, we’re excited to crowdsource ideas from the community!

Structure

The repository is divided into subfolders for builders, generators, and primitives.

Templates

For a single problem, It’s helpful to have a shared interface for building a specific type of labeling function. For instance, the Intro Tutorial features a number of keyword labeling functions using a shared template:

def make_keyword_lf(keywords, label=SPAM):
    return LabelingFunction(
        name=f"keyword_{keywords[0]}",
        f=keyword_lookup,
        resources=dict(keywords=keywords, label=label),
    )


"""Spam comments talk about 'my channel', 'my video', etc."""
keyword_my = make_keyword_lf(keywords=["my"])

"""Spam comments ask users to subscribe to their channels."""
keyword_subscribe = make_keyword_lf(keywords=["subscribe"])

"""Spam comments post links to other channels."""
keyword_link = make_keyword_lf(keywords=["http"])

"""Spam comments make requests rather than commenting."""
keyword_please = make_keyword_lf(keywords=["please", "plz"])

"""Ham comments actually talk about the video's content."""
keyword_song = make_keyword_lf(keywords=["song"], label=HAM)

Generators

Labeling functions may be generated using programmatic methods. We’ve explored this in a number of settings — from automatically-generated labeling functions (Varma et. al 2019) to natural language interfaces for parsing labeling functions (Hancock et. al 2018). In the Crowdsourcing Tutorial, we show a generator that produces a labeling function for each crowdworker:

def worker_lf(x, worker_dict):
    return worker_dict.get(x.tweet_id, ABSTAIN)


def make_worker_lf(worker_id):
    worker_dict = worker_dicts[worker_id]
    name = f"worker_{worker_id}"
    return LabelingFunction(name, f=worker_lf, resources={"worker_dict": worker_dict})


worker_lfs = [make_worker_lf(worker_id) for worker_id in worker_dicts]

Primitives

For certain use cases, it's helpful to generate primitives, or basic features, over the underlying data for Snorkel operators to access. This is especially important for non-textual data modalities, as we’ve shown in work across medical imaging (Fries et. al, 2019) and computer vision (Chen et. al 2019).

Contributing

Coming soon!

About

A collection of utilities for writing labeling functions, transformation functions, and slicing functions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages