Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing components of a step result #485

Open
lgessler opened this issue Dec 6, 2022 · 4 comments
Open

Accessing components of a step result #485

lgessler opened this issue Dec 6, 2022 · 4 comments

Comments

@lgessler
Copy link

lgessler commented Dec 6, 2022

Suppose I'm working on a POS tagger. I have two steps, one where I load my dataset and another where I train my model:

{
  data: { ... },
  trained_model: { 
    type: "torch::train",
    model: {
      ...
      tag_count: ... // what here?
    }
    ...
}

I want to count the number of unique tags that were loaded in the dataset dynamically instead of hardcoding the tag count in the config, and I could do this during the data step and include it in the output. However, I don't see how a ref like { type: "ref", ref: "data" } could help, since I really want to access a specific subpart of the data step's result, e.g. the attribute tag_count.

One workaround is to make a third step between the two, count_tags, which takes data and produces the tag_count as its step result, though this gets awkward when you want to do this for many vocabularies.

Does Tango have any amenities for cases like this? It'd be great if I could write something like { type: "ref", ref: "data", attrs: ["tag_count"] }

@epwalsh
Copy link
Member

epwalsh commented Dec 6, 2022

Hey @lgessler, we do have a mechanism for accessing a specific key in the output of another step. This works for things that act like dictionaries, tuples, or lists. For example, you would do { type: "ref", ref: "data", key: "tag_count" } if the output from your data step looks like this:

def run(self, ...) -> Dict[str, Any]:
    return {"dataset": dataset, "tag_count": tag_count}

Or {type: "ref", ref: "data", key: 1} if the output from your data step looks like this:

def run(self, ...) -> Tuple[Dataset, int]:
    return dataset, tag_count

@epwalsh
Copy link
Member

epwalsh commented Dec 6, 2022

Now that I think of it, we could probably also support attributes in addition to keys. If anyone is interested in making a PR for that, have a look at the PR that added support for keys: #371

@lgessler
Copy link
Author

lgessler commented Dec 6, 2022

Awesome! AFAICT this isn't discussed in the docs (and I don't see any docs changes in #371). Would you accept a PR on the docs?

@epwalsh
Copy link
Member

epwalsh commented Dec 6, 2022

Absolutely! That would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants