feat(evaluate): add signwriting evaluation metrics #1104

AmitMY · 2024-02-10T16:18:10Z

This pull request fixes #1103

Changes here are complete (this works), but we might want to figure out a different solution, in case this is too much hardcoding for you all (perhaps something like an external evaluation script, and a single custom metric that can call that script)

Note! that unlike BLEU or other metrics, this evaluation takes all factors into account, as they matter for SignWriting. Even if using a non-factored setup though, this evaluation works.

What do you think? how can we integrate this into sockeye?
This is very important to me, to push forward the adoption of SignWriting as a written form of signed languages, allowing both transcription and translation models to be trained with sockeye.

Tests fail with the following error, despite an Nvidia Tesla T4 GPU available:

ValueError: Expected a cuda device, but got: CPU

Pull Request Checklist

Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
until you can check this box.
Unit tests pass (pytest)
System tests pass (pytest test/system)
Passed code style checking (./style-check.sh)
You have considered writing a test
Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mjdenkowski · 2024-02-14T03:36:38Z

Hi Amit,

It's great to see that you have an implementation working! We can think about two questions for adding new features to Sockeye:

How can we make the feature general enough to be broadly useful for the community?
How can we make the feature available in a way where it can be easily understood and safely accessed?

For now, it looks like you have an implementation that works for your use case and that you could share with other interested users via a forked repository. In that case, you would control everything and wouldn't need to add extra layers of engineering.

Running your metric with the main version of Sockeye would be more challenging. Adding support for custom metric scripts is a great idea from a science perspective. Since Sockeye is also built to run in production environments, we need to avoid security vulnerabilities like arbitrary code execution. Designing, implementing, and documenting a secure way to run user-defined metrics would significantly increase the scope of this project.

I recommend using your fork of Sockeye for now and revisiting this if using the fork starts to block your work or the work of others working on SignWriting.

Best,
Michael

AmitMY · 2024-02-14T08:32:14Z

Thanks.
I think that since my implementation is generic for SignWriting outputs, it might be valid to introduce inside of sockeye.
It does not change the behavior for spoken language users, but all of a sudden allows for more experimentation with signed languages.

In any case, for the meanwhile I am successfully using my fork.

AmitMY added 2 commits February 10, 2024 17:05

feat(evaluate): add signwriting evaluation metrics

f070c8c

chore(): changelog and bump release version

93099c7

AmitMY requested review from fhieber, tdomhan, mjdenkowski, xingniu and annacurrey as code owners February 10, 2024 16:18

fix(evaluate): clip score remove cache directory

baf2a5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluate): add signwriting evaluation metrics #1104

feat(evaluate): add signwriting evaluation metrics #1104

AmitMY commented Feb 10, 2024 •

edited

mjdenkowski commented Feb 14, 2024

AmitMY commented Feb 14, 2024

feat(evaluate): add signwriting evaluation metrics #1104

Are you sure you want to change the base?

feat(evaluate): add signwriting evaluation metrics #1104

Conversation

AmitMY commented Feb 10, 2024 • edited

Pull Request Checklist

mjdenkowski commented Feb 14, 2024

AmitMY commented Feb 14, 2024

AmitMY commented Feb 10, 2024 •

edited