Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Infra for Training #124

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

bgenchel
Copy link
Collaborator

started a new branch because there was some mismatch with the previous branch and remote, so this seemed easier.

Not infra as in backend, obv. but code structure and tests for datasets, with guitarset implemented as a first inclusion / example.

Benjamin Genchel and others added 5 commits April 24, 2024 18:50
…ng-fork. implemented and tested a dataset only arg version of download.py. Need to reconsider the other arguments.
…to test the ecosystem. Tests are written but tox is currently not passing. Many files reformatted by black.
…project.toml and tox.yml, which needs review. Added .jams files to MANIFEST.in to accomodate test file for guitarset. Added check for pytest run to guitarset download to avoid unnecessary full download during testing, need review. tox passing.
…roject.toml. Moved all tests relating to data into their own test directory. Updated path to resources accordingly.
… test for tf_example_serialization, added / corrected tests for other data / train files.
@bgenchel bgenchel self-assigned this Apr 24, 2024
CONTRIBUTING.md Outdated
@@ -13,6 +13,7 @@ We recommend first installing the following non-python dependencies:
- To install on Windows, run `choco install libsndfile` using [Chocolatey](https://chocolatey.org/)
- To install on Ubuntu, run `sudo apt-get update && sudo apt-get install --no-install-recommends -y --fix-missing pkg-config libsndfile1`
- [ffmpeg](https://ffmpeg.org/) is a complete, cross-platform solution to record, convert and stream audio in all `basic-pitch` supported formats
- sox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a description and mention that it's required for data creation?

"disk_size_gb": 128,
"experiments": ["use_runner_v2"],
"save_main_session": True,
"worker_harness_container_image": known_args.worker_harness_container_image,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to run a DirectRunner test with a harness container image? We may need to provide a Dockerfile or instructions on how to create one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss in meeting

@@ -0,0 +1,41 @@
import argparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add copyright notice

#!/usr/bin/env python
# encoding: utf-8
#
# Copyright 2022 Spotify AB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of copy right notices, you probably should update the year.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all the files?

@@ -110,7 +110,11 @@ def model_output_to_notes(
)


def sonify_midi(midi: pretty_midi.PrettyMIDI, save_path: Union[pathlib.Path, str], sr: Optional[int] = 44100) -> None:
def sonify_midi(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh why is there a ton of formatting changes? Do you use a different max line length than the one specified in pyproject.toml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but I did run the formatter via tox. I can switch it all back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new ones at least

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it all back, but after running tox it does seem that black wants to reformat two long lines into multi line blocks even though they don't exceed the line limit. Wonder if it's some other rule

pyproject.toml Outdated
@@ -19,6 +19,7 @@ classifiers = [
]
dependencies = [
"coremltools; platform_system == 'Darwin'",
"apache_beam",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be in the general project deps. This should be in extras only.

… sox in CONTRIBUTING.md, add copyright notice to download.py, remove apache_beam from general dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants