Adding SalientExcerptMixSourceFolder object #228

ethman · 2021-06-26T01:59:22Z

Fixes #225

…ied MixSourceFolder object to increase versatility. Simplified a few lines from utils.

…ntent to testcase.

…m the segment_mode.

codecov-commenter · 2021-07-02T18:38:23Z

Codecov Report

Merging #228 (134030f) into master (471e796) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #228   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           72        72           
  Lines         5209      5280   +71     
=========================================
+ Hits          5209      5280   +71

Impacted Files	Coverage Δ
nussl/datasets/__init__.py	`100.00% <ø> (ø)`
nussl/core/utils.py	`100.00% <100.00%> (ø)`
nussl/datasets/hooks.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 471e796...134030f. Read the comment docs.

… params.

ethman

Thanks for doing this! I've added inline comments, but I think the main takeaways for me are that I would expect the changes to be contained within the SalientExcerptMixSourceFolder object and related tests, but it seems like there's a bit of spillover. Overall, great work Boaz!

ethman · 2021-07-02T22:10:09Z

nussl/core/utils.py

+    hop_dur = int(dur * hop_ratio)
+    threshold = np.power(10.0, threshold_db / 20.0)
+    # adjust the shape of a mono track
+    if audio.shape[0] == 1:


Does this work with stereo?

I'll double check, but I believe that the stereo worked fine, but for some reason mono needed to be formatted differently hence the special check.

nussl/datasets/base_dataset.py

nussl/datasets/hooks.py

tests/datasets/test_hooks.py

nussl/datasets/hooks.py

… from base_dataset, adding padding code to hooks, reverting commit of multitrack, cleaning asserts in tests.

ethman

Overall looks good again. Getting closer. Keep iterating. I think we should nix the padding, as discussed below.

One nit pick is to make sure your code is following PEP8, specifically limiting the maximum line length to 79 chars: https://pep8.org/#maximum-line-length

nussl/datasets/hooks.py

…arameter

ethman

Looking really good @boazcogan! Most of my comments are nit picks. I do have some opinions about how you should structure your code & logic for the balancing, but I'll pop into your stand up tomorrow to discuss.

ethman · 2021-07-12T16:14:57Z

nussl/datasets/hooks.py

+        folder (str): Location that should be processed to produce the
+            list of files.
+        salient_src (str): The name of the source that will be used to identify
+         salient samples in the dataset. e.g. ('drums')


Make sure all of the wrapped lines have the same indent. Some have 4 spaces (or 1 tab) others have 2. Follow the docstrings in the rest of this file. This will be important when we have to build the documentation website.

ethman · 2021-07-12T16:15:39Z

nussl/datasets/hooks.py

+        segment_dur (float, optional): the duration of the desired audio clips
+         in seconds. Defaults to 4.0.
+        hop_ratio (float, optional): size of the hops to use when computing
+         the RMS. Defaults to 0.5.


Not just the RMS, but more importantly the Segment window.

ethman · 2021-07-12T16:18:03Z

nussl/datasets/hooks.py

+         in seconds. Defaults to 4.0.
+        hop_ratio (float, optional): size of the hops to use when computing
+         the RMS. Defaults to 0.5.
+        verbose (bool, optional): suppress progress bar and logging for the


This language is a bit ambiguous. Does "Suppress progress bar..." mean that if this arg is True it will suppress the progress bar? For bools, I like to be explicit in the docs: If True, this flag will blah blah. Else, blah blah.

ethman · 2021-07-12T16:19:39Z

nussl/datasets/hooks.py

+                                                       self.hop_ratio,
+                                                       mix.sample_rate,
+                                                       self.threshold_db)
+            # this is a fairly cheap operation, no need to embed it within a


I don't think you need this comment

ethman · 2021-07-12T16:24:00Z

nussl/datasets/hooks.py

+                        'mixsrc_item': item,
+                        'start': start / mix.sample_rate
+                    })
+        # if the balance flag is set then balance the metadata


Same with this comment. In general, use inline comments to explain code excerpts that might be difficult to parse like complex logic, a seemingly unorthodox decision, or the expected contents of a data structure. Adding unnecessary comments can clutter up the code.

A more effective comment here would be like # self._balance_set() will take care of the balancing logic. But I'm still not convinced you need a comment here though...

ethman · 2021-07-12T16:26:06Z

nussl/datasets/hooks.py

+        mixsrc_item = item['mixsrc_item']
+        # Note that the onset needs to be passed to the AudioSignal
+        # class as the kwarg offset.
+        onset = item['start']


Why not rename this variable to start = item['start']? That way the code is clearer and we don't need this comment to explain what's happening.

As it is right now, this variable is referred to by three names in just 3 lines of code: 'start' (in item['start']), onset, and offset (in the process_item kwarg). I'm not sure if there's enough context to name the variable offset here because there are a few layers of abstraction before the offset arg is used directly, so I think start conveys what's happening here well.

Sorry for the long winded explanation; I'm walking you through my thought process.

ethman · 2021-07-12T16:38:50Z

nussl/datasets/hooks.py

+        #  others
+        avg_length = np.mean([elem[0] for elem in sample_counts])
+        metadata = []
+        for count, song, starts, sample_rate in sample_counts:


Nit: the AudioSignal might not be a song, so perhaps this is a confusing variable name. item would be better and more consistent with the rest of the object.

nussl/datasets/hooks.py

ethman · 2021-07-12T16:49:22Z

nussl/datasets/hooks.py

+            list of files.
+        salient_src (str): The name of the source that will be used to identify
+         salient samples in the dataset. e.g. ('drums')
+        sample_rate (int, optional): the sampling rate for the audio files.


Nit: Make sure to use proper grammar in docstrings, including capitalizing the first word of each sentence.

…stic behavior.

Making new SalientMixSrc branch

9901ea0

ethman assigned boazcogan Jun 26, 2021

ethman and others added 10 commits June 25, 2021 20:59

Merge branch 'master' into salient_mixsrc2

1300175

SalientExcerptMixSourceFolder dataset object is now functional. Modif…

c120224

…ied MixSourceFolder object to increase versatility. Simplified a few lines from utils.

adding simple testcase

ed3f03d

starting handling for signals that are too short

ba68e1d

updating code to contain padding for signals and adding additional co…

d946c6d

…ntent to testcase.

fixing some simple bugs in the sequencing of padding

c89bd24

removing print statements and temporarily removing modal behavior fro…

36a71fc

…m the segment_mode.

accidentally removed duration limiting code when reading

a31d319

repairing OnTheFly mixcosure function and adding documentation to getmix

120e899

repairing OnTheFly mixcosure function and adding documentation to getmix

8d6a2cc

boazcogan added 2 commits July 2, 2021 13:09

adding documentation and moving sample_rate from optional to required…

7489cdf

… params.

adding documentation and moving sample_rate from optional to required…

cde1ae9

… params.

ethman commented Jul 6, 2021

View reviewed changes

Integrating some changes suggested in feedback. Removing salient code…

841d906

… from base_dataset, adding padding code to hooks, reverting commit of multitrack, cleaning asserts in tests.

ethman commented Jul 6, 2021

View reviewed changes

nussl/datasets/hooks.py Outdated Show resolved Hide resolved

nussl/datasets/hooks.py Show resolved Hide resolved

nussl/datasets/hooks.py Outdated Show resolved Hide resolved

nussl/datasets/hooks.py Outdated Show resolved Hide resolved

nussl/datasets/hooks.py Outdated Show resolved Hide resolved

boazcogan added 2 commits July 6, 2021 17:41

updating hooks to be agnostic to sampling rate and removing padding p…

0f67c3b

…arameter

adding dataset balancing

ed7c12a

ethman commented Jul 12, 2021

View reviewed changes

boazcogan added 3 commits July 14, 2021 16:06

refining balancing code and adding testcases for the new mode.

706c156

adding a test for incorrect balance mode entry

f7868d0

Adjusting comments, docstring, and variable naming. Forcing determini…

134030f

…stic behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding SalientExcerptMixSourceFolder object #228

Adding SalientExcerptMixSourceFolder object #228

ethman commented Jun 26, 2021

codecov-commenter commented Jul 2, 2021 •

edited

ethman left a comment

ethman Jul 2, 2021

boazcogan Jul 6, 2021

ethman left a comment •

edited

ethman left a comment

ethman Jul 12, 2021

ethman Jul 12, 2021

ethman Jul 12, 2021

ethman Jul 12, 2021

ethman Jul 12, 2021

ethman Jul 12, 2021

ethman Jul 12, 2021

ethman Jul 12, 2021

ethman Jul 12, 2021

Adding SalientExcerptMixSourceFolder object #228

Are you sure you want to change the base?

Adding SalientExcerptMixSourceFolder object #228

Conversation

ethman commented Jun 26, 2021

codecov-commenter commented Jul 2, 2021 • edited

Codecov Report

ethman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethman left a comment • edited

Choose a reason for hiding this comment

ethman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 2, 2021 •

edited

ethman left a comment •

edited