measurements export
does not consistently allow the strain column to be used as a grouping column
#1428
Labels
bug
Something isn't working
Current behavior and how to reproduce
The
augur measurements export
command does not allow users to specify the column used as the--strain-column
in the--grouping-column
argument unless the user knows that the strain column gets renamed internally tostrain
and the user does not specify--include-columns
. The following examples will demonstrate the nature of this issue and how to reproduce the behavior.Reproducing this error requires building a minimal tree and measurements dataset. I ended up using the public data from the measurements panel paper as a starting point. The sequences and metadata are attached here: h3n2_data.zip
Clone the measurements panel repo and change into the directory:
Download sequences and metadata attached above into the
data/
subdirectory.Run the workflow to build the Auspice JSONs with the latest Nextstrain CLI and image:
Confirm that the Auspice JSONs work as expected by dragging them on to https://auspice.us/.
Next, start a Nextstrain shell and rebuild the measurements panel manually with a command that overrides the grouping columns:
Confirm that this panel JSON works in auspice.us with the original tree JSON. Then, try adding the strain column to the list of grouping columns (so we can reverse the grouping on the y-axis to the test strain as when we want to see the distribution of measurements in that direction):
This command produces the following error:
This (clearly incorrect) error suggests that the strain column
test_strain
gets renamed internally such that the user can no longer refer to it during the code that parses the grouping columns. To test this, we can change the name of the grouping column fromtest_strain
tostrain
as follows:This works!
But, if I also add the
--include-columns
argument to limit which data appear in the panel, like so:I get the following new error:
This error suggests that the grouping column validation associated with
--include-columns
occurs beforetest_strain
gets renamed internally tostrain
. Then, when I try to change the grouping column to the value listed in the error message above (test_strain
):I get the following error again:
Expected behavior
I would expect to be able to consistently refer to the strain column by its name in the input collection file for all arguments to the
measurements export
command. Although the use case here is unusual for our traditional serology data, other types of data benefit from this ability to group by the strain column on the y-axis.Possible solution
One possible solution would be to keep the original strain column under its same name but add a new internal id column that copies the strain column values for use by internal logic. We could drop the original strain column from the internal data frame just before writing out the JSON version of the measurements.
The text was updated successfully, but these errors were encountered: