-tedpca command-line interface #956

Lestropie · 2023-06-29T06:49:42Z

Relates to discussion in #949.

Note that I am approaching this as someone with no experience with specifically TEDANA, but plenty of experience in research software.

In 23.0.x, there is a single command-line option "-tedpca" that controls how much data will pass from the PCA to the ICA. There are multiple ways in which this option can be used:

Specify a string corresponding to one of a set of built-in rank selection methods.
Specify a floating-point value corresponding to a fraction of variance to be preserved.
Specify an integer value corresponding to a number of components to be preserved.

The relevant logic to tease these apart is here.

This however introduces a few problems:

For a rank selection method, one cannot use the built-in argparse capability to enforce that the user input corresponds to one of a set of choices.
The almost identical values 0.99999 and 1.0 will have polar opposite empirical behaviour. The former preserves all components; the latter rejects all but one component due to being interpreted as an integer.

From a software user interface perspective, the way that I would approach this personally would be an argument group, with three mutually exclusive options, eg.:

-tedpca_method, of type choice
-tedpca_rank, of type integer
-tedpca_variance, of type float

This would preclude erroneous user input at the point of command-line parsing.

Further, in our particular circumstance (lead by @BahmanTahayori), where we want to bypass the PCA step here, that would be properly achieved:

In the short term using -tedpca_variance 1.0.
In the long term with a fourth option eg. -tedpca_skip to be added to this option group.

The text was updated successfully, but these errors were encountered:

handwerkerd · 2023-07-23T20:26:33Z

For better and worse, we are calling sklearn.decomposition.PCA and we are replicating their terminology ( https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html )

The actual call in tedana is here:
ppca = PCA(copy=False, n_components=algorithm, svd_solver="full")

I fully agree that this is a very confusing convention and shouldn't exist, but I'm not sure if using a different option to send the same parameter to another function might create a different type of confusion.

In general, this hold PCA method parsing step is a messy function with a lot of conditional logic and it's likely we'll want to add more options. I'm at OHBM now, but I'm very open to rethinking both how this section of the workflow is organized and what options can more clearly be presented.

Since this would be a breaking change for some workflows, I'd like to think enough to make a change only once so that we won't have multiple breaks.

Lestropie · 2023-07-24T00:30:54Z

we are replicating their terminology

OK, that's useful to know the origin / justification for the interface. I suppose though there can be different demands between an API and a CLI, since with the latter the user technical skill floor is lower and everything is a string. Can optimise for the latter and massage back for the former.

it's likely we'll want to add more options

I'm at the end of my scope of knowledge of TE-dependent (P|I)CA as it is, so would be leaning entirely on the experts here.

Since this would be a breaking change for some workflows, I'd like to think enough to make a change only once

Completely agree.

From a user interface perspective, at least if one were to attempt to use -tedpca with an updated version, argparse should produce a command-line usage warning that that usage is ambiguous and print all possible matching options. Won't however help any workflow engines that wrap the software.

From a software engineering perspective, you may not want to have such backward-incompatible changes being merged to main interspersed with bug fixes and the like.

Lestropie · 2024-03-23T11:34:20Z

In meeting 2024-03-21 it was decided to preserve the current command-line option, but refine the internal translation of a user-provided input into the internal representation. Discussion around that alternative solution will best occur in the relevant Pull Request.

Lestropie closed this as completed Mar 23, 2024

Lestropie mentioned this issue Mar 23, 2024

Changes to --tedpca option #1066

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

-tedpca command-line interface #956

-tedpca command-line interface #956

Lestropie commented Jun 29, 2023

handwerkerd commented Jul 23, 2023

Lestropie commented Jul 24, 2023

Lestropie commented Mar 23, 2024

-tedpca command-line interface #956

-tedpca command-line interface #956

Comments

Lestropie commented Jun 29, 2023

handwerkerd commented Jul 23, 2023

Lestropie commented Jul 24, 2023

Lestropie commented Mar 23, 2024