Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto column detection when initializing gaze #715

Open
1 task done
dkrako opened this issue Apr 5, 2024 · 1 comment · May be fixed by #719
Open
1 task done

auto column detection when initializing gaze #715

dkrako opened this issue Apr 5, 2024 · 1 comment · May be fixed by #719
Assignees
Labels
enhancement New feature or request

Comments

@dkrako
Copy link
Contributor

dkrako commented Apr 5, 2024

Description of the problem

When working with local datasets we have to setup a bunch of different column names in the DatasetDefinition.
It would be much nicer if standard column namings could be inferred.

Description of a solution

In some cases column names could be guessed. As a starting point we could reuse our internal standard for preprocessed files:

# suffixes as ordered after using GazeDataFrame.unnest()
component_suffixes = ['x', 'y', 'xl', 'yl', 'xr', 'yr', 'xa', 'ya']
pixel_columns = ['pixel_' + suffix for suffix in component_suffixes]
pixel_columns = [c for c in pixel_columns if c in gaze_df.frame.columns]
position_columns = ['position_' + suffix for suffix in component_suffixes]
position_columns = [c for c in position_columns if c in gaze_df.frame.columns]
velocity_columns = ['velocity_' + suffix for suffix in component_suffixes]
velocity_columns = [c for c in velocity_columns if c in gaze_df.frame.columns]
acceleration_columns = ['acceleration_' + suffix for suffix in component_suffixes]
acceleration_columns = [c for c in acceleration_columns if c in gaze_df.frame.columns]

This would also very much simplify #714 as there's no need for an auto_nest argument then.

So I would propose to add a new argument to the init, for instance:

class GazeDataFrame:
    def __init__(
        ...
         auto_column_detect: bool = False,
        ...
    ):

The default could also be set to True but that could be a breaking change so I'm ambivalent.

For each attribute, for example pixel, we would then write something like this:

 component_suffixes = ['x', 'y', 'xl', 'yl', 'xr', 'yr', 'xa', 'ya'] 

if auto_column_detect and pixel_columns is None:  # I would vote for not overwriting specified columns
   column_canditates = ['pixel_' + suffix for suffix in component_suffixes] 
   pixel_columns = [c for c in column_canditates if c in gaze_df.frame.columns]

if pixel_columns:  # this part is from GazeDataFrame.__init__() and is false if the list is empty
   self._check_component_columns(pixel_columns=pixel_columns)
   self.nest(pixel_columns, output_column='pixel')
   column_specifiers.append(pixel_columns)

This is flexible enough for extending the column_candidates in a potential follow up.

Minimum acceptance criteria

  • auto detect columns if adhering to the internal column naming standard for preprocessed csv files
@dkrako dkrako added the enhancement New feature or request label Apr 5, 2024
@prassepaul prassepaul linked a pull request Apr 19, 2024 that will close this issue
@prassepaul
Copy link
Member

fixed by #719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants