auto column detection when initializing gaze #715

dkrako · 2024-04-05T11:38:07Z

Description of the problem

When working with local datasets we have to setup a bunch of different column names in the DatasetDefinition.
It would be much nicer if standard column namings could be inferred.

Description of a solution

In some cases column names could be guessed. As a starting point we could reuse our internal standard for preprocessed files:

pymovements/src/pymovements/dataset/dataset_files.py

Lines 283 to 296 in cb9ef95

    
           # suffixes as ordered after using GazeDataFrame.unnest() 
        
           component_suffixes = ['x', 'y', 'xl', 'yl', 'xr', 'yr', 'xa', 'ya'] 
        
           pixel_columns = ['pixel_' + suffix for suffix in component_suffixes] 
        
           pixel_columns = [c for c in pixel_columns if c in gaze_df.frame.columns] 
        
           position_columns = ['position_' + suffix for suffix in component_suffixes] 
        
           position_columns = [c for c in position_columns if c in gaze_df.frame.columns] 
        
           velocity_columns = ['velocity_' + suffix for suffix in component_suffixes] 
        
           velocity_columns = [c for c in velocity_columns if c in gaze_df.frame.columns] 
        
           acceleration_columns = ['acceleration_' + suffix for suffix in component_suffixes] 
        
           acceleration_columns = [c for c in acceleration_columns if c in gaze_df.frame.columns]

This would also very much simplify #714 as there's no need for an auto_nest argument then.

So I would propose to add a new argument to the init, for instance:

class GazeDataFrame:
    def __init__(
        ...
         auto_column_detect: bool = False,
        ...
    ):

The default could also be set to True but that could be a breaking change so I'm ambivalent.

For each attribute, for example pixel, we would then write something like this:

 component_suffixes = ['x', 'y', 'xl', 'yl', 'xr', 'yr', 'xa', 'ya'] 

if auto_column_detect and pixel_columns is None:  # I would vote for not overwriting specified columns
   column_canditates = ['pixel_' + suffix for suffix in component_suffixes] 
   pixel_columns = [c for c in column_canditates if c in gaze_df.frame.columns]

if pixel_columns:  # this part is from GazeDataFrame.__init__() and is false if the list is empty
   self._check_component_columns(pixel_columns=pixel_columns)
   self.nest(pixel_columns, output_column='pixel')
   column_specifiers.append(pixel_columns)

This is flexible enough for extending the column_candidates in a potential follow up.

Minimum acceptance criteria

auto detect columns if adhering to the internal column naming standard for preprocessed csv files

The text was updated successfully, but these errors were encountered:

prassepaul · 2024-04-19T12:31:45Z

fixed by #719

dkrako added the enhancement New feature or request label Apr 5, 2024

dkrako mentioned this issue Apr 5, 2024

add a gaze.from_file() facade #714

Open

2 tasks

dkrako assigned prassepaul Apr 18, 2024

prassepaul linked a pull request Apr 19, 2024 that will close this issue

feat: autodetect column names #719

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto column detection when initializing gaze #715

auto column detection when initializing gaze #715

dkrako commented Apr 5, 2024 •

edited by prassepaul

prassepaul commented Apr 19, 2024

auto column detection when initializing gaze #715

auto column detection when initializing gaze #715

Comments

dkrako commented Apr 5, 2024 • edited by prassepaul

Description of the problem

Description of a solution

Minimum acceptance criteria

prassepaul commented Apr 19, 2024

dkrako commented Apr 5, 2024 •

edited by prassepaul