[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently #600

Quetzalcohuatl · 2024-01-31T15:32:16Z

🔧 Proposed code refactoring

if system column not in train dataframe.coljmns or in valid columns, then error out

Motivation

Otherwise user might erroneously believe they are using a system column

Quetzalcohuatl · 2024-01-31T15:43:30Z

Conversation_chain_handler.py L140

change from a simple log to a raise error? There is so much stuff being printed in the log that the average person would miss the warning

psinger · 2024-02-05T11:11:26Z

How exactly is it possible to specify a column that does not exist?

maxjeblick · 2024-02-05T11:30:20Z

How exactly is it possible to specify a column that does not exist?

I guess the issue is referring to the case if the training Dataframe contains a system column, but validation does not.

Conversation_chain_handler.py L140
change from a simple log to a raise error?

To keep the pipeline flexible, one should not raise an issue here. One may use a common evaluation datasets across different experiments (mt-bench, company specific evaluation dataset, ...) that does not contain any system column.

As a low-priority issue, one could think about adding Dataframe checks before running an experiment (alongside cfg checks). For now, logging a warning is sufficient IMO.

Quetzalcohuatl · 2024-02-05T11:38:06Z

No, it doesn’t have to do with train vs valid. Just use any csv file, and in your config.yaml for training, type system=“column_that_doesnt_exist”. The code will still run, it will log a small error saying that the System column was not found. I’m suggesting that instead of logging that, you should just raise an AssertionError

maxjeblick · 2024-02-05T11:59:06Z

Thanks for the clarification!
As mentioned, the reason to not raise an AssertionError but rather a warning for system prompt missing is intentional.

I'd go into the direction of adding DataFrame checks to check_config_for_errors and making them runnable via the command line.

Quetzalcohuatl added the area/core Core code related issue label Jan 31, 2024

maxjeblick self-assigned this Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently #600

[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently #600

Quetzalcohuatl commented Jan 31, 2024

Quetzalcohuatl commented Jan 31, 2024

psinger commented Feb 5, 2024

maxjeblick commented Feb 5, 2024

Quetzalcohuatl commented Feb 5, 2024

maxjeblick commented Feb 5, 2024

[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently #600

[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently #600

Comments

Quetzalcohuatl commented Jan 31, 2024

🔧 Proposed code refactoring

Motivation

Quetzalcohuatl commented Jan 31, 2024

psinger commented Feb 5, 2024

maxjeblick commented Feb 5, 2024

Quetzalcohuatl commented Feb 5, 2024

maxjeblick commented Feb 5, 2024