Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently #600

Open
Quetzalcohuatl opened this issue Jan 31, 2024 · 5 comments
Assignees
Labels
area/core Core code related issue

Comments

@Quetzalcohuatl
Copy link
Contributor

🔧 Proposed code refactoring

if system column not in train dataframe.coljmns or in valid columns, then error out

Motivation

Otherwise user might erroneously believe they are using a system column

@Quetzalcohuatl Quetzalcohuatl added the area/core Core code related issue label Jan 31, 2024
@Quetzalcohuatl
Copy link
Contributor Author

Conversation_chain_handler.py L140

change from a simple log to a raise error? There is so much stuff being printed in the log that the average person would miss the warning

@maxjeblick maxjeblick self-assigned this Feb 5, 2024
@psinger
Copy link
Collaborator

psinger commented Feb 5, 2024

How exactly is it possible to specify a column that does not exist?

@maxjeblick
Copy link
Contributor

How exactly is it possible to specify a column that does not exist?

I guess the issue is referring to the case if the training Dataframe contains a system column, but validation does not.

Conversation_chain_handler.py L140
change from a simple log to a raise error?

To keep the pipeline flexible, one should not raise an issue here. One may use a common evaluation datasets across different experiments (mt-bench, company specific evaluation dataset, ...) that does not contain any system column.

As a low-priority issue, one could think about adding Dataframe checks before running an experiment (alongside cfg checks). For now, logging a warning is sufficient IMO.

@Quetzalcohuatl
Copy link
Contributor Author

No, it doesn’t have to do with train vs valid. Just use any csv file, and in your config.yaml for training, type system=“column_that_doesnt_exist”. The code will still run, it will log a small error saying that the System column was not found. I’m suggesting that instead of logging that, you should just raise an AssertionError

@maxjeblick
Copy link
Contributor

Thanks for the clarification!
As mentioned, the reason to not raise an AssertionError but rather a warning for system prompt missing is intentional.

I'd go into the direction of adding DataFrame checks to check_config_for_errors and making them runnable via the command line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Core code related issue
Projects
None yet
Development

No branches or pull requests

3 participants