You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check that this issue hasn't been reported before.
I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
When I do sft and start training with a locale parquet.gz file it works. It should also work for DPO with locale files.
Current behaviour
It does not work with an other locale file format than jsonl. It does not work with a locale parquet.gz
See below:
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 59, in<module>
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 51, in do_train
dataset_meta = load_rl_datasets(cfg=cfg, cli_args=cli_args)
File "/workspace/axolotl/src/axolotl/cli/__init__.py", line 400, in load_rl_datasets
train_dataset, eval_dataset = load_prepare_dpo_datasets(cfg)
File "/workspace/axolotl/src/axolotl/utils/data.py", line 958, in load_prepare_dpo_datasets
train_dataset = load_split(cfg.datasets, cfg)
File "/workspace/axolotl/src/axolotl/utils/data.py", line 931, in load_split
ds = load_dataset( # pylint: disable=invalid-name
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 2523, in load_dataset
builder_instance = load_dataset_builder(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 2195, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 1848, in dataset_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /workspace/axolotl/persist/data/rag_tiny.parquet.gz/rag_tiny.parquet.gz.py or any data file in the same directory.
Please check that this issue hasn't been reported before.
Expected Behavior
When I do sft and start training with a locale parquet.gz file it works. It should also work for DPO with locale files.
Current behaviour
It does not work with an other locale file format than jsonl. It does not work with a locale parquet.gz
See below:
Steps to reproduce
See example config:
Config yaml
see above
Possible solution
will provide a PR in a min.
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main
Acknowledgements
The text was updated successfully, but these errors were encountered: