Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modelscope导致datasets库无法正常载入数据集 #812

Closed
zodiacg opened this issue Mar 28, 2024 · 5 comments
Closed

Modelscope导致datasets库无法正常载入数据集 #812

zodiacg opened this issue Mar 28, 2024 · 5 comments
Assignees
Labels

Comments

@zodiacg
Copy link

zodiacg commented Mar 28, 2024

Thanks for your error report and we appreciate it a lot.

Checklist

  • I have searched the tutorial on modelscope doc-site
  • I have searched related issues but cannot get the expected help.
  • The bug has not been fixed in the latest version.

Describe the bug
modelscope 1.13.2版本中,替换了datasets库中的DownloadManager._download方法,导致部分datasets数据集载入失效。

To Reproduce
我使用swift框架进行训练,使用自定义代码的方式添加了自定义数据集。
本地下载M3IT数据集,使用如下代码载入数据集:

ds = datasets.load_dataset(f"{_M3IT_DIR}/M3IT_IMG.py", name='coco', trust_remote_code=True)

在1.13.2版本中,会报错:

FileNotFoundError: Local file /*****/m3it_qwenSource=SDK&Revision=master&FilePath=.%2Fdata%2Fcaptioning%2Fcoco%2Ftrain.jsonl doesn't exist

观察stack是由modelscope/msdatasets/utils/hf_datasets_util.py:91导致的。版本退回到1.13.1版本则无此问题。

能不能不要搞这种侵入式改动。看了眼changelog还标了breaking change,确实很breaking了

Your Environments (required)

  • modelscope: 1.13.2/1.13.1
  • datasets: 2.18.0
  • ms-swift: 1.7.3

Please @ corresponding people according to your problem:

Dataset releated: @wangxingjun778

@wangxingjun778
Copy link
Collaborator

感谢反馈!
目前发现1.13.2和1.13.3均会有此现象,可先回退到1.13.1,我们将在下个小版本修复这个问题。

@Tendo33
Copy link

Tendo33 commented Apr 16, 2024

site-packages/modelscope/msdatasets/utils/hf_datasets_util.py 这里面的的导入都失效很多了,没有更新吗

@wangxingjun778
Copy link
Collaborator

site-packages/modelscope/msdatasets/utils/hf_datasets_util.py 这里面的的导入都失效很多了,没有更新吗

Hi,预计下周初会发布新版本,fix掉该问题。

@wangxingjun778
Copy link
Collaborator

This issue has been fixed and merged to master branch.
Version 1.14.0 will be released soon.

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants