Could I fine tune this model for Chinese datasets? #41

asenasen123 · 2023-08-18T09:17:30Z

Could you please tell me how i can fine tune for my custom Chinese datasets?

Muennighoff · 2023-08-18T09:48:14Z

Sure if you want to finetune you can follow some of what is outlined in this issue: #2

For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough

asenasen123 · 2023-08-21T01:09:50Z

Sure if you want to finetune you can follow some of what is outlined in this issue: #2

For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough

Do many spgt models on Huggingface support Chinese？

asenasen123 · 2023-08-21T01:11:00Z

If I want to fine-tune the sgpt model, do I just change the dataset?

Muennighoff · 2023-08-21T06:29:59Z

I think only the bloom ones perform well for Chinese.
Yes you can just change the dataset.

asenasen123 · 2023-08-21T07:10:30Z

I think only the bloom ones perform well for Chinese. Yes you can just change the dataset.

Which Chinese dataset should I evaluate the fine-tuned model on?

Muennighoff · 2023-08-21T07:41:08Z

I would evaluate on the Chinese datasets in MTEB.
If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard

Also see embeddings-benchmark/mteb#134

asenasen123 · 2023-08-21T07:43:50Z

I would evaluate on the Chinese datasets in MTEB. If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard

Also see embeddings-benchmark/mteb#134

Are evaluation indicators also Pearson and Spearman?

Muennighoff · 2023-08-21T07:45:14Z

I would evaluate on the Chinese datasets in MTEB. If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard
Also see embeddings-benchmark/mteb#134

Are evaluation indicators also Pearson and Spearman?

For retrieval datasets its nDCG@10 ; But don't worry about the evaluation - if you use MTEB it takes care of automatically calculating the scores etc.

asenasen123 · 2023-08-21T07:46:35Z

I would evaluate on the Chinese datasets in MTEB. If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard
Also see embeddings-benchmark/mteb#134

Are evaluation indicators also Pearson and Spearman?

For retrieval datasets its nDCG@10 ; But don't worry about the evaluation - if you use MTEB it takes care of automatically calculating the scores etc.

Thank you very much!

wilfoderek · 2023-11-13T14:00:23Z

Sure if you want to finetune you can follow some of what is outlined in this issue: #2

For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough

what about spanish fine tune?

Muennighoff · 2023-11-13T16:13:51Z

Sure if you want to finetune you can follow some of what is outlined in this issue: #2
For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough

what about spanish fine tune?

Sure you can do that too. https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco has also seen a lot of Spanish so it may work well for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could I fine tune this model for Chinese datasets? #41

Could I fine tune this model for Chinese datasets? #41

asenasen123 commented Aug 18, 2023

Muennighoff commented Aug 18, 2023

asenasen123 commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

Muennighoff commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

Muennighoff commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

Muennighoff commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

wilfoderek commented Nov 13, 2023

Muennighoff commented Nov 13, 2023

Could I fine tune this model for Chinese datasets? #41

Could I fine tune this model for Chinese datasets? #41

Comments

asenasen123 commented Aug 18, 2023

Muennighoff commented Aug 18, 2023

asenasen123 commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

Muennighoff commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

Muennighoff commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

Muennighoff commented Aug 21, 2023

asenasen123 commented Aug 21, 2023

wilfoderek commented Nov 13, 2023

Muennighoff commented Nov 13, 2023