Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

咨询下,LLM的数据污染检测(判断数据集是否训练见过)技术方向靠谱吗?有推荐论文吗? #21

Open
gongjunjin opened this issue Sep 4, 2023 · 1 comment

Comments

@gongjunjin
Copy link

No description provided.

@cyp-jlu-ai
Copy link
Collaborator

No description provided.

Hello, data contamination detection for LLM is an important research area, especially in ensuring the quality and reliability of model training data.

Here are some recommended papers on data contamination detection for LLM:

  1. Time Travel in LLMs: Tracing Data Contamination in Large Language Models
  2. Can we trust the evaluation on ChatGPT?
  3. Large language models are few-shot testers: Exploring llm-based general bug reproduction
  4. BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
  5. Anatomy of an AI-powered malicious social botnet
  6. Origin Tracing and Detecting of LLMs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants