Additional training of Llama 2 with scientific articles

About

Source codes for arxiv preprint.
Teaching Specific Scientific Knowledge into Large Language Models through Additional Training

Setup

Clone this repo.

git clone https://github.com/KanHatakeyama/Additional-training-Llama2.git

Download dataset from HuggingFace

git lfs clone https://huggingface.co/datasets/kanhatakeyama/nature-family-CC-papers
mv nature-family-CC-papers/database/ .
mv nature-family-CC-papers/smallDB/ .

Create env (conda)
```
conda env create -f environment.yml
```

Codes

Demo code and model for additional training with scientific texts
- Here!
Fictional datasets
- Result analysis
  - Full-parameter
  - LoRA
- Training
  - Full-parameter
    - Change number of target texts
    - Change number of irrelevant texts
  - LoRA
    - 7b model
      - Random search
      - Optimization
    - 7,13,70b models
      - Selected adapter layers
        
        Change number of target texts
        
        Change number of irrelevant texts
      - Full adapter layers
        
        Change number of target texts
        
        Change number of irrelevant texts
Scientific papers
- Result analysis
- Training

Author

Kan Hatakeyama
Tokyo Tech., Japan

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
bayes		bayes
demo_model		demo_model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Supplementary Data.xlsx		Supplementary Data.xlsx
demo.ipynb		demo.ipynb
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bayes

bayes

demo_model

demo_model

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Supplementary Data.xlsx

Supplementary Data.xlsx

demo.ipynb

demo.ipynb

environment.yml

environment.yml

Repository files navigation

Additional training of Llama 2 with scientific articles

About

Setup

Codes

Author

About

Releases

Packages

Languages

License

KanHatakeyama/Additional-training-Llama2

Folders and files

Latest commit

History

Repository files navigation

Additional training of Llama 2 with scientific articles

About

Setup

Codes

Author

About

Resources

License

Stars

Watchers

Forks

Languages