Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example file, train_msmarco_v2.py is not working #163

Open
bakingeol opened this issue Feb 13, 2024 · 0 comments
Open

example file, train_msmarco_v2.py is not working #163

bakingeol opened this issue Feb 13, 2024 · 0 comments

Comments

@bakingeol
Copy link

bakingeol commented Feb 13, 2024

Hi, thank you for great repo

I'm trying to training dense retrieve model, using example code(train_msmarco_v2.py), I encounter something strange situation.

retrieve.fit method is not working, the progress bar and GPU-util are stucked. only model parameters are uploaded, forward and backward pass is not working.

I created a new conda and installed beir, and the python version is 3.7.16. The only thing that changed was that triplets could not be downloaded, so I went to the link, downloaded it manually, and added the jsonl file.

During training, amp_use was changed to True when tried with A6000, and changed to False when tried with rtx4090.

Can you give me some reasons why this example file is not working??

Thank you.

(dense_retr) baekig@rtx02:/practice/beir_practice/beir/examples/retrieval/training$ python --version Python
3.7.16
(dense_retr) baekig@rtx02:
/practice/beir_practice/beir/examples/retrieval/training$ python train_msmarco_v2.py
2024-02-13 22:05:39 - Loading Corpus...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8841823/8841823 [00:32<00:00, 274786.45it/s]
2024-02-13 22:06:12 - Loaded 8841823 DEV Documents.
2024-02-13 22:06:13 - Doc Example: {'text': 'The presence of communication amid scientific minds was equally important to the success of the Manhattan Project as scientific intellect was. The only cloud hanging over the impressive achievement of the atomic researchers and engineers is what their success truly meant; hundreds of thousands of innocent lives obliterated.', 'title': ''}
2024-02-13 22:06:13 - Loading Queries...
2024-02-13 22:06:14 - Loaded 6980 DEV Queries.
2024-02-13 22:06:14 - Query Example: how many years did william bradford serve as governor of plymouth colony?
2024-02-13 22:06:14 - loading triplets dataset
9144553it [00:57, 160027.21it/s]
2024-02-13 22:07:11 - model load
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias']

  • This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    2024-02-13 22:07:13 - Use pytorch device: cuda
    Adding Input Examples: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 142884/142884 [00:16<00:00, 8790.52it/s]
    2024-02-13 22:07:29 - Loaded 9144553 training pairs.
    2024-02-13 22:07:41 - eval set contains 8841823 documents and 6980 queries
    2024-02-13 22:07:44 - training start. epoch: 3, batch_size: 64, max_seq_length: 350, warmup_steps: 1000
    2024-02-13 22:07:44 - Starting to Train...
    /home/baekig/.conda/envs/dense_retr/lib/python3.7/site-packages/transformers/optimization.py:415: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
    FutureWarning,
    Epoch: 0%| | 0/3 [00:00<?, ?it/s]
    Iteration: 0%| | 0/142883 [00:00<?, ?it/s]

aiohttp 3.8.6
aiosignal 1.3.1
async-timeout 4.0.3
asynctest 0.13.0
attrs 23.2.0
backcall 0.2.0
beir 2.0.0 /home/baekig/practice/beir_practice/beir
blessed 1.20.0
certifi 2022.12.7
charset-normalizer 3.3.2
click 8.1.7
datasets 2.13.2
debugpy 1.5.1
decorator 5.1.1
dill 0.3.6
elasticsearch 7.9.1
entrypoints 0.4
faiss-cpu 1.7.4
filelock 3.12.2
frozenlist 1.3.3
fsspec 2023.1.0
gpustat 1.1.1
huggingface-hub 0.16.4
idna 3.6
importlib-metadata 6.7.0
ipykernel 6.15.2
ipython 7.31.1
jedi 0.18.1
joblib 1.3.2
jupyter_client 7.4.9
jupyter_core 4.11.2
matplotlib-inline 0.1.6
multidict 6.0.5
multiprocess 0.70.14
nest-asyncio 1.5.6
nltk 3.8.1
numpy 1.21.6
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-ml-py 12.535.133
packaging 22.0
pandas 1.3.5
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 22.3.1
prompt-toolkit 3.0.36
psutil 5.9.0
ptyprocess 0.7.0
pyarrow 12.0.1
Pygments 2.11.2
python-dateutil 2.8.2
pytrec-eval 0.5
pytz 2024.1
PyYAML 6.0.1
pyzmq 23.2.0
regex 2023.12.25
requests 2.31.0
safetensors 0.4.2
scikit-learn 1.0.2
scipy 1.7.3
sentence-transformers 2.2.2
sentencepiece 0.1.99
setuptools 65.6.3
six 1.16.0
threadpoolctl 3.1.0
tokenizers 0.13.3
torch 1.13.1
torchvision 0.14.1
tornado 6.2
tqdm 4.66.2
traitlets 5.7.1
transformers 4.30.2
typing_extensions 4.7.1
urllib3 2.0.7
wcwidth 0.2.5
wheel 0.38.4
xxhash 3.4.1
yarl 1.9.4
zipp 3.15.0

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant