example file, train_msmarco_v2.py is not working #163

bakingeol · 2024-02-13T13:29:56Z

Hi, thank you for great repo

I'm trying to training dense retrieve model, using example code(train_msmarco_v2.py), I encounter something strange situation.

retrieve.fit method is not working, the progress bar and GPU-util are stucked. only model parameters are uploaded, forward and backward pass is not working.

I created a new conda and installed beir, and the python version is 3.7.16. The only thing that changed was that triplets could not be downloaded, so I went to the link, downloaded it manually, and added the jsonl file.

During training, amp_use was changed to True when tried with A6000, and changed to False when tried with rtx4090.

Can you give me some reasons why this example file is not working??

Thank you.

(dense_retr) baekig@rtx02:/practice/beir_practice/beir/examples/retrieval/training$ python --version Python
3.7.16
(dense_retr) baekig@rtx02:/practice/beir_practice/beir/examples/retrieval/training$ python train_msmarco_v2.py
2024-02-13 22:05:39 - Loading Corpus...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8841823/8841823 [00:32<00:00, 274786.45it/s]
2024-02-13 22:06:12 - Loaded 8841823 DEV Documents.
2024-02-13 22:06:13 - Doc Example: {'text': 'The presence of communication amid scientific minds was equally important to the success of the Manhattan Project as scientific intellect was. The only cloud hanging over the impressive achievement of the atomic researchers and engineers is what their success truly meant; hundreds of thousands of innocent lives obliterated.', 'title': ''}
2024-02-13 22:06:13 - Loading Queries...
2024-02-13 22:06:14 - Loaded 6980 DEV Queries.
2024-02-13 22:06:14 - Query Example: how many years did william bradford serve as governor of plymouth colony?
2024-02-13 22:06:14 - loading triplets dataset
9144553it [00:57, 160027.21it/s]
2024-02-13 22:07:11 - model load
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias']

This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2024-02-13 22:07:13 - Use pytorch device: cuda
Adding Input Examples: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 142884/142884 [00:16<00:00, 8790.52it/s]
2024-02-13 22:07:29 - Loaded 9144553 training pairs.
2024-02-13 22:07:41 - eval set contains 8841823 documents and 6980 queries
2024-02-13 22:07:44 - training start. epoch: 3, batch_size: 64, max_seq_length: 350, warmup_steps: 1000
2024-02-13 22:07:44 - Starting to Train...
/home/baekig/.conda/envs/dense_retr/lib/python3.7/site-packages/transformers/optimization.py:415: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
FutureWarning,
Epoch: 0%| | 0/3 [00:00<?, ?it/s]
Iteration: 0%| | 0/142883 [00:00<?, ?it/s]

aiohttp 3.8.6
aiosignal 1.3.1
async-timeout 4.0.3
asynctest 0.13.0
attrs 23.2.0
backcall 0.2.0
beir 2.0.0 /home/baekig/practice/beir_practice/beir
blessed 1.20.0
certifi 2022.12.7
charset-normalizer 3.3.2
click 8.1.7
datasets 2.13.2
debugpy 1.5.1
decorator 5.1.1
dill 0.3.6
elasticsearch 7.9.1
entrypoints 0.4
faiss-cpu 1.7.4
filelock 3.12.2
frozenlist 1.3.3
fsspec 2023.1.0
gpustat 1.1.1
huggingface-hub 0.16.4
idna 3.6
importlib-metadata 6.7.0
ipykernel 6.15.2
ipython 7.31.1
jedi 0.18.1
joblib 1.3.2
jupyter_client 7.4.9
jupyter_core 4.11.2
matplotlib-inline 0.1.6
multidict 6.0.5
multiprocess 0.70.14
nest-asyncio 1.5.6
nltk 3.8.1
numpy 1.21.6
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-ml-py 12.535.133
packaging 22.0
pandas 1.3.5
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 22.3.1
prompt-toolkit 3.0.36
psutil 5.9.0
ptyprocess 0.7.0
pyarrow 12.0.1
Pygments 2.11.2
python-dateutil 2.8.2
pytrec-eval 0.5
pytz 2024.1
PyYAML 6.0.1
pyzmq 23.2.0
regex 2023.12.25
requests 2.31.0
safetensors 0.4.2
scikit-learn 1.0.2
scipy 1.7.3
sentence-transformers 2.2.2
sentencepiece 0.1.99
setuptools 65.6.3
six 1.16.0
threadpoolctl 3.1.0
tokenizers 0.13.3
torch 1.13.1
torchvision 0.14.1
tornado 6.2
tqdm 4.66.2
traitlets 5.7.1
transformers 4.30.2
typing_extensions 4.7.1
urllib3 2.0.7
wcwidth 0.2.5
wheel 0.38.4
xxhash 3.4.1
yarl 1.9.4
zipp 3.15.0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example file, train_msmarco_v2.py is not working #163

example file, train_msmarco_v2.py is not working #163

bakingeol commented Feb 13, 2024 •

edited

example file, train_msmarco_v2.py is not working #163

example file, train_msmarco_v2.py is not working #163

Comments

bakingeol commented Feb 13, 2024 • edited

Thank you.

bakingeol commented Feb 13, 2024 •

edited