Skip to content

Latest commit

 

History

History
729 lines (645 loc) · 58.2 KB

5a_finetune_sequence_classificaition.md

File metadata and controls

729 lines (645 loc) · 58.2 KB

Language Model Finetuning on Sequence Classification Task




We provide a finetuning script (./scripts/downstream/train_sequence_classification_lm_finetuning.py) to finetune our pretrained language model on 3 multiclass classification tasks ( wisesight_sentiment, wongnai_reviews, generated_reviews_enth : review_star ) and 1 multilabel classification task (prachathai67k).

The arguements for the train_sequence_classification_lm_finetuning.py are as follows:


Required arguments:

  • tokenizer_type_or_public_model_name :

    The token type that RoBERThai used (spm, spm_camembert (for roberthai-95g-spm), newmm, syllable, sefr_cut).

    If the token type is specified, it is required to specify the directory to model checkpoint and tokenizer via --model_dir and --tokenizer_dir.

    Otherwise, specify other public language model (Currently, we support mbert and xlmr )

  • dataset_name :

    Specify the dataset name to finetune. Currently, sequence classification datasets including wisesight_sentiment, generated_reviews_enth-review_star, andwongnai_reviews.

  • output_dir :

    The directory to store finetuned model

  • logging_dir :

    The directory to logging output including Tensorboard log, and wandb log (optional)


Optional arguments:

  • --model_dir : The directory of pretrained model checkpoint

  • --tokenizer_dir : The directory of tokenizer's vocab

  • --space_token : The custom token that will replace a space token in the texts. As some models use custom space token (default: "<_>"). For mbert and xlmr specify the space token as " ".

  • --max_length: Specify the max length of text inputs to be passed to the model, The max length should be less than the max positional embedding or the max sequence length that langauge model was pretrained on.

  • --num_train_epochs: Number of epochs to finetune model (default: 5)

  • --learning_rate: The value of peak learning rate (default: 1e-05)

  • --weight_decay : The value of weight decay (default: 0.01)

  • --warmup_ratio: The ratio of steps / max_steps to warmup learning rate (default: 0.1; in other word, warm up the learning until the peak valye for the first 10% of the total steps)

  • --batch_size: The batch size (default: 16)

  • --no_cuda: Append "--no_cuda" to use only CPUs during finetuning (default: False)

  • --fp16: Append "--fp16" to use FP16 mixed-precision trianing (default: False)

  • --metric_for_best_model: The metric to select the best model based on validation set (default: f1_micro)

  • --greater_is_better: The criteria to select the best model according to the specified metric either by expecting the greater value or lower value (default: True)

  • --logging_steps : In interval of training steps to perform logging (default: 10)

  • --seed : The seed value (default: 2020)

  • --fp16_opt_level : The OPT level for FP16 mixed-precision training (default: O1)

  • --gradient_accumulation_steps : The number of steps to accumulate gradients (default: 1, no gradient accumulation)

  • --adam_epsilon : Value of Adam epsilon (default: 1e-05)

  • --max_grad_norm : Value of gradient norm (default: 1.0)

  • --lowercase : Append "--lowercase" to convert all input texts to lowercase as some model may support only uncased texts (default: False)

  • --run_name : Specify the run_name for logging experiment to wandb.com (default: None)

Example


  1. Finetuning roberthai-thwiki-spm on multiclass classification task of wisesight_sentiment dataset.

    The following script will finetune the roberthai-thwiki-spm pretrained model from checkpoint:7000.

    The script will finetune model with FP16 mixed-precision training on GPU (ID: 3). The train and validation batch size is 16 with no gradient accumulation. The model checkpoint will be save every epoch and select the best model by validation f1_micro. During finetuning, the learning rate will be warmed up linearly until 3e-05 for 100 steps, then linearly decay to zero. The maximum sequence length that the model will be passed (from the resuling number of tokens according to the tokenizer specified). Otherwise, it will truncate the sequence to max_length.

    cd ./scripts/downstream
    CUDA_VISIBLE_DEVICES=3 python ./train_sequence_classification_lm_finetuning.py \
    spm \
    wisesight_sentiment \
    /workspace/checkpoints/roberthai-thwiki-spm/finetuned/wisesight_sentiment/ \
    /workspace/logs/roberthai-thwiki-spm/finetuned/wisesight_sentiment/ \
    --tokenizer_dir /workspace/checkpoints/roberthai-thwiki-spm/tokenizer_folder \
    --model_dir /workspace/checkpoints/roberthai-thwiki-spm/model/checkpoint-7000 \
    --num_train_epochs 1 \
    --metric_for_best_model f1_micro \
    --learning_rate 3e-05 \
    --warmup_ratio 0.1 \
    --max_length 512 \
    --space_token "<_>" \
    --fp16
    
    Log output:
    [INFO] Dataset: wisesight_sentiment
    
    
    [INFO] Huggingface's dataset name: wisesight_sentiment 
    [INFO] Task: multiclass_classification
    
    [INFO] space_token: <_>
    [INFO] prepare_for_tokenization: False
    
    Reusing dataset wisesight_sentiment (/root/.cache/huggingface/datasets/wisesight_sentiment/wisesight_sentiment/1.0.0/4bb1772cff1a0703d72fb9e84dff9348e80f6cdf80b0f6c0f59bcd85fc5a3537)
    Some weights of the model checkpoint at /workspace/checkpoints/roberthai-thwiki-spm/model/checkpoint-7000 were not used when initializing RobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.decoder.bias']
    - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
    - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at /workspace/checkpoints/roberthai-thwiki-spm/model/checkpoint-7000 and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    
    [INFO] Model architecture: RobertaForSequenceClassification(
    (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
        (word_embeddings): Embedding(24000, 768, padding_idx=1)
        (position_embeddings): Embedding(514, 768, padding_idx=1)
        (token_type_embeddings): Embedding(1, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
        (layer): ModuleList(
            (0): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (1): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (2): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (3): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (4): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (5): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (6): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (7): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (8): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (9): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (10): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
            (11): RobertaLayer(
            (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
                (output): RobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
                )
            )
            (intermediate): RobertaIntermediate(
                (dense): Linear(in_features=768, out_features=3072, bias=True)
            )
            (output): RobertaOutput(
                (dense): Linear(in_features=3072, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
            )
            )
        )
        )
    )
    (classifier): RobertaClassificationHead(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (out_proj): Linear(in_features=768, out_features=4, bias=True)
    )
    ) 
    
    
    
    [INFO] tokenizer: PreTrainedTokenizer(name_or_path='/workspace/checkpoints/roberthai-thwiki-spm/tokenizer_folder', vocab_size=24000, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': '<mask>', 'additional_special_tokens': ['<_>']}) 
    
    
    
    [INFO] Preprocess and tokenizing texts in datasets
    [INFO] max_length = 512 
    
    [DEBUG] labels [1 1 1 1]
    
    [DEBUG] label_encoder.classes_ [0 1 2 3]
    
    
    [DEBUG] (before: preprocessor) input_text ['ไปจองมาแล้วนาจา Mitsubishi Attrage ได้หลังสงกรานต์เลย รอขับอยู่นาจา กระทัดรัด เหมาะกับสาวๆขับรถคนเดียวแบบเรา ราคาสบายกระเป๋า ประหยัดน้ำมัน วิ่งไกลแค่ไหนหายห่วงค่ะ', 'เปิดศักราชใหม่! นายกฯ แถลงข่าวก่อนการแข่งขันศึก #ช้างเอฟเอคัพ นัดชิงชนะเลิศ', 'บัตรสมาชิกลดได้อีกไหมคับ', 'สนใจ new mazda2ครับ']
    
    [DEBUG] Apply preprocessor to texts.
    
    
    [DEBUG] (after: preprocessor) input_text ['ไปจองมาแล้วนาจา<_>mitsubishi<_>attrage<_>ได้หลังสงกรานต์เลย<_>รอขับอยู่นาจา<_>กระทัดรัด<_>เหมาะกับสาวๆขับรถคนเดียวแบบเรา<_>ราคาสบายกระเป๋า<_>ประหยัดน้ำมัน<_>วิ่งไกลแค่ไหนหายห่วงค่ะ', 'เปิดศักราชใหม่!<_>นายกฯ<_>แถลงข่าวก่อนการแข่งขันศึก<_>#ช้างเอฟเอคัพ<_>นัดชิงชนะเลิศ', 'บัตรสมาชิกลดได้อีกไหมคับ', 'สนใจ<_>new<_>mazda2ครับ']
    
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:05<00:00,  4.06it/s]
    0it [00:00, ?it/s]
    [DEBUG] labels [2 0 1 0]
    
    [DEBUG] label_encoder.classes_ [0 1 2 3]
    
    
    [DEBUG] (before: preprocessor) input_text ['วันที่6/3/61 เสียอารมณ์มาเร้ย. อาหารช้ามากกกกก นั่งคอย ประมาน20 นาที พนักงานยกน้ำจิ้มมาโดยไม่ใช้ถาด เอาถ้วยซ้อนๆกันออกมา จนเช็คบิล ตับหวานก้อไม่ได้', 'ยี่ห้อนี่ เขาชอบตั้งชื่อ ลงท้ายด้วย สระอา เทียน่า อัลเมร่า นาวาร่า เทอร่า พัลซ่า ยามาฮ่า แฮร่', 'สองวันสุดท้าย! ใครอยู่แถวแฟชั่น ไอส์แลนด์ มาร่วมสนุกกับกิจกรรมจากรองพื้นลอรีอัล ปารีส ทรูแมช ที่ร้าน Eve & Boy ได้ เรามีทั้งบริการเลือกเฉดรองพื้น 13 เฉดและแต่งหน้า Touch Up ฟรี! ที่สำคัญยังมีเกมส์ชิงของรางวัลจากรุ่น True Match มากมาย และบูธถ่ายรูปเก๋ๆ ภายในงาน ถ้าพลาดวันนี้ พรุ่งนี้ยังมีอีกวันนะคะ ตั้งแต่ เวลา 11:00 น. - 20:00น. #TrueToMyShade #TrueMatch #LorealParisTH', 'น้องแสงโสมอี้บ๋อ กะว่าน้องโซดา น้ำแข็งดี เอ๊ะหรือเหล้ากึ่งแก้วน้ำล้วนโซดาลอยดีหา 5555']
    
    [DEBUG] Apply preprocessor to texts.
    
    
    [DEBUG] (after: preprocessor) input_text ['วันที่6/3/61<_>เสียอารมณ์มาเร้ย.<_>อาหารช้ามาก<_>นั่งคอย<_>ประมาน20<_>นาที<_>พนักงานยกน้ำจิ้มมาโดยไม่ใช้ถาด<_>เอาถ้วยซ้อนๆกันออกมา<_>จนเช็คบิล<_>ตับหวานก้อไม่ได้', 'ยี่ห้อนี่<_>เขาชอบตั้งชื่อ<_>ลงท้ายด้วย<_>สระอา<_>เทียน่า<_>อัลเมร่า<_>นาวาร่า<_>เทอร่า<_>พัลซ่า<_>ยามาฮ่า<_>แฮร่', 'สองวันสุดท้าย!<_>ใครอยู่แถวแฟชั่น<_>ไอส์แลนด์<_>มาร่วมสนุกกับกิจกรรมจากรองพื้นลอรีอัล<_>ปารีส<_>ทรูแมช<_>ที่ร้าน<_>eve<_>&<_>boy<_>ได้<_>เรามีทั้งบริการเลือกเฉดรองพื้น<_>13<_>เฉดและแต่งหน้า<_>touch<_>up<_>ฟรี!<_>ที่สำคัญยังมีเกมส์ชิงของรางวัลจากรุ่น<_>true<_>match<_>มากมาย<_>และบูธถ่ายรูปเก๋ๆ<_>ภายในงาน<_>ถ้าพลาดวันนี้<_>พรุ่งนี้ยังมีอีกวันนะคะ<_>ตั้งแต่<_>เวลา<_>11:00<_>น.<_>-<_>20:00น.<_>#truetomyshade<_>#truematch<_>#lorealparisth', 'น้องแสงโสมอี้บ๋อ<_>กะว่าน้องโซดา<_>น้ำแข็งดี<_>เอ๊ะหรือเหล้ากึ่งแก้วน้ำล้วนโซดาลอยดีหา<_>5']
    
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  5.02it/s]
    0it [00:00, ?it/s]
    [DEBUG] labels [2 1 2 1]
    
    [DEBUG] label_encoder.classes_ [0 1 2 3]
    
    
    [DEBUG] (before: preprocessor) input_text ['ซื้อแต่ผ้าอนามัยแบบเย็นมาค่ะ แบบว่าอีห่ากูนอนไม่ได้', 'ครับ #phithanbkk', 'การด่าไปเหมือนได้บรรเทาความเครียดเฉยๆ แต่บีทีเอส (รถไฟฟ้า) มันสำนึกมั้ย ก็ไม่อ่ะ 😕', 'Cf clarins 5 ขวด 2850']
    
    [DEBUG] Apply preprocessor to texts.
    
    
    [DEBUG] (after: preprocessor) input_text ['ซื้อแต่ผ้าอนามัยแบบเย็นมาค่ะ<_>แบบว่าอีห่ากูนอนไม่ได้', 'ครับ<_>#phithanbkk', 'การด่าไปเหมือนได้บรรเทาความเครียดเฉยๆ<_>แต่บีทีเอส<_>(รถไฟฟ้า)<_>มันสำนึกมั้ย<_>ก็ไม่อ่ะ<_>😕', 'cf<_>clarins<_>5<_>ขวด<_>2850']
    
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.52it/s]
    0it [00:00, ?it/s]
    [INFO] Done.
    
    [INFO] Number of train examples = 21628
    [INFO] Number of batches per epoch (training set) = 1352
    [INFO] Number of validation examples = 2404
    [INFO] Number of batches per epoch (validation set) = 2404
    [INFO] Warmup ratio = 0.1
    [INFO] Warmup steps = 136
    [INFO] Learning rate: 3e-05
    [INFO] Logging steps: 10
    [INFO] FP16 training: True
    
    
    [INFO] TrainingArguments:
    TrainingArguments(output_dir='/workspace/checkpoints/roberthai-thwiki-spm/finetuned/wisesight_sentiment/', overwrite_output_dir=True, do_train=False, do_eval=None, do_predict=False, evaluate_during_training=False, evaluation_strategy=<EvaluationStrategy.EPOCH: 'epoch'>, prediction_loss_only=False, per_device_train_batch_size=16, per_device_eval_batch_size=16, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=3e-05, weight_decay=0.01, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1, max_steps=-1, warmup_steps=136, logging_dir='/workspace/logs/roberthai-thwiki-spm/finetuned/wisesight_sentiment/', logging_first_step=False, logging_steps=10, save_steps=500, save_total_limit=None, no_cuda=False, seed=2020, fp16=True, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=10, dataloader_num_workers=0, past_index=-1, run_name='/workspace/checkpoints/roberthai-thwiki-spm/finetuned/wisesight_sentiment/', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=True, metric_for_best_model='f1_micro', greater_is_better=True)
    
    
    
    Begin model finetuning.
    Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.
    
    Defaults for this optimization level are:
    enabled                : True
    opt_level              : O1
    cast_model_type        : None
    patch_torch_functions  : True
    keep_batchnorm_fp32    : None
    master_weights         : None
    loss_scale             : dynamic
    Processing user overrides (additional kwargs that are not None)...
    After processing overrides, optimization options are:
    enabled                : True
    opt_level              : O1
    cast_model_type        : None
    patch_torch_functions  : True
    keep_batchnorm_fp32    : None
    master_weights         : None
    loss_scale             : dynamic
    wandb: Offline run mode, not syncing to the cloud.
    wandb: W&B syncing is set to `offline` in this directory.  Run `wandb online` to enable cloud syncing.
    0%|          | 0/1352 [00:00<?, ?it/s]
    1%|          | 10/1352 [01:41<1:35:35,  4.27s/it]8823529411767e-06, 'epoch': 0.0073964497041420114}
    1%|▏         | 20/1352 [02:11<51:39,  2.33s/it]{'loss': 1.3437000274658204, 'learning_rate': 4.411764705882353e-06, 'epoch': 0.014792899408284023}
    2%|▏         | 30/1352 [02:27<20:12,  1.09it/s]{'loss': 1.2052518844604492, 'learning_rate': 6.61764705882353e-06, 'epoch': 0.022189349112426034}
    3%|▎         | 40/1352 [02:29<03:05,  7.08it/s]{'loss': 1.1024581909179687, 'learning_rate': 8.823529411764707e-06, 'epoch': 0.029585798816568046}
    4%|▎         | 50/1352 [02:44<08:19,  2.60it/s]029411764705883e-05, 'epoch': 0.03698224852071006}
    4%|▍         | 60/1352 [02:45<02:58,  7.25it/s]{'loss': 1.047784423828125, 'learning_rate': 1.323529411764706e-05, 'epoch': 0.04437869822485207}
                                                    {'loss': 1.0573417663574218, 'learning_rate': 1.5441176470588234e-05, 'epoch': 0.051775147928994084}
                                                    {'loss': 1.0460289001464844, 'learning_rate': 1.7647058823529414e-05, 'epoch': 0.05917159763313609}
                                                    {'loss': 1.0624427795410156, 'learning_rate': 1.9852941176470586e-05, 'epoch': 0.06656804733727811}
                                                    {'loss': 0.9531181335449219, 'learning_rate': 2.2058823529411766e-05, 'epoch': 0.07396449704142012}
                                                    {'loss': 1.06204833984375, 'learning_rate': 2.4264705882352942e-05, 'epoch': 0.08136094674556213}
    9%|▉         | 120/1352 [03:02<02:52,  7.16it/s]{'loss': 0.92392578125, 'learning_rate': 2.647058823529412e-05, 'epoch': 0.08875739644970414}
                                                    {'loss': 0.9776107788085937, 'learning_rate': 2.8676470588235295e-05, 'epoch': 0.09615384615384616}
    10%|█         | 140/1352 [03:04<02:31,  7.99it/s]{'loss': 1.0748321533203125, 'learning_rate': 2.9901315789473686e-05, 'epoch': 0.10355029585798817}
    11%|█         | 150/1352 [03:07<02:48,  7.14it/s]{'loss': 0.9978012084960938, 'learning_rate': 2.9654605263157896e-05, 'epoch': 0.11094674556213018}
    12%|█▏        | 160/1352 [03:08<02:47,  7.10it/s]{'loss': 0.8718536376953125, 'learning_rate': 2.9407894736842106e-05, 'epoch': 0.11834319526627218}
    13%|█▎        | 170/1352 [03:10<02:42,  7.28it/s]{'loss': 0.8958663940429688, 'learning_rate': 2.9161184210526316e-05, 'epoch': 0.1257396449704142}
                                                    {'loss': 0.8289108276367188, 'learning_rate': 2.8914473684210526e-05, 'epoch': 0.13313609467455623}
    14%|█▍        | 190/1352 [03:13<03:04,  6.29it/s]{'loss': 0.8470077514648438, 'learning_rate': 2.8667763157894736e-05, 'epoch': 0.14053254437869822}
    15%|█▍        | 200/1352 [03:29<1:32:46,  4.83s/it]{'loss': 0.8324371337890625, 'learning_rate': 2.8421052631578946e-05, 'epoch': 0.14792899408284024}
    16%|█▌        | 210/1352 [03:31<05:24,  3.52it/s]{'loss': 0.9709686279296875, 'learning_rate': 2.817434210526316e-05, 'epoch': 0.15532544378698224}
                                                    {'loss': 0.8655807495117187, 'learning_rate': 2.792763157894737e-05, 'epoch': 0.16272189349112426}
    17%|█▋        | 230/1352 [03:34<02:39,  7.04it/s]{'loss': 0.9475296020507813, 'learning_rate': 2.768092105263158e-05, 'epoch': 0.17011834319526628}
    18%|█▊        | 240/1352 [03:36<02:25,  7.63it/s]{'loss': 0.7989715576171875, 'learning_rate': 2.743421052631579e-05, 'epoch': 0.17751479289940827}
    18%|█▊        | 250/1352 [03:37<02:14,  8.22it/s]{'loss': 0.8985809326171875, 'learning_rate': 2.71875e-05, 'epoch': 0.1849112426035503}
                                                    {'loss': 0.8492599487304687, 'learning_rate': 2.694078947368421e-05, 'epoch': 0.19230769230769232}
                                                    {'loss': 0.8721343994140625, 'learning_rate': 2.669407894736842e-05, 'epoch': 0.1997041420118343}
                                                    {'loss': 0.7967071533203125, 'learning_rate': 2.644736842105263e-05, 'epoch': 0.20710059171597633}
    21%|██▏       | 290/1352 [03:43<03:20,  5.30it/s]{'loss': 0.7402099609375, 'learning_rate': 2.620065789473684e-05, 'epoch': 0.21449704142011836}
    22%|██▏       | 300/1352 [03:45<08:55,  1.96it/s]{'loss': 0.9380523681640625, 'learning_rate': 2.5953947368421054e-05, 'epoch': 0.22189349112426035}
    23%|██▎       | 310/1352 [03:47<03:09,  5.49it/s]{'loss': 0.9521942138671875, 'learning_rate': 2.5707236842105264e-05, 'epoch': 0.22928994082840237}
    24%|██▎       | 320/1352 [03:48<02:18,  7.45it/s]{'loss': 0.8440216064453125, 'learning_rate': 2.5460526315789474e-05, 'epoch': 0.23668639053254437}
    24%|██▍       | 330/1352 [03:50<02:08,  7.94it/s]{'loss': 0.875164794921875, 'learning_rate': 2.5213815789473684e-05, 'epoch': 0.2440828402366864}
                                                    {'loss': 0.8367584228515625, 'learning_rate': 2.4967105263157894e-05, 'epoch': 0.2514792899408284}
    26%|██▌       | 350/1352 [03:52<02:12,  7.58it/s]{'loss': 0.895458984375, 'learning_rate': 2.4720394736842104e-05, 'epoch': 0.2588757396449704}
    27%|██▋       | 360/1352 [03:54<02:42,  6.12it/s]{'loss': 0.935107421875, 'learning_rate': 2.4473684210526318e-05, 'epoch': 0.26627218934911245}
                                                    {'loss': 0.815704345703125, 'learning_rate': 2.4226973684210528e-05, 'epoch': 0.27366863905325445}
    28%|██▊       | 380/1352 [03:56<02:11,  7.42it/s]{'loss': 0.91475830078125, 'learning_rate': 2.3980263157894738e-05, 'epoch': 0.28106508875739644}
    29%|██▉       | 390/1352 [03:58<01:59,  8.06it/s]{'loss': 0.8481536865234375, 'learning_rate': 2.3733552631578948e-05, 'epoch': 0.28846153846153844}
    30%|██▉       | 400/1352 [04:02<04:39,  3.40it/s]{'loss': 0.8343109130859375, 'learning_rate': 2.348684210526316e-05, 'epoch': 0.2958579881656805}
    30%|███       | 410/1352 [04:03<02:09,  7.25it/s]{'loss': 0.86136474609375, 'learning_rate': 2.324013157894737e-05, 'epoch': 0.3032544378698225}
    31%|███       | 420/1352 [04:05<02:12,  7.02it/s]{'loss': 0.819671630859375, 'learning_rate': 2.299342105263158e-05, 'epoch': 0.3106508875739645}
    32%|███▏      | 430/1352 [04:06<02:04,  7.43it/s]{'loss': 0.8303619384765625, 'learning_rate': 2.274671052631579e-05, 'epoch': 0.3180473372781065}
                                                    {'loss': 0.7434661865234375, 'learning_rate': 2.25e-05, 'epoch': 0.3254437869822485}
                                                    {'loss': 0.7448455810546875, 'learning_rate': 2.225328947368421e-05, 'epoch': 0.3328402366863905}
    34%|███▍      | 460/1352 [04:10<02:15,  6.57it/s]{'loss': 0.6353759765625, 'learning_rate': 2.200657894736842e-05, 'epoch': 0.34023668639053256}
    35%|███▍      | 470/1352 [04:12<01:53,  7.78it/s]{'loss': 0.8759033203125, 'learning_rate': 2.175986842105263e-05, 'epoch': 0.34763313609467456}
                                                    {'loss': 0.74891357421875, 'learning_rate': 2.151315789473684e-05, 'epoch': 0.35502958579881655}
                                                    {'loss': 0.769342041015625, 'learning_rate': 2.1266447368421055e-05, 'epoch': 0.3624260355029586}
    37%|███▋      | 500/1352 [04:17<08:44,  1.62it/s]{'loss': 0.8095611572265625, 'learning_rate': 2.1019736842105265e-05, 'epoch': 0.3698224852071006}
    38%|███▊      | 510/1352 [04:19<02:15,  6.20it/s]{'loss': 0.753070068359375, 'learning_rate': 2.0773026315789475e-05, 'epoch': 0.3772189349112426}
                                                    {'loss': 0.756378173828125, 'learning_rate': 2.0526315789473685e-05, 'epoch': 0.38461538461538464}
    39%|███▉      | 530/1352 [04:21<01:49,  7.53it/s]{'loss': 0.784039306640625, 'learning_rate': 2.0279605263157895e-05, 'epoch': 0.39201183431952663}
                                                    {'loss': 0.7979827880859375, 'learning_rate': 2.0032894736842105e-05, 'epoch': 0.3994082840236686}
    41%|████      | 550/1352 [04:24<01:37,  8.21it/s]{'loss': 0.764959716796875, 'learning_rate': 1.9786184210526315e-05, 'epoch': 0.4068047337278107}
    41%|████▏     | 560/1352 [04:26<01:48,  7.32it/s]{'loss': 0.759918212890625, 'learning_rate': 1.9539473684210525e-05, 'epoch': 0.41420118343195267}
    42%|████▏     | 570/1352 [04:27<01:33,  8.38it/s]{'loss': 0.7657470703125, 'learning_rate': 1.9292763157894736e-05, 'epoch': 0.42159763313609466}
    43%|████▎     | 580/1352 [04:31<03:48,  3.37it/s]{'loss': 0.6763214111328125, 'learning_rate': 1.9046052631578946e-05, 'epoch': 0.4289940828402367}
                                                    {'loss': 0.691802978515625, 'learning_rate': 1.879934210526316e-05, 'epoch': 0.4363905325443787}
    44%|████▍     | 600/1352 [04:48<55:19,  4.41s/it]7894737e-05, 'epoch': 0.4437869822485207}
    45%|████▌     | 610/1352 [04:49<03:06,  3.98it/s]{'loss': 0.7890625, 'learning_rate': 1.830592105263158e-05, 'epoch': 0.4511834319526627}
                                                    {'loss': 0.769647216796875, 'learning_rate': 1.805921052631579e-05, 'epoch': 0.45857988165680474}
                                                    {'loss': 0.87633056640625, 'learning_rate': 1.78125e-05, 'epoch': 0.46597633136094674}
    47%|████▋     | 640/1352 [04:53<01:50,  6.46it/s]{'loss': 0.76488037109375, 'learning_rate': 1.756578947368421e-05, 'epoch': 0.47337278106508873}
    48%|████▊     | 650/1352 [04:55<01:31,  7.70it/s]{'loss': 0.754241943359375, 'learning_rate': 1.731907894736842e-05, 'epoch': 0.4807692307692308}
    49%|████▉     | 660/1352 [04:56<01:36,  7.15it/s]{'loss': 0.7030029296875, 'learning_rate': 1.707236842105263e-05, 'epoch': 0.4881656804733728}
    50%|████▉     | 670/1352 [04:57<01:22,  8.23it/s]{'loss': 0.933746337890625, 'learning_rate': 1.682565789473684e-05, 'epoch': 0.49556213017751477}
                                                    {'loss': 0.884130859375, 'learning_rate': 1.6578947368421053e-05, 'epoch': 0.5029585798816568}
    51%|█████     | 690/1352 [05:01<01:31,  7.26it/s]{'loss': 0.776885986328125, 'learning_rate': 1.6332236842105266e-05, 'epoch': 0.5103550295857988}
                                                    {'loss': 0.75533447265625, 'learning_rate': 1.6085526315789476e-05, 'epoch': 0.5177514792899408}
    53%|█████▎    | 710/1352 [05:06<01:45,  6.10it/s]{'loss': 0.723431396484375, 'learning_rate': 1.5838815789473687e-05, 'epoch': 0.5251479289940828}
    53%|█████▎    | 720/1352 [05:07<01:18,  8.02it/s]{'loss': 0.771697998046875, 'learning_rate': 1.5592105263157897e-05, 'epoch': 0.5325443786982249}
    54%|█████▍    | 730/1352 [05:08<01:23,  7.46it/s]{'loss': 0.72344970703125, 'learning_rate': 1.5345394736842107e-05, 'epoch': 0.5399408284023669}
    55%|█████▍    | 740/1352 [05:10<01:29,  6.81it/s]{'loss': 0.7661865234375, 'learning_rate': 1.5098684210526315e-05, 'epoch': 0.5473372781065089}
    55%|█████▌    | 750/1352 [05:11<01:14,  8.08it/s]{'loss': 0.7775390625, 'learning_rate': 1.4851973684210527e-05, 'epoch': 0.5547337278106509}
    56%|█████▌    | 760/1352 [05:12<01:17,  7.65it/s]{'loss': 0.68282470703125, 'learning_rate': 1.4605263157894737e-05, 'epoch': 0.5621301775147929}
    57%|█████▋    | 770/1352 [05:14<01:45,  5.53it/s]{'loss': 0.641754150390625, 'learning_rate': 1.4358552631578949e-05, 'epoch': 0.5695266272189349}
    58%|█████▊    | 780/1352 [05:15<01:09,  8.21it/s]{'loss': 0.752880859375, 'learning_rate': 1.4111842105263159e-05, 'epoch': 0.5769230769230769}
    58%|█████▊    | 790/1352 [05:17<01:11,  7.90it/s]{'loss': 0.763067626953125, 'learning_rate': 1.3865131578947369e-05, 'epoch': 0.584319526627219}
                                                    {'loss': 0.651007080078125, 'learning_rate': 1.361842105263158e-05, 'epoch': 0.591715976331361}
    60%|█████▉    | 810/1352 [05:21<01:12,  7.45it/s]{'loss': 0.72872314453125, 'learning_rate': 1.337171052631579e-05, 'epoch': 0.599112426035503}
    61%|██████    | 820/1352 [05:22<01:04,  8.22it/s]{'loss': 0.790142822265625, 'learning_rate': 1.3125e-05, 'epoch': 0.606508875739645}
    61%|██████▏   | 830/1352 [05:24<01:02,  8.31it/s]{'loss': 0.646343994140625, 'learning_rate': 1.287828947368421e-05, 'epoch': 0.613905325443787}
    62%|██████▏   | 840/1352 [05:25<01:13,  6.97it/s]{'loss': 0.830230712890625, 'learning_rate': 1.263157894736842e-05, 'epoch': 0.621301775147929}
    63%|██████▎   | 850/1352 [05:26<01:11,  7.04it/s]{'loss': 0.728875732421875, 'learning_rate': 1.2384868421052632e-05, 'epoch': 0.628698224852071}
    64%|██████▎   | 860/1352 [05:28<01:05,  7.49it/s]{'loss': 0.725, 'learning_rate': 1.2138157894736842e-05, 'epoch': 0.636094674556213}
    64%|██████▍   | 870/1352 [05:29<01:08,  7.08it/s]{'loss': 0.676788330078125, 'learning_rate': 1.1891447368421053e-05, 'epoch': 0.643491124260355}
    65%|██████▌   | 880/1352 [05:30<00:58,  8.06it/s]{'loss': 0.735028076171875, 'learning_rate': 1.1644736842105263e-05, 'epoch': 0.650887573964497}
    66%|██████▌   | 890/1352 [05:32<01:07,  6.89it/s]{'loss': 0.7079345703125, 'learning_rate': 1.1398026315789473e-05, 'epoch': 0.658284023668639}
    66%|██████▌   | 893/1352 [05:32<00:57,  7.92it/s]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
    67%|██████▋   | 900/1352 [05:36<07:34,  1.01s/it]{'loss': 0.784625244140625, 'learning_rate': 1.1151315789473684e-05, 'epoch': 0.665680473372781}
    67%|██████▋   | 910/1352 [05:38<01:13,  6.02it/s]{'loss': 0.72747802734375, 'learning_rate': 1.0904605263157894e-05, 'epoch': 0.6730769230769231}
    68%|██████▊   | 920/1352 [05:39<01:09,  6.20it/s]{'loss': 0.768280029296875, 'learning_rate': 1.0657894736842106e-05, 'epoch': 0.6804733727810651}
    69%|██████▉   | 930/1352 [05:40<01:06,  6.39it/s]{'loss': 0.761163330078125, 'learning_rate': 1.0411184210526316e-05, 'epoch': 0.6878698224852071}
    70%|██████▉   | 940/1352 [05:42<00:54,  7.52it/s]{'loss': 0.72337646484375, 'learning_rate': 1.0164473684210528e-05, 'epoch': 0.6952662721893491}
                                                    {'loss': 0.77833251953125, 'learning_rate': 9.917763157894738e-06, 'epoch': 0.7026627218934911}
    70%|███████   | 952/1352 [05:44<00:53,  7.43it/s]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
    71%|███████   | 960/1352 [05:45<00:56,  6.94it/s]{'loss': 0.593701171875, 'learning_rate': 9.671052631578948e-06, 'epoch': 0.7100591715976331}
    72%|███████▏  | 970/1352 [05:46<00:46,  8.14it/s]{'loss': 0.715960693359375, 'learning_rate': 9.424342105263158e-06, 'epoch': 0.7174556213017751}
    72%|███████▏  | 980/1352 [05:47<00:53,  7.00it/s]{'loss': 0.63238525390625, 'learning_rate': 9.177631578947368e-06, 'epoch': 0.7248520710059172}
    73%|███████▎  | 988/1352 [05:48<00:43,  8.31it/s]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
    73%|███████▎  | 990/1352 [05:49<00:41,  8.74it/s]{'loss': 0.8906982421875, 'learning_rate': 8.93092105263158e-06, 'epoch': 0.7322485207100592}
    74%|███████▍  | 1000/1352 [06:07<30:06,  5.13s/it]{'loss': 0.68482666015625, 'learning_rate': 8.68421052631579e-06, 'epoch': 0.7396449704142012}
                                                    {'loss': 0.641973876953125, 'learning_rate': 8.4375e-06, 'epoch': 0.7470414201183432}
    75%|███████▌  | 1020/1352 [06:09<00:53,  6.24it/s]{'loss': 0.7864013671875, 'learning_rate': 8.19078947368421e-06, 'epoch': 0.7544378698224852}
    76%|███████▌  | 1030/1352 [06:11<00:43,  7.46it/s]{'loss': 0.738299560546875, 'learning_rate': 7.94407894736842e-06, 'epoch': 0.7618343195266272}
    77%|███████▋  | 1040/1352 [06:12<00:38,  8.05it/s]{'loss': 0.84078369140625, 'learning_rate': 7.697368421052632e-06, 'epoch': 0.7692307692307693}
    78%|███████▊  | 1050/1352 [06:13<00:42,  7.17it/s]{'loss': 0.752410888671875, 'learning_rate': 7.450657894736843e-06, 'epoch': 0.7766272189349113}
    78%|███████▊  | 1060/1352 [06:15<00:37,  7.70it/s]{'loss': 0.72764892578125, 'learning_rate': 7.203947368421053e-06, 'epoch': 0.7840236686390533}
    79%|███████▉  | 1070/1352 [06:16<00:34,  8.29it/s]{'loss': 0.7495849609375, 'learning_rate': 6.957236842105264e-06, 'epoch': 0.7914201183431953}
    80%|███████▉  | 1080/1352 [06:17<00:35,  7.59it/s]{'loss': 0.706201171875, 'learning_rate': 6.710526315789474e-06, 'epoch': 0.7988165680473372}
    81%|████████  | 1090/1352 [06:19<00:40,  6.49it/s]{'loss': 0.690985107421875, 'learning_rate': 6.463815789473684e-06, 'epoch': 0.8062130177514792}
    81%|████████▏ | 1100/1352 [06:23<03:46,  1.11it/s]{'loss': 0.6723876953125, 'learning_rate': 6.217105263157895e-06, 'epoch': 0.8136094674556213}
    82%|████████▏ | 1110/1352 [06:24<00:35,  6.82it/s]{'loss': 0.74981689453125, 'learning_rate': 5.970394736842105e-06, 'epoch': 0.8210059171597633}
    83%|████████▎ | 1120/1352 [06:25<00:29,  7.86it/s]{'loss': 0.8501708984375, 'learning_rate': 5.723684210526316e-06, 'epoch': 0.8284023668639053}
    84%|████████▎ | 1130/1352 [06:27<00:30,  7.17it/s]{'loss': 0.7604736328125, 'learning_rate': 5.476973684210527e-06, 'epoch': 0.8357988165680473}
    84%|████████▍ | 1140/1352 [06:28<00:32,  6.45it/s]{'loss': 0.729522705078125, 'learning_rate': 5.230263157894737e-06, 'epoch': 0.8431952662721893}
    85%|████████▌ | 1150/1352 [06:30<00:26,  7.71it/s]{'loss': 0.702081298828125, 'learning_rate': 4.983552631578948e-06, 'epoch': 0.8505917159763313}
    86%|████████▌ | 1160/1352 [06:31<00:24,  7.80it/s]{'loss': 0.74451904296875, 'learning_rate': 4.736842105263158e-06, 'epoch': 0.8579881656804734}
    87%|████████▋ | 1170/1352 [06:33<00:27,  6.65it/s]{'loss': 0.703253173828125, 'learning_rate': 4.490131578947369e-06, 'epoch': 0.8653846153846154}
    87%|████████▋ | 1180/1352 [06:34<00:22,  7.66it/s]{'loss': 0.589453125, 'learning_rate': 4.243421052631579e-06, 'epoch': 0.8727810650887574}
    88%|████████▊ | 1190/1352 [06:35<00:26,  6.16it/s]{'loss': 0.649810791015625, 'learning_rate': 3.99671052631579e-06, 'epoch': 0.8801775147928994}
                                                    {'loss': 0.62998046875, 'learning_rate': 3.75e-06, 'epoch': 0.8875739644970414}
    89%|████████▉ | 1210/1352 [06:40<00:19,  7.11it/s]94736842106e-06, 'epoch': 0.8949704142011834}
    90%|█████████ | 1220/1352 [06:41<00:16,  7.82it/s]{'loss': 0.6827392578125, 'learning_rate': 3.256578947368421e-06, 'epoch': 0.9023668639053254}
    91%|█████████ | 1230/1352 [06:42<00:15,  7.94it/s]{'loss': 0.73472900390625, 'learning_rate': 3.009868421052632e-06, 'epoch': 0.9097633136094675}
                                                    {'loss': 0.826873779296875, 'learning_rate': 2.763157894736842e-06, 'epoch': 0.9171597633136095}
    92%|█████████▏| 1250/1352 [06:45<00:12,  7.87it/s]{'loss': 0.717706298828125, 'learning_rate': 2.5164473684210525e-06, 'epoch': 0.9245562130177515}
                                                    {'loss': 0.754278564453125, 'learning_rate': 2.2697368421052634e-06, 'epoch': 0.9319526627218935}
    94%|█████████▍| 1270/1352 [06:48<00:12,  6.46it/s]{'loss': 0.720526123046875, 'learning_rate': 2.023026315789474e-06, 'epoch': 0.9393491124260355}
    95%|█████████▍| 1280/1352 [06:49<00:09,  7.83it/s]57894736842e-06, 'epoch': 0.9467455621301775}
                                                    {'loss': 0.7333740234375, 'learning_rate': 1.5296052631578948e-06, 'epoch': 0.9541420118343196}
                                                    {'loss': 0.60689697265625, 'learning_rate': 1.2828947368421053e-06, 'epoch': 0.9615384615384616}
    97%|█████████▋| 1310/1352 [06:56<00:05,  7.35it/s]{'loss': 0.65772705078125, 'learning_rate': 1.0361842105263158e-06, 'epoch': 0.9689349112426036}
    98%|█████████▊| 1320/1352 [06:57<00:04,  7.71it/s]{'loss': 0.7178466796875, 'learning_rate': 7.894736842105263e-07, 'epoch': 0.9763313609467456}
    98%|█████████▊| 1330/1352 [06:58<00:02,  7.81it/s]{'loss': 0.77506103515625, 'learning_rate': 5.427631578947369e-07, 'epoch': 0.9837278106508875}
                                                    {'loss': 0.62127685546875, 'learning_rate': 2.9605263157894736e-07, 'epoch': 0.9911242603550295}
    100%|█████████▉| 1350/1352 [07:01<00:00,  7.80it/s]{'loss': 0.721728515625, 'learning_rate': 4.934210526315789e-08, 'epoch': 0.9985207100591716}
                                                    {'eval_loss': 0.6999529004096985, 'eval_accuracy': 0.7096505823627288, 'eval_f1_micro': 0.7096505823627288, 'eval_precision_micro': 0.7096505823627288, 'eval_recall_micro': 0.7096505823627288, 'eval_f1_macro': 0.5704574545076051, 'eval_precision_macro': 0.6174460882580549, 'eval_recall_macro': 0.100%|██████████| 1352/1352 [07:06<00:00,  8.20it/s] 1.0}
    100%|██████████| 1352/1352 [07:10<00:00,  3.14it/s]
    Done.
    
    [INFO] Done.
    
    [INDO] Begin saving best checkpoint.
    [INFO] Done.
    
    
    Begin model evaluation on test set.
    98%|█████████▊| 164/167 [00:05<00:00, 28.47it/s][DEBUG] label_ids = [2 1 2 ... 0 1 1]
    Evaluation on test set (dataset: wisesight_sentiment)
    eval_loss : 0.7032
    eval_accuracy : 0.7080
    eval_f1_micro : 0.7080
    eval_precision_micro : 0.7080
    eval_recall_micro : 0.7080
    eval_f1_macro : 0.5519
    eval_precision_macro : 0.6141
    eval_recall_macro : 0.5262
    eval_nb_samples : 2671.0000
    
    wandb: Waiting for W&B process to finish, PID 69657
    wandb: Program ended successfully.
    wandb: Find user logs for this run at: /workspace/scripts/wandb/offline-run-20210115_095447-1iwbiyf3/logs/debug.log
    wandb: Find internal logs for this run at: /workspace/scripts/wandb/offline-run-20210115_095447-1iwbiyf3/logs/debug-internal.log
    wandb: Run summary:
    wandb:                                                                   loss 0.72173
    wandb:                                                          learning_rate 0.0
    wandb:                                                                  epoch 1.0
    wandb:                                                             total_flos 2230830350876160
    wandb:                                                                  _step 1352
    wandb:                                                               _runtime 442
    wandb:                                                             _timestamp 1610704930
    wandb:                                                     test-set_eval_loss 0.70315
    wandb:                                                 test-set_eval_accuracy 0.70797
    wandb:                                                 test-set_eval_f1_micro 0.70797
    wandb:                                          test-set_eval_precision_micro 0.70797
    wandb:                                             test-set_eval_recall_micro 0.70797
    wandb:                                                 test-set_eval_f1_macro 0.55186
    wandb:                                          test-set_eval_precision_macro 0.61413
    wandb:                                             test-set_eval_recall_macro 0.52615
    wandb:                                               test-set_eval_nb_samples 2671
    wandb:                                                              eval_loss 0.69995
    wandb:                                                          eval_accuracy 0.70965
    wandb:                                                          eval_f1_micro 0.70965
    wandb:                                                   eval_precision_micro 0.70965
    wandb:                                                      eval_recall_micro 0.70965
    wandb:                                                          eval_f1_macro 0.57046
    wandb:                                                   eval_precision_macro 0.61745
    wandb:                                                      eval_recall_macro 0.54565
    wandb:                                                        eval_nb_samples 2404
    wandb: Run history:
    wandb:                   loss █▆▅▄▅▃▄▃▃▃▃▄▃▁▂▃▃▂▃▂▄▃▂▂▃▂▃▃▁▂▃▂▂▃▂▁▂▂▁▁
    wandb:          learning_rate ▂▃▅▇███▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁
    wandb:                  epoch ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
    wandb:             total_flos ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
    wandb:                  _step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
    wandb:               _runtime ▁▁▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇██
    wandb:             _timestamp ▁▁▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇██
    wandb:              eval_loss ▁
    wandb:          eval_accuracy ▁
    wandb:          eval_f1_micro ▁
    wandb:   eval_precision_micro ▁
    wandb:      eval_recall_micro ▁
    wandb:          eval_f1_macro ▁
    wandb:   eval_precision_macro ▁
    wandb:      eval_recall_macro ▁
    wandb:        eval_nb_samples ▁
    wandb: 
    wandb: You can sync this run to the cloud by running:
    wandb: wandb sync /workspace/scripts/wandb/offline-run-20210115_095447-1iwbiyf3
    root@IST-DGX01:/workspace/scripts#