使用GeneralRecognitionV2_PPLCNetV2_base.yaml训练自己的数据集，有些参数如何调整 #3093

yaphet266 · 2024-02-27T02:03:59Z

是否有文档解释下这个yaml文件中每个配置项的含义

global configs

Global:
checkpoints: null
pretrained_model: null
output_dir: ./output
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 100
print_batch_step: 20
use_visualdl: False
eval_mode: retrieval
retrieval_feature_from: features # 'backbone' or 'features'
re_ranking: False
use_dali: False

used for static mode and model export

image_shape: [3, 224, 224]
save_inference_dir: ./inference

AMP:
scale_loss: 65536
use_dynamic_loss_scaling: True

O1: mixed fp16

level: O1

model architecture

Arch:
name: RecModel
infer_output_key: features
infer_add_softmax: False

Backbone:
name: PPLCNetV2_base_ShiTu
pretrained: True
use_ssld: True
class_expand: &feat_dim 512
BackboneStopLayer:
name: flatten
Neck:
name: BNNeck
num_features: *feat_dim
weight_attr:
initializer:
name: Constant
value: 1.0
bias_attr:
initializer:
name: Constant
value: 0.0
learning_rate: 1.0e-20 # NOTE: Temporarily set lr small enough to freeze the bias to zero
Head:
name: FC
embedding_size: *feat_dim
class_num: 192612
weight_attr:
initializer:
name: Normal
std: 0.001
bias_attr: False

loss function config for traing/eval process

Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
- TripletAngularMarginLoss:
weight: 1.0
feature_from: features
margin: 0.5
reduction: mean
add_absolute: True
absolute_loss_weight: 0.1
normalize_feature: True
ap_value: 0.8
an_value: 0.4
Eval:
- CELoss:
weight: 1.0

Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.06 # for 8gpu x 256bs
warmup_epoch: 5
regularizer:
name: L2
coeff: 0.00001

data loader for train and eval

DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/train_reg_all_data_v2.txt
relabel: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- RandFlipImage:
flip_code: 1
- Pad:
padding: 10
backend: cv2
- RandCropImageV2:
size: [224, 224]
- RandomRotation:
prob: 0.5
degrees: 90
interpolation: bilinear
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: hwc
sampler:
name: PKSampler
batch_size: 256
sample_per_id: 4
drop_last: False
shuffle: True
sample_method: "id_avg_prob"
id_list: [50030, 80700, 92019, 96015] # be careful when set relabel=True
ratio: [4, 4]
loader:
num_workers: 4
use_shared_memory: True

Eval:
Query:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/
cls_label_path: ./dataset/Aliproduct/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: hwc
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True

Gallery:
  dataset:
    name: VeriWild
    image_root: ./dataset/Aliproduct/
    cls_label_path: ./dataset/Aliproduct/val_list.txt
    transform_ops:
      - DecodeImage:
          to_rgb: True
          channel_first: False
      - ResizeImage:
          size: [224, 224]
          return_numpy: False
          interpolation: bilinear
          backend: cv2
      - NormalizeImage:
          scale: 1.0/255.0
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: hwc
  sampler:
    name: DistributedBatchSampler
    batch_size: 64
    drop_last: False
    shuffle: False
  loader:
    num_workers: 4
    use_shared_memory: True

Metric:
Eval:
- Recallk:
topk: [1, 5]
- mAP: {}

The text was updated successfully, but these errors were encountered:

TingquanGao · 2024-02-28T06:28:09Z

修改数据集路径(image_root: ./dataset/ cls_label_path: ./dataset/train_reg_all_data_v2.txt)后可以先训练试试，观察loss是否下降，已经最终的收敛情况、精度情况，再适当调整learning rate。

yaphet266 · 2024-02-29T08:49:52Z

@TingquanGao 如果主干网络想改成ResNet50怎么修改，是否有可参考的yaml文件

changdazhou · 2024-04-25T13:02:22Z

更改一下Arch:即可

paddle-bot bot assigned cuicheng01 Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用GeneralRecognitionV2_PPLCNetV2_base.yaml训练自己的数据集，有些参数如何调整 #3093

使用GeneralRecognitionV2_PPLCNetV2_base.yaml训练自己的数据集，有些参数如何调整 #3093

yaphet266 commented Feb 27, 2024

TingquanGao commented Feb 28, 2024

yaphet266 commented Feb 29, 2024

changdazhou commented Apr 25, 2024

使用GeneralRecognitionV2_PPLCNetV2_base.yaml训练自己的数据集，有些参数如何调整 #3093

使用GeneralRecognitionV2_PPLCNetV2_base.yaml训练自己的数据集，有些参数如何调整 #3093

Comments

yaphet266 commented Feb 27, 2024

global configs

used for static mode and model export

O1: mixed fp16

model architecture

loss function config for traing/eval process

data loader for train and eval

TingquanGao commented Feb 28, 2024

yaphet266 commented Feb 29, 2024

changdazhou commented Apr 25, 2024