Skip to content

Latest commit

 

History

History
478 lines (443 loc) · 14.1 KB

dbmtl.md

File metadata and controls

478 lines (443 loc) · 14.1 KB

DBMTL

简介

DBMTL构建了多个目标之间的贝叶斯网络,显式建模了多个目标之间可能存在的因果关系,通过对不同任务间的贝叶斯关系来同时优化场景中的多个指标。 dbmtl.png

底层的shared layer和specific layer是通过hard parameter sharing方式来人工配置的,而google的MMoE是基于soft parameter sharing来实现不同任务底层特征和网络共享,并在Youtube场景中取得了不错的效果。因此DBMTL同样支持将shared layer和specific layer模块替换成MMoE模块,即通过task gate的方式在多组expert参数中加权组合出对应task的feature。

dbmtl_mmoe.png

在多模态(图像、视频、文本)推荐场景,DBMTL支持使用CMBF模型UNITER模型作为底层的shared layer,以便充分利用多模态特征,取到更好的推荐效果。

配置说明

DBTML

model_config {
  model_class: "DBMTL"
  feature_groups {
    group_name: "all"
    feature_names: "user_id"
    feature_names: "cms_segid"
    ...
    feature_names: "tag_brand_list"
    wide_deep: DEEP
  }
  dbmtl {
    bottom_dnn {
      hidden_units: [1024, 512, 256]
    }
    task_towers {
      tower_name: "ctr"
      label_name: "clk"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "cvr"
      label_name: "buy"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_tower_names: ["ctr"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 5e-6
}
  • model_class: 'DBMTL', 不需要修改
  • feature_groups: 配置一个名为'all'的feature_group。
  • dbmtl: dbmtl相关的参数
    • experts
      • expert_name
      • dnn deep part的参数配置
        • hidden_units: dnn每一层的channel数目,即神经元的数目
    • task_towers 根据任务数配置task_towers
      • tower_name
      • dnn deep part的参数配置
        • hidden_units: dnn每一层的channel数目,即神经元的数目
      • 默认为二分类任务,即num_class默认为1,weight默认为1.0,loss_type默认为CLASSIFICATION,metrics_set为auc
      • 注:label_fields需与task_towers一一对齐。
    • embedding_regularization: 对embedding部分加regularization,防止overfit

DBMTL Based On Backbone

model_config {
  model_name: "DBMTL"
  model_class: "MultiTaskModel"
  feature_groups {
    group_name: "all"
    feature_names: "user_id"
    feature_names: "cms_segid"
    ...
    feature_names: "tag_brand_list"
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: "mask_net"
      inputs {
        feature_group_name: "all"
      }
      keras_layer {
        class_name: 'MaskNet'
        masknet {
          mask_blocks {
            aggregation_size: 512
            output_size: 256
          }
          mask_blocks {
            aggregation_size: 512
            output_size: 256
          }
          mask_blocks {
            aggregation_size: 512
            output_size: 256
          }
          mlp {
            hidden_units: [512, 256]
          }
        }
      }
    }
  }
  model_params {
    task_towers {
      tower_name: "ctr"
      label_name: "clk"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "cvr"
      label_name: "buy"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_tower_names: ["ctr"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 5e-6
}

该案例添加了一个额外的MaskNet层,为了展示以组件化方式搭建模型的灵活性。

  • model_name: 任意自定义字符串,仅有注释作用

  • model_class: 'MultiTaskModel', 不需要修改, 通过组件化方式搭建的多目标排序模型都叫这个名字

  • backbone: 通过组件化的方式搭建的主干网络,参考文档

    • blocks: 由多个组件块组成的一个有向无环图(DAG),框架负责按照DAG的拓扑排序执行个组件块关联的代码逻辑,构建TF Graph的一个子图
    • name/inputs: 每个block有一个唯一的名字(name),并且有一个或多个输入(inputs)和输出
    • keras_layer: 加载由class_name指定的自定义或系统内置的keras layer,执行一段代码逻辑;参考文档
    • masknet: MaskNet模型的参数,详见参考文档
  • 其余与dbmtl一致

DBMTL+MMOE

model_config {
  model_class: "DBMTL"
  feature_groups {
    group_name: "all"
    feature_names: "user_id"
    feature_names: "cms_segid"
    ...
    feature_names: "tag_brand_list"
    wide_deep: DEEP
  }
  dbmtl {
    bottom_dnn {
      hidden_units: [1024]
    }
    expert_dnn {
      hidden_units: [256, 128, 64, 32]
    }
    num_expert: 8
    task_towers {
      tower_name: "ctr"
      label_name: "clk"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "cvr"
      label_name: "buy"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_tower_names: ["ctr"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 5e-6
}
  • dbmtl
    • expert_dnn: MMOE的专家DNN配置
      • hidden_units: dnn每一层的channel数目,即神经元的数目
    • expert_num: 专家DNN的数目
    • 其余与dbmtl一致

DBMTL模型每个塔的输出名为:"logits_" / "probs_" / "y_" + tower_name 其中,logits/probs/y对应: sigmoid之前的值/概率/回归模型的预测值 DBMTL模型每个塔的指标为:指标名+ "_" + tower_name

DBMTL+CMBF

多模态、多目标推荐模型

model_config: {
  model_class: 'DBMTL'
  feature_groups: {
    group_name: 'image'
    feature_names: 'embedding'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'general'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'gender'
    feature_names: 'age'
    feature_names: 'occupation'
    feature_names: 'zip_id'
    feature_names: 'movie_year_bin'
    feature_names: 'score_year_diff'
    feature_names: 'score_time'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'text'
    feature_names: 'title'
    feature_names: 'genres'
    wide_deep: DEEP
  }
  dbmtl {
    bottom_cmbf {
      multi_head_num: 2
      image_multi_head_num: 2
      text_multi_head_num: 2
      image_feature_patch_num: 8
      image_head_size: 32
      text_head_size: 8
      image_self_attention_layer_num: 2
      text_self_attention_layer_num: 2
      cross_modal_layer_num: 3
      image_cross_head_size: 8
      text_cross_head_size: 16
      max_position_embeddings: 16
      use_token_type: true
    }
    task_towers {
      tower_name: "classify"
      label_name: "label"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      metrics_set: {
        gauc {
          uid_field: 'user_id'
        }
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "rating"
      label_name: "rating"
      loss_type: L2_LOSS
      metrics_set: {
        mean_squared_error {}
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_tower_names: ["classify"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 1e-6
}
  • dbmtl
    • bottom_cmbf: 跨模态融合模型CMBF的相关配置,请参考CMBF模型配置。
      • image_feature_dim: 在单模态学习模块之前做图像特征维度调整,调整到该参数指定的维度
      • multi_head_num: 跨模态融合模块中的 head 数量,默认为1
      • image_multi_head_num: 图像单模态学习模块中的 head 数量,默认为1
      • text_multi_head_num: 文本单模态学习模块中的 head 数量,默认为1
      • image_head_size: 单模态学习模块中的图像tower,multi-headed self-attention的每个head的size
      • text_head_size: 单模态学习模块中的文本tower,multi-headed self-attention的每个head的size
      • image_feature_patch_num: [可选,默认值为1] 当只有一个image feature时生效,表示该图像特征是一个复合embedding,维度为image_feature_patch_num * embedding_size
      • image_self_attention_layer_num: 单模态学习模块中的图像tower,multi-headed self-attention的层数
      • text_self_attention_layer_num: 单模态学习模块中的文本tower,multi-headed self-attention的层数
      • cross_modal_layer_num: 跨模态融合模块的层数,建议设在1到5之间,默认为1
      • image_cross_head_size: 跨模模态学习模块中的图像tower,multi-headed attention的每个head的size
      • text_cross_head_size: 跨模模态学习模块中的文本tower,multi-headed attention的每个head的size
      • attention_probs_dropout_prob: self/cross attention模块attention权重的dropout概率
      • hidden_dropout_prob: multi-headed attention模块中FC layer的dropout概率
      • use_token_type: bool,default is false;是否使用token type embedding区分不同的text sequence feature
      • use_position_embeddings: bool, default is true;是否为文本序列添加位置编码
      • max_position_embeddings: 文本序列的最大位置,当use_position_embeddings为true时,必须配置;并且必须大于或等于所有特征配置max_seq_len的最大值
      • text_seq_emb_dropout_prob: 文本序列embedding的dropout概率
      • other_feature_dnn: [可选] 其他特征的MLP网络配置
    • 其余与dbmtl一致

DBMTL+UNITER

多模态、多目标推荐模型

model_config: {
  model_class: 'DBMTL'
  feature_groups: {
    group_name: 'image'
    feature_names: 'embedding'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'general'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'gender'
    feature_names: 'age'
    feature_names: 'occupation'
    feature_names: 'zip_id'
    feature_names: 'movie_year_bin'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'other'
    feature_names: 'score_year_diff'
    feature_names: 'score_time'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'text'
    feature_names: 'title'
    feature_names: 'genres'
    wide_deep: DEEP
  }
  dbmtl {
    bottom_uniter {
      hidden_size: 512
      num_attention_heads: 4
      num_hidden_layers: 2
      intermediate_size: 512
      hidden_act: 'swish'
      max_position_embeddings: 16
      hidden_dropout_prob: 0.1
      attention_probs_dropout_prob: 0
      other_feature_dnn: {
        hidden_units: 256
        hidden_units: 128
      }
    }
    task_towers {
      tower_name: "classify"
      label_name: "label"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      metrics_set: {
        gauc {
          uid_field: 'user_id'
        }
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "rating"
      label_name: "rating"
      loss_type: L2_LOSS
      metrics_set: {
        mean_squared_error {}
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_tower_names: ["classify"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 1e-6
}
  • dbmtl
    • bottom_uniter: 多模态融合模型UNITER的相关配置,请参考UNITER模型配置。

      • num_attention_heads: 跨模态融合模块中的 head 数量
      • hidden_size: transformer模块的隐层单元数,必须是num_attention_heads的整数倍。
      • num_hidden_layers: transformer模块中的self-attention层数,建议设在1到5之间,默认为1
      • intermediate_size: transformer模块中MLP层的神经元个数
      • hidden_act: transformer模块中MLP层的激活函数,目前支持:'relu'、'swish'、'gelu'、'tanh'
      • attention_probs_dropout_prob: self/cross attention模块attention权重的dropout概率
      • hidden_dropout_prob: multi-headed attention模块中FC layer的dropout概率
      • use_position_embeddings: bool, default is true;是否为文本序列添加位置编码
      • max_position_embeddings: 文本序列的最大位置,当use_position_embeddings为true时,必须配置;并且必须大于或等于所有特征配置max_seq_len的最大值
      • other_feature_dnn: [可选] 其他特征的MLP网络配置
    • 其余与dbmtl一致

示例Config

参考论文

DBMTL