Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models (ACM MM 2022)
Official Pytorch implementation and dataset
Code is available at github
VL-Bias is available at Google Drive
VL-Bias dataset collected 24k images, including 13K for the 52 activities and 11K for the 13 occupations.
baking biking cleaning cooking crying driving exercising fishing hugging jumping kneeling lifting picking praying riding running sewing shouting skating smiling spying staring studying talking walking waving begging calling climbing coughing drinking eating falling hitting jogging kicking laughing painting pitching reading rowing serving shopping sitting sleeping speaking standing stretching sweeping throwing washing working
athlete chef doctor engineer farmer footballer judge mechanic nurse pilot police runner soldier
we use four templates described in Table to generate captions. Finally, for each template, we have collected 24k image-text pairs, including 13K for the 52 activities and 11K for the 13 occupations.