Skip to content

Fine-tune DETR object detection approach on a custom VPA Dataset.

Notifications You must be signed in to change notification settings

EhsanAlahi/DETR_VPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DETR_VPA

The purpose of the project was to train DETR object detection model on a custom dataset VPA (Visual Privacy Advisor). The model was trained for almost 55 epoch on Google colab. I used the official facebook detr: End to end object detection with transformers (https://github.com/facebookresearch/detr) and fine-tunned my own model on vpa. Another repository that helped me tuning the detr on my custom dataset is https://github.com/woctezuma/finetune-detr#data.

DATASET

You can download the dataset from https://tribhuvanesh.github.io/vpa/ if you want to retrain the model or use it for another use-case.In total, dataset contains 29 classes mostly related to text documnets.

  "face_all"
  "address_current_all", 
  "address_home_all", 
  "a108_license_plate_all", 
  "person_body", 
  "nudity_all", 
  "name_all", 
  "ethnic_clothing", 
  "birth_date", 
  "handwriting", 
  "ausweis", 
  "credit_card", 
  "passport", 
  "drivers_license", 
  "student_id", 
  "amail", 
  "receipt", 
  "ticket", 
  "disability_physical", 
  "medicine", 
  "phone", 
  "education_history", 
  "landmark", 
  "fingerprint", 
  "date_time", 
  "username", 
  "signature", 
  "email"

Data Preperation

Inorder to train detr on vpa, I converted the vpa annotations into COCO format. The modify_vpa_coco.ipynb can be used for that purpose.

Training

The model was fine-tunned using DETR R50 model. Others models can be downloaded from the catalogue of model zoo (https://github.com/facebookresearch/detr#model-zoo).

Inferences

You can use the inferences.ipynb file to generate inferences on anyother images. Download the wights file from this drive link (https://drive.google.com/file/d/1-tcalfc-AjF1GezjqMvPLLGs4nr4R5p9/view?usp=sharing)

Use-Cases

VPA dataset contain 29 different classes and our model was trained on all of them. Model perform differently for each class.Below are the results for different classes.

Face, Body and Nudity

alt text alt_text alt_text

Handwritten, Mail and Ticket

alt_text alt_text alt_text

Receipt

alt_text alt_text

Passport

alt_text alt_text

license plate number

alt_text

Redact Information

We can aslo use it to redact information on student-id cards or other documents like Pictute, User-name, Date of birth or other information. alt_text alt_text

References

Releases

No releases published

Packages

No packages published