Skip to content
/ rec2vqa Public

This repository is about Referring Expression Comprehension Based Visual Question Answering.

Notifications You must be signed in to change notification settings

XIRZC/rec2vqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Referring Expression Based Visual Question Answering

Logo

This project aims at combining the two most prevalent vision-langauge tasks, first referring expression comprenhension(REC) and then visual question answering(VQA), short for REC2VQA. I finetuned VLBERT on VQAv2 and RefCOCO first to get two independent checkpoints, and then developed a demonstration webui to show this new two-stage task based on vue and django. To optimize the large model loading time, I leveraged Redis and RabbitMQ to asynchronously request large model inference after loading the large model in advance.

System Architecture

Getting Started

We recommand to use docker for installing and deploying this demonstrative vlbert app.

Before utilizing docker to deploy, we need to mannually set django database:

cd ./django/
pip install -r requirements.txt
python manage.py makemigrations api
python manage.py migrate

The reason you need to run above commands is we hope to mount ./django directory to corresponding container /work directroy, so everytime we make changes on the repository codes, the container can have corresponding changes.

Frontend

We use Vue3+ElementPlus+Typescript for frontend user interface developing, and use node docker image to build docker image and deploy this vue app.

Here are two ways for getting and deploying frontend docker image used in this repository:

  • Run cd ./vue/app && docker build -t mrxir/rec2vqa:vue . or just uncomment build: ./vue/app line in docker-compose.yml when directly run docker compose up -d
  • Run docker pull mrxir/rec2vqa:vue or just directly run docker compose up -d

And, directly run docker compose up -d may be the best option.

Backend

We use Django+Redis+Rabbitmq for backend data interface developing, and use python3 docker image to build this django app environment.

As mentioned in above, you can just directly run docker compose up -d, or build or pull it by yourself.

VLBERT

We use nvidia-cuda docker image to build the awful and old environment that vlbert used, which is based on Ubuntu16.04-Cuda9-Cudnn7-Gcc4.9.3-Pytorch1.1.0-Torchvision0.3.0-Python3.6

As mentioned in above, you can just directly run docker compose up -d, or build or pull it by yourself.

Note: you must refer to this wiki for ensuring you can access nvidia gpus if you want to build vlbert docker image by yourself. Otherwise, you will find that your build image cannot correctly run on the compose stage.

Miscellaneous

In addition to above docker images, there are also some other miscellaneous files to hold, including ./vlbert/docker_build for vlbert image build requirements(optional) and ./vlbert/(data|ckpts|model) including vqa and rec finetuned weights, vlbert cached module weights and datasets for down-stream tasks finetuning(optional). Here is the aliyunpan link. After downloading these files, you need to place these files in corresponding path in order to mount these files into docker container workspace correctly during docker compose up -d.

Deploying

The easiest and best way for deploying this codebase is just running docke compose up -d in repository root directory.

And we have five docker images and six services in docker compose deploying.

Docker images:

Docker services:

  • redis deploy at 5672 open to all local network ips
  • rabbitmq deploy at 6732 open to all local network ips
  • vue deploy at 80 open to all local network ips
  • django deploy at 8080 open to all local network ips
  • vlbert-recworker deploy after rabbitmq booting finished
  • vlbert-vqaworker deploy after rabbitmq booting finished

After deploying, you can visit http://$YOUR_LOCAL_IP/#/app/Main for vue frontend interface, and http://$YOUR_LOCAL_IP:8080 for django backend data api. And if you deploy at a server, then replace YOUR_LOCAL_IP with YOUR_REMOTE_IP.

I deploy on my local network server, and here are urls after deploying this app and NAT traversa by Cloudflare Zero-Trust Tunnel:

Demo Video

3.mp4

Project Structure

.
├── assets # static resources
│   ├── data_flow.png
│   ├── demo.mp4
│   ├── logo.png
│   ├── presentation.pptx
│   ├── sys_arch.png
│   └── thesis.pdf
├── django # backend django api
│   ├── api # main django app
│   ├── backend # django configurations
│   ├── db # sqlite database
│   ├── Dockerfile # django docker build file
│   ├── manage.py # django main program
│   ├── media # django host static files path
│   ├── recworker.py # referring expression comprehension asynchronous worker
│   ├── requirements.txt # python dependencies
│   └── vqaworker.py # visual question answering asynchronous worker
├── docker-compose.yml # docker compose configuration file
├── README.md
├── vlbert # vision-language large model for VQA and REC
│   ├── cfgs
│   ├── ckpts
│   ├── common
│   ├── data
│   ├── Dockerfile
│   ├── external
│   ├── figs
│   ├── LICENSE
│   ├── model
│   ├── pretrain
│   ├── README.md
│   ├── refcoco
│   ├── requirements.txt
│   ├── scripts
│   ├── vcr
│   ├── viz
│   └── vqa
└── vue # frontend vue app
    └── app

21 directories, 17 files

References

This repository is developed based mainly on VLBERT(for vlbert pytorch model finetuning and inference) and GradCam Demo & MAttNet Demo(for combining redis, rabbitmq and django to asynchronously request model inference and realtime communication using websockets).

About

This repository is about Referring Expression Comprehension Based Visual Question Answering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published