Diffusion Based Augmentation for captioning and retrieval in Cultural Heritage

This is the official repository for the ICCV 2023 4th Workshop on e-Heritage paper: Diffusion Based Augmentation for captioning and retrieval in Cultural Heritage Dario Cioni, Lorenzo Berlincioni, Federico Becattini, Alberto del Bimbo

If you find our work useful, we welcome citations:

@InProceedings{Cioni_2023_ICCV,
    author    = {Cioni, Dario and Berlincioni, Lorenzo and Becattini, Federico and Del Bimbo, Alberto},
    title     = {Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2023},
    pages     = {1707-1716}
}

Project Structure

Here is the description of the main files and folders of the project.

  cultural-heritage-image2text/
  │
  ├── main.py - main script for training and testing models
  │
  ├── data_loader/ - anything about data loading goes here
  │   └── artpedia.py contains Artpedia Dataset and DataModule
  │
  ├── data/ - default directory for storing input data
  │
  ├── model/ - models and metrics
  │   ├── model.py - LightningModule wrapper for image captioning
  │   └── metrics/ directory with custom metrics
  │
  ├── runs/
  │   ├── cultural-heritage/ - trained models are saved here
  │   └── wandb/ - local logdir for wandb and logging output
  │
  └── utils/
      ├── utils.py - small utility functions for training
      └── download.py - utility to download images from Artpedia json metadata

Data

Experiments were performed on the Artpedia and ArtCap datasets. Images were downloaded from Wikipedia using the download.py script. To download the images, run the following command, providing a valid identifier, the annotation file and the output directory.

python utils/download.py email@domain.com --ann_file data/artpedia/artpedia.json --img_dir data/artpedia/images

Installation

This project uses a modified version of pycocoevalcap. To install it, run the following command:

git submodule add --init
git submodule update --remote
cd pycocoevalcap
pip install -e .

Usage

Command line interface is implemented using LightningCLI.

Configuration

The setup during training and validation is controlled by a configuration file. The configuration file is a YAML file with the following structure:

# lightning.pytorch==2.0.1.post0
seed_everything: int | bool
trainer:
  # list of trainer args
  logger:
    class_path: lightning.pytorch.loggers.WandbLogger
    init_args:
      # wandb logging args
  callbacks:
    class_path: callbacks.predictions.LogPredictionSamplesCallback
model:
  model_name_or_path: microsoft/git-base
  learning_rate: 5.0e-05
  warmup_steps: 500
  weight_decay: 0.0
  metrics:
    # add or remove metrics here
    - class_path: model.CocoScore
    - class_path: torchmetrics.text.BERTScore
      init_args:
        model_name_or_path: distilbert-base-uncased
        batch_size: 16
        lang: en
        max_length: 512
  generation:
    # generation args
data:
  img_dir: data/artpedia/
  ann_file: data/artpedia/artpedia_augmented.json
  batch_size: 2
  # Processor name for model
  model_name_or_path: microsoft/git-base
  num_workers: 6
ckpt_path: null # provide a path to a checkpoint to load

Every configuration can be overridden by passing a command line argument with the same name. For example, to override the batch_size parameter, you can run:

python main.py fit --config configs/config.yaml --data.batch_size 32

You can find a complete example of a configuration file in configs/ folder.

Dataset augmentation

The dataset augmentation is performed using the img2img.py script. The script uses the Automatic1111 API. To use the script, you need to provide an URL to the API. The script takes as input the path to the dataset annotation file, the path to the original dataset images and the output path for the augmented dataset.

For each image in the dataset, the script generates a new folder with the same name as the image, containing augmented images.

python img2img.py --api_url http://127.0.0.1:7860 --ann_file data/artpedia/artpedia.json --img_dir data/artpedia/images --out_dir data/artpedia/samples

Train

Training is performed using the fit subcommand, followed by the path to the configuration file and other optional arguments.

python main.py fit -c configs/your_config.yaml

Test

Test is performed using the test command, followed by the path to the configuration file and other optional arguments.

python main.py test -c configs/config.yaml --ckpt_path path/to/ckpt.ckpt

Results

Here are the performance of the pretrained models on the Artpedia and ArtCap dataset. For additional results, please refer to the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
callbacks		callbacks
configs		configs
data_loader		data_loader
model		model
pycocoevalcap @ de626c2		pycocoevalcap @ de626c2
resources		resources
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
download.py		download.py
main.py		main.py
requirements.txt		requirements.txt

ciodar/cultural-heritage-diffaug

Folders and files

Latest commit

History

Repository files navigation

Diffusion Based Augmentation for captioning and retrieval in Cultural Heritage

Table of Contents

Project Structure

Data

Installation

Usage

Configuration

Dataset augmentation

Train

Test

Results

About

Resources

Stars

Watchers

Forks

Languages