Image-to-Text Tool

Project Description

This tool processes images and generates textual descriptions using advanced machine learning models.

It supports multiple models such as BLIP and UForm, allowing users to choose the model that best fits their needs.

Setup and Installation

To set up the environment for this project, please follow the instructions according to your operating system.

For Unix-based Systems (Linux, macOS)

Open a terminal.
Navigate to the project's root directory.
Run the following command to execute the installation script:

./install.sh

This script will create a Python virtual environment, activate it, install the necessary dependencies, and set up the PyTorch library with GPU support.

For Windows Systems

Open Command Prompt.
Navigate to the project's root directory.
Run the following command to execute the installation script:

install.bat

Note: It's also possible to execute the script by double-clicking the install.bat file in Windows Explorer.

Similar to the Unix script, this will set up the Python virtual environment and install all necessary dependencies, including PyTorch with GPU support.

Docker Setup

Alternatively, you can use Docker to set up the environment:

Ensure Docker is installed on your system.
Build the Docker image:

docker build -t image-to-text-tool .

To run the Docker container with the project's root directory mapped as a volume, use the following command. This allows you to have the input folder and other necessary files within the container.

For Unix-based Systems:

docker run -it --gpus all -v $(pwd):/usr/src/app image-to-text-tool

For Windows Systems (Command Prompt):

docker run -it --gpus all -v %cd%:/usr/src/app image-to-text-tool

For Windows Systems (PowerShell):

docker run -it --gpus all -v ${PWD}:/usr/src/app image-to-text-tool

This approach uses the NVIDIA CUDA base image and sets up the environment with GPU support, while also allowing you to work directly with your project files.

Usage

After installing the dependencies and setting up the environment, you can use the tool as follows:

Place the images you want to process in the input folder.

Run the run.py script with the desired model flags.

For example:

python run.py --blip --uform

To process the images using all available models, simply run the script without any flags:

python run.py

This will process the images using the specified models and generate textual descriptions.

Output

The output will be saved in JSON format in designated files for each model. Check the output files in the project directory to view the descriptions generated for each image.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
install.bat		install.bat
install.sh		install.sh
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

install.bat

install.bat

install.sh

install.sh

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

Image-to-Text Tool

Project Description

Setup and Installation

For Unix-based Systems (Linux, macOS)

For Windows Systems

Docker Setup

Usage

Output

About

Releases 1

Packages

Languages

License

renan-siqueira/image-to-text-tool

Folders and files

Latest commit

History

Repository files navigation

Image-to-Text Tool

Project Description

Setup and Installation

For Unix-based Systems (Linux, macOS)

For Windows Systems

Docker Setup

Usage

Output

About

Topics

Resources

License

Stars

Watchers

Forks

Languages