A Flask-based REST API that extracts and structures information from business card images using OCR and AI.
- Supports multiple OCR engines:
- Google Cloud Vision API
- Tesseract OCR
- Text structuring using OpenAI GPT
- JSON output format
- File upload validation
- Secure file handling
- Configurable via environment variables
- Python 3.x
- Flask 2.0.1
- Google Cloud Vision API
- OpenAI GPT API
- Tesseract OCR
- Additional dependencies in
requirements.txt
- Python 3.10 or higher
- Tesseract OCR installed on your system
- Google Cloud account
- OpenAI account
- Git (optional)
-
Clone the repository (or download the source code):
git clone https://github.com/The-Lone-Druid/cardscannerpoc.git cd cardscannerpoc
-
Create and activate a virtual environment:
# Windows python -m venv venv source venv/Scripts/activate # Git Bash # or .\venv\Scripts\activate.ps1 # PowerShell # or venv\Scripts\activate.bat # Command Prompt # macOS/Linux python3 -m venv venv source venv/bin/activate
-
Install required packages:
pip install -r requirements.txt
- Create a Google Cloud account: https://cloud.google.com/
- Create a new project in Google Cloud Console
- Enable the Cloud Vision API:
- Go to "APIs & Services" > "Library"
- Search for "Cloud Vision API"
- Click "Enable"
- Make sure to add a billing account to the project or else the API will not work, follow the instructions here
- Create credentials:
- Go to "APIs & Services" > "Credentials"
- Click "Create Credentials" > "Service Account"
- Fill in service account details
- Select role: "Project" > "Owner"
- Click "Create and Continue"
- Download JSON credentials:
- Click on your service account
- Go to "Keys" tab
- Click "Add Key" > "Create New Key"
- Choose JSON format
- Save the file in your project's
credentials
folder
- Create an OpenAI account: https://platform.openai.com/signup
- Get your API key:
- Go to https://platform.openai.com/api-keys
- Click "Create new secret key"
- Copy the generated key
- This also requires a billing account, follow the instructions here
-
Windows:
- Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
- Install and note the installation path
- Update
tesseract_cmd
path inutils/tesseract_helper.py
-
macOS:
brew install tesseract
-
Linux:
sudo apt-get install tesseract-ocr
-
Create a
.env
file in the project root:FLASK_SECRET_KEY=your-secret-key-here GOOGLE_APPLICATION_CREDENTIALS=./credentials/your-credentials-file.json OPENAI_API_KEY=your-openai-api-key
-
Create required directories:
mkdir -p credentials mkdir -p scans/generated
-
Move your Google Cloud credentials JSON file to the
credentials
folder
python test_setup.py
Once you have verified that the API keys are working, you can start the application.
-
Ensure your virtual environment is activated
-
Start the Flask server:
python app.py
-
Access the application at: http://127.0.0.1:5000
- Open the web interface in your browser
- Select the OCR engine (Google Vision or Tesseract)
- Upload a business card image
- Click "Scan Card"
- View the extracted information in both raw and structured JSON format
cardscannerpoc/
├── __pycache__/
├── credentials/
├── scans/
│ └── generated/
├── static/
│ ├── css/
│ │ └── style.css
│ └── js/
│ └── main.js
├── templates/
│ └── index.html
├── utils/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── vision_helper.py
│ ├── tesseract_helper.py
│ └── gpt_helper.py
├── .env
├── .gitignore
├── app.py
├── config.py
├── README.md
├── requirements.txt
├── test_setup.py
└── TODOS.md
Test your API connections:
python test_setup.py
The application includes error handling for:
- Invalid file types
- Failed text extraction
- API connection issues
- Processing errors
-
Never commit sensitive files:
- .env
- API credentials
- Uploaded images
- Generated JSON files
-
The .gitignore file is configured to exclude:
- Sensitive files
- Virtual environment
- Python cache files
- Uploaded and generated files
- Integration with Deepseek R1 Model
- Cost estimation for various APIs
- Project documentation enhancement
- Microservice implementation with Node.js integration
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project uses Commitizen for commit messages. To commit, you'll have to install commitizen and setup husky:
-
Install dependencies:
npm install
-
Initialize husky:
npx husky install
-
Commit using Commitizen:
git commit # or npm run commit
This project is licensed under the MIT License - see the LICENSE file for details