This repository contains a PyTorch implementation of the Transformer model as described in the paper "Attention Is All You Need" by Vaswani et al. (2017).
This implementation provides a complete pipeline for training and using Transformer models, including:
- Data preprocessing from Excel files
- Model training with customizable hyperparameters
- Validation during training
- Model architecture following the original paper
.
├── main.py # Main training script
├── train.py # Training loop implementation
├── preprocess.py # Data preprocessing utilities
├── arguments.py # Command line argument definitions
├── module/ # Core model components
│ ├── Models.py # Transformer model architecture
│ ├── Constants.py # Constants and special tokens
│ └── Optim.py # Optimizer with learning rate scheduling
├── data/ # Data directory
└── output/ # Training outputs and model checkpoints
- Python 3.6+
- PyTorch
- NumPy
- pandas (for Excel file handling)
- Prepare your data in Excel format
- Run the training script:
python main.py --excel path/to/your/data.xlsx --output_dir output
--excel
: Path to input Excel file--output_dir
: Directory to save model checkpoints--batch_size
: Batch size for training--d_model
: Model dimension--n_layers
: Number of transformer layers--n_head
: Number of attention heads--dropout
: Dropout rate--n_warmup_steps
: Number of warmup steps for learning rate scheduling
The implementation follows the original Transformer architecture with:
- Multi-head self-attention
- Position-wise feed-forward networks
- Layer normalization
- Residual connections
- Positional encoding
This implementation is based on:
- The original Transformer paper: "Attention Is All You Need" by Vaswani et al. (2017)
- The module implementation by jadore801120