AIS Data Analysis is a data science project focused on the analysis of Automatic Identification System (AIS) telemetry collected in the maritime area surrounding the Port of Ancona, Italy (2020–2023).
The project aims to explore, visualize, classify, and predict vessel behaviors using advanced data analysis techniques, geospatial tools, and machine learning models.
Key goals of the project include:
- Cleaning and structuring AIS data for efficient analysis
- Generating statistical and geospatial visualizations
- Performing vessel trajectory clustering
- Training classification models for vessel type prediction
- Forecasting future positions and navigation features
To install the necessary dependencies, run:
pip install -r requirements.txt
Create the following directories for your dataset:
mkdir -p "dataset/AIS_Dataset"
mkdir -p "dataset/AIS_Dataset_csv"
Place the files named ais_stat_data_{year}.csv
into the dataset/AIS_Dataset
directory.
Run the following script to organize and prepare the CSV files:
python setupCSV.py
Run the following script to organize and prepare the vessels tracks CSV files, you need to select the year
python setupTracks.py --year 2020
Initiate a preliminary analysis of the dataset using:
python analyzer_1.py
The project workflows are organized into several phases and types of analysis:
- setupCSV.py: Imports raw AIS files and generates cleaned CSVs with standardized columns.
- setupTracks.py: Extracts and normalizes track trajectories from AIS records for plotting.
- setupSplitUnder10.py: Filters out tracks with fewer than 10 points.
- setupClassification.py: Builds a structured dataset ready for classification models.
- analyzer_1.py: Calculates descriptive statistics (e.g., average speed, counts) and generates basic plots (histograms, scatter plots).
- analyzer_2.0.1.py: Creates interactive maps and heatmaps based on geographic data.
- Uses contextily, folium, and geopandas to overlay data on real-world maps.
- clustering_1.0.1.py, clustering_1.0.2.py, clustering_1.0.3.py: Applies clustering algorithms (K-Means, DBSCAN, HDBSCAN) to group similar trajectories.
- classification_1.0.1.py, classification_1.0.2.py, classification_1.0.3.py: Implements machine learning models (Random Forest, SVM, XGBoost) to classify vessel types or routes.
- prediction_9.0.1.py: Uses regression models (Linear Regression, LSTM on time sequences) to predict future position and speed.
- analyzer_bearing.py: Calculates bearing angles and trajectory-based features to enrich datasets.
- counter_1.py: Aggregates and analyzes parameters (e.g., vessel count by time period, vessel type distribution, temporal statistics).
This project is licensed under the MIT License.
See the LICENSE file for full details.
- Micol Zazzarini
- Andrea Fiorani
- Antonio Antonini
Developed at Università Politecnica delle Marche, Department of Information Engineering.