This project is based on the Open University Learning Analytics Dataset (OULAD), which is openly available for educational use.
- I downloaded the OULAD dataset from the official website.
- The dataset contains approximately 28,000 student records across seven tables.
- Out of the seven tables, I selected and worked with the following four:
studentinfo
assessment
courses
studentRegistration
- During review, I found ~4,000 duplicate records in the dataset.
- Actions taken:
- Rearranged columns in Excel to match the SQL table schema.
- Removed duplicate records using Excel’s Remove Duplicates feature.
- Verified data types (e.g., date format, integer columns).
- Saved cleaned files in CSV UTF-8 (Comma delimited) format to ensure compatibility during SQL import.
- Created a new database on my local SQL Server named
OULAD
. - Manually created SQL tables using
CREATE TABLE
statements with the same names as in the dataset.
- Imported the cleaned
.csv
files into the respective tables using BULK INSERT commands. - Also used proper
CODEPAGE = '65001'
to handle UTF-8 encoded files.
- Generated the ER Diagram using dbdiagram.io.
- I wrote the schema in DBML language with support and guidance from ChatGPT.
- Established primary and foreign key relationships based on the structure of the OULAD dataset.
- Executed multiple professional-level SQL queries on the database to analyze:
- Course and assessment participation
- Student final results
- Dropout patterns
- Exported key queries and their outputs into a PDF, which is attached to the GitHub project repository.
-
CSV Import Errors:
- Encountered character encoding issues while importing files.
- Resolved by converting files to CSV UTF-8 (CSV-8) format and setting
CODEPAGE = '65001'
in import commands.
-
Date Format Issues:
- Excel saved dates inconsistently, causing type mismatch in SQL.
- Fixed by formatting all date columns before saving.
-
Duplicate Data:
- Detected ~4,000 duplicate rows across tables.
- Cleaned via Excel using filters and conditional formatting before import.
-
Composite Keys in ER Diagram:
- dbdiagram.io required learning DBML syntax for defining composite primary keys and references.
- Learned syntax like:
primary key (id_student, code_module, code_presentation)
- Clean database with four normalized tables.
- ER diagram created and exported using dbdiagram.io.
- SQL query results exported to PDF.
- All assets uploaded to GitHub as part of this portfolio project.
Thank you for visiting this project repository!