Skip to content

Latest commit

 

History

History
58 lines (37 loc) · 2.96 KB

File metadata and controls

58 lines (37 loc) · 2.96 KB

🚀 Predictive Text Analysis for Science Exams 🚀

Welcome to our Predictive Text Analysis project! This repository contains code for predicting answers to science exam questions using advanced natural language processing techniques.

📚 Dataset Used

We utilized a comprehensive dataset containing questions (prompt) and answer choices (A, B, C, D, E) from science exams. The dataset was meticulously curated to ensure diverse and meaningful questions for analysis.

🔍 Features

  • Prompt Analysis: We performed in-depth analysis on question prompts, exploring word frequencies, lengths, and semantic patterns.
  • Text Vectorization: Utilized TF-IDF vectorization to convert textual data into numerical features for machine learning model training.
  • Machine Learning Model: Implemented a Random Forest Classifier for answer prediction, achieving high accuracy on the test set.

🧠 Model Architecture

Our machine learning model comprises a Random Forest Classifier, a robust algorithm for multi-class classification tasks. We used TF-IDF vectorized features as input, enabling the model to learn complex patterns in the textual data.

🌟 Features

  • Interactive Visualizations: Explore interactive charts and visualizations, including bar charts representing class distributions and dynamic word clouds showcasing frequently occurring words in questions.
  • 3D Scatter Plots: Dive into 3D scatter plots to uncover correlations between question difficulty, length, and correct answer frequencies.
  • Confusion Matrix: Visualize the model's performance through an intuitive confusion matrix, providing insights into prediction accuracy.

🚀 Usage

  1. Data Preprocessing: Explore Jupyter Notebooks for in-depth data preprocessing and exploratory data analysis.
  2. Model Training: Utilize the provided Python scripts to train the Random Forest Classifier and obtain predictions.
  3. Interactive Visualizations: Run interactive Python scripts for dynamic visualizations of the dataset and model performance.

🛠️ Dependencies

  • Python 3.7+
  • Pandas
  • NumPy
  • Scikit-Learn
  • Matplotlib
  • Seaborn
  • Plotly
  • WordCloud

📊 Results

Our trained model achieved an accuracy of over 90% on the test dataset, demonstrating its effectiveness in predicting correct answers to science exam questions.

🌐 Connect with Me

Let's connect and collaborate! Feel free to reach out to me on:

I'm always open to discussions, collaborations, and learning new things together. Don't hesitate to drop me a message or explore my other projects on GitHub. Happy coding! 🚀

Feel free to dive into the code, experiment with the features, and explore the nuances of writing quality predictions through keystroke analysis! 🕵️‍♂️💬

Happy coding! 🚀