Skip to content

A tool used to generate questions from lecture transcripts using template extraction from sample questions asked by real students. Performed using SRL.

Notifications You must be signed in to change notification settings

LiamTalberg/StuQuestions-Question-Generator

Repository files navigation

Installation and Setup:

- Ensure you are using Python version 3.9

- On Windows and Linux to install the required libraries perform the following steps:
    1) Run the command: "pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html"
    -  NOTE: It is VITAL that the "-f https://download.pytorch.org/whl/torch_stable.html" is included in the command.
    2) After the packages have been installed run the command: "python3 -m spacy download en_core_web_sm"
    - This will download the english package for Spacy.

- On Mac to install the required libraries perform the following steps:
    1) Alter the requirements.txt file by removing the "+cpu" from torch and torchvision.
    2) Run the command: "pip install -r requirements.txt"
    3) After the libraries have been installed run the command: "python3 -m spacy download en_core_web_sm"
    - This will download the english package for Spacy.

- Fixes to some common erors:
    - Pip can not find "torch version 1.7.1" package:
        - Make sure that Python version 3.9 is being used.
        - This can be done using "python3.9 -m pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html"
        - Or using the actual path to the the python 3.9 installation: "/usr/bin/python3.9 -m pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html"

    - Error when building wheel for jsonnet:
        - Install the python dev tools for python3.9 with the command: "sudo apt-get install python3.9-dev"


Descriptions of Included Folders/Files:

- "Transcripts" : A folder that contains 12 weeks of lecture transcripts, each in a different folder corresponding the week.
- "Sample Questions" : A folder that contains the 12 weeks of questions related to transcripts. Along with a merged and cleaned version of the all the questions in seperate .txt files.
- "Evaluation Data": A folder than contains the excel files with the results of the survey and internal evaluation.
- "semanticQG.py" : A Python script that contains the Semantic QG system.
- "templateQG.py" : A Python script that contains the Template QG system.
- "requirements.txt" : A text file containing the names and versions of the required libraries.
- "demoContent.txt" : A text file containing an example of a lecture transcript that can be used for testing/demo.
- "structured-prediction-srl-bert.2020.12.15.tar.gz" : The model from AllenNLP that performs the SRL.
- "allQuestions.txt" : A file containing 594 questions that have been cleaned and extracted from the dataset.
- "validQuestions.txt" A file containing the 81 questions from which valid templates were extracted.
- NOTE: The "validQuestions.txt" file can be used to quickly run the program for testing/demo.
- NOTE: However, due to the clustering the output will be different to that evalauted in the report.


Required Format of the Input Files:

- NOTE: All input files should be placed at the same level as the Python scripts.

- Content File:
    - This is the file that contains the text from which questions will be generated.
    - This can be any passage of text.
    - The text can be of any resonable length (longer text = longer to generate questions)
    - It can contain multiple paragraphs.
    - The format can be seen in "demoContent.txt"

- Sample Questions File:
    - The is the file containing the sample questions from which templates are extracted.
    - The file should contain each question on a new line.
    - The format can be seen in "allQuestions.txt" and "validQuestions.txt"
    

How to Run the Semantic QG System:

- NOTE: If the "structured-prediction-srl-bert.2020.12.15.tar.gz" model is not already downloaded (i.e.: not at same level as the "semanticQG.py" script):
    - Comment out line 166.
    - Uncomment line 167.
    - This will download the model when the program is executed.
    - However, this will require the model to re-downloaded every time the program is run.
    - Thus, the model can be manually be downloaded from: "https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz"
    - After downlading, place the model at the same level as the semanticQG.py script.

- The code for the semantic QG system is found within "semanticQG.py"
- The program can be executed by running "python3 semanticQG.py"
- Once executed the program will begin by loading in the SRL model (this can take a few seconds).
- NOTE: If running on Windows, an error will be displayed in the console. This can be ignored.
- On the first time running the code it may automatically install an NLTK Wordnet package.
- Once the model is loaded you will be prompted to enter the name of a text file from which you want questions to be generated.
- The file should be placed at the same level as the "semanticQG.py" script.
- You should enter the file name EXCLUDING the ".txt" extension (i.e.: demoContent NOT demoContent.txt).
- If the file CANNOT be found, an error message be presented.
- If the file CAN be found, the system will begin to generate questions.
- Upon completion, some statistics regarding the amount generated questions will be displayed in the terminal.
- The generated questions will be output to a file with the same name as the input file with the suffix "SemanticQuestions"
- i.e.: If the input file was "demoContent.txt" the output file will be "demoContentSemanticQuestions.txt"
- This output file will contain the sentence followed by the question/s generated from it by the system.
- To terminate the program you can simply type "stop"


How to Run the Template QG System:

- NOTE: If the "structured-prediction-srl-bert.2020.12.15.tar.gz" model is not already downloaded (i.e.: not at same level as the "semanticQG.py" script):
    - Comment out line 166.
    - Uncomment line 167.
    - This will download the model when the program is executed.
    - However, this will require the model to re-downloaded every time the program is run.
    - Thus, the model can manually be downloaded from: "https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz"
    - After downlading, place the model at the same level as the semanticQG.py script.

- The code for the Template QG system is found within "templateQG.py"
- The program can be executed by running "python3 templateQG.py"
- Once executed the program will begin by loading in the SRL model (this can take a few seconds)
- On the first time running the code it may automatically install an NLTK Wordnet package.
- NOTE: If running on Windows, an error will be displayed in the console. This can be ignored.
- Once the model is loaded the program will request the name of the file containing the sample questions.
- These sample questions are the questions used to extract templates (The format of the file is explained above).
- You should enter the name of the file containing the questions EXCLUDING the ".txt" extension (i.e.: allQuestions NOT allQuestions.txt)
- The file should be placed at the same level as the "templateQG.py" script.
- If the file CANNOT be found, an error message be presented.
- If the file CAN be found, the system will beging to cluster and extract templates from the questions.
- This can take some time depending on the amount of questions.
- Once completed, the system will display the number of templates extracted.
- Two files will also be created: one with the suffix "Clustered" and another with the suffix "TemplatesClustered"
- e.g.: If the input file was called "allQuestions.txt": 
    - "allQuestionsClustered.txt" : Would contain all the sample questions listed in their respective clusters.
    - "allQuestionsTemplatesClustered.txt" : Would contain the extracted templates in their respective clusters.
- NOTE: The templates in this file will only be the valid templates that have met the extraction criteria.
- After template extaction you will be prompted to enter the name of a text file from which you want questions to be generated.
- You should enter the file name EXCLUDING the ".txt" extension (i.e.: demoContent NOT demoContent.txt).
- The file should be placed at the same level as the "templateQG.py" script.
- If the file CANNOT be found, an error message be presented.
- If the file CAN be found, the system will begin to generate questions.
- Upon completion, some statistics regarding the generated questions will be displayed in the terminal.
- The generated questions will be output to a file with the same name as the input file with the suffix "TemplateQuestions"
- i.e.: If the input file was "demoContent.txt" the output file will be "demoContentTemplateQuestions.txt"
- This output file will contain the sentence followed by the question/s generated from it by the system, as well as the template used.
- To terminate the program you can simply type the "stop"











About

A tool used to generate questions from lecture transcripts using template extraction from sample questions asked by real students. Performed using SRL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages