Challenges in the Automatic Analysis of Students' Diagnostic Reasoning

The following repository contains the code for our different evalution metrics applicable to multi-label sequence-labelling tasks such as epistemic activity identification. It also provides the code for training single- and multi-output Bi-LSTMs. The new corpora can be obtained on request, allowing to replicate all experiments of our paper.

Citation

If you find the implementation useful, please cite the following two papers:

@inproceedings{Schulz:2019:AAAI,
	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	author = {Schulz, Claudia and Meyer, Christian M. and Gurevych, Iryna},
	publisher = {AAAI Press},
	booktitle = {Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
	year = {2019},
	note = {(to appear)},
	address = {Honolulu, HI, USA}
}

@misc{SchulzEtAl2018_arxiv,
	author = {Schulz, Claudia and Meyer, Christian M. and Sailer, Michael and Kiesewetter, Jan and Bauer, Elisabeth and Fischer, Frank and Fischer, Martin R. and Gurevych, Iryna},
	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	year = {2018},
	howpublished = {arXiv:1811.10550},
	url = {https://arxiv.org/abs/1811.10550}
}

Abstract: We create the first corpora of students' diagnostic reasoning self-explanations from two domains annotated with the epistemic activities hypothesis generation, evidence generation, evidence evaluation, and drawing conclusions. We propose a separate performance metric for each challenge we identified for the automatic identification of epistemic activities, thus providing an evaluation framework for future research:

the correct identification of epistemic activity spans,

the reliable distinction of similar epistemic activities, and the

detection of overlapping epistemic activities.

Contact person: Claudia Schulz, clauschulz1812@gmail.com

Alternative contact person: Jonas Pfeiffer, pfeiffer@ukp.informatik.tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

http://famulus-project.de

Please send us an e-mail if you want to get access to the corpora. Don't hesitate to contatct us to report issues or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Experimental setup

All code is run using Python 3. In all scripts, we specify where the user has to adapt the code (mostly file paths) with 'USER ACTION NEEDED'.

Neural Network Experiments

The folder "neuralNetwork_experiments" contains the code required to train the neural networks. Our Bi-LSTM architectures are based on the implementation of Nils Reimers (NR): https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf

neuralnets -- contains BiLISTM2.py for the single-output architecture and BiLSTM2_multipleOutput.py for the multi-output architecture
util -- various scripts for processing data and other utilities by NR
data -- on request we provide train.txt, dev.txt, test.txt for all experimental setups

Setup with virtual environment (Python 3)

Set up a Python virtual environment (optional):

virtualenv --system-site-packages -p python3 env
source env/bin/activate

Install the requirements:

.env/bin/pip3 install -r requirements.txt

Get the word embeddings

Download German (text) fastText embeddings from github and place it in the neuralNetwork_experiments folder
Run embeddingsFirstLine.py to remove the first line (header)

Run the Experiments

to train models for prefBaseline, concat, or separate, use train_singleOutput.py
to train models for multiOutput, use train_multiOutput.py
to use a trained model for prediction run runModel_singleOutput.py and trainModel_multiOutput.py NOTE: the loading of multiOutput models assumes a static layout, this needs to be changed if the model parameters are changed

Evaluation Metrics

The folder "evaluation" contains the code required to use our evaluation framework. evaluate.py implements our different evaluation metrics.

use the runModel scripts to create predictions for all (test) files
evaluate.py assumes the following folder structure of prediction results:
- MeD / TeD for the two domains
  - separate, pref, concat, separate - folders for each method
    - MeD_pref1, MeD_pref2, ... - 10 folders with predicition files for 10 models trained for this model
    - note that "separate" has 4 subfolders (separate_dc, separate_hg, separate_ee, separate_eg) for the 4 epistemic activities, each with 10 subfolders for the results of the 10 models
  - goldData - gold annotations for the prediction files
  - human - different set of files used to evaluate human upper bound (all files annotated by all annotators)
    - MeD_human1, ... - annotations of each annotator
    - goldData - gold labels for the files used to evaluate human performance

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
evaluation		evaluation
neuralNetwork_experiments		neuralNetwork_experiments
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

evaluation

neuralNetwork_experiments

neuralNetwork_experiments

README.md

README.md

Repository files navigation

Challenges in the Automatic Analysis of Students' Diagnostic Reasoning

Citation

Experimental setup

Neural Network Experiments

Setup with virtual environment (Python 3)

Get the word embeddings

Run the Experiments

Evaluation Metrics

About

Releases

Packages

Languages

UKPLab/aaai19-diagnostic-reasoning

Folders and files

Latest commit

History

Repository files navigation

Challenges in the Automatic Analysis of Students' Diagnostic Reasoning

Citation

Experimental setup

Neural Network Experiments

Setup with virtual environment (Python 3)

Get the word embeddings

Run the Experiments

Evaluation Metrics

About

Resources

Stars

Watchers

Forks

Languages