This is a sample implementation of our automated verbal reports analysis framework. The framework allows you to conduct your jsPsych study whilst recording the participants. Subsequently, analyses can be replicated by executing our data-pipeline. We also provide a more detailed guide here.
- Automated Analysis of Verbal Reports
You can choose between the setup for developing and a setup for production. We recommend testing and developing your study locally on your own computer. After everything works, you can run the setup on the server.
You can obtain the code using either one of the following ways:
Open a console to check if Git is already installed. Open your terminal and type:
git --version
If Git is not installed, you need to install Git. Then open your terminal and navigate to a preferred folder and run:
git clone https://github.com/tehillamo/AutoV-LLM.git
Navigate in your browser to our GitHub repository https://github.com/tehillamo/AutoV-LLM. Then click on the green button <> Code
and click on Download ZIP
. After that, you need to unzip the file in a folder of your choice.
- npm (>= 9.5.1), Node.js (>= 18.16.1), Python (>= 3.10)
- Open your terminal and navigate to our framework folder
- Run
cd AUTOV-LLM/webserver
to step into the webserver folder - Run
npm install
to install all dependencies - Run
npm run dev
to start the webserver - Available in browser under
http:://localhost:8000
Please install the necessary cuda backend if you plan to use cuda. The code is based on cuda 12.6.
- Open your terminal and navigate to our framework folder
- Run
cd AUTOV-LLM/data_pipeline
to step into the data pipeline folder - We would recommend to create a virtual environment with
python -m venv venv
. If you do not want to do that skip to step 5 - Start virtual environment
source venv/bin/activate
- Run
pip install -r requirements.txt
to install python dependencies - Please run after that
pip install whisperx==3.4.2
andpip install numpy==1.26.4
in this order to avoid version conflicts. - Install ffmpeg (required version >=4.1 and <= 4.4)
- MacOS:
brew install ffmpeg@4
andbrew link ffmpeg@4
- MacOS:
- Start the corresponding script using
python -u scripts.py
- Install Docker and Docker-Compose
- Open your terminal and navigate to our framework folder
- Run
cd AUTOV-LLM/webserver
to step into the webserver folder - Run
docker compose -f docker-compose-prod.yaml up --build -d
to start the docker container - Available in browser under
http:://localhost:8080
- To stop the container use
docker compose stop
- Open your terminal and navigate to our framework folder
- Run
cd AUTOV-LLM/data_pipeline
to step into the data pipeline folder - Run
docker compose up
to start the docker container and run the whisper model on cpu.- use
docker compose -f docker-compose-gpu.yaml up
for a speedup if you have a cuda compatible gpu
- use
- To stop the container use
docker compose stop
We use the jsPsych framework to run our study. You can specify the different trials in /webserver/public/src/index.js
. To make use of the recordings you need to specify the first and last trial, as well as some callback functions. To track the trials, it is necessary to initialize jsPsych with a trigger functions. onTrialStartRecording(trial);
starts the recording as onTrialFinishRecording(trial);
ends the recording of this trial and sends it to the server. When the study has finished, we send the data to the server using sendData();
.
var jsPsych = initJsPsych({
on_trial_start: function(trial) {
onTrialStartRecording(trial);
},
on_trial_finish: function(trial) {
onTrialFinishRecording(trial);
},
on_finish: function() {
sendData(jsPsych.data.allData.trials);
}
});
Then we specify our trials. The first and last trial should both be jsPsychSpeechRecording
. The first trial indicates the start (start: true
). The last trial specifies the end (start: false
). The actual study trials can be declared between the two recording trials. If you need other plugins import them in /webserver/public/index.html
. Just wrap the jsPsychSpeechRecording
around your study trials.
const trials = [
{
type: jsPsychSpeechRecording,
start: true
},
// add your study trials here
{
type: jsPsychSpeechRecording,
start: false
}
];
After that, we can run the study.
jsPsych.run(trials);
The implementation of the jsPsychSpeechRecording
plugin can be found in /webserver/public/jspsych/dist/plugin-recording.js
. Note that you must include the script in the html file.
Our memory task study can be found in /webserver/public/index.html
. You can find a template to integrate you own study here /webserver/public/index_template.html
.
All recordings and the trial data will be saved per default to the ressources
folder. Each participant has a unique, random and anonymous id. For each participant we create a new folder inside ressources
. In this folder you can find the recordings and the behavioral data from the study.
We offer various scripts to automatically assess verbal report recordings. In the following sections, we describe each script's function and how to employ it. Our script is built in a modular fashion such that more scripts can be easily integrated. Furthermore the used machine learning models can also be changed. If you added scripts feel free to open a pull request such that we can add them to our repository.
This script transcribes the recordings into text. We are using OpenAI's whisper model [1] for this. However, we also implemented different options (see config options). Since many automated speech recognition (ASR) models differ in their initialization and execution we need to write custom initializations and execution functions. In the code we have examples for Whisper, WhisperX and Nvidia Parakeet. This should give a very good overview how to incooperate new ASR models.
We use Sentence-BERT [2] to obtain embeddings for each transcribed verbal report.
We implemented three techniques to reduce the dimensionality:
- PCA
- t-SNE
- combination of PCA and t-SNE
We use a zero-shot classification algorithm to find the most problable text label for a given text.
In the last step we merge the data obtained from the study with the earlier computed data (transcribed text, embeddings, lower dimensional embeddings).
In the config file you can specify all parameters. Note that if you change input_path
, output_path
and cache_path
, then you must change this also in the docker-compose file.
Parameter | Explanation |
---|---|
input_path | Path to the CSV file |
output_path | Path where the output file should be saved to |
output_name | Name of the resulting CSV file in the output path |
transcription_model | Which ASR model to use. Possible options are Whisper, WhisperX and nvidia/parakeet-tdt-0.6b-v2. For Whisper and WhisperX use whisper-<model> and whisperx-<model> respectively. For the Nvidia model just use the model name from huggingface (nvidia/parakeet-tdt-0.6b-v2). |
behavioral_columns | A list of the behavioral columns from jsPsych which should be merged into the output file |
reduction_algorithm | Algorithm for dimensionality reduction, possible values: ("PCA", "TSNE", "both") |
dimension | Dimension to which the embedding dimension should be reduced |
transcribe_text | Perform speech to text (must provide input_path) |
post_asr_correction | Perform a post asr correction (OpenAI API key required because it uses GPT4) |
word_cloud | Create word cloud |
ignore_words_in_word_cloud | List of words to ignore in the word cloud |
text_classification | Perform the text labelling algorithm |
calculate_text_embeddings | Calculate the text embeddings |
dimensionality_reduction | Apply dimensionality reduction to the embeddings |
text_classes | List of text classes for the text classification algorithm |
keywords | Compute keywords |
top_n_keywords | Select top n keywords, ordered by probability (top = highest probability) |
summarize | Compute summary |
max_length_summary | Maximum length of the summary |
min_length_summary | Minimum length of the summary |
zero_shot_text_finetuned_model | Path to fine-tuned text-classification model. Leave null if you do not have a fine-tuned model. |
bert_finetuned_model | Path to fine-tuned embedding model. Leave null if you do not have a fine-tuned model. You can also specify any SentenceBERT model. |
openai_api_key | API key for OpenAI |
use_openai_prompting | Flag to use OpenAI prompting. The prompt will be the transcribed text. Note that you must provide an OpenAI API key! |
use_openai_embeddings | Flag to use OpenAI text embeddings. Note that you must provide an OpenAI API key! calculate_text_embeddings must be also set to true |
developer_prompt | Developer prompt for the Prompting script, i.e. instructions for the Model |
openai_model | Specific model for prompting |
openai_embeddings_model | Specific model for embeddings |
You can find the output of the evaluation script in /output/
. The script produces a CSV file.
CSV column | Explanation |
---|---|
uuid | UUID of each participant (unique) |
trial_number | Trial number of the study |
transcribed_text | Transcribed text from the recording |
embedding | Embedding obtained from SentenceBERT |
embedding_reduced_pca | Lower dimensional embedding (PCA) |
embedding_reduced_tsne | Lower dimensional embedding (t-SNE) |
embedding_reduced_both | Lower dimensional embedding (first PCA to 50 dimensions, then t-SNE) |
keywords | Keywords of transcribed text |
summary | Summary of transcribed text |
If you use our framework in yours studies or use it in your research, feel free to cite our work.
...
[1] Radford, A., Kim, J.W., Xu, T., Brockman, G., Mcleavey, C. & Sutskever, I.. (2023). Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:28492-28518 Available from https://proceedings.mlr.press/v202/radford23a.html.
[2] Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.