Skip to content

aus10powell/Automated-Health-Responses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated-Health-Responses

Beyond the catch-all classification of "chatbot," this project explores various flavors, including sentence completion, Q/A, dialogue goal-oriented, VQA (visual dialogue), negotiation, and Machine Translation.

Description: A prototype project aiming to provide automated, physician-like responses to medical questions.

Data:

  • QuickUMLS: A package for accessing a vast medical concept database, UMLS (QuickUMLS GitHub)
  • Data source: AskDoc subreddit. Downloaded from Google's Big Query (AskDoc on Big Query) up to 04-2018.

Research:

Seq2Seq

Papers

Notes About Approaches

  • Dialogue systems, including chatbots, can be classified under three categories:
    • Back-and-forth dialogue between algorithm and human
    • Frame-based, goal-oriented (e.g., online help or call-routing)
    • Interactive Q/A system.
  • Mechanism to generate machine response can be generative or responsive. Successful systems often combine both.

Notes about the Dataset

  • Warning: Data from the AskDoc forum may have explicit content due to the anonymity of Reddit.
  • Data: From the subreddit's inception (2014) to early 2018, comprising approximately 30k threads and 109k responses.

Notes About Approaches

  • Dialogue systems (which include chatbots) generally can be classified under three categories:
    • The back-and-forth dialogue between algorithm and human
    • The frame-based, goal-oriented (think online help or call-routing)
    • The interactive Q/A system.
  • The mechanism to generate the machine response to these systems can be generative (the machine comes up with its own response), or responsive (returns a pre-determined answer based on a classification). Most successful systems seem to have a combination of the two.
  • Probably anyone with a smartphone searches online for something relating to their health. Although the first page, second page, wikipedia article or any one page may not be helpful, the process may itself reveal to the individual important insights: http://matjournals.in/index.php/JoWDWD/article/view/2334

Notes about the dataset

Warning: Data from this forum may have more than its share of topics of a sexual nature. (Could easily be assumed because of the anonymity of reddit.)

Data is from when the subreddit was started (2014) to early 2018. There are approximately 30k threads, 109k responses.

Data Journal

  • 1st Iteration: Exploring model and modeling choices.
    • Decided on architecture for data prep on first model for conversations. We frame the problem as bootstrapping responses to conversations in a general sense of someone who has a health-related question and someone who has some sort of knowledge on the subject. Given that there are multiple responses to potential the same question, the first pass is: someone asks a question on reddit thread and everyone post in that thread not by the author is encoded as a response. This is a big consideration of what we could reasonably expect from a trained network. We are obviously over-sampling questions perhaps giving the network incentive to learn the most generic response to a random question
    • Found out reference code for the Tensorflow Seq2Seq model was depreciated because it uses static unrolling:
      • Static unrolling involves construction of computation graph with a fixed sequence of time step. Such a graph can only handle sequences of specific lengths. One solution for handling sequences of varying lengths is to create multiple graphs with different time lengths and separate the dataset into this buckets.
      • Action: Use Dynamic Unrolling Dynamic unrolling instead uses control flow ops to process sequence step by step. In TF this is supposed to more space efficient and just as fast. This is now a recommended way to implement RNNs.
  • 2nd Iteration: Experimenting with generative model approach
    • So far just using a word-level, teacher-forcing for 1 step ahead Seq2Seq is doing not well (currently based primarily off the reasonableness of responses to training set), but understandably at least. This is currently serving as a baseline when deciding further directions to pursue.
      • There are some issues with current data approach since the model is tending to generalize to a politically phrased response: "I'm not a doctor but"
        • Q: Husband deteriorating before my eyes, doctors at a loss, no one will help; Reddit docs, I need you.
        • A: I don't think this is a single pain is not a doctor but I have a similar symptoms and the story
    • As suspected, even with seq2seq at a word level, we are getting not so great results. Although have not trained on full dataset yet, there is a decided improvement when using less than 30 words for response. One option would be change pipeline and limit words and sentences. However I suspect the bigger issue is that many posts to initial post are not direct responses. Structuring data using as parent/post might be the right approach to try first.
  • 3rd Iteration: Response Retrieval
    • Altered dataset so each post that had a comment posted as reply is treated as direct response. So occasionally one comment may be both a query and a response. Test training at a word level without any cleansing of data lead to very poor results as expected.

    • Successfully implemented dual_encoder with large improvements over baseline:

      • Training Notes:

        • Substantial gains were made by:
          • Adding dropout of 0.5 for the hidden layer
          • Switching to GRU RNN vs LSTM
          • Halving the number of neurons for the 1 hidden layer. Incidentally, this also dramatically decreased training time due to decrease matrix computation.
      • Random Baseline:

        • Recall @ (1, 10): 0.100675
        • Recall @ (2, 10): 0.2
        • Recall @ (5, 10): 0.399273
      • TF-IDF baseline:

        • Recall @ (1, 10): 0.476141
        • Recall @ (2, 10): 0.570431
        • Recall @ (5, 10): 0.722859
      • Dual Encoder with 6B 300d Glove:

        • Recall @ (1,10): 0.61527
        • Recall @ (2,10) 0.77616
        • Recall @ (5,10) 0.944514
      • Dual Encoder with 840B 300d Glove:

        • Recall @ (1,10): 0.715415
        • Recall @ (2,10) 0.87425
        • Recall @ (5,10) 0.974819

      Of course since current implementation of the model is binary (predicting out of 10 possible choices whether the response is correct or not), it really only makes sense to pay attention to Recall@1.

  • 4th Iteration: Improving Relevancy of what Response Retrieval is...well, retrieving

Future Work:

  • Addressing challenges in generating responses to queries, exploring word embeddings, and utilizing the Word Movers Algorithm for similarity scoring.
  • Investigating mental health and emotional subcategories in forum postings.
  • Analysis suggests a high prevalence of mental health topics. Investigating emotional subcategories using Plutchik's Wheel of Emotion (Emotion Wheel).
    • One big issue with trying to generated responses to queries is determining which are queries and which are responses. Using word embeddings and computing a similarity score using the Word Movers Algorithm, we can get very similar types of phrases to a type of query. Example below:
    • Examples of a seed question that could be classified as inquiring about further information:
      • Seed: "Hey, how's your husband doing now? Hope everything is okay."
      • "So why are you posting on here then, if you had two 'real' doctors giving you advice? What answer are you looking for here? "
      • 'How long ago did you change your diet, as in when did you have the kidney stones?'
      • 'How old is your partner?\n\nDo you know her diagnosis (ie why they did her surgery)?',

About

A prototype project for automated, physician-like responses to medical questions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages