Skip to content

Latest commit

 

History

History
73 lines (48 loc) · 4.71 KB

ROADMAP.md

File metadata and controls

73 lines (48 loc) · 4.71 KB

Tagger Mail Data Science Road Map

Maintainability Test Coverage MIT PEP8

Lambda Labs 24 - M. Bustamante, J. Lindberg, B. Mulas, C. Filkins

The Tagger Data Science team put together 2 API points: one which resides in the cloud on a flask application within Amazon Web Services and the other, which lies internally within a stand-alone desktop Electron application. The Tagger cloud-based API pulls emails from the Google API. Our API then cleans these emails and runs them through an NLP pipeline using a latent Dirichlet allocation to derive a topic set. Those topics are then weighted by frequency and paired with concurrent VADER Sentiment Analysis. All of this is packaged up in JSON for retrieval by the desktop application. The data science API for the desktop application, in turn, receives search requests from the end-user and searches the database of email "smart tags" to find a list of relevant email IDs, which are then output to the desktop application for presentation at the user level.

Current Architecture

Currently, the application tagging engine produces a 49% tagging accuracy for the end-user. Analysis [insert link]

Go here to see a breakdown of current build architecture.

Methods for improved tagging model accuracy:

  • Spacy “Named Entity Recognition” Docs

  • Spacy “Part of Speech” Docs

  • Spacy “Dependency Parsing” Docs

Methods for improved search speed:

  • Elasticsearch Docs

Methods for improved search model relevancy:

  • K Nearest Neighbors Docs, or KMeans Docs on "search phrase" using tagging dictionary (This is the model that Labs24 was building towards. You can see such work in the colab notebooks.) Predictors should be a list of emails based on search phrases.

    • Issues include:
      • Requires managing size of tagging dictionary for KNN
      • Requires frequency of building tagging dictionary for KNN
  • Using an RNN such a Self Organizing Map Pypi to "compete" with the KNN model. Develop signaling within the search to pass priority to the RNN when it's accuracy overtakes the KNN model.

    • Issues include:
      • Requires long training time over large data sets of emails
      • Requires application to cede function from one model to another without user intervention
  • User-generated tagging as addition and augmentation to existing ML tagging API

    • Current BE DB has a table for user-generated tags, called “User Tags” by BE engineers. This is the extent of the implementation at this time.
    • The ultimate goal is to allow users to add (per individual email) their own tags, remove existing tags and apply Active Learning techniques towards more relevant results for the end-user Docs
      • Issues include:
        • Returning the current state of the tags back to the web-based API
        • Effecting the tagging model by the results of the user tag editing This ability for the end-user to edit the tag dictionary by either adding or deleting tags will be useful in user-based labeling for training the RNN
  • Sentiment analysis/warning at the search result UX

Addtional ML Concepts / Views

The following are ways to further use machine learning to improve the Tagger Mail experience.

  • Toggle machine-written (spam) email versus personal email
  • Clustering folders
  • Auto-correct
  • Clustering on relationships
  • Best time of day to send emails/when your colleagues/friends send emails is probably a good time to send them as well
  • Analytics across all user functions
  • Collect contact information for sender during a search
  • Products/media your contacts refer to in messages
  • Trips/vacations your contacts refer to in messages
  • Email deferral tracking
  • Commitment detection
  • Smart reply
  • Contact view sorted by recency

Additional Documentation