Skip to content
Spencer Buja edited this page Jun 22, 2019 · 19 revisions

Table of Contents

What is BrowseCloud?

It's a laborious task to collect and synthesize the perspectives of customers. There's an immense amount of customer data from a variety of digital channels: survey data, StackOverflow, Reddit, email, etc. Even for internal tools teams at Microsoft, there are at least 10,000 user feedback documents generated per quarter.

To help solve this problem, BrowseCloud is an application that summarizes feedback data via smart word clouds, called counting grids. On a word cloud, the size of the text simply scales with the frequency of the word. Text is scattered randomly on word clouds. In BrowseCloud, we have a word cloud where the position of the word matters. As the user scans along the visualization, themes smoothly transition between each other.

Web App

Go to https://aka.ms/browsecloud-demo to give our web app a try! You can also download the app to run it locally via the command line, or you can setup the infrastructure needed for the full experience.

Explore customer feedback using a visualization on the web.

Explore Workflow

Uploading New Data Set

Uploading a new data set is only supported by the internal version of BrowseCloud at Microsoft for now.

Getting Started Workflow

Upload Workflow

FAQ

Who do I contact for help with BrowseCloud?

Use the issues tab or email browsecloud-team@microsoft.com

How do I format my "input file"/"training data"?

Where is the source code?

We are using GitHub to store and manage the project's source code! https://github.com/microsoft/browsecloud/

How do I train a new model?

If you work at Microsoft, use https://browsecloud-client.azurewebsites.net, add your own data, and train on the site. If you do not work at Microsoft, then clone the source code and run dumpCountingGrids.py on your new data. You can then visualize using the demo angular app. Put your model files in the browsecloud-client/assets/demo directory and run.

What is the underlying algorithm?

In BrowseCloud, we utilize the Bayesian approach to building AI, and our algorithm falls into the family of probabilistic inference methods.

We assume that there is some space into which a set of tight distributions is embedded, and that these distributions are then combined using a windowing operation to create a resultant distribution from which the observed bags of words or features are generated. However, we do not assume that the mapping is given a priori. For simplicity, we assume that the space is a discrete grid of counts, but of arbitrary dimension (we experimented with 2- and 3-dimensional grids) and we consider iterative estimation of counts on this grid and the mapping of the data to the overlapping windows on it. Our experiments indicate that the thematic shifts are indeed present in a variety of datasets, and as a result, our model outperforms standard topic mixing (LDA) there. We analyzed a wide variety of data types, including text, images, gene expression and viral peptides, and used the learned counting grids to perform regression or classification.

To learn where to map documents to the grid, which is a set of latent variables, we run generalized expectation maximization and update the counts of the words on the grid.

Link to Paper: https://arxiv.org/ftp/arxiv/papers/1202/1202.3752.pdf