Skip to content

Roadmap Proposal: Natural Language Processing

Tom Schenk Jr edited this page Feb 5, 2016 · 3 revisions

DRAFT

The current version of OpenGrid uses a combination of quick queries (e.g., tweets Chicago Bulls) based on a very basic syntax and advanced queries that require users to complete a form of various parameters. Ideally, the advanced search becomes less important in the user interface by relying on more robust natural language processing in the quick search bar.

Almost all users have become accustomed to interacting the internet with a single search box. Whether it's the homepage of Google or a single bar in the browser: web users understand how to type what they're looking for into a bar and search. Meanwhile, "advanced search" capability (although once part of Google) has been deprecated. Likewise, we should strive to make queries easier based on human language rather than filling out forms to build a query.

The advanced query consists of some basic parameters:

  • Range of dates
  • Selecting data sources (and criteria for each source)
  • Select location parameters

Natural Language Processing (e.g., OpenNLP) can identify these principal components of the query.

Example syntax and the resulting queries.

  • 911 calls in Rogers Park

Resulting query: Dataset == 911p AND Community Area == Rogers Park

  • Burglaries in Rogers Park

Though similar to previous example, this can be more complex. Burglaries could correspond to burglaries filed in the Crimes dataset or could be related to 911 calls received about burglaries. We should over-identify

In the absence of specific dates, the application could rely upon our current protocol to displaying a fixed number (e.g., 6,000) of the most recent data points.

Resulting query: Dataset == Crimes AND Dataset == 911p WHERE Primary Description == Burglaries AND Community Area == Rogers Park

  • Tweets around me

Resulting query: Dataset == Twitter AND geoWithin: {center, ([current location])}

  • Tweets about Chicago Bulls on May 20, 2015

Resulting query: Dataset == Twitter WHERE Twitter.text == "Chicago Bulls" AND Date == "2015-05-20"

Crimes between May 20, 2015 and June 26, 2015

Buses around City Hall

Some default will have to be chosen if the distance isn't specified

Resulting query: Dataset == CTA AND geoWithin: {center, (41,8657, -87.7611)}

Burglaries in Rogers Park

Though similar to previous example, this can be more complex. Burglaries could correspond to burglaries filed in the Crimes dataset or could be related to 911 calls received about burglaries. We should over-identify

In the absence of specific dates, the application could rely upon our current protocol to displaying a fixed number (e.g., 6,000) of the most recent data points.

Alternative UI markers

The above proposal does not completely replace the proposed functionality of all advanced queries. Namely, the form-based approach does allow users to enter highly sophisticated queries with multiple conjunction and disjunctions, customize the aesthetic layout (e.g., point-size, point-color, point-transparency). We would need to identify an alternative approach

Quick Search -> Advanced Search

One method is to retain advanced search and simply transcribe any Basic Query to the Advanced Search window for further refinement by the end-user. This has the advantage of providing a clear interpretation of the Basic Query into distinct fields, but does not require the user to interact with the Advanced Query form unless they want to modify the parameters.

UI improvements in grid

Alternatively, we could define a different UI protocol for the grid. Conceivably, the results can be refined in the search grid to let the user refine the data elements, date ranges, date filters, and the aesthetics by right-clicking on the data sources and column headers to refine the query. For instance:

  • If the basic query included data sources that should be removed, the user can right-click on the result in the window and flag it for removal from query.
  • If the filters on a field in a result (e.g., type code in 311), the user could right-click on that column heading to show a list of check-boxes corresponding to the filter (i.e., similar to the Filter... action in Excel). These can be refined by the user.
  • If the date range is incorrect, a UI element showing the date range used in a query, allowing it to be modified.
  • The aesthetic elements for results (e.g., the color of the dots corresponding to 911 calls) can be modified by selecting the results in the results grid and selecting a menu to modify it.

This approach has the advantage of essentially removing the need for any advanced query window. However, this UI interaction may be less transparent to the user, leaving them uncertain on how to modify results.

Weekly meeting notes

Roadmap proposals

Scope and Planning

Clone this wiki locally