Skip to content
This repository has been archived by the owner on May 30, 2019. It is now read-only.

Samplestack Search Configuration

Charles Greer edited this page Feb 5, 2015 · 9 revisions

The search capabilities in Samplestack, as documented in Search Tips, are powered by the MarkLogic Search API. This page describes how the Search API is configured to support Samplestack's search scenarios, how Samplestack accesses the Search API, and how the build process configures MarkLogic's runtime search capabilities.

In short, the Java Middle tier talks to an instance of the MarkLogic REST API, (by default on localhost, port 8006). This REST API instance, accessed from Java, can be configured and extended as part of the build process. One such extension is to store a set of "Query Options" on the server that can be accessed by name as part of a runtime search. These configurations and extensions are updated in the Java stack as part of the ./gradlew dbconfigure task.

Options

Samplestack uses several such "Query Options" files, contained at /database/options. The main search configuration is in the one called questions.json. You don't have to do any application-level configuration to take advantage of MarkLogic Search capabilities -- you just have to understand where to find your data and what kind of search you expect from it.

Constraints

First, this file contains "constraints." The name of each constraint translates to a prefix you can use in the Search Bar in Samplestack.

  • askedBy, answeredBy, commentedBy, id. Each of these constraints describes a "path-index" which limits the search to the values within a particular part of the JSON document. Each of these constraints is backed by a range index, which means that Samplestack can find documents by these criteria very quickly, and could also do sorting and comparison operations and facets on them as needed.

  • user, userName, id, resolved. These constraints are configured simply to look for a JSON property called "displayName", "userName" or "id" and search for exact matches. These constraints, when invoked, will generate value queries, which match the exact value of a particular property, rather than, say, a word within that value. value-query is supported by MarkLogic's universal index, and is available out-of-the box on whatever you ingest.

  • votes, answers. These constraints are backed by range indexes, and hence support GT and LE operators, and can be used in sorting. The sorting configuration is also in this file, but further down.

  • tag This is a contraint for searching on the "tag" JSON property, but it's also configured for facet resolution. This means that searches will return unique values and frequencies across this property.

Operators

The "operator" section of the Samplestack Query Options defines the sort configurations. The three sort states presented for Samplestack searches are by 'relevance', 'vote count' and 'lastActivityDate' which records the last activity on a QnADocument. Although Samplestack provides controls for these states, you can also activate them in a search box.

Term

The JSON property "term" configures how a search term works by default, that is, when you don't use a prefix.

{"term": {
            "default": {"word":
                {"field":{"name":"default-samplestack-search"}}}
        }
}

To refine how the questions and answers are searched, in Samplestack we use a "field" to back the default search. A field is a MarkLogic construct that groups terms and weightings from various parts of a document into one named search criterion. This field is configured as part of the database setup, and as such is in /database/database-properties.json:

{
            "field-name": "default-samplestack-search",
            "field-path": [
                {
                    "path": "/title",
                    "weight": 2
                },
                {
                    "path": "/text",
                    "weight": 2
                },
                {
                    "path": "/answers/text",
                    "weight": 1
                },
                {
                    "path": "//comments/text",
                    "weight": 0.5
                }
            ],
            "tokenizer-overrides": null
        }

This field configuration assigns various weights to text depending on whether its in a question body, a title, the text of an answer, or a comment.

Result Transform

The Search Options include configuration of the Search snippets. The "transform-results" section provides some ways to generate basic snippets out of the box. Customization is also an option, but Samplestack uses some built-in parameters to tune the size and content of search matches.

Together:

            "max-snippet-chars":100,
            "max-matches":4,
            "per-match-tokens":12,
            "preferred-matches":{"json-property":["text","title"]}

means "prefer text from 'title' and 'text' properties, include at most 4 matches per document in a snippet. Each match should have at most 12 tokens (= words), and the maximum length of a snippet is 100 characters.

Indexes

As part of dbconfigure, gradle uploads a configuration for the database from /database/database-properties.json. This defines range indexes, which will be created as soon as this file is processed by the Management API. Each range index from the above section has a correlation in the database-properties.json. The value query constraints like "userName" do not require an index, but range constraints such as "votes" do.

Samplestack's Searches

Samplestack's main search all goes through rawSearch():

package com.marklogic.samplestack.dbclient;
public class MarkLogicQnAService  {
...
  public ObjectNode rawSearch(ClientRole role, ObjectNode structuredQuery,
			long start, DateTimeZone userTimeZone) {
  ...
  }
}

This method is responsible for passing a structured query object, sent from the browser as a JSON object, to the MarkLogic Client API. (It also does some fancy work with dates and facets).

Below we traverse the body of this method to see how a Java client constructs a query and sends it to MarkLogic:

  • You need a QueryManager to construct a query, and a DocumentManager to retrieve the documents that match a query.
QueryManager queryManager = clients.get(role).newQueryManager(); 
JSONDocumentManager docMgr = clients.get(role).newJSONDocumentManager();
  • Create a query definition which is bound to query options stored on MarkLogic. The query used here (docNode) came from a structured query object, which in Samplestack came originally from the browser application.
RawQueryDefinition qdef = queryManager.newRawStructuredQueryDefinition(
  new JacksonHandle(docNode), QUESTIONS_OPTIONS);
  • Set a response transform (also stored on MarkLogic).
ServerTransform responseTransform = new ServerTransform(SEARCH_RESPONSE_TRANSFORM);
qdef.setResponseTransform(responseTransform);
  • Create a handle to encapsulate a JSON response from the search, and perform the search.
JacksonHandle responseHandle = new JacksonHandle();
DocumentPage docPage = null;
try {
  docPage = docMgr.search(qdef, start, responseHandle);
} catch (com.marklogic.client.FailedRequestException ex) {
  throw new SamplestackSearchException(ex);
}
  • Get a JSON node view of the response and do things with it.
ObjectNode responseNode = (ObjectNode) responseHandle.get();

This is the main interaction, and a non-trivial one, between a Java Client and MarkLogic search. The middle tier however doesn't do much; it brokers, but largely leaves alone, query objects ferried from the UI in the browser application all the way to a configured search scenario in MarkLogic.

Further Reading

Search Developer's Guide How to make MarkLogic hum.

REST Application Developer's Guide For JSON query options reference, understanding REST calls to MarkLogic

Java Application Developer's Guide The reference for building applications with the Java API.