Skip to content

componentAutoSuggestionServer

CSchowalter edited this page Sep 9, 2013 · 43 revisions

Current ToDos:

  • Add a possibility to create new indices

Description

The AutoSuggestionServer provides a REST-Api such that users or other services can request search Strings. This is used to support users for entering search queries correctly. The AutoSuggestionServer provides suggestions of a search string given a prefix in a ranked order. The ranking of search terms has to be provided to the server. The server's task is only to provide suggestions as well as images together with the suggestions to the user as fast as possible.

Requirements

current Requirements

  • the server should be able to communicate with a jQuery client out of the box. Some basics for the serverside implementation of an auto Suggest Server that is compatible with jQuery are listed at: http://stackoverflow.com/questions/5077409/jquery-ui-autocomplete-server-side-example-what-does-a-request-json-response-lo Some jQuery code for calling an autocomplete with several options and parameters (most client side) are given at: http://www.devbridge.com/projects/autocomplete/jquery/

  • The server must be able to handle more than a million Strings for Suggestions

  • The server time for suggestions must be less than 1 Millisecond (such that 1000 concurrent users can use the server)

  • Different contexts should be treated individually

  • e.g. a search bar in venues should suggest venues as results and not just anything that matches the letters

  • the server should be highly configurable to individual suggest indices

  • Images should be included to the results

  • small file size suffices (few kb)

  • [Developers] The API and protocols should be re-usable in other plattforms

  • The api must be usable by anybody who is authentificated

  • The suggestions should include global keys of metalcon entities where applicable

  • The server should be able to make its data persistent

  • There should be a way to configure the server

  • max allowed image size

  • allowed image geometry (exact pixel size or aspect ratio)

  • The server should avoid key/name duplicates

  • name duplicates can be allowed if unique keys identify each entry

  • [University] the server should be able to log if suggestions are accepted by a user (is this really a requirement of the suggest server or of the logging component?)

Future Requirements

  • what about concatenating search strings and suggestions

Protocol for the auto suggestion Server

The protocol is called "Auto Suggest Transfer Protocol" or short "ASTP".

The typical CRUD operations are supported.

Status messages for every operation are human readable, give advice on how to solve the problems and are returned as a list, so in case multiple problems occur, all of them are reported.

Create

new index

The server always has a default index. In order to create a new index, a NewIndexCreate request hast to be performed. It is a simple request containing only the desired index name. It is not allowed to have two indices with the same name so trying to create such a duplicate index leads to an error-response. Trying to create an index without specifying a name also leads to an error. If the request is valid, a status-ok response will be returned

new entry

Two ways to create data entries. One is for single entries (or rather small numbers sequentially)

HTTP/1.0 POST somedomain/autosuggest/insert
Accept: text/x-json
Content-type: multipart/form-data, boundary=ASTP-boundary

--ASTP-boundary
content-disposition: form-data; name="term"

SEARCHTERM
--ASTP-boundary
content-disposition: form-data; name="key"

KEY
--ASTP-boundary
content-disposition: form-data; name="indexName"

INDEX
--ASTP-boundary
content-disposition: form-data; name="weight"

WEIGHT
--ASTP-boundary
content-disposition: form-data; name="image" filename="FILENAME"
Image-Bytestream...
--ASTP-boundary--

The server expects HTTP-form in a multipart POST request. GET is not allowed. The image is optional (details below). Depending on the index the data are inserted in, either a unique key and an arbitrary name are expected or if there are no keys, a unique name is needed. In either case, creating a nameless item is forbidden. Also, it is always necessary to tell the server which index the new entry is supposed to be inserted into.

Status messages:

  • if everything succeded: "OK"

Warnings:

  • "Image file size too big. The maximal size is X" if the image file size is greater than the set limit. X is replaced with the value from the config file.
  • "Image geometry too big, the maximum scales are width=X and height=Y" if the image hight or width are greater than the set pixel size. X and Y are replaced by the values set in the config file!
  • "wrong image type. This server only supports JPEG." if the image is not delivered in JPEG-encoding
  • "Missing index name. Using default index." in case the index name was not specified
  • "No image inserted" if for any reason there was no image inserted to the database.
  • "No key given" if no key is specified

Errors:

  • "HTTP Content-Type is not multipart. The requests need to be multipart/mixed, which needs to include one "multipart/form-data" for the parameters and optionally "image/jpeg" if the entry should contain an image.
  • "No search term given. A suggest item always needs a term. Please enter a search term for your item." if the request is missing its term
  • "Weight not specified. Every entry needs to have a weight. Please speicify a weight for the suggestion." if there is no weight given.
  • "Weight is not a valid number. Weight may only be a not-negative integer." if weight was specified in another way than a valid integer.
  • "key duplicate. It is not allowed to have the same key twice in an index." if the key already exists
  • "name duplicate. Indexes without keys don't allow the same name to exist twice." if the name already exists and duplicates are not allowed

If an error happens, the data is rejected. Creating incomplete entries is not allowed. Adding images which do not meet the requirements set in the config file is also forbidden. The request will still be accepted in this case but without the image.

Alternatively, if there is no image to transmit for an entry, the request is the same, just without an image

HTTP/1.0 POST somedomain/autosuggest/insert
Accept: text/x-json
Content-type: multipart/form-data, boundary=ASTP-boundary

--ASTP-boundary
content-disposition: form-data; name="term"

SEARCHTERM
--ASTP-boundary
content-disposition: form-data; name="key"

KEY
--ASTP-boundary
content-disposition: form-data; name="indexName"

INDEX
--ASTP-boundary
content-disposition: form-data; name="weight"

WEIGHT
--ASTP-boundary--

Everything else this does is similar to the first create call.

Both entry methods share the same response:

It is a JSON-file, which has the message type as key and its message as value. Also the search term is included.

Example:

{"Warning:suggestionKey":"Suggestion key not given","term":"test"}

It is possible that multiple warnings occur.

The other one is for importing whole databases (so called "bulk import"). This allows streaming whole CSV-files instead of every single entry of it separately.

HTTP/1.0 POST somedomain/autosuggest/addbulk
Accept: text/x-json
Content-type: text/comma-separated-values

Open Question: Should there be a way for bulk importing images?

Bulk import only responds whether the whole import worked or not, so the response is:

{Status: "Status-Message"}

Retrieve

Since it might not always be useful (or needed) to transfer an image, there are two ways to request data. The returned data is always in JSON format.

Requests are made with HTTP 1.0 since there seems to be no feature of 1.1 which would improve anything, so backwards compatibility and easier implementation is reasonable. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Simple Request

HTTP/1.0 GET somedomain/autosuggest/getsuggestions?q=&indexName=&numItems=
Accept: text/x-json;

The request has the following three parameters:

  • q: the prefix of a string that should match all the suggested items
  • indexName: A string with the identifier of the index that should be used.
  • numItems: the default value will be 7 and we can only set values between 1 and 7.

Open Question: Should Servertime (the timespan the server needed for the operation) be optional, so there is a request to activate this for the response?

Response

Depending on the index, the response may or may not include a key. Same for images. Names and Status-Messages are always returned. In case no indexName is provided, a default index will be used. The status message makes aware of this. It is not allowed to ask for more than 7 results. The server will provide just 7 and give an appropriate message if more are requested.

Status-Messages are

  • "ok"
  • "No index name provided. Using default." When index name is missing
  • "too many items requested. Clipping to 7. This server only allows 7 items at most." When more than 7 results are requested

The data returned is composed to JSON, so the returned data look like this:

{suggestions: [item1, item2, item3,..., item7], 
serverTime: "string in Microseconds",
Status: "Status-Message"}

image request

HTTP/1.0 GET somedomain/autosuggest/getsuggestionswithimages?q=”querystring”&indexName=”bandIndex”&numItems=7
Accept: text/x-json;

TODO: better name for the URL

The request has the same three parameters as simple request:

  • q: the prefix of a string that should match all the suggested items
  • indexName: A string with the identifier of the index that should be used.
  • numItems: the default value will be 7 and we can only set values between 1 and 7.

image Response

Status-Messages are:

  • "ok"
  • "No index name provided. Using default." When index name is missing
  • "too many items requested. Clipping to 7. This server only allows 7 items at most." When more than 7 results are requested
  • "no image found. If there actually is an image, check if the server is allowed to access it!" If the request was made for an entry without image
{suggestions: [{name:item1, image:lkasjdfiojalsfnlsiafjlasldnf}, {name:item2, image:ioweurnjweuihsd, {name:item3, image:jnzsckhsdkfn},...], serverTime: “string in Microseconds”, }

The image binary data is treated as if it was a string in JSON. The image is reconstructed client-side. Since binary image data is not made to be viewed as text, the actual JSON seemingly contains garbage as image-attribute.

Meta Request

HTTP/1.0 GET somedomain/autosuggest/getserverinfo?

This request calls for data concerning the names of available index lists and the amounts of entries each one has. This data can be used for example to find out which contexts suggestions are available for and if they are populated well enough to be used.

Open Question: Is it ok to do this without status response or should the server be able to explicitly say "I have no index". There seems to be no other problem worth mentioning and this one should be obvious already, right?

Response

{indexList: [{name:name1, size: int}, {name:name2, size: int},...]}

Update

The update request is very similar to the one for create. Depending on the index the update is requested for, there are two kinds of requests similar to the create function. One for indexes with images:

HTTP/1.0 POST somedomain/autosuggest/addentry
Accept: text/x-json
Content-type: multipart/mixed

---------metalb0undary---------
Content-Type: text/x-json
{Dataname: Data}
---------metalb0undary---------
Content-Type: image/jpeg
Image-Bytestream...
---------metalb0undary-----------

And one for indexes without images:

TODO: request

Error Codes:

  • "ok" if everything succeeded
  • "Image file size too big. The maximal size is X" if the image file size is greater than the set limit. X is replaced with the value from the config file.
  • "wrong image scale, the right scale should be width=X and height=Y" if the image does not meet the set pixel size. X and Y are replaced by the values set in the config file!
  • "wrong image type. This server only supports JPEG." if the image is not in JPEG
  • "request incomplete. X is missing" if key, index name or entry name is missing (X replaced accordingly)
  • "key duplicate. It is not allowed to have the same key twice in an index." if the key already exists
  • "name duplicate. Indexes without keys don't allow the same name to exist twice." if the name already exists and duplicates are not allowed

Delete

Delete removes the entire entry including the image associated to it.

HTTP/1.0 GET somedomain/autosuggest/deletesuggestion?q=”querystring”

Note: There is no check against unintentional deletion intended at this point.

Misc

In order to keep an efficient data structure, the images can be stored apart from their index entry. If an certain index is requested, the image belonging to it must be found quickly so a hash map is necessary to speed up this search process.

This means every entry consists not only of name and priority but also the hash for the image.

e.g.:

Metallica, hashkey1, 8
Ensiferum, hashkey2, 4
Clone this wiki locally