Skip to content
Lukas Schmelzeisen edited this page Sep 2, 2013 · 22 revisions

Description

A search option enables users to find content they want to find.

Search is realized internally with Technologysolr.

Requirements fulfilled by this component

  • Fans (Information retrieval, fans want to be able to get easy access to informations)
  • Other Websites (recieve services from MetalCon)
  • General User (search for people, bands, venues)
  • Fans (get new music)
  • Fans (beginner-friendly, new users want to be able to find other people they might know easily)

Requirements for this component

Have a look at this UI mockup to get a basic understanding of requirements for the search component.

UI mockup for search

Download Pencil Document | Download SVG | Download PNG

  • Search: Users need to be able to find search information relevant to their search query. Results must be precise (percentage of relevant documents returned) and recall (percentage of relevant sites returned out of all relevant sites in the system) should be big. Users should be able formulate complicated search queries (e.g. use AND/OR search, disallow certain phrases, etc.).
    • Search results are aggregated both from metalcon internal database (Persons, Bands, User Reviews, ...) aswell as from metalcon's crawl of the metalweb. A suitable algorithm for zipping both the searches must be found.
  • Spellchecking / Query Suggestion: User entered queries should be spellchecked. If it is likely that the user misspelled his query, one alternative should be displayed. Additionally the system should guess what the user meant with his query and present according query suggestions.
  • Facet selection: Users need to be able to filter search results based on certain facet categories (Persons, Bands, Venues, Websites, ...). Have a look at Facebook search with faceting at the left for an example of this.
  • Facet display: For each search result item its facet should be displayed. In the current mockup this is just a letter abbreviating the facet category. In release this is probably going to be some form of image. Look at Metalcon's current search for an idea of this.
  • Semantic information display: If a specific facet category is detected for a search result, semantic information from the metalcon database should be displayed. See Section "Semantic Information".
  • Highlighting: When displaying search results, phrases the user explicitly queried need to be highlighted.
  • Paging: The users must not be overwhelmed by search results. Search results need to be truncated after a certain amount (probably 10). Further result batches need to be displayed if requested by the user.
  • AutoSuggestion (not shown in mockup): When entering a search query automatic suggetions should be displayed to the user based on common words, popular queries, past queries, ...
  • AutoImLucky: If we are pretty certain we know what the users wants, we should directly forward the user to a content page, instead of showing search results (e.g. When the user enters a band name "Ensiferum" we forward him directly to the "Ensiferum"-Metalcon-band-page instead of showing search results for the query "Ensiferum"). Facebook also does this. However there needs to be a mechanism for users to force search result display in case he doesn't want to get forwarded (in Facebook this is clicking the "magnifying glass" instead of just pressing "Enter").
  • API: Search needs to accessable through a public API to extern web pages.

additional thoughts

  • Related Queries: Formulating related queries? Google at least offers the: people also search for... This is probably what we want in Spellchecking / Query Suggestions.
  • Graph Search: Facebook nowadays offers the possibility to search for friends who like and do ... We want something like graph search, but due to its complexity most likely we wont have this in 1.0.
  • Personalization: It should be possible for a user to full text search the content that he has created and it should also be possible for him to get results from his interests ranked higher.

Example Queries

Query for

  • Specific Persons: Lukas
  • Persons my friends know
  • Persons that share my taste in music
  • Persons by stuff they write on their profile
  • Specific Bands: Ensiferum
  • Bands my friends like
  • Bands similar to Bands I like
  • Records: From Afar
  • Records I own
  • Records from Bands I like but do not own
  • Records my Friends own
  • Specific Songs: Deathbringer from the Sky
  • Songs by lyrics
  • Specific Venues: Druckluftkammer
  • Specific Events: Metalfest
  • Events / Venues that my friends visit
  • Events close to another Event
  • Events playing similar music like another Event
  • Events playing music I like
  • Events / Venues in radius around a point (my current location).
  • Events in a Venue
  • new Venues
  • Websites: Metal.de
  • Websites related to a specific topic
  • Genres: Melodic Death Metal
  • Genres I listen to a lot
  • Genres my friends listen to a lot
  • Stuff happening on a date: 23.8. (different date formats)
  • Stuff happening on a relative date: heute, morgen, dienstag, wochenende
  • Specific Reviews
  • Reviews for music I listen to
  • User Forums / Groups
  • Past User Posts
  • Past Private Messages
  • Metalcon functionality: Hilfe, Nachrichten
  • User settings
  • Merch from a specific band
  • ...

A list of facets and semantic information we could display for it.

Facet Semantic information
* Thumbnail, Number of people who like this, Number of friends who like this, friends close to you in social graph who like this)
Person Common Friends, Number of friends, Favorite Music, Age, Location
Genres Popular Bands
Band Discography, Current Label, Genres, Short Bio, Popular Songs
Records Band, Songs, Release Date, User Rating
Song Band, Record, User Rating
Venue Upcoming Events, Past Events, User Rating
Event Venue, Date, Number of people participating
User Posts Date, Author, Content
Website Link, Content, Language

Do we want the user to control result sorting on criteria other then relevancy? These options might differ on facets:

Facet Sorting
Person Number of common friends
Song Length
Events / Venues Distance to my location

API

The API takes requests with GET-Parameters and returns a JSON-Response.

GET-Parameters:

Parameter Name Type Default Description
query String The query to search for.
itemid Item-ID "" In case the user selected a entry from the autocomplete menu, a Document ID can be specified to increase search precision.
page Int 0 Search results are returned in batches of 10. This parameter controls which page to return.
facet Facet-ID "all" A facet to limit search results to.
pers.user User-ID Personalizes search results for this user.

JSON-Response

The response looks something like this. The response will not be pretty-printed like in this example, but rather a be truncated of all unnecessary characters.

{
    "request": {
        "query": "golf",
        "itemid": "",
        "page": 0,
        "facet": "all",
        "pers.user": "68fafdde-1427-4a2b-bf03-22d852a3b6c9",
    },
    "suggestedQueries": [
        {
            "query": "golf car",
            "itemid":  "12428535-4744-48d3-9cd1-bc43a7b11fdb",
            "display": "<em>golf</em> car"
        },
        {
            "query": "golf sea",
            "itemid": " 6cae52c3-ab6d-413c-b45c-f428fec88a6f",
            "display": "<em>golf</em> sea"
        },
        ...
    ],
    "facets": {
        "all": {
            "id": "all",
            "num": 123
        },
        "band": {
            "id": "band",
            "num": 2
        },
        ...
    },
    "docs": [
        {
            "id": "9213d4b8-08c6-4d5d-a0da-e506f11cac89",
            "facet": "band",
            ...
        },
        {
            "id": "ed78d5fe-e404-415d-8c3a-8f0b43c44b9e",
            "facet": "album",
            ...
        },
        ...
    ]
}

Explanation of fields:

Field name Type Description
request Object Mirrors the request the server recieved.
suggestedQueries Array A list of alternative queries the user might which to search for. This includes spellcorrected user queries.
suggestedQueries[].query String The search query to should be submitted to the searchServer if the users chooses to search for the query.
suggestedQueries[].itemid Item-ID An Item-ID to prevent ambiguity between similar queries, only used internally and not displayed to the user.
suggestedQueries[].display String The string to display to the user for the suggested query. This might not be the same as suggestedQueries[].query, for example because of highlighting.
facets Array A list of all facets that contained search results for this query. The facet all is always returned, and it is a union of all facets. It is even returned if the search yielded no results.
facets[].id Facet-ID An ID for the facet.
facets[].num Int Number of documents the search results contained of this facet.
docs Array A list of search results returned for the query. One item of this list is called a document. The list returns 10 documents at most, to request more results use the page-GET-Parameter. Number of pages available is always facets.all.num mod 10.
docs[].id Entity-ID ID unique to this document for reference with componentStaticDataDelivery.
docs[].facet Facet-ID The facet this document belongs to.
Facet Document-Attributes Description
TODO TODO TODO

API Open Questions

  • Images for suggested queries?
  • How are the facets returned sorted?
    • Fixed sorting order for facets (alphabetic, importance of facets)?
    • Ordered on number of results?
  • Do we have documents that could be part of multiple facets?
    • Does Solr/Lucene support this?

Search Open Questions

  • In order to display accompanying for search results (e.g. semantic information) we could put this information into Solr's database and have them returned with each search query. But it is probably going to be easier and more performant if we use our own datastore for this. What technology would be suitable here? A key/value store should suffice.
  • Not yet clear were language detection will happen. Solr definitly supports it, but I don't know of the performance implications. Language detection should be possible by nutch it's probably a better idea to feed the detected language into Solr.

Features

Technologies

Responsible Developer

  • Lukas
Clone this wiki locally