Skip to content

demonatic/Skilo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skilo Search Engine

USTC 软院 2019工程实践Project

Skilo is a simple Search Engine implemented in C++ on Linux. It provides Restful API to create collection with corresponding schema, add documents to collection and word/phrase/fuzzy search services and etc.

Demo

A Chinese recipe search demo:

一个中文的食谱搜索demo

Documents

0x01 0x02 0x03 0x04
项目概览 索引实现 Schema实现 查询实现

Feature

  • Simple and easy to use RESTful API
  • Nested schema field support
  • Word Search/ Phrase Search / Fuzzy search & Typo tolerant
  • Query by/Sort by
  • Chinese Support
  • Auto Suggestion

Build & Run

git clone https://github.com/demonatic/Skilo --recursive
cd ./Skilo
cmake .  #build dependencies will take a while
make
make install
Skilo

edit the file "/etc/skilo/skilo.conf" in project root directory to change database/log directory, listen port and etc.

Usage Example

  • Create a collection

    POST /collections
    {
      "name":"recipe",
      "tokenizer":"jieba",
      "schema":{
          "type":"object",
          "$fields": {
              "recipe_name":{
                  "type":"string",
                  "index":true
              },
              "difficulty":{"type":"integer"},
              "rank":{"type":"integer"},
              "tips":{"type":"string"},
              "ingredients": {
                  "type": "array",
                  "$items": {
                      "type": "object",
                      "$fields":{
                          "note": {"type": "string"},
                          "title": {"type": "string", "index":true}
                      }
                  }
              },
              "steps": {
                  "type": "array",
                  "$items": {
                      "type": "object",
                      "$fields":{
                          "content": {"type": "string"},
                          "image": {"type": "string"}
                      }
                  }
              }
          }
      },
      "auto_suggestion":{
          "entry_num":5,
          "min_gram":2,
          "max_gram":15
      }
    }
  • Add document(s) to collection

    POST /collections/<collection_name>

    add single document:

    {
        "id":1001,
        "recipe_name": "麻婆豆腐",
        "tips": "反正很好吃哦,而且做起来很简单呢",
        "difficulty":1,
        "rank":4,
        "ingredients": [{
            "note": "500克",
            "title": "豆腐"
        }, {
            "note": "150克",
            "title": "肉末"
        }],
        "steps": [{
                "content": "豆腐切一厘米见方的小块。",
                "image": "/recipe/image/1001/1.jpg"
            },{
                "content": "花椒和麻椒冷油下锅,慢火2-3分钟爆出香味后捞出扔掉。",
                "image": "/recipe/image/1001/2.jpg"
            },{
                "content": "锅里底油放入蒜末和郫县豆瓣酱小火翻炒1-2分钟。",
                "image": "/recipe/image/1001/3.jpg"
            },{
                "content": "然后放入肉末翻炒至熟,炒熟的肉末加入一小碗半开水煮2-3分钟。",
                "image": "/recipe/image/1001/4.jpg"
            },{
                "content": "然后加入豆腐块,不要用铲子翻板,轻轻的将豆腐推开即可,在煮4-5分钟,让豆腐完全入味。",
                "image": "/recipe/image/1001/5.jpg"
            },{
                "content": "出锅前加入少许淀粉水,让汤汁更加浓稠。",
                "image": "/recipe/image/1001/6.jpg"
            }
        ]
    }

    add batch:

    {
        "docs":[
            <doc1>,
            <doc2>,
            <doc3>
        ]
    }
    
  • Query Collection

    GET /collections/<collection_name>/documents

    in case some client doesn't support GET with body, also:

    POST /collections/<collection_name>/documents
    {
        "query": "豆腐",
        "query by": ["recipe_name","ingredients.$items.title"],
        "boost": [2.5,1],
        "sort by":["difficulty:asc","rank:desc"]
    }
  • Query Result

    {
      "found": 2,
      "hits": [
        {
            "id": "1001",
            "recipe_name": "麻婆豆腐",
            "difficulty":1,
            "rank":4,
            ....
        },
        {
            "id": "1002",
            "recipe_name": "麻婆豆腐",
            "difficulty":2,
            "rank":3,
            ....
        }
      ],
      "scores":[14.86,5.32],
      "took secs": 0.000732,
    }
    
  • Auto Suggestion

    List top K hot queries start with given prefix <query_prefix>

    GET /collections/<collection_name>/auto_suggestion?q=<query_prefix>

    e.g. list hot query suggestions start with "红烧" :

    {
      "suggestions": [
        "红烧肉",
        "红烧狮子头",
        "红烧带鱼"
      ]
    }
  • Overall Summary

    show all collections brief information

    GET /collections/

    we get:

    {
      "collections": [
        {"name":"recipe","created at":"Mon Oct 2 00:59:08 2019","doc num":84231},
        {"name":"order","created at":"Thu Jun 18 15:48:19 2020","doc num":5652}
      ]
    }
  • Collection Summary

    show brief information(collection name, schema, tokenizer name, created_time, doc num...) about given collection

    GET /collections/<collection_name>

Test

Unit Test and Integration Testing are based on GoogleTest framework

Reference

Beating hash tables with trees? The ART-ful radix trie

The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases

Frame of Reference and Roaring Bitmaps

现代信息检索

TypeSense Guide

ELASTICSEARCH 搜索的评分机制

Lucene系列(10)——相似度评分机制浅析(终篇)

Elasticsearch权威指南(中文版)

Autosuggest Retrieval Data Structures & Algorithms

About

A simple search engine implemented in C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published