Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在官方最新源码基础上更改持久化存储对象为反向索引与评分字段 #35

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
@@ -1,8 +1,8 @@
悟空全文搜索引擎
======

* [高效索引和搜索](/docs/benchmarking.md)(10M条微博3.6G数据7分钟索引完,2.5毫秒搜索响应时间,每秒可处理1.6K次请求
* 支持中文分词(使用[sego分词包](https://github.com/huichen/sego)并发分词,速度13MB/秒)
* [高效索引和搜索](/docs/benchmarking.md)(1M条微博500M数据28秒索引完,1.65毫秒搜索响应时间,19K搜索QPS
* 支持中文分词(使用[sego分词包](https://github.com/huichen/sego)并发分词,速度27MB/秒)
* 支持计算关键词在文本中的[紧邻距离](/docs/token_proximity.md)(token proximity)
* 支持计算[BM25相关度](/docs/bm25.md)
* 支持[自定义评分字段和评分规则](/docs/custom_scoring_criteria.md)
Expand All @@ -11,7 +11,7 @@
* 可实现[分布式索引和搜索](/docs/distributed_indexing_and_search.md)
* 采用对商业应用友好的[Apache License v2](/license.txt)发布

微博搜索演示 http://soooweibo.com
[微博搜索demo](http://vhaa7.fmt.tifan.net:8080/)

# 安装/更新

Expand Down
70 changes: 70 additions & 0 deletions core/data.go
@@ -0,0 +1,70 @@
package core

import (
"github.com/huichen/wukong/types"
"sync"
)

// 文档信息[shard][id]info
var DocInfoGroup = make(map[int]*types.DocInfosShard)
var docInfosGroupRWMutex sync.RWMutex

func AddDocInfosShard(shard int) {
docInfosGroupRWMutex.Lock()
defer docInfosGroupRWMutex.Unlock()
if _, found := DocInfoGroup[shard]; !found {
DocInfoGroup[shard] = &types.DocInfosShard{
DocInfos: make(map[uint64]*types.DocInfo),
}
}
}

func AddDocInfo(shard int, docId uint64, docinfo *types.DocInfo) {
docInfosGroupRWMutex.Lock()
defer docInfosGroupRWMutex.Unlock()
if _, ok := DocInfoGroup[shard]; !ok {
DocInfoGroup[shard] = &types.DocInfosShard{
DocInfos: make(map[uint64]*types.DocInfo),
}
}
DocInfoGroup[shard].DocInfos[docId] = docinfo
DocInfoGroup[shard].NumDocuments++
}

// func IsDocExist(docId uint64) bool {
// docInfosGroupRWMutex.RLock()
// defer docInfosGroupRWMutex.RUnlock()
// for _, docInfosShard := range DocInfoGroup {
// _, found := docInfosShard.DocInfos[docId]
// if found {
// return true
// }
// }
// return false
// }

// 反向索引表([shard][关键词]反向索引表)
var InvertedIndexGroup = make(map[int]*types.InvertedIndexShard)
var invertedIndexGroupRWMutex sync.RWMutex

func AddInvertedIndexShard(shard int) {
invertedIndexGroupRWMutex.Lock()
defer invertedIndexGroupRWMutex.Unlock()
if _, found := InvertedIndexGroup[shard]; !found {
InvertedIndexGroup[shard] = &types.InvertedIndexShard{
InvertedIndex: make(map[string]*types.KeywordIndices),
}
}
}

func AddKeywordIndices(shard int, keyword string, keywordIndices *types.KeywordIndices) {
invertedIndexGroupRWMutex.Lock()
defer invertedIndexGroupRWMutex.Unlock()
if _, ok := InvertedIndexGroup[shard]; !ok {
InvertedIndexGroup[shard] = &types.InvertedIndexShard{
InvertedIndex: make(map[string]*types.KeywordIndices),
}
}
InvertedIndexGroup[shard].InvertedIndex[keyword] = keywordIndices
InvertedIndexGroup[shard].TotalTokenLength++
}