Welcome to mint-filter 👋

基于Aho–Corasick算法实现的敏感词过滤方案，Aho–Corasick算法是由Alfred V. Aho和Margaret J.Corasick 发明的字符串搜索算法，用于在输入的一串字符串中匹配有限组“字典”中的子串。它与普通字符串匹配的不同点在于同时与所有字典串进行匹配。算法均摊情况下具有近似于线性的时间复杂度，约为字符串的长度加所有匹配的数量。

实现详细说明（搜索算法未更新，请查看代码）：

个人博客：《TypeScript：Aho–Corasick算法实现敏感词过滤》

掘金社区：《TypeScript：Aho–Corasick算法实现敏感词过滤》

🏠 Homepage

1. 性能

运行性能

需要注意的是，实际生产环境运行速度会比下面测试数据更快。

使用20000个随机敏感词实例化的平均时间：< 96ms

测试字符串包含随机生成的汉字、字母、数字。以下测试均在20000个随机敏感词构建的树下进行测试，每组测试6次取平均值：

编号	字符串长度	不替换敏感词[replace:false]	替换敏感词
1	1000	< 1.35ms	< 1.55ms
2	5000	< 3.60ms	< 3.60ms
3	10000	< 8.10ms	< 9.81ms
4	20000	< 15.03ms	< 16.03ms
5	50000	< 20.83ms	< 21.18ms
6	100000	< 29.02ms	< 34.45ms

与DFA算法对比

	Aho-Corasick算法	DFA算法
算法效率	多模式串时表现良好	单模式串时表现良好
内存占用	需要更多内存空间	内存消耗小
匹配速度	非常快，具有线性时间复杂度	常量时间复杂度，匹配速度受限于自动机大小
实现难度	较为复杂，需要构建AC自动机并进行状态转移	实现简单，只需维护一个大的状态转移表
匹配多个字符串	高效，可同时搜索多个模式字符串	必须为每个字符串单独执行匹配操作
错误容错性	良好	不太好

2. 安装

yarn add mint-filter

3. 使用

CommonJS导入

const { Mint } = require('mint-filter')

TypeScript / ES Module引用

import Mint from 'mint-filter'
const mint = new Mint(['敏感词数组'])

// 基本使用
mint.filter('需要验证的文本')

4. 构造函数

constructor

• new Mint(keys)

参数

Name	Type
`keys`	`string`[]

定义于

index.ts:26

5. 方法

add

▸ add(key, build?): boolean

新增敏感词

Example

const status = mint.add('敏感词')

参数

Name	Type	Default value	Description
`key`	`string`	`undefined`	关键词
`build`	`boolean`	`true`	是否构建树，默认不用传递

返回

boolean

状态

定义于

index.ts:233

delete

▸ delete(key): "update" | "delete"

删除关键字

Example

const status = mint.delete('敏感词')

参数

Name	Type	Description
`key`	`string`	关键词

返回

"update" | "delete"

状态（update ｜ delete），告知用户是删除了树上的节点还是单纯的更新了节点

定义于

index.ts:169

filter

▸ filter(text, options?): FilterData

过滤文本

Example

mint.add('无法通过')
let status = mint.filter('这是一句无法通过的文本')
console.log(status) // { words: ["无法通过"], text: "这是一句****的文本" }

status = mint.filter('这是一句无法通过的文本', { replace: false })
console.log(status) // { words: ["无法通过"], text: "这是一句无法通过的文本" }

参数

Name	Type	Description
`text`	`string`	文本内容
`options?`	`Pick`<`FilterOptions`, `"replace"`>	-

返回

FilterData

FilterData

定义于

index.ts:134

verify

▸ verify(text): boolean

检测文本是否通过验证

Example

mint.add('无法通过')
const status = mint.verify('这是一句无法通过的文本')
console.log(status) // false

参数

Name	Type	Description
`text`	`string`	文本内容

返回

boolean

Boolean

定义于

index.ts:152

6. 测试

yarn run test

7. 作者

👤 ZhelinCheng

Website: https://zhelin.me
Github: @ZhelinCheng

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

Show your support

Give a ⭐️ if this project helped you!

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
.github/workflows		.github/workflows
.husky		.husky
docs		docs
src		src
time		time
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.npmignore		.npmignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
_README.md		_README.md
commitlint.config.js		commitlint.config.js
jest.config.js		jest.config.js
nodemon.json		nodemon.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.module.json		tsconfig.module.json
yarn.lock		yarn.lock

License

ZhelinCheng/mint-filter

Folders and files

Latest commit

History

Repository files navigation

Welcome to mint-filter 👋

🏠 Homepage

1. 性能

运行性能

与DFA算法对比

2. 安装

3. 使用

CommonJS导入

TypeScript / ES Module引用

4. 构造函数

constructor

参数

定义于

5. 方法

add

参数

返回

定义于

delete

参数

返回

定义于

filter

参数

返回

定义于

verify

参数

返回

定义于

6. 测试

7. 作者

🤝 Contributing

Show your support

📝 License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages