Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] 支持上传html格式的文档 #364

Open
zexho994 opened this issue May 6, 2024 · 3 comments
Open

[FEATURE] 支持上传html格式的文档 #364

zexho994 opened this issue May 6, 2024 · 3 comments

Comments

@zexho994
Copy link

zexho994 commented May 6, 2024

MaxKB 版本

1.1.1

请描述您的需求或者改进建议

希望可以支持html , md格式中table的友好切割方式,来提高table数据的命中率

请描述你建议的实现方案

No response

附加信息

No response

@zexho994 zexho994 changed the title [FEATURE] [FEATURE] 支持表格的更多分段方式 May 6, 2024
@baixin513
Copy link
Contributor

感谢反馈,关于html的后续可以考虑支持一下。
md格式中table的友好切割方式是怎么个友好切割法? 没太明白。

@baixin513 baixin513 changed the title [FEATURE] 支持表格的更多分段方式 [FEATURE] 支持上传html格式的文档 May 6, 2024
@zexho994
Copy link
Author

zexho994 commented May 7, 2024

比如markdown的格式

Month Savings
January $250
February $80
March $420

现在无论哪种切割方式,切割了table后,每一行的数据关联不到列,这样就失去了数据的语义了。
假如可以 Month=january , Savings=250 , Monty=February, Savings=80这样,就可以让table的每一行数据保证原有的语义。
现在MaxKB提供了很多的分段规则,希望这个点也可以变为一种可选切割规则,因为在实际使用时表格的回答效果不太好。

@baixin513
Copy link
Contributor

使用 markdown 样式的表格导入后,会按 markdown 样式切割表格。如下图
image

提问后只要命中分段,测试了一下回答也是正确的,你使用的什么模型呢?
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants