Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加 postgresql 中文全文搜索 zhparser/jieba/pgroonga #59

Open
davidlauhn opened this issue Aug 1, 2019 · 22 comments
Open

添加 postgresql 中文全文搜索 zhparser/jieba/pgroonga #59

davidlauhn opened this issue Aug 1, 2019 · 22 comments

Comments

@davidlauhn
Copy link

postgresql自带的搜索不支持中文,导致ttrss搜索中文的根本没法用,不知道有没有计划添加 zhparser/jieba/pgrooga之类的?

@davidlauhn davidlauhn changed the title 有没有计划给ttrss中的postgresql添加中文全文搜索插件 zhparser 或 jieba 或 pgroonga? 有没有计划给ttrss中的postgresql添加中文全文搜索插件 zhparser/jieba/pgroonga? Aug 1, 2019
@HenryQW
Copy link
Owner

HenryQW commented Aug 6, 2019

没有这个计划,看起来需要更改 TTRSS 的搜索逻辑,mysql 有这个问题吗?

@HenryQW
Copy link
Owner

HenryQW commented Aug 6, 2019

推荐通过阅读器来实现全文搜索,比如 Reeder

@davidlauhn
Copy link
Author

mysql没有试过哦,我自己慢慢试试看,谢谢

@HenryQW HenryQW closed this as completed Aug 7, 2019
@jostyee
Copy link

jostyee commented Aug 16, 2019

https://discourse.tt-rss.org/t/solved-search-in-chinese/2241/2
如果没有理解错其实tinytinyrss已经支持,只要配置好pgrooga就能设置全局搜索的默认语言了?

@davidlauhn
Copy link
Author

https://discourse.tt-rss.org/t/solved-search-in-chinese/2241/2
如果没有理解错其实tinytinyrss已经支持,只要配置好pgrooga就能设置全局搜索的默认语言了?

太菜,搞不懂pgrooga怎么配置,然后用zhparser实现了

@HenryQW
Copy link
Owner

HenryQW commented Aug 16, 2019

@davidlauhn 可以分享一下解决方案,我看看能不能加进去。或者直接 PR 就完美了!

@HenryQW
Copy link
Owner

HenryQW commented Aug 16, 2019

@jostyee 没看懂 DEFAULT_SEARCH_LANGUAGE 的用法,我试了下那个贴里的办法还是不行。

@HenryQW HenryQW reopened this Aug 16, 2019
@davidlauhn
Copy link
Author

davidlauhn commented Aug 17, 2019

@HenryQW 本人非码农/非运维,以下全部基于copy/paste,只知然,不知所以然,而且不一定准确,没法接受提问,因为真的不懂,抱歉 :-)

修改了两个 docker image

docker-compose.yml

services:
  database.postgres:
    image: davidlauhn/postgres-11-with-zhparser:latest
    container_name: postgres
    environment:
      - PG_PASSWORD=password # please change the password
      - DB_EXTENSION=pg_trgm
    volumes:
      - ~/postgres/data/:/var/lib/postgresql/ # persist postgres data to ~/postgres/data/ on the host
    restart: always

  service.rss:
    image: davidlauhn/awesome-ttrss:latest
    container_name: ttrss
    ports:
      - 80:80
    environment:
      - SELF_URL_PATH=http://domain.name/ # please change to your own domain
      - DB_HOST=database.postgres
      - DB_PORT=5432
      - DB_NAME=ttrss
      - DB_USER=postgres
      - DB_PASS=password # please change the password
      - ENABLE_PLUGINS=auth_internal, fever # auth_internal is required. Plugins enabled here will be enabled for all users as system plugins
    stdin_open: true
    tty: true
    restart: always
    command: sh -c 'sh /wait-for.sh database.postgres:5432 -- php /configure-db.php && exec s6-svscan /etc/s6/'

  service.mercury: # set Mercury Parser API endpoint to `service.mercury:3000` on TTRSS plugin setting page
    image: wangqiru/mercury-parser-api:latest
    container_name: mercury
    expose:
      - 3000
    restart: always

    service.opencc: # set OpenCC API endpoint to `service.opencc:3000` on TTRSS plugin setting page
    image: wangqiru/opencc-api-server:latest
    container_name: opencc
    environment:
      NODE_ENV: production
    expose:
      - 3000
    restart: always

然后配置一下zhparser

    docker exec -it postgres /bin/sh
    psql -U postgres -d ttrss -c 'CREATE EXTENSION zhparser'
    psql ttrss postgres -c 'CREATE TEXT SEARCH CONFIGURATION Chinese (PARSER = zhparser)'
    psql ttrss postgres -c 'ALTER TEXT SEARCH CONFIGURATION Chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple'
    psql ttrss postgres
    update ttrss_entries set tsvector_combined = to_tsvector('Chinese', content);

重启一下postgresql,更改ttrss的搜索语言为Chinese即可。

中文搜索堪用,但貌似分词稍微有点点小问题,zhparser会把长词拆成短词匹配,应该是zhparser默认的配置还需要调教,因我要求也不高,所以将就着用了

@jostyee
Copy link

jostyee commented Aug 17, 2019

@davidlauhn 启用zhparser没那么麻烦,sameersbn/postgresql 支持通过env开启的:

https://github.com/sameersbn/docker-postgresql#enabling-extensions

@davidlauhn
Copy link
Author

@jostyee 我也不想这么大费周章,可不懂嘛,所以就跟着说明一步步走咯 :-)

@HenryQW
Copy link
Owner

HenryQW commented Aug 17, 2019

@jostyee zhparser 还需要装依赖的,不能直接开启

@stale
Copy link

stale bot commented Sep 12, 2019

This issue has been automatically marked as stale because it has not had recent activity in 14 days. It will be closed if no further activity occurs in 7 days. Thank you for your contributions.

@stale stale bot added the wontfix label Sep 12, 2019
@stale stale bot closed this as completed Sep 19, 2019
@HenryQW HenryQW reopened this Dec 20, 2019
@stale stale bot removed the wontfix label Dec 20, 2019
@HenryQW
Copy link
Owner

HenryQW commented Dec 20, 2019

有空调查一下可行性。欢迎大佬 PR!

@HenryQW HenryQW changed the title 有没有计划给ttrss中的postgresql添加中文全文搜索插件 zhparser/jieba/pgroonga? 添加 postgresql 中文全文搜索 zhparser/jieba/pgroonga Dec 20, 2019
@stale stale bot added the wontfix label Jan 3, 2020
@HenryQW HenryQW removed the wontfix label Jan 3, 2020
Repository owner deleted a comment from stale bot Jan 3, 2020
@stale
Copy link

stale bot commented Jan 17, 2020

This issue has been automatically marked as stale because it has not had recent activity in 14 days. It will be closed if no further activity occurs in 7 days. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 17, 2020
@HenryQW HenryQW pinned this issue Jan 21, 2020
@stale stale bot removed the wontfix label Jan 21, 2020
@HenryQW HenryQW unpinned this issue Jan 21, 2020
@hoilc
Copy link

hoilc commented Feb 9, 2020

简单地改了版,有兴趣的可以试用下

postgres镜像 hoilc/postgres-chinese-textsearch:latest

ttrss镜像 hoilc/ttrss:latest, 需要添加环境变量TEXTSEARCH_EXTENSION=pg_jieba,zhparser

https://github.com/hoilc/Awesome-TTRSS/blob/master/docker-compose.yml

@ptsa
Copy link

ptsa commented May 30, 2020

简单地改了版,有兴趣的可以试用下

postgres镜像 hoilc/postgres-chinese-textsearch:latest

ttrss镜像 hoilc/ttrss:latest, 需要添加环境变量TEXTSEARCH_EXTENSION=pg_jieba,zhparser

https://github.com/hoilc/Awesome-TTRSS/blob/master/docker-compose.yml

@HenryQW 这个好用的话可以合并过来,ttrss的中文搜索的确不行

@HenryQW
Copy link
Owner

HenryQW commented May 30, 2020

PR 一下嘛?我最近太忙了

@ptsa
Copy link

ptsa commented May 30, 2020

@hoilc 提交下pr
@HenryQW 他这个postgresql 也有改 你要fork 下他的postgresql 吧 https://github.com/hoilc/postgres-chinese-textsearch

@ptsa
Copy link

ptsa commented Jun 30, 2020

@hoilc 没有提交pr 我复制了他的代码 提交了 pr

@0rt
Copy link

0rt commented Apr 1, 2021

请问这个修改汇到latest没有?我尝试搜索中文还是没成功

@ptsa
Copy link

ptsa commented Apr 2, 2021

@0rt 我提交没成功。可能方法没对

@appotry
Copy link

appotry commented Apr 12, 2023

调试了一个最新版的 postgres-chinese-textsearch

postgres-chinese-textsearch
https://hub.docker.com/r/bloodstar/postgres-chinese-textsearch

version: "3"
services:
  service.rss:
    image: bloodstar/ttrss:latest
    container_name: ttrss
    ports:
      - 181:80
    environment:
      - SELF_URL_PATH=http://localhost:181/ # please change to your own domain
      - DB_HOST=database.postgres
      - DB_PORT=5432
      - DB_NAME=ttrss
      - DB_USER=postgres
      - DB_PASS=ttrss # please change the password
      - PUID=1000
      - PGID=1000
      - TEXTSEARCH_EXTENSION=pg_jieba # add support for chinese fulltext search (pg_jieba, zhparser, or both two)
    volumes:
      - feed-icons:/var/www/feed-icons/
    networks:
      - public_access
      - service_only
      - database_only
    stdin_open: true
    tty: true
    restart: always

  service.mercury: # set Mercury Parser API endpoint to `service.mercury:3000` on TTRSS plugin setting page
    image: wangqiru/mercury-parser-api:latest
    container_name: mercury
    networks:
      - public_access
      - service_only
    restart: always

  service.opencc: # set OpenCC API endpoint to `service.opencc:3000` on TTRSS plugin setting page
    image: wangqiru/opencc-api-server:latest
    container_name: opencc
    environment:
      - NODE_ENV=production
    networks:
      - service_only
    restart: always

  # database.postgres:
  #   image: postgres:13-alpine
  #   container_name: postgres
  #   environment:
  #     - POSTGRES_PASSWORD=ttrss # feel free to change the password
  #   volumes:
  #     - ~/postgres/data/:/var/lib/postgresql/data # persist postgres data to ~/postgres/data/ on the host
  #   networks:
  #     - database_only
  #   restart: always

  database.postgres:
    image: bloodstar/postgres-chinese-textsearch:latest
    container_name: postgres
    environment:
      - POSTGRES_PASSWORD=ttrss # please change the password
    volumes:
      - ~/postgres/data/:/var/lib/postgresql/data # persist postgres data to ~/postgres/data/ on the host
    restart: always

  # utility.watchtower:
  #   container_name: watchtower
  #   image: containrrr/watchtower:latest
  #   volumes:
  #     - /var/run/docker.sock:/var/run/docker.sock
  #   environment:
  #     - WATCHTOWER_CLEANUP=true
  #     - WATCHTOWER_POLL_INTERVAL=86400
  #   restart: always

volumes:
  feed-icons:

networks:
  public_access: # Provide the access for ttrss UI
  service_only: # Provide the communication network between services only
    internal: true
  database_only: # Provide the communication between ttrss and database only
    internal: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants