Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search messages in conversation by keyword #675

Open
gabriel-vasile opened this issue Jul 13, 2021 · 14 comments
Open

Search messages in conversation by keyword #675

gabriel-vasile opened this issue Jul 13, 2021 · 14 comments

Comments

@gabriel-vasile
Copy link
Contributor

No description provided.

@gabriel-vasile
Copy link
Contributor Author

I guess there is no ETA for this feature and no plans to ever implement it, but I'm willing to contribute the server code for it.
Please tell me how you think this should be done with some references to the code and I'll open a pr.

@or-else
Copy link
Contributor

or-else commented Jul 19, 2021

Can you write up a proposal? Something along the lines of:

  1. Features, such as support/no support for
    a. Grammar, such as stemming.
    b. Chinese and other languages which require word segmentation.
    c. Language detection.
    d. What to do when the language is not supported.
  2. Location of query rewrite if any of the features from 1 require it.
  3. Changes to the external API
  4. DB organization, index structure.
  5. Cluster mode changes

@gabriel-vasile
Copy link
Contributor Author

I was thinking about integrating something like Elasticsearch.
Writing a search engine from scratch is not a PR, it's a full time job for a team.

@or-else
Copy link
Contributor

or-else commented Jul 22, 2021

In case of an external search provider I think the following is needed:

  1. Client-side API for sending search queries to the server and getting search results.
  2. Server-side API (plugin) for sending messages to the provider for indexing, sending queries and getting responses.
  3. At least one client with UI for creating queries and showing results.

@gabriel-vasile
Copy link
Contributor Author

About the interaction between tinode and the search provider, there are two approaches to indexing:

  1. using a plugin, like you said
  2. let the search engine do it

For 1. there is the issue with existing messages. There needs to be a possibility to index all existing messages in case the index is lost, was just initialized, or any other reason.
For 2., with Elastic at least, indexing is easily solved with a pipeline. Elastic supports mysql, mongo, and rethinkdb as data sources. Users need to provide a pipeline and Elastic will periodically query the database for new messages and index them. I'm not sure other search providers have this feature.

I think we should first decide if we are going to support more than one search engine and which one/s in particular. In my use case, supporting just Elastic is fine and it would make the implementation of this feature so much easier.

@or-else
Copy link
Contributor

or-else commented Jul 23, 2021

I would separate the concerns of starting a new service from scratch vs upgrading an existing service with message search.

I do see value of having Elastic or any other provider going to the DB directly. It also has drawbacks. For example, if we implement any sort of encryption at rest (a feature some people want) then the direct intake from the DB won't work.

I think we should first decide if we are going to support more than one search engine

I think there should be a choice. It does not need to be implemented immediately, a single provider is a good start. But there is value in an abstraction layer. Tinode is frequently used in organizations with an established infrastructure. If they use Solr or Algolia then it would be a harder decision if Tinode supports Elastic only.

@gabriel-vasile
Copy link
Contributor Author

I guess with 'encryption at rest' you mean end-to-end encryption and not just server-side encryption.
If that's the case, then there is no other choice but to let the clients do the search.
Sorry, but I think I'll have to drop working on this as I'm not really familiar with any of the client SDKs neither the languages.

@or-else or-else reopened this Aug 3, 2021
@or-else
Copy link
Contributor

or-else commented Aug 3, 2021

This is a useful feature. No need to close even if you don't want to work on it.

I meant what I said: encryption at rest.

@gabriel-vasile
Copy link
Contributor Author

gabriel-vasile commented Aug 5, 2021

I meant what I said: encryption at rest.

What you said is not clear enough. You can have end-to-end encryption (clients have the encrypt/decrypt keys) or server-side encryption (the server has the encrypt/decrypt key). In both cases the data is encrypted "at rest". But one has access to the plain, unencrypted data on the server and allows you to search through it, the other doesn't.

@rkgarcia
Copy link
Contributor

rkgarcia commented Aug 19, 2021

What about to use Full Text Search from Databases? With end-to-end encryption the search must be done in the client side

@or-else
Copy link
Contributor

or-else commented Aug 19, 2021

What about to use Full Text Search from Databases?

Rethinkdb does not have it at all. Mongo has no support for CJK - it can't split words. FTS in all three databases is mostly useless for heavily inflected languages.

So, it can be done for English with MySQL and maybe with Mongo but it will suck.

@or-else
Copy link
Contributor

or-else commented Aug 19, 2021

Elastic or sphinx or solr is not a bad idea.

@ice-myles
Copy link

Are there any planned release dates for the full text search and encryption in rest features?
They are showed here in the planned section.

@or-else
Copy link
Contributor

or-else commented Nov 30, 2022

No. @ice-myles are you willing to help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@rkgarcia @or-else @gabriel-vasile @ice-myles and others