Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

How to use GNES for text classification? #358

Open
ilham-bintang opened this issue Oct 25, 2019 · 2 comments
Open

How to use GNES for text classification? #358

ilham-bintang opened this issue Oct 25, 2019 · 2 comments

Comments

@ilham-bintang
Copy link

ilham-bintang commented Oct 25, 2019

Problem and Question

Hi, I have take a look the poem project. And want to use another data for indexing. How to use labeled csv to do supervised learning with text data?

my data sample.tsv with thai text:

intent	question
ClassA	FAQ 1? ចូលទៅប្រើប្រាស់កម្មវិធីនេះ?
ClassA	Another FAQ Question similar to FAQ 1?
ClassB	TestQuestion with thai text ចូលទៅប្រើប្រាស់កម្មវិធីនេះ?
ClassB	Another data sample

What I have trial

  1. I try to pass Pandas Series and it raise GRPC Error,
  2. I try to pass tuple with (intent, question) and raise GRPC error
  3. I try to use the question only to index it and convert the str into bytes. This is successfully build without GRPC error, but it raise
W:EncoderService:[enc:emb: 42]:document (doc_id=20) contains no chunks!
W:IndexerService:[ind:_ha: 57]:document (doc_id=10) contains no chunks!
W:EncoderService:[enc:emb: 42]:document (doc_id=22) contains no chunks!
W:IndexerService:[ind:_ha: 57]:document (doc_id=12) contains no chunks!
W:EncoderService:[enc:emb: 42]:document (doc_id=24) contains no chunks!
W:IndexerService:[ind:_ha: 57]:document (doc_id=16) contains no chunks!
E:EncoderService:[enc:emb: 67]:can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
W:EncoderService:[enc:emb: 68]:encoder service throws an exception, the sequel pipeline may not work properly

Question

How to train the labeled csv data?

@ilham-bintang ilham-bintang changed the title How to encode the Thai text properly? How to encode text properly? Oct 25, 2019
@ilham-bintang ilham-bintang changed the title How to encode text properly? How to use GNES for text classification? Oct 25, 2019
@ilham-bintang ilham-bintang reopened this Oct 25, 2019
@hanxiao
Copy link
Collaborator

hanxiao commented Oct 26, 2019

Hi, short answer, not straightforward. Long answer, let me explain.

GNES as its name suggested, focusing on the search scenario. With the recent release GNES Flow, it becomes more obvious that GNES is to some extent similar to Kubeflow/Airflow: it provides a cloud-native workflow for AI-powered microservices. However, the major difference is that GNES' workflow is designed and optimized for search scenario only.

If you look at GNES components and predefined flows, they are completely search-driven. This is because from the day one, this project is designed to be the next-gen search engine, nothing else.

So if you ask me whether it can be used for classification, clustering, recommendation etc. Maybe it can be done easily, maybe one needs more component or flow. To be honest, I didn't put much thoughts about these tasks, not as much as I put in search (where also my experience in). Meanwhile, I do welcome people to contribute their ideas on this thread, in particular,

  • What other components besides Encoder, Router, Preprocessor, Indexer are required to achieve the classification task?
  • How is the classification workflow look like? Can we represent it using GNES Flow?
  • What are the responses in this task to the client, is it describable via the current response protobuf?

Once these questions are figured out, you will get the answer.

@ilham-bintang
Copy link
Author

Hi @hanxiao
First of all, Thank you for your contribution to this great project and clean explanation about GNES.

Actually the issue above has been fixed by removing the Cambodian letter.
I see the Cambodian letter can not be encoded with utf-8.


My answer:
My task is creating FAQ searching (similar to the Demo Poem Search). This task completely searching task.

In the very first project, we check the similarity by calculating the distance between the sentences vector with Flair DocumentEmbedding stack. But the result pretty bad.

So we found this search engine using the neural net and the result can be better than measure the Cosine Distance.

thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants