Any one want to add support for messenger chats? Wechat? Telegram? #5

Spandan-Madan · 2018-11-28T07:47:03Z

This is a proof of concept, and help in extending it to other kinds of chat sources would be helpful. If someone can make a PR, I would be more than happy to accept it!

Essentially, you would need to make a clean_messenger_chat.py or clean_wechat_chat.py file to create all the pickle files as are currently created from WhatsApp chats.

Thanks!

kwikiel · 2018-11-28T11:10:04Z

What's the format of your_sents.p? - it would be interesting to add messenger support but it's unclear for me what's the structure of those intermediary files since I don't have Whatsapp

JafarAkhondali · 2018-11-28T11:44:50Z

@Spandan-Madan I would like to try creating Telegram version, But it's better to put some comments in your code ( I know it's clean but putting some comments helps other understand the concepts and understand code without reading it)

ghost · 2018-11-28T22:07:31Z

@Spandan-Madan I'd like to try WeChat version, but I'm wondering would it works if the chat context is in another language (e.g. Chinese)? I guess it's relevent with the pretrained model from tensorflow hub

Spandan-Madan · 2018-11-29T01:28:35Z

@kwikiel You can take a look at the python file clean_whatsapp_chats.py to see how I'm parsing for more details. But to give you some direction - your_sents.p just stores a list of your sentences as a pickle file.

Broadly, when you parse, you care about 3 things -

a list of sentences you said.
list of sentences said by the other person.
A dictionary mapping their sentences to your corresponding responses.

The prepare_files.ipynb file basically works with these, and embeds yours and their sentences using tensorflow's universal sentence encoder. Does some other stuff too, if you read the code it should be clear but if you have any questions, feel free to ask here and I will respond! Best of luck!

Spandan-Madan · 2018-11-29T01:29:47Z

@JafarAkhondali @kwikiel I can add comments to make it more understandable. I am very busy with research these days so I might take a few days to get around to it, but will do it asap. In the meantime, try working with it and please ask here if you have any questions, Thanks a lot!

Spandan-Madan · 2018-11-29T01:31:40Z

@dhdiego For other languages I have an idea that you can try.

You'll have to use Facebook's fasttext instead of google's universal sentence encoder. Fasttext is multilingual and available in over 100 languages. It gives you an embedding for a word. So, you can potentially take the average of the fasttext embeddings of the words as the embedding of the sentence. I have tried it before and it works well.

ghost · 2018-11-29T02:03:29Z

@Spandan-Madan Yeah I'm also thinking about fasttext which use a simple global average to get sentence embedding. Anyway chat records for Wechat is a little bit complex to get (can't save as txt directly) I'm working on this and will follow up.

Spandan-Madan · 2018-11-29T02:05:37Z

@dhdiego Sounds good, a wechat extension would be pretty cool!

I have worked with average word2vec, fasttext and glove all before. They all seem to do OK, so I think you should definitely give it a shot!

Spandan-Madan · 2018-12-03T20:40:04Z

@dhdiego @kwikiel Let me know if you need any help with the code base or if something is unclear. Are you still planning on working on the extensions you had mentioned?

ghost · 2018-12-03T20:45:27Z

@Spandan-Madan The problem about WeChat is that Tencent has saved all the chat history in a database and encrypted it even I download my own chat history. I'm still working on find a stable solution to decrypt it (I found some way to calculate the password base on my own account information but haven't try it). And there is 2 final this week and I'll be free after this Thursday. I believe I'll focus on get the chat data and do the clear script by then.

DiegoD94 · 2018-12-13T21:25:18Z

@Spandan-Madan Hi This is DhDiego, I change my account to this. And above account is no longer active.
What's more I already decrypt chat history database of WeChat and the script will follow up these days

DiegoD94 · 2018-12-13T23:34:47Z

@Spandan-Madan I already commit a PullRequest for WeChat Extension (English only) working of constructing Chinese version with fasttext or glove

Spandan-Madan · 2018-12-15T23:48:58Z

@DH-Diego Accepted the pull request for WeChat!! Looking forward to your pull for chinese!

DiegoD94 · 2018-12-15T23:51:10Z

@Spandan-Madan That's great, I'm working on my final projects and another final exam on 12.20 I'll work on Chinese version after that. I'll try to introduce this fun project to Chinese community after I finish!

Spandan-Madan · 2018-12-15T23:57:30Z

@DH-Diego Sure! Best of luck for your exams! Also, what's your email ID? I am working on some more interesting projects as well, if you are interested we can collaborate. I am always looking for more people to collaborate with!

DiegoD94 · 2018-12-17T18:49:58Z

@Spandan-Madan That sounds great, you can reach me via hd2412@columbia.edu and we can do it together. Sorry for late responde

Spandan-Madan added help wanted Extra attention is needed extension labels Nov 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any one want to add support for messenger chats? Wechat? Telegram? #5

Any one want to add support for messenger chats? Wechat? Telegram? #5

Spandan-Madan commented Nov 28, 2018

kwikiel commented Nov 28, 2018

JafarAkhondali commented Nov 28, 2018

ghost commented Nov 28, 2018

Spandan-Madan commented Nov 29, 2018

Spandan-Madan commented Nov 29, 2018

Spandan-Madan commented Nov 29, 2018

ghost commented Nov 29, 2018

Spandan-Madan commented Nov 29, 2018

Spandan-Madan commented Dec 3, 2018

ghost commented Dec 3, 2018

DiegoD94 commented Dec 13, 2018

DiegoD94 commented Dec 13, 2018

Spandan-Madan commented Dec 15, 2018

DiegoD94 commented Dec 15, 2018

Spandan-Madan commented Dec 15, 2018

DiegoD94 commented Dec 17, 2018

Any one want to add support for messenger chats? Wechat? Telegram? #5

Any one want to add support for messenger chats? Wechat? Telegram? #5

Comments

Spandan-Madan commented Nov 28, 2018

kwikiel commented Nov 28, 2018

JafarAkhondali commented Nov 28, 2018

ghost commented Nov 28, 2018

Spandan-Madan commented Nov 29, 2018

Spandan-Madan commented Nov 29, 2018

Spandan-Madan commented Nov 29, 2018

ghost commented Nov 29, 2018

Spandan-Madan commented Nov 29, 2018

Spandan-Madan commented Dec 3, 2018

ghost commented Dec 3, 2018

DiegoD94 commented Dec 13, 2018

DiegoD94 commented Dec 13, 2018

Spandan-Madan commented Dec 15, 2018

DiegoD94 commented Dec 15, 2018

Spandan-Madan commented Dec 15, 2018

DiegoD94 commented Dec 17, 2018