Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing process_data_weibo.py #1

Open
wise-east opened this issue Dec 5, 2018 · 9 comments
Open

Missing process_data_weibo.py #1

wise-east opened this issue Dec 5, 2018 · 9 comments

Comments

@wise-east
Copy link

I'd like to train this model and reproduce some of its results. I realized there is a local file 'process_data_weibo' that has been imported in the EANN_model.py which is not included in this repository. Would it be possible to have process_data_weibo.py file to be uploaded as well to get a better understanding of the processing that took place so I can quickly format other data to fit this model?

import process_data_weibo as process_data

@yaqingwang
Copy link
Owner

Because lots of the procedures are more like matching image and corresponding post. For common preprocessing, for the text processing, I remove the stopwords and use the jieba as word splitter. Th word embeddings are pretrained on the given dataset. Hope this helps.

@yuppx
Copy link

yuppx commented Dec 10, 2018

Can you expose a dataset or expose several dataset instances, dataset formats, pre-trained models, or checkpoints?

@yaqingwang
Copy link
Owner

I have uploaded a small dataset to show the dataset formats.

@yaqingwang
Copy link
Owner

yaqingwang commented Jan 11, 2019

To be more convenient, the prcoess_data_weibo.py is also added. The model can be trained and tested on uploaded example dataset with 2 options: EANN ---multimodal(text and image) and EANN_text ----single textual features.

@WeiJie96
Copy link

WeiJie96 commented Jun 6, 2019

I was trying to replicate some of the results in the paper, but I realised that the metrics such as accuracy and precision were fluctuating widely due to the small dataset. After removing the posts without images, the size of the training set is 43, the validation set is 11, and the test set is 20. Could you kindly upload a larger dataset so that we can get a better sense of the metrics?
Thank you

@yaqingwang
Copy link
Owner

Thanks for your interest in our work. I will upload a larger data. Hope this will help.

@WeiJie96
Copy link

Thank you very much

@lidream
Copy link

lidream commented Dec 5, 2019

Thanks for your interest in our work. I will upload a larger data. Hope this will help.

I have read your paper in KDD 2018, it's quite great. And now i'm doing your experiment, would mind also share a larger dataset for me?

@anmolasati
Copy link

Can you give a link for Twitter Dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants