Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up preprocess.py with pandas #2

Open
abdulfatir opened this issue Dec 29, 2017 · 8 comments
Open

Tidy up preprocess.py with pandas #2

abdulfatir opened this issue Dec 29, 2017 · 8 comments

Comments

@abdulfatir
Copy link
Owner

In preprocess_csv function in preprocess.py (link), pandas can be used to parse the csv more efficiently and with way less code. The machine I was using while developing the project did not have pandas installed.

@GongQin721
Copy link

I run your codes,it happend [Errno 2] No such file or directory: '../train-processed-freqdist.pkl',can you solve my problem?Thank you

@abdulfatir
Copy link
Owner Author

@GongQin721 This is off-topic. Please read the Readme properly.

@GongQin721
Copy link

OK ,thank you very much!

@chaiitanyasangani88
Copy link

Can you help me with headers of the csv, if any? If not, some idea about the structure of csv would be of great help.

@abdulfatir
Copy link
Owner Author

Hi @chaiitanyasangani88

The csv structure is in the Dataset Information section:

We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets.

@Carolinecrl
Copy link

In your lstm.py code, these '.csv' and '.pkl' files are showed asFREQ_DIST_FILE = '../train-processed-freqdist.pkl' ,TRAIN_PROCESSED_FILE = '../train-processed.csv' and so on.
I wonder how can I process these file from 'positive-words.txt' and 'negative-words.txt' in dataset.
Could you please help me with problems above?

@abdulfatir
Copy link
Owner Author

@Carolinecrl

'positive-words.txt' and 'negative-words.txt' are not the dataset. They're just for the baseline. The dataset is not included in the repo.

@16L31A0575n1
Copy link

in stats.py which csv file should be sent train or test or any another sample
(random )one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants