Tidy up preprocess.py with pandas #2

abdulfatir · 2017-12-29T21:22:44Z

In preprocess_csv function in preprocess.py (link), pandas can be used to parse the csv more efficiently and with way less code. The machine I was using while developing the project did not have pandas installed.

The text was updated successfully, but these errors were encountered:

GongQin721 · 2018-03-29T08:23:31Z

I run your codes,it happend [Errno 2] No such file or directory: '../train-processed-freqdist.pkl',can you solve my problem?Thank you

abdulfatir · 2018-03-29T12:32:11Z

@GongQin721 This is off-topic. Please read the Readme properly.

GongQin721 · 2018-04-29T07:27:56Z

OK ,thank you very much!

chaiitanyasangani88 · 2018-10-18T11:59:48Z

Can you help me with headers of the csv, if any? If not, some idea about the structure of csv would be of great help.

abdulfatir · 2018-10-18T14:07:35Z

Hi @chaiitanyasangani88

The csv structure is in the Dataset Information section:

We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets.

Carolinecrl · 2018-11-27T08:10:54Z

In your lstm.py code, these '.csv' and '.pkl' files are showed asFREQ_DIST_FILE = '../train-processed-freqdist.pkl' ,TRAIN_PROCESSED_FILE = '../train-processed.csv' and so on.
I wonder how can I process these file from 'positive-words.txt' and 'negative-words.txt' in dataset.
Could you please help me with problems above?

abdulfatir · 2018-11-27T09:11:25Z

@Carolinecrl

'positive-words.txt' and 'negative-words.txt' are not the dataset. They're just for the baseline. The dataset is not included in the repo.

16L31A0575n1 · 2020-04-07T05:52:58Z

in stats.py which csv file should be sent train or test or any another sample
(random )one.

abdulfatir added good first issue help wanted labels Dec 29, 2017

abdulfatir added the Hacktoberfest label Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tidy up preprocess.py with pandas #2

Tidy up preprocess.py with pandas #2

abdulfatir commented Dec 29, 2017

GongQin721 commented Mar 29, 2018

abdulfatir commented Mar 29, 2018

GongQin721 commented Apr 29, 2018

chaiitanyasangani88 commented Oct 18, 2018

abdulfatir commented Oct 18, 2018

Carolinecrl commented Nov 27, 2018

abdulfatir commented Nov 27, 2018

16L31A0575n1 commented Apr 7, 2020

Tidy up preprocess.py with pandas #2

Tidy up preprocess.py with pandas #2

Comments

abdulfatir commented Dec 29, 2017

GongQin721 commented Mar 29, 2018

abdulfatir commented Mar 29, 2018

GongQin721 commented Apr 29, 2018

chaiitanyasangani88 commented Oct 18, 2018

abdulfatir commented Oct 18, 2018

Carolinecrl commented Nov 27, 2018

abdulfatir commented Nov 27, 2018

16L31A0575n1 commented Apr 7, 2020