Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant download dataset #4

Open
Lukecn1 opened this issue Oct 26, 2022 · 4 comments
Open

Cant download dataset #4

Lukecn1 opened this issue Oct 26, 2022 · 4 comments

Comments

@Lukecn1
Copy link

Lukecn1 commented Oct 26, 2022

However I have issues in downloading the data as there are many of the links that are no longer working and therefore cannot be scraped.

This is even true for the sample_data.csv, where a large percentage is missing one or both articles in the pair.

Are you able to share the evaluation dataset privately?

@computermacgyver
Copy link
Member

Hi @Lukecn1 . Unfortunately copyright law prevents us from sharing the news articles directly 😞
Most articles are available on the Internet Archive, and the code should automatically try to download from there. The sample data was created earliest in the project before we started ensuring articles were on the Internet Archive; so, although the sample data may be missing most of the actual articles used in the SemEval competition should be available.

@Lukecn1
Copy link
Author

Lukecn1 commented Oct 26, 2022

Thats fair, I hadn't considered the copyright aspect.

I experienced the same issue when scraping the evaluation dataset however.

I will try from scratch again, and see of maybe its an issue on my end.

@intifa233
Copy link

Hi, This question may be stupid, I am just a beginner at python. I created a new environment successfully installed the requirements.txt. Also the downloader by "pip install semeval_8_2022_ia_downloader".
When I used "python -m semeval_8_2022_ia_downloader.cli --links_file=input.csv --dump_dir=output_dir", it said "FileNotFoundError: [Errno 2] No such file or directory: 'input.csv'".
Would you please tell me what should I do? Thank you!

@computermacgyver
Copy link
Member

Welcome @intifa233 . All questions are good ones. I'm opening a separate issue to discuss this. Please see #5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants