Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Crashes on praw.exceptions #782

Open
wants to merge 2 commits into
base: development
Choose a base branch
from

Conversation

thomas694
Copy link

Downloading a subreddit and additionally specifying a list with (valid, but partially older) submission ids like
bdfr clone --subreddit ... --include-id-file Z:/ID_list.txt Z:/Reddit crashes on download of some submissions.

Example exceptions:

[2023-02-17 12:34:56,789 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "Z:\bulk-downloader-for-reddit\bdfr\cloner.py", line 26, in download
    self._download_submission(submission)
  File "Z:\bulk-downloader-for-reddit\bdfr\downloader.py", line 62, in _download_submission
    elif submission.subreddit.display_name.lower() in self.args.skip_subreddit:
         ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\base.py", line 34, in __getattr__
    self._fetch()
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\comment.py", line 195, in _fetch
    raise ClientException(f"No data returned for comment {self.fullname}")
praw.exceptions.ClientException: No data returned for comment t1_xyz
[2023-02-17 12:34:56,789 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "Z:\bulk-downloader-for-reddit\bdfr\__main__.py", line 160, in cli_clone
    reddit_scraper = RedditCloner(config, [stream])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\cloner.py", line 19, in __init__
    super(RedditCloner, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\downloader.py", line 41, in __init__
    super(RedditDownloader, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 30, in __init__
    super(Archiver, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 65, in __init__
    self.reddit_lists = self.retrieve_reddit_lists()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 174, in retrieve_reddit_lists
    master_list.extend(self.get_submissions_from_link())
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 65, in get_submissions_from_link
    supplied_submissions.append(self.reddit_instance.submission(url=sub_id))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\reddit.py", line 981, in submission
    return models.Submission(self, id=id, url=url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 586, in __init__
    self.id = self.id_from_url(url)
              ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 458, in id_from_url
    parts = RedditBase._url_parts(url)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\base.py", line 19, in _url_parts
    raise InvalidURL(url)
praw.exceptions.InvalidURL: Invalid URL: zabcdefg

praw/docs/code_overview/exceptions.rst says:
"In addition to exceptions under the praw.exceptions namespace shown below,
exceptions might be raised that inherit from prawcore.PrawcoreException."

The code uses praw but doesn't catch both base exceptions at all relevant locations like cloner.py#L28

        try:
            self._download_submission(submission)
            self.write_entry(submission)
        except prawcore.PrawcoreException as e:
            logger.error(f"Submission {submission.id} failed to be cloned due to a PRAW exception: {e}")

The fix catches both base exceptions at all locations where they haven't been caught both yet
and lets the run finish even if for some things an error is thrown.

Fixes #764 too, resolves #713.

@Serene-Arc
Copy link
Collaborator

Hi, thanks for the PR. We require tests to be added that cover any new additions to the code. If you could add tests in the relevant locations that show a test failing due to those errors that then passes when you add the fix, I can merge this PR!

@thomas694
Copy link
Author

As Python is none of my main languages and I'm not familiar with Python tests I don't know how to write a test for it. Actually I don't see it as a code addition but the required checking for the second exception what was forgotten before.
But maybe someone experienced in writing tests can co-author here?

@Serene-Arc
Copy link
Collaborator

I can, but it will need to wait until I have the time. And it is a code addition, and fixes a bug. Therefore it needs tests to make sure that it doesn't reoccur.

@thomas694
Copy link
Author

Of course, thanks.

@Serene-Arc
Copy link
Collaborator

@thomas694 I'm getting to this now, in the logs you've censored the comments that caused this issue. Can you please supply them?

@thomas694
Copy link
Author

"No data":
bdfr clone --subreddit movies --limit 1 --link 14atfia E:/Reddit_Test
https://www.reddit.com/r/movies/comments/14atfia

"Invalid URL":
You can use any ID that is less than 6 characters long.
bdfr clone --subreddit reddit.com --limit 1 --link 87 E:/Reddit_Test
https://www.reddit.com/r/reddit.com/comments/87

@Serene-Arc
Copy link
Collaborator

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants