Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Ids are longer than expected #783

Open
wants to merge 3 commits into
base: development
Choose a base branch
from

Conversation

thomas694
Copy link

When using a list of (valid) ids in a command like
bdfr clone --subreddit ... --include-id-file Z:/ID_list.txt Z:/Reddit
some ids fail with praw.exceptions.InvalidURL: Invalid URL: zabcdefg.

Example exception:

[2023-02-17 12:34:56,789 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "Z:\bulk-downloader-for-reddit\bdfr\__main__.py", line 160, in cli_clone
    reddit_scraper = RedditCloner(config, [stream])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\cloner.py", line 19, in __init__
    super(RedditCloner, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\downloader.py", line 41, in __init__
    super(RedditDownloader, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 30, in __init__
    super(Archiver, self).__init__(args, logging_handlers)
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 65, in __init__
    self.reddit_lists = self.retrieve_reddit_lists()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\connector.py", line 174, in retrieve_reddit_lists
    master_list.extend(self.get_submissions_from_link())
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 65, in get_submissions_from_link
    supplied_submissions.append(self.reddit_instance.submission(url=sub_id))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\reddit.py", line 981, in submission
    return models.Submission(self, id=id, url=url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 586, in __init__
    self.id = self.id_from_url(url)
              ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\submission.py", line 458, in id_from_url
    parts = RedditBase._url_parts(url)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python\Lib\site-packages\praw\models\reddit\base.py", line 19, in _url_parts
    raise InvalidURL(url)
praw.exceptions.InvalidURL: Invalid URL: zabcdefg

In the meantime ids got longer (6-8 characters) (archiver.py#L60, connector.py#L313).

Fixes the two relevant locations.

@Serene-Arc
Copy link
Collaborator

Hi, thanks for the PR.

Question: if praw crashes, how do you know that the IDs are valid? That would suggest that they are, in fact, not valid because Reddit's own library is rejecting them.

@thomas694
Copy link
Author

thomas694 commented Feb 19, 2023

It crashed saying "invalid url: ...", but I used a list of ids. When I looked at the line given in the error I saw that the length of the id was different than what was checked for. As soon as I changed the line the downloads worked.
When the length check fails, BDFR passes the id to praw as an url.

@Serene-Arc
Copy link
Collaborator

Right, seems fine then. If you can write up tests for this bug, I'll review and merge.

@thomas694
Copy link
Author

I've to correct myself, the length of IDs is 6 or 7 characters, there are no 8 characters yet. I didn't manually check the few longer ones in the big list which my script outputted, they are indeed invalid. So only one location needs to be adapted.

I cannot run the test suite locally, but I copied a similar looking test and adapted it. The build system will show if it works.

@geneccx
Copy link

geneccx commented Jul 15, 2023

There are also 5 character IDs. #620 is related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants