Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SITE] How to download "posts" which are links to comments? #892

Open
3 tasks done
germyparker opened this issue Jun 18, 2023 · 1 comment
Open
3 tasks done

[SITE] How to download "posts" which are links to comments? #892

germyparker opened this issue Jun 18, 2023 · 1 comment

Comments

@germyparker
Copy link

germyparker commented Jun 18, 2023

First of all, I'm not sure how to submit this as a question, because I don't think it's a bug, and "[SITE]" seemed like the best option...?

  • I am requesting a site support.
  • I am running the latest version of BDfR
  • I have read the Opening an issue

This might be a weird question - but -

I'm trying to download an entire subreddit which consists only of links to comments in other subreddits. I'm hoping to get the single comment the link goes to (ideally actually the entire thread that follows, but beggars can't be choosers).

However, instead of getting a md file with the comment, I'm getting the content of the OP of the linked thread. Does that make sense?

Alternatively: The subreddit reposts comments by one specific user, so an alternative is to just download everything that user has ever said. This is sub-optimal for several reasons: 1, not every comment is useful/interesting, the subreddit is just the good ones, and 2, after about 30 posts, I get the following error:

praw.exceptions.ClientException: This comment does not appear to be in the comment tree 

Here's the command I'm using:

bdfr archive --user PoppinKREAM --all-comments --file-scheme '{REDDITOR}_{SUBREDDIT}_{TITLE}_{POSTID}' ./output

and the full error:

Traceback (most recent call last):
  File "/usr/local/bin/bdfr", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bdfr/__main__.py", line 139, in cli_archive
    reddit_archiver.download()
  File "/usr/local/lib/python3.10/site-packages/bdfr/archiver.py", line 49, in download
    self.write_entry(submission)
  File "/usr/local/lib/python3.10/site-packages/bdfr/archiver.py", line 92, in write_entry
    self._write_entry_json(archive_entry)
  File "/usr/local/lib/python3.10/site-packages/bdfr/archiver.py", line 103, in _write_entry_json
    content = json.dumps(entry.compile())
  File "/usr/local/lib/python3.10/site-packages/bdfr/archive_entry/comment_archive_entry.py", line 18, in compile
    self.source.refresh()
  File "/usr/local/lib/python3.10/site-packages/praw/models/reddit/comment.py", line 309, in refresh
    raise ClientException(self.MISSING_COMMENT_MESSAGE)
praw.exceptions.ClientException: This comment does not appear to be in the comment tree

Finally, I think this is the post it's failing on:

https://old.reddit.com/r/reddevils/comments/146eg1s/brandon_williams_rant_roudup/jnqnprn/

I'm using the latest version via pip, updated last week.

To reiterate: I would much prefer a solution to the initial problem, if there is one: how to download posts that are links to comments.

@Fakeaccount12312
Copy link

Fakeaccount12312 commented Oct 11, 2023

What is the subreddit you tried to originally download, and the command you used? Would like to try this myself. If you are talking about r/ShitPoppinKreamSays, it just fails downloading anything since the links there are np.reddit.com links and bdfr has no proper downloading module for that. You could try scraping the log bdfr generates for these links though, collecting them in a file and downloading that using bdfr archive --include-id-file comments.txt --comment-context. See #835 for some inspiration for how I tried that method. Some kind of hacking is probably required. Also note that #851 could cause some issues here. I check Github very infrequently, so a reply might take some time, but I hope my tips help somewhat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants