Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update-Tumblr Crawler impl by [tumblr apiv2] #35

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

geosmart
Copy link

New Features:

  • Tumblr Crawler impl by tumblr apiv2,Solve the authentication issue;
  • You can set POST_TYPE(posts/likes) to download all your liked posts;

Copy link
Owner

@dixudx dixudx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the api doc.

Also there is Audio Post. Better support that as well.

Please run PEP8 check before submiting your codes.

@@ -93,47 +96,3 @@ If you are using Shadowsocks with global mode, your `./proxies.json` can be,
```

And now you can enjoy your downloads.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why deletes below content. Any reasons?


# Numbers of downloading threads concurrently
THREADS = 10

# just a test apikey
API_KEY = "lmvVU5ExdfFZPyGOv0gCknJ2r1UnQEIZTYAYoDhKrq7eJdCn2o"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why posted test apikey here? Seems this api_key belongs to your account. If you want to make this available for public, that's ok.

We'd better provider a function def getApiKey(username, password) to get this apikey instead of this hard code way if you want to use api v2. And the user inputs username and password through CLI or file.

For some users, they have not registered tumblr and may not take time to get registered. This is why I chose api v1 instead of v2.

Anyway, this will be a good enhancement to provide v2 support.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key I found it by google,it's not mine,just use it

API_KEY = "lmvVU5ExdfFZPyGOv0gCknJ2r1UnQEIZTYAYoDhKrq7eJdCn2o"

# enum(posts,likes)
POST_TYPE="likes"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add some doc here.

For most common usage, POST_TYPE = ("likes", "posts")

@@ -163,7 +168,7 @@ def scheduling(self):

def download_media(self, site):
self.download_photos(site)
self.download_videos(site)
# self.download_videos(site)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comments this function.

@@ -185,34 +190,76 @@ def _download_media(self, site, medium_type, start):
if not os.path.isdir(target_folder):
os.mkdir(target_folder)

base_url = "http://{0}.tumblr.com/api/read?type={1}&num={2}&start={3}"
# liked posts:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move to two functions, such as download_likes() and download_posts(). And for each, there will be photos and videos.

print(u"文件proxies.json格式非法.\n"
u"请参照示例文件'proxies_sample1.json'和'proxies_sample2.json'.\n"
u"然后去 http://jsonlint.com/ 进行验证.")
print(u"proxies.json format illegal.\n"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above. Please don't translate back to English. No need to have same message in same language again.

if matched_url is not None:
return matched_url
else:
video_player = post["video_url"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you referred to api v2 doc. I did not see any video_url attributes for video posts.

Copy link
Owner

@dixudx dixudx Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be post["player"], and match the url from the largest resolution (pick the largest value from post["player"][idx_CHANGEME]["width"] ).

# if post has photoset, walk into photoset for each photo
photoset = post["photos"]
for photo in photoset:
self.queue.put((medium_type, photo["original_size"], target_folder))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems wrong here. Please check out the api v2 doc.

@@ -82,15 +88,16 @@ def _register_regex_match_rules(self):
def _handle_medium_url(self, medium_type, post):
try:
if medium_type == "photo":
return post["photo-url"][0]["#text"]
return post["url"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any handlings for post["alt_sizes"], which would contain photos with different resolutions. The largest one should be picked up.


# download like
def downloadTaskPreHandlerOfLikePost(self, medium_type, posts, target_folder):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be refactored along with downloadTaskPreHandlerOfNormalPost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants