update-Tumblr Crawler impl by [tumblr apiv2] #35

geosmart · 2017-06-16T08:44:48Z

New Features:

Tumblr Crawler impl by tumblr apiv2,Solve the authentication issue;
You can set POST_TYPE(posts/likes) to download all your liked posts;

dixudx

Please follow the api doc.

Also there is Audio Post. Better support that as well.

Please run PEP8 check before submiting your codes.

dixudx · 2017-06-16T08:49:21Z

README.md

@@ -93,47 +96,3 @@ If you are using Shadowsocks with global mode, your `./proxies.json` can be,
 ```

 And now you can enjoy your downloads.


Why deletes below content. Any reasons?

dixudx · 2017-06-16T08:59:50Z

tumblr-photo-video-ripper.py


 # Numbers of downloading threads concurrently
 THREADS = 10

+# just a test apikey
+API_KEY = "lmvVU5ExdfFZPyGOv0gCknJ2r1UnQEIZTYAYoDhKrq7eJdCn2o"


Why posted test apikey here? Seems this api_key belongs to your account. If you want to make this available for public, that's ok.

We'd better provider a function def getApiKey(username, password) to get this apikey instead of this hard code way if you want to use api v2. And the user inputs username and password through CLI or file.

For some users, they have not registered tumblr and may not take time to get registered. This is why I chose api v1 instead of v2.

Anyway, this will be a good enhancement to provide v2 support.

the key I found it by google,it's not mine，just use it

dixudx · 2017-06-16T09:06:23Z

tumblr-photo-video-ripper.py

+API_KEY = "lmvVU5ExdfFZPyGOv0gCknJ2r1UnQEIZTYAYoDhKrq7eJdCn2o"
+
+# enum(posts,likes)
+POST_TYPE="likes"


Better add some doc here.

For most common usage, POST_TYPE = ("likes", "posts")

dixudx · 2017-06-16T09:07:26Z

tumblr-photo-video-ripper.py

@@ -163,7 +168,7 @@ def scheduling(self):

    def download_media(self, site):
        self.download_photos(site)
-        self.download_videos(site)
+        # self.download_videos(site)


Why comments this function.

dixudx · 2017-06-16T09:09:30Z

tumblr-photo-video-ripper.py

@@ -185,34 +190,76 @@ def _download_media(self, site, medium_type, start):
        if not os.path.isdir(target_folder):
            os.mkdir(target_folder)

-        base_url = "http://{0}.tumblr.com/api/read?type={1}&num={2}&start={3}"
+        # liked posts:


Should move to two functions, such as download_likes() and download_posts(). And for each, there will be photos and videos.

dixudx · 2017-06-16T09:17:31Z

tumblr-photo-video-ripper.py

-    print(u"文件proxies.json格式非法.\n"
-          u"请参照示例文件'proxies_sample1.json'和'proxies_sample2.json'.\n"
-          u"然后去 http://jsonlint.com/ 进行验证.")
+    print(u"proxies.json format illegal.\n"


Same above. Please don't translate back to English. No need to have same message in same language again.

dixudx · 2017-06-16T09:29:51Z

tumblr-photo-video-ripper.py

-                    if matched_url is not None:
-                        return matched_url
-                else:
+                video_player = post["video_url"]


Have you referred to api v2 doc. I did not see any video_url attributes for video posts.

Should be post["player"], and match the url from the largest resolution (pick the largest value from post["player"][idx_CHANGEME]["width"] ).

dixudx · 2017-06-16T09:33:59Z

tumblr-photo-video-ripper.py

+                    # if post has photoset, walk into photoset for each photo
+                    photoset = post["photos"]
+                    for photo in photoset:
+                        self.queue.put((medium_type, photo["original_size"], target_folder))


Seems wrong here. Please check out the api v2 doc.

dixudx · 2017-06-16T09:35:57Z

tumblr-photo-video-ripper.py

@@ -82,15 +88,16 @@ def _register_regex_match_rules(self):
    def _handle_medium_url(self, medium_type, post):
        try:
            if medium_type == "photo":
-                return post["photo-url"][0]["#text"]
+                return post["url"]


I didn't see any handlings for post["alt_sizes"], which would contain photos with different resolutions. The largest one should be picked up.

dixudx · 2017-06-16T09:36:50Z

tumblr-photo-video-ripper.py


+    # download like
+    def downloadTaskPreHandlerOfLikePost(self, medium_type, posts, target_folder):


should be refactored along with downloadTaskPreHandlerOfNormalPost

update-Tumblr Crawler impl by [tumblr apiv2]

c71a5b1

dixudx requested changes Jun 16, 2017

View reviewed changes

dixudx mentioned this pull request Sep 17, 2017

Access Denied when retrieve Failed to retrieve video from xxx #43

Closed

mohd-haikal98 approved these changes Feb 5, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update-Tumblr Crawler impl by [tumblr apiv2] #35

update-Tumblr Crawler impl by [tumblr apiv2] #35

geosmart commented Jun 16, 2017

dixudx left a comment

dixudx Jun 16, 2017

dixudx Jun 16, 2017

geosmart Jul 17, 2017

dixudx Jun 16, 2017

dixudx Jun 16, 2017

dixudx Jun 16, 2017

dixudx Jun 16, 2017

dixudx Jun 16, 2017

dixudx Jun 16, 2017 •

edited

dixudx Jun 16, 2017

dixudx Jun 16, 2017

dixudx Jun 16, 2017

		@@ -93,47 +96,3 @@ If you are using Shadowsocks with global mode, your `./proxies.json` can be,
		```

		And now you can enjoy your downloads.


		# download like
		def downloadTaskPreHandlerOfLikePost(self, medium_type, posts, target_folder):

update-Tumblr Crawler impl by [tumblr apiv2] #35

Are you sure you want to change the base?

update-Tumblr Crawler impl by [tumblr apiv2] #35

Conversation

geosmart commented Jun 16, 2017

dixudx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dixudx Jun 16, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dixudx Jun 16, 2017 •

edited