Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support v2 API data as input #31

Open
narcisoyu opened this issue Apr 21, 2022 · 2 comments
Open

Feature: Support v2 API data as input #31

narcisoyu opened this issue Apr 21, 2022 · 2 comments

Comments

@narcisoyu
Copy link

Hi! I'm doing a research project about Twitter analysis.

I fetched user data by Twitter Academic API (v2), and after usingM3Twitter.transform_jsonl(...) I got the following error:

KeyError                                  Traceback (most recent call last)
<ipython-input-5-23da1cf5d317> in <module>
      5 ,access_token=' ',access_secret=' ')
      6 
----> 7 m3twitter.transform_jsonl(input_file="test.jsonl", output_file="test_result.jsonl")

~/opt/anaconda3/lib/python3.8/site-packages/m3inference/m3twitter.py in transform_jsonl(self, input_file, output_file, img_path_key, lang_key, resize_img, keep_full_size_img)
     48             with open(output_file, "w") as fhOut:
     49                 for line in fhIn:
---> 50                     m3vals = self.transform_jsonl_object(line, img_path_key=img_path_key, lang_key=lang_key,
     51                                                          resize_img=resize_img, keep_full_size_img=keep_full_size_img)
     52                     fhOut.write("{}\n".format(json.dumps(m3vals)))

~/opt/anaconda3/lib/python3.8/site-packages/m3inference/m3twitter.py in transform_jsonl_object(self, input, img_path_key, lang_key, resize_img, keep_full_size_img)
     80             else:
     81                 img_file_resize = img_path
---> 82         elif user["default_profile_image"]:
     83             # Default profile image
     84             img_file_resize = TW_DEFAULT_PROFILE_IMG

KeyError: 'default_profile_image'

I also run the example data provided in m3inference/test/twitter_cache/ and the function runs perfectly.

Then I double-checked the jsonl file, it looks like the two versions of Twitter API (v1 / v2) returns (slightly) different jsonl files (I suppose the example data were made by v1 API). Details please see: https://developer.twitter.com/en/docs/twitter-api/migrate/data-formats/standard-v1-1-to-v2

I'm not sure if my comment makes sense, maybe you could have a look?
Thanks in advance!

@computermacgyver computermacgyver changed the title KeyError: 'default_profile_image' (possible incompatibility with Twitter v2 API jsonl file) Feature: Support v2 API data as input Apr 21, 2022
@computermacgyver
Copy link
Member

computermacgyver commented Apr 21, 2022

Thanks, @narcisoyu . You are correct that this code was designed for the v1.1 API.

We've not written ingest code for v2, but it should be straightforward. I'm happy to support you to do this if you're willing to have a go

@computermacgyver
Copy link
Member

For anyone who finds this. The "Academic API" is the v2 format.

Ultimately, m3 needs data in the format shown in this example file:
https://github.com/euagendas/m3inference/blob/master/test/data.jsonl

We have code to go from the v1.1 API to that format, but do not have code to go from the v2 API output to that format. I'd like to add that but do not have capacity at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants