Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter extractor will retrieve "/home" instead of a tweet URL #520

Open
Divi opened this issue Jun 30, 2023 · 2 comments · May be fixed by #532
Open

Twitter extractor will retrieve "/home" instead of a tweet URL #520

Divi opened this issue Jun 30, 2023 · 2 comments · May be fixed by #532

Comments

@Divi
Copy link

Divi commented Jun 30, 2023

Twitter is now a fully logged-on website: you cannot access a tweet without an account.
So, the extractor will try to request the twitter.com/xxx/status/xxx but will follow location on /home (with the login screen) and will attempt to call the oembed API with /home URI.

The only fix that I found to disable this behavior is to disable the "follow redirection" behavior on cURL.

$client = new CurlClient();
$client->setSettings([
    'follow_location' => false
]);

$embed = new Embed(new Crawler($client));

We may use the cookie to inject the auth_token cookie, but I'm not sure the token won't change after a few hours/days.

This may impact other embeds, so if you have a better solution, please let me know!

@stevecoug
Copy link
Contributor

Thank you for the fix, that worked for me as well. I only use that for twitter.com URLs.

@helmo
Copy link

helmo commented Mar 27, 2024

Thanks, the 'follow_location' also helped here....

Here's the patch how I added it, being used from the Drupal url_embed module, https://www.drupal.org/project/url_embed/issues/3435840

--- src/Http/Crawler.php.orig   2024-03-27 13:33:31.547671482 +0100
+++ src/Http/Crawler.php        2024-03-27 13:34:14.180154682 +0100
@@ -23,6 +23,9 @@
     public function __construct(ClientInterface $client = null, RequestFactoryInterface $requestFactory = null, UriFactoryInterface $uriFactory = null)
     {
         $this->client = $client ?: new CurlClient();
+        $this->client->setSettings([
+                'follow_location' => false
+        ]);
         $this->requestFactory = $requestFactory ?: FactoryDiscovery::getRequestFactory();
         $this->uriFactory = $uriFactory ?: FactoryDiscovery::getUriFactory();
     }

rootpd pushed a commit to remp2020/remp that referenced this issue Mar 28, 2024
miroc added a commit to remp2020/mailer-module that referenced this issue Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants