Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charset 1251 problem #509

Open
sakirsa opened this issue Dec 3, 2022 · 6 comments
Open

charset 1251 problem #509

sakirsa opened this issue Dec 3, 2022 · 6 comments

Comments

@sakirsa
Copy link

sakirsa commented Dec 3, 2022

Hi. I encountered a problem with the encoding. Cyrillic does not display, how to collect data in the encoding 1251. tryed to use iconv, but Fatal error: Uncaught TypeError: iconv(): Argument #3 ($string) must be of type string, Embed\Extractor given

@oscarotero
Copy link
Owner

Can I have an URL to reproduce this?

@sakirsa
Copy link
Author

sakirsa commented Dec 4, 2022

@sakirsa
Copy link
Author

sakirsa commented Dec 4, 2022

https://feedster.ru/emb.php to check if cloudflare blocks

@parrycarry
Copy link

Oooo, I too would like to see a solution for this. The Cyrillic is completely voided when the page isn't UTF-8.

@oscarotero
Copy link
Owner

Embed uses this parser to convert the HTML to DOMDocuments and deal with encoding issues.

The function is used here. I've forced the encoding in the second argument and it works fine:

$this->document = !empty($html) ? Parser::parse($html, "windows-1251") : new DOMDocument();

So probably the automatic encoding detection of the parser doesn't work fine. Not sure how to fix it and I don't have much time to dig into this.

If anyone wants to work on a PR to fix it, it's very appreciated.

@parrycarry
Copy link

I haven't thoroughly tested this... but $detected = $encoding ?? mb_detect_encoding($html, NULL, true); seems to fix the problem.... The Windows-1251 Cyrillic shows up as well as UTF-8 Cyrillic no problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants