Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Incorrect string value" when adding some PDFs #7264

Open
tsia opened this issue Feb 18, 2024 · 0 comments
Open

"Incorrect string value" when adding some PDFs #7264

tsia opened this issue Feb 18, 2024 · 0 comments

Comments

@tsia
Copy link

tsia commented Feb 18, 2024

Environment

  • Version: 2.6.8
  • Installation: git clone
  • PHP version: 7.4.33
  • OS: Debian 11
  • Database: MariaDB
  • Parameters:
My app/config/parameters.yml is:
# This file is auto-generated during the composer install
parameters:
  database_driver: pdo_mysql
  database_host: mysql.local
  database_port: null
  database_name: wallabag
  database_user: wallabag
  database_password: ********
  database_path: null
  database_table_prefix: wallabag_
  database_socket: null
  database_charset: utf8mb4
  domain_name: 'https://wallabag.example.com'
  server_name: 'Your wallabag instance'
  locale: en
  secret: ********
  twofactor_sender: mail@example.com
  fosuser_registration: false
  fosuser_confirmation: false
  fos_oauth_server_access_token_lifetime: 3600
  fos_oauth_server_refresh_token_lifetime: 1209600
  from_email: mail@example.com
  rss_limit: 50
  rabbitmq_host: localhost
  rabbitmq_port: 5672
  rabbitmq_user: guest
  rabbitmq_password: guest
  rabbitmq_prefetch_count: 10
  redis_scheme: tcp
  redis_host: cache.local
  redis_port: 6379
  redis_path: null
  redis_password: null
  sentry_dsn: null
  mailer_dsn: 'smtp://mailout.local'

What steps will reproduce the bug?

When adding this PDF: https://trustedcomputinggroup.org/wp-content/uploads/TCG-PC-Client-Platform-Firmware-Profile-Version-1.06-Revision-52_pub-1.pdf i get a MySQL Error:
SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\xEC\x00\xA0\xE0";...' for column `wallabag`.`wallabag_entry`.`published_by` at row 1. I checked utf8mb4 settings in the database and config above and everything seems fine. To me it looks like the string in question isn't even valid UTF-8 which is why MySQL is complaining. so i guess the PDF extraction doesn't like the document and spits out some weird bytes.

my quick and dirty fix was to patch Entry::setPublishedBy() like this:

public function setPublishedBy($publishedBy)
{
    $this->publishedBy = preg_replace('/[^(\x20-\x7F)]*/','', $publishedBy);

    return $this;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant