Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Lisp trivial-utf-8 error: Invalid byte at start of character: 0xFC #1562

Open
7 tasks done
JosefMachytkaNetApp opened this issue Feb 23, 2024 · 2 comments
Open
7 tasks done

Comments

@JosefMachytkaNetApp
Copy link

JosefMachytkaNetApp commented Feb 23, 2024

Hi guys, I have encounter a problem which seems to come from trivial-utl-8 lisp library. At least I cannot find this error message anywhere in pgloader code or in PostgreSQL code. Only in this lisp library - https://github.com/fukamachi/trivial-utf-8/blob/master/trivial-utf-8.lisp#L110 - but this code seems to be abandoned for 13 years.

I am using pgloader to copy data in parallel from foreign tables created by tds_fdw extension, into target PostgreSQL tables. In PostgreSQL all works as expected. I can select data from FDW table, I can do INSERT INTO target_table SELECT * FROM fdw_table, all works. But when I start pgloader to copy data from all tables, it fails on one particular table with an error message - "A thread failed with error: Invalid byte at start of character: 0xFC" - see details below.

Underlying data in source database are encoded in UTF-16. Tds_fdw and PostgreSQL process data without problems. But looks like lisp library fails. Maybe someone would have some idea what to do with it?

Thank you very much.


  • pgloader --version
pgloader version "3.6.7~devel"
compiled with SBCL 2.2.9.debian
  • did you test a fresh compile from the source tree?
    yes, no change

  • did you search for other similar issues?
    Yes, this seems to be a very rare issue coming from lisp itself.

  • how can I reproduce the bug?
    Underlying data in source database are encoded in UTF-16. Tds_fdw and PostgreSQL processes data without problems. But lisp library fails.

  • pgloader output you obtain

2024-02-23T08:34:24.097092Z ERROR A thread failed with error: Invalid byte at start of character: 0xFC
2024-02-23T08:34:24.101092Z ERROR Date/time: 2024-02-23-09:34!
An unhandled error condition has been signalled:
   Invalid byte at start of character: 0xFC

Backtrace for: #<SB-THREAD:THREAD "lparallel" RUNNING {10059F0073}>
0: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE-TO-STREAM #<SB-IMPL::STRING-OUTPUT-STREAM {1008493303}>)
1: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE #<CL-POSTGRES-TRIVIAL-UTF-8:UTF-8-DECODING-ERROR {10084930F3}> :OUTPUT NIL :IF-EXISTS :APPEND :VERBOSE NIL)
2: ((LAMBDA (CONDITION) :IN PGLOADER.LOAD:COPY-FROM) #<CL-POSTGRES-TRIVIAL-UTF-8:UTF-8-DECODING-ERROR {10084930F3}>)
3: (LPARALLEL.KERNEL::CONDITION-HANDLER #<CL-POSTGRES-TRIVIAL-UTF-8:UTF-8-DECODING-ERROR {10084930F3}>)
4: (SB-KERNEL::%SIGNAL #<CL-POSTGRES-TRIVIAL-UTF-8:UTF-8-DECODING-ERROR {10084930F3}>)
5: (ERROR CL-POSTGRES-TRIVIAL-UTF-8:UTF-8-DECODING-ERROR :BYTE 252 :MESSAGE "Invalid byte at start of character: 0x~X")
6: (CL-POSTGRES-TRIVIAL-UTF-8:READ-UTF-8-STRING #<unavailable argument> :NULL-TERMINATED #<unavailable argument> :STOP-AT-EOF #<unavailable argument> :CHAR-LENGTH #<unavailable argument> :BYTE-LENGTH #<unavailab
le argument>)
7: ((LAMBDA (#:G0 #:G1) :IN "/usr/share/common-lisp/source/postmodern/cl-postgres/interpret.lisp") #<SB-SYS:FD-STREAM for "socket, peer: /var/run/postgresql/.s.PGSQL.5432" {1007EED623}> 540)
8: ((LAMBDA (#:G39 PGLOADER.SOURCES:FIELDS) :IN PGLOADER.SOURCES:MAP-ROWS) #<SB-SYS:FD-STREAM for "socket, peer: /var/run/postgresql/.s.PGSQL.5432" {1007EED623}> #(#<CL-POSTGRES::FIELD-DESCRIPTION {1007F0AD63}> 
#<CL-POSTGRES::FIELD-DESCRIPTION {1007F0AF13}> #<CL-POSTGRES::FIELD-DESCRIPTION {1007F0B0C3}> #<CL-POSTGRES::FIELD-DESCRIPTION {1007F0B273}> #<CL-POSTGRES::FIELD-DESCRIPTION {1007F0B423}> #<CL-POSTGRES::FIELD-DE
SCRIPTION {1007F0B5D3}> #<CL-POSTGRES::FIELD-DESCRIPTION {1007F0B783}>))
9: (CL-POSTGRES::SEND-QUERY #<SB-SYS:FD-STREAM for "socket, peer: /var/run/postgresql/.s.PGSQL.5432" {1007EED623}> #<unavailable argument> #<unavailable argument>)
10: ((LABELS #:G4 :IN CL-POSTGRES:EXEC-QUERY))
11: ((:METHOD PGLOADER.SOURCES:MAP-ROWS (PGLOADER.SOURCE.PGSQL:COPY-PGSQL)) #<PGLOADER.SOURCE.PGSQL:COPY-PGSQL {1005D33833}> :PROCESS-ROW-FN #<FUNCTION (LAMBDA (PGLOADER.LOAD::ROW) :IN PGLOADER.LOAD:QUEUE-RAW-DA
TA) {1007EEBD0B}>) [fast-method]
  • data that is being loaded, if relevant
    Cannot past client's data here

  • How the data is different from what you expected, if relevant
    Lisp library fails on reading data, although in PostgreSQL itself all works.

@svantevonerichsen6906
Copy link
Collaborator

Can't look deeper now, but trying to decode UTF-16 with UTF-8 would produce this problem, yes. Did you specify the --encoding option?

@JosefMachytkaNetApp
Copy link
Author

@svantevonerichsen6906 Thank you for your comment, I tried it, but looks like in PostgreSQL to PostgreSQL mode pgloader does not allow this parameter. When I add it in command line, I get following messages and pgloader stops:

pgloader version 3.6.7~devel
compiled with SBCL 2.2.9.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2024-02-23T12:39:52.004000Z NOTICE Starting pgloader, log system is ready.
2024-02-23T12:39:52.016001Z INFO Starting monitor
2024-02-23T12:39:52.024001Z LOG pgloader version "3.6.7~devel"
2024-02-23T12:39:52.024001Z INFO Stopping monitor

And if I try to add it into WITH part, pgloader complains on config file parsing:

2024-02-23T12:38:48.316014Z INFO Stopping monitor

What I am doing here?

At

          concurrency = 1,
          prefetch rows = 100,
  
                             ^ (Line 11, Column 27, Position 249)

In context PGSQL-OPTIONS:

While parsing PGSQL-OPTIONS. Expected:

     the character Tab
  or the character Newline
  or the character Return
  or the character Space
  or the string "--"
  or the string "/*"
  or the string "batch"
  or the string "concurrency"
  or the string "create"
  or the string "data"
  or the string "disable"
  or the string "downcase"
  or the string "drop"
  or the string "foreign"
  or the string "include"
  or the string "max"
  or the string "no"
  or the string "on"
  or the string "prefetch"
  or the string "preserve"
  or the string "quote"
  or the string "reset"
  or the string "schema"
  or the string "snake_case"
  or the string "truncate"
  or the string "uniquify"
  or the string "workers"

I showed only last 2 lines from WITH part which it was able to parse, next one is "encoding" and it fails on it because it is not among expected strings.

So unfortunately it looks like "encoding" cannot help here.

BTW, I used FreeTDS library with "client charset = UTF-8" so it should convert data to UTF-8 - it had been discussed already on StackOverflow. So it is surprising for me that Lisp seems to fail. Do you know if ODBC would work better here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants