Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't decode ill-formed UTF-8 octet sequence when using --set argument #809

Open
joaostorrer opened this issue Jan 11, 2024 · 17 comments
Open
Assignees
Labels
waiting Waiting on feedback

Comments

@joaostorrer
Copy link

joaostorrer commented Jan 11, 2024

Sqitch is failing whenever I set a variable with a value that contains an accent:

root@d975b89f1c63:/repo# sqitch deploy flipr_test --set var="ãçê"
Deploying changes to flipr_test
  + bla_heg ......... not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.

Deploy failed

The same happens with revert and rebase commands.

I tried setting LC_ALL & LANG envs to pt_BR.UTF-8, but the error continues.

root@d975b89f1c63:/repo# sqitch --version
sqitch (App::Sqitch) v1.4.0
root@d975b89f1c63:/repo# perl -v

This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi
(with 47 registered patches, see perl -V for more detail)

Copyright 1987-2021, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
@theory
Copy link
Collaborator

theory commented Jan 16, 2024

Hrm. That's coming from the spool method, which spools output from the database CLI. Which RDBMS are you using? I wonder if it's not properly emitting UTF-8 for some reason.

@theory theory self-assigned this Jan 16, 2024
@joaostorrer
Copy link
Author

I'm using PostgreSQL.

@theory
Copy link
Collaborator

theory commented Jan 16, 2024

Is your database using UTF-8 encoding? Check client_encoding.

@joaostorrer
Copy link
Author

Sorry, actually Postgres works fine, the problem is with Oracle database.
My deploy consists in deploying to Postgres and Oracle, so i got confused.

My Oracle database is configured with WE8MSWIN1252 charset:

SQL> select *
  2  from   nls_database_parameters
  3  where  parameter in ('NLS_RDBMS_VERSION', 'NLS_DATE_LANGUAGE', 'NLS_NCHAR_CHARACTERSET', 'NLS_CHARACTERSET', 'NLS_ISO_CURRENCY', 'NLS_TERRITORY', 'NLS_LANGUAGE')
  4  order  by 1;
 
PARAMETER                                                                        VALUE
-------------------------------------------------------------------------------- ----------------------------------------------------------------
NLS_CHARACTERSET                                                                 WE8MSWIN1252
NLS_DATE_LANGUAGE                                                                AMERICAN
NLS_ISO_CURRENCY                                                                 AMERICA
NLS_LANGUAGE                                                                     AMERICAN
NLS_NCHAR_CHARACTERSET                                                           AL16UTF16
NLS_RDBMS_VERSION                                                                19.0.0.0.0
NLS_TERRITORY                                                                    AMERICA
 
7 rows selected

But when I define the Sql*Plus variable with the same value it works fine:

SQL> define test = "ãçê"
SQL> define
DEFINE _SQLPLUS_RELEASE = "000000000" (CHAR)
DEFINE _EDITOR          = "PLSQLDev" (CHAR)
DEFINE _DATE            = "17/01/2024" (CHAR)
DEFINE _PRIVILEGE       = "" (CHAR)
DEFINE _O_VERSION       = "Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 " (CHAR)
DEFINE _O_RELEASE       = "000000000" (CHAR)
DEFINE test             = "ãçê" (CHAR)
 
SQL> select '&test' from dual;
 
'ÃÇÊ'
-----
ãçê

@theory
Copy link
Collaborator

theory commented Jan 19, 2024

Sqitch requires everything to be UTF-8, and it sets up the Oracle connection for that:

BEGIN {
# We tell the Oracle connector which encoding to use. The last part of the
# environment variable NLS_LANG is relevant concerning data encoding.
$ENV{NLS_LANG} = 'AMERICAN_AMERICA.AL32UTF8';
# Disable SQLPATH so that no start scripts run.
$ENV{SQLPATH} = '';
}

So be sure that the contents of the variable are UTF-8 and not CP1252.

@joaostorrer
Copy link
Author

I tried a few things to guarantee the variable is in UTF-8:

root@18d3f3eb022a:/repo# sqitch deploy flipr_test --set var="$(echo "ãçê")"
Deploying changes to flipr_test
  + appschema ................. not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.

Deploy failed
root@18d3f3eb022a:/repo# sqitch deploy flipr_test --set var="$(echo "ãçê" | iconv -t UTF-8)"
Deploying changes to flipr_test
  + appschema ................. not ok
Can't decode ill-formed UTF-8 octet sequence <E3> at end of file at /bin/../lib/perl5/App/Sqitch.pm line 480.

Deploy failed
root@18d3f3eb022a:/repo# echo "ãçê" | file -
/dev/stdin: UTF-8 Unicode text
root@18d3f3eb022a:/repo# echo -n "ãçê" | file -
/dev/stdin: UTF-8 Unicode text, with no line terminators

I also tested deploying in another Oracle Database that is configured with NLS_CHARACTERSET AL32UTF8, but I got the same error message.

@theory
Copy link
Collaborator

theory commented Jan 23, 2024

I think the problem is not the text read from the command-line, but text returned from SQLPlus. If it doesn't emit UTF-8 then Sqitch will have trouble reading it. Could SQLPlus be ignoring OUTPUT_CHARSET?

Try this patch:

--- a/lib/App/Sqitch.pm
+++ b/lib/App/Sqitch.pm
@@ -472,6 +472,7 @@ sub spool {
     }
 
     local $SIG{PIPE} = sub { die 'spooler pipe broke' };
+    binmode $fh, ':raw';
     if (ref $fh eq 'ARRAY') {
         for my $h (@{ $fh }) {
             print $pipe $_ while <$h>;

It should treat the data read from SQL*Plus as raw bytes, rather than try to decode them as UTF-8. Will be interesting to see what it emits.

@joaostorrer
Copy link
Author

The patch works!

@theory
Copy link
Collaborator

theory commented Jan 26, 2024

Yes, the patch works because it's not decoding the output from SQLPlus into UTF-8. Which means SQLPlus is not emitting UTF-8. Clearly there's some additional configuration we need to make to get it to do so.

@theory
Copy link
Collaborator

theory commented Jan 26, 2024

Sure seems like setting NLS_LANG should work.

@theory theory added the waiting Waiting on feedback label Feb 4, 2024
@joaostorrer
Copy link
Author

I tried setting NLS_LANG to different values like BRAZILIAN PORTUGUESE_BRAZIL.UTF8, PORTUGUESE_BRAZIL.UTF8, AMERICAN_AMERICA.UTF8 and AMERICAN_AMERICA.AL32UTF8, but they all failed with Can't decode ill-formed...

@theory
Copy link
Collaborator

theory commented Feb 12, 2024

Yeah, Sqitch sets NLS_LANG, so anything you set it to doesn't work. Unless you tried setting it in the source?

BEGIN {
# We tell the Oracle connector which encoding to use. The last part of the
# environment variable NLS_LANG is relevant concerning data encoding.
$ENV{NLS_LANG} = 'AMERICAN_AMERICA.AL32UTF8';
# Disable SQLPATH so that no start scripts run.
$ENV{SQLPATH} = '';
}

@joaostorrer
Copy link
Author

If I set to BRAZILIAN PORTUGUESE_BRAZIL.UTF8 or PORTUGUESE_BRAZIL.UTF8 in the source I get the same error.

@theory
Copy link
Collaborator

theory commented Feb 14, 2024

For some reason SQL*Plus is returning non-UTF-8 bytes. I do not understand why and don't have the resources to investigate. Maybe it's worth asking on Stack Overflow or an Oracle forum?

The key thing is to get SQL*Plus to emit only UTF-8 bytes. Though honestly I have no idea what it thinks it's emitting. Your change name is ASCII, so should be fine. When you tried the patch a few weeks ago, you said it worked; what, exactly, did it emit? Very curious to know what it might be choking on.

@joaostorrer
Copy link
Author

joaostorrer commented Feb 14, 2024

I inserted the content of var into a table to get what it emit with the patch applied.
The column was inserted with que upside down question mark (¿).
As far as I know, this only indicates that the database tried to inserted a character that is not recognized by SQL*Plus.

@theory
Copy link
Collaborator

theory commented Feb 15, 2024

Is the deploy script you wrote UTF-8 encoded?

@joaostorrer
Copy link
Author

Yes, all the .sql scripts inside my project are UTF-8 (without BOM) encoded and Windows (CR LF).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting Waiting on feedback
Projects
None yet
Development

No branches or pull requests

2 participants