Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SSL Validation Error with Diversity - Postgresql starting 2.0.0 #2282

Open
1 task done
cambrosch opened this issue May 2, 2024 · 17 comments
Open
1 task done
Labels

Comments

@cambrosch
Copy link

cambrosch commented May 2, 2024

Environment

  • VerneMQ Version: 2.0.0
  • OS: Docker
  • Erlang/OTP version (if building from source):
  • Cluster size/standalone: standalone

Current Behavior

Running the exactly identical docker Parameters as from 1.13.0, after upgrading to 2.0.0, vmq diversity cannot connect to our postgresql server via SSL (hosted in Azure), see error in log.

A downgrade back to 1.13.0 with the same parameters fixed the issue. Validating the certificate chain using pgadmin (mode: verify-full) showed no issues with SSL.

Expected behaviour

Connecting to this sql server should not result in a validation error.

Configuration, logs, error output, etc.

Error in Console:

2024-05-02T08:58:15.702041604Z 2024-05-02T08:58:15.699735+00:00 [error] <0.628.0> gen_server:error_info/8:1391: Generic server <0.628.0> terminating. Reason: {ssl_negotiation_failed,{options,incompatible,[{verify,verify_peer},{cacerts,undefined}]}}. Last message: {command,epgsql_cmd_connect,#{port => 5432,ssl => true,host => "servername-removed.postgres.database.azure.com",password => #Fun<epgsql_cmd_connect.0.87005817>,database => "emili",username => "psql",ssl_opts => []}}. State: {state,undefined,undefined,<<>>,undefined,on_message,undefined,{[],[]},undefined,undefined,undefined,undefined,[],information_redacted,[],undefined,undefined,undefined,undefined,undefined}. Client <0.618.0> stacktrace: [{gen,do_call,4,[{file,"gen.erl"},{line,240}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,415}]},{epgsql,call_connect,2,[{file,"/opt/vernemq/_build/default/lib/epgsql/src/epgsql.erl"},{line,207}]},{vmq_diversity_worker_wrapper,handle_info,2,[{file,"/opt/vernemq/apps/vmq_diversity/src/vmq_diversity_worker_wrapper.erl"},{line,176}]}].
2024-05-02T08:58:15.703000270Z 2024-05-02T08:58:15.699712+00:00 [error] <0.626.0> gen_server:error_info/8:1391: Generic server <0.626.0> terminating. Reason: {ssl_negotiation_failed,{options,incompatible,[{verify,verify_peer},{cacerts,undefined}]}}. Last message: {command,epgsql_cmd_connect,#{port => 5432,ssl => true,host => "servername-removed.postgres.database.azure.com",password => #Fun<epgsql_cmd_connect.0.87005817>,database => "emili",username => "psql",ssl_opts => []}}. State: {state,undefined,undefined,<<>>,undefined,on_message,undefined,{[],[]},undefined,undefined,undefined,undefined,[],information_redacted,[],undefined,undefined,undefined,undefined,undefined}. Client <0.616.0> stacktrace: [{logger_config,allow,2,[{file,"logger_config.erl"},{line,64}]},{vmq_diversity_worker_wrapper,handle_info,2,[{file,"/opt/vernemq/apps/vmq_diversity/src/vmq_diversity_worker_wrapper.erl"},{line,181}]},{gen_server,try_handle_info,3,[{file,"gen_server.erl"},{line,1095}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,1183}]}].
2024-05-02T08:58:15.703784054Z 2024-05-02T08:58:15.700569+00:00 [warning] <0.616.0> vmq_diversity_worker_wrapper:handle_info/2:181: Could not connect to postgresql due to {ssl_negotiation_failed,{options,incompatible,[{verify,verify_peer},{cacerts,undefined}]}}

Postgre-related docker environment parameters:

DOCKER_VERNEMQ_VMQ_DIVERSITY__AUTH_POSTGRES__ENABLED = on
DOCKER_VERNEMQ_VMQ_DIVERSITY__POSTGRES__HOST = servername-removed.postgres.database.azure.com
DOCKER_VERNEMQ_VMQ_DIVERSITY__POSTGRES__PORT = 5432
DOCKER_VERNEMQ_VMQ_DIVERSITY__POSTGRES__USER = psql
DOCKER_VERNEMQ_VMQ_DIVERSITY__POSTGRES__SSL = on
DOCKER_VERNEMQ_VMQ_DIVERSITY__POSTGRES__PASSWORD = removed
DOCKER_VERNEMQ_VMQ_DIVERSITY__POSTGRES__DATABASE = removed
DOCKER_VERNEMQ_PLUGINS__VMQ_DIVERSITY = on
DOCKER_VERNEMQ_LISTENER__SSL__DEFAULT = 0.0.0.0:8883
DOCKER_VERNEMQ_LISTENER__SSL__CAFILE = /etc/ssl/ca.pem
DOCKER_VERNEMQ_LISTENER__SSL__CERTFILE = /etc/ssl/cert.pem
DOCKER_VERNEMQ_LISTENER__SSL__KEYFILE = /etc/ssl/key.pem
DOCKER_VERNEMQ_LISTENER__SSL__TLS_VERSION = tlsv1.3
DOCKER_VERNEMQ_LISTENER__SSL__REQUIRE_CERTIFICATE = off

Postgresql server is set to:
min SSL version: TLS 1.2
max SSL version TLS 1.3

Code of Conduct

  • I agree to follow the VerneMQ's Code of Conduct
@cambrosch cambrosch added the bug label May 2, 2024
@ioolkos
Copy link
Contributor

ioolkos commented May 2, 2024

@cambrosch I think this comes from new SSL requirements in OTP 26 (which used for 2.0.0 while 1.3.0 is based on OTP 25).
But it's a good catch. It seems in 26, SSL wants to explicitly know about the CA chain.

Can you try setting vmq_diversity.postgres.cafile in vernemq.conf?
Maybe pointing to the system CA certs is enough (/etc/ssl/certs/ca-certificates.crt), maybe not...


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@cambrosch
Copy link
Author

That sadly doesn't work, I can't add it via DOCKER_VERNEMQ_VMQ_DIVERSITY__POSTGRES__CAFILE as that throws an Error generating Config with cuttlefish, and I also can't manually override the config file in the docker container, I tried that in several configurations but if I change it manually, as soon as I restart vernemq it gets overridden, and if I mount a drive to save the config file, it wipes the docker container, and refuses to work for one reason or another. That's a separate issue, but probably not one I can quickly fix :/

@ioolkos
Copy link
Contributor

ioolkos commented May 2, 2024

I think you can mount a conf.local file and when the Docker image finds this, it takes that conf file as a full replacement.
/etc/vernemq/vernemq.conf.local.
But this will not solve the issue here. An error generating the config is usually a wrong setting name. But yours looks correct :(


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos
Copy link
Contributor

ioolkos commented May 2, 2024

@cambrosch do you see the Cuttlefish config error printed to you console when you run the Docker image in the foreground?


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@cambrosch
Copy link
Author

2024-05-02T10:15:08.68071  Connecting to the container 'vernemq'...
2024-05-02T10:15:08.70573  Successfully Connected to container: 'vernemq' [Revision: 'vernemq--jamr5ej-5dfd4d78dc-4khc5', Replica: 'vernemq--jamr5ej']
2024-05-02T10:15:10.703798696Z Error generating config with cuttlefish
2024-05-02T10:15:10.703850738Z   run `vernemq config generate -l debug` for more information.

@ioolkos
Copy link
Contributor

ioolkos commented May 2, 2024

@cambrosch are you able to attach to the container and run vernemq config generate -l debug? This should print out the actual config problem.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@cambrosch
Copy link
Author

Sadly, the container immediately crashes upon getting this message, so I cannot attach a console :/

@ioolkos
Copy link
Contributor

ioolkos commented May 2, 2024

I just tested this with a docker run, feeding it an example.env file with Postgres configs similar to yours. This initially complained about whitespaces around the ='s in the env file, but other than that seems to work, at least no complaints generating the config. I'm not sure how you run the Docker image, though.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@cambrosch
Copy link
Author

Ah, I messed that up. I had /etc/ssl mounted for the MQTT TLS certs, so /etc/ssl/certs/ca-certificates.crt didn't even exist. I re-created that now, and now the config at least boots again. Alas, now I'm on to a new error:

2024-05-02T14:49:19.698008699Z 2024-05-02T14:49:19.697685+00:00 [notice] <0.3694.0> ssl_handshake:path_validation_alert/1:2127: TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure, - {bad_cert,hostname_check_failed}
2024-05-02T14:49:19.698101996Z 2024-05-02T14:49:19.697896+00:00 [warning] <0.616.0> vmq_diversity_worker_wrapper:handle_info/2:181: Could not connect to postgresql due to {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}
2024-05-02T14:49:19.699001414Z 2024-05-02T14:49:19.697927+00:00 [error] <0.3689.0> gen_server:error_info/8:1391: Generic server <0.3689.0> terminating. Reason: {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}. Last message: {command,epgsql_cmd_connect,#{port => 5432,ssl => true,host => "hostname-removed.postgres.database.azure.com",password => #Fun<epgsql_cmd_connect.0.87005817>,database => "removed",username => "psql",ssl_opts => [{cacertfile,"/etc/ssl/certs/ca-certificates.crt"}]}}. State: {state,undefined,undefined,<<>>,undefined,on_message,undefined,{[],[]},undefined,undefined,undefined,undefined,[],information_redacted,[],undefined,undefined,undefined,undefined,undefined}. Client <0.616.0> stacktrace: [{gen,do_call,4,[{file,"gen.erl"},{line,240}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,415}]},{epgsql,call_connect,2,[{file,"/opt/vernemq/_build/default/lib/epgsql/src/epgsql.erl"},{line,207}]},{vmq_diversity_worker_wrapper,handle_info,2,[{file,"/opt/vernemq/apps/vmq_diversity/src/vmq_diversity_worker_wrapper.erl"},{line,176}]}].
2024-05-02T14:49:19.699475025Z 2024-05-02T14:49:19.698600+00:00 [error] <0.3689.0> proc_lib:crash_report/4:584: crasher: initial call: epgsql_sock:init/1, pid: <0.3689.0>, registered_name: [], exit: {{ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}},[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,1226}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.616.0>,<0.615.0>,auth_postgres,vmq_diversity_sup,<0.595.0>], message_queue_len: 0, messages: [], links: [<0.616.0>], dictionary: [], trap_exit: false, status: running, heap_size: 10958, stack_size: 28, reductions: 33545; neighbours:
2024-05-02T14:49:19.713975185Z 2024-05-02T14:49:19.713598+00:00 [notice] <0.3698.0> ssl_handshake:path_validation_alert/1:2127: TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure, - {bad_cert,hostname_check_failed}
2024-05-02T14:49:19.714218930Z 2024-05-02T14:49:19.713791+00:00 [warning] <0.620.0> vmq_diversity_worker_wrapper:handle_info/2:181: Could not connect to postgresql due to {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}
2024-05-02T14:49:19.714381329Z 2024-05-02T14:49:19.713796+00:00 [error] <0.3690.0> gen_server:error_info/8:1391: Generic server <0.3690.0> terminating. Reason: {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}. Last message: {command,epgsql_cmd_connect,#{port => 5432,ssl => true,host => "hostname-removed.postgres.database.azure.com",password => #Fun<epgsql_cmd_connect.0.87005817>,database => "removed",username => "psql",ssl_opts => [{cacertfile,"/etc/ssl/certs/ca-certificates.crt"}]}}. State: {state,undefined,undefined,<<>>,undefined,on_message,undefined,{[],[]},undefined,undefined,undefined,undefined,[],information_redacted,[],undefined,undefined,undefined,undefined,undefined}. Client <0.620.0> stacktrace: [{gen,do_call,4,[{file,"gen.erl"},{line,240}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,415}]},{epgsql,call_connect,2,[{file,"/opt/vernemq/_build/default/lib/epgsql/src/epgsql.erl"},{line,207}]},{vmq_diversity_worker_wrapper,handle_info,2,[{file,"/opt/vernemq/apps/vmq_diversity/src/vmq_diversity_worker_wrapper.erl"},{line,176}]}].
2024-05-02T14:49:19.714946319Z 2024-05-02T14:49:19.714359+00:00 [error] <0.3690.0> proc_lib:crash_report/4:584: crasher: initial call: epgsql_sock:init/1, pid: <0.3690.0>, registered_name: [], exit: {{ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}},[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,1226}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.620.0>,<0.615.0>,auth_postgres,vmq_diversity_sup,<0.595.0>], message_queue_len: 0, messages: [], links: [<0.620.0>], dictionary: [], trap_exit: false, status: running, heap_size: 10958, stack_size: 28, reductions: 33544; neighbours:

@ioolkos
Copy link
Contributor

ioolkos commented May 2, 2024

Argh, now it's a verification error (the client tries to verify the peer), on the level of Erlang SSL. Need to research this but cannot do it immediately. Maybe also some sort of wildcard server name is the issue.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos
Copy link
Contributor

ioolkos commented May 4, 2024

I'm now suspecting this is the same as #1485 that we had to fix in the MQTT bridge. Are those wildcard certs?


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos
Copy link
Contributor

ioolkos commented May 15, 2024

@cambrosch are you still looking into this? is the public cert of the Postgres server a wildcard cert? https://en.wikipedia.org/wiki/Wildcard_certificate


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@cambrosch
Copy link
Author

The Certificate is using Common Name: removedhash.database.azure.com
Subject Alternative Names: removedhash.database.azure.com, dev-removed-psql.postgres.database.azure.com
Organization: Microsoft Corporation
I don't see any wildcard, but also the common name is not the used domain name, that's only listed in alternate names.

@cambrosch
Copy link
Author

Also>

depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Ro                                                                               ot CA
verify return:1
depth=1 C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
verify return:1
depth=0 C = US, ST = Washington, L = Redmond, O = Microsoft Corporation, CN = removedhash.database.azure.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = Washington, L = Redmond, O = Microsoft Corporation, CN = removedhash.database.azure.com
   i:C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
 1 s:C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
   i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root                                                                                CA
 2 s:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root                                                                                CA
   i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root                                                                                CA

Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:E                                                                               d448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:                                                                               RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:ECDSA+SHA1:RSA+SHA2                                                                               24:RSA+SHA1
Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed                                                                               25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+                                                                               SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 8913 bytes and written 839 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
    Protocol  : TLSv1.3
    Cipher    : TLS_AES_256_GCM_SHA384
    Session-ID: 441A89869FA67AE2B6E730907FB563C4103DA580AA1CD249445439FD6652CF19
    Session-ID-ctx:
    Resumption PSK: EC984194F66930E86B393A88C7E5C7EA7BC32C0D8D12743AF40E8E67285E                                                                               E6F0845B1799FFCDB24AB3096D42AAF9AE5F
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - 33 e5 9b d1 be 3d ee 94-79 33 c0 fd 7d 7f 63 34   3....=..y3..}.c4
    0010 - 62 ca 74 ab a6 bb 76 52-52 2a 6f 63 79 36 95 e1   b.t...vRR*ocy6..

    Start Time: 1715759464
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0

@ioolkos
Copy link
Contributor

ioolkos commented May 15, 2024

We'll need to bite the bullet and implement more options for all plugins that need outgoing SSL.

Those are:

  • {verify, verify_none | verify_peer}.
  • {customize_hostname_check, [{match_fun, public_key:pkix_verify_hostname_match_fun(https)}]}.
  • depth. (for CA chains).

The reason is that OTP 26 defaults to verify_peer for clients. Surprisingly, there's no way to configure this via application environment.
Another option would be to fall back to OTP 25.

@cambrosch one thing I wonder though: what happens when you set postgres host to an IP address instead of a name (if that's possible for your Azure env).

EDIT: just to be clear: it's of course not a bad thing to harden requirements with verify_peer. It will require the client to have access to a CA file so that it can verify the server. But I think the hostname_check (SNI) is then also triggered by that.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@mths1
Copy link
Contributor

mths1 commented May 19, 2024

@ioolkos : I can reproduce this. Azure DB with default microsoft certificates fail as described. Using an IP didn't make any difference.

@ioolkos
Copy link
Contributor

ioolkos commented May 19, 2024

@mths1 Thanks for testing!
Something like #2288 (untested) needed for any outgoing SSL then, to be fully OTP 26 compliant.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants