Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a troubleshooting section in our Getting Started guide #2353

Closed
neoaggelos opened this issue Apr 12, 2020 · 31 comments
Closed

Add a troubleshooting section in our Getting Started guide #2353

neoaggelos opened this issue Apr 12, 2020 · 31 comments
Assignees
Labels
c/shared This is shared between components documentation This involves writing user documentation
Milestone

Comments

@neoaggelos
Copy link
Contributor

Summary

Like #2352. Add a troubleshooting section in the Getting Started for common problems that may arise when following the Getting Started guide.

Why do we need this ?

Make docs friendlier to new users.

What is already there? What do you see now?

No troubleshooting section.

What is missing? What do you want to see?

A Troubleshooting section at the end of the getting started guide, for users to be able to look up common problems, along with the reason and simple steps to fix them.

How do you propose to document this?

Our docs should generally be straightforward and easy to follow. However, having a troubleshooting section, with specific error messages and instructions to fix them could prove very helpful for new users.

Can you do this yourself and submit a Pull Request?

yes

@neoaggelos neoaggelos self-assigned this Apr 12, 2020
@neoaggelos neoaggelos added the documentation This involves writing user documentation label Apr 12, 2020
@neoaggelos neoaggelos added this to the April 2020 milestone Apr 12, 2020
@fox27374
Copy link

Hi, definitely a thumbs up to this. I ran into a couple of problems and open questions while following the guide. At the moment I am stuck with this error. Maybe you can also point to this one in the documentation?
image

@johanstokking
Copy link
Member

@fox27374 can you open the browser developer tools and paste the window.PAGE_DATA value? You can enter that in the browser console while seeing this error.

Also, did you follow all steps in the Getting Started, i.e. for creating the Console OAuth client?

@fox27374
Copy link

Hi,
here is the window.PAGE_DATA as well as the command I use for creating the oauth client. One important point to mention is, that I use my own certificates (signed by the lab CA).

DATA
window.PAGE_DATA = { "error": { "code": 7, "message": "error:pkg/web/oauthclient:exchange (token exchange refused)", "details": [{ "@type": "type.googleapis.com/ttn.lorawan.v3.ErrorDetails", "namespace": "pkg/web/oauthclient", "name": "exchange", "message_format": "token exchange refused", "code": 7 }] } };

COMMAND
docker-compose run --rm stack is-db create-oauth-client --id console --name "Console" --owner admin --secret "SM2CE7335KDAIILCA76KETRHDQTTDAQTDJHBSL6RCOX3WFZFDZ4Q" --redirect-uri "https://lora01.ntslab.loc/console/oauth/callback" --redirect-uri "/console/oauth/callback"

Thanks a lot!
Cheers,
Daniel

@johanstokking
Copy link
Member

@fox27374 thanks for the additional information.

What is the configured OAuth URL, i.e. the /token URL that you configured? You can redact sensitive content.

Can you confirm that lora01.ntslab.loc resolves in the Docker container, assuming that you run The Things Stack via Docker?

@fox27374
Copy link

Hi,

Thank you for the reply and for helping me here. The content is not yet sensible, its a lab setup for now as a test for a future production environment. I want to get rid of the Actility server :)

Yes, i run the TTN stack via Docker on a Linux server. lora01.ntslab.loc is configured in the hosts file, so name resolution should work.

The /token URL is:
token-url: 'https://lora01.ntslab.loc/oauth/token'

If you need more information, you can directly have a look at the docker-compose.yml and the ttn-lw-stack.yml files. I also use a start script to do the initialisation (start.sh).

Thank you in advance,
Daniel

@neoaggelos
Copy link
Contributor Author

neoaggelos commented Apr 20, 2020

Hi @fox27374

Yes, i run the TTN stack via Docker on a Linux server. lora01.ntslab.loc is configured in the hosts file, so name resolution should work.

Do you mean the /etc/hosts file of your machine? This does not affect the Docker container where the stack is running, which could be the source of the issue you are seeing.

You could check that with the following command:

$ docker-compose stack exec nc -z lora01.ntslab.loc

You should see something along the lines of nc: bad address 'lora01.ntslab.loc'.

Can you try adding an extra_hosts section in your docker-compose.yaml, like so:

# docker-compose.yaml
services:
  # ...
  stack:
    # ...
    extra_hosts:
      - "lora01.ntslab.loc:YOUR_IP_ADDRESS"
    # ...

And restart with docker-compose up -d

The hostname resolution should then work. (But, if YOUR_IP_ADDRESS is something like 127.0.0.1, then you might still get some errors)

@fox27374
Copy link

Hi @neoaggelos
thank you for the info. I removed the hosts entry and set the IP/hostname directly on the DNS server. Additionally I added the "extra_hosts" entry in the docker-compose.yml.
I am afraid, the error still exists.

I started ash shell in the container and and checked the dns resolution:

$ nslookup lora01.ntslab.loc
Name:      lora01.ntslab.loc
Address 1: 172.24.89.120 lora01.ntslab.loc

So this seems good. Following the error message token exchange refused, is there any further debugging we can enable for the oauth token exchange? Sorry to keep you busy with this ....
Thanks

@fox27374
Copy link

By the way, seems like someone else also has the same problem

@neoaggelos
Copy link
Contributor Author

Hi @neoaggelos
thank you for the info. I removed the hosts entry and set the IP/hostname directly on the DNS server. Additionally I added the "extra_hosts" entry in the docker-compose.yml.

Hmm, with proper DNS configuration, you should not have to set extra_hosts.

I am afraid, the error still exists.

I started ash shell in the container and and checked the dns resolution:

$ nslookup lora01.ntslab.loc
Name:      lora01.ntslab.loc
Address 1: 172.24.89.120 lora01.ntslab.loc

The 172.24.89.120 is the one from the network created by Docker, which could also be a possible reason of failure.

So this seems good. Following the error message token exchange refused, is there any further debugging we can enable for the oauth token exchange? Sorry to keep you busy with this ....
Thanks

Try clearing your cookies, and trying from a clean browser session as well. Also, make sure the certificates are properly read from the stack cat /var/run/secrets/cert.pem and cat /var/run/secrets/key.pem from a shell within the container should be enough to check that one.

Off-topic; Have you tried setting up the stack on localhost? Did you succeed?

@fox27374
Copy link

Hi,

sorry, i did not mention that the 172.24.89.120 is the IP address of the server itself in the lab. The docker addresses are 172.9.0.X

I do all the tests with a browser in private mode, so there are no cookies involved. The key and cert is readable with the "thethings" user:

/ $ whoami
thethings

/ $ cat /var/run/secrets/key.pem 
-----BEGIN PRIVATE KEY-----
MIIEvwIBADANBgkqhkiG9w0BAQEFAASCBKkwggSlAgEAAoIBAQC7IjZoBd2Mu4Ev
AYDrEh6mBWYw5cRDA02F10OQpbQbm6RigFbODM2owGRyCkkZfAUL2VV9xl5TzdMl
I6IecaA7/F7TpciuiJHmnfRVAbDlPI6EJYybdrU7tmfdeWc/ThuVVNolJFUeap+T
OIzv9MkGbBAF19ju4PJel6z3ef+NUhc5LKfjVQZeieQULX2b9+Hpd4ySdR2Nfzdt
......

I will try to change the setup to localhost and keep you posted.

@johanstokking
Copy link
Member

sorry, i did not mention that the 172.24.89.120 is the IP address of the server itself in the lab. The docker addresses are 172.9.0.X

But can you curl https://lora01.ntslab.loc from inside the container? If not, what is the error reported?

@fox27374
Copy link

Hi,

seems like we got it. The curl hint was a good one. This showed, that the ca.pem was not in the trusted certificate store:

/ # curl https://lora01.ntslab.loc
curl: (60) SSL certificate problem: self signed certificate in certificate chain

So I copied the ca.pem certificate to /usr/local/share/ca-certificates/

/ $ ls -la /usr/local/share/ca-certificates/ca.pem 
-rw-r--r--    1 thething thething      1310 Apr 14 11:36 /usr/local/share/ca-certificates/ca.pem

by adding it to the volumes section of the docker-compose.yml file:

volumes:
      - "./data/blob:/srv/ttn-lorawan/public/blob"
      - "./config/stack:/config:ro"
      - "./config/stack/cert/ca.pem:/usr/local/share/ca-certificates/ca.pem"

Now I am able to login to the console and all certificates are trusted. Awesome!

Is this the best / intended way of adding a trusted root certificate to the TTN container?

@fox27374
Copy link

Sorry for beeing euphoric too early. It seems like the auth token was still in the DB, thats why everything worked. After the container starts, I needed to run this command in order to add the ca.pem certificate to the trusted store:

docker exec -it --user root ttn-server_stack_1 /usr/sbin/update-ca-certificates

Then the oauth client is able to get a token and store it in the DB. I can work for now, but this should not be the final solution i guess. Any ideas?
Thanks a lot!

@johanstokking
Copy link
Member

@fox27374 great that you found the cause. That's always a good start to come up with a clean solution.

The stack respects TTN_LW_TLS_ROOT_CA (or tls.root-ca), a file name, with your CA. See https://thethingsstack.io/v3.7.0/reference/configuration/the-things-stack/

@fox27374
Copy link

@johanstokking : I added the folowing to the docker-compose.yml

stack
......
    secrets:
      - cert.pem
      - key.pem
      - ca.pem

secrets:
  cert.pem:
    file: config/stack/cert/cert.pem
  key.pem:
    file: config/stack/cert/key.pem
  ca.pem:
    file: config/stack/cert/ca.pem

This way, the certificate files are available in the container in /run/secrets and /var/run/secrets. I checked this direclty in the container.

I added
TTN_LW_TLS_ROOT_CA: "/var/run/secrets/ca.pem"
to the docker-compose.yml file. The error is still there. I also tried to add this to the ttn-lw-stack.yml:

tls:
  source: "file"
  root-ca: "/var/run/secrets/ca.pem"
  certificate: "/var/run/secrets/cert.pem"
  key: "/var/run/secrets/key.pem"

Same thing here. I still get the error. Could it be, that some applications, especially the oauth client use the OS internal trusted root certificates? Because as soon as I add the ca.pem to the trusted root certificates, everything works.
Thanks, Daniel

@johanstokking
Copy link
Member

cc @adriansmares

@fox27374
Copy link

Hi, any news here? I tried debugging the access to the trusted root certificates with strace but did not succeed.

@johanstokking
Copy link
Member

@fox27374 can you verify that this works?

$ curl -cacert /var/run/secrets/ca.pem https://lora01.ntslab.loc

@adriansmares looks like we need two things;

  1. Report the underlying error cause, potentially as reason attribute, as it's a net error or something else stdlib
  2. Verify that we are respecting tls.root-ca in the OAuth client

@Lucianovici
Copy link

Hi guys,

I am getting the same 403 error, running TTN stack v3 with docker within a Vagrant box (with Virtual Box). - Just a sandbox for me to create the Saltstack recipe.

I tried many approaches, considering I took care of the DNS.

  • use self-signed certificates
  • reuse some existing certificates created with letsencrypt on a VPS by TTN stack.
  • tried all the insecure configs one by one

For me it is not a problem of root-ca, I don't know what it is. Should we open another issue for this?

One question though: From your knowledge, is it possible to config it without TLS, just for dev purposes within a Vagrant box? If so would you please give me some pointers?

I can confirm that on my VPS it works fine with letsencrypt, which is of course what we'll have in production.

Thanks.

@johanstokking johanstokking added prio/medium c/shared This is shared between components labels Apr 27, 2020
@johanstokking
Copy link
Member

Adding c/shared cause it might not be a config thing

@fox27374
Copy link

Hi, sorry for the late reply. I can verify that curl only works with the --cacert parameter as the ca.pem certificate is not installed in the tusted root certificates:

/ $ whoami
thethings
/ $ curl https://lora01.ntslab.loc
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
/ $ curl --cacert /var/run/secrets/ca.pem https://lora01.ntslab.loc
/ $ 

@johanstokking
Copy link
Member

Please check if the OAuth client respects the TLS configuration

@htdvisser htdvisser removed this from the April 2020 milestone May 1, 2020
@htdvisser htdvisser added this to the May 2020 milestone May 1, 2020
@wasn-eu
Copy link

wasn-eu commented May 6, 2020

if you use nginx in front of the stack nginx must handle all ssl/tls.

this are the configs for nginx:

nginx.conf

stream {
    include stream_conf.d/*.conf;
}

stream_conf.d/mqtt.conf

log_format mqtt '$remote_addr [$time_local] $protocol $status $bytes_received '
                '$bytes_sent $upstream_addr';

upstream ttn1 {
    server stack-ip:1881;
    zone tcp_mem 64k;
}
upstream ttn2 {
    server stack-ip:1882;
    zone tcp_mem 64k;
}
upstream ttn3 {
    server stack-ip:1883;
    zone tcp_mem 64k;
}

server {
    listen 8881 ssl; # MQTT secure port
    preread_buffer_size 1k;

    ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
    ssl_ciphers         HIGH:!aNULL:!MD5;
    ssl_session_cache   shared:SSL:128m; # 128MB ~= 500k sessions
    ssl_session_tickets on;
    ssl_session_timeout 8h;

    proxy_pass ttn1;
    proxy_connect_timeout 1s;
}

server {
    listen 8882 ssl; # MQTT secure port
    preread_buffer_size 1k;

    ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
    ssl_ciphers         HIGH:!aNULL:!MD5;
    ssl_session_cache   shared:SSL:128m; # 128MB ~= 500k sessions
    ssl_session_tickets on;
    ssl_session_timeout 8h;

    proxy_pass ttn2;
    proxy_connect_timeout 1s;


server {
    listen 8883 ssl; # MQTT secure port
    preread_buffer_size 1k;

    ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
    ssl_ciphers         HIGH:!aNULL:!MD5;
    ssl_session_cache   shared:SSL:128m; # 128MB ~= 500k sessions
    ssl_session_tickets on;
    ssl_session_timeout 8h;

    proxy_pass ttn3;
    proxy_connect_timeout 1s;
}

server {
    listen 1881; # MQTT secure port
    preread_buffer_size 1k;

    proxy_pass ttn1;
    proxy_connect_timeout 1s;
}

server {
    listen 1882; # MQTT secure port
    preread_buffer_size 1k;

    proxy_pass ttn2;
    proxy_connect_timeout 1s;
}

server {
    listen 1883; # MQTT secure port
    preread_buffer_size 1k;

    proxy_pass ttn3;
    proxy_connect_timeout 1s;
}

you need this in your site config for all ports (PORT=1884, 1885, 1887):

server {
        server_name FQDN;

        location / {
                proxy_pass      http://stack-ip:PORT;
                proxy_set_header Host $http_host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Host $server_name;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "Upgrade";
                proxy_buffering off;
        }

       listen [::]:PORT ipv6only=on; # managed by Certbot
       listen PORT; # managed by Certbot
}

and this for ports (PORT/PORTSSL=1885/443, 1884/8884, 1887/8887):

server {

        server_name FQDN;

        location / {
                proxy_pass      http://stack-ip:PORT;
                proxy_set_header Host $http_host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Host $server_name;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "Upgrade";
                proxy_buffering off;
        }

        listen [::]:PORTSSL ssl ipv6only=on; # managed by Certbot
        listen PORTSSL ssl; # managed by Certbot
        ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
        ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
        include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
        ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

as you can see i am using lets encrypt.

@neoaggelos
Copy link
Contributor Author

Thanks a lot @wasn-eu!

This is also useful for #1760.

@ramampiandra
Copy link

ramampiandra commented May 6, 2020

Hi all,

I have a similar issue when installing TTN 3.7 on ubuntu.

I followed the fox27374's guide (https://github.com/fox27374/lora-stack) but still have the issue.
My installation is on VM and Ubuntu. I use self signed certificate for local development.

I am still stuck with this error. "Token Refused Exchange"
Thank you in advance,

@johanstokking johanstokking assigned kschiffer and unassigned neoaggelos May 8, 2020
@fox27374
Copy link

Hi @ramampiandra,

as I wrote in the Slack chat, for the whole thing to work, you need the following:

  • A certificate for the TLS traffic: cert.pem
  • The corresponding private key: key.pem
  • The CA certificate that issued the cert.pem: ca.pem

Please make sure that the certificates are correct:

cert.pem

openssl x509 -in cert.pem -text -noout | grep -A 1 Identifier
            X509v3 Subject Key Identifier:
                26:78:63:90:E7:1C:09:B7:DA:B3:7D:81:F0:DE:47:6B:AE:16:58:79
            X509v3 Authority Key Identifier:
                keyid:86:32:F5:56:44:21:EC:E3:2A:D9:5F:6E:87:82:7A:67:C2:F1:77:E8

ca.pem

openssl x509 -in ca.pem -text -noout | grep -A 1 Identifier
            X509v3 Subject Key Identifier:
                86:32:F5:56:44:21:EC:E3:2A:D9:5F:6E:87:82:7A:67:C2:F1:77:E8

Make sure that the Authority Key Identifyer in the cert.pem is the same as the Subject Key Identifyer in the ca.pem.

After the stack is started and all docker containers are up, run the following command (adapt the "ttn-server_stack_1" to the name of your TTN container):
docker exec -it --user root ttn-server_stack_1 /usr/sbin/update-ca-certificates
This will install the ca.pem certificate within the container and add it to the trusted certificates.

After that, directly login to your container and test if the certificate works:

docker-compose exec stack "/bin/ash"
curl https://YOURSERVER.YOUR.DOMAIN

You should NOT see any result or error - this means your certificate is trusted.

I hope this helps,
Cheers

@kschiffer
Copy link
Member

So after looking into this in detail, I was able to reproduce and can confirm that there is indeed a problem with the TLS config (and specifically root certificates) not being respected by our OAuth flow, causing the token exchange to fail.

I'm currently working on a PR to fix this which should land later today.

@fox27374
Copy link

@kschiffer awesome, thank you for having a look at this. Just keep me posted so that I can help you with testing.

@dgraposo
Copy link

Hi! There is another workaround, to fix this temporarily?

@johanstokking
Copy link
Member

@dgraposo this should be fixed in 3.8.1

@kschiffer
Copy link
Member

I will close this issue for now, since the focus moved to the "token exchange refused" issue, which has been addressed via #2511 and which can be followed further via #2521. I suspect this was the biggest reason to add a troubleshooting section.

This issue is not very useful anymore to discuss its initial purpose. I suggest reopening with proper scope if we deem a troubleshooting section to be necessary still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/shared This is shared between components documentation This involves writing user documentation
Projects
None yet
Development

No branches or pull requests

10 participants