Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application crashes on outgoing DNS requests/traffic #2271

Open
klingenm opened this issue Feb 26, 2024 · 5 comments
Open

Application crashes on outgoing DNS requests/traffic #2271

klingenm opened this issue Feb 26, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@klingenm
Copy link

klingenm commented Feb 26, 2024

Bug Description

Our Nest.js application crashes seemingly a bit randomly.

We think it is related to having an active WebSocket connection from frontend -> backend where backend is running in mirrord.

Looks like it is triggered if the DNS query is done while there is a long running >5 seconds stolen request being processed in the local process.

{
  "kube_context": "cloud-dev-APP",
  "target": "deployment/APP-backend",
  "operator": false,
  "pause": true,
  "agent": {
    "ephemeral": true,
    "log_level": "mirrord=trace"
  },
  "internal_proxy": {
    "log_level": "mirrord_intproxy=trace",
    "log_destination": "/tmp/internal_proxy.log"
  },
  "feature": {
    "network": {
      "incoming": {
        "mode": "steal",
        "port_mapping": [[46000, 3000]]
      }
    },
    "env": {
      "override": {
        "TYPEORM_MIGRATIONS_RUN": "true",
        "PRETTY_PRINT": "true"
      }
    }
  }
}

Steps to Reproduce

  1. start app with mirrord
  2. start an active WebSocket connection via agent to app
  3. trigger another request to backend (maybe required that a new db connection is created)

Backtrace

Not from same run as the int_proxy logs, I lost those...


2024-02-23T15:28:35.490170Z ERROR ThreadId(02) mirrordlayer::error: Error occured in Layer >> ProxyError(CodecError(IoError(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })))
Assertion failed: (!"unknown EAI* error code"), function uv__getaddrinfo_translate_error, file getaddrinfo.c, line 90.

Tested on 3.90.0

Relevant Logs

internal_proxy.log

Your operating system and version

macOS Sonoma 14.2.1 (23C71)

Local process

nodejs

Local process version

... node/v18.17.1/bin/node: Mach-O 64-bit executable arm64

Additional Info

It most often seems to crash when it tries to connect to the database.

The database is run in a separate namespace in the k8s cluster. An ExternalName with name "db" is created in APP's namespace, which is used in app config.

@klingenm klingenm added the bug Something isn't working label Feb 26, 2024
@klingenm
Copy link
Author

I started digging into why it works for one team but not the other. My hypotesis about the websocket does not hold water. I did not know, but the other team is now actually using WebSockets more extensively than the one having the issues.

What stands out is that on the problematic page, one graphql query is done that returns a 6MB response which takes more than 5 seconds to complete and in general it makes more grapqhl queries than should be needed.

@klingenm
Copy link
Author

More digging; as can be seen from the logs, the error is related to DNS query. In our case it can be completely avoided if we configure the db connection with the db server IP instead of the "db" name, thus avoiding the DNS query.

I'm open for screen-sharing session to show you reproduction, but I'll need time to set up an environment.

@aviramha
Copy link
Member

Thank you! we're still investigating the logs.

@Razz4780
Copy link
Contributor

Fixed with #2308

@Razz4780
Copy link
Contributor

Observed again

@Razz4780 Razz4780 reopened this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants