Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large put request fails with connection reset #311

Open
barperez111 opened this issue May 7, 2021 · 9 comments
Open

Large put request fails with connection reset #311

barperez111 opened this issue May 7, 2021 · 9 comments
Labels

Comments

@barperez111
Copy link

Bug description

Hi! i encounter something that feels like a bug. it is related to what is talked about here.

Basically, we run z2jh on eks. when saving large notebooks the correspondent http put request sent by content manager fails with connection reset after =~ 1 min.
To check things, we ran a standalone notebook on the same cluster. using it, large files are saved just fine (after something like 2 minutes). that lead me thinking that the issue is chp related, so i added a chp before the stand alone notebook, and the issue appeared (getting connection reset after =~ 1 min ).

I tried setting --timeout and --proxy-timeout params but that didn't help.. log debug level didn't help me either.

any thoughts? are we sure timeout params working well?
Ill Appreciate any help whatsoever ,
thanx!

@barperez111 barperez111 added the bug label May 7, 2021
@welcome
Copy link

welcome bot commented May 7, 2021

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@barperez111
Copy link
Author

Any help here please?

@minrk
Copy link
Member

minrk commented May 25, 2021

Sorry for leaving you hanging. Can you share the logs from CHP and the backend server (ideally with --debug) when this happens and the command-line options you use? It should be something like:

configurable-http-proxy --log-level debug --timeout 300000 --proxy-timeout 300000 # 5 minute timeout for each

There's probably a configuration parameter in node-http-proxy we need to expose. Including the traceback from CHP if there is one would help pin down what's needed, whether it's in the client configuration or server.

@barperez111
Copy link
Author

Hi! Sorry for the huge delay.
I ran with the suggested configuration but there seems to be nothing interesting in the logs (I made sure chp ran in debug mode).

The logs shows that the proxy of content/ api but I couldn't find any error logs..

  1. Maybe i'm missing a parameter that enables error logs?
  2. Can you think of a parameter that is not exposed and may be relevant to the issue?

Thanx a lot!

@consideRatio
Copy link
Member

I'm unsure about configuration to solve your issue in CHP, but I wonder what evidence there is that CHP is to blame compared to another part in the network chain.

Anyone that can reproduce this on another k8s cluster setup, such on on GKE or AKS would reduce the chances it is configuration in a AWS component managing incoming traffic.

Perhaps you can write down more details about your entire setup? How is network traffic flowing? The JupyterHub Helm chart will let traffic go from (the autohttps pod running Traefik ->) the proxy pod running CHP -> the user pod when it comes to saving a notebook. Those could be at fault - but then there is components outside control of the JupyterHub Helm chart as well that could be at fault. Do we have a way to pinpoint the issue to what component is causing the trouble?

@barperez111
Copy link
Author

What caused the suspicion in chp is: when we added a pod (classic notebook separated from the jupyter helm) to our cluster saving large notebooks works (taking more than 2 minutes, but work). when we added chp as a proxy to it, the issue reappeared.

@consideRatio
Copy link
Member

consideRatio commented Jun 24, 2021

Ah that is a great test to pinpoint it to CHP @barperez111! Is it correct then that the network traffic has gone the same paths, but in one case it went directly to a user pod instead through CHP to the user pod - in both situations using the same other network infrastructure?

@barperez111
Copy link
Author

Yes I believe that is correct.

@oharach1
Copy link

oharach1 commented Aug 9, 2021

Hi! Sorry for the huge delay.
I ran with the suggested configuration but there seems to be nothing interesting in the logs (I made sure chp ran in debug mode).

The logs shows that the proxy of content/ api but I couldn't find any error logs..

  1. Maybe i'm missing a parameter that enables error logs?
  2. Can you think of a parameter that is not exposed and may be relevant to the issue?

Thanx a lot!

Hi @barperez111 I am running on a similar problem while trying to upload files >10MB, I have read the information about modify tornado websockets and body size and memory on jupyter notebook configuration but still facing the same issue for upload large files.

One question, how do you modify the parameters timeout & proxy-timeout for Configurable-http-proxy? I mean this modification was via Jupyterhub config file?

thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants