Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks #6322

Open
KGOH opened this issue May 17, 2024 · 11 comments
Open

Memory leaks #6322

KGOH opened this issue May 17, 2024 · 11 comments
Labels
needs info 📭 Requires more information

Comments

@KGOH
Copy link

KGOH commented May 17, 2024

I have a minimal config, and I constantly observe leaking memory. I used version 2.5.0 and later upgrated to 2.7, but this didn't help

{$GRAFANA_DOMAIN} {
  reverse_proxy grafana:3000
}

{$COLLECTOR_DOMAIN} {
  reverse_proxy myapp:8080
}

image

Caddy is deployed via docker compose, here's a snippet:

  caddy:
    restart: unless-stopped
    image: caddy:2.7
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - ./ssl:/myapp/ssl
      - caddy_data:/data
    ports:
      - 80:80
      - 443:443
    networks:
      default:
        aliases:
          - ${GRAFANA_DOMAIN}
          - ${COLLECTOR_DOMAIN}
    environment:
      GRAFANA_DOMAIN:   ${GRAFANA_DOMAIN}
      COLLECTOR_DOMAIN: ${COLLECTOR_DOMAIN}
@mholt
Copy link
Member

mholt commented May 17, 2024

Is it possible that the backend isn't closing connections? What is myapp:8080 serving? How can we reproduce it?

If you could please collect a memory and goroutine profile that would help answer some questions: https://caddyserver.com/docs/profiling

Thank you

@KGOH
Copy link
Author

KGOH commented May 20, 2024

@mholt myapp is a REST API server. Currently I'm not sure how exactly you can reproduce it without me sharing the whole app (which I can't do)

Here are memory and goroutine profiles
goroutines.txt
heap_debug=1.gz

here's an example of memory graph on the machine I got these profiles from
image

@mholt
Copy link
Member

mholt commented May 20, 2024

Thanks -- interesting... what kind of traffic is your server getting?

The top heaps and goroutines are simply growing buffers to read from TLS connections, and serving HTTP/2, respectively.

Goroutines:

694 @ 0x43e44e 0x44e905 0xb234cf 0xb217cc 0xb20bbc 0x769b93 0x471901
#	0xb234ce	golang.org/x/net/http2.(*serverConn).serve+0x88e	golang.org/x/net@v0.17.0/http2/server.go:940
#	0xb217cb	golang.org/x/net/http2.(*Server).ServeConn+0xbcb	golang.org/x/net@v0.17.0/http2/server.go:531
#	0xb20bbb	golang.org/x/net/http2.ConfigureServer.func1+0xfb	golang.org/x/net@v0.17.0/http2/server.go:321
#	0x769b92	net/http.(*conn).serve+0x1212				net/http/server.go:1917

690 @ 0x43e44e 0x436c97 0x46bce5 0x4e11c7 0x4e24ba 0x4e24a8 0x545825 0x558ea5 0x69841b 0x50dc38 0x6985fe 0x695bd0 0x69be98 0x69bea1 0x4b8350 0xb16fe5 0xb16fb4 0xb17725 0xb22667 0x471901
#	0x46bce4	internal/poll.runtime_pollWait+0x84			runtime/netpoll.go:343
#	0x4e11c6	internal/poll.(*pollDesc).wait+0x26			internal/poll/fd_poll_runtime.go:84
#	0x4e24b9	internal/poll.(*pollDesc).waitRead+0x279		internal/poll/fd_poll_runtime.go:89
#	0x4e24a7	internal/poll.(*FD).Read+0x267				internal/poll/fd_unix.go:164
#	0x545824	net.(*netFD).Read+0x24					net/fd_posix.go:55
#	0x558ea4	net.(*conn).Read+0x44					net/net.go:179
#	0x69841a	crypto/tls.(*atLeastReader).Read+0x3a			crypto/tls/conn.go:805
#	0x50dc37	bytes.(*Buffer).ReadFrom+0x97				bytes/buffer.go:211
#	0x6985fd	crypto/tls.(*Conn).readFromUntil+0xdd			crypto/tls/conn.go:827
#	0x695bcf	crypto/tls.(*Conn).readRecordOrCCS+0x24f		crypto/tls/conn.go:625
#	0x69be97	crypto/tls.(*Conn).readRecord+0x157			crypto/tls/conn.go:587
#	0x69bea0	crypto/tls.(*Conn).Read+0x160				crypto/tls/conn.go:1369
#	0x4b834f	io.ReadAtLeast+0x8f					io/io.go:335
#	0xb16fe4	io.ReadFull+0x64					io/io.go:354
#	0xb16fb3	golang.org/x/net/http2.readFrameHeader+0x33		golang.org/x/net@v0.17.0/http2/frame.go:237
#	0xb17724	golang.org/x/net/http2.(*Framer).ReadFrame+0x84		golang.org/x/net@v0.17.0/http2/frame.go:498
#	0xb22666	golang.org/x/net/http2.(*serverConn).readFrames+0x86	golang.org/x/net@v0.17.0/http2/server.go:820

Heaps:

261: 10690560 [3380: 138444800] @ 0x50ddce 0x50d7fd 0x698594 0x698557 0x6964ff 0x69be98 0x69bea1 0x76322b 0x56d763 0x4b89a3 0x7773bb 0x777298 0x4b89a3 0x4b8806 0x55a386 0x55a343 0x5611ab 0x55fcf0 0x4b87a7 0x781d55 0x781d0f 0x56f74b 0x4b87a7 0x7759a8 0x77598f 0x775525 0x75ccad 0x7849ef 0x471901
#	0x50ddcd	bytes.growSlice+0x8d				bytes/buffer.go:249
#	0x50d7fc	bytes.(*Buffer).grow+0x13c			bytes/buffer.go:151
#	0x698593	bytes.(*Buffer).Grow+0x73			bytes/buffer.go:168
#	0x698556	crypto/tls.(*Conn).readFromUntil+0x36		crypto/tls/conn.go:826
#	0x6964fe	crypto/tls.(*Conn).readRecordOrCCS+0xb7e	crypto/tls/conn.go:676
#	0x69be97	crypto/tls.(*Conn).readRecord+0x157		crypto/tls/conn.go:587
#	0x69bea0	crypto/tls.(*Conn).Read+0x160			crypto/tls/conn.go:1369
#	0x76322a	net/http.(*connReader).Read+0x14a		net/http/server.go:791
#	0x56d762	bufio.(*Reader).Read+0xe2			bufio/bufio.go:230
#	0x4b89a2	io.(*LimitedReader).Read+0x42			io/io.go:480
#	0x7773ba	net/http.(*body).readLocked+0x3a		net/http/transfer.go:839
#	0x777297	net/http.(*body).Read+0x117			net/http/transfer.go:831
#	0x4b89a2	io.(*LimitedReader).Read+0x42			io/io.go:480
#	0x4b8805	io.copyBuffer+0x1a5				io/io.go:430
#	0x55a385	io.Copy+0x65					io/io.go:389
#	0x55a342	net.genericReadFrom+0x22			net/net.go:671
#	0x5611aa	net.(*TCPConn).readFrom+0x6a			net/tcpsock_posix.go:54
#	0x55fcef	net.(*TCPConn).ReadFrom+0x2f			net/tcpsock.go:130
#	0x4b87a6	io.copyBuffer+0x146				io/io.go:416
#	0x781d54	io.Copy+0x54					io/io.go:389
#	0x781d0e	net/http.persistConnWriter.ReadFrom+0xe		net/http/transport.go:1801
#	0x56f74a	bufio.(*Writer).ReadFrom+0x18a			bufio/bufio.go:797
#	0x4b87a6	io.copyBuffer+0x146				io/io.go:416
#	0x7759a7	io.Copy+0x47					io/io.go:389
#	0x77598e	net/http.(*transferWriter).doBodyCopy+0x2e	net/http/transfer.go:412
#	0x775524	net/http.(*transferWriter).writeBody+0x3c4	net/http/transfer.go:370
#	0x75ccac	net/http.(*Request).write+0xbac			net/http/request.go:738
#	0x7849ee	net/http.(*persistConn).writeLoop+0x18e		net/http/transport.go:2424

14: 573440 [7879: 322723840] @ 0x50ddce 0x50d7fd 0x698594 0x698557 0x6964ff 0x69be98 0x69bea1 0x4b8350 0xb16fe5 0xb16fb4 0xb17725 0xb22667 0x471901
#	0x50ddcd	bytes.growSlice+0x8d					bytes/buffer.go:249
#	0x50d7fc	bytes.(*Buffer).grow+0x13c				bytes/buffer.go:151
#	0x698593	bytes.(*Buffer).Grow+0x73				bytes/buffer.go:168
#	0x698556	crypto/tls.(*Conn).readFromUntil+0x36			crypto/tls/conn.go:826
#	0x6964fe	crypto/tls.(*Conn).readRecordOrCCS+0xb7e		crypto/tls/conn.go:676
#	0x69be97	crypto/tls.(*Conn).readRecord+0x157			crypto/tls/conn.go:587
#	0x69bea0	crypto/tls.(*Conn).Read+0x160				crypto/tls/conn.go:1369
#	0x4b834f	io.ReadAtLeast+0x8f					io/io.go:335
#	0xb16fe4	io.ReadFull+0x64					io/io.go:354
#	0xb16fb3	golang.org/x/net/http2.readFrameHeader+0x33		golang.org/x/net@v0.17.0/http2/frame.go:237
#	0xb17724	golang.org/x/net/http2.(*Framer).ReadFrame+0x84		golang.org/x/net@v0.17.0/http2/frame.go:498
#	0xb22666	golang.org/x/net/http2.(*serverConn).readFrames+0x86	golang.org/x/net@v0.17.0/http2/server.go:820
...

There's no memory leak here... just normal traffic. Not even a lot of traffic.

What does top or htop on the machine show?

I'd also be interested if you can reproduce this without the API backend.

@KGOH
Copy link
Author

KGOH commented May 20, 2024

@mholt can this imply that my app doesn't close connections?

I'll get back to you with htop later

@mholt
Copy link
Member

mholt commented May 20, 2024

Quite possibly. It's a bit unfortunate that the graphs only show a single process. That's why I'm curious what the output of top is.

@mholt mholt added the needs info 📭 Requires more information label May 20, 2024
@KGOH
Copy link
Author

KGOH commented May 22, 2024

@mholt, hello. The app is a java jetty server that has idle timeout of 200000ms (about 3.5 minutes) (doc)

Here's a graph for the java app, grafana and caddy processes for the last month. As you can see, there was a restart after a certain memory threshold. The java app doesn't seem to constantly grow in memory. The java app receives mobile traffic, that probably can drop due to unstable internet connection
image

As I can see in the caddy docs, caddy's default request idle timeout is 5 minutes
https://caddyserver.com/docs/caddyfile/options#timeouts
This doesn't explain why we see the growing memory consumption

@mholt
Copy link
Member

mholt commented May 22, 2024

What are your clients / traffic patterns like?

Try setting some timeouts: https://caddyserver.com/docs/caddyfile/options#timeouts

The majority of the (few) goroutines that show up in the profiles are waiting for reads.

@KGOH
Copy link
Author

KGOH commented Jun 7, 2024

Timeouts didn't help (the blue line that ends abruptly is caddy before I added timeouts, the red line is caddy after I restarted it with timeouts config added)
image

I added the following lines to Caddyfile

{
  servers {
    timeouts {
      read_body   5m
      read_header 5m
      write       5m
      idle        5m
    }
  }
}

I'm not sure how to describe traffic pattern. In short there are a lot of mobile connections. I guess they can drop off without properly closing connections. At the same time I observe a big number of such exceptions:
~15000 in 14 days: org.eclipse.jetty.io.EofException Early EOF
image

@KGOH
Copy link
Author

KGOH commented Jun 7, 2024

I'm not sure whether this can be related. On my memory usage graphs you can also notice that java memory sometimes is growing somewhat similarly (not here though!)

Right now I'm also observing a memory leak within java
image
For the sake of context, here's another issue I created in the jetty server repo
jetty/jetty.project#11858

@mholt
Copy link
Member

mholt commented Jun 7, 2024

Interesting... one other thought I have is, what if you disable HTTP/3?

https://caddyserver.com/docs/caddyfile/options#protocols

@hackerfactor
Copy link

Has anyone been able to replicate this problem?
Does it happen with 2.8.4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info 📭 Requires more information
Projects
None yet
Development

No branches or pull requests

3 participants