Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

msgp: too few bytes left to read object #875

Open
Dieterbe opened this issue Mar 16, 2018 · 5 comments
Open

msgp: too few bytes left to read object #875

Dieterbe opened this issue Mar 16, 2018 · 5 comments

Comments

@Dieterbe
Copy link
Contributor

2018/03/16 10:29:00 [dataprocessor.go:221 func1()] [E] DP getTargetsRemote: error unmarshaling body from mt-read00-12574-medium-ops-b-2445396050-vdcg2/getdata: "msgp: too few bytes left to r>
2018/03/16 10:29:00 [graphite.go:766 executePlan()] [E] HTTP Render msgp: too few bytes left to read object
[Macaron] 2018-03-16 10:29:00: Completed /render 500 Internal Server Error in 129.391075ms
@shanson7
Copy link
Collaborator

I'm seeing these fairly frequently. Any idea what the issue is?

@Dieterbe Dieterbe added this to the 0.9.1 milestone May 2, 2018
@shanson7
Copy link
Collaborator

shanson7 commented May 9, 2018

I deployed a “silent node” (carbon in, partition 9999) and added some debug statements

It turns out the buffers are coming back as nil
2018/05/09 21:05:26 [dataprocessor.go:223 func1()] [E] DEBUG len(buf)=0, is nil:true

It seems like we are getting nil buffers back from the peers when the request gets canceled. Adding more logging I see
2018/05/09 21:30:24 [dataprocessor.go:216 func1()] [E] DP getTargetsRemote: error with POST to metrictank-read-046-1/getdata: "500 Internal Server Error"

Looking at that time for metrictank-read-046-1 I see
2018/05/09 21:30:24 [cluster.go:191 getData()] [E] HTTP getData() start must be before end.

That comes from cassandra store. Likely something to do with this logic: https://github.com/grafana/metrictank/blob/master/api/dataprocessor.go#L537

@shanson7
Copy link
Collaborator

shanson7 commented May 9, 2018

I think this is ccache corruption. For this particular repro request it was always the same instance that was breaking things. I sent a ccache/delete request and now the error is gone for this repro

@tehlers320
Copy link

This occurred for me during a schema update and was not related to the ccache at all on version 0.9.0. Once schemas were the same on all servers this went away.

@shanson7
Copy link
Collaborator

Sorry, to clarify:

  1. The main issue of msgp: too few bytes left to read object is coming from here. This happens when the request to the peer is canceled because another peer has returned an error (so the buffer is nil and not eligible for unmarshaling). The fix for this is probably to just check if the request was canceled before unmarshaling.

  2. This means that there is another problem that is causing the error to be returned. In my specific case it is some ccache corruption.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants