Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss while writing values with rate greater than heartbat #43

Open
GoogleCodeExporter opened this issue Apr 6, 2015 · 15 comments
Open

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
(all settings are just for example, the issue could be reproduced with others)
1. Create rrdDb with samplingRate 1 sec and one datasource with type GAUGE and 
heartbeat 2 sec
2. Start writing values with rate 3 sec
3. Only one first value will be written all other ones will be lost and 
replaced with NaN.

What is the expected output? What do you see instead?

For example, I write 10 values with rate 3 sec (in rrdDb with settings 
described above)

Current output:
time1  Value1
time2  NaN
time3  NaN
time4  NaN
time5  NaN
time6  NaN
time7  NaN
time8  NaN
time9  NaN
time10 NaN

Expected Output:
time1  Value1
time2  NaN
time3  NaN
time4  Value4
time5  NaN
time6  NaN
time7  Value7
time8  NaN
time9  NaN
time10 Value10

What version of the product are you using? On what operating system?

Version of product: 2.2
The issue is reproduced on 
Windows 7
Linux 2.6.32-358.6.1.el6.x86_64


Please provide any additional information below.

I attach JUnit test that generates the rrdDb with settings described above. It 
requires only junit and rrd libs to be run.

Original issue reported on code.google.com by peter.ku...@gmail.com on 30 May 2013 at 1:52

Attachments:

@GoogleCodeExporter
Copy link
Author

dataRate should be replaced with value 3 in RRDTest to reproduce the issue 
exactly.

Original comment by peter.ku...@gmail.com on 30 May 2013 at 1:59

@GoogleCodeExporter
Copy link
Author

I simplified your test, and it now also generate rrdtool command. The results 
are the same that rrd4j, so I don't think it's a bug.

Original comment by fbacche...@gmail.com on 1 Jun 2013 at 8:02

Attachments:

@GoogleCodeExporter
Copy link
Author

Thank you, I will report this issue to RRD Tool team, because it looks like a 
bug in design as it leads to data loss.

Original comment by peter.ku...@gmail.com on 3 Jun 2013 at 8:57

@GoogleCodeExporter
Copy link
Author

Original comment by fbacche...@gmail.com on 3 Jun 2013 at 10:27

  • Added labels: Priority-Low
  • Removed labels: Priority-Medium

@GoogleCodeExporter
Copy link
Author

Does anybody know a workaround for this?

The samples in my use case drop in very sporadically, and I want them to be 
stored and graphed just like they drop in. Setting a heartbeat higher than my 
sample rate (aka step) is no option, because this leads to previously 
"incative" slots to be filled with my current sample value.

I think this could be patched in 
org.rrd4j.core.Datasource.calculateUpdateValue(long, double, long, double) by 
adding an "else" to the outermost if:

} else {
  updateValue = newValue;
}


Original comment by mschaef...@scoop-gmbh.de on 11 Nov 2013 at 5:58

@GoogleCodeExporter
Copy link
Author

This would make sense if these were counters but this is not what I would 
expect for a gauge.

We ran into this issue to and its a bit of a pain.  Hoping to see this fixed in 
the next release.

Original comment by cbr...@infinio.com on 16 Dec 2014 at 4:16

@GoogleCodeExporter
Copy link
Author

Can anybody confirm that the suggested patch of comment #5 works correctly? 

Original comment by mich...@capacis.de on 16 Dec 2014 at 4:20

@GoogleCodeExporter
Copy link
Author

Actually, I did't test it. We live with this issue in our product trying to 
avoid scenarios leading to data loss. I can provide you only bug report in 
rrdTool, but there was not any response: 
https://github.com/oetiker/rrdtool-1.x/issues/395.

Original comment by peter.ku...@gmail.com on 16 Dec 2014 at 4:27

@GoogleCodeExporter
Copy link
Author

No feed back from rrdtool. So please feel tree to fork rrd4j on github and make 
a pull request. I will welcome any commented and tested patch.

Original comment by fbacche...@gmail.com on 29 Dec 2014 at 11:18

@GoogleCodeExporter
Copy link
Author

forked RRD4J and tried patch suggestion from comment #5, but Output is not as 
expected :(

I suggest we try to find a solution for 
https://github.com/oetiker/rrdtool-1.x/issues/395 first. Left a comment there...

Original comment by mich...@capacis.de on 29 Dec 2014 at 11:56

@GoogleCodeExporter
Copy link
Author

Tried the last released version - 1.5.0-rc2. It's still repoduced. 

see my comment at https://github.com/oetiker/rrdtool-1.x/issues/395

Lets wait for reply.

Original comment by maxim.uv...@gmail.com on 4 Mar 2015 at 1:31

@GoogleCodeExporter
Copy link
Author

The author of rrdtool replied with "work BY DESIGN".

However, we could work around the issue. 
See a commit it my fork: 
https://github.com/themuvarov/RRD4J/commit/44d537af9c1adb3792a5dc531be59f04fe369
3fd 

There is a unit test to confirm the fix: 
https://github.com/themuvarov/RRD4J/commit/3d5c53599fb8898cc102e5be85caf0ba1bf24
267

It works correctly on heartbeat interval bigger then 2 sec. Datasource: GAUGE
The part of the test output:
<!-- Tue May 28 17:34:09 MSK 2013 / 1369748049 -->
         <row>
            <v>+1.0000000000E00</v>
         </row>
         <!-- Tue May 28 17:34:10 MSK 2013 / 1369748050 -->
         <row>
            <v>NaN</v>
         </row>
         <!-- Tue May 28 17:34:11 MSK 2013 / 1369748051 -->
         <row>
            <v>NaN</v>
         </row>
         <!-- Tue May 28 17:34:12 MSK 2013 / 1369748052 -->
         <row>
            <v>+2.0000000000E00</v>
         </row>
         <!-- Tue May 28 17:34:13 MSK 2013 / 1369748053 -->
         <row>
            <v>NaN</v>
         </row>
         <!-- Tue May 28 17:34:14 MSK 2013 / 1369748054 -->
         <row>
            <v>NaN</v>
         </row>
         <!-- Tue May 28 17:34:15 MSK 2013 / 1369748055 -->
         <row>
            <v>+3.0000000000E00</v>
         </row>
         <!-- Tue May 28 17:34:16 MSK 2013 / 1369748056 -->
         <row>
            <v>NaN</v>
         </row>
         <!-- Tue May 28 17:34:17 MSK 2013 / 1369748057 -->
         <row>
            <v>NaN</v>
         </row>
         <!-- Tue May 28 17:34:18 MSK 2013 / 1369748058 -->
         <row>
            <v>+4.0000000000E00</v>
         </row>

Lets us test it in other cases.

Original comment by maxim.uv...@gmail.com on 6 Mar 2015 at 8:37

@GoogleCodeExporter
Copy link
Author

Your test is not a test, there is no Assert, what do you think a continuous 
integration tool (http://jrds.fr/jenkins/job/rrd4j/) can do with println ?

Your code dont check if the datasource type is gauge. Other data sources 
behavior must not be changed.

Original comment by fbacche...@gmail.com on 6 Mar 2015 at 8:47

@GoogleCodeExporter
Copy link
Author

Yes, I know about the issues you mentioned. But I only use GAUGE in my project, 
so it's quite enough for me.

Maybe the WA helps someone

Original comment by maxim.uv...@gmail.com on 6 Mar 2015 at 10:42

@GoogleCodeExporter
Copy link
Author

added assertion - to the test 
added GAUGE check - to the code

https://github.com/themuvarov/RRD4J/commit/55a4d9d8e91275fd931ec3a7286a288d6421a
125

Original comment by maxim.uv...@gmail.com on 6 Mar 2015 at 12:16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant