Date type has not enough precision for the logging use case. #10005

jordansissel · 2015-03-05T22:28:11Z

At present, the 'date' type is millisecond precision. For many log use cases, higher precision time is valuable - microsecond, nanosecond, etc.

The biggest impact of this is during sorting of search results. If you sort chronologically, newest-first, by a date field, documents with the same date will probably be sorted incorrectly (because they match). This is often reported by users seeing events "out of order" when they have the same timestamp. Specific example being sorting by date and seeing events in newest-first order, unless there is a tie, in which case oldest-first (or first-written?) appears. This causes a bit of confusion for the ELK use case.

Related: logstash-plugins/logstash-filter-date#8

I don't have any firm proposals, but I have two different implementation ideas:

Proposal 1, use a separate field: Store our own custom-precision time in a separate field as a long. This allows us to do correct sorting (because we have higher precision), but it makes any date-related functionality in Elasticsearch not usable (searching now-1h or doing date_histogram, etc)
Proposal 2, date type has tunable precision: Have the date type have configurable precision, with the default (backwards compatible) precision being milliseconds. This would let us choose, for example, nanosecond precision for the logging use case, and year precision for an archaeological use case (billions of years ago, or something). Benefit here is date histogram and other date-related features could still work. Further, having the precision configurable would allow us to keep the underlying data structure a 64bit long and users could choose their most appropriate precision.

The text was updated successfully, but these errors were encountered:

jordansissel · 2015-03-05T22:54:59Z

I know Joda's got a precision limit (the Instant class is millisecond precision) and a year limit ("year must be in the range [-292275054,292278993]"). I'm open to helping explore solutions in this area.

synhershko · 2015-03-06T00:00:00Z

What about consequences to field_date size? even with docvalues in place, cardinality will be ridiculously high. Even for those scenarios which need this, this could be an overkill, no?

nikonyrh · 2015-03-16T10:11:18Z

Couldn't you just store the decimal part of the second in a secondary field (as a float or long) and sort by these two fields when needed? You could still aggregate based on the standard date field but not at a microsecond resolution.

markwalkom · 2015-04-16T04:16:37Z

I've been speaking to a few networking firms lately and it's dawned on me that microsecond level is going to be critical for IDS/network analytics.

markwalkom · 2015-06-24T05:10:13Z

Another request from the community - https://discuss.elastic.co/t/increase-the-resolution-of-elastic-timestamps-to-nanoseconds/24227

anoinoz · 2015-06-24T05:14:44Z

that last request is mine I believe. I would add that to monitor networking (and other) activities in our field, nanosecond support is paramount.

abrisse · 2015-09-04T16:25:17Z

👍 for this feature

jack-pappas · 2015-09-13T17:43:28Z

What about switching from Joda Time to date4j? It supports higher-precision timestamps compared to Joda and supposedly the performance is better as well.

dadoonet · 2015-09-13T17:54:08Z

Before looking on the technical side, is BSD License compatible with Apache2 license?

dadoonet · 2015-09-13T18:04:38Z

So BSD is compatible with Apache2.

clintongormley · 2015-09-19T11:59:14Z

I'd like to hear @jpountz's thoughts on this comment #10005 (comment) about high cardinality with regards to index size and performance.

I could imagine adding a precision parameter to date fields which defaults to ms, but also accepts s, us, ns.

We would need to move away from Joda, but I wouldn't be in favour of replacing Joda with a different dependency. Instead, we have this issue discussing replacing Joda with Java.time #12829

jpountz · 2015-09-24T13:24:20Z

@clintongormley It's hard to predict because it depends so much on the data so I ran an experiment for an application that ingests 1M messages at a 2000 messages per second per shard rate.

Precision	Terms dict (kB)	Doc values (kB)
milliseconds	3348	2448
microseconds	10424	3912

Millisecond precision is much more space-efficient, in particular because with 2k docs per second, several messages are in the same millisecond, but even if we go with 1M messages at a rate of 200 messages per second so that sharing the same millisecond is much more unlikely, there are still significant differences between millisecond and microsecond precision.

Precision	Terms dict (kB)	Doc values (kB)
milliseconds	7604	2936
microseconds	10680	4888

That said, these numbers are for a single field, the overall difference would be much lower if you include _source storage, indexes and doc values for other fields, etc.

Regarding performance, it should be pretty similar.

clintongormley · 2015-09-25T11:33:57Z

From what users have told me, by far the most important reason for storing microseconds is for the sorting of results. it make no sense to aggregate on buckets smaller than a millisecond.

This can be achieved very efficiently with the two-field approach: one for the date (in milliseconds) and one for the microseconds. The microseconds field would not need to be indexed (unless you really need to run a range query with finer precision than one millisecond), so all that would be required is doc_values. Microseconds can have a maximum of 1,000 values, so doc_values for this field would require just 12 bits per document.

For the above example, that would be only an extra 11kB.

A logstash filter could make adding the separate microsecond field easy.

clintongormley · 2015-09-25T11:37:15Z

Meh - there aren't 1000 bits in a byte. /me hangs his head in shame.

It would require 1,500kB

pfennema · 2015-09-30T15:27:14Z

If we want to use the ELK framework proper analyzing network latency we really need nanosecond resolution. Are there any firm plans/roadmap to change the timestamps?

portante · 2015-10-01T04:43:17Z

Let's say I index the following JSON document with nanosecond precision timestamps:

{ "@timestamp": "2015-09-30T12:30:42.123456789-07:00", "message": "time is running out" }

So the internal date representation will be, 2015-09-30T19:30:42.123 UTC, right?

But if I issue a query matching that document, and ask for either the _source document or the @timestamp field explicitly, won't I get back the original string? If so, then in cases where the original time string lexicographically sorts the same as the converted time value, would that be sufficient for a client to further sort to get what they need?

Or is there a requirement that internal date manipulations in ES need such nanosecond precision? I am imagining that if one has records with nanosecond precision, only being able to query for a date range with millisecond precision could potentially result in more document matches than wanted. Is that the major concern?

pfennema · 2015-10-02T13:59:28Z

I think the latter, internal date manipulations need probably nanosecond precision. Reason is that when monitoring latency on 10Gb networks we get pcap records (or packets directly from the switch via UDP) which include multiple fields with nanosecond timestamps in the record. We like to find out the difference between the different timestamps in order to optimize our network/software and find correlations. In order to do this we like to zoom in on every single record and not aggregate records.

abierbaum · 2015-10-25T22:13:09Z

👍 for solving this. It is causing major issues for us now in our logging infrastructure.

clintongormley · 2015-10-26T10:42:12Z

@pfennema @abierbaum What problems are you having that can't be solved with the two-field solution?

pfennema · 2015-10-26T11:46:17Z

What we like to have is that we have a timescale in the display (Kibana) where we can zoom in on the individual measurements which have a timestamp with nanosecond resolution. A record in our case has multiple fields (NICTimestamp, TransactionTimestamp, etc) which we like to correlate with each other on an individual basis hence not aggregated. We need to see where spikes occur to optimize our environment. If we can have on the x-axis the time in micro/nanosecond resolution we should be able to zoom in on individual measurements.

abierbaum · 2015-10-26T12:27:55Z

@clintongormley Our use case is using ELK to analyze logs from the backend processes in our application. The place we noticed it was postgresql logs. With the current ELK code base, even though the logs coming from the database server have the commands in order, once they end up elastic search and are visualized in kibana the order of items happening on the same millisecond are lost. We can add a secondary sequence number field, but that doesn't work well in Kibana queries (since you can't sort on multiple fields) and causes quite a bit of confusion on the team because they just expect the data in Kibana to be sorted in the same order as it came in from postgresql and logstash.

gigi81 · 2015-11-20T09:20:34Z

We have the same problem as @abierbaum described. When events happen on the same millisecond the order of the messages is lost.
Any workaround or suggestion on how to fix this would be really appreciated.

dtr2 · 2016-01-17T16:33:03Z

You don't need to increase the timestamp accuracy: instead, the time sorting should be based on both timestamp and ID: message IDs are monotonically increasing, and specifically, they are monotonically increasing for a set of messages with the same timestamp...

StephanX · 2017-11-22T22:26:22Z

Our use case is that we ingest logs kubernetes => fluentd (0.14) => elasticsearch, and logs that are emitted rapidly (anything under a millisecond apart, which is easily done) obviously have no way of being kept in that order when displayed in kibana.

varas · 2017-12-11T12:07:00Z

Same issue, we are tracking events that happen within nanosec precision.

Is there any plan to increase it?

clintongormley · 2017-12-11T12:11:12Z

Yes, but we need to move from Joda to Java.time in order to do so. See #27330

gavenkoa · 2018-01-21T18:03:02Z

I opened bug in Logback as its core interface also preserves data in millisecond resolution so precision is lost even earlier, before ES: https://jira.qos.ch/browse/LOGBACK-1374

It seems that historical java.util.Date type is the cause of problems is Java world.

shekharoracle · 2018-01-27T06:59:17Z

Same use case, using kubernetes filebeat elasticsearch stack for log collection, but not having nano second precision is leading to incorrect ordering of logs.

portante · 2018-01-27T12:31:33Z

Seems like we need to consider the collectors providing a monotonically increasing counter which records the order in which the logs were collected. Nanosecond precision does not necessarily solve the problem because time resolution might not be nanosecond.

lgogolin · 2018-02-16T16:13:26Z

Seriously guys ? This bug is almost 3 years old...

matthid · 2018-02-16T18:57:09Z

The problem is also that if you try to find a workaround you run into a series of other bugs so there is not even a viable acceptable workaround:

If you use a string, sorting will be slow
If you use a integer and try to make it readable you will not have big enough numbers (Support for BigInteger and BigDecimal #17006)
If you add an additional ordering field you cannot easily configure Kibana to have a "thenBy" ordering on that field.

So the only viable workaround seems to be to have an epoch + 2 additional digits which are increased in logstash when the timestamp matches.

Does anyone have found a better approach?

jraby · 2018-02-16T19:32:47Z

Been storing microseconds since epoch in an number field for 2 years now.
Suits our needs but YMMV.

jpountz · 2018-03-14T09:15:16Z

cc @elastic/es-search-aggs

tlhampton13 · 2018-05-16T21:43:19Z

Not all time data is collected using commodity hardware. There is plenty of specialty equipment that collects nanosecond resolution data. Thinking about other applications besides log analysis. Sorting by time is critical, but aggregations over small timeframes is also important. For example, maybe I just want to aggregate some scientific data over a one second window or even over millisecond window.

I have nanosecond resolution data and would love to be able to use ES aggregations to analyze it.

jimczi · 2019-02-11T11:22:56Z

Elasticsearch 7.0 will include a date_nanos field type that handles nanoseconds sorting precision:
#37755
Nanoseconds precision field is now a first class citizen that doesn't require two fields to retain precision so I will close this issue, please open new ones if you find bugs or enhancements to make on this new field type.

gavenkoa · 2021-12-27T17:56:19Z

https://jira.qos.ch/browse/LOGBACK-1374 added Instant getInstant() to the interface ILoggingEvent allowing to capture nanotime resolution!

It is in 1.3.0-alpha12. I'll expect to see usage in new appenders.

clintongormley added discuss :Search/Mapping Index mappings, including merging and defining field types labels Mar 9, 2015

clintongormley added :Dates and removed :Search/Mapping Index mappings, including merging and defining field types labels Jun 24, 2015

rashidkpc mentioned this issue Sep 28, 2015

Nanosecond times elastic/kibana#2498

Closed

clintongormley added the high hanging fruit label Dec 5, 2015

tbragin mentioned this issue Jan 6, 2016

Extract log event context elastic/kibana#275

Closed

jchannon mentioned this issue Sep 7, 2017

What happens if time_key is not present uken/fluent-plugin-elasticsearch#284

Closed

mattaezell mentioned this issue Jan 30, 2018

LineLogger NewLineBuf timestamp format xcat2/goconserver#19

Closed

This was referenced Feb 2, 2018

Support nanosecond (epoch nano) timestamp ecwws/fluent-plugin-elasticsearch-timestamp-check#9

Closed

Add an ability to specify arbitrary precision timestamp ecwws/fluent-plugin-elasticsearch-timestamp-check#10

Merged

gberche-orange mentioned this issue Feb 6, 2018

Doc: unsupported ns timestamp in ELK for log ordering cloudfoundry-attic/cf-syslog-drain-release#10

Closed

clintongormley added :Search/Mapping Index mappings, including merging and defining field types and removed :Dates labels Feb 13, 2018

dadoonet mentioned this issue Mar 18, 2018

6.2.2 timestamps could be out of order #29128

Closed

dadoonet mentioned this issue Apr 8, 2018

Date range query does it support microseconds? #29422

Closed

colings86 added the >feature label Apr 24, 2018

higee mentioned this issue Jul 4, 2018

microsecond 단위 date도 사용 가능한지? higee/elastic#24

Closed

This was referenced Aug 3, 2018

Core: Add new nanosecond supporting field mapper #32601

Closed

Core: Migrating from joda time to java.time #27330

Closed

cosmo0920 mentioned this issue Oct 10, 2018

ES - Wrong sort on timestamp uken/fluent-plugin-elasticsearch#480

Closed

alaendle mentioned this issue Jan 22, 2019

$_time precision fixed to milliseconds - limits sampling rate to 1kHz eclipse-archived/unide#40

Closed

jimczi closed this as completed Feb 11, 2019

mfriess2 mentioned this issue Aug 7, 2020

Switch FastDateFormat to DateTimeFormatter logfellow/logstash-logback-encoder#377

Merged

Date type has not enough precision for the logging use case. #10005

Date type has not enough precision for the logging use case. #10005

Comments

jordansissel commented Mar 5, 2015

jordansissel commented Mar 5, 2015

synhershko commented Mar 6, 2015

nikonyrh commented Mar 16, 2015

markwalkom commented Apr 16, 2015

markwalkom commented Jun 24, 2015

anoinoz commented Jun 24, 2015

abrisse commented Sep 4, 2015

jack-pappas commented Sep 13, 2015

dadoonet commented Sep 13, 2015

dadoonet commented Sep 13, 2015

clintongormley commented Sep 19, 2015

jpountz commented Sep 24, 2015

clintongormley commented Sep 25, 2015

clintongormley commented Sep 25, 2015

pfennema commented Sep 30, 2015

portante commented Oct 1, 2015

pfennema commented Oct 2, 2015

abierbaum commented Oct 25, 2015

clintongormley commented Oct 26, 2015

pfennema commented Oct 26, 2015

abierbaum commented Oct 26, 2015

gigi81 commented Nov 20, 2015

dtr2 commented Jan 17, 2016

StephanX commented Nov 22, 2017 • edited

varas commented Dec 11, 2017

clintongormley commented Dec 11, 2017

gavenkoa commented Jan 21, 2018

shekharoracle commented Jan 27, 2018

portante commented Jan 27, 2018

lgogolin commented Feb 16, 2018

matthid commented Feb 16, 2018

jraby commented Feb 16, 2018

jpountz commented Mar 14, 2018

tlhampton13 commented May 16, 2018

jimczi commented Feb 11, 2019

gavenkoa commented Dec 27, 2021 • edited

StephanX commented Nov 22, 2017 •

edited

gavenkoa commented Dec 27, 2021 •

edited