Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Metrics

Sundeep Tiyyagura edited this page Jun 17, 2020 · 23 revisions

User Guide


Working With Metrics

Most of the functionality of Argus revolves around the manipulation of metrics. This section describes how a metric is represented in Argus and how to query and transform metrics.

Note: In the syntax examples, brackets [] indicate optional elements, and angle brackets <> denote mandatory elements.

Metric Identifier

When identifying metric components by name, you are restricted to a subset of characters. Characters outside of the subset are replaced by a double underscore ("__").

Allowed Identifier Characters
a to z, A to Z, 0 to 9, -, _, .,/

Metric Structure

A metric includes a namespace, scope, name, and tags.

Metric Fields
<scope><name>[tags][namespace]

Scope—The required scope field categorizes the metric. Metrics that are generated from the same logical source can be published under the same scope. For example, all metrics from a particular datacenter, which represent values at that datacenter level, can be published using the ‘datacenter’ scope. Metrics generated by specific hosts within the datacenter can be published using the ‘datacenter.hostname’ scope. You decide how the scope field is used.

Scope Field
<identifier>:

Name—The metric name is a required field that identifies the measure used. For example, if the metric is a measure of physical memory used on a host, ‘memory.physical.used’ is a good name choice.

Name Field
<identifier>:

Tags—Tags associate low cardinality information with metrics to facilitate fast aggregation. Tags are optional. The tag field consists of one or more key-value pairs separated by commas and enclosed in curly brackets. A good example of using tags is when tracking specific HTTP method types for a metric that measures web server request count. In this case, the tag is the method, and value can be POST, PUT, GET, or DELETE. The number of values for tag values are small and bounded.

Tag Fields Sample
{<key_identifier>=<value_identifier>[,<key_identifier>=<value_identifier>]*} {type=request,method=POST}

Namespace—The namespace is an optional field. It indicates a unique logical space for one or more metrics that you own. The purpose of providing a namespace is to avoid metric name collisions between groups of users and to restrict the publishing authority of metric data to the owner of the namespace. If no namespace is associated with a metric, it belongs to the global namespace and is not restricted nor protected. Namespaces are generated at your request using the /namespace web service endpoint. If you generate the namespace, you are considered the owner of the namespace. As the owner, you can add and remove authorized users to that namespace. Only authorized users can publish metrics to a namespace. You can read any metric from any namespace, but only authorized users can publish metric data to a namespace.

Namespace Field
<identifier>:

TIP: It’s recommended that you use dotted names to indicate naming hierarchy. Ordering of name segments are chosen so that the most distinct value appears first. For example, sanfrancisco.california.unitedstates.

TIP: Be judicious in your use of tags. Keep the tag count for metrics low (maximum allowed is seven) and the range of values small. Don’t store unbounded data, such as timestamps and IDs in tags. Doing so negatively impacts query performance.

View Metrics

To view metrics, click Metrics on the navigation bar. The View Metrics page has a text input field where you can enter a query. If your query is a simple metric expression with no transforms applied, any event annotations associated with the resulting time series are also rendered.

TIP: For query grammar, see the Querying Metrics section.

The rendered view has its Y-Axis automatically scaled based on the series data. A pan and zoom control below the chart allows users to easily narrow the view to investigate small time ranges. The upper left of the graph contains a time range selector to easily snap the view to a fixed range.

CAUTION: If the ratio of data points to screen pixels result in a visually crowded chart, the chart renderer dynamically performs an average value downsample. Zooming in increases the level of detail. If you do not want this behavior, specify downsampling in the query.

You can use the bookmark icon on the right side of the input field to bookmark or copy a persistent URL to the rendered view. Share this URL to provide other users easy access to a view of a particular metric query.

Query Metrics

You query metrics with a metric query string that includes the time range, metric expression to query, aggregator to use to combine query results consisting of more than one time series, and optional downsampler specification.

Metric Query Field
<start>[end]<scope><metric>[tags]<aggregator>[downsampler][namespace]

Start—This field can be an absolute timestamp in milliseconds from the epoch or a relative offset. An offset consists of a negative integer with a time unit abbreviation as a suffix. The value cannot represent a time in the future, and it must occur before the end time if it is specified. For example, to specify a query start time of one week ago, use an offset of –7d. The start time is inclusive.

Start Field (offset) Start Field (absolute time)
<relative offset>: <absolute timestamp>:
Valid Time Units
S (seconds)
M (minutes)
H (hours)
D (days)

End—This optional field can be an absolute timestamp in milliseconds from the epoch or a relative offset. An offset consists of a negative integer with a time unit abbreviation as a suffix. The value cannot represent a time in the future, and it must occur after the start time if it is specified. For example, to specify a query end time of 12 AM on January 1, 2020, use an absolute timestamp of 1420070400000. The end time is inclusive.

End Field (offset) End Field (absolute time)
<relative offset>: <absolute timestamp>:

Aggregator—It's possible for a metric query to match multiple discrete time series, such as when a metric uses tags to further uniquity the series into subcomponent time series. For example, if users collect a metric for all hosts in a datacenter, they can choose to specify the hostname as a tag. Doing so allows direct inspection of the metric for any single host. The aggregator field allows fast aggregation across such metrics in which the tags are omitted from the metric identifier. For example, to calculate the total of a metric for the entire datacenter, specify the sum aggregator function on the metric identifier having the tags field omitted.

Aggregator Field
:<aggregator_function>

If the metric identifier fully specifies a single unique time series, the aggregator is not used and can be considered an identity operation.

Valid Aggregators Description Interpolation
sum Arithmetic sum of all data points across series at each timestamp No Interpolation
min Minimum value of data points across series at each timestamp No Interpolation
max Maximum value of data points across series at each timestamp No Interpolation
dev Standard deviation of data points across series at each timestamp Uses Linear Interpolation
avg Arithmetic mean of data points across series at each timestamp Uses Linear Interpolation
count Count the number of data points across series at each timestamp No Interpolation. Uses 0 for missing data points
none None aggregator will output all the raw timeseries without any aggregator applied No Interpolation
Example: Aggregation on a Single Unique Time Series (Identity-Operation)

-1d:appserver0:opcount{type=request,method=POST}:avg:argus

Example: Average Aggregation of Operation Count for All HTTP Request Methods

-1d:appserver0:opcount{type=request}:avg:argus

Downsampler—Often, time-series data is downsampled to a coarser resolution for reporting or transform purposes. This optional field is composed of a relative time interval and an aggregation function. If specified, the time-series data is downsampled to the interval specified using the requested function. The interval must be a positive integer greater than zero.

Aggregator Field
:<interval>-<aggregator>
Example: Using Aggregation and Downsampling

-14d:-7d:appserver0:opcount{type=request}:avg:1h-sum:argus

Note that the downsampler function is first applied on the individual time series before an aggregation is performed across time-series

Pattern Matching

You can use certain wildcard patterns on any metric identifier field within a query expression (namespace, scope, name, tag key, and tag value). Argus supports basic wildcarding, which can return a set of results containing more than one time series. The aggregation function specified in the metric identifier is applied to each time series that is matched by the expression. The order of the time series is not guaranteed, however you can apply sorting using a metric transform to ensure a consistent ordering.

Wildcard Description
* Match any number of characters
? Match any single character
[abc] Match any of 'a','b','c'
[a-c] Match any of 'a','b','c'
[abc|pqr|xyz] Match any of abc, pqr, xyz
Query Expression Pattern Matching

Expression:
-14d:-7d:appserver[0-9]:op?ount{type=request}:avg:1h-sum:ar*us
Matches:
-14d:-7d:appserver0:opfount{type=request}:avg:1h-sum:argus -14d:-7d:appserver3:opcount{type=request}:avg:1h-sum:arkus

Expression:
-14d:-7d:appserver:opfount{type=request|response}:avg:1h-sum:argus
Matches:
-14d:-7d:appserver:opfount{type=request}:avg:1h-sum:argus -14d:-7d:appserver:opcount{type=response}:avg:1h-sum:argus

Transforms

Various functions that you can specify to perform transformation on a time series are available as part of the query syntax. You can next transforms as needed to synthesize more complicated transforms. For example, you can use the DIVIDE and SCALE transforms to synthesize the calculation of a percent.

Transforms generally take one of two types of parameters as input. The most common parameter type is a metric query. The other parameter is a constant that can be a string literal or a number. Constants are indicated by encapsulating the constant value within ‘#’ characters.

Each transform describes its input parameters and types, but generally, metric transforms are multiple input and multiple output functions.

TIP: For a description of the available transforms, their syntax, and semantics, see the Transform Reference section.

TIP: Some transforms have requirements on the alignment of the timestamps for a time series. Clever use of downsampling can snap timestamps to a common time interval.

Publishing Metrics

The process of writing metric data to Argus is calling publishing metrics. A POST request is issued to the /collection/metrics web service endpoint with a JSON payload describing the metric data to be written.

If the metric specifies a namespace, Argus verifies that the authenticated user that's publishing the metric has the authority to perform the operation.

Metric data published to Argus is validated immediately but not committed immediately. The data is enqueued internally and persisted asynchronously using a distributed commit mechanism. Generally, the latency of the write endpoint method is in the range of 100 milliseconds. The latency on the commit of metric data to persistent storage is less than 1 minute. The total time from when a metric is written to the endpoint to the time it is available in a query result is less than 1 minute. The timing depends on the write load and how your Argus deployment is configured.

TIP: Detailed information about metric publishing is in the description of the /collection/metrics endpoint section of the Web Services section.

Example: Metric Payload
[
      {  
      "scope":"jvm",
      "metric":"thread.peak",
      "tags":{  
         "host":"argus.host-132"
      },
      "namespace":"argus",
      "datapoints":{  
         "1444106820000":"751.0",
         "1444106880000":"751.0",
         "1444106940000":"751.0",
         "1444107000000":"751.0",
         "1444107060000":"751.0",
         "1444107120000":"751.0",
         "1444107180000":"751.0",
         "1444107240000":"751.0",
         "1444107300000":"751.0"
      }
]
```
Clone this wiki locally