Skip to content

24.02.0

Latest
Compare
Choose a tag to compare
@Hardikl Hardikl released this 21 Feb 15:24
8f9201c

24.02.0 / 2024-02-21 Release

πŸ“Œ Highlights of this major release include:

  • New Datacenter dashboard which contains node health, capacity, performance, storage efficiency, issues, snapshot, power, and temperature details.

  • Harvest includes SnapMirror active sync EMS events with alert rules. Thanks @Nikhita-13 for reporting.

  • Harvest monitors FlexCache performance metrics and includes a new FlexCache dashboard to visualize them. Thanks to @ewilts for raising.

  • Harvest detects HA pair down and sensor failures. These are shown in the Health dashboard. Thanks to @johnwarlick for raising.

  • Harvest monitors MetroCluster diagnostics and shows them in the MetroCluster dashboard. Thanks to @wagneradrian92 for reporting.

  • We improved the performance of all dashboards that include topk queries. Thanks to @mamoep for reporting!

  • We added filter support for the ZapiPerf collector. See filter for more detail. Thanks to @debbrata-netapp for reporting.

  • A bin/harvest grafana customize command that writes the dashboards to the filesystem so other programs can manage them. Thanks to @nicolai-hornung-bl for reporting!

  • We fixed an intermittent latency spike problem that impacted all perf objects. Thanks to @summertony15 and @rodenj1 for reporting this critical issue.

  • ⭐ Several of the existing dashboards include new panels in this release:

    • Node and Aggregate dashboard include volume stats panels. Thanks to @BrendonA667 for raising.
    • SVM dashboard includes volume capacity panels. Thanks to @BrendonA667 for raising.
    • SnapMirror dashboard includes automated_failover and automated_failover_duplex policies.
  • More Harvest dashboard dropdown variables include the All option. Making it easier to get an overview of your environment.

  • All EMS alerts include an impact annotation. Thanks to @divya for raising.

  • 🌾 Harvest includes new templates to collect:

    • Network filesystem (NFS) rewinds performance metrics (rw_ctx). Thanks to @shawnahall71 for raising
    • Network data management protocol (NDMP) session metrics. Thanks to @schumijo for raising.
  • πŸ“• Documentation additions

    • Harvest describe why and how to configure Docker's logging drivers Docker logging configuration Thanks to @madaan for raising.
    • How to create templates that use ONTAP's private CLI details
    • How to create custom Grafana dashboards Steps
    • How to validate your harvest.yml file and share a redacted copy with the Harvest team. Details
    • Harvest describes high-level concepts here Thanks to @norespers for raising.
  • All constituents are disabled by default for workload detail performance templates.

  • The bin/harvest zapi CLI now supports a timeout argument.

  • Harvest performance collectors (ZapiPerf and RestPerf) ask ONTAP for performance counter metadata every 24 hours instead of every 20 minutes. Thanks to BrianMa for raising.

  • The Harvest REST collector's api_time metric now includes the API time for all template endpoints. Thanks to ChristopherWilcox for raising.

Announcements

‼️ IMPORTANT Release 24.02 disables four templates that collected metrics not used in dashboards.
These four templates are disabled by default: ObjectStoreClient, TokenManager, OntapS3SVM, and Vscan.
This change was made to reduce the number of collected metrics.
If you require these templates, you can enable them by uncommenting them in their corresponding default.yaml or by extending the existing object template.

πŸ”Ί IMPORTANT The minimum version of Prometheus required to run Harvest is now 2.33.
Version 2.33 is required to take advantage of Prometheus's @ modifier.
Please upgrade your Prometheus server to at least 2.33 before upgrading Harvest.

πŸ’‘ IMPORTANT After upgrade, don't forget to re-import your dashboards, so you get all the new enhancements and fixes. You can import them via the 'bin/harvest grafana import' CLI, from the Grafana UI, or from the 'Maintenance > Reset Harvest Dashboards' button in NAbox.

Known Issues

  • Harvest does not calculate power metrics for AFF A250 systems. This data is not available from ONTAP via ZAPI or REST.
    See ONTAP bug 1511476 for more details.

  • ONTAP does not include REST metrics for offbox_vscan_server and offbox_vscan until ONTAP 9.13.1. See ONTAP bug
    1473892 for more details.

IMPORTANT 7-mode filers that are not on the latest release of ONTAP may experience TLS connection issues with errors
like tls: server selected unsupported protocol version 301 This is caused by a change in Go 1.18.
The default for TLS client connections was changed to TLS 1.2 in Go 1.18.
Please upgrade your 7-mode filers (recommended) or set tls_min_version: tls10 in
your harvest.yml poller section.
See #1007 for more details.

Thanks to all the awesome contributors

🀘 Thanks to all the people who've opened issues, asked questions on Discord, and contributed code or dashboards
this release:

@shawnahall71, @pilot7777, @ben, @madaan, @johnwarlick, @jfong5040, @santosh725, @summertony15, @jmg011, @cheese1, @mamoep, @Falcon667, @dess, @debbrata-netapp, @ewilts,
@Nikhita-13, @norespers, @nicolai-hornung-bl, @BrendonA667, @schumijo, @divya, @joshuacook-tamu, @wagneradrian92, @george-strother

🌱 This release includes 26 features, 24 bug fixes, 20 documentation, 3 styling, 5 refactoring, 11 miscellaneous, and 12 ci pull requests.

πŸš€ Features

  • Include Start Time, Exported Metrics, And Poll Duration In Collector logs (#2493)
  • Adding Rw_ctx Zapiperf Object Template (#2494)
  • Change Pollcounter Schedule To 24H (#2499)
  • Add Ha Down And Sensor Issues In Health Dashboard (#2519)
  • Adding Ndmp Session Rest Template (#2531)
  • Use Modifier For Topk To Improve Svm Dashboard Performance (#2553)
  • Add Timeout For Zapi Cli (#2566)
  • Restperf Disk Plugin Should Support Metric Customization (#2573)
  • Add Filter Support For Zapiperf Collector (#2575)
  • FlexCache Monitoring (#2583)
  • Supporting Automated_failover, Automated_failover_duplex Policy In Sm (#2584)
  • Disabled The Templates Whose All Metrics Are Not Consumed In Dashboards (#2587)
  • Harvest Should Include Snapmirror Active Sync Ems Events (#2588)
  • Use Modifier For Topk To Improve Dashboard Performance (#2590)
  • Harvest Should Include A Snapmirror Active Sync Template (#2596)
  • Disable Constituents By Default For Workload Detail Performance Templates (#2598)
  • Adding Template For Metrocluster Diagnostics Check (#2601)
  • Adding Per Volume Panels In Svm Dashboard (#2602)
  • Add Grafana Customize Command (#2619)
  • Add Volume Stats To Node And Aggregate Dashboard (#2627)
  • Ems Alerts Should Include An Impact Annotation (#2631)
  • Improving Debug Log Clarity And Reducing Noise (#2637)
  • Datacenter Dashboard (#2650)
  • Harvest Dashboards Should Include An All Option (#2661)
  • Percent Unit Panels Should Use Decimal Points (#2663)
  • Change Stat Panel For Uptime,Power Status,Fan Status To Table In Node Dashboard (#2668)

πŸ› Bug Fixes

  • Handled Missing Uuid In Volume For Change_log (#2478)
  • Remove Docs From Deb Binary (#2489)
  • Parsed Logger Changes (#2490)
  • Array Metrics Should Have Correct Base Label In Zapiperf (#2496)
  • Harvest Should Collect Luns In Qtress (#2502)
  • Grafana Export Should Set Correct Permissions (#2505)
  • Begin Log For Pollcounter And Pollinstance Should Be In Ms (#2509)
  • Quickstart.md Docs Should Not Duplicate Pollers (#2521)
  • Print Results If Not Nil For Rest Cli (#2525)
  • Storage Efficiency Ratios Panels Should Show Cluster Capacity (#2529)
  • Qos Fixed% Should Include Admin Svm Qos Policy (#2532)
  • Handling Shelf_new_status For 7Mode (#2535)
  • Rest Aggr.yaml Template Should Be In The 9.11.0 Folder (#2538)
  • Storagegrid Collectors Should Support Only_cluster_instance (#2542)
  • Intermittent Latency Spike (#2548)
  • Hide Idle Metric And Max To Auto For Cpu_domain_busy (#2555)
  • Storagegrid Error When Password Has (#2576)
  • Container Workflow Creates Files As Root Even When The Commands Are Executed By A Non-Root User (#2581)
  • Clone_split_estimate Parse Error (#2613)
  • Qos Latency Spikes Due To Low Iops (#2615)
  • Fix Datacenter Count In Metadata Dashboard (#2622)
  • Doctor Print Should Include Child Pollers Into Optional Parent Pollers (#2641)
  • Remove Max Percent Limit From 'Volumes Per Snapshot Reserve Used' Panel (#2662)
  • Align Template Name With Object Name For Ndmp (#2667)
  • Honor absolute paths from the HARVEST_CONF environment variable (#2674)
  • Rest collector should include endpoint api_times (#2679)
  • StorageGrid Rest collector doesn't remove deleted Objects (#2677)
  • NABox doctor command errors for custom.yaml (#2691)
  • WaflSizer RestPerf template panics (#2695)
  • Purging unused metrics from shelf template for 7mode (#2696)
  • Handle inter-cluster snapmirrors when different datacenter (#2697)
  • Multi poller in a container should route logs to console (#2698)

πŸ“• Documentation

  • Fix Service Latency (#2492)
  • Fix Doc Link From Changelog Dashboard (#2510)
  • How To Use Harvest With Rest Private Cli (#2523)
  • Add Docker Logging Configuration Guide (#2524)
  • Mention Iec Is Base2 And Source (#2527)
  • Change Link From Netapp.io To Github (#2533)
  • Steps To Create Custom Grafana Dashboard (#2550)
  • Add Type And Base For Qos Detail Metrics (#2557)
  • Ems Doc Update (#2561)
  • Unit For Nfs Throughput Should Be B_per_sec (#2562)
  • Consolidate Upgrade Steps With Install (#2567)
  • Bump The Minimum Prometheus Version To 2.33 (#2569)
  • Add Restart Information In Power Document (#2572)
  • Add Rest Strategy Under Left Nav (#2578)
  • Add Rest Permissions (#2604)
  • Add Fsa Template Description (#2606)
  • Update Grafana Datasource Docs (#2614)
  • Add Vserver For Rest Role Creation (#2620)
  • Fix Broken Link And Remove Todo (#2624)
  • Harvest Should Describe High-Level Concepts (#2625)
  • Add doctor print commands for each platform (2670)
  • Release 24.02 metric docs (#2694)
  • Debian upgrade documentation (#2699)

Styling

  • Resolve Spell Check Warnings (#2461)
  • Address All Lint Errors (#2643)
  • Address Lint Warnings In Integration (#2659)

Refactoring

  • Move Begin Logging To The End Of The Line (#2513)
  • Update Aggr Dashboard To Sync With Sm (#2568)
  • Remove Dead Code (#2570)
  • Address Data Flow Analysis Warnings (#2589)
  • Revert Ontap Mediator Alert Names (#2618)

Miscellaneous

  • Update All Dependencies (#2481)
  • Merge 23.11.0 To Main (#2488)
  • Update All Dependencies (#2522)
  • Update All Dependencies (#2543)
  • Update All Dependencies (#2558)
  • Update All Dependencies (#2564)
  • Update All Dependencies (#2579)
  • Update All Dependencies (#2585)
  • Update Golang.org/X/Exp Digest To 1B97071 (#2592)
  • Update All Dependencies (#2629)
  • Update Golangci/Golangci-Lint-Action Action To V4 (#2653)
  • Update all dependencies (#2687)

πŸ”¨ CI

  • Keep Mkdocs Version Fixed For Build Servers (#2511)
  • Bump Go (#2536)
  • Update Range In Query To 3H Before Validation (#2571)
  • Bump Go (#2582)
  • Template Validation For Rest, Restperf (#2586)
  • Bump Dependencies (#2600)
  • Detect Poller Logs Errors (#2603)
  • Detect Poller Logs Errors (#2609)
  • Fix Nightly Build (#2630)
  • Bump Go And Dependencies (#2649)
  • Disable Dockerfile Updtes By Renovate (#2655)
  • Ignore Metrocluster Error In Counter Test (#2664)
  • Bump go (#2671)
  • Update makefile go version (#2678)