Skip to content

KeithRochester/Standard-Monitoring-Framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Standard Monitoring Framework SCOM MPs

Easily and quickly deploy business service focused monitoring. Configuration is stored in the registry for Windows agents and in CSV files for Unix/Linux agents. You can optionally use SCOM Agent tasks to create and update the registry keys and CSV files.

Business services and sub-components health state can be manually set using SCOM agent tasks.

Introduction to the Standard Monitoring Framework (SMF)

For a while now, I’ve been helping customers implement standardised monitoring in various guises. These management packs demonstrate the principle. They can be used as a starting point for implementing your own standard monitoring. The MPs have been put together in my personal lab which means they have had limited testing, they should work in your environment, but I can’t be 100% certain for all the supported Unix/Linux distros. Solaris taught me not to assume what works on RHEL works everywhere. ☹

History

The idea for the SMF came from the great work that people like Russ Slaten and Tyson Paul have done on creating monitoring from configuration files. They both have created MPs that monitor ‘stuff’ by reading/discovering configuration from files. I liked the work they had done, to allow the easy creation of port and URL monitoring without having to use authoring tools, so much I decided to try and apply the same principle to as much monitoring as possible.

Goals

  • Set of standard monitors that target discovered objects
  • Configuration of monitoring stored as properties of objects
  • SCOM objects discovered based on registry keys on Windows and CSVs on Unix/Linux
  • Move monitoring configuration out of SCOM making it easier for application SMEs to ‘own’ their monitoring
  • Reduce the complexity and time to deploy monitoring
  • Monitoring configuration becomes part of the application configuration and owned and deployed by application SMEs
  • Monitoring is added to a Business Service for dashboarding/reporting
  • Alerts have owner set based on object property (Can be disabled/delayed to avoid conflict with other alert routing processes)
  • Business Service health state can be manually set (Allows dashboards to be red/amber if an incident is occurring that we don't yet have monitoring for) - More info.

An example Windows Service monitoring

Without SMF

  • Application SME gets in touch with the SCOM SME (me) asking for a new Windows Service monitor to be authored and implemented for a new service they have created.
  • The request gets added to a back log and eventually scheduled.
  • Even with Kevin Holman’s fragments this takes some time and as it’s a new or updated MP, I’m bound to fat finger something and it’ll need testing and fixing.
  • Monitoring is deployed and probably forgotten about until the Windows service changes and the process starts again!

With SMF

  • The Application SME creates a set of registry keys on the servers where the service needs to be monitored. Describing:
    • Which Windows service to monitor.
    • When to monitor it.
    • What priority/severity the alerts should be.
    • The team that alerts should be routed to.
    • The business service the Windows service is part of.
  • SCOM discoveries run and discover the Windows Service, monitoring configuration, and the business service it is part of.
  • SCOM monitors the service and includes its health in the business service (Visible in the SCOM console and SquaredUp dashboards)

Zero requirement for the SCOM SME to be involved. Other than to encourage application owners to build and review monitoring for their applications.

If creating registry keys and CSVs sounds like too much work for your application SMEs then to help get started I’ve created SCOM agent tasks that create the keys and CSVs with overridden information. But I really try to push the ownership to the SMEs. The less work I have to do the better!

DANGER WILL ROBINSON

The tasks are in optional MPs (Standard.Monitoring.Framework.Windows.Tasks and Standard.Monitoring.Framework.Linux.Tasks) because if your operator roles are not scoped correctly and health services not locked down they could be used to escalate an operators access to the monitored environment. Make sure you understand the risk before importing these MPs.

Don’t give operators access to run the tasks on servers if they aren’t administrators of the servers already, otherwise they could use the tasks to give themselves admin access. Consider auditing task execution.

Common Properties

All SMF monitored objects have these common base properties.

  • Business Service – Business service the object is part of.
  • Component – Component the object is part of i.e. backend servers. Components are contained by business services and health rolls up.
  • Environment – Environment that the object is part of.
  • Team – Team that alerts will be routed to.
  • Frequency – How often to run the monitoring.
  • Schedule Start Time – Time to start monitoring i.e. 00:00 or 09:00 etc...
  • Schedule End Time – Time to stop monitoring i.e. 23:59 or 17:00 etc...
  • Schedule Days – Bitmask for the days of the week to run monitoring i.e. 127 or 7 days 62 for Monday to Friday.
    • Sunday = 1
    • Monday = 2
    • Tuesday = 4
    • Wednesday = 8
    • Thursday = 16
    • Friday = 32
    • Saturday = 64
  • Match Count – How many consecutive unhealthy samples needed before it’s unhealthy.
  • Severity – Severity of alerts generated and health state of monitor.
  • Priority – Priority of alerts generated.
  • Description – description of monitoring, will be included in alert description.
  • Documentation – Link to documentation, link will be included in alert description.

What can be monitored right now

(I am open to suggestions to be added)

Those last two are really powerful and are used like Swiss Army Knives for monitoring. Invariably application SMEs will have admin scripts or executables that we can use to monitor health. By not embedding them into SCOM MPs we can leave the maintenance of them to the application SMEs. Less work for the SCOM SME! 😊

Other Existing Monitoring i.e. Microsoft SQL, Windows/Linux OS, Windows Clusters etc…

By setting the required registry keys and CSV entries it’s possible to include the health of other objects in the health of SMF business services. At the moment I’ve got agent tasks and discoveries for:

When I get time I’ll add other common objects such as IIS servers, websites, application pools etc...

What does it look like

The screen capture below shows the diagram view for a simple business service created using the SMF.

Business Services Diagram View

If you have SquaredUp (and you really should) I've created some dashboards and perspectives.

What’s Next

Any suggestions/feedback gratefully received or if you want to help improve/fix the MPs get in touch.

Thanks Keith