Certain Windows services are vital to health and reliability of systems. Monitoring their uptime and automatically restarting them upon failure is paramount for many IT organizations
This short article explains basic steps to:
-
Track if a particular Windows service is in the Running state
-
Alert when a particular Windows service is not in the Running state
-
Auto-restart a particular Windows service when it is not in the Running state
This article assumes that CloudMonix agent is installed on the monitored Windows server
Tracking the running state of a Windows Service
-
Verify that monitoring is working on a particular Windows machine - is any data coming in on the dashboards?
-
Add a new metric that is tracking running state of a particular Windows service by editing settings for the monitored server and adding a new metric of type WindowsServiceState
-
Dropdown with Windows Service names should contain all of the Windows services on the monitored server if CloudMonix agent has been successfully running on it for a few minutes
- Provide a unique name for the metric and optionally tag it to be highlighted on the dashboard
When everything has been added successfully and the service has been highlighted on the dashboard, observe that after a few minutes dashboard view of the monitored resource will now start to contain service states for particular service
Alerting when a particular Windows Service is not running
To receive an alert when a monitored Windows Service is not running
- Add a new alert to monitored resource and give it a short descriptive name. It will show up in notifications and on the dashboard
-
Specify the severity of the alert to have it be properly categorized and filtered for notifications and dashboards
-
Enter an expression that compares the value of the previously defined service state metric. When the expression is TRUE the alert will fire
-
Learn more about Windows Service State metric and other Windows-server based metrics here
- Hint: pressing space in the Expression field will trigger a auto-complete dropdown of all metric names that can be used in the expression
Restarting a failed Windows Service automatically
Creating an action to restart a stopped Windows Service has steps similar to that of raising an alert
Typically, when actions are fired, there are two notifications sent: one when action execution is requested by CloudMonix and another when it is executed
- Add a new action to monitored resource and give it a short descriptive name. It will show up in notifications and on the dashboard
-
Specify the severity of the action to have it be properly categorized and filtered for notifications and dashboards
-
Enter an expression that compares the value of the previously defined service state metric. When the expression is TRUE the action will fire
- Specify the target resource for action execution (typically Self)
- Specify the type of the action (typically PowershellRestartService or CustomPowershellScript (if latter, you will need to define a particular PowerShell script that will execute on the monitored machine)
- IMPORTANT: Specify a meaningful Suspended period for the action. This will allow action to not be re-executed again within that time period and allow for things to "catch up" to function execution.
- Learn more about Windows Service State metric and other Windows-server based metrics and actions here
- Hint: pressing space in the Expression field will trigger a auto-complete dropdown of all metric names that can be used in the expression