This step-by-step guide explains how to setup and monitor Azure Windows VM Scale Sets using CloudMonix. Since Azure Service Fabric and Azure Batch Services run on top of Windows VM Scale Sets, they can also be monitored thru the Windows VM Scale Set monitoring functionality.
In this article
1. Monitoring setup
2. Collect, understand and use your data
2.1 Metrics
2.2 Alerts
2.3 Actions
3. Setup verification and troubleshooting
Did you know?
CloudMonix extends native Azure Windows VM Scale Set monitoring with advanced metrics and features. Noteworthy:
- CloudMonix receives data from Azure Diagnostics Extension, Azure Diagnostics Agent or CloudMonix agent
- ability to auto-scale and staggered reboot of every instance on a daily basis
- pre-configured metrics (basic Azure VM Scale Set): application event logs, CPU time, CPU time 30 min average, disk free space (each disk), disk free space total, disk idle time, disk read / write speed, instances, virtual memory in use, memory free, recommended actions, resource status, system event logs
- alerts (basic Azure VM Scale Set): high CPU, low disk space, low memory and resource outages
- conditional or schedule-based ability to reboot / reimage/ start / deallocate a VM Scale Set instance
Setup
a. Run the Setup Wizard in the portal (preferred way):
This article explains how to add resources to CloudMonix via the Setup Wizard.
b. Download and install CloudMonix Agent (optional):
By default CloudMonix will use Azure Diagnostics Extension to monitor Azure Windows VM Scale Sets. CloudMonix agent, however, provides additional metrics and automation features. In case CloudMonix agent is used for monitoring, it is important to understand the differences.
Noteworthy:
- setup script for the agent can be run remotely
- agent is pre-configured for specific account it was downloaded from
- agent needs to be deployed for every monitored VM
Installation and configuration instructions for CloudMonix agent can be found here . CloudMonix utilizes a separate agent for each monitored VM.
c. Firewall configuration (optional):
Given the scenario CloudMonix agent is used for monitoring, it is necessary to white-list CloudMonix IP addresses as described in this article.
d. Tweak settings in the Definition tab (optional):
Definition tab for an existing resource can be accessed by clicking the resource's monitoring settings in the performance dashboard:
Definition tab provides optional settings for the resource name, Azure API, Azure resource management token, Azure resource group, Azure resource name, CloudMonix agent, diagnostic storage account, deployment ID, options to preserve Diagnostics configuration and my compute nodes, scale-down and scale-up cooling periods, configuration template and categories:
Do NOT modify diagnostics configuration setting allows preventing CloudMonix modifying the Diagnostics configuration, however in this case users are fully responsible for the configuration management and updates. Learn more here.
Do NOT auto-update my compute nodes setting allows preventing CloudMonix from automatic propagation of configuration changes to all nodes in the Scale Set. Azure doesn't automatically deploy configuration changes to all nodes in the Scale Set, therefore CloudMonix ensures that all nodes use the same configuration by propagating the changes. Learn more here.
Best Practices
Configuration Template setting provides pre-defined configuration templates available in CloudMonix by default as well as previously stored custom templates. The following default templates are available for Azure Windows VM Scale Sets:
e. Manual setup (optional instead of the Setup Wizard route):
Click the Add New button in the top right corner of your dashboard:
Fill in required information in the Definition tab as described in the previous step.
f. Advanced configuration:
Advanced configuration tab provides additional monitoring settings, which are already set as default for most use-cases.
g. Scale Ranges and Scale Adjustments:
CloudMonix built-in auto-scaling and scale adjustments features provide powerful reactive, proactive and scheduled auto-scaling rules. These settings can be accessed via the Scale Ranges and Scale Adjustments tabs:
Azure native auto-scaling should be disabled in the Azure Portal. It is also recommended to disable Azure Over-Provision feature since extra VMs it deploys may result in conflict with current instance count in CloudMonix. Learn more about Over-Provision here.
Read the full article on how to use auto-scaling and scale adjustments features in CloudMonix.
Collect, understand and use your data
Specific Metrics, Templates, Alerts and Automation Actions for Azure Windows VM Scale Sets:
a. Metrics:
Diagnostic data points retrieved from the monitored resource are referred to as metrics . CloudMonix provides default templates for the metrics recommended for common configurations. Metrics can be further added, removed or customized in the Metrics tab of the Azure Windows VM Scale Set resource configuration dialog:
b. Alerts:
CloudMonix features a sophisticated alert engine that allows alerts to be published for very particular conditions pre-defined by a template configuration or custom based on any of the available metrics. Alerts can be further added, removed or customized in the Alerts tab of the Azure Windows VM Scale Set resource configuration dialog:
c. Actions:
Actions are automation features that can be configured to fire based on specific conditions or schedule. Actions can be added and configured in the Actions tab of the Azure Windows VM Scale Set resource configuration dialog:
- default monitoring templates for Cloud Role include the ability to reboot Cloud Role resource instance daily and reboot low-RAM Cloud Role. These actions are disabled by default and need to be explicitly enabled
- available actions include conditional or schedule-based ability to reboot / reimage/ start / deallocate a VM Scale Set instance.
Setup verification and troubleshooting
a. Setup verification:
Successful resource setup can be verified by clicking Test button in the resource configuration dialog and visiting the Test Results tab:
b. Troubleshooting monitoring issues:
CloudMonix provides deep insights into resource monitoring issues via the Status Dashboard screen. The screen allows to overview resources that have raised alerts and troubleshoot them by diving into the monitoring logs.
Read the full article on how to use Status Dashboard to diagnose resource monitoring issues.