Host Failure Modeling

This report allows you to simulate host failure and forecast CPU and memory usage for vSphere clusters.

Description

The Host Failure Modeling report evaluates total capacity of your infrastructure and forecasts how many days remain before the level of CPU and memory usage in a cluster with failed hosts reaches the specified threshold values.

To calculate future cluster performance, the report analyzes historical performance data for the specified period in the past, calculates the performance utilization trend and applies this trend to the forecast horizon. For better forecast accuracy, along with the performance trend the report provides trend deviations that represent the best- and worst-case scenarios. This allows you to estimate various possible outcomes and helps you rationally plan your cluster capacities.

The report also provides recommendations on appropriate resource allocation to prevent possible CPU and memory resource shortfalls in future.

Parameters

Performance Data From: defines a date in the past starting from which historical performance data will be used to draw the performance trend. The report analyzes historical performance data starting from this date to the current date (data collection period).

Note

To make a forecast, the report must use historical performance data for at least 72 hours.

Forecast Horizon: defines the forecast period (period in future within which the host failure should be simulated). The calculated performance utilization trend is applied to the time interval that starts from the current date to the forecast horizon date.

Note

The date in the Forecast Horizon field must be a date in future.

Show details: defines whether the report will display expanded or collapsed charts.

Mode: defines whether the report will display all results or problematic areas only (problematic areas are vSphere clusters for which resource utilization thresholds will be breached within the forecast period).

Scope: defines a list of Groups or Objects that will be analyzed in the report (by default, the VMware Clusters group is selected).

Threshold: CPU Usage (%): defines the CPU usage threshold as a percentage of total cluster CPU resources.

Threshold: Used Memory (%): defines the used memory threshold as a percentage of total cluster memory resources.

Hosts to Fail: defines the number of hosts for which you want to simulate failure.

Performance Modeling Based On: defines whether the report will use maximum (peak) or average CPU and memory historical values to assess cluster performance.

Sample Usage

Model the situation where one host in each cluster in your virtual environment fails. Forecast the number of days left before the level of CPU and memory usage on remaining hosts reaches 80%. Use the historical performance data for the previous month for report analysis.

Instructions

Open the Host Failure Modeling report.
From the Performance Data From list, select Previous Month > First Day.
From the Forecast Horizon list, select a date in future (30 days from today).
In the Scope section, select VMware Clusters and click Remove. Then search for available clusters to include in the report.

Click Add Object. In the Add Object window, click Options. In the Options window, click Add. In the Class Name search box, enter cluster and click Search. Select the VMware Clusters class in the list of search results, click Add and click OK. In the Options windows, click OK to apply the filter. In the Add Object window, click Search. The search will return a list of objects that belong to the VMware Clusters class. Select the necessary clusters in the list, click Add and click OK.

In the Threshold: CPU Usage (%) and Threshold: Used Memory (%) fields, enter 80.
In the Hosts to Fail field, enter 1.
From the Show details list, select Expand Charts.
From the Mode list, select Show all results.
From the Performance Modeling Based On list, select Average Usage.
Click Run to view the report.

Report Output

The Virtual Infrastructure Described table will provide an overview for all clusters included in the report scope: number of clusters, number of hosts, number of VMs (total, active, shutdown and suspended), and total CPU and memory usage values.

The Performance Forecast Analysis and Recommendations table will provide the following details for every cluster:

Column	Description
Name	Name of the cluster.
Analysis Result*	Result of the prediction analysis. Possible analysis results are: Passed means that the specified thresholds will not be breached within the forecast period. Failed means the specified thresholds will be breached within the forecast period or have already been breached within the data collection period.
Constraining Resource*	Resource for which the specified threshold will be breached first within the forecast period. For example, both CPU Usage (%) and Used Memory (%) thresholds will be breached within the forecast period and the CPU Usage (%) is reported to be breached first. In this case, the report will display CPU Usage as the constraining resource. However, the Recommendations will be provided both for CPU Usage and Used Memory resources.
Days Left	Number of days left before the constraining resource threshold is breached. The number of days is shown as the worst-to-best case range.
Recommendations*	Recommendations on appropriate resource allocation. Recommendations for a resource will be provided in case the specified threshold for the resource is breached within the forecast period or has already been breached within the data collection period.

*Provided results and recommendations are based on the worst-case scenario.

The Overall Recommendations table will provide recommendations for all clusters included in the report scope.

Details

Report details will provide in-depth forecast information for every cluster included in the report scope.

The Virtual Infrastructure Described table will provide an overview for the cluster: number of hosts, number of VMs (total, active, shutdown and suspended) in the cluster and total CPU and memory usage values.

The doughnut charts will represent the number of removed and remaining hosts and amount of removed and remaining resources in the simulated scenario.

The Performance Forecast section will show a chart and a details table for each analyzed cluster resource.

The performance forecast charts will display the following types of data series for the analyzed data collection period:

Trend and its deviation represent resource utilization trend and the 95% confidence interval for it. The upper border of the confidence interval represents the worst-case scenario. The lower border of the confidence interval represents the best-case scenario. The width of the interval between the best and worst-case series depends on the variability of the collected historical performance data (the greater the variability, the wider the interval).
Value represents the historical performance data for the data collection period (for example, how the cluster CPU usage has changed within the analyzed period)
Threshold represents the threshold specified for the CPU and memory usage
Capacity represents the total CPU and memory capacity

The performance forecast tables will provide the following details for the cluster resource:

Column	Description
Aspect	Name of the analyzed resource
Threshold	Specified threshold value
Prediction	Prediction on whether the resource threshold will be breached within the forecast period. Possible prediction values are: Threshold will be achieved means that threshold will be achieved within the forecast period in both worst- and best-case scenarios. Threshold may be achieved means that only worst-case data series will breach the threshold within the forecast period. Threshold will not be achieved means that neither best-case nor worst-case data series will breach the threshold within the forecast period. Threshold is achieved means the trend line has already breached the threshold.
Days Left	Number of days left before the specified threshold is breached. The number of days is shown both for the best-case scenario and worst-case scenario.
Available Resources	Amount of non-consumed resources on the forecast horizon date (that is, at the end of the forecast period): the amount of available resources is calculated as the difference between the threshold value and metric forecast value. Positive values mean that spare resources will be available. Negative values mean that extra resources are required to survive. The values are shown both for the best-case scenario and worst-case scenario.