By default, all problems with any of the transactions that you monitor will make the green color change to yellow or red. You may change that by adjusting the strategy. You can adjust the strategy on domain level, system level, and dashboard level depending on what you want to achieve.
The Dashboard strategy varies from one OneView customer to another. Some OneView customers want to see every glitch in order to be able to react and prevent similar problems down the line, others want to iron out the glitches because they do not have the time to fix them all right now.
Best Practices
A Green Dashboard
The primary goal for OneView is to present a green dashboard most of the time. Only when a persistently bad user experience is detected, it should change color.
The OneView dashboard should not highlight technical issues like disk nearly full events, certificate expiration eminent, high CPU load, etc. unless you know they have an enduser impact.
A technical monitoring dashboard will nearly always have red flags or events that needs your attention.
The primary Oneview dashboard should, if possible, only show issues that have a documented impact on end users.
Therefore you should focus on end users on the primary dashboard and put technical monitoring on secondary dashboards.
The Load Balancer Case (BEST or AVERAGE strategy)
Imagine a system of one frontend load balancer/reverse proxy server and N backend servers each representing a sub system.
If one of the N backend servers breaks, you may signal this by letting the status of the sub system turn red, but the status of the overall system should still be green, because the load balancer will direct users to healthy backend servers.
In this case the overall system should use the dashboard calculation strategy called BEST (green as long as at least one backend server is green) or AVERAGE (green, yellow or red depending on the number of failing backend servers).
The Application Stack Case (WORST strategy)
Imagine a system of one frontend server, one application server, one database server and a SAN.
If one of these components fails, the entire system will fail. In this case the overall system should use the dashboard calculation strategy called WORST.
The Business Transactions Case
Imagine a system having a set of different business transactions that you monitor individually for all users.
This could be data from an IIS access log or any other application log of user transactions and their response times and successes.
Identify transactions that are essential to the application and put them into a domain called “Primary transactions”. The domain would use the WORST dashboard calculation strategy.
Identify a representative set of user transactions (10 to 50) that are not essential and put them in one or more other domains called “Secondary transactions”. Apply the AVERAGE dashboard calculation strategy to these domains.
Put all domains into a system called “Business Transactions” and apply the WORST dashboard calculation strategy.
The Active Monitoring/Robot Case
Imagine a system monitored by a script or robot issuing a synthetic transaction against the system and reporting response time, success or failure at regular intervals.
Create a domain for the robot transaction and apply the WORST dashboard strategy for the domain.
Create a system for the domain and apply the WORST strategy.
For the transactions in the domain you must apply a dashboard strategy that takes into account any measurement errors.
Because the robot only test once at every interval, you do not want to flag the system as being down, until the next test shows the system has most likely failed.
You should apply the TOLERANT or NORMAL dashboard strategy to the transactions in the domain.
If you need to delay the time until the overall system changes its state upon robot error for a longer period, you may apply the BEST dashboard strategy for the transactions in the domain and set the number of samples N to a high number.