Jim writes:
Hello,
I'm wondering if anyone has any thoughts on how one might make an "monitor" that represents the average of several other monitors.
I'd actually be inclined to call that a "view" and either have it be one or more "equipment"s or one or more "monitor"s (or, possibly, a composite of them) and default to either "min", "avg" or "max" (in the case of "equipment"-centred views, I suspect either "avg" or "max" would be the right default, for views composed of "monitor"s, I am less sure). I can probably think of even more interesting ways of aggregating measures into useful values, if given a few more moments to think about it. [1]
Just so that's on record, somewhere. :)
The typical use-case, as I see it, is to slap sufficient inter-related things into one or more views, so all you'd look at frequently is the status for the view-as-such, then opening the view up to watch components within the view (be that one or more equipment objects or one or more monitors; sort of how the equipment aggregates monitors).
I'm looking at http://meta.rocksclusters.org/ganglia/ right now where they display a graph of the average load over some 450+ machines. How might you implement something like that in NOCtool?
Depends on, I would've thought.
p.s. anyone seen any "good" monitoring UIs? something they like... I can't say I ever really have :P
Closest I've seen so far is HP OpenView and Spectrum (no longer Cabletron, but surprisingly still alive). Both rely heavily on the admin(s) to set up decent views, as uncareful adding of monitored elements tends towards "crowded".
//Ingvar [1] Off the top of my head, I could probably make a case for: minimum measure/alert level arithmetic mean of measure/alert level median of measure/alert level geometric mean of measure/alert level (this'd be "multiply all N measures together, extract the Nth root, this is the geomean) maximum of measure/alert level
Minimum is the one I'd have teh hardest time to defend, but... The three averages are variously useful for performance indication (artithmean is useful for a fine-grained load-balancing; median is handy for most practical purposes, I would've thought and the geomean ought to spike as you starts towards having more servers with an issue, while being fairly unresponsive when there's just a small problem) Max is handy whenever you have a small number of things aggregated (or not much load-balancing between them).