I'm curious why nagios/munin are overkill. I think they exactly match your requirements. Scheduling the tests and keeping track of the result in a scalable way can be a bit complicated - the actual tests are basically plugins. nagios and munin come with a few built-in tests (basically, the ones you want to see) and the rest are plugins, probably in separate packages. It's a bit annoying to learn nagios config language though, I have to admit. Munin is way less complicated, but the thinning of data as time goes by annoys me. Then again, it was one of your requirements. The graphs are a bonus. You don't have to look at them if you don't want to. I haven't looked at zenoss, but will keep an eye open for it. bjb On Fri, Jul 12, 2013 at 11:49:23PM -0400, Peter Sjöberg wrote: > On 07/12/2013 10:28 AM, Brenda J. Butler wrote: > > > > > > I don't know oswatcher, but based on your description the following > > would be usefule for you: > > > > > > munin (keeps a contstant sized database, which thins out as you look back > > in time). > 10sec look and it looks like overkill but I will look at it more. > > > > > nagios > Definitely overkill. Using nagios for other things but what I'm after is > not monitoring as much as a tool to use after the monitoring alerted > that something is bad. At that point I want to know what did lead up to > all memory used up or what process that did consume all cpu/io since > once the alert happens it many time gets resolved with a big shotgun > like a reboot (like when they accidentally started 40 instances of a > java app on a server designed for 4) and we are left to tell what > happened without logs. > > > On 07/12/2013 01:36 PM, Jeffrey Moncrieff wrote:> > > You can also try zenoss. > > > Will check on that later > > > > > In both cases, if there is some test they don't already do, you can > > write your own and have them use it. > > > Well, google did find https://github.com/stephenlang/scrutiny and that's > about the closest I seen to what I'm looking for but a bit to basic. > > Since after all it's not that much to it I started writing something > that I will try out over the weekend. I know one challenge will be to be > able to actually collect anything when the system is crawling but > anything is better then what we have now which is nothing (besides 1 > minute sar data which tend to stop before system dies). > > /ps > > _______________________________________________ > Linux mailing list > Linux [ at ] lists [ dot ] oclug [ dot ] on [ dot ] ca > http://oclug.on.ca/mailman/listinfo/linux ---end quoted text---