rrd exampleFirst off let me say that there are tons of monitoring application out there and even using rrdtool,namely Cacti, but I found that they were all lacking one thing or if they had that feature, they were to expensive. Full view of multiple graphs,zoomable, without pestering interfaces.

The system should have a simple but flexible admin interface but nothing else. We looked at different systems to monitor more then 20 gateway routers, graph the traffic and more then 40 Windows Servers (including domain controllers) graphing CPU and connection statistics.

Screen real-estate was not a problem as we had 2 24″ screens for this purpose. The first thing that needed to be done was to get the requirements down and anyone who would be rolling out their own system knows that thats the biggest part. We decided that the system should be as minimalist as possible and do only the graphing as we had other systems monitoring the network. We just wanted a webpage that would display us what we wanted and now 10000 menus and graphics like all the other systems.

Requirements:

To summarize we wanted a system that would run without much maintenance, a script that collects and builds the graphs regularly at an interval of 1 minute, or 3 minutes. Something that can be used by other IT staff across our offices, but displays a lot of information. It should also be accessible via the web so dedicated monitoring workstations with big screens can display the graphs that are easy to decipher.

Consolidation:

We could in theory run MRTG but the problem with that is that out of the box it will display INPUT and OUTPUT on a graphs for one device. Nothing else.

I would like to combine several Graphs into one.

For example:

  • One graph for each inter-country network link. To see the traffic throughput. Since we have offices in 5 countries, we would create a graph with 5 differently colored lines. While we are at it we will create a graph thats a bit larger and has a negative side which represents the output. So positive for input, negative for output. The other thing to consider is that we have different link speeds, anything from 10 to 2 mbit, so the graph would have to be base on percentage use.
  • The other option is not to have it country based but link speed based so your group all the connections with the same link speed together. I prefer the Country thing though since it goes hand in hand with the next point.
  • Next I want to graph relevant Windows Domain controller performance in groups. Since we have quite a few, country based would be preferred here. This information has to be divided however. So it would be one group for network connections, one for DNS requests / sec, one for CPU usage and one for Memory usage.
  • The same could be applied for file servers and disk storage. So we want that too.

Also we want to have a site that loads fast and is tabbed to access different groups of information at the same time.

All in all the requirements look like this:

  • Website that loads quickly and displays all information by country in tabs, possibly with different expansions using collapsible subsections.
  • Group relevant servers in a country and use snmp to fetch continuous information.
  • Display several servers in the same graph in order to spot relevance in traffic and connections. This allows us to check if a spike is only in one country on one server or if it spans multiple countries / servers.
  • Graphs should be small(ish) and on click will go into more detail (javascript)
  • Lastly easy method of adding more servers and information.

Well this looks like something that could take a little while to cook up with. The HTML part for the tabs and the Image zoom is ready and will be introduced in the next installment of this series. The technology we used is Javascript and JQuery.

This entry was posted on Monday, August 11th, 2008 at 5:33 am.
Categories: General Announcements, Security.

No Comments, Comment or Ping

Reply to “Creating a graphing / monitoring system with RRDTool”