Posts Tagged ‘shinken’

Monitoring : Shinken custom services and templates

Monday, September 23rd, 2013

The basic Shinken monitoring templates are very powerful. One of the problems with Shinken (or any other monitoring platform for that matter) is that they require customization. One of the area’s where shinken is both very powerful and requires customization is in the templating area. The scenario that I am trying to figure out is specifically around the use of templates to scale out the monitoring of our SaaS platform. We currently have 35,000 virtual machines with at least 5 monitors per VM. The downside of that is the need to create a template that works for us so that we can scale beyond 35k hosts. Of course that requires some customization.

Firstly we want to abstract the conventions away from the face to face view. We don’t want to see “service down” down, we just want to see “$HOSTNAME – Application” that way from the board we can see the host and the application affected. We can then later use that data as a part of our ITIL problem management process.

Firstly we want to define a host in the shinken etc/hosts/<hostname>.cfg

define host{
     use           application-name-to-check
     host_name     www.domain.com
     address       www.domain.com
 }

Here we have the “use” statement which calls on the template called “application-name-to-check”. Next we want to configure the template to query the application with all of the correct variables defined. in etc/templates.cfg

define host{
     name application-name-to-check
     check_command check_name-to-check
     register                        0
     # Checking part
     max_check_attempts              2
     check_interval                  5
     # Check every time
     active_checks_enabled           1
     check_period                    24x7
     # Notification part
     # One notification each day (1440 = 60min* 24h)
     # every time, and for all 'errors'
     # notify the admins contactgroups by default
     contact_groups                  admins
     notification_interval           1440
     notification_period             24x7
     notification_options            d,u,r,f
     notifications_enabled           1
     # Advanced option. Look at the wiki for more informations
     event_handler_enabled           0
     flap_detection_enabled          1
     process_perf_data               1
     _CHECK_HTTP_DOMAIN_NAME    $HOSTADDRESS$
     _CHECK_HTTP_PORT           80
 }

You can customize any of the variables in the list above to suit your environment. In this case we are checking port 80 and use the $HOSTADDRESS$ variable defined in the first hosts part of the configuration above. The area to note here is that we need to have check_command check_name-to-check specified. In this case the check_command is called check_name-to-check (just to be creating use something descriptive for your organisation). We have also specified register 0 which defines that it is a template. Next we need to specify the check_command in etc/commands.cfg

 define command {
     command_name   <strong>check_name-to-check</strong>
     command_line   $PLUGINSDIR$/check_http -H $_HOSTCHECK_HTTP_DOMAIN_NAME$ -u &lt;URL-TO-CHECK-FOR-APP&gt; -p $_HOSTCHECK_HTTP_PORT$ --authorization=$_HOSTCHECK_HTTP_AUTH$
 }

In this case we are serving a HTTP application that needs to have the URL specified in the check_command. That needs to be defined in the <URL-TO-CHECK-FOR-APP> section (remove/replace that with the right url. e.g. /url/). Once that is defined we want to define the service. etc/services/application.cfg

define service{
     service_description          application-name-to-check
     use            generic-service
     register       0
     host_name      application-name-to-check
     check_command  check_name-to-check
 }

It is now time to make sure the configuration syntax is correct. Once you run “service shinken check” it should give you a clean output. If not find the error and fix it. The service should restart properly if it has checked successfully. Restart the service and your templates should now be active.