Thursday, February 28, 2013

How to monitor weird SNMP devices in Operations Manager

Sponsor post

A much too common scenario, you have a device that you know for a fact supports SNMP, but it's not supported by Operations Manager or by Jalasoft Xian Network Manager. In that case, you can opt to create your own Management Pack for Operations Manager, but that’s kind of a tough job and definitely not something that you’ll enjoy doing. An easier way to get around this is to use Xian Network Manager 2012. It has a simple feature called "Custom Rules" that will make it possible, in just a matter of minutes, to monitor anything you want on those weird SNMP devices.

Adding your weird network devices to Xian Network Manager

The first thing you need to do is add you weird device to Xian NM. Make sure you have SNMP enabled and the IP-address and community string at hand. Click on [Device] in the menu bar followed by [Add]. A small dialog appears, search for "Network Device" and then click OK. Next a screen like the one in figure 1 appears. The only thing you need to do here is provide the IP-address and the community string. If the device is not within the same network you might consider increasing the time out.

Figure 1, Discovering the Network Device

Now click [Finish] and Xian NM will begin to discover the device. As an example, we enabled SNMP on a Windows 2008 R2 server and within 15 seconds the mentioned server appears as a network device in the Xian NM Console, as you can see in figure 2.

Figure 2, The Discovered Network Device

What is monitored out of the box?

Xian NM monitors some standard aspects after the Network Device has been discovered. You can see this by double clicking on the discovered device and going to "active rules". In Figure 3, you can see exactly what is being monitored, Total Traffic, System Uptime and Availability. Total Traffic monitors the traffic flowing over all the interfaces on the devices, which is set with an automatic threshold. This means that Xian NM will define a threshold after monitoring it for 12 hours. System Uptime returns and the name suggests the number of days the device is up and running.

By default, an alert is sent when the device is up for more than 90 days or below. In this way you’re alerted when the device restarts or when the device is due for a maintenance check. Availability checks if the device can be reached through ICMP and SNMP. The default settings can be easily changed according to your needs. Moreover, you can also apply inactive rules which you can see on the right side of the "device properties" window. Simply drag and drop and the wizard will help you to set up the rule.

Figure 3, Device properties of our 'weird' Network Device

Creating a custom rule

Next it’s time to create a custom rule. For our Windows server we are going to add a CPU Load rule.  In the main console click on [Configuration] on the bottom left of the screen. Now select [Rules] and then [Network Device] in the tree view on the left. Right click on the Network Device and click on "Add Custom Rule". Consequently, a dialog window as the one in figure 4 appears. First you have to define some general information regarding your custom rule. Name, Title and Description can go without an explanation.

However, do be careful with the type of rule, most rules are Performance rules. For example, CPU Load, Memory, System Uptime, so if you are not sure choose one of the three. Select Incremental Performance rule for rules that deal with traffic, this type calculates the difference of two points in units per second. Lastly, if you are concerned with Fan status, Interface status or any other status, select Status rule. In our case since we are going to add a CPU load rule we select Performance rule.

Figure 4, The Add Custom Rule Wizard - General

Next you need to set some performance parameters. The most important ones are:
- Maximum and default threshold value, here you define the threshold value that should be shown by default when you apply the rule and the max value the user can assign;
- Unit, what unit are you using? Percentage, Buffers, packets etc.;
- Allow automatic threshold; Enable this option in order to have a rule that can calculate the threshold automatically;
- Applicable Element, over which element the rule is running. Normally you probably need the NetworkDeviceSnmpDeviceElement;
- Rule Category, Normal this is General, but you are free to change this.

The most important part of the setup process is defined under expression. You will see a small XML code, but you don’t need to do much. Copy and paste the OID over the OID that is already shown. However, for our example we made it slightly more complicated. This is due to the fact that our Windows Server has two cores and thus we will have two CPU loads OIDs. What we’ve done is changed the syntax to get the two CPU loads and divide them into two. Through this operation we also discard the variable MyOID. If you take a look at figure 5, you can clearly see what we’ve done. 

Figure 5, defining the syntax to get the average CPU load of the Windows Server which has two cores

The last thing that you need to do in this wizard is fill in the alert data. This is data that will be displayed when an alert is triggered in OpsMgr and gives the user some extra information. Now click [Finish] and the rule will be added in Xian NM. But before we add the Management Pack of OpsMgr, let’s first do a short test to see if the rule runs as expected.

1. Go to the "General Section" of the rules of the Network Device. There you will see the rule you just added. Right click and select "Test Rule", click OK. Now you will be asked to select the device that you want to use to test the rule. Select the device you added earlier;
2. Now a simplified version of the rule wizard will appear. Go immediately to the schedule and set the interval to 5 seconds. (you can opt for another interval if you want);
3. Click on start.

Figure 6, Testing your custom rule

After starting the rule a small dialog window like the one in figure 7 will appear. If all is configured correctly, you will see values appearing. If not, select the "alert" tab. If it says that an exception occurred, you probably made a mistake with the OID setting. In that case, go back and adjust the OID settings in the Custom Rule Wizard. Additionally, you can export the result to CSV in case you need to test more than just the rule and if you also want to analyze the value in a different format, for example excel.

Figure 7, Rule test output

Time to create the Management Pack and import it into OpsMgr. This is needed so OpsMgr is aware of the data that could arrive from this new rule and furthermore, to make sure that this information is processed in the correct way. To generate the Management Packs click on "Network Devices" in the rules section, consequently the option "Generate Management Packs" appears in the action panel. Click on it and select a destination folder. Next, you use the normal import procedure to import the generated Management Pack into OpsMgr.

Adding the custom rule

Let’s add the custom rule to your device. Note that as of this moment the rule you created can be added to any device of the same type. So in our case, if we add any other Windows Server we can deploy this rule as well.
1. Double click on the device where you want to add your custom rule;
2. Go to the active rules Tab;
3. Drag and drop the rule from rule list (on the right side) to the running rules list. Note that you can easily distinguish the custom rules since they are blue colored. (see figure 8)

Figure 8, adding a custom rule to your device

4. Now the typical Xian NM rule wizard will appear. Here you have to pay attention to the threshold level and type. If you know which threshold level you need, opt for a manual threshold otherwise choose automatic threshold. This one will monitor the values for a certain period and define a threshold for you, which you can change also later on. Also be careful with the correct interval. You don’t want it to be too low, since that would be complicating the performance of Xian NM and OpsMgr;
5. Now click Finish;
6. The rule will now go in to running state.

What do you see in OpsMgr?

After the new rule has been running for a while and it has past at least one cycle of the interval, you will be able to see performance data and alerts appearing in OpsMgr as long as you enabled it in the rule settings. In figure 9 and 10 you can see the performance data and alerts appear exactly as any other Xian NM rule or actually as any other rule/monitor of OpsMgr.

Figure 9, Performance data of your Custom Rule in OpsMgr

Figure 10, Alerts of the Custom Rule in OpsMgr

Thanks for reading!

1 comment: