From my few years of working in a NOC I've seen a lot of stupid behaviors from devices in regards to monitoring and notifications. Here's my current list of gripes.
I wish your app, monitoring product and/or device would take the following list into consideration when you decide to add SNMP Notification (trap) support.
- You probably don't need more then one custom trap. A few different traps for different types of notifications is fine. 30,000 different traps that all have identical structure is not ok. (I'm looking at you Symantec I3)
- Use a different OID for each piece of information. Please don't make us use regex to pull out the data.
- If you're trapping about a problem, trap about when the problem is resolved.
- Cold/warm start, link up, link down and most of the other basic RFC traps that are well defined are more then welcome.
- If you're going to tell me the status changed of something, tell me what it is! (I'm looking at you Arista Networks.)
- If you're going to tell me the status changed of something in particular, tell me how it changed!
- Nobody ever got angry because you provided too much data. If I have to poll your device to find out what happened you're doing it wrong.
- If there is extra data you might think is useful (Say the label on the port when you trap the port had a link down event.) please provide it. You can still do this and conform to the RFC spec.
- SNMP V2c support is fine, nobody cares about V3. Feel free to support it but don't assume people will use it by default and don't make people jump though hoops to disable it.
- Don't hold internal state of the "Alert" you're trapping about. If I have to manually log into your system and clear or acknowledge your alert before you'll tell me it happened again you're doing it wrong. (I'm looking at you RAD Data Comm.)
- If you're reporting on state changes, tell me every state change every time! Nobody ever complained about too much data.