NSClient++ Help (#1) - Slowly but surely, event logs are drving me insane [SOLVED] (#535) - Message List
Hey folks,
I've been going around and around trying to monitor Windows event logs
on both Server 2003 and Server 2008. For this exercise, I'm working only on 2003.
I have defined a check command like so:
$USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t $USER11$ -c CheckEventLog -a file="$ARG1$" filter=new filter=in MaxWarn="$ARG2$" MaxCrit="$ARG3$" filter-generated="$ARG5$" filter+severity=="$ARG6$" filter+eventID=="$ARG4$" truncate=1023 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)"
(sorry about the crazy line wrap)
I have created a testing service on a host:
# Eventlog checks using check_windows_eventlog2
#
# ARG1 = Event log to check
# ARG2 = MaxWarn value
# ARG3 = MaxCrit value
# ARG4 = EventID to look for
# ARG5 = Time filter
# ARG6 = Severity (information, warning, error)
define service {
service_description Abrupt restart
check_command check_windows_eventlog2!system!1!1!6008!\>10m!error
host_name hostname
check_period 24x7
notification_period 24x7
contact_groups nagios-admins
notifications_enabled 1
use active-service
}
I "flick the switch" on the VM to simulate an unplanned restart, and sure enough, Windows logs an event 6008 in the system event log as an error. Yay.
But the above check command and service don't catch it. In fact, running on the command line:
/usr/local/nagios/libexec/check_nrpe -H hntbw598 -p 5666 -t 90 -c CheckEventLog -a file="System" filter=new filter=in MaxWarn=1 MaxCrit=1 filter-generated=\>1h filter+severity=="error" filter+eventID=="6008" truncate=1023 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)"
Doesn't get it either. I see it in event log. I know it's there. It's sitting there, mocking me.
PLEASE, someone, grab a cluebat and give me a whack.
Thanks!
Benny
-
Message #1667
Interesting... try the latest nightly (out in a bit) and see if that resolves the issue.
Whilest looking at this I found a pretty major bog which I have fixed.
I have also (BTW) added a debug-threshold which makes it "simpler" to debug these kinds of errors (if you are running in /test mode).
Like so:
CheckEventLog debug=true debug-threshold=1 file=Program filter=in MaxWarn=1 MaxCrit=1 filter-generated=gt:1d filter+eventID==1008 filter+severity==error truncate=1023 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)" d \NSClient++.cpp(1073) Injecting: CheckEventLog: debug=true, debug-threshold=1, file=Program, filter=in, MaxWarn=1, MaxCrit=1, filter-generated=gt:1d, filter+eventID==1008, filter+severity==error, truncate=1023, unique, descriptions, syntax=%source%: (%severity% event ID %id%) %message% (%count% events found) d \CheckEventLog.cpp(655) Filter: - {timeGenerated max 1d }+ {eventID eq 1008 }+ {eventSeverity eq error } d \CheckEventLog.cpp(185) Attempting to match: Program with Program d \CheckEventLog.cpp(663) Opening alternative log: Application d \CheckEventLog.cpp(742) [2] Matched: + eventID eq 1008 for: Customer Experience Improvement Program: (error event ID 1008) (%count% events found) d \CheckEventLog.cpp(742) [3] Matched: + eventSeverity eq error for: SideBySide: (error event ID 35) (%count% events found) d \CheckEventLog.cpp(742) [3] Matched: + eventSeverity eq error for: SideBySide: (error event ID 59) (%count% events found) d \CheckEventLog.cpp(742) [2] Matched: + eventID eq 1008 for: Customer Experience Improvement Program: (error event ID 1008) (%count% events found) d \NSClient++.cpp(1109) Injected Result: CRITICAL 'SideBySide: (error event ID 35) (1 events found), SideBySide: (error event ID 59) (1 events found), Customer Experience Improvement Program: (error event ID 1008) (2 events found), eventlog: 4 > critical' d \NSClient++.cpp(1110) Injected Performance Result: ''eventlog'=4;1;1; ' CRITICAL:SideBySide: (error event ID 35) (1 events found), SideBySide: (error event ID 59) (1 events found), Customer Experience Improvement Program: (error event ID 1008) (2 events found), eventlog: 4 > critical|'eventlog'=4;1;1;// MickeM
mickem03/11/10 07:37:23 (6 months ago) -
Message #1668
(now nightly is up)
// Michael Medin
mickem03/11/10 08:57:30 (6 months ago)-
Message #1670
Ah hah... Yes, that works great:
EventLog: (warning event ID 6005) The Event log service was started. (1 events found), EventLog: (warning event ID 6008) The previous system shutdown at 7:39:31 AM on 3/11/2010 was unexpected. (1 events found), EventLog: (warning event ID 6009) Microsoft (R) Windows (R) 5.02. 3790 Service Pack 2 Uniprocessor Free. (1 events found), eventlog: 3 > critical|'eventlog'=3;1;1;
However, now one of my other event log checks on this host is failing with the dreaded "could not construct return paket" message:
Could not construct return paket in NRPE handler check clientside (nsclient.log) logs...
This service is configured like so (using the check_windows_eventlog2 check previously mentioned):
# Eventlog checks using check_windows_eventlog2 # # ARG1 = Event log to check # ARG2 = MaxWarn value # ARG3 = MaxCrit value # ARG4 = EventID to look for # ARG5 = Time filter # ARG6 = Severity (information, warning, error) define service { service_description Group policy success check_command check_windows_eventlog2!application!1!2!1704!\>48h!information host_name hostname check_period 24x7 notification_period 24x7 contact_groups nagios-admins notifications_enabled 1 use active-service }I know that this service is largely useless as I'm counting successes, I'm just using it to test this event log stuff. This service begun giving me the "could not construct return paket" error once I uninstalled the previous client and installed the nightly. The nsclient.log from this host doesn't give any more details:
2010-03-11 07:45:20: debug:CACHENSClient++.cpp:533: Attempting to start NSCLient++ - 0.3.8.29 2010-03-11 2010-03-11 07:45:20: message:CACHEmodules\FileLogger\FileLogger.cpp:93: Log path is: C:\Program Files\NSClient++\\nsclient.log 2010-03-11 07:45:20: error:modules\NRPEListener\NRPEListener.cpp:325: NRPESocketException: To much data cant create return packet (truncate datat)
Thank you for your great software and all your help!
Benny
bensec0103/11/10 14:48:38 (6 months ago)-
Message #1671
Gads, sorry about the formatting. Let me clean that up a bit:# Eventlog checks using check_windows_eventlog2
#
# ARG1 = Event log to check
# ARG2 = MaxWarn value
# ARG3 = MaxCrit value
# ARG4 = EventID to look for
# ARG5 = Time filter
# ARG6 = Severity (information, warning, error)
define service {
service_description Group policy success
check_command check_windows_eventlog2!application!1!2!1704!\>48h!information
host_name hostname
check_period 24x7
notification_period 24x7
contact_groups nagios-admins
notifications_enabled 1
use active-service
}
bensec0103/11/10 14:51:46 (6 months ago) -
Message #1672
This one is simpler...
the truncate=1023 is pretty bad (and I should really update all the examples). The data is actually longer then 1023 since you get 1023 +plus performance data so a better option is something like 900 (or to be precise: 1024-1-length(|'eventlog'=3;1;1;) = 1005.
BUT and this is a big but since the hist count (in this case it is 3) might become larger (ie more then 1 digit) so a "larger margin" is prudent.
So unless you really need as much data as possible truncate after 900 or some such.
(Maybe I should make truncate 900 default?)
// Michael Medin
mickem03/11/10 15:00:55 (6 months ago)-
Message #1673
Ah! OK. Good to know.
I've adjusted my truncate value to 900 in my check commands... So, now, it's doing something even weirder. I'm trying it from the command line, and now I get:
/usr/local/nagios/libexec/check_nrpe -H hostname -p 5666 -t 90 -c CheckEventLog -a file="system" filter=new filter=in MaxWarn=1 MaxCrit=1 filter-generated=\>10m filter+severity=="error" filter+eventID=="6008" truncate=900 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)"
SideBySide?: (error event ID 32) Dependent Assembly Microsoft.VC80.ATL could not be found and Last Error was The referenced assembly is not installed on your sys (1 events found), SideBySide?: (error event ID 59) Resolve Partial Assembly fai Reference error message: The referenced assembly is not installed on your syste . (2 events found), eventlog: 3 > critical|'eventlog'=3;1;1;
It's like it's not seeing the filters I'm trying to apply and returning all events...?
bensec0103/11/10 15:35:37 (6 months ago)-
Message #1675
Uuuuhhhhhhhhhh...
And it "fixed itself" ?
/usr/local/nagios/libexec/check_nrpe -H hntbw598 -p 5666 -t 90 -c CheckEventLog -a file="system" filter=new filter=in MaxWarn=1 MaxCrit=1 filter-generated=\>10m filter+severity=="error" filter+eventID=="6008" truncate=1024 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)" Eventlog check ok|'eventlog'=0;1;1;
/usr/local/nagios/libexec/check_nrpe -H hntbw598 -p 5666 -t 90 -c CheckEventLog -a file="system" filter=new filter=in MaxWarn=1 MaxCrit=1 filter-generated=\>10m filter+severity=="error" filter+eventID=="6008" truncate=900 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)" Eventlog check ok|'eventlog'=0;1;1;
I didn't touch this host, nor did I make any changes anywhere else.
bensec0103/11/10 15:56:08 (6 months ago)-
Message #1676
Does this mean everything is good?
// Michael Medin
mickem03/11/10 16:17:50 (6 months ago)-
Message #1678
Nope. :( The "Abrupt restart" service I have is working, yes. But the one that broke is still broken:
/usr/local/nagios/libexec/check_nrpe -H hntbw598 -p 5666 -t 90 -c CheckEventLog -a file="application" filter=new filter=in MaxWarn=1 MaxCrit=2 filter-generated=\>48h filter+severity=="information" filter+eventID=="1704" truncate=900 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)"
NagiosEventLog: (success event ID 0) failed to load: C:\Program Files (x86)\Monitoring\nagevlog.exe( reson: 193 (16 events found), VMTools: (success event ID 105) The service was started. (4 events found), VMUpgradeHelper: (success event ID 256) The VmUpgradeHelper service has started. (2 events found), VMUpgradeHelper: (success event ID 258) Restoring network configuration. (2 events found), VMUpgradeHelper: (success event ID 270) Not restoring network configuration for adap ID for this adapter is unchanged. (2 events found), VMUpgradeHelper: (success event ID 271) Restored network configuration. (2 events found), SceCli: (informational event ID 1704) Security policy in the Group policy objects has been applied successfully. (3 events found), MsiInstaller: (success event ID 11707) failed to load: C:\WINDOWS\SysWOW64\msi.dll( reson: ...|'eventlog'=35;1;2;
bensec0103/11/10 16:22:23 (6 months ago)-
Message #1679
Humm...
the reason it is not working is you are "relaying" on the "problem" I "fixed".
Damnit...
To fix or not to fix that is the question. Or perhaps filter=even-newer is the solution? :)
I shall think a bit about it and get back to you...
// Michael Medin
mickem03/11/10 16:36:14 (6 months ago)-
Message #1680
Oh! I thought what I was trying to do is very straightforward... If I'm doing things in a weird way, please don't change software just to try to accomodate me. All I need to do is get event log monitoring working, I don't need *this* specific test (the 1704 one) to work properly. I just created that one to test how this event log stuff works.
Are you saying that this specific (the 1704) test is goofy? If so, it's not really a valid service, it's just me fiddling with event log IDs to see how it all works together.
bensec0103/11/10 16:43:06 (6 months ago)-
Message #1681
OK, I guess I spoke too soon. My "abrupt restart" test service is also not working as it should. It just sent me an alert for an error in the system event log that is *not* the ID I look for.
The check command:
# check_windows_eventlog2
#
# Checks a Windows machine's eventlog for occurances (and counts) of event IDs
define command {
command_name check_windows_eventlog2
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t $USER11$ -c CheckEventLog -a file="$ARG1$" filter=new filter=in MaxWarn="$ARG2$" MaxCrit="$ARG3$" filter-generated="$ARG5$" filter+severity=="$ARG6$" filter+eventID=="$ARG4$" truncate=900 unique descriptions "syntax=%source%: (%severity% event ID %id%) %message% (%count% events found)"
}
And the service definition:
# Eventlog checks using check_windows_eventlog2
#
# ARG1 = Event log to check
# ARG2 = MaxWarn value
# ARG3 = MaxCrit value
# ARG4 = EventID to look for
# ARG5 = Time filter
# ARG6 = Severity (information, warning, error)
define service {
service_description Abrupt restart
check_command check_windows_eventlog2!system!1!1!6008!\>30m!error
host_name hntbw598
check_period 24x7
notification_period 24x7
contact_groups nagios-admins
notifications_enabled 1
use active-service
}
And yet, it sent me an alert for a 6013:
Mar 11 12:10:13 hntbw597 nagios: SERVICE ALERT: hntbw598;Abrupt restart;CRITICAL;HARD;6;EventLog?: (error event ID 6013) The system uptime is 5625 seconds. (1 events found), eventlog: 1 > critical
Mar 11 12:10:13 hntbw597 nagios: SERVICE NOTIFICATION: cbensend;hntbw598;Abrupt restart;CRITICAL;notify-service-by-email;EventLog?: (error event ID 6013) The system uptime is 5625 seconds. (1 events found), eventlog: 1 critical
So, to me, it looks like it's either not filtering properly on the event ID and is just sending me any of severity type "error", or I don't at all understand how this works.
Help?
bensec0103/11/10 19:30:10 (6 months ago)
-
-
-
-
-
-
-
-
-
-
Message #1719
Use the new where-filters much better and simpler!!! :P
// Michael Medin
mickem04/14/10 21:42:20 (5 months ago)








