Skip to content

Feature: Selectively applicable conditions and metrics#395

Open
inqrphl wants to merge 18 commits into
mainfrom
feature-selectively-applicable-conditions-and-metrics
Open

Feature: Selectively applicable conditions and metrics#395
inqrphl wants to merge 18 commits into
mainfrom
feature-selectively-applicable-conditions-and-metrics

Conversation

@inqrphl

@inqrphl inqrphl commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

This PR includes changes to the core Conditon and Metric logic - all features are optional, and do not effect unless used

Core changes

add whitelist and blacklists to Condition struct. -> these are a slice to map[string]string , default golang map is a wrapper to heap memory region used to save key-values, so it does not copy the whole map again. it behaves like a pointer array, but needs helper functions to compare underlying heap table. these are used to store entries for which and restrict/allow entries in Condition.Match

if the entry is in its blacklist, it will not be allowed to check against that. If the Condition has a nonzero length whitelist, it is also effective, only allowed to check entries in the whitelist

these entries come from different checks in snclient, and are added to check.listData. Once they are allocated, the pointer is set and can be added to whitelist/blacklist. Modifying the entry later does not invalidate them in whitelist/blacklist as the underlying pointer to map data is the same

in the Condition.Match a disallowed entry returns inconclusive result - meaning the match is null and void, it does not tell anything about the match. This is the second return argument of Condition.Match

these conditions then influence the return state of the check - each check.listData entry is compared against each Condition in the ok, warning and critical threshold. If they are inconclusive however, they do not modify the results. This many to many check can be somewhat restricted using whitelists and blacklists

add reference the entry object to Metric struct -> which entry generated this metric, if this is known if conditions that check if its whitelisted/blacklisted before using it to build perfdata string

Check_drivesize

use these changes to add selective drive-specific metric and threshold handling to check_drivesize. these come in two forms

  1. if a condition that uses the keyword '[drive] x_pct' is given, this is taken as a specialized keyword. other conditions that use 'x' are blacklisted for that drive entry.

this is done using disableGenerallizedConditionsForDrive helper.

This helps to give used_pct thresholds specific to drives - other drives do not include them while building their perf threshold string as they are blacklisted for being generic

ahmet@jelinek:~/repositories/snclient$ ./snclient --logfile stdout -vvv run check_drivesize drive="/" drive="/tmp" warning="used_pct gt 90" warning=" '/tmp used_pct' gt 80"
OK - All 2 drive(s) are ok |'/ used'=548949176320B;904451781427;904451781427;0;1004946423808 '/ used %'=54.6%;90;90;0;100 '/tmp used'=150626304B;;30080176128;0;33422417920 '/tmp used %'=0.5%;80;90;0;100
  1. check all thresholds/condition lists before adding a metric about a drive. check if there is a condition that has "drive"
    keyword, and matches the current entry -

if there is no condition with the specialized keyword - all conditions are generic and pass through the filter

if there is a specialized condition with keyword "drive" and that matches the drive entry - add all conditions with a subcondition passing through the filter using drive entry

The heuristic here might not be fully correct, but looking too deep is not needed now

This helps if the condition is specified using multiple conditions with a group operator. Here, the second condition wont be added to the '/' drive when adding it as a metric.

ahmet@jelinek:~/repositories/snclient$ ./snclient --logfile stdout -vvv run check_drivesize drive="/" drive="/boot" warning="used_pct gt 90" warning="drive eq '/boot' and used_pct gt 50"
WARNING - warning(/boot 493.473 MiB/943.234 MiB (52.3%)) |'/ used'=548475305984B;904451781427;904451781427;0;1004946423808 '/ used %'=54.6%;90;90;0;100 '/boot used'=517443584B;494526464;890147635;0;989052928 '/boot used %'=52.3%;50;90;0;100

If it was added, the perfdata generation would detect two conditions in total using 'used_pct' keyword, one global one with value 90 and one specific to '/boot' drive which still went through, with value 50. It would then think it has a range was @50:90

Other changes

add a new processMetricsWithSpecializedKeyword() functon to CheckData -> this calls processMetricsWithSpecializedKeyword for warning and critical thresholds of Metric object

add a new transformKeywordsUsingAttributes() function to CheckData -> this transforms keywords using a format specified. This can be supplied from a list of attributes programmatically changing the keywords

add a new disableGenerallizedConditionsUsingAttributes() function to CheckData -> this this also buids a specialized and generallized keyword using the specified formats. Then it calls disableGenerallizedConditionsForEntry() for ok, warn and crit thresholds of the CheckData

add a new DetailedString() function to Condition, it prints out more fields, use this in logs. Helpful as it also prints the original version of the condition - the keywords might be transformed later, either by the check or to make comparisons with metrics possible

add a new BlacklistWhitelistCheck() function to Condition - fastly checks if condition is accepted through the lists, this is used in Condition.Match()

add a new GetListOfKeywords() function to Condition - recursively gets lists of keywords, useful to ascertain if a condition should be inspected further

add a new disableGenerallizedConditionsForEntry() for ConditionList - checks if specialized keyword is used and if so filters to conditions not using the specialized keyword, and then furhter filters to conditions using the generallized keyword. disables them for the entry. this is used in the specialized metric name matching

add a new filterConditionsUsingKeywords() function to ConditionList - filters the condition list simply

add a new ifKeywordIsPresentAndPermitsEntry() function to ConditionList - used to determine if keyword is present in the condition and is allowed through using Check.Match() . this is used to see if a specialized condition is present in the conditionlist

add a new filterForSpecializedKeyword() function to ConditionList - this does the heuristic described in the second case, but is generallized over the ConditionList. Checks if there is a special condition present using ifKeywordIsPresentAndPermitsEntry() , and if so filters only the condtions using that keyword and matches it. Otherwise the condition list is generic, it includes everyhting

add SubtractSlice[T comparable](op1, op2 []T) (ret []T) to utils.go - used while filtering out slice results from one another

MapsEqual(a, b map[string]string) to utils.go - looks into the actual data pointer of the map, checks if they are the same. maps cannot be directly compared using == operator

ContainsMap(slice []map[string]string, target map[string]string) bool to utils.go - this is used in blacklist/whitelist checks with MapsEqual

Deduplicate[T comparable](slice []T) []T to utils.go - another function to clean up elements of a slice before saving

add logs around Condition checks for check.details, check.listData entries, check.result.Metrics etc.
add logs when a Condition blacklists/whitelists an entry
add tests around whitelist/blacklist functionality

inqrphl and others added 13 commits June 26, 2026 00:27
save a list of refrences to entries that a Condition object wont be evaulated against. using Condition.Match with these data will return inconclusive results

save a reference to the listdata entry that a Metric object was generated from. if such a reference is present, and the Condition object has it in its skip list, it wont be used when building the perf threshold string

add helper function that looks through a condition list, takes out Condition objects where specialized keywords are present, then filters to Condition objects that have a generallized keyword present. These generallized versions can have entries added to their evaluation skip list

improve .String() on Condition objects, add more log points, add helper function for slice subtraction

use check_drivesize as a testing ground for these changes. If '<drive> used_pct' keyword is present, it will be taken as a specialized keyword, and generallized 'used_pct' containing Conditions will have the drive entry added to their blacklist
…ing skip list for entry. this prevents golangci-lint from complaining

this works because golang maps use an internal pointer to heap to save their key-value data. while maps are not directly comparable, using reflect.ValueOf(map).Pointer() to compare their pointers

this way, we can save the skipList using map[string]string, and use the internal pointer comparsion to check if we need to skip. see utils.go for more details

check_drivesize now passes the entry normally as map[string]string
utility: subtractslice , mapsequal , containsmap
condition: testing skiplist
…is intended to give a parseable string output

make a new function called DetailString() that gives more debug-like output usable in trace logs
improve log messages around condition and metric entry skipping
use a combined whitelist/blacklist checker in condition.match() and to see if condition is applicable in perfdata string

when adding metrics in check_drivesize, use a heuristic to check if a condition is a specialized condition, with drive keyword and matches an entry - if so add it

if there are no specialized condition, present add every condition as it is generic
log metric name when using blacklist+whitelist to filter conditions for perfstring
@lgmu

lgmu commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Does it have to be hardcoded for the used_pct metric? I think it would be better if it was generic for all available metrics like free_bytes, inodes_used_pct etc.

https://omd.consol.de/docs/snclient/checks/commands/check_drivesize/#filter-keywords

This works:

check_drivesize drive=/boot drive=/home "warning=used_pct gt 66" "warning='/home used_pct' gt 77"
OK - All 2 drive(s) are ok |'/boot used'=182829056B;352028836;480039322;0;533377024 '/boot used %'=34.3%;66;90;0;100 '/home used'=561360896B;;933024154;0;1036693504 '/home used %'=54.1%;77;90;0;100

This doesn't:

check_drivesize drive=/boot drive=/home "warning=used_pct gt 66" "warning='/home inodes_used_pct' gt 77"
OK - All 2 drive(s) are ok |'/boot used'=182829056B;352028836;480039322;0;533377024 '/boot used %'=34.3%;66;90;0;100 '/home used'=561360896B;684217713;933024154;0;1036693504 '/home used %'=54.1%;66;90;0;100

Here I would have expected to additionally receive the inodes performance data which is set to 77% for /home.


Also another thing I noticed, why isn't the metric called like /boot inodes_used % but instead /boot inodes?

check_drivesize drive=/boot drive=/home "warning=inodes_used_pct gt 66" "warning='/home inodes_used_pct' gt 77"
OK - All 2 drive(s) are ok |'/boot used'=182829056B;;480039322;0;533377024 '/boot used %'=34.3%;;90;0;100 '/boot inodes'=0%;66;;0;100 '/home used'=561360896B;;933024154;0;1036693504 '/home used %'=54.1%;;90;0;100 '/home inodes'=0.4%;66;;0;100

@inqrphl

inqrphl commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Does it have to be hardcoded for the used_pct metric? I think it would be better if it was generic for all available metrics like free_bytes, inodes_used_pct etc.

https://omd.consol.de/docs/snclient/checks/commands/check_drivesize/#filter-keywords

This works:

check_drivesize drive=/boot drive=/home "warning=used_pct gt 66" "warning='/home used_pct' gt 77"
OK - All 2 drive(s) are ok |'/boot used'=182829056B;352028836;480039322;0;533377024 '/boot used %'=34.3%;66;90;0;100 '/home used'=561360896B;;933024154;0;1036693504 '/home used %'=54.1%;77;90;0;100

This doesn't:

check_drivesize drive=/boot drive=/home "warning=used_pct gt 66" "warning='/home inodes_used_pct' gt 77"
OK - All 2 drive(s) are ok |'/boot used'=182829056B;352028836;480039322;0;533377024 '/boot used %'=34.3%;66;90;0;100 '/home used'=561360896B;684217713;933024154;0;1036693504 '/home used %'=54.1%;66;90;0;100

Here I would have expected to additionally receive the inodes performance data which is set to 77% for /home.

Also another thing I noticed, why isn't the metric called like /boot inodes_used % but instead /boot inodes?

check_drivesize drive=/boot drive=/home "warning=inodes_used_pct gt 66" "warning='/home inodes_used_pct' gt 77"
OK - All 2 drive(s) are ok |'/boot used'=182829056B;;480039322;0;533377024 '/boot used %'=34.3%;;90;0;100 '/boot inodes'=0%;66;;0;100 '/home used'=561360896B;;933024154;0;1036693504 '/home used %'=54.1%;;90;0;100 '/home inodes'=0.4%;66;;0;100

Let me see if I can generalize the functions more, so that I can add more metrics to behave like this easily.

@inqrphl inqrphl marked this pull request as draft June 29, 2026 07:54
Ahmet Oeztuerk added 3 commits June 29, 2026 14:53
…em more generallizable for further checks

conditionlist now has four functions:

disableGenerallizedConditionsForEntry
filterConditionsUsingKeywords
ifKeywordIsPresentAndPermitsEntry
filterForSpecializedKeyword

checkdata:
addBytePercentMetrics and AddPercentMetrics now take an entry parameter, they add it while constructing a metric

has three helper functions
processMetricsWithSpecializedKeyword
transformKeywordsUsingAttributes
disableGenerallizedConditionsUsingAttributes

Use these helper functions in check_drivesize.go:
transformDrivePctMetrics
disableGenerallizedConditionsForDrive

go through all attributes of percent type, and check if they are specified specifically for a drive. transform them using this method.

also specify the entries in other metrics, which enables specialized filtering using drive= and perflabel= for more attributes
@inqrphl inqrphl marked this pull request as ready for review June 29, 2026 15:58
Ahmet Oeztuerk added 2 commits June 30, 2026 14:58
unlike AddBytePercentMetrics, it was not adding " %" to the name of the metric

this caused problems when matching drive specific inodes_free and inodes_used conditions meant to match with perfdata labels
@lgmu

lgmu commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

check_drivesize drive=/boot "warning=(used_pct gt 80) or (inodes_used_pct gt 85)" "critical=(used_pct gt 90) or (inodes_used_pct gt 95)"
OK - All 1 drive(s) are ok |'/boot used'=182829056B;426701619;480039322;0;533377024 '/boot used %'=34.3%;80;90;0;100 '/boot inodes_used %'=0%;85;95;0;100

-> This works correctly now


check_drivesize drive=/boot drive=/home drive=/tmp "warning=(used_pct gt 80) or (inodes_used_pct gt 85)" "critical=(used_pct gt 90) or (inodes_used_pct gt 95)" "warning=drive eq '/tmp' and used_pct gt 66" "critical=drive eq '/tmp' and used_pct gt 88"
OK - All 3 drive(s) are ok |'/boot used'=182829056B;426701619;480039322;0;533377024 '/boot used %'=34.3%;80;90;0;100 '/boot inodes_used %'=0%;85;95;0;100 '/home used'=561356800B;829354803;933024154;0;1036693504 '/home used %'=54.1%;80;90;0;100 '/home inodes_used %'=0.4%;85;95;0;100 '/tmp used'=194297856B;2429131162;3238841549;0;3680501760 '/tmp used %'=5.3%;66;88;0;100 '/tmp inodes_used %'=0.1%;;;0;100

-> This also works correctly now


check_drivesize drive=/boot drive=/home drive=/tmp "warning=(used_pct gt 80) or (inodes_used_pct gt 85)" "critical=(used_pct gt 90) or (inodes_used_pct gt 95)" "warning=drive eq '/tmp' and ((used_pct gt 66) or (inodes_used_pct gt 77))" "critical=drive eq '/tmp' and ((used_pct gt 88) or (inodes_used_pct gt 99))"
OK - All 3 drive(s) are ok |'/boot used'=182829056B;426701619;480039322;0;533377024 '/boot used %'=34.3%;80;90;0;100 '/boot inodes_used %'=0%;85;95;0;100 '/home used'=561356800B;829354803;933024154;0;1036693504 '/home used %'=54.1%;80;90;0;100 '/home inodes_used %'=0.4%;85;95;0;100 '/tmp used'=194260992B;;;0;3680501760 '/tmp used %'=5.3%;;;0;100 '/tmp inodes_used %'=0.1%;;;0;100

Warning

Here the thresholds for /tmp used % and /tmp inodes_used % are both empty. It's not possible to combine them in one condition?


check_drivesize drive=/boot drive=/home drive=/tmp "warning=(used_pct gt 80) or (inodes_used_pct gt 85)" "critical=(used_pct gt 90) or (inodes_used_pct gt 95)" "warning=drive eq '/tmp/' and ((used_pct gt 66) or (inodes_used_pct gt 77))" "critical=drive eq '/tmp/' and ((used_pct gt 88) or (inodes_used_pct gt 99))"
OK - All 3 drive(s) are ok |'/boot used'=182829056B;426701619;480039322;0;533377024 '/boot used %'=34.3%;80;90;0;100 '/boot inodes_used %'=0%;85;95;0;100 '/home used'=561356800B;829354803;933024154;0;1036693504 '/home used %'=54.1%;80;90;0;100 '/home inodes_used %'=0.4%;85;95;0;100 '/tmp used'=194326528B;2944401408;3312451584;0;3680501760 '/tmp used %'=5.3%;80;90;0;100 '/tmp inodes_used %'=0.1%;85;95;0;100

Warning

Here the warning/critical condition for /tmp seems to get ignored entirely, because of the trailing /.
Maybe we can do the same as for drive= that a trailing / always gets ignored?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants