Tuning Concepts for Splunk ES with a Risk Based Alerting Approach (RBA Part 2)
Author: Ben Marrable
Release Date: 02/02/2022
Once RBA is set up and running, you may find you are getting too many alerts or perhaps not very many at all, either way it is expected that some tuning is required. There are a number of areas where tuning can be conducted and it is important to understand the effects of each tuning mechanism. Tuning options available are:
Relevant Blog Post:
- Tune the detection search itself (Risk Rule)
- Tune the amount of Risk associated with the Risk Rule, either based off of the impact or the confidence of the alert.
- Tune the risk modifiers and or the risk framework you’re working from, i.e alter what each impact or confidence level should align to risk
- Tune the Risk Indicator Rules and what thresholds you’re looking to alert at.
1 and 2 are classed as local tuning mechanisms, whereas 3 and 4 are global tuning mechanisms, i.e. they affect all the Risk Rules in the system.
Depending on the situation, the appropriate tuning mechanism will vary, the starting point is to analyse the number of events generated from each risk rule and the corresponding quantity of risk. This search should assist in that process, it will show each risk rule (Risk Rule) with the corresponding total amount of Risk (Total Risk Score Attributed) it has generated, the average amount of Risk for each trigger(Average Risk Score Applied), the distinct number of entities that have triggered against (Number of Objects Triggered Against) and the total number of times it triggered (Number of Times the Risk Rule Triggered)
| tstats summariesonly=true sum(All_Risk.risk_score) as risk_score, avg(All_Risk.risk_score) as avg_risk dc(All_Risk.risk_object) as risk_objects, count from datamodel=Risk.All_Risk where (All_Risk.risk_object=”*” OR risk_object=”*”) by source | sort 1000 – risk_score, count | rename source as “Risk Rule”, risk_score as “Total Risk Score Attributed”, avg_risk as “Average Risk Score Applied”, risk_objects as “Number of Objects Triggered Against”, count as “Number of Times the Risk Rule Triggered”
Any rules which are producing a significant amount of risk, generally highlight an area to tune, consequently rules that are producing a small amount of risk provides possible scope for an increase. Tuning mechanisms in place here are 1 and 2 in the above list.
Before we talk about false positives, let’s be clear what we mean by false positives in the way of risk based alerting. False positives are very rare in the world of Risk Based alerting, any behaviour in the environment that meets the criteria for the risk rule, is considered a positive event regardless of whether it is malicious or not. This is because Risk Based alerting is all about connecting behaviours across a number of different risk rules.
However, if there are clear and distinct false positives, these could be excluded from the risk rule itself, for example if there’s a vulnerability scanner acting legitimately, consider excluding it from the risk rule. When doing this be as specific as possible, i.e. include the user, host and timeframe as part of the exclusion. An alternative to this could be to add that vulnerability scanner as a risk modifier and have that modifier significantly reduce the risk aligned to it. This is an example of using method 1 for tuning, by filtering out events at source. The second example of using method 1 for tuning occurs for risk rules where there is an element of aggregation involved, for example a risk rule which alerts should a user connect to X number of machines in a day, that threshold if unable to be dynamic based on the users role, could be increased if it is producing many alerts per day.
Once there are no clear repetitive false positives, the next step is to assess the levels of risk being aligned to the risk rule. If the risk rule is clearly producing a significant amount of risk, knowing that it is still very useful to know about, consider reducing the risk score that it generates, this could even be to 0 in certain situations, where you want to know about a specific event and include it in the number of mitre tactics/techniques involved but not alert strictly from any number of those events.
From the getting started guide, you should have a matrix(table) of severity, confidence and risk score factors calculated for each risk rule. Consider further tuning of your confidence levels and aligned risk multipliers for them, with the goal of more refined base risk scores.
The second area is to search across your rules with context around the knowledge of the risk modifiers, the following search will show the count of risk rule matches for both additive and multiplicative risk rule modifiers. Be wary of modifiers that are very common, and consider being more specific if possible.
| from datamodel:Risk | fillnull risk_factor_mult_matched risk_factor_add_matched value=”No Risk Factor Match” | stats values(risk_factor_mult) as “Multiplier Value” values(risk_factor_add) as “Additive Value” count as “Count” by risk_factor_mult_matched, risk_factor_add_matched | rename risk_factor_mult_matched as “Multiplier Risk Factors Matched” risk_factor_add_matched as “Additive Risk Factors Matched”| sort – Count
The final area, once all the above tuning has occurred as best as possible, is to tune the risk indicator rules themselves. The first point of call here is to identify why the risk rules are triggering or even not triggering, i.e. which thresholds are and are not being met. The default search for a 24 hour period is to alert when the threshold hits 100 points of risk, this could be increased or alternatively consider some of the following options:
- If you are seeing many alerts created from a single risk rule, consider adding a threshold of source_count >=2
- Or if you wish to only be alerted when there is more than one MITRE Technique involved add mitre_technique_count >=2 to the threshold in the search
- You could also be very selective and have a threshold such as
| where risk_score>100 AND ((mitre_tactic_id_count >= 2 AND source_count >= 2) OR (annotations.mitre_attack=T1098*))
Experiment with other threshold values to find what works best for your environment.
Good luck with your tuning process in getting the most out of Risk Based Alerting.