5 Tips to Help you Optimise your Splunk Searches
Author: Dan Gray
Release Date: 01/02/2024
Splunk is a great tool for aggregating and interrogating the data you hold but to really get the best value out of your data you should be utilising search best practices. I’ll outline here some of the best practices that will have your searches running quickly and efficiently while returning accurate results that will give you unparalleled insights into your data.
First, why should we care about using efficient searches?
Primarily, it saves time and resources. The longer it takes for a search to run the longer you’re using a CPU core and memory on the indexer, cumulatively this can impact the overall performance of the Splunk environment. Each Splunk search occupies a CPU core for as long as it takes to run, one reason why real-time searches should only be used very sparingly! This means that the more efficiently you can search the quicker you can hand that CPU core back to the processor to run more Splunk jobs.
From our perspective, that of a user or analyst, we’ll spend much less time waiting around for searches to complete allowing us to work much more efficiently. From the perspective of the business, efficient searches allow for faster resolution of Splunk security incidents or ITSI notable events. Cost savings can also add up as you will save on computational resources and man hours.
With that out of the way, let's talk about my top 5 tips to make our searches more efficient.
Tip #1 - Specify your Indexes
Splunk sorts data into indexes, you can think of these as the cupboard Splunk has to open to go looking for your data. If you don’t specify an index, Splunk will look in each cupboard to retrieve the data you want.
Below is a real world example taken from the job inspector before and after optimising a search by simply specifying the indexes to run against. We found that needle in the haystack 10 times quicker!
This search has completed and has returned 1 results by scanning 16,113,296 events in 3,100.902 seconds
This search has completed and has returned 1 results by scanning 14,381,772 events in 313.665 seconds
Tip #2 - Appropriate time range selection
Inside each index Splunk groups events into buckets, these buckets are labelled with a timestamp which is the first thing Splunk will read to decide whether the information you’re looking for is inside the bucket.
Stretching our cupboard analogy a tad, imagine within each cupboard were a number of draws with a label on showing the month in which the data was filed away. If you tell Splunk you’re only interested in things that happened in October, it won’t have to waste time looking in January - September.
As we can see from the image above, if we were only interested in events from the last 24 hour period but neglected to set our time picker we would have spent an extra 47.5 seconds searching through 245,000 more events than we had to.
Tip #3 - Use field names
To improve the precision of your search, specify field names whenever possible. This makes your search more explicit and reduces the number of buckets Splunk needs to look inside and return data from.
By default Splunk extracts a number of fields, these include host, source and sourcetype. These fields and their values are added to the label mentioned in tip #2. This label is called a time-series index file or tsidx for short. Splunk scans these tsidx files before opening up the raw data to check if a field of interest is inside.
By specifying one of these default fields you’ll save time by reducing the number of buckets Splunk needs to look inside, it has the added benefit of returning only the results that you’re interested in too, saving you from wading through tonnes of irrelevant events.
Tip #4 - Use wildcards effectively
Try to avoid using wildcards when you can, try to narrow down your search by being explicit when you can. If you have to use wildcards then avoid using leading wildcards. These are very resource intensive because Splunk has to search every bucket looking for your matching term then work backwards to see if it matches your pattern.
Avoid using wildcards in the middle of a string, Splunk say it best, taken directly from the documentation.
“A search that uses a wildcard in the middle of the term returns inconsistent results because of the way in which data that contains punctuation is indexed and searched.”
If you’re interested why, you can find more information in the documentation itself - https://docs.splunk.com/Documentation/SCS/current/Search/Wildcards
Tip #5 - The job inspector is your friend
Throughout this blog there’s been images of the total time taken for each search job taken from Job inspector but you can find loads more detail in there. The part we’ll focus on for this blog is the execution costs dropdown. Here we can find the most time consuming components of your search so you can address these and improve performance.
More Resources like this one:
Somerford's Added Value Explained
Partner & Customer Testimonials |
Business Value Panel Discussion
Your Elite Splunk Professional Services Partner