How to Format JSON Data Ready for Splunk

Somerford Blog

Splunk is fantastic at receiving structured data in any format and then making sense of it for output to management and technicians alike, so most Splunk ingesting blogs are in the format, "How do I configure Splunk to work with … files". However, in this case, I have been asked, "hey, our developers want to set up their app logging to use JSON - what is the best JSON log format for easier Splunk searching?"

Splunk can make sense of any structured logs and it does include a special setup for JSON, in that it even has a default JSON sourcetype called "_json" (which you should not use - more on that later), but it is not all plain-sailing and I hope that this blog will help you to get your JSON log ingestion running smoothly and quickly.

A JSON Primer

JSON stands for JavaScript Object Notation and it is commonly used as a method for storing objects in JavaScript (also can be used with other computer languages) and transferring data between client and server. JSON files are made up of plain text, but JSON has its own way of providing a structure for data which is readily extensible and is very easy to read (with little explanation). More details about what JSON is are here. However, the most important things to know about how JSON objects are structured are as follows:

You Might Also Like:

5 Tips to Help you Optimise your Splunk Searches

• Make sure that each event is in its own JSON Object, i.e. starting and ending with curl brackets. This will make Splunk recognise that the event is JSON automatically and show the WHOLE of the event as JSON.
• Make sure that the log has the same structure (i.e. do not mix JSON with key-value pairs or Syslog in the same sourcetype). This will reduce complexity enormously.
• Add the index date/time field at the beginning (not the middle or end) of the event (i.e. the first key-value pair WITHIN THE JSON object) and NOT outside of the event. This makes it easier for humans and Splunk to find the date. So this is good:

How to Format JSON Data Ready for Splunk #8

• NOTE: As a test of this date placement recommendation in regex101, when it searched for the TIME_PREFIX at the start of the event, it took RegEx 13 steps to find the date, but when I moved the TIME_PREFIX to the end of the event, it took between 49 and 116 steps (depending upon complexity of REGEX), making this Aggregation Queue operation relatively expensive, so this optimisation is valid.
• It is not recommended that your events have duplicate value key names, which can make searching of data difficult and JSON requires that all keys are unique. Instead, for events that contain multiple keys with the same name and purpose, use JSON arrays (as mentioned above), so this is good practice:

1. Firstly, you can add extracted fields to the sourcetype and make them CIM-compliant, if required. For example, if I have two apps that generate JSON log data and one has one field included (such as “errortype”), but the other has a different field included (such as “successful_completion”) and they both are defined as using the _json sourcetype, then the process of finding the field to do an operation on is going to be much more complex than if the logs are differentiated with their custom sourcetypes, as the field extractions (as an example) can be attached to the sourcetypes, rather than the source, etc.
2. Secondly, if you use a custom sourcetype, searching will be easier because your search can then just look for the sourcetype, rather than searching on events with the _json sourcetype and then the source in each search.
3. And lastly, the _json sourcetype definition includes indexed extractions which we want to avoid. In most cases, we do not want to use Indexed Extractions because they can greatly increase the size (possible doubling or tripling?) of the index on the disk because all of the values in the event are kept in the index with the raw data. The only reason why we want to use Indexed Extractions is if the data is constantly being searched using the fields (using the double colon operator “::”, which is inflexible) rather than using the inbuilt fields and regular searches are way too slow.

More Resources like this one:

Splunk Security Essentials Deep Dive & Tutorial—Splunk for Cybersecurity—Threat Detection & Response

How to Create Advanced Splunk Dashboards, Panels and Reports — Creating Management-Ready Dashboards

Have a Query Relating to JSON?

Get in touch and we'd be happy to support you!

Cookie	Duration	Description
ADRUM_BT1	past	This cookie is used to optimize the visitor experience on the website by detecting errors on the website and share the information to support staff.
ADRUM_BTa	past	This cookie is used to optimize the visitor experience on the website by detecting errors on the website and share the information to support staff.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_1170872_23	1 minute	Set by Google to distinguish users.
_gat_gtag_UA_99925054_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	16 years 2 months 24 days 11 hours 26 minutes	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
cookie-test	past	No description
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
guest	1 month 1 hour	No description available.
jcm	past	No description
jcmc	past	No description
JOTFORM_SESSION	1 month	No description available.
SameSite	past	No description available.
theme	1 month 1 hour	No description available.
userReferer	1 month 1 hour	No description available.

How to Format JSON Data Ready for Splunk

A JSON Primer

JSON Formatting for Splunk

How to Configure Splunk to Read Your JSON Log

Worked Example

Configuration for props.conf

Useful JSON Tools

More Resources like this one:

Splunk Security Essentials Deep Dive & Tutorial—Splunk for Cybersecurity—Threat Detection & Response

How to Create Advanced Splunk Dashboards, Panels and Reports — Creating Management-Ready Dashboards

Have a Query Relating to JSON?