transform ingested HEC JSON log as regular log

PT_crusher · ‎10-15-2024

Looking for props.conf / transforms.conf configuration guidance.

The aim is to search logs from a HTTP Event Collector the same way we search for regular logs. Don't want to search JSON in the search heads.

We're in the process of migrating from Splunk Forwarders to logging-operator in k8s. Thing is, Splunk Forwarder uses log files and standard indexer discovery whereas logging-operator uses stdout/stderr and must output to an HEC endpoint, meaning the logs arrive as JSON at the heavy forwarder.

We want to use Splunk the same way we did over the years and want to avoid adapting alerts/dashboards etc to the new JSON source

OLD CONFIG AIMED TO THE INDEXERS (using the following config we get environment/site/node/team/pod as search-time extraction fields)

[vm.container.meta]
# source: /data/nodes/env1/site1/host1/logs/team1/env1/pod_name/localhost_access_log.log
CLEAN_KEYS = 0
REGEX = \/.*\/.*\/(.*)\/(.*)\/(.*)\/.*\/(.*)\/.*\/(.*)\/
FORMAT = environment::$1 site::$2 node::$3 team::$4 pod::$5
SOURCE_KEY = MetaData:Source
WRITE_META = true

SAMPLE LOG USING logging-operator

{
"log": "ts=2024-10-15T15:22:44.548Z caller=scrape.go:1353 level=debug component=\"scrape manager\" scrape_pool=kubernetes-pods target=http://1.1.1.1:8050/_api/metrics msg=\"Scrape failed\" err=\"Get \\\"http://1.1.1.1:8050/_api/metrics\\\": dial tcp 1.1.1.1:8050: connect: connection refused\"\n",
"stream": "stderr",
"time": "2024-10-15T15:22:44.548801729Z",
"environment": "env1",
"node": "host1",
"pod": "pod_name",
"site": "site1",
"team": "team1"
}

sainag_splunk · ‎10-15-2024

The only way you can do is to use /services/collector/raw endpoint.

I understand the desire to maintain your existing Splunk setup, I would advise against using the raw endpoint (/services/collector/raw) to transform the JSON logs back into regular log format. This approach would unnecessarily increase system load and complexity.

Instead, the best practice is to use the existing event endpoint (/services/collector/event) for ingesting data into Splunk. This is optimized for handling structured data like JSON and is more efficient.

I recommend adjusting your alerts and dashboards to work with the new JSON structure from logging-operator. While this may require some initial effort, it's a more sustainable approach in the long run:

Update your search queries to use JSON-specific commands like spath or KV_MODE=JSON to extract fields.
Modify dashboards to reference the new JSON field names.
Adjust alerts to use the appropriate JSON fields and structure.

PickleRick · ‎10-15-2024

I don't think that's the issue here. The same payload sent to the /raw endpoint would end up looking the same. It's the source formatting the data differently than before.

PT_crusher · ‎10-15-2024

raw endpoint is not an option because it is not supported by the logging-operator

transform ingested HEC JSON log as regular log

configuration

using Splunk Enterprise

Mastering Data Pipelines: Unlocking Value with Splunk

The Latest Cisco Integrations With Splunk Platform!

AI Adoption Hub Launch | Curated Resources to Get Started with AI in Splunk