Deployment Architecture

search heads failing because of huge knowledge bundles

soumyasaha25
Contributor

currently half of my searchheads are shutdown (auto shutdown due to issues within Splunk) and the remaining are not able to query the indexers
The problem is caused by a large knowledge bundle.
when i checked the .bundle files on the SHs, it is a huge (~340 MB) file with what looks like a huge python code.
i have maxBundleSize set to 2048(which is the default)
i have a blacklist in distsearch.conf which is as below:

[replicationSettings]
maxBundleSize = 2048

[replicationBlacklist]
<name for bin directories> = (.../bin/*)
<name for InstallDirectories> = (.../install/*)
<name for AppServerDirectories> = (.../appserver/*)
<name for allAppUIDirectories> = (.../default/data/ui/*)
<name for allOldDefaultDirectories> = (.../default.old.*)

My questions is: is there any way to check what files/apps are included in this bundle that is causing issues and if those items are required or can be excluded.

0 Karma
1 Solution

jwelch_splunk
Splunk Employee
Splunk Employee

mkdir -p /tmp/support
tar xvf /opt/splunk/var/run/blah.bundle -C /tmp/support
cd /tmp/support
du -h --max-depth=1 |sort -hr |more

Walk it out most likley be in apps/*/lookups

You can blacklist any lookup that is not:

  1. Automatic Lookup "props.conf"
  2. any csv that is not being searched with a |lookup . because that is a remote lookup...If you are using | lookup local=true you could blacklist it.

And if you blacklist it..... and you get an error after the fact. Unblacklist it.

Don't forget that you have a Transmit side and a Receive side

In your case the transmit side is your SH and the distsearch.conf setting maxBundleSize applies
however
the receive side is your indexers... and that setting is server.conf
[httpServer]
max_content_length = blah

And depending on your version it might be 800mb or 2gb but written as 2147483648 in the 2gb example.

Hope this helps.

View solution in original post

0 Karma

jwelch_splunk
Splunk Employee
Splunk Employee

mkdir -p /tmp/support
tar xvf /opt/splunk/var/run/blah.bundle -C /tmp/support
cd /tmp/support
du -h --max-depth=1 |sort -hr |more

Walk it out most likley be in apps/*/lookups

You can blacklist any lookup that is not:

  1. Automatic Lookup "props.conf"
  2. any csv that is not being searched with a |lookup . because that is a remote lookup...If you are using | lookup local=true you could blacklist it.

And if you blacklist it..... and you get an error after the fact. Unblacklist it.

Don't forget that you have a Transmit side and a Receive side

In your case the transmit side is your SH and the distsearch.conf setting maxBundleSize applies
however
the receive side is your indexers... and that setting is server.conf
[httpServer]
max_content_length = blah

And depending on your version it might be 800mb or 2gb but written as 2147483648 in the 2gb example.

Hope this helps.

0 Karma

soumyasaha25
Contributor

when i ran the commands as suggested by you, i got the below results, i was of the view that
2.6G ./apps
2.6G .
328K ./system
56K ./users
48K ./kvstore_s_SA-
is it safe to blacklist the apps directory entirely, we have a huge dependency on the TA and app for AWS. on further troubleshooting i found that the lookup aws_description.csv is taking up close to 2.3 GB. is it safe to blacklist the aws_description.csv lookup, since we would require aws description data for alerts and reports.
In case i need to blacklist, will the below setting work

[replicationBlacklist]
<name for lookup directories> = (.../lookups/...)
<name for bin and jardirectories> = (.../(bin|jars)/...)
0 Karma

jwelch_splunk
Splunk Employee
Splunk Employee

I deleted my last post because I missed your part about the aws_description.csv being 2.3 GB.

As I mentioned earlier..... You need to find out if that file is being used as part of an automatic lookup in a props statement. If it is not blacklist the file. If you get errors after the fact un-blacklist it.

And figure out why that csv is so big. You might want to file a support case and work with an AWS SME.

Bottom line is if the lookup is being performed on the SH you don't need the CSV in the bundle.

If you find you do need it, then you need to increase your maxBundleSize and max_content_length, but I would suspect something is wrong if that file is 2.3 gb

0 Karma

soumyasaha25
Contributor

Thanks a lot for your response.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...