Deployment Architecture

How to move the data to colddb after 30 days?

VatsalJagani
SplunkTrust
SplunkTrust

I want to move to data from hot/warm buckets to colddb (as that is in a different location in the end).

I've checked indexes.conf definition:

 

maxHotSpanSecs = <positive integer>
* Upper bound of timespan of hot/warm buckets, in seconds.

 

I tried changing the above setting, but I don't see data being moved to a new location (colddb) after pushing the configuration.

 

I found the below question on the community, but this is a too old question from 2014, so I want to confirm this before applying to Production.

https://community.splunk.com/t5/Getting-Data-In/How-send-indexed-data-older-than-3-months-to-colddb-...

 

maxHotSpanSecs = 86400
maxHotBuckets = 3
maxWarmDBCount = 30

 

 

What configuration should apply to achieve the above requirement?

Do Splunk will automatically move buckets to colddb on restart or we need to perform any manual steps?

 

Labels (1)
Tags (3)
0 Karma
1 Solution

VatsalJagani
SplunkTrust
SplunkTrust

Here are only two options that Splunk provides to specify when Splunk should move buckets from Warm Bucket to Cold Bucket.

 

homePath.maxDataSizeMB

  • Specifies the maximum size of 'homePath' (which contains hot and warm buckets).
  • If this size is exceeded, splunkd moves buckets with the oldest value of latest time (for a given bucket) into the cold DB until homePath is below the maximum size.

maxWarmDBCount

  • The maximum number of warm buckets.
  • Default - 300

 

In my case applying maxWarmDBCount setting to 3 large indexes solved the storage issue.

View solution in original post

VatsalJagani
SplunkTrust
SplunkTrust

Here are only two options that Splunk provides to specify when Splunk should move buckets from Warm Bucket to Cold Bucket.

 

homePath.maxDataSizeMB

  • Specifies the maximum size of 'homePath' (which contains hot and warm buckets).
  • If this size is exceeded, splunkd moves buckets with the oldest value of latest time (for a given bucket) into the cold DB until homePath is below the maximum size.

maxWarmDBCount

  • The maximum number of warm buckets.
  • Default - 300

 

In my case applying maxWarmDBCount setting to 3 large indexes solved the storage issue.

myudkowsky
Communicator

 

I'm surprised by this answer, specifically the statement there "are only two options that Splunk provides to specify when Splunk should move buckets from Warm Bucket to Cold Bucket."

Perhaps you can explain a bit further? As I read the specifications, there are other methods.

The original question mentions

maxHotSpanSecs = positive integer
* Upper bound of timespan of hot/warm buckets, in seconds.

as well as

maxWarmDBCount = nonnegative integer
* The maximum number of warm buckets.

Now if each warm bucket has a max span of 1 day, and we limit the count of warm buckets to 30, then warm buckets over 30 days old will be placed into the cold db.

Of course we may run into issues, as warm buckets are generated for a variety of reasons, such as a Splunk restart (current hot becomes warm) or hot idle (again, current hot becomes warm).

But it seems to me that this would be a reasonable proxy, especially if the user were willing to have, e.g., 1.5x more files than needed.

If you could explain why this scheme would not work, that would help me understand.

 

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@myudkowsky - Sorry for the late reply.

 

It does not work for the reason you mentioned, bucket move from hot to warm is not just dependent on maxHotSpanSecs. It depends on the size and Splunk restarts as well.

  • Violates if you do more Splunk restarts than expected. For example, if you do 15 Splunk restarts now you have only 15 days worth of data in warm.
  • If you are ingesting data that has a large volume, for example, 10 GB per day, then Splunk would create multiple warm DB per day (more than 10 buckets per day). By that calculation, you would have less than 3 days of data warm. You can set other parameters, but.

But, there is no guarantee in terms of time. Given those two parameters can set you the limit but that would be the maximum limit of 30 days, you cannot set a minimum limit.

But, as I mentioned in my previous answer, you can of course use maxWarmDBCount to achieve it at some level to avoid home path storage getting filled. I personally have used it multiple times, but no guarantee in terms of time.

 

I hope this answers your query. I hope this helps!! Upvote/karma would be appreciated!!

myudkowsky
Communicator

@VatsalJaganiThanks, precisely my thinking on the topic. I appreciate your swift response - it was waiting for me when I got back to my office.

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@richgalloway  - I've discussed with some other Splunk Admins and got to know exactly what you mentioned. There is no perfect way to limit data in a warm bucket by time.

But in my case, the issue was with storage full. Hence for that, I got to know an alternate option for that with Volumes.

 

Here is how:

# Volumes definition
[volume:hotvolume]
path = /hotstorge/splunk
maxVolumeDataSizeMB = 256000

[volume:cold]
path = /coldstorage/splunk

# Index definition
[myindex]
coldPath = volume:cold/myindex/colddb
homePath = volume:hot/myindex/db
thawedPath = /coldstorage/splunk/myindex/thaweddb

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If your problem is resolved, then please click an "Accept as Solution" button to help future readers.

---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Don't forget about warm buckets.  Hot buckets generally roll to warm before they roll to cold.  When Splunk restarts, all hot buckets become warm buckets.

While there are time constraints on how long a bucket remains hot and when a bucket is frozen, there are no time constraints on warm buckets.  Only size and count control when a warm bucket becomes a cold bucket.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...