Free feature
Compaction¶
A dataset is an append-only immutable log of data that would, left unchecked, grow forever. This problem is partly mitigated as entities are only written to the log if they are new or different (based on a content hash comparison) from the most recent version of that entity. To supplement this, and ensure that a dataset does not consume all available disk space, a retention policy can be defined. A retention policy describes the general way in which the log should be compacted. The compaction feature in Sesam controls such policies.
The default policy is to keep two versions of every entity. This is the minimum number of versions to keep in order to make dependency tracking work. A time-based policy is also available allowing you to say how old and entity can be before it becomes a candidate for compaction. If sink compaction is disabled the dataset is automatically compacted once every 24 hours.
Note
Compaction will only be performed up to the lowest offset for which there exists a pipe doing dependency tracking on the dataset. Each pipe doing dependency tracking keeps a tracking offset on the dataset so that it knows which entities to perform dependency tracking for. It is this tracking offset that compaction cannot go beyond. This is done so that those pipes should not fall out of sync. If the compaction did not hold off then we could not guarantee that the output of those pipes are correct.
Be aware that disabled pipes also hold off compaction. If the pipes are to be disabled for a long time then it is better to remove the pipe, or alternatively comment out the hops.
Properties¶
Property |
Type |
Description |
Default |
Req |
---|---|---|---|---|
|
Boolean |
If |
|
No |
|
Boolean |
If |
|
No |
|
Integer |
The number of unique versions of an entity to keep around. The default is Warning If the value is less than |
|
No |
|
Integer |
Specifies the threshold for how old entities must be before they are considered for compaction. This property is usually used when you want to keep entities around for a certain time. |
|
No |
|
Integer |
Same as |
|
No |
|
Float |
The growth factor required for the automatically scheduled compaction to kick
in. Uses the minimum value of |
|
No |
|
Float |
Specifies the sink compaction interval. If this value is zero, sink compaction will run every time
the pipe runs. If it is larger than zero, sink compaction will only run if at least
|
|
No |
|
Number |
Enables TTL compaction for deletes if set. The value determines the number of hours until a deleted entity is considered for compaction. When the entity is compacted away, all versions of the entity will be removed from the database. |
|
No |
|
Number |
Determines the number of seconds that the pipe is allowed to spend on the TTL compaction process. If the pipe times out, the compaction process will continue from where it last stopped the next time the pipe runs. |
60 |
No |