Free feature
Compaction¶
A dataset is an append-only immutable log of data that would, left unchecked, grow forever. This problem is partly mitigated as entities are only written to the log if they are new or different (based on a content hash comparison) from the most recent version of that entity. To supplement this, and ensure that a dataset does not consume all available disk space, a retention policy can be defined. A retention policy describes the general way in which the log should be compacted. The compaction feature in Sesam controls such policies.
The default policy is to keep two versions of every entity. This is the minimum number of versions to keep in order to make dependency tracking work. A time-based policy is also available allowing you to say how old and entity can be before it becomes a candidate for compaction. If sink compaction is disabled the dataset is automatically compacted once every 24 hours.
Note
Compaction will only be performed up to the lowest offset for which there exists a pipe doing dependency tracking on the dataset. Each pipe doing dependency tracking keeps a tracking offset on the dataset so that it knows which entities to perform dependency tracking for. It is this tracking offset that compaction cannot go beyond. This is done so that those pipes should not fall out of sync. If the compaction did not hold off then we could not guarantee that the output of those pipes are correct.
Be aware that disabled pipes also hold off compaction. If the pipes are to be disabled for a long time then it is better to remove the pipe, or alternatively comment out the hops.
Properties¶
Property |
Type |
Description |
Default |
Req |
|---|---|---|---|---|
|
Boolean |
If |
|
No |
|
Boolean |
If |
|
No |
|
Integer |
The number of unique versions of an entity to keep around. The default is Warning If the value is less than |
|
No |
|
Integer |
Specifies the threshold for how old entities must be before they are considered for compaction. This property is usually used when you want to keep entities around for a certain time. |
|
No |
|
Integer |
Same as |
|
No |
|
Float |
The growth factor required for the automatically scheduled compaction to kick
in. Uses the minimum value of |
|
No |
|
Float |
Specifies the sink compaction interval. If this value is zero, sink compaction will run every time
the pipe runs. If it is larger than zero, sink compaction will only run if at least
|
|
No |
|
Number |
Enables TTL compaction for deletes if set. The value determines the number of hours until a deleted entity is considered for compaction. When the entity is compacted away, all versions of the entity will be removed from the database. |
|
No |
|
Number |
Determines the number of seconds that the pipe is allowed to spend on the TTL compaction process. If the pipe times out, the compaction process will continue from where it last stopped the next time the pipe runs. |
60 |
No |
|
Boolean |
If |
|
No |
|
Number |
Determines the number of seconds that the pipe is allowed to spend on the retraction process. If the pipe times out, the retraction process will continue from where it last stopped the next time the pipe runs. |
60 |
No |
Retract¶
When compaction.retract is enabled and an output entity has $retract: true, all earlier
versions of that entity are permanently removed from the sink dataset while the current version
is retained. Deletion state is unaffected. The operation is idempotent.
Warning
Retract is irreversible. Pruned versions cannot be recovered. Enabling retract will override the
compaction.keep_versions setting on a per-entity basis.
Only the pipe’s sink dataset is affected. Upstream datasets and external consumers are unchanged.
$retractpropagates like any other field but may be dropped by merge sources or emit_children. Downstream pipes that must honour the retract needcompaction.retractenabled and$retractexplicitly included in their output.