Dataset source

The dataset source is one of the most commonly used sources in a Sesam installation. It simply presents a stream of entities from a dataset stored in Sesam. Its configuration is very simple and looks like:

Prototype

{
    "type": "dataset",
    "dataset": "id-of-dataset",
    "include_previous_versions": false,
    "include_replaced": true,
    "supports_signalling": false
}

Properties

Property

Type

Description

Default

Req

dataset

String

A dataset id

Yes

subset

Array

An eq DTL expression where the left-hand side is the index expression and the right-hand side is the value that represents the subset. If the subset is specified then only entities that are in that subset will be read from the source.

Example: ["eq", "_S.category", "tank"]

Note

Make sure that you use indexes version 2 when you use subsets. The reason is that these support deletes. Indexes version 1 does not.

Note

eq in subsets behaves the way it does in joins.

Note

The right-hand side argument of the eq must be a literal value and not an expression. Example: use "~:foo:bar" instead of ["ni", "foo", "bar"].

No

completeness

Boolean, Dict or an Array of strings

As a Boolean: If set to true, the dataset source completeness filtering feature is enabled. This will instruct the source to only return source entities that have a _ts value that is older than or equal to the completeness timestamp value of the source dataset.

As a Dict: If set to a dict with an “expression” key, the minimum completeness value will be set to the return value of the DTL expression. The expression will be evaluated each time the pipe is about to start. The expression must return a datetime value. Example of an expression that will prevent the pipe from processing entities with a _ts value that are less than seven days old:

"completeness": {
   "expression": ["datetime-plus", "day", -7, ["now"]]
}

If the DTL expression returns anything other than a datetime object, the pipe will set the minimum completeness value to “~t1970-01-01T00:00:00Z” (which will usually result in the pipe not processing any entities).

As an Array: It is also possible to use the completeness timestamp value of one or more specific upstream datasets instead of the source dataset; this is done by setting completeness to an array of the upstream dataset ids. If the array contains more than one dataset id, the smallest completeness timestamp value is used.

false

initial_completeness

Array of strings (dataset ids)

If set to a non-empty list, the source will only return source entities if the specified dataset(s) have a completeness value. It doesn’t matter what the completeness value is, it only needs to be present.

require_populated_input

Boolean

If set to true, the pipe will not run unless the source dataset has been populated. The global default global_defaults.require_populated_input can be set for all pipes in the service metadata.

false

include_previous_versions

Boolean

If set to false, the dataset source will only return the latest version of any entity for any unique _id value in the dataset. This is the default behaviour.

false

include_replaced

Boolean

If set to false, the dataset source will filter out entities where the $replaced property is true. This typically used when reading from datasets that have been produced by the merge source.

true

supports_signalling

Boolean

Flag used to enable or disable signalling support between internal pipes (dataset to dataset pipes). If enabled, a pipe run is scheduled as soon as the input dataset(s) changes. It does not interrupt any already running pipes.

See global_defaults.use_signalling_internally in the service metadata section for more details.

If signalling is enabled globally, you will have to explicitly set supports_signalling to false to disable it on individual pipes where you don’t want to automatically schedule runs on changes. Note that it is automatically disabled (if not explicitly enabled on the source) if the schedule interval is less than an hour or a cron expression has been used.

false

if_source_empty

Enum<String>

Determines the behaviour of the pipe when the dataset source contains no entities. Normally, any previously synced entities will be deleted even if the pipe does not receive any entities from its source. If set to "fail", the pipe will automatically fail if the source returns no entities. This means that any previous entities in the pipe’s dataset are not deleted. If set to "accept", the pipe will not fail and any previously synced entities will be deleted.

The global default global_defaults.if_source_empty can be set for all pipes in the service metadata.

"accept"

Continuation support

See the section on continuation support for more information.

Property

Value

supports_since

true (Fixed)

is_since_comparable

true (Fixed)

is_chronological

true (Fixed)

Example configuration

The outermost object would be your pipe configuration, which is omitted here for brevity:

{
    "source": {
        "type": "dataset",
        "dataset": "northwind:customers",
        "include_previous_versions": true
    }
}