Service metadata

There is an optional special configuration entity used to represent the service instance’s metadata. The metadata is used to specify properties that apply to the service instance itself. This entity can either be added as a normal configuration entity, edited in the UI or updated with the Service API.

Example:

{
   "_id": "node",
   "type": "metadata",
   "namespaced_identifiers": true,
   "namespaces": {
      "default": {
        "example": "http://example.org/",
        "fifa": "http://www.fifa.com/"
      }
   },
   "global_defaults": {
      "use_signalling_internally": true,
      "default_compaction_type": "sink",
      "symmetric_namespace_collapse": false
   },
   "dependency_tracking": {
      "dependency_warning_threshold": 10000,
      "dependency_error_threshold": 50000,
      "dependency_warning_threshold_total_bytes": 33554432,
      "dependency_error_threshold_total_bytes": 134217728,
      "enable_hops_thresholds": true
   }
}

Properties

Property

Type

Description

Default

Req

namespaced_identifiers

Boolean

Flag used to enable namespaced identifers support for the service as a whole. Pipes inherit the value of the namespaced_identifiers property less explictly overridden.

false

namespaces.default

Dict

A dictionary of namespace to URI expansions. This expansion mapping is used to expand namespaced identifiers into fully qualified URIs, e.g. by those components that provide RDF support.

A few expansion mappings come built-into the system. These are always available unless explicity overridden:

"_": "http://example.org/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"owl": "http://www.w3.org/2002/07/owl#",
"foaf": "http://xmlns.com/foaf/0.1/",
"wgs84": "http://www.w3.org/2003/01/geo/wgs84_pos#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"dc": "http://purl.org/dc/elements/1.1/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"dcterms": "http://purl.org/dc/terms/",
"gs": "http://www.opengis.net/ont/geosparql#",

global_defaults.use_signalling_internally

Boolean

Flag used to globally enable signalling support between internal pipes (i.e. pipes that read from datasets and writes to datasets). If enabled, a pipe run is scheduled as soon as any of the input datasets changes (it does not interrupt any already running pipes).

The default setting of this property is true which means signalling is enabled for all internal pipes in the installation. You can turn enable or disable this feature on individual pipes by setting the supports_signalling flag on the dataset source, merge, union datasets and merge datasets sources). This way you can also enable signalling on non-internal pipes.

Note

Note that signalling support is “best-effort” only; signals are not persisted so delivery is not guaranteed. For this reason, pipes in such flows should always have scheduled interval as a “backup”.

If you set supports_signalling explicitly on the pipe source it will be enabled regardless of the pump schedule.

true

global_defaults.default_compaction_type

Enum<String>

Specifies the default compaction type. It can be set to "background" or "sink". Background compaction will run once every 24 hours. Sink compaction will normally run every time the pipe runs, but this can be tweaked with the global_defaults.compaction_interval setting.

"sink"

global_defaults.compaction_interval

Float

Specifies the default sink compaction interval. If this value is zero, sink compaction will run every time the pipe runs. If it is larger than zero, sink compaction will only run if at least compaction_interval seconds has passed since the last sink compaction. The use-case for this setting is to prevent pipes that run often from constantly trying to compact the sink-dataset.

0

global_defaults.compaction_keep_versions

Integer

The number of unique versions of an entity to keep around. The value must be greater than or equal to 0. If set to 0 then a time threshold must be set explicitly.

Warning

A value less than 2 means that dependency tracking is best effort only, and it will not be able to find all reprocessable entities. Do full or partial rescans as a counter measure.

global_defaults.compaction_time_threshold_hours

Integer

Specifies the threshold for how old entities must be before they are considered for compaction. This property is usually used when you want to keep entities around for a certain time.

global_defaults.compaction_time_threshold_hours_pump

Integer

Same as compaction_time_threshold_hours, but applies to the pipe’s pump execution dataset. Pump execution datasets are always trimmed by time.

global_defaults.compaction_growth_threshold

Float

The growth factor required for the automatically scheduled compaction to kick in. A value of 1.1 mean that there must have been 10% new offsets written to the dataset since the last compaction. 1.0 is the minimum value allowed.

global_defaults.max_entity_bytes_size

Integer

Defines the maximum size in bytes of an individual entity as it is stored in a dataset.

104857600 (100MB)

global_defaults.run_at_startup_if_not_populated

Boolean

Specifies the default value of the property run_at_startup_if_not_populated for pumps.

false

global_defaults.infer_pipe_entity_types

Boolean

Schema inference is enabled for all pipes by default. Setting the property to false will disable schema inference by default. Notice that one can also configure schema inference at the pipe level.

Note

The default value is false for developer subscriptions.

true

global_defaults.use_config_circuit_breaker

Boolean

When set to true, activates the circuit breaker for uploading configuration to the node. When activated, any changes to the node configuration that would result in the deletion of more than 10% of the existing components will not go through (this is the case only when the number of deleted components is also more than 10).

false

global_defaults.reprocessing_policy

Enum<String>

Specifies the default policy that pipes use to decide if the pipe needs to be reset or not. The policy can also be set on individual pipes.

  • continue (the default) means that the pipe will continue processing input entities, and not reset the pipe, even though there might be factors indicating the the pipe should be reset.

  • automatic means that the pipe will automatically reset the pipe when it finds that there are factors that indicate that the pipe should be reset. The rationale for resetting the pipe is so that input entities can the reprocessed so that the output is correct.

continue

global_defaults.rescan_when_config_changes

Boolean

Specifies the default value of the pipes’ rescan_when_config_changes property.

false

global_defaults.enable_background_rescan

Boolean

When set to true, enables running pipe rescans in the background for all applicable pipes.

false

global_defaults.eager_load_microservices

Boolean

When set to false, Sesam can hold off starting up microservices which aren’t connected to any pipes. Set to true to force all microservices to start up regardless.

true

global_defaults.symmetric_namespace_collapse

Boolean

When set to true, the expand and collapse features will be symmetrical, i.e. data containing namespaced identifiers read into Sesam will map to the same thing on the way out of Sesam. Note that setting this option to true as assumed by the DTL ni-collapse and ni-expand DTL functions will also alter the URI/NI collapse and expand behaviour of the RDF and SPARQL source and sink.

false

global_defaults.max_merged

Integer

Sets the maximum number of entities that can be merged at a time with pipes using the merge source. The pipes will fail if more than max_merged entities are attempted merged into a single entity. It is recommended to reduce this value to limit potential memory usage, as the merge pipe will use an excessive amount of RAM if the number of merged entities is too high.

50000

global_defaults.always_index_ids

Boolean

If enabled, dataset sinks will by default maintain an index for the $ids property. This is equivalent to setting "indexes": "$ids" on all dataset sinks in the node.

false

global_defaults.if_source_empty

Enum<String>

Determines the default behaviour of pipes when a source returns no entities. Normally, any previously synced entities will be deleted even if the pipe does not receive any entities from its source. If set to "fail", pipes will automatically fail if the source returns no entities. This means that any previous entities in the pipe’s dataset are not deleted. If set to "accept", the pipe will not fail and any previously synced entities will be deleted.

This property can be set on individual sources as well, in which case the source configuration will override the global default value.

global_defaults.require_populated_input

Boolean

Determines the default behaviour of sources that reads from datasets when one or more of the datasets hasn’t been populated. If set to true, a pipe with such a source will only run if the dataset(s) the source reads from has been populated. This property can be set on individual sources as well, in which case the source configuration will override the global default value.

false

global_defaults.trace

Boolean or Object

This can be set to true to log the http requests and responses the REST transform, REST source, REST sink and HTTP endpoint source sends and receives. This information will be added to a “trace” property in the pump-completed and pump-failed events in the pipe execution log. By default the http headers and the first few bytes of the body is logged. If you need more fine-grained control of the logging, you can set trace to be an object and set the various trace.log_* sub-properties (see below for a description of each sub-property).

false

No

global_defaults.trace.log_request_headers

Boolean

If the trace property is an object this sub-property specifies if the request headers will be logged in the pump-completed/pump-completed events in the execution-log.

true

No

global_defaults.trace.log_request_body_maxsize

Integer

If the trace property is an object this property specifies how many bytes of the request body should be logged in the pump-completed/pump-completed events in the execution-log.

100

No

global_defaults.trace.log_response_headers

Boolean

If the trace property is an object this sub-property specifies if the response headers will be logged in the pump-completed/pump-completed events in the execution-log.

true

No

global_defaults.trace.log_response_body_maxsize

Integer

If the trace property is an object this property specifies how many bytes of the response body should be logged in the pump-completed/pump-completed events in the execution-log.

100

No

global_defaults.trace.log_secret_redacted_bytes

Integer

If the trace property is an object this property specifies how many bytes of each $SECRET will be redacted in the pump-completed/pump-completed events in the execution-log. The purpose of this setting is to redact enough of the secrets to render them safe to log, but to potentially leave some of the secret for debugging purposes. A value of -1 means to redact all bytes of the secrets. Note that the redaction is only a best-effort attempt to prevent secrets from ending up in the logs, there may be cases where secrets leak through in any case, so it is best to always check that what ends up being logged looks ok.

600

No

global_defaults.verify_ssl

Boolean

The default value for the verify_ssl property in URL systems.

false

No

dependency_tracking.dependency_warning_threshold

Integer

The number of entities that dependency tracking can keep in memory at a given time. If this number is exceeded then a warning message is written to the log.

10000

dependency_tracking.dependency_error_threshold

Integer

The number of entities that dependency tracking can keep in memory at a given time. If this number is exceeded then the pump will fail. Do not set this value too high as it may cause excessive memory usage.

50000

dependency_tracking.dependency_warning_threshold_total_bytes

Integer

The number of bytes that dependency tracking can keep in memory at a given time. If this number is exceeded then a warning message is written to the log.

33554432 (32MB)

dependency_tracking.dependency_error_threshold_total_bytes

Integer

The number of bytes that dependency tracking can keep in memory at a given time. If this number is exceeded then the pump will fail. Do not set this value too high as it may cause excessive memory usage.

134217728 (128MB)

dependency_tracking.enable_hops_thresholds

Boolean

If true, then warning and error thresholds that apply for dependency tracking also apply for regular "hops" expressions. It is recommended that you set this property to true in development environments.

false