Deprecations

This document contains deprecated sections from the documentation. Do not make use of these features.

DTL Functions

Function

Description

Examples

curie

Arguments:
PREFIX(string{1}),
VALUES(value-expression{1})

Constructs new CURIEs as URI objects based on a the PREFIX and VALUES arguments.

["curie", "foo", "bar"]

This will produce a URI object with the value "~rfoo:bar".

["curie", "foo", ["list", "bar", "zoo"]]

This will produce a list of two URI objects with the values ["~rfoo:bar", "~rfoo:zoo"].

uri-expand

Arguments:
FUNCTION(function-expression(0|1}
ENTITIES(value-expression{1})

Runs the given entities through the prefixing rules and the prefix expansion mapping defined in the node metadata RDF registry. The given entities must have a _dataset property containing the id of the dataset to which they belong or the key to look up the prefixes must be computed by the (optional) FUNCTION argument. The result of the FUNCTION argument will override any _dataset property on the entity. The id given or computed will be used to locate the prefix rules and prefix expansion mapping within the node RDF registry. Note that the result of FUNCTION must be a single string value.
The main purpose of this function is to prepare entities for translation into RDF form. See the RDF support document for more information about how this works.
Example node metadata:
{
    "rdf": {
      "people": {
         "prefixes": {
           "p": "http://example.org/people/"
         },
         "prefix_rules": {
           "id": "p",
           "properties": [
              "p", ["name"],
              "c", ["Employer"],
              "_", ["**"]
           ]
         }
      }
    }
}
Example input entity:
{
  "_id": "john_doe",
  "_dataset": "people",
  "name": "John Doe",
  "employer": "Example Ltd.",
  "born": "1973-01-21"
}
Given the above configuration you should expect the following URI-expanded entity in the result:
{
  "_id": "<http://example.org/people/john_doe>",
  "_dataset": "people",
  "<http://example.org/people/name>": "John Doe",
  "<http://example.org/company/employer>": "Example Ltd.",
  "<http://example.org/born>": "1973-01-21"
}
["uri-expand",
{"_id": "mary", "_dataset": "people", "name": "Mary Jones"}]

Returns an URI expanded version of the mary entity.

["uri-expand",
["lookup", ["list", "~rsesam:A/foo"], "bar"]]

Looks up the foo entity in the A dataset and bar in the current dataset, then URI expands them.
["uri-expand",
["list", {"_id": "mary", "name": "Mary Jones"}]]

Returns an empty list because the mary entity is missing the _dataset property.
["uri-expand", ["string", "people"],
{"_id": "mary", "_dataset": "employees",
"name": "Mary Jones"}]

Returns an URI expanded version of the mary entity using the prefixes registered by the “people” key in the node RDF registry (i.e. the _dataset value of “employees” is overriden by the computed value)
["uri-expand", ["string", "_.type"],
{"_id": "mary", "_dataset": "employees",
"type": "person", "name": "Mary Jones"}]

Returns an URI expanded version of the mary entity using the prefixes registered by the “person” key in the node RDF registry. The _dataset value of “employees” is overriden by the computed value (based on the contents of the entity’s type property in this example).

lookup

Arguments:
DATASET_IDS(value-expression{0|1})
ENTITY_REFERENCES(value-expression{1})

Returns an entity or a list of entities by resolving the strings or URIs in ENTITY_REFERENCES. The URIs will be resolved by looking up entities by id in the given datasets. Relative references will be resolved in the current dataset or in the DATASET_IDS datasets if specified. The returned entities have an extra _dataset property containing the id of the dataset where they came from.
["lookup", "~rsesam:A/foo"]

Looks up the foo entity in the A dataset.

["lookup", "A", ["list", "foo", "sesam:B/bar"]]

Looks up the foo entity in the A dataset and the bar entity in the B dataset.

["lookup", "bar"]

Looks up the bar entity in the current dataset.

["lookup",
["list", "A", "B"],
["list", "bar", "baz",
"~rsesam:C/foo", "~rsesam:D/quux"]

Looks up the bar and baz entities in the A and B datasets. foo is looked up in the C dataset and quux in the D dataset because they are explicit entity references.

reference

Arguments:
DATASET_ID(string{1})
ENTITY_IDS(value-expression{})

Returns a URI that can be used to reference entities in the given dataset. The DATASET_ID and ENTITY_IDS parts will be URI path encoded. URIs of this type can be resolved using the lookup function.
["reference", "foo", "bar"]

Returns "~rsesam:foo/bar" (which is a value of the URI datatype)).

["reference", "foo", ["list", "a", "b"]]

Returns ["~rsesam:foo/a", "~rsesam:foo/b"].

Sources

The Kafka source

The Kafka source consumes data from a Kafka topic. The consumer stores the offset in the pipe, and does not commit the consumer offset back to Kafka.

The entities emitted from this source has offset, partition, timestamp, value and key as properties. Message keys in Kafka can be any bytes, but the source will try to utf-8 decode the key and add that as the _id property.

Prototype

{
    "type": "kafka",
    "system": "kafka-system-id",
    "topic": "some-topic"
}

Properties

Property

Type

Description

Default

Req

system

String

The id of the Kafka System component to use.

Yes

topic

String

The topic to consume from.

Yes

partitions

List<Integer>

Manual assignment of partitions if only a subset of the topic is to be consumed by this pipe. In Azure Event Hubs this property has to be set for assignment to work for now.

<All>

No (Yes for Event Hubs)

seek_to_beginning

Boolean

If the consumer should start from the beginning of the topic or only consume new messages. This only applies to the first run, subsequent runs will continue where it left off unless the pipe is reset.

false

ignore_null_keys

Boolean

If the consumer should drop messages that does not have keys.

true

consumer_timeout_ms

Integer

The pipe will consume all available messages from the topic. Once all messages has been consumed it will wait for this period of time until it will complete. Note that for topics that receives new messages more often than this interval the pipe will never complete.

60000

Example configuration

The outermost object would be your pipe configuration, which is omitted here for brevity.

{
    "source": {
        "type": "kafka",
        "system": "my-kafka",
        "consumer_timeout_ms": 5000,
        "ignore_null_keys": false,
        "partitions": [0, 1],
        "seek_to_beginning": true,
        "topic": "foo"
    },
}

Diff datasets source

The diff datasets source is similar to the merge dataset source, except that it also compares the entities from the datasets. The comparison produces a diff and filters out entities that are equal.

For each merged entity (same as the all strategy in merge dataset source) an additional $diff property is also generated. The diff contains the datasets and values for the properties that are not equal across all the datasets.

Entity ids are not modified in any way.

Prototype

{
    "type": "diff_datasets",
    "datasets": ["id-of-dataset1", "id-of-dataset2"]
 }

Properties

The configuration only requires the property datasets which must be a list of datasets ids.

Property

Type

Description

Default

Req

datasets

List<String>

A list of datasets ids.

Yes

initial_datasets

List<String{>=0}>

By default the source will be considered populated if all the datasets in the datasets property have been populated. If some of these datasets will never be populated then this property can be used to list the datasets that must be populated before the source is considered populated. You should normally not have to use this property.

See also the dataset sink property set_initial_offset.

whitelist

List<String>

The names of the properties to include in the comparison. If there is a blacklist also specified, the whitelist will be filtered against the contents of the blacklist.

blacklist

List<String>

The names of the properties to exclude from the comparison. If there is a whitelist also specified, the blacklist operates on the values of the whitelist (and not the properties present in the entities).

treat_lists_as_sets

Boolean

Flag to indicate if you want to ignore duplicates and ordering of lists in the entities you are comparing. This option also affects lists nested deeper inside the entity.

false

ignore_deletes

Boolean

Flag to indicate if you want to ignore deleted entities during the comparison. By default there will be produced a difference if one of the datasets contains a deleted entity while the other datasets does not contain the deleted entity.

If true the deleted entities are treated as if they don’t exist.

false

if_source_empty

Enum<String>

Determines the behaviour of the pipe when the datasets do not return any entities. Normally, any previously synced entities will be deleted even if the pipe does not receive any entities from its source. If set to "fail", the pipe will automatically fail if the source returns no entities. This means that any previous entities in the pipe’s dataset are not deleted. If set to "accept", the pipe will not fail and any previously synced entities will be deleted.

The global default global_defaults.if_source_empty can be set for all pipes in the service metadata.

"accept"

Continuation support

See the section on continuation support for more information.

Property

Value

supports_since

true (Fixed)

is_since_comparable

true (Fixed)

is_chronological

true (Fixed)

Example configuration

The outermost object would be your pipe configuration, which is omitted here for brevity:

{
    "source": {
        "type": "diff_datasets",
        "datasets": ["product", "other-products"]
    }
}

Example result

{
    "_id": "some-product",
    "$diff": {
        "price": {
            "products": "price-from-products",
            "other-products": "price-from-other-products",
        }
    }
 }

Transforms

The properties to CURIEs transform

This transform can transform entity properties to RDF CURIEs (a superset of XML QNames) based on wildcard patterns. It is used primarily when dealing with or preparing to output RDF data. Note that URL quoting is applied to the property names as part of the transform. Also note that by default the path separator character (“/) is not quoted, but the behaviour is configurable.

Prototype

{
    "type": "properties_to_curies",
    "rule": "rdf-registry-entry",
    "quote_safe_characters": "/",
    "id": "optional-id-prefix",
    "properties": [
      "optional_some_prefix", ["optional_some_pattern"]
    ]
}

Properties

Property

Type

Description

Default

Req

rule

String

The id of the key in the RDF registry containing the prefix rules to to use for the transformation. See RDF support for more information about the RDF registry and how to configure it.

Yes*

quote_safe_characters

String

A string of characters that should be treated as “safe” from URL quoting by the transform. By default this is the slash character (“/”). If this property is set to the empty string (“”), all characters of the property name will be URL quoted. This property can also be set at the RDF registry level, but this value will be overridden if set directly on the transform configuration.

id

String

The prefix to use for _id properties

Yes*

properties

List<(String, List<String>)>

A list of String,List pairs that make up the rules for which properties should be assigned which prefixes. See the example section below for a fuller explanation of this property.

Yes*

Note that rule and id and properties are mutually exclusive. If all three are present, rule is given precedence and id and properties are ignored.

Example

The rule property references a RDF registry entry containing a prefix_rules object. See RDF support for more information about the RDF registry and how to configure it. Alternatively, the contents of the prefix_rules entry (i.e. .the id and properties) can be included inline in the transform configuration.

Given a pre-existing RDF registry entry my_entry:

"my_entry": {
   ..
   "prefix_rules": {
       "id": "x",
       "properties": [
            "c", ["status", "code"],
            "_", ["status"],
            "t", ["t_*"],
            "m", ["status", "**", "m*"],
            "s", ["status", "**"],
            "x", ["**"]
       ]
   }
   ..
}

And a transform configuration:

{
    "type": "properties_to_curies",
    "rule": "my_entry"
}

And the input entity:

{
    "_id": "foo/bar",
    "name": "John",
    "born": "1980-01-23",
    "code": "AB32",
    "t_a": "A",
    "a/b": "A/B",
    "status": {
        "married": true,
        "spouse": "Pam",
        "code": 123,
        "t_b": {
            "t_c": "C",
            "hello": "world",
            "<s:hi>": "bye"
        }
    }
}

The transform will output the following transformed entity:

{
    "_id": "<x:foo/bar>",
    "<x:name>": "John",
    "<x:born>": "1980-01-23",
    "<x:code>": "AB32",
    "<t:t_a>": "A",
    "<x:a/b>": "A",
    "<_:status>": {
        "<m:married>": true,
        "<s:spouse>": "Pam",
        "<c:code>": 123,
        "<t:t_b>": {
            "<t:t_c>": "C",
            "<s:hello>": "world",
            "<s:hi>": "bye"
        }
    }
}

Setting quote_safe_characters to “” would instead yield:

{
    "_id": "<x:foo%2Fbar>",
    "<x:name>": "John",
    "<x:born>": "1980-01-23",
    "<x:code>": "AB32",
    "<t:t_a>": "A",
    "<x:a%2Fb>": "A",
    "<_:status>": {
        "<m:married>": true,
        "<s:spouse>": "Pam",
        "<c:code>": 123,
        "<t:t_b>": {
            "<t:t_c>": "C",
            "<s:hello>": "world",
            "<s:hi>": "bye"
        }
    }
}

Notice that now “/” has also been URL quoted (“%2F”)

The URIs to CURIEs transform

This transform can transform entity properties containing URIs in the keys and/or the values to a more compact form using RDF CURIEs (a superset of XML QNames). It is used primarily when dealing with or reading RDF data. See the Working with RDF document for more information about working with RDF data in Sesam.

Prototype

{
    "type": "uris_to_curies",
    "prefix_includes": ["entry1", "entry2"]
}

Properties

Property

Type

Description

Default

Req

prefix_includes

List<String>

A list of string keys to look up in the instance-wide RDF registry. These keys reference objects which contain RDF support structures such as CURIE prefixes (and possibly references to other prefix sets to include). The prefixes collected from the RDF registry will be used to compress full URIs to CURIEs. See RDF support for more information about the RDF registry and how to configure it. The common RDF prefixes are built-in and you don’t have to provide the mapping for it (i.e. RDF, RDFS, OWL etc).

Example

Given the configuration:

{
    "transform": [
       {
         "type": "uris_to_curies",
         "prefix_includes": ["my_entry"]
       }
    ]
}

The RDF registry entry:

"my_entry": {
   "prefixes": {
      "foo": "http://psi.foo.com/"
      "test": "http://psi.test.com/"
   }
   ..
}

And the input entity:

{
    "_id": "http://psi.test.com/2",
    "http://psi.test.com/name": "John",
    "born": "1980-01-23",
    "http://psi.test.com/code": "AB32",
    "status": {
        "http://psi.foo.com/married": true,
        "spouse": "Pam",
        "url1": "~rhttp://www.foo.com",
        "url2": "~rhttp://psi.foo.com/url2",
        "code": 123,
        "child": {
            "t_c": "C",
            "http://psi.test.com/hello": "http://psi.foo.com/world",
            "http://psi.tests.com/s": "bye"
        }
    }
}

The transform will output the following compact/”compressed” transformed entity:

{
    "_id": "<test:2>",
    "<test:name>": "John",
    "born": "1980-01-23",
    "<test:code>": "AB32",
    "status": {
        "<foo:married>": true,
        "spouse": "Pam",
        "code": 123,
        "url1": "~rhttp://www.foo.com",
        "url2": "~rfoo:url2",
        "child": {
            "t_c": "C",
            "<test:hello>": "<foo:world>",
            "http://psi.tests.com/s": "bye"
        }
    }
}

Note that the transform will not attempt to unquote the remainder elements after the matched prefixes.

The lower keys transform

This transform transforms all the keys of an entity to lower case (optionally recursively).

Prototype

{
    "type": "lower_keys",
    "recurse": false
}

Properties

Property

Type

Description

Default

Req

recurse

Boolean

An optional flag to indicate whether to do the case conversion recursively or not (default is false, which means no recursion).

false

Example

With the default transform configuration:

{
    "type": "lower_keys",
}

And given the the input entity:

{
    "_id": "http://psi.test.com/2",
    "Born": "1980-01-23",
    "CODE": "AB32",
    "Status": {
        "http://psi.foo.com/married": true,
        "Spouse": "Pam",
        "URL1": "~rhttp://www.foo.com",
        "URL2": "~rhttp://psi.foo.com/url2",
        "CODE": 123,
        "Child": {
            "t_c": "C",
            "http://psi.test.com/hello": "http://psi.foo.com/world",
            "http://psi.tests.com/S": "bye"
        }
    }
}

The transform will output the following transformed entity:

{
    "_id": "http://psi.test.com/2",
    "born": "1980-01-23",
    "code": "AB32",
    "status": {
        "http://psi.foo.com/married": true,
        "Spouse": "Pam",
        "URL1": "~rhttp://www.foo.com",
        "URL2": "~rhttp://psi.foo.com/url2",
        "CODE": 123,
        "Child": {
            "t_c": "C",
            "http://psi.test.com/hello": "http://psi.foo.com/world",
            "http://psi.tests.com/S": "bye"
        }
    }
}

Note that only the root keys are transformed. If the recurse property is set to true in the configuration, however, the result would instead become:

{
    "_id": "http://psi.test.com/2",
    "born": "1980-01-23",
    "code": "AB32",
    "status": {
        "http://psi.foo.com/married": true,
        "spouse": "Pam",
        "url1": "~rhttp://www.foo.com",
        "url2": "~rhttp://psi.foo.com/url2",
        "code": 123,
        "child": {
            "t_c": "C",
            "http://psi.test.com/hello": "http://psi.foo.com/world",
            "http://psi.tests.com/s": "bye"
        }
    }
}

The upper keys transform

This transform transforms all the keys of an entity to upper case (optionally recursively). The transform mirrors the lower case transform exactly except for the keys being transformed to upper case. See previous section for details.

The undirected graph transform

The undirected graph transform transforms a list of properties representing nodes in a graph into all its possible sets of edges, forming a complete graph. The transform will generate all possible edges in the graph, which will be twice the number of entities as there are values in the aggregate of the list of properties given. See the example section for an example.

Prototype

{
    "type": "undirected_graph",
    "nodes": ["_id", "sameAs"],
    "from": "from-property",
    "to": "to-property"
}

Properties

Property

Type

Description

Default

Req

nodes

List<String>

A list of entity property names that should be used to pick the nodes of the graph. The properties must refer to a value that is either a string or a URI, or a list of strings or URIs. No other value types are allowed in the transform.

[“_id”, “sameAs”]

from

String

The name of the property to use as “from” point in the generated entity for an edge in the graph.

“from”

to

String

The name of the property to use as the “to” point in the generated entity for an edge in the graph.

“to”

Example

Given the configuration:

{
    "transform": [
       {
         "type": "undirected_graph",
         "nodes": ["_id", "map"],
         "from": "from",
         "to": "to"
       }
    ]
}

And the input entity:

{
   "_id": "foo",
   "map": ["bar", "zoo"]
}

The transform will output the following edges of the graph as entities on its output stream:

{
    "_id": "foo.bar",
    "from": "foo",
    "to": "bar"
}

{
    "_id": "foo.zoo",
    "from": "foo",
    "to": "zoo"
}

{
    "_id": "bar.foo",
    "from": "bar",
    "to": "foo"
}

{
    "_id": "bar.zoo",
    "from": "bar",
    "to": "zoo"
}

{
    "_id": "zoo.foo",
    "from": "zoo",
    "to": "foo"
}

{
    "_id": "zoo.bar",
    "from": "zoo",
    "to": "bar"
}

Sinks

Deprecated Properties

The prefix_includes property has been deprecated for the sparql, sdshare, and http_endpoint sinks.

Property

Type

Description

Default

Req

prefix_includes

List<String>

A list of string keys to look up in the node-wide RDF registry. These keys reference objects which contain RDF support structures such as CURIE prefixes (and possibly references to other prefix sets to include). The prefixes collected from the RDF registry will be used to expand CURIEs into full URIs. See RDF support for more information about the RDF registry and how to configure it. You do not need include any prefix sets to use the common RDF prefixes (i.e. RDF, RDFS, OWL and so on).

The Kafka sink

The Kafka sink produces data to a Kafka topic.

Entities sent to this sink will use the key, value and partition properties if present, otherwise the key will be utf-8 encoded version of _id and the value will be the entire entity. If partition is not specified, the partitioning will be based on the key.

The properties used matches the properties emitted by the Kafka source. This means that it should be possible to consume a topic and produce to a new topic in a pipe with no DTL.

The sink will flush to Kafka after every batch.

Prototype

{
    "type": "kafka",
    "system": "kafka-system-id",
    "topic": "some-topic"
}

Properties

Property

Type

Description

Default

Req

system

String

The id of the Kafka System component to use.

Yes

topic

String

The topic to send to.

Yes

Example configuration

The outermost object would be your pipe configuration, which is omitted here for brevity.

{
    "sink": {
        "type": "kafka",
        "system": "my-kafka",
        "topic": "foo"
    },
}

Systems

Deprecated Properties

Microservice system

The following properties have been deprecated:

Property

Type

Description

Default

Req

docker.cpu_period

Integer

The percentage of CPU time the OS scheduler is allowed use (see the Docker documentation for details). Note that the value is divided by 1000 with respects to the range in the Docker documentation. You should not normally change the default value.

100

docker.cpuset_cpus

String

A string expression representing the CPU cores the container is allowed to use, see docker.cpu_quota. The default (null value) means the container can use all cores. A value of "0,4" means use core 0 and 4. A value of "0-4" means use cores 0 through 4. A value of "0,6-8" means use core 0 and 6 through 8.

null

The Kafka system

This system can be used to read and write data from Apache Kafka as well as Azure Event Hubs for Apache Kafka.

Prototype

{
    "_id": "id-of-system",
    "name": "Name of system",
    "type": "system:kafka",
    "bootstrap_servers": "localhost:9092,otherhost:9092",
}

Properties

Property

Type

Description

Default

Req

bootstrap_servers

String

Comma separated list of bootstrap servers with hostname and port. For Azure Event Hubs this should be set to <fqdn>:9093.

Yes

sasl_username

String

Username to use when authentication against a SASL enabled Kafka cluster. If username is set, authentication will be performed. For Azure Event Hubs this property must be set to $ConnectionString and the connection string should be passed as the password.

No

sasl_password

String

Password to use when authentication against a SASL enabled Kafka cluster. For Azure Event Hubs this should be set to Endpoint=sb://[...].

No

The RDF registry

When working with RDF data in Sesam, we would like to be able to define, maintain and share these RDF prefixes among our datasets and DTL transforms. For this purpose Sesam has a built-in RDF registry. You can configure the registry by including an entity in your configuration on the form:

{
   "_id": "node"
   "type": "metadata",
   "rdf": {
      "dataset1": {
          "prefixes": {
              "foo": "http://example.com/foo/",
              "foo_schema": "http://example.com/foo/schema/"
          },
          "prefix_rules": {
              "id": "foo",
              "properties": [
                  "foo_schema", ["**"]
              ]
          }
      },
      "dataset2": {
          "prefixes": {
              "bar": "http://example.com/bar/",
              "bar_schema": "http://example.com/bar/schema/"
          },
          "prefix_includes": ["dataset1"],
          "quote_safe_characters": "",
          "prefix_rules": {
              "id": "bar",
              "properties": [
                  "foo_schema", ["some_prop"],
                  "bar_schema", ["**"]
              ]
          }
      }
}

The root key rdf above contains the entire configuration of the RDF registry. Its sub-keys will usually correspond to dataset ids, although you can register any valid key here.

RDF registry items

The “prototype” of a RDF registry entry entry_id look like:

..
"entry_id": {
    "prefixes": {
       "foo" : "http://example.com/foo/",
       "baz" : "http://example.com/baz/",
       "bar" : "http://example.com/baz/"
    },
    "prefix_includes": ["list_of", "other", "registry", "entries"],
    "prefix_rules": {
        "id": "bar",
        "properties": [
            "foo", ["some_prop"],
            "baz", ["**"]
        ]
    },
    "quote_safe_characters": "/æåø",
}

Note that the quote_safe_characters is an optional property of the RDF registry entity. If specified, it should contains a string of characters that should be excluded from URL quoting when constructing CURIEs. It can also be specified on the properties to CURIEs transform where, if specified, will take precedence over any value it might have in the RDF registry entry. This property defaults to “/” and would normally not need to be changed. A value of “” (the emtpy string) means “quote all characters”. See below for more detail on the use of this transform.

Prefixes

Each registry item must contain at least a single property prefixes which is a object containing prefix to URI mappings for CURIE generation or expansion. The registry items can also contain a list property prefix_includes which must be references to other existing RDF registry keys. When looking up items in the RDF registry, any prefix elements in this list will be recursively included. Take care that you don’t have overlapping prefix names, as the final result will be undefined. Also make sure you don’t create circular references using this property.

Built-in prefixes

The Sesam RDF registry has built-in support for the common prefixes in RDF, such as rdf, rdfs and owl. This means you don’t have to define these yourself to use them in your CURIEs. The full list of built-in prefixes is:

{
    "_": "http://example.org/",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "wgs84": "http://www.w3.org/2003/01/geo/wgs84_pos#",
    "dc": "http://purl.org/dc/elements/1.1/",
    "dcterms": "http://purl.org/dc/terms/",
    "gs": "http://www.opengis.net/ont/geosparql#"
}

The “_” prefix is used in general as a fallback if no prefix is defined for a property when mapping an entity to its RDF representation.

Prefix rules

The final property that can exist in a RDF registry item is prefix_rules. This element tells us how to create RDF CURIEs from a plain entity: the id property contains the prefix to use for the _id property of the entity (i.e. the subject in RDF) and the properties property is a list of property pairs that encode the rules for what prefix to apply to which property of the entity.

The properties format is tuples of string/list pairs, where the first item is the prefix to add and the second is the path expression that is used to match against. The number of elements in the list must be even. Path expressions are evaluated in order and the first matching path expression will win, so if a path expression matches the prefix will be assigned to the matching key.

A path expression is a list of strings. The left-most string value is the most specific. ** can be used to denote nestedness at an arbitrary depth. * can be used as a wildcard in the string values themselves.

The property to CURIE transform

A complete example of how the prefix_rules property works; we want to transform an entity that looks like:

{
    "_id": "2",
    "name": "John",
    "born": "1980-01-23",
    "code": "AB32",
    "t_a": "A",
    "status": {
        "married": True,
        "spouse": "Pam",
        "code": 123,
        "t_b": {
            "t_c": "C",
            "hello": "world",
            "<s:hi>": "bye"
        }
    }
}

to RDF form using CURIEs. We start by defining the rules for this transformation in the RDF registry entry my_entry:

"my_entry": {
   ..
   "prefix_rules": {
       "id": "x",
       "properties": [
            "c", ["status", "code"],
            "_", ["status"],
            "t", ["t_*"],
            "m", ["status", "**", "m*"],
            "s", ["status", "**"],
            "x", ["**"]
       ]
   }
   ..
}

We then add a properties to CURIEs transform to the start of our pipe’s transform section:

..
    "transform": [
        {
            "type": "properties_to_curies",
            "rule": "my_entry"
        }
        ..
    ]

This transform will use our my_entry rules and produce the following transformed entity:

{
    "_id": "<x:2>",
    "<x:name>": "John",
    "<x:born>": "1980-01-23",
    "<x:code>": "AB32",
    "<t:t_a>": "A",
    "<_:status>": {
        "<m:married>": True,
        "<s:spouse>": "Pam",
        "<c:code>": 123,
        "<t:t_b>": {
            "<t:t_c>": "C",
            "<s:hello>": "world",
            "<s:hi>": "bye"
        }
    }
}

RDF input

Sesam supports RDF input from several different sources:

Additionally, you can set up a HTTP endpoint source which includes a SDShare Push capable HTTP endpoint where you can post RDF data in NTriples format in accordance with the SDShare Push protocol.

The URIs to CURIEs transform

All of these methods of RDF input will provide entities to your data flows on the general form:

{
    "_id": "<http://example.com/bar>",
    "<http://example.com/schema/some_predicate>": "Some literal",
    "<http://example.com/schema/other_predicate>": "~rhttp://example.com/zoo"
}

When processing this data in the flow, we would like to first transform these entities to CURIE form using the RDF registry to manage the prefixes. In the above example we can add a URIs to CURIEs transform to the pipe to achieve this:

{
    "_id": "my-pipe",
    ..
    "transform": [
       {
         "type": "uris_to_curies",
         "prefix_includes": ["my_entry"]
       }
    ]

where the corresponding my_entry in the RDF registry looks like:

..
"my_entry": {
    "prefixes": {
        "foo": "http://example.com/",
        "foo_schema": "http://example.com/schema/"
    }
    ..
}
..

This transform will then produce the following entity:

{
   "_id": "<foo:bar>",
   "<foo_schema:some_predicate>": "Some literal",
   "<foo_schema:other_predicate>": "~rfoo:zoo"
}

RDF in transforms

The Sesam DTL language features several functions that are useful when working with RDF data in your flow.

Accessing CURIEs properties

When addressing properties in CURIEs form in DTL transform, you can simply use their names verbatim. For example:

..
["rename", "<foo:third_predicate>", "<foo:some_predicate>"],
["copy", "_S.<foo_schema:other_predicate>"],
["add", "<rdfs:label>", "Bob"]
..

You can also use the CURIEs in path expressions in the same way as any other property name. If you want to add a URI literal as part of your transformed entity you can use the DTL curie function, which takes a prefix and a value expression (i.e. a literal or a function) and produces a URI property value:

..
["add", "<foo_schema:baz>", ["curie", "foo", "zoo"]]
..

This will add a property that looks like:

{
  ..
  "<foo_schema:baz>": "~rfoo:zoo"
  ..
}

CURIE expansion in DTL

When processing RDF data in a flow, we sometimes would like to expand an entity or a child entity from CURIEs to full URI form (for example if there are conflicting usages of prefixes). This can be done using the DTL uri-expand:

..
["add", "<baz:expanded>", ["uri-expand", ["string", "my_entry"], {"_id": "<foo:bob>", "<foo:name>": "Bob Jones"}]]
..

This will expand the properties of the entity (here shown inline, but typically will be from a hops join or some other function) to its “full” form:

{
  ..
  "<baz:expanded>": {
      "_id": "<http://example.com/foo/bob>",
      "<http://example.com/foo/name>": "Bob Jones"
  }
  ..
}

Note that expanding CURIEs is normally done at the endpoint of your flow (i.e. by the sink or a SDShare feed, see below). However, if the sink you are using to output the final data is not RDF aware (i.e. supports automatic prefix expansion) you can use the uri-expand function to achieve the same functionality.

Pipes

Short-hand configuration

As mentioned earlier, in the pipe section, there is a special “short hand” configuration for one of the most used pipes; pipes pumping entities from RDBMS tables to an internal dataset. Since this is an often encountered usecase, we have condensed the information needed into a single url-style form:

[
    {
       "_id": "Northwind",
       "type": "system:mysql",
       "name": "Northwind database",
       "username": "northwind",
       "password": "secret",
       "host": "mydb.example.org",
       "database": "Northwind"
    },
    {
       "_id": "Northwind:Orders",
       "type": "pipe",
       "name": "Orders from northwind",
       "short_config": "sql://Northwind/Orders"
    }
]

Currently, only the sql system and source is supported though other short forms may be added at a later time. The above example using the short_config form is equivalent to this fully expanded pipe configuration:

[
    {
       "_id": "Northwind",
       "type": "system:mysql",
       "name": "Northwind database",
       "username": "northwind",
       "password": "secret",
       "host": "mydb.example.org",
       "database": "Northwind"
    },
    {
       "_id": "Northwind:Orders",
       "type": "pipe",
       "source": {
           "type": "sql",
           "system": "Northwind",
           "table": "Orders"
       },
       "sink": {
           "type": "dataset",
           "dataset": "Northwind:Orders"
       },
       "pump": {
           "schedule_interval": 30
       }
    }
]

You can combine the short form with properties from the dataset sink, sql source and specific pump properties, as long as the _id and type properties aren’t overridden, for example changing the pump schedule and startup flag:

[
    {
       "_id": "Northwind",
       "type": "system:mysql",
       "name": "Northwind database",
       "username": "northwind",
       "password": "secret",
       "host": "mydb.example.org",
       "database": "Northwind"
    },
    {
       "_id": "Northwind:Orders",
       "type": "pipe",
       "name": "Orders from northwind",
       "short_config": "sql://Northwind/Orders",
       "pump": {
           "schedule_interval": 60,
           "run_at_startup": true
       }
    }
]