Deprecations

This document contains deprecated sections from the documentation. Do not make use of these features.

DTL Functions

Function Description Examples
curie
Arguments:
PREFIX(string{1}),
VALUES(value-expression{1})

Constructs new CURIEs as URI objects based on a the PREFIX and VALUES arguments.

["curie", "foo", "bar"]

This will produce a URI object with the value "~rfoo:bar".

["curie", "foo", ["list", "bar", "zoo"]]

This will produce a list of two URI objects with the values ["~rfoo:bar", "~rfoo:zoo"].
uri-expand
Arguments:
FUNCTION(function-expression(0|1}
ENTITIES(value-expression{1})

Runs the given entities through the prefixing rules and the prefix expansion mapping defined in the node metadata RDF registry. The given entities must have a _dataset property containing the id of the dataset to which they belong or the key to look up the prefixes must be computed by the (optional) FUNCTION argument. The result of the FUNCTION argument will override any _dataset property on the entity. The id given or computed will be used to locate the prefix rules and prefix expansion mapping within the node RDF registry. Note that the result of FUNCTION must be a single string value.
The main purpose of this function is to prepare entities for translation into RDF form. See the RDF support document for more information about how this works.
Example node metadata:
{
    "rdf": {
      "people": {
         "prefixes": {
           "p": "http://example.org/people/"
         },
         "prefix_rules": {
           "id": "p",
           "properties": [
              "p", ["name"],
              "c", ["Employer"],
              "_", ["**"]
           ]
         }
      }
    }
}
Example input entity:
{
  "_id": "john_doe",
  "_dataset": "people",
  "name": "John Doe",
  "employer": "Example Ltd.",
  "born": "1973-01-21"
}
Given the above configuration you should expect the following URI-expanded entity in the result:
{
  "_id": "<http://example.org/people/john_doe>",
  "_dataset": "people",
  "<http://example.org/people/name>": "John Doe",
  "<http://example.org/company/employer>": "Example Ltd.",
  "<http://example.org/born>": "1973-01-21"
}
["uri-expand",
{"_id": "mary", "_dataset": "people", "name": "Mary Jones"}]

Returns an URI expanded version of the mary entity.

["uri-expand",
["lookup", ["list", "~rsesam:A/foo"], "bar"]]

Looks up the foo entity in the A dataset and bar in the current dataset, then URI expands them.
["uri-expand",
["list", {"_id": "mary", "name": "Mary Jones"}]]

Returns an empty list because the mary entity is missing the _dataset property.
["uri-expand", ["string", "people"],
{"_id": "mary", "_dataset": "employees",
"name": "Mary Jones"}]

Returns an URI expanded version of the mary entity using the prefixes registered by the "people" key in the node RDF registry (i.e. the _dataset value of "employees" is overriden by the computed value)
["uri-expand", ["string", "_.type"],
{"_id": "mary", "_dataset": "employees",
"type": "person", "name": "Mary Jones"}]

Returns an URI expanded version of the mary entity using the prefixes registered by the "person" key in the node RDF registry. The _dataset value of "employees" is overriden by the computed value (based on the contents of the entity's type property in this example).
lookup
Arguments:
DATASET_IDS(value-expression{0|1})
ENTITY_REFERENCES(value-expression{1})

Returns an entity or a list of entities by resolving the strings or URIs in ENTITY_REFERENCES. The URIs will be resolved by looking up entities by id in the given datasets. Relative references will be resolved in the current dataset or in the DATASET_IDS datasets if specified. The returned entities have an extra _dataset property containing the id of the dataset where they came from.
["lookup", "~rsesam:A/foo"]

Looks up the foo entity in the A dataset.

["lookup", "A", ["list", "foo", "sesam:B/bar"]]

Looks up the foo entity in the A dataset and the bar entity in the B dataset.

["lookup", "bar"]

Looks up the bar entity in the current dataset.

["lookup",
["list", "A", "B"],
["list", "bar", "baz",
"~rsesam:C/foo", "~rsesam:D/quux"]

Looks up the bar and baz entities in the A and B datasets. foo is looked up in the C dataset and quux in the D dataset because they are explicit entity references.
reference
Arguments:
DATASET_ID(string{1})
ENTITY_IDS(value-expression{})

Returns a URI that can be used to reference entities in the given dataset. The DATASET_ID and ENTITY_IDS parts will be URI path encoded. URIs of this type can be resolved using the lookup function.
["reference", "foo", "bar"]

Returns "~rsesam:foo/bar" (which is a value of the URI datatype)).

["reference", "foo", ["list", "a", "b"]]

Returns ["~rsesam:foo/a", "~rsesam:foo/b"].

Sources

The Kafka source

The Kafka source consumes data from a Kafka topic. The consumer stores the offset in the pipe, and does not commit the consumer offset back to Kafka.

The entities emitted from this source has offset, partition, timestamp, value and key as properties. Message keys in Kafka can be any bytes, but the source will try to utf-8 decode the key and add that as the _id property.

Prototype

{
    "type": "kafka",
    "system": "kafka-system-id",
    "topic": "some-topic"
}

Properties

Property Type Description Default Req
system String The id of the Kafka System component to use.   Yes
topic String The topic to consume from.   Yes
partitions List<Integer> Manual assignment of partitions if only a subset of the topic is to be consumed by this pipe. In Azure Event Hubs this property has to be set for assignment to work for now. <All> No (Yes for Event Hubs)
seek_to_beginning Boolean If the consumer should start from the beginning of the topic or only consume new messages. This only applies to the first run, subsequent runs will continue where it left off unless the pipe is reset. false  
ignore_null_keys Boolean If the consumer should drop messages that does not have keys. true  
consumer_timeout_ms Integer The pipe will consume all available messages from the topic. Once all messages has been consumed it will wait for this period of time until it will complete. Note that for topics that receives new messages more often than this interval the pipe will never complete. 60000  

Example configuration

The outermost object would be your pipe configuration, which is omitted here for brevity.

{
    "source": {
        "type": "kafka",
        "system": "my-kafka",
        "consumer_timeout_ms": 5000,
        "ignore_null_keys": false,
        "partitions": [0, 1],
        "seek_to_beginning": true,
        "topic": "foo"
    },
}

Transforms

The properties to CURIEs transform

This transform can transform entity properties to RDF CURIEs (a superset of XML QNames) based on wildcard patterns. It is used primarily when dealing with or preparing to output RDF data. Note that URL quoting is applied to the property names as part of the transform. Also note that by default the path separator character ("/) is not quoted, but the behaviour is configurable.

Prototype

{
    "type": "properties_to_curies",
    "rule": "rdf-registry-entry",
    "quote_safe_characters": "/",
    "id": "optional-id-prefix",
    "properties": [
      "optional_some_prefix", ["optional_some_pattern"]
    ]
}

Properties

Property Type Description Default Req
rule String The id of the key in the RDF registry containing the prefix rules to to use for the transformation. See RDF support for more information about the RDF registry and how to configure it.   Yes*
quote_safe_characters String A string of characters that should be treated as "safe" from URL quoting by the transform. By default this is the slash character ("/"). If this property is set to the empty string (""), all characters of the property name will be URL quoted. This property can also be set at the RDF registry level, but this value will be overridden if set directly on the transform configuration.    
id String The prefix to use for _id properties   Yes*
properties List<(String, List<String>)> A list of String,List pairs that make up the rules for which properties should be assigned which prefixes. See the example section below for a fuller explanation of this property.   Yes*

Note that rule and id and properties are mutually exclusive. If all three are present, rule is given precedence and id and properties are ignored.

Example

The rule property references a RDF registry entry containing a prefix_rules object. See RDF support for more information about the RDF registry and how to configure it. Alternatively, the contents of the prefix_rules entry (i.e. .the id and properties) can be included inline in the transform configuration.

Given a pre-existing RDF registry entry my_entry:

"my_entry": {
   ..
   "prefix_rules": {
       "id": "x",
       "properties": [
            "c", ["status", "code"],
            "_", ["status"],
            "t", ["t_*"],
            "m", ["status", "**", "m*"],
            "s", ["status", "**"],
            "x", ["**"]
       ]
   }
   ..
}

And a transform configuration:

{
    "type": "properties_to_curies",
    "rule": "my_entry"
}

And the input entity:

{
    "_id": "foo/bar",
    "name": "John",
    "born": "1980-01-23",
    "code": "AB32",
    "t_a": "A",
    "a/b": "A/B",
    "status": {
        "married": true,
        "spouse": "Pam",
        "code": 123,
        "t_b": {
            "t_c": "C",
            "hello": "world",
            "<s:hi>": "bye"
        }
    }
}

The transform will output the following transformed entity:

{
    "_id": "<x:foo/bar>",
    "<x:name>": "John",
    "<x:born>": "1980-01-23",
    "<x:code>": "AB32",
    "<t:t_a>": "A",
    "<x:a/b>": "A",
    "<_:status>": {
        "<m:married>": true,
        "<s:spouse>": "Pam",
        "<c:code>": 123,
        "<t:t_b>": {
            "<t:t_c>": "C",
            "<s:hello>": "world",
            "<s:hi>": "bye"
        }
    }
}

Setting quote_safe_characters to "" would instead yield:

{
    "_id": "<x:foo%2Fbar>",
    "<x:name>": "John",
    "<x:born>": "1980-01-23",
    "<x:code>": "AB32",
    "<t:t_a>": "A",
    "<x:a%2Fb>": "A",
    "<_:status>": {
        "<m:married>": true,
        "<s:spouse>": "Pam",
        "<c:code>": 123,
        "<t:t_b>": {
            "<t:t_c>": "C",
            "<s:hello>": "world",
            "<s:hi>": "bye"
        }
    }
}

Notice that now "/" has also been URL quoted ("%2F")

The URIs to CURIEs transform

This transform can transform entity properties containing URIs in the keys and/or the values to a more compact form using RDF CURIEs (a superset of XML QNames). It is used primarily when dealing with or reading RDF data. See the Working with RDF document for more information about working with RDF data in Sesam.

Prototype

{
    "type": "uris_to_curies",
    "prefix_includes": ["entry1", "entry2"]
}

Properties

Property Type Description Default Req
prefix_includes List<String> A list of string keys to look up in the instance-wide RDF registry. These keys reference objects which contain RDF support structures such as CURIE prefixes (and possibly references to other prefix sets to include). The prefixes collected from the RDF registry will be used to compress full URIs to CURIEs. See RDF support for more information about the RDF registry and how to configure it. The common RDF prefixes are built-in and you don't have to provide the mapping for it (i.e. RDF, RDFS, OWL etc).    

Example

Given the configuration:

{
    "transform": [
       {
         "type": "uris_to_curies",
         "prefix_includes": ["my_entry"]
       }
    ]
}

The RDF registry entry:

"my_entry": {
   "prefixes": {
      "foo": "http://psi.foo.com/"
      "test": "http://psi.test.com/"
   }
   ..
}

And the input entity:

{
    "_id": "http://psi.test.com/2",
    "http://psi.test.com/name": "John",
    "born": "1980-01-23",
    "http://psi.test.com/code": "AB32",
    "status": {
        "http://psi.foo.com/married": true,
        "spouse": "Pam",
        "url1": "~rhttp://www.foo.com",
        "url2": "~rhttp://psi.foo.com/url2",
        "code": 123,
        "child": {
            "t_c": "C",
            "http://psi.test.com/hello": "http://psi.foo.com/world",
            "http://psi.tests.com/s": "bye"
        }
    }
}

The transform will output the following compact/"compressed" transformed entity:

{
    "_id": "<test:2>",
    "<test:name>": "John",
    "born": "1980-01-23",
    "<test:code>": "AB32",
    "status": {
        "<foo:married>": true,
        "spouse": "Pam",
        "code": 123,
        "url1": "~rhttp://www.foo.com",
        "url2": "~rfoo:url2",
        "child": {
            "t_c": "C",
            "<test:hello>": "<foo:world>",
            "http://psi.tests.com/s": "bye"
        }
    }
}

Note that the transform will not attempt to unquote the remainder elements after the matched prefixes.

Sinks

Deprecated Properties

The prefix_includes property has been deprecated for the sparql, sdshare, databrowser, and http_endpoint sinks.

Property Type Description Default Req
prefix_includes List<String> A list of string keys to look up in the node-wide RDF registry. These keys reference objects which contain RDF support structures such as CURIE prefixes (and possibly references to other prefix sets to include). The prefixes collected from the RDF registry will be used to expand CURIEs into full URIs. See RDF support for more information about the RDF registry and how to configure it. You do not need include any prefix sets to use the common RDF prefixes (i.e. RDF, RDFS, OWL and so on).    

The Kafka sink

The Kafka sink produces data to a Kafka topic.

Entities sent to this sink will use the key, value and partition properties if present, otherwise the key will be utf-8 encoded version of _id and the value will be the entire entity. If partition is not specified, the partitioning will be based on the key.

The properties used matches the properties emitted by the Kafka source. This means that it should be possible to consume a topic and produce to a new topic in a pipe with no DTL.

The sink will flush to Kafka after every batch.

Prototype

{
    "type": "kafka",
    "system": "kafka-system-id",
    "topic": "some-topic"
}

Properties

Property Type Description Default Req
system String The id of the Kafka System component to use.   Yes
topic String The topic to send to.   Yes

Example configuration

The outermost object would be your pipe configuration, which is omitted here for brevity.

{
    "sink": {
        "type": "kafka",
        "system": "my-kafka",
        "topic": "foo"
    },
}

Systems

The Kafka system

This system can be used to read and write data from Apache Kafka as well as Azure Event Hubs for Apache Kafka.

Prototype

{
    "_id": "id-of-system",
    "name": "Name of system",
    "type": "system:kafka",
    "bootstrap_servers": "localhost:9092,otherhost:9092",
}

Properties

Property Type Description Default Req
bootstrap_servers String Comma separated list of bootstrap servers with hostname and port. For Azure Event Hubs this should be set to <fqdn>:9093.   Yes
sasl_username String Username to use when authentication against a SASL enabled Kafka cluster. If username is set, authentication will be performed. For Azure Event Hubs this property must be set to $ConnectionString and the connection string should be passed as the password.   No
sasl_password String Password to use when authentication against a SASL enabled Kafka cluster. For Azure Event Hubs this should be set to Endpoint=sb://[...].   No

The RDF registry

When working with RDF data in Sesam, we would like to be able to define, maintain and share these RDF prefixes among our datasets and DTL transforms. For this purpose Sesam has a built-in RDF registry. You can configure the registry by including an entity in your configuration on the form:

{
   "_id": "node"
   "type": "metadata",
   "rdf": {
      "dataset1": {
          "prefixes": {
              "foo": "http://example.com/foo/",
              "foo_schema": "http://example.com/foo/schema/"
          },
          "prefix_rules": {
              "id": "foo",
              "properties": [
                  "foo_schema", ["**"]
              ]
          }
      },
      "dataset2": {
          "prefixes": {
              "bar": "http://example.com/bar/",
              "bar_schema": "http://example.com/bar/schema/"
          },
          "prefix_includes": ["dataset1"],
          "quote_safe_characters": "",
          "prefix_rules": {
              "id": "bar",
              "properties": [
                  "foo_schema", ["some_prop"],
                  "bar_schema", ["**"]
              ]
          }
      }
}

The root key rdf above contains the entire configuration of the RDF registry. Its sub-keys will usually correspond to dataset ids, although you can register any valid key here.

RDF registry items

The "prototype" of a RDF registry entry entry_id look like:

..
"entry_id": {
    "prefixes": {
       "foo" : "http://example.com/foo/",
       "baz" : "http://example.com/baz/",
       "bar" : "http://example.com/baz/"
    },
    "prefix_includes": ["list_of", "other", "registry", "entries"],
    "prefix_rules": {
        "id": "bar",
        "properties": [
            "foo", ["some_prop"],
            "baz", ["**"]
        ]
    },
    "quote_safe_characters": "/æåø",
}

Note that the quote_safe_characters is an optional property of the RDF registry entity. If specified, it should contains a string of characters that should be excluded from URL quoting when constructing CURIEs. It can also be specified on the properties to CURIEs transform where, if specified, will take precedence over any value it might have in the RDF registry entry. This property defaults to "/" and would normally not need to be changed. A value of "" (the emtpy string) means "quote all characters". See below for more detail on the use of this transform.

Prefixes

Each registry item must contain at least a single property prefixes which is a object containing prefix to URI mappings for CURIE generation or expansion. The registry items can also contain a list property prefix_includes which must be references to other existing RDF registry keys. When looking up items in the RDF registry, any prefix elements in this list will be recursively included. Take care that you don't have overlapping prefix names, as the final result will be undefined. Also make sure you don't create circular references using this property.

Built-in prefixes

The Sesam RDF registry has built-in support for the common prefixes in RDF, such as rdf, rdfs and owl. This means you don't have to define these yourself to use them in your CURIEs. The full list of built-in prefixes is:

{
    "_": "http://example.org/",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "wgs84": "http://www.w3.org/2003/01/geo/wgs84_pos#",
    "dc": "http://purl.org/dc/elements/1.1/",
    "dcterms": "http://purl.org/dc/terms/",
    "gs": "http://www.opengis.net/ont/geosparql#"
}

The "_" prefix is used in general as a fallback if no prefix is defined for a property when mapping an entity to its RDF representation.

Prefix rules

The final property that can exist in a RDF registry item is prefix_rules. This element tells us how to create RDF CURIEs from a plain entity: the id property contains the prefix to use for the _id property of the entity (i.e. the subject in RDF) and the properties property is a list of property pairs that encode the rules for what prefix to apply to which property of the entity.

The properties format is tuples of string/list pairs, where the first item is the prefix to add and the second is the path expression that is used to match against. The number of elements in the list must be even. Path expressions are evaluated in order and the first matching path expression will win, so if a path expression matches the prefix will be assigned to the matching key.

A path expression is a list of strings. The left-most string value is the most specific. ** can be used to denote nestedness at an arbitrary depth. * can be used as a wildcard in the string values themselves.

The property to CURIE transform

A complete example of how the prefix_rules property works; we want to transform an entity that looks like:

{
    "_id": "2",
    "name": "John",
    "born": "1980-01-23",
    "code": "AB32",
    "t_a": "A",
    "status": {
        "married": True,
        "spouse": "Pam",
        "code": 123,
        "t_b": {
            "t_c": "C",
            "hello": "world",
            "<s:hi>": "bye"
        }
    }
}

to RDF form using CURIEs. We start by defining the rules for this transformation in the RDF registry entry my_entry:

"my_entry": {
   ..
   "prefix_rules": {
       "id": "x",
       "properties": [
            "c", ["status", "code"],
            "_", ["status"],
            "t", ["t_*"],
            "m", ["status", "**", "m*"],
            "s", ["status", "**"],
            "x", ["**"]
       ]
   }
   ..
}

We then add a properties to CURIEs transform to the start of our pipe's transform section:

..
    "transform": [
        {
            "type": "properties_to_curies",
            "rule": "my_entry"
        }
        ..
    ]

This transform will use our my_entry rules and produce the following transformed entity:

{
    "_id": "<x:2>",
    "<x:name>": "John",
    "<x:born>": "1980-01-23",
    "<x:code>": "AB32",
    "<t:t_a>": "A",
    "<_:status>": {
        "<m:married>": True,
        "<s:spouse>": "Pam",
        "<c:code>": 123,
        "<t:t_b>": {
            "<t:t_c>": "C",
            "<s:hello>": "world",
            "<s:hi>": "bye"
        }
    }
}

RDF input

Sesam supports RDF input from several different sources:

Additionally, you can set up a HTTP endpoint source which includes a SDShare Push capable HTTP endpoint where you can post RDF data in NTriples format in accordance with the SDShare Push protocol.

The URIs to CURIEs transform

All of these methods of RDF input will provide entities to your data flows on the general form:

{
    "_id": "<http://example.com/bar>",
    "<http://example.com/schema/some_predicate>": "Some literal",
    "<http://example.com/schema/other_predicate>": "~rhttp://example.com/zoo"
}

When processing this data in the flow, we would like to first transform these entities to CURIE form using the RDF registry to manage the prefixes. In the above example we can add a URIs to CURIEs transform to the pipe to achieve this:

{
    "_id": "my-pipe",
    ..
    "transform": [
       {
         "type": "uris_to_curies",
         "prefix_includes": ["my_entry"]
       }
    ]

where the corresponding my_entry in the RDF registry looks like:

..
"my_entry": {
    "prefixes": {
        "foo": "http://example.com/",
        "foo_schema": "http://example.com/schema/"
    }
    ..
}
..

This transform will then produce the following entity:

{
   "_id": "<foo:bar>",
   "<foo_schema:some_predicate>": "Some literal",
   "<foo_schema:other_predicate>": "~rfoo:zoo"
}

RDF in transforms

The Sesam DTL language features several functions that are useful when working with RDF data in your flow.

Accessing CURIEs properties

When addressing properties in CURIEs form in DTL transform, you can simply use their names verbatim. For example:

..
["rename", "<foo:third_predicate>", "<foo:some_predicate>"],
["copy", "_S.<foo_schema:other_predicate>"],
["add", "<rdfs:label>", "Bob"]
..

You can also use the CURIEs in path expressions in the same way as any other property name. If you want to add a URI literal as part of your transformed entity you can use the DTL curie function, which takes a prefix and a value expression (i.e. a literal or a function) and produces a URI property value:

..
["add", "<foo_schema:baz>", ["curie", "foo", "zoo"]]
..

This will add a property that looks like:

{
  ..
  "<foo_schema:baz>": "~rfoo:zoo"
  ..
}

CURIE expansion in DTL

When processing RDF data in a flow, we sometimes would like to expand an entity or a child entity from CURIEs to full URI form (for example if there are conflicting usages of prefixes). This can be done using the DTL uri-expand:

..
["add", "<baz:expanded>", ["uri-expand", ["string", "my_entry"], {"_id": "<foo:bob>", "<foo:name>": "Bob Jones"}]]
..

This will expand the properties of the entity (here shown inline, but typically will be from a hops join or some other function) to its "full" form:

{
  ..
  "<baz:expanded>": {
      "_id": "<http://example.com/foo/bob>",
      "<http://example.com/foo/name>": "Bob Jones"
  }
  ..
}

Note that expanding CURIEs is normally done at the endpoint of your flow (i.e. by the sink or a SDShare feed, see below). However, if the sink you are using to output the final data is not RDF aware (i.e. supports automatic prefix expansion) you can use the uri-expand function to achieve the same functionality.

Pipes

Short-hand configuration

As mentioned earlier, in the pipe section, there is a special "short hand" configuration for one of the most used pipes; pipes pumping entities from RDBMS tables to an internal dataset. Since this is an often encountered usecase, we have condensed the information needed into a single url-style form:

[
    {
       "_id": "Northwind",
       "type": "system:mysql",
       "name": "Northwind database",
       "username": "northwind",
       "password": "secret",
       "host": "mydb.example.org",
       "database": "Northwind"
    },
    {
       "_id": "Northwind:Orders",
       "type": "pipe",
       "name": "Orders from northwind",
       "short_config": "sql://Northwind/Orders"
    }
]

Currently, only the sql system and source is supported though other short forms may be added at a later time. The above example using the short_config form is equivalent to this fully expanded pipe configuration:

[
    {
       "_id": "Northwind",
       "type": "system:mysql",
       "name": "Northwind database",
       "username": "northwind",
       "password": "secret",
       "host": "mydb.example.org",
       "database": "Northwind"
    },
    {
       "_id": "Northwind:Orders",
       "type": "pipe",
       "source": {
           "type": "sql",
           "system": "Northwind",
           "table": "Orders"
       },
       "sink": {
           "type": "dataset",
           "dataset": "Northwind:Orders"
       },
       "pump": {
           "schedule_interval": 30
       }
    }
]

You can combine the short form with properties from the dataset sink, sql source and specific pump properties, as long as the _id and type properties aren't overridden, for example changing the pump schedule and startup flag:

[
    {
       "_id": "Northwind",
       "type": "system:mysql",
       "name": "Northwind database",
       "username": "northwind",
       "password": "secret",
       "host": "mydb.example.org",
       "database": "Northwind"
    },
    {
       "_id": "Northwind:Orders",
       "type": "pipe",
       "name": "Orders from northwind",
       "short_config": "sql://Northwind/Orders",
       "pump": {
           "schedule_interval": 60,
           "run_at_startup": true
       }
    }
]