Annotated Example¶
Let’s say that we have two datasets, person and orders, and that
we want to transform the persons by joining in their orders and
apply a few other transform functions. In this section you’ll find a
complete DTL transform that takes entities from the person dataset,
joins them with entities from the orders dataset and creates new
entities from them.
Given the following source entity (from the person dataset):
{
"_id": "1",
"name": "John Smith",
"age": 25
}
We then want to transform it into the following target entity:
{
"_id": "1",
"type": "customer",
"name": "JOHN SMITH",
"orders": [
{"_id": 100, "amount": 320 },
{"_id": 200, "amount": 500 }
],
"order_count": 2
}
A pipe with the dtl transform below lets us transform persons into
persons with orders:
{
"_id": "person-with-orders",
"type": "pipe",
"source": {
"type": "dataset",
"dataset": "person"
},
"transform": {
"type": "dtl",
"rules": {
"default": [
["copy", "_id"],
["add", "type", "customer"],
["add", "name", ["upper", "_S.name"]],
["add", "orders",
["sorted", "_.amount", ["apply-hops", "order", {
"datasets": ["orders o"],
"where": [
["eq", "_S._id", "o.cust_id"]
]
}]]],
["add", "order_count", ["count", "_T.orders"]],
["filter", ["gt", "_T.order_count", 10]]
],
"order": [
["copy", "_id"],
["add", "amount", "_S.amount"]
]
}
},
"sink": {
"type": "dataset",
"dataset": "person-with-orders"
}
}
Explanation:
- The
dtltransform will receive source entities from thepersondataset. It will transform them and they’ll be written to theperson-with-ordersdataset. - There are two named
rulesspecified in the DTL transform:defaultandorder. Thedefaultrule is mandatory and is the one that is applied to the entities in thepersondataset. You can think of it as the entry point of the execution, similar to amainfunction in many programming languages. ["copy", "_id"]copies the_idproperty from the source entity to the target entity.["add", "type", "customer"]adds thetypeproperty to the target entity with the literal value"customer".["add", "name", ["upper", "_S.name"]]adds thenameproperty to the target entity, uppercasing the name in the source entity.
["add", "orders",
["sorted", "_.amount",
["apply-hops", "order", {
"datasets": ["orders o"],
"where": [
["eq", "_S._id", "o.cust_id"]
]
}]
]
]
- The expression above adds the
ordersproperty to the target entity. It does this by joining the source entity’s_idproperty with thecust_idproperty of entities in theordersdataset. The join is done by theapply-hopsfunction, which takes a hops specification that contains list ofdatasets, assigns aliases to them, which then get exposed as variables that you can use in expressions in thewhereclause. The result of the join is a list of orders:
[
{
"_id": "100",
"amount": 320,
"order_lines": ["..."],
"cust_id": "1"
},
{
"_id": "200",
"amount": 500,
"order_lines": ["..."],
"cust_id": "1"
}
]
Next, the order transform is then applied. The result of this is a list of orders with two properties: _id and amount:
[{
"_id": "100",
"amount": 320
},
{
"_id": "200",
"amount": 500
}]
The order entities are then sorted by their amount property before being assigned to the orders property on the target entity:
[{
"_id": "100",
"amount": 320
},
{
"_id": "200",
"amount": 500
}]
["add", "order_count", ["count", "_T.orders"]]adds theorder_countproperty to the target entity. Note that the value is the number of order entities in the target entity’sordersproperty. Note that we can access properties on the target entity once we’ve added them.- Stop processing if the
["filter", ["gt", "_T.order_count", 10]]evaluates to true. If the filter is false the target entity is not emitted / created.
Note
Transform functions are applied in the order given. The order is significant, and one transform can use target entity properties created by earlier transform function.
The hops function is deterministic but not sorted (it produces deterministic order based on the
_idproperty of the entities within each dataset it processes). You must apply thesortedfunction to the result of a hops join to achieve a particular order.The filter function can be used to stop transformation of individual entities, effectively filtering them out of the output stream.
When the DTL of a pipe is modified, the pipe’s “last-seen” value must be cleared in order to reprocess already seen entities with the new DTL. This can be done by setting the “last-seen” value to an empty string with the update-last-seen operation in the SESAM API.