Entity Data Model¶
Sesam uses an entity data model as the core representation of data. Each entity is a dictionary of key-value pairs. Each key is a string and the value can be either a literal value, a list or another dictionary.
A Sesam entity has a few special keys that should not be tempered with. The following data prototype explains these special properties.
[
{
"_id": "the identity of the entity",
"_updated": "a token indicating when this was modified",
"_deleted": "indicating if the entity should be treated as deleted",
"_hash": "a hash string of the entity's content",
"_previous": "the _updated token of the previous version",
"_ts": "timestamp for when entity was registered in source",
"_filtered: "true when entity has been filtered out by pipe transforms"
}
[
The entity data model supports a wide range of data types including, string, integer, decimal, boolean, namespaced identifier, URI, bytes and datetime. Over the wire both a binary and JSON representation is used.
This is an example of how an entity could look like:
[
{
"_id": "1",
"name": "Bill",
"dob": "01-01-1980"
},
{
"_id": "2",
"name": "Jane",
"dob": "04-10-1992"
}
]
Reserved fields¶
Entity fields starting with _
are reserved. Any such fields, except _id
and _deleted
, will be ignored when writing an entity to a dataset. Note that the fields are only reserved at the root level, so child entities can have them.
Field |
Description |
Required |
---|---|---|
|
This is the primary key of the entity. The value is always a string. |
Yes |
|
If |
|
|
The sequence of the entity. The value must be either a string
or an integer value. The value is used to tell the order of the
entities. The value is meant to be opaque, and should not be
parsed or interpreted by other parties than the source
that produced it. The |
|
|
A string containing the hash of the entity’s content. This value
is used to decide when an entity has changed. Of the reserved
fields, only This field is generated automatically when writing an entity to a dataset. |
|
|
A pointer back to the previous version of this entity. The
value refers to the This field is generated automatically when writing an entity to a dataset. |
|
|
This is the real-world timestamp for when the entity was added to the dataset. The value is an integer representing the number of microseconds since epoch (January 1st 1970 UTC). This field is used only for informal purposes. This field is generated automatically when writing an entity to a dataset. |
|
|
This boolean field is added automatically by DTL if the entity was filtered out by the filter transform function. The purpose of it is to signal to the sink that the entity should be deleted if it exists in the sink. Only the dataset sink supports this currently. If the sink does not support it then the entity will be discarded instead of passed along to the sink. |
|
|
If Note that this property has been superceeded by a new way of doing dependency tracking that does not require modifying entites. If you see this property then the entity was likely materialized by the old implementation. This field is generated automatically by the dependency tracking. |
Special fields¶
Entity fields starting with $
are semi-reserved. They have special meaning and will sometimes be produced and consumed by built-in components. These fields are normal fields that will be hashed and stored as part of the entity.
Field |
Description |
Required |
---|---|---|
|
This field can be used to hold all the identities of an entity. An entity
may have multiple identities, i.e. in addition to the one in |
|
|
The create-child DTL transform function
will add the created child entity as a value in the |
|
|
The merge source will set the |
Standard types¶
Entities are mapped to and from JSON objects, so they support the same data types as JSON does. Because JSON only supports a limited number of data types there is also limited support for Transit data types.
Type |
Description |
Example |
---|---|---|
Dict |
Like a JSON object where keys are always strings. This type is not orderable. |
|
Entity |
Like a Dict, but with an |
|
List |
A list of values. Values can be of any type. This type is not orderable. |
|
String |
A string value. Maximum length is 4294967296 bytes. |
|
Integer |
An integer value. The range of this data type is unlimited, i.e. it can store any positive or negative integer value. |
|
Decimal |
A decimal number. This data type has arbitrary precision. Use it instead of
|
|
Float |
A double-precision floating point number. The valid range is the IEEE 754 binary 64 format, because we’re internally storing the value as a double-precision floating-point number. Note that you may loose precision when using this data type. |
|
Boolean |
A boolean value. Either |
|
Null |
A null value. Typically used to represent a missing value. This type is not orderable. |
|
Extension types¶
Transit encoded values are represented as strings in JSON. The value is prefixed by “~” and tag character that indicates the type of the value. The extension types below are currently the only ones supported. Transit types that are not recognized will be treated as string values.
Note
There’s currently no support for escaping string literals that start with a “~” character.
Type |
Description |
Example |
---|---|---|
NI |
Namespaced Identifier (NI) |
|
URI |
Uniform Resource Identifier (URI) |
|
Date |
A date value. The valid range is from |
|
Datetime |
Date and time with up to nanoseconds precision. The valid range is
from |
|
Bytes |
A base64 encoded binary value. |
|
UUID |
A Universally unique identifier formatted as hexadecimal text. |
|
Decimal |
A decimal number with arbitrary precision. |
|
Mixed type ordering¶
In situations where lists of values of multiple types have to be ordered then the following ordering is used:
Null
Boolean
Integer, Float, Decimal
Date, Datetime
UUID
Namespaced identifier (NI)
URI
String
Dict
Tuple
Bytes
Types under the same bullet point are compatible and internally orderable. Values of incompatible types are sorted not by value but by the rank of their type (see the list above).
Example: ["sorted", ["list", 1.5, "b", 1, "a", 2]]
returns [1, 1.5, 2, "a", "b"]
because the strings and integers are not compatible types. The
integers are ordered before the strings. Decimals and integers are compatible,
so they are sorted together.
Note
Values of the Dict type are ordered by sorting their keys and then comparing each key+value pair.