Entity Data Model

Sesam uses an entity data model as the core representation of data. Each entity is a dictionary of key-value pairs. Each key is a string and the value can be either a literal value, a list or another dictionary.

A Sesam entity has a few special keys that should not be tempered with. The following data prototype explains these special properties.

[
  {
        "_id": "the identity of the entity",
        "_updated": "a token indicating when this was modified",
        "_deleted": "indicating if the entity should be treated as deleted",
        "_hash": "a hash string of the entity's content",
        "_previous": "the _updated token of the previous version",
        "_ts": "timestamp for when entity was registered in source",
        "_filtered: "true when entity has been filtered out by pipe transforms"
  }
[

The entity data model supports a wide range of data types including, string, integer, decimal, boolean, namespaced identifier, URI, bytes and datetime. Over the wire both a binary and JSON representation is used.

This is an example of how an entity could look like:

[
    {
        "_id": "1",
        "name": "Bill",
        "dob": "01-01-1980"
    },
    {
        "_id": "2",
        "name": "Jane",
        "dob": "04-10-1992"
    }
]

Reserved fields

Entity fields starting with _ are reserved. Any such fields, except _id and _deleted, will be ignored when writing an entity to a dataset. Note that the fields are only reserved at the root level, so child entities can have them.

Field

Description

Required

_id

This is the primary key of the entity. The value is always a string.

Yes

_deleted

If true, then the entity is deleted. All other values are interpreted as if the entity is not deleted.

_updated

The sequence of the entity. The value must be either a string or an integer value. The value is used to tell the order of the entities. The value is meant to be opaque, and should not be parsed or interpreted by other parties than the source that produced it. The _updated value can be passed through to the since request parameter in HTTP endpoints.

_hash

A string containing the hash of the entity’s content. This value is used to decide when an entity has changed. Of the reserved fields, only _id and _deleted contribute to the hash value.

This field is generated automatically when writing an entity to a dataset.

_previous

A pointer back to the previous version of this entity. The value refers to the _updated field of the previous version of the entity. If the field is missing or the value is null, then there exists no previous version. Note that the previous version may not actually exist anymore as dataset compaction may have reclaimed it.

This field is generated automatically when writing an entity to a dataset.

_ts

This is the real-world timestamp for when the entity was added to the dataset. The value is an integer representing the number of microseconds since epoch (January 1st 1970 UTC). This field is used only for informal purposes.

This field is generated automatically when writing an entity to a dataset.

_filtered

This boolean field is added automatically by DTL if the entity was filtered out by the filter transform function. The purpose of it is to signal to the sink that the entity should be deleted if it exists in the sink. Only the dataset sink supports this currently. If the sink does not support it then the entity will be discarded instead of passed along to the sink.

_tracked

If true then the entity was added to the dataset by dependency tracking.

Note that this property has been superceeded by a new way of doing dependency tracking that does not require modifying entites. If you see this property then the entity was likely materialized by the old implementation.

This field is generated automatically by the dependency tracking.

Special fields

Entity fields starting with $ are semi-reserved. They have special meaning and will sometimes be produced and consumed by built-in components. These fields are normal fields that will be hashed and stored as part of the entity.

Field

Description

Required

$ids

This field can be used to hold all the identities of an entity. An entity may have multiple identities, i.e. in addition to the one in _id. The value type is always NI. The merge source will collect all the merged identities in this field.

$children

The create-child DTL transform function will add the created child entity as a value in the $children property of the target entity. The emit_children transform can then later be used to expand the $children entities into standalone entities.

$replaced

The merge source will set the $replaced field to true if the output entity is being replaced with a new entity that has a different entity id. This typically happens when the entity is being merged with another entity where the id of the other entity takes precedence over the current one.

Standard types

Entities are mapped to and from JSON objects, so they support the same data types as JSON does. Because JSON only supports a limited number of data types there is also limited support for Transit data types.

Type

Description

Example

Dict

Like a JSON object where keys are always strings. This type is not orderable.

{"a": 123}

Entity

Like a Dict, but with an _id property. The _id property must be a string.

{"_id", "person1", "a": 123}

List

A list of values. Values can be of any type. This type is not orderable.

["abc", 123, [4, 5], {"x": "y"}]

String

A string value. Maximum length is 4294967296 bytes.

"abc"

Integer

An integer value. The range of this data type is unlimited, i.e. it can store any positive or negative integer value.

123

Decimal

A decimal number. This data type has arbitrary precision. Use it instead of Float when/if keeping precision is important to your application.

123.456

Float

A double-precision floating point number. The valid range is the IEEE 754 binary 64 format, because we’re internally storing the value as a double-precision floating-point number. Note that you may loose precision when using this data type.

123.456

Boolean

A boolean value. Either true or false.

true

Null

A null value. Typically used to represent a missing value. This type is not orderable.

null

Extension types

Transit encoded values are represented as strings in JSON. The value is prefixed by “~” and tag character that indicates the type of the value. The extension types below are currently the only ones supported. Transit types that are not recognized will be treated as string values.

Note

There’s currently no support for escaping string literals that start with a “~” character.

Type

Description

Example

NI

Namespaced Identifier (NI)

"~:mynamespace:123"

URI

Uniform Resource Identifier (URI)

"~rhttp://www.sesam.io/"

Date

A date value. The valid range is from "~t0001-01-01" to "~t9999-12-31".

"~t2015-12-31"

Datetime

Date and time with up to nanoseconds precision. The valid range is from "~t0001-01-01T00:00:00Z" to "~t9999-12-31T23:59:59.123456789Z". The date and time parts of the string are mandatory. The fraction of a second is optional. The value must always be in UTC, so the Z at the end is mandatory.

"~t2015-01-02T03:04:05.123456789Z", "~t1973-01-22T23:11:54Z"

Bytes

A base64 encoded binary value.

"~bAAECAwQF"

UUID

A Universally unique identifier formatted as hexadecimal text.

"~u531a379e-31bb-4ce1-8690-158dceb64be6"

Decimal

A decimal number with arbitrary precision.

"~f12345678901234567890.1234567890"

Mixed type ordering

In situations where lists of values of multiple types have to be ordered then the following ordering is used:

  • Null

  • Boolean

  • Integer, Float, Decimal

  • Date, Datetime

  • UUID

  • Namespaced identifier (NI)

  • URI

  • String

  • Dict

  • Tuple

  • Bytes

Types under the same bullet point are compatible and internally orderable. Values of incompatible types are sorted not by value but by the rank of their type (see the list above).

Example: ["sorted", ["list", 1.5, "b", 1, "a", 2]] returns [1, 1.5, 2, "a", "b"] because the strings and integers are not compatible types. The integers are ordered before the strings. Decimals and integers are compatible, so they are sorted together.

Note

Values of the Dict type are ordered by sorting their keys and then comparing each key+value pair.