Entity Data Model¶

Sesam uses an entity data model as the core representation of data. Each entity is a dictionary of key-value pairs. Each key is a string and the value can be either a literal value, a list or another dictionary.

A Sesam entity has a few special keys that should not be tempered with. The following data prototype explains these special properties.

[
  {
        "_id": "the identity of the entity",
        "_updated": "a token indicating when this was modified",
        "_deleted": "indicating if the entity should be treated as deleted",
        "_hash": "a hash string of the entity's content",
        "_previous": "the _updated token of the previous version",
        "_ts": "timestamp for when entity was registered in source",
        "_filtered: "true when entity has been filtered out by pipe transforms"
  }
[

The entity data model supports a wide range of data types including, string, integer, decimal, boolean, namespaced identifier, URI, bytes and datetime. Over the wire both a binary and JSON representation is used.

This is an example of how an entity could look like:

[
    {
        "_id": "1",
        "name": "Bill",
        "dob": "01-01-1980"
    },
    {
        "_id": "2",
        "name": "Jane",
        "dob": "04-10-1992"
    }
]

Reserved fields¶

Entity fields starting with _ are reserved. Any such fields, except _id and _deleted, will be ignored when writing an entity to a dataset. Note that the fields are only reserved at the root level, so child entities can have them.

Field	Description	Required
`_id`	This is the primary key of the entity. The value is always a string.	Yes
`_deleted`	If `true`, then the entity is deleted. All other values are interpreted as if the entity is not deleted.
`_updated`	The sequence of the entity. The value must be either a string or an integer value. The value is used to tell the order of the entities. The value is meant to be opaque, and should not be parsed or interpreted by other parties than the source that produced it. The `_updated` value can be passed through to the `since` request parameter in HTTP endpoints.
`_hash`	A string containing the hash of the entity’s content. This value is used to decide when an entity has changed. Of the reserved fields, only `_id` and `_deleted` contribute to the hash value. This field is generated automatically when writing an entity to a dataset.
`_previous`	A pointer back to the previous version of this entity. The value refers to the `_updated` field of the previous version of the entity. If the field is missing or the value is `null`, then there exists no previous version. Note that the previous version may not actually exist anymore as dataset compaction may have reclaimed it. This field is generated automatically when writing an entity to a dataset.
`_ts`	This is the real-world timestamp for when the entity was added to the dataset. The value is an integer representing the number of microseconds since epoch (January 1st 1970 UTC). This field is used only for informal purposes. This field is generated automatically when writing an entity to a dataset.
`_filtered`	This boolean field is added automatically by DTL if the entity was filtered out by the filter transform function. The purpose of it is to signal to the sink that the entity should be deleted if it exists in the sink. Only the dataset sink supports this currently. If the sink does not support it then the entity will be discarded instead of passed along to the sink.
`_tracked`	If `true` then the entity was added to the dataset by dependency tracking. Note that this property has been superceeded by a new way of doing dependency tracking that does not require modifying entites. If you see this property then the entity was likely materialized by the old implementation. This field is generated automatically by the dependency tracking.

Special fields¶

Entity fields starting with $ are semi-reserved. They have special meaning and will sometimes be produced and consumed by built-in components. These fields are normal fields that will be hashed and stored as part of the entity.

Field	Description	Required
`$ids`	This field can be used to hold all the identities of an entity. An entity may have multiple identities, i.e. in addition to the one in `_id`. The value type is always NI. The merge source will collect all the merged identities in this field.
`$children`	The create-child DTL transform function will add the created child entity as a value in the `$children` property of the target entity. The emit_children transform can then later be used to expand the `$children` entities into standalone entities.
`$replaced`	The merge source will set the `$replaced` field to `true` if the output entity is being replaced with a new entity that has a different entity id. This typically happens when the entity is being merged with another entity where the id of the other entity takes precedence over the current one.

Standard types¶

Entities are mapped to and from JSON objects, so they support the same data types as JSON does. Because JSON only supports a limited number of data types there is also limited support for Transit data types.

Type	Description	Example
Dict	Like a JSON object where keys are always strings. This type is not orderable.	`{"a": 123}`
Entity	Like a Dict, but with an `_id` property. The `_id` property must be a string.	`{"_id", "person1", "a": 123}`
List	A list of values. Values can be of any type. This type is not orderable.	`["abc", 123, [4, 5], {"x": "y"}]`
String	A string value. Maximum length is 4294967296 bytes.	`"abc"`
Integer	An integer value. The range of this data type is unlimited, i.e. it can store any positive or negative integer value.	`123`
Decimal	A decimal number. This data type has arbitrary precision. Use it instead of `Float` when/if keeping precision is important to your application.	`123.456`
Float	A double-precision floating point number. The valid range is the IEEE 754 binary 64 format, because we’re internally storing the value as a double-precision floating-point number. Note that you may loose precision when using this data type.	`123.456`
Boolean	A boolean value. Either `true` or `false`.	`true`
Null	A null value. Typically used to represent a missing value. This type is not orderable.	`null`

Extension types¶

Transit encoded values are represented as strings in JSON. The value is prefixed by “~” and tag character that indicates the type of the value. The extension types below are currently the only ones supported. Transit types that are not recognized will be treated as string values.

Note

There’s currently no support for escaping string literals that start with a “~” character.

Type	Description	Example
NI	Namespaced Identifier (NI)	`"~:mynamespace:123"`
URI	Uniform Resource Identifier (URI)	`"~rhttp://www.sesam.io/"`
Date	A date value. The valid range is from `"~t0001-01-01"` to `"~t9999-12-31"`.	`"~t2015-12-31"`
Datetime	Date and time with up to nanoseconds precision. The valid range is from `"~t0001-01-01T00:00:00Z"` to `"~t9999-12-31T23:59:59.123456789Z"`. The date and time parts of the string are mandatory. The fraction of a second is optional. The value must always be in UTC, so the `Z` at the end is mandatory.	`"~t2015-01-02T03:04:05.123456789Z"`, `"~t1973-01-22T23:11:54Z"`
Bytes	A base64 encoded binary value.	`"~bAAECAwQF"`
UUID	A Universally unique identifier formatted as hexadecimal text.	`"~u531a379e-31bb-4ce1-8690-158dceb64be6"`
Decimal	A decimal number with arbitrary precision.	`"~f12345678901234567890.1234567890"`

Mixed type ordering¶

In situations where lists of values of multiple types have to be ordered then the following ordering is used:

Null
Boolean
Integer, Float, Decimal
Date, Datetime
UUID
Namespaced identifier (NI)
URI
String
Dict
Tuple
Bytes

Types under the same bullet point are compatible and internally orderable. Values of incompatible types are sorted not by value but by the rank of their type (see the list above).

Example: ["sorted", ["list", 1.5, "b", 1, "a", 2]] returns [1, 1.5, 2, "a", "b"] because the strings and integers are not compatible types. The integers are ordered before the strings. Decimals and integers are compatible, so they are sorted together.

Note

Values of the Dict type are ordered by sorting their keys and then comparing each key+value pair.

Time-based masterdata management in Sesam

Data platforms