Time-based masterdata management in Sesam

Summary

Time-based master data management ensures that the most recently updated system becomes the master of a property, rather than assigning a specific system as the master for a given property. This approach is useful when no single system can be designated as the master for a particular type of data. In practice, it’s often more important to have the latest version of the data—such as an employee’s phone number or a customer’s email address—regardless of which system it comes from. If a customer updates their phone number in one system, you don’t want an outdated number from another system to override it. Time-based master data management always selects the most recent update as the master value, regardless of the source system.

This section covers best practices in how to implement time-based masterdata management in Sesam and the effects is has.

Last modified value

In order to do time-based masterdata management, every property needs a datetime corresponding to the date and time the property was last modified. In Sesam, the preferred way of implementing this to introduce claims, a claim is a collection of metadata for each property.

We can now rewrite the key-value <property>: <value> as <property>: <claim> where the claim is a dictionary containing the claim data. In the example below, the claim contains the property’s value, its $last-modified datetime, and the claim’s end-time, which is when the claim got replaced by a newer claim. These are the claim properties.

Example

"<namespace>:<property>":
  {
    "<namespace>:value": "value",
    "<namespace>:$last-modified": "~t1987-09-11:T06:55:07.456827Z",
    "<namespace>:end-time": "~t2024-09-20T07:18:20.698042Z"
  }

Not every system has last modified datetime values for each property. Some systems might only have a datetime value on the entity as a whole, while some systems might not have any datetime value. In all three cases we need to be able to supply the claim with a near real-time datetime value in order to perform accurate time-based masterdata management. The different scenarios are described below in order of priority:

  1. The system provides last modified datetime values for each individual property

  • In this case the system provided datetime values are directly added to each corresponding claim’s $last-modified value

  1. The system provides last modified datetime values for each entity

  • In this case we use the provided datetime value as the initial $last-modified but manually update the $last-modified value when the claim value changes

  1. The system does not provides any last modified datetime values on property or entity level

  • In this case we use Sesam’s internal _ts value from the -collect pipe as initial $last-modified value and manually update the $last-modified value when the claim value changes

A full DTL example on how to set and update $last-modified values in claims can be seen at the bottom of this page.

Masterdata management

Now that we have up-to-date datetime values inside each property claim we can perform time-based masterdata management inside our global pipes. Instead of using the coalesce function to set system priority we can sort claims by the newest $last-modified value and pick that claim value corresponding to the newest claim.

Example

["add", "<property>",
  ["path", "value",
    ["last",
      ["sorted", "_.<namespace>:$last-modified",
        ["filter",
          ["is-empty", "_.end-time"], "_T.<property>"]
      ]
    ]
  ]
]

DTL examples

Set and update $last-modified

"transform": {
  "type": "dtl",
  "rules": {
    "default": [
      ["comment", "locating the last version of the entity in the sink dataset"],
      ["add", "_history",
        ["apply", {
          "datasets": ["<sink-dataset> t"],
          "where": [
            ["eq", "_S.<primary-key>", "t._id"]
          ],
          "track-dependencies": false
        }]
      ],
      ["merge",
        ["apply", "history",
          ["dict", "key", "age", "value",
            ["dict", "value", "_S.age", "_S.$last-modified"]
          ]
        ]
      ]
    ],
    "history": [
      ["add", "_property", "_S.value"],
      ["add", "_pid", "_P._T._id"],
      ["comment", "locating the last version of the property"],
      ["add", "_property-history",
        ["path", "_S.key",
          ["if",
            ["and",
              ["eq",
                ["count", "_R._T._history"], 1],
              ["is-empty", "_T._pid"]
            ],
            ["first", "_R._T._history"],
            ["filter",
              ["eq", "_._id", "_T._pid"], "_R._T._history"]
          ]
        ]
      ],
      ["add", "_property-history-newer",
        ["filter",
          ["gt", "_.$last-modified", "_R._T.$last-modified"], "_T._property-history"]
      ],
      ["if",
        ["eq",
          ["count", "_T._property-history"], 0],
        ["add", "_S.key", "_S.value"],
        [
          ["comment", "Ignore new data if older than history"],
          ["if",
            ["gt",
              ["count", "_T._property-history-newer"], 0],
            ["add", "_S.key", "_T._property-history"],
            [
              ["add", "_property-history-latest",
                ["filter",
                  ["is-empty", "_.end-time"], "_T._property-history"]
              ],
              ["add", "_property-history-old",
                ["filter",
                  ["is-not-empty", "_.end-time"], "_T._property-history"]
              ],
              ["add", "_property-compare",
                ["map",
                  ["apply", "match-dict",
                    ["dict", "source", "_.", "target", "_T._property-history-latest"]
                  ], "_T._property"]
              ],
              ["add", "_property-history-compare",
                ["map",
                  ["apply", "match-dict",
                    ["dict", "source", "_.", "target", "_T._property-compare.match"]
                  ], "_T._property-history-latest"]
              ],
              ["add", "_S.key",
                ["combine",
                  ["apply", "add-end", "_T._property-history-compare.new"],
                  ["apply", "add-end", "_T._property-history-old"]
                ]
              ]
            ]
          ],
          ["remove", "_property*"]
        ]
      ]
    ],
    "match-dict": [
      ["add", "key", "_P._S.key"],
      ["if",
        ["in", true,
          ["map",
            ["eq",
              ["apply", "strip-dates", "_S.source"],
              ["apply", "strip-dates", "_."]
            ], "_S.target"]
        ],
        ["add", "::match", "_S.source"],
        ["add", "::new", "_S.source"]
      ]
    ],
    "strip-dates": [
      ["copy", "ps:*"],
      ["if",
        ["neq", "_P._T.key", "end-time"],
        ["remove", "end-time"]
      ],
      ["if",
        ["neq", "_P._T.key", "$last-modified"],
        ["remove", "$last-modified"]
      ]
    ],
    "add-end": [
      ["copy", "*"],
      ["if",
        ["is-empty", "_S.end-time"],
        ["add", "end-time", "_R._T._$last-modified"]
      ],
      ["merge",
        ["dict",
          ["items", "_T."]
        ]
      ]
    ]
  }
}

The example above also handles old claims and makes sure that the new claim value is actually more current than the old one.