Connect data

In this phase we will demonstrate how to connect data across different sources. More specifically how the same companies from different sources can be merged to give us a more complete perspective of the respective companies.

We will also get a small taste of master data management (MDM) mechanisms in Sesam by adding global properties.

Objectives:

After you complete this tutorial you will have learned the following:

  1. How to merge entities from different sources

  2. How to define global properties

Prerequisites

Before starting on this tutorial we suggest you complete the Enrich data tutorial as we will use that data in this tutorial.

Create a global pipe

To create a global pipe, follow the steps below.

  1. Navigate to Pipes

  2. Click on New pipe

  3. Paste and save the configuration below

  4. Click Start to ensure your pipe runs

  5. Click the Output tab to see the result

{
  "_id": "global-organization",
  "type": "pipe",
  "source": {
    "type": "merge",
    "datasets": ["hubspot-company-enrich hc", "enhetsregisteret-company-enrich ec"],
    "equality_sets": [
      ["hc.properties.about_us", "ec.orgnr"]
    ],
    "identity": "first",
    "version": 2
  },
  "transform": {
    "type": "dtl",
    "rules": {
      "default": [
        ["copy", "*"],
        ["add", "organization-number",
          ["coalesce",
            ["list", "_S.enhetsregisteret-company:orgnr", "_S.hubspot-company:properties.hubspot-company:about_us"]
          ]
        ],
        ["add", "name",
          ["coalesce",
            ["list", "_S.enhetsregisteret-company:navn", "_S.hubspot-company:properties.hubspot-company:name"]
          ]
        ],
        ["add", "address",
          ["coalesce",
            ["list", "_S.enhetsregisteret-company:forretningsadr", "_S.hubspot-company:properties.hubspot-company:address"]
          ]
        ],
        ["add", "zipcode",
          ["coalesce",
            ["list", "_S.enhetsregisteret-company:forradrpostnr", "_S.hubspot-company:properties.hubspot-company:zip"]
          ]
        ],
        ["add", "city",
          ["coalesce",
            ["list", "_S.enhetsregisteret-company:forradrpoststed", "_S.hubspot-company:properties.hubspot-company:city"]
          ]
        ]
      ]
    }
  },
  "metadata": {
    "global": true
  }
}

Important to note here is that we fetch companies from different sources (Enhentsregisteret and HubSpot), and merge the company entities when orgnr from Enhetsregisteret matches about_us from HubSpot.

Also notice that data from all sources are preserved and aggregated in the result.

The properties we added are listed in the global-organization namespace. Properties in a global namespace are what we call global properties.

We use coalesce to determine the source trust order and is one of the mechanisms provided by Sesam for MDM.

By adding the global properties we no longer need to worry about which source system has the most accurate data. From here on we can simply use the global properties instead when needed.

These global properties will be used later in the transform phase.