Creating a Custom Data Source

A custom data source service is one that exposes data from an existing system as a stream of JSON objects over HTTP. Data from the custom service can easily be consumed by a Pipe and written into a Dataset.

The basic requirements on the custom service are very simple. The service must expose a single resource that returns all the data from the underlying system as a stream of JSON objects.

Optionally, and it is recommended that this is implemented, the resource can accept a single query parameter called ‘’since’’. This is a token that can be used by the service to only return entities that have changed on or later than indicated in the since token. The ’’since’’ parameter is further explained in the JSON pull protocol. For endpoints that does not support a since query parameter, but does support other query parameters to locate changes in the resource a Microservice system provides more agile ways of importing only changes from the resource.

The JSON objects (in Sesam called an entity) produced by the source must also adhere to a few simple rules related to the reserved fields and the stucture of the batch:

  • Entities MUST have an ‘_id’ property.

  • Entities MAY have an ‘_deleted’ property. It defaults to false if ommitted.

  • Entities MAY have an ‘_updated’ property. If present this will be used when Sesam invokes the since parameter on subsequent calls.

  • Any other properties starting with ‘_’ are reserved and will not be stored in Sesam.

  • A response must expose entities as a JSON Array.

Here is an example entity:

1
2
3
4
5
  {
      "_id" : "1",
      "_deleted" : false,
      "_updated" : 0
  }

and another one:

1
2
3
4
5
  {
      "_id" : "e-8786763",
      "_deleted" : false,
      "_updated" : "2016-03-03T00:00:00Z"
  }

The following is a simple example of a response of entities exposed as a JSON array:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  [
      {
          "_id" : "1",
          "_deleted" : false,
          "_updated" : 0
      },

      {
          "_id" : "2",
          "_deleted" : false,
          "_updated" : 1
      }
  ]

The source service can be run anywhere providing that it can be contacted over HTTP from the Sesam service. To configure Sesam to consume the feed into a dataset in Sesam, see the sections below.

Custom Data Source - The URL system

The configuration below defines a URL system for the remote service. Inside the configuration we have specified the url_pattern of the service. This is helpful if the service is serving several different collections of data since each pipe connecting to the system can point to their own specific endpoint. Also, if the service moves the base url can be updated in just one place.

The pipe’s source is defined as a JSON source. It expects a resource containing JSON data packed in a JSON array. Note that in the example below we have set supports_since to true, which means we expect the resource endpoint to support the since parameter for requesting deltas, i.e. only updated data. We have also specified a pipe specific url. This URL will be attached to the system’s url_pattern to form the complete URL for that request.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
  {
      "_id": "custom-source-pipe",
      "type": "pipe",
      "source": {
          "type": "json",
          "system": "custom-url-system",
          "supports_since" : true,
          "url": "entities"
      }
  }

  {
    "_id": "custom-url-system",
    "type": "system:url",
    "url_pattern": "http://localhost:5000/api/%s"
  }

Custom Data Source - The Microservice system

If the built-in URL system is not enough to cover your required functionality, a microservice could be a good solution. When creating a microservice as a custom data source there are a few thing to bare in mind in order to gain optimal functionality.

To set up a microservice custom source a microservice that implements the JSON pull protocol should be developed and running.

Once this is running it is possible to define a pipe in Sesam where the source is a JSON source. All data read by the microservice will be sent to the source, preferable as a stream.

For more information on how microservices can be used in Sesam please see the Microservices in Sesam section.


Tutorials

Custom Data Source - The Microservice System

Learn how to create a custom data source with a microservice.


Start tutorial


In order to help write data source components, a set of starter templates have been created for several languages. Each template comes with a runnable service that exposes a simple set of in-memory objects as JSON using the protocol described above. Each service also comes with a Dockerfile to allow quick packaging and deployment of the custom service alongside Sesam.

The templates that are relevant to building new data sources are:

In the following configurations we will see how the JSON source in combination with the Microservice system can be used to create a Custom Data Source.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
  {
    "_id": "custom-source-pipe",
    "type": "pipe",
    "source": {
      "type": "json",
      "system": "custom-microservice-system",
      "url": "/my-source-endpoint"
    }
  }
  {
    "_id": "custom-microservice-system",
    "type": "system:microservice",
    "docker": {
      "environment": {
        "some-other-variable": "some-other-value",
        "some-variable": "some-value"
      },
      "image": "my-image-url",
      "port": 5000
    }
  }

Change tracking

Whenever possible, we advise you to always setup a microservice to only import changes instead of full imports. By doing so you will drastically reduce the time it takes for a microservice to import data, and therefore make data available to target systems much faster.

You can achieve this by using what we refer to in Sesam as Change Tracking.Read more about change tracking in this article Continuation support for Microservices.


Tutorials

Continuation support and change tracking

Look closer into continuation support and change tracking for data imported from a microservice.


Start tutorial


Pushing Data Into The Hub

An alternative to getting Sesam to pull data is that a client can also push data to the hub. The steps for doing this are quite straight forward.

The first step is to define a push receiver endpoint in Sesam. The HTTP Endpoint Source should be configured to allow the custom service to push JSON data to Sesam. This endpoint supports the JSON push protocol.

An example would be:

1
2
3
4
5
6
7
  {
      "_id": "my-endpoint",
      "type": "pipe",
      "source": {
          "type": "http_endpoint"
      }
  }

The the following URL can be used as an endpoint to receive JSON according to the JSON push protocol.

http://localhost:9042/api/receivers/my-endpoint/entities

Once this is configured any custom code, event handler, or queue reader can post data to Sesam.

Important

The http endpoint source works much like source with since support in that every time data is pushed to the source from an external provider, Sesam registers this as stream of changes.

One of the effects of this is that data that used to be included in the push, but is not anymore, is not marked as deleted automatically downstream. You can read about how to avoid this here.