JSON Push Protocol¶
The JSON Push protocol is an HTTP-based protocol that uses POST requests to perfom incremental or full data syncs across to an HTTP endpoint. It supports splitting up the data into smaller batches, so that not all the data have to be sent in a single request.
The protocol is supported by the http_endpoint source and the json sink. This protocol can be used by microservices and other clients to send entities.
Requests¶
The following HTTP request parameters are supported:
Parameter |
Description |
---|---|
sequence_id |
A string token that is generated every time the data sync is started. Included in all requests. |
is_full |
A boolean. Has the value |
request_id |
A string token that is generated for every request. Included in all requests. |
previous_request_id |
A string token referencing the request_id of the previous request. Included in all but the first request. |
is_first |
A boolean. Included in the first request with the value |
is_last |
A boolean. Included in the last request with the value |
The HTTP body is JSON data which will always be in the form of a JSON array even if it is a single entity. The serialisation of entities as JSON is described in more detail here.
If the sequence_id is not the same as the one that is currently active, then it will replace the active one (full sync only). Only a single sequence_id should be active at any time. A sequence id goes out of scope when the ‘is_last=true’ request parameter is given. Note that an incremental request does not affect the active sequence.
Conflicts¶
When doing an incremental sync the request parameters can silently be ignored and the entities should be accepted.
The HTTP server should recognize these cases as conflicting when doing a full data sync:
The
previous_request_id
does not match with the request_id of the previous request. Return ‘409 Conflict’.The value of
is_full
is not the same as in the previous request of the same sequence. Return ‘409 Conflict’.The
is_first
request parameter is true when we know that it is not the first request in the active sequence. Return ‘409 Conflict’.
If the JSON Push Sink receives a 409 Conflict
response code, then it
must stop the active sync, and try again later. It should start from
the beginning with a new sequence if it was a full data sync.
Examples¶
This section contains three examples that illustrates how you can use
the JSON push protocol to push data to Sesam using incremental and
full syncs. The examples use the curl
command line utility.
Our examples all use the following pipe which sets up a receiver endpoint that we can use to push our JSON entities to:
$ cat pipe.json
{
"_id": "myendpoint",
"type": "pipe",
"source": {
"type": "http_endpoint"
},
"sink": {
"type": "dataset",
"dataset": "mydataset"
}
}
To use the curl
command to communicate with Sesam we need a JWT
token for authorization. We’ll add the JWT authorization header to an
environment variable to make this easier:
export AUTH_HEADER="Authorization: bearer YOUR-JWT-TOKEN"
Example 0: incremental sync¶
This example has one JSON request body, which looks like this:
$ cat sequence0.json
[
{
"_id": "a",
"name": "A"
},
{
"_id": "b",
"name": "B"
}
]
We can now POST
the contents of the sequence0.json
file to the
/api/receivers/myendpoint/entities
endpoint. This request is
treated as an incremental sync, which in practice will just add the
entities to the mydataset
dataset.
$ curl -X POST -H "$AUTH_HEADER" -H "Content-Type: application/json" --data @sequence0.json http://localhost:9042/api/receivers/myendpoint/entities
{}
If we now look at the mydataset
dataset using the API, it now looks like this:
$ curl -s -H "$AUTH_HEADER" http://localhost:9042/api/datasets/mydataset/entities | jq .
[
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 0,
"_previous": null,
"_ts": 1489654249474509,
"_hash": "8b48c6574f7e8474194090eb9123666e"
},
{
"name": "B",
"_id": "b",
"_deleted": false,
"_updated": 1,
"_previous": null,
"_ts": 1489654249474537,
"_hash": "d3b66df7c0e6513556378c2ec5b91d5c"
}
]
Example 1: full sync¶
This example has three JSON request bodies, which look like this:
$ cat sequence1.0.json
[
{
"_id": "b",
"name": "B"
}
]
$ cat sequence1.1.json
[
{
"_id": "a",
"name": "A (updated)"
},
{
"_id": "c",
"name": "C"
}
]
$ cat sequence1.2.json
[
{
"_id": "d",
"name": "D"
}
]
We can now POST
the contents of the three files to the
/api/receivers/myendpoint/entities
endpoint. We’ll want to do this
in one sequence, but across three HTTP requests.
All our request set the request parameter is_full
to true
in
order to indicate that we’re doing a full sync, i.e. we’re sending the
complete set of data in this sequence. This will enable deletion
detection, which we’ll see an example of later.
We also set the sequence_id
request parameter to a unique token
that represents our sequence. In this case we use the value 1
to
indicate that this is our first sequence.
Since this is our first request in the sequence we set request_id
to 1
.
Note
In general it is a good idea to use UUIDs for
sequence_id
, so that we can guarantee that they are unique. For
request_id
it is OK to use numbers as the request id is local
to the sequence.
The first request starts the sequence and posts the contents of the
first file. Note that the is_first
request parameter is set to
true
.
$ curl -X POST -H "$AUTH_HEADER" -H "Content-Type: application/json" --data @sequence1.0.json 'http://localhost:9042/api/receivers/myendpoint/entities?is_full=true&sequence_id=1&request_id=1&is_first=true'
{}
Our dataset now contains the same two entities as before as the first
file did not contain any changes to the b
entity. This is normal
as a dataset will only update when entities actually are different.
$ curl -s -H "$AUTH_HEADER" http://localhost:9042/api/datasets/mydataset/entities | jq .
[
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 0,
"_previous": null,
"_ts": 1489654249474509,
"_hash": "8b48c6574f7e8474194090eb9123666e"
},
{
"name": "B",
"_id": "b",
"_deleted": false,
"_updated": 1,
"_previous": null,
"_ts": 1489654249474537,
"_hash": "d3b66df7c0e6513556378c2ec5b91d5c"
}
]
The second request posts the contents of the second file. Note that
the previous_request_id
references the request_id
of our
previous request. This is just a safety measure so that we make sure
that we don’t miss any requests.
$ curl -X POST -H "$AUTH_HEADER" -H "Content-Type: application/json" --data @sequence1.1.json 'http://localhost:9042/api/receivers/myendpoint/entities?is_full=true&sequence_id=1&request_id=2&previous_request_id=1'
Our dataset now contains an updated a
entity and a new c
entity.
$ curl -s -H "$AUTH_HEADER" http://localhost:9042/api/datasets/mydataset/entities | jq .
[
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 0,
"_previous": null,
"_ts": 1489654249474509,
"_hash": "8b48c6574f7e8474194090eb9123666e"
},
{
"name": "B",
"_id": "b",
"_deleted": false,
"_updated": 1,
"_previous": null,
"_ts": 1489654249474537,
"_hash": "d3b66df7c0e6513556378c2ec5b91d5c"
},
{
"name": "A (updated)",
"_id": "a",
"_deleted": false,
"_updated": 2,
"_previous": 0,
"_ts": 1489654329529744,
"_hash": "73873bcb381ebef10644a0bda6798e6a"
},
{
"name": "C",
"_id": "c",
"_deleted": false,
"_updated": 3,
"_previous": null,
"_ts": 1489654329529773,
"_hash": "3b6c7ae2d4d66d9f1cf185f0c3004cce"
}
]
The third request is our last request. It posts the contents of the
third file. Here the is_last
request parameter is set to true
to tell the server that this is the last request in the sequence.
$ curl -X POST -H "$AUTH_HEADER" -H "Content-Type: application/json" --data @sequence1.2.json 'http://localhost:9042/api/receivers/myendpoint/entities?is_full=true&sequence_id=1&request_id=3&previous_request_id=2&is_last=true'
{}
Our dataset now contains a new d
entity.
$ curls -H "$AUTH_HEADER" http://localhost:9042/api/datasets/mydataset/entities | jq .
[
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 0,
"_previous": null,
"_ts": 1489654249474509,
"_hash": "8b48c6574f7e8474194090eb9123666e"
},
{
"name": "B",
"_id": "b",
"_deleted": false,
"_updated": 1,
"_previous": null,
"_ts": 1489654249474537,
"_hash": "d3b66df7c0e6513556378c2ec5b91d5c"
},
{
"name": "A (updated)",
"_id": "a",
"_deleted": false,
"_updated": 2,
"_previous": 0,
"_ts": 1489654329529744,
"_hash": "73873bcb381ebef10644a0bda6798e6a"
},
{
"name": "C",
"_id": "c",
"_deleted": false,
"_updated": 3,
"_previous": null,
"_ts": 1489654329529773,
"_hash": "3b6c7ae2d4d66d9f1cf185f0c3004cce"
},
{
"name": "D",
"_id": "d",
"_deleted": false,
"_updated": 4,
"_previous": null,
"_ts": 1489654349898053,
"_hash": "9d8255209c3e1d318e8cf2cab7a3a73e"
}
]
Example 2: full sync (deletion detection)¶
This example has two JSON request bodies, which look like this:
$ cat sequence2.0.json
[
{
"_id": "a",
"name": "A"
},
{
"_id": "b",
"name": "B"
}
]
$ cat sequence2.1.json
[
{
"_id": "d",
"name": "D"
}
]
This example is similar to the previous one, but this time there’s only two requests and we’ll show off the deletion detection feature. Entities that exists in the dataset, but are not part of the entities sent in a sequence will be marked as deleted.
The first request starts the sequence and posts the contents of the
first file. Note that the is_first
request parameter is set to
true
.
$ curl -X POST -H "$AUTH_HEADER" -H "Content-Type: application/json" --data @sequence2.0.json 'http://localhost:9042/api/receivers/myendpoint/entities?is_full=true&sequence_id=2&request_id=1&is_first=true'
Our dataset now contains an updated a
entity. b
did not
change, so it was not added to the dataset.
$ curl -s -H "$AUTH_HEADER" http://localhost:9042/api/datasets/mydataset/entities | jq .
[
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 0,
"_previous": null,
"_ts": 1489654249474509,
"_hash": "8b48c6574f7e8474194090eb9123666e"
},
{
"name": "B",
"_id": "b",
"_deleted": false,
"_updated": 1,
"_previous": null,
"_ts": 1489654249474537,
"_hash": "d3b66df7c0e6513556378c2ec5b91d5c"
},
{
"name": "A (updated)",
"_id": "a",
"_deleted": false,
"_updated": 2,
"_previous": 0,
"_ts": 1489654329529744,
"_hash": "73873bcb381ebef10644a0bda6798e6a"
},
{
"name": "C",
"_id": "c",
"_deleted": false,
"_updated": 3,
"_previous": null,
"_ts": 1489654329529773,
"_hash": "3b6c7ae2d4d66d9f1cf185f0c3004cce"
},
{
"name": "D",
"_id": "d",
"_deleted": false,
"_updated": 4,
"_previous": null,
"_ts": 1489654349898053,
"_hash": "9d8255209c3e1d318e8cf2cab7a3a73e"
},
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 5,
"_previous": 2,
"_ts": 1489654388968749,
"_hash": "8b48c6574f7e8474194090eb9123666e"
}
]
The second request is our last request. It posts the contents of the
second file. Here the is_last
request parameter is set to true
to tell the server that this is the last request in the sequence. Note
that in this example there was no middle request that was neither
is_first
nor is_last
.
$ curl -X POST -H "$AUTH_HEADER" -H "Content-Type: application/json" --data @sequence2.1.json 'http://localhost:9042/api/receivers/myendpoint/entities?is_full=true&sequence_id=2&request_id=2&previous_request_id=1&is_last=true'
Our dataset now contains a deleted c
entity. The entity was
deleted because it did exist in the dataset, but was not part of the
entities that we sent. It is thus marked as deleted. d
did not
change.
$ curl -s -H "$AUTH_HEADER" http://localhost:9042/api/datasets/mydataset/entities | jq .
[
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 0,
"_previous": null,
"_ts": 1489654249474509,
"_hash": "8b48c6574f7e8474194090eb9123666e"
},
{
"name": "B",
"_id": "b",
"_deleted": false,
"_updated": 1,
"_previous": null,
"_ts": 1489654249474537,
"_hash": "d3b66df7c0e6513556378c2ec5b91d5c"
},
{
"name": "A (updated)",
"_id": "a",
"_deleted": false,
"_updated": 2,
"_previous": 0,
"_ts": 1489654329529744,
"_hash": "73873bcb381ebef10644a0bda6798e6a"
},
{
"name": "C",
"_id": "c",
"_deleted": false,
"_updated": 3,
"_previous": null,
"_ts": 1489654329529773,
"_hash": "3b6c7ae2d4d66d9f1cf185f0c3004cce"
},
{
"name": "D",
"_id": "d",
"_deleted": false,
"_updated": 4,
"_previous": null,
"_ts": 1489654349898053,
"_hash": "9d8255209c3e1d318e8cf2cab7a3a73e"
},
{
"name": "A",
"_id": "a",
"_deleted": false,
"_updated": 5,
"_previous": 2,
"_ts": 1489654388968749,
"_hash": "8b48c6574f7e8474194090eb9123666e"
},
{
"name": "C",
"_id": "c",
"_deleted": true,
"_updated": 6,
"_previous": 3,
"_ts": 1489654451658805,
"_hash": "ec83ea023462b80f19a23e39639f7307"
}
]