Changelog¶
2024-11-27¶
Added support for TTL (time to live) compaction for deletes. This can be enabled by setting
ttl_deletes_hours
in the pipe’s compaction section. When enabled, entities will be compacted away if the latest version of the entity has"_deleted": true
and is older thanttl_deletes_hours
. All versions of the entity will be compacted away, and the only way to recover them is to restore from a backup that contains the entities.
2024-10-11¶
Added support for connectors.
Added support for multitenancy.
Added support for webhooks.
Updated roadmap.
2024-10-09¶
Integrated Search has been extended to add support for phrase search.
2024-10-07¶
Added
verify_ssl
as a global default in the service metadata. This determines the default value of theverify_ssl
property on URL systems. The default value isfalse
.
2024-09-19¶
Extended Integrated Search to allow using a well-defined query syntax. Improvements have been made to the search results for namespaced identifiers that have been merged. They now have the same query result page.
Added the
trigger_on
property to the http transform.
2024-03-13¶
Added documentation for how to change the logging level for workernodes.
2024-01-31¶
Added a new property
schema_url
andsystem
to the JSON schema transform. Theschema_url
can be used to avoid embedding the schema in your pipe configuration by pointing to an externally stored schema instead. If this is used, thesystem
must be set, and it must point to a valid URL system.
2024-01-12¶
Added the possibility to specify permissions to be applied to a system in a
permissions
pipe property.
2023-12-22¶
Added support for conditional properties for the System, Pipe and service metadata configuration entities.
2023-12-20¶
Added support for using DTL to calculate value of the
completeness
property on the dataset source at runtime.Added the completeness DTL function.
2023-12-12¶
Added the coalesce-args DTL function. This function is different from
coalesce
in that it evaluates its arguments in order and stops when it finds an argument that is not null. This can in many situations be a lot more efficient.Fixed a bug where timestamps were not parsed correctly during partial rescans.
2023-11-27¶
Extended the
prevent_multiple_versions
property of dataset sinks to also accept the enum"ignore"
(in addition totrue
or the default valuefalse
). If set to"ignore"
the pipe will silently ignore any updates to existing entities in the dataset (whereas atrue
value makes the pipe fail when encountering updates).
2023-11-07¶
Added a new phonenumber-parse DTL function.
Added a new phonenumber-format DTL function.
2023-10-11¶
Clarified that the system level
headers
property on REST systems is used on all requests executed by the system. The keys in this property can be overridden in the individual operations but cannot be discarded.
2023-09-01¶
Active use of the sesam-py client will now prevent developer and developer-pro subscriptions from being hibernated. This feature was introduced in version 2.8.0.
2023-08-18¶
Hibernation for developer subscriptions are extended to developer pro subscriptions as well.
Any automated CI system that requires 24/7 uptime should be moved to a single node. You can still do CI testing with a developer subscription, but hibernation wake-up time must be expected.
2023-08-17¶
Execution log entries
circuit-breaker-commit
andcircuit-breaker-rollback
are now written when a circuit breaker is committed or rolled back.Added the
trace
property available on the REST transform, REST source, REST sink and HTTP endpoint source to theglobal_defaults
section of the service metadata. This property, if set, represents the default value for thetrace
property on these components when not set explicitly in their config. The intention is to be able to turn this feature on globally when debugging or doing development without having to change the individual components.
2023-08-11¶
Added a new global default
run_at_startup_if_not_populated
to the service metadata. This setting determines the default value of run_at_startup_if_not_populated for pumps.
2023-08-10¶
2023-08-07¶
Added a new
next_page_termination_strategy
optionnot-full-page
and a new propertypage_size
to the REST system. When this new strategy is enabled, paging will terminate if the number of entities in the response is less than the specifiedpage_size
. This new property can also be used in Jinja expressions.
2023-07-04¶
We will from now on spin down developer-subscriptions that have had no interaction recently. “Interacted” is defined as clicking around in the Management Studio in the given subscription. After it has been interacted with it will be spun up again, taking about 15minutes. Improvements to the UI to reflect this is being worked on.
2023-06-30¶
Added a new
refresh_window
option to theoauth2
section of the URL system and REST systems. When using refresh tokens, this value (in seconds) is the window to pre-emptively refresh a token that is about to expire. It’s 30 seconds by default. Set this property to 0 if the system doesn’t allow tokens to be refreshed before they expire.
2023-06-26¶
Added a new
next_page_termination_strategy
optionsame-response
to the REST system that is enabled by default. When enabled, paging will terminate if the response is equal to the previous response.
2023-05-15¶
Corrected the documentation of sources that have the
supports_signalling
property to reflect that the threshold for turning off implicit signalling is an hour, not two minutes. Note that you should explicitly turn on or off signalling support using thesupport_signalling
property if you need to have control over this on your pipe.
2023-05-08¶
Added support for Tripletex authentication to the URL system and REST systems
Added an group DTL function.
2023-05-02¶
A dataset source with
subset
now respects theinclude_previous_versions
property (which is false by default). Before this change historical versions were included. The dataset entities API will also now respect thehistory
request parameter for subsets.
2023-04-27¶
Updated the documentation of the path DTL function with a description of how non-string items in the PROPERTY_PATH list are treated (they are ignored).
2023-04-25¶
Added a new
require_populated_input
setting as a global default in the service metadata and as a property on the dataset, merge, merge_datasets and union_datasets sources. It can be used to prevent a pipe from running unless the pipe’s source-datasets have been populated.
2023-03-29¶
Added
page
andis_first_page
bound parameters to the Jinja expressions for the REST transform and REST source. These are useful for including or excluding properties when doing paged operations.Added a
"manual"
enum to thesince_property_location
of the REST source - if set, the source will not attempt to add any continuation-related parameter automatically.
2023-03-24¶
Updated our Terms of Service.
2023-03-17¶
We decided to revert our recent change of the default value of
allowed_status_codes
in the REST transform from 200-299 to 200. The change did cause some problems with non-idempotent sinks. The default value is now 200-299.
2023-03-14¶
allowed_status_codes
andignored_status_codes
can now be specified on REST operations, but they can only be used with the REST transform.
2023-03-07¶
Added the possibility to specify permissions to be applied to the pipe in a
permissions
pipe property.
2023-02-28¶
Added
validation_expression
property to the HTTP endpoint source. This allows custom request validation for receiver endpoints. This is particularly useful when clients cannot use JWT tokens for authentication.
2023-02-24¶
Added a new
error_expression
property to theoperation
object properties in the REST system (and any local variants). It is available to the REST source and REST transform and is intended to be used to test for error conditions in responses from systems that don’t use HTTP error codes properly. If it renders to a non-empty string the source or transform will fail. The contents of the rendered error is included in the exception raised to the pipe.
2023-02-23¶
Added a new
initial_completeness
property to the dataset source.
2023-01-31¶
Restricted access to pipe runner API for subscriptions not having developer_mode enabled. The motivation is to avoid running tests in production systems as that is disruptive/destructive.
2023-01-30¶
Extended the completeness feature to propagate the completeness value of all upstream datasets. You can now also specify the specific upstream datasets that you want a dataset source to have completeness for.
2023-01-26¶
Changed the default value of
side_effects
fromfalse
totrue
for the REST transform and HTTP transforms. Note that this is a change of behavior and will prevent previews from including these types of transforms by default. The motivation for this change is to prevent unintentional changes in the external systems accessed by the transforms when previewing a pipe. You can manually changeside_effects
tofalse
if you’re sure your transforms are free from such side-effects or if you don’t mind changes happening when previewing a pipe.
2023-01-25¶
Added the
since
bound parameter to thepayload
,headers
andparams
operation object properties in the REST system (and any local variants) for the REST source.Documented some additional bound parameters available for paged responses in the templated properties for the REST system (and any local variants) and REST source and REST transform.
2023-01-24¶
Added support for the missing
"HEAD"
and"OPTIONS"
HTTP methods for operation objects in the REST system (and any local variants). Note that"HEAD"
requests will always result in an empty response body, so will not work withreplace_entity
set totrue
in the REST transform and requires aresponse_property
to be set for the REST source.
2023-01-23¶
Added a special Jinja template marker string
"sesam:markjson"
that can be used to generate json objects (both objects, lists and single values) from strings in thepayload
,params
andheaders
operation objects in the REST system (and any local variants). This feature is considered experimental and may change or be removed.
2023-01-20¶
Added a special Jinja template marker string
"sesam:markskip"
that can be used to conditionally drop properties from thepayload
,params
andheaders
operation objects in the REST system (and any local variants). This feature is considered experimental and may change or be removed.
2023-01-19¶
Added a new
trace
property on the REST transform, REST source and REST sink. It can be used to log the http requests and responses these components sends and receives, which can be useful during development or debugging.Renamed the
trace.log_authorization_header_redacted_bytes
property of the HTTP endpoint source totrace.log_secret_redacted_bytes
.Added docs on how to enable trace in the Preview panel in Management studio.
2023-01-18¶
Added “entity” and “source_entity” as bound parameters in various Jinja templateable properties in the REST system, REST transform, REST source and REST sink.
2023-01-17¶
Added a new
next_page_termination_strategy
optionsame-next-page-request
to operations in the REST system (and any local variants). If included in thenext_page_termination_strategy
values, it will terminate the paging if it detects that the request to issue is identical to the previous request (i.e. the headers, url, parameters and payload are all the same values). Added this new strategy to the defaultnext_page_termination_strategy
, which is now a list ofnext-page-link-empty
andsame-next-page-request
.Added an “experimental” note to
next_page_termination_strategy
to indicate that this property is still under development and subject to change/removal.
2023-01-11¶
It’s now possible to specify a
operations
property directly on the REST transform, REST source and REST sink. If present both in the pipe and the system, the pipe version will take precedence. Note that only the system version allows secrets. This is primarily intended as a convenience feature during development; in a production environment if multiple pipes use the sameoperations
configuration, you should consider storing it on the REST system so it can be reused and maintained in one place.
2023-01-10¶
Added support for http basic authentication to the Elasticsearch system.
Added new options to the
trace
property of the HTTP endpoint source:log_authorization_header_redacted_bytes
,log_response_body_maxsize
andlog_response_headers
.
2023-01-09¶
Changed the default
allowed_status_codes
in the REST transform from 200-299 to 200.REST transform, REST source and REST sink: reverted the
payload
merge behavior from 2022-12-08. It will now work the way it did previously, i.e as a default fallback mechanism. Ifpayload
is defined multiple places, the order of precedence is 1) entity, 2) sink/source/transform and 3) operation. If you need to add a secret to thepayload
you should add it only to theoperation
section on the REST system and then use theproperties
property on the pipe side to dynamically add properties from the entities to thepayload
via Jinja templating.
2023-01-06¶
Documented the
response_headers_property
configuration property for the REST source.Documented the
index_mapping_properties
,index_check_document
andfirst_run_delete_query
configuration properties for the Elasticsearch sink.
2023-01-04¶
Added a new
rescan_when_config_changes
setting as a pipe property and as a global default in the service metadata.
2023-01-03¶
All Jinja templates are now using a more strict “undefined variables” check, this means that any reference to a non-existing variable in the template will now throw an exception instead of in some cases rendering an empty string. Note that this is a change in behavior.
For security reasons, all Jinja templates are by default executed in a restricted sandbox environment. Note that this means some functions and objects may no longer be available.
2022-12-30¶
Added a new property
mark_deletion_tracked
to the dataset sinks. If set totrue
(the default isfalse
), a"$deletion_tracked":true
property will be added to entities deleted by deletion tracking during full runs or rescans.
2022-12-28¶
The
scope
sub-property of theoauth2
config element of the URL system and REST system now accept single strings as well as arrays of strings.Added a new experimental
trigger_on
property to the REST transform. This property can be used to selectively pass through entities based on a property of the entity, for instance allowing a chain of REST transforms to use different transforms for different operations.REST system: added new
payload_type
enum"text"
and changed the default to"json"
if thepayload_type
is not set. Note that this is a change of behavior. Setting thepayload_type
to"text"
sets thecontent-type
of the request to"text/plain"
if thepayload
is not of typebytes
(and isn’t set explicitly in theheaders
property of the operation). If the type of the payload isbytes
thecontent-type
will be set to"application/octet-stream"
. All other types will be serialized to a JSON encoded string.The
headers
andparams
properties of theoperations
section of the REST system can now be templated using Jinja expressions.The
payload
property of theoperations
section of the REST system and in the REST source , REST transform and REST sink configurations can now be templated using Jinja expressions.Added
previous_body
andprevious_headers
named parameters to relevant “templateable” properties of the REST system and in the REST source and REST transform. Note that these are only set for systems that supports paging, for all pages except the first one. Use Jinja’s “is defined” tests in templates that use these to set default values for the first page.
2022-12-22¶
Added a new
trace
property to the HTTP endpoint source. It can be used to log incoming requests to the pipe’s execution log, which can be useful during development or debugging.Documented the
do_float_as_int
anddo_float_as_decimal
properties in the HTTP endpoint source. (These properties have existed for a very long time, they have just not been documented until now.)
2022-12-16¶
Added a
next_page_termination_strategy
property to operations in the REST system. This can be used to define how the REST source and REST transform decide when to terminate when using pagination. The default value isnext-page-link-empty
which means that the paging is considered done if thenext_page_link
template evaluates to null (or an empty string). The other strategies areempty-result
andsame-next-page-link
which terminates pagination on empty results returned or if the next page link is the same as the current page link, respectively. The strategies can be combined as an array.Added
url
andrequest_params
bound variables to thenext_page_link
template. The motivation for this is to support more services that need to construct their pagination links with parts of the current query parameters.Fixed a bug in the REST transform that would cause it to attempt to merge the
properties
property in the entity with the static version defined in the operation or transform configuration. The correct behavior is to use the entity version if it exists and then fall back to the transform and operation, in that order, if it does not.
2022-12-13¶
Added a new
if_transform_empty
property to the REST transform. It can be used to make the transform fail if it returns an unexpected empty response. The default is to allow empty responses, which could lead to deletion tracking downstream. This property is analogous to theif_source_empty
property for sources.
2022-12-08¶
The
payload
property of an operation in the REST system will now be merged with the payload from the pipe if both are dicts. The motivation for this change is to allow payload properties that contain static secrets to be defined in the system.Added a new
allowed_status_codes
to the REST transform. It can be used to pass through non-ok responses for further processing.Added a new
response_status_property
to both the REST transform and REST system operation elements that, if specified, holds which property to use for the status code of the response.Documented the
response_headers_property
configuration property for the REST transform and REST system operation element.
2022-12-02¶
Added a new debug option to the pump configuration section:
max_seconds_per_entity
. It can be used to pinpoint entities that are particularly slow to transform. It will make the pipe fail if the batch uses on average more than the limit number of seconds per entity. It should be used in conjunction withbatch_size
set to 1 on the pipe to be exact - the execution log will include the first entity in the batch that triggers this limit.
2022-12-01¶
Added support for OAuth 2 refresh token grants to the URL system and REST system.
2022-11-15¶
Made the
since
variable available to theurl
property in the REST system operation configuration. Note it’s only applicable to REST sources with continuation support.Updated the documentation of the REST component Jinja templates with what variables are available to them.
2022-11-11¶
A new payload type
multipart-form
applicable to the REST sink and REST transform has been added.Fixed the example for using the
form
ormultipart-form
payload types - it should use a single dictionary of key value pairs, not a list.
2022-11-09¶
The Diff datasets source has been deprecated
The REST source is no longer considered experimental.
2022-10-11¶
Added configuration warning to pipes with chained DTL transforms where other than the first transform use hops with dependency tracking enabled.
Added configuration warning to pipes that have hops with dependency tracking enabled, but do not use the “dataset” source.
2022-10-03¶
Pipe runs triggered by pumps using cron expressions or scheduled intervals larger than one hour (3600 seconds) are persisted, so if the service is down when they should have run they will be run as soon as the service starts up again.
2022-09-06¶
Deletion tracking done by background rescan is now done in batches and is interleaved with incremental synchronization. This means that deletion tracking will no longer stop-the-world.
2022-09-01¶
We’ve updated our Subscription Fee, payment terms. Note that prices are now listed in U.S. Dollar. For existing customers, the changes will take effect from December 1st 2022.
2022-08-17¶
Added the
if_source_empty
property to sources and the global defaultglobal_defaults.if_source_empty
to the service metadata. This property determines the behaviour of pipes when their source returns no entities. Previously synced entities will normally be deleted from the pipe dataset when it finishes running, even if no entities are received. Setting this new property tofail
will prevent this by making the pipe fail before it can perform a new sync.
2022-08-09¶
Added
escape_null_bytes
property to the CSV source. If set totrue
, any null characters in the input CSV file will be escaped before parsing the data. This prevents the source pipe from failing due to attempted reads of lines containing null characters. The property is set tofalse
by default due to performance reasons.
2022-08-08¶
Added
verify_ssl
property to the LDAP system. Ifuse_ssl
is set totrue
then this property controls if the certificate used for the connection should be verified. It istrue
by default.
2022-08-05¶
Added
custom_ca_pem_chain
property to the LDAP system. This property can hold a custom chain of certificates (in PEM format) that will be used to validate the SSL connection ifuse_ssl
is set totrue
.
2022-07-27¶
Added a new property
global_defaults.always_index_ids
to the service metadata. Enabling this will make all dataset sinks maintain an index on the$ids
property, without the need for specifying theindexes
property on each individual sink.
2022-07-01¶
Added a “discard-inferred-schema” pump operation to the service API. This operation will discard any inferred schema entries for the pipe and writes a special “pump-discard-inferred-schema” entity to the pipe execution log for reference. This operation can only be done on non-running pipes.
Behavioural change: all pipes that have
infer_pipe_entity_types
set totrue
, and have a source with continuation support, will now discard their inferred schemas upon being reset.
2022-06-30¶
Added a new property include_completeness to pipes. This property specifies a list of dataset ids that should contribute to the completeness timestamp value of the sink dataset. By default, this property is equal to the pipe’s input datasets, minus any datasets listed in exclude_completeness.
Pipes that fail to infer their schemas due to limitations on the resulting schema size will no longer fail. The inferred schema will instead be truncated and marked as such and the pipe will not attempt to do schema inference the next time it runs.
2022-06-08¶
The VPN feature now supports high availability for connections. This means that you can set up redundant connections that can be failed over to. This is a multi subscription only feature.
2022-05-20¶
2022-05-12¶
A pipe with automatic reprocessing enabled will now automatically reset if the dependency tracking threshold is reached.
2022-05-03¶
Transforms now have a side_effects property that specifies if the transform has side-effects or not. A side-effect means that it causes changes to the system that it talks to. If the transform alters the system in any way, then this property must be set to true to prevent inadvertent changes to the system by features like pipe preview.
Corrected a bug that for multi subscriptions would cause the default maximum concurrent pipes for a SQL system to be 20 instead of the 10 and essentially unlimited for non-SQL systems. Note that the default number of concurrent pipe for all systems is controlled by the
worker_threads
property available on all systems and is 10 by default.
2022-04-25¶
Documented the resource quotas for microservices.
The default value of
max_merged
in the merge source is now set as a global default in the service metadata, and the default value has been increased to 50000 entities. This is a very high number of entities for the merge source to handle at once, and merge sources will start using up large amounts of RAM before hitting this default limit. It is recommended to reduce this limit to prevent such high memory usage and then reconfigure any pipes that attempt to merge too many entities.
2022-04-19¶
Added a new property
max_merged
with a default value of 100 entities to the merge source. Pipes that attempt to merge more entities thanmax_merged
will fail with this change. The motivation for adding this new property is that merge sources generally should not be merging that many entities in the first place, and the merge process can end up using excessive amounts of RAM.
2022-04-07¶
Schema inferencing has been extended to collect namespaces used in NI values.
2022-03-31¶
Added support for Metrics.
New data option Metrics and monitoring in test and production pricing replaces the pr. pipe monitoring option. Pipe monitoring will still be available for existing subscription that is already using this.
2022-03-25¶
New developer subscription size Developer Pro is now available.
Added support for Durable Data.
2022-03-24¶
Subscriptions created in the portal are now provisioned with the Clustered architecture.
2022-03-21¶
The Databrowser tool will reach end-of-life December 31st 2023. It is superseded by the Integrated Search feature. We will notify the current subscribers soon.
Added a property
ignore_non_existent_datasets
to the merge, merge_datasets and union_datasets sources. By default, listing one or or more datasets ininitial_datasets
that do not exist does not prevent the source from being populated. Settingignore_non_existent_datasets
tofalse
will make the pipe fail if any non-existent datasets are listed indatasets
.Fixed a bug where the
initial_datasets
property was initialized as an empty list in the merge, merge_datasets and union_datasets sources ifinitial_datasets
was not explicitly set. The property now defaults correctly to the same list of datasets listed indatasets
. This is a breaking change.The dataset and diff_datasets now warn the user if any input datasets do not exist. This also applies to the merge, merge_datasets and union_datasets sources if
ignore_non_existent_datasets
isfalse
.
2022-03-10¶
Restructured this documentation site. What’s Sesam is targeted at architects and decision makers. User guide is targeted at users of Sesam, with new subsections for Data synchronization, Data modelling, Data platforms and Operations.
2022-03-03¶
Pipes with
manual
oroff
pump mode can now be disabled and enabled.
2022-02-11¶
As part of the Clustered architecture everywhere initiative we are now in the process of migrating in-cloud subscriptions over to it. You can find the provisioning status of a subscription in
Subscription
>Basics
in the Management Studio. There you can see which provisioner version it is running (version 1
is old single machine service,version 2
is the new clustered service, if self-hosted it will sayself-hosted
).
Changes to the user experience:
Pipes are now being provisioned asynchronous, this is reflected in the UI.
Config upload when using sesam-py can report taking a little longer.
2022-01-25¶
The lower keys, upper keys and undirected graph transforms have been deprecated. DTL transforms can replace the functionality of lower keys and upper keys transforms.
2022-01-24¶
Added a new property remove_pk_char_trailing_spaces to the SQL sink. This property is enabled by default and fixes an issue with updating table rows when the primary key is of type
nchar
orchar
.
2022-01-20¶
Added custom header functionality to HTTP transforms.
2022-01-12¶
Added domain name validation to
docker.hosts
property on microservice systems. This ensures that domain names are on a format that is accepted by Kubernetes.
2022-01-03¶
Added a new resolved_entity property to write-error entities in the execution log. It contains the entity that was used to resolve the write-error if it is different from the original entity that caused the write-error. This property is also set for any tracked dead letters that has been resolved (on the deleted dead letter). Fixed a bug where the resolved property was not set (to
true
) if a write-error entity was successfully retried.
2021-12-20¶
Renamed the
prefilters
property in the hops DTL function tosubsets
.prefilters
had some known issues and is now deprecated. Note that you may have to reset the pipe if you change fromprefilters
tosubsets
. All new pipes should usesubsets
to get the documented behaviour.
2021-12-17¶
Added
custom_ca_pem_chain
property to the URL system and REST system. This property can hold a custom chain of certificates (in PEM format) that will be used to validate the SSL connection ifverify_ssl
is set totrue
.
2021-12-11¶
Our security team has investigated the impact of CVE-2021-44228. The following components have been analysed as they could potentially be affected:
Integrated search. This component uses Elasticsearch under the hood. The version of Elasticsearch that we use is not affected according to this Elastic Security announcement.
Legacy Databrowser. This component uses Apache Solr under the hood. The version of Solr that we use is not affected according to this Solr Security announcement.
GDPR Portal. This component uses Apache Solr under the hood. The version of Solr that we use is not affected according to this Solr Security announcement.
Unofficial OCI images that are hosted as microservices. These components can be affected, and our users need to make sure they only run code that they trust.
2021-11-29¶
Changed the default value of the
global_defaults.use_signalling_internally
property of the service metadata section totrue
. This property was previouslyfalse
by default
2021-11-26¶
Integrated search is now available for subscriptions running on the Clustered Architecture.
VPN is now configurable for subscriptions running on the Clustered Architecture.
2021-11-19¶
The IP address of our log shipping receiver endpoint has changed from
13.74.166.9
to52.142.116.113
. If you run a self-hosted service and have blocked outgoing traffic then you need to update the firewall accordingly. See the Self-hosted service document.
Changed the name of “The Microsoft Azure SQL Data Warehouse system” to “Microsoft SQL Server system” and “The MSSQL system” to “Legacy Microsoft SQL system”
The “Legacy Microsoft SQL system” has been superceeded by the “Microsoft SQL Server system” and will likely be deprecated in the future
The “Microsoft SQL Server system” has a new type
"system:sqlserver"
which replaces the old"system:mssql-azure-dw"
, which is kept as an alias for nowAdditional note: the recommended “Microsoft SQL Server system” uses official Microsoft (ODBC) drivers while the “Legacy Microsoft SQL system” uses open source drivers. The Microsoft ODBC drivers should support all current Microsoft SQL Server compatible products, including Azure Synapse Analytics (previously known as Azure SQL DataWarehouse). Note that switching from the “Legacy Microsoft SQL system” (
"system:mssql
) to the preferred “Microsoft SQL Server system” ("system:sqlserver"
aka"system:mssql-azure-dw"
) can lead to minor data differences in properties due to the different driver backends
2021-11-11¶
Added a
encode_error_strategy
property to the CSV endpoint - it tells the sink how to deal with encoding errors when the encoding is different from “utf-8”, the default is to use a “backslashed unicode” replacement but other strategies can be chosen
2021-11-09¶
Added a “discard-retries” pump operation to the service API - it is available in the UI as a “Discard retry queue” menu item on pipes. This operation will make the next pipe run ignore any previous write error retries by writing a special “pump-discard-retries” entity to the pipes execution log. This operation can only be done on non-running pipes.
2021-10-25¶
Added a
byte_order_mark
property to the CSV endpoint and XML endpoint sinks. Iftrue
these sinks will emit a UTF-8 byte order mark (BOM) to the start of the file/stream. It’sfalse
by default and should only be used in conjunction with a UTF-8 encoding.
2021-10-11¶
The http_endpoint source will now get its completeness value from the “X-Dataset-Completeness” http request header, if it is present. If the header is not present, the current time will be used instead, just as before.
2021-09-29¶
Added a new Quick Reference document for faster and easier navigation to configuration types and DTL transforms and functions.
2021-09-28¶
Added the (experimental) ni-collapse and ni-expand DTL functions. Note that these are only meant to work with the
global_defaults.symmetric_namespace_collapse
service metadata option set totrue
(false
by default while this functionality is in experimental state)
2021-09-27¶
The “Datasets” page has been removed.
A dataset is managed by a pipe and considered a part the pipe. All the details about a dataset have therefore been moved to the pipe page of the pipe that writes to the dataset (under Output). Internal datasets can be found under “Datahub” > “Internal datasets”.
2021-09-01¶
Added an explanation about why you should not hop to the sink dataset.
2021-08-16¶
Clarified when the
is_first
andis_last
flags can be expected to be set in the Sesam JSON Push Protocol - these flags are only set when running a full sync (i.e. not when in incremental mode). They are intended to signal to the client the start and end of a full sync run across multiple requests.Fixed a bug in the JSON (push) sink that set the
is_first
flag also on incremental syncs.
2021-08-04¶
Added a
header
property to the JSON source. This property can be used to specify additional header values to be set when doing HTTP GET requests. This was added to make the JSON source symmetrical with the JSON (push) sink. Note that both the JSON source and sink adhere to the Sesam specific JSON Pull Protocol. Consider using the more general REST source or sink if you’re interacting with a non-Sesam JSON capable REST api.
2021-06-14¶
Added a
json_content_types
property to the REST system. This property can be used to specify additional JSON content types to accept besides the default “application/json”. The content must still be valid JSON. Note that the REST source will no longer attempt to parse all responses as JSON but check the content-type against the list of recognised content-types first. If the response content-type is not in this list, it will be treated as “unknown” and an empty entity containing a property with the response body (and optionally the content type) will be emitted for further processing with DTL. Support forresponse_include_content_type
andresponse_property
has been added to the REST source for this scenario.
2021-06-09¶
Added a
initial_since_value
property to the source configuration. This property holds the “since” value to use by the source when the pipe offset is unset (or has been reset).The
since_default
property of the SPARQL source has been deprecated, please useinitial_since_value
instead.
2021-05-31¶
We’ve updated our Subscription Fee, payment terms. For existing customers, the changes will take effect from September 1st 2021.
2021-05-20¶
Added a Sesam Community section.
2021-05-06¶
If pipes with sources with the chronological strategy fail, they now save their pipe offset based on last successful batch in the pipe run. This improvement makes it more likely that a failing pipe is able to make progress.
2021-05-05¶
Added
rate_limiting_retries
andrate_limiting_delay
properties to the REST source, REST transform, REST sink and REST system. These can be used to retry failed requests that return a HTTP 429 error code.
2021-05-03¶
The
payload_property
of the REST source and REST transform now supports traversing a path in the response body using a “dotted” notation.
2021-04-29¶
Added a configuration hint for controlling the deployment of microservices. The new eager_load_microservices option will allow Sesam to hold off starting up microservices which are not connected to any pipes. This option is
true
by default, in line with previous behaviour. The option can be overriden per system using theeager_load
flag in the Microservice system configuration. Individual microservices which need to be run eagerly should have the optioneager_load
set totrue
explicitly in anticipation of the default changing.
2021-04-15¶
Added ‘dialect’ keyword to Microsoft Azure SQL Data Warehouse server system to indicate whether it’s a normal SQL server or a Synapse server. Note that it uses the ‘HEAP’ table type when used to create new tables.
2021-03-25¶
The driver for the LDAP system has been changed to version 2.4 of LDAP3 . The new driver gives the same results as the old driver in our tests, but it is still possible that there may be some subtle changes in how the new driver interacts with the LDAP server. The newer version implements some security fixes.
2021-03-22¶
The mail message sink will now automatically add a
Date
header to the email message.Added support for specifying a list of HTTP response status codes to ignore in the REST transform.
2021-03-19¶
Added support for paginated responses to the REST transform as well.
The REST transform
response-property
,replace-entity
andresponse-include-content-type
properties has been deprecated. Useresponse_property
,replace_entity
andresponse_include_content_type
instead.
2021-03-15¶
Added experimental REST source. This source is intended to be able to replace some of the connectors that currently require Microservices.
2021-03-12¶
Notification status changes on Status page is now fully automated.
2021-03-05¶
Added default
operation
,properties
andpayload
values to the REST sink and REST transform
2021-02-19¶
The driver for the MySQL database type has been changed to the latest stable version of PyMySQL (the old driver was from 2015, and we wanted to use a more recent driver). The new driver gives the same results as the old driver in our tests, but it is still possible that there may be some subtle changes in how the new driver interacts with the MySQL database (for instance in how data is converted between Sesam’s internal format and the fields in a database table).
2021-02-18¶
A new property
equality_sets
has been added to the merge source. This property can be used instead of (or in combination with) theequality
property, and should make it a bit easier to configure the equality-rules correctly.
2021-02-15¶
Open Sesam will shut down March 31st, 2021. It unfortunately did not gain as much traction among our users as we had hoped and we are focusing more on the core product. We will notify the users by email soon.
2021-02-11¶
The default batch_size value of pipes that use the REST sink has been changed to 1 (used to be 100).
2021-02-05¶
We are optimizing the maximum number of concurrent running pipes in small subscriptions. The rationale is to get better overall performance. Note that this also affects self-hosted subscriptions.
Documented the compaction settings in the global defaults section of the service metadata. Note that should be careful in changing these values as this can lead to loss of data and/or influence dependency tracking functionality.
2021-02-01¶
We automatically upgrade a Small subscription type to a Medium subscription type if the data storage usage exceeds 40 Gb. We also upgrade a Medium subscription type to Large subscription type if the data storage usage exceeds 350 Gb. Note that this also affects self-hosted subscriptions.
2021-01-11¶
Added experimental support for running a pipe rescan in the background while simultaneously doing normal incremental pipe-runs.
2020-12-01¶
Changed the receive endpoint for log shipping. See Self-hosted service.
2020-11-20¶
New circuit breaker feature for uploading configuration available in service metadata. Prevents the node from updating it’s configuration if the new configuration would result in the deletion of more than 10 and more than 10% of existing components (for example when using the
/config
API). The circuit breaker can be activated by setting the service metadata propertyglobal_defaults.use_config_circuit_breaker
totrue
.
2020-11-18¶
The
blacklist
andwhitelist
properties of the SQL sink has been deprecated. You can use DTL to filter properties to achieve the same functionality.Note that these deprecated properties cannot be used to avoid inserting values into or overwriting values of existing table columns (partial table updates) or to support identity columns.
For the special case of identity columns (columns with automatically assigned values) some RDBMS systems such as MS SQL Server allow you to define a “writable view” that can be used as a workaround for this. We have added some information to the documentation on this usecase for MS SQL Server.
2020-11-13¶
In the pump configuration section the
use_dead_letter_dataset
property has been deprecated and thedead_letter_dataset
property has been un-deprecated. Please update your configuration. Thedead_letters_dataset
should contain a per-pipe unique user dataset id. The motivation for this reversal is that we wish to migrate away from using system datasets for any “dead letters” in a pipe.
2020-10-23¶
Documented the REST transform.
2020-10-09¶
Fixed a bug in datetime-shift and other functions that does implicit or explicit timezone-conversion where we didn’t have the correct historic daylight saving information. This affects the following ranges: 1895-1901, 1916, 1940-1945, 1959-1965 and any year after 2038.
2020-08-24¶
Changed default compaction type to
sink
. To go back to the previous default, you can set sink compaction tofalse
on individual pipes or set the global default propertydefault_compaction_type
tobackground
in the service metadata.
2020-08-21¶
Added an optional
description
property to pipes and systems - it can be either a string or a list of strings.Added an optional
comment
property to pipes, systems, sources, sinks, pumps and transforms - - it can be either a string or a list of strings.
2020-08-17¶
The dataset sink property
set_initial_offset
now accepts theonload
enum value. This enum value sets the sink dataset’s initial offset when the pipe is loaded / configured.
2020-08-13¶
The encrypt-pki, encrypt-pgp and their corresponding decrypt DTL functions now support using ‘$SECRET()’ syntax in their key and password parameters
2020-08-04¶
Documented the
instance
property of the MS SQL system. Please note the the potential consequences for firewall rules when using this property.
2020-06-19¶
Experimental pipe entity type inferencing now enabled by default. Change default value by setting service metadata property
global_defaults.infer_pipe_entity_types
tofalse
.
2020-05-28¶
Added the Restore completed and Pump offset set notification rule types.
2020-03-27¶
Added the
dependency_tracking
property to service metadata. It can be used to specify various dependency tracking related properties.
2020-03-23¶
Added the
max_entity_bytes_size
property to the dataset sink.Added the
global_defaults.max_entity_bytes_size
property to service metadata.
2020-03-18¶
Added the
global_defaults.default_compaction_type
property to service metadata.
2020-03-05¶
The union_datasets source now as a
prefix_ids
property that can be set to false to not add the dataset id as the prefix on entity ids.
2020-03-03¶
The transform function rename will now rename properties with a null value. The old behaviour ignored such properties, but that was considered to be a bug.
2020-02-12¶
Added support for
create_table_if_missing
SQL sink property for the Oracle, Oracle TNS and MySQL systems. Previously only the MS SQL and PostgreSQL systems supported this option.
2020-01-08¶
The default value of the
read_timeout
property has been changed from 7200 seconds to 1800 seconds for the URL system and the Microservice system.
2019-12-19¶
The replace DTL function now takes a dict argument that lets one specify more than one string replacement.
2019-12-18¶
Updated the documentation for the
supports_signalling
property on dataset sources and theglobal_defaults.use_signalling_internally
property of the service metadata section.The The JSON push sink and REST sink no longer includes header values or entity data in the traceback details of the execution log on failures.
The execution log and dead letter entities no longer includes copies of the
source
orsink
configuration properties of the pipe.The properties of the event entities in the execution log are now truncated at 10 mb to avoid excessive event entity sizes. Note that this cut-off value might be decreased further in the future.
If the pump fails due to exceeding retry limits, the entity in question is no longer included in the traceback properties. Instead it’s put in a separate
exception_entity
property. Note that this property is not included in the monitoring data, so you cannot devise notification rules that refer to it.
2019-12-17¶
Added support for Config groups.
2019-11-25¶
The RDF source will no longer add the
<rdflibtoplevelelement>
root wrapper element to literals with datatypehttp://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
. This is a breaking change.
2019-10-28¶
Added the hex DTL function.
Updated the integer DTL function to parse hexadecimal values.
The dataset sink now has a property called
prevent_multiple_versions
that makes the pipe fail if an entity already exists in the sink dataset. This is useful if one wants to prevent multiple versions of the same entity to be written.The dataset sink now has a property called
suppress_filtered
. The default value isfalse
unless it is a full sync and the source is of typedataset
andinclude_previous_versions
isfalse
. The purpose of this property is to make it possible to opt-in or opt-out of a specific optimization in the pipe. The optimization is to suppress entities that are filtered out in a transform early so that they are not passed to the sink. This optimization should only be used when the pipe produces exactly one version per_id
in the output. The optimization is useful when the pipe filters out a lot of entities.
2019-10-07¶
Sink compaction, merge source, LDAP source, Email message sink, SMTP system, SMS message sink, Twilio system, REST system, and REST sink are no longer experimental.
The reference DTL function has been deprecated.
The Kafka system, Kafka source and Kafka sink have been deprecated.
2019-09-04¶
Index version 2 is now the default version for dataset indexes. This index implementation (version 2) supports bidirectional traversal and that can be used to expose incremental feeds for one or more subsets of a dataset.
2019-09-04¶
Added new Pump finished overdue notification rule type.
Added new Pump failed notification rule type.
2019-08-27¶
DTL property path strings can now be quoted. In practice this means that you can have periods in path elements if you quote them. Example:
"_S.foo.'john.doe''s'.bar"
is now equivalent to["path", ["list", "foo", "john.doe's", "bar"], , "_S."]
. A quoted path element must begin and end with a single quote. Single quotes can be escaped with''
.Extended the JSON Pull Protocol document with information about response headers and an example using dataset subsets.
2019-08-26¶
We’ve added support for a feature called completeness. When a pipe completes a successful run the sink dataset will inherit the smallest completeness timestamp value of the source datasets and the related datasets. Inbound pipes will use the current time as the completeness timestamp value. This mechanism has been introduced so that a pipe can hold off processing source entities that are more recent than the source dataset’s completeness timestamp value. The propagation of these timestamp values is done automatically. Individual datasets can be excluded from completeness timestamp calculation via the
exclude_completeness
property on the pipe. One can enable the completeness filtering feature on a pipe by setting thecompleteness
property on the dataset source totrue
.
2019-08-19¶
Pipes now have a property called
reprocessing_policy
that can be set to cause automatic resets when external factors indicate that the pipe should be reset.
2019-08-12¶
The dataset sink now has a property called
set_initial_offset
that specifies how the sink should set the initial offset on the sink dataset (a.k.a. the populated flag).
2019-05-31¶
Added experimental support for automatic scheduling of internal (dataset to dataset) pipes and JSON pipes that read from external Sesam datasets via the REST API. See the
supports_signalling
property of these sources and the globaluse_signalling_internally
anduse_signalling_externally
options in service metadata section. Please note the limitations and usage notes.
2019-04-23¶
The embedded source now has configurable continuation properties, i.e.
supports_since
,is_chronological
andis_since_comparable
.
2019-04-01¶
The “dtl” transform will now fail if the target entity’s
_id
property is either missing or is not a string. It will also do so if the arguments to “create” and “create-child” is not a dict or is missing the_id
property or the_id
property is of a non-string type. This is a change in default behaviour, but it is possible to opt-out of this new behaviour by setting theid_required
property tofalse
. It would make it easier to discover logic errors.
2019-03-26¶
The
track_children
property on the dataset sink is now inferred to betrue
if any of the pipe’s transforms use thecreate-child
DTL function. It is possible to override this by setting the property’s value tofalse
.
2019-03-22¶
The lookup DTL function has been deprecated and replaced with the lookup-entity function. Note that the dataset referenced in its first argument must be populated before the parent pipe will run.
2019-03-14¶
The valid characters in pipe and system ids have been restricted to be valid DNS name components. In practice this means that the first character must be a letter or a digit and the rest must be letters, digits and hyphens. The maximum length is 62. Invalid ids will trigger a validation warning.
2019-03-13¶
A source that has
supports_since=true
,is_since_comparable=false
andis_chronological=True
will now use the chronological continuation strategy. Earlier it used no continutation strategy.
2019-02-27¶
2019-02-15¶
Made the URL system throw an error if it received an invalid ‘Content-Length’ response header value. The URL system used to ignore such errors; the new
ignore_invalid_content_length_response_header
property can be set to get the old behaviour.
2019-02-14¶
Added the docker.hosts property to the microservice system. This allow adding custom hostname to IP address mappings to the microservice container.
2019-02-13¶
Added a new coerce_to_decimal property to the Oracle and Oracle TNS systems. If set to true, it will force the use of the decimal type for all “numeric” types (i.e. numbers with precision and scale information). Currently what type the column data ends up as is not clearly defined by the oracle backend driver so in some cases it may yield a float value instead of a decimal value. This property should always be set to true if your flows care if numeric values are floats or decimals. The default value is false.
2019-02-07¶
We’ve changed the default strategy for pipe execution logging. By default, we now will never log any runs which resulted in no processed/changed entities. You can opt-in to the previous behaviour by editing the
log_events_noop_runs
,log_events_noop_runs_changes_only
andnotification_granularity
pump properties.
2019-02-04¶
There is now a new index implementation (version 2) that supports bidirectional traversal and that can be used to expose incremental feeds for one or more subsets of a dataset. Index version 1 is currently the default. Nodes must be started with a special command line option in order to change the default value. Version 2 will be made the default at some point once we have enough experience with it.
The dataset and json sources now support the
subset
property. This property is used to specify a subset of the source dataset.The hops and apply-hops DTL functions now support the
prefilters
property. This property is used to specify a subset of the dataset that it is hopped to.The
GET /api/datasets/{dataset_id}/indexes
API endpoint now includes the indexes’ version number.The
DELETE /datasets/{dataset_id}/indexes/{index_int_id}
API endpoint has been added. It can be used to delete a dataset index.
2019-01-28¶
Compaction is now incremental, so it will continue from where it got to the last time.
Compaction will be performed by the dataset sink if
compaction.sink
is set totrue
in the pipe configuration. This is only available for pipes using the dataset sink. If sink compaction is enabled no scheduled compaction will be done on the dataset as this is no longer neccessary. Index compaction will still require scheduled compaction, but this does not require a lock on the dataset. Note that sink compaction is currently experimental.Automatic compaction will now kick if there are 10% or 10000 new dataset offsets since the last compaction. The 10000 cap is fixed for now.
2019-01-03¶
The dataset sink will now mark the sink dataset as populated when all input datasets are populated and all entities have been read from them. Earlier it marked the sink dataset as populated after the first completed run. This was typically not what you wanted as it caused the sink datasets to be prematurely populated, which then caused unnecessary dependency tracking.
Added the
initial_datasets
property to the merge, merge_datasets, union_datasets, and diff_datasets sources. This property should only be used if some of the input datasets will never be populated. The property should then list the datasets that have to be populated before the sink datasets should be populated.
2018-12-07¶
Casting decimal numbers containing a “scientific notation” shorthand (i.e. “1E-3”, “10E14” etc) to a string using the DTL string function will now expand the exponent to its full representation (i.e. “1E2” -> “100”, “1E-3” -> “0.001”). This is a change in behaviour.
2018-11-12¶
["matches", "x*", ["list"]]
now returnsfalse
instead oftrue
. Note that this is a breaking change, but the old behaviour was considered a bug as it is both non-intuitive and most likely not what you want.
2018-10-31¶
Added the
sslmode
property to the PostgreSQL system. Its default value (prefer
) reflects the PostgreSQL client library default, hence you should only set this property if you need other behaviour than the default.
2018-10-25¶
Added the Kafka system, Kafka source and Kafka sink.
2018-10-16¶
Added
compaction.growth_threshold
property to the pipe configuration. This lets you specify when dataset compaction kicks in.The
compaction.keep_versions
property can now also be set to0
and1
. The default value is2
; which is needed for dependency tracking to be fully able to find reprocessable entities. Setting it to a lower value means that dependency tracking is best effort only.
2018-09-24¶
Added a new
recreate_table_on_first_run
boolean flag to the sql sink - it controls if Sesam should recreate the table fromschema_definiton
when the pipe is reset or runs for the first time. Note that this requires thecreate_table_if_missing
property to also be set totrue
to take effect.Altered the way the PK is created on schema definition generation. If the sink type is
sql
andcreate_table_if_missing
is set totrue
, the default primary key is the_id
property of the entities. Previously it would always look for a property with the same contents as_id
(which is still the default for non-sql sink pipes).
2018-09-03¶
Added a
fallback_to_single_entities_on_batch_fail
boolean flag to the pump configuration. The default reflects the current behaviour (true
). It can be usefuly to set tofalse
if the cost of processing a single entity at a time is high and there is a lot of entities in a batch (for example in a typical MS SQL sink in initial bulk upload mode).
2018-08-24¶
Datasets that are not populated will no longer be compacted.
2018-08-10¶
Receiver and publisher pipes can now be disabled.
2018-08-02¶
Added support in the split DTL function to split string into characters using the empty separator.
2018-07-04¶
Added a translation GUI for the GDPR platform. This GUI makes is much easier to customize the various texts used by the GDPR portal.
2018-06-26¶
2018-06-25¶
Changed the base64-encode and base64-decode DTL functions to only accept bytes and string input respectively.
Added support for bytes input to the string casting function. The encoding used is
utf-8
.Added a bytes casting function that casts strings to (
utf-8
encoded) bytes representation.
2018-06-19¶
Added a RDF transform, similar to the XML transform. It will render entities to a NTriples string and embed it in the transformed entity.
Added the base64-encode and base64-decode DTL functions.
2018-06-06¶
Changed default behaviour of the CSV source: if
dialect
is set, this will override the default value ofauto_dialect
. Previously you would have to both turn offauto_dialect
and setdialect
. Note that ifauto_dialect
isfalse
and nodialect
has been set, theexcel
dialect is used as default.The is_chronological property on the SQL source is now dynamic as it is
true
if theupdated_column
andtable
properties are set.Added the is_chronological_full property to the SQL source . If explicity set to
false
then a full run will not consider the source to be chronological even though it is chronological in incremental runs. The default value is the value of theis_chronological
, but can be set tofalse
.
2018-06-05¶
The old
dead_letter_dataset
pump configuration option (string) has been deprecated and replaced byuse_dead_letter_dataset
, which is a boolean flag (false by default). If set to true, the id of the dead letter dataset is automatically generated and linked to the parent pipe id (system:dead-letter:pipe-id
). Note that entities written to this new dataset will no longer have the pipe id as part of their_id
property. This new dataset will inherit the ACLs from its parent pipe (like pump execution datasets). If the pipe is removed, the automatically created dataset is also removed. The olddead_letter_dataset
property will continue to work as before but will be removed at some future date.
2018-05-29¶
Added the checkpoint_interval property to the pipe. The default has been changed from
1
to100
, which means that the pipe offset is now saved after every 100 batches instead of after every batch. The default is effectively every 10000 entities, but since it is dependent onbatch_size
the default value is100
(i.e. 10000/batch_size
). Note that the pipe offset is always saved at the end of every sync if it changed.Pipes that perform deletion tracking will now have their pipe offset and deletion tracking state saved every 15 minutes or so. If a pipe is interrupted it will now be able to continue doing deletion tracking from where it last saved it’s state.
2018-05-02¶
2018-04-30¶
A partial rescan can now be scheduled on a pump by specifying the two properties
partial_rescan_count
andpartial_rescan_delta
.
2018-04-27¶
Added the hash128 DTL function. It generates 128 bit integer hashes from bytes and strings.
2018-04-26¶
The sink dataset and the dead-letter dataset will now be asserted when the pipe is loaded. Receiver datasets, i.e. sink datasets that are used in combination with the
http_endpoint
source, will be automatically populated at the same time. Note that it is possible to opt-out of this behaviour by settingauto_populate_dataset
tofalse
on the http_endpoint source. Dead-letter datasets are automatically populated, and it is not possible to opt-out.Note that this is a change in behaviour, but in most situations it is the right thing to do. If the initial push to the receiver is a full sync, then it might be good to set
auto_populate_dataset
tofalse
. The reason why this is useful for full syncs is because pipes doing hops against the dataset will then wait until the sync is complete and the dataset is populated.
2018-04-23¶
Processing of namespaced identifiers have gotten a decent performance boost.
Regression: The
make-ni
DTL function will now return a sorted list of NIs. Earlier the sorting was done by sorting the keys of the source entity, which is a much expensive thing to do.
2018-04-19¶
Added support for circuit breakers, a safety mechanism that one can enable on the dataset sink. The circuit breaker will trip if the number of entities written to a dataset in a pipe run exceeds a certain configurable limit.
2018-04-09¶
Added the round DTL function. It rounds to the nearest digit using the “round half to even” rule.
2018-03-20¶
Added oauth2 (BackendServerClient profile, aka “client credentials”) option to the URL system
2018-03-07¶
Changed the default value of the node configuration setting “pipe_cleanup_after_deletion” to “true”. This means the node will remove any pipe-related data when a pipe is deleted (execution logs, acls, pipe offsets etc)
2018-03-05¶
Added the map-values function. It maps over the values of dictionaries and returns a list of mapped values.
2018-02-27¶
The combine DTL function now allows a single argument. This is useful when you want to turn an expression into a list of values. It is extra useful when you don’t quite know if the value is a list or not. Example:
["combine", "_S.x"]
2018-01-22¶
Added a
content_disposition
configuration property to be able to specify the type in theContent-Dispositon
HTTP response header to the HTTP endpoint sinks.Added the possibility to specify the
filename
of the HTTP endpoint sinks as the last element of the URL (overrides anyfilename
set in the configuration of the sink).
2018-01-16¶
Added the url-unquote function that URL unquotes any URL quoted characters in its input. See the related url-quote function.
2018-01-15¶
The RDF source and SDShare source now supports the
sort_lists
property to automatically sort resulting properties containing lists (i.e. RDF statements having the same predicate). It istrue
by default.
2017-12-15¶
The JSON source now supports the
page_size
property.
2017-12-14¶
Added
encrypt-pgp
anddecrypt-pgp
DTL functions that can encrypt strings to OpenPGP messages using a PGP public key and decrypt these messages back to strings using a PGP private key and its associated password.
2017-12-12¶
Added
encrypt-pki
anddecrypt-pki
DTL functions that can asymmetrically encrypt strings to bytes and decrypt bytes to strings using a PKI public/private key-pair in DEM format (PKCSv8). The encryption is performed using RSA 2048 bits with sha-1 hashes and OAEP/MGF1 padding.
2017-11-23¶
Added Databrowser documentation.
2017-11-22¶
Added the Pattern match notification rule type.
2017-11-15¶
Added the intersects DTL function. This boolean function returns true if there is an overlap between the values in the two arguments.
The DTL compiler will now issue a warning if you try to perform two or more join expressions between the same two dataset aliases. It is there to notify you of possible cardinality issues and to tell you about the tuples function, which may be used to avoid cardinality issues.
When there are two or more join expressions between the same two dataset aliases only the first one is treated as a join expression; the rest of them are equality comparisions. One can use the tuples function to combine them into one big join expression at the cost of composite indexes being used.
Warning
Note that the eq function serves a dual purpose. It can both be used for join expressions and it can be used for equality comparisions. These two are different in that a join uses intersection (similar to the
intersects
function) and the equality comparison is an exact match. Use the intersects function if you want to check for intersection/overlap instead of an exact match.
2017-11-08¶
The JSON push sink now supports customizable HTTP headers via a
headers
property.
2017-10-12¶
Documented the JSON Pull Protocol.
2017-10-09¶
If a pipe is running and the pipe-config is modified, the pipe will no longer be stopped. Instead a “An old version of the pipe is still running” warning will be displayed, and it is up to the user if they want to stop the running pipe or not.
2017-09-06¶
Improved and expanded documentation on namespaced identifiers and the features related to it.
Moved the deprecations to a separate document.
2017-09-05¶
Added a
track_dead_letters
option to the pump configuration. If set to true, it will delete “dead” entities from the dead letter dataset if a later version of it is successfully written to the sink. Note that using this option incurs a performance cost so use with care.
2017-08-23¶
It is now possible to specify
track-dependencies
on all the HOPS_SPEC in a specific hops DTL function. This change was made so that one can disable tracking for any of the HOP_SPECs, not just the last one.
2017-08-16¶
The json-parse and json-transit-parse DTL functions now accept an optional default value expression. The default value expression is used when the input value is not valid JSON.
2017-08-08¶
The datetime-parse and datetime-format DTL functions now accept an optional timezone argument. This makes it possible to parse datetime strings and format datetime values in specific timezones.
2017-06-29¶
When a pipe is reset then the pipe’s retry queue is now also reset.
Bug fix: It is now possible to interrupt pumps that are performing retries.
Indexing of datasets changed so that each dataset is indexed for a maximum of five minutes in each iteration. This prevents some datasets from being blocked from indexing when there are other large datasets being indexed.
2017-06-26¶
Added the enumerate DTL function that can be used to enumerate values, i.e. combine values with an enumeration count.
Added the json-parse and json-transit-parse DTL functions.
2017-06-23¶
Added a conditional transform. This works the same way as conditional sinks and sources.
2017-06-20¶
Added functionality for preventing all pipes from automatically running (useful in some debugging scenarios). See the Low level debugging page for details.
2017-06-16¶
Added a
is_sorted
property to the RDF source to indicate that the input data is sorted on subject, enabling the source to avoid loading the entire file into memory. Note that it only works fornt
(NTriples) format files without blank nodes.
2017-06-12¶
Added a
write_retry_delay
property to pipe pumps. This is used in conjunction withmax_consecutive_write_errors
when the system the pipe is writing to is known to be sporadically (non-transiently) unavailable. See the Pump section for details.
2017-06-08¶
The Security document now contains a description of users, roles and permissions in Sesam.
2017-05-31¶
Added support for bulk operations in the SQL sink. Bulk operations are currently only supported for the MSSQL and Microsoft Azure SQL Data Warehouse systems.
2017-05-29¶
Added the
indexes
property to the dataset sink. If set to"$ids"
then an index will be maintained for the$ids
property. This index will then be used by the dataset browser to look up entities both by _id and $ids.The default value of the
max_depth
property in hops has been changed fromnull
to10
. This means that the default is to stop the recursion at level 10.
2017-05-26¶
The JSON push protocol has been simplified to make it easier to write receivers. It will now always send the entities as an array, even if it contains just a single object. The JSON push sink has been updated to reflect this. If you need single-object JSON POST/PUT operations, you should use the REST sink instead.
Systems now support environment variables in their config like pipes do
2017-04-28¶
The
equality
property on themerge
source is now optional.
2017-04-24¶
Changed the default value of the “schedule_interval” pump configuration property. Before, the default value was 30 seconds for all pipes. The new default value for pipes with a dataset sink and a dataset sink is now 30 seconds +/- 1.5 seconds. For all other pipes, the default is 900 seconds +/- 45 seconds. (The
+/-
part helps stagger the start-time of the pipes, so that we don’t get lots of pipes starting at the same instant.)Added a warning in the GUI for non-internal pipes that don’t have a “schedule_interval” or a “cron_expression” attribute set.
2017-03-30¶
Extended all systems to accept a new property
worker_threads
that limits the number of concurrent pipes that can run against a particular system. The default value is 10. For inbound pipes the source system is used and for outbound pipes the sink system is used. For internal pipes, the the pool has 50 worker threads (i.e. for dataset to dataset pipes or receiver/publisher endpoints).
2017-03-24¶
Extended the URL system and REST system to accept default custom request headers using the
headers
property. Also fixed the REST system schema to reflect authentication options and thejwt_token
property.
2017-03-16¶
The JSON Push Protocol document now contains examples of how to use
curl
to perform incremental and full syncs.
2017-03-15¶
Added the _R variable, which can be used to refer to the root context in a DTL transform.
2017-03-14¶
The
base_url
property of the URL system and REST system has been deprecated. It has been superseded by the theurl_pattern
property.
2017-03-09¶
Added the is-changed DTL function that can be used compare data from the current and the previous version of the source entity.
2017-03-02¶
Added a conditional source and conditional sink that can pick from a list of actual candidates, typically controlled by an environment variable.
2017-03-01¶
Added a substring DTL function that returns a substring of another string given a start and end index.
2017-02-28¶
2017-02-20¶
Added
url_pattern
property to URL system. This property gives you more control over how absolute URLs are produced. It can be used instead of thebase_url
property.
2017-02-14¶
Added a
jwt
authentication scheme andjwt_token
property to the URL system
2017-02-06¶
Added
text_body_template
andtext_body_template_property
properties to the Email message sink. Use these to explicitly construct a plain-text version of your messages if sending multi-part messages.
2017-02-03¶
For security reasons, the Mail and SMS sinks no longer support file-based templates. Note that this is a non-backwards compatible change. You can use environment variables and upload your existing template files using the environment variable API or the corresponding Management Studio form.
2017-02-01¶
Datasets are now scheduled for automatic compaction once every 24 hours. The default is to keep the last 2 versions up until the current time. It is possible to customize the automatic compaction. See documentation on compaction for more information.
2017-01-26¶
The SQL source no longer includes columns with null values by default. You can include them by setting the
preserve_null_values
property of the SQL source totrue
. Note that this is a change of the previous default behaviour.The CSV source no longer includes empty string values by default. You can include these by setting the CSV source property
preserve_empty_strings
totrue
. Note that this is a change in the default behaviour.
2017-01-23¶
The
dict
function now takes zero, one or an even number of arguments. If zero arguments given then an empty dict is returned. If an even number of arguments then a new dict with each pair of arguments as key and value. The latter is convenient for easy construction of dicts.The transform functions add and default now take an expression in their first argument. This means that the properties can be dynamic and that there can be multiple. rename now takes dynamic arguments in the first and second positions.
2017-01-11¶
Documented the
pool_recycle
option on SQL systems and changed its default from -1 (no recycling) to 1800 (30 minutes).
2017-01-06¶
Added the merge source. This is a data source that is able to infer the sameness of entities across multiple datasets.
2017-01-04¶
Added an
unhandled_template_variable_replacement
property to the Email Message sink.
2016-12-20¶
Added a
uuid
DTL function. It takes no parameters and returns a UUID object (type 4).
2016-12-19¶
Added a
disable_set_last_seen
property to the Pipe properties. If set totrue
, it will not be possible to set or reset thelast seen
bookmark on the pipe using the API (i.e. protecting it from accidental changes by principals with write permission on the pipe).
2016-12-15¶
Added a
read_retry_delay
property to pipe pumps. This is used in conjunction withmax_read_retries
when the source is known to be sporadically (non-transiently) unavailable. See the Pump section for details.
2016-12-07¶
The documentation on cron expressions now makes it clear that they are evaluated in the UTC timezone.
2016-12-06¶
The concat DTL function now takes a variable number of arguments. This avoids constructing unnecessary lists.
2016-11-30¶
The url-quote DTL function now takes an optional
SAFE_CHARS
argument. This is especially useful when you don’t want to quote the/
character.
2016-11-22¶
The section on Continuation Support has been extended. Each source now has a Continuation support table that shows the source’s support for continuations.
2016-11-09¶
Added the json and json-transit DTL functions.
The group-by DTL function has been changed to always return string keys. The string keys are the JSON transit encoded (same type of string as the json-transit function produces). The reason is that the entity data model (and JSON) only supports string keys.
group-by
has also gotten an optional STRING_FUNCTION argument which lets you specify a custom function to create the string keys.The sorted, sorted-descending, min, max DTL functions have been updated to support mixed type ordering.
2016-11-07¶
Added the microservice system (Experimental).
2016-11-03¶
Added the
filename
property to the HTTP endpoint sink, XML endpoint sink and CSV endpoint sink. This property provides a hint to HTTP clients on what filename to use when downloading data (via theContent-Disposition
header property).
2016-10-18¶
Added the Embedded source. This is a data source that lets you embed data inside the configuration of the source. This is convenient when you have a small and static dataset.
2016-10-17¶
Added the XML transform and XML endpoint sink. These can be used to generate XML documents inline in entities or published to external consumers, respectively.
2016-10-13¶
Changed the CSV endpoint sink to not output deleted entities by default. Added a new skip-deleted-entities config parameter that can be set to
false
if one want deleted entities to appear in the CSV output.
2016-10-04¶
Reworked DTL math functions to reflect that
float
is an allowed type in entities. If the function parameters are of mixed types, the result will be coerced to the type that is the most precise. I.e. float+decimal=decimal, int*float=float, int/div=decimal and so on. Not that this is a change in behaviour as entities that previously only haddecimal
as types after using DTL math functions if the input was of type float, now may end up with values that are floats instead. Use the dtldecimal
cast-function to coerce the result todecimal
if this is important to the application.Added
is-float
andfloat
DTL functions. Changedis-decimal
function so it no longer returnstrue
if the argument is afloat
. You will now have to add both ais-float
and ais-decimal
in anor
clause to test for both types.
2016-09-28¶
Added Elasticsearch support, which includes a system and a sink.
Added the
commit_at_end
property to the Solr sink.Moved the
commit_within
property from the Solr system to the Solr sink. The reason is that the commit rate is really specific to how and where it is used. This change is backward compatible, as the default value is taken from the system. It is recommended to update the configuration files accordingly.
2016-09-28¶
Fixed the documentation for the merge DTL transform; it mistakingly stated that the merge transformation would not overwrite existing attributes in the target entity.
Updated the /api/config GET” endpoint to format the json in a more human-readable way.
2016-09-22¶
Added index inspection on datasets.
Added new analyze-dtl operation.
Fixed automatic index creation for the run-dtl operation.
Linked to the changelog from the Management Studio.
2016-09-21¶
Added the datetime-shift DTL function.
Added support for timezones to the datetime-parse DTL function.
Added missing sink- and source- prototypes in the “Edit pipe” gui in Management Studio.
Fixed a bug that prevented users from adding a system in Management Studio.
2016-09-20¶
Fixed missing validation in the /api/pipes “POST” endpoint and added support for the “force” parameter.
Fixed missing validation in the /api/pipes/{pipe_id}/config “PUT” endpoint and added support for the “force” parameter.
Fixed missing validation in the /api/systems “POST” endpoint and added support for the “force” parameter.
Fixed missing validation in the /api/systems/{system_id}/config “PUT” endpoint and added support for the “force” parameter.