SQL source

The SQL database source is one of the most commonly used data sources. In short, it presents database relations (i.e. tables, views or queries) as a entity stream to Sesam.

The SQL source has several options, all of which are presented below with their default values:

Prototype

{
    "system": "id-of-system",
    "table": "name-of-table",
    "primary_key": ["list","of","key","names"],
    "query": "SQL query string",
    "updated_query": "SQL query string for 'since' support in queries",
    "updated_column": "column-name-for-since-support-in-tables",
    "whitelist": ["columns","to","include"],
    "blacklist": ["columns","to","exclude"],
    "fetch_size": 1000,
    "preserve_null_values": false,
    "schema": "default-schema-name-if-included"
}

Note

The SQL source system is purposefully read-only transactions, as to avoid side effects.

Column types

See the supported column types list for a overview of which RDBMS column types are supported and how they are mapped to Sesam types. Note that if your table or query property refer to relations with unsupported column types, you will either have to use the blacklist configuration property to ignore them, or write a custom query that coerces the non-supported column to a supported type.

Properties

Property

Type

Description

Default

Req

system

String

Must refer to a SQL system component by id. The role of this component is provide services like connection pooling and authentication for the data sources using it

Yes

table

String

If table is given, it must refer to a fully qualified table name in the database system, not including schema, which if needed must be set separately. The table and query properties are mutually exclusive with table used if both are present. TODO: are table names case sensitive?

Yes

primary_key

List<String> or String

The value of this property can be a single string with the name of the column that contains the primary key (PK) of the table or query, or a list of strings if it is a compound primary key. If the property is not set and the table property is used, the data source component will attempt to use table metadata to deduce the PK to use. In other words, you will have to set this property if the query property us used. Note that this property might be case sensitive.

query

List<String> or String

Must be a valid query in the dialect of the RDBMS represented by the system property. You will also have to configure the primary key(s) of the query in the primary_key property. Note: mutually exclusive with the table property with table taking precedence. If a list of strings is given, they will be converted to a single string by concatenation with the newline character.

updated_column

String

If the underlying relation contains information about updates, the data source is able to support since markers. You can provide the name of the column to use for such queries here. This must be a valid column name in the table or query result sets and it must be of a data type that supports larger or equal (“>=”) tests for the table case. Note that this property might be case sensitive.

updated_query

List<String> or String

If the query property is set, the since support must be expressed by a full query including any test needed. A single variable binding :since must be included somewhere in the query string - for example “select * from view_name v where v.updates >= :since”. If a list of strings is given, they will be converted to a single string by concatenation with the newline character.

Warning

The updated_query must cover the exact same ids as the update query. If not you run the risk of a full sync failing and then completing in incremental mode, which would mean that the full sync is not covering all ids and thus end up deleting ids not seen. Consider setting is_chronological_full to false on pipes to prevent the pipe offset from being set during a full sync.

schema

String

If a specific schema within a database is needed, you must provide its name in this property. Do not use schema names in the table property.

whitelist

List<String>

The names of the columns to include in the generated entities. If there is a blacklist also specified, the whitelist will be filtered against the contents of the blacklist.

blacklist

List<String>

The names of the columns to exclude from the generated entities. If there is a whitelist also specified, the blacklist operates on the values of the whitelist (and not the whole columnset).

preserve_null_values

Boolean

If set to true will include null values in the entities produces by this source. By default they are omitted.

False

fetch_size

Integer

The fetch size of the result sets (number of rows in a cursor fetch) to get from the database

1000

if_source_empty

Enum<String>

Determines the behaviour of the pipe when the SQL source does not return any entities. Normally, any previously synced entities will be deleted even if the pipe does not receive any entities from its source. If set to "fail", the pipe will automatically fail if the source returns no entities. This means that any previous entities in the pipe’s dataset are not deleted. If set to "accept", the pipe will not fail and any previously synced entities will be deleted.

The global default global_defaults.if_source_empty can be set for all pipes in the service metadata.

"accept"

Continuation support

See the section on continuation support for more information.

Property

Value

supports_since

false (Dynamic: true if updated_column set)

is_since_comparable

true (Default)

is_chronological

false (Dynamic: if table and updated_column set then defaults to true, if query then it can be set explicitly)

is_chronological_full

false (Dynamic: true if is_chronological is effectively true and this property is not explicity set to false)

If this property is set to false then a full run will not consider the source to be chronological even though it is chronological in incremental runs.

Note

In practice this avoids doing an order by when doing full runs, but at the cost of not saving pipe offsets and supporting incremental deletion tracking if it fails to complete.

We have seen SQL tables where only the latest rows have an value in the updated column. In that case it is not that useful to use order by and to save pipe offsets incrementally.

Example configuration

The outermost object would be your pipe configuration, which is omitted here for brevity:

Example with a single table:

{
    "source": {
        "type": "sql",
        "system": "Northwind",
        "table": "Customers"
    }
}

Example with a single table, where the primary key is in a column named table_id and the updated datestamp is in a column called updated. This enables us to switch on since support and as such run the SQL query "SELECT * FROM my_table WHERE updated >= :since":

{
    "source": {
        "type": "sql",
        "system": "my_system",
        "table": "my_table",
        "primary_key": "table_id",
        "updated_column": "updated"
    }
}

Example with custom query:

{
    "source": {
        "type": "sql",
        "system": "Northwind",
        "query": "select * from Customers",
        "primary_key": "CustomerID"
    }
}

Example with a custom query from a table called my_table where the primary key is in a column named table_id and the updated datestamp is in a column called updated. This enables us to switch on since support:

{
    "source": {
        "type": "sql",
        "system": "my_system",
        "query": "select * from my_table",
        "primary_key": "table_id",
        "updated_column": "updated",
        "updated_query": "select * from my_table where updated >= :since"
    }
}