CSV endpoint sink¶

This is a data sink that registers an HTTP publisher endpoint that one can get entities in CSV format from.

A pipe that references the CSV endpoint sink will not pump any entities. In practice this means that a pump is not configured for the pipe; the only way for entities to flow through the pipe is by retrieving them from the CSV endpoint using a client that supports the HTTP protocol.

It exposes the URLs:

URL
`http://localhost:9042/api/publishers/mypipe/csv`
`http://localhost:9042/api/publishers/mypipe/csv/some_filename.csv`

The exposed URL may support additional parameters such as since and limit - see the API reference for the full details.

Note that you can optionally specify the filename to use in the Content-Disposition header of the HTTP response as the last path element of the URL.

Prototype¶

{
    "type": "csv_endpoint",
    "columns": ["properties","to","use","as","columns"],
    "quoting": "all|minimal|non-numeric|none",
    "delimiter": ",",
    "doublequote": true,
    "include_header": true,
    "escapechar": null,
    "lineterminator": "\r\n",
    "quotechar": "\"",
    "encoding": "utf-8",
    "encode_error_strategy": "replacement-strategy-to-use",
    "skip-deleted-entities": true,
    "filename": "my_data.csv",
    "content_disposition": "attachment"
}

Properties¶

Property	Type	Description	Default	Req
`columns`	List<String>	A list of string keys to look up in the entity to construct the CSV columns. If `include_header` is set to `true` (which is the default), this list will also be included as the first line of the CSV file.		Yes
`quoting`	Enum<String>	A string from the set of “all”, “minimal”, “non-numeric” and “none” that describes how the fields of the CSV file will be quoted. A value of “all” means all fields will be quoted, even if they don’t contain the `quotechar` or `delimiter` characters. A value of “non-numeric” means all non-numeric values will be quoted. The “minimal” setting (the default) means only fields with contents that need to be quoted will be quoted. Finally, the `none` value means do not quote (note this can produce broken CSV files if there are values that have to be quoted).	`"minimal"`
`delimiter`	String	The character to use as field separator. It will also affect which fields will be quoted if the `quoting` setting is set to `minimal"` (which is the default). The default value is to use the comma (`","`) character.	`","`
`doublequote`	Boolean	Controls how instances of `quotechar` appearing inside a field should themselves be quoted. When set to `true` (the default), the character is doubled (repeated). When set to `false`, the `escapechar` property setting is used as a prefix to the `quotechar`. If `doublequoting` is set to `true` but `escapechar` is not set, the backward slash character (`\`) is used as prefix.	`true`
`include_header`	Boolean	Controls if the `columns` property should be included as the header of the CSV file produced.	`true`
`escapechar`	String	A one-character string used by the sink to escape `delimiter` characters in fields if `quoting` is set to `none` and the `quotechar` if `doublequote` is set to `false`. The default is `null` which disables escaping (except if `doublequote` is set to `true`, in which case the default is `\`).	`null`
`lineterminator`	String	A character sequence to use as the EOL marker in the CSV output. The default is carriage return plus linefeed (`"\r\n"`).	`"\r\n"`
`quotechar`	String	A one-character string that controls how to quote field values. The default is the double quote character. See `doublequote` and `escapechar` for related settings.	`"\""`
`byte_order_mark`	Boolean	If `true` the sink will emit a UTF-8 byte order mark (BOM) to the start of the file/stream. I should only be used in conjunction with a UTF-8 encoding.	`false`
`encoding`	String	Which encoding to use when converting the output to string values. The default is `utf-8`. See section 7.2.3 on this page for a list of valid values.	`"utf-8"`
`encode_error_strategy`	String enum	An enumeration of “ignore”, “replace”, “xmlcharrefreplace” and “backslashreplace” that tells the sink how to deal with illegal characters in the output data when the `encoding` property is different than `utf-8`. The default “backslashreplace” replaces the offending character(s) with backslash escaped unicode values (i.e. the `"ę"` character would be replaced with `"\u0119"` if it’s illegal for the chosen encoding). The “replace” strategy will use a special unicode “replacement character” for unicode encodings, see https://en.wikipedia.org/wiki/Specials_%28Unicode_block%29 for more details or simply the `"?"` character if a non-unicode encoding. The “xmlcharrefreplace” replacement strategy uses numerical xml character values on the form `"&#NNN;"`. The “ignore” strategy is the simplest and just skips any illegal characters entirely.	“backslashreplace”
`skip-deleted-entities`	Boolean	This can be set to `false` to make deleted entities appear in the CSV output. The default is that deleted entities does not appear. If you set this to `true` you will also most likely want to include the “_deleted” attribute in the `columns` list, so that rows that represents deleted entities can be recognized. (If you need to rename or reformat the “_deleted” attribute you can do that by adding a DTL transform to the pipe.)	`true`
`filename`	String	This property provides a hint to HTTP clients on what filename to use when downloading data (via the `Content-Disposition` header property). Note that this property is not entirely standardized yet, so to be compatible with most HTTP clients, the filename should be ASCII characters only. For the same reason, quotes or backward or forward slashes should be avoided. If this property is not set, the contents will be served inline.
`content_disposition`	String	This property provides a hint to HTTP clients how to render the file data. The valid values are `attachment` and `inline`. It is used in the `Content-Disposition` header and the behaviour is client specific.	`"attachment"`

Example configuration¶

The pipe configuration given below will expose the my-entities publisher endpoint and read the entities from the my-entities dataset, picking the _id, foo and bar properties as columns in the CSV file:

{
    "_id": "my-entities",
    "name": "My published csv endpoint",
    "type": "pipe",
    "sink": {
        "type": "csv_endpoint"
        "columns": ["_id", "foo", "bar", "zoo"],
        "filename": "my_data.csv"
    }
}

The data will be available at http://localhost:9042/api/publishers/my-entities/csv (or alternatively http://localhost:9042/api/publishers/my-entities/csv/some_other_filename.csv)

Conditional sink

Dataset sink