Low-level debugging¶
This page contains tips and tricks for doing various low-level debugging tasks. This information is only relevant for system administrators who are hosting a sesam instance, since most of the techniques require access to the filesystem of the box that the sesam instance is running on.
Changing the loglevel via the “/api/logs” api endpoint¶
The “/api/logs” api endpoint can be used to change the loglevel that a worker-node uses when writing to its “node.log” file.
Example of how to enable DEBUG-level logging from the `requests_oauthlib`
library in a worker-node:
$ export JWT=<jwt copied from a browser network log>
$ curl 'https://datahub-e0a07f91.sesam.cloud/_/a69ad5e5d08b/api/logs' -dloglevel=DEBUG -dlogger_name=requests_oauthlib -H "authorization: bearer $JWT"
{
"msg": "Log level updated"
}
$
Note that this request must be done directly to the workernode; it doesn’t work to do it against the sesam subscription service api.
Profiling all Sesam API calls¶
Starting the sesam service with the ‘–enable-api-profiler’ commanline option set. This will make the sesam service log all api calls to files in the “<LOGFILE_FOLDER>/api_profiler” folder.
This is mainly useful for development and debugging, since it requires you to restart the sesam service with the commanline option set and to have access to files in the <LOGFILE_FOLDER>. This is not possible to do with a Sesam service that is running in the Sesam cloud.
Also, this option generates a lot of log-files (which are not automatically cleaned up), so on a production box it would eventually fill up the disk.
Profiling one specific Sesam API call¶
Adding the special “SESAM-API-PROFILER” header to the Sesam APi request will make the Sesam instance return a file containing the profile-information for the api-call. The “real” response from the api-call is discarded.
A header-value of “pstats” will return a file that can be loaded with pythons pstats module.
A header-value of “calltree” will return a file that can be opened in KCachegrind.
Examples using curl:
curl -v -O -J -H "SESAM-API-PROFILER:calltree" http://localhost:9042/api/pipes curl -v -O -J -H "SESAM-API-PROFILER:pstats" http://localhost:9042/api/pipes
Profiling a pump run¶
Adding a special “profiler” request parameter to the Sesam API request will make the Sesam instance write files containing the profile-information for the pump run. The profile files are written to the logs folder of the instance with a “.pstat” and “.calltree” extension. The filename also contains the id of the pipe and the start timestamp.
Examples using curl:
curl -s --data "profiler=true" --data "operation=start" http://localhost:9042/api/pipes/foo/pump
Getting a stacktrace of all the threads in the node¶
If the sesam node seems to be frozen or otherwise is behaving oddly, it can be useful to look at the stacktraces of all the currently running threads in the node. There are two ways of retrieving this information:
Using the “/api/status/stacktrace” api endpoint¶
This is an “unofficial” endpoint that returns a html file. If the node has authentication enabled (this is the usual case) the endpoint requires that the JWT grants the “group:Admin” role.
Example:
curl -o stacktrace.html http://localhost:9042/api/status/stacktrace
Sending the SIGUSR1 signal to the sesam node process¶
This will create a new “stacktrace.html”-file in the “logs”-folder on the server. This requires you to have login-access to the server running the node, both for sending the signal and for looking at the resulting file. The advantage of this method compared to the api endpoint is that this will work even if the api itself is broken. If the signal is sent more than once, any existing “stacktrace.html”-file will be renamed to “stacktrace.html.1” (and any existing “stacktrace.html.1” will be renamed to “stacktrace.html.2”, etc. This is similar to the way the “node.log” file is rotated).
Example:
knut.johannessen@myserver:/sesam/logs/sesam-node$ ls -l total 33104 -rwxrwxrwx 1 777 sesam 1548906 Mar 21 12:43 node_access.log -rwxrwxrwx 1 777 sesam 113395 Mar 21 12:41 node_config.log -rwxrwxrwx 1 777 sesam 35058 Mar 21 12:43 node_health.log -rwxrwxrwx 1 777 sesam 1401190 Mar 21 12:43 node.log -rwxrwxrwx 1 777 sesam 0 Mar 17 17:49 node_memory_profile.log -rwxrwxrwx 1 777 sesam 136744 Mar 21 12:43 node_metrics.log -rwxrwxrwx 1 777 sesam 30627178 Mar 21 12:43 node_pipe_execution.log # First, find the process identifier of the node you are interested in. knut.johannessen@myserver:/sesam/logs/sesam-node$ docker inspect testsesamportal_storage-sesam-node_1 | grep '"Pid"' "Pid": 6280, # Send the SIGUSR1 signal to the node-process knut.johannessen@myserver:/sesam/logs/sesam-node$ sudo kill -SIGUSR1 6280 # A new "stacktrace.html" file is created in the logs folder: knut.johannessen@myserver:/sesam/logs/sesam-node$ ls -l total 33476 -rwxrwxrwx 1 777 sesam 1548906 Mar 21 12:43 node_access.log -rwxrwxrwx 1 777 sesam 113395 Mar 21 12:41 node_config.log -rwxrwxrwx 1 777 sesam 35058 Mar 21 12:43 node_health.log -rwxrwxrwx 1 777 sesam 1406116 Mar 21 12:44 node.log -rwxrwxrwx 1 777 sesam 0 Mar 17 17:49 node_memory_profile.log -rwxrwxrwx 1 777 sesam 136744 Mar 21 12:43 node_metrics.log -rwxrwxrwx 1 777 sesam 30723086 Mar 21 12:44 node_pipe_execution.log -rw-r--r-- 1 sesam sesam 279164 Mar 21 12:43 stacktrace.html # Sending a second signal will create a new "stacktrace.html" file and rotate the existing file: knut.johannessen@myserver:/sesam/logs/sesam-node$ sudo kill -SIGUSR1 6280 knut.johannessen@myserver:/sesam/logs/sesam-node$ ls -l total 34148 -rwxrwxrwx 1 777 sesam 1549330 Mar 21 12:47 node_access.log -rwxrwxrwx 1 777 sesam 113395 Mar 21 12:41 node_config.log -rwxrwxrwx 1 777 sesam 35702 Mar 21 12:47 node_health.log -rwxrwxrwx 1 777 sesam 1424737 Mar 21 12:47 node.log -rwxrwxrwx 1 777 sesam 0 Mar 17 17:49 node_memory_profile.log -rwxrwxrwx 1 777 sesam 139364 Mar 21 12:47 node_metrics.log -rwxrwxrwx 1 777 sesam 31109009 Mar 21 12:47 node_pipe_execution.log -rw-r--r-- 1 sesam sesam 279164 Mar 21 12:47 stacktrace.html -rw-r--r-- 1 sesam sesam 279164 Mar 21 12:43 stacktrace.html.1 knut.johannessen@myserver:/sesam/logs/sesam-node$
Checking if the health-checker thread in the node has created a stacktrace¶
The node will periodically call its own “/api/health” endpoint and log the results to the “node_health.log” file. If the call to the “/api/health” endpoint fails, the health-checker thread will create a new “stacktrace.html”-file, just as if a SIGUSR1 signal had been sent to the node. If the node is misbehaving it is therefore worth having a look in the “logs” folder to check if there are any recent “stacktrace.html” files.
Preventing pipes from automatically running¶
Most pipes will be configured to run automatically at certain intervals. In some cases we want to prevent all such pipes from being started automatically.
Examples of cases where this functionality can be useful:
We suspect that one or more pipes are using a lot of memory, but it is hard to isolate the problem because lots of pipes are being started by the taskmanager.
The sesam node crashes on or soon after startup because of a problem with a pipe.
Using the “Instance configuration” gui¶
This is the easiest way of disabling the pump scheduler.
Log in to https://portal.sesam.io
Navigate to the subscription in question.
Go to the “Settings” => “Instance configuration” page.
Use the “Insert configuration” dropdown to add a “TaskManager settings” item and set the “disable_pump_scheduler” property to “true”. The resulting configuration should look something like this:
{ "_id": "node", "type": "metadata", "task_manager": { "disable_pump_scheduler": true } }
Upload a full node-config with a node metadata entity¶
If you want to upload a full sesam node configuration and prevent any of the pipes in the config from starting automatically, you can add this entity in the configuration you are uploading:
{
"_id": "node",
"type": "metadata",
"task_manager": {
"disable_pump_scheduler": true
}
}
Writing a “/sesam/node/data/startup_options.json” file¶
If GUI is not accessable for some reason (for example if the node crashes on start-up), it is also possible to disable the pump scheduler by creating a file in the sesam node’s “data” folder.
The file must be called “startup_options.json”. It must be a valid json-file and look like this:
{
"task_manager": {
"disable_pump_scheduler": true
}
}
The sesam node must be restarted in order for the file to take effect (use docker restart <node-container-name>).
Examining memory-usage¶
The sesam-node has a few service api endpoints for indicating what is it using memory for.
/api/status/heap¶
Returns a plain-text document with informatino about what the memory is used for. This is the most high-level memory debug endpoint, and usually the most useful one. Example:
Process memory information: pmem(rss=664109056, vms=3156135936, shared=71004160, text=2220032, lib=0, data=3010367488, dirty=0)
Memory use summary:
rocksdb.kMemTableTotal : 82692096
rocksdb.kTableReadersTotal : 3266947
rocksdb.block-cache-usage : 34275312
rocksdb.block-cache-pinned-usage : 935104
ColumnFamily memory overhead : 86988000
MemoryCachedRocksDbDataset cache size : 17352128
RocksDbDataset bitsets size : 1092544
EntityBatch pool size : 9244266
Node pipes size : 14972902
Node systems size : 242230
Node datasets size : 161918
PyMalloc cached bytes : 24838224
je_malloc fragmentation (stats.active-stats.allocated) : 27581760
je_malloc cached bytes and metadata (stats.resident-stats.active): 26914816
**Total accounted for memory** : 329623143
**Unaccounted for memory (rss-total_accounted_for) : 334485913
Top dataset bitsets sizes:
wikidata-organisation-organisation-enrich : 4672
wikidata-organisation-collect : 4672
tripletex-account1-000002-tripletex-productunit-transform-split: 4672
tripletex-account1-000002-tripletex-productunit-transform-emit : 4672
tripletex-account1-000001-tripletex-productunit-transform-split: 4672
tripletex-account1-000001-tripletex-productunit-transform-emit : 4672
tenant000002-global-classification-enhance-merge : 4672
tenant000001-global-classification-enhance-merge : 4672
tenant-auth : 4672
system:pump:wikidata-organisation-organisation-enrich : 4672
Top column family size-all-mem-tables:
n0 (NodeState) : 6293504
l760 (dataset 'system:pump:tripletex-account1-000002-tripletex-vattype-classification-enrich') : 1050624
l748 (dataset 'system:pump:tripletex-account1-000002-tripletex-supplier-organisation-enrich') : 1050624
l746 (dataset 'system:pump:tripletex-account1-000002-tripletex-supplier-location-enrich') : 1050624
l738 (dataset 'system:pump:tripletex-account1-000002-tripletex-supplier-collect') : 1050624
l722 (dataset 'system:pump:tripletex-account1-000002-tripletex-projectcategory-classification-enrich'): 1050624
l714 (dataset 'system:pump:tripletex-account1-000002-tripletex-project-task-enrich') : 1050624
l702 (dataset 'system:pump:tripletex-account1-000002-tripletex-project-collect') : 1050624
l686 (dataset 'system:pump:tripletex-account1-000002-tripletex-productunit-classification-enrich') : 1050624
l68 (dataset 'system:pump:wikidata-organisation-organisation-enrich') : 1050624
Top column family estimate-table-readers-mem (i.e. index and filter blocks):
l68 (dataset 'system:pump:wikidata-organisation-organisation-enrich') : 29708
l23 (dataset 'system:backup') : 25464
l6 (dataset 'system:pump:system:update-storage-node-pipe-entity-type') : 25452
l52 (dataset 'system:pump:global-organisation') : 23342
l67 (dataset 'wikidata-organisation-organisation-enrich') : 21220
l422 (dataset 'wikidata-organisation-collect') : 19107
l51 (dataset 'global-organisation') : 19098
l335 (dataset 'system:pump:tripletex-account1-000001-tripletex-productunit-classification-enrich') : 16984
l325 (dataset 'system:pump:tripletex-account1-000001-tripletex-productgrouprelation-product-enrich'): 16984
l245 (dataset 'system:pump:tripletex-account1-000001-tripletex-order-agreement-enrich') : 16984
Python memory summary:
types | # objects | total size
============================================= | =========== | ============
dict | 222350 | 57.27 MB
str | 530396 | 46.13 MB
list | 309512 | 27.02 MB
code | 41449 | 7.37 MB
set | 10204 | 5.28 MB
pytrie.Node | 113661 | 5.20 MB
type | 5192 | 4.88 MB
tuple | 72466 | 4.40 MB
collections.OrderedDict | 4157 | 1.72 MB
weakref | 12841 | 902.88 KB
collections.deque | 1114 | 680.39 KB
int | 20285 | 588.96 KB
abc.ABCMeta | 532 | 562.54 KB
datetime.datetime | 10961 | 513.80 KB
builtin_function_or_method | 7035 | 494.65 KB
frozenset | 1854 | 490.83 KB
function (__init__) | 3301 | 438.41 KB
property | 5573 | 391.85 KB
getset_descriptor | 5772 | 360.75 KB
openpyxl.descriptors.MetaSerialisable | 334 | 347.05 KB
cell | 8846 | 345.55 KB
re.Pattern | 560 | 280.00 KB
lake.store.ext_types.NI | 4290 | 201.09 KB
function (__repr__) | 1248 | 165.75 KB
method | 2526 | 157.88 KB
collections.defaultdict | 259 | 141.70 KB
method_descriptor | 1834 | 128.95 KB
wrapper_descriptor | 1804 | 126.84 KB
function (<lambda>) | 870 | 115.55 KB
float | 4745 | 111.21 KB
function (__eq__) | 836 | 111.03 KB
_frozen_importlib.ModuleSpec | 2349 | 110.11 KB
cachetools.keys._HashedTuple | 1989 | 109.25 KB
lake.store.rocksdb.SegmentedBitset | 2301 | 107.86 KB
_frozen_importlib_external.SourceFileLoader | 2256 | 105.75 KB
enum.EnumMeta | 96 | 100.55 KB
bytes | 2565 | 100.25 KB
function (__ne__) | 726 | 96.42 KB
_thread.RLock | 2039 | 95.58 KB
lake.store.rocksdb.ColumnFamily | 1977 | 92.67 KB
lake.dtl.dtl_hops.CompiledHops | 1023 | 87.91 KB
function (to_dict) | 611 | 81.15 KB
function (to_str) | 605 | 80.35 KB
_thread.lock | 1882 | 73.52 KB
jinja2.nodes.NodeType | 68 | 70.66 KB
member_descriptor | 1021 | 63.81 KB
pgpy.types.MetaDispatchable | 61 | 63.38 KB
cython_function_or_method | 315 | 61.15 KB
lake.store.rocksdb.RocksDbEntityLog | 767 | 53.93 KB
lake.store.rocksdb.RocksDbEntityIndex | 767 | 53.93 KB
/api/status¶
Returns a json document that among other things include memoryrelated output from the https://pypi.org/project/psutil package. Example:
{
"node_id": "singlenode",
"node_version": "1.0.230521.651",
"node_start_time": "2023-05-23T08:41:53Z",
"node_uptime": "1 hours, 59 minutes, 19 seconds",
"psutil.virtual_memory": {
"total": 50481627136,
"available": 30703407104,
"percent": 39.2,
"used": 18712219648,
"free": 1521696768,
"active": 9185828864,
"inactive": 36309233664,
"buffers": 2002059264,
"cached": 28245651456,
"shared": 542887936,
"slab": 2964918272
},
...
/api/status/debugmallocstats¶
Returns the results of running the sys._debugmallocstats() function. Example:
Small block threshold = 512, in 32 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
0 16 133 32277 1372
1 32 2711 310874 30712
...
30 496 139 1084 28
31 512 265 1839 16
# arenas allocated total = 1,784
# arenas reclaimed = 851
# arenas highwater mark = 1,054
# arenas allocated current = 933
933 arenas * 262144 bytes/arena = 244,580,352
# bytes in allocated blocks = 215,139,648
# bytes in available blocks = 14,324,112
2670 unused pools * 4096 bytes = 10,936,320
# bytes lost to pool headers = 2,738,016
# bytes lost to quantization = 1,442,256
# bytes lost to arena alignment = 0
Total = 244,580,352
13 free PyDictObjects * 48 bytes each = 624
31 free PyFloatObjects * 24 bytes each = 744
...