Increased DB CPU usage

Incident Report for Pay & Connect

Postmortem

To relieve the strain on the DB server while the source of the issue was investigated, the DB was allocated additional resources in the form of an increase in CPU.

The added resources were sufficient to bring the CPU usage back down to acceptable levels temporarily, but this morning (26/8) those resources were stretched again to their limits and the server started presenting first increased latency, and eventually stopped serving requests.

At this point we increased the DB CPU resources again even further to immediately relieve the load on the server, and increased efforts to establish the root cause. We isolated a particularly slow and long running query which had started showing performance degradation as a result of the transaction table size. We managed to implement a dramatic optimisation of the query and deployed an update soon after. Following this query optimisation we are able to see enormous improvements on the server performance metrics.

Posted Aug 26, 2022 - 18:08 CAT

Resolved

An increase of DB CPU usage was detected, which caused an increase in latency throughout the system.

Posted Aug 25, 2022 - 17:30 CAT