Outage on systems due to update
Incident Report for Ymonitor
Resolved
Dear customers,

We have carried out an infrastructure update on Ymonitor systems between 7 April 20:00 and 8 April 01:00. We did not announce any scheduled maintenance for this update. Because when we tested it on our test and acceptance environments there was no service disruption. Nevertheless, an outage occurred on our production environment during the update. We apologize for the inconvenience that this might have caused and would like to share with you what we know so far about it.

During an infrastructure update on 7 April, because of an unknown reason, we experienced outage on some Ymonitor components. It started at 23:35 (7 April) and lasted until 00:42 (8 April). During this time, the observed effects were the following.

If there were any alerts on the monitors during this time, they could not be created. Therefore, they could not be reported by email, SMS, WhatsApp, or any other means. Those alerts are not created after systems went back to normal, either.
Some endpoints that serve the measurement and alert related data could not be reached.
The main webpage and the dashboards were unavailable.
No measurement data was lost during this time. After the above-mentioned time, services started working normally and the measurement data is saved in our databases.

We are currently investigating the root cause of this issue. For now, it seems like a problem occurred with one of our infrastructure suppliers. We have contacted them and asked for further investigation. If we can reveal more information about their findings, we will follow up with more updates here.

Should you have any questions, please feel free to reach out to your service delivery contact.
Posted Apr 08, 2022 - 15:01 CEST
This incident affected: Ymonitor Dashboards, ymonitor.nl, API, Alerting, and YGate API.