Dear Customer,
We had an incident on all Ymonitor components between 30 - 31 May 2020, due to certificate issues. We apologize for any inconvenience that it might have caused and hereby explain in more detail what happened and how we will prevent issues like these in the future.
The incident started at 12:48 (CET) on 30 May, our sentinels stopped communicating with the central system. We identified this issue and the root cause was identified around 9:50 on 31 May.
Details about the incident:
The sentinels of Ymonitor always check the validity of all the SSL certificates on the certificate chain. That means, starting from the SSL certificate of "api.ymonitor.nl", the intermediate certificate and the root certificate validity is verified before a sentinel can start a transaction with the central system. The intermediate and root certificates belong to Comodo which is a major certificate authority (CA). Normally the certificate issuers are obliged to warn their customers of the expiration dates of certificates in advance. However, Ymor was not notified by the CA or the certificate issuer about the validity of the root and intermediate certificates. Hence, when they expired on 30 May 2020 at 12:48 all sentinels stopped communicating with the Ymonitor central servers.
Details about the solution:
After the root cause was identified our engineers started to work to remediate the issue. They created a patch for the sentinel software which was immediately applied by the service delivery personnel on to the sentinels where we have remote access. We validated the patch to make sure that it does not open any security vulnerabilities, as it keeps the validity check of our SSL certificate in place. As our sentinels are capable of caching measurement data, when the patch was applied, they started sending the cached measurements to the central system. We closely monitored the system to make sure that the measurement data was saved to the Ymonitor databases. At 20:00 we called the system operational again. In case the solution was implemented at your environment no data was lost during the incident. In some cases, we haven’t been able to implement the solution yet. In that case, your organization has been contacted by your Service Delivery Manager to discuss the required actions to implement the solution in your environment.
Future improvements:
In the upcoming days, we are going to evaluate the incident, the process, and implement a new version of the sentinel client software as soon as possible to prevent any events of this sort in the future.
Again, our apologies for the inconvenience.
Kind regards,
Ymor - part of Sentia