Increased error rates - Issues with Redditch DC Networking

Fixed

2 months ago —

Following the interventions on Saturday we have seen a reduction in the number of failures we're now seeing and have a process in place to remotely restart the router when this happens. We think the majority of the public facing impact should now be resolved and as such are going to close this incident while we continue engagement with Ubiquiti as this appeasrs to have some sort of software bug that has been introduced.

We will as a longer term project be looking to move to alternative hardware and software partially due to this issue and partially to allow for more cost effective scaling and additional functionality which our existing hardware cannot support.

Investigating

2 months ago —

On-site DC support have restarted the gateway for us again. As part of attempting to remove the need for this device and mitigate it's usage elsewhere we've pushed a configuration issue to one of our app servers which has resulted in a large scale outage across multiple services. We're working to resolve the situation ASAP.

Investigating

2 months ago —

We appear to be unable to remotely access the gateway even with the additional connectivity which we had enabled. We're working to get a resolution in place and restore functionality.

A lot of outbound connectivity from our Data Center will be broken at this time including image uploads and full text search may be partially broken.

Watching

2 months ago —

The gateway has once again failed overnight resulting in a short outage this morning before traffic resumed. We're going to perform another restart to the gateway and download diagnostic logs while we continue to wait for the vendor to provide a root cause analysis and fix.

Watching

2 months ago —

We've restarted the gateway remotely and proven we can do this without the need for on-site remote hands now. We've also applied the pending update to part of the software in the hopes this improves stability. We're waiting for an update from our vendor on what is causing the fault though from our own diagnostics it appears to be out of memory errors caused by the IDS / IPS services which is the same fault we saw on our previous device.

Unless there is further disruption I will provide an update by 22:00 Europe / London today.

Investigating

2 months ago —

Monitoring has detected a further outage of our gateway, at this time it looks as if we're still recieving inbound traffic. We're going to attempt to restart the gateway again now and apply a pending update to part of the system.

Watching

2 months ago —

The onsite team have restarted the device and connectivity appears restored.

Investigating

2 months ago —

At approx 3:30AM this morning the sites went offline. We are waiting for the device to be restarted.

Investigating

2 months ago —

We have confirmed that the work around is working at this time, we've applied this now to all of our hosts within the environment to facilitate some traffic outbound.

We are seeing connectivity issues now between Universeodon and MastodonAppUK - We suspect there may be some sort of issue with the nature of this connectivity and will be troubleshooting this further tomorrow when the gateway is restarted.

We expect to share the next update by 11:00 Europe / London on the 13th Feb 2026.

Investigating

2 months ago —

We've been able to push a temporary work around to override DNS settings on our Content processing servers which we hope will at least get our queue cleared. We're monitoring this patch now.

Our DC provider will reboot our gateway in the morning which I hope will restore our access if it doesn't fix itself overnight, given there isn't a user impacting issue as a result of this (Outside of content processing being partially down) we are reluctant to get the technicians to drive to the site just to reboot the gateway to restore our access.

Investigating

2 months ago —

The device is entirely non responsive in a way that means we cannot login to it even when we are able to reach it. We've now engaged our DC providers on-site team and requested their team reboot the device physically for us to hopefully get our access restored so we can continue to operate as normal.

Investigating

2 months ago —

We're struggling to get any access to the console so we're going to take the content processing back offline. If we're unable in the next 20-30 mins to gain access we will need to shutdown all other traffic coming into the environment in the hopes that it allows us to get the access we require.

Investigating

2 months ago —

With thanks to our ISP we have been able to confirm this does not appear to be a sort of network capacity attack / DDoS / similar, traffic over the last 48 hours looks to be normal / as expected. We suspect this is a bug or glitch within the Unifi operating system on the Gateway following an update to the gateway yesterday. Currently the gateway is entirely unresponsive other than responding to ping requests.

We have fully stopped the content processing for Universeodon and are still seeing issues for MastodonAppUK, we will re-start processing on Universeodon as soon as we can.

Investigating

2 months ago —

We are continuing to experience issues where our gateway router is non responsive and struggling to run effectively. We're unclear as to what is causing this as we're not seeing an increase in connections to our load balancer. We're working to see if we can track down the root cause.

Investigating

2 months ago —

We're continuing to work to resolve the major issues we're seeing. Currently it looks like this is primarily impacting our content processing workers and outbound connectivity, inbound connections to the site appear to be working as normal however our content processing services are seeing major disruption.

2 months ago —

We're investigating issues currently being seen from our infrastructure at the Redditch DC location. This is currently impacting multiple services across MastodonAppUK, Universeodon as well as the Superior-Networks billing panel.

We will restore full service ASAP.