Some systems are experiencing issues

About This Site

Real time service status from across the ATLAS Media Group portfolio

ATLAS Media Group

Superior Networks

Scissors

Mastodon App

MastodonApp.UK Website

Performance Issues

Cloudflare CDN

Operational

Database Service

Operational

Queue Service

Operational

Media Storage

Operational

E-Mail Service

Operational

Advanced Search

Operational

Universeodon

Website

Partial Outage

Database

Operational

Queue & Content Processing Service

Partial Outage

Media Storage

Operational

Advanced Search

Performance Issues

Cloudflare CDN

Operational

E-Mail Service

Performance Issues

Universeodon Relay

Performance Issues

ATLAS Shared Services

Past Incidents

10th September 2024

MastodonApp.UK & Universeodon.com & PixelFedapp.UK - Outages

We are seeing outages across our compute cluster. We are unsure what the root cause of the issue is and are currently investigating.

We are now confident that full service has been restored.

I've made a further fix to our underlying infrastructure and believe I've managed to rectify the intermittent connectivity fault. I still need to resolve issues with a large number of members home timelines but that should be a lot smoother now this issue is resolved.

Universeodon is partially operational at this time. We are still seeing intermittent issues where the root cause remains unclear. Due to a series of other urgent issues that have come up I've not yet had time to fully troubleshoot the issues but will look to get this intermittent outage issue resolved ASAP.

Universeodon.com continues to have major disruption and ongoing issues. We have had to remove the corrupted redis database which has resulted in any sidekiq / content processing data being lost which we expect to be almost no data, however this has also resulted in some feeds such as home feeds being lost and needing re-creation. We're running into major issues when we try to regenerate the home feeds at this time and do not have capacity to continue working on the issue right now.

We have been able to fully restore MastodonApp.UK to normal operations. Unfortunately we're having much bigger issues when it comes to getting Universeodon.com operational due to a corrupted component (Redis) which is currently resulting in a full service outage while we look to minimise the data corruption issues there.

9th September 2024

No incidents reported

8th September 2024

No incidents reported

7th September 2024

No incidents reported

6th September 2024

No incidents reported

5th September 2024

No incidents reported

4th September 2024

No incidents reported

3rd September 2024

No incidents reported

2nd September 2024

No incidents reported

1st September 2024

Nexus Repository Server Nexus Service - Certificate Renewal Issue

We have identified an issue whereby our Nexus server is failing to automatically renew certificates as expected, the site is currently offline as a result.

Service restored.

Multiple Services - Storage Failure

We are investigating an issue impacting multiple services due to shared storage becoming unavailable. We will update as soon as we know more.

Full service has now been restored.

Storage has been restarted. Some services have come online however we are still trying to get all services online.

French Region Superior Networks - French Region Host Outage

We are working to resolve issues with one of the nodes in our French region. We have started the process of moving clients onto a new operational node however our VPSCP is having issues with the migration. We have engaged our vendor that owns the VPSCP software and are awaiting further updates from them for us to resolve this issue.

We are back fully online.

We have identified the issue on both the old and new host and the original host is now fully operational. There was an additional issue when we attempted to migrate clients onto a new host, we are currently reversing the migrations now.

31st August 2024

Database [MAINTENANCE] Universeodon.com Database Migration

Starting 19:30 BST we will commence migration of the Universeodon.com database server to new infrastructure to enable us to use additional capacity that has been provisioned.

The migration has fully completed and we're back up and running. We'll continue to keep an eye but all appears to be operational at this time.

The migration is now underway and we think it should be around 2 hours from now before full service is restored at the current transfer times.

Queue & Content Processing Service Universeodon - Content Processing Offline

We are currently experiencing a major outage on our content processing services. We are looking to restore service ASAP and are actively investigating this issue.

Content processing is fully restored and operational.

We have identified the root cause as a misconfiguration on one of our core routers. We are actively correcting the configuration now and hope to bring the content processing back online shortly.

30th August 2024

No incidents reported

29th August 2024

Queue Service Mastodonapp.uk content processing failure

Our content processing service has experienced a catastrophic failure resulting in feeds not being updated. We are actively back queueing all of these actions and as soon as we can remediate the issue we will start to catch up on the content that needs processing.

Content processing is now fully operational and working as expected.

We are starting to once again see further performance issues to the database layer of MastodonAppUK which is also resulting in disruption to our ability to process content on Universeodon.com - We are working to mitigate the impacts of this now.

We have managed to slightly increase our ingress capacity for content processing. We're currently running approx 8 Mins behind live on all our queues with the exception of ingress which is at around 21 Hours behind. We expect this queue to take a couple of hours to fully clear and will continue to monitor.

We have increased capacity on our database infrastructure however are still hitting bottlenecks. As a result we've scaled back our processing workers to prioritise the default content queue and are currently running a small amount of capacity for the ingress queue to try to catch this up. This means for all queues other than ingress we're currently running around 30 mins behind with ingress currently around 22 hours behind. We will continue to adjust the scaling to ensure the site remains online and operational while we process all this content.

We have identified a capacity bottleneck on the router that serves part of our database infrastructure, we are scaling this up now and hope this should upgrading some of the bottleneck issues.

It appears that the content processing has resulted in too much pressure being put on our database infrastructure causing major outages across the site. We are looking to scale back the content processing to restore the sites access.

We have powered on our legacy content processing server which is starting to work through the backlog, it looks like around midnight on the 29th August 2024 the new content processing services had a major failure resulting in the vast majority of content processing jobs failing to be executed. We currently have a backlog of a little over 1.1 million events which is likely to increase as we process content and additional processing is required. I suspect it'll take a few hours to get things back caught up. We are going to monitor the infrastructure and queues over the coming hours to ensure full recovery.