Some systems are experiencing issues

About This Site

Real time service status from across the ATLAS Media Group portfolio

Past Incidents

25th September 2025

No incidents reported

24th September 2025

No incidents reported

23rd September 2025

No incidents reported

22nd September 2025

No incidents reported

21st September 2025

No incidents reported

20th September 2025

No incidents reported

19th September 2025

No incidents reported

18th September 2025

Universeodon Outage

Universeodon has gone offline, we are investigating why.

  • Service has been restored, we have around 150k retry jobs which we hope are the majority of the jobs that were queued at the time of the outage but we know we have lost an unknown number of jobs we think mostly ingress jobs. We expect the site to be under extra pressure now for the next 12-24 hours while other servers on the fediverse re-try sending us their content and activity. We will close this incident now and pick up the existing main incident related to the original connectivity issues.

  • Our database server appears to have stopped routing traffic at around 1AM UK time, we are unsure as to why this happened and are aware it has happened previously. This has then taken the site down but also resulted in the loss of an unknown number of jobs to the server as our content processing continued to try to process and due to the length of time of the outage marked jobs as dead and not to be re-tried. It is impossible to know what jobs we lost or how many were lost, we are attempting to re-try the ones currently in the dead queue and have a huge backlog of retry jobs.

  • 17th September 2025

    No incidents reported

    16th September 2025

    Universeodon - Connectivity Issues

    We are seeing connectivity issues between Universeodon's old compute cluster and our new compute environment with part of our data storage, as a result the site is offline. We are urgently working to resolve the situation.

  • The planned mainteannce tonight has been cancelled and will be re-scheduled as the site is now working as expected.

    We will raise a new incident here and announce via @wild1145@mastodonapp.uk and via server announcements when the new maintenance will take place.

    We do still need to re-build the feeds for all users however we will pick that up under the existing incident on here.

  • Universeodon queues are now tracking at under 10 mins, we do still have a lot of retries to execute so there might still be some delays but it looks like we're mostly back to where we should be.

    We will shortly spin up a new incident to cover the database move to the new infrastructure and we expect the site to be down for a few hours while we do this.

  • Due to the overnight database connectivity issues we have lost an unknown number of jobs which tried and failed to run due to DB connectivity issues.

    Our queues this morning are difficult to fully process as we have around 152k retry jobs and a rapidly growing set of standard queues as well. We expect some fairly slow processing now for the next few hours but will continue to monitor.

    Following this database issue and the fact it's happened multiple times in recent memory including over the last couple of days we will look to take the site offline for maintenance later today and migrate the database to a new server in the hopes we can remedy some of these connectivity issues.

  • We are now seeing around 50 mins of lag on the default queue and around 1 hour on our ingress and push queues. As a result we're scalling back our ingress capacity to prioritise the other queues and enable capacity to ensure the WebUI works as expected and content is pushed where it needs to go. We currently have a total of around 285k jobs queued across our queues.

  • We are seeing some reduction in speed and responsiveness on Universeodon however our intention for the time being is to keep the content processing ticking over at it's current rate. We are making good progress through our ingress queue and it's now down to around 3 hours of latency from live with around 184k jobs in the queue. Our default queue is currently lagging at just under 30 mins of latency from live with just under 110k jobs in the queue. We will review the situation in a couple of hours and if needed re-balance the queues again to re-prioritise the default queue to ensure timelines aren't significantly impacted beyond any existing impact.

  • Ingress queues on Universeodon are currently tracking at approx 4 hours behind real-time with the default queue currently slightly delayed at just under 20 mins. We've adjusted our balance of queues to prioritise our default queue and non-ingress queues as the ingress queue will often cause other jobs to spawn and can rapidly increase the job count on the other queues.

    MastodonAppUK continues to look okay in terms of queue lenght and delays with real-time processing continuing.

  • All queues except the ingress queue are now back to real time processing. Ingress is currently around 6 hours behind live, we're going to try to increase the processing capacity a little further to hopefully clear through the backlog a bit quicker.

    We are still going to need to do a feed re-build due to previouly lost Redis data however this will happen once the database has been migrated to the new infrastructure.

  • Queues on Universeodon are coming down slowly, we've again increased processing capacity as things appear to be running a lot more reliably now.

    Ingress is around 7 hours behind live and the default queue (Timelines and other aspects) is currently tracking approx 20 mins behind live. All other queues are processing around real time and are within our expected ranges.

  • The queues on MastodonAppUK have now cleared and we're processing as-normal in real time.

    We've just increased some of our queue processing again on Universeodon in the hopes that it will help work through the queues there now that we are more confident with our original content processing configuration.

  • We have reverted one of our original tuning changes which increased the number of content processor processes running by reducing the number of threads each had to match the original CPU configuration of the VM's. This appears to have made a noticable difference in performance and reduced the issues crossing the boundries between our old and new infrastructure. MastodonAppUK is now fully operating back on the original infrastructure platform with Universeodon seeing a 2x increase in processing capacity without any noticable impact to site / end-user performance at this time. We will continue to monitor.

  • We are seeing increasing queue sizes on Universeodon and MastodonAppUK with some intermittent issues on Universeodon's website. We have scaled again to a maximum safe point without doing any other work so are hoping to try to keep the queues somewhat managable. Ingress is currently tracking around 9 hours behind on Universeodon while all other queues are between 30 and 45 mins behind at this time.

    For MastodonAppUK We are going to activate our legacy content processing nodes on the old infrastructure in the hopes that it will both work down the queue and allow us to temporarily disable the new content processing nodes, as our Redis and Database servers are still on the old cluster it should be as if nothing had ever moved and we hope will give some much needed capacity to the Universeodon service. Queues are currently around 40 mins behind on Ingress and 1 hour behind for feeds and default queues.

  • The majority of the stability issues have cleared overnight and with the majority of the MastodonAppUK backlog cleared the demand on connectivity between the two sites has dropped considerably. We're slowly ramping up the content processing on Universeodon to catch-up on ingress which was not running overnight and to try to keep on top of the MastodonAppUK queues.

    We will look to migrate the Universeodon database server this afternoon / evening onto the new compute cluster as we hope that will help the overall performance of the environment and will allow us to route all Universeodon traffic locally within the new DC.

  • We have managed to get the site online after bringing the database servers connectivity fully online again. We've re-enabled a very small amount of content processing capacity however this is already proving to cause Universeodon to struggle. We're not sure why Universeodon is struggling more than MastodonAppUK however we're monitoring the situation and will have to look to expedite the migration to the new compute cluster for all other aspects of the sites.

  • We're now experiencing an issue with the Universeodon Database server which is no longer connecting to the network, this is preventing any sort of access and resulting in the site continuing to be fully offline. We are working as fast as possible to resolve the matter.

  • We have confirmed the issue is related to our SideKiq processing, we've now suspended all content processing and switched over to the new website infrastructure. We will start to bring back online the Universeodon content processing and then will bring back online the MastodonApp content processing to ensure stability.

  • We have reversed the change to our load balancer configuration and are re-routing the traffic back through our old infrastructure for the web services at this time as we are unable to bring the new server online for reasons which are not clear at this time. We will continue investigating.

  • We are continuing to have significant issues getting the new web infrastructure to serve public traffic and the root cause is currently unknown. We are working as hard as we can to restore service.

  • In an effort to stablise the site itself and to ensure events destined for Universeodon.com are delivered we have suspended the active processing of content while we re-build the content processing servers on the new compute cluster. Content will then be processed on our new infrastructure where we should be able to better catch up with the backlog.

  • We think we've identified the issue as relating to the traffic routing between our old and new clusters, where our MastodonApp.UK content processing still talks to our Redis server on the old environment and our Web and Content processing services talk to Redis on our new environment we think the delays to MastodonApp.UK and our work to increase processing capacity there has significantly impacted service here.

    We are working now to migrate Universeodon content processing onto the new compute environment which we hope will minimise some of this disruption and will continue to monitor the situation.

  • Elasticsearch MastodonAppUK - Search Migration

    We are currently migrating our Elastic instances to new infrastructure and as a result need to re-build our search indexes from scratch. This is unfortunately having some delays and is not running as expected and as a result full text search on the site is not currently operational.

  • We have had to pause work on the MastodonAppUK Search migration, we will look to continue the work in the coming days once we migrate the MastodonApp Redis and DB to the new infrastructure which we think should make the Search migration smoother.

  • Website Universeodon - Potential data loss

    Due to earlier issues at our colocation data center parts of our in-memory storage may have been lost resulting in delays to content loading and timelines potentially not being complete. We are investigating the issue at this time and will update when we can.

  • We will re-run the feed re-builder once the database server is migrated to the new infrastructure as we're running into latency issues currently between our existing tooling which is preventing the job from completing.

  • Feeds do not seem to be fully functional, we are running a re-build in the background now and hope this will resolve issues for all users.

  • Queue Service MastodonAppUK - Content Processing Delays

    Due to earlier issues at our colocation site and interuptions to networking we are currently seeing performance issues and delays of around 40 mins to our content processing. We are working to resolve the situation.

  • Closing incident, further updates are linked to the Universeodon connectivity issues which has become the primary incident.

  • Queues continue to grow due to increasing latency with the Redis connection, unfortunately there is no obvious path forward at this point and issues with other aspects of the service are restricting my ability to properly resolve this issue.

  • All queues are currently backed up between 2 mins and 1 hour. We aren't processing events very quickly due to the issues with our Redis configuration. Depending on time We may attempt to migrate the Redis server onto the new infrastructure tonight however seeing the issues Universeodon had when we did this move it is not ideal. We will keep you updated.

  • We are continuing to work to minimise the queue lengths, unfortunately due to the connectivity constraints between our old environment and the new environment increasing our queue processing capacity has had a substantial and negative impact to Universeodon and as a result we've had to scale back our setup, we think the latency between our new cluster and old cluster is partially to blame for the reason the jobs are queueing and struggling to process however continue to investigate and will do what we can to minimise disruption.

  • We are seeing queues continue to grow and our content processing is struglging to keep up. We're rolling out some configuration changes which should increase capacity and we will continue to scale up our content processing until the backlog is back to a managable state.

  • 15th September 2025

    No incidents reported

    14th September 2025

    No incidents reported

    13th September 2025

    No incidents reported