Some systems are experiencing issues.
Our backups are now in a better state, we're re-starting the Universeodon service at this time.
We're continuing to see a large backlog in WAL Files needing to be pushed to our external backup service. We're going to continue to monitor and we will look to keep the site offline until we've made a significant enough dent in the backlog to safely restore service and not risk backup stability.
Due to growing disk usage and our backups struggling to keep up right now we've temporarily taken Universeodon.com offline to allow the backups a chance to fully catch-up. Unfortunately once we increase the disk allocation to the DB Server it's impossible to reverse and with the current constraints with SSD Storage globally we are looking to conserve capacity where possible.
We can see a large backlog in WAL Files waiting to be pushed to our off-site backup service, this is likely the result of the issues with our networking and routing which has taken DNS offline a few times. We expect that once all these files are pushed up we should see a substantial amount of disk space released to the DB Server.
Despite a significant increase in disk capacity we're once again seeing disk related issues impacting Universeodon - We suspect this might be the result of issues with our backup streaming. We are investigating.
We have expanded the disk space and will be reviewing our monitoring tool configuration to ensure high disk capacity warnings are flagged in future to allow us to proactively intervene. We will continue to monitor to ensure full stability has restored. As part of this we are also validating our off-site backup configuration has not been impacted and is still operational.
We have identified the issue as being a full disk on our primary database server. We're working to expand capacity now and restore service as quickly as possible.
Monitoring has detected a global outage of Universeodon - We are investigating.
Incident UUID 78f2e438-55ed-40c5-9645-fdc8029b1761