Glad to have you back, and thanks for your work
And we're back!
35770 35771 Good to see you back. This site is so good.
35772 Welcome back! So <3 WH
35773 Yay! I'm so glad the site is back up! :)
35774 AksumkA said:
Sorry for the downtime this past week everyone.
We're back up and running on some brand new server hardware! Will the site be faster? Will the site be better? Will this happen again? Who knows!!
I'll update this post later this week with a quick breakdown of what happened and how things were recovered. I'm sure someone out there would like to learn from my mistakes! If you guys have any specific questions, please let me know and I'll do my best to answer.
The TL;DR is: NVMe drives that serve as our boot and database storage went read only after ~4.5PB of data read/written to them. A better admin would have caught that way sooner. Whoops... No data should have been lost (minus some cached stuff like last read posts in the forums here) - I was able to recover everything from the previous server. EDIT: I did choose to drop the subscription notices. These things are a real pain, so sorry for that. Hope you don't mind a fresh start!
Anyway, thanks again as always for hanging around with us!
AksumkA, it's a pity that this happened, but as they say, "Shit happens..."))
P.S. I'm glad to see you online again)
35778 gratz!
Added 1 second after
gratz!
35779 I'm happy to see the site back on again. Thank you AksumkA !
35780 This website is fantastic. I'm very happy to see you overcome this hardship.
Added 20 minutes after
AksumkA said:
Sorry for the downtime this past week everyone.
We're back up and running on some brand new server hardware! Will the site be faster? Will the site be better? Will this happen again? Who knows!!
I'll update this post later this week with a quick breakdown of what happened and how things were recovered. I'm sure someone out there would like to learn from my mistakes! If you guys have any specific questions, please let me know and I'll do my best to answer.
The TL;DR is: NVMe drives that serve as our boot and database storage went read only after ~4.5PB of data read/written to them. A better admin would have caught that way sooner. Whoops... No data should have been lost (minus some cached stuff like last read posts in the forums here) - I was able to recover everything from the previous server. EDIT: I did choose to drop the subscription notices. These things are a real pain, so sorry for that. Hope you don't mind a fresh start!
Anyway, thanks again as always for hanging around with us!
Thank you very much for all your hard work.
35784 we are SO BACK moment ........................👌👌
35788 Great work and welcome back
35789 Welcome back!!!
35790 All we have is now, go big or go home!
35797 thanks a lot♥
35806 LET'S GOOOO
35807 Gratz!
35809 hey bro don't worry thank god you return , that's enough :)
35813 Appreciate the hard work and post-mortem write-up. Can you share what you have implemented (or intend to) regarding the boot/data drives and monitoring their health, or other takeaways?
35814 thank you for all the hard work to get it back up and running!
35817 Missed you greatly and glad you're back.
35819 Perfect.
35826 agcrouton said:
Appreciate the hard work and post-mortem write-up. Can you share what you have implemented (or intend to) regarding the boot/data drives and monitoring their health, or other takeaways?
I do have some basic monitoring setup with Munin, but the plugins for NVMe drives seems hit or miss, so I'll have to look into other options. As of now, still kinda running risky.
One of the other things that burned me was not having good documentation on what config changes were made to the various services that run the site. Things like memory allocation, number of processes a service can spawn, etc. We had a few short down times after coming back thanks to that (like Elasticsearch's default config only allowing 1GB of memory, whoops). So that's another thing I'll be working on putting together, a document with all these notes.
One other threat was the older versions of some things we're still running on. Needing to make sure the latest distros still has repositories for the versions we need to still use is a whole thing. I blame that on the general lack of updating I've been able to do to the site's code as a whole. We're so far behind in the framework version that to upgrade to whatever the latest is would be a whole huge project.
35827 Monitoring can just be some smartctl test and email the result.
I always write a chroot script after a new server install and save it on a forge somewhere.
love dd <3
If you now the why the read only i take the feedback.
Nice to see the site is back. keep fighting !!!
35828 AksumkA said: (interesting stuff)
Agree with HumanG33k about smartctl - https://www.smartmontools.org/wiki/NVMe_Support suggests it may show useful Spare and R/W count that could be used to track drive longevity.
Regarding the documentation and keeping current on packages/versions - always the hard part, but at least you know it and are trying to do something about it.
Appreciate the info!
35829 THANK YOU!! we appreciate you and all you do for us fans <3 -D
35832 man,what can i say! I really miss you.
All images remain property of their original owners. Site & code © wallhaven.cc 2025. Privacy Policy · Terms of Service
