Newnix - Drive Failures

New About Yours API Help
2.3 KB, Plain text
Heard you've been low on feedback recently, so I figured I'd send in an issue that came up recently with a server running FreeBSD 11.1.
I've been tasked with upgrading our NMS at work, I've been using Zabbix 3.4, which has been great for the most part, if fairly undocumented for *BSD configurations.
Since the initial server (an older Proliant with dual Opterons and a frustrating RAID controller) seems to have run into some drive issues causing boot to fail due to issues importing "zroot", I figured this would be a great time to ask about some ways to improve the system.

1) I/O optimization, I've had a hard time finding much on the topic, but the drives are reporting 512 sectors, and the PostgreSQL database is using 8k records. Should I set the dataset recordsize to be 8k as well? Is there some tuning you're aware of to improve the I/O for PostgreSQL installs on ZFS?

2) System diagnostics, are there any tools in particular I should look into using from FreeBSD to explore why some of the drives (da{0..3}) aren't addressable using `gpart show`? They show up in the output of `kern.disks` and `camcontrol devlist`, and `zpool status` shows them as online, but when trying to boot, the kernel log starts complaining about the headers. After running `gpart recover`, these geoms just don't appear to exist anymore. 

3) Boot import, the other 4 drives in this server make up a separate pool used for the host (zabbix in its own jail on its own pool to separate I/O workloads based on the 15k and 10k SAS drives), this pool can be imported from an install USB, but not when trying to actually boot the system. I get some sort of message stating that it's unable to set an "attribute u". Scrubbing the pool showed no errors, so I'm not sure what the problem is unless it's the physical drives, but then I'd expect issues trying to import the pool from an install image too.

If all goes well, I'll be able to recover and get our system back up and running before the holiday, to revisit the issue with our ProdEng team about getting some less frustrating hardware for this task when we return next week, but this is a fairly perplexing issue that I've not been able to dig up much information on.

Thanks as always for the great show, and I can hardly wait for the conferences of 2019 to come around, hopefully give a better prepared talk this time at least at one location.
Pasted 1 month, 2 weeks ago — Expires in 320 days