Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
EVIL Gibson
Mar 23, 2001

Internet of Things is just someone else's computer that people can't help attaching cameras and door locks to!
:vapes:
Switchblade Switcharoo

mobby_6kl posted:

Cool story, bro.


Thanks. Unless you were offended by it... poo poo!

:ignorance:

Adbot
ADBOT LOVES YOU

Motronic
Nov 6, 2009

EVIL Gibson posted:

tell me your definition of "story" because your reaction seems to be taking it in a way I did not mean and I apologize.

The way you phrased it suggests the stories you are referring to are fiction. This is a baffling position, along with suggesting teflon. Plenty of people are covering pins with non-conductive stuff. They aren't making it up. They're really doing it, just like I am. For years. And it works fine. Without modifying anything at all other than the particular device that needs the modification.

It's not a big deal, just odd. Nobody is offended. You can let it go.

dodecahardon
Oct 20, 2008

mobby_6kl posted:

Is there a way to get Synology DSM to run a job when the system wakes up? That doesn't seem to be an option in the "Secheduled Tasks" thingie, and from what I've seen one solution is to write a script in /etc/pm but there's no /etc and I'm not enough of a Linux nerd to work out how to get around that.

It looks like you can create cron jobs and put your script in /var/services/homes/username.

JockstrapManthrust
Apr 30, 2013
You can do this in the Task Scheduler (I'm using DSM 7).
If you go to the Control Panel, and in the Services section open up the Task Scheduler.
Then click Create > Triggered Task > User-defined Script.
Name the task, set the user to run it under, and the event to trigger it, such as Boot-Up.

I use this to stop and remove the ActiveInsight and HybridShare services on boot and it works great.

Just in case anyone wants to do that, I run that on the Boot-Up trigger as the Root user.

synopkg stop ActiveInsight && synopkg uninstall ActiveInsight

synopkg stop HybridShare && synopkg uninstall HybridShare

corgski
Feb 6, 2007

Silly goose, you're here forever.

Right now my NAS is running Windows with StableBit Drivepool. Are there any tools for linux that function the same way? MergerFS seems to only do JBOD pooling without any of the duplication or read striping.

BlankSystemDaemon
Mar 13, 2009



Twerk from Home posted:

What experiences have you had with SATA disks on SAS expanders? I've had a couple of really bad experiences with SATA disks on SAS expanders both at work and at home over the years. Disks falling out of hardware RAID arrays, SATA disks throwing regular I/O errors at a higher rate when used with software defined storage, the whole array going non-responsive and blocking mysteriously and not doing that after the SATA disks were removed and only SAS disks were in place.

The thing is, I've gotten explicit confirmation from hardware vendors and also software tools that SATA disks on SAS expanders are supported, but I'm twice burned now and basically ready to just always pay the small cost premium for SAS disks forever going forward. Can I get a sanity check?
Good hardware RAID controllers will use SCSI READ DEFECT DATA command and mark disks as failed if they don't return expected results.
SATA disks won't return expected results, so if the hardware RAID controller hasn't been made to handle everything that SATA disks can return, it's likely they eventually end up throwing an error.

Basically, hardware RAID is bad as it's not hardware RAID - it's just a separate realtime OS running on a 500MHz MIPS, PowerPC, or ARM processor with special instructions to handle XOR and Reed-Solomon error correction.

SATA on SAS is absolutely supported by the specification, otherwise the ports wouldn't be compatible via keying; if you've ever tried doing SAS on SATA you'll know what this means.

If ZFS is failing because of SATA disks erroring out, it's probably because the disks are bad.

mobby_6kl
Aug 9, 2009

by Fluffdaddy

JockstrapManthrust posted:

You can do this in the Task Scheduler (I'm using DSM 7).
If you go to the Control Panel, and in the Services section open up the Task Scheduler.
Then click Create > Triggered Task > User-defined Script.
Name the task, set the user to run it under, and the event to trigger it, such as Boot-Up.

I use this to stop and remove the ActiveInsight and HybridShare services on boot and it works great.

Just in case anyone wants to do that, I run that on the Boot-Up trigger as the Root user.

synopkg stop ActiveInsight && synopkg uninstall ActiveInsight

synopkg stop HybridShare && synopkg uninstall HybridShare
I actually saw this a while ago but at least in DSM 6 I have two options: "Boot-up" and "Shutdown". I assumed, maybe incorrectly, that this would only trigger on actual cold boot and not on wake from sleep. I could just test it, of course :downs:


Charles Mansion posted:

It looks like you can create cron jobs and put your script in /var/services/homes/username.
Cron can't run task on events like resume/wake, as far as I know though.


The idea here is that I'll stop some services that prevent the NAS from going into standby at night and start them back up whenever it wakes up in the morning. Maybe it's weird because "it's a server" and all but there's usually no reason for it to run while I'm sleeping or working or on vacation. Plus I have WoL if I do need it.

JockstrapManthrust
Apr 30, 2013

mobby_6kl posted:

I actually saw this a while ago but at least in DSM 6 I have two options: "Boot-up" and "Shutdown". I assumed, maybe incorrectly, that this would only trigger on actual cold boot and not on wake from sleep. I could just test it, of course :downs:

Ah, my bad, thought you meant start-up. Wake/sleep would also be good to have in there as its a little under populated with just boot/shutdown triggers.
Could be worth a suggestion email to them as its a great way to add scripting to the system via the webui rather than having to hack up the OS underneath and risk the change being nuked with a DSM update.

Generic Monk
Oct 31, 2011

apparently the recommended configuration for a raidz1 vdev is to have an even number of data disks otherwise you run into fuckery where it can’t divide the sector size (?) evenly. what does that mean? do you just lose some capacity?

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
The width of the stripe written may not be as wide as the amount of disks.

Say you have five disks on a RAIDZ1, the physical sector size being 4KB, which would be dictating the smallest block size, and then you somehow only write 8KB between flushes, the stripe would be three wide. Two data blocks of 4KB and one for parity.

See this image I fetched from GIS:

BlankSystemDaemon
Mar 13, 2009



You need to read this, to understand that zfs has variable stripe width and how anything anyone says about it is probably wrong, if they haven't also read and understood it.

Generic Monk
Oct 31, 2011

BlankSystemDaemon posted:

You need to read this, to understand that zfs has variable stripe width and how anything anyone says about it is probably wrong, if they haven't also read and understood it.

this is really helpful; I think I saw this linked in one of the truenas forum threads but the link there was broken. so this is saying that in very specific workloads with specific block/sector size combinations you might see a higher percentage of the 'stripe' be taken up by parity and not data, making it more inefficient, but this is another one of those things that home users really don't need to be worrying about?

BlankSystemDaemon
Mar 13, 2009



Generic Monk posted:

this is really helpful; I think I saw this linked in one of the truenas forum threads but the link there was broken. so this is saying that in very specific workloads with specific block/sector size combinations you might see a higher percentage of the 'stripe' be taken up by parity and not data, making it more inefficient, but this is another one of those things that home users really don't need to be worrying about?
Sure, if you remove the "home users" from the summary, that's more or less an accurate summary.

It's important to remember that if you're doing IOPS sensitive workloads, you wouldn't be using raidz to begin with - when you're building raidz pools, you're typically looking to satisfy a minimum workload that's fairly conservative, and have very large storage requirements that can't be met when using fully 50% of your disks for availability.

Even using the fastest NVMe SSDs that're open-channnel (ie. SSDs with multiple I/O queues and that don't have a flash translation layer, in other words that don't pretend to behave like disks, but exposes the non-volatile flash directly to the OS), striped mirroring is still going to increase your IOPS linearly for every device you add in each mirror that the data is striped across, or for every new mirror that you let the data be striped across.

Computer viking
May 30, 2011
Now with less breakage.

Speaking of mirrors and raidz. I'm setting up a replacement file server at work, with 15x 16TB drives. Last time I did this, I had the impression that using mirrors was also preferable because it makes resilvering way quicker. Is that still a thing, or is resilvering a drive in raidz quick enough in practice?

I'm debating mirrors vs some level of raidz. The IO loads here are usually fairly nice; mostly reading and writing a single large file at the time (over NFS on 10gbit or SMB on 1gbit), though I'm sure there's sporadic "thousand tiny files at once" cases. I have enough disks to get enough space (for now) with mirrors, but more space is more space. Then again, more IOPS never hurts either. I also have 2x500GB NVME SSDs I'll need to consider the best use for, and a 300GB SAS SSD that Dell tossed into the server.

Tempted to just go with 7x mirrored pairs, one hot spare. As for the SSDs, uh. To be considered and perhaps benchmarked.

e: On a related note, I'm not overly impressed with the two PCIe slots in this server. There's fortunately also an OCP connector, but I'd happily trade that for more plain PCIe slots; there's plenty of space for them. Maybe even an NVME slot or three on the vast open expanses of motherboard. As is, NVME-on-PCIe-card + external SAS HBA + 10gbit NIC means I have zero PCIe connectivity left. Oh well, I assume the internal SAS controller eats a fair few lanes, and I don't know how many you get to play with on a modern Xeon.

e2: The 15 disks are not a hard limit or an ideal number, it's just what our vendor had in stock. There's space for 20.

Computer viking fucked around with this message at 10:56 on May 31, 2022

MJP
Jun 17, 2007

Are you looking at me Senpai?

Grimey Drawer
What's the zeitgeist goon opinion on NAS boxes with gig ethernet, two bays, regular old SMB support for doing local Windows file sharing, and maybe decent media transcoding? I have an old Buffalo NAS that's been a workhorse but is definitely slow at file transfers and a real kludge of a user interface.

Extra bonus if it lets me just drop in the two drives I have now, currently in RAID 1.

Alternatively, are there decent PCIe RAID controllers for regular old Windows desktops? I have the physical room for the drives in my machine, might as well cut out the middleman.

Rexxed
May 1, 2010

Dis is amazing!
I gotta try dis!

Synology would be my go to for a small NAS and they have some 2 bay models. It probably won't let you use your current drives without changing them over to its filesystem, though, which is going to reformat them. I'm not sure what would unless it's entirely software raid and you move it to a setup using the same one. I don't think it's worth getting a hardware raid card for a desktop unless you just need more sata ports and can put it in JBOD mode. I'd always choose to run things with ZFS if I could, even a single disk mirror would be my choice because it could be moved to any other OS install that runs ZFS.

Transcoding can be a bit more daunting than just serving files but it depends on what kind of transcoding you're doing. AFAIK you want a beefier CPU or a GPU that works with whatever media server you're using to handle it on the fly, but I'm not familiar with all of the options and I'm sure that a lot of posters here or in the plex/emby/whatever is popular now thread will know more.

Scruff McGruff
Feb 13, 2007

Jesus, kid, you're almost a detective. All you need now is a gun, a gut, and three ex-wives.

MJP posted:

What's the zeitgeist goon opinion on NAS boxes with gig ethernet, two bays, regular old SMB support for doing local Windows file sharing, and maybe decent media transcoding? I have an old Buffalo NAS that's been a workhorse but is definitely slow at file transfers and a real kludge of a user interface.

Extra bonus if it lets me just drop in the two drives I have now, currently in RAID 1.

Alternatively, are there decent PCIe RAID controllers for regular old Windows desktops? I have the physical room for the drives in my machine, might as well cut out the middleman.

If you're storing/collecting media files my only caution would be that you'd be surprised how fast you'll fill up drives depending on what you're storing and how, so two bays might be more limiting than expected. But otherwise like Rexxed said, the Synology boxes seem to work very well as a user-friendly and easy to maintain solution.

For a desktop, I would again agree with Rexxed, hardware RAID is effectively dead, don't bother with it. You can definitely add an HBA expander card in IT mode pretty easily to add more drives than your motherboard supports (I recommend theartofserver for both his YouTube channel containing everything you need to know and his ebay store for quality parts and support. Very cool dude) but if you have enough SATA ports already then you don't really need one. I can't speak to the feasibility of ZFS in Windows, though I'm sure others here could speak to that at length. I would expect the preferred method would be to flip the OS by running a Linux distro that supports all the ZFS features and then have a Windows VM inside that for your regular desktop needs.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
I'll second Synology. Even my ancient one is plenty fast at file transfer, and is super easy to set up. At the same time at least with the + models you can make it do anything you want. The newer ones would support hadrware transcoding.

I think Scruff McGruff is right though about 2-bay ones being limiting, if you're storing movies or shooting RAW photos or videos, it'll fill up in no time. And with two drives only, you're kind of stuck. You'll have to get two much bigger drives and somehow transfer the data and then you're stuck with two extra drives. Instead of just popping in a fresh disk. You could get a slightly used four-bay model for the price of a two-bay new one if that's a concern.

Klyith
Aug 3, 2007

GBS Pledge Week

Generic Monk posted:

is there any other real difference between reds and say, the white label drives you get from shucking? the only things i'm aware of is the warranty and that the firmware is tweaked to not aggressively spin the drive down when not in use; is that it?

Besides the 3.3v thing, one thing NAS & server drives do differently is read errors, where the firmware gives up after a couple attempts and reports an error. While standard desktop / laptop drive firmware will stall and keep trying way longer. Theory being that in a NAS there should be a raid or something else providing error redundancy, so failing fast is better. While normal drives are usually *not* redundant so the drive should make a heroic effort to recover data.

I dunno what the firmware of white-label WD portable drives does. I could see WD going either way.

In a high-performance NAS this matters, in a home job with one or just a few users it doesn't. Though I dunno, maybe ZFS will start marking a whole drive as bad when it stops responding for 60 seconds to deal with a bad read.

MJP
Jun 17, 2007

Are you looking at me Senpai?

Grimey Drawer

Scruff McGruff posted:

If you're storing/collecting media files my only caution would be that you'd be surprised how fast you'll fill up drives depending on what you're storing and how, so two bays might be more limiting than expected. But otherwise like Rexxed said, the Synology boxes seem to work very well as a user-friendly and easy to maintain solution.

For a desktop, I would again agree with Rexxed, hardware RAID is effectively dead, don't bother with it. You can definitely add an HBA expander card in IT mode pretty easily to add more drives than your motherboard supports (I recommend theartofserver for both his YouTube channel containing everything you need to know and his ebay store for quality parts and support. Very cool dude) but if you have enough SATA ports already then you don't really need one. I can't speak to the feasibility of ZFS in Windows, though I'm sure others here could speak to that at length. I would expect the preferred method would be to flip the OS by running a Linux distro that supports all the ZFS features and then have a Windows VM inside that for your regular desktop needs.

It's more that my regular old desktop PC has plenty of physical space for spinning disks and I'm not looking to pack-rat, just have around 4-6TB of regular old storage for my wife and I, which then gets backed up to Acronis.

The transcoding requirement is more of a "hey, this would be nice to be able to stream stuff into the living room" but if a desktop can do it, I'd be happy to not have one extra network endpoint to secure.

Could I just get away with a Windows Storage Spaces pool of two or three drives and be done with it? I don't want to make an extra hobby of managing storage so if it's a schlep I'll just go with Synology, but I figure I should use what I have first.

Aware
Nov 18, 2003
You can definitely just use storage spaces though my last attempt many years ago found it to perform terribly for write speed at least. A lot of people seem to use Stablebit software on Windows for this instead but I can't comment much as I haven't used it.

BlankSystemDaemon
Mar 13, 2009



Computer viking posted:

Speaking of mirrors and raidz. I'm setting up a replacement file server at work, with 15x 16TB drives. Last time I did this, I had the impression that using mirrors was also preferable because it makes resilvering way quicker. Is that still a thing, or is resilvering a drive in raidz quick enough in practice?

I'm debating mirrors vs some level of raidz. The IO loads here are usually fairly nice; mostly reading and writing a single large file at the time (over NFS on 10gbit or SMB on 1gbit), though I'm sure there's sporadic "thousand tiny files at once" cases. I have enough disks to get enough space (for now) with mirrors, but more space is more space. Then again, more IOPS never hurts either. I also have 2x500GB NVME SSDs I'll need to consider the best use for, and a 300GB SAS SSD that Dell tossed into the server.

Tempted to just go with 7x mirrored pairs, one hot spare. As for the SSDs, uh. To be considered and perhaps benchmarked.

e: On a related note, I'm not overly impressed with the two PCIe slots in this server. There's fortunately also an OCP connector, but I'd happily trade that for more plain PCIe slots; there's plenty of space for them. Maybe even an NVME slot or three on the vast open expanses of motherboard. As is, NVME-on-PCIe-card + external SAS HBA + 10gbit NIC means I have zero PCIe connectivity left. Oh well, I assume the internal SAS controller eats a fair few lanes, and I don't know how many you get to play with on a modern Xeon.

e2: The 15 disks are not a hard limit or an ideal number, it's just what our vendor had in stock. There's space for 20.
Well, raidz resilver is limited by the speed that the CPU can do Reed-Solomon P(+Q(+R) - but if your CPU has AVX2 and you're using an implementation of OpenZFS that's new enough, the maths been vectorized and even without AVX2 it's still done using SIMD via SSE, so in practice you're usually limited by disk read speed rather than the CPU.
The other bottleneck can be the checksum algorithm; if you're using fletcher and your CPU supports SHA256 in hardware (which some newer Xeons do), you should absolutely be using that instead.

Striped mirrors can still be faster because it's almost guaranteed to be doing streaming I/O from one part of the mirror to the other, whereas there are scenarios where raidz resilver will look way more like random I/O - but all of that will depend on how you disks have been written to, which can be usecase-optimized to a certain extent.

If you have 18 disks in 2x raidz3, there's room for a couple hot-spares.
Alternatively, you can use draid as the entire point of that is to speed up raidz rebuilds by having spare blocks distributed across the vdev.

One option for the two NVMe SSDs is to use them with allocation classes as documented in zpoolconcepts(7) and setting recordsiz as well as special_small_blocks (documented in zfsprops(7)), so that the metadata as well anything smaller the threshold gets written to the SSDs - my recommendation would be that anything below the native sector size of the spinning rust should go there, but you can choose anything up to one power of 2 below your recordsize.

The big thing to remember is that the special vdevs used by allocation classes are not cache disks. The metadata and small blocks aren't stored anywhere else, irrespective of whether you're using it with striped mirrors or raidz.

EDIT: Special vdevs are also one of the only ways of making dedup bearable.
Although if you're doing dedup, you'll want to use zdb -S to simulate building a DDT (or do some napkin maths, with 80 bytes per on-disk LBA), because I'm pretty sure 500GB SSDs are too small for metadata+smallblocks+dedup.
You'll also have want to have sha256 checksumming and a CPU or QuickAssist daughterboard that can offload it.

BlankSystemDaemon fucked around with this message at 11:48 on Jun 1, 2022

Computer viking
May 30, 2011
Now with less breakage.

BlankSystemDaemon posted:

Well, raidz resilver is limited by the speed that the CPU can do Reed-Solomon P(+Q(+R) - but if your CPU has AVX2 and you're using an implementation of OpenZFS that's new enough, the maths been vectorized and even without AVX2 it's still done using SIMD via SSE, so in practice you're usually limited by disk read speed rather than the CPU.
The other bottleneck can be the checksum algorithm; if you're using fletcher and your CPU supports SHA256 in hardware (which some newer Xeons do), you should absolutely be using that instead.

Striped mirrors can still be faster because it's almost guaranteed to be doing streaming I/O from one part of the mirror to the other, whereas there are scenarios where raidz resilver will look way more like random I/O - but all of that will depend on how you disks have been written to, which can be usecase-optimized to a certain extent.

If you have 18 disks in 2x raidz3, there's room for a couple hot-spares.
Alternatively, you can use draid as the entire point of that is to speed up raidz rebuilds by having spare blocks distributed across the vdev.

One option for the two NVMe SSDs is to use them with allocation classes as documented in zpoolconcepts(7) and setting recordsiz as well as special_small_blocks (documented in zfsprops(7)), so that the metadata as well anything smaller the threshold gets written to the SSDs - my recommendation would be that anything below the native sector size of the spinning rust should go there, but you can choose anything up to one power of 2 below your recordsize.

The big thing to remember is that the special vdevs used by allocation classes are not cache disks. The metadata and small blocks aren't stored anywhere else, irrespective of whether you're using it with striped mirrors or raidz.

EDIT: Special vdevs are also one of the only ways of making dedup bearable.
Although if you're doing dedup, you'll want to use zdb -S to simulate building a DDT (or do some napkin maths, with 80 bytes per on-disk LBA), because I'm pretty sure 500GB SSDs are too small for metadata+smallblocks+dedup.
You'll also have want to have sha256 checksumming and a CPU or QuickAssist daughterboard that can offload it.

Right - that's quite useful; thanks.

It's a xeon silver 4309Y, which is listed on ARK as having AVX-512 and AESNI, so I think I'm good there. Also, IO is realistically not the bottleneck for most things we do - if it can deliver a few hundred MB/sec in most uses, that's more than enough. Doing so efficiently never hurts, of course. And thanks for the sha256 tip, that's a smart use of CPU features I wouldn't have thought of. Also, it's good to hear the resilver should ... mostly be fast enough on raidz? I'm not entirely comfortable with the redundancy pattern of a large pack of two-disk mirrors, nor the 50% "waste" - though the performance is at least good. Still, I won't exactly mind using something else.

2x 9 disk raidz3 sounds very reasonable, though I'll have to wait for the last few disks; I'll test with 2x 7 disk raidz1 for now and prepare to nuke and reconfigure before putting it to use.
I like the idea of using the NVME disks as a special vdev - presumably an NVME mirror should be resilient enough. (And if it goes suddenly and completely bad, I guess that's why I set up the tape backup). I do remember hearing the BSD Now guys talk about the "redirect small writes to SSD" idea, but somehow didn't consider doing it here.

Maybe the 300GB SSD would work as as L2ARC, if it's decently fast? It's a role where it's fine that it's non-redundant, at least.

I think I'll just hold off the dedup. I've briefly tested it, and it doesn't really do a lot on our data; there's very little redundancy to work with.

Computer viking fucked around with this message at 12:37 on Jun 1, 2022

Brain Issues
Dec 16, 2004

lol
My Synology keeps telling me that I have a drive failing, but when I look at S.M.A.R.T log details everything is OK. This most recent one is a 14tb and I've removed it, and have been running an extended S.M.A.R.T test on Hard Disk Sentinel Pro for 12 hours now with no errors.

Anyone know what's the deal here, is there somewhere else that I can look on my Synology that will actually tell me WHY/HOW my drives are failing?

I'm getting really tired of 1. spending money. 2. replacing drives that are potentially still good and 3. resilvering a 100TB array.

I've had 5 different drives do this same thing now over the last 3.5 years. All purchased new, all with less than 30k hours on them when they "failed". This most recent one only has 15k hours on it.

BlankSystemDaemon
Mar 13, 2009



Computer viking posted:

Right - that's quite useful; thanks.

It's a xeon silver 4309Y, which is listed on ARK as having AVX-512 and AESNI, so I think I'm good there. Also, IO is realistically not the bottleneck for most things we do - if it can deliver a few hundred MB/sec in most uses, that's more than enough. Doing so efficiently never hurts, of course. And thanks for the sha256 tip, that's a smart use of CPU features I wouldn't have thought of. Also, it's good to hear the resilver should ... mostly be fast enough on raidz? I'm not entirely comfortable with the redundancy pattern of a large pack of two-disk mirrors, nor the 50% "waste" - though the performance is at least good. Still, I won't exactly mind using something else.

2x 9 disk raidz3 sounds very reasonable, though I'll have to wait for the last few disks; I'll test with 2x 7 disk raidz1 for now and prepare to nuke and reconfigure before putting it to use.
I like the idea of using the NVME disks as a special vdev - presumably an NVME mirror should be resilient enough. (And if it goes suddenly and completely bad, I guess that's why I set up the tape backup). I do remember hearing the BSD Now guys talk about the "redirect small writes to SSD" idea, but somehow didn't consider doing it here.

Maybe the 300GB SSD would work as as L2ARC, if it's decently fast? It's a role where it's fine that it's non-redundant, at least.

I think I'll just hold off the dedup. I've briefly tested it, and it doesn't really do a lot on our data; there's very little redundancy to work with.
SHA256 is part of what Intel calls SHA-NI in the architecture manuals, but it's not listed on ARK unfortunately.
If you boot the system, have a look at the CPUID flags (usually reported as part of dmesg, on a Unix-like); since it's a Xeon Scalable 3rd Gen, it should be supported according to everything I know.

In the BSDs, it's available from /var/run/dmesg.boot under the "Features" section - here's a snip of an example (I bolded the relevant bit):
pre:
Structured Extended Features=0x2294e283<FSGSBASE,TSCADJ,SMEP,ERMS,NFPUSG,MPX,PQE,RDSEED,SMAP,CLFLUSHOPT,PROCTRACE,SHA>
It can also show up as aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256> in the BSDs dmesg - it can either be the aesni(4) driver or the ossl(4) driver, which are hardware accelerated software implementions and hand-optimized assembly instructions from OpenSSL respectively, available in crypto(9) via the crypto(4) driver.

Yes, you always want mirrored special devs - n-way mirroring is supported, so if you can fit more NVMe disks in it, you can use them too.

As for 300GB L2ARC, that might not be the most efficient use of your memory. What's your working set (ie. what do you get from zfs-stats -A, if you're on one of the BSDs?), and how much system memory do you have?
L2ARC requires mapping LBAs from the SSD into memory, and every map of every LBA takes up about 80 bytes, so as an example, a 256GB SSD needs ~37GB memory to map all 300GB of SSD storage into memory, and that's taking away memory resources not just from the ARC (which is orders of magnitude faster, compared to a SAS SSD), but also from user-space (ie. sambad, nfsd, and whatever else runs on the server).
(80*500118192)/(1024/1024/1024)=37.3GB, but it's not quite 80 bytes per LBA, I just don't remember the exact figure.

Yeah, dedup is almost never a good idea. Unless you're got virtualization of lots of very similar systems where you clone from one dataset, you're not gonna see the returns on investment.
It was more to mention that the special vdev can also provide the dedicated dedup table device that some people have been wishing for, to even begin to make use of dedup.


Brain Issues posted:

My Synology keeps telling me that I have a drive failing, but when I look at S.M.A.R.T log details everything is OK. This most recent one is a 14tb and I've removed it, and have been running an extended S.M.A.R.T test on Hard Disk Sentinel Pro for 12 hours now with no errors.

Anyone know what's the deal here, is there somewhere else that I can look on my Synology that will actually tell me WHY/HOW my drives are failing?

I'm getting really tired of 1. spending money. 2. replacing drives that are potentially still good and 3. resilvering a 100TB array.

I've had 5 different drives do this same thing now over the last 3.5 years. All purchased new, all with less than 30k hours on them when they "failed". This most recent one only has 15k hours on it.
Synology is likely looking not just at the S.M.A.R.T data, but also the actual READ/WRITE requests sent to the disks by the kernel - and if any of them return errors, it's a pretty good indicator that there's something wrong, especially if it's a non-transient issue.
I don't know specifics, though - except that in FreeBSD, it'd be reported by CAM in the system log; that's not going to help you much, except as a hint to look at the system log though. :shrug:

The only combination of filesystem+OS that'll reliably tell you whether a disk is bad is ZFS on FreeBSD or an Illumos-derivative.
All other systems make all sorts of assumptions that make it more guesswork.

BlankSystemDaemon fucked around with this message at 14:54 on Jun 1, 2022

wolrah
May 8, 2006
what?

Brain Issues posted:

My Synology keeps telling me that I have a drive failing, but when I look at S.M.A.R.T log details everything is OK. This most recent one is a 14tb and I've removed it, and have been running an extended S.M.A.R.T test on Hard Disk Sentinel Pro for 12 hours now with no errors.
I don't know entirely how Synology works, but I'm guessing if it's flagging a failure based on something other than SMART data then it's seeing some actual data errors. If you can get a shell on the thing you might be able to see the actual error in the kernel log.

Google released a study about SMART a while back and their results showed that SMART tests have a fairly high false negative rate, but a near zero false positive rate. That is, you only have something like a 50/50 chance at SMART even giving you a warning before the drive fails entirely, but if it does then the drive is almost certainly on its way out. In other words a lack of SMART test failures does not mean the drive is good.

Brain Issues
Dec 16, 2004

lol
Ok - assuming that these failures are all legitimate, is that a normal failure rate? I'd say I'm getting on average, ~20,000 hours of use out of these drives before they're being reported as "failed" by my Synology. With power cycles in the low teens.

BlankSystemDaemon
Mar 13, 2009



Brain Issues posted:

Ok - assuming that these failures are all legitimate, is that a normal failure rate? I'd say I'm getting on average, ~20,000 hours of use out of these drives before they're being reported as "failed" by my Synology. With power cycles in the low teens.
I have no idea what heuristics they're using for judging when a drive has failed.
20k hours seems a weird point though, when you're dealing with bathtub curves that mean the drive either fails in the first hundred or so hours, or can last for up to a decade.

Find out exactly what's causing it to be marked as failed.

Brain Issues
Dec 16, 2004

lol

BlankSystemDaemon posted:

Find out exactly what's causing it to be marked as failed.

"But how?" The reason I bought a Synology instead of a homebrew solution was that it was supposed to be plug and play. I've exhausted all the GUI tools and prompts for disk failure logging and have no idea where to even begin to dig deeper.



I've since removed the disk but when I visited HDD/SSD > Health Info, all the numbers seemed OK. Removing the drive and running it through Hard Disk Sentinel also presents a PASS for S.M.A.R.T.

Brain Issues fucked around with this message at 14:58 on Jun 1, 2022

BlankSystemDaemon
Mar 13, 2009



Brain Issues posted:

"But how?" The reason I bought a Synology instead of a homebrew solution was that it was supposed to be plug and play. I've exhausted all the GUI tools and prompts for disk failure logging and have no idea where to even begin to dig deeper.



I've since removed the disk but when I visited HDD/SSD > Health Info, all the numbers seemed OK. Removing the drive and running it through Hard Disk Sentinel also presents a PASS for S.M.A.R.T.
I see you found the log center, so if that doesn't tell you anything more, you need to do this:
Enable the SSH server somewhere in the Synology settings, grab putty from Simon's website (or install an OpenSSH client for Windows), connect via ssh to the Synology, and then cd /var/log/ and cat each file to look for things that stand out as errors.
You can also cat /path/to/file | nc termbin.com 9999 and post the link here, and someone can look through the log for you.

Getting a "PASS" from S.M.A.R.T can involve looking just at the attributes, which by themselves don't tell the whole story. If it's failing on a short test, that's probably a pretty good indicator that something is wrong with it.

EDIT: Also, loving :laffo: at the disclaimer about exporting cryptographic software from the US - that site is old, because that munitions law was passed in 1992 and had its teeth kicked out by the 2000s after both MD5, DES, and all of Kerberos had been independently implemented by people outside the US, as well as Phil Zimmermanns publication of PGP in 1991 and its source code in book format in 1995.

BlankSystemDaemon fucked around with this message at 15:12 on Jun 1, 2022

Klyith
Aug 3, 2007

GBS Pledge Week

wolrah posted:

Google released a study about SMART a while back and their results showed that SMART tests have a fairly high false negative rate, but a near zero false positive rate. That is, you only have something like a 50/50 chance at SMART even giving you a warning before the drive fails entirely, but if it does then the drive is almost certainly on its way out. In other words a lack of SMART test failures does not mean the drive is good.

That google study was a long time ago. Since then Backblaze has put out a lot of info about their drive failures, and some combo of better modern drives and knowing which stats to pay attention to does a lot better than 50/50.

OTOH all of their analysis is geared at predicting failures based on SMART data, so is much more useful for the question "my drive has SMART errors, are they important?" and not so much "is my no-error drive really bad?"


Brain Issues posted:

Ok - assuming that these failures are all legitimate, is that a normal failure rate? I'd say I'm getting on average, ~20,000 hours of use out of these drives before they're being reported as "failed" by my Synology. With power cycles in the low teens.

That's an elevated rate of failure. In general you should expect drives to last at least 4-5 years. (And then rapid decline after 6 years.)

https://www.backblaze.com/blog/wp-content/uploads/2013/11/drive-survival-chart-.jpg

Were all these drives in the same bay of the box? And have you tried something like pulling the "failed" drive, doing a full format wipe on a PC, then reinstalling it?

edit:

BlankSystemDaemon posted:

when you're dealing with bathtub curves that mean the drive either fails in the first hundred or so hours, or can last for up to a decade.
Drives don't really have a bathtub curve these days. Or at least the ones Backblaze gets don't. Year 1 & 2 have a slight elevation over year 3, but it's not really bathtub-like. Drive failure is now a hockey stick.

Klyith fucked around with this message at 15:19 on Jun 1, 2022

Computer viking
May 30, 2011
Now with less breakage.

BlankSystemDaemon posted:

As for 300GB L2ARC, that might not be the most efficient use of your memory. What's your working set (ie. what do you get from zfs-stats -A, if you're on one of the BSDs?), and how much system memory do you have?
L2ARC requires mapping LBAs from the SSD into memory, and every map of every LBA takes up about 80 bytes, so as an example, a 256GB SSD needs ~37GB memory to map all 300GB of SSD storage into memory, and that's taking away memory resources not just from the ARC (which is orders of magnitude faster, compared to a SAS SSD), but also from user-space (ie. sambad, nfsd, and whatever else runs on the server).
(80*500118192)/(1024/1024/1024)=37.3GB, but it's not quite 80 bytes per LBA, I just don't remember the exact figure.

Ah, I didn't realize it was quite that expensive. I'll just use it as a boot drive, then.

As for the working set, the current server has just 32GB of RAM, and the new one 64GB. I'd like a lot more, but realistically I don't think it'll be a huge problem?
code:
------------------------------------------------------------------------
ZFS Subsystem Report                            Wed Jun  1 16:21:04 2022
------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                208.96  m
        Recycle Misses:                         0
        Mutex Misses:                           204.19  k
        Evict Skips:                            72.51   k

ARC Size:                               55.94%  15.66   GiB
        Target Size: (Adaptive)         62.55%  17.51   GiB
        Min Size (Hard Limit):          12.50%  3.50    GiB
        Max Size (High Water):          8:1     28.00   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       18.06%  3.16    GiB
        Frequently Used Cache Size:     81.94%  14.35   GiB

ARC Hash Breakdown:
        Elements Max:                           7.08    m
        Elements Current:               54.04%  3.83    m
        Collisions:                             201.20  m
        Chain Max:                              11
        Chains:                                 972.67  k

BlankSystemDaemon
Mar 13, 2009



Computer viking posted:

Ah, I didn't realize it was quite that expensive. I'll just use it as a boot drive, then.

As for the working set, the current server has just 32GB of RAM, and the new one 64GB. I'd like a lot more, but realistically I don't think it'll be a huge problem?
code:
------------------------------------------------------------------------
ZFS Subsystem Report                            Wed Jun  1 16:21:04 2022
------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                208.96  m
        Recycle Misses:                         0
        Mutex Misses:                           204.19  k
        Evict Skips:                            72.51   k

ARC Size:                               55.94%  15.66   GiB
        Target Size: (Adaptive)         62.55%  17.51   GiB
        Min Size (Hard Limit):          12.50%  3.50    GiB
        Max Size (High Water):          8:1     28.00   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       18.06%  3.16    GiB
        Frequently Used Cache Size:     81.94%  14.35   GiB

ARC Hash Breakdown:
        Elements Max:                           7.08    m
        Elements Current:               54.04%  3.83    m
        Collisions:                             201.20  m
        Chain Max:                              11
        Chains:                                 972.67  k
Yeah, L2ARC is expensive, memory-wise - but if you've got a working set that's several TB large, and don't have one of the high-end Xeon Platinum that can take multiple TB of memory, it's the only way to have any caching.

You cut off the statistics I was looking for, here's an example from my FreeBSD development laptop:
pre:
ARC Eviction Statistics:
        Evicts Total:                           146684496384
        Evicts Eligible for L2:         89.59%  131427413504
        Evicts Ineligible for L2:       10.40%  15257082880
        Evicts Cached to L2:                    0

ARC Efficiency
        Cache Access Total:                     101096844
        Cache Hit Ratio:                96.89%  97957752
        Cache Miss Ratio:               3.10%   3139092
        Actual Hit Ratio:               96.64%  97701114

        Data Demand Efficiency:         93.70%
        Data Prefetch Efficiency:       8.29%

        CACHE HITS BY CACHE LIST:
          Most Recently Used (mru):     22.51%  22052217
          Most Frequently Used (mfu):   77.22%  75648897
          MRU Ghost (mru_ghost):        0.20%   205660
          MFU Ghost (mfu_ghost):        0.18%   179643

        CACHE HITS BY DATA TYPE:
          Demand Data:                  31.54%  30901364
          Prefetch Data:                0.04%   41190
          Demand Metadata:              67.65%  66270446
          Prefetch Metadata:            0.76%   744752

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  66.09%  2074642
          Prefetch Data:                14.49%  455129
          Demand Metadata:              15.02%  471564
          Prefetch Metadata:            4.38%   137757

CopperHound
Feb 14, 2012

Now you got me curious about l2arc memory usage, do any of these numbers in arc_summary say how much of my arc is eaten up by it?
code:
------------------------------------------------------------------------
ZFS Subsystem Report                            Wed Jun 01 08:46:55 2022
FreeBSD 12.2-RELEASE-p11                                   zpl version 5
Machine: [sekret]                                       spa version 5000

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    80.4 %   19.8 GiB
        Target size (adaptive):                        80.5 %   19.8 GiB
        Min size (hard limit):                         4.0 %  1021.6 MiB
        Max size (high water):                           24:1   24.7 GiB
        Most Frequently Used (MFU) cache size:         98.5 %   17.6 GiB
        Most Recently Used (MRU) cache size:            1.5 %  277.4 MiB
        Metadata cache size (hard limit):              75.0 %   18.5 GiB
        Metadata cache size (current):                 19.4 %    3.6 GiB
        Dnode cache size (hard limit):                 10.0 %    1.8 GiB
        Dnode cache size (current):                    48.2 %  912.9 MiB

ARC hash breakdown:
        Elements max:                                               6.1M
        Elements current:                              59.0 %       3.6M
        Collisions:                                                 2.4G
        Chain max:                                                    12
        Chains:                                                   900.3k

ARC misc:
        Deleted:                                                    1.9G
        Mutex misses:                                             124.3k
        Eviction skips:                                            46.2M

ARC total accesses (hits + misses):                                22.5G
        Cache hit ratio:                               90.9 %      20.4G
        Cache miss ratio:                               9.1 %       2.1G
        Actual hit ratio (MFU + MRU hits):             90.5 %      20.3G
        Data demand efficiency:                        75.7 %       1.3G
        Data prefetch efficiency:                      20.1 %       2.1G

Cache hits by cache type:
        Most frequently used (MFU):                    97.1 %      19.8G
        Most recently used (MRU):                       2.5 %     514.7M
        Most frequently used (MFU) ghost:               0.4 %      72.7M
        Most recently used (MRU) ghost:                 0.6 %     114.6M

Cache hits by data type:
        Demand data:                                    4.7 %     957.3M
        Demand prefetch data:                           2.1 %     420.3M
        Demand metadata:                               87.9 %      17.9G
        Demand prefetch metadata:                       5.4 %       1.1G

Cache misses by data type:
        Demand data:                                   15.0 %     307.2M
        Demand prefetch data:                          81.3 %       1.7G
        Demand metadata:                                2.1 %      43.5M
        Demand prefetch metadata:                       1.6 %      32.3M

DMU prefetch efficiency:                                          654.6M
        Hit ratio:                                     43.2 %     282.8M
        Miss ratio:                                    56.8 %     371.8M

L2ARC status:                                                   DEGRADED
        Low memory aborts:                                            75
        Free on write:                                              4.0M
        R/W clashes:                                                2.9k
        Bad checksums:                                                31
        I/O errors:                                                    0

L2ARC size (adaptive):                                         427.8 GiB
        Compressed:                                    99.2 %  424.6 GiB
        Header size:                                    0.1 %  290.8 MiB

L2ARC breakdown:                                                    2.1G
        Hit ratio:                                     71.2 %       1.5G
        Miss ratio:                                    28.8 %     591.0M
        Feeds:                                                     12.5M

L2ARC writes:
        Writes sent:                                    100 %      11.7M

L2ARC evicts:
        Lock retries:                                               3.0k
        Upon reading:                                                470
And uhh yeah, running this is my first time seeing that the l2arc is degraded.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!

BlankSystemDaemon posted:

L2ARC requires mapping LBAs from the SSD into memory, and every map of every LBA takes up about 80 bytes, so as an example, a 256GB SSD needs ~37GB memory to map all 300GB of SSD storage into memory, and that's taking away memory resources not just from the ARC (which is orders of magnitude faster, compared to a SAS SSD), but also from user-space (ie. sambad, nfsd, and whatever else runs on the server).
(80*500118192)/(1024/1024/1024)=37.3GB, but it's not quite 80 bytes per LBA, I just don't remember the exact figure.
That seems not entirely right. By that claim, my current 124GB L2ARC would eat up all of my memory, but it's far from that.

AFAIK, it's 80 bytes per ZFS record, not LBA.

code:
ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    89.8 %    9.4 GiB
        Target size (adaptive):                        90.1 %    9.5 GiB
        Min size (hard limit):                          4.6 %  498.3 MiB
        Max size (high water):                           21:1   10.5 GiB
        Most Frequently Used (MFU) cache size:         96.3 %    8.5 GiB
        Most Recently Used (MRU) cache size:            3.7 %  337.2 MiB

L2ARC size (adaptive):                                         151.7 GiB
        Compressed:                                    82.0 %  124.4 GiB
        Header size:                                    0.3 %  430.2 MiB
--edit:
^^^ My header allocation seem higher, because I'm mainly caching metadata of all volumes and data of ZVOLs, which obviously have an average lower record size.

Computer viking
May 30, 2011
Now with less breakage.

BlankSystemDaemon posted:

Yeah, L2ARC is expensive, memory-wise - but if you've got a working set that's several TB large, and don't have one of the high-end Xeon Platinum that can take multiple TB of memory, it's the only way to have any caching.

You cut off the statistics I was looking for, here's an example from my FreeBSD development laptop:

No, that's literally everything. Looking at it, I guess I should have used -AE or even -AEL? (The current server has an L2ARC set up.)
Of course, I did just reboot to cold-swap in a new drive, so it'll take a bit to get representative numbers again.

e: And sorry for the repeated editing of this post, I should draft before posting.

Computer viking fucked around with this message at 23:21 on Jun 1, 2022

BlankSystemDaemon
Mar 13, 2009



CopperHound posted:

Now you got me curious about l2arc memory usage, do any of these numbers in arc_summary say how much of my arc is eaten up by it?

[SNIP]

And uhh yeah, running this is my first time seeing that the l2arc is degraded.
It looks to me like you're missing some numbers, because you're not seeing a section called ARC Efficiency - at least that's part of zfs-stats, so I don't know why it's missing in arc_summary (whatever that is).

Does the L2ARC degraded not show up in zpool status?

Combat Pretzel posted:

That seems not entirely right. By that claim, my current 124GB L2ARC would eat up all of my memory, but it's far from that.

AFAIK, it's 80 bytes per ZFS record, not LBA.

[SNIP]
--edit:
^^^ My header allocation seem higher, because I'm mainly caching metadata of all volumes and data of ZVOLs, which obviously have an average lower record size.
You're absolutely right, it's per-record. It makes no sense that it'd be per-LBA.

The mathematics is a bit more complicated than that though - I haven't found it in the source code, but according to one of the ZFS and FreeBSD developers who makes a living doing storage, the formula is:
pre:
(L2ARC size in kilobytes) / (typical recordsize -- or volblocksize -- in kilobytes) * 70 bytes = ARC header size in RAM
Sorry! orz

Computer viking posted:

No, that's literally everything. Looking at it, I guess I should have used -AE or even -AEL? (The current server has an L2ARC set up.)
Of course, I did just reboot to cold-swap in a new drive, so it'll take a bit to get representative numbers again.

e: And sorry for the repeated editing of this post, I should draft before posting.
zfs-stats -E doesn't exist in 1.4 so maybe it's hidden there. If not, the values come from sysctl(8) in the kstat.zfs OID, so you can calculate things yourself.

Get yourself to upgrading, friend - if you're running -STABLE and not release, you can benefit from using src/tools/build/beinstall.sh which installs into a boot environment and lets you reboot into it.
This combines well with the ability to switch boot environments in the FreeBSD standard loader - so even if everything is so hosed up that you cannot boot the system properly, the bootblock will remain untouched and you can get back to the exact environment you were in, and do rootcausing there.

I'm absolutely guilty of repeatedly editing my old posts. :sigh:

BlankSystemDaemon fucked around with this message at 23:26 on Jun 1, 2022

Computer viking
May 30, 2011
Now with less breakage.

For historical reasons, it's not set up for boot environments - I think this install started as a 10.x in 2015, and I haven't really bothered to try and retrofit it in there.

But yes, it's really about time to jump to 13.1.

BlankSystemDaemon
Mar 13, 2009



Computer viking posted:

For historical reasons, it's not set up for boot environments - I think this install started as a 10.x in 2015, and I haven't really bothered to try and retrofit it in there.

But yes, it's really about time to jump to 13.1.
Boot environments are worth it, if you ask me. I reboot once, and either everything works or I go back and fix it.

It's basically magic.

Adbot
ADBOT LOVES YOU

CopperHound
Feb 14, 2012

BlankSystemDaemon posted:

It looks to me like you're missing some numbers, because you're not seeing a section called ARC Efficiency - at least that's part of zfs-stats, so I don't know why it's missing in arc_summary (whatever that is).

Does the L2ARC degraded not show up in zpool status?
IDK what the difference between zfs-stats ans arc_summary is, but

Some redditor posted:

I just realized that the zfs-stats command is missing from TrueNAS Core 12x, since it was present on FreeNAS 11.2x.
Haven't been able to find any info on why it's missing, however I believe the replacement for the zfs-stats command will be arc_summary and arcstat moving forward in TrueNAS systems. The zpool command is for managing pools, not the ARC/L2ARC or anything related to memory.

My zpool status doesn't list anything as degraded:
code:
  pool: pool
 state: ONLINE
  scan: scrub repaired 0B in 14:05:03 with 0 errors on Sun May 15 14:05:04 2022
config:

	NAME                                            STATE     READ WRITE CKSUM
	pool                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    gptid/f705a0ef-5dce-11ec-bda2-002590decd63  ONLINE       0     0     0
	    gptid/f88eb502-5dce-11ec-bda2-002590decd63  ONLINE       0     0     0
	    gptid/d2b88424-5cbc-11ec-bda2-002590decd63  ONLINE       0     0     0
	    gptid/d4b42333-5cbc-11ec-bda2-002590decd63  ONLINE       0     0     0
	    gptid/d6339699-5cbc-11ec-bda2-002590decd63  ONLINE       0     0     0
	    gptid/d7221c2f-5cbc-11ec-bda2-002590decd63  ONLINE       0     0     0
	    gptid/d819727c-5cbc-11ec-bda2-002590decd63  ONLINE       0     0     0
	    gptid/d9215ba7-5cbc-11ec-bda2-002590decd63  ONLINE       0     0     0
	cache
	  gptid/a8022ab5-69ed-11ec-877e-002590decd63    ONLINE       0     0     0

errors: No known data errors

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply