|
Hed posted:I'm looking to give my FreeNAS-11.2-U8 install some love. I have 5 6TB drives in a RAID-Z2. Just making sure that if I want to upgrade I need to buy 5 <newsize> drives and would have to power down and replace the drives one by one 5 times to get the new size? I just finished building a new server, re-used my 804. It seems like it's still one of the better cases. Not sure how many sata ports you have, but you could add the new drives up to your max ports. Then you don't have to power down for each swap.
|
# ? Apr 18, 2022 13:48 |
|
|
# ? May 15, 2024 12:27 |
|
I'm not at all sure of RaidZ but when I replaced all my disks in my striped mirror vdev I swapped one by one, power off/on, and then triggered a resilver and when I was done swapping all four disks my pool size was automatically upgraded.
|
# ? Apr 18, 2022 13:53 |
phosdex posted:I just finished building a new server, re-used my 804. It seems like it's still one of the better cases. withoutclass posted:I'm not at all sure of RaidZ but when I replaced all my disks in my striped mirror vdev I swapped one by one, power off/on, and then triggered a resilver and when I was done swapping all four disks my pool size was automatically upgraded.
|
|
# ? Apr 18, 2022 14:52 |
|
BlankSystemDaemon posted:Won't work until raidz expansion is implemented, which should hopefully be in OpenZFS 3.0. I never did this intentionally so it's possible an upgrade of TrueNAS has it enabled. I don't even know where to check to see if it's enabled.
|
# ? Apr 18, 2022 15:19 |
|
So would it work to shutdown a freenas server, boot to a different OS, and just directly clone each drive to a new one? And then go back to freenas and have it do a full check / scrub? It seems like resilvering a full drive in parity-based raid systems is always a kinda slow process, because it's reconstructing every data block from parity. And you have to do one at a time for every drive. If you did direct copies it would run at max write speed, plus you could do multiple drives at once. The copy operation would not have 100% data validation in the same way that a resilver does... But you'd still have all the parity data, so if any copy was flawed it would be caught (and likely repaired) right away during the scrub. Plus you'd still have the original drives.
|
# ? Apr 18, 2022 15:55 |
|
BlankSystemDaemon posted:Won't work until raidz expansion is implemented, which should hopefully be in OpenZFS 3.0. They don't mean raidz expansion, they just mean being able to have more than one new drive physically installed per power cycle. They'd still need to resilver for each disk individually, just with fewer power cycles needed to swap drives between resilvers.
|
# ? Apr 18, 2022 16:52 |
|
At what point in the number of drives and ZFS redundancy level does an SSD write cache stop mattering, assuming the NAS is on a gigabit LAN with 1-2 concurrent clients? I ask because space is at a premium, and a proper hot-swappable vendor-made TrueNAS device fits into my remaining space / power budget better than a homemade Define 7 or even a Node/Silverstone. Since they're quite expensive I'm wondering what corners I can cut for my use case.
|
# ? Apr 18, 2022 18:11 |
|
Oh boy that's a can of worms, but iops is probably the limiting factor that doesn't scale up without additional vdevs. Also, write cache only matters with synchronous writes, which you might not need or want for your use case. CopperHound fucked around with this message at 19:26 on Apr 18, 2022 |
# ? Apr 18, 2022 19:20 |
Shumagorath posted:At what point in the number of drives and ZFS redundancy level does an SSD write cache stop mattering, assuming the NAS is on a gigabit LAN with 1-2 concurrent clients? I ask because space is at a premium, and a proper hot-swappable vendor-made TrueNAS device fits into my remaining space / power budget better than a homemade Define 7 or even a Node/Silverstone. Since they're quite expensive I'm wondering what corners I can cut for my use case. You have the Separate Intent Log, also known as the SLOG, and internally known as a log device (preferably two or more devices as a mirror). It temporarily but persistently (in case of power-loss, so you want capasitor-backed SSDs) stores synchronous writes and rearranges them such that they can be written sequentially, so they can be more easily read off the disk subsequently. Asynchronous writes cannot be cached, but they can be stored on a separate device; you need to use a special vdev (which also stores the metadata, so preferably two or more devices as a mirror) and the per-dataset small_small_block property set to whichever size blocks you want to store on the special vdev - typically, the kind of asynchronous writes you want to store on the special vdev are small writes, ie. when something doesn't fill the whole recordsize. Operations per second has always been the limiting factor for any RAID (or, indeed, single-disk devices), not just ZFS. If you need database/SAP/enterprise levels of IOPS, you want to use NVMe instead as not only does it have more IOPS, it also has more queues - AHCI only has one queue, and you need NCQ or TCQ for reordering (the latter is for SCSI and SAS, but there's still only one queue). BlankSystemDaemon fucked around with this message at 19:36 on Apr 18, 2022 |
|
# ? Apr 18, 2022 19:32 |
|
If the device role is just bulk file storage and Plex server it sounds like I can get away without SSDs entirely...?
|
# ? Apr 18, 2022 19:36 |
Shumagorath posted:If the device role is just bulk file storage and Plex server it sounds like I can get away without SSDs entirely...?
|
|
# ? Apr 18, 2022 19:36 |
|
iXSystems absolutely fleece you on drives so I'd be buying the appliance empty anyhow. What about RAM? I don't know the prices for ECC but 64GB sounds excessive for five bays...?
|
# ? Apr 18, 2022 21:09 |
Shumagorath posted:iXSystems absolutely fleece you on drives so I'd be buying the appliance empty anyhow. What about RAM? I don't know the prices for ECC but 64GB sounds excessive for five bays...?
|
|
# ? Apr 18, 2022 21:29 |
|
Shumagorath posted:iXSystems absolutely fleece you on drives so I'd be buying the appliance empty anyhow. What about RAM? I don't know the prices for ECC but 64GB sounds excessive for five bays...? You would be perfectly fine with 16GB, which I think you can find for about $100 on ebay.
|
# ? Apr 18, 2022 21:50 |
|
BlankSystemDaemon posted:Absolutely. The only possible caveat here is if you are populating said server with the wonders of torrents and/or usenet, having a small cheap SSD that's not actually part of the pool as a scratch disk can come in handy. Your horribly fragmented downloads get put together / unpacked on the SSD, and then *arr / your client can move the file over to the ZFS pool as a big continuous write. Also, unpacking nzbs is miserable on spindles. But used in this way it can be literally the cheapest nastiest "at least it's from a company I've heard of before" SSD you can get your hands on, because when it shits the bed it doesn't impact your pool.
|
# ? Apr 18, 2022 21:51 |
IOwnCalculus posted:The only possible caveat here is if you are populating said server with the wonders of torrents and/or usenet, having a small cheap SSD that's not actually part of the pool as a scratch disk can come in handy. Your horribly fragmented downloads get put together / unpacked on the SSD, and then *arr / your client can move the file over to the ZFS pool as a big continuous write.
|
|
# ? Apr 18, 2022 21:56 |
|
Thanks everyone for the replies. Sounds like I'll find my elbow in the HDD curve, upgrade these 6TB drives one-by-one with a resilver in between. Thanks for telling me about the zpool expand flag!
|
# ? Apr 19, 2022 16:13 |
|
I forget if you are able to plug in extra drives, but wouldn't running code:
E: further reading suggests that I may be wrong.
|
# ? Apr 19, 2022 16:24 |
|
I've seen no difference in performance with a resilver whether or not the drive being replaced is still present and healthy. Which is sort of annoying I suppose, it seems like in that scenario you really could just do a sequential read off of the old and write onto the new, but there must be other work that needs to go on as well that makes this impossible.
|
# ? Apr 19, 2022 16:44 |
|
Can you at least run the resilver on multiple drives at once instead of a one at a time replace?
|
# ? Apr 19, 2022 16:49 |
|
I've wondered if it's possible to do two disks at a time in a raidz2 when upgrading, but I've never tried. Maybe I'll fool around with it in a demo setup to see if it's possible. Figure if you're not breaking the array parity it could work in theory. But man I'd be pucker city while that rebuild was going on.
|
# ? Apr 19, 2022 18:08 |
|
If I want to go down the “san” route of having a fileserver that is only a (FreeBSD?) high speed data store and then linking over Infiniband QDR or FDR or 10 or 40 gbe to an application server (Linux?) that does the actual hosting, what is going to be the keywords I need to search for there? Is this where you do iSCSI and serve block devices? If so, what filesystem do you put on the block devices, and doesn’t that have additional overhead/performance impact vs a ZFS managed dataset both at the SAN and app server levels? The other way is NFS, right? And that has performance impact too of course. For most stuff it absolutely won’t matter for a homelab level project, but Postgres has always been a pain point there. The snapshot stuff is super useful for migrations on that kind of stuff (although it can be done other ways ofc) and I want to toy around. I know there is going to be some performance impact (and I’m totally onboard with an Optane SLOG to help mitigate that) and I’ll be playing around with some tuning options and pgbench to see what works best/try to quantify the performance impact there. But it’d be helpful to start at least in the neighborhood of the right answer, using the right technologies at least Kinda wondering if database services might be the exception and maybe that’s a service that gets provided by the SAN server even if the applications run on something else.
|
# ? Apr 19, 2022 19:13 |
CopperHound posted:I forget if you are able to plug in extra drives, but wouldn't running IOwnCalculus posted:I've seen no difference in performance with a resilver whether or not the drive being replaced is still present and healthy. Nulldevice posted:I've wondered if it's possible to do two disks at a time in a raidz2 when upgrading, but I've never tried. Maybe I'll fool around with it in a demo setup to see if it's possible. Figure if you're not breaking the array parity it could work in theory. But man I'd be pucker city while that rebuild was going on. CopperHound posted:Can you at least run the resilver on multiple drives at once instead of a one at a time replace? It's definitely possible, but if you encounter an URE during the rebuild on a record that doesn't have a ditto record (ie. usually just data, unless you set copies=[23] on a dataset, as metadata has two ditto blocks), that record will be broken and you'll need to restore the file from backup.
|
|
# ? Apr 19, 2022 19:30 |
|
Paul MaudDib posted:If I want to go down the “san” route of having a fileserver that is only a (FreeBSD?) high speed data store and then linking over Infiniband QDR or FDR or 10 or 40 gbe to an application server (Linux?) that does the actual hosting, what is going to be the keywords I need to search for there? My setup is a TrueNAS with 10GbE to my switch, ESXi with 10GbE to my switch and the second port of each 10 card connected directly together with a DAC on it's own IP subnet. In TrueNAS the iSCSI device is a file on the filesystem. At least the way I did it - not sure if there is a different way. It's been working quite well. But I'm just a thread lurker/asking questions and certainly no expert on any of this. I have a sample size of exactly 1.
|
# ? Apr 19, 2022 20:08 |
|
IOwnCalculus posted:The only possible caveat here is if you are populating said server with the wonders of torrents and/or usenet, having a small cheap SSD that's not actually part of the pool as a scratch disk can come in handy. Your horribly fragmented downloads get put together / unpacked on the SSD, and then *arr / your client can move the file over to the ZFS pool as a big continuous write. Hmmm. My setup grew into: Boot pool My main spinny disk pool Two 2tb SSDs in a mirrored pool that I run my VMs off of. Microcenter is a great source of cheap these. Downloads do just live on that spinny pool though. Haven't noticed any issues but it might save a significant amount of wear if I rejiggered it the way you described. 🤔 I do have my *arr apps use hardlinks though so I can continue seeding my ISOs after they've been imported without storing twice.
|
# ? Apr 19, 2022 20:26 |
|
Paul MaudDib posted:If I want to go down the “san” route of having a fileserver that is only a (FreeBSD?) high speed data store and then linking over Infiniband QDR or FDR or 10 or 40 gbe to an application server (Linux?) that does the actual hosting, what is going to be the keywords I need to search for there? It really all depends on your workload, IOPS needs, management needs, and backing storage. ZFS backed iSCSI LUNs are a 100% valid way to do things, but you trade the SAN host's awareness of storage for the VM's awareness of storage. Caching behaves a little differently, compression can act a little oddly, but it behaves in the VM like a raw local disk so you can do anything to it that you could on a demo laptop or test system. My hobo-SAN whitebox monstrosity is an old DDR3 era Supermicro case and mobo with an old Xeon in it, 32gb of ram, running Solaris. Choosing between iSCSI, SMB and NFS, I wanted at rest disk encryption and a windows based backup utility that needed VSS snapshots to work, so I ended up using iSCSI. The backing store is a RAIDZ2 10 disk array, with no flash caching or anything fancy. The ZFS volume was published via COMSTAR iSCSI LUNs, mounted to my file server VM, and formatted NTFS. It hosts media, stores files, and generally does what I need it to without hardly any issues. Current system uptime is almost 400 days. I'm really tempted to replace the mobo/cpu/ram to modernize it, and throw this stupid thing on it. Slot 8 M2 SSDs in it, make it a 2 vdev 4 drive RAIDZ1 array for double iops vs an 8 drive RAIDZ2, enjoy having a SMB share that can casually saturate 2 SFP+ links.
|
# ? Apr 19, 2022 20:29 |
|
Chilled Milk posted:do have my *arr apps use hardlinks though so I can continue seeding my ISOs after they've been imported without storing twice.
|
# ? Apr 19, 2022 20:50 |
Paul MaudDib posted:If I want to go down the “san” route of having a fileserver that is only a (FreeBSD?) high speed data store and then linking over Infiniband QDR or FDR or 10 or 40 gbe to an application server (Linux?) that does the actual hosting, what is going to be the keywords I need to search for there? As for iSCSI and block devices, ctld(8) or iser(4) (the latter is use-case specific, as it requires a Mellanox NIC to offload on) both function as iSCSI daemons which can use ZFS volumes (basically, ZFS datasets that act as character devices/disks on FreeBSD) as targets/extents. They get formatted with whatever file system gets used by the initiator (iSCSI terminology is confusing; the initiator and target mean the opposite of what most people initially think they mean). One important thing for iSCSI, though, is that irrespective of whether you're using a store-and-forward or cut-through switch (the latter tends to be what Infiniband switches do, because it reduces latency - but it's also more expensive), you absolutely need to ensure that you're doing at least 4096B jumboframes, because a lot of modern filesystems will do native 4k sectors and splitting 4096B across a standard Ethernet consisting of up to 1514 bytes is a recipe for a terrible experience. Also, when you're dealing with iSCSI, you need to handle synchronous writes properly, which means a pair of small (on the order of 10-20GB, possibly under-provisioned to get as much write endurance) capacitor SLC or MLC flash SSDs - because while ZFS itself won't mind losing a log device nowadays (it used to not like it very much, at all), anything using the iSCSI targets will lose up to 5 seconds of data which it assumed was flushed to disk, and is therefore irrevocably gone forever. Databases and NFS also rely on synchronous writes though, so any server that's good for iSCSI will also be good for the other two. Having said all that, I think it's important that you realize you have to be able to maintain it, because I don't wanna be responsible for giving support if things break. Motronic posted:My setup is a TrueNAS with 10GbE to my switch, ESXi with 10GbE to my switch and the second port of each 10 card connected directly together with a DAC on it's own IP subnet. It should be as easy as dding your file onto a ZFS volume and switching over the extent from using the file to using the device. BlankSystemDaemon fucked around with this message at 21:07 on Apr 19, 2022 |
|
# ? Apr 19, 2022 20:51 |
|
BlankSystemDaemon posted:I would highly encourage you to use device extents instead of file extents for your targets, since there's a whole bunch of filesystem/ZPL overhead that you can simply get rid of that way. That makes sense and it what I'm used to on commercial NAS/SAN offerings. I didn't have an entire pool I wanted to dedicate to it at the time and I'm still not sure if that's the way I really want to go or if I'm going back to local storage for the ESX hosts. It's given better performance than the SSD they were on locally before and it doesn't seem to make a dent in truenas resources. This is post-10GbE install (thanks again for the pointer on those NICs). Just for redundancy I'm leaning towards keeping them on the NAS except for maybe backup VMs that are going to be backing up to a local disk in the ESX host and maybe a backup DNS resolver.
|
# ? Apr 19, 2022 21:15 |
|
necrobobsledder posted:Wait, you can’t use hard links across file systems. How are you able to do this without some crazy ARC type setup with performance policies? Not 100% sure if zfs datasets within a pool share inodes is the reason I still do symbolic links What I meant was "I wonder if there's some combination of file system setup and/or application config that would let me have my cake and eat it too so I can do this thing I'm describing" Which, just thinking about it again. Just set up a dataset on the SSD pool, download to that, and upon complete, move to a downloads spot on my main pool. Duh
|
# ? Apr 20, 2022 04:14 |
|
That way it behaves like a standard filesystem for the purposes of downloads -- download program pre-allocates the file, FS assigns space rationally, tiny 16k writes just get added to the end instead of causing full block copies. aww poo poo I was looking at a thing where people were talking about both btrfs and zfs, btrfs is the one where you can disable CoW Klyith fucked around with this message at 16:30 on Apr 20, 2022 |
# ? Apr 20, 2022 15:39 |
|
Is disabling CoW on a dataset even possible?
|
# ? Apr 20, 2022 16:01 |
|
This has got me thinking of actually having the temp download directory on my SSD jaildisk and then letting the programs move their files when completed.
|
# ? Apr 20, 2022 16:09 |
Klyith posted:It appears that the standard way to avoid causing excessive fragmentation on ZFS is to make a subvolume for downloads that disables copy on write (and probably snapshots while you're at it). Also, how are you measuring that fragmentation? ZFS doesn't let you do it, it can only give you the free space fragmentation, which is an indicator of the fragmentation of all the free space on disk that can't be made into contiguous blocks that are whatever your recordsize is set to (by default 128kB). The problem isn't ZFS, it's that there isn't a standard for how torrent clients should write the data. Some will insist on syncronously writing individual UDP-packet-sized segments of torrent pieces, which can vary in size from 32kB to 16MB, to disk synchronously, then rewrite that data synchronously once they have the full pieces (and if it fails the checksum, then rewrite it again), then rewrite the full file synchronously, and possibly even rewrite the full torrent synchronously one final time. Other clients will store individual blocks in a memory-mapped buffer then write the entire torrent piece to disk synchronously before rewriting the file asynchronously once completed. Still other clients will download everything in the torrent into a memory-mapped segment and only write everything asynchronously once a complete file has been completely written to the memory-mapped segment and checksummed. Which one is the objectively best one, irrespective of filesystem? The last one is best for ZFS because it has an dirty data buffer for asynchronous data, and can thus completely avoid fragmentation if you do it that way - but it's also the way that's least commonly used because most developers think filesystems aren't their problem, and users apparently hate their memory being used and would rather have memory that's free and thereby waste electricity. As a result, the correct recommendation for someone wanting to just download torrents onto ZFS using any of at least a dozen different clients, is to use a dataset with the sync, checksum and primarycache properties set to disabled as the download directory, and hope that the client can at least figure out how to move files contiguously once it's completed. This last should be immensely improved by the Block Reference Table, which should hopefully also be included in OpenZFS 3.0. Here's a video on it by the FreeBSD developer who's working on it, from the OpenZFS Developer Summit 2020: https://www.youtube.com/watch?v=hYBgoaQC-vo CopperHound posted:Is disabling CoW on a dataset even possible? BlankSystemDaemon fucked around with this message at 16:59 on Apr 20, 2022 |
|
# ? Apr 20, 2022 16:32 |
|
BlankSystemDaemon posted:No, it's an inherit property of ZFS, BTRFS, APFS, and basically everything that isn't a clone of a filesystem designed back in 1980 (FFS/UFS, which is still also in FreeBSD). Btrfs actually allows you to disable COW with the mount option nodatacow, see btrfs(5). This comes with the major caveat that you disable checksumming and compression, which is very important to consider alongside this note from the top of the section: btrfs(5) posted:Most mount options apply to the whole filesystem and only options in the first mounted subvolume will take effect. This is due to lack of implementation and may change in the future. This means that (for example) you can’t set per-subvolume nodatacow, nodatasum, or compress using mount options. It's as if they're trying their best to mess up their users' data.
|
# ? Apr 20, 2022 16:59 |
Keito posted:Btrfs actually allows you to disable COW with the mount option nodatacow, see btrfs(5). This comes with the major caveat that you disable checksumming and compression, which is very important to consider alongside this note from the top of the section: I do wonder how many people are involved with BTRFS. One of the things I like about OpenZFS is that the developer summits are places where ideas get discussed among the developers - because even if someone has what appears to them to be a great idea, it's usually a bad idea if other developers can poke holes in it.
|
|
# ? Apr 20, 2022 17:13 |
|
BlankSystemDaemon posted:What is fragmentation on a copy-on-write filesystem? If you move a record, the filesystem is no longer copy-on-write. Yeah at first I was going say "is this even a problem anyone needs to worry about?" before I got distracted by the copy-on-write thing, which turned out to be btrfs. My bad. I guess the one actual ZFS thing is that if you have turned on automatic snapshots, you could exclude a downloads folder from that. Free space fragmentation, it seems like the main problem is that people on the internet don't have good guidelines for what % fragmentation to worry about. Like "50% fragmentation" would be bad on a normal FS but is totally normal measuring free space on ZFS. It's a bad thing if the FS has to do a lot of shuffling to write files, but when does that happen in practice?
|
# ? Apr 20, 2022 17:41 |
|
BlankSystemDaemon posted:As a result, the correct recommendation for someone wanting to just download torrents onto ZFS using any of at least a dozen different clients, is to use a dataset with the sync, checksum and primarycache properties set to disabled as the download directory, and hope that the client can at least figure out how to move files contiguously once it's completed. I mean, the better recommendation is just to buy an Optane drive (or better, a pair) and do SLOG. You can literally buy a unused surplus 16gb M.2 Optane stick for $12 on eBay and that’s plenty both in terms of size and performance. Optane doesn’t really suffer from the write cache problem with flash (you’re fine even without battery backup because writes are de-facto instant, there is never any latency for block erase or other busywork that is necessary due to flash) and in consumer terms they have effectively infinite life even with heavy writes (they are designed as cache drives from the outset) and the latency is an order of magnitude better than flash. They are ideal for this workload and the lovely 16gb Optane sticks (Optane M10/H10 are the model) are dirt cheap because they were mass produced for a while for Optane HDD caching but then flopped and nobody knows what to do with a 16gb SSD. Do check what size you’re getting, there are 22110 sticks that are longer than the popular 2280 size, but either way silverstone sells an adapter card that will put a M.2 card into a PCIe slot in less than a 1U height (so you can basically put it into any slot that isn’t physically obstructed). They actually sell a couple models, one only does 2280 but they also sell one that does 22110, so check your clearance in your PC and the size of your drive, but they’re only about $20 for the PCIe adapter if you need it. Or if you don’t have height problems the regular cards are like $5. I think the fragmentation they’re talking about isn’t of the dataset per se, but of the extents in the pool - small non-async writes will fragment the contiguous free space over time, so even a big sequential transfer later can be slowed by having to write a bunch of smaller extents instead of allocating one big one. It’s a noted effect with databases on ZFS for example, and the fix is either you turn async to always on those datasets, or you run slog (which is effectively the same thing but safer/a different set of risks). Paul MaudDib fucked around with this message at 18:13 on Apr 20, 2022 |
# ? Apr 20, 2022 18:02 |
Klyith posted:Yeah at first I was going say "is this even a problem anyone needs to worry about?" before I got distracted by the copy-on-write thing, which turned out to be btrfs. My bad. I guess the one actual ZFS thing is that if you have turned on automatic snapshots, you could exclude a downloads folder from that. Even at 90% capacity on a relatively modern pool that was on my old server, I noticed no write penalties that caused me to go under ~116MBps using NFS over TCP. I believe Windows will do background de-fragmentation nowadays if you're on spinning rust but who uses spinning rust for Windows? I have no idea about Linux using extN.
|
|
# ? Apr 20, 2022 18:08 |
|
|
# ? May 15, 2024 12:27 |
|
Paul MaudDib posted:I mean, the better recommendation is just to buy an Optane drive (or better, a pair) and do SLOG. Now L2ARC for seeding on the other hand? There is probably some merit.
|
# ? Apr 20, 2022 19:06 |