Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
LionArcher
Mar 29, 2010


What’s the best way to store these forums offline? I want an archive of them when poo poo really hits the fan.

Adbot
ADBOT LOVES YOU

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Is there currently some drama going on that I’m missing by ignoring GBS et al?

Aware
Nov 18, 2003

Rap Game Goku posted:

Doesn't Unraid do ZFS now?

Apparently, though I wouldn't touch it until it's had another year or two of use and fixea. And also because migrating to it would be a right pain in the rear end unless I buy more disks.

Kibner
Oct 21, 2008

Acguy Supremacy
Partner told me not to spend so much on hdd’s so my ambition has fallen from like 13 drives to instead thinking of jus getting 4 and making a zfs mirror.

Scruff McGruff
Feb 13, 2007

Jesus, kid, you're almost a detective. All you need now is a gun, a gut, and three ex-wives.

Kibner posted:

Partner told me not to spend so much on hdd’s so my ambition has fallen from like 13 drives to instead thinking of jus getting 4 and making a zfs mirror.

NAS/Storage Megathread: Partner told me not to spend so much on hdd's

History Comes Inside!
Nov 20, 2004




Delete all their favourite things

LionArcher
Mar 29, 2010


Combat Pretzel posted:

Is there currently some drama going on that I’m missing by ignoring GBS et al?

No no, I just want an archive of all the things because of climate change crumbling infrastructure. Also curious how much data that would take.

History Comes Inside!
Nov 20, 2004




Probably not all that much for the forums themselves but all the offsite embedded poo poo is going to add up

BlankSystemDaemon
Mar 13, 2009



Kibner posted:

Partner told me not to spend so much on hdd’s so my ambition has fallen from like 13 drives to instead thinking of jus getting 4 and making a zfs mirror.
This is storage erasure! :mad:

For what it's worth, you can still do raidz2 with 4 disks and then just slowly expand once in a while using raidz expansion, once it lands lands hopefully in 3.0.

IOwnCalculus
Apr 2, 2003





Harik posted:

e: I have a super hard time believing ZFS can accidentally import a drive from another pool. It absolutely shits bricks if you try to use a pool on another system without going through a whole "I'm really done with this pool, please mark it clean and don't try to use it at startup anymore" routine before moving the drives.

Not another pool, another vdev in the same pool. Think:

code:
raidz1
    sda
    sdb
    sdc
    sdd
raidz1
    sde
    sdf
    sdg
    sdh
Then the next reboot half the drives swap order but because zfs blindly trusts /etc/zfs/zpool.cache and imports the pool in exactly the order above, even though now it has two 'wrong' drives in each vdev.

I suppose you could avoid this by always importing by scanning the disks and never reading zpool.cache... but why not identify the drives in a meaningful manner? I don't miss the old days of mdraid having to reverse-engineer which /dev/sdX is dead and what physical drive that was before it stopped responding to everything.

BlankSystemDaemon
Mar 13, 2009



I think the whole "sd(4) devices moving around arbitrarily" is a Linuxism that comes about because of the specific implementation of floppy support as it exists in Linux - because neither Solaris, FreeBSD, NetBSD, macOS, or Windows (ie. all the other OS' that ZFS run on) behave like this.

Dyscrasia
Jun 23, 2003
Give Me Hamms Premium Draft or Give Me DEATH!!!!
On the failure topic, what does everyone use for alerting on them? As someone with years of experience with email systems, I don't want to deal with SMTP at home. Is there any sort of webhook or API based method for a headless server to communicate with a local client regarding various failures?

SpartanIvy
May 18, 2007
Hair Elf
I just upgraded my HPE ML30 Gen9 in a totally unnecessary but cool way :c00l:

IOwnCalculus
Apr 2, 2003





Dyscrasia posted:

On the failure topic, what does everyone use for alerting on them? As someone with years of experience with email systems, I don't want to deal with SMTP at home. Is there any sort of webhook or API based method for a headless server to communicate with a local client regarding various failures?

I can only speak for my combination of ZFS on Linux but zed includes support for multiple notification methods. Versions newer than in that blog post include the ability to push to Slack, which is what I use. I've also had the occasional time when zed just decided to not notify of an event for some reason, so I also have a cronjob set up every morning to just push the output of 'zpool status tank' to Slack as well.

kri kri
Jul 18, 2007

Rap Game Goku posted:

Doesn't Unraid do ZFS now?

It does, and it’s pretty nice. Easy appdata backups. I formatted some external drives to zfs and it’s easy to send the snapshots over.

BlankSystemDaemon
Mar 13, 2009



Dyscrasia posted:

On the failure topic, what does everyone use for alerting on them? As someone with years of experience with email systems, I don't want to deal with SMTP at home. Is there any sort of webhook or API based method for a headless server to communicate with a local client regarding various failures?
Mail-based delivery is not the fail-safe people treat it as, and is especially bad for the kinds of situations where you actually need to rely on things.
To quote the SMTP RFC:

quote:

When the receiver-SMTP accepts a piece of mail (by sending a "250 OK" message in response to DATA), it is accepting responsibility for delivering or relaying the message. It must take this responsibility seriously. It MUST NOT lose the message for frivolous reasons, such as because the host later crashes or because of a predictable resource shortage. Some reasons that are not considered frivolous are discussed in the next subsection and in Section 7.8

In recent years, there has been an increase of attacks on SMTP servers, either in conjunction with attempts to discover addresses for sending unsolicited messages or simply to make the servers inaccessible to others (i.e., as an application-level denial of service attack). While the means of doing so are beyond the scope of this Standard, rational operational behavior requires that servers be permitted to detect such attacks and take action to defend themselves. For example, if a server determines that a large number of RCPT TO commands are being sent, most or all with invalid addresses, as part of such an attack, it would be reasonable for the server to close the connection after generating an appropriate number of 5yz (normally 550) replies.
These two sections add up to mail basically only being considered delivery guaranteed once the server has already accepted it for delivery, and that with the ever-increasing attacks on mail servers, mean that non-frivolous reasons are pretty common - case in point, as an example, one of the best ways of filtering spam involves not accepting mail via 250 OK if the mail can be detected as spam (since this makes it much harder for the spammer to figure out if there's a mailbox to reach, as there's no back-scatter involved).

As far as alternatives go, it'll depend on what you're using - since I'm a firm believer in prometheus (and have easy access to sysutils/py-prometheus-zfs as well as a few others), I've gone with paperduty - because while it's proprietary, also free for up to 5 users and lets me do iOS notifications directly to my phone (which can be an exercise in frustration to get up and running privately).
It also has other options like Telegram, which lets you do the above for free - assuming you want to deal with that whole thing.

BlankSystemDaemon fucked around with this message at 02:23 on Sep 19, 2023

Dyscrasia
Jun 23, 2003
Give Me Hamms Premium Draft or Give Me DEATH!!!!
Fair enough both of you, I use Pagerduty for work, it seems like there is not anything for personal usage then? I suppose any notification service that's not smtp would be paid. I was hoping for something that might live in my local network, maybe a webhook between two always on machines. I suppose this gets into the learn how to roll my own situation.

Dyscrasia fucked around with this message at 03:34 on Sep 19, 2023

Aware
Nov 18, 2003
I use Telegram with Overseerr to notify me when users make requests and when it becomes available. It was pretty easy to setup a bot. It helps I already use telegram with friends I guess though.

IOwnCalculus
Apr 2, 2003





Yeah, that's the reason I went with a personal Slack instance. Free plan is still enough for webhooks and since I already have it running for other purposes, adding my own workspace just to collect push notifications was a no-brainer.

Thanks Ants
May 21, 2004

#essereFerrari


Pushover works well for me

Scruff McGruff
Feb 13, 2007

Jesus, kid, you're almost a detective. All you need now is a gun, a gut, and three ex-wives.

Thanks Ants posted:

Pushover works well for me

Yeah, this is what I use for system notifications. I use LunaSea for *arr app notifications (requested/available).

Tamba
Apr 5, 2010

My NAS is still sending me emails, but I use https://gotify.net/ for notifications from Home Assistant.

Dyscrasia
Jun 23, 2003
Give Me Hamms Premium Draft or Give Me DEATH!!!!
Pushover and gotify look exactly like what I was imagining.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
I tried Cobia Beta in a VM, k3s-server seemed to be relatively quiet. Now with the release of RC, I've upgraded my NAS to Cobia and migrated my Docker stuff to their native apps stuff (based on k3s). And guess who's knocking at the door again? I guess that's my life now. It's just a few watts of additional power usage, but still :shibe:

That said, their apps interface is pretty slow and doesn't always refresh fast.

wolrah
May 8, 2006
what?

BlankSystemDaemon posted:

Mail-based delivery is not the fail-safe people treat it as, and is especially bad for the kinds of situations where you actually need to rely on things.
This definitely needs to be emphasized. I literally just got back from a client site where the local backup NAS got turned off somehow and the emails from the server trying to alert people about it weren't getting delivered for the last month for reasons yet to be determined.

Fortunately the problem actually turned out to be a fault with the redundant power supplies that just needed someone to press F1 to acknowledge and continue booting, but the message it displayed combined with a non-technical user communicating it over the phone led me to expect I was walking in to a failed RAID array with the last backups being over a month ago.

If you need to trust mail based alerts then you also need to have an "all good" message so a mail delivery failure doesn't turn in to a false negative. And of course the problem with that is it then becomes just noise that you ignore.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Hmmm, I wish you could set txg timeouts on ZFS on a per pool basis or something.

I've been diagnosing a bunch of IO stuff after upgrading, and the SSD pool gets a burst of ~5MB of data to write every five seconds, which is the default txg timeout. Turns out it's the databases (both mariadb and influxdb) for Home Assistant. Trying various things, I eventually bumped said timeout to 10 seconds and disabled sync for random reasons, and now it's still writing 5MB, but every 10 seconds instead.

So it looks like the databases are frequently overwriting logical blocks and transaction grouping mitigates that before it actually hits the disk.

--edit:
Playing with txg timeout values:



:toot:

Combat Pretzel fucked around with this message at 21:25 on Sep 19, 2023

el_caballo
Feb 26, 2001
Well for future reference I think I figured out you can't do TRIM on FAT32 in Windows. At least not with a Jmicron chipset. I guess the Optimize Drives tool is the true reference for how a drive is seen by Windows and it still says "Hard disk drive" no matter what combination of cmd line disk benchmarks I do. Some people claim they have working TRIM with an external SSD but I'm giving up.


el_caballo posted:

I got a sorta unrelated question but I think you hard drive nerds probably have a simple answer. I rebuilt my Unraid server with the guts of my old desktop and was left with an unused old Adata SU800 250gb SSD. So I bought a $10 Sabrent enclosure with the idea of making this part of my travel firestick Kodi kit. FYI: in order for an un-rooted firestick to read a USB drive it needs to be FAT32 so that's what this portable SSD is now and the speed definitely helps with copying all those split RAR4 movie files.

My question is: does formatting an SSD to FAT32 and/or using it in a USB enclosure affect using TRIM? I did some searching that seemed to say TRIM doesn't work over USB and doesn't work in anything but NTFS on Windows, but also yes TRIM does work for all FAT file systems but also this Sabrent EC-USAP enclosure chipset which could be one of two different chipsets doesn't support TRIM and never will because it uses USB but also yes it does if you update the firmware with a Jmicron tool which I did.

Those last few mysteries are for me to figure out on my own and this is a cheap drive so who cares but this is my first portable SSD so I am just curious for all those portable SSDs yet unborn that I will own and love in the future and how they'll work with TRIM, FAT32, USB and Windows (Win 11). Crystal Disk Info does show TRIM as one of the features right now in Windows as a FAT32 drive but I don't know if that just means it supports it not necessarily that it is currently using it.

BlankSystemDaemon
Mar 13, 2009



el_caballo posted:

Well for future reference I think I figured out you can't do TRIM on FAT32 in Windows. At least not with a Jmicron chipset. I guess the Optimize Drives tool is the true reference for how a drive is seen by Windows and it still says "Hard disk drive" no matter what combination of cmd line disk benchmarks I do. Some people claim they have working TRIM with an external SSD but I'm giving up.
USB controllers are a loving nightmare, for basically anything that isn't a HID.

For storage, specifically, even getting something that lets you access S.M.A.R.T can be a big issue, and of all the things I've tested, the only place I've ever found TRIM to work is the RTL9210B-CG found in the Akasa AK-ENU3M2-05.*

I think the above might let you get a M.2 to SATA bridge assuming your disks have a SATA interface?

*: This is quite ironic, because I've spent a not-inconsiderable a mount of words both ITT and elsewhere on the forum complaining about Realtek and how absolute poo poo they are, and I'm a bit peeved that they've managed to be competent about this specific thing.

Kibner
Oct 21, 2008

Acguy Supremacy
When getting SFF-8643 mini-SAS HD controller -> SATA HDD cables, is there anything I should be looking for?

Like, should I just get these cheap-ish OIKWAN cables or the more expensive StarTech cables.

BlankSystemDaemon
Mar 13, 2009



Kibner posted:

When getting SFF-8643 mini-SAS HD controller -> SATA HDD cables, is there anything I should be looking for?

Like, should I just get these cheap-ish OIKWAN cables or the more expensive StarTech cables.
That SAS supports SATA is implemented in the actual controller on the HBA, so unless the cheap ones are so cheap that that they're electrically incompatible, I can't imagine that it can make a difference.

Kibner
Oct 21, 2008

Acguy Supremacy

BlankSystemDaemon posted:

That SAS supports SATA is implemented in the actual controller on the HBA, so unless the cheap ones are so cheap that that they're electrically incompatible, I can't imagine that it can make a difference.

Ok, cool. I was just wanting to make sure that there isn't something I am missing like connector issues or bandwidth issues or what. Good to know that it is very straightforward.

BlankSystemDaemon
Mar 13, 2009



Kibner posted:

Ok, cool. I was just wanting to make sure that there isn't something I am missing like connector issues or bandwidth issues or what. Good to know that it is very straightforward.
A single SAS lane usually has at least double if not triple the amount of bandwidth that SATA has - so bandwidth definitely isn't an issue.

Since spinning rust struggles to saturate SATA, and a SAS connector carries 4 SAS lanes, it's usually possible to host upwards of 16 SATA disks on a single SAS connector.

BlankSystemDaemon fucked around with this message at 00:52 on Sep 20, 2023

Computer viking
May 30, 2011
Now with less breakage.

BlankSystemDaemon posted:

A single SAS lane usually has at least double if not triple the amount of bandwidth that SATA has - so bandwidth definitely isn't an issue.

Since spinning rust struggles to saturate SATA, and a SAS connector carries 4 SAS lanes, it's usually possible to host upwards of 16 SATA disks on a single SAS connector.

± the badly documented mess of what exactly you mean by "SAS connector".

Until I started using SAS (and U.3), I though we were long past the time where the best explanations of standards were HTML 4.0 tables put together by a family run company to keep their customers from asking so many questions.

insta
Jan 28, 2009
Can the thread give me only good feedback on the QNAP TS-873A for a home lab? I plan to put 64gb ecc ram, a 10gbe card in it, some nvme drives as cache, 8x12tb drives for Linux ISOs, and run docker containers on other machines with network attached storage. It will only be doing file storage duty, and MAYBE be a Swarm coordinator.

I have already bought it so no negative feedback or better suggestions thx

Harik
Sep 9, 2001

From the hard streets of Moscow
First dog to touch the stars


Plaster Town Cop

IOwnCalculus posted:

Not another pool, another vdev in the same pool. Think:

code:
raidz1
    sda
    sdb
    sdc
    sdd
raidz1
    sde
    sdf
    sdg
    sdh
Then the next reboot half the drives swap order but because zfs blindly trusts /etc/zfs/zpool.cache and imports the pool in exactly the order above, even though now it has two 'wrong' drives in each vdev.

I suppose you could avoid this by always importing by scanning the disks and never reading zpool.cache... but why not identify the drives in a meaningful manner? I don't miss the old days of mdraid having to reverse-engineer which /dev/sdX is dead and what physical drive that was before it stopped responding to everything.

that's literally madness. why the fu... what. WHAT. Throw a loving uuid in the metadata what the actual christ.

i flat-out refuse to believe zfs will blow up a pool if your drives get imported in the wrong order. that's utterly unacceptable for a filesystem.

e: I'm spinning up a vm just to test this, istg if it does break I'll mock zfs proponents for the rest of eternity.

Harik fucked around with this message at 07:02 on Sep 22, 2023

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Nah, ZFS stores UUIDs in the disk’s uberblock exactly for that reason.

Harik
Sep 9, 2001

From the hard streets of Moscow
First dog to touch the stars


Plaster Town Cop

IOwnCalculus posted:

Not another pool, another vdev in the same pool. Think:

code:
raidz1
    sda
    sdb
    sdc
    sdd
raidz1
    sde
    sdf
    sdg
    sdh
Then the next reboot half the drives swap order but because zfs blindly trusts /etc/zfs/zpool.cache and imports the pool in exactly the order above, even though now it has two 'wrong' drives in each vdev.

I suppose you could avoid this by always importing by scanning the disks and never reading zpool.cache... but why not identify the drives in a meaningful manner? I don't miss the old days of mdraid having to reverse-engineer which /dev/sdX is dead and what physical drive that was before it stopped responding to everything.

This is definitively wrong, or at least I can't find a way to reproduce it when testing in a virtual machine.

I tried swapping disks in the same pool, and between pools.

Swapping between pools marked them as UNAVAIL and they had to be re-synchronized when they were re-added:

code:
Sep 22 03:14:18 zfstest zed[1255]: eid=1 class=statechange pool='tank' vdev=vdh1 vdev_state=UNAVAIL
Sep 22 03:14:18 zfstest zed[1271]: eid=2 class=statechange pool='tank' vdev=vdh1 vdev_state=UNAVAIL
Sep 22 03:14:18 zfstest zed[1297]: eid=8 class=statechange pool='testing' vdev=vdc1 vdev_state=UNAVAIL

harik@zfstest:~$ zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank     19.5G   269K  19.5G        -         -     0%     0%  1.00x  DEGRADED  -
testing  19.5G  1.00G  18.5G        -         -     0%     5%  1.00x  DEGRADED  -
but there was no data loss. At worst, if enough drives shuffled around it would fail to bring up the zpool until you located them by their new devicenames.

swapping within a pool caused nothing to happen.

returning the drives to their original order caused it to self-heal immediately:
code:
harik@zfstest:~$ zpool status 
  pool: tank
 state: ONLINE
  scan: resilvered 41K in 00:00:00 with 0 errors on Fri Sep 22 03:20:13 2023
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    vdf     ONLINE       0     0     0
	    vdg     ONLINE       0     0     0
	    vdh     ONLINE       0     0     0
	    vdi     ONLINE       0     0     0

errors: No known data errors

  pool: testing
 state: ONLINE
  scan: resilvered 62.5K in 00:00:00 with 0 errors on Fri Sep 22 03:20:13 2023
config:

	NAME        STATE     READ WRITE CKSUM
	testing     ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    vdb     ONLINE       0     0     0
	    vdc     ONLINE       0     0     0
	    vdd     ONLINE       0     0     0
	    vde     ONLINE       0     0     0

errors: No known data errors
If I'd been writing a lot to it in degraded the resilvering would have been more exensive I assume.

Harik fucked around with this message at 08:21 on Sep 22, 2023

BlankSystemDaemon
Mar 13, 2009



Harik posted:

that's literally madness. why the fu... what. WHAT. Throw a loving uuid in the metadata what the actual christ.

i flat-out refuse to believe zfs will blow up a pool if your drives get imported in the wrong order. that's utterly unacceptable for a filesystem.

e: I'm spinning up a vm just to test this, istg if it does break I'll mock zfs proponents for the rest of eternity.
Again, the problem is with Linux, specifically - and exists because of the floppy handling combined with users not knowing the difference.
EDIT: And it's not like the developers can just decide that one specific set of devices can't be used while others can - hence why the documentation very clearly states to avoid using sd(4) on Linux.

Here's the vdev label:

quote:

------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'zroot'
state: 0
txg: 33636114
pool_guid: 9053419482513538692
errata: 0
hostname: geroi.local
top_guid: 9414597141481046164
guid: 9414597141481046164
vdev_children: 1
vdev_tree:
type: 'disk'
id: 0
guid: 9414597141481046164
path: '/dev/nda0p3.eli'
whole_disk: 1
metaslab_array: 256
metaslab_shift: 31
ashift: 12
asize: 238601633792
is_log: 0
DTL: 100855
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
labels = 0 1 2 3
It checks all GUIDs as well as hostname on import, regardless of whether you use the cache or do a plain import.

What will happen is that zfs will run a resilver which won't actually trigger any repairs because there's no inconsistencies - but it's just barely conceivable that someone might shut down the system before that finishes, and at that point if the switcheroo happens again, trouble can in theory occur.
In practice, I've only seen people demonstrate it in test scenarios, to prove that it is a risk.

BlankSystemDaemon fucked around with this message at 14:51 on Sep 22, 2023

IOwnCalculus
Apr 2, 2003





I don't know what else to say other than I did actually run into this poo poo years ago when switching from FreeNAS to Ubuntu. It would start puking checksum errors when looking at /dev/sdX and they all went away for good after re-importing by-id.

IOwnCalculus posted:

FYI: It is indeed possible to import a ZFS pool from FreeNAS to ZFS-on-Linux, since the multi_vdev_dump (or whatever it's called) flag is not typically actually used by FreeNAS.

Annoyingly, one of my Reds is starting to throw errors now.

Edit: I did end up having to do this to get it to actually import everything properly and not freak out every reboot.

code:
sudo zpool export tank
sudo zpool import -d /dev/disk/by-id tank

Adbot
ADBOT LOVES YOU

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
I blame it on everyone doing some harebrained partitioning scheme, be it plain partitions or this GEOM crap, in which they stick the vdev. Thou Holiness Jeff Bonwick intended to use the whole disk literally, from the second sector onwards, without any of that poo poo.

(I still create any pools on command-line to bypass that bullshit.)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply