Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
echinopsis
Apr 13, 2004

by Fluffdaddy
MSDOS had it right with 8.3 in all caps

Adbot
ADBOT LOVES YOU

Cybernetic Vermin
Apr 18, 2005

it is not like it is some intractable problem to do unicode-defined normalization, case folding and e.g. collation either. it is hardly a necessary feature, but it's nice and not *that* complex.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
it’s not hard, but it is also locale-dependent, which is inherently at odds with being something like a filesystem which is meant to reflect the same data for different users

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

I don’t think it would be that unreasonable to say “this is the locale that /whatever is mounted as”, if you wanted the correct case folding in exchange

I wonder what the performance of that would be like. maybe we’d get RISC-V instruction extensions for doing the collation and folded comparison, requiring a gig of fast cache just for the tables

Cybernetic Vermin
Apr 18, 2005

for collation locale is kind of a big deal, but the filesystem does not usually handle that part (afaik at least?).

for normalization/case folding i think you indeed do just fix a sensible default on normalization/case-folding. debate is then mostly whether "sensible" is just 1:1 (i.e. just memcmp blindly), fixing a unicode table (possibly tweaked), and if so if you include various casing-"style" normalization.

windows does do the latter. i can't say i care a ton, but i think it overall matters more for some languages that others.

e.g. greek has the pleasure of having not only U+03c1 (Small Greek Rho) and U+03a1 (Capital Greek Rho), but *also* U+03f1 (Greek Rho Symbol), and that gets a touch on the side of bullshit to have people interface directly with.

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum
goon project: invent a language where nul byte is meaningful

prisoner of waffles
May 8, 2007

Ah! well a-day! what evil looks
Had I from old and young!
Instead of the cross, the fishmech
About my neck was hung.
xargs -0

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Sweeper posted:

goon project: invent a language where nul byte is meaningful

*the zen master kicks you in the balls*

Vanadium
Jan 8, 2005

I've seen apps just url encode file names instead of thinking about encodings and I can't say they're wrong

VikingofRock
Aug 24, 2008




Filesystems/OSs should just give every file a UUID and then use that. Then, allow users to set the display names to whatever they want. Users wanna have 50 different files on their desktop, all called "temp.txt" or whatever? I say, let them!

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
export LC_ALL=C

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."

VikingofRock posted:

Filesystems/OSs should just give every file a UUID and then use that. Then, allow users to set the display names to whatever they want. Users wanna have 50 different files on their desktop, all called "temp.txt" or whatever? I say, let them!

this would make revision control systems, backups, and even just interacting with any sort of archive format rather exciting

prisoner of waffles
May 8, 2007

Ah! well a-day! what evil looks
Had I from old and young!
Instead of the cross, the fishmech
About my neck was hung.
let a thousand file-overwrite vulnerabilities bloom

VikingofRock
Aug 24, 2008




Internet Janitor posted:

this would make revision control systems, backups, and even just interacting with any sort of archive format rather exciting

Eh, we could all use a little more excitement in our lives

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Vanadium posted:

I've seen apps just url encode file names instead of thinking about encodings and I can't say they're wrong

I can, because URL encoding is defined on characters, and file names can contain sequences that aren’t valid characters

FlapYoJacks
Feb 12, 2009
all files should be named as the memory address they point to. Users shouldn’t be given an option to name them. Good ol 0x549badc60381c2361d53ce2195e926f7. And yes, the file names should change if the memory address changes.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
files should be named with the timestamp at which they’re permitted to be opened. before that moment, permission denied. after that moment, immediately deleted

strict monotonic clock, of course

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
a filesystem that incorporated a concept of (configurable) expiration dates would be kind of interesting, and in some cases even useful

12 rats tied together
Sep 7, 2006

it's kinda like a snapchat operating system

Sagacity
May 2, 2003
Hopefully my epitaph will be funnier than my custom title.
if only posts could do the same

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

echinopsis posted:

why not have the real file name and file display name be different things


no problems

we can even store the file display name in the resource fork associated with the file! :downs:

Sapozhnik
Jan 2, 2005

Nap Ghost
operating systems should provide virtual disk services in the same way that they provide virtual memory services. carve the disk up into 64MB (or whatever size) pages and allocate logical volumes to per-user user-space file system implementations to do with as they please.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull
i mean, they already do that

linux lvm will happily export device nodes carving up a disk into virtual disks and there's nothing stopping userspace from opening one of those block devices and going ham

much finer granularity than 64MB too

redleader
Aug 18, 2005

Engage according to operational parameters
would your filesystem and os need to keep up to date with new unicode versions?

echinopsis
Apr 13, 2004

by Fluffdaddy

VikingofRock posted:

Filesystems/OSs should just give every file a UUID and then use that. Then, allow users to set the display names to whatever they want. Users wanna have 50 different files on their desktop, all called "temp.txt" or whatever? I say, let them!

hell, every file is 24 digit alphabuermic random number w

mystes
May 31, 2006

echinopsis posted:

hell, every file is 24 digit alphabuermic random number w
Your post is definitely alphabuermic

pseudorandom name
May 6, 2007

redleader posted:

would your filesystem and os need to keep up to date with new unicode versions?

Windows embeds the case-folding table into NTFS at filesystem creation, and then doesn't use it at runtime.

iirc macOS has changed it's normalization algorithm incompatibly as Unicode has changed incompatibly (or maybe it was the reverse, Unicode changed incompatibly and macOS kept the obsolete Unicode normalization algorithm)

in conclusion, normalizing filenames is stupid

Visions of Valerie
Jun 18, 2023

Come this autumn, we'll be miles away...

redleader posted:

im leaving that in the well-qualified hands of the unicode committee

yes, the people who brought us han unification surely won't gently caress this up in any way

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Visions of Valerie posted:

yes, the people who brought us han unification surely won't gently caress this up in any way

wasn’t there a Han unification effort before Unicode?

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

pseudorandom name posted:

Windows embeds the case-folding table into NTFS at filesystem creation, and then doesn't use it at runtime.

what? why?

mystes
May 31, 2006

Subjunctive posted:

wasn’t there a Han unification effort before Unicode?
like in 202 BC?

Sapozhnik
Jan 2, 2005

Nap Ghost
Should have done Latin-Greek-Cyrillic unification as well while they were at it

D, Δ, Д are all basically the same letter, if the user really about that character's exact appearance then they can choose a region-appropriate font

echinopsis
Apr 13, 2004

by Fluffdaddy
Д is just A

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
but then how will i make "put it in н" jokes

redleader
Aug 18, 2005

Engage according to operational parameters

Visions of Valerie posted:

yes, the people who brought us han unification surely won't gently caress this up in any way

actually han unification is fine, according to white people on the internet



Sapozhnik posted:

Should have done Latin-Greek-Cyrillic unification as well while they were at it

D, Δ, Д are all basically the same letter, if the user really about that character's exact appearance then they can choose a region-appropriate font

ah, no, this is different from han unification because

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

han unification in unicode was designed and directed by CJK-JRG, which is staffed with experts from China, Japan, and Korea. I assume that they at least aren’t all white

echinopsis
Apr 13, 2004

by Fluffdaddy
this gives me a headache

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

VikingofRock posted:

Filesystems/OSs should just give every file a UUID and then use that
they do, it’s the inode number, bing bong

pseudorandom name
May 6, 2007

redleader posted:

ah, no, this is different from han unification because

e.g. KOI8-R contains both A and Д and you need to be able to losslessly roundtrip from KOI8-R to Unicode and then back to KOI8-R

Adbot
ADBOT LOVES YOU

pseudorandom name
May 6, 2007

its the same reason Unicode contains a bunch of precomposed characters even though you can make them using combining sequences

and also why unicode contains both precomposed and combining forms of hangul

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply