Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Volte
Oct 4, 2004

woosh woosh

pseudorandom name posted:

Swift calls your HashFunction type Hasher
Actually what it calls Hasher is what I also called Hasher. Here's the lone sentence in giant letters at the of the Hasher documentation page:

quote:

The universal hash function used by Set and Dictionary.
If it wasn't obvious by this point, my problem is "universal hash function".

Adbot
ADBOT LOVES YOU

pseudorandom name
May 6, 2007

yeah, like I said, your complaint is that Set and Dictionary don't take custom hashers as a parameter to their initializers

Volte
Oct 4, 2004

woosh woosh
It's actually that custom hashers aren't even a concept that is compatible with Swift's concept of hashability, but I won't argue that Set and Dictionary not having initializers for them is a natural consequence of that.

pseudorandom name
May 6, 2007

oh for fucks sake, I'm sorry, I understand what you're complaining about now

I was under the wrong impression that Hasher was a protocol not a struct because why would you correctly generalize Hashable and then gently caress up Hasher

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
we did it!!!!!!

Beamed
Nov 26, 2010

Then you have a responsibility that no man has ever faced. You have your fear which could become reality, and you have Godzilla, which is reality.


pseudorandom name posted:

see how few posts that took when reading comprehension was involved?


pseudorandom name posted:

oh for fucks sake, I'm sorry, I understand what you're complaining about now

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
i bet psuedorandom name's face is so red right now but respect for coming around in the end

Volte
Oct 4, 2004

woosh woosh
Makes sense, I had the same misunderstanding about Hasher initially. Sadly even if Hasher was a protocol, it would be a red herring, because at best you could choose which operation you use to combine the members together, rather than customize the entire notion of a hash for a particular type.

pseudorandom name
May 6, 2007

oh god and you wrote it right there in the very first post

ynohtna
Feb 16, 2007

backwoods compatible
Illegal Hen
if the hash suits smoke it

raminasi
Jan 25, 2005

a last drink with no ice

DONT THREAD ON ME posted:

i bet psuedorandom name's face is so red right now but respect for coming around in the end

luchadornado
Oct 7, 2004

A boombox is not a toy!

as someone who has just spent a day wrangling siphash and fnv, these last two pages were traumatic

hackbunny
Jul 22, 2007

I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

Volte posted:

Makes sense, I had the same misunderstanding about Hasher initially

me too :ssh: but I checked before committing to it. I edit so much wrong stuff out of my posts before hitting submit that I wonder if what's left makes sense

Beamed
Nov 26, 2010

Then you have a responsibility that no man has ever faced. You have your fear which could become reality, and you have Godzilla, which is reality.


ya'll really hit my inferiority complex with this sheer corpus of knowledge you guys have. gd

Athas
Aug 6, 2007

fuck that joker
I just use trees so I don't have to worry about computing hashes.

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

Athas posted:

I just use trees so I don't have to worry about computing hashes.

let’s have this argument again but with generated ordering operators

gonadic io
Feb 16, 2011

>>=

Sweeper posted:

let’s have this argument again but with generated ordering operators

lol, I don't know of many languages that have ordered sets/maps that take a ord function in their constructor. most of the ones i use require you to wrap your type in a newtype wrapper with a different ord instance. haskell has this stuff built in, you can do stuff like

[3, 4].map(x => Down(x)).sort().map(d => unDown(d)) == [4, 3]

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

gonadic io posted:

lol, I don't know of many languages that have ordered sets/maps that take a ord function in their constructor. most of the ones i use require you to wrap your type in a newtype wrapper with a different ord instance. haskell has this stuff built in, you can do stuff like

[3, 4].map(x => Down(x)).sort().map(d => unDown(d)) == [4, 3]

have you heard the good news

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

rjmccall posted:

case-insensitive comparison is basically a broken concept

except to, you know, actual humans who use human languages

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Soricidus posted:

c# does it that way because it’s a java clone and java does it that way

idk why java does it that way

because Java is an Objective-C clone at heart and OpenStep NSObject does it that way

probably because Stepstone’s original Object class did it that way

and I bet it was originally that way in Smalltalk-76 too

AggressivelyStupid
Jan 9, 2012

strings, are bad

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

eschaton posted:

except to, you know, actual humans who use human languages

actual humans who use actual human languages want a locale-sensitive comparison, and it is a huge fuckup for a programming language to use one of those as the default ordering of strings

in general, sorts are parameterized because using a custom sort on data is an incredibly common thing to do because sorts are user-meaningful. hashing is not user-meaningful so the use cases for non-standard hashes are niche as gently caress, and i say that as someone who does a lot of niche algorithms work. making hashing be driven by type is a totally reasonable choice, and the resistance to it comes from the fact that a lot of people treat defining a trivial wrapper type as one the most onerous tasks a programmer can be asked to do, as opposed to something that sensible programmers should be doing as a matter of course whenever it’s useful

gonadic io
Feb 16, 2011

>>=
agreed. consider the case where you want to sort book titles, and ignore "the" at the start for the purposes of sorting.

CPColin
Sep 9, 2003

Big ol' smile.

DONT THREAD ON ME posted:

we did it!!!!!!

This is a true YOSPOS success story and everybody should be happy!!!

echinopsis
Apr 13, 2004

by Fluffdaddy

hackbunny posted:

not really? think strings, there are so many ways to call two strings "equal". it's the reason why .net has an IEqualityComparer


let's not give dangerous oversimplified lessons. string comparison must not be made easier, in fact it's always been too easy. even something as simple as lowercase vs uppercase for case insensitive comparison is far from trivial - windows uppercases, using a non-standard internal table. the c runtime library lowercases, even on windows, so you really shouldn't compare filenames with wcsicmp (there's even a provision for ntfs volumes to use an internal built-in uppercasing table - special file $UpCase - but I don't know if it's actually used in practice. I really hope not!). and let's not even get into culturally sensitive vs culturally independent casing, and corner cases like the german sharp-S or the turkish dotless I

there used to be an amazing blog about all these subtleties, by microsoft engineer Michael Kaplan, but he died and the blog was taken offline. I didn't learn much because the topic is unimaginably complex but I learned to treat strings like dangerous wild animals

among the things I learned is the kinda brilliant solution microsoft found for efficient contextual collation of strings: transforming text strings into sort keys - binary strings that encode the sorting weights of each character (see LCMapStringEx with the LCMAP_SORTKEY flag). these binary strings can then be cached and efficiently compared with stricmp, giving the same result of comparing the original strings - extremely convenient in places like databases (just remember to store the version information of the sorting data, and reindex if necessary!). SQL Server definitely uses them and I wonder how much work it would be to add support to sqlite, possibly as an extension, because I always find myself having to add user-defined functions

i call bullshit

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

rjmccall posted:

actual humans who use actual human languages want a locale-sensitive comparison, and it is a huge fuckup for a programming language to use one of those as the default ordering of strings

in general, sorts are parameterized because using a custom sort on data is an incredibly common thing to do because sorts are user-meaningful.

this is true

nonetheless, when we’ve made affirmative design decisions that some things (like sorting) belong at the presentation layer rather than at lower layers, you would not believe the wailing and gnashing of teeth

like, “who cares about the relational model and consistency, I need ordered relationships!” of course followed quickly by “what do you mean these don’t perform as well as unordered relationships?!”

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.
preemptive thanks for the swift Hasher discussion because I can already tell this will come up irl one day and I’ll be Prepared

also congrats everyone on figuring out what each other was saying, I think its time for everyone involved who’s ok with it to have a hug

floatman
Mar 17, 2009
I do comparisons in PHP by randomly alternating between 2 equals and 3 equals
Works 4 me

Soricidus
Oct 21, 2010
freedom-hating statist shill

floatman posted:

I do comparisons in PHP by randomly alternating between 2 equals and 3 equals
Works 4 me

same, which is possibly suboptimal because starting all those php subprocesses from java has a fair bit of overhead

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
one does not simply normalize unicode



i guess this sorta implying that 7-bit ascii is an evil artifact corrupting whoever uses it, but that's not exactly wrong is it

(credit to @FakeUnicode, excellent unicode horror account)

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
it's essentially impossible to do string normalization in a way that makes sense to humans without language metadata at the very least (but even that might not be enough). asking yourself what it actually means for two strings to be considered equal is a downright philosophical question.

champagne posting
Apr 5, 2006

YOU ARE A BRAIN
IN A BUNKER

TheFluff posted:

it's essentially impossible to do string normalization in a way that makes sense to humans without language metadata at the very least (but even that might not be enough). asking yourself what it actually means for two strings to be considered equal is a downright philosophical question.

clearly the solution lies somewhere in pattern recognition ai

Zlodo
Nov 25, 2006
just normalize using a neural network to recognize similar character shapes

all you need to standardize is which font to use

champagne posting
Apr 5, 2006

YOU ARE A BRAIN
IN A BUNKER

Zlodo posted:

all you need to standardize is which font to use

this project is doomed from the start

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
all internet arguments end with philosophizing over definitions and this one is no different

Vomik
Jul 29, 2003

This post is dedicated to the brave Mujahideen fighters of Afghanistan

Zlodo posted:

just normalize using a neural network to recognize similar character shapes

all you need to standardize is which font to use

comic Sans MS - an accessibility font

fritz
Jul 26, 2003

TheFluff posted:

one does not simply normalize unicode



i guess this sorta implying that 7-bit ascii is an evil artifact corrupting whoever uses it, but that's not exactly wrong is it

(credit to @FakeUnicode, excellent unicode horror account)

idk the mongolians have a special character for 'ill get around to it eventually'

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Zlodo posted:

just normalize using a neural network to recognize similar character shapes

all you need to standardize is which font to use
you joke, but imma well-actually it anyway

to a scandinavian, ö is an entirely separate letter which sorts at the end of the alphabet and is just as distinct from o as a is from b, to the point that text that replaces ö with o is markedly hard to read, and a text search that treated the two as equal would be borderline useless.

on the other hand, to an american reading the new yorker, ö is just a pretentious way of writing o in certain words, and in order for text search to work as people expect, it should sort as an o and compare equal to an o. the unicode representation is identical in both cases, assuming you did your homework and use NFC normalization like the best practices told you (and they're not wrong).

then there's german, which is somewhere in between - it should probably be treated as distinct from a regular o, but on the other hand it usually sorts as if it was an o.

this is all babby tier compared to some of the poo poo going on in the east asian languages, where for example the same unicode codepoint can render a visually significantly different character depending on what language (or country, or part of a country) you're in.


it might sound like i'm stoned out of my mind when i go "but what does it meeeeeean to say that two character sequences are the same, duuuuuuude" but it's an extremely relevant question to ask yourself if you're doing that


e: unicode very intentionally stops short of the language level; the fact that a new yorker ö has the same representation as a swedish ö is a feature, not a bug, and all the spooky language stuff is somebody else's problem

TheFluff fucked around with this message at 21:27 on Mar 4, 2019

hackbunny
Jul 22, 2007

I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av
thank you Suspicious Dish for the archived copy of Sorting it All Out, I've been re-reading it with great interest. I've also found a mistake!

"Comparison confusion: INVARIANT vs. ORDINAL posted:

Originally, LOCALE_INVARIANT had just one noble purpose -- to allow one to use CompareString (and LCMapString with the LCMAP_SORTKEY flag) in a way that would only use the "Default" Windows sorting table as mentioned a little bit here and especially here.

:wrong: the invariant locale uses different casing data! to sort/search strings case-insensitively in the same way the filesystem, registry, etc. do, you want an ordinal comparison - CompareStringOrdinal. I know because this morning I checked all 4294967296 combinations of two WCHARs (including unpaired utf-16 surrogates and other illegal unicode characters - which are actually legal in windows object names) and CompareStringOrdinal matches RtlCompareUnicodeString (what the kernel and drivers use) exactly; I haven't checked wcsicmp but the fact that it folds to lowercase instead of uppercase is a guarantee of different behavior (counter-intuitively, lots of case-folding operations are one-way). I learned something new! and I can stop using a kind-of-undocumented function

(I wonder if michael eventually realized the mistake and fixed it in a later post. apropos of nothing :nws: lol at his final blog post:nws:. what a legacy)

why bother, you may ask? because when you duplicate OS behavior, you want the highest fidelity possible. winging it may open a security hole

hackbunny fucked around with this message at 21:33 on Mar 4, 2019

Adbot
ADBOT LOVES YOU

pseudorandom name
May 6, 2007

except NTFS and exFAT have embedded case-folding tables

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply