p-lang thread: (now (have you (problems two)))

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > p-lang thread: (now (have you (problems two)))

«‹›1784 »

Volte: Oct 4, 2004; woosh woosh

pseudorandom name posted:

Swift calls your HashFunction type Hasher

Actually what it calls Hasher is what I also called Hasher. Here's the lone sentence in giant letters at the of the Hasher documentation page:

quote:

The universal hash function used by Set and Dictionary.

If it wasn't obvious by this point, my problem is "universal hash function".

# ? Mar 3, 2019 20:54

Adbot: ADBOT LOVES YOU

# ? Jun 1, 2024 06:15

pseudorandom name: May 6, 2007

yeah, like I said, your complaint is that Set and Dictionary don't take custom hashers as a parameter to their initializers

# ? Mar 3, 2019 20:56

Volte: Oct 4, 2004; woosh woosh

It's actually that custom hashers aren't even a concept that is compatible with Swift's concept of hashability, but I won't argue that Set and Dictionary not having initializers for them is a natural consequence of that.

# ? Mar 3, 2019 20:57

pseudorandom name: May 6, 2007

oh for fucks sake, I'm sorry, I understand what you're complaining about now

I was under the wrong impression that Hasher was a protocol not a struct because why would you correctly generalize Hashable and then gently caress up Hasher

# ? Mar 3, 2019 21:01

DONT THREAD ON ME: Oct 1, 2002; by Nyc_Tattoo; Floss Finder

we did it!!!!!!

# ? Mar 3, 2019 21:03

Beamed: Nov 26, 2010; Then you have a responsibility that no man has ever faced. You have your fear which could become reality, and you have Godzilla, which is reality.

pseudorandom name posted:

see how few posts that took when reading comprehension was involved?

pseudorandom name posted:

oh for fucks sake, I'm sorry, I understand what you're complaining about now

# ? Mar 3, 2019 21:05

DONT THREAD ON ME: Oct 1, 2002; by Nyc_Tattoo; Floss Finder

i bet psuedorandom name's face is so red right now but respect for coming around in the end

# ? Mar 3, 2019 21:06

Volte: Oct 4, 2004; woosh woosh

Makes sense, I had the same misunderstanding about Hasher initially. Sadly even if Hasher was a protocol, it would be a red herring, because at best you could choose which operation you use to combine the members together, rather than customize the entire notion of a hash for a particular type.

# ? Mar 3, 2019 21:07

pseudorandom name: May 6, 2007

oh god and you wrote it right there in the very first post

# ? Mar 3, 2019 21:08

ynohtna: Feb 16, 2007; backwoods compatible; Illegal Hen

if the hash suits smoke it

# ? Mar 3, 2019 21:09

raminasi: Jan 25, 2005; a last drink with no ice

DONT THREAD ON ME posted:

i bet psuedorandom name's face is so red right now but respect for coming around in the end

# ? Mar 3, 2019 21:21

luchadornado: Oct 7, 2004; A boombox is not a toy!

as someone who has just spent a day wrangling siphash and fnv, these last two pages were traumatic

# ? Mar 3, 2019 21:29

hackbunny: Jul 22, 2007; I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

Volte posted:

Makes sense, I had the same misunderstanding about Hasher initially

me too

but I checked before committing to it. I edit so much wrong stuff out of my posts before hitting submit that I wonder if what's left makes sense

# ? Mar 3, 2019 21:40

Beamed: Nov 26, 2010; Then you have a responsibility that no man has ever faced. You have your fear which could become reality, and you have Godzilla, which is reality.

ya'll really hit my inferiority complex with this sheer corpus of knowledge you guys have. gd

# ? Mar 3, 2019 21:43

Athas: Aug 6, 2007; fuck that joker

I just use trees so I don't have to worry about computing hashes.

# ? Mar 3, 2019 21:49

Sweeper: Nov 29, 2007; The Joe Buck of Posting; Dinosaur Gum

Athas posted:

I just use trees so I don't have to worry about computing hashes.

let�s have this argument again but with generated ordering operators

# ? Mar 3, 2019 21:52

gonadic io: Feb 16, 2011; >>=

Sweeper posted:

let�s have this argument again but with generated ordering operators

lol, I don't know of many languages that have ordered sets/maps that take a ord function in their constructor. most of the ones i use require you to wrap your type in a newtype wrapper with a different ord instance. haskell has this stuff built in, you can do stuff like

[3, 4].map(x => Down(x)).sort().map(d => unDown(d)) == [4, 3]

# ? Mar 3, 2019 21:56

Sweeper: Nov 29, 2007; The Joe Buck of Posting; Dinosaur Gum

gonadic io posted:

lol, I don't know of many languages that have ordered sets/maps that take a ord function in their constructor. most of the ones i use require you to wrap your type in a newtype wrapper with a different ord instance. haskell has this stuff built in, you can do stuff like

[3, 4].map(x => Down(x)).sort().map(d => unDown(d)) == [4, 3]

have you heard the good news

# ? Mar 3, 2019 22:06

eschaton: Mar 7, 2007; Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

rjmccall posted:

case-insensitive comparison is basically a broken concept

except to, you know, actual humans who use human languages

# ? Mar 3, 2019 23:09

eschaton: Mar 7, 2007; Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Soricidus posted:

c# does it that way because it�s a java clone and java does it that way

idk why java does it that way

because Java is an Objective-C clone at heart and OpenStep NSObject does it that way

probably because Stepstone�s original Object class did it that way

and I bet it was originally that way in Smalltalk-76 too

# ? Mar 3, 2019 23:19

AggressivelyStupid: Jan 9, 2012

strings, are bad

# ? Mar 4, 2019 00:37

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

eschaton posted:

except to, you know, actual humans who use human languages

actual humans who use actual human languages want a locale-sensitive comparison, and it is a huge fuckup for a programming language to use one of those as the default ordering of strings

in general, sorts are parameterized because using a custom sort on data is an incredibly common thing to do because sorts are user-meaningful. hashing is not user-meaningful so the use cases for non-standard hashes are niche as gently caress, and i say that as someone who does a lot of niche algorithms work. making hashing be driven by type is a totally reasonable choice, and the resistance to it comes from the fact that a lot of people treat defining a trivial wrapper type as one the most onerous tasks a programmer can be asked to do, as opposed to something that sensible programmers should be doing as a matter of course whenever it�s useful

# ? Mar 4, 2019 00:43

gonadic io: Feb 16, 2011; >>=

agreed. consider the case where you want to sort book titles, and ignore "the" at the start for the purposes of sorting.

# ? Mar 4, 2019 00:45

CPColin: Sep 9, 2003; Big ol' smile.

DONT THREAD ON ME posted:

we did it!!!!!!

This is a true YOSPOS success story and everybody should be happy!!!

# ? Mar 4, 2019 00:46

echinopsis: Apr 13, 2004; by Fluffdaddy

hackbunny posted:

not really? think strings, there are so many ways to call two strings "equal". it's the reason why .net has an IEqualityComparer

let's not give dangerous oversimplified lessons. string comparison must not be made easier, in fact it's always been too easy. even something as simple as lowercase vs uppercase for case insensitive comparison is far from trivial - windows uppercases, using a non-standard internal table. the c runtime library lowercases, even on windows, so you really shouldn't compare filenames with wcsicmp (there's even a provision for ntfs volumes to use an internal built-in uppercasing table - special file $UpCase - but I don't know if it's actually used in practice. I really hope not!). and let's not even get into culturally sensitive vs culturally independent casing, and corner cases like the german sharp-S or the turkish dotless I

there used to be an amazing blog about all these subtleties, by microsoft engineer Michael Kaplan, but he died and the blog was taken offline. I didn't learn much because the topic is unimaginably complex but I learned to treat strings like dangerous wild animals

among the things I learned is the kinda brilliant solution microsoft found for efficient contextual collation of strings: transforming text strings into sort keys - binary strings that encode the sorting weights of each character (see LCMapStringEx with the LCMAP_SORTKEY flag). these binary strings can then be cached and efficiently compared with stricmp, giving the same result of comparing the original strings - extremely convenient in places like databases (just remember to store the version information of the sorting data, and reindex if necessary!). SQL Server definitely uses them and I wonder how much work it would be to add support to sqlite, possibly as an extension, because I always find myself having to add user-defined functions

i call bullshit

# ? Mar 4, 2019 00:52

eschaton: Mar 7, 2007; Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

rjmccall posted:

actual humans who use actual human languages want a locale-sensitive comparison, and it is a huge fuckup for a programming language to use one of those as the default ordering of strings

in general, sorts are parameterized because using a custom sort on data is an incredibly common thing to do because sorts are user-meaningful.

this is true

nonetheless, when we�ve made affirmative design decisions that some things (like sorting) belong at the presentation layer rather than at lower layers, you would not believe the wailing and gnashing of teeth

like, �who cares about the relational model and consistency, I need ordered relationships!� of course followed quickly by �what do you mean these don�t perform as well as unordered relationships?!�

# ? Mar 4, 2019 02:47

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

preemptive thanks for the swift Hasher discussion because I can already tell this will come up irl one day and I�ll be Prepared

also congrats everyone on figuring out what each other was saying, I think its time for everyone involved who�s ok with it to have a hug

# ? Mar 4, 2019 03:50

floatman: Mar 17, 2009

I do comparisons in PHP by randomly alternating between 2 equals and 3 equals
Works 4 me

# ? Mar 4, 2019 06:53

Soricidus: Oct 21, 2010; freedom-hating statist shill

floatman posted:

I do comparisons in PHP by randomly alternating between 2 equals and 3 equals
Works 4 me

same, which is possibly suboptimal because starting all those php subprocesses from java has a fair bit of overhead

# ? Mar 4, 2019 09:27

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

one does not simply normalize unicode

i guess this sorta implying that 7-bit ascii is an evil artifact corrupting whoever uses it, but that's not exactly wrong is it

(credit to @FakeUnicode, excellent unicode horror account)

# ? Mar 4, 2019 13:19

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

it's essentially impossible to do string normalization in a way that makes sense to humans without language metadata at the very least (but even that might not be enough). asking yourself what it actually means for two strings to be considered equal is a downright philosophical question.

# ? Mar 4, 2019 13:30

champagne posting: Apr 5, 2006; YOU ARE A BRAIN
IN A BUNKER

TheFluff posted:

it's essentially impossible to do string normalization in a way that makes sense to humans without language metadata at the very least (but even that might not be enough). asking yourself what it actually means for two strings to be considered equal is a downright philosophical question.

clearly the solution lies somewhere in pattern recognition ai

# ? Mar 4, 2019 13:39

Zlodo: Nov 25, 2006

just normalize using a neural network to recognize similar character shapes

all you need to standardize is which font to use

# ? Mar 4, 2019 14:10

champagne posting: Apr 5, 2006; YOU ARE A BRAIN
IN A BUNKER

Zlodo posted:

all you need to standardize is which font to use

this project is doomed from the start

# ? Mar 4, 2019 14:30

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

all internet arguments end with philosophizing over definitions and this one is no different

# ? Mar 4, 2019 15:22

Vomik: Jul 29, 2003; This post is dedicated to the brave Mujahideen fighters of Afghanistan

Zlodo posted:

just normalize using a neural network to recognize similar character shapes

all you need to standardize is which font to use

comic Sans MS - an accessibility font

# ? Mar 4, 2019 15:27

fritz: Jul 26, 2003

TheFluff posted:

one does not simply normalize unicode

i guess this sorta implying that 7-bit ascii is an evil artifact corrupting whoever uses it, but that's not exactly wrong is it

(credit to @FakeUnicode, excellent unicode horror account)

idk the mongolians have a special character for 'ill get around to it eventually'

# ? Mar 4, 2019 16:22

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Zlodo posted:

just normalize using a neural network to recognize similar character shapes

all you need to standardize is which font to use

you joke, but imma well-actually it anyway

to a scandinavian, � is an entirely separate letter which sorts at the end of the alphabet and is just as distinct from o as a is from b, to the point that text that replaces � with o is markedly hard to read, and a text search that treated the two as equal would be borderline useless.

on the other hand, to an american reading the new yorker, � is just a pretentious way of writing o in certain words, and in order for text search to work as people expect, it should sort as an o and compare equal to an o. the unicode representation is identical in both cases, assuming you did your homework and use NFC normalization like the best practices told you (and they're not wrong).

then there's german, which is somewhere in between - it should probably be treated as distinct from a regular o, but on the other hand it usually sorts as if it was an o.

this is all babby tier compared to some of the poo poo going on in the east asian languages, where for example the same unicode codepoint can render a visually significantly different character depending on what language (or country, or part of a country) you're in.

it might sound like i'm stoned out of my mind when i go "but what does it meeeeeean to say that two character sequences are the same, duuuuuuude" but it's an extremely relevant question to ask yourself if you're doing that

e: unicode very intentionally stops short of the language level; the fact that a new yorker � has the same representation as a swedish � is a feature, not a bug, and all the spooky language stuff is somebody else's problem

TheFluff fucked around with this message at 21:27 on Mar 4, 2019

# ? Mar 4, 2019 21:11

hackbunny: Jul 22, 2007; I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

thank you Suspicious Dish for the archived copy of Sorting it All Out, I've been re-reading it with great interest. I've also found a mistake!

"Comparison confusion: INVARIANT vs. ORDINAL posted:

Originally, LOCALE_INVARIANT had just one noble purpose -- to allow one to use CompareString (and LCMapString with the LCMAP_SORTKEY flag) in a way that would only use the "Default" Windows sorting table as mentioned a little bit here and especially here.

the invariant locale uses different casing data! to sort/search strings case-insensitively in the same way the filesystem, registry, etc. do, you want an ordinal comparison - CompareStringOrdinal. I know because this morning I checked all 4294967296 combinations of two WCHARs (including unpaired utf-16 surrogates and other illegal unicode characters - which are actually legal in windows object names) and CompareStringOrdinal matches RtlCompareUnicodeString (what the kernel and drivers use) exactly; I haven't checked wcsicmp but the fact that it folds to lowercase instead of uppercase is a guarantee of different behavior (counter-intuitively, lots of case-folding operations are one-way). I learned something new! and I can stop using a kind-of-undocumented function

(I wonder if michael eventually realized the mistake and fixed it in a later post. apropos of nothing :nws:

lol at his final blog post :nws:

. what a legacy)

why bother, you may ask? because when you duplicate OS behavior, you want the highest fidelity possible. winging it may open a security hole

hackbunny fucked around with this message at 21:33 on Mar 4, 2019

# ? Mar 4, 2019 21:28

Adbot: ADBOT LOVES YOU

# ? Jun 1, 2024 06:15

pseudorandom name: May 6, 2007

except NTFS and exFAT have embedded case-folding tables

# ? Mar 4, 2019 21:33

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > p-lang thread: (now (have you (problems two)))

«‹›1784 »