Coding Horrors: You can gather all your technical debt into one easy framework!

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

qntm posted:

I'm going to go with .toLowerCase().length as my personal highlight here.

I'm trying to decide if there's a unicode case where lowering can change length. Maybe �?

# ? Oct 15, 2015 19:54

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 10:54

omeg: Sep 3, 2012

Arabic scripts have some weird rules so maybe there. But that's mostly character joining, dunno about casing.

# ? Oct 15, 2015 19:56

ChubbyThePhat: Dec 22, 2006; Who nico nico needs anyone else

Mogomra posted:

Not the most horrifying thing I've ever seen, but certainly :pwn:

code:

for (var i = 0; i < 26; i++) {
	if (typeof request.app.requestData.fields['image_' + i + '_websrc'] !== 'undefined') {
		if (request.app.requestData.fields['image_' + i + '_websrc'].length > 0) {
			fileCountToGrab++;
			var VALID_EXTS = [".jpg", ".jpeg", ".bmp", ".png", ".gif", "ttf", "otf"];
			if (VALID_EXTS.indexOf(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().substring(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().length - 4).toLowerCase()) == -1 && VALID_EXTS.indexOf(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().substring(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().length - 5).toLowerCase()) == -1) {
				return false;
			}
		}
	}
}

What the gently caress lol. There sheer number of toLowerCase() calls is just murderous. I also had to stop myself from writing that with a ternary operator.

# ? Oct 15, 2015 20:36

qntm: Jun 17, 2009

Subjunctive posted:

I'm trying to decide if there's a unicode case where lowering can change length. Maybe �?

Hmm, it looks like setting � to upper case yields "SS" but the reverse isn't true. I think I'm correct, although mainly by coincidence.

# ? Oct 15, 2015 20:39

Bonfire Lit: Jul 9, 2008; If you're one of the sinners who caused this please unfriend me now.

Subjunctive posted:

I'm trying to decide if there's a unicode case where lowering can change length. Maybe �?

� already is lower-case. (The capital form is ẞ and nobody uses it.)

e:

qntm posted:

Hmm, it looks like setting � to upper case yields "SS" but the reverse isn't true. I think I'm correct, although mainly by coincidence.

Yeah, � has no traditional capital form because it never occurs at the beginning of a word. If you're type-setting allcaps, you're supposed to use SS unless you have to preserve the exact spelling of a proper name.

Bonfire Lit fucked around with this message at 20:43 on Oct 15, 2015

# ? Oct 15, 2015 20:40

Flobbster: Feb 17, 2005; "Cadet Kirk, after the way you cheated on the Kobayashi Maru test I oughta punch you in tha face!"

qntm posted:

Hmm, it looks like setting � to upper case yields "SS" but the reverse isn't true. I think I'm correct, although mainly by coincidence.

That makes sense about the reverse, since the linguistic rules in German about "ss" vs. "�" are far more complex than a 1:1 mapping.

# ? Oct 15, 2015 20:53

Mogomra: Nov 5, 2005; simply having a wonderful time

Just to clarify about the � stuff:

All of the *_websrc properties are AWS S3 urls, and are pretty much guaranteed to not have magical characters that multiply when they're upper/lowercased.

# ? Oct 15, 2015 21:00

Soricidus: Oct 21, 2010; freedom-hating statist shill

omeg posted:

Arabic scripts have some weird rules so maybe there. But that's mostly character joining, dunno about casing.

nah, arabic doesn't have casing, it's mostly only found in european alphabets

# ? Oct 15, 2015 21:04

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Case folding does � -> ss so that �.toUpperCase().toFoldCase() == �.toFoldCase().

# ? Oct 15, 2015 21:27

Carbon dioxide: Oct 9, 2012

Mogomra posted:

Not the most horrifying thing I've ever seen, but certainly :pwn:

code:

for (var i = 0; i < 26; i++) {
	if (typeof request.app.requestData.fields['image_' + i + '_websrc'] !== 'undefined') {
		if (request.app.requestData.fields['image_' + i + '_websrc'].length > 0) {
			fileCountToGrab++;
			var VALID_EXTS = [".jpg", ".jpeg", ".bmp", ".png", ".gif", "ttf", "otf"];
			if (VALID_EXTS.indexOf(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().substring(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().length - 4).toLowerCase()) == -1 && VALID_EXTS.indexOf(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().substring(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().length - 5).toLowerCase()) == -1) {
				return false;
			}
		}
	}
}

.jpg and .gif have a dot, but ttf and otf don't.

# ? Oct 15, 2015 21:48

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Correct me if I'm wrong, but I believe that there is exactly one Unicode character that becomes more than one character on lowercasing, U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE, also known as "the Turkish I".

# ? Oct 15, 2015 22:25

Karate Bastard: Jul 31, 2007; Soiled Meat

toLowerCase() is floor() for strings.

# ? Oct 15, 2015 22:46

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Unicode is strings with traps in

# ? Oct 15, 2015 22:51

Karate Bastard: Jul 31, 2007; Soiled Meat

I wish they hadn't invented Unicode. We should have kept "plain text".

# ? Oct 15, 2015 22:57

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

How na�ve.

# ? Oct 15, 2015 22:59

Karate Bastard: Jul 31, 2007; Soiled Meat

# ? Oct 15, 2015 23:01

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

The headaches involved in working with legacy encodings is a strict superset of the headaches of dealing with unicode.

# ? Oct 15, 2015 23:10

Soricidus: Oct 21, 2010; freedom-hating statist shill

the fundamental mistake of unicode is trying to make computers handle every single traditional writing system at the same time. it's gonna hurt. particularly when you come onto hosed-up poo poo like arabic that needs to be written in several different directions at the same time. seriously read the unicode bidi algorithm some time if you want your head to explode.

the correct answer would, of course, have been for everyone to standardize on ASCII for text. if you want another language, romanize it. if you want to represent another script because you think it looks nice or because you want to preserve your cultural heritage, use a bitmap. problem solved.

now, as for date/time APIs, clearly the optimal solution is to adopt swatch internet time, because

# ? Oct 15, 2015 23:35

Andy Dufresne: Aug 4, 2010; The only good race pace is suicide pace, and today looks like a good day to die

Mogomra posted:

Not the most horrifying thing I've ever seen, but certainly :pwn:

code:

for (var i = 0; i < 26; i++) {
	if (typeof request.app.requestData.fields['image_' + i + '_websrc'] !== 'undefined') {
		if (request.app.requestData.fields['image_' + i + '_websrc'].length > 0) {
			fileCountToGrab++;
			var VALID_EXTS = [".jpg", ".jpeg", ".bmp", ".png", ".gif", "ttf", "otf"];
			if (VALID_EXTS.indexOf(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().substring(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().length - 4).toLowerCase()) == -1 && VALID_EXTS.indexOf(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().substring(request.app.requestData.fields['image_' + i + '_websrc'].toLowerCase().length - 5).toLowerCase()) == -1) {
				return false;
			}
		}
	}
}

I love ReSharper because with the click of a button I can make poo poo like this readable.

# ? Oct 15, 2015 23:45

Asymmetrikon: Oct 30, 2009; I believe you're a big dork!

Actually, the correct answer would have been to define a new, language-agnostic regular alphabet and make people use that

# ? Oct 15, 2015 23:47

b0lt: Apr 29, 2005

Asymmetrikon posted:

Actually, the correct answer would have been to define a new, language-agnostic regular alphabet and make people use that

it's called ascii

# ? Oct 15, 2015 23:50

canis minor: May 4, 2011

Asymmetrikon posted:

Actually, the correct answer would have been to define a new, language-agnostic regular alphabet and make people use that

Isn't that APL?

# ? Oct 16, 2015 00:04

Asymmetrikon: Oct 30, 2009; I believe you're a big dork!

_^(wingdings)

# ? Oct 16, 2015 00:16

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Andy Dufresne posted:

I love ReSharper because with the click of a button I can make poo poo like this readable.

How does resharper know that .toLowerCase or such don't have side-effects? Or does it require you to keep track of that?

# ? Oct 16, 2015 00:22

qntm: Jun 17, 2009

Hammerite posted:

Correct me if I'm wrong, but I believe that there is exactly one Unicode character that becomes more than one character on lowercasing, U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE, also known as "the Turkish I".

Looks like you're correct. LATIN CAPITAL LETTER I WITH DOT ABOVE becomes i̇ on lowercasing, which is U+0069 LATIN SMALL LETTER I followed by the very last thing you would expect, U+0307 COMBINING DOT ABOVE.

Asymmetrikon posted:

Actually, the correct answer would have been to define a new, language-agnostic regular alphabet and make people use that

Marain!

# ? Oct 16, 2015 00:27

necrotic: Aug 2, 2005; I owe my brother big time for this!

Subjunctive posted:

How does resharper know that .toLowerCase or such don't have side-effects? Or does it require you to keep track of that?

I think it would leave those in, but change the formatting dramatically.

# ? Oct 16, 2015 00:33

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

qntm posted:

Looks like you're correct. LATIN CAPITAL LETTER I WITH DOT ABOVE becomes i̇ on lowercasing, which is U+0069 LATIN SMALL LETTER I followed by the very last thing you would expect, U+0307 COMBINING DOT ABOVE.

that's still just one character

# ? Oct 16, 2015 00:38

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Plorkyeran posted:

that's still just one character

That may or may not mean that it only contributes 1 to the string's length, because JS is pretty awesome that way. See also surrogates.

# ? Oct 16, 2015 00:40

fritz: Jul 26, 2003

Asymmetrikon posted:

Actually, the correct answer would have been to define a new, language-agnostic regular alphabet and make people use that

IPA.

# ? Oct 16, 2015 00:55

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Subjunctive posted:

That may or may not mean that it only contributes 1 to the string's length, because JS is pretty awesome that way. See also surrogates.

I'm not aware of any language where .length actually gives the number of characters. The languages that actually try generally end up going the route Swift did with no generic length/count/etc. property and making you explicitly say what sort of thing you want the count of.

# ? Oct 16, 2015 01:15

Soricidus: Oct 21, 2010; freedom-hating statist shill

first, define "character"

# ? Oct 16, 2015 01:18

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Soricidus posted:

first, define "character"

doing the right thing, even when nobody's looking

# ? Oct 16, 2015 01:24

Soricidus: Oct 21, 2010; freedom-hating statist shill

That definitely doesn't sound like javascript

# ? Oct 16, 2015 01:25

Dylan16807: May 12, 2010

Soricidus posted:

That definitely doesn't sound like javascript

Well javascript counts UTF-16 code units, which is an answer guaranteed to be wrong for pretty much all uses.

# ? Oct 16, 2015 01:30

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Soricidus posted:

first, define "character"

yes, that's what leads to the swift/perl6 solution

# ? Oct 16, 2015 01:36

VikingofRock: Aug 24, 2008

Soricidus posted:

the correct answer would, of course, have been for everyone to standardize on ASCII for text. if you want another language, romanize it. if you want to represent another script because you think it looks nice or because you want to preserve your cultural heritage, use a bitmap. problem solved.

When people say things like this, I can never tell if they are being serious.

# ? Oct 16, 2015 01:43

Pavlov: Oct 21, 2012; I've long been fascinated with how the alt-right develops elaborate and obscure dog whistles to try to communicate their meaning without having to say it out loud
Stepan Andreyevich Bandera being the most prominent example of that

Soricidus posted:

first, define "character"

A series of ones and zeroes, but not too many zeroes, just a moderate amount.

# ? Oct 16, 2015 02:46

Kazinsal: Dec 13, 2011

Dylan16807 posted:

Well javascript counts UTF-16 code units, which is an answer guaranteed to be wrong for pretty much all uses.

It's all about the UTF-EBCDIC.

# ? Oct 16, 2015 03:20

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

I forget if it's V8 or SpiderMonkey that has an "encode to UTF8" function, but doesn't special casing surrogate pairs, so it really gives you CESU-8. Even the incredibly sane opinion of "just always use UTF8" is nearly impossible to implement.

# ? Oct 16, 2015 04:24

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 10:54

Athas: Aug 6, 2007; fuck that joker

As always, the best solution is to stop trying to do text transformation if you can't actually know anything about the text. Doing weird concatenation, length checks, upper/lowercasing and the like is total nonsense for the (very broad) concept of general human text, and is usually not what you actually want. In the horror we are talking about, it's seems pretty clear that he just wants a non-case-sensitive comparison, which can be done in a much saner way without actually transforming text. It's still fraught with peril, though, and in this case unnecessary: he probably shouldn't treat a URL as arbitrary human text, but just as a bunch of ASCII characters.

Basically, my point is this: correct text transformation is in the general case hard for the same reason that natural language processing is hard. It's AI-hard. Stop trying to do it. It's not what you want to do anyway. See also: programmers making assumptions about names. These should also be seen as black boxes.

# ? Oct 16, 2015 05:44

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »