Coding Horrors: You can gather all your technical debt into one easy framework!

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »

b0lt: Apr 29, 2005

JawnV6 posted:

Put the jne on a different page requiring a fault or better yet straddling the canonical boundary. Everyone gets that edge wrong. I imagine those situations will never get fused though, the FE will recognize it needs to issue the fault before the decoder ever sees the pair.

Vaguely related: I know of at least three recent CPUs that are broken in various fun ways when variable length instructions cross a page boundary

# ? Feb 14, 2017 22:03

Adbot: ADBOT LOVES YOU

# ? May 28, 2024 02:26

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

rjmccall posted:

who gives a poo poo if someone manages to find a contrived use for one and it has terrible performance?

Your competitors' salespeople?

Someone started a slapfight in the SHSC AMD thread over AVX performance which I'm sure tens of people might have a valid reason to care about in the expected lifetime of the hardware but it's going to influence purchasing decisions somewhere.

# ? Feb 14, 2017 22:17

ExcessBLarg!: Sep 1, 2001

omeg posted:

I only write low level stuff in C nowadays. I've seen
code:
do {
   ...
   if (error) break;
} while (0); 
used for error handling. Thoughts?

I think that's Pascal's version of goto, and it's terrible. Also if the loop doesn't return from the function then you have to re-check the error condition following the loop anyways.

Seriously though, goto has its place. Yes, in structured languages there's alternatives to any goto, but they sometimes require splitting a function where it doesn't otherwise make sense to do so, results in code duplication (usually condition checks), loop break abuse, or things like that. Goto is fine when it makes code concise and enhances its readability.

Also, it doesn't really make sense to gratiutiously use gotos in modern languages the way people did in the past, so there's not really that much risk of abuse. In old dialects of Basic or Fortran, goto was often used because C-style control structures simply didn't exist, or because the environment didn't allow for free-form editing and if you needed to insert code in the middle of a routine, you often had to use a pair of unconditional gotos to splice it in.

# ? Feb 15, 2017 02:31

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

JawnV6 posted:

Ok, interrupt deferral makes sense. But you can still fault after the cmp and before the second op though? Put the jne on a different page requiring a fault or better yet straddling the canonical boundary. Everyone gets that edge wrong. I imagine those situations will never get fused though, the FE will recognize it needs to issue the fault before the decoder ever sees the pair.

Yeah, you're probably going to fuse only when you have actual instruction bytes available, or something equivalent (like you're getting fused ops from a stream cache). If you have to stall to fetch bytes for instruction 2, you're going to issue instruction 1 immediately instead of waiting to fuse it. Fusion is an opportunistic optimization that helps, but isn't essential; in practice you can get away with a conservative approach of only fusing when it's easy to prove safety and still see enough reasonable performance gains to justify the effort.

Making the fusing logic very simple and conservative also means that it takes less power, less die area, and is easier to do timing for. It wouldn't be shocking to see rules like "we only fuse if one of the instructions is a MOV and the other instruction is a non-memory-op and both instructions are coming out of the stream cache" still being good enough to see 1% improvement on some benchmark that some client cares about. If you can get a 1% improvement for what might as well be free then you're going to take it, especially if it's the sort of thing that you can potentially tune and improve in future generations to be even better.

JawnV6 posted:

There's no good way to fuse up a zero-length call though. The STA/STD pair required for the call's implicit push getting fused and not offering an instruction boundary between them, if the implicit stack location had to be paged in just to be marked dirty, etc. Too much going on there.

Off the top of my head, I can't think of any way that you'd be able to win by fusing a CALL/POP pair outside of something esoteric like binary translation where you're dynamically recompiling the program stream in large blocks. You're just going to special case CALL 0 in your call/return predictor, and otherwise not do anything special. The POP immediately following the CALL will get its result out of whatever store-forwarding or memory-renaming structures you have, and it should be no more or less performant than any other random top-of-stack manipulation. This is one of those things that is certainly kind of weird, but it's similar enough to all the other weird crap that you have to deal with all the time that your existing microarchitecture should handle it fine.

# ? Feb 15, 2017 03:31

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Munkeymon posted:

Your competitors' salespeople?

Someone started a slapfight in the SHSC AMD thread over AVX performance which I'm sure tens of people might have a valid reason to care about in the expected lifetime of the hardware but it's going to influence purchasing decisions somewhere.

Nah. Vector unit performance matters for general-purpose hardware because sometimes people want to do big things that can be meaningfully vectorized, and the people who wrote that code probably cared enough to consider using AVX even if the vast majority of programmers will never have to. If nothing else, some of that code is in SPEC. In contrast, making a true call to the next instruction is not a real thing, just like repeatedly loading words from absolute address -1 is not a real thing.

# ? Feb 15, 2017 05:11

sarehu: Apr 20, 2007; (call/cc call/cc)

It could be a code size minimizing optimization for a function that ends for _ = 1 to 2 { ... } while under register pressure?

# ? Feb 15, 2017 05:21

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

sarehu posted:

It could be a code size minimizing optimization for a function that ends for _ = 1 to 2 { ... } while under register pressure?

If you're under register pressure, you're almost certainly using callee-save registers in the loop, which will break the pattern unless you do another local call first. But yes, that's cute. If your function ends in a loop with a power-of-two trip count, and you're on an x86-like platform that pushes the PC on call, and the loop doesn't need any callee-save registers or the stack pointer, and there's no reason you need to keep the stack aligned, then you can emit the loop using chained calls and rets and all of your branches will be perfectly predicted.

Hmm, if you're willing to do another interior call as set-up, you can not only lift the CSR restriction but also do this at an arbitrary position in the function. So the only real restriction is that you need to not use the stack pointer directly. And of course you can do this on an LR architecture by just pushing and popping the PC yourself in your sub-function.

rjmccall fucked around with this message at 08:39 on Feb 15, 2017

# ? Feb 15, 2017 08:33

dougdrums: Feb 25, 2005; CLIENT REQUESTED ELECTRONIC FUNDING RECEIPT (FUNDS NOW)

JawnV6 posted:

rjmccall alluded to it, for position-independent code like libraries sometimes you need to grab the PC to figure out where "here" is so you can jump "there."

That makes sense, I'm sure I've done that before and not taken notice. I was hoping there was some hot mess that did it as a matter of operation. If I had some sort of dynamic compiler supervisor, it would only need to figure out where it is once to offset to code pages. So the only reason to do it repeatedly is if you have some code that loads some code then that code loads even more code ... maybe that's a real thing too though ...

I mean I wrote one of those and I can't remember for the life of me how I figured that offset.

quote:

Also memory is changed, if the chip got too clever and tried to skip that bit it would cause other issues.

I thought this might be the case, that you could just ignore writing it, but I see how you can't really guarantee that it won't be read again.

dougdrums fucked around with this message at 12:45 on Feb 15, 2017

# ? Feb 15, 2017 12:41

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

You don't usually need a PIC base for jumps and calls, because those instructions can take relative offsets on every architecture I've ever seen. If you don't even have a relative offset, e.g. because the symbol resolves outside of the current image, the linker will use the offset of a stub that materializes the address in some other way � typically the stub loads from a global that the loader initializes, but on some targets the loader just directly rewrites the stub. A similar technique gets used when the instruction encoding doesn't allow a big enough immediate offset to reach an arbitrary place in the image, which is common with fixed-width instruction encodings (e.g. ARM64 allows an offset of �128MB, but images can reasonably get bigger than that): the compiler optimistically uses an instruction that uses a relative offset, and if the linker can't make that work, it just makes the instruction go to a stub.

The place where you need a PIC base is when you need the true address of some global and the target doesn't directly support PC-relative addressing; that means just i386 these days. In that case, you explicitly compute your PIC base and then add a relative offset to that, assuming you have one. In theory, on a target like i386 where MOV and LEA can take a 32-bit immediate absolute address, you could have the compiler just emit that instruction and tell the linker to fill it in during load. Unfortunately, that would be a disaster for launch times and general memory performance, because accessing global memory is common enough that it would dirty almost every page in the text segment. Rewriting a bunch of stubs doesn't have that problem because they're densely packed together, so you're only dirtying a page or two.

rjmccall fucked around with this message at 17:13 on Feb 15, 2017

# ? Feb 15, 2017 17:11

sarehu: Apr 20, 2007; (call/cc call/cc)

My register pressure assumption was because with a register free you could use 5 bytes this way instead:

code:

mov al, 2  ; 2 bytes
top:
...
dec eax  ; 1 byte
jpe top  ; 2 bytes

I wasn't thinking about callee-save registers, but without using them, you still save a register. So... maybe it's "caller-save register pressure."

# ? Feb 15, 2017 19:06

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

sarehu posted:

My register pressure assumption was because with a register free you could use 5 bytes this way instead:
code:
mov al, 2  ; 2 bytes
top:
...
dec eax  ; 1 byte
jpe top  ; 2 bytes
I wasn't thinking about callee-save registers, but without using them, you still save a register. So... maybe it's "caller-save register pressure."

call rel16 is 3 bytes and has the advantage of allowing all branches to be perfectly predicted.

EDIT: Oh, four bytes, I guess, because you'd need to put an operand-size override prefix on it. I think that's one of those tricks that you only do in code-size-at-all-costs mode.

rjmccall fucked around with this message at 20:39 on Feb 15, 2017

# ? Feb 15, 2017 20:32

Dr Monkeysee: Oct 11, 2002; just a fox like a hundred thousand others; Nap Ghost

ExcessBLarg! posted:

Also, it doesn't really make sense to gratiutiously use gotos in modern languages the way people did in the past, so there's not really that much risk of abuse. In old dialects of Basic or Fortran, goto was often used because C-style control structures simply didn't exist, or because the environment didn't allow for free-form editing and if you needed to insert code in the middle of a routine, you often had to use a pair of unconditional gotos to splice it in.

I think this was pointed out earlier in this very thread but Knuth's original "goto considered harmful" was written to admonish a bunch of prehistoric programmers to start using such new-fangled inventions as "if", "while", and "for". While there's very few legit uses for goto these days there's also barely anybody left who would would even consider goto a viable control structure.

It's just not a thing to care about anymore.

# ? Feb 15, 2017 21:36

nielsm: Jun 1, 2009

Dr Monkeysee posted:

I think this was pointed out earlier in this very thread but Knuth's original "goto considered harmful"

That was Dijkstra.

# ? Feb 15, 2017 21:39

Dr Monkeysee: Oct 11, 2002; just a fox like a hundred thousand others; Nap Ghost

Oops. I couldn't remember which one wrote it and my quick googling led me astray.

# ? Feb 15, 2017 21:41

sarehu: Apr 20, 2007; (call/cc call/cc)

rjmccall posted:

call rel16 is 3 bytes and has the advantage of allowing all branches to be perfectly predicted.

EDIT: Oh, four bytes, I guess, because you'd need to put an operand-size override prefix on it. I think that's one of those tricks that you only do in code-size-at-all-costs mode.

Oooh.

Edit: You'll still have F'd up prediction unless you do 66 e8 01 00 90 though (jump 1 byte ahead, over a nop).

(Unfortunately jumping backward 1 or 2 bytes can't be made to help.)

sarehu fucked around with this message at 22:42 on Feb 15, 2017

# ? Feb 15, 2017 22:11

JawnV6: Jul 4, 2004; So hot ...

ShoulderDaemon posted:

Making the fusing logic very simple and conservative also means that it takes less power, less die area, and is easier to do timing for. It wouldn't be shocking to see rules like "we only fuse if one of the instructions is a MOV and the other instruction is a non-memory-op and both instructions are coming out of the stream cache" still being good enough to see 1% improvement on some benchmark that some client cares about. If you can get a 1% improvement for what might as well be free then you're going to take it, especially if it's the sort of thing that you can potentially tune and improve in future generations to be even better.

This is where it's sorta obvious my info is 5+ years out of date, I didn't know MOV was viable for any fusing. Register renaming makes some instructions quite light, but I thought that mechanism was separate from fusing. Stream cache also implies all goofy page/memory boundaries aren't going to be relevant.

Back in my day I filed a sub-1% perf bug on recursive traces :v:

ShoulderDaemon posted:

Off the top of my head, I can't think of any way that you'd be able to win by fusing a CALL/POP pair outside of something esoteric like binary translation where you're dynamically recompiling the program stream in large blocks.

Hmph. Binary translation shouldn't be that esoteric. C'mon, get things in gear over there.

rjmccall posted:

You don't usually need a PIC base for jumps and calls, because those instructions can take relative offsets on every architecture I've ever seen.
...
The place where you need a PIC base is when you need the true address of some global and the target doesn't directly support PC-relative addressing

Thanks for the correction & detail. I'm really, truly glad to have left x86 (mostly) behind.

sarehu posted:

Oooh.

Edit: You'll still have F'd up prediction unless you do 66 e8 01 00 90 though (jump 1 byte ahead, over a nop).

(Unfortunately jumping backward 1 or 2 bytes can't be made to help.)

Don't be afraid to jump into the middle of instructions. EB FF will jump to the FF, which can be a DEC.

# ? Feb 15, 2017 23:16

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

JawnV6 posted:

This is where it's sorta obvious my info is 5+ years out of date, I didn't know MOV was viable for any fusing. Register renaming makes some instructions quite light, but I thought that mechanism was separate from fusing. Stream cache also implies all goofy page/memory boundaries aren't going to be relevant.

JawnV6 posted:

Hmph. Binary translation shouldn't be that esoteric. C'mon, get things in gear over there.

Please interpret my examples as demonstrative of a general idea and not as "here is what current Core microarchitecture actually does". I can't possibly disclose actual fusion rules from our microarchitectures. Similarly, please understand "esoteric" to be within the context of "weirder than what most people think of as a normal CPU" and not "weirder than what Core microarchitectures may or may not genuinely do".

# ? Feb 16, 2017 00:54

sarehu: Apr 20, 2007; (call/cc call/cc)

JawnV6 posted:

Don't be afraid to jump into the middle of instructions. EB FF will jump to the FF, which can be a DEC.

If you try that, 66 e8 fe ff doesn't work and 66 e8 ff ff __ will have to burn a byte anyway. And there's not really any practical choice for that byte.

# ? Feb 16, 2017 01:14

beuges: Jul 4, 2005; fluffy bunny butterfly broomstick

JawnV6 posted:

Don't be afraid to jump into the middle of instructions. EB FF will jump to the FF, which can be a DEC.

Is your name Mel?

# ? Feb 16, 2017 01:54

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

sarehu posted:

Oooh.

Edit: You'll still have F'd up prediction unless you do 66 e8 01 00 90 though (jump 1 byte ahead, over a nop).

Why? The forward branches are unconditional and immediately resolvable and the backward branches are returns. The return address predictor isn't, like, keyed by anything.

# ? Feb 16, 2017 02:16

sarehu: Apr 20, 2007; (call/cc call/cc)

rjmccall posted:

Why? The forward branches are unconditional and immediately resolvable and the backward branches are returns. The return address predictor isn't, like, keyed by anything.

The return address predictor would ignore the zero length call, right? And then, an unbalanced ret. My comment was under that assumption.

# ? Feb 16, 2017 03:34

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Ah. This entire digression was about whether it was reasonable for a processor to implement that special case by just assuming that calls with zero offset never happened in real code, so no, I was analyzing under the hypothesis that a processor wouldn't want to do that. Obviously, if the processor takes special care to make an instruction less efficient, you should not use that instruction.

# ? Feb 16, 2017 05:22

JawnV6: Jul 4, 2004; So hot ...

It's possible to get the same code to run twice with different values after a 0 length call, but there's certainly easier ways to do it with 7 bytes

code:

0:  66 e8 00 00             call   L0
L0:
4:  ff 04 24                inc    DWORD PTR [esp]

On the first pass, the stored EIP is incremented to 5. The ret takes you to the second byte, which 04 24 decodes to "add al, 24", then the code runs again with eax+24. The next time the ret is hit it consumes the proper return address.

dec [esp] makes the opcode a 00, which is fairly useless targeting memory indexed by eax. Other manipulations of [esp] might prove useful, but most of the time the modrm byte ends up being 0x24 and still targeting [esp] on the second time through, which will muck up the real return address. inc/add al thankfully leave EIP in the same place which greatly simplifies the reasoning.

# ? Feb 17, 2017 04:24

vOv: Feb 8, 2014

It's been ages since I touched asm in college so I have no clue what the hell you all are talking about but I'm pretty sure it still counts as a horror given that you're talking about jumping into the middle of instructions.

# ? Feb 17, 2017 05:18

Absurd Alhazred: Mar 27, 2010; by Athanatos

vOv posted:

It's been ages since I touched asm in college so I have no clue what the hell you all are talking about but I'm pretty sure it still counts as a horror given that you're talking about jumping into the middle of instructions.

Don't kinkshame.

# ? Feb 17, 2017 05:20

hobbesmaster: Jan 28, 2008

vOv posted:

It's been ages since I touched asm in college so I have no clue what the hell you all are talking about but I'm pretty sure it still counts as a horror given that you're talking about jumping into the middle of instructions.

Having only ever touched assembly on tiny embedded platforms (avr, arm thumb ) every new fact I learn about modern x86 is more horrifying than the last.

# ? Feb 17, 2017 05:24

pseudorandom name: May 6, 2007

Hey now, x86 is a reasonably decent bytecode format.

Not as good as Java or .NET CLI, but they have 40 years of legacy to deal with.

# ? Feb 17, 2017 05:55

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

vOv posted:

It's been ages since I touched asm in college so I have no clue what the hell you all are talking about but I'm pretty sure it still counts as a horror given that you're talking about jumping into the middle of instructions.

The real horror is that Jawn is now talking about modifying the instruction stream.

# ? Feb 17, 2017 07:02

Absurd Alhazred: Mar 27, 2010; by Athanatos

rjmccall posted:

The real horror is that Jawn is now talking about modifying the instruction stream.

Don't cross the streams!

# ? Feb 17, 2017 07:03

Qwertycoatl: Dec 31, 2008

I don't think the Guardian's web people have discovered the wonders of version control: https://guardiannewsampampmedia.formstack.com/forms/js.php/untitled_form_19_copy_5_copy_2_copy_copy_copy_1_copy_2_copy_1_copy_1_copy_1_copy_1_copy_1_copy_copy_1_copy_1_copy_1_copy_copy_copy_copy_1_copy_4_copy_copy_copy_copy

# ? Feb 17, 2017 09:20

JawnV6: Jul 4, 2004; So hot ...

rjmccall posted:

The real horror is that Jawn is now talking about modifying the instruction stream.

It's not SMC. The write is hitting the return IP stored on the stack.

code:

v-- inc [esp]
FF 04 24
   ^--- add al, 0x24

Not modifying, just re-indexing the instruction stream, would still pass something like W^X.

# ? Feb 17, 2017 17:49

redleader: Aug 18, 2005; Engage according to operational parameters

Sometimes it's the little things:

C# code:


int i = 0;

foreach (Thing thing in response.Things) {

       Assert.AreEqual(result[i], response.Things[i].ThingID);

       Assert.AreEqual("foo", response.Thing[i].Type);

       i++;

}

# ? Feb 17, 2017 22:47

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

JawnV6 posted:

It's not SMC. The write is hitting the return IP stored on the stack.
code:
v-- inc [esp]
FF 04 24
   ^--- add al, 0x24
Not modifying, just re-indexing the instruction stream, would still pass something like W^X.

Oh right, of course.

...although I do have to note that "add al, 24" does not compute eax+24.

# ? Feb 17, 2017 22:56

Absurd Alhazred: Mar 27, 2010; by Athanatos

redleader posted:

Sometimes it's the little things:

C# code:

int i = 0;

foreach (Thing thing in response.Things) {

       Assert.AreEqual(result[i], response.Things[i].ThingID);

       Assert.AreEqual("foo", response.Thing[i].Type);

       i++;

}

LOL, I've seen people abuse foreach/for (x : S) with an additional running index, but this one doesn't even refer to thing. :psyduck:

# ? Feb 18, 2017 01:51

Dr. Stab: Sep 12, 2010; 👨🏻‍⚕️🩺🔪🙀😱🙀

Qwertycoatl posted:

I don't think the Guardian's web people have discovered the wonders of version control: https://guardiannewsampampmedia.formstack.com/forms/js.php/untitled_form_19_copy_5_copy_2_copy_copy_copy_1_copy_2_copy_1_copy_1_copy_1_copy_1_copy_1_copy_copy_1_copy_1_copy_1_copy_copy_copy_copy_1_copy_4_copy_copy_copy_copy

I'm not hip on all the new webdev microservices implementations, but what the hell does this file do that an HTML file couldn't?

# ? Feb 18, 2017 02:51

necrotic: Aug 2, 2005; I owe my brother big time for this!

None of that is hip if it helps :shrug:

# ? Feb 18, 2017 03:16

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

I'm js.php

# ? Feb 18, 2017 12:13

JawnV6: Jul 4, 2004; So hot ...

rjmccall posted:

Oh right, of course.

...although I do have to note that "add al, 24" does not compute eax+24.

It's x86 so it's true ~86% of the time.

# ? Feb 19, 2017 07:22

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

JawnV6 posted:

It's x86 so it's true ~86% of the time.

# ? Feb 19, 2017 18:43

Adbot: ADBOT LOVES YOU

# ? May 28, 2024 02:26

Tank Boy Ken: Aug 24, 2012; J4G for life; Fallen Rib

The above just made me smile after I made a funny typo while using dotnet. I actually typed in dotnot and had to laugh when the programm complained and I actually realised my mistake. You know: dot NOT. HAH. Okay I'm sorry.

# ? Feb 20, 2017 15:43

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »