Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
what is best in optimizer pipeline
value numbering
code motion
peepholing
loop idiom recognition
legalization
thread gassing
op banning
View Results
 
  • Locked thread
rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
hear me o muse, cause like the introduction to a victorian novel we're preparing to execute



all i need baby is a source a target and a host



gimme that lithe sweet flexible intermediate form more more i need it

https://twitter.com/jckarter/status/869214560279863301

what you call undefined behavior is the only way i know how to live



mmm god lowering peepholes forwarding yes

rjmccall fucked around with this message at 20:49 on Jun 27, 2017

Adbot
ADBOT LOVES YOU

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
what the gently caress, fixed

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

JewKiller 3000 posted:

ssa form: cool and good?

arguments good, phi nodes bad

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

poty posted:

whats a good introductory book i can read as an electrical engineer that studied some computer (architecture) stuff but nothing useful about compilers

a lot of classic compilers books are really useless because they're actually all about lexing and parsing

here's the main thing you need to understand about compilers as a hardware person. assembly language is typically specified in terms of its translation to machine code + object-file primitives: everything in an assembly file is an explicit directive which is supposed to correspond exactly to something in the output, and there isn't much room for variation. that is not true of "high-level" programming languages, including c, which are so-called because they introduce their own abstraction level: that is, they introduce an abstract machine with its own formal semantics, and the translation into an executable form like machine code is merely a (typically lenient) implementation mechanism for that abstract machine. so for a lot of things, you shouldn't just try to understand the compiler behaviorally, you really need to understand the formal semantics of the programming language

a language implementation's relationship to its language specification is often a lot like a memory controller's relationship to the architecture's memory ordering rules. there are things that will work in practice for a specific system that are not guaranteed to work in general because there is always additional flexibility that is simply not yet being exploited by the implementation. a new implementation comes out that devirtualizes the right call / speculates the right load and suddenly the fact that the original code was never actually correct becomes important

otherwise i don't have a specific book recommendation but maybe try to understand some specific optimization like copy propagation and how the compiler reasons about aliasing

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
having a jit enables some really neat dynamic code generation and specialization tricks. i'd love to see more language implementations based on a sane language + execution environment that really take intelligent advantage of both static and dynamic information

instead it feels like jits always spend the bulk of their effort (both in development and at runtime) working around stupidity in the language / runtime model. "i couldn't possibly statically inline this trivial getter, it's in a different class file" :monocle:

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

JawnV6 posted:

whats a runtime

code which supports the execution of programs, e.g. to implement language features. at the very minimum it includes a loader. generally it also includes functions and variables that support various language features, either for performance (e.g. memcpy and memset) or complexity (e.g. soft division, c++ dynamic casts) or correctness (e.g. the non-lock-free cases of c11 _Atomic), as well as code to support library features that interact with basic behavior (e.g. setjmp, atexit). the line between "runtime" and "standard library" gets pretty vague

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
eh. it's less awkward than "source to source compiler" or "source to source transformation"

it's always to some kind of source code, though, i've never seen it used for e.g. javac

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Shinku ABOOKEN posted:

what are phi nodes? why are they bad?

it's the dominant way of handling data flow when control flow merges in ssa

here's a dumb c program

C code:
int x;
if (whatever) {
  x = foo();
} else {
  x = bar();
}
baz(x);
in a more explicit representation that isn't yet in proper ssa form, this looks like:

code:
  if %whatever goto trueBlock else goto falseBlock
trueBlock:
  %fooResult = call @foo()
  store %fooResult to %x
  goto contBlock
falseBlock:
  %barResult = call @bar()
  store %barResult to %x
  goto contBlock
contBlock:
  %mergedValue = load %x
  call @baz(%mergedValue)
the purpose of ssa is to make this data flow explicit, i.e. we don't want to see this opaque load from a variable, we want to see the last value stored there. in a lot of cases, that's easy enough to do, but how do we talk about it after this kind of control-flow merge, where the value is different based on where we came from?

phi nodes are the dominant answer. a phi node is a sort of pseudo-instruction you can put at the start of a block when it has multiple predecessors like this, and it basically says what the value is when coming from all the predecessors:

code:
  if %whatever goto trueBlock else goto falseBlock
trueBlock:
  %fooResult = call @foo()
  goto contBlock
falseBlock:
  %barResult = call @bar()
  goto contBlock
contBlock:
  %mergedValue = phi %fooResult from trueBlock, %barResult from falseBlock
  call @baz(%mergedValue)
they suck because they're a weird orthogonal special case. they look like instructions, but they can only go at the start of a block, and the values they use are not normally valid to reference in the block. if you were writing some optimization involving calls to @foo, and you wanted to insert code everywhere the return value is used, you can't add it here; you have to recognize that the use is a phi node and put the code at the end of trueBlock instead

the way better alternative is to say that basic blocks can have parameters and gotos can have arguments, which ends up looking kindof like a functional program with a lot of explicit tail calls:

code:
  if %whatever goto trueBlock else goto falseBlock
trueBlock:
  %fooResult = call @foo()
  goto contBlock(%fooResult)
falseBlock:
  %barResult = call @bar()
  goto contBlock(%barResult)
contBlock(%mergedValue):
  call @baz(%mergedValue)

rjmccall fucked around with this message at 04:21 on Jun 28, 2017

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
in theory clang is a natural cross compiler and can in fact just generate code for anything. you do have to tell the build system to link in an appropriate backend, though, which is reasonable because backends get pretty big. but it's not like gcc where you need a different binary for every target

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

redleader posted:

related (but possibly a little off topic) question: what's up with the c/c++ standards committees that makes them prefer undefined over implementation-defined behavior? i mean surely most compilers would know their target's behavior with (un)signed overflow (for example)

implementation-defined behavior still has to be semantically well-defined: that is, the implementation has to define exactly what happens. the vast majority of undefined behavior is not this kind of "policy" stuff where that's possible

for example, dereferencing a bad pointer is undefined behavior because there's no reasonable way for the implementation to dynamically determine pointer validity; it would need to track pointers and scopes and allocations and issue a ton of checks and it would all be obscenely expensive and constraining. in theory you could define the results of dereferencing a null pointer to always be a guaranteed trap, but that would force null checks on every access on targets that can't guarantee an unmapped address (which is almost all of them, because many oses do allow you to mmap the zero page)

in the policy places, it usually comes down to optimizability. these rules are designed so that the compiler doesn't have to pessimize behavior just to handle some corner case that, in the committee's judgement, shouldn't be happening in a sensible program. for example, the type aliasing rules are there so that the compiler isn't blocked from doing straightforward memory optimizations out of paranoia that you might have cast the same pointer to different types. the signed-int overflow rules are there so that the compiler can ignore the overflow-related corner cases when looking at an idiomatic for-loop. and so on

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

you should read the spj paper linked from the tweet in the op

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Deep Dish Fuckfest posted:

how good is modern jitted code compared to what's emitted by a traditional toolchain these days anyway? i've heard about the ability to use info you don't have at compile time to potentially produce better code before, but i have no idea how much of a difference it makes in practice. or any idea what traditional optimizations aren't possible when you're jitting, if any

there are a ton of trade-offs around jits, it's really super dependent on the language and the runtime environment

if the language does a lot of dispatch based on the dynamic types of values, like virtual method calls or the overloaded behavior of + in javascript, then jits have the massive natural advantage of being able to dynamically specialize the code for the types that are actually seen. like if the code is "for (i in array) { result += array[i] }" and dynamic sampling suggests that the values in the array are always integers then the jit might generate machine code for this loop where result is actually an integer variable and the loop body does a type check and an overflow check and just bails out to an interpreter if something weird ever happens

even more static languages can end up with a lot of dynamic-seeming behavior because of the build model. like how java class files are meant to each be a kind of miniature dynamic library with its own stable abi w.r.t other classes (other than static final ints), so the jit is actually the first thing that's allowed to do cross-class optimization. or like how the language lets you always link in more code in java / c++, and that code might subclass some class or implement some interface, so you can't statically do an exhaustive class hierarchy analysis (e.g. to devirtualize calls to some method that in practice is never overridden); but a jit can optimistically devirtualize based on the code it sees as long as it has the ability to change its mind later

or you can decide based on the dynamic profile of your program that it's totally worth unrolling / vectorizing this loop but not these other ones

on the other hand none of this dynamic profiling is free and it takes a lot of time + power for it to pay for itself and generally some residue of it sticks around even after specialization. and it's usually the case that profiles don't vary that much from execution to execution, so there's no inherent reason you couldn't generate a good-enough static profile using a representative workload and then do all the same optimizations based on that

the other thing a jit gets you is the ability to optimize for an exact target. like if you statically compile for a wide class of machines, you have to either make pessimal assumptions about the hardware or do a bunch of runtime checks. or if you statically compile for a range of runtime-system releases, you have to perform opaque runtime calls in order to allow details of the runtime system to change. the former is generally not that big of a problem, but the latter can add up to a lot of overhead, especially if you're automatically managing memory. garbage collectors in particular want to change code-generation patterns in like a thousand different places, and the changes they make are very specific to the gc algorithm in use, so if you're generating code for an unknown gc algorithm you basically have to emit every single store of an object pointer as an opaque call into the runtime and it's just murderously slow. even with reference-counting the overhead of having to call the runtime really adds up

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Malcolm XML posted:

yeah but equivalent doesn't mean better than

continuations for life

i too recommend using intermediate representations that actively make it more difficult to perform important optimizations

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
it's fun to imagine starting from scratch: writing a primitive assembler in machine code, then a primitive compiler in assembly, then slowly bootstrapping that into a complete system

on the other hand, how exactly are we writing the machine code without a computer? once upon a time i think you could toggle this stuff directly into memory using switches or something on the side of the storage system, but those days are gone. and why did They Who Done The Event wipe all the drives but not any of the firmware? maybe somebody hid a compiler in the firmware. oh, and a kernel, that would be grand

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

JewKiller 3000 posted:

alright those were all easy let's do a fun one. what optimization passes do you think are most dangerous? i'm not referring to how they interact with a buggy program, because that would vary based on the program, i mean the compiler code itself. which aggressive optimizations are most likely to accidentally generate incorrect code because they're hella complicated/old/unmaintained/no one sane ever uses them? and don't tell me none or i'll ask for a formal proof :v:

that would vary with the compiler a lot

in clang, i can tell you that the objc arc optimizer is a bit suspect. it used to be really buggy, and then we neutered it a lot, and now i think it mostly just doesn't optimize all that well

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

emoji posted:

tfw your mom programmed punch cards back in the day but can't set up a router (until you forced her to go mac)

apparently my mom did this one summer. she doesn't remember much about it though

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

awesome

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

eschaton posted:

this looks like a transformation, is there a semantic difference I'm not seeing or is it really just a means of making phi nodes implicit so they're easier to reason about?

phi-like branch arguments can be restricted to specific terminators, other terminators can introduce path-specific values in a natural way. it's not like inherently more expressive, no

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
it really depends on how far away from the normal compiler output you want to get. in general it's stupider than you think though

swift's incremental compilation is currently just compiler-directed dependency tracking and we still type-check and re-emit the entire object file from scratch. that's way less incremental than it ought to be, but it has the distinct advantage of making it easy to validate that incremental builds are producing the same results as non-incremental builds

on the other end of the spectrum there are systems where the incremental result is a sort of patch file to the previous executable. so the results of incremental builds are really different from the results of from-scratch builds, but only the precise minimum needs to change

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
yeah the best part of that article is the repeated conflation of the problem with like a million other things. notably it is not characteristic of cisc architectures to give you direct access to the pc. anyway yeah i get what they were trying to do with the ppc abi but ugh, doing pic on pre-pic architectures is just the worst

apple's arm32 abi is pretty terrible, mostly around floating-point arguments. the calling convention clearly predates the idea that arm processors might drive general-purpose computation and/or actually have a floating-point unit; it's basically an all-arguments-on-stack convention that pops the first 16 bytes into integer registers, which means e.g. an early double will get passed as two ints. also the arm bitfield layout rules are needlessly different from the standard sysv rules

i386 is just really old and passes all arguments on the stack, which you'd think would be simple but it actually causes a lot of complexity because there are a dozen different non-standard calling conventions and abi flags that people use to try to speed things up. also there are a ton of special cases with return values, like how complex floats get returned as two values on the floating point stack but a struct of two floats gets returned in memory

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
that predates me, but the story i've heard is that it was just expedient: it was the only abi implemented in the version of gcc we were using for bringup. there was plenty of other compiler work to do at the time, and it was okay to put off because we didn't have binary compatibility requirements until we added the app store. we did actually change a bunch of stuff before releasing the first ios sdk, including some fundamental abi stuff; for example, earlier versions of ios actually used the fragile objc runtime and setjmp/longjmp exceptions. but there was too much other stuff in flight, and too much assembly already written, for anybody to go back and change the basic calling convention

the apple arm64 abi is actually very close to arm's. the major difference is the variadic calling convention. my understanding is that arm was considering using our rule but ultimately backed off because they got tired of chasing down the bugs it was exposing. i remember a lot of grumbling within apple, too, but it's just such a superior way of doing varargs that we decided to persevere

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
that's a great explanation and i have nothing to add to it

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
yeah, eabi is a lot better, as is the abi we use for the watch

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
tvOS uses the normal arm64 abi iirc, there are just some framework level changes

watchOS is very different at the abi level because as mentioned the 32-bit iOS abi really sucks

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
it's not literally the same build, no, but we manage to say that various linux distributions are built on the same kernel despite each configuring it differently

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

JawnV6 posted:

i got tricked by a compiler

i left a status thing uninitialized, it determined that the only value assigned was ERROR, elided the entire function and just kicked that back

that all makes sense, but why wouldn't volatile writes prevent the total collapse? shouldn't those get preserved even if the rest of the code drops away?

it probably decided that something you were doing was undefined behavior and therefore that such-and-such path could never be taken

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

:nono:

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Dylan16807 posted:

is there a reason they don't get rid of all the ub for trivial things like syntax errors? how did "An unmatched ‘ or ” character is encountered on a logical source line during tokenization (6.4)" survive into this century?

practically speaking, they haven't; there are no significant compilers that will fail to diagnose that

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
it looks like you can't and the bug is specifically about making it safer to run the linker as root?

what the gently caress

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
programs really should not be run as root unless they've been designed to be extremely cautious about how they interact with the system. the average toolchain is no more designed to work that way than a word processor is

i mean, it's not really a security hole on its own, not unless you're doing something that should set off warning bells in anyone not completely oblivious, like writing a compilation server or making the compiler setuid. but it does create a lot of unnecessary potential for bugs to completely gently caress over your system

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

JawnV6 posted:

lol here i am whining that clang doesn't keep the inferred guess at a signature flexible enough to admit the later decl is good enough

the type of an undeclared function is defined by the language standard, not the compiler, and it can never be changed because the old terrible code that uses it is the sort of old terrible code that would rely on it

honestly i would not even want to improve the signature "guess" if we could because it would just make people think that it was at all acceptable to use the feature

Adbot
ADBOT LOVES YOU

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
weak is badly overloaded even if you just consider it in the context of linkers

  • Locked thread