Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
op, the dec alpha has a fully weak memory ordering model which requires read barriers on atomic loads of objects even if all the subsequent loads from the object are value-dependent on the loaded pointer. intriguingly, it is my understanding that the weakness of the alpha's memory ordering model is purely theoretical, and all shipping alpha hardware in fact uses a stronger model which does guarantee that dependent loads will be properly ordered. nonetheless, because this is not architecturally guaranteed, any lock-free code must ensure that it uses proper barriers around atomic loads if it is ever ported to alpha. in principle, systems programming languages such as c and c++ would allow programmers to clearly state their requirements and then compile it optimally for the given platform, potentially avoiding load barriers when compiling for systems other than the alpha. unfortunately, inventing a sound formal definition of the concept of value-dependence that still admits reasonable optimizations in code that may not even be aware of the fact that it's carrying a dependency on an atomic load has proven to be an exceptionally tricky problem. even now, a full ten years after the introduction of atomics to c and c++, many compilers do not compile the so-called "consume" memory ordering optimally. this problem would be entirely defined away if processors were instead as overtly hostile as the theoretical but not actual memory ordering model of the dec alpha, op

Adbot
ADBOT LOVES YOU

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
these are good posts

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
the times just had a clue like “aunt and uncle’s little girl” for “niece”, so

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

DuckConference posted:

i think the pointer authentication stuff on arm uses the whole space

the kernel tells the processor how many bits to use. it also conditionally honors tbi, which allows programmers to use the top 8 bits without faulting

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
as far as the isa goes, itanium is really weird

first, it’s got a poo poo-ton of registers, both integer and floating-point. there are 32 static registers, meant as scratch between calls, and then a window of 96 more registers that you can rotate during calls to save data without spilling to the stack (at least, not directly in the compiler; if you have enough calls to overflow the register window, it still has to spill, of course). that’s almost an unreasonable number of registers. and that’s for each of integer and fp, and then there are a bunch of specialized registers for things like conditions. it’s really a lot

the bigger thing is that itanium wants to run instructions in parallel by default. instructions are 41 bits, and there are three packed into a 128-bit instruction bundle. instructions within a bundle always run in parallel, so if you have dependencies, like if one instruction adds an offset to a pointer and then another loads from that pointer, they need to go in separate bundles. but they can’t just go in separate bundles: you need the second bundle to say specifically that it has a dependency on a previous bundle and so cannot be run in parallel

so it turns out that superscalar architectures are pretty good at running things in parallel already. the itanium approach is better in some ways: superscalar architectures can definitely suffer from false dependencies, especially with memory, and itanium can communicate that the processor doesn’t have to worry about that. but to do that, the compiler also has to know that there isn’t a dependency. for simple data dependencies in registers, this is straightforward. for memory, it usually means doing an alias analysis. this is a lot easier in fortran, which has very limited pointers and very strong default assumptions about aliasing, than it is in (say) c

but also the compiler has to reorder a lot of stuff just to try to fill bundles so that you don’t end up with appallingly bad code density. the problem is that superscalar architectures get most of the potential value here without all the extra pain. and probably itanium would still have benefitted from using standard superscalar techniques to recognize potential for parallelism even when the instruction stream said there might be a dependency; i don’t know how much of that they did

anyway, below the isa level, itanium was positioned as a server / hpc architecture, so its chipsets were designed for beefy systems. in particular, they had a lot of memory bandwidth. so yeah, databases usually aren’t computation-bound, but they can absolutely be memory-bound depending on the workload, and itanium machines were good at that. and they could also be very good at certain hpc workloads that did a million things in parallel over big datasets

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply