Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


My masters thesis was on translating compiled CUDA to OpenCL, although that's a bit higher level than normal assembly. SPIR (i.e. compiled OpenCL) is heavily based on LLVM while PTX (i.e. compiled CUDA) is mostly based on shader languages, which means that you have things like infinite registers and such (register allocation is done later in drivers, well PTX does it when translating into binary form but that's functionally the same thing).

It's not actually all that difficult and is done surprisingly often in emulators (PPSSPP comes to mind). The main problem is in handling differences in architecture capabilities (different instruction sets et. al.) and things like rounding and register allocation/ register types. Apple has a patent on doing it in hardware even, I think it was from when they were switching to x86 with Macs link. If you want to look up more it's sometimes called dynamic recompilation as well, although that's more of an umbrella term that includes things like post-compiler optimisation and auto-vectorisation etc.

Jabor posted:

Complex instruction sets are basically all emulated today anyway. The instruction set the CPU exposes gets translated down into microinstructions for the actual CPU core to execute.

It turns out this is overall a good thing, because you can make a microinstruction set that executes really fast when you don't have to worry about things like "being just as performant on next year's CPU model" and "being compact enough that you're not saturating your memory bandwidth just by reading the instructions you're trying to execute".

The problem is you can't just feed the microinstructions into the processor you want to translate to (assuming you're on a CISC-type system). You have to code around the idiosyncrasies of whatever platform you're translating from and whatever platform you're translating to, even if both internally execute a very similar set of microinstructions.

The other thing is you're probably not going to be able to use things like media and vector instructions even if both platforms support them, because the recompiler won't have enough information to translate them unless both instruction sets are very similar. And then you have things like interrupt behaviour, protected execution, jumps etc. which can be really difficult, or even practically impossible to deal with. That's the actual hard bit, not your run-of-the-mill arithmetic instructions.

e: Basically the same thing as OneEightHundred said pretty much, at least the second part of my post

Private Speech fucked around with this message at 12:16 on Aug 14, 2015

Adbot
ADBOT LOVES YOU

  • Locked thread