Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
I don't think 20% slower and 20% larger would have been even a primary reason why AMD's newest batch of processors are so uncompetitive. The performance numbers make the efficiency far worse than that, actually.

While it's nice to be able to have some automated tools, for something as performance-critical as a mainstream x86 CPU I'd have thought AMD would have a library of hand-optimized layouts that can be tightly grouped together that are designed to be easier for tools to optimize. That's what I used to do with some tools I used working with FPGAs and it made developing new IP cores much faster with minimal wasted space on the die, and you could still understand the resulting RTL design enough to optimize easily by hand. For almost every performance-sensitive design that was going to be fabbed, we had a contractor we'd hire whose job was literally to take these maybe 300k gate designs and place and route them efficiently by hand - every transistor. These designs were certainly faster, but the contractor became too expensive (and taking way too long), the designs much more complex, and the tools started to catch up for all these designs while he stayed about the same in productivity. Now, with modern CPUs, you can't expect a human to fully P&R a whole 800 million transistor design and hand-optimize it all, so I dunno wtf AMD was doing before they switched to SoC designs.

The funny thing about these automatic synthesis and place & route tools is that occasionally they can optimize something a human couldn't have come up with (like modern software compilers) just doing some hand analysis. I was working on a design for what amounts to a DSP and saw that with my logic it had just hard-wired a couple spots to a high signal. Turns out that the cryptographic hash algorithm I was using had some noticeable collisions and the synthesis tools discovered them for me.

Adbot
ADBOT LOVES YOU

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
I've been disappointed with an E-350 for my HTPC needs though because the CPU is just too anemic for snappy, reliable UI response times, especially if running other services in the background. That is, I couldn't really differentiate it from my ION-based machine I got rid of prior to that. The HP Microserver with nVidia GPU that's since replaced it is perceptibly on-par with the microATX monstrosity HTPC that I built last year and retired quickly. Coincidentally, there's an AMD CPU in this guy as well. The form factor of the Microserver was more attractive to me than dealing with mini ITX minimization scaling issues as well, so even if AMD made an APU with the Athlon and a respectable GPU I'd be opting for the Microserver.

I think Intel has a slight upper hand even in the lower end segments mostly because the i3 is so drat power efficient you shouldn't even need to bother with GPU decoding. Sure, an i3 has an entry pricepoint of $130 + motherboard v. $130 for an E-350, but an i3 has better motherboard options than any Llano CPU will ever get, which can be a slight factor for hobbyist builders over OEMs.

movax posted:

Why would you even bother dealing with x86 at that price? Licensing a BIOS, trying to minimize power consumption, etc...painful.
Mostly only because your customers are wanting support for formats that are basically only do-able with straight up software decoders (people get so huffy if your hardware doesn't support some crazy number of b-frames and pyramidal encoding and poo poo in your h.264 or VC-1... and TruHD64PenisMightier audio encoding).

There's also the sheer laziness factor by your customers not wanting to deal with the hardware-supported formats better as mentioned above. Me, I'm really pissed off at having to transcode stuff because most of what I have is so low quality to begin with I can't accept transcoding for convenience.

But basically it all boils down to the age-old problem of "software defines your hardware requirements." That's why we always ask people buying hardware regardless of if they're your grandma or a Fortune 500 customer wtf they want to run, right?

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

Star War Sex Parrot posted:


Jesus, 3w at idle? That's some serious power gating happening, you'd think they're more power efficient at desktop 2D than integrated chips from Intel are by now.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
I totally remember that post on Hardforums. The innards of the chip were ripped out and you could see the different metal layers of the chip. It looked like an x-ray version of some of the VLSI diagrams put out in press kits but in a vertical cross section form.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Yeah, it needs to be stressed that the primary reason for going AMD years ago was because of Hypertransport beating the pants off Intel's interconnect since that's such a big factor in large-scale HPC workloads (the joke is that supercomputers turn cpu-bound tasks into I/O bound ones)

Adbot
ADBOT LOVES YOU

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

Professor Science posted:

no no, this isn't a QPI or FSB versus HT performance/multi-socket thing (although I'm sure that didn't hurt at the time)--Gemini is actually a device that hangs off the HyperTransport bus. besides the occasional Torrenza FPGA, Gemini is the only device I know of on HT like this.
Yeah, you're right but I wasn't even considering Gemini (heck, didn't know what it was until now). I didn't quite explain with the terse comment admittedly, but before the consideration of performance the architecture and direction of HT was better aligned (not to mention mature and not just paper unlike QPI / CSI or its previous names back in 2003). Prior to 2006, the only low-cost buses with a future and solid performance eliminated PCI and PCI-X, and PCI-E wasn't around either. SRC in their own COTS HPC FPGA-ing efforts had already built their own bus and switching fabric basically (ouch it was expensive) and I proved the mediocre results for it all even with seriously savage optimizations by a Verilog / VHDL guy (me at the time), and Cray went with AMD, possibly involved in making Hypertransport in the first place. The ability to scale easier (you could just add more lanes with HT for near-linear costs) added a lot more value to going HT as well.

Me and my friend Bob (he was working on the GPGPU side of the house) in 2003 were looking for a high-performance bus to solve both our I/O problems on COTS HPC efforts and wound up with Hypertransport being literally the only thing that could suffice for anything that could call itself COTS HPC (although he was quite peeved given there was no HT-based video card available). I thought that the FPGA nodes wouldn't have made it in after the disappointing results I showed but apparently someone thought it was worth it, hrm.

Unfortunately, it's highly doubtful that going forward AMD will see any more of that sweet, sweet DoD / DoE HPC money because Intel's going all-out for Aries apparently and owns Infiniband tech outright. QPI is being skipped entirely for Infiniband it seems. More bad AMD news in the AMD thread? Par for the course.

unpronounceable posted:

I have no sense of scale of the CPU load for thread scheduling or anything is for massive systems like this; could you compare it to a more typical desktop load?
Beyond what was said earlier, big clusters need entire nodes dedicated to monitoring and making sure things are in order (reliability and serviceability meaning "is it up, and can we figure a way around it if we can't?" as opposed to just availability which means "is it up? mkay!") and the less overhead of that you require to guarantee availability the more for :rice:. Sort of like how Google's clusters encounter constant hardware failures at all times, this sort of commercial cluster has similar problems to deal with. Think of it like ECC for entire machines.

  • Locked thread