Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Zhentar
Sep 28, 2003

Brilliant Master Genius

Peanutmonger posted:

I can't even recall the model of my video card these days, but I definitely remember loving my Epox 8RDA+. Those were the days, running my 1700+ thoroughbred at 2400+ at 35-40C with the stock cooler.

In fact, I think it's still in my closet...

Yes, the 8RDA+ may not have had the features of the A7N8X, but it had such excellent value for the low price.

Good thing too, I went through two or three of them thanks to watercooling accidents.

Adbot
ADBOT LOVES YOU

Zhentar
Sep 28, 2003

Brilliant Master Genius

Factory Factory posted:

If it's the same OS and the same APIs, isn't it just a question of some guy (possibly at Microsoft) writing and selling an ARM compiler to set as a target alongside the x86 compiler? As long as the software doesn't use any assembly code, at least.

Windows 8 ARM will certainly be added as a target for Visual Studio. It looks like MinGW even already supports it. But there can still be a lot of obstacles even if the software doesn't use any assembly. Differences in endianness, memory alignment rules, calling conventions and stack layout can potentially impact code across architectures. Even if the code doesn't intentionally rely on this, it is very likely that any application of reasonably large complexity will have many hidden bugs that become apparent in different different conditions.

Also, it's not just a matter of re-targeting your own code - every library which your application is dependent on must also be ported.


Alereon posted:

Also, would .NET applications need to be recompiled, or can the CLR just JIT into native ARM code?

A pure managed code .NET application doesn't contain any x86 code at all, AFAIK (the .NET framework is loaded using info encoded in the PE header, not native code), so there would be need need for recompilation in that respect. However, Microsoft has stated that .NET code will be required to be recompiled to run on ARM. I don't think they've justified the reason behind that yet.

Zhentar
Sep 28, 2003

Brilliant Master Genius
AMD got their x86 license out of that sort of requirement, but it's clearly not a factor today because there is no x86 chip with two suppliers. And if there were such a requirement, AMD couldn't meet it anyway because they can't fabricate chips anymore.

Edit: also, while Intel would certainly not want AMD's patents going to another party, very few could afford to outbid Intel.

Zhentar fucked around with this message at 15:29 on Jul 15, 2011

Zhentar
Sep 28, 2003

Brilliant Master Genius

Arsten posted:

After reading through benchmarks and this thread, my question is "What benefits or spiffy technology are actually in Bulldozer?"

It doesn't do much good for desktop users, but the uncore is loaded with features that may make more compelling in servers. It had better, considering how freaking huge it is - the uncore by itself is almost as large as a quad-core Sandy Bridge chip. I've not seen much coverage on it, but there is this SemiAccurate article on it.

Zhentar
Sep 28, 2003

Brilliant Master Genius

PC LOAD LETTER posted:

But then why did they blow all that die space for all that cache if it had little impact on performance and consumed so much more power? Doesn't seem to add up.

Because it is useful for server workloads, and they only designed a single Bulldozer die. I would guess that the decision was made to conserve engineering resources. I think the expense is much larger in concern to die space than power.

Zhentar
Sep 28, 2003

Brilliant Master Genius

PC LOAD LETTER posted:

I thought L3 cache was most always easy to add or remove since its modular and doesn't touch anything hinky like the L1 does though. Its fairly fault tolerant too right?

Removing the L3 cache may not be a big deal, but take a look at the die layout. The L3 cache is in the middle of the die, and on the edges the full length of the die are used by HyperTransport PHYs on one side, and the DDR3 PHY on they other. Actually making the die smaller after taking out the L3 cache would require more significant layout changes. Designing a layout that leaves the L3 cache somewhere easier to chop off would mean increasing the average distance from the cores, increasing latency and hurting performance.

Plus, the L3 cache is still only about 20% of the die area. Given how late Bulldozer was anyway, I'm not sure spending more time to save 20% of the die would have been worth it.

Zhentar
Sep 28, 2003

Brilliant Master Genius

PC LOAD LETTER posted:

I'm sure its not a cut n' paste operation to remove or add L3 cache but I somehow doubt it would've been that much of a problem to do in time for launch. AMD likely knew well and good how BD would perform early this year at the very least. AFAIK cache uses a fair amount of power too. The die savings would've been nice, especially considering how drat big BD is when its supposed to be small due to the whole module approach, but cutting the cache would have big power savings too right?

Early this year would have been way too late for that big of a change. I think early last year would have been doable, if not a bit late. And no, I don't think it would have been a big power savings. The caches can make up a pretty large portion of the leakage power, because of the sheer number of transistors in them, and it's harder to power gate them without impacting performance, but that's mostly a concern for idle power. Under load, the cache shouldn't be a significant portion of the power consumption.


Edit: Google's preview of this paper claims a 16MB L3 cache for some Xeon has, on average, a dynamic power consumption of 1.7W. That's at 65nm, so the 32nm BD cache should be capable of even less.

Zhentar fucked around with this message at 15:15 on Oct 25, 2011

Zhentar
Sep 28, 2003

Brilliant Master Genius

PC LOAD LETTER posted:

I dunno. They sure released a fixed Phenom II quickly IIRC. Different problem but still, they can certainly fix some stuff relatively quickly. I just have a real hard time believing it takes nearly 2 years to move around stuff like the L3 or HT links or whatever. That is almost half as long as it takes to design a whole new CPU core itself.

Pretty much the fastest possible turnaround for a change is 6 weeks (not counting designing the change itself), and that's for a basic, metal-layer only change (e.g. it doesn't change any transistors, only the wires connecting them). Things like the Phenom II TLB fix involve few, if any, changes to the transistors, and just flip around connections to get slightly different (but correct) behavior.

Moving stuff around means having to do a new floorplan, redoing a lot of the layout, verification, testing, mask generation, and production, plus likely additional revisions to fix issue or improve yields. It takes a long time relative to designing a whole new core because it requires going back to relatively early stages of the design process.

I wasn't able to find any really good sources about the design timeline, so I'm going off of memory about how long this kind of stuff takes... hopefully someone more knowledgeable about the process can fill in details more accurately.

PC LOAD LETTER posted:

Wow BD is even more hosed then I thought then. If they can't get the power usage significantly down by lopping off stuff like cache than they probably have no hope of even approaching the power efficiency of Intel's chips until they do a totally new arch.

BD's power consumption is definitely not some minor design artifact... increasing its efficiency substantially will definitely require substantial design changes. With process and architecture improvements it should be able to get a lot closer to being competitive (maybe with Trinity), but it does seem like AMD made some seriously poor design decisions when it comes to power consumption.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Maxwell Adams posted:

TechReport has done a quick test on Bulldozer thread scheduling. It looks like they had the same idea I had, where you set up different patterns of core affinities and see what happens.

That article is a bit deceptive, because there's one thing it doesn't make clear... all of those scores are significantly lower than if they were just allowed to run with 8 threads in the first place.

The other thing that's not clear is to what extent those benchmarks are floating point. With FP operations, the modules really are a lot closer to simply a single core with Hyperthreading.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Longinus00 posted:

I think it's not trying to compare 2/4 thread vs. 8 thread, it's figuring out how to best schedule when there's not full core/module saturation. Like in games. If windows is trying to maximize idle cores/modules while in lightly threaded situations it could lead to lower performance. This might be what the windows 8 10% performance increase comes from.

Yeah, I realize that's what they're intending to compare, but the article doesn't do a good job of conveying that; I was pointing it out because it would be easy for someone to walk away from that article with the wrong conclusion.

My other complain about not being clear about the floating-pointness is because that directly impacts how applicable it is to games and other desktop scenarios. If it's faster because it reduces contention on FP components, then it's meaningless for most desktop workloads. If it's faster because it reduces cache contention, or some other reason, then it's more likely to help other workloads.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Agreed posted:

Edit: Oh, ARM. Great.

The ARM part is just some dipshit analyst speculating that their plans to focus on "lower power" means they're going to use ARM. As opposed to, say, investing in their product line that's actually doing well.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Factory Factory posted:

Is AMD's process different, or are overclocked Bulldozer chips going to burn out faster than a fart in a frat house?

Not that the two conditions are necessarily exclusive, but yes, AMD's process (and many other potentially relevant aspects of the overall design) is significantly different from Intel's.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Bob Morales posted:

What are the sheer odds that a startup could make an x86 chip?

Ignoring the prerequisite of licensing x86 from Intel and x86-64 from AMD, it's still pretty much zero. Bulldozer took AMD 6-7 years to design with a lot of engineers (hundreds?) with x86 design experience to build off of. Making something competitive with Intel would require a huge, long term investment, and given Intel's process lead, there's a pretty high risk you end up about as competitive as Bulldozer at the end. So Intel would really have to go nuts with monopoly abuse to give an x86 startup a fighting chance.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Popete posted:

I asked if AMD might license ARM or partner with them (interestingly Intel is actually semi partners with ARM as a company they recently bought was previously partners with ARM) he seemed to think that AMD might partner with ARM, and produce their own chips, but use the ARM architecture.

What would AMD gain by moving to ARM? What about their technology, design experience, or other strengths gives them a significant edge over existing ARM competition? Nothing. They're doing well in the x86 mobile market because their integrated GPU (and its drivers) kicks the poo poo out of the competition. But the ARM GPU competition is not nearly as weak as Intel, and unlike the low-end x86 market, the existing ARM SoCs out there have more than sufficient GPU performance for the vast majority of users.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Shaocaholica posted:

So is BD AMDs Netburst? Or is it better/worse than that analogy?

Yes, that does seem to be a reasonable comparison. AMD has been pretty tight-lipped about architectural details, so it's not clear how apt that analogy is, or how bad it is - AMD has made some Netburst-like design decisions, and it may well suffer the same fundamental architectural flaws of Netburst for the same reasons. Or, its failings may be much smaller; Piledriver could bring a refinement that resolves implementation flaws that are holding back Bulldozer and bring much more competitive performance to the table. It's hard to say; there are arguments supporting both views but not much concrete information available to sway things one way or the other.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Shaocaholica posted:

For FPU intensive work, does the extra hardware in BD really add any performance?

Theoretically, yes. Even the heaviest FP loads involve a significant amount of integer work; for example, things like pointer math or loop counters and the such. Back in the day, I ran some simulations of a simple superscalar in-order processor, and in a lot of the floating point benchmarks the FP units spent more time idle than not. Out-of-order execution helps with that a lot, Hyperthreading helps with that a little, but there's still plenty of room for improvement. Feeding the FPU with two threads on separate integer cores will achieve higher FPU utilization.

As for how it fares in F@H, I haven't seen any benchmarks, but I can make a pretty decent guess; it'll put up a pretty decent fight against the i5-2500K, possibly even beating it. If they put out a F@H client compiled with FMA, it'll almost certainly beat the i5-2500K, perhaps even by a significant margin. But it'll eat up a lot more power in the process, and suck at most tasks, so it's still not worth it.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Shaocaholica posted:

If this was going to be a problem, why didn't AMD get MS to push this out -before- the cpu launched?

[marketing paranoia]Because if it had been out when the CPU launched, they wouldn't have been able to try to downplay the lovely benchmark results with promises of a future patch making things better[/marketing paranoia]. That or they just didn't have the weight to convince MS to push through a patch to treat their processor as a quad core with hyperthreading while simultaneously talking all about how their processor isn't just a quad core with hyperthreading.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Star War Sex Parrot posted:

Intel usually introduces new architectures in the enterprise market first, but has moved away from that in recent years. I can only assume that's for business reasons.

You sure about that? Looking back as far as Netburst on Wikipedia, the Xeons look like they've lagged at least 3-6 months behind the desktop parts.

As far as why they lag, I'd guess it mainly comes down to the design/testing/validation for the extra Xeon features & layouts added onto the base architecture. Probably just extra testing & validation in general, for that matter. And given the level of pressure they're getting from AMD, their timelines probably err on the side of safety.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Star War Sex Parrot posted:

The next consoles won't be x86 anyway so it doesn't matter.

Also they can't "go" 64-bit, both the 360 and the PS3 already are 64 bit.

Edit: funny enough, despite Nintendo hitting 64-bit first with the Nintendo 64 in 1996, I think everything they've released since has been 32-bit.

Zhentar fucked around with this message at 22:53 on Dec 21, 2011

Zhentar
Sep 28, 2003

Brilliant Master Genius
Both AMD and nVidia were working with the same troubled TSMC 40nm process. AMD responded by modifying their GPU design to get better yields, nVidia responded with press release ultimatums telling TSMC to have better yields.

It's easy to blame TSMC for overpromising, but there are always going to be problems with process transitions and the designers need to take that into account and not just take TSMC at their word.

Zhentar
Sep 28, 2003

Brilliant Master Genius
I think the ones bright enough to know that were using silver dollars instead.

Zhentar
Sep 28, 2003

Brilliant Master Genius

tijag posted:

This is true mostly. I take everything he says with a huge container of salt.

Ivy Bridge stuff needs some extra salt, as well. Charlie is a major Linux fanboy, and doesn't take well to it not being treated as a serious desktop OS contender. Sandy Bridge (and previous Intel stuff as well) graphics drivers treated Linux as a second class citizen, so Charlie is predisposed to seething rage when it comes to Intel graphics.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Fruit Smoothies posted:

Are they not clever enough to keep up with intel, then? Do they not have enough money? Perhaps they're predicting ARM will overtake x86 in the desktop market?

Intel's profit for 2011 exceeded AMD's revenue. Their mainstream desktop lineup has been excellent for some 5 years running now. Given the enormous cost of designing and manufacturing it's surprising that AMD has held up as long as they have. But the disappointment of Bulldozer shows that they are not able to compete with Intel's core strength. It's just as well that they don't beat themselves into bankruptcy trying. AMD will pose more of a threat to a lazy Intel if they are alive and well in other parts of the market than they would killing themselves failing to keep up.

Zhentar
Sep 28, 2003

Brilliant Master Genius
You can get much more detail from the EE Times. The 10% reduction is just an estimate for what the technology can do, not what the Piledriver implementation achieves. The tl;dr of the technology is that it uses dark magic (and inductors) to reuse some of the clock signal electricity instead of dissipating all of it, and the savings come at almost no cost above the innocents sacrificed to bind the pact of evil creating it.

Zhentar
Sep 28, 2003

Brilliant Master Genius

tijag posted:

If the developer implements FXAA I believe it is possible that the text can remain untouched by the process.

Correct. It's pretty simple - the developer just tells it do the FXAA pass before putting the text on the screen.

Zhentar
Sep 28, 2003

Brilliant Master Genius

HalloKitty posted:

120fps would be ridiculous, and the machine would cost a lot to make,

The economics pretty strongly favor making large, expensive chips. The added cost is only really substantial for the first year or so, and you'll be selling the system for ~5 years. And the console itself isn't even where the money is at, it's the games, so eating extra cost on the console to sell more games is worth it (or, looking at it the other way around, it's not worth doing a little better on each console sale if your competitor has an advantage that lets them claim more of the market).

Zhentar
Sep 28, 2003

Brilliant Master Genius

MeramJert posted:

Why would they name 3 totally different cards the same name?

This way OEMs can make users think their getting the newer, faster Kepler card but without using too much of the limited 28nm TSMC supply (or clear out 40nm stocks, once there's enough 28nm production), I suppose.

Zhentar
Sep 28, 2003

Brilliant Master Genius
There are even advantages to the modern X86 way of doing things - the ISA the CPU pipeline runs on is not part of its published interface. Which means the chip designers can change whatever they want when it's convenient for them, without requiring any compiler/software changes.

Factory Factory posted:

By the way, what's the difference between superscalar and vector?

Superscalar means running executing more than one instruction in parallel, and is for the most part just magic that happens behind the scenes. Vector is a single instruction with multiple sets of arguments, and only happens by a compiler saying so.

Goon Matchmaker posted:

That reeks of anti-trust lawsuit though.

The FTC approved clause is that Intel is only required to enter good faith negotiations with the purchaser.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Alereon posted:

in-order designs are fundamentally slow compared to out-of-order designs.

While this is true...

Alereon posted:

in-order designs are fundamentally power-inefficient compared to out-of-order designs.

This is not (necessarily). It takes a lot of extra logic on the chip to make out-of-order work. You will spend significantly more power doing your computations on an out-of-order architecture. You can still come out ahead if finishing sooner lets you stop wasting power on other stuff, but it's not inherently better.


Also, in-order-ness is not the source of Intel's low power offering problems. Every processor they're competing with in in-order. There won't be any shipping out-of-order ARM processors until sometime next year, with the A15, and it sucks up enough power that they had to come up with big.LITTLE to offset it (not that big.LITTLE isn't a good idea).

Edit: d'oh, misread the chart. The A9 is OOO as well. A15's problem is that it shoots for desktop level pipeline depth and width.

Zhentar fucked around with this message at 01:06 on Oct 16, 2012

Zhentar
Sep 28, 2003

Brilliant Master Genius

HalloKitty posted:

But isn't that from 12 cores to 8 bulldozer modules? not even clocked very high - I can't imagine single thread performance budged at all.

When you've got a cluster of 299,008 cores, if you ever care about the performance of a single thread, you're doing it wrong.

Zhentar
Sep 28, 2003

Brilliant Master Genius

keyvin posted:

If developers are targeting AMD levels of per core performance, then it seems like it follows that AMD will be acceptable on the PC.

The problem is, AMD already is acceptable - just wholly inferior at every price point. And it's not bulldozer cores going into these consoles, so it's not even going to encourage developers to optimize for the strengths that could possibly make the bulldozer architecture cost competitive.

Adbot
ADBOT LOVES YOU

Zhentar
Sep 28, 2003

Brilliant Master Genius

Agreed posted:

Something I did not adequately keep up with as I watched the early launch in horror and then got over it and wrapped up in my own stuff anyway - did mature Windows 8 bring any significant improvement to thread scheduling at-the-OS-level to the equation, or not? Did it help AMD's gigantor architecture at all, or does it do about as well there as it does on Windows 7?

No, it didn't. Which isn't surprising, since it was pretty clear to begin with that wasn't the problem.


Even with the increasing prevalence of multi-threaded code, AMD's still hosed by Amdahl's Law; as you split out the parallelizable portion, you become limited by the portion that can only be done single threaded. It doesn't help that it's pretty hard to effectively split most things eight ways in the first place.


Edit: GPUs are also manufactured to tolerate higher temperatures than CPUs. Running a GPU at 100C isn't any big deal, not so much for CPUs (although I guess Ivy Bridge is rated for it!)

  • Locked thread