Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

silence_kit posted:

I get your argument here. I too seriously doubt that improved cooling will be economical. But let's go into a time warp back to 10-15 years ago. Couldn't you have made a similar argument against the development of more dense VLSI technology back then, the same as what you are doing now? What is the difference between now and then?


Oh, I see.

The sun is ~63W/mm^2. A modern high power chip like AMD's bulldozer 135W boondoggle has more heat per unit area than a turbine fan blade or a nuclear reactor core.

We're approaching the fundamental limit of the thermal conductivity and impedance of the silicon bulk substrate. Having the heatsink directly attached to the silicon die can't pull the heat out fast enough unless the deltaT is super high, which is why those phase change chillers work so well on the retarded 5+ Ghz overclocks. For people who don't want to have what amounts to a miniature commercial freezer plant in their computer require either a chip that doesn't generate as much heat, or a non-traditional cooling method.

Laser etching microchannels in the substrate itself and pumping water through them is stupidly efficient. A whitepaper back in the 80s managed to sink north of 700w in a chip about the size of an 12 core Xeon, with a 60C water temp rise. Something like that might end up being the next high performance cooling solution, assuming they don't just eat the loss and make the chips twice as big to cut the density down enough.

Adbot
ADBOT LOVES YOU

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

silence_kit posted:

The point I'm making is that the famous slide which compares computer chip power densities to the sun, rockets, nuclear bombs, etc. is like 15 years old now, and in the meantime there has been much investment into and improvement in the device/interconnect density in the state-of-the-art VLSI technologies. Circuit designers have been able to figure out how to take advantage of the increased device/interconnect density of computer chip technologies since then without requiring exotic and probably uneconomical/unpractical cooling, why would further improvements to density be any different? What am I missing here?

They're still incredibly limited by the total thermal envelope of the part. The fact that they got smaller and more efficient just means you can cram more of them per unit area before running into the same issue. A simple example would be why the 24 core stupid expensive Xeons don't clock faster, and it's because they run right up against the 135w TDP limit imposed. If the chip had a 200W power budget due to better cooling technologies, you'd see it clock ~30% faster.

It becomes even more challenging once you have stacked layers of chips. You could fit a metric asston of HBM memory on die by stacking it up super high, but you pretty quickly run into massive issues with dissipating the heat out of the center of the stack, which limits the total power and thus speed of the HBM stack. Being able to wick the heat out faster than the molocrystaline silicon can conduct it will be key in the next 10 years to improving package TDP and interconnect density.

Hell, look at what heat pipes did for the entire CPU cooling industry, before you'd a hugeass copper heatsink with some Delta 140 CFM fan that sounded like a model jet taking off and cook the chip with a 100W load on it. Now you have the heatpipe tower coolers that can handle 135W silently, just due to how much better they're able to pull the heat away from the chip and towards the extremities of the fins.

I personally am looking at the technology that uses a vapor phase change system through etched paths on die, using the same phase changing goo the heatpipes use, possibly with a pump to encourage the liquid to flow into the chip. Then you can stack them literally as high as you want and can afford to cool, and the total thickness of silicon needed to conduct the heat through goes down substantially.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

sincx posted:

How did Toslink get so cheap/ubiquitous?

Crappy laser diode like in a DVD player, doesn't need to blink super fast or with very many colors, so you get the 3.1 mbit/sec toslink for the low low price of like 17 cents. The most modern, high spec example is only 150mbit/sec, which is well within the range allowable with a simple single color red laser diode. Basically trivial to do using stuff that's been around in commodity pricing for decades.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Malcolm XML posted:

And plastic cable

True, but even the fancy glass fiber ones are like $20 or so. The plastic ones are so cheap they're included on even the crappiest tier products.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Cygni posted:

Yeah, I wouldn't bet the farm on Zen+/Zen2. AMD doesn't have the money to redraw the core massively each year, which means we're really betting on Glofo's in house "7nm" DUV process. Considering Glofo had to license Samsung's 14nm process cause their last few were so uncompetitive, I'm a bit hesitant to get hyped for their 7nm. The DUV process is a betweener design too, supposed to get replaced by a EUV process like a year after it launches.

Thats if, you know, EUV actually ends up working.

They have EUV more or less working, and production equipment is being produced for it, it just took 10 years longer than anyone thought it would because lol ionizing radiation as a lithography process. They'll use it on the very few critical layers that would best benefit from it, and use DUV for the rest, based on what I've read.

Do keep in mind, any chip produced with it will get a finalized tapeout made, and the first chips off the line will be between 4 and 6 months later, due to the time it takes each step to process, and the 100+ steps each wafer needs to undergo.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

WhyteRyce posted:

I stumbled into some data hoarders subreddit and dear lord those people are insane

Link? Those people are always good for a laugh. That said, 60 TB of post-RAID6 storage is totally normal, right?

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

inkwell posted:

I was under the impression that people are still trying to get the pellicles right so they don't melt immediately.

It works, but not at the beam levels they want in order to get 300 wafers per hour through it.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Watermelon Daiquiri posted:

Give me '10nm is hard'!!!



also: i wonder if that dgpu nuc is the rumored licensing by intel of radeon stuff?

10nm is a rat bastard because you need EUV to get the critical logic layers and parts of the cache down in size and therefore down in cost. The fab manufacturing companies just released a commercially viable litho system for 10nm.

Also keep in mind that a modern chip with the dual or quad multipatterning layers takes between 4 and 6 months from the day the wafers show up to the day the finished chips go in a bin.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Rastor posted:

It's all about performance per watt. If Qualcomm is really delivering in that category the big players like LinkedIn, Azure, Google are interested. If they don't really have an edge they will fade out like every other ARM server play.

If you can provide a platform that delivers X big compute task 15% cheaper than the next best solution, the cost of rejiggering the code to run on it becomes secondary to the 20 million you spend on the hardware to run it on.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA
That with a video card style vapor chamber cooler could work out quite well.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

VostokProgram posted:

I get that the flourinert is vaporizing inside the box to draw energy away from the components, but how does it cool down enough to go back to liquid? Does the gaseous flourinert get run through a heat exchanger with the atmosphere?

You have a custom loop sitting just out of frame. You have a 120*240mm rad inside the box, and you pump water through it to two more 120*240 rads outside. The water ends up basically at 20-25C after the 2nd rad, goes in through the pressure bulkhead to the internal radiator and all the fins act as a shitload of condenser area for the gas, it works quite well, and even if your room is 35C because you're a buttcoin business magnate, it will still work quite well in any case that was actually designed for a 10 PSI overpressure.

My 'this is stupid, awesome, but stupid PC build', will be a wall mount, like the ones you see at gaming case competitions, with a bigass plexiglass box in the center that holds the core system. The stuff is about $250/gallon if you buy 5 gallons, which is enough to do a full ATX build without too much issue. I'm gonna be a wiseass and take apart the power supply and bury it in blue aquarium gravel like it's some kind of pirate ship, possibly have a plastic fish floating in the tank.

Methylethylaldehyde fucked around with this message at 22:58 on Nov 18, 2017

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Three-Phase posted:

Yeah, I think it's got a larger array. They do sell ones that are under $500 now, but the array is less than 100x100. Still pretty amazing but that resolution is very limited.

Larger array and the ability to use detachable optics, most likely. That's one thing I would really be interesting in seeing, is thermal of Thermal Fusion images of various products as they undergo overclocking, the VRMs and heatsink surfaces especially.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

ConanTheLibrarian posted:

That would be at least 2 1/2 years after standardisation. Seems excessive compared to the time to market of PCIe 3.

We also aren't running into any super compelling needs to have PCIe-4 over PCIe-3 compared to USB3 needing the PCIe-3 lanes over PCIe-2. About all it would give us is twice the bandwidth to the southbridge for more NVMe stuff, but a lot of boards just hang those slots directly off the regular PCIe lanes these days anyways.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

DrDork posted:

Which would be super cool for mega-SLI implementations--like 4x or 8x cards. Too bad NVidia killed off anything over 2x SLI (which doesn't even work all that well these days to begin with), and AMD is a hot mess. So basically there's no real point for it on the GPU side whatsoever.

Could still be cool for splitting out a whole mess of NVM drives, though, since trying to dig out 4x PCIe 3.0 lanes for more than one or two of those buggers is not as easy as I'd like on consumer-level boards. But past that I struggle to see the urgent need for it.

They make PCi-e riser cards with a PLX chip in them specifically for that use case, but they're kinda retarded expensive.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

GRINDCORE MEGGIDO posted:

Dumb question: what is it about avx instructions that causes so much additional heat?

Really dense matrix calculations done on lots of data per clock tick, more or less. There are AVX specific registers, and you can do a ton of floating point vector math in not a whole lot of time.

Same way a welder has a duty cycle, the processor has more burst capacity than steady state capacity, so really computationally dense instructions running for long periods will thermal throttle it.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Malcolm XML posted:

Yeah wider = slower but it can have throughput.


These morons decided to run avx-512 (throughput Optimized) for latency sensitive code

AVX is magical because wider doesn't necessarily mean slower, the latest implementations have 32 registers, and if you have code with a lot of vector math, especially math that can be chained, you can have the entire AVX silicon stack chewing on stuff 24/7.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA
TSMC/GoFlo's 7nm is roughly on par with intel's 10nm in terms of pitch size, via size, density and sram cell sizes, +-10%ish depending on what metric you're looking at.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

JawnV6 posted:

What does this even mean

All of the stuff that's easy to do, like branch prediction, instruction caching, L2 caches, MMX/SIMD instructions, etc has already been done. Going from 'we put in this thing called a branch predictor and it's god damned amazing, holy poo poo' to 'our branch predictor is 13% better at predicting branches in looping code, go team' is what he means by diminishing returns.

One of the cool things that might be on the horizon is a huge wad of L3/L4 cache that's under the main chip, with an interposer between it and the main core set. Imagine 1 GB of L3 high speed edram on chip.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

JawnV6 posted:

No, I literally cannot comprehend how the latter half of the statement "even if the architecture were changed to ARM or something from a microarchitectural standpoint we are hitting diminishing returns" is in any way sensible. x86 has some really lovely characteristics that would absolutely not transfer over if ARM was slapped onto the front end.

If you assume that ~5Ghz is the upper end of clockspeed (speed of light, switching delays, clock propagation, TDP limits), and look at all the work that's been done to make x86, ARM, and MIPS as IPC efficient as possible, there comes a point where you just can't make it 10% faster per year, forever. All of the easy to implement in hardware poo poo has already been done, so now we're working on refinements to earlier systems, and adding more complex systems in place, in an effort to eek out more performance.

JawnV6 posted:

So.... WideIO w/ TSV's? Cool, anything from the horizon that wasn't cutting edge in 2012?

Cutting edge doesn't show up on desktop level processors, so it's new and exciting to see how regular software developed for regular chips could perform if they had a 1 GiB L3 eviction cache for instructions and data.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

ElehemEare posted:

Melting down ITT that nobody has emptyquoted this.

:perfect:

Having a branch predictor at all was a huge rear end improvement in the early days. Now we're getting the last few % out of whats left over after 15 or 20 years of improvements.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

movax posted:

As for challenging x86...eh, I don’t think so, for RISC-V. At least not primarily; it’ll insert itself into the market in other segments first.

Eating ARM's lunch in the super low end most likely.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

JawnV6 posted:

Again, I'm finding this really handwavey and not really true to the x86 history as it existed. A "branch predictor" like we've been discussing on a 486 is wholly unnecessary. 100% static prediction. By the time of a Pentium MMX you need something dynamic or the bubbles will kill perf. It's ahistorical to say it was a huge improvement because there's no predictor-less model that gets some huge upgrade with only a predictor added. There's a 5-stage in-order core with a static predictor, then a 6-stage superscalar with a dynamic BTB.

I was speaking more in the realm of 'microprocessors, generalized', where we went from 'holy poo poo photolithography is amazing' in the late 60s, with huge process, compiler and architecture improvements every year like clockwork, but now we're running into issues that are Not Fun(tm) to solve. Like EUV lithography and how basically nothing likes being bombarded with ionizing photons, and how any problem outside of embarrassingly parallelizable ones run into issues where even with infinite cores, it's fundamentally limited by the single threaded performance.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Honestly that's kinda cool.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

movax posted:

I don’t really do the YouTube, why does he suck?

Imagine if the 25 year old Best Buy geek squad employee you openly loathed every time you had to interact with him made hundreds of thousands per year talking about technology.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Risky Bisquick posted:

He's alright if you know the angle

He's occasionally funny, but I knew too many people who were unironically that obnoxious AND useless, so it's pretty triggering.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA
To be fair, a ton of 32nm fab space is being used for flash, I believe.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

movax posted:

I always thought it was n for CPUs, n-1 for PCHs / high-performance ASICs (maybe some 10GbE stuff) and then n-2 and beyond for flash and whatever else. Insert their LTE modems whereever applicable.

For flash you want big chunky features on a completely perfected fab process, because when you layer the cells 64x deep, and have 1000+ processing steps needed, you want it to have as few defects as humanly possible.

Also, the size all depends on who you're making it for, how much they're willing to spend on tapeout and debug, and how much they desperately need the power savings.

Once 7nm comes out, we should see a ton of 10GbE or better stuff come way down in price, as 14nm production shifts over.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

JawnV6 posted:

Why was Krzanich on the way out though? Otellini stepped away out of cycle because of the 'big decision' of opening up the fabs, is there something else of that magnitude coming down the pipe?

He sold every share he was contractually able to, after knowing about Spectre/meltdown, and critically, BEFORE the public knew about it. It's literally a perfect, textbook case of securities fraud. And rumblings from on high are the FTC is preparing to nail him to the wall for it. Best find something less embarrassing for the firm, like him abusing his role as CEO to gently caress the hot intern and get rid of him before he goes to prison.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

EoRaptor posted:

He didn't abuse his role, the relationship was consensual.

Those relationships are against policy because even if it's 100% perfectly consensual, the disparity in power and the relationship between them can give the appearance of coercion, and appearances are everything in the corporate world.

The same way an Army Captain can't gently caress one of his NCOs because he has authority over them, the CEO can't gently caress one of his employees, because of his authority over them. Enough lovely people in positions of power have abused it so now it looks skeezy as gently caress no matter what the circumstances.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

eames posted:

I suspect it's going to be a TIM similar to liquid metal instead of solder, he mentioned that Intel was experimenting with such a compound in one of his older videos, just like he mentioned 8C Coffee Lake during the 8700K launch last year.
Maybe they found the long term performance to be good enough and figured out a way to package it so it doesn't spill during transport (i.e. by applying some sort of gasket material around right the die).

They could also go with bare die+shim+spacer/support for the massive overclocker crowd.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

hobbesmaster posted:

Yeah current LTE-A modems use 14nm: https://www.intel.com/content/www/us/en/wireless-products/mobile-communications/xmm-7560-brief.html

Cellular modems usually use older nodes so I'm not sure why delays in 10nm screw them over. Unless it was designed for 14nm equipment that is not going to be available because the 10nm stuff isn't online?

5g stuff needs 10nm or better because of the hilarious power requirements needed to drive the system. A 5g SOC is like twice as power hungry as current SOCs, and almost all of it is in the actual modem. Same with the base station stuff, a LOT of the gear is so close to cutting edge they desperately need the extra 40% power savings in order to avoid having the telco racks sound like the mid 90s 1U pizza box servers.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

BobHoward posted:

(can they be fixed by changing just metal layers, or do you need a full layer spin).

Modern multipatteringing 14++++ is like what, a 4-6 month lead from 'welllll FUCKKK' through to new silicon in hand?

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

BobHoward posted:

I have no idea, but can tell you that the total delay was on the order of months on a foundry 45nm process. The common trick to help out with OH gently caress moments is to make your first wafer order larger than it needs to be and instruct the foundry to hold some wafers aside without doing the upper (or all) metal layers. When you come back to them with a metal-only spin, you’ll have some wafers ready which only need metal process steps to complete.

This is one of the reasons why you hope that if you have fixes to make, they can be done entirely in metal. Gets you back on the path towards shipping parts for revenue much sooner. Metal masks are also less expensive, especially the higher you go in the stack. For this reason, another standard mitigation is to pepper your chip with spare gates - flip flops and combinatorial logic gates in the base layers, with vias to bring their inputs and outputs to low metal layers, but no actual circuit connections. This lets you do much more sophisticated fixes with metal changes: if there’s some spare gates close to where a designer hosed up, you can patch in new logic without needing a base layer change to create the gates.

That's super clever for an early run. As long as you have die space, adding little FPGA-style modular bits could save your rear end if something bad happened.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

C.H.O.M.E. posted:

Its not an fpga, in that it is not programmable in the field. its just extra logic gates that aren’t hooked up to metal routing at all unless you make new metal layer masks.

Yeah, it's not programmable like an FPGA, but all the base FPGA building blocks, the adders and pipeline bits are there in silicon, just not attached to anything until the engineer figures out oh poo poo, and needs to delay one line of the serial bus by .9 ns to get it to work, so they route in a single stage delay to get it back in line.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

k-uno posted:

So, I have a technical question that I haven't been able to find a clear answer to on Google, and I figured I'd ask this thread. Specifically, how does performance compare for a two-socket system of two Xeons vs. a single-socket Xeon with twice as many cores?

As part of my job, I run a lot of parallel scientific computing tasks constrained by memory bandwidth and FP64 rates (and which often don't run well on GPUs). I know in principle a two-socket system has twice the memory bandwidth and can usually turbo to higher speeds, but this is hampered by the fact that bandwidth and latency for one CPU accessing memory attached to the other CPU's socket are both really bad. As far as I know, none of the software we'll be running is smart enough to know how to divide the data intelligently between the two memory banks. And most of it uses MKL, so will probably run poorly on AMD chips.

Does anyone have any real-world experience with this issue? All I can find looking around is discussions by people running lots of simultaneous VMs, but that's a very different scenario from one user running really intense jobs.

It depends entirely on your data set and how you're doing the computations. If they're all atomic enough to be easily made parallel, then you can just associate threads and system memory on a NUMA node basis and be done with it, but the more inter-related the data and code get, the more you're going to pay in penalties picking it up from a non-local node. Without knowing exactly what you're doing, the answer is a big fat 'it depends'.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

k-uno posted:

I’m a physics professor and the work I do is in quantum computing, so it’s quantum physics simulations. The computationally expensive step is complex-valued sparse matrix-vector multiplication (for very large sparse matrices, done thousands to millions of times), which is ultimately limited by memory bandwidth but involves a fair bit of fp64 math as well. It also doesn’t get much of a boost on GPUs, or at least not enough in my reading to be worth the trouble. The actual calculations are done in Mathematica, Matlab or Python, because of ease and flexibility; as far as I know none of those programs support manually assigning individual threads to individual processors. Though if there’s a way to do that externally from windows or Linux that would be amazing!

I will be the first to say I’m not an amazing programmer since my background is in physics, and a lot of the work will be done by grad students (whose education naturally focuses much more on physics than programming). This is not serious HPC in any sense, but the calculations can often take up to 32 GB of RAM and run for a few days. I asked about the one vs two socket issue because we’ll be buying a few workstations for the group in the next few months and I was wondering how much of a penalty two sockets causes when you have to rely fully on the OS for thread scheduling.

Easy answer here is to just hang 64 GB off each socket and call it a day. If you get a 2 socket machine you can pin the entire instance of mathematica or python to a NUMA node using start /affinity.

Does any of your code use the shiny new AVX instructions to do the vector math?

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

BangersInMyKnickers posted:

what the hell is wrong with you

Some people just really like the intricate nature of multiboxing, and setting up the predefined stimulus:response trees, so that your army of priests and druids just rolls through a BG instance shitstomping anything withing line of sight while you lead the pack with your dwarf paladin, throwing hammers and shittalking in chat.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

mystes posted:

I can see the appeal of a game where you control multiple characters simultaneously, but it just seems crazy to go to such lengths to simulate this by running multiple instances of a game where you control a single character.

It's like a huge, ever evolving puzzle wherein you generate large quantities of salt via being in control of a hivemind of highly frontloaded DPS characters. Just being able to get it working in the first place is a huge sense of accomplishment for a lot of people.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

ConanTheLibrarian posted:

wait what? I didn't think the move off silicon was due for another few years yet

It still won't be for a few more years. They they're actively looking at things like III-V materials and other exotic stuff for gate materials and some other things, but it'll be mostly a SI substrate, with ever more elaborate poo poo deposited onto it.

Adbot
ADBOT LOVES YOU

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

evilweasel posted:

yeah but that's worse in every respect than the relevant 14nm chip and seems to have been shoved out for no other purpose than to claim that it's in production

Also to recoup a tiny tiny bit of cash back from the node, and to help debug the many and varied issues with the process.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply