|
Twerk from Home posted:I'm rooting for Atom's time to shine. Skylake was their new architecture on 14nm, broadwell was the first! Right, but they haven’t made an architectural change since Skylake. Like presumably they have Ice Lake or whatever penciled in for 10nm, why not just say ‘gently caress it’ and roll it out now on 14nm, akin to Nvidia rolling out Maxwell while they were still stuck on 28nm?
|
# ? May 18, 2018 03:10 |
|
|
# ? May 23, 2024 20:59 |
|
Space Racist posted:Right, but they haven’t made an architectural change since Skylake. Like presumably they have Ice Lake or whatever penciled in for 10nm, why not just say ‘gently caress it’ and roll it out now on 14nm, akin to Nvidia rolling out Maxwell while they were still stuck on 28nm? Confusingly, processes these days have very different methods of creating transistors and connecting them to other features on the chip. The tools to design the chip are heavily tied to the process, as you need to be aware you are using something like FinFET before you start designing the layout. You can't back out once you start, and intel is trapped with a commitment to a 10nm design process which the actual lithography cannot execute on. They only have very expensive options in front of them.
|
# ? May 18, 2018 04:37 |
|
Regarding OC and power consumption, I use my 8700K in Windows as a gaming console and the CPU package power difference between stock and 5 Ghz/1.35V is ~15W in Overwatch and ~30W in BF1 (which I admittedly only use as a benchmark). It's almost zero while browsing and zero when the PC idles at 800MHz/0.7V, but the performance improvement is pretty significant with a high refresh rate monitor because — as others have mentioned — game engines are ultimately bound by single thread performance even when they utilize all cores. Amdahl's law, etc. The effect is so significant that turning off Hyperthreading to avoid thread collisions improves FPS and frametimes in all but the very latest AAA game engines, not to mention that CFL seems to clock ~200 MHz higher and obviously stays cooler with HT off (in retrospect I should've bought a 8600K).
|
# ? May 18, 2018 11:53 |
|
MaxxBot posted:You are correct that the low hanging fruit has been picked, even if the architecture were changed to ARM or something from a microarchitectural standpoint we are hitting diminishing returns. EoRaptor posted:Confusingly, processes these days have very different methods of creating transistors and connecting them to other features on the chip. The tools to design the chip are heavily tied to the process, as you need to be aware you are using something like FinFET before you start designing the layout. You can't back out once you start, and intel is trapped with a commitment to a 10nm design process which the actual lithography cannot execute on. They only have very expensive options in front of them.
|
# ? May 18, 2018 18:36 |
|
Here’s my current take: I’d be OK with development halting to focus efforts on getting the software correct for these newer chips. Yes, I know that it’s a small team working on the software side of things, but ACPI and power management driver bugs drive me loving bonkers. Cool, my new XPS 13 gets insane battery life but I want to smash the loving thing against the wall because of various P and C state transition bugs that turn it into a micro-stuttering mess. I’ve been on every single loving side of that problem. I’ve done cool and interesting(TM) things in RTL figuring my software dudes would take care of it. I’ve written low-level code that’s dealt with someone else’s cool and interesting bullshit that’s a nightmare to work with. It’s like they don’t have systems engineers who are politically empowered to write interface control documents and make sure everyone is on the same page. Otherwise, hey, it’s 2 months to launch and by the way, to get the performance and power consumption we claim, we need drivers / software to do X, Y, Z and customers are good and hosed until that overworked team gets to it.
|
# ? May 18, 2018 19:08 |
|
JawnV6 posted:What does this even mean All of the stuff that's easy to do, like branch prediction, instruction caching, L2 caches, MMX/SIMD instructions, etc has already been done. Going from 'we put in this thing called a branch predictor and it's god damned amazing, holy poo poo' to 'our branch predictor is 13% better at predicting branches in looping code, go team' is what he means by diminishing returns. One of the cool things that might be on the horizon is a huge wad of L3/L4 cache that's under the main chip, with an interposer between it and the main core set. Imagine 1 GB of L3 high speed edram on chip.
|
# ? May 18, 2018 19:16 |
|
movax posted:I've seen things you people wouldn't believe.
|
# ? May 18, 2018 19:17 |
|
Methylethylaldehyde posted:One of the cool things that might be on the horizon is a huge wad of L3/L4 cache that's under the main chip, with an interposer between it and the main core set. Imagine 1 GB of L3 high speed edram on chip. <<< Back to Broadwell-C (OK, not 1GiB, but yeah, it made a huge difference, shame it didn't continue).
|
# ? May 18, 2018 19:25 |
|
It is a goddamned miracle anything more complicated than a toaster works as intended.
|
# ? May 18, 2018 19:49 |
|
movax posted:It is a goddamned miracle anything more complicated than a toaster works as intended. I would loving love an errata in hardware thread, or even more posts about it. Some of the workarounds must be borderline genius. GRINDCORE MEGGIDO fucked around with this message at 20:03 on May 18, 2018 |
# ? May 18, 2018 20:00 |
Well I love that the big solution for getting lithography done on 14nmesque processes is to just do 'half' of the layout with one mask and then In the same exposure do the other 'half' on a second mask. It's a bit more complicated, but that's essentially what they have to do.
|
|
# ? May 18, 2018 20:56 |
|
movax posted:Here’s my current take: I’d be OK with development halting to focus efforts on getting the software correct for these newer chips. Yes, I know that it’s a small team working on the software side of things, but ACPI and power management driver bugs drive me loving bonkers. Cool, my new XPS 13 gets insane battery life but I want to smash the loving thing against the wall because of various P and C state transition bugs that turn it into a micro-stuttering mess. Power states are such a pervasive pain in the butt it almost seems better just saying gently caress that poo poo and throwing all resources behind some kind of miracle batteries Why is L0s, even!!?
|
# ? May 18, 2018 21:04 |
|
movax posted:I’ve been on every single loving side of that problem. I’ve done cool and interesting(TM) things in RTL figuring my software dudes would take care of it. Methylethylaldehyde posted:All of the stuff that's easy to do, like branch prediction, instruction caching, L2 caches, MMX/SIMD instructions, etc has already been done. Going from 'we put in this thing called a branch predictor and it's god damned amazing, holy poo poo' to 'our branch predictor is 13% better at predicting branches in looping code, go team' is what he means by diminishing returns. Like, for example because you brought it up, branch prediction. x86 branches a lot more frequently than ARM does, you can expect one every 3~5 instructions and that window is much larger on ARM. We hit 93% branch accuracy with Yeh's bimodal in 1991, so again it's confusing to see folks saying Sandybridge was the inflection point then pointing out 30 year old research. Methylethylaldehyde posted:One of the cool things that might be on the horizon is a huge wad of L3/L4 cache that's under the main chip, with an interposer between it and the main core set. Imagine 1 GB of L3 high speed edram on chip. priznat posted:Power states are such a pervasive pain in the butt it almost seems better just saying gently caress that poo poo and throwing all resources behind some kind of miracle batteries
|
# ? May 18, 2018 21:14 |
|
movax posted:Here’s my current take: I’d be OK with development halting to focus efforts on getting the software correct for these newer chips. Yes, I know that it’s a small team working on the software side of things, but ACPI and power management driver bugs drive me loving bonkers. Cool, my new XPS 13 gets insane battery life but I want to smash the loving thing against the wall because of various P and C state transition bugs that turn it into a micro-stuttering mess. I don't know the inner workings of the power management in Windows but on my workstations, that poo poo is turned loving off. You want a system that is unstable? Let it sleep. This is another reason I changed to Windows 10 pro for Workstations, it has less power state crap in the MAXIMUM PERFORMANCE mode.
|
# ? May 18, 2018 21:26 |
|
redeyes posted:You want a system that is unstable? Let it sleep. This is another reason I changed to Windows 10 pro for Workstations, it has less power state crap in the MAXIMUM PERFORMANCE mode. Unstable meaning crashing? I thought movax was speaking more to things like an automatic transmission that keeps second guessing you or lacks hysteresis. When you know better than it does about what's coming up and would shift very differently. "I know you can't make this hill in second, christ just downshift"
|
# ? May 18, 2018 21:37 |
|
JawnV6 posted:Unstable meaning crashing? I thought movax was speaking more to things like an automatic transmission that keeps second guessing you or lacks hysteresis. When you know better than it does about what's coming up and would shift very differently. "I know you can't make this hill in second, christ just downshift" Not straight up crashing usually although my AMD RX480 would do that from time to time coming out of sleep. More like stutters, stuff not loading right or taking too long. I have a workstation that HAS to be up constantly, the person coding on it has a loving brain hemmorage if it goes down, reboots, or does anything except run smoothly from 8AM to 10PM. If that machine sleeps for any reason, it fucks up within days. I don't really care to troubleshoot it, I just leave the loving thing on. I'm not trying to be hyperbolic but every time I let a machine sleep, it becomes unstable at some point later. Home users do not care or notice it.
|
# ? May 18, 2018 21:48 |
|
A cpu is a rock we tricked into thinking
|
# ? May 18, 2018 21:52 |
|
redeyes posted:Home users do not care or notice it. That you have a single pissy, temperamental dev doesn't translate 1:1 into this.
|
# ? May 18, 2018 22:04 |
|
Of course, but from all the support calls I get, it is certainly an issue on a lot of machines. It might be worth mentioning, I run a small computer business but I support something like 100 businesses, a lot more home users, a few non profits, a few .gov's... so I am speaking from a decent sample size. redeyes fucked around with this message at 22:46 on May 18, 2018 |
# ? May 18, 2018 22:10 |
|
JawnV6 posted:Like, for example because you brought it up, branch prediction. x86 branches a lot more frequently than ARM does, you can expect one every 3~5 instructions and that window is much larger on ARM. Why? Assuming AArch64 or indeed Thumb. Predication on all instructions as in classic ARM is actually considered an architectural wart once you're trying to do high performance OoO stuff, which is why the 64 bit architecture dropped it, and without that I'm not getting why there would be a big difference. There isn't in my own hobby compiler, for what it's worth.
|
# ? May 18, 2018 22:19 |
|
GRINDCORE MEGGIDO posted:I would loving love an errata in hardware thread, or even more posts about it. Some of the workarounds must be borderline genius. I just want to know if "under complex microarchitectural conditions" is really just Intel-speak for "once in a while for some unfuckingknown reason". That would translate errata like "Under complex microarchitectural conditions, processor may hang with an internal timeout error (MCACOD 0400H) logged into IA32_MCi_STATUS or cause unpredictable system behavior." into "sometimes the CPU just freezes or does something weird, but we found a magic microcode patch!"
|
# ? May 18, 2018 22:21 |
|
feedmegin posted:Why? Assuming AArch64 or indeed Thumb. Predication on all instructions as in classic ARM is actually considered an architectural wart once you're trying to do high performance OoO stuff, which is why the 64 bit architecture dropped it, and without that I'm not getting why there would be a big difference. There isn't in my own hobby compiler, for what it's worth. Tell me, what's your 1:1 instruction equivalent on AArch64 for FXSAVE? Or something trivial like MOV EDX, [EBX + 8*EAX + 4] . "oh, my hobby compiler wouldn't spit that out"? Kazinsal posted:I just want to know if "under complex microarchitectural conditions" is really just Intel-speak for "once in a while for some unfuckingknown reason".
|
# ? May 18, 2018 22:56 |
|
JawnV6 posted:No, I literally cannot comprehend how the latter half of the statement "even if the architecture were changed to ARM or something from a microarchitectural standpoint we are hitting diminishing returns" is in any way sensible. x86 has some really lovely characteristics that would absolutely not transfer over if ARM was slapped onto the front end. If you assume that ~5Ghz is the upper end of clockspeed (speed of light, switching delays, clock propagation, TDP limits), and look at all the work that's been done to make x86, ARM, and MIPS as IPC efficient as possible, there comes a point where you just can't make it 10% faster per year, forever. All of the easy to implement in hardware poo poo has already been done, so now we're working on refinements to earlier systems, and adding more complex systems in place, in an effort to eek out more performance. JawnV6 posted:So.... WideIO w/ TSV's? Cool, anything from the horizon that wasn't cutting edge in 2012? Cutting edge doesn't show up on desktop level processors, so it's new and exciting to see how regular software developed for regular chips could perform if they had a 1 GiB L3 eviction cache for instructions and data.
|
# ? May 18, 2018 23:13 |
|
JawnV6 posted:Nothing to do with predication? This is a widely-held belief that I can't accurately source, but uarch features directly rely on this being true. What are they calling the trace cache these days? I'm not sure why you wouldn't believe this. x86 is CISC, instructions do more. Branches are more frequent because there's less of other types. The whole point of RISC is that the complicated instructions don't actually get used much in practice though? Plus, if your assertion is 'x86 has more branches proportionally because of complex instructions doing more and also that's bad in this scenario for some reason' (because that implies the exact same number of branches, surely, just fewer instructions between them) then the solution is to...only use the simpler instruction forms. What's ARM buying you here? Regardless it's an easy thing to verify, I assume - surely there's some studies out there of relative frequency of branch instructions in the same software compiled for different isas?
|
# ? May 18, 2018 23:15 |
|
Methylethylaldehyde posted:If you assume that ~5Ghz is the upper end of clockspeed (speed of light, switching delays, clock propagation, TDP limits), and look at all the work that's been done to make x86, ARM, and MIPS as IPC efficient as possible, there comes a point where you just can't make it 10% faster per year, forever. All of the easy to implement in hardware poo poo has already been done, so now we're working on refinements to earlier systems, and adding more complex systems in place, in an effort to eek out more performance. x86/ARM have had very different histories. If we're slapping an ARM frontend onto a modern x86 design, you're ripping up a lot more than the decoder. The two approaches to memory coherency are vastly different, and that's the kind of thing that bubbles down into every section of the chip. Every little uarch feature that touches that area may not map 1:1 across ISAs. This imaginary pan-ISA death of innovation is a farce. Furthermore, the "easy" being done doesn't preclude "hard" things that cause massive gains. Like why is HT only on execution ports anywaaaay feedmegin posted:Plus, if your assertion is 'x86 has more branches proportionally because of complex instructions doing more and also that's bad in this scenario for some reason' (because that implies the exact same number of branches, surely, just fewer instructions between them) then the solution is to...only use the simpler instruction forms. What's ARM buying you here? And I think that ignores, among other things, the actual differences between x86/ARM and their histories in the industry. x86 has more branches per instruction. There's no value judgement yet, it just does. As a result, the Intel architects had to focus more on branch prediction than their ARM counterparts. So in Methylethylaldehyde's x86/ARM swap, you end up with a stupidly good branch predictor and you'd be better served spending transistors on other areas. The solution of "only use the simpler instructions" is a whole can of worms. What are we optimizing for again? Total compute, or uop throughput for some unknown reason? Using 8 instructions to do the same math that 1 could do will reduce the frequency of branches. It'll also blow out your iCache, increase heat shuffling bits around, and probably increase register pressure too. feedmegin posted:Regardless it's an easy thing to verify, I assume - surely there's some studies out there of relative frequency of branch instructions in the same software compiled for different isas? FXSAVE was a cheap shot. 3-register LEA is bread and butter "x86 is more dense than ARM".
|
# ? May 19, 2018 01:41 |
|
Methylethylaldehyde posted:All of the stuff that's easy to do, like branch prediction Melting down ITT that nobody has emptyquoted this.
|
# ? May 19, 2018 03:40 |
|
SuperH is the best ISA fight me
|
# ? May 19, 2018 04:31 |
Well, to be fair 2cm is kinda getting close to the physical dimensions of the chip. Now, you'd probably do something like stick the clock generator in the middle and other tweaks to ensure that there aren't any clock dependent signals traveling a long distance, but still.
|
|
# ? May 19, 2018 04:37 |
|
ElehemEare posted:Melting down ITT that nobody has emptyquoted this. Having a branch predictor at all was a huge rear end improvement in the early days. Now we're getting the last few % out of whats left over after 15 or 20 years of improvements.
|
# ? May 19, 2018 04:40 |
|
Malcolm XML posted:SuperH is the best ISA fight me Dreamcast for life yo
|
# ? May 19, 2018 06:02 |
|
redeyes posted:I don't know the inner workings of the power management in Windows but on my workstations, that poo poo is turned loving off. You want a system that is unstable? Let it sleep. This is another reason I changed to Windows 10 pro for Workstations, it has less power state crap in the MAXIMUM PERFORMANCE mode. Same, it’s completely disabled on my desktop system outside of letting the CPU bounce around multipliers. It’s very hard to get right (sleep / ACPI power management) and when you consider that Windows boxes have a goddamn near infinite variety of configurations of hardware, it’s a nightmare. The only machines I’ve really seen sleep reliably were MacBooks, and they have the obvious advantage of limited hardware configuration.
|
# ? May 19, 2018 06:34 |
|
JawnV6 posted:In my general experience, HW folks have some utterly amazing opinions of what SW can accomplish. I'm sure this is borne of utmost respect and admiration for the profession instead of laziness and assuming it must be easy for the folks on the other side of the wall can accomplish. I had a really good relationship with my SW team at my last job, which was at the level where we would sit down and I’d make a hardware design decision that was nowhere near optimal from my point of view understanding that as a result the SW guys would remain oversubscribed at 2x instead of 3x. Easy for me to live with knowing it’s helping my buddies out (while still getting the job done) and also knowing that I have some goodwill to burn (which I did) when I needed their help to fix my fuckups because hardware spins take time and money! The absence of management involvement in the above is not an accident.
|
# ? May 19, 2018 06:39 |
|
movax posted:Same, it’s completely disabled on my desktop system outside of letting the CPU bounce around multipliers. It’s very hard to get right (sleep / ACPI power management) and when you consider that Windows boxes have a goddamn near infinite variety of configurations of hardware, it’s a nightmare. The only machines I’ve really seen sleep reliably were MacBooks, and they have the obvious advantage of limited hardware configuration. Where exactly are you disabling this?
|
# ? May 19, 2018 06:43 |
|
My system recently wouldn't sleep on it's own or stay asleep. I found some neat commands that let me disable certain devices ability to wake the system. Apparently my mouse was twitching when I wasn't around.
|
# ? May 19, 2018 07:02 |
|
Tab8715 posted:Where exactly are you disabling this? I’ve lost track of the various Group Policy editor changes I’ve made but Windows has been told to not enter sleep under any circumstances (only hibernate on UPS battery low) and having an older CPU (2600K) it doesn’t have any of the fancy power stuff in the first place. Also ticked off various S3-related stuff in the BIOS and my only USB device is my mouse (PS/2 keyboard, and I try as hard as I can to not have other permanently attached USB stuff if I can help it). (I don’t like USB in case it wasn’t obvious but it’s probably the best we could have gotten in the mid 90s. PCI was still the best thing to come out of IAL.)
|
# ? May 19, 2018 07:09 |
|
Sorry for the laymen questions ahead of time, but there's some really interesting poo poo in this thread: Is there active research on designing new architectures that don't have all the drawbacks of X86/ARM/PowerPC (which is not quantum-based q-bit stuff)? Is there anything out there now that seems really good, but has yet to get any mainstream usage due to the dominance of X86? Is there another path that could be taken that is prohibitively expensive but could theoretically eek out much more significant gains than tweaking the poo poo out of X86?
|
# ? May 19, 2018 10:37 |
|
Avalanche posted:Sorry for the laymen questions ahead of time, but there's some really interesting poo poo in this thread: There’s RISC-V, which is in the devkit stage now.
|
# ? May 19, 2018 10:59 |
|
I’m all in for RISC-V taking on ARM in the micro controller arena, followed by high-performance application processors (so Cortex-M and Cortex-A, not servers though). If Eastern silicon vendors decide they get tired of paying ARM royalties for each SoC they’re kicking out, RISC-V fits in perfectly. Folks like Western Digital are super interested because they can cut the cost of an ARM out of every single drive they ship. Microsemi is pushing it hard for FPGA soft core usage and SiFive is making headway in shipping product and slashing the NRE for ASIC integration by huge amounts. I want to say that companies are already funding clang/llvm/etc development for the RISC-V ISA and targets, but compiler / toolchain support IMO is what can cause infant mortality. Processor is worthless if the tools don’t exist or suck so badly that developers don’t want to touch it. e: with RISC-V being an ISA definition, it’s still going to be up to implementers to figure out how they want to branch prediction and the other features that have been mentioned here. Not a magic wand on its own... As for challenging x86...eh, I don’t think so, for RISC-V. At least not primarily; it’ll insert itself into the market in other segments first. movax fucked around with this message at 13:09 on May 19, 2018 |
# ? May 19, 2018 12:23 |
|
Thread being strange, I can see "last post by Movax", but not the post. E - can now, sorry.
|
# ? May 19, 2018 12:32 |
|
|
# ? May 23, 2024 20:59 |
|
Avalanche posted:Sorry for the laymen questions ahead of time, but there's some really interesting poo poo in this thread: The Mill shall vanquish all lesser architectures.
|
# ? May 20, 2018 19:54 |