|
The small cores seem pretty performant if 8C+8c with 24T can beat a true 16C/32T… unless those new big cores are a way bigger jump than people are expecting.
|
# ? Jul 21, 2021 17:58 |
|
|
# ? May 21, 2024 17:04 |
|
Cygni posted:The small cores seem pretty performant if 8C+8c with 24T can beat a true 16C/32T… unless those new big cores are a way bigger jump than people are expecting. It's possible it's both: big jump on big core but so power hungry that they can't fit more than a small number without burning things down when it's running full out, so they shuffle in some decent little cores to help out at a lower thermal load instead of just playing the "downclock all the cores" game that they've played in the past. As noted, a couple of super-fast single threads is often better for non-server workloads than slower higher-thread count solutions, so this might be the way to try to get the best of both worlds within the thermal envelope they're very obviously struggling to work within. But either way I guess we'll have to wait for better info and specs before we really know what the story is, and the Windows scheduler can still make the whole experiment pointless if it doesn't know what it's doing.
|
# ? Jul 21, 2021 19:26 |
|
mdxi posted:On the software side, it's all been a non-issue for years on Linux -- driven by the use cases of phones and Chromebooks. But after watching the Windows scheduler cripple perfectly ordinary x86 CPUs because what do you mean a computer can have more than 4 cores, AMD?, I have no doubt Microsoft will balls-up this rather more complex issue in some hilarious way. Ryzen's windows scheduler problems were more to do with Windows not really understanding the concept of a CCX (seeing as Ryzen was one of the first (probably the first?) non-uniform cache architectures Windows had to deal with). AFAIK windows has always worked fine with a 6950X 10-core processor or whatever (although the Windows scheduler is generally acknowledged as not being as good as Linux of course). If they had done something like their Zen3 architecture from the start, where a CCX was 8 cores, it would solved the problem for all consumer chips (as these were all 8-core or less for Zen1/Zen+) - at that point it would have been a "normal" uniform architecture like a 5960X, from the perspective of the scheduler. Having a CCX within a larger CCD was a dumb architectural choice, then AMD didn't do the legwork to let Microsoft know it was weird, and insisted even after launch there wasn't a problem. It was all just tremendously avoidable and a lot of the blame falls on AMD there. There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores and giving you nonsensical "this game needs at least 2 cores!" messages or whatever, but that wasn't Windows' fault. As far as Alder Lake, Intel has been working with Microsoft and the scheduler can classify workloads/application states and assign them appropriately. That was part of the point of getting Lakefield out even if it didn't exactly set the world on fire as a product in itself. Paul MaudDib fucked around with this message at 20:13 on Jul 21, 2021 |
# ? Jul 21, 2021 19:59 |
|
The difficult part of the Windows scheduler is that it has two scheduler modes, one of which prioritizes active foreground work while the other prioritizes heavily interrupt-driven (and often I/O-waiting) processes like server tasks. That's one of the main differences since 2015 or so between client Windows and server Windows -- which scheduler is enabled by default. The "client" scheduler just assumes you have one big brick of CPU/cache/memory and throws poo poo at the wall when it needs to schedule a thread for a few quantums. The "server" scheduler algorithm is more NUMA-aware, but it was also designed under the assumption that NUMA domains are generally multi-socket machines, not multi-CCX machines, which as far as I know don't show up as separate sockets even in NUMA mode.
|
# ? Jul 21, 2021 20:12 |
|
Kazinsal posted:The difficult part of the Windows scheduler is that it has two scheduler modes, one of which prioritizes active foreground work while the other prioritizes heavily interrupt-driven (and often I/O-waiting) processes like server tasks. That's one of the main differences since 2015 or so between client Windows and server Windows -- which scheduler is enabled by default. The "client" scheduler just assumes you have one big brick of CPU/cache/memory and throws poo poo at the wall when it needs to schedule a thread for a few quantums. The "server" scheduler algorithm is more NUMA-aware, but it was also designed under the assumption that NUMA domains are generally multi-socket machines, not multi-CCX machines, which as far as I know don't show up as separate sockets even in NUMA mode. yeah I think the "easy" answer for scheduling onto CCXs would have been to present them as separate sockets, but on top of the scheduler not being designed to do that, it also triggers a whole bunch of problems with licensing (what's this, a 2-socket machine!? that will be one billionty dollars if you want to use our software). windows 10 home edition won't even run on a multi-socket machine, pro is limited to 2 sockets.
|
# ? Jul 21, 2021 20:16 |
|
ms will almost certainly be working on big/little and ccx/die awareness
|
# ? Jul 21, 2021 20:17 |
|
on top of lakefield they also probably have had to address it for Windows on ARM, I'd think
|
# ? Jul 21, 2021 20:18 |
|
Paul MaudDib posted:There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores or whatever, but that wasn't Windows' fault. even windows is brushing up against legacy limits with threadripper, the scheduling guts use a pointer-sized bitfield to represent thread affinity so the limit was 32 logical cores originally and 64 after the 64bit transition to support chips with 128 or more threads they had to duct tape on an extra level of abstraction that lets the kernel pretend a 128 thread cpu is two 64 thread cpus, or a 256 thread cpu is four 64 thread cpus, etc repiv fucked around with this message at 20:31 on Jul 21, 2021 |
# ? Jul 21, 2021 20:24 |
|
I'm looking forward to Part 2 of Windows Internals 7th Edition being at least 30% errata for changes in the NT kernel since Part 1 came out... four years ago
|
# ? Jul 21, 2021 20:32 |
|
Paul MaudDib posted:There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores and giving you nonsensical "this game needs at least 2 cores!" messages or whatever, but that wasn't Windows' fault.
|
# ? Jul 22, 2021 05:53 |
|
Bofast posted:There are also games like Dirt Rally that still crash if you try to run them on 9+ logical core CPUs, which is probably more likely to be an issue than something not being able to handle 32 cores. Sounds like: unsigned __int8 threadIDs[8]; // There will never be more than 8 cores in a system so we can use a static buffer.
|
# ? Jul 23, 2021 01:21 |
|
Are there any indications that the big cores can run simultaneously with the small cores? I'm guessing no because of dark silicon. But the number of cores still seems relatively small compared to Xeons.
|
# ? Jul 23, 2021 08:12 |
|
Beef posted:Are there any indications that the big cores can run simultaneously with the small cores? Is there any indication that they can't? They're going to be scheduled such that the smaller cores work on more background tasks while the big cores are doing the heavy lifting. The multicore scores are all with gracemont + golden cove working together
|
# ? Jul 23, 2021 08:42 |
|
Ika posted:Sounds like: Dirt Rally actually has separate XML config files for different core counts, so I don't know if it just can't find a suitable one for that many threads or what is going on. Worked fine on my older 4C/4T R3 1300X, though.
|
# ? Jul 23, 2021 12:10 |
|
Beef posted:Are there any indications that the big cores can run simultaneously with the small cores? Yeah like Intel's Lakefield parts, both big and small can run simultaneously. Thats how it is able to beat a 5950X in that multicore test. It does get... complicated though, depending on the tasks and the instruction sets used. Some of the rumors is that the "small" cores aren't really that small, with performance roughly equal to Skylake cores. The big cores will also have AVX-512 and theoretically bigger boost clocks, but maybe a more apt way to phrase Alder Lake is going to be "bigHUGE".
|
# ? Jul 23, 2021 21:47 |
|
Cygni posted:Yeah like Intel's Lakefield parts, both big and small can run simultaneously. Thats how it is able to beat a 5950X in that multicore test. It does get... complicated though, depending on the tasks and the instruction sets used. supposedly Alder Lake-S (client desktop) will not have AVX-512 support. it seems like you could probably build an application with multiple codepaths, one that is AVX-512 and one that is not, and have the application dynamically executing both of them at runtime based on the appropriate path for the core. Obviously there are potentially some interprocess communication edge cases there, and you would have to have some kind of "affinity" call to tell the OS scheduler that this thread can only be migrated around other AVX-512 cores, but it seems like it broadly should work. like, you don't have to take the overlap between both instruction sets, you can try really hard to keep the big threads on the big cores and if the worst happens then you trap on those instructions and migrate to the big cores. Not something you want to be doing ten times a second obviously but it doesn't have to, like, crash the whole processor. but yeah I am really not impressed with where Intel has gone with AVX-512. Rocket Lake has it... Alder Lake takes it back out? And Zen4 will supposedly have it but that's probably early to mid next year, which will be later than Alder Lake. I guess it's great on... laptops? definitely where I do all my AVX heavy computation /s. It does make sense on the server platform, as long as they can keep the power under control (so it doesn't have weird clockdown behaviors - so far these are fixed in ICL-SP as well as far as I know, but who knows what the future holds with Intel), and there is news Intel is putting together a HEDT/workstation platform based on Sapphire Rapids so I guess it will be available there. As much as the extra instructions in AVX are super important to all kinds of tasks (even with a Zen1-style "two cycle" implementation it would be very powerful) the actual implementation and rollout of AVX has never been anything short of a slow-rolling disaster. first the power license-based clockdowns and latency, then the endless client skylake with no support on the consumer platform, and just the Ice Lake/Tiger Lake laptops... like we're seriously like 5 years into avx-512 and you still can't actually buy one on desktop except for a lovely 4 year old HEDT architecture (that always kinda sucked) or for Rocket Lake (which ends up being a completely epic loving fail where it doesn't increase performance or efficiency at all and Zen3 walks all over it even in the most favorable use-cases like x265). like... holy poo poo intel I get it, I'll just buy a Zen4 chip, god drat. Paul MaudDib fucked around with this message at 22:27 on Jul 23, 2021 |
# ? Jul 23, 2021 22:07 |
|
Do we know if Alder Lake-S has
|
# ? Jul 23, 2021 23:30 |
|
I somehow missed this because I thought after adding it on Rocket Lake, there is no way Intel would back out. Lol. Incredible.
|
# ? Jul 23, 2021 23:47 |
|
Cygni posted:I somehow missed this because I thought after adding it on Rocket Lake, there is no way Intel would back out. Lol. Incredible. Probably a dumb question/thought, but could this be partly the consequence of how they have different teams in different areas, working on the successive architectures?
|
# ? Jul 24, 2021 00:10 |
|
SourKraut posted:Probably a dumb question/thought, but could this be partly the consequence of how they have different teams in different areas, working on the successive architectures? There's no way the teams could be so disconnected that one didn't know the other was putting in AVX-512 and they needed to, too, or whatever.
|
# ? Jul 24, 2021 00:59 |
|
Alder Lake-S Desktop & Alder Lake-P Mobile vPro in Q1 2022 I am ready this correctly or is there more to the story? I was under the impression I'd be able to build myself a fancy new this Fall but now we're waiting until next year?
|
# ? Jul 24, 2021 01:14 |
|
As the article says that's the roadmap for vPro (business) parts, it doesn't contradict the consumer versions launching at the end of this year
|
# ? Jul 24, 2021 01:21 |
|
its quite possible they saw how much of a power hog avx-512 was in rocket lake and decided not to repeat that mistake with the next gen
|
# ? Jul 24, 2021 08:21 |
|
I don't think that idea washes since the new chip is supposed to have a 228W+ power mode which is what is enabled to get those (quite good BTW) bench numbers at something like 5Ghz+ clocks. They're quite clearly willing to blow out the power budget to get the performance they need so cutting AVX512 to save on power doesn't make sense.
|
# ? Jul 24, 2021 12:56 |
|
Perhaps something more like trying to keep the instructions available on each core type the same.
|
# ? Jul 24, 2021 17:55 |
ConanTheLibrarian posted:Perhaps something more like trying to keep the instructions available on each core type the same.
|
|
# ? Jul 24, 2021 17:59 |
|
Further separating avx 512 for hedt users only
|
# ? Jul 24, 2021 20:17 |
|
Paul MaudDib posted:supposedly Alder Lake-S (client desktop) will not have AVX-512 support. I believe it was the same for 8087 FP/SSE/MMX/3DNOW! but that was so long ago and I've purged that disastrous era from my working memory. The same functionality can be used to lock an avx-using task to the big/huge cores.
|
# ? Jul 25, 2021 01:39 |
|
PC LOAD LETTER posted:I don't think that idea washes since the new chip is supposed to have a 228W+ power mode which is what is enabled to get those (quite good BTW) bench numbers at something like 5Ghz+ clocks. Can't wait for Dell to sell gaming PCs with a tiny aluminum heatsink on those 228W processors.
|
# ? Jul 25, 2021 08:19 |
|
Yeah the gn Alienware and g5 reviews were absurd.
|
# ? Jul 25, 2021 22:18 |
|
Bunch of tech and biz news today from intc https://www.businesswire.com/news/home/20210726005136/en/ Highlights: Node naming is decoupled from nanometers now. 10nm is now "Intel 7", 7nm is now "Intel 4", which is familiar to me as I've been called a "Midwest 8" "Ribbon FET" transistors, or transistors gated on 4 sides (like the 3 sided FinFET transistors, but 4!) PowerVIA tech, routing all power from the backside of the wafer Qualcomm named as a foundry customer for the "Intel 20A" process node, and AWS as a foundry customer for packaging.
|
# ? Jul 27, 2021 00:04 |
|
what's the a supposed to stand for? angstroms?
|
# ? Jul 27, 2021 00:14 |
|
canyoneer posted:Bunch of tech and biz news today from intc Jesus Christ, Intel...
|
# ? Jul 27, 2021 03:59 |
|
honestly it is a good idea to ditch the now meaningless “nm” naming. they shoulda gone all out and started giving the names totally number free names. our new process: Gary. followed by Phillip.
|
# ? Jul 27, 2021 04:13 |
|
They are doing this so next time they are stuck at a node size for 7 years they can just keep renaming it.
|
# ? Jul 27, 2021 04:28 |
|
Perplx posted:They are doing this so next time they are stuck at a node size for 7 years they can just keep renaming it. yes but it's intel, so in three generations the node will be "intel 3.8 ++++++++++++++ xxxtreme "do the dew" edition"
|
# ? Jul 27, 2021 05:10 |
|
Shipon posted:what's the a supposed to stand for? angstroms? Yeah, except it's the former "Intel 4nm", so it should really be 40Å, since 1nm == 10Å.
|
# ? Jul 27, 2021 05:27 |
|
How long until we get the nanoangstroms line from I Have No Mouth And I Must Scream
|
# ? Jul 27, 2021 06:30 |
|
canyoneer posted:Bunch of tech and biz news today from intc
|
# ? Jul 27, 2021 07:14 |
|
|
# ? May 21, 2024 17:04 |
|
The changes are cool and good because even though they're still arbitrary they make it easier to compare to other fabs. (This is with the updated names) You see enough people taking the numbers literally and thinking TSMC 7nm is a generation ahead of Intel 10nm ESF when it's mostly on par.
|
# ? Jul 27, 2021 09:24 |