Intel: lol

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

Cygni: Nov 12, 2005; raring to post

The small cores seem pretty performant if 8C+8c with 24T can beat a true 16C/32T� unless those new big cores are a way bigger jump than people are expecting.

# ? Jul 21, 2021 17:58

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 17:04

DrDork: Dec 29, 2003; commanding officer of the Army of Dorkness

Cygni posted:

The small cores seem pretty performant if 8C+8c with 24T can beat a true 16C/32T� unless those new big cores are a way bigger jump than people are expecting.

It's possible it's both: big jump on big core but so power hungry that they can't fit more than a small number without burning things down when it's running full out, so they shuffle in some decent little cores to help out at a lower thermal load instead of just playing the "downclock all the cores" game that they've played in the past.

As noted, a couple of super-fast single threads is often better for non-server workloads than slower higher-thread count solutions, so this might be the way to try to get the best of both worlds within the thermal envelope they're very obviously struggling to work within.

But either way I guess we'll have to wait for better info and specs before we really know what the story is, and the Windows scheduler can still make the whole experiment pointless if it doesn't know what it's doing.

# ? Jul 21, 2021 19:26

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

mdxi posted:

On the software side, it's all been a non-issue for years on Linux -- driven by the use cases of phones and Chromebooks. But after watching the Windows scheduler cripple perfectly ordinary x86 CPUs because what do you mean a computer can have more than 4 cores, AMD?, I have no doubt Microsoft will balls-up this rather more complex issue in some hilarious way.

Ryzen's windows scheduler problems were more to do with Windows not really understanding the concept of a CCX (seeing as Ryzen was one of the first (probably the first?) non-uniform cache architectures Windows had to deal with). AFAIK windows has always worked fine with a 6950X 10-core processor or whatever (although the Windows scheduler is generally acknowledged as not being as good as Linux of course).

If they had done something like their Zen3 architecture from the start, where a CCX was 8 cores, it would solved the problem for all consumer chips (as these were all 8-core or less for Zen1/Zen+) - at that point it would have been a "normal" uniform architecture like a 5960X, from the perspective of the scheduler. Having a CCX within a larger CCD was a dumb architectural choice, then AMD didn't do the legwork to let Microsoft know it was weird, and insisted even after launch there wasn't a problem. It was all just tremendously avoidable and a lot of the blame falls on AMD there.

There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores and giving you nonsensical "this game needs at least 2 cores!" messages or whatever, but that wasn't Windows' fault.

As far as Alder Lake, Intel has been working with Microsoft and the scheduler can classify workloads/application states and assign them appropriately. That was part of the point of getting Lakefield out even if it didn't exactly set the world on fire as a product in itself.

Paul MaudDib fucked around with this message at 20:13 on Jul 21, 2021

# ? Jul 21, 2021 19:59

Kazinsal: Dec 13, 2011

The difficult part of the Windows scheduler is that it has two scheduler modes, one of which prioritizes active foreground work while the other prioritizes heavily interrupt-driven (and often I/O-waiting) processes like server tasks. That's one of the main differences since 2015 or so between client Windows and server Windows -- which scheduler is enabled by default. The "client" scheduler just assumes you have one big brick of CPU/cache/memory and throws poo poo at the wall when it needs to schedule a thread for a few quantums. The "server" scheduler algorithm is more NUMA-aware, but it was also designed under the assumption that NUMA domains are generally multi-socket machines, not multi-CCX machines, which as far as I know don't show up as separate sockets even in NUMA mode.

# ? Jul 21, 2021 20:12

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

Kazinsal posted:

The difficult part of the Windows scheduler is that it has two scheduler modes, one of which prioritizes active foreground work while the other prioritizes heavily interrupt-driven (and often I/O-waiting) processes like server tasks. That's one of the main differences since 2015 or so between client Windows and server Windows -- which scheduler is enabled by default. The "client" scheduler just assumes you have one big brick of CPU/cache/memory and throws poo poo at the wall when it needs to schedule a thread for a few quantums. The "server" scheduler algorithm is more NUMA-aware, but it was also designed under the assumption that NUMA domains are generally multi-socket machines, not multi-CCX machines, which as far as I know don't show up as separate sockets even in NUMA mode.

yeah I think the "easy" answer for scheduling onto CCXs would have been to present them as separate sockets, but on top of the scheduler not being designed to do that, it also triggers a whole bunch of problems with licensing (what's this, a 2-socket machine!? that will be one billionty dollars if you want to use our software).

windows 10 home edition won't even run on a multi-socket machine, pro is limited to 2 sockets.

# ? Jul 21, 2021 20:16

Potato Salad: Oct 23, 2014; nobody cares

ms will almost certainly be working on big/little and ccx/die awareness

# ? Jul 21, 2021 20:17

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

on top of lakefield they also probably have had to address it for Windows on ARM, I'd think

# ? Jul 21, 2021 20:18

repiv: Aug 13, 2009

Paul MaudDib posted:

There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores or whatever, but that wasn't Windows' fault.

even windows is brushing up against legacy limits with threadripper, the scheduling guts use a pointer-sized bitfield to represent thread affinity so the limit was 32 logical cores originally and 64 after the 64bit transition

to support chips with 128 or more threads they had to duct tape on an extra level of abstraction that lets the kernel pretend a 128 thread cpu is two 64 thread cpus, or a 256 thread cpu is four 64 thread cpus, etc

repiv fucked around with this message at 20:31 on Jul 21, 2021

# ? Jul 21, 2021 20:24

Kazinsal: Dec 13, 2011

I'm looking forward to Part 2 of Windows Internals 7th Edition being at least 30% errata for changes in the NT kernel since Part 1 came out... four years ago

# ? Jul 21, 2021 20:32

Bofast: Feb 21, 2011; Grimey Drawer

Paul MaudDib posted:

There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores and giving you nonsensical "this game needs at least 2 cores!" messages or whatever, but that wasn't Windows' fault.

There are also games like Dirt Rally that still crash if you try to run them on 9+ logical core CPUs, which is probably more likely to be an issue than something not being able to handle 32 cores.

# ? Jul 22, 2021 05:53

Ika: Dec 30, 2004; Pure insanity

Bofast posted:

There are also games like Dirt Rally that still crash if you try to run them on 9+ logical core CPUs, which is probably more likely to be an issue than something not being able to handle 32 cores.

Sounds like:

unsigned __int8 threadIDs[8]; // There will never be more than 8 cores in a system so we can use a static buffer.

# ? Jul 23, 2021 01:21

Beef: Jul 26, 2004

Are there any indications that the big cores can run simultaneously with the small cores?

I'm guessing no because of dark silicon. But the number of cores still seems relatively small compared to Xeons.

# ? Jul 23, 2021 08:12

BurritoJustice: Oct 9, 2012

Beef posted:

Are there any indications that the big cores can run simultaneously with the small cores?

I'm guessing no because of dark silicon. But the number of cores still seems relatively small compared to Xeons.

Is there any indication that they can't? They're going to be scheduled such that the smaller cores work on more background tasks while the big cores are doing the heavy lifting. The multicore scores are all with gracemont + golden cove working together

# ? Jul 23, 2021 08:42

Bofast: Feb 21, 2011; Grimey Drawer

Ika posted:

Sounds like:

unsigned __int8 threadIDs[8]; // There will never be more than 8 cores in a system so we can use a static buffer.

Quite possibly. A bit like the settings autodetect in GTA IV panicking if there's more than 2 GB of VRAM and hardlocking the game to 640x480 minimum settings unless you start it with some extra parameter to disable the check.
Dirt Rally actually has separate XML config files for different core counts, so I don't know if it just can't find a suitable one for that many threads or what is going on. Worked fine on my older 4C/4T R3 1300X, though.

# ? Jul 23, 2021 12:10

Cygni: Nov 12, 2005; raring to post

Beef posted:

Are there any indications that the big cores can run simultaneously with the small cores?

I'm guessing no because of dark silicon. But the number of cores still seems relatively small compared to Xeons.

Yeah like Intel's Lakefield parts, both big and small can run simultaneously. Thats how it is able to beat a 5950X in that multicore test. It does get... complicated though, depending on the tasks and the instruction sets used.

Some of the rumors is that the "small" cores aren't really that small, with performance roughly equal to Skylake cores. The big cores will also have AVX-512 and theoretically bigger boost clocks, but maybe a more apt way to phrase Alder Lake is going to be "bigHUGE".

# ? Jul 23, 2021 21:47

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

Cygni posted:

Yeah like Intel's Lakefield parts, both big and small can run simultaneously. Thats how it is able to beat a 5950X in that multicore test. It does get... complicated though, depending on the tasks and the instruction sets used.

Some of the rumors is that the "small" cores aren't really that small, with performance roughly equal to Skylake cores. The big cores will also have AVX-512 and theoretically bigger boost clocks, but maybe a more apt way to phrase Alder Lake is going to be "bigHUGE".

supposedly Alder Lake-S (client desktop) will not have AVX-512 support.

it seems like you could probably build an application with multiple codepaths, one that is AVX-512 and one that is not, and have the application dynamically executing both of them at runtime based on the appropriate path for the core. Obviously there are potentially some interprocess communication edge cases there, and you would have to have some kind of "affinity" call to tell the OS scheduler that this thread can only be migrated around other AVX-512 cores, but it seems like it broadly should work.

like, you don't have to take the overlap between both instruction sets, you can try really hard to keep the big threads on the big cores and if the worst happens then you trap on those instructions and migrate to the big cores. Not something you want to be doing ten times a second obviously but it doesn't have to, like, crash the whole processor.

but yeah I am really not impressed with where Intel has gone with AVX-512. Rocket Lake has it... Alder Lake takes it back out? And Zen4 will supposedly have it but that's probably early to mid next year, which will be later than Alder Lake.

I guess it's great on... laptops? definitely where I do all my AVX heavy computation /s. It does make sense on the server platform, as long as they can keep the power under control (so it doesn't have weird clockdown behaviors - so far these are fixed in ICL-SP as well as far as I know, but who knows what the future holds with Intel), and there is news Intel is putting together a HEDT/workstation platform based on Sapphire Rapids so I guess it will be available there.

As much as the extra instructions in AVX are super important to all kinds of tasks (even with a Zen1-style "two cycle" implementation it would be very powerful) the actual implementation and rollout of AVX has never been anything short of a slow-rolling disaster. first the power license-based clockdowns and latency, then the endless client skylake with no support on the consumer platform, and just the Ice Lake/Tiger Lake laptops... like we're seriously like 5 years into avx-512 and you still can't actually buy one on desktop except for a lovely 4 year old HEDT architecture (that always kinda sucked) or for Rocket Lake (which ends up being a completely epic loving fail where it doesn't increase performance or efficiency at all and Zen3 walks all over it even in the most favorable use-cases like x265).

like... holy poo poo intel I get it, I'll just buy a Zen4 chip, god drat.

Paul MaudDib fucked around with this message at 22:27 on Jul 23, 2021

# ? Jul 23, 2021 22:07

Sidesaddle Cavalry: Mar 15, 2013; Oh Boy Desert Map

Do we know if Alder Lake-S has ~~been liberated~~ been given the privilege of running with ECC memory yet? (Actually I don't know if ECC DDR5 even exists)

# ? Jul 23, 2021 23:30

Cygni: Nov 12, 2005; raring to post

Paul MaudDib posted:

supposedly Alder Lake-S (client desktop) will not have AVX-512 support.

I somehow missed this because I thought after adding it on Rocket Lake, there is no way Intel would back out. Lol. Incredible.

# ? Jul 23, 2021 23:47

Canned Sunshine: Nov 20, 2005; CAUTION: POST QUALITY UNDER CONSTRUCTION

Cygni posted:

I somehow missed this because I thought after adding it on Rocket Lake, there is no way Intel would back out. Lol. Incredible.

Probably a dumb question/thought, but could this be partly the consequence of how they have different teams in different areas, working on the successive architectures?

# ? Jul 24, 2021 00:10

DrDork: Dec 29, 2003; commanding officer of the Army of Dorkness

SourKraut posted:

Probably a dumb question/thought, but could this be partly the consequence of how they have different teams in different areas, working on the successive architectures?

There's no way the teams could be so disconnected that one didn't know the other was putting in AVX-512 and they needed to, too, or whatever.

# ? Jul 24, 2021 00:59

Gucci Loafers: May 20, 2006; Ask yourself, do you really want to talk to pair of really nice gaudy shoes?

Alder Lake-S Desktop & Alder Lake-P Mobile vPro in Q1 2022

I am ready this correctly or is there more to the story? I was under the impression I'd be able to build myself a fancy new this Fall but now we're waiting until next year? :smith:

# ? Jul 24, 2021 01:14

repiv: Aug 13, 2009

As the article says that's the roadmap for vPro (business) parts, it doesn't contradict the consumer versions launching at the end of this year

# ? Jul 24, 2021 01:21

Shipon: Nov 7, 2005

its quite possible they saw how much of a power hog avx-512 was in rocket lake and decided not to repeat that mistake with the next gen

# ? Jul 24, 2021 08:21

PC LOAD LETTER: May 23, 2005; WTF?!

I don't think that idea washes since the new chip is supposed to have a 228W+ power mode which is what is enabled to get those (quite good BTW) bench numbers at something like 5Ghz+ clocks.

They're quite clearly willing to blow out the power budget to get the performance they need so cutting AVX512 to save on power doesn't make sense.

# ? Jul 24, 2021 12:56

ConanTheLibrarian: Aug 13, 2004; dis buch is late; Fallen Rib

Perhaps something more like trying to keep the instructions available on each core type the same.

# ? Jul 24, 2021 17:55

BlankSystemDaemon: Mar 13, 2009

ConanTheLibrarian posted:

Perhaps something more like trying to keep the instructions available on each core type the same.

Because that worked so well for the ARM licensees who did it.

# ? Jul 24, 2021 17:59

Wild EEPROM: Jul 29, 2011; oh, my, god. Becky, look at her bitrate.

Further separating avx 512 for hedt users only

# ? Jul 24, 2021 20:17

Harik: Sep 9, 2001; From the hard streets of Moscow
First dog to touch the stars; Plaster Town Cop

Paul MaudDib posted:

supposedly Alder Lake-S (client desktop) will not have AVX-512 support.

it seems like you could probably build an application with multiple codepaths, one that is AVX-512 and one that is not, and have the application dynamically executing both of them at runtime based on the appropriate path for the core. Obviously there are potentially some interprocess communication edge cases there, and you would have to have some kind of "affinity" call to tell the OS scheduler that this thread can only be migrated around other AVX-512 cores, but it seems like it broadly should work.

For AVX on AMD64 you trap to the OS the first time you touch an AVX register, to inform the OS that it now has to preserve all that extra state. Linux makes use of this to lazy-save/restore the AVX state on context switches. This is particularly useful if an AVX-using thread gets preempted to handle a system task that doesn't use AVX - rather than causing 4 loads/saves (kernel entry, kernel exit to system task, kernel re-entry, kernel exit to AVX task) it causes zero - the AVX register state is left unchanged the entire time, and when the avx-using thread touches them again the trap just returns as there's no work to do.

I believe it was the same for 8087 FP/SSE/MMX/3DNOW! but that was so long ago and I've purged that disastrous era from my working memory.

The same functionality can be used to lock an avx-using task to the big/huge cores.

# ? Jul 25, 2021 01:39

FuturePastNow: May 19, 2014

PC LOAD LETTER posted:

I don't think that idea washes since the new chip is supposed to have a 228W+ power mode which is what is enabled to get those (quite good BTW) bench numbers at something like 5Ghz+ clocks.

They're quite clearly willing to blow out the power budget to get the performance they need so cutting AVX512 to save on power doesn't make sense.

Can't wait for Dell to sell gaming PCs with a tiny aluminum heatsink on those 228W processors.

# ? Jul 25, 2021 08:19

hobbesmaster: Jan 28, 2008

Yeah the gn Alienware and g5 reviews were absurd.

# ? Jul 25, 2021 22:18

canyoneer: Sep 13, 2005; I only have canyoneyes for you

Bunch of tech and biz news today from intc
https://www.businesswire.com/news/home/20210726005136/en/

Highlights:
Node naming is decoupled from nanometers now. 10nm is now "Intel 7", 7nm is now "Intel 4", which is familiar to me as I've been called a "Midwest 8"
"Ribbon FET" transistors, or transistors gated on 4 sides (like the 3 sided FinFET transistors, but 4!)
PowerVIA tech, routing all power from the backside of the wafer
Qualcomm named as a foundry customer for the "Intel 20A" process node, and AWS as a foundry customer for packaging.

# ? Jul 27, 2021 00:04

Shipon: Nov 7, 2005

what's the a supposed to stand for? angstroms?

# ? Jul 27, 2021 00:14

Canned Sunshine: Nov 20, 2005; CAUTION: POST QUALITY UNDER CONSTRUCTION

canyoneer posted:

Bunch of tech and biz news today from intc
https://www.businesswire.com/news/home/20210726005136/en/

Highlights:
Node naming is decoupled from nanometers now. 10nm is now "Intel 7", 7nm is now "Intel 4", which is familiar to me as I've been called a "Midwest 8"
"Ribbon FET" transistors, or transistors gated on 4 sides (like the 3 sided FinFET transistors, but 4!)
PowerVIA tech, routing all power from the backside of the wafer
Qualcomm named as a foundry customer for the "Intel 20A" process node, and AWS as a foundry customer for packaging.

Jesus Christ, Intel...

# ? Jul 27, 2021 03:59

Cygni: Nov 12, 2005; raring to post

honestly it is a good idea to ditch the now meaningless �nm� naming. they shoulda gone all out and started giving the names totally number free names.

our new process: Gary. followed by Phillip.

# ? Jul 27, 2021 04:13

Perplx: Jun 26, 2004; Best viewed on Orgasma Plasma; Lipstick Apathy

They are doing this so next time they are stuck at a node size for 7 years they can just keep renaming it.

# ? Jul 27, 2021 04:28

Dr. Fishopolis: Aug 31, 2004; ROBOT

Perplx posted:

They are doing this so next time they are stuck at a node size for 7 years they can just keep renaming it.

yes but it's intel, so in three generations the node will be "intel 3.8 ++++++++++++++ xxxtreme "do the dew" edition"

# ? Jul 27, 2021 05:10

Kazinsal: Dec 13, 2011

Shipon posted:

what's the a supposed to stand for? angstroms?

Yeah, except it's the former "Intel 4nm", so it should really be 40�, since 1nm == 10�.

# ? Jul 27, 2021 05:27

gradenko_2000: Oct 5, 2010; HELL SERPENT; Lipstick Apathy

How long until we get the nanoangstroms line from I Have No Mouth And I Must Scream

# ? Jul 27, 2021 06:30

trilobite terror: Oct 20, 2007; BUT MY LIVELIHOOD DEPENDS ON THE FORUMS!

canyoneer posted:

Bunch of tech and biz news today from intc
https://www.businesswire.com/news/home/20210726005136/en/

Highlights:
Node naming is decoupled from nanometers now. 10nm is now "Intel 7", 7nm is now "Intel 4", which is familiar to me as I've been called a "Midwest 8"

# ? Jul 27, 2021 07:14

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 17:04

BurritoJustice: Oct 9, 2012

The changes are cool and good because even though they're still arbitrary they make it easier to compare to other fabs.

(This is with the updated names)

You see enough people taking the numbers literally and thinking TSMC 7nm is a generation ahead of Intel 10nm ESF when it's mostly on par.

# ? Jul 27, 2021 09:24

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

«‹›741 »