Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Cygni
Nov 12, 2005

raring to post

The small cores seem pretty performant if 8C+8c with 24T can beat a true 16C/32T… unless those new big cores are a way bigger jump than people are expecting.

Adbot
ADBOT LOVES YOU

DrDork
Dec 29, 2003
commanding officer of the Army of Dorkness

Cygni posted:

The small cores seem pretty performant if 8C+8c with 24T can beat a true 16C/32T… unless those new big cores are a way bigger jump than people are expecting.

It's possible it's both: big jump on big core but so power hungry that they can't fit more than a small number without burning things down when it's running full out, so they shuffle in some decent little cores to help out at a lower thermal load instead of just playing the "downclock all the cores" game that they've played in the past.

As noted, a couple of super-fast single threads is often better for non-server workloads than slower higher-thread count solutions, so this might be the way to try to get the best of both worlds within the thermal envelope they're very obviously struggling to work within.

But either way I guess we'll have to wait for better info and specs before we really know what the story is, and the Windows scheduler can still make the whole experiment pointless if it doesn't know what it's doing.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

mdxi posted:

On the software side, it's all been a non-issue for years on Linux -- driven by the use cases of phones and Chromebooks. But after watching the Windows scheduler cripple perfectly ordinary x86 CPUs because what do you mean a computer can have more than 4 cores, AMD?, I have no doubt Microsoft will balls-up this rather more complex issue in some hilarious way.

Ryzen's windows scheduler problems were more to do with Windows not really understanding the concept of a CCX (seeing as Ryzen was one of the first (probably the first?) non-uniform cache architectures Windows had to deal with). AFAIK windows has always worked fine with a 6950X 10-core processor or whatever (although the Windows scheduler is generally acknowledged as not being as good as Linux of course).

If they had done something like their Zen3 architecture from the start, where a CCX was 8 cores, it would solved the problem for all consumer chips (as these were all 8-core or less for Zen1/Zen+) - at that point it would have been a "normal" uniform architecture like a 5960X, from the perspective of the scheduler. Having a CCX within a larger CCD was a dumb architectural choice, then AMD didn't do the legwork to let Microsoft know it was weird, and insisted even after launch there wasn't a problem. It was all just tremendously avoidable and a lot of the blame falls on AMD there.

There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores and giving you nonsensical "this game needs at least 2 cores!" messages or whatever, but that wasn't Windows' fault.

As far as Alder Lake, Intel has been working with Microsoft and the scheduler can classify workloads/application states and assign them appropriately. That was part of the point of getting Lakefield out even if it didn't exactly set the world on fire as a product in itself.

Paul MaudDib fucked around with this message at 20:13 on Jul 21, 2021

Kazinsal
Dec 13, 2011


The difficult part of the Windows scheduler is that it has two scheduler modes, one of which prioritizes active foreground work while the other prioritizes heavily interrupt-driven (and often I/O-waiting) processes like server tasks. That's one of the main differences since 2015 or so between client Windows and server Windows -- which scheduler is enabled by default. The "client" scheduler just assumes you have one big brick of CPU/cache/memory and throws poo poo at the wall when it needs to schedule a thread for a few quantums. The "server" scheduler algorithm is more NUMA-aware, but it was also designed under the assumption that NUMA domains are generally multi-socket machines, not multi-CCX machines, which as far as I know don't show up as separate sockets even in NUMA mode.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Kazinsal posted:

The difficult part of the Windows scheduler is that it has two scheduler modes, one of which prioritizes active foreground work while the other prioritizes heavily interrupt-driven (and often I/O-waiting) processes like server tasks. That's one of the main differences since 2015 or so between client Windows and server Windows -- which scheduler is enabled by default. The "client" scheduler just assumes you have one big brick of CPU/cache/memory and throws poo poo at the wall when it needs to schedule a thread for a few quantums. The "server" scheduler algorithm is more NUMA-aware, but it was also designed under the assumption that NUMA domains are generally multi-socket machines, not multi-CCX machines, which as far as I know don't show up as separate sockets even in NUMA mode.

yeah I think the "easy" answer for scheduling onto CCXs would have been to present them as separate sockets, but on top of the scheduler not being designed to do that, it also triggers a whole bunch of problems with licensing (what's this, a 2-socket machine!? that will be one billionty dollars if you want to use our software).

windows 10 home edition won't even run on a multi-socket machine, pro is limited to 2 sockets.

Potato Salad
Oct 23, 2014

nobody cares


ms will almost certainly be working on big/little and ccx/die awareness

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
on top of lakefield they also probably have had to address it for Windows on ARM, I'd think

repiv
Aug 13, 2009

Paul MaudDib posted:

There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores or whatever, but that wasn't Windows' fault.

even windows is brushing up against legacy limits with threadripper, the scheduling guts use a pointer-sized bitfield to represent thread affinity so the limit was 32 logical cores originally and 64 after the 64bit transition

to support chips with 128 or more threads they had to duct tape on an extra level of abstraction that lets the kernel pretend a 128 thread cpu is two 64 thread cpus, or a 256 thread cpu is four 64 thread cpus, etc

repiv fucked around with this message at 20:31 on Jul 21, 2021

Kazinsal
Dec 13, 2011


I'm looking forward to Part 2 of Windows Internals 7th Edition being at least 30% errata for changes in the NT kernel since Part 1 came out... four years ago

Bofast
Feb 21, 2011

Grimey Drawer

Paul MaudDib posted:

There were also some instances of games flipping a poo poo with Threadripper because they just couldn't comprehend the idea of a processor with 32 cores and giving you nonsensical "this game needs at least 2 cores!" messages or whatever, but that wasn't Windows' fault.
There are also games like Dirt Rally that still crash if you try to run them on 9+ logical core CPUs, which is probably more likely to be an issue than something not being able to handle 32 cores. :D

Ika
Dec 30, 2004
Pure insanity

Bofast posted:

There are also games like Dirt Rally that still crash if you try to run them on 9+ logical core CPUs, which is probably more likely to be an issue than something not being able to handle 32 cores. :D

Sounds like:

unsigned __int8 threadIDs[8]; // There will never be more than 8 cores in a system so we can use a static buffer.

Beef
Jul 26, 2004
Are there any indications that the big cores can run simultaneously with the small cores?

I'm guessing no because of dark silicon. But the number of cores still seems relatively small compared to Xeons.

BurritoJustice
Oct 9, 2012

Beef posted:

Are there any indications that the big cores can run simultaneously with the small cores?

I'm guessing no because of dark silicon. But the number of cores still seems relatively small compared to Xeons.

Is there any indication that they can't? They're going to be scheduled such that the smaller cores work on more background tasks while the big cores are doing the heavy lifting. The multicore scores are all with gracemont + golden cove working together

Bofast
Feb 21, 2011

Grimey Drawer

Ika posted:

Sounds like:

unsigned __int8 threadIDs[8]; // There will never be more than 8 cores in a system so we can use a static buffer.
Quite possibly. A bit like the settings autodetect in GTA IV panicking if there's more than 2 GB of VRAM and hardlocking the game to 640x480 minimum settings unless you start it with some extra parameter to disable the check.
Dirt Rally actually has separate XML config files for different core counts, so I don't know if it just can't find a suitable one for that many threads or what is going on. Worked fine on my older 4C/4T R3 1300X, though.

Cygni
Nov 12, 2005

raring to post

Beef posted:

Are there any indications that the big cores can run simultaneously with the small cores?

I'm guessing no because of dark silicon. But the number of cores still seems relatively small compared to Xeons.

Yeah like Intel's Lakefield parts, both big and small can run simultaneously. Thats how it is able to beat a 5950X in that multicore test. It does get... complicated though, depending on the tasks and the instruction sets used.

Some of the rumors is that the "small" cores aren't really that small, with performance roughly equal to Skylake cores. The big cores will also have AVX-512 and theoretically bigger boost clocks, but maybe a more apt way to phrase Alder Lake is going to be "bigHUGE".

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Cygni posted:

Yeah like Intel's Lakefield parts, both big and small can run simultaneously. Thats how it is able to beat a 5950X in that multicore test. It does get... complicated though, depending on the tasks and the instruction sets used.

Some of the rumors is that the "small" cores aren't really that small, with performance roughly equal to Skylake cores. The big cores will also have AVX-512 and theoretically bigger boost clocks, but maybe a more apt way to phrase Alder Lake is going to be "bigHUGE".

supposedly Alder Lake-S (client desktop) will not have AVX-512 support.

it seems like you could probably build an application with multiple codepaths, one that is AVX-512 and one that is not, and have the application dynamically executing both of them at runtime based on the appropriate path for the core. Obviously there are potentially some interprocess communication edge cases there, and you would have to have some kind of "affinity" call to tell the OS scheduler that this thread can only be migrated around other AVX-512 cores, but it seems like it broadly should work.

like, you don't have to take the overlap between both instruction sets, you can try really hard to keep the big threads on the big cores and if the worst happens then you trap on those instructions and migrate to the big cores. Not something you want to be doing ten times a second obviously but it doesn't have to, like, crash the whole processor.

but yeah I am really not impressed with where Intel has gone with AVX-512. Rocket Lake has it... Alder Lake takes it back out? And Zen4 will supposedly have it but that's probably early to mid next year, which will be later than Alder Lake.

I guess it's great on... laptops? definitely where I do all my AVX heavy computation /s. It does make sense on the server platform, as long as they can keep the power under control (so it doesn't have weird clockdown behaviors - so far these are fixed in ICL-SP as well as far as I know, but who knows what the future holds with Intel), and there is news Intel is putting together a HEDT/workstation platform based on Sapphire Rapids so I guess it will be available there.

As much as the extra instructions in AVX are super important to all kinds of tasks (even with a Zen1-style "two cycle" implementation it would be very powerful) the actual implementation and rollout of AVX has never been anything short of a slow-rolling disaster. first the power license-based clockdowns and latency, then the endless client skylake with no support on the consumer platform, and just the Ice Lake/Tiger Lake laptops... like we're seriously like 5 years into avx-512 and you still can't actually buy one on desktop except for a lovely 4 year old HEDT architecture (that always kinda sucked) or for Rocket Lake (which ends up being a completely epic loving fail where it doesn't increase performance or efficiency at all and Zen3 walks all over it even in the most favorable use-cases like x265).

like... holy poo poo intel I get it, I'll just buy a Zen4 chip, god drat.

Paul MaudDib fucked around with this message at 22:27 on Jul 23, 2021

Sidesaddle Cavalry
Mar 15, 2013

Oh Boy Desert Map
Do we know if Alder Lake-S has been liberated been given the privilege of running with ECC memory yet? (Actually I don't know if ECC DDR5 even exists)

Cygni
Nov 12, 2005

raring to post


I somehow missed this because I thought after adding it on Rocket Lake, there is no way Intel would back out. Lol. Incredible.

Canned Sunshine
Nov 20, 2005

CAUTION: POST QUALITY UNDER CONSTRUCTION



Cygni posted:

I somehow missed this because I thought after adding it on Rocket Lake, there is no way Intel would back out. Lol. Incredible.

Probably a dumb question/thought, but could this be partly the consequence of how they have different teams in different areas, working on the successive architectures?

DrDork
Dec 29, 2003
commanding officer of the Army of Dorkness

SourKraut posted:

Probably a dumb question/thought, but could this be partly the consequence of how they have different teams in different areas, working on the successive architectures?

There's no way the teams could be so disconnected that one didn't know the other was putting in AVX-512 and they needed to, too, or whatever.

Gucci Loafers
May 20, 2006

Ask yourself, do you really want to talk to pair of really nice gaudy shoes?


Alder Lake-S Desktop & Alder Lake-P Mobile vPro in Q1 2022

I am ready this correctly or is there more to the story? I was under the impression I'd be able to build myself a fancy new this Fall but now we're waiting until next year? :smith:

repiv
Aug 13, 2009

As the article says that's the roadmap for vPro (business) parts, it doesn't contradict the consumer versions launching at the end of this year

Shipon
Nov 7, 2005
its quite possible they saw how much of a power hog avx-512 was in rocket lake and decided not to repeat that mistake with the next gen

PC LOAD LETTER
May 23, 2005
WTF?!
I don't think that idea washes since the new chip is supposed to have a 228W+ power mode which is what is enabled to get those (quite good BTW) bench numbers at something like 5Ghz+ clocks.

They're quite clearly willing to blow out the power budget to get the performance they need so cutting AVX512 to save on power doesn't make sense.

ConanTheLibrarian
Aug 13, 2004


dis buch is late
Fallen Rib
Perhaps something more like trying to keep the instructions available on each core type the same.

BlankSystemDaemon
Mar 13, 2009



ConanTheLibrarian posted:

Perhaps something more like trying to keep the instructions available on each core type the same.
Because that worked so well for the ARM licensees who did it.

Wild EEPROM
Jul 29, 2011


oh, my, god. Becky, look at her bitrate.
Further separating avx 512 for hedt users only

Harik
Sep 9, 2001

From the hard streets of Moscow
First dog to touch the stars


Plaster Town Cop

Paul MaudDib posted:

supposedly Alder Lake-S (client desktop) will not have AVX-512 support.

it seems like you could probably build an application with multiple codepaths, one that is AVX-512 and one that is not, and have the application dynamically executing both of them at runtime based on the appropriate path for the core. Obviously there are potentially some interprocess communication edge cases there, and you would have to have some kind of "affinity" call to tell the OS scheduler that this thread can only be migrated around other AVX-512 cores, but it seems like it broadly should work.
For AVX on AMD64 you trap to the OS the first time you touch an AVX register, to inform the OS that it now has to preserve all that extra state. Linux makes use of this to lazy-save/restore the AVX state on context switches. This is particularly useful if an AVX-using thread gets preempted to handle a system task that doesn't use AVX - rather than causing 4 loads/saves (kernel entry, kernel exit to system task, kernel re-entry, kernel exit to AVX task) it causes zero - the AVX register state is left unchanged the entire time, and when the avx-using thread touches them again the trap just returns as there's no work to do.

I believe it was the same for 8087 FP/SSE/MMX/3DNOW! but that was so long ago and I've purged that disastrous era from my working memory.

The same functionality can be used to lock an avx-using task to the big/huge cores.

FuturePastNow
May 19, 2014


PC LOAD LETTER posted:

I don't think that idea washes since the new chip is supposed to have a 228W+ power mode which is what is enabled to get those (quite good BTW) bench numbers at something like 5Ghz+ clocks.

They're quite clearly willing to blow out the power budget to get the performance they need so cutting AVX512 to save on power doesn't make sense.

Can't wait for Dell to sell gaming PCs with a tiny aluminum heatsink on those 228W processors.

hobbesmaster
Jan 28, 2008

Yeah the gn Alienware and g5 reviews were absurd.

canyoneer
Sep 13, 2005


I only have canyoneyes for you
Bunch of tech and biz news today from intc
https://www.businesswire.com/news/home/20210726005136/en/

Highlights:
Node naming is decoupled from nanometers now. 10nm is now "Intel 7", 7nm is now "Intel 4", which is familiar to me as I've been called a "Midwest 8"
"Ribbon FET" transistors, or transistors gated on 4 sides (like the 3 sided FinFET transistors, but 4!)
PowerVIA tech, routing all power from the backside of the wafer
Qualcomm named as a foundry customer for the "Intel 20A" process node, and AWS as a foundry customer for packaging.

Shipon
Nov 7, 2005
what's the a supposed to stand for? angstroms?

Canned Sunshine
Nov 20, 2005

CAUTION: POST QUALITY UNDER CONSTRUCTION



canyoneer posted:

Bunch of tech and biz news today from intc
https://www.businesswire.com/news/home/20210726005136/en/

Highlights:
Node naming is decoupled from nanometers now. 10nm is now "Intel 7", 7nm is now "Intel 4", which is familiar to me as I've been called a "Midwest 8"
"Ribbon FET" transistors, or transistors gated on 4 sides (like the 3 sided FinFET transistors, but 4!)
PowerVIA tech, routing all power from the backside of the wafer
Qualcomm named as a foundry customer for the "Intel 20A" process node, and AWS as a foundry customer for packaging.

Jesus Christ, Intel...

Cygni
Nov 12, 2005

raring to post

honestly it is a good idea to ditch the now meaningless “nm” naming. they shoulda gone all out and started giving the names totally number free names.

our new process: Gary. followed by Phillip.

Perplx
Jun 26, 2004


Best viewed on Orgasma Plasma
Lipstick Apathy
They are doing this so next time they are stuck at a node size for 7 years they can just keep renaming it.

Dr. Fishopolis
Aug 31, 2004

ROBOT

Perplx posted:

They are doing this so next time they are stuck at a node size for 7 years they can just keep renaming it.

yes but it's intel, so in three generations the node will be "intel 3.8 ++++++++++++++ xxxtreme "do the dew" edition"

Kazinsal
Dec 13, 2011


Shipon posted:

what's the a supposed to stand for? angstroms?

Yeah, except it's the former "Intel 4nm", so it should really be 40Å, since 1nm == 10Å.

gradenko_2000
Oct 5, 2010

HELL SERPENT
Lipstick Apathy
How long until we get the nanoangstroms line from I Have No Mouth And I Must Scream

trilobite terror
Oct 20, 2007
BUT MY LIVELIHOOD DEPENDS ON THE FORUMS!

canyoneer posted:

Bunch of tech and biz news today from intc
https://www.businesswire.com/news/home/20210726005136/en/

Highlights:
Node naming is decoupled from nanometers now. 10nm is now "Intel 7", 7nm is now "Intel 4", which is familiar to me as I've been called a "Midwest 8"

Adbot
ADBOT LOVES YOU

BurritoJustice
Oct 9, 2012

The changes are cool and good because even though they're still arbitrary they make it easier to compare to other fabs.



(This is with the updated names)

You see enough people taking the numbers literally and thinking TSMC 7nm is a generation ahead of Intel 10nm ESF when it's mostly on par.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply