Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
JawnV6
Jul 4, 2004

So hot ...

WhyteRyce posted:

X0 was an rear end covering too right? As in, "hey we can't get this important thing ready in time for tape-in and then we end up blocking a whole bunch of people so let's make up a new stepping designation to redefine what A0 is so that we can say we were actually ready on the new A0"
I wasn't directly involved in X0, perhaps I was more empathetic to the covered asses than I should've been.

quote:

I was once on a project were we found a bug but because it didn't fail all the time on all the parts so the designer was convinced they could just implement a screen. There was another stepping already planned too and he could have easily gotten it in time but "had timing implications". We later had to do another stepping literally just for that issue and only that issue
Well sure, causing a stepping comes out of nowhere and just happens but lowering the headroom on turbo can be linked directly to their work. Our incentives are all really well aligned, you see.

Josh Lyman posted:

I remember in the Athlon Thunderbird days that new steppings, notably AXIA and later AYHJA, were highly desired because they overclockeded better. Am I misunderstanding steppings or is this no longer done?
"shipping on C2" doesn't mean that further process improvements wouldn't happen and further on in the life cycle of the product D0's get sold. None of what's being discussed here is contradicting what you're claiming. The first few steppings only being used for internal testing is common.

The letters refer to the doped silicon layer, the numbers refer to the metal layers that connect transistors. Importantly, the doped silicon can be kept around for a while, e.g. if we know A1 is coming we can just avoid etching the A0 metal layers onto these wafers.

Beef posted:

Regarding simulation speeds. My guestimate is in the order of a day, on a limited-model FPGA RTL-emulation platform. On a full-core cycle accurate simulator you're looking at the order of years, you cannot realistically do full-die that way even.
real boot is sooooo boring. cool, lets simulate 140,000 cycles where absolutely nothing is happening besides a PLL locking

Adbot
ADBOT LOVES YOU

hobbesmaster
Jan 28, 2008

BobHoward posted:

Any competent build system will use source file modification time and dependency analysis to figure out exactly what actually needs to be recompiled.

I have wasted so much time over the years learning that there are no competent build systems. :v:

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
Just touch everything in the source tree then you get

Yaoi Gagarin
Feb 20, 2014

BobHoward posted:

Compilation: it depends on the language you're compiling and the size of the project, but as Twerk from Home said, this is one of the things lots of Intel-style E cores should actually help with. (But it's not something a typical PC enthusiast ever does.)

Some specifics... C family languages usually have single-threaded compilers. However, medium to big projects contain many source files. So, the build system spawns N copies of the compiler in parallel, each one processing one source file. Every time one compiler process finishes, the build system spawns a replacement on the next source file in the queue, until there are no files left to process. Typically N is chosen to match the machine's core (or hardware thread) count.

After the compile phase comes the link phase, in which all the object files produced by the compile phase must be linked together into a single output executable. Much like the compilers, C family linkers have traditionally been single threaded, so this part of a build didn't scale at all. However, in recent years, there's finally been progress on multi-threaded linkers. As I understand it, these new linkers don't scale as well as compilation, but it is now possible to get some multi-core speedup rather than none.

Balancing all this: developers seldom compile an entire project from scratch. They modify a handful of files, then compile to test their work. Any competent build system will use source file modification time and dependency analysis to figure out exactly what actually needs to be recompiled. Unchanged source files mostly don't need to be, since there's already an object file on disk from past compile passes and unless it #included a changed header, recompiling won't produce different results.

Of course nobody wants to do a full compile. But 1) if a header changes you have to recompile all source files that use it, 2) in a big company there might genuinely be lots of changes to lots of source files.

Plus at my job we have some DLLs where it seems every two weeks some weird poo poo has invalidated the precompiled header and I have to do a clean build anyway. :whitewater:

I wish I had something more modern than a 9800X, lol

hobbesmaster
Jan 28, 2008

priznat posted:

Just touch everything in the source tree then you get



rm -rf $SSTATE_CACHE_DIR

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Paul MaudDib posted:

you have to distinguish between intel's specific e-cores (and the processors that use them) and e-cores as a general approach. AMD's c-cores are still notionally e-cores, and will tackle that same idea of area efficiency for low-priority batch tasks. they just are doing their e-cores a specific way for certain business+technical reasons. apple has also not really made the e-cores work that great tbh, while they are small, what I've read from people who try to program on apple silicon macos is that getting stuck on an e-core is death for performance. Very good for battery life and background daemons/etc but don't do anything interactive with them.

Who are these people that had trouble with this easy thing so I can avoid their terrible software

Yes, Apple's E cores are slow - about 1/3 the perf of a P core in the M1 generation. It's true that you don't want all kinds of code to run on them. This is why Apple lets you avoid them (or, when appropriate, essentially lock a thread to them); asking for a different scheduling QoS band is one simple API call.

re: E-cores as a general approach, yes, it's unfortunate that the terminology has been overloaded. There's two dimensions of efficiency, area and power, and each is a spectrum, not a binary. Neither Intel's E or AMD's C cores are truly low power or area. They're really more like throughput-oriented P cores: by giving up on peak ST performance, AMD and Intel are able to provide a significant fraction of the biggest cores' performance in many-thread loads for less power and area than the big cores.

Apple's E cores are much more about power efficiency. Not to the exclusion of all else, they're still fairly sophisticated OoO CPUs, but M1 E cores use less than 500mW each at max frequency and can scale down to single digit milliwatts in low power states. They're so efficient that Apple is able to re-use them as the main application processor core in Apple Watch SoCs.

Anime Schoolgirl
Nov 28, 2002

Khorne posted:

What if we package multiple cores together and have them share infrequently used, expensive resources like the fpu. I bet we could achieve insane clock speeds and avoid complexities like SMT with this future-looking design.

Bridgedozer

If this architecture crushes zen6 it will be very entertaining given the history.
the single p-core in a unit will still have SMT by itself, this seems more like a way of making the e-cores have closer access to resources the p-cores have since as of current they have to go the long way around, even to another e-core. it seems less onerous than the tower of dumb bullshit HP asked for in the bulldozer arch.

by the way, itanium is also another architecture full of stupid bullshit HP asked for, coincidence?

Hasturtium
May 19, 2020

And that year, for his birthday, he got six pink ping pong balls in a little pink backpack.

Anime Schoolgirl posted:

the single p-core in a unit will still have SMT by itself, this seems more like a way of making the e-cores have closer access to resources the p-cores have since as of current they have to go the long way around, even to another e-core. it seems less onerous than the tower of dumb bullshit HP asked for in the bulldozer arch.

by the way, itanium is also another architecture full of stupid bullshit HP asked for, coincidence?

Wait, what did HP want with Bulldozer? I didn't know they were involved.

Anime Schoolgirl
Nov 28, 2002

Hasturtium posted:

Wait, what did HP want with Bulldozer? I didn't know they were involved.
it being HP is a guess on my part but a lot of (mostly bad) features in bulldozer were at one particular vendor's request, and the most frequent company that makes such requests of chip designers' architectures is HP.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

BobHoward posted:

Apple's E cores are much more about power efficiency. Not to the exclusion of all else, they're still fairly sophisticated OoO CPUs, but M1 E cores use less than 500mW each at max frequency and can scale down to single digit milliwatts in low power states. They're so efficient that Apple is able to re-use them as the main application processor core in Apple Watch SoCs.

yeah that's fair I guess, especially since they can customize a lot of their software (and indirectly, people will cater to their designs) and make sure the right stuff gets bumped. maybe it's developers getting nailed by app nap or similar, idk.

the other silly thing that gets used in is... the apple USB-C to HDMI adapter. They tested all the active DP1.4 to HDMI 2.1 chipsets and found them wanting. so they just emulated it in software with the s1 (iirc) lol. It has USB-C, it has HDMI PHYs... do I need to hand you a bucket?

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
Someone has found patents on rentable units

quote:


Basically, instead of scheduling on the application thread level, this new method analyzes the work required by a thread and then breaks application threads into segments using partitions. These partitioned threads are then scheduled onto processor cores based on their performance requirements. In other words, a program thread that is mostly simple ALU work but includes a crunchy AVX section may be scheduled onto an E-core, but have its AVX work thrown over to a P-core to make sure it gets completed within a certain time threshold.

The patent talks at length about the method, describing a self-tuning algorithm where the processor's own "Streamed Threading Circuitry" (described as a "Renting Unit" in leaks and likely an evolution of Intel's current Thread Director) logs the amount of time each partition takes to execute, and if the estimation for execution time was wrong, the processor will begin to schedule similar partitions on the the appropriate core type: E-cores if execution completed very quickly, or P-cores if it was very slow.

https://hothardware.com/news/future-intel-cpu-partition-threads

JawnV6
Jul 4, 2004

So hot ...
there's a respected architect who used to be at VMware who's a [redacted] libertarian in his off time so every paper has like a lottery system or things bidding for CPU time or whatever, waldspurger?

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

JawnV6 posted:

there's a respected architect who used to be at VMware who's a [redacted] libertarian in his off time so every paper has like a lottery system or things bidding for CPU time or whatever, waldspurger?

Do they have a strange interest in the age of threads too

Dr. Video Games 0031
Jul 17, 2004


That's more or less what I was picturing when I first heard about RUs from MLID. It has pretty serious implications for single-thread performance if they can effectively split that thread across multiple cores, but I assume it depends heavily on the type of work that thread is doing.

SwissArmyDruid
Feb 14, 2014

by sebmojo
Since we have Intel people here: Please let whoever needs to know know that after EVGA's demise, I no longer have any GPU OEM/partners that I am attached to. I was buying all my EVGA cards as B-stock to begin with, and I will consider signing away my soul if Intel makes SR-IOV a standard feature on their consumer GPUs.

hobbesmaster
Jan 28, 2008

Doesn’t Intel already do that? Integrated GPU does it too and that can be pretty handy

SwissArmyDruid
Feb 14, 2014

by sebmojo
Discrete GPUs. There's apparently a hack floating around that lets you turn an A770 into a two-tenant SR-IOV where each one has 8 GB on a 16 GB card.

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
Oh wow so that'd mean there'd at least be one customer for them! =D

mobby_6kl
Aug 9, 2009

by Fluffdaddy
I had an issue where my laptop wouldn't output video to USB-C monitors (HDMI out still worked) so I tired restarting of course, and manually installing the latest drivers. I didn't fix it immediately (though it now works so :iiam:).

But I now have the Arc Control Center that registers some global hotkeys including Alt-O for the telemetry overlay. This of course overrides all other shortcuts in apps like Excel or anything that uses it for menus or accelerators, etc. There is a notification icon for the CC that has "open settings" and "quit" options that don't do anything. The only way seems to kill it manually from the task manager or uninstall the whole thing: lol.

SwissArmyDruid
Feb 14, 2014

by sebmojo

Rocko Bonaparte posted:

Oh wow so that'd mean there'd at least be one customer for them! =D

Well, the hope is that Intel forces AMD and Nvidia to expose the feature on their cards as well.

I do not want to run Windows 11 on bare metal. I am already waffling with 10 as it is. I am so tired of corporations looking at me like a spring of behavioral data to be mined for profit. But 11 makes me want to turn to the Arch (Linux) side and never look back.

SwissArmyDruid fucked around with this message at 10:21 on Aug 18, 2023

feedmegin
Jul 30, 2008

BobHoward posted:

Typically N is chosen to match the machine's core (or hardware thread) count.

Usually a bit higher actually because at any one time some of them are going to be blocked on i/o.

Beef
Jul 26, 2004
PIUMA shown at hotchips: https://www.servethehome.com/intel-shows-8-core-528-thread-processor-with-silicon-photonics/

More interesting details than the older super-general whitepaper https://arxiv.org/abs/2010.06277

SwissArmyDruid
Feb 14, 2014

by sebmojo
ASRock Releases 2 New SoC Motherboards Based on Intel® N100 Processor

Well, color me intrigued. I'm sure the N200/N300s are sure to be on the way, now that these are finally available globally.

Hasturtium
May 19, 2020

And that year, for his birthday, he got six pink ping pong balls in a little pink backpack.

SwissArmyDruid posted:

ASRock Releases 2 New SoC Motherboards Based on Intel® N100 Processor

Well, color me intrigued. I'm sure the N200/N300s are sure to be on the way, now that these are finally available globally.

Man, I was just thinking how nice an N300 SFF machine that could take a discrete graphics card would be. An N100 would be perfect for a lot of undemanding work, and the silence and low power consumption are hugely appealing to me personally. It will be interesting to see the benchmarks, and particularly how CPU-limited something like a midrange GPU would be by these chips.

Hasturtium fucked around with this message at 13:01 on Aug 30, 2023

Cygni
Nov 12, 2005

raring to post

Hasturtium posted:

It will be interesting to see the benchmarks, and particularly how CPU-limited something like a midrange GPU would be by these chips.

It’s a PCIe 3.0 x2 slot, so it will choke basically everything on that front. N100 doesn’t have a lot of lanes.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Hasturtium posted:

Man, I was just thinking how nice an N300 SFF machine that could take a discrete graphics card would be. An N100 would be perfect for a lot of undemanding work, and the silence and low power consumption are hugely appealing to me personally. It will be interesting to see the benchmarks, and particularly how CPU-limited something like a midrange GPU would be by these chips.

Here's the ballpark that an N100 is in, and these samples are mostly from mini-PCs with subpar cooling. It's between a Haswell and Skylake i5 for multi-core but still not quite as fast as a Haswell i5 for single-core. If anyone else wants to poke at passmark, which is a generally decent CPU benchmark: https://www.cpubenchmark.net/compare/3103vs2570vs5157vs1921/Intel-i3-8100-vs-Intel-i5-6600K-vs-Intel-N100-vs-Intel-i5-4670K




I wish I knew what PL1 and PL2 were for the N100. Years of dealing with Intel's bullshit means that I know that the i3-8100 with a 65W TDP sucks down 88W, but I have no idea what an N100 actually uses.

Hasturtium
May 19, 2020

And that year, for his birthday, he got six pink ping pong balls in a little pink backpack.

Cygni posted:

It’s a PCIe 3.0 x2 slot, so it will choke basically everything on that front. N100 doesn’t have a lot of lanes.

Oh. Well, so much for my optimism on that front, but even if it’s bandwidth-constrained the expansion options are still interesting. I’ve been very happy with my little N100 mini PC.

Cygni
Nov 12, 2005

raring to post

https://twitter.com/momomo_us/status/1696879845412589946



when you need 120 cores in a standard atx gamer case, baby

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
:stare: fuckin hell

That DIMM placement on the top :lol:

Beef
Jul 26, 2004
Leaving some performance on the table there, SPR Xeons have 8 memory channels and this only has 8 DIMM slots across two sockets.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

Beef posted:

Leaving some performance on the table there, SPR Xeons have 8 memory channels and this only has 8 DIMM slots across two sockets.

I would like to see an ATX board with 2 xeon sps and 16 dimm slots, lol.

Just 1 pcie slot and a bunch of mcio for the rest maybe.

power crystals
Jun 6, 2007

Who wants a belly rub??

DIMM slots are horizontally mounted on the back.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

power crystals posted:

DIMM slots are horizontally mounted on the back.

Special cases with motherboard cutouts! Or just have really high standoffs. But then wouldn’t fit the rear connector panels..

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

priznat posted:

Special cases with motherboard cutouts! Or just have really high standoffs. But then wouldn’t fit the rear connector panels..

REALLY tall standoffs, back IO is mounted on the bottom of the mobo, can only use half height PCI cards, must be mounted in full height case.

repiv
Aug 13, 2009

Methylethylaldehyde posted:

REALLY tall standoffs, back IO is mounted on the bottom of the mobo, can only use half height PCI cards, must be mounted in full height case.

someone get asrock on the phone, stat!

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

Methylethylaldehyde posted:

REALLY tall standoffs, back IO is mounted on the bottom of the mobo, can only use half height PCI cards, must be mounted in full height case.

That’d be awesome! Time to get weird with it. How many layers on that PCB though haha

Hasturtium
May 19, 2020

And that year, for his birthday, he got six pink ping pong balls in a little pink backpack.

priznat posted:

That’d be awesome! Time to get weird with it. How many layers on that PCB though haha

what do you think the pcb smells like haha

ConanTheLibrarian
Aug 13, 2004


dis buch is late
Fallen Rib

Beef posted:

PIUMA shown at hotchips: https://www.servethehome.com/intel-shows-8-core-528-thread-processor-with-silicon-photonics/

More interesting details than the older super-general whitepaper https://arxiv.org/abs/2010.06277

Why'd they stop at 66 threads/core when 69 was so close?

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

ConanTheLibrarian posted:

Why'd they stop at 66 threads/core when 69 was so close?

:nice:

Seriouspost: it's not actually 66 threads/core. Servethehome's attempt at a real time summary of the slides/talk garbled things a fair bit. The paper's available here:

http://heirman.net/papers/aananthakrishnan2023piuma.pdf

The system is composed of "blocks" consisting of 6 cpu cores + 1 scratchpad memory + 1 DRAM controller + 1 offload engine. Of the six cores in each block, four are multithreaded cores (MTC) and two are single-threaded (STC). The MTCs support 16 hardware thread contexts, the STCs only one. 66 threads comes from 16*4 + 2*1 = 66.

MTCs are in-order, and switch between threads round-robin style on memory misses. Only one thread's instructions are in the pipeline at a time. This is a very bespoke architecture intended to run pointer-chasing workloads. Caches aren't very effective, especially big ones, so there's nothing more than small L1 I and D caches. A thread runs until it generates a cache miss, then the core switches to the next thread. By the time the MTC rolls back around to the original thread which missed the cache, that memory access should be done, completely hiding memory latency.

The STC cores exist to handle tasks which need higher single-thread performance. They're still not anything spectacular; still in-order. It sounds like they needed these to run the OS kernel which manages scheduling things for the MTCs.

The paper mentions that there's cache manipulation instructions in the ISA - prefetch, invalidation, writeback. ISAs which expose this stuff to userspace are almost never general purpose, or if they are, it's a big pain point. But it makes sense in this niche.

The most interesting thing to me is the silicon photonics links (which regrettably the paper doesn't talk about). I thought Intel had given up on that stuff, but hey it's DARPA's money so why not.

BobHoward fucked around with this message at 23:59 on Aug 30, 2023

Adbot
ADBOT LOVES YOU

Cygni
Nov 12, 2005

raring to post

BobHoward posted:

The most interesting thing to me is the silicon photonics links (which regrettably the paper doesn't talk about). I thought Intel had given up on that stuff, but hey it's DARPA's money so why not.

Broadcom is also still working on it, and Intel's got a couple different groups seemingly doing it. I'm not smart enough to know what it all means/is for though.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply