Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Tuna-Fish
Sep 13, 2017

Gwaihir posted:

My god what possible purpose can a 939 machine have to anyone other than as a museum piece?

I'm currently running a 939 (3500+) as a router and a home server.

It ended up there because all the newer machines (other than my main one) got donated/loaned to people who suddenly needed them because of the pandemic, and a core 2 machine I left for the purpose died.

I find it a little shameful, because it's burning way more power doing the job than a more modern machine would, but I already know the pc I will use to replace it, once I get a new one in the fall, and rotate my current pc to a friend and get my little and power-efficient i3 sff back. But I won't buy a new machine until both Zen4 and Raptor Lake.

Adbot
ADBOT LOVES YOU

Tuna-Fish
Sep 13, 2017

Combat Pretzel posted:

I figure they'll be able to make higher clocked V-Cache desktop CPUs, because it's implied that buyers of these will be applying bigger and better heatsinks than you'd typically find in datacenters. That said, I'd expect lower boost frequencies.

Note that while data center heatsinks are usually slimmer than high-end consumer ones, this doesn't mean they are necessarily any less powerful. Unlike in the consumer space, DC doesn't give a poo poo about noise, so the airflow over the heatsink can be 10x higher.

Tuna-Fish
Sep 13, 2017

lamentable dustman posted:

I wonder if the big rear end IHS is so they have some literal headroom if they want to do chip stacking.

It's because they wanted to maintain cooler compatibility with AM4, and since AM5 is LGA, the socket is much thinner.

IMO this was a stupid decision. Most coolers that are good enough that people want to move them between systems would have come up with a low-cost adapter kit very quickly, now they just sacrificed some thermals permanently for a minor short-term benefit.

SwissArmyDruid posted:

AMD probably elected to focus the cuts to N7, based on how they are using N6 for I/O dies and a revised PS5 SoC.

I think most of the cuts are about Mendocino. It's Zen 2 + RDNA2 on TSMC N6, very cheap to make and aimed at entry level and low-end mainstream. It's a really solid product for it's segment, and if they had something like it during the early pandemic era, it would have sold like hot cakes, as it competes mainly against crappy atoms which it easily beats in every conceivable way.

Except that the channel is stuffed of low-end stuff, all the demand was shifted forwards because of the pandemic, and most people are expecting that the entire segment will move barely any volume for a couple of years. So now they have a great product for a segment that doesn't really exist anymore.

Tuna-Fish
Sep 13, 2017

hobbesmaster posted:

If they were Intel and cycling sockets on every other cpu release but AM6 should be very, very far out, no?

AM6 will likely show up with DDR6, which is still ways off. But it will definitely be LGA, and there is no reason to expect it to have requirements any different from AM5.

Combat Pretzel posted:

Someone needs to make a detailed feature spreadsheet for all these boards.

:hmmyes:

Tuna-Fish
Sep 13, 2017

gradenko_2000 posted:

yeah my question was more like, if an iGPU uses system RAM as video RAM, and we expect iGPUs to get faster just from the move from DDR4 and DDR5, all other things being equal, if having lots of cache matters in that paradigm in any way

from the posts so far I'm guessing not because L3 cache for the CPU is just something that the CPU uses, and you'd have make "cache for the iGPU" specifically if you wanted something like that to work like it

You could set the system up in that way, and it would really help for IGPU performance. Being able to fit the render backends in cache is a massive boon for bandwidth-starved GPUs. It's just that any existing AMD APUs don't work that way, and we have seen no hints from AMD they intend to go there.

I do hope they would, I think it would make for an absolutely amazing laptop APU. Cache doesn't just increase performance, it saves a ton of energy.

SwissArmyDruid posted:

A reminder that this is in the context of 7000-series APUs, which will have up to 12 CUs, and could, therefore, have proportionally-diminished amounts of Infinity Cache compared to the full-fat RDNA3 GPUs.

The IC you require for same performance doesn't scale with CU count, but with resolution. (Which indirectly scales with CU count because if your GPU can't handle large resolutions, you'll maybe drop it down.) The big thing you want is to fit your render targets in the cache, because you do a poo poo ton of writes to those while rendering a frame, and locality for all other memory accesses of a GPU sucks anyway.

Klyith posted:

I'm really hoping that the companies are finding they're wrong when they imagined that covid gouging and scalping were prices that customers would accept forever, so this was just the new normal. And that they'll have to ratchet down to more like "+10-20% above pre-pandemic" rather than double.

The problem is that everything for sale right now was built during the pandemic-related inflated prices. This is less bad for CPUs and the like (because the companies have long-standing supply contracts for everything), but the motherboard makers and the like got hit really hard, because a motherboard contains approximately a zillion small parts that they typically just buy from suppliers JIT and suddenly all of those quadrupled in price. The big problem with AM5 currently is definitely the motherboards, but AMD can't really lean on their partners because they are not gouging, they just legitimately paid through the nose to build the things and now they aren't even selling at effectively zero margin prices.

Tuna-Fish fucked around with this message at 17:54 on Nov 20, 2022

Tuna-Fish
Sep 13, 2017

repiv posted:

mercifully they've designed AVX10 to use the same intrinsics as AVX512 so for the most part software can just be recompiled to use the former, aside from having to update CPUID checks and special-case 512bit code, but it's still going to be a hassle having to juggle two very similar yet binary-incompatible extensions in order to maintain support for chips that support AVX512 but not AVX10

They are not binary-incompatible. The AVX10 256-bit instructions are the 256-bit instructions of AVX512.VL. There will be a few new instructions in AVX10 that AMD won't have, but there will be a common 256-bit subset that will run on everything that supports AVX512.VL (like Zen4) and AVX10.

I would bet dollars to donuts that the next common SIMD compilation target that will eventually be used by everyone except the three nerds who do feature detection will be that shared subset. That's still ways off, of course, once it's been available even on the low end for a few years.

Tuna-Fish
Sep 13, 2017

Dr. Video Games 0031 posted:

When are we finally going to do proper quad-channel with 1dpc for consumer chips? What exactly is preventing that from being a thing? Is it just a segmentation thing, to make HEDT more appealing?

Cost. The requirement for the signal path are really strict these days, which means that routing more lines makes every existing line more expensive.

I think there is very little chance that the same platform that needs to support normal, low-end desktop will also ever support 4 channels with dimms. It's a bit more possible if Dell CAMM becomes a thing even on desktop, as those were designed to make routing cheaper. But even with them, I think the odds are pretty remote.

Honestly, I think consumer CPU memory buses will only grow wider once memory is soldered on the same substrate as the CPU. Because that's the only way to do it cheaply.

PC LOAD LETTER posted:

LGA makes it easy to add tons of contacts though.

The socket is a lot less of a limit these days than what's under the socket.

Tuna-Fish
Sep 13, 2017

Yeah, 137mm² of N5 vs 100 mm² of N6 means probably at least twice the cost.

Of course, given what all the other stuff you put in a laptop cost, it's probably sensible for most users to go for the better chip for dramatically better performance, especially in 1T.

Tuna-Fish
Sep 13, 2017

Another reason you want cache on the IOD is that SRAM doesn't really scale on any process past 5nm. No point paying extra for super fancy expensive N3E/N2 silicon if the L3 takes the exact same amount of space.

I think the IOD will be done on a (then) obsolete process like N4, making it a lot cheaper place to put the cache.

Tuna-Fish
Sep 13, 2017

repiv posted:

x86 needs to add javascript instructions like ARM did, it's falling behind the webshit curve

x86 literally already has the exact same "javascript instruction" that ARM added. In fact, they have had it since the 80's!

BobHoward posted:

The Arm "javascript instruction" (there's only one) is way less specific to Javascript than people tend to think - it's just a variant of the floating point to integer conversion instruction. Because JS has this idiotic thing where integers are represented as floating point doubles, it leans FP-to-int a lot, and building a variant of FP-to-int with the exact rounding mode and other behaviors needed by JS is a very low implementation complexity thing with enough reward to be worth it.

The instruction can be described as: convert FP64 into a 32-bit integer in the same way that x86 does it. The reason this is important for javascript is that when Eich implemented integer ops for js, he just converted the float into an int, then did the op, and converted it back. This is a problem for ARM, because he did that on x86, what happens when a float is too big to fit into a 32-bit int is undefined in the standard, and x86 and ARM did different things. (x86 gives a result mod register size, ARM clamps to highest representable int.)

Calling it a javascript instruction was just less embarrassing for ARM than calling it an x86 emulation instruction.

Tuna-Fish
Sep 13, 2017

Subjunctive posted:

No, I worked on the internals of that engine with Brendan in the 90s and 2000s, and it had tagged ints for everything that fit in 31 bits. For integer ops that could over/underflow that range we did the math in double space and then converted back to integer if it fit, otherwise the result was a tagged pointer to a double. Integer ops that couldn’t overflow, like right shift, stayed in int space the whole time.

That's the later spidermonkey, which had fancy things like optimizations.

In the original mocha codebase, when an integer op is executed, it unconditionally pops floats, converts them into integers, does something with them, converts the result back into a float, and pushes it.

(MochaInt and MochaFloat are not some fancy types with logic in the casts, but just i32 and f64.)

It did not stay that way for very long, but the original prototype written in 10 days in 1995 was successful enough that the later versions wanted to maintain backwards compatibility. Some people in a JS conference estimated that those 20k SLOC contain thousands of decisions made that need to be thought of for backwards compatibility, nearly each and every one made with the heuristic of doing whatever was the most straightforward thing so that the prototype could be shipped in 10 days.

Tuna-Fish fucked around with this message at 21:23 on Oct 1, 2023

Tuna-Fish
Sep 13, 2017

Kibner posted:

Once you start overclocking ECC RAM, you risk losing the accuracy of the parity bits and could start getting silent corruption.

You don't increase the odds of silent to noisy correction when you do that. Which means that so long as you don't get any correctables, you are not getting silent ones either.

Tuna-Fish
Sep 13, 2017

Cygni posted:

These are really niche products imo, kinda like the last generation. Especially with the boards with only 4 memory slots. The first few TR generations had some more mainstream DIY appeal, but I don't really see these in the same boat.
The reason to buy a non-pro TR is that you need more PCIe slots than the AM5 platform provides.

Tuna-Fish
Sep 13, 2017

Twerk from Home posted:

I'd love to see a unified x86 thread, ideally with a OP that mentions Centaur and VIA.

I won't stand for this Transmeta erasure!

Tuna-Fish
Sep 13, 2017

gradenko_2000 posted:

That's a big spread on the GPU config. Gotta wait to see if stepping up to the full fat 12CUs will be worth it.

It would still only have 128bit wide DDR5, probably similar to RX 6400 or so. The laptop part that's interesting for it's GPU is Strix Halo, coming later with 256-bit LPDDR5X, but that's not going to be available for AM5.

Tuna-Fish
Sep 13, 2017

Is CAMM going to be a thing for this gen, or only the one after this?

Tuna-Fish
Sep 13, 2017

Dr. Video Games 0031 posted:

new memory module format developed by dell and accepted as a jedec standard for anyone to use. it uses compression mounting to keep a low profile, and is designed in such a way to ensure shorter trace lengths. it strikes a compromise between the performance and low-profile nature of soldered memory with the modularity of sodimms

Only thing I'd add is that the link is strictly point-to-point, and modules are 128 bits. So a normal laptop would have exactly one of those, and to upgrade memory you'd have to remove it and replace it with a larger one.

Tuna-Fish
Sep 13, 2017

ijyt posted:

Looking at the two I don't see what advantage the surface area of camm had over lpcamm

LPCAMM can only fit large amounts of memory if there are many dies in each memory package. CAMM can do larger amounts of memory with more traditional packaging.

The advantage (other than compactness) for LPCAMM is that trace lengths are even shorter than CAMM, and it should be able to reach very close to the power use/maximum speeds of soldered memory.

Tuna-Fish
Sep 13, 2017

Dr. Video Games 0031 posted:

So I guess AMD has a lot of leftover Zen 3 inventory that they're struggling to sell unless they throw v-cache at it?

Note that these are dies they put v-cache on, and then decided to sell as lower SKUs, not dies that were manufactured to be normal Zen3 that they later decided to put v-cache on.

The v-cache dies undergo manufacturing steps fairly early on that the non-v-cache ones don't. They use the same floorplan, but they don't etch out the through-silicon vias for the non-v-cache ones. (That would be a waste of money and time).

Presumably they have some that don't meet targets to be the better SKU.

Tuna-Fish
Sep 13, 2017

Klyith posted:

However, the chiplets that become X3D CPUs have to be thinned down such that CPU chiplet + cache silicon has the same height as normal not-X3D CPU chiplets. I don't know how that's done, but I can't imagine anything other than chemical etching?

No, that's mechanical. They are basically lapped until they are paper-thin. But that's not the bottleneck, the through-silicon-vias are. The cache chip goes on the backside of the CPU. In order to connect to it, they have to drill holes through the entire die and fill them with metal. This is done by etching, after manufacturing transistors but before building the metal stack. This is a fairly long and expensive process, and AMD does not do it for the dies they don't plan to put vCache on. They use the same floorplan, but people have checked with electron microscopes that normal CPUs just have the spots where the vias would start, but don't have holes drilled through the die.

Because of this, they need to decide whether a CPU is going to get vCache fairly early. Some of those chips are not going to be fully intact, and some of them are not going to meet the clock speed requirements for the top-of-the-line stuff. It's presumably these rejects that are now being sold as SKUs lower down the product stack.

Tuna-Fish
Sep 13, 2017

Parametric yields are a thing. There is no way 90%+ of vCache Zen3 dies can do the 5800X3D spec, if they could, you would assume you could find a lot of golden samples that could do much more, there's just too much variability for that. That's why you have lower-tier SKUs that have lower clocks, so you can sell everything you can make.

Tuna-Fish
Sep 13, 2017

Klyith posted:

Wild! Do you know if that's done on whole wafers, or individual dies after they break them up?
I would assume full wafers, but I don't know the specifics.

Lapping/polishing is used a lot in chip making, it's a lot more precise than people think and there are a lot of steps where removing material to an uniform depth is useful.

Tuna-Fish
Sep 13, 2017

kliras posted:

maybe use threadripper on a linux install instead of windows if you want to squeeze all performance out of it:

The windows scheduler kind of shits the bed when there are more than 64 threads. This is why it was common knowledge for people who ran 3995WX on windows to turn off SMT in bios, because the extra threads would never help more than hurt.

(edit:) As in, just 20% difference in computational loads is actually much better than I expected for a 96-core, 192-thread CPU.

Tuna-Fish
Sep 13, 2017

Cygni posted:

*****: AMD has a long and storied history of promising long term support for platforms and then bailing. Remember FM1, AM1, QuadFX, TRX40, etc. I would not expect that AM5 will have as long and historic a run as AM4, and I would also remember that AMD explicitly tried to bifurcate AM4 support and only relented officially after they got a storm of bad-press and the motherboard makers effectively hacked in old support against AMD's wishes. I think there is a lot of good reasons for AMD to keep AM5 alive for a long time, but I wouldn't bet the farm.

I would expect AM5 to be shorter-lived than AM4 simply because DDR5 is expected to be shorter-lived than DDR4. Mass-market adoption of DDR6 is currently expected in 2026. If AMD jumps on it immediately, that gives AM5 only ~4 years as the newest AMD desktop socket vs the 6 of AM4. AMD might well delay a year, like they did with AM5, but even then the support will be a year shorter.

SpaceDrake posted:

Looking forward to doing a drop-in upgrade to a 9800X3D in six or so years when those get nice and cheap.

Note that 9800X3D will be Zen5, the next desktop CPU line expected to be released next year, and it's currently expected that Zen6 will still support AM5 a year later. The only chips that will be labeled as the 8000-series are APUs, like the 4000 and 6000-series.

SpaceDrake posted:


I'll admit I'm an ignorant boob on AMD's future plans, but some of this surprises me. The AM4-based 5700 NoLetter lauched today, and... I'm a little unsure what the point of it is? It's apparently similar in every regard to the 5700X, including full overclock support, and it clocks slightly higher than the X, and is priced about the same... so what is this, exactly? Who is this meant to serve? :confused:

Inept posted:

Importantly, it only has 16MB of L3 cache, so no one should really buy it if it's not significantly cheaper than a 5700X.

Also, they've been sitting on these for at least a year and a half. Bizarre time to release it. https://www.tomshardware.com/news/amds-ryzen-7-5700-emerges-without-radeon-vega-igpu

They are selling off stock of CPUs that can't validate as something better, in this case likely because there are too many faults in the cache for the built-in redundancy.

Tuna-Fish
Sep 13, 2017

Cygni posted:

Cinebench is probably the most favorable thing they could run on the c core, as it doesn’t care about cache. A 7700X beats a 7800X3D lol.

I’m very interested to see some actual desktop testing on the Phoenix2 parts once AMD finally ships the desktop APUs.

There are no cache differences between 7545U and 7540U. They both have 16MB of shared L3, and 1MB L2 per core. The only difference between the Zen4 and the Zen4c cores is achievable frequency.

Tuna-Fish
Sep 13, 2017

Cygni posted:

Oh that’s interesting I must have misremembered that. The 4c cores on Bergamo have half the L3 cache per core of Genoa, and maybe I just assumed that was the same config in Phoenix2.

L3 amount is independent of core type. The APUs have been using 16MB (or, half of the desktop L3) for a long time.

Tuna-Fish
Sep 13, 2017

Indiana_Krom posted:

It is one cache,
No, it's not. The L3 is a victim cache of the L2:s of the cores that are in the same CCX. That is, the only way a line ever ends up in L3 is if it was first evicted by one of the local cores. The other CCD can read from the cache, but only for coherency, and this is not much faster than reading from main ram.

Tuna-Fish
Sep 13, 2017

Subjunctive posted:

wait, Zen 5 already? christ, Lisa, take a breath

what are we expecting out of it? hopefully better memory controller…

Literally exactly the same memory controller as Zen4. It's using the same IOD. The core is internally much wider, based on compiler cost model patches posted by AMD, but everything uncore (past L2) is the same as in Zen4.

Subjunctive posted:

I just want 32x4 at EXPO without worrying about my BIOS revision!

(I’m going to upgrade regardless because I have a sickness.)

Unless your sickness is very strong, if you already have Zen4 maybe hold on until Zen6. It's expected to come quite soon after Zen5, because it's supposed to use largely the same core, but with entirely new uncore/memory controller, using some fancy new way of integrating the dies.

Tuna-Fish
Sep 13, 2017

A Bad King posted:

I think DDR5 has error correcting baked in each module? Just not the interconnects between the module and the CPU?

The important part of ECC is the reporting, and normal DDR5 does not have that.

Tuna-Fish
Sep 13, 2017

I think this is precisely why AMD made flashback a hard requirement for AM5.

Tuna-Fish
Sep 13, 2017

AMD has not yet announced when they are going to announce the new products, but they are doing the opening keynote at Computex and given the bios leaks everyone just assumes that that's when they do the announcement. For the past few generations, AMD announcement has predated the actual availability of chips by about 1-2 months, so June 4th announcement means chips probably on shelves in July.

Tuna-Fish
Sep 13, 2017

SpaceDrake posted:

I would definitely say a sub-$150 MATX board of some description, with a 10Gbit NIC plugged in to the second PCIe like we're living in the 1990s, is the play.

I'd like to note that 10gbe is starting to become legacy in the server world, which is why you can get the relevant networking gear for very reasonable prices used. There are only 2 issues usually:

1. The older server adapters have obsolete PCIe connectors, that are quite wide. x4 slots that are physically x16 are great for this, if your mobo only has physical x1 slots you cannot use server network cards on it.

2. Server AIBs are generally designed for a high airflow environment. If you use one, you need to make sure that there is direct airflow on the heatsink of the card.

That being said, if you are fine with sfp, you can go a lot faster than 10Gbps for reasonable prices used, with things like 56Gbps qsfp28 adapters available for below €50.

Tuna-Fish
Sep 13, 2017

Subjunctive posted:

yeah but switches for that… :homebrew:

You can occasionally pick up used switches the same way too, although they tend to spend a lot less time available than the network cards. Makes sense, home lab people who buy this crap used probably use a lot less devices that have network cards per switch than actual server rooms.

Tuna-Fish
Sep 13, 2017

hobbesmaster posted:

edit: i apologize for ever doubting asrock, their expensive x670e boards have you covered https://pg.asrock.com/mb/AMD/X670E%20PG%20Lightning/index.asp
edit2 no they don’t, goddamnit why can’t you search by motherboard block diagram or something

Huh?

According to their manual, PG lightning routes the PCIE3 connector (mechanically PCIe x16, electrically PCIe 4.0x4) directly from the CPU.




JSON Bourne posted:

On the topic of PCIE lanes, is there any AM5 motherboard that offers slots for both x16 and x8? I've been looking but the best I can seem to find is an x16 slot and an x4.

3.0 x4 is just barely enough for a single link of 25Gbps, with overhead. I recommend going for one of the motherboards where the PCIe slot hangs directly off the CPU, otherwise the network card has to share bandwidth with everything else on your system.

Tuna-Fish
Sep 13, 2017

hobbesmaster posted:

They’re looking for PCIE 3.0 8x.

It’s x4, not x8. The mellanox card needs 8 electrical lanes.

It doesn't need 8, it gracefully degrades down to 4. If you have a single 25Gbps link, 3.0x4 is sufficient. (32Gbps bandwidth) Just need physical x16 and electrical x4.

Adbot
ADBOT LOVES YOU

Tuna-Fish
Sep 13, 2017

Klyith posted:

PCIe power is all on that front stubby bit, which is the same on every size of slot.
True.

Klyith posted:

Every slot needs 75 watts by spec.
Not true. By spec, a x16 slot must be able to provide 75W, while a x1 or x4 must be able to provide at least 25W. A small slot may optionally provide up to 75W, but is not required to be able to by the spec. (There is a protocol for configuring a card for high power that allows for the host to communicate to the cards how much power they may draw.)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply