AMD's Strix Halo under the hood

161 points by kristianp 3 days ago

mort96 3 days ago

I don't understand the name "Strix". It's a name a GPU and motherboard partner of theirs, Asus, uses (used?) for their products. It's impossible for me to read "AMD Strix" and not think of it as some ASU's GPU with an AMD chip in it, or some motherboard for AMD sockets.

Aren't there enough syllables out there to invent a combination which doesn't collide with your own board partners?

newsclues 3 days ago

Strix https://en.wikipedia.org/wiki/Strix_(mythology) Halo is the code name.
AMD Ryzen AI MAX 300 is the product name. This continuing to use the code name.
- mort96 3 days ago
  
  Well, it's a public enough code name that it surprises me that they just used Asus's name.
  - damnitbuilds 3 days ago
    
    Confused me too.
    Did AMD not know?
    Or did AMD know and not care?
    
    1una 3 days ago
    
    AMD makes exclusive deals with ASUS regularly. I guess they're just good friends.
- alecmg 3 days ago
  
  oh great, they are not using a confusing name anymore... wait, now they are using a stupid name!
Keyframe 3 days ago

I understand it's internal codename, but also can't read it without thinking Asus. Especially considering I have Asus Strix 4090 in my rig.
DCKing 3 days ago

I don't think AMD really uses the name "Strix Halo" to market it to a large audience, it's just an internal codename. Just two other recent internal names are "Hawk Point" and "Dragon Range" internally, where Hawk and Dragon are names that MSI and PowerColor use to market GPUs as as well. Heck, PowerColor even exclusively sells AMD cards under the "Red Dragon" name!
AMD's marketing names for especially their mobile chips are just so deliberately confusing that it makes way more sense for press and enthusiasts to keep referring to it by its internal code name than whatever letter/number/AI nonsense AMD's marketing department comes up with.
bombcar 3 days ago

Spy X Family!
AMD is captured.
casey2 2 days ago

I thought it was a Halo reference https://www.halopedia.org/Strix-class_harrier

noelwelsh 3 days ago

For me the question is: what does this mean for future of desktop CPUs? High bandwidth unified memory seems very compelling for many applications, but the GPU doesn't have as much juice as a separate unit. Are we going to see more these supposedly laptop APUs finding their way into desktops, and essentially a bifurcation of desktops into APUs and discrete CPU/GPUs? Or will desktop CPUs also migrate to becoming APUs?

Tepix 3 days ago

iGPUs have been getting ever closer to entry level and even mid-range GPUs.
In addition there's a interest in having a lot of memory for LLM acceleration. I expect both CPUs to get more LLM acceleration capabilities and desktop pc memory bandwidth to increase from its current rather slow dual channel 64bit DDR5-6000 status quo.
We're already hearing the first rumors for Medusa Halo coming in 2026 with 50% more bandwidth than Strix Halo.
- Tepix 3 days ago
  
  The sentence "In addition there's a interest in having a lot of memory for LLM acceleration" was supposed to say "In addition there's a interest in having a lot of memory bandwidth for LLM acceleration" but it's too late to edit it now.
- kllrnohj 3 days ago
  
  > iGPUs have been getting ever closer to entry level and even mid-range GPUs.
  Not really closer. igpus got good enough to kill the low the discreet market basically entirely, but they haven't been "getting closer" to discreet cards. Both CPU SoCs and discreet GPUs have access to the same manufacturing nodes and memory technologies, and the simple physical reality is that they can just be bigger and use more power as a separate physical entity, along with memory better optimized for its workloads.
- Gravityloss 3 days ago
  
  This has been the case for decades now.
  GPU:s have existed about 30 years. Embedded ones for 20 years or so? Why are the embedded GPU:s always so stunted?
  - Plasmoid2000ad 3 days ago
    
    I think the market is very limited for high end iGPUs in practice with the compromises that occur with them.
    On Desktop, upgradability is very popular and obviously the returns from the cooling on discrete GPUs are immense. With GPU dies costing so much, due to their size and dependency on TSMC, pushing the faster but hotter is probably a cost effecient compromise.
    On Laptops with APUs, you currently ususally give up upgradeable memory - the fastest LPDDR is only soldered on (today), and the fastest solution would be on-die memory for bandwith gains that only really Apple is doing.
    Marketing wise, low core count Laptops appear to be hard to sell. Gaming laptops seem to ship with more cores than the desktop you would build - the CPU appears out-specced. I think this is because CPUs are cheaper, but that means a high-end APU would also need large CPU to compete. Now you've got a relatively unbalanced APU, with expensive hot CPU and relatively hot iGPU crammed in a small space - cooling is now tricky.
    This is going to be compared with cheap RTX 4060 laptops - and generally look bad by comparison. I think what's changing now to narrow the gap is Handhelds, and questionable practices from Nvidia.
    The Steam Deck kicked big OEMs into requesting AMD for large APUs.
    Nvidia seems to have influence on OEM AMD Laptops - Intel CPU and Nvidia GPU for years now seem to ship first, in larger quantities, and get marketing push despite CPU arguably being worse.
    Intel despite their issues seem to raising the iGPU bar too - their Desktop GPU investment seems to be paying off, and might be pressuring AMD to react.
    
    Aerroon 2 days ago
    
    I think it's less about upgradeability on desktop and more that companies will overcharge you with integrated products. Eg Apple and RAM or smartphones and storage.
    If you want a good value for your money you need modularity and competition for the modules. If it's a one package deal the companies will charge so much that it curbs secondary markets that could be created, which could add value to the product.
  - aurareturn 3 days ago
    
    Why are the embedded GPU:s always so stunted?
    Memory bandwidth. Besides LLMs, gaming on an iGPU will always be more expensive for the same performance as dedicated GPUs due to memory bandwidth.
    Before someone tells me consoles using iGPUs, keep in mind that consoles use GDDR as its main system memory which has slow access times for the CPU. In a non-console, CPU performance is important. GDDR is also power hungry so they can't be used as the main system RAM in a laptop form.
    
    pixelfarmer 3 days ago
    
    > Memory bandwidth.
    It is the thermal envelope that defines pretty much everything nowadays. Without active management of it chips would die a heat death very fast. Which also means chips are designed with a certain chip external heat management in mind. The more heat you can get out of a system and away from a chip, the more powerful you can design these things. And game consoles do have active cooling, i.e. they sit between desktop PCs and thin laptops, probably sharing the thermal handling capacity with larger gaming laptops, if anything.
  - shadowpho 3 days ago
    
    >Why are the embedded GPU:s always so stunted?
    Because gpu want a lot of silicon. 5080 is 300 mm^2. Meanwhile ryzen 9xxx is 50 mm^2.
    Meanwhile CPU wants to use that wafer space for themselves. And even if you use 100% of the wafer space for GPU you will have a small gpu and no cpu
  - close04 3 days ago
    
    Just look at how a discrete GPU vs. an integrated GPU look like in terms of size, power, cooling, and other constraints like memory type and placement. That’s why both options still exist. If one size did it all, the other option would just die out.
DCKing 3 days ago

Strix Halo is impressive, but it isn't AMD going all out on the concept. Strix Halo's die area (300mm2 ish) is roughly the same as estimates for Apple's M3 Pro die area. The M3 Max and M3 Ultra are twice or four times the size.
In a next iteration AMD could look into doubling or quadrupling the memory channels and GPU die area like as Apple has done. AMD is already a pioneer in the chiplet technology Apple is also using to scale up. So there's lots of room to grow for even higher costs.
- hkgjjgjfjfjfjf 3 days ago
  
  [dead]
c2h5oh 3 days ago

APUs are going to replace low end video cards, because they no longer make economical or technical sense.
Historically those cards had narrow memory bus and about a quarter or less video memory of high end (not even halo) cards from the same generation.
That narrow memory bus puts their max memory bandwidth at a comparable level to desktop DDR5 with 2 DIMMs. At the same time quarter of high end is just 4GB VRAM which is not enough for low details for many games and prevents upscaling/frame gen from working.
From manufacturing standpoint low end GPUs aren't great either - memory controllers, video output and a bunch of other non-compute components don't scale with process node.
At the same time unified memory and bypassing PCIE benefits igpus greatly. You don't have to build an entire card, power delivery, cooler - you just slightly beef up existing ones.
tl;dr; sub-200 dollas GPUs are dead and will be replaced by APUs. I won't be surprised if they will start nibbling at lower mid-range market too in the near future.
- rcarmo 3 days ago
  
  My main gaming rig (for admittedly not very intensive games) has been a 7000 series Ryzen APU with a 780M, and my next one will also be an APU. It makes zero economic sense to build a discrete CPU system for casual gaming, even if I believe that APU prices will be artificially inflated to "cozy up" to low-end discrete GPU prices for a while to maximize profits.
  - pjmlp 3 days ago
    
    Which is why for the games I play, a graphics workstation laptop like Thinkpad P series is much more useful, including GPGPU coding outside gaming, without being an heavyweight circus laptop whose battery lasts half-hour.
- kllrnohj 3 days ago
  
  > tl;dr; sub-200 dollas GPUs are dead and will be replaced by APUs
  that already happened like 5+ years ago. The GT 1030 never got an update, so Nvidia hasn't made an entry level GPU since. Intel kinda did with ARC, but that was almost more a dev board
sambull 3 days ago

That new 'desktop' from framework appears to be just that with the AMD Ryzen Al Max 385
- nrp 3 days ago
  
  We have both Max 385 and Max+ 395 versions.
  - foxandmouse 3 days ago
    
    Any word on putting that in a mobile device? So far there’s only a hp business laptop and a gaming tablet… none of which appeal to the “MacBook crowd”
Symmetry 3 days ago

Having a system level cache for low latency transfer of data between CPU and GPU could be very compelling for some applications even if the overall GPU power is lower than a dedicated card. That doesn't seem to be the case here, though?
- noelwelsh 3 days ago
  
  Strix Halo has unified memory, which is the same general architecture as Apple's M series chips. This means the CPU and GPU share the same memory, so there is no need to copy CPU <-> GPU.
  - Symmetry a day ago
    
    There's not need for an explicit copy, but physically does the data move from the CPU cache to RAM to the GPU cache or does it stay on the chip?
    
    simne a day ago
    
    > does the data move from the CPU cache to RAM to the GPU cache
    Probably, not. Because it need dedicate channel on hardware level.
    - GPU are mostly for streaming applications with large data blocks, so usually, CPU cache architecture is too different from GPU to simply copy (move) data, plus, they are on different chiplets, and dedicate channel means additional interface pins on imposer which are definitely very expensive.
    So, when it is possible to make SoC with dedicated channel CPU<->GPU (or between chiplets), but usually it used only on very expensive architectures, like Xeon, or IBM Power, and not used on consumer products.
    For example on older AMD products with APU, usually, graphics core have priority over the CPU to access unified RAM, but CPU cache don't have any additions to handle shared with GPU memory.
    On latest IBM Power and similarly on Xeon, invented shared L4 cache architecture, where blocks of extremely huge L4 (near to 1Gb per socket on Power, as I remember somewhere about 128Gb on Xeon), could be assigned programmatically to exact core(s) and could give extremely high performance gain for applications running on these cores (usually these things very beneficial for DB or something like zip compressing).
    Added: example difference CPU cache to GPU, for CPU usual size of transaction is less than 64bits, may be current 128..256bits but this is not common on consumer hardware (could be on server SoC), just because many consumer applications are not optimal to use large blocks, but for GPU normal to use 256..1024bits bus, so their cache definitely also have 256bits and larger blocks.
    
    simne 5 hours ago
    
    I have seen video. There stated exactly: CPU don't have access to GPU cache, because "they have ran tests, and with this configuration some applications seen two digits speed increase, but nearly none applications they tested shown significant gains with CPU have access to GPU cache".
    So, when CPU access GPU memory, CPU just directly access RAM via system bus, but not trying to check GPU cache. And yes, this mean, could be large delay between GPU write cache and data actually delivered to RAM and seen by CPU, but probably, smaller than on discrete GPU on PCIe.
    
    simne 21 hours ago
    
    Plus, main idea of GPU, their "Computing Unit" is not alone.
    Mean, in CPU could cut any core and it will work completely separated without other cores.
    In GPU, typical, have blocks for example 6x CUs, which have one pipeline for all, and this is how they achieve thousand CUs or more. So, all CUs basically run same program, in some architectures could make limited independent branching with huge penalties on speed, but mostly just one execution path for all CUs.
    Very similar to SIMD CPU, even some GPUs was basically SIMD CPUs with extremely wide data bus (or even just VLIW). So, GPU cache sure optimized for such usage, it provide buffer wide enough for all CUs on same time.
phkahler 3 days ago

>> Are we going to see more these supposedly laptop APUs finding their way into desktops, and essentially a bifurcation of desktops into APUs and discrete CPU/GPUs?
I sure hope so. We could use a new board form factor that omits the GPU slot. Although my case puts the power connector and button over that slot on the back so it's not completely wasted, but the board area is. This has seemed like a good idea for a long time to me.
This can also be a play against nVidia. When mainstream systems use "good enough" integrated GPUs and get rid of that slot, there is no place for nVidia except in high-end systems.
- adrian_b 3 days ago
  
  There is no need for a new board form factor, because they have existed for many decades.
  Below the mini-ITX format with a GPU slot, there are 3 standard form factors that are big enough for a full-featured personal computer that is more powerful than most laptops: nano-ITX (120 mm x 120 mm, for 5" by 5" cases; half the area of mini-ITX), 3.5" (from the size of the 3.5 inch HDDs, approximately the same area with nano-ITX, but rectangular instead of square) and the 4" x 4" NUC format introduced by Intel.
  With a nano-ITX or 3.5" board you can make a computer not bigger than 1 liter that can ensure a low noise even for a 65 W power dissipation for the CPU+iGPU and that can have a generous amount of peripheral ports, to cover all needs.
  Keeping the low noise condition, one could increase the maximum power-dissipation to 150 W for the CPU+iGPU in a somewhat bigger case, but certainly still smaller than 2.5 liter.
  I expect that we will see such mini-PCs with Strix Halo, the only question is whether their price would be low enough to make them worthwhile.
  The fabrication cost for Strix Halo must be lower than for a combo of CPU with discrete GPU, but the initial offerings with it attempt to make the customer pay more for the benefit of having a more compact system, which for many people will not be enough motivation to accept a higher price.
icegreentea2 3 days ago

The bifurcation is already happening. The last few years have seen lots of miniPC/NUC like products being released.
One of (many) factors that were holding back this form factor was the gap in iGPU/GPU performance. However with the frankly total collapse of the low end GPU market in the last 3-4 years, there's a much larger opening for iGPUs.
I also think that within the gaming space specifically, a lot of the chatter around the Steam Deck helped reset expectations. Like if everyone else is having fun playing games at 800p low/medium, then you suddenly don't feel so bad playing at maybe 1080p medium on your desktop.
juancn 3 days ago

I would love a unified memory architecture, even for external GPUs.
Pay for memory once, and avoid all the copying around between CPU/GPU/NPU for mixed algorithms, and have the workload define the memory distribution.
adra 3 days ago

Framework made a tiny desktop form factor version with this chip in it, so we'll if it gets much traction (at least among enthusiasts).

swiftcoder 3 days ago

I don't really like these "lightly edited" machine transcripts. There are transcription errors in many paragraphs, just adds that little bit of extra friction when reading.

ryukoposting 3 days ago

Interesting read, and interesting product. If I understand it right, this seems like it could be at home in a spiritual successor to the Hades Canyon NUCs. I always thought those were neat.

I wish Chips and Cheese would clean up transcripts instead of publishing verbatim. Maybe I'll use the GPU on my Strix Halo to generate readable transcripts of Chips and Cheese interviews.

keyringlight 3 days ago

The Framework desktop seems like a next step.
Although I appreciate the drive for small profile I wonder where the limits are if you put a big tower cooler onto it, seeing as the broad design direction is for laptops or consoles I doubt there's too much left on the table. I think that highlights a big challenge - is there a sizeable enough market for it, or can you pull in customers from other segments to buy a NUC instead. You'd need a certain amount of mass manufacturing with a highly integrated design to make it worthwhile.
- jorvi 3 days ago
  
  > can you pull in customers from other segments to buy a NUC instead
  I've never understood the hype for NUCs for non-office settings. You can make SFF builds that are tiny and still fit giant GPUs like the RTX 3090 /4090, say less for something like a 4080 Super. And then you can upgrade the GPU and (woe is you) CPU later on. Although a high-end X3D will easily last you 2-3 GPU generations.
  - kccqzy 3 days ago
    
    The size of NUCs is much smaller than any SFF builds with RTX 3090. Some people just like smallness.
    
    bee_rider 3 days ago
    
    Closer to a phone than a laptop, in size!
  - layer8 3 days ago
    
    You can’t mount an SFF build unobtrusively behind a monitor or under a desk, it’s much larger and heavier than a NUC.
    
    phkahler 3 days ago
    
    Agreed, although this is the smallest I've seen:
    https://github.com/phkahler/mellori_ITX/blob/master/images/m...
    It's currently got a 5700G - Zen 3 in it and 64GB RAM. I'd like the next one to hang on the back of a monitor or TV via the standard mounting holes.
  - woodrowbarlow 3 days ago
    
    i feel like high-end mini-ITX builds only became viable a few years ago with the introduction of 700W+ SFX PSUs.
  - bee_rider 3 days ago
    
    You could fit a NUC in a pair of cargo shorts, FWIW. Or many bicycle under-seat bags, which was nice for biking to school without needing any backpack. They were in a sort of… qualitatively smaller size class than laptops.
pixelpoet 3 days ago

Yeah would it have killed them to read over it just once? Can they not find a single school kid to do it for lunch money or something? Hell I'll do it for free, I've read this article twice now, and read everything they put out the moment it hits my inbox.

zbrozek 3 days ago

I really want LPDDR5X (and future better versions) to become standard on desktops, alongside faster and more-numerous memory controllers to increase overall bandwidth. Why hasn't CAMM gotten anywhere?

I also really want an update to typical form factors and interconnects of desktop computers. They've been roughly frozen for decades. Off the top of my head:

- Move to single-voltage power supplies at 36-57 volts.

- Move to bladed power connectors with fewer pins.

- Get rid of the "expansion card" and switch to twinax ribbon interconnects.

- Standardize on a couple sizes of "expansion socket" instead, putting the high heat-flux components on the bottom side of the board.

- Redesign cases to be effectively a single ginormous heatsink with mounting sockets to accept things which produce heat.

- Kill SATA. It's over.

- Use USB-C connectors for both power and data for internal peripherals like disks. Now there's no difference between internal and external peripherals.

gjsman-1000 3 days ago

> Why hasn't CAMM gotten anywhere?
Framework asked AMD if they could use CAMM for their new Framework Desktop.
AMD actually humored the request and did some engineering, with simulations. According to Framework, the memory bandwidth on the simulations was less than half of the soldered version.
This completely defied the entire point of the chip - the massive 256 bit bus ideal for AI or other GPU-heavy tasks, which allows this chip to offer the features it does.
This is also why Framework has apologized for non upgradability, but said it can’t be helped, so enjoy fair and reasonable RAM prices. Previously, it had been speculated that CAMM had a performance penalty, but Framework’s engineer on video saying it was that bad was fairly shocking.
- arghwhat 3 days ago
  
  I do not believe they were asking for CAMM as replacement for soldered RAM, but as an upgrade for DIMMs in desktop.
  CAMM is touted as being better than DIMMs when it comes to signal integrity and possible speed. Soldered of course beat any socket, in-package beats any soldered RAM, and on-die beat any external component.
  That AMD Strix Halo is unable to maintain signal integrity for any socketed RAM is a Strix Halo problem, not a socket problem. They probably backed themselves a bit into a corner with other parts of the design sacrifying tolerances on the memory side, and it's a lot easier to push motherboard design requirements than redoing a chip.
  If this wasn't a Strix Halo issue, then they would have been able to run with socketed memory with a lower memory clock. All CPUs, this one included, has variable memory clocks that could be utilized and perform memory training as even the PCB traces to the chip cause significant signal degradation.
  - kimixa 3 days ago
    
    For signal integrity issues increasing the link power can often overcome some of the issues caused by longer traces and connectors in the line - while less of an issue for desktop devices, then that goes against the ideal of a low-powered device with limited cooling. Doubly so as it's hard to re-clock in the timescales needed for intermittent use power saving, so will be using that extra power when idle.
    I suspect the earlier comment about "Half the performance with CAMM" is likely at iso-power, but that might still be a pretty big dealbreaker.
    
    arghwhat 3 days ago
    
    More power is to overcome switching losses and parasitic reactances. You can increase drive strength up to a limit to overcome this, but a slight clock reduction will make things work at the same power.
    CPU's and GPU's reclock extremely fast to my knowledge, but what we're talking about isn't dynamic reclocking, just limiting the max clock as suitable to the system design.
    We already see this when we have laptop silicon that run faster clocks when using soldered RAM compared to when the same silicon is using socketed counterparts.
    That this wasn't an option probably mean that they're either far too close to the limit, or unwilling to allow a design that runs below max speed.
- Tuna-Fish 3 days ago
  
  The problem was specifically routing the 256-bit LPDDR5X out of the chip into the CAMM2 connector. This is hard to do with such a wide bus, because LPDDR5X wasn't originally designed for it.
  LPDDR6X is designed for it, and an use CAMM2.
  - gjsman-1000 3 days ago
    
    Judging by that LPDDR5X was announced in 2019; and LPDDR6X was just announced in 2024... we're still a full laptop/desktop cycle away.
- sunshowers 3 days ago
  
  I'm curious how much the CUDIMM thing Intel is doing, where the RAM has its own clock, can help in the CAMM context. The Zen 4/5 memory controller doesn't support it but a future one might.
wmf 3 days ago

There's a rumor that future desktops will use LPDDR6 (with CAMMs presumably) instead of DDR6. Of course CAMMs will be slower so they might "only" run at ~8000 GT/s while soldered LPDDR6 will run at >10000.
- Tuna-Fish 3 days ago
  
  LPDDR6 won't go that low, even on CAMM2. The interface is designed for up to 14.4Gbps, with initial modules aiming for 10.6Gbps.
- nsteel 3 days ago
  
  What's the advantage of LPDDR6 Vs DDR6 given the latter will be faster ?
simoncion 3 days ago

> - Move to single-voltage power supplies at 36-57 volts.
Why? And why not 12V? Please be specific in your answers.
> - Get rid of the "expansion card" and switch to twinax ribbon interconnects.
If you want that, it's available right now. Look for a product known as "PCI Express Riser Cable". Given that the "row of slots to slot in stiff cards" makes for nicely-standardized cases and card installation procedures that are fairly easy to understand, I'm sceptical that ditching slots and moving to riser cables for everything would be a benefit.
> - Kill SATA. It's over.
I disagree, but whatever. If you just want to reduce the number of ports on the board, mandate Mini SAS HD ports that are wired into a U.2 controller that can break each port out into four (or more) SATA connectors. This will give folks who want it very fast storage, but also allow the option to attach SATA storage.
> - Use USB-C connectors for both power and data for internal peripherals like disks.
God no. USB-C connectors are fragile as all hell and easy to mishandle. I hate those stupid little almost-a-wafer blades.
> - Standardize on a couple sizes of "expansion socket" instead...
What do you mean? I'm having trouble envisioning how any "expansion socket" would work well with today's highly-variably-sized expansion cards. (I'm thinking especially of graphics accelerator cards of today and the recent past, which come in a very large array of sizes.)
> - Redesign cases to be effectively a single ginormous heatsink with mounting sockets...
See my questions to the previous quote above. I currently don't see how this would work.
- wtallis 3 days ago
  
  Graphics cards have finally converged on all using about the same small size for the PCB. The only thing varying is the size of the heatsink, and due to the inappropriate nature of the current legacy form factor (which was optimized for large PCBs) the heatsinks grow along the wrong dimension and are louder and less effective than they should be.
- arghwhat 3 days ago
  
  > Why? And why not 12V? Please be specific in your answers.
  Higher voltages improve transmission efficiency, in particularly for connectors, as long as sufficient insulation is easy to maintain. Datacenters are looking at 48V for a reason.
  Nothing comes for free though, and it makes for slightly more work for the various buck converters.
  > God no. USB-C connectors are fragile as all hell and easy to mishandle. I hate those stupid little almost-a-wafer blades.
  They are numerous orders of magnitude more rugged than any internal connector you've used - most of them are only designed to handle insertion a handful of times (sometime connectors even only work once!), vs. ten thousand times for the USB-C connector. In that sense, a locking USB-C connector would be quite superior.
  ... on that single metric. It would be ridiculously overcomplicated, driving up part costs when a trivial and stupidly cheap connector can do the job sufficiently. Having to run off 48V to push 240W and have no further power budget at all also increase complexity, cost and add limitations.
  USB-C is meant for end-user things where everything has to be crammed into the same, tiny connector, where it does great.
  - zbrozek 3 days ago
    
    I figured the USB-C connector would be the most controversial of the idea list. Certainly there might be better options, but I was interested in trying to harmonize on something that could radically cut connector diversity and drive up fungibility and flexibility. I was imagining that the power half of it would be USB-PD PPS+EPR from the motherboard to the peripheral.
    The idea is that USB-C would be the "small widget" connector for things like large form factor disks (not m.2), optical drives (as much as those still exist), etc. Heavy iron things would use the bladed connector for power and the twinax ribbon for data. PCIe is great.
    It would also not sadden me tremendously to have CPU+RAM be a single device, be it on the CPU substrate or as a system-on-module. That flavor of upgradeability isn't worth losing a ton of performance. I'm sure tons of folks would hate it.
    
    arghwhat 3 days ago
    
    Honestly, the idea of upgradeable RAM was purely tied to it being low-speed, high-cost devices. This in turn drives the ridiculous markup some companies charge now, drives up system costs due to complexity, and harms performance.
    Having a desktop CPU with integrated 64-128GB of RAM would probably justify the lack of "upgradability" with sheer performance and power improvements. There was never much upgradability there anyway - you could start by underspec'ing and then later fill it up to the (surprisingly not that high) limit, but what you saved by not buying all the RAM up front is probably less than the extra cost for supporting it at all. Paying more than the cost of RAM to have the flexibility of not buying RAM.
    (Phones have 24GB of RAM nowadays, a PC should not have any less than 64GB of RAM as base tier.)

runjake 3 days ago

When, if ever, will this be released as a bare processor/memory/motherboard combination that I can buy and throw in my own case?

Does anyone know?

Timshel 3 days ago

Framework is selling the board as a stand alone: https://frame.work/fr/en/products/framework-desktop-mainboar...
Too bad there isn't a full PCIe (might not be enough bandwidth left) :(.
- runjake 3 days ago
  
  I was looking to see if they sold only the motherboard just last night and failed. Thanks!!
  $1,299 (64GB) and $1,599 USD (128GB) for the motherboards. Yikes, but I get why.
wmf 3 days ago

Here's one motherboard: https://frame.work/products/framework-desktop-mainboard-amd-...
I wouldn't be surprised if Minisforum also offers a motherboard.
colejohnson66 3 days ago

Framework Desktop
https://frame.work/products/desktop-diy-amd-aimax300

sourtrident 3 days ago

Fascinating how Strix Halo feels like AMD's spiritual successor to their ATI merger dreams - finally delivering desktop-class graphics and CPU power in a genuinely portable form factor. Can't wait to see where it pushes laptop capabilities.

heraldgeezer 3 days ago

Cool I guess for a mini PC but Im one of those desktop PC tower nerds :)

hulk-konen 3 days ago

I hope they make this in ATX (or mATX) form factor, toss out all the size, energy, and heat concerns, and add more ports and interfaces.
- heraldgeezer 2 days ago
  
  At those sizes, just get a real CPU and GPU separate. Was my point.

Tepix 3 days ago

I think having a (small desktop) system with Strix Halo plus a GPU to accelerate prompt processing could be a good combo, avoiding the weakness of the Mac Ultra. The Strix Halo has 16 PCIe lanes.

nrp 3 days ago

Note that none of the PCIe interfaces on Strix Halo are larger than x4. The intent is to allow multiple NVMe drives and a Wi-Fi card. We also used PCIe for 5Gbit Ethernet.
- Scramblejams 3 days ago
  
  Love what you're doing, I'm in batch 4!
  Feedback: That 4x slot looks like it's closed on the end. Can we get an open-ended slot there instead so we can choose to install cards with longer interfaces? That's often a useful fallback.
- sunshowers 3 days ago
  
  Hi Nirav! Long time admirer.
  Gen 4 x4 or gen 5 x4? I saw that gen 5 x4 results in maybe a 3% decrease in 5090 performance compared to gen 5 x16.
  - nrp 2 days ago
    
    Gen 4 x4. That’s a limitation of Ryzen AI Max.
    
    sunshowers 2 days ago
    
    Thank you! Even gen 4 x4 doesn't look too bad, just a 6% drop at 4k with a 5090.
    I realize pairing this with a 5090 is a bit ridiculous, but... is it?
    https://www.techpowerup.com/review/nvidia-geforce-rtx-5090-p...
  - Tepix a day ago
    
    That really depends on the use case, does it not?
aurareturn 3 days ago

Max RAM for Strix Halo is 128GB. It's not a competitor to the Mac Ultra which goes up to 512GB.
You shouldn't need another GPU to do prompt processing for Strix Halo since the biggest model it can realistically run is a 70B model. Prompt processing isn't going to help much because it has a good enough GPU but its memory bandwidth is only 256GB/s (~210 GB/s effective).
- Gracana 3 days ago
  
  Despite the hype, the 512GB Mac is not really a good buy for LLMs. The ability to run a giant model on it is a novelty that will wear off quickly... it's just too slow to run them at that size, and in practice it has the same sweet spot of 30-70B that you'd have with a much cheaper machine with a GPU, without the advantage of being able to run smaller models at full-GPU-accelerated speed.
  - SV_BubbleTime 3 days ago
    
    There’s so much flux in LLM requirements.
    2 to 3 tokens per second was actually probably fine for most things last year.
    Now, with reasoning and deep searching, research models, you’re gonna generate 1000 or more tokens just as it’s talking to itself to figure out what to do for you.
    So everyone’s focused on how big a model you can fit inside your ram, the inference speed is now more important than it was.
    
    Gracana 3 days ago
    
    Absolutely.
    The thinking models really hurt. I was happy with anything that ran at least as fast as I could read, then "thinking" became a thing and now I need it to run ten times faster.
    I guess code is tough too. If I'm talking to a model I'll read everything it says, so 10-20 tok/s is well and good, but that's molasses slow if it's outputting code and I'm scanning it to see if it looks right.
    
    adgjlsfhk1 3 days ago
    
    counterpoint: thinking models are good since they give similar quality at smaller RAM sizes. if a 16b thinking model is as good as a 60b one shot model, you can use more compute without as much RAM bottleneck
    
    terribleperson 3 days ago
    
    Counter-counterpoint: RAM costs are coming down fast this year. Compute, not so much.
    I still agree, though.
  - aurareturn 3 days ago
    
    It runs DeepSeek R1 q4 MoE well enough.
    
    Gracana 3 days ago
    
    It does have an edge on being able to run large MoE models.
- porphyra 3 days ago
  
  The $2000 strix halo with 128 GB might not compete with the $9000 Mac Studio with 512 GB but is a competitor to the $4000 Mac Studio with 96 GB. The slow memory bandwidth is a bummer, though.
  - aurareturn 3 days ago
    
    but is a competitor to the $4000 Mac Studio with 96 GB. The slow memory bandwidth is a bummer, though.
    Not really. The M4 Max has 2x the GPU power, 2.13x the bandwidth, faster CPU.
    $2000 M4 Pro Mini is more of a direct comparison. The Mini only has 64GB max ram but realistically, a 32B model is the biggest model you want to run with less than 300 GB/s bandwidth.
    
    Tepix 3 days ago
    
    You will be limited to a much smaller context size with half the RAM even if you're using a smaller model.
- izacus 3 days ago
  
  > Max RAM for Strix Halo is 128GB. It's not a competitor to the Mac Ultra which goes up to 512GB.
  What a... strange statement. How did you get to that conclusion?
  - aurareturn 3 days ago
    
    Why do you think it's strange?
    
    izacus 3 days ago
    
    The original poster arrogantly and confidently proclaims that a device that costs like 2000$ isn't going to be able to compete against a 10000$ SKU of another device.
    I'm wondering how do you get to such a conclusion?
    
    aurareturn 3 days ago
    
    I honestly don't understand what you're trying to say.
- Tepix 3 days ago
  
  Running something like Qwq 32b q4 with a ~50k context will use up those 128GB with the large KV cache.
  - aurareturn 3 days ago
    
    So what makes you think Strix Halo with such a weak GPU and slow memory bandwidth can handle 50k context with a usable experience for a 32B model?
    Let's be realistic here.
    The compute, bandwidth, capacity (if 128GB) are completely imbalanced for Strix Halo. M4 Pro with 64GB is much more balanced.
- Tepix 3 days ago
  
  Of course it's a competitor. Only a fraction of M3 Ultra sold will have 512GB RAM

elorant 3 days ago

But why did they choose to build this as a mobile cpu though? I don’t need 128GB of unified RAM on my laptop. It’s the desktop where things happen.

jchw 3 days ago

People keep saying "to compete with Apple" which of course, is nonsense. Apple isn't even second or third place in laptop marketshare last I checked.
So why build powerful laptops? Simple: people want powerful laptops. Remoting to a desktop isn't really a slam dunk experience, so having sufficient local firepower to do real work is a selling point. I do work on both a desktop and a laptop and it's nice being able to bring a portable workstation wherever I might need it, or just around the house.
- dietr1ch 3 days ago
  
  This is a really good point. It's not easy to use both a laptop and a desktop at the same time. There's challenges around locality, latency, limited throughput, unavailability that software can't easily deal with, so you need to be aware and smart about it, and you'll need to compromise on things.
  I'd work from my workstation at all times if I could. Tramp is alright, but not too fast and fundamentally can't make things transparent.
alienthrowaway 3 days ago

"Mobile CPU" has recently come to mean more than laptops. The Steam Deck validated the market for handheld gaming computers, and other OEMs have joined the fray. Even Microsoft intends to release an XBox-branded portable. I think there's an market opportunity for better-than-800p handheld gaming, and Strix Halo is perfectly positioned for it - I wouldn't bet against the handheld XBox running in this very processor.
cptskippy 3 days ago

Because people are accustom to unified memory in laptops and also complain about the low amounts of ram and inability to upgrade.
This solves those problems but apparently uncovers a new one.
- ForTheKidz 3 days ago
  
  > Because people are accustom to unified memory in laptops
  Surely the vast majority of laptops sold in the last five years don't have unified memory yet.
  - zamadatix 3 days ago
    
    I've never seen a good technical comparison showing what's new between "Unified Memory" vs traditional APUs/iGPUs memory subsystems laptops have had for over a decade, only comparisons to dGPU setups which are rarer in laptops. The biggest differences comparing Apple Silicon or Strix Halo to their predecessors seems to be more about the overall performance scale, particularly of the iGPU, than the way memory is shared. Articles and blogposts most commonly reference:
    - The CPU/GPU memory are shared (does not have to be dedicated to be used by either).
    - You don't need to copy data in memory to move it between the CPU/GPU.
    - It still uses separate caches for the CPU & GPU but the two are able to talk to each other directly on the same die instead of an external bus.
    But these have long been true of traditional APUs/iGPUs, not new changes. I did even see some claims Apple put the memory on die too and that's what makes it unified but checking that it seems to still actually be "on package", which isn't unique either, and it wouldn't explain any differences in access patterns anyways. I've been particularly confused as to why Strix Halo would now qualify as having Unified Memory when it doesn't seem anything is different than before, save the performance.
    If anyone has a deeper understanding of what's new in the Unified Memory approach it'd be appreciated!
    
    ForTheKidz 2 days ago
    
    You're obviously more familiar with the topic than I so I'll trust your insight into the age of the term and concept, but none of these describe common chipsets for laptops that a consumer would expect. Unless we're talking a very specific user trying to engage in heavy GPGPU computation here (on a laptop?? I can't imagine the market for this is very large). Most users of apple's M* laptops don't know or care they they have a laptop that shares memory in a different way than the intel laptops generally allowed for.
    
    kbolino 3 days ago
    
    I believe, but don't know for sure, that classic iGPUs still behaved like discrete PCI devices under the hood, and accessed RAM using DMA over PCI(e), which is slower than the RAM is capable of, and also adds some overhead. Whereas, modern unified memory approaches have a direct high-bandwidth connection between RAM and the GPU, which may be shared with the CPU, bypassing PCI entirely and accessing the memory at its full speed.
    
    zamadatix 3 days ago
    
    My understanding is this was true for the original Trinity era APUs in 2012 but by 2014 Kevari APUs had already put them on the same bus so passing the pointer over PCIe was no longer necessary for systems with unified system memory https://en.wikipedia.org/wiki/Heterogeneous_System_Architect...
    Reading about HSA again (it's been many years) it seems like ARM was one of the ones originally part of defining it back then and I wonder if Apple actually did anything different on top this at all or just branded/marketed the use of what was already there.
    
    kbolino 2 days ago
    
    Great link!
    It seems there are 3 key things which distinguish the modern "unified memory architecture" from its predecessors:
    1. Pointer passing instead of buffer copying 2. Separate bus for memory access alongside the PCIe bus 3. No partitioning of RAM into exclusive CPU vs. GPU areas
    These features seem to have come at different times, and it's the combination of all three that defines the modern approach. Broadly speaking, whereas classic iGPUs were still functionally "peripheral devices", modern UMA/HSA iGPUs are coequal partners with their CPUs.
    AMD seems to have delivered the first hardware meeting these criteria about 8-9 years ago, beating Apple to the punch by a couple years. However, AMD's memory bandwidth can be quite a bit behind Apple's. The M1 Pro handily beats anything AMD had out at the time (200 GB/s vs. 120 GB/s), and the M1 Max has double that bandwidth.
  - cptskippy 3 days ago
    
    Perhaps not unified in the CPU package but soldered to the motherboard and not upgradable.
  - wmf 3 days ago
    
    Yes, around 90% of laptops sold in the last ten years have unified memory.
    
    ForTheKidz 2 days ago
    
    The term doesn't just mean "has an igpu" right? I'd guess the figure is higher than 90% if that's how you're defining it—most motherboards come with igpus now and certainly almost all laptop mainboards. Otherwise I'm not sure how you are defining it to find that figure!
    
    wmf 2 days ago
    
    There are other aspects IMO like the iGPU has to support memory translation and coherence but AMD and Intel have also had those features for years (even if the drivers don't use them).
bangaladore 3 days ago

Compete with Apple is my guess. There is a decent market for super high end laptops.
Framework (I believe) made one of these into a purchasable desktop.
icegreentea2 3 days ago

128GB is actually a step down. The previous generation (of sorts) Strix Point had maximum memory capacity of 256GB.
The mini-PC market (which basically all uses laptop chips) seems pretty robust (especially in Asia/China). They've basically torn out the bottom of the traditional small form factor market.
aurareturn 3 days ago

Because desktops are a much smaller market and AMD caught the Apple Silicon FOMO.
- alienthrowaway 3 days ago
  
  IMO, the likely cause is AMD capitalizing on multiple OEMs having Steam-Deck envy and/or setting the foundation for the Steam Deck 2 with near-desktop graphics fidelity rather than 800p medium/low settings users have to put up with.
  - aurareturn 3 days ago
    
    Way too power hungry for a handheld gaming machine and way too small market. No way AMD spent all this effort designing a brand new SoC architecture just for Steam Deck.
ThatMedicIsASpy 3 days ago

I wonder if you could actually put these into a socket or issues would occur.
- neogodless 3 days ago
  
  While I think the market is small, and they don't release a lot of these, AMD has sold desktop / socketed APUs in the past.
  They tend to come out much slower than the laptop chips, or the "CPU-only" desktop chips.
  This is one of the more recent examples: https://www.amd.com/en/products/processors/desktops/ryzen/80...

linwangg 3 days ago

[dead]

aurareturn 3 days ago

During testing, the RTX 4060m is about 13% faster in 1080p than the AI Max 395+. With Ray Tracing, the RTX 4060m is 35% faster.
So it's probably closer to RTX 4050m than an RTX 4070m.
Also, the Strix Halo laptop is nearly 3x more expensive than the RTX 4060 laptop. So expect to pay more instead of less.
https://www.youtube.com/watch?v=RycbWuyQHLY
- NikolaNovak 3 days ago
  
  That's the thing. Minipcs with high end igpu seem to be encroaching or more expensive than entry to mid gaming laptops from reputable brands that have discrete gpu, plus screen and battery and keyboard etc :-/
  I prefer the NUC/mini pc format, currently running 2 Beelink SER8, but wish the price point / value proposition of high end Minipcs wasn't so much out of line with laptops.

randomNumber7 3 days ago

As long as they cant even provide something similiar to a simple CUDA C API on consumer hardware i dont buy their stuff.

pjmlp 3 days ago

There is no such thing as simple CUDA C API, that is the mistake most folks do when talking about CUDA.
It won over OpenCL, because it is a polyglot ecosystem, with first tier support for C, C++, Fortran, and Python (JIT DSL), plus several languages that have toolchains targeting PTX, the IDE integration, graphical debugger, compute and graphical rendering libraries.
All of the above AMD and Intel could have provided for OpenCL, but never did when it mattered, not even after SPIR was introduced.
Now they finally have GPGPU support for Fortran, C++, Python JIT DSLs, but a bit too late to the party, because contrary to NVidia it isn't like those tools are available regardless of the card.
- randomNumber7 3 days ago
  
  The early versions had been only C. Then they added a lot of stuff.
  You don't need all the fancy stuff, but OpenCL (and even more so Vulkan) are too complicated when all you want to do is some gpu number crunching.
  Being able to write a kernel with something that looks like C. Having pointers on gpu and cpu and being able to call these kernels somewhat conveniently (like CUDA C) would be a great starting point.
  - pjmlp 3 days ago
    
    Early meaning until CUDA 3.0 in 2010, we are now on CUDA 12.8, 15 years later.

FloatArtifact 3 days ago

Seems like Apple's M2 is a sweet spot for AI performance at 800 GB/s of memory bandwidth which can be added under $1,500 refurbished for 65 gigs of RAM.

crazystar 3 days ago

Where for $1500?
- runjake 3 days ago
  
  Not on Apple Refurbs. That would cost you about $2200.
  And the M2 Max has a memory bandwidth of 400GB/s.
  - FloatArtifact 3 days ago
    
    Whoops, I got confused between the Max and the Ultra for memory bandwidth. But I have seen on occasion, months ago, reverbs for that price.
  - sroussey 3 days ago
    
    I’m guessing a reference to M2 Ultra? Not sure about that price though…
    
    runjake 3 days ago
    
    M2 Ultra refurb was over $4,000, last I checked.