Interesting article. At first glance I wasn't sure how how semiconductor physics will permit a 64GB SoC by mid-2021, when "unified memory" implies the CPU, GPU and DRAM must all be packaged together. This is an issue because the M1's transistor budget is "only" 16 billion using 5nm fabrication. 3nm won't be available in full production until 2H2022.
However -- it appears DRAM is not on chip but on package, which gives a lot more flexibility. That could be significantly increased. www.anandtech.com/show/16226/apple-silicon-m1-a14-deep-dive
The M1's die size is 120 square mm: www.tomshardware.com/news/apple-m1-vs-apple-m14-floorplans
If we compare this to Intel's i9-10900K used in the iMac 27, that is 206 square mm using 14nm fabrication: www.techpowerup.com/267649/intel-core-i9...ie-size-measurements
If we conservatively predict TSMC's 5nm process could produce an approx. 180 mm^2 die, that might increase the transistor budget of a hypothetical M2 by 50% to about 24 billion. Comparing that to published die shots of the M1 (above), that's very roughly in line with Alex's estimate of 8 CPU "power cores" and 10 GPU cores in the M2.
But good as the M1 GPU is, and even with the efficiency of tile-based deferred rendering (developer.apple.com/videos/play/wwdc2020/10632/
) the M-series needs a lot more GPU horsepower for certain tasks. I don't see how it could fit on the current die at 5nm. Given Apple's current direction and the requirement for extreme bandwidth, it seems unlikely this would be a discrete GPU on PCIe or any other published bus. I see three possibilities, all centered around maintaining the "unified memory" approach. In order of increasing bandwidth, these are:
- A physically discrete soldered-in proprietary Apple GPU in a separate package, communicating via a proprietary ultra-speed bus.
- Similar to DRAM, an on-package GPU inside the SoC using an even faster on-package bus.
- "Chiplet" design GPU. IOW a small die integrated onto the same substrate as the SoC die. This would be necessary because of insufficient transistor budget at 5nm to produce a high-core-count on-die GPU: semiengineering.com/the-good-and-bad-of-chiplets/
In theory any of the above would free up the die space currently occupied by the integrated GPU, making that available for more cores or other IP blocks.
Eventually TSMC's 3nm process should increase areal density to about 250 million transistors per square mm by around 2H2022. If we do the math that implies maybe 50 billion transistors on a 180 mm^2 die, which might enable bringing a higher-core-count GPU back on die. The advantages would be higher speed and improved manufacturing economics: fuse.wikichip.org/news/3453/tsmc-ramps-5...r-square-millimeter/
I'm not a semiconductor engineer, this is only speculation.