...please let me know if you have any idea about what's hanging me up here.
Your main stated issue is less-than-expected perf. improvement when comparing a late 2013 MacBook Pro (2.6 Ghz 16GB/1TB, 750M) to an M1 Max MBP when exporting ProRes 422 material to H264.
Maybe you are focusing on CPU graphs, which do not always accurately reflect performance. E.g, when encoding if FCP hands off frames to hardware acceleration, that is not counted as CPU usage. It appears that iStat Menus attributes hardware video acceleration to the GPU, even though it is totally separate. Does it do that for all codecs, or some codecs? I don't know. Unfortunately there are no rules about such things and no perf. tool I'm aware of selectively monitors all combinations of hardware video acceleration separate from the CPU or GPU. There are cases where the hardware is working at full speed but the monitoring tools do not give you visibility on that. The only way to be totally certain of performance is doing timed tests under controlled conditions.
For situations like this, it's important to differentiate between render vs encode performance. Even if the timeline is fully rendered there are various situations in FCP where it may not use those render files to expedite export. This blurs the assessment of what is really causing the slowdown: a general render perf. issue, a specific render perf. issue caused by certain plugins or their order in the Fx stack, a issue with encode perf. during export or an I/O issue when exporting high-bitrate files like 4k ProRes 422.
While you can time render perf. when doing a CTRL+R on selected timeline clips, the time required can vary greatly depending on the Fx and their order in the stack. E.g, if you have an Fx that processes a sliding window of several frames (Neat Video, Flicker Free, etc) that will force recomputation of all Fx above that in the stack for each frame in the window. Thus it is best to put those Fx 1st in the stack, ideally only using one.
The only truly reliable way to segregate render vs export performance is export the timeline as ProRes 422, then re-import that and export it with all Fx baked in using the final codec such as H264. If reading 4k ProRes 422 and exporting to 4k ProRes 422, on M1 Max it is easy to hit an I/O limit if the disk is not an SSD with over about 2 gigabytes/sec bandwidth. If exporting to H264 that is much smaller, involves more compute time (even if hardware accelerated) and you'll rarely hit an I/O bottleneck.
I don't have an older x86 MBP but I have a 10-core Vega 64 iMac Pro and an M1 Max MBP 16. I did several tests with both running Monterey 12.2.1, FCP 10.6.1 and Resolve Studio 17.4.5. Both machines have extremely fast external 4-drive Thunderbolt SSD RAID-0 arrays, and media is on those with export files on the internal SSD.
Summary: in general FCP is very fast and for certain workflows the perf. improvements on M1 Max can be significant. Usually FCP on M1 Max handles most "difficult" codecs much smoother than the iMac Pro. The fastest FCP M1 Max transcoding case is creating 50% ProRes Proxies from 4k ProRes 422. It is incredibly fast, with peak I/O rates of over 1.3 gigabytes/sec. However -- there are several cases where FCP has some performance problems.
One of these has been long known, which is extremely slow 4k 10-bit HEVC export if using the built-in HEVC export preset to a .M4V file. That unfortunately continues on M1 Max. While it is 1.5x faster than an iMac Pro, it is nonetheless 25x (!!!) slower than Resolve Studio 17.4.5 on the same M1 Max machine, exporting with the same HEVC parameters. That's not new to M1 Max, Resolve is about 20x faster than FCP on iMac Pro when exporting to 10-bit HEVC, if FCP is using the built-in preset. However -- if you simply export that to Compressor and create a custom preset with the same resolution, bit depth and bit rate, it is fast. Apple obviously needs to fix that.
There is a known FCP performance bug on M1 Max (maybe all Apple Silicon) where creating 50% H264 proxies from 4k originals is abnormally slow. All other H264 proxy resolutions are fast and creating ProRes Proxies is extremely fast.
If you are using plugins like Neat Video, the latest versions have significantly improved performance on Apple Silicon. If you are using 3rd-party audio plugins, if not updated to Apple Silicon some of those may work in an x86 container process under Rosetta. There could be performance or reliability issues with that.
Other tests below:
** 4k/23.98 ProRes 422 export to 4k/23.98 "fast" H264, I get the following numbers for a 60 sec clip with no Fx **
(all pre-rendering and caching disabled in all cases)
iMac Pro, FCP 10.6.1: 29.3 sec
iMac Pro, Resolve Studio 17.4.5: 29.2 sec
M1 Max, FCP 10.6.1: 18.9 sec
M1 Max, Resolve Studio 17.4.5: 17.4 sec
** 4k/23.98 ProRes 422 export to 4k ProRes 422 (no Fx) **
iMac Pro, FCP 10.6.1: 8.9 sec
Resolve Studio 17.4.5: 2.9 sec
M1 Max, FCP 10.6.1: 3.6 sec
Resolve Studio 17.4.5: 1.9 sec