...which export settings in FCPX and Compressor are actually using the H264/HEVC/ProRes engines in my M1Max...
In general the FCP H264/HEVC/ProRes export settings should use the hardware accelerator(s). An exception is the built-in 10-bit HEVC "Apple Devices" preset does not apparently use it, but a similar preset can be defined in Compressor which does.
The observed performance implies one decoder and one encoder are being used on ProRes, H264 and HEVC.
However -- there are apparently issues with FCP/Compressor on the M1 Max and M1 Ultra not fully using the multiple accelerators. E.g, while the M1 Pro has one H264/HEVC decode and one encode accelerator, the M1 Max has two H264/HEVC encoders plus two ProRes engines and the M1 Ultra has two decoders, four encoders plus four ProRes engines. Yet I don't think the M1 Max is significantly faster at encoding H264/HEVC than the M1 Pro, and I know the M1 Ultra is no faster on that task than the M1 Max because I have both machines and tested it.
On ProRes the M1 Ultra seems only slightly faster than the M1 Max at ProRes-to-ProRes transcoding, e.g, creating 50% ProRes Proxies from 4k ProRes 422. In theory it should be 2x faster since it has double the accelerators and it's not otherwise I/O or CPU limited. Both M1 Max and Ultra are extremely fast at ProRes decode/encode, but the M1 Ultra is not fully harnessing all four ProRes accelerators.
For H264/HEVC, those are "Long GOP" formats and in many cases each GOP (Group Of Pictures) is totally independent. Thus a pool of workers threads can employ multiple accelerators, each on a separate GOP. It appears this is not happening on current versions of MacOS and FCP. There are Long GOP formats where each GOP is dependent on others, and this could place a limit on parallelism, but I'm not sure how to identify those.
I suspect there is some issue at either a MacOS system layer, FCP application layer (or both) concerning thread reentrancy. Up until the M1 Max there was only a single hardware accelerator per CPU (not per core). E.g, all Intel x86 CPUs have only a single Quick Sync unit.
When a single hardware unit (either CPU core or accelerator) long exists, software is written under that assumption, then new designs make multiple units available, this frequently exposes reentrancy issues. Of course Apple has long known the M1 Max and Ultra were coming, so I don't know why they did not coordinate between hardware and software from the outset.
You'd think with making the media engines such a a centre of attention for marketing they would (a) make sure they're fully utilised and (b) let you know hen you're using them, increasing the envy factor for people who haven't upgraded yet.