For Macs with >= 8 performance cores, we select CPU+GPU (original attention). Otherwise we select CPU+ANE (split einsum). Some computers (M1 Pro, 16 core GPU) might yield slightly better performance using CPU+GPU+ANE with SPLIT_EINSUM.