I couldn't find anything suggesting that spec2006 was compiled with Intel's compiler for Core M and LLVM for A9X. It seems far more probable that the same compiler was used for both. Do you have a link?
I remember we had a discussion in the Anandtech forums precisely about this issue. Thing is ICC is doing high level optimizations (on AST or DAG level), which prepares the code for SIMD parallelism and emits AVX instructions, where LLVM does not. Outcome is, that you essentially compare the performance of AVX units in x86 with Integer units in ARM because no NEON code was generated. There were other issues, like different pointer aliasing options given to the compiler. What Anandtech did was just asking both Intel and Apple for best options for SPEC and used that.
That would all be fine if they would have put the disclaimer in the article, that they rather compare compilers more than architectures but alas they did not.
What really matters is the design of the silicon behind the ISA, and there aren't many differences left in that area between ARM and x86.
Agreed, but you can only work within the restrictions given. For example Intel will never go to a weakly ordered memory model, because that would break compatibility. They have still translation to microcode with all the downsides, like caching pre-decoded instructions, which are subject to pre-decode misses etc. Another example would be the smaller GP register set (16 vs 32), variable size instruction length, no proper 3 operand instructions etc. There are lots of things holding x86/x64 back.
You might also watch the interview with Jim Keller:
https://www.youtube.com/watch?v=SOTFE7sJY-Q
He clearly states that you can get either more performance and/or higher efficiency with ARM. Keep in mind his team designed and ARM and an x86 architecture in parallel.
Using the same techniques, we'd still reach 50% of native performance, but those 50% now would just represent a lot more computing power than it did back then.
Well 50% CPU performance of Atom in average when emulated on Snapdragon 835 i would call best case/impressive.
However there are few applications, which exclusively run applications user code. So even when emulated, all the calls into the OS/App framework are translated into native calls.
As example, when selecting a menue item in a Win32 application, there are only few lines of (potentially emulated) application code and much of native code in the app framework involved in this task.
There are only few cases where application code dominates, like when running a video editor and you apply video effect.