Implement Intel style cross-cacheline atomics with ARM TME We don't currently have a CPU architecture that supports TME but we should track this. Any atomic operation that crosses a cacheline will fault on ARM. We then need to do two CAS loops to implement this. Problem is that this can tear. Intel works around this problem by supporting "Split locks" AMD should also just tear in this situation. Once we have an architecture that supports TME, use it to hold ownership of both sides of the cacheline and do the ops to ensure no tearing.