blob: 3a48cf26fd414a66cf43540e53e9ff9ff5a2ab44 (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
Can we optimize non-locking RMW atomic operations?
Currently we convert all lock RMW ops to acquire-release semantics.
Couple weird things to investigate here
1. Basic ALU ops without lock
- Non-lock ops get turned in to load + ALU + store
- Can potentially convert in to atomic memory operation **without** acquire-release semantics.
- Should only generate on ARMv8.1+ if it supports atomic memory ops
- Might need hardware TSO support?
2. RMW ops that don't imply LOCK but really should, used without LOCK
- **CMPXCHG, CMPXCHG8B, CMPXCHG16B, XADD**
- These instructions don't imply LOCK prefixes but they are almost universally used with them
- Linux kernel has some optimization where it backpatches `lock cmpxchg` in to `nop cmpxchg` on uniprocessors? Citation needed.
- These might be able to be converted to operations with...release? semantics?
- Needs investigation.
|