blob: ff7b7235eb3c8d3581b286e9857836e8f7205b5b (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
|
Question/Request: FEX x87 reduced precison info and comparison vs new "rosettax87" project hack?
Hi,
just very new to FEX and finding that FEX supports some kind of x87 reduced precision..
searching google I find references everywhere but not much info:
a)https://www.reddit.com/r/AsahiGaming/comments/1hv1cix/games_that_perform_better_with_reduced_x87/
b)https://fex-emu.com/FEX-2310/:
"Until then it is recommended to use FEX’s x87 reduced precision mode to try and alleviate some of the overhead."
c)https://fex-emu.com/FEX-2503/
"This allows to to see that maybe we should try enabling the x87 reduced precision option to get some performance back."
and in some forum I found:
FEX_X87REDUCEDPRECISION=1
so that's the enviroment variable to add, right?
also what perf can we gain vs non reduced on targeted microtests? what kind of reduction precision loss?
sorry if I don't know where to search for docs..
I say this because for Rosetta since recently we have project:
https://github.com/Lifeisawful/rosettax87
which acceleretes x87 computation in some cases 10X at least on M4:
https://github.com/Lifeisawful/rosettax87/issues/2
at least on this simple shared benchmark calculation fsqrt seems:
clang -v -arch x86_64 -mno-sse -mfpmath=387 ./sample/math.c -o ./build/math
at least
Rosetta M4:
Average time: 123040 ticks
Rosettax87 M4:
Average time: 12222 ticks
part of code:
```
#define TIMES 1000000
#define RUNS 10
#define METHOD run_fsqrt
clock_t run_fsqrt() {
float sixteen = 16.0f;
clock_t start = clock();
// Run fsqrt many times to get measurable time
float four;
for(int i = 0; i < TIMES; i++) {
four = __builtin_sqrtf(sixteen);
}
clock_t end = clock();
clock_t time_spent = (end - start);
printf("Result: %x\n", *(uint32_t*)&four);
return time_spent;
}
int main() {
clock_t times[RUNS];
clock_t sum = 0;
printf("benchmark %s\n", STRINGIFY(METHOD));
// Perform multiple runs
for(int i = 0; i < RUNS; i++) {
times[i] = METHOD();
sum += times[i];
printf("Run %d time: %lu ticks\n", i+1, times[i]);
}
// Calculate average using integer math
clock_t avg = sum / RUNS;
printf("\nAverage time: %lu ticks\n", avg);
return 0;
}
```
|