blob: 519279af3b9eb74a21e3dfe616b940cda911d5c4 (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
|
Question/Request: About Box64 x87 reduced precison info and comparison vs new "rosettax87" project hack?
Hi,
very new to Box32/64 and finding that Box64 supports some kinds of x87 reduced precision..
from docs:
https://github.com/ptitSeb/box64/blob/main/docs/USAGE.md
https://github.com/ptitSeb/box64/blob/main/docs/box64.pod
we have:
BOX64_X87_NO80BITS
BOX64_DYNAREC_X87DOUBLE
I see with BOX64_DYNAREC_X87DOUBLE that you even allow/default to single precision using this option..
0: Try to use float when possible for x87 emulation. [Default]
1: Only use Double for x87 emulation.
2: Check Precision Control low precision on x87 emulation.
so question is if you can share what perf can we gain vs non reduced x87 precision on targeted microtests?
I say this because for Rosetta since recently we have project:
https://github.com/Lifeisawful/rosettax87
which acceleretes x87 computation in some cases 10X at least on M4:
https://github.com/Lifeisawful/rosettax87/issues/2
at least on the simple sample benchmark shared on this project (using x87 for calculating fsqrt seems):
clang -v -arch x86_64 -mno-sse -mfpmath=387 ./sample/math.c -o ./build/math
at least
Rosetta M4:
Average time: 123040 ticks
Rosettax87 M4:
Average time: 12222 ticks
part of code:
```
#define TIMES 1000000
#define RUNS 10
#define METHOD run_fsqrt
clock_t run_fsqrt() {
float sixteen = 16.0f;
clock_t start = clock();
// Run fsqrt many times to get measurable time
float four;
for(int i = 0; i < TIMES; i++) {
four = __builtin_sqrtf(sixteen);
}
clock_t end = clock();
clock_t time_spent = (end - start);
printf("Result: %x\n", *(uint32_t*)&four);
return time_spent;
}
int main() {
clock_t times[RUNS];
clock_t sum = 0;
printf("benchmark %s\n", STRINGIFY(METHOD));
// Perform multiple runs
for(int i = 0; i < RUNS; i++) {
times[i] = METHOD();
sum += times[i];
printf("Run %d time: %lu ticks\n", i+1, times[i]);
}
// Calculate average using integer math
clock_t avg = sum / RUNS;
printf("\nAverage time: %lu ticks\n", avg);
return 0;
}
```
|