summary refs log tree commit diff stats
path: root/results/scraper/box64/2595
blob: 519279af3b9eb74a21e3dfe616b940cda911d5c4 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
Question/Request: About Box64 x87 reduced precison info and comparison vs new "rosettax87" project hack?
Hi,
very new to Box32/64 and finding that Box64 supports some kinds of x87 reduced precision..

from docs:
https://github.com/ptitSeb/box64/blob/main/docs/USAGE.md
https://github.com/ptitSeb/box64/blob/main/docs/box64.pod
we have:
BOX64_X87_NO80BITS
BOX64_DYNAREC_X87DOUBLE

I see with BOX64_DYNAREC_X87DOUBLE that you even allow/default to single precision using this option..
0: Try to use float when possible for x87 emulation. [Default]
1: Only use Double for x87 emulation.
2: Check Precision Control low precision on x87 emulation.

so question is if you can share what perf can we gain vs non reduced x87 precision on targeted microtests? 


I say this because for Rosetta since recently we have project:
https://github.com/Lifeisawful/rosettax87
which acceleretes x87 computation in some cases 10X at least on M4:
https://github.com/Lifeisawful/rosettax87/issues/2

at least on the  simple sample benchmark shared on this project (using x87 for calculating fsqrt seems):
clang -v -arch x86_64 -mno-sse -mfpmath=387 ./sample/math.c -o ./build/math

at least
Rosetta M4:
Average time: 123040 ticks

Rosettax87 M4:
Average time: 12222 ticks

part of code:

```
#define TIMES 1000000
#define RUNS 10
#define METHOD run_fsqrt

clock_t run_fsqrt() {
    float sixteen = 16.0f;
    
    clock_t start = clock();
    
    // Run fsqrt many times to get measurable time
    float four;
    for(int i = 0; i < TIMES; i++) {
        four = __builtin_sqrtf(sixteen);
    }
    
    clock_t end = clock();
    clock_t time_spent = (end - start);
    
    printf("Result: %x\n", *(uint32_t*)&four);
    return time_spent;
}

int main() {
    clock_t times[RUNS];
    clock_t sum = 0;

    printf("benchmark %s\n", STRINGIFY(METHOD));
    
    // Perform multiple runs
    for(int i = 0; i < RUNS; i++) {
        times[i] = METHOD();
        sum += times[i];
        printf("Run %d time: %lu ticks\n", i+1, times[i]);
    }
    
    // Calculate average using integer math
    clock_t avg = sum / RUNS;
    printf("\nAverage time: %lu ticks\n", avg);
    
    return 0;
}


```