9 files changed, 275 insertions, 0 deletions
diff --git a/results/scraper/fex/documentation/1202 b/results/scraper/fex/documentation/1202
new file mode 100644
index 000000000..61ab04dba
--- /dev/null
+++ b/results/scraper/fex/documentation/1202
@@ -0,0 +1,45 @@
+pressure-vessel upstream requirements
+FEX needs some features in pressure-vessel in order to work correctly.
+This is because pressure-vessel messes with the real filesystem which means that FEX can't transparently hide all aspects of this.
+
+- pressure-vessel needs to check if running in a FEX environment. 
+  - Check CPUID for the CPU modelname for FEX `model name      : FEX-2108-21-ge90892a2`
+    - Format is `FEX-<YYMM[.<Minor rev>]>[-<Commits since last tag>-<CurrentCommit>]`
+    - eg: Released Tag `FEX-2108`
+    - eg: Released Tag with minor rev `FEX-2108.1`
+    - eg: Commit that isn't in a release, aka building from origin/main `FEX-2108-21-ge90892a2`
+    - eg: Future release `FEX-2109`
+    - Minor revisions haven't occured in FEX yet
+    - Always separated by dashes.
+    - `FEX-<YYMM>` will always exist, the rest are optional.
+- Once determined to be in a FEX environment, use the `FEXGetConfig` tool to find the currently configured rootfs
+  - PR #1204 Adds this configuration program.
+    - Exists since `FEX-2109` tagged revision
+  - `FEXGetConfig --current-rootfs` Returning the mounted rootfs location in the case of squashfs
+    - Or folder that the rootfs lives at if not squashfs
+  - `FEXGetConfig --current-rootfs-lock` To return the `lock` file to keep rootfs active.
+    - Necessary for new FEX instances to find the squashfs mount
+    - Will be in `/tmp/`
+  - `FEXGetConfig --current-rootfs-socket` To return the current UNIX domain socket for pipes watching
+    - Necessary to keep the FEXMountDaemon active while it tracks FEX instances
+    - Will be in `/tmp/`
+    - Only exists beggining at `FEX-2109-<X>`
+  - `FEXGetConfig --install-prefix` Will let you find where the FEX libraries are installed. Not everyone wants to install to /usr
+  - Optional `--app <Filename>` to get app profile configuration as well
+    - Usually not necessary, but future proofing will let us use this
+  - `FEXGetConfig --version` - Returns the same string as CPUID
+    - Aren't guaranteed to be running in a FEX environment without still checking CPUID. Be careful of that.
+  - Pressure-vessel should pull in $ROOTFS/usr/lib64 and $ROOTFS/usr/lib32 instead of true host folders
+    - Necessary since FEX may mount the rootfs in /tmp as squashfs or exist in ~/.fex-emu/RootFS or anywhere else
+    - Real host will not have any x86-64 or x86 libraries in the host root
+  - Also pull in $prefix/share/fex-emu/ and $prefix/lib/fex-emu/
+    - Necessary for thunk support
+    - Also need /lib/aarch64-linux-gnu/ for thunks
+  - $ROOTFS/etc?
+    - I'm not sure if this matters. 
+  - Pull in $prefix/FEXInterpreter for executing without binfmt_misc installed
+    - This can happen when testing on both an x86-64 and aarch64 host
+    - Since this is a hardlink to FEXLoader, special care might need to be taken? Not sure if a symlink to a hardlink exposes the original path or not.
+- Once in the chroot. Set `FEX_ROOTFS=''` since the new root is a true x86 environment.
+  - This will override the rootfs that FEX is using to nothing. Necessary otherwise some things break.
+  - pressure-vessel configures its rootfs in a functional way that this works.
\ No newline at end of file
diff --git a/results/scraper/fex/documentation/145 b/results/scraper/fex/documentation/145
new file mode 100644
index 000000000..41bdf3591
--- /dev/null
+++ b/results/scraper/fex/documentation/145
@@ -0,0 +1,15 @@
+FEX-EMU infrastructure
+We have
+- [fex-emu.org](fex-emu.org)
+- [Google Calendar](https://calendar.google.com/calendar?cid=MWtyZTFtYzZzZnJwYm4zM2YyajQxdHFrdWNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ)
+- [Google Drive Share (RO)](https://drive.google.com/open?id=1tWQbqYd4lJH54VVnMWaSAoEyP_ac0_Sa)
+- email forwards for skmp, phire, hdkr @fex-emu.org
+- [chat.fex-emu.org redirect](https://chat.fex-emu.org)
+
+We need
+- Bot that auto merges PRs approved / made by @FEX-Emu/maintainers 
+- A non hello world website (See https://github.com/FEX-Emu/fex-emu.org/issues/1)
+
+In the future
+- Register our trademark/namemark (@skmp can look into pricing in greece for worldwide, so we have an idea of the costs involved)
+- a formal entity to represent us (@phire suggested looking into the NZ options to do this, so we have an idea of the costs and complexity involved)
\ No newline at end of file
diff --git a/results/scraper/fex/documentation/1682 b/results/scraper/fex/documentation/1682
new file mode 100644
index 000000000..e2e345cdc
--- /dev/null
+++ b/results/scraper/fex/documentation/1682
@@ -0,0 +1,61 @@
+Signal Handling
+Splitting from #1558 & #1677, as well as discussions with @neobrain  and @Sonicadvance1.
+
+## The issues
+(a) Signals can interrupt the JIT compiler or syscall, other FEX-related code, 3rd party libraries, or thunked libraries, which are not guaranteed to be signal re-entrant safe. Any code that touches non-stack memory, or uses mutexes is possibly not signal safe. We currently block signals around some code, either using `ScopedSignalMaskWith*` guards or manually (eg, the dispatcher disabling signal handling around calls to CompileCode)
+
+(b) Signals can interrupt the translated code in the middle of operations that would normally be atomic wrt signals. This may or may not be a problem, depending on how we have implemented x86. A good example is REP* operations. This can be an issue even without LSE elimination, as the recovered guest state might be "teared".
+
+(c) Similar to above, signals can interrupt the translated code in places where we can't recover the guest architectural place, due to optimisations.
+
+(d) Similar to above, synchronous signals might be generated which need to recover a full context and cannot be deffered.
+
+Group 1: From x86 instructions
+- SIGSEGV (memops, permissions / unmapped memory)
+- SIGBUS (meops, mapping past end of file)
+- SIGFPE (all floating point exceptions? Integer overflow too?)
+
+Group 2: Handled from the x86 frontend
+- SIGILL (not handled instruction)
+- SIGTRAP (breakpoint, `int3` or `int 0x3`
+- SIGEMT (not generated)
+
+Group 3: Generated from system calls
+- SIGSYS (Bad system call, SVr4; seccomp)
+- SIGABRT (raise / __pthread_kill / kill  others?)
+
+(e) Signal latency. Whenever we disable the signal mask, like we do around `::CompileCode`, or with the signal + mutex lock guards, signal delivery gets delayed. This is mostly a concern for long-standing/non constant time signal blocking, like around `::CompileCode`  (can take up to 10+ miliseconds with complex blocks). There is an argument to be made that we should compile blocks faster, though that will never 100% solve the issue. Also, signal handlers can be delayed while code for them is getting compiled, particularly during their first run.
+
+(f) With deferred signals the opposite problem also appears, that we consume the signal too fast. I'm not sure if this results to an extra signal being possibly queued while a signal is deferred. Also, the signal might appear 'dequeued' to the sender, while it is still 'pending' in FEX, which might lead to some guest instructions running (a bit of 'execution overshoot'), a condition that can be detected, but extremely unlikely to matter to the guest. 
+
+(g) While signal delivery is not guaranteed to happen at any speed, lovely features like signal queue merging, which can lead to losing information about the delivered signals, can uncover bugs / assumptions done in the guest code. 
+
+## Current status
+Our current "signal safety strategy" for (a) is to sprinkle signal disabling code around regions that deadlock. This is very inconsistent throughput the codebase, and there are several bugs waiting to be hit. In general, this is a compromise between "likely to lockup" and "performant code". 
+
+For (b) and (c) we currently only partially recover the guest architectural state, store it alongside the host architectural state, and hope the guest code doesn't care too much about the contents of the guest state, and that it will not modify it. We depend on returning to the interrupted host code using the stored host architectural state, in order to resume execution in the middle of any teared instructions, and eventually exit from some point with a valid guest state. This poses another limitation, that the interrupted block cannot be discarded from the code cache, so the code cache cannot be cleared. This might also have further implications around SMC and code invalidations.
+
+## Proposed solutions
+For (a) I'd like us to have clear guidelines on how to handle this, as well as a mode that might be slower but offers guaranteed stability. This needs some thought, but is not too hard.
+
+For (b) and (c) the only viable solution I can think of is a combination of deferring the signal delivery until we have a fully recoverable guest state, and storing metadata that can help us exit from the middle of a block. (c) Can be avoided by limiting store elimination from LSE and disabling DSE. We can have a tradeoff between "defer delay" vs "runtime performance".
+
+For (d.1), we'll need special state flushing semantics and/or recovery metadata and/or exit blocks in instructions that may cause them. This requires extra caution around SRA.
+
+For (d.2), the frontend can take care of everything.
+
+For (d.3), we can likely merge it with the syscall handling case of (a)
+
+For (e) we can implement some form of 'aborts' for long running cases with blocked signals, ie early exits during `::CompileCode` or even possible `conditional aborts` ie temporarily pausing the execution but only aborting if re-executed before getting resumed.
+
+For (f), we can modify the behavior syscalls where signal queueing status can be detected, and make them take actual signal delivery by FEX to the guest into account. This cannot be perfect during guest/host process interop.
+
+For (g), we can implement 'user mode queueing', possibly on top of (g), to get closer to native guest behaviour.
+
+(e) + (f) + (g) are edge case behaviors that is unlikely to matter in practice, and can mostly get triggered by compilation stutter completely altering the expected timing of the guest application.
+
+## Related Tickets
+#518, #650, #1228, #1666
+
+## Other information
+Unity depends on at least graceful handling of asynchronous SIGPWR, SIGXCPU (GC, loose context requirements) and SIGSEGV w/ null pointers (NullReferenceException generation, strict context requirements).
diff --git a/results/scraper/fex/documentation/1697 b/results/scraper/fex/documentation/1697
new file mode 100644
index 000000000..8c12e8e00
--- /dev/null
+++ b/results/scraper/fex/documentation/1697
@@ -0,0 +1,26 @@
+Fork safety
+We need to guarantee that forks under fex are not more unsafe than forks of the native program itself.
+
+In theory, multithreaded programs are not supposed to fork. In practice, it happens quite often.
+
+Compilations:
+
+### Deadlocks
+
+If a thread other than the one doing the owns has any kind of lock, then that lock will never be unlocked in the forked program. Also, some locks may not gracefully handle forks (see: #1681). These issues also extend to any libraries FEX uses, including c/c++ runtimes, jemalloc, and others.
+
+A partial solution to this is for the forking thread to own all the locks before fork, and for us to use custom lock implementations that are guaranteed to work across forks.
+
+Currently, this is only done for the `FileManager` mutex. #1558 adds a generic place to do this for the syscalls handlers, and we need to go over all our os/frontend mutexes and add them there.
+
+FEXCore itself doesn't have a callback for forks yet, that needs to be looked as well.
+
+This could also be solved semi-automatically if we keep a list of all locks, possibly as part of our own custom lock type, though we need some way to define locking interdependencies and also interop with foreign lock types.
+
+This is written with mutexes in mind, but it applies to all synchronization primitives, possibly including condvars, futexes, atomics, etc.
+
+### Memory Leaks
+
+When a thread forks, we need to drop all memory used by other threads. I haven't investigated to what extent that is done, and a possible efficient mechanism might be per thread memory pools with `madvise(MADV_DONTFORK)` and/or de-allocation callbacks to be called post-fork.
+
+@phire worked on this previously, #889 contains some more information.
\ No newline at end of file
diff --git a/results/scraper/fex/documentation/1746 b/results/scraper/fex/documentation/1746
new file mode 100644
index 000000000..6d22aa418
--- /dev/null
+++ b/results/scraper/fex/documentation/1746
@@ -0,0 +1,13 @@
+Self Modifying Code (SMC) Support
+#### Overview
+X86 has fully coherent icache, and in some models, prefetch queue and OoO buffers are also coherent (citation needed, unit test pending).
+
+This means that in order to be fully correct we need to detect code changes when they happen, execute only new code after the write completes, across all threads, and that no thread should observe any other thread running stale code.
+
+FEX currently supports 4 modes of SMC
+- No support
+- Mman (Invalidation around mmap, munmap, etc apis)
+- Mtrack (Mman + segfault based detection of changes)
+- Full (Disables most cross-block optimisations, checks every instruction for modification before executing it)
+
+(@skmp todo: fill with more information about what actually is supported, next tasks, related tickets, etc)
\ No newline at end of file
diff --git a/results/scraper/fex/documentation/1890 b/results/scraper/fex/documentation/1890
new file mode 100644
index 000000000..d76590dbb
--- /dev/null
+++ b/results/scraper/fex/documentation/1890
@@ -0,0 +1,35 @@
+Documentation Home
+## Quick Links
+- [Code Of Conduct](https://github.com/FEX-Emu/FEX/blob/main/CODE_OF_CONDUCT.md)
+---
+- [Next Project Milestone](https://github.com/orgs/FEX-Emu/projects/4/views/1)
+---
+- [Release & Milestones](https://github.com/FEX-Emu/FEX/milestones?direction=asc&sort=due_date&state=open)
+- [Unstaged Tickets](https://github.com/FEX-Emu/FEX/issues?q=is%3Aopen+is%3Aissue+no%3Amilestone+-label%3Augc+-label%3Adocumentation+-label%3Adiscussion+-label%3Aproposal)
+- [Code Review for Merges](https://github.com/FEX-Emu/FEX/pulls?q=is%3Apr+is%3Aopen+-is%3Adraft)
+- [Code Review for Proposals](https://github.com/FEX-Emu/FEX/pulls?q=is%3Apr+is%3Aopen+is%3Adraft)
+---
+- [Read Me](https://github.com/FEX-Emu/FEX/blob/main/README.md)
+
+## Get In Touch
+- [Instant Messaging](https://discord.gg/fexemu)
+- [Discussion Forum](https://github.com/FEX-Emu/FEX/discussions)
+
+## Contributing 
+- CONTRIBUTING.MD (TBD)
+- [Release & Milestone Planning](https://github.com/FEX-Emu/FEX/milestones?direction=asc&sort=due_date&state=open)
+
+## High Level View
+- Signal Handling: #1682 
+- Self Modifying Code: #1746
+- Fork Safety: #1697
+- AOT & IR Caching: #828 (wayy outdated)
+- neobrain's thunking planning: https://github.com/neobrain/FEX/projects/1
+
+## Code Structure
+- [Source Outline](https://github.com/FEX-Emu/FEX/blob/main/docs/SourceOutline.md)
+
+## Infrastructure
+- Infa: #145
+
+
diff --git a/results/scraper/fex/documentation/1908 b/results/scraper/fex/documentation/1908
new file mode 100644
index 000000000..bfd8c6c2e
--- /dev/null
+++ b/results/scraper/fex/documentation/1908
@@ -0,0 +1,34 @@
+Security Model and Implications
+#### Overview
+FEX may have a weaker security model vs typical posix.
+
+Namely, IR or OBJ cache assume that every FEX process is a trusted peer.
+
+As FEX shares address spaces and security tokens with the emulated application, there's no way to guarantee that a malicious application can't modify the behaviour of any other FEX process, as long as they share caches.
+
+This is even more important around setuid binaries that operate in a separate security context. As is execution of setuid executable through FEX could lead to privileged data leakage and privilege escalation.
+
+I have a mental note to tackle these issues as part of [2212](https://github.com/FEX-Emu/FEX/milestone/3), I will budget some time to create tickets and further document the situation latest during [2210](https://github.com/FEX-Emu/FEX/milestone/1).
+
+Also note that the FEX VM and IR don't offer any correctness or security guarantees right now.
+
+#### Brain Dump
+
+Sadly, linux has very limited support for in place context switching (only vfork), so zero/low cost security contexts are not an option. Maybe vfork is enough.
+
+Sadly, linux has only one granularity of page protection, no ASIDs, and in general the memory management api and infrastructure is extremely poor.
+
+Sadly, linux also has almost no apis for information management / data and metadata leakage mitigation (only madvise and very few options there iirc).
+
+#### What can we do
+
+##### For privilege and data protection
+- Make cache management happen under a different uid, and ask a fexserver to compile. Complications: Linux induced overhead.
+- Try to escalate a task during compilation via vfork: Complications vfork overhead,, questions of forked process privileges.
+- Try to restrict the guest from accessing the FEX structures (48th address bit in aarch64 running x86_64, 32th+ bits in any running x86_32, segment registers in x86)
+- Use caches in readonly mode + AOT
+- ?
+
+#### For metadata protection
+- AOT
+- ?
\ No newline at end of file
diff --git a/results/scraper/fex/documentation/1914 b/results/scraper/fex/documentation/1914
new file mode 100644
index 000000000..8d85ec4f8
--- /dev/null
+++ b/results/scraper/fex/documentation/1914
@@ -0,0 +1,18 @@
+Address Space Stealing
+As part of #1885 a few ideas turned up
+
+We want to steal the address space first thing, before libc's _start, and also before the dynamic linker.
+
+Current ideas on how to get there
+- Make a custom ld-linux replacement, ld-stealmem
+- ld-stealmem should steal the address space (example: https://github.com/FEX-Emu/fex-assorted-tests-bins/blob/main/address-space-stealing/alloc.cpp) 
+- Implement our own mmap, munmap and put them in a section
+- Use seccomp-bpf (test: https://github.com/FEX-Emu/fex-assorted-tests-bins/blob/main/seccomp/secccomp.c) to redirect to our internal mmap, munmap if the syscall doesn't come from our special section. Verified to work on x86_64 (ubuntu 22.04) and arm64 (ubuntu 20.04)
+- Possibly make a virtual mmap flag to control host/guest mmaps
+- Load the real ld-linux via our ELF loader (example: https://github.com/FEX-Emu/FEX/blob/main/Source/Tests/ELFCodeLoader2.h#L104)
+- Modify the AT_ENTRYPOINT & friends 'as if' ld-linux was launched by the kernel
+- destroy the stack frame and jump to ld-loader, which will load FEX
+- (maybe for each thread?) make a je_malloc arena that is host-prefered
+- provide host_malloc & friends
+
+We can also define virtual syscalls, or extend prctl to control `ld-stealmem` better
\ No newline at end of file
diff --git a/results/scraper/fex/documentation/828 b/results/scraper/fex/documentation/828
new file mode 100644
index 000000000..22e6ef7ae
--- /dev/null
+++ b/results/scraper/fex/documentation/828
@@ -0,0 +1,28 @@
+AOT & IR Cache Planning
+This is a high level ticket, to track the work that needs to be done for a fairly complete aot/ir cache setup. Follow up from #47 #693 
+
+## Current state
+- FEXLoader can capture IR to aot files via `--aotircapture`
+- FEXLoader can load IR from aot files via `--aotirload`
+- FEXLoader can  pre-process an entire elf with `--aotirgenerate`
+- IR loading is done per executable file (.so or otherwise)
+- IR loading depends on mmap hooks to detect when binaries are loaded
+- IR is loaded via mmap w/ index
+- Used modules create a .path entry in ~/.fex-emu/aotir/
+- Scripts/FEXUpdateAOTIRCache.sh reads .path files and generates .aotir files for the matching elfs/so files
+ 
+### Multi threaded AOTIR generation
+- There's a POC branch, needs multiple thread contexts and some other tweaks
+
+### Streamable AOTIR generation
+- Move the index to the end of the file
+- Stream writes
+
+### Precompiled binary caching (~ 2-4 weeks to reviewable code)
+- Needs our jit to be relocation-aware
+- Needs our codegen to be relocation-optimized
+- Needs similar logic to mmap-based ir loading for the metadata
+- Needs relocation information to be stored and parsed and applied
+  - Preferably on a per-block use basis, to avoid stutters in large files
+- Should introduce a FEXAOTCompiler to compile IR caches to binary caches
+- Should introduce a new cache loading mode, `--aotbin-load` or such that loads binary caches
\ No newline at end of file