about summary refs log tree commit diff stats
diff options
context:
space:
mode:
-rw-r--r--README.md521
1 files changed, 521 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 00000000..fb416adc
--- /dev/null
+++ b/README.md
@@ -0,0 +1,521 @@
+Reverse engineering framework in Python
+
+**Table of Contents** 
+
+- [What is Miasm?](#user-content-what-is-miasm)
+- [Basic examples](#user-content-basic-examples)
+	- [Assembling / Disassembling](#user-content-assembling--disassembling)
+	- [Intermediate representation](#user-content-intermediate-representation)
+	- [Emulation](#user-content-emulation)
+	- [Symbolic execution](#user-content-symbolic-execution)
+- [How does it work?](#user-content-how-does-it-work)
+- [Documentation](#user-content-documentation)
+- [Obtaining Miasm](#user-content-obtaining-miasm)
+	- [Software requirements](#user-content-software-requirements)
+	- [Configuration](#user-content-configuration)
+- [Testing](#user-content-testing)
+- [They already use Miasm](#user-content-they-already-use-miasm)
+- [Misc](#user-content-misc)
+
+
+What is Miasm?
+==============
+
+Miasm is a free and open source (GPLv2) reverse engineering framework.
+Miasm aims to analyze / modify / generate binary programs. Here is
+a non exhaustive list of features:
+
+* Opening / modifying / generating PE / ELF 32 / 64 LE / BE using Elfesteem
+* Assembling / Disassembling X86 / ARM / MIPS / SH4 / MSP430
+* Representing assembly semantic using intermediate language
+* Emulating using JIT (dynamic code analysis, unpacking, ...)
+* Expression simplification for automatic de-obfuscation
+* ...
+
+Basic examples
+==============
+
+Assembling / Disassembling
+--------------------------
+
+Import Miasm x86 architecture:
+```
+>>> from miasm2.arch.x86.arch import mn_x86
+```
+Assemble a line:
+```
+>>> l = mn_x86.fromstring('XOR ECX, ECX', 32)
+>>> print l
+XOR        ECX, ECX
+>>> mn_x86.asm(l)
+['1\xc9', '3\xc9', 'g1\xc9', 'g3\xc9']
+```
+Modify an operand:
+```
+>>> l.args[0] = mn_x86.regs.EAX
+>>> print l
+XOR        EAX, ECX
+>>> a = mn_x86.asm(l)
+>>> print a
+['1\xc8', '3\xc1', 'g1\xc8', 'g3\xc1']
+```
+Disassemble the result:
+```
+>>> print mn_x86.dis(a[0], 32)
+XOR        EAX, ECX
+```
+Using `Machine` abstraction:
+
+```
+>>> from miasm2.analysis.machine import Machine
+>>> mn = Machine('x86_32').mn
+>>> print mn.dis('\x33\x30', 32)
+XOR        ESI, DWORD PTR [EAX]
+```
+
+For Mips:
+```
+>>> mn = Machine('mips32b').mn
+>>> print  mn.dis('97A30020'.decode('hex'), "b")
+LHU        V1, 0x20(SP)
+```
+Intermediate representation
+---------------------------
+
+Create an instruction:
+
+```
+>>> machine = Machine('arml')
+>>> l = machine.mn.dis('002088e0'.decode('hex'), 'l')
+>>> print l
+ADD        R2, R8, R0
+```
+
+Create an intermediate representation (IR) object:
+```
+>>> ira = machine.ira()
+```
+Add instruction to the pool:
+```
+>>> ira.add_instr(l)
+```
+
+Print current pool:
+```
+>>> for lbl, b in ira.blocs.items():
+...     print b
+...
+loc_0000000000000000:0x00000000
+
+        R2 = (R8+R0)
+
+        IRDst = loc_0000000000000004:0x00000004
+```
+Working with IR, for instance by getting side effects:
+```
+>>> from miasm2.expression.expression import get_rw
+>>> for lbl, b in ira.blocs.items():
+...     for irs in b.irs:
+...         o_r, o_w = get_rw(irs)
+...         print 'read:   ', [str(x) for x in o_r]
+...         print 'written:', [str(x) for x in o_w]
+...         print
+... 
+read:    ['R8', 'R0']
+written: ['R2']
+
+read:    ['loc_0000000000000004:0x00000004']
+written: ['IRDst']
+```
+
+Emulation
+---------
+
+Giving a shellcode:
+```
+00000000 8d4904      lea    ecx, [ecx+0x4]
+00000003 8d5b01      lea    ebx, [ebx+0x1]
+00000006 80f901      cmp    cl, 0x1
+00000009 7405        jz     0x10
+0000000b 8d5bff      lea    ebx, [ebx-1]
+0000000e eb03        jmp    0x13
+00000010 8d5b01      lea    ebx, [ebx+0x1]
+00000013 89d8        mov    eax, ebx
+00000015 c3          ret
+>>> s = '\x8dI\x04\x8d[\x01\x80\xf9\x01t\x05\x8d[\xff\xeb\x03\x8d[\x01\x89\xd8\xc3'
+```
+Import the shellcode thanks to the `Container` abstraction:
+
+```
+>>> from miasm2.analysis.binary import Container
+>>> c = Container.from_string(s)
+>>> c
+<miasm2.analysis.binary.ContainerUnknown object at 0x7f34cefe6090>
+```
+
+Disassembling the shellcode at address `0`:
+
+```
+>>> from miasm2.analysis.machine import Machine
+>>> machine = Machine('x86_32')
+>>> mdis = machine.dis_engine(c.bin_stream)
+>>> blocs = mdis.dis_multibloc(0)
+>>> for b in blocs:
+...  print b
+...
+loc_0000000000000000:0x00000000
+LEA        ECX, DWORD PTR [ECX+0x4]
+LEA        EBX, DWORD PTR [EBX+0x1]
+CMP        CL, 0x1
+JZ         loc_0000000000000010:0x00000010
+->      c_next:loc_000000000000000B:0x0000000b  c_to:loc_0000000000000010:0x00000010
+loc_0000000000000010:0x00000010
+LEA        EBX, DWORD PTR [EBX+0x1]
+->      c_next:loc_0000000000000013:0x00000013
+loc_000000000000000B:0x0000000b
+LEA        EBX, DWORD PTR [EBX+0xFFFFFFFF]
+JMP        loc_0000000000000013:0x00000013
+->      c_to:loc_0000000000000013:0x00000013
+loc_0000000000000013:0x00000013
+MOV        EAX, EBX
+RET
+>>>
+```
+
+Initializing the Jit engine with a stack:
+
+```
+>>> jitter = machine.jitter(jit_type='python')
+>>> jitter.init_stack()
+```
+
+Add the shellcode in an arbitrary memory location:
+```
+>>> run_addr = 0x40000000
+>>> myjit.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, s)
+```
+
+Create a sentinelle to catch the return of the shellcode:
+
+```
+def code_sentinelle(jitter):
+    jitter.run = False
+    jitter.pc = 0
+    return True
+
+>>> jitter.add_breakpoint(0x1337beef, code_sentinelle)
+>>> jitter.push_uint32_t(0x1337beef)
+```
+
+Active logs:
+
+```
+>>> jitter.jit.log_regs = True
+>>> jitter.jit.log_mn = True
+```
+
+Run at arbitrary address:
+
+```
+>>> jitter.continue_run()
+RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000000 RDX 0000000000000000
+RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
+zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
+RIP 0000000040000000
+40000000 LEA        ECX, DWORD PTR [ECX+0x4]
+RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
+RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
+zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
+....
+4000000e JMP        loc_0000000040000013:0x40000013
+RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
+RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
+zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
+RIP 0000000040000013
+40000013 MOV        EAX, EBX
+RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
+RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
+zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
+RIP 0000000040000013
+40000015 RET        
+>>> 
+
+```
+
+Interacting with the jitter:
+
+```
+>>> jitter.vm.dump_memory_page_pool()
+ad 1230000 size 10000 RW_ hpad 0x2854b40
+ad 40000000 size 16 RW_ hpad 0x25e0ed0
+
+>>> hex(jitter.cpu.EAX)
+'0x0L'
+>>> jitter.cpu.ESI = 12
+```
+
+Symbolic execution
+------------------
+
+Initializing the IR pool:
+
+```
+>>> ira = machine.ira()
+>>> for b in blocs:
+...    ira.add_bloc(b)
+... 
+```
+
+Initializing the engine with default symbolic values:
+
+```
+>>> from miasm2.ir.symbexec import symbexec
+>>> sb = symbexec(ira, machine.mn.regs.regs_init)
+```
+
+Launching the execution:
+
+```
+>>> symbolic_pc = sb.emul_ir_blocs(ira, 0)
+>>> print symbolic_pc
+((ECX_init+0x4)[0:8]+0xFF)?(0xB,0x10)
+```
+
+Same, with step logs (only changes are displayed):
+
+```
+>>> sb = symbexec(ira, machine.mn.regs.regs_init)
+>>> symbolic_pc = sb.emul_ir_blocs(ira, 0, step=True)
+________________________________________________________________________________
+ECX (ECX_init+0x4)
+________________________________________________________________________________
+ECX (ECX_init+0x4)
+EBX (EBX_init+0x1)
+________________________________________________________________________________
+zf ((ECX_init+0x4)[0:8]+0xFF)?(0x0,0x1)
+nf ((ECX_init+0x4)[0:8]+0xFF)[7:8]
+pf (parity ((ECX_init+0x4)[0:8]+0xFF))
+of ((((ECX_init+0x4)[0:8]+0xFF)^(ECX_init+0x4)[0:8])&((ECX_init+0x4)[0:8]^0x1))[7:8]
+cf (((((ECX_init+0x4)[0:8]+0xFF)^(ECX_init+0x4)[0:8])&((ECX_init+0x4)[0:8]^0x1))^((ECX_init+0x4)[0:8]+0xFF)^(ECX_init+0x4)[0:8]^0x1)[7:8]
+af (((ECX_init+0x4)[0:8]+0xFF)&0x10)?(0x1,0x0)
+ECX (ECX_init+0x4)
+EBX (EBX_init+0x1)
+________________________________________________________________________________
+IRDst ((ECX_init+0x4)[0:8]+0xFF)?(0xB,0x10)
+zf ((ECX_init+0x4)[0:8]+0xFF)?(0x0,0x1)
+nf ((ECX_init+0x4)[0:8]+0xFF)[7:8]
+pf (parity ((ECX_init+0x4)[0:8]+0xFF))
+of ((((ECX_init+0x4)[0:8]+0xFF)^(ECX_init+0x4)[0:8])&((ECX_init+0x4)[0:8]^0x1))[7:8]
+cf (((((ECX_init+0x4)[0:8]+0xFF)^(ECX_init+0x4)[0:8])&((ECX_init+0x4)[0:8]^0x1))^((ECX_init+0x4)[0:8]+0xFF)^(ECX_init+0x4)[0:8]^0x1)[7:8]
+af (((ECX_init+0x4)[0:8]+0xFF)&0x10)?(0x1,0x0)
+EIP ((ECX_init+0x4)[0:8]+0xFF)?(0xB,0x10)
+ECX (ECX_init+0x4)
+EBX (EBX_init+0x1)
+```
+
+
+Retry execution with a concrete ECX. Here, the symbolic / concolic execution reach the shellcode's end:
+
+```
+>>> from miasm2.expression.expression import ExprInt32
+>>> sb.symbols[machine.mn.regs.ECX] = ExprInt32(-3)
+>>> symbolic_pc = sb.emul_ir_blocs(ira, 0, step=True)
+________________________________________________________________________________
+ECX 0x1
+________________________________________________________________________________
+ECX 0x1
+EBX (EBX_init+0x1)
+________________________________________________________________________________
+zf 0x1
+nf 0x0
+pf 0x1
+of 0x0
+cf 0x0
+af 0x0
+ECX 0x1
+EBX (EBX_init+0x1)
+________________________________________________________________________________
+IRDst 0x10
+zf 0x1
+nf 0x0
+pf 0x1
+of 0x0
+cf 0x0
+af 0x0
+EIP 0x10
+ECX 0x1
+EBX (EBX_init+0x1)
+________________________________________________________________________________
+IRDst 0x10
+zf 0x1
+nf 0x0
+pf 0x1
+of 0x0
+cf 0x0
+af 0x0
+EIP 0x10
+ECX 0x1
+EBX (EBX_init+0x2)
+________________________________________________________________________________
+IRDst 0x13
+zf 0x1
+nf 0x0
+pf 0x1
+of 0x0
+cf 0x0
+af 0x0
+EIP 0x10
+ECX 0x1
+EBX (EBX_init+0x2)
+________________________________________________________________________________
+IRDst 0x13
+zf 0x1
+nf 0x0
+pf 0x1
+of 0x0
+cf 0x0
+af 0x0
+EIP 0x10
+EAX (EBX_init+0x2)
+ECX 0x1
+EBX (EBX_init+0x2)
+________________________________________________________________________________
+IRDst @32[ESP_init]
+zf 0x1
+nf 0x0
+pf 0x1
+of 0x0
+cf 0x0
+af 0x0
+EIP @32[ESP_init]
+EAX (EBX_init+0x2)
+ECX 0x1
+EBX (EBX_init+0x2)
+ESP (ESP_init+0x4)
+>>> print symbolic_pc
+@32[ESP_init]
+>>> sb.dump_id()
+IRDst @32[ESP_init]
+zf 0x1
+nf 0x0
+pf 0x1
+of 0x0
+cf 0x0
+af 0x0
+EIP @32[ESP_init]
+EAX (EBX_init+0x2)
+ECX 0x1
+EBX (EBX_init+0x2)
+ESP (ESP_init+0x4)
+```
+
+
+
+How does it work?
+=================
+
+Miasm embeds its own disassembler, intermediate language and
+instruction semantic. It is written in Python.
+
+To emulate code, it uses LibTCC, LLVM or Python to JIT the intermediate
+representation. It can emulate shellcodes and all or parts of binaries. Python
+callbacks can be executed to interact with the execution, for instance to
+emulate library functions effects.
+
+Documentation
+=============
+TODO
+
+Obtaining Miasm
+===============
+
+* Clone the repository: [Miasm on GitHub](https://github.com/serpilliere/miasm)
+* Get one of the Docker images at [Docker Hub](https://registry.hub.docker.com/u/miasm/)
+
+Software requirements
+---------------------
+
+Miasm uses:
+
+* LibTCC [tinycc](http://repo.or.cz/w/tinycc.git) to JIT code for emulation mode. See below
+* or LLVM v3.2 with python-llvm, see below
+* python-pyparsing
+* python-dev
+* elfesteem from [Elfesteem](http://code.google.com/p/elfesteem/)
+
+Configuration
+-------------
+
+* Install elfesteem
+```
+hg clone https://code.google.com/p/elfesteem/
+cd elfesteem_directory
+python setup.py build
+sudo python setup.py install
+```
+
+* To use the jitter, TCC or LLVM is recommended
+* LibTCC needs a little fix in the `Makefile`:
+  * remove libtcc-dev from the system to avoid conflicts
+  * clone [tinycc release_0_9_26](http://repo.or.cz/w/tinycc.git/snapshot/d5e22108a0dc48899e44a158f91d5b3215eb7fe6.tar.gz)
+  * edit the `Makefile`
+  * add option `-fPIC` to the `CFLAGS` definition: `CFLAGS+= -fPIC`
+
+```
+#
+# Tiny C Compiler Makefile
+#
+
+TOP ?= .
+include $(TOP)/config.mak
+VPATH = $(top_srcdir)
+
+CPPFLAGS = -I$(TOP) # for config.h
+
+# ADD NEXT LINE:
+CFLAGS+= -fPIC
+...
+```
+
+  * `./configure && make && make install`
+  * LLVM
+    * Debian (testing/unstable): install python-llvm
+    * Debian stable/Ubuntu/Kali/whatever: install from [llvmpy](http://www.llvmpy.org/)
+    * Windows: python-llvm is not supported :/
+  * Build and install Miasm:
+```
+$ cd miasm_directory
+$ python setup.py build
+$ sudo python setup.py install
+```
+
+If something goes wrong during one of the jitter modules compilation, Miasm will
+skip the error and disable the corresponding module (see the compilation
+output).
+
+Testing
+=======
+
+Miasm comes with a set of regression tests. To run all of them:
+
+```
+cd miasm_directory/test
+python test_all.py
+```
+
+Some options can be specified:
+
+* Mono threading: `-m`
+* Code coverage instrumentation: `-c`
+* Only fast tests: `-t long` (excludes the long tests)
+
+They already use Miasm
+======================
+* [Sibyl](https://github.com/cea-sec/Sibyl): A function divination tool
+
+
+Misc
+====
+
+* Man, does miasm has a link with rr0d?
+* Yes! crappy code and uggly documentation.