diff options
| author | serpilliere <serpilliere@users.noreply.github.com> | 2020-02-23 14:02:18 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-02-23 14:02:18 +0100 |
| commit | 886f05bcc6c4285bf60cb296186b9e975698356a (patch) | |
| tree | a402aad544052b6603d3e9d13f8b63b3f41bbabd | |
| parent | 1dac428d4cd764018ba222340718f1cdf5f71656 (diff) | |
| parent | 9016dd157ce2adc6d8107acc518daf8254633c15 (diff) | |
| download | miasm-886f05bcc6c4285bf60cb296186b9e975698356a.tar.gz miasm-886f05bcc6c4285bf60cb296186b9e975698356a.zip | |
Merge pull request #1142 from serpilliere/add_ir_documentation
Add ir documentation
| -rw-r--r-- | doc/expression/expression.ipynb | 897 |
1 files changed, 897 insertions, 0 deletions
diff --git a/doc/expression/expression.ipynb b/doc/expression/expression.ipynb new file mode 100644 index 00000000..5049a6a0 --- /dev/null +++ b/doc/expression/expression.ipynb @@ -0,0 +1,897 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Intermediate Representation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Miasm provides an [intermediate representation](https://en.wikipedia.org/wiki/Intermediate_representation)\n", + "(*IR*) to represent the effects of a source code (for example, like LLVM). The benefits of using an IR are:\n", + "* A unified representation that does not depend on the source architecture;\n", + "* A minimal language;\n", + "* The side effects are explicit, for example *A + B* will not implicitly update flags.\n", + "\n", + "Miasm's IR implementation is located in `miasm.expression.expression`, with `Expr*` objects. Rules of the language will be enumerated along this documentation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Rule: Each expression has a size (in bit). Some expressions need this size at creation time, others compute their size from their arguments.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Language" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The basic words in the language are:\n", + "* `ExprId` : represents an identifier. For example, the register `EAX` will be represented by an `ExprId` of size 32 bits.\n", + "* `ExprInt` : represents an unsigned integer." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a\n", + "ExprId('a', 32)\n", + "a\n", + "32\n" + ] + } + ], + "source": [ + "from miasm.expression.expression import *\n", + "\n", + "a = ExprId(\"a\", 32)\n", + "print(a)\n", + "print(repr(a))\n", + "\n", + "# Identifier\n", + "print(a.name)\n", + "print(a.size)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0x10\n", + "0xFFFFFFFF\n", + "16\n" + ] + } + ], + "source": [ + "cst1 = ExprInt(16, 32)\n", + "print(cst1)\n", + "cst2 = ExprInt(-1, 32)\n", + "print(cst2)\n", + "\n", + "# Show associated value\n", + "print(int(cst1))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The word `ExprMem` represents a memory access, of a given size in bits." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@16[0x11223344]\n", + "0x11223344\n" + ] + } + ], + "source": [ + "# Memory access of 16 bits, at address 0x11223344 on 32 bits\n", + "addr = ExprInt(0x11223344, 32)\n", + "mem1 = ExprMem(addr, 16)\n", + "print(mem1)\n", + "\n", + "# Show memory address\n", + "print(mem1.ptr)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The word `ExprOp` describes the n-ary operations between expressions. The operation is a string, so new operations can be created on the fly. Some operations (`+`, `*`, `|`, `parity`, ...) are already used by Miasm. An operation occurs between elements having the same size and has the same size as the arguments." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a + 0x10\n", + "(ExprId('a', 32), ExprInt(0x10, 32))\n", + "MyCustomOp(a)\n" + ] + } + ], + "source": [ + "# Defining an operation\n", + "op1 = ExprOp(\"+\", a, cst1)\n", + "print(op1)\n", + "\n", + "# Accessing the arguments\n", + "print(op1.args)\n", + "\n", + "# Creating a custom operation\n", + "op2 = ExprOp(\"MyCustomOp\", a)\n", + "print(op2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Rule: In _most_ cases, the arguments of an ExprOp must have the same size. The ExprOp size is then the size of its arguments. Some exceptions exist, for example the operator \"```==```\" has a size of 1 bit. Size exceptions are described here: https://github.com/cea-sec/miasm/blob/5ee76eed19b62983909d6a16966acc15b2c8726f/miasm/expression/expression.py#L1045**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Helpers ease the creation of common operations." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a + 0x10\n", + "a * 0x10\n", + "-a\n", + "a | 0x10\n", + "a & 0x10\n" + ] + } + ], + "source": [ + "print(a + cst1)\n", + "print(a * cst1)\n", + "print(- a)\n", + "print(a | cst1)\n", + "print(a & cst1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Be careful, even though the Expressions can represent \"everything\", Miasm assumes some properties on certain operations:\n", + "* the associative operations (`+`, `^`, `|`, ...) are n-ary operations ;\n", + "* the `-` is always unary" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a + -0x10\n" + ] + } + ], + "source": [ + "print(a - cst1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* `parity` has always a size 1, it's an exception" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "32\n", + "1\n" + ] + } + ], + "source": [ + "p = ExprOp(\"parity\", a)\n", + "print(a.size)\n", + "print(p.size)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `=` operation is handled separately by the word `ExprAssign`." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a = 0x10\n", + "0x10\n", + "a\n" + ] + } + ], + "source": [ + "assign = ExprAssign(a, cst1)\n", + "print(assign)\n", + "\n", + "# Source, destination\n", + "print(assign.src)\n", + "print(assign.dst)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Rule: The ```ExprAssign``` is not really a word of the language: it can be considered as a statement. Thus, it cannot be an argument of another expression.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The word `ExprCond` represents a ternary relation, equivalend to the Python `src1 if cond else src2`" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a?(0x10,0xFFFFFFFF)\n", + "a\n", + "0x10\n", + "0xFFFFFFFF\n" + ] + } + ], + "source": [ + "cond = ExprCond(a, cst1, cst2)\n", + "print(cond)\n", + "\n", + "# Access to the elements\n", + "print(cond.cond)\n", + "print(cond.src1)\n", + "print(cond.src2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Rule: The sizes of ```src1``` and ```src2``` must be equal, but can differ from the size of ```cond```.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following words manipulate the sizes:\n", + "* `ExprSlice`: extracts a bits slice of an expression;\n", + "* `ExprCompose`: concatenates two expressions." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a[6:8]\n", + "2\n", + "a\n", + "6\n", + "8\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sl = ExprSlice(a, 6, 8)\n", + "print(sl)\n", + "print(sl.size)\n", + "\n", + "# Access to the properties\n", + "print(sl.arg)\n", + "print(sl.start)\n", + "print(sl.stop)\n", + "\n", + "# Simpler form\n", + "sl == a[6:8]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Rule: The size of an ```ExprSlice``` is equal to ```stop - start```.**" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{a 0 32, 0x10 32 64}\n", + "64\n", + "(ExprId('a', 32), ExprInt(0x10, 32))\n", + "[(0, ExprId('a', 32)), (32, ExprInt(0x10, 32))]\n" + ] + } + ], + "source": [ + "# Concatenation of a (bit 0 to 31) with cst1 (bit 32 to 63)\n", + "comp = ExprCompose(a, cst1)\n", + "print(comp)\n", + "print(comp.size)\n", + "\n", + "# Access to the arguments\n", + "print(comp.args)\n", + "# Access to the starting bit and the associated argument\n", + "print(list(comp.iter_args()))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Rule: The size of an ```ExprCompose``` is equal to the sum of the sizes of its arguments.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, the word `ExprLoc` represents a memory location.\n", + "For example, it can represent the destination of a jump or a function call.\n", + "\n", + "A location is described by a unique element of type `LocKey`. You can see the `LocKey` as a key that you can use to retrieve all the information associated with a location: its offset, its name (\"main\") etc.\n", + "`ExprLoc` is a kind of `LocKey` container." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "loc_key_1\n" + ] + } + ], + "source": [ + "loc = ExprLoc(LocKey(1), 32)\n", + "print(loc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In summary, the different words are:\n", + "\n", + "| Word | Meaning |\n", + "|-----|----------|\n", + "|ExprAssign|A=B|\n", + "|ExprInt|0x18|\n", + "|ExprId|EAX|\n", + "|ExprLoc|label_1|\n", + "|ExprCond|A ? B : C|\n", + "|ExprMem|@16[ESI]|\n", + "|ExprOp|A + B|\n", + "|ExprSlice|AH = EAX[8 :16]|\n", + "|ExprCompose|AX = AH.AL|" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Common helpers" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "ExprInt(0xFFFFFFFF, 32)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Proper size mask\n", + "a.mask" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "32" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Expression size\n", + "a.size" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a 0x10\n" + ] + } + ], + "source": [ + "# Printable version\n", + "print(a, cst1)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ExprId('a', 32) ExprOp('+', ExprId('a', 32), ExprInt(0x10, 32))\n" + ] + } + ], + "source": [ + "# Expr representation (can be copy-pasted in the code)\n", + "print(repr(a), repr(a + cst1))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "zeroExt_64(0x10)\n", + "signExt_64(0x10)\n" + ] + } + ], + "source": [ + "# Size extension (unsigned, signed)\n", + "print(cst1.zeroExtend(64))\n", + "print(cst1.signExtend(64))" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a[31:32]\n" + ] + } + ], + "source": [ + "# Most significant bit\n", + "print(a.msb())" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a + a + 0x10\n", + "0xFFFFFFFF + 0xFFFFFFFF + 0x10\n" + ] + } + ], + "source": [ + "# Replacement\n", + "expr1 = a + a + cst1\n", + "print(expr1)\n", + "expr2 = expr1.replace_expr({a: cst2})\n", + "print(expr2)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n", + "False\n", + "True\n", + "True\n", + "True\n", + "False\n" + ] + } + ], + "source": [ + "# Type test\n", + "print(a.is_id())\n", + "print(a.is_int())\n", + "print(cst1.is_int())\n", + "print(op1.is_op())\n", + "print(op1.is_op(\"+\"))\n", + "print(op1.is_op(\"&\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Expression represented by a graph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Miasm IR expressions have a recursive structure and can be represented and manipulated as graphs.\n", + "The graph object is a `DiGraph`, implemented in `miasm.core.graph`. It offers usual methods for manipulating graphs (node and vertex access, predecessors, successors, dominators, graphviz dot representation ...)." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(a + 0x10) & 0xFFFFFFFF\n", + "(a + 0x10) & 0xFFFFFFFF\n", + "0x10\n", + "0xFFFFFFFF\n", + "a\n", + "a + 0x10\n", + "a + 0x10 -> a\n", + "a + 0x10 -> 0x10\n", + "(a + 0x10) & 0xFFFFFFFF -> a + 0x10\n", + "(a + 0x10) & 0xFFFFFFFF -> 0xFFFFFFFF\n" + ] + } + ], + "source": [ + "expr3 = a + cst1 & cst2\n", + "print(expr3)\n", + "graph = expr3.graph()\n", + "print(graph)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "dot = graph.dot()\n", + "#from graphviz import Source\n", + "#src = Source(dot)\n", + "#src" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Expression simplification" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The expression simplification applies transformation rules to an expression until they can be applied. This process is done by an `ExpressionSimplifier` object, implemented in `miasm.expression.simplifications`.\n", + "\n", + "Some basic simplifications are already implemented and can be activated with the `expr_simp` instance in the module." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0x10 + 0xFFFFFFFF\n", + "0xF\n", + "0x10[4:5]\n", + "0x1\n", + "a + a + -a\n", + "a\n", + "a + 0x10\n", + "0x10 + 0x10\n", + "0x20\n", + "a * 0x4\n" + ] + } + ], + "source": [ + "from miasm.expression.simplifications import expr_simp\n", + "\n", + "# 0x10 + (-1) = 0xF\n", + "op3 = cst1 + cst2\n", + "print(op3)\n", + "cst3 = expr_simp(op3)\n", + "print(cst3)\n", + "\n", + "# 5th bit of 0x10 = 1\n", + "sl2 = cst1[4:5]\n", + "print(sl2)\n", + "cst4 = expr_simp(sl2)\n", + "print(cst4)\n", + "\n", + "# a + a - a = a\n", + "op4 = a + a - a\n", + "print(op4)\n", + "print(expr_simp(op4))\n", + "assert expr_simp(op4) == a\n", + "\n", + "# Use to evaluate an expression (here a + 0x10 is evaluated with a = 0x10)\n", + "print(op1)\n", + "print(op1.replace_expr({a: cst1}))\n", + "print(expr_simp(op1.replace_expr({a: cst1})))\n", + "print(expr_simp(a + a +a + a))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Transformation rules can be added with the method `enable_passes`. A transformation rule is a function and is associated to an expression type.\n", + "\n", + "Below, the code transforms the booleans operation in an `ExprCond` to an operation of type `<`." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "((x + -y) ^ ((x ^ y) & ((x + -y) ^ x)))[31:32]\n", + "True\n", + "False\n", + "True\n" + ] + } + ], + "source": [ + "x = ExprId(\"x\", 32)\n", + "y = ExprId(\"y\", 32)\n", + "\n", + "inf_signed = ((x - y) ^ ((x ^ y) & ((x - y) ^ x)))[31:32]\n", + "print(inf_signed)\n", + "\n", + "def is_inf(x_val, y_val):\n", + " new_val = expr_simp(inf_signed.replace_expr({\n", + " x: x_val,\n", + " y: y_val,\n", + " }))\n", + " assert new_val.is_int()\n", + " return int(new_val) == 1\n", + "\n", + "# 0 < 10\n", + "print(is_inf(ExprInt(0, 32), ExprInt(10, 32)))\n", + "# 10 !< 10\n", + "print(is_inf(ExprInt(10, 32), ExprInt(10, 32)))\n", + "# -1 < 0\n", + "print(is_inf(ExprInt(0xFFFFFFFF, 32), ExprInt(0, 32)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code enables this transformation, which was already implemented but not enabled: " + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{<class 'miasm.expression.expression.ExprCond'>: [<function expr_simp_equal at 0x7f4dd825daf0>],\n", + " <class 'miasm.expression.expression.ExprOp'>: [<function expr_simp_inverse at 0x7f4dd825da60>],\n", + " <class 'miasm.expression.expression.ExprSlice'>: [<function expr_simp_inf_signed at 0x7f4dd825d940>,\n", + " <function expr_simp_inf_unsigned_inversed at 0x7f4dd825d9d0>]}\n", + "(((x ^ y) & (x ^ (x + -y))) ^ (x + -y))[31:32]\n", + "x <s y\n" + ] + } + ], + "source": [ + "from pprint import pprint as pp\n", + "from miasm.expression.simplifications import ExpressionSimplifier\n", + "pp(ExpressionSimplifier.PASS_COND)\n", + "\n", + "print(expr_simp(inf_signed))\n", + "expr_simp_cond = ExpressionSimplifier()\n", + "expr_simp_cond.enable_passes(ExpressionSimplifier.PASS_COND)\n", + "print(expr_simp_cond(inf_signed))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Going further" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some functionalities relative to expressions but not detailed here:\n", + "* `SymbolicExecutionEngine` : Symbolic execution;\n", + "* `Translators` : expression translation to C, Python, \"Miasm like\" or z3 expressions;\n", + "* `expr_range` : Range analysis of an expression possible values;\n", + "* `AssignBlock`, `IRBlock`, `DiGraphDefUse`, `dead_simp`, ... : grouping expressions to describe a full program, associated analysis;\n", + "* `miasm.arch.*.sem.py`, `SemBuilder` : architecture specific semantic descriptions, i.e. all the side effects of a mnemonic." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.1" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} |