{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# LocationDB object \n", "The `LocationDB` is the Miasm object responsible of the symbols' management. A `Location` is an object representing a code or data (or anywhere else) position. As the name says, the `LocationDB` is a database of locations. Here are some rules:\n", "- each location has exactly *one* associated `LocKey`\n", "- a `LocKey` is linked to a unique `LocationDB` (and must not be used in another `LocationDB`)\n", "- a `LocKey` is very similar to primary key object in a database.\n", "- a `LocKey` can have an optional *offset*.\n", "- a `LocKey` can have multiple symbol names\n", "- two `Lockey`s cannot share an identical offset\n", "- two `LocKey`s cannot share a symbol name\n", "\n", "Below are manipulations of the `LocationDB`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import and create a locationDB\n", "from miasm.core.locationdb import LocationDB\n", "loc_db = LocationDB()\n", "print(repr(loc_db))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Create a location with default attributes (no offset, no symbol name)\n", "loc_a = loc_db.add_location()\n", "print(loc_a)\n", "# Create a second location with an offset\n", "loc_b = loc_db.add_location(offset=112233)\n", "print(loc_b)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Add a location with a name\n", "loc_c = loc_db.add_location(name=\"main\")\n", "# Add another alias name to this location\n", "loc_db.add_location_name(loc_c, \"_main_\")\n", "# Add another location\n", "loc_d = loc_db.add_location()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Display LocationDB\n", "print(loc_db)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Associate an offset to an existing location\n", "loc_db.set_location_offset(loc_a, 0x5678)\n", "print(loc_db)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Remove a name from an existing location\n", "loc_db.remove_location_name(loc_c, \"_main_\")\n", "print(loc_db)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Get the offset of a location\n", "hex(loc_db.get_location_offset(loc_a))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "# Location with no offset\n", "print(loc_db.get_location_offset(loc_c))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'loc_5678'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display locations\n", "loc_db.pretty_str(loc_a)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'loc_1b669'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loc_db.pretty_str(loc_b)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'main'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loc_db.pretty_str(loc_c)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'loc_key_3'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loc_db.pretty_str(loc_d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that:\n", "- if a location doesn't have an offset neither a name, its *pretty* form is `loc_key_XXX`\n", "- if a location has an offset but no name, its *pretty* form is `loc_OFFSET`\n", "- if a location has a name, its *pretty* form is `name`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Locations usage in Miasm\n", "\n", "The locations are used everywhere in miasm. For example, let disassemble a simple shellcode:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loc_0\n", "CMP EAX, 0x10\n", "JZ loc_c\n", "->\tc_next:loc_5 \tc_to:loc_c \n", "loc_c\n", "MOV EAX, ECX\n", "CALL loc_11223344\n", "->\tc_next:loc_13 \n", "loc_5\n", "MOV ESI, EAX\n", "CMOVA EAX, EBX\n", "JMP loc_14\n", "->\tc_to:loc_14 \n", "loc_13\n", "INC EAX\n", "->\tc_next:loc_14 \n", "loc_14\n", "RET \n" ] } ], "source": [ "from miasm.analysis.binary import Container\n", "from miasm.analysis.machine import Machine\n", "\n", "# Create a LocationDB\n", "loc_db = LocationDB()\n", "\n", "# Create a container of bytes\n", "cont = Container.from_string(\n", " b\"\\x83\\xf8\\x10\\x74\\x07\\x89\\xc6\\x0f\\x47\\xc3\\xeb\\x08\\x89\\xc8\\xe8\\x31\\x33\\x22\\x11\\x40\\xc3\",\n", " loc_db\n", ")\n", "\n", "# Instantiate a x86 32 bit architecture\n", "machine = Machine(\"x86_32\")\n", "\n", "# Instantiate a disassembler engine, using the previous bin_stream and its\n", "# associated location DB.\n", "mdis = machine.dis_engine(cont.bin_stream, loc_db=loc_db)\n", "\n", "# Run a recursive traversal disassembling from address 0\n", "asmcfg = mdis.dis_multiblock(0)\n", "\n", "# Display each basic blocks\n", "for block in asmcfg.blocks:\n", " print(block)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loc_0\n", "CMP EAX, 0x10\n", "JZ loc_c\n", "->\tc_next:loc_5 \tc_to:loc_c \n" ] } ], "source": [ "block = asmcfg.getby_offset(0)\n", "print(block)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loc_key_0\n" ] } ], "source": [ "# The basic block is placed at a location, which can be retrieved using `.loc_key`\n", "print(block.loc_key)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loc_key_0: 0x0 - entry\n", "loc_key_1: 0xc - \n", "loc_key_2: 0x5 - \n", "loc_key_3: 0x11223344 - \n", "loc_key_4: 0x13 - \n", "loc_key_5: 0x14 - \n" ] } ], "source": [ "# We can add a name to this first location\n", "loc_db.add_location_name(block.loc_key, \"entry\")\n", "print(loc_db)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "entry\n", "CMP EAX, 0x10\n", "JZ loc_c\n", "->\tc_next:loc_5 \tc_to:loc_c \n" ] } ], "source": [ "# And we can re-display the block:\n", "print(block)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "entry\n", "CMP EAX, 0x10\n", "JZ quiche\n", "->\tc_next:loc_5 \tc_to:quiche \n" ] } ], "source": [ "# We will give an arbitrary name to location at offset 0xC\n", "loc_c = loc_db.get_offset_location(0xc)\n", "loc_db.add_location_name(loc_c, \"quiche\")\n", "print(block)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Those locations are also used in intermediate representation:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Get a lifter\n", "lifter = machine.lifter_model_call(loc_db)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Get the intermediate representation of the asmcfg\n", "ircfg = lifter.new_ircfg_from_asmcfg(asmcfg)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'entry':\n", "\n", "zf = FLAG_EQ_CMP(EAX, 0x10)\n", "\n", "nf = FLAG_SIGN_SUB(EAX, 0x10)\n", "\n", "pf = parity((EAX + -0x10) & 0xFF)\n", "\n", "cf = FLAG_SUB_CF(EAX, 0x10)\n", "\n", "of = FLAG_SUB_OF(EAX, 0x10)\n", "\n", "af = ((EAX ^ 0x10) ^ (EAX + -0x10))[4:5]\n", "\n", "EIP = CC_EQ(zf)?(b'quiche',loc_5)\n", "\n", "IRDst = CC_EQ(zf)?(b'quiche',loc_5)\n", "\n" ] } ], "source": [ "# Get location at 0\n", "loc_entry = loc_db.get_offset_location(0)\n", "# Get irblock at this location\n", "irblock = ircfg.blocks[loc_entry]\n", "# Display IRBlock\n", "print(irblock)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In miasm, each expression embeds its size. The location doesn't have a size. To use a location in IR code, you have to wrap it in the Miasm word `ExprLoc`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CC_EQ(zf)?(loc_key_1,loc_key_2)\n" ] } ], "source": [ "# Get the irblock destination (IRDst value)\n", "dst = irblock.dst\n", "print(dst)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ExprCond(ExprOp('CC_EQ', ExprId('zf', 1)), ExprLoc(, 32), ExprLoc(, 32))\n" ] } ], "source": [ "print(repr(dst))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ExprLoc(, 32) ExprLoc(, 32)\n" ] } ], "source": [ "# It's an ExprCond. We retrieve here the possible values\n", "src1, src2 = dst.src1, dst.src2\n", "print(repr(src1), repr(src2))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loc_key_1\n" ] } ], "source": [ "# Retrieve the location of the ExprLoc\n", "loc = src1.loc_key\n", "print(loc)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }