# Binary Ninja Intermediate Language Series, Part 2: Medium Level IL¶

The Medium Level Intermediate Language (MLIL) is the second major representation in the Binary Ninja Intermediate Language (BNIL) family of intermediate languages. Much like LLIL this representation is tree based and has many of the same instructions. This representation is distinct in a few key ways.

- Registers have been translated to variables.
- The stack as a concept is not present.
- Variables have types associated with them.
- Call sites have their parameters inferred and associated with them.
- Data flow has been calculated and constants are propagated.
- Some dead code is eliminated (MLIL only, Mapped MLIL doesn't do this)

## Purposes of MLIL¶

- Simplified representation
- Small discrete operations
- Can be more accurate to binary representation than decompilation
- Powerful Data flow (PossibleValueSet) APIs
- Accurate (though verbose) variable identification

In the rest of this article we will explore the variable object, the type object, the confidence system, and finally the instruction set.

## The Variable Object¶

First, it's important to understand what we mean when we talk about a MLIL variable. Continuing from our example above we can get a `Variable`

object.

>>> inst.output [<var int64_t rax>] >>> var = inst.output[0] >>> type(var) <class 'binaryninja.function.Variable'>

Variables in MLIL have a very specific meaning, that is not completely obvious at first. They represent a single storage location within the scope of a single function. To those not well versed in program analysis, a storage location is where a value is located at a given point in time. In the process of compilation a compiler conducts a step called *Register Allocation*; this is the process of figuring out how to map the potentially infinite number of variables specified in the original source code to a finite set of registers. When there are more variables and intermediate values than registers available, the compiler *spills* them on to the stack. Thus a single high-level-language variable can be mapped across a number of storage locations. A variable can simultaneously be in multiple registers and on the stack at the same time. However, unlike high-level-language variables, MLIL variables represent one and only one storage location. Binary Ninja's High Level IL (HLIL) will be responsible for storing this mapping.

So let's look at the properties available on a `Variable`

object.

`source_type`

¶

The `source_type`

represents the storage location type and can be one of the following :

enum VariableSourceType { StackVariableSourceType, RegisterVariableSourceType, FlagVariableSourceType };

>>> var.source_type <VariableSourceType.RegisterVariableSourceType: 1>

`storage`

¶

The `storage`

property changes meaning depending on the `VariableSourceType`

. When a variable is of type `RegisterVariableSourceType`

, its `storage`

property represents the index into the register list for the given architecture. If the `source_type`

is `StackVariableSourceType`

, its `storage`

property represents the stack offset of the variable.

>>> var <var int64_t rax> >>> var.source_type <VariableSourceType.RegisterVariableSourceType: 1> >>> bv.arch._regs_by_index[var.storage] 'rax' >>> var2 <var int64_t var_260> >>> var2.source_type <VariableSourceType.StackVariableSourceType: 0> >>> hex(var2.storage) '-0x260'

Given the above information it might now be intuitive how variable names are constructed. First we determine the `source_type`

of the variable. If it's a `RegisterVariableSourceType`

we just use the register's name directly. If it’s a `StackVariableSourceType`

then we use `var_`

+ `hex(-storage)`

. Finally, we append a count each time that that storage location is reused.

`index`

¶

The `index`

is an identifier chosen to be unique across different analysis passes.

`type`

¶

The `type`

property returns the `Type`

object associated with the variable:

>>> var.type <type: int64_t, 0% confidence>

Type objects are described in detail in the next section.

## The Type Object¶

Type objects are very similar to standard C types. A Type object's type can be determined through the object’s `type_class`

property. Valid types are in the `TypeClass`

enumeration:

enum TypeClass { VoidTypeClass = 0, BoolTypeClass = 1, IntegerTypeClass = 2, FloatTypeClass = 3, StructureTypeClass = 4, EnumerationTypeClass = 5, PointerTypeClass = 6, ArrayTypeClass = 7, FunctionTypeClass = 8, VarArgsTypeClass = 9, ValueTypeClass = 10, NamedTypeReferenceClass = 11, WideCharTypeClass = 12 };

Type objects all contain a `confidence`

property; this is currently only used for type inference, but can also be used by users implementing their own analyses. Below is a reference for each of the type objects and their unique properties.

### VoidTypeClass¶

A void object is one that nothing is known about. For instance if a reference is taken of a static memory address a variable will be created there with a void type as we know the address is used, but are unsure about what size is being accessed. The instruction that takes the address of the static memory address will be a void pointer.

### BoolTypeClass¶

A boolean type is an integer which has a value of False (0) or True (!0).

### IntegerTypeClass¶

An integer type has a sign, a width (in bytes), and a display type. The display type determines how the integer should be displayed; the options are self explanatory:

enum IntegerDisplayType { DefaultIntegerDisplayType, BinaryDisplayType, SignedOctalDisplayType, UnsignedOctalDisplayType, SignedDecimalDisplayType, UnsignedDecimalDisplayType, SignedHexadecimalDisplayType, UnsignedHexadecimalDisplayType, CharacterConstantDisplayType, PointerDisplayType };

### FloatTypeClass¶

The float type is a IEEE 754 variable precision type, and can represent floating point numbers up to 10 bytes in width. All floating point numbers are assumed to be signed.

### WideCharTypeClass¶

The wide character holds a unicode character constant whose interpretation can change depending on the `analysis.unicode`

group of settings.

### VarArgsTypeClass¶

A varargs type is used to indicate that a function is variadic and thus represents the set of additional parameters being passed to a given function.

### ValueTypeClass¶

A value type is simply a constant value. It is used mainly in demangling for types which only have a have a name or value.

### FunctionTypeClass¶

The function type describes the return type, parameter list, and calling convention of a function, among many other properties.

`can_return`

- boolean value indicating if the function can return`calling_convention`

- the calling convention this function uses`const`

- boolean value indicating if this a const function`has_variable_arguments`

- boolean value indicating if this function is variadic`parameters`

- contains a list of`Type`

objects`platform`

- the`Platform`

object associated with this function`return_value`

- the return type of this function`stack_adjustment`

- the size in bytes of the stack adjustment that this function makes

### PointerTypeClass¶

A pointer type simply describes a pointer and what it points to in the `target`

/`element_type`

property.

### ArrayTypeClass¶

Array types function similarly to pointer types however the array type knows how large the object that it points to is:

`target`

/`element_type`

- the type of element this array is constructed of`count`

- the count of array elements`width`

- the size of the array (count * target.width)

### EnumerationTypeClass¶

`Enumeration`

types function much the same way they do in C, providing a mapping between a name and corresponding constant. The object itself contains a `members`

property and a list of `EnumerationMember`

objects each containing a name and value.

### StructureTypeClass¶

Structure types are simple in principle but are complicated by the need for them to be referenced by a `NamedTypeReference`

for them to be useful. Structures come in 3 different flavors: `struct`

, `class`

, and `union`

. While the first two simply differ in name, in unions all members overlap. `Structure`

objects contain a list of StructureMembers. StructureMember objects contain a `name`

, `offset`

, and `type`

. Structures can be packed or aligned, accessible by the `packed`

property.

### NamedTypeReferenceClass¶

NamedTypeReference types are symbolic references to other types. They function much like a C `typedef`

(i.e. Name X corresponds to type Y). The NamedTypeReference has a `type_class`

property describing what sort of type it is pointing at.

enum NamedTypeReferenceClass { UnknownNamedTypeClass = 0, TypedefNamedTypeClass = 1, ClassNamedTypeClass = 2, StructNamedTypeClass = 3, UnionNamedTypeClass = 4, EnumNamedTypeClass = 5 };

Most of the above should be self-explanatory except for the `UnknownNamedTypeClass`

which is used in the name demangler, as the mangler doesn't disambiguate between named Enumerations and named Structures. NamedTypeReference objects also have a UUID `type_id`

.

## The Instruction Set¶

The instruction set is made up of `MediumLevelILInstruction`

objects. Let's start exploring by using the python console to poke around at some instructions. Open up a binary in Binary Ninja and retrieve an MLIL instruction:

>>> inst = current_mlil[8] <il: rax = 0x402cb0("PORT")> >>> type(inst) <class 'binaryninja.mediumlevelil.MediumLevelILInstruction'>

`current_mlil`

is mapped to whatever function is currently being viewed and is not generally available to those writing plugins, as your plugin could be headless. The bracket operators tell the API to get the MLIL instruction at index 8 for the current function.

There are a number of properties that can be queried on the `MediumLevelILInstruction`

object, and the validity of these properties changes depending on what the current operation is. If we look at the `operation`

of `inst`

we can see it is a `MLIL_CALL`

instruction.

>>> inst.operation <MediumLevelILOperation.MLIL_CALL: 51>

From the code in `mediumlevelil.py`

we can see that the `MLIL_CALL`

operation has three properties in addition to the operations available to all `MediumLevelILInstruction`

objects

MediumLevelILOperation.MLIL_CALL: [("output", "var_list"), ("dest", "expr"), ("params", "expr_list")],

Thus we can query the call's `output`

which is a list of variables:

>>> inst.output [<var int64_t rax>]

The call's `dest`

(destination expression) which in this case is a `MLIL_CONST_PTR`

:

>>> inst.dest <il: 0x402cb0> >>> inst.dest.operation <MediumLevelILOperation.MLIL_CONST_PTR: 14> >>> inst.dest.value <const ptr 0x402cb0> >>> hex(inst.dest.value.value) '0x402cb0'

The parameter list can be accessed through the `params`

property:

>>> inst.params [<il: "PORT">] >>> inst.params[0] <il: "PORT"> >>> type(inst.params[0]) <class 'binaryninja.mediumlevelil.MediumLevelILInstruction'>

### Control Flow¶

`MLIL_JUMP`

- Branch to the`dest`

expression's address`MLIL_JUMP_TO`

- A jump table dispatch instruction. Uses the`dest`

expression to calculate the MLIL instruction target`targets`

to branch to`MLIL_CALL`

- Branch to the`dest`

expression function, saving the return address, with the list of parameters`params`

and returning the list of return values`output`

`MLIL_CALL_UNTYPED`

- This is a call instruction where stack resolution could not be determined, and thus a list of parameters and return values do not exist`MLIL_CALL_OUTPUT`

- This expression holds a set of return values`dest`

from a call`MLIL_CALL_PARAM`

- This expression holds the set of parameters`src`

for a call instruction`MLIL_RET`

- Return to the calling function.`MLIL_RET_HINT`

- Indirect jump to`dest`

expression (only used in internal analysis passes.)`MLIL_NORET`

- This instruction will never be executed, the instruction before it is a call that doesn't return`MLIL_IF`

- Branch to the`true`

/`false`

mlil instruction identifier depending on the result of the`condition`

expression`MLIL_GOTO`

- Branch to the`dest`

expression id`MLIL_TAILCALL`

- This instruction calls an the expression`dest`

using`params`

as input and`output`

for return values`MLIL_SYSCALL`

- Make a system/service call with parameters`params`

and output`output`

`MLIL_SYSCALL_UNTYPED`

- Makes a system/service call, but an exact set of parameters couldn't be determined.

### Variable Reads and Writes¶

`MLIL_SET_VAR`

- Sets a variable`dest`

to the result of an expression`src`

`MLIL_SET_VAR_FIELD`

- Sets variable`dest`

at`offset`

to the`src`

expression`MLIL_SET_VAR_SPLIT`

- Sets a pair of variables`high`

:`low`

to the result of the`src`

expression`MLIL_LOAD`

- Read`size`

bytes from the memory address`src`

`MLIL_LOAD_STRUCT`

- Read from the struct offset at`src`

+`offset`

`MLIL_STORE`

- Stores`size`

bytes into`dest`

from`src`

`MLIL_STORE_STRUCT`

- Stores`size`

bytes into struct offset`dest`

+`offset`

from`src`

`MLIL_VAR`

- A variable expression`src`

`MLIL_VAR_FIELD`

- A variable and offset expression`src`

,`offset`

`MLIL_VAR_SPLIT`

- A split pair of variables`high`

:`low`

which can be used a single expression`MLIL_ADDRESS_OF`

- The address of variable`src`

`MLIL_ADDRESS_OF_FIELD`

- The address and`offset`

of the variable`src`

`MLIL_CONST`

- A constant integral value`constant`

`MLIL_CONST_PTR`

- A constant integral value which is used as a pointer`constant`

`MLIL_EXTERN_PTR`

- A symbolic pointer`constant`

+`offset`

to a symbol that exists outside the binary`MLIL_FLOAT_CONST`

- A floating point constant`constant`

`MLIL_IMPORT`

- A`constant`

integral value representing an imported address`MLIL_LOW_PART`

-`size`

bytes from the low end of`src`

expression

### Arithmetic Operations¶

`MLIL_ADD`

- Adds`left`

expression to`right`

expression`MLIL_ADC`

- Adds with carry the`left`

expression to the`right`

expression with carry from the`carry`

expression`MLIL_SUB`

- Subtracts the`right`

expression from the`left`

expression`MLIL_SBB`

- Subtraction with borrow the`right`

expression from the`left`

expression with carry from the`carry`

expression`MLIL_AND`

- Bitwise AND`left`

expression with the`right`

expression`MLIL_OR`

- Bitwise OR`left`

expression with the`right`

expression`MLIL_XOR`

- Bitwise XOR`left`

expression with the`right`

expression`MLIL_LSL`

- Logical shift left the`left`

expression by the number of bits stored in the`right`

expression`MLIL_LSR`

- Logical shift right the`left`

expression by the number of bits stored in the`right`

expression`MLIL_ASR`

- Arithmetic shift left the`left`

expression by the number of bits stored in the`right`

expression`MLIL_ROL`

- Rotate left the`left`

expression by the number of bits stored in the`right`

expression`MLIL_RLC`

- Rotate left with carry the`left`

expression and the`carry`

expression by the number of bits stored in the`right`

expression`MLIL_ROR`

- Rotate right the`left`

expression by the number of bits stored in the`right`

expression`MLIL_RRC`

- Rotate right with carry the`left`

expression and the`carry`

expression by the number of bits stored in the`right`

expression`MLIL_MUL`

- Single-precision multiply the`left`

expression with the`right`

expression`MLIL_MULU_DP`

- Double-precision unsigned multiply the`left`

expression with the`right`

expression, result expression is twice the size of the input expressions`MLIL_MULS_DP`

- Double-precision signed multiply the`left`

expression with the`right`

expression, result expression is twice the size of the input expressions`MLIL_DIVU`

- Unsigned single-precision divide`left`

expression by the`right`

expression`MLIL_DIVU_DP`

- Unsigned double-precision divide`left`

expression by the`right`

expression`MLIL_DIVS`

- Signed single-precision divide`left`

expression by the`right`

expression`MLIL_DIVS_DP`

- Signed double-precision divide`left`

expression by the`right`

expression`MLIL_MODU`

- Unsigned single-precision modulus of`left`

expression by the`right`

expression`MLIL_MODU_DP`

- Unsigned double-precision modulus of`left`

expression by the`right`

expression`MLIL_MODS`

- Signed single-precision modulus of`left`

expression by the`right`

expression`MLIL_MODS_DP`

- Signed double-precision modulus of`left`

expression by the`right`

expression`MLIL_NEG`

- Sign inversion of`src`

expression`MLIL_NOT`

- Bitwise inversion of`src`

expression`MLIL_FADD`

- IEEE754 floating point addition of`left`

expression with`right`

expression`MLIL_FSUB`

- IEEE754 floating point subtraction of`left`

expression with`right`

expression`MLIL_FMUL`

- IEEE754 floating point multiplication of`left`

expression with`right`

expression`MLIL_FDIV`

- IEEE754 floating point division of`left`

expression with`right`

expression`MLIL_FSQRT`

- IEEE754 floating point square root of`left`

expression with`right`

expression`MLIL_FNEG`

- IEEE754 floating point sign negation of`src`

expression`MLIL_FABS`

- IEEE754 floating point absolute value of`src`

expression`MLIL_FLOAT_TO_INT`

- IEEE754 floating point to integer conversion of`src`

expression`MLIL_INT_TO_FLOAT`

- Integer to IEEE754 floating point conversion of`src`

expression`MLIL_FLOAT_CONV`

- Convert bytes in`src`

expression to IEEE754 floating point`MLIL_ROUND_TO_INT`

- Rounds the IEEE754 floating point number`src`

expression`MLIL_FLOOR`

- Computes the floating point floor of the IEEE754 number in`src`

`MLIL_CEIL`

- Computes the floating point floor of the IEEE754 number in`src`

`MLIL_FTRUNC`

- Computes the floating point truncation of the IEEE754 number in`src`

`MLIL_SX`

- Sign extends the`src`

expression`MLIL_ZX`

- Zero extends the`src`

expression`MLIL_ADD_OVERFLOW`

- Calculates overflow of the addition of`left`

expression with`right`

expression

### Comparison Instructions¶

`MLIL_CMP_E`

- Compare expression evaluates to true if`left`

expression is equal to`right`

`MLIL_CMP_NE`

- Compare expression evaluates to true if`left`

expression is not equal to`right`

`MLIL_CMP_SLT`

- Compare expression evaluates to true if`left`

expression is signed less than`right`

`MLIL_CMP_ULT`

- Compare expression evaluates to true if`left`

expression is unsigned less than`right`

`MLIL_CMP_SLE`

- Compare expression evaluates to true if`left`

expression is signed less than or equal to`right`

`MLIL_CMP_ULE`

- Compare expression evaluates to true if`left`

expression is unsigned less than or equal to`right`

`MLIL_CMP_SGE`

- Compare expression evaluates to true if`left`

expression is signed greater than or equal to`right`

`MLIL_CMP_UGE`

- Compare expression evaluates to true if`left`

expression is unsigned greater than or equal to`right`

`MLIL_CMP_SGT`

- Compare expression evaluates to true if`left`

expression is signed greater than`right`

`MLIL_CMP_UGT`

- Compare expression evaluates to true if`left`

expression is unsigned greater than`right`

`MLIL_TEST_BIT`

- Test if bit`right`

in expression`left`

is set`MLIL_FCMP_E`

- Floating point compare expressions - evaluates to true if`left`

expression is equal to`right`

`MLIL_FCMP_NE`

- Floating point compare expressions - evaluates to true if`left`

expression is not equal to`right`

`MLIL_FCMP_LT`

- Floating point compare expressions - evaluates to true if`left`

expression is less than`right`

`MLIL_FCMP_LE`

- Floating point compare expressions - evaluates to true if`left`

expression is less than or equal to`right`

`MLIL_FCMP_GE`

- Floating point compare expressions - evaluates to true if`left`

expression is greater than or equal to`right`

`MLIL_FCMP_GT`

- Floating point compare expressions - evaluates to true if`left`

expression is greater than`right`

`MLIL_FCMP_O`

- Floating point compare expressions - evaluates to true if both`left`

and`right`

expressions are ordered (not NaN)`MLIL_FCMP_UO`

- Floating point compare expressions - evaluates to true if either`left`

or`right`

expression is unordered (NaN)

### Miscellaneous Instructions¶

`MLIL_BP`

- Breakpoint instruction`MLIL_TRAP`

- Interrupt/trap instruction with`vector`

expression`MLIL_INTRINSIC`

- Intrinsic instruction defined by the architecture`MLIL_FREE_VAR_SLOT`

- Free the`dest`

expression from the register stack`MLIL_UNDEF`

- The expression performs undefined behavior`MLIL_UNIMPL`

- The expression is not implemented`MLIL_UNIMPL_MEM`

- The expression is not implemented but does access`src`

memory