Binary Ninja Flags

This guide is broken up into three sections.

  • Part 1 covers the basics of how architecture modules and flags interact
  • Part 2 discusses how to use those flags when lifting
  • Part 3 covers advanced topics like custom roles

Binja Architecture Flags Part 1: The Basics

Think of flags as global boolean variables, set and cleared by observers who monitor the result of instructions from a distance:

static

Instructions like JXX glance at the flagmen for permission to jump.

Analysis would be really complicated by tracking a "remote thread" like this, in addition to the normal instruction stream. So another approach is to insert the flag logic into the instruction stream:

...
SUB a, b, c
flag_s = ...
flag_z = ...
flag_v = ...
...

Now the problem is that tons of instructions affect tons of flags, so a lot of noise is generated.

Binja tries to reduce the noise by displaying only the flags between producer and consumer that are relevant. For example, ADD affects many flags, but if none are read until ADC (add with carry) then only the carry flag is shown.

Writing effective flag code in architectures requires this understanding, and there's a bit of an art in "helping" Binja connect flag producers and consumers.

But this is about basics! So what are the basic ways in which an architecture author informs Binja of flags? The following is python architecture code for Z80.

1) declare the flags

flags = ['s', 'z', 'h', 'pv', 'n', 'c']

That's simple, it's just a list of strings.

2) assign flag Roles

flag_roles = {
    's': FlagRole.NegativeSignFlagRole,
    'z': FlagRole.ZeroFlagRole,
    'h': FlagRole.HalfCarryFlagRole,
    'pv': FlagRole.SpecialFlagRole,
    'n': FlagRole.NegativeSignFlagRole,
    'c': FlagRole.CarryFlagRole
}

Map each flag to a role from api/python/enum.py if possible. This informs Binja to generate IL for the basic, textbook behavior of a flag. For example, the ZeroFlagRole will generate IL that sets the flag when an arithmetic result is zero.

If a flag's behavior does not fit the textbook behavior, use SpecialFlagRole and in a future article we'll implement a callback for its custom IL. For example, PV here in Z80 acts as both a parity flag and an overflow flag. It's meaning at any given time depends on the last instruction that set it.

In the NES example nes.py , the 6502 flag c deviates from textbook behavior, disqualifying it from CarryFlagRole, and earning it SpecialFlagRole.

3) flag write types

flag_write_types = ['none', '*', 'only_carry']
flags_written_by_flag_write_type = {
    'none': [],
    '*': ['s', 'z', 'h', 'pv', 'n', 'c'],
    'only_carry': ['c'],
}

Think of flag_write_types as custom named groups for your convenience. When you later lift instructions that affect all flags, you don't want to list them individually over and over, you want to say just ..., flag="*").

In fact, the flag keyword parameter for the IL constructing functions does not accept individual flag names, it accepts only these group names!

Binja Architecture Flags Part 2: Flag Producer/Consumer

In the last installment, we informed Binja of our architecture's flags by defining a list of flag names, a mapping from flag names to flag roles, and groups of flags commonly set together called flag write types.

Let's remind ourselves briefly that these are terms within an imagined protocol between architecture author and Binja. The protocol's purpose is to communicate to Binja what flags are present and how they behave. As we accumulate architectures that stretch the protocol's limits (eg: PowerPC with its flag banks), it's possible the protocol gets adjusted to more generally accommodate architectures.

In Part 1, we said Binja tries to reduce the noise in displayed IL by showing only the flags between producer and consumer that are relevant. Consider this simplified flags definition for Z80:

flags = ['c']
flag_roles = { 'c': FlagRole.CarryFlagRole }
flag_write_types = ['none', 'only_carry']
flags_written_by_flag_write_type = { 'none': [], 'only_carry': ['c'], }

We have one flag named 'c', we tell Binja that it's the textbook carry flag, and we define a flag group called "only_carry" which sets just this flag.

At lifting time, we mark certain IL instructions as a producer using the flags keyword. Remember to pass the group or flag write type that contains c, not c itself:

il.add(size, lhs, rhs, flags='only_carry')

We mark certain IL instructions as a consumer by passing an LLIL flag expression as one of its operands:

il.add_carry(size, lhs, rhs, il.flag("c"))

Instructions can be both producers and consumers. This applies to add-with-carry, which both consumes and produces a new c. We omitted , flags='only_carry' to maximize contrast.

Example A

Let's lift a simple C function:

unsigned char add(unsigned char a, unsigned char b)
{
    return a + b;
}

Recap: the C function is named add() which will produce a Z80 instruction ADD (among others) which we lift to LLIL ADD that is a producer of the c flag.

Binja gives this LLIL, after compilation with SDCC:

_add:
   0 @ 0000020e  HL = 3
   1 @ 00000211  HL = HL + SP {arg2}
   2 @ 00000212  IY = 2
   3 @ 00000216  IY = IY + SP {arg1}
   4 @ 00000218  A = [IY {arg1}].b
   5 @ 0000021b  A = A + [HL {arg2}].b   <-- PRODUCER
   6 @ 0000021c  L = A
   7 @ 0000021d  <return> jump(pop)

WHERE'S THE CARRY!?

This is the lesson that was non-intuitive to me. Binja knows from the architecture-supplied lifted IL that ADD sets c, but it doesn't produce any c setting low level IL because it failed to detect anyone using c.

Example B

Let's change the function to use 2-byte integers:

unsigned int add(unsigned int a, unsigned int b)
{
    return a + b;
}

Since the Z80 ALU only adds 8-bit ints, multi-byte adds are synthesized with an initial ADD, followed by runs of ADC. Here's the new IL:

_add:
   0 @ 0000020e  HL = 4
   1 @ 00000211  HL = HL + SP {arg3}
   2 @ 00000212  IY = 2
   3 @ 00000216  IY = IY + SP {arg1}
   4 @ 00000218  A = [IY {arg1}].b
   5 @ 0000021b  temp0.b = A
   6 @ 0000021b  temp1.b = [HL {arg3}].b
   7 @ 0000021b  A = A + [HL {arg3}].b                   <-- PRODUCER
   8 @ 0000021b  flag:c = temp0.b + temp1.b u< temp0.b   <--
   9 @ 0000021c  C = A
  10 @ 0000021d  A = [IY + 1 {arg2}].b
  11 @ 00000220  HL = HL + 1 {arg4}
  12 @ 00000221  A = adc.b(A, [HL {arg4}].b, flag:c)     <-- CONSUMER
  13 @ 00000222  B = A
  14 @ 00000223  L = C
  15 @ 00000224  H = B
  16 @ 00000225  <return> jump(pop)

With ADC present, Binja detects the relationship, and produces the IL flag:c = temp0.b + temp1.b u< temp0.b.

This also means that you can mark instructions as producers of a group of many flags, and LLIL will contain the flag calculating code for those flags consumed further along.

Review

You, the architecture author:

  • inform Binja of your architecture's flags by defining:
  • flag names
  • flag "roles" which are just their textbook behavior, if they qualify
  • flag "write types" which groups of flags often set together
  • assign instructions as flag producers by passing flag= keyword parameter during lifting
  • assign instructions as flag consumers by passing IL flag expressions as operands during lifting

With the defined variables and lifted IL, Binja decides when to generate flag-affecting low level IL.

Binja Architecture Flags Part 3: Flag Roles

The previous sections said that flag roles were the textbook behavior of flags. If you can assign a flag to a role, there's a chance Binja will have that default behavior and you won't have to implement it yourself. For example, the carry flag is so common among architectures and its behavior is (to my knowledge) non-varying during addition, that we have it hardcoded into Binja:

   5 @ 0000021b  temp0.b = A
   6 @ 0000021b  temp1.b = [HL {arg3}].b
   7 @ 0000021b  A = A + [HL {arg3}].b
   8 @ 0000021b  flag:c = temp0.b + temp1.b u< temp0.b   <-- HERE

The c flag was declared to have CarryFlagRole so the flag:c = ... was produced by Binja and the architecture author is not responsible.

The hardcoded flag code is in the get_flag_write_low_level_il() method of the Architecture class and (at the time of this writing) supports:

ADD ADC SUB NEG FSUB OTHERS
CarryFlagRole yes yes yes yes yes
NegativeSignFlagRole yes yes yes yes yes yes
OrderedFlagRole yes
OverflowFlagRole yes yes
PositiveSignFlagRole yes yes yes yes yes yes
UnorderedFlagRole yes
ZeroFlagRole yes yes yes yes yes yes

If a cell has "yes" it means that for the instruction at the column header, there's an built-in attempt at producing the flag in the row. It is not a guarantee that the flag semantics match your architecture's.

So for any instructions that are not ADD, ADC, SUB, NEG, or FSUB, you can try and see if Binja's built-in for calculating negative, positive, or zero flags are accurate for your architecture. If you want carry, ordered, overflow, or unordered, you will need to implement it yourself.

What happens when you mark an instruction the producer of a certain flag, but no implementation exists?

  12 @ 0000029e  A = sbb.b(temp1.b, D, flag:c)
  13 @ 0000029e  flag:s = sbb.b(temp1.b, D, flag:c) s< 0
  14 @ 0000029e  flag:pv = unimplemented

Here, sbb is an s producer and a pv producer.

Flag s is mapped to FlagRole.NegativeSignRole which is "yes" in the table, so Binja emits the default LLIL to set it.

Flag pv is mapped to FlagRole.OverflowFlagRole which is not "yes", so Binja doesn't have an implementation and emits LLIL_UNIMPLEMENTED.

If you need to define your own flag setting code, set the flag role to SpecialFlagRole and override get_flag_write_low_level_il(). There are two contraints you must work around:

  • you do not have access to the operation result, which is tempting to use in negative and overflow calculation
  • the operands given are not expressions, they're the registers or constants themselves

In C++ land, converting the operands to expressions has a helper function in LowLevelILFunction called GetExprForRegisterOrConstantOperation(). In Python land, you can use expressionify() from https://github.com/Vector35/Z80/blob/master/Z80IL.py

Example: 6502

In 6502, the subtraction carry flag (borrow) is inverted from the textbook behavior, otherwise it's as expected:

flag_roles = {
    "c": FlagRole.SpecialFlagRole,  # Not a normal carry flag, subtract result is inverted
    ...

Then, later in get_flag_write_low_level_il() there is:

def get_flag_write_low_level_il(self, op, size, write_type, flag, operands, il):
    if flag == 'c':
        if (op == LowLevelILOperation.LLIL_SUB) or (op == LowLevelILOperation.LLIL_SBB):
            # Subtraction carry flag is inverted from the commom implementation
            return il.not_expr(0, self.get_default_flag_write_low_level_il(op, size, FlagRole.CarryFlagRole, operands, il))
            # Other operations use a normal carry flag
            return self.get_default_flag_write_low_level_il(op, size, FlagRole.CarryFlagRole, operands, il)
    ...         

This is nice and convenient: the normal behavior is simply wrapped in a not. See the source for the full code.