Disassembly of a tail-re­cur­sive func­tion

Created: Wed Mar 13 09:18:31 CET 2019

Last mod­i­fied: Thu Mar 14 11:54:34 CET 2019


Welcome back to my se­ries about (dis)assembly !

It’s been a while since the last ar­ti­cle, sorry for the de­lay <3

Here is a short re­cap of the pre­vi­ous ar­ti­cles, in the or­der in which you should read them,

2019-02-25

In this one, we stud­ied what the C pro­gram main(){return(0);} looked like af­ter com­pi­la­tion with GCC.

I in­tro­duced the tools and the ways of prop­erly dis­as­sem­bling a small program.

We also wrote a very small pro­gram in as­sem­bly.

2019-02-24

Shorter than the prev post, we talked about the LOGICAL AND op­er­a­tor, operand sizes and reg­is­ters.

There was a nice com­ment by fw that adds a bunch to what I wrote. Thanks to her/​him.

2019-02-27

We started dis­as­sem­bling ob­ject files in­stead of linked ex­e­cuta­bles; we looked at how a func­tion’s pa­ra­me­ters are rep­re­sented down to ma­chine code level and we in­tro­duced the stack in which they lie. I even drew a schematics of the process.

2019-03-01

This was a rather short post, where I tried to show what hap­pens when the NULL pointer is ref­er­enced. This one should be the eas­ier to fol­low.

How to read this ?

I know, I know, man­u­als that starts with an How to read this man­ual” section are gen­er­ally ut­ter-trash. But, this is a warn­ing for be­gin­ners, that I be­lieve they should take into ac­count,

If you find your­self stuck at some point,

It might take some time, rang­ing from 3 min­utes to a month of desperation be­fore you un­der­stand some­thing. I ex­pe­ri­enced this, I know it’s hard, but you should­n’t fear it. It’s eas­ier if you have side hobbies.

To pre­vent this, I ask that you take any ques­tion to the com­ment sec­tion at the bot­tom of the page. More info about it here.

There is also my email ad­dress which you can spam to your heart con­tent.

The code sam­ple

#include <stdio.h>
#include <assert.h>

int
sum(int x, int acc)
{
    assert(x >= 0);
    if (x == 0)
        return acc;
    return sum(x - 1, acc + x);
}

int
main(void)
{
    printf("%d\n", sum(10, 0));
    return 0;
}

Compilation flags and call to ob­j­dump:

cc -O1 -c -o sum.o sum.c
objdump -M intel -d sum.o

Here is the dis­as­sem­bly of sum(),

 0: 85 ff                   test   edi,edi
 2: 78 1b                   js     1f <sum+0x1f>
 4: 66 66 66 2e 0f 1f 84    nop    WORD PTR cs:[rax+rax+0x0]
 b: 00 00 00 00 00 
10: 85 ff                   test   edi,edi
12: 74 28                   je     3c <sum+0x3c>
14: 01 fe                   add    esi,edi
16: 8d 47 ff                lea    eax,[rdi-0x1]
19: 85 ff                   test   edi,edi
1b: 89 c7                   mov    edi,eax
1d: 7f f1                   jg     10 <sum+0x10>
1f: 55                      push   rbp
20: 48 89 e5                mov    rbp,rsp
23: bf 00 00 00 00          mov    edi,0x0
28: be 00 00 00 00          mov    esi,0x0
2d: ba 07 00 00 00          mov    edx,0x7
32: b9 00 00 00 00          mov    ecx,0x0
37: e8 00 00 00 00          call   3c <sum+0x3c>
3c: 89 f0                   mov    eax,esi
3e: c3                      ret    
3f: 90                      nop

TCO,

stands for Tail Call Optimization.

Because sum() ends with a call to it­self in the C code, the com­piler optimized it out with a loop; that goes from in­struc­tion 10 to instruction 1d,

10: 85 ff                   test   edi,edi
12: 74 28                   je     3c <sum+0x3c>
14: 01 fe                   add    esi,edi
16: 8d 47 ff                lea    eax,[rdi-0x1]
19: 85 ff                   test   edi,edi
1b: 89 c7                   mov    edi,eax
1d: 7f f1                   jg     10 <sum+0x10>

If you did your home­works right, you should know that EDI and ESI are often used to store func­tions’ ac­tual pa­ra­me­ters.

Here, EDI stores x and ESI stores acc. When main() calls sum(), EDI = A and ESI = 0.

Back to sum()’s loop”,

RSP is never mod­i­fied, mean­ing that the stack does­n’t grow nor shrink in size. This would not be the case had we com­piled with -O0 in­stead of -O1.

We will now fo­cus on what mnemon­ics com­pose the loop.

Jump in­struc­tions

Instructions 10 and 12 can be sum­ma­rized as ter­mi­nat­ing the function if x = zero.

What test does is per­form­ing a bit­wise AND on two operands and dis­cards the value. Neither of the two operands are modified. Only flags register are set (or not) depending on the re­sult.

Flags reg­is­ters in­clude SF, the sign flag, which is set on neg­a­tive result, that is to say, in our case, the re­sult of EDI BITWISE AND EDI.

There ex­ist a bunch of instructions that jumps” to a given ad­dress de­pend­ing on the value of register flags.

test edi,edi
je 3c

je is equiv­a­lent to jz in the sense that both of them jump iff ZF = 1”, where ZF is the Zero flag. We could have rewrit­ten this part as below,

cmp edi,edi ; test edi,edi
jz 3c       ; je 3c

Indeed, (EDI BITWISE AND EDI = 0 iff EDI = 0) iff ZF = 0.
where iff = if and only if”.

cmp per­forms a sub­strac­tion of both of its operands, the re­sult be­ing then dis­carded.

Again, (EDI - 0 = 0 iff EDI = 0) iff ZF = 0.

To sum­ma­rize, af­ter a test edi,edi,

With this much in­for­ma­tion, we can rewrite our loop in pseudo-code:

10:     85 ff                   IS EDI = 0 ?
12:     74 28                   JUMP TO 3C IF SO
14:     01 fe                   add    esi,edi
16:     8d 47 ff                lea    eax,[rdi-0x1]
19:     85 ff                   IS EDI > 0 ?
1b:     89 c7                   mov    edi,eax
1d:     7f f1                   JUMP TO 10 IF SO

That leaves us with in­struc­tion 14, which per­forms the sum as we specified it in sum.c.

LEA

What about lea eax,[rdi-0x1] ?

This is our decre­ment. No sub edi, 0x1 around here. If mov was capable of it, we could have writ­ten mov eax, rdi-0x1.

You may won­der the rea­son be­hind the brack­ets. What ex­plains the brackets also ex­plains why use RDI (64 bits) in­stead of EDI (32 bits).

LEA ma­nip­u­lates ad­dresses and is able to per­form arith­metic op­er­a­tions of them, there is noth­ing we care for at rdi-0x1, what we are look­ing for it the re­sult of the sub­strac­tion.

Because ad­dresses are 64 bits val­ues on 64 bit sys­tems, RDI must be used.

Using LEA in this case is a com­piler hack to avoid mod­i­fy­ing EDI.

Of course LEA does have some le­git use cases.

We don’t need a 64 bit reg­is­ter like RAX to hold the re­sult be­cause the substraction will only af­fect the lower 32 bits of RDI, ex­cept if those are all set to 0, but we don’t care about that.

Exiting the loop

What hap­pens if EDI = 1 at in­struc­tion 19 ? Well, every­thing goes like it should, EDI is set to 0 at 1b and be­cause EDI was greater than 0 at 19, we jump to the be­gin­ning of the loop where the func­tion is terminated on the con­di­tion that EDI is now 0.

Instruction 3c sets sum()’s re­turn value to ESI, namely acc.

Note that be­cause the choice of ex­it­ing the loop is made at in­struc­tion 12, and not at in­struc­tion 1d, what­ever is af­ter ad­dress 1f and before 3c is un­reach­able from in­side the loop.

Now what about as­sert() ?

There was a line of the form assert(x >= 0); in our code, where did it go ?

First of all, it has been ex­panded by the pre­proces­sor.

fgrep -r 'assert(e)' -A3 /usr/include

yields,

#define assert(e) \
  ((e) ? (void)0 : __assert(__func__, __FILE__, \
                            __LINE__, #e))

So we have some­thing of the form call __assert if x < 0.”

The first two in­struc­tions of sum(),

test edi,edi
js 1f

They do just that, jump if EDI < 0.”

Looking at what lies af­ter 1f,

1f: 55                       push   rbp
20: 48 89 e5                mov    rbp,rsp
23: bf 00 00 00 00          mov    edi,0x0
28: be 00 00 00 00          mov    esi,0x0
2d: ba 07 00 00 00          mov    edx,0x7
32: b9 00 00 00 00          mov    ecx,0x0
37: e8 00 00 00 00          call   3c 
3c: 89 f0                   mov    eax,esi
3e: c3                      ret    
3f: 90                      nop

Where did the call to __assert go ? It’s not there yet, what we’re looking at is a place­holder, some­thing we should just not care about since it does­n’t rep­re­sent ex­actly how our pro­gram will run from there.

Let’s look at the ac­tual ex­e­cutable, pro­duced af­ter link­ing of sum.o.

20136f: 55                      push   rbp
201370: 48 89 e5                mov    rbp,rsp
201373: bf ca 02 20 00          mov    edi,0x2002ca
201378: be bf 02 20 00          mov    esi,0x2002bf
20137d: ba 07 00 00 00          mov    edx,0x7
201382: b9 b8 02 20 00          mov    ecx,0x2002b8
201387: e8 d4 00 00 00          call   201460 <__assert@plt>
20138c: 89 f0                   mov    eax,esi
20138e: c3                      ret    
20138f: 90                      nop

Had you run ob­j­dump(1) with -drw in­stead of just -d that you would have got­ten more in­for­ma­tions about this,

37: e8 00 00 00 00              call   3c <sum+0x3c>  
38: R_X86_64_PC32               __assert+0xfffffffffffffffc

Et voilà, c’est fini pour au­jour­d’hui. I hope you learned as many things that I did while writ­ing this.

source code