Taking a glimpse at as­sem­bly pro­duced by GCC

Created: Sun Feb 24 05:02:41 CET 2019

Last mod­i­fied: Sun Feb 24 05:02:41 CET 2019


UPDATE: That post will be treated as the sec­ond in a new se­ries. The next one, that is to say, the first in the se­ries, will be more be­gin­ner friendly and will cover the ba­sics of (dis)assembly. (See com­ments at bottom.)


I’m list­ing ran­dom stuff I no­ticed while ex­per­i­ment­ing with gdbgui. I replaced mem­ory ad­dresses with la­bels wher­ever pos­si­ble.

All of the ex­am­ples are com­piled us­ing gcc -g -o foo foo.c.

Part 1: C’s && maps to cmp and je at as­sem­bly-level.

if (argc != 1 && argc != 2) {
    fprintf(stderr, "Usage: %s ...\n", argv[0]);
    exit(200);
} else {
    // do what's normal
}

Now the GCC-produced as­sem­bly (here in in­tel syn­tax).

cmp DWORD PTR [rbp-0x144],0x1
je else

cmp DWORD PTR [rbp-0x144],0x2
je else

; [...] calls to fprintf() and exit()

else:

If I gave you the above snip­pet of as­sem­bly and asked you for a C translation, you’d prob­a­bly an­swer with some­thing of the fol­low­ing form, if asked for a lit­eral mot-à-mot trans­la­tion,

if (argc == 1) {
    // do what's normal
} else if (argc == 2) {
    // do what's normal
} else {
    fprintf(stderr, "Usage: %s ...\n", argv[0]);
    exit(200);
}

which holds the ex­act same same mean­ing as,

if (argc == 1 || argc == 2) {
    // do what's normal
} else {
    fprintf(stderr, "Usage: %s ...\n", argv[0]);
    exit(200);
}

Which seems like a some­what-in­verted ver­sion of the orig­i­nal code.

Both ways are strictly equiv­a­lent !

It is just a mat­ter of how we read the code in plain eng­lish: if x, then do A else do B” is equiv­a­lent to if not x, then do B else do A,” where x is our con­di­tion ex­pres­sion, not” is like C’s ! and A and B represent both cases.

Applying this to above C, we get our as­sem­bly.

If you want to take it to the math­e­mat­i­cal level, then there is a theorem of propo­si­tional logic - I be­lieve it’s called the De Morgan’s law - that states that ¬(x∧y) ≡ (¬x)∨(¬y) and vice versa (if you swap ∨ and ∧) where ∨ means or”, ∧ means and” and ¬ means not”.

It ex­plains why by negat­ing (x != c && x != d) we get (x == c || x == d).


Part 2: In as­sem­bly, you must know the size of what you’re ask­ing for.

Another point to make is that argc is stored at ad­dress rbp-0x144, and its value is ac­cessed us­ing the DWORD PTR [] syn­tax, which is assembly’s equiv­a­lent of C’s *, the in­di­rec­tion op­er­a­tor; ex­cept that in as­sem­bly, you must know in ad­vance the size of the value you want to access. Here we’re reach­ing for a dou­ble word, which is the size of an integer on your plat­form. On mine, argc is a 32 bit value.


Part 3: func­tion pro­logue and reg­is­ter sizes

At the time main() just had been called, the fol­low­ing line of as­sem­bly has been ex­e­cuted:

mov DWORD PTR [rbp-0x144],edi

This is a part of the func­tion pro­logue, where the stack that will hold au­to­matic vari­ables is ini­tial­ized and filled with val­ues. Here argc is ini­tial­ized with the value stored in edi, an in­te­ger sized register that was prob­a­bly set to its value by the func­tion that called main().

Right af­ter that, the value of argv, which is an ad­dress, is moved from rsi, which is a 64 bit reg­is­ter; the size of a mem­ory ad­dress on the 64 bit ar­chi­tec­ture.

Fun fact: esi and rsi are ac­tu­ally the same reg­is­ter, edi is just one half - the lower one to be ex­act, where the bits are the less significant* - of rdi. (Don’t mis­take rdi for rsi though.) Thus mod­i­fy­ing edi means mod­i­fy­ing rdi whereas some op­er­a­tions on rdi won’t af­fect edi. woooooo..

*For ex­am­ple, in bi­nary 0010, 10 is less sig­nif­i­cant than 00.

You can take it as a rule that eXX is the lower half of the rXX register on 64 bit plat­forms.

So that’s it for to­day’s post, I hope you found it in­ter­est­ing de­spite my in­abil­ity to ex­plain things in a sim­ple fash­ion.

Onward to part 2!

source code