Dereferencing NULL point­ers

Created: Fri Mar 1 09:18:43 CET 2019

Last mod­i­fied: Fri Mar 1 10:20:39 CET 2019


You might need to read about C or C++ point­ers be­fore read­ing this article.

Dereferencing is the act of look­ing at what lies at a given mem­ory address, ie. in putchar(*p), p is deref­er­enced.

A C pro­gram can’t look at ad­dress 0 of mem­ory. It’s il­le­gal, the ker­nel forbids it.

In fact, there are a bunch of places in mem­ory that your pro­gram can’t read or write at all. There are so many that you would be bet­ter of thinking in terms of places you can ac­cess than in terms of places you can’t.

Dereferencing a ran­dom pointer might or might not throw an er­ror and break your pro­gram. Accessing the data stored at ad­dress 0 will inevitably throw an er­ror. In practice, your pro­gram can’t re­cover from a deref­er­ence of the NULL pointer.

Side note: us­ing NULL as a de­fault value for point­ers is con­sid­ered the safe way to pre­vent dan­gling pointers; the pro­gram will crash in case of mis­takes.

echo 'main(){return(*(int*)0);}' > null.c
gcc null.c
./a.out
# => error

The two in­struc­tions re­spon­si­ble for this mess were copied be­low:

mov $0x0,%eax
mov (%rax),%eax # instruction 0x112e

Dereferencing 0 causes an ac­cess violation. If you happen to have some ex­pe­ri­ence with CPU modes, such as the real mode, the pro­tected mode and such things in gen­eral, you prob­a­bly know that the Linux ker­nel (or any big enough op­er­at­ing sys­tem) runs in a relatively re­stricted CPU mode.

Quoting Wikipedia,

At the hard­ware level, the fault is ini­tially raised by the mem­ory management unit (MMU) on il­le­gal ac­cess (if the ref­er­enced mem­ory exists)

If I un­der­stand it cor­rectly, (1) the pro­gram tried to deref­er­ence a NULL pointer, then (2) while in­struc­tion 112e is still be­ing treated, the MMU sends a sig­nal to the op­er­at­ing sys­tem which (3) kills the program.

With the right level of op­ti­miza­tion, com­pil­ers will of­ten op­ti­mize out NULL pointer derefences (or in­di­rec­tion) with the ud2 mnemonic, which sole pur­pose is to cause an er­ror.

This ar­ti­cle was short (only a quar­ter of the last one in terms of line count) but I hope you liked it.

See you next time, don’t hes­i­tate to com­ment!

source code