Programming new fea­tures for file2c(1)

Created: Sat Mar 16 11:59:55 CET 2019

Last mod­i­fied: Sun Mar 17 21:39:11 CET 2019


I’ll be us­ing files from FreeBSD’s source tree, nev­erth­less, Linux users can just down­load the C file and re­move the __FBSDID di­rec­tive.

What does file2c(1) do ?

dd if=/dev/urandom bs=1 count=10 \
  | file2c -xn 2 'int i = {' '};' \
  | indent
#int        i = {
#   0x1e, 0x2f,
#   0x82, 0x63,
#   0x5e, 0x4d,
#   0x9c, 0x96,
#   0x48, 0xbd
#};

file2c(1) con­verts bi­nary data to C ar­rays or scalars.

The -n flag spec­i­fies how many items to fit on a sin­gle line. The -x flag tells file2c(1) that we want base 16 rep­re­sen­ta­tion (default is base 10).

We want to add -c and -o flags. Respectively to out­put char­ac­ters and oc­tal num­bers.

echo hello | file2c -c
# 'h', 'e', 'l', 'l', 'o', '\012'

echo hello | file2c -o # octal notation
# 0150,0145,0154,0154,0157,012

Editing the file

cd /usr/src/usr.bin/file2c
$EDITOR file2c.c

Skip to the main func­tion, right af­ter au­to­matic vari­ables de­c­la­ra­tions - where de­fault be­hav­ior is de­fined by the way - you should see a line that looks like this:

while ((c = getopt(argc, argv, "n:sx")) != -1) {

file2c(1) used libc’s getopt function to parse com­mand line ar­gu­ments. Quoting the doc, To use this facility, your pro­gram must in­clude the header file unistd.h.”

The first thing to do is to add our flags to getopt()’s op­tions parameter, just like this: "n:sxoc".

After this change, we can be pretty con­fi­dent that getopt() won’t er­ror out were we to run echo hello | file2c -c. Since we did­n’t add any behavior to the flag yet, it would just be ig­nored.

while ((c = getopt(argc, argv, "n:sxoc")) != -1) {
        switch (c) {
        case 'n':       /* Max. number of bytes per line. */
                maxcount = strtol(optarg, NULL, 10);
                break;
        case 's':       /* Be more style(9) comliant. */
                pretty = 1;
                break;
        case 'x':       /* Print hexadecimal numbers. */
                radix = 16;
                break;
        case '?':
        default:
                usage();
        }
}

Inside the while loop, there is a switch. By the struc­ture of the code, we know for sure that getopt does some­thing like this with -xn 2:

Build an im­plicit in­ter­nal rep­re­sen­ta­tion of argv: some­thing which would it­er­ate over it and con­sume CLI ar­gu­ments so that getopt() would first re­turn 'x' and then 'n' on sec­ond call. Because -n was followed by a : in the op­tions, getopt() will set the global variable optarg to the re­quired ar­gu­men­t’s value, which is to say, "2".”

Lower in the source file, we can see an­other switch state­ment:

switch (radix) {
case 10:
        linepos += printf("%d", c);
        break;
case 16:
        linepos += printf("0x%02x", c);
        break;
default:
        abort();
}

To add proper be­hav­ior to our new pair of flags, we’ll need a to­tal of four changes.

In getopt()’s switch state­ment: du­pli­cates of the 'x' case with different val­ues for radix:

switch (c) {
/* case 'x': ... */
case 'o':       /* Print octal numbers. */
        radix = 8;
        break;
case 'c':       /* Print numbers as characters. */
        radix = -1; // negative number so that we're sure nobody will 
                    // need the value
        break;
}

and in the later switch, the one where we’re it­er­at­ing over char­ac­ters from stan­dard in­put, we’ll add branches cor­re­spond­ing to the new val­ues of radix:

switch (radix) {
case -1:        
        if (iscntrl(c)
            || c == '\\'
            || c == '\"'
            || c == '\'') {
                linepos += printf("'\\%03o'", c);
        } else {
                linepos += printf("'%c'", c);
        }
        break;
case 8:     
        linepos += printf("0%o", c);
        break;  
/* case 10: ... */
}

is­c­n­trl() helps us de­tect char­ac­ters that should be es­caped be­fore be­ing put back in an­other C source file; you must in­clude ctype.h at the beginning of the file.

Using -xco would re­sult in only the last flag tak­ing ef­fect.

Thanks for read­ing, I hope the short in­tro­duc­tion to getopt() was use­ful and more im­por­tantly, that new fea­tures we just added to file2c(1) will be just as help­ful to some.

source code