Encoding a String – Solution

The task for this month’s Exercise is to write an encoding filter that follows a specific pattern: After the first character (or byte) is output as a 2-digit hex value, the remaining characters are output as the difference between the previous character and the current character. I’m sure this type of encoding has an official name, but it’s the holidays and I’m too lazy to look it up.

The encoder I created uses two int variables, current and previous. Variable current contains the current character read and previous contains the preceding character. This scheme is initialized with previous set equal to zero, so the first character read always retains its original value.

Like any filter, I use a loop with the getchar() function as a condition to read standard input. The value read is output as a 2-digit hex value, the difference between current and previous. Then previous is assigned the value of current, and the loop continues:

2021_12-Exercise.c

#include <stdio.h>

int main()
{
    int current,previous;

    /* process input */
    chb = 0;
    while( (current=getchar()) != EOF )
    {
        printf("%02X",(unsigned char)(current-previous));
        previous = current;
    }
    putchar('\n');

    return(0);
}

Both variables current and previous are declared as integers. This data type is the value returned from the getchar() function. The reason is that the EOF is an integer value, not a character. This value is tested for in the while loop condition, which is how the loop terminates.

At Line 11, the printf() statement uses placeholder %02X to output a two-digit hexadecimal value, uppercase letters, with zero padded for single-digit output. This keeps the encoding consistent at two characters per byte.

The (unsigned char) cast ensures that the value output is only two digits long. Otherwise, for negative values the compiler pay prefix an integer with a few F digits.

At this point, you can play with the filter, but the data stays encrypted:

$ ./encode
What secrets are to be revealed?
5711F913AC53F2FE0FF30FFFAD4111F3BB54FBB14203BB52F311EFFC0BF9FFDBCB

I hope that your solution met with success. An easy way to test is to process two adjacent letters, up and down:

$ ./encode
ABA
4101FFC9

The code for 'A' is 41. Up to 'B' is 01. Down to 'A' is FF, which is -1 in two-digit hex. The C9 is the newline.

The true usefulness of this encoding, and the test of your programming prowess, comes with writing the decoding filter. The decoding program is the topic of next month’s Exercise. You can start creating now or wait until New Year’s Day, which is what I’m going to do.

Leave a Reply