Encoding and Decoding, Part II

To help improve the presentation of encoded data, consider sprucing up the output.

In last week’s Lesson, the XOR 0xAA encoding program read text input and generated hexadecimal output. Here’s the typical result of running the program on the gettysburg.txt file:

./1024 < gettysburg.txt 
ECC5DFD88AD9C9C5D8CF8ACBC4CE8AD9CFDCCFC48AD3CFCBD8D98ACBCDC58AC5DFD88ACCCBDEC2CFD8D98AC8D8C5DFCDC2DE8ACCC5D8DEC28AC5C48ADEC2C3D98AC9C5C4DEC3C4CFC4DE8ACB8AC4CFDD8AC4CBDEC3C5C4868AC9C5C4C9CFC3DCCFCE8AC3C48AC6C3C8CFD8DED3868ACBC4CE8ACECFCEC3C9CBDECFCE8ADEC58ADEC2CF8ADAD8C5DAC5D9C3DEC3C5C48ADEC2CBDE8ACBC6C68AC7CFC48ACBD8CF8AC9D8CFCBDECFCE8ACFDBDFCBC684A0A0

As formatted on this web page, the encoded text is just a long string, which you have to scroll left and right to see. On the terminal screen, it looks just as ugly.

Because a specific method is used to encode the text, and the long-term plan is to decode the text, a better form of output is a positive thing. To be helpful to the decoder program, the output could be formatted. It can include a header that describes the encoding format. The data itself can be organized so that the decoding program has an easier time fetching the data — and humans can more easily view the data. And finally, an ending tag can wrap up the encoded data, confirming that it's been properly encoded and read.

To meet these ends, I've crafted the following improvement to the original encoding program, which I call hexcode:

#include <stdio.h>

#define BYTES_PER_LINE 24

int main()
{
    int ch,bytes;

    bytes = 0;
    printf("START HEX CODE v1.0\n");
    while(1)
    {
        ch = fgetc(stdin);
        bytes++;
        if(ch == EOF)
            break;
        printf(" %02X",ch^0xAA);
        if( bytes == BYTES_PER_LINE)
        {
            putchar('\n');
            bytes = 0;
        }
    }
    printf("\nEND HEX CODE\n");

    return(0);
}

The defined constant BYTES_PER_LINE at Line 3 sets the number of hex byte chunks displayed in each line (row) of output. This value is used with the bytes variable, which is initialized at Line 9, then incremented at Line 14 for each character read. When the value of bytes is equal to the BYTES_PER_LINE constant (Line 18), the bytes variable is reset, and a newline is output to start over with the next row of formatted text.

The first line output is the program's name and version number, at Line 10. I added the word START mostly for human eyes, but the program name can be used by the decoder to confirm the data format. The version number also helps the decoder to recognize how the data is encoded, should the format change in the future.

At Line 17, the printf() statement's output is modified to add a space between each hexadecimal value. The XOR 0xAA function still takes place at that line.

Finally, I added a end tag, output with printf() at Line 24.

Here is the sample output for the gettysburg.txt file, which was shown earlier in this post using the original hexcode program:

$ ./hexcode < gettysburg.txt 
START HEX CODE v1.0
 EC C5 DF D8 8A D9 C9 C5 D8 CF 8A CB C4 CE 8A D9 CF DC CF C4 8A D3 CF CB
 D8 D9 8A CB CD C5 8A C5 DF D8 8A CC CB DE C2 CF D8 D9 8A C8 D8 C5 DF CD
 C2 DE 8A CC C5 D8 DE C2 8A C5 C4 8A DE C2 C3 D9 8A C9 C5 C4 DE C3 C4 CF
 C4 DE 8A CB 8A C4 CF DD 8A C4 CB DE C3 C5 C4 86 8A C9 C5 C4 C9 CF C3 DC
 CF CE 8A C3 C4 8A C6 C3 C8 CF D8 DE D3 86 8A CB C4 CE 8A CE CF CE C3 C9
 CB DE CF CE 8A DE C5 8A DE C2 CF 8A DA D8 C5 DA C5 D9 C3 DE C3 C5 C4 8A
 DE C2 CB DE 8A CB C6 C6 8A C7 CF C4 8A CB D8 CF 8A C9 D8 CF CB DE CF CE
 8A CF DB DF CB C6 84 A0 A0
END HEX CODE

Much better!

In next week's Lesson, the decoding process begins. The first step is to convert a hexadecimal character string into a value.

Leave a Reply