Encoding and Decoding, Part I

A good way to exercise your C programming muscles is to work on a encoding/decoding project. This process makes you think about data and how it’s represented, and also how to work on both ends of an input/output puzzle.

Encoding is the process of translating information into another format. For example, Morse Code translates letters and numbers into dots and dashes.

At the other end, decoding translates the code created back into its original form, such as taking the dots and dashes and converting them back into readable text.

Don’t confuse encoding with compression or encryption. While these could be attributes of encoding, they’re not required features. As with encoding, you can compress and decompress data, encrypt it or decrypt it, or a mixture of each. In all cases, the original data is recovered, which is the point of encoding and decoding.

A simple example of encoding would be to input a string as text and output that string as its ASCII code values, using either decimal or hex values. Send those values into another program to decode and retrieve the original text.

Another example, one I touch upon in my books, is the exclusive OR operation using byte 0xAA, binary 10101010. If you XOR 0xAA any byte and then XOR 0xAA the result, you get back the original value.

The following code reads text from standard input and outputs a string of hexadecimal bytes encoded with XOR 0xAA.

#include <stdio.h>

int main()
{
    int ch;

    while(1)
    {
        ch = fgetc(stdin);
        if(ch == EOF)
            break;
        printf("%02X",ch^0xAA);
    }

    return(0);
}

The while loop at Lines 7 through 13 is endless. Standard input is read at Line 9 and stored in variable ch. If the end of file is captured at Line 10, the loop halts. Otherwise, the value of ch is XOR’d with 0xAA in the printf() statement at Line 12, and output as a 2-digit hexadecimal byte.

Yes, this program is a filter, which reads standard input and generates standard output. You can run it by itself, which is weird, or redirect input from another source.

If you run the code by itself, type some text and press the Enter key to see it processed. Press Ctrl+C to end input or, to generate the EOF character, type Ctrl+Z in Windows or Ctrl+D in Unix. Here’s a sample run at the command prompt:

Hello there, Dan!
E2CFC6C6C58ADEC2CFD8CF868AEECBC48BA0

Assuming the program is named 1024 and you have a file named gettysburg.txt in the current directory, you can use the filter with input redirection to encode the file’s text:

./1024 < gettysburg.txt 
ECC5DFD88AD9C9C5D8CF8ACBC4CE8AD9CFDCCFC48AD3CFCBD8D98ACBCDC58AC5DFD88ACCCBDEC2CFD8D98AC8D8C5DFCDC2DE8ACCC5D8DEC28AC5C48ADEC2C3D98AC9C5C4DEC3C4CFC4DE8ACB8AC4CFDD8AC4CBDEC3C5C4868AC9C5C4C9CFC3DCCFCE8AC3C48AC6C3C8CFD8DED3868ACBC4CE8ACECFCEC3C9CBDECFCE8ADEC58ADEC2CF8ADAD8C5DAC5D9C3DEC3C5C48ADEC2CBDE8ACBC6C68AC7CFC48ACBD8CF8AC9D8CFCBDECFCE8ACFDBDFCBC684A0A0

The encoding works, but it's sloppy. First, the output isn't identifiable as encoded data, so a heading would help not only anyone reading the data but eventually assist the decoding program to recognize properly-encoded data.

Second, the code could be formatted in a manner other than a long string of hex bytes. After all, this output is encoded. It's not an attempt at encryption or obfuscation, so making it pretty is a good thing.

In next week's Lesson I show some improvements to the code, which properly presents the output.

Leave a Reply