Encoding and Decoding, Part IV

It would be a grand thing to set out and craft the entire decoding program in one sitting. That’s ambition in action, but it doesn’t demonstrate much programming experience.

Instead of facing multiple setbacks, my approach is to tackle the decoding process one step at a time.

The first step, conquered in last week’s Lesson, was to figure out how to translate the hexadecimal character string into a value. That accomplishment can be set aside for the next few steps, which are:

2. Process standard input one line at a time.
3. Confirm that the first line is the hexcode header and check the version number.
4. Decode each line of text and display the output.
5. Check for the end of the formatted hexcode data.

Step 2 is to write code that chews standard input one line a time. To accomplish this task, and because this program is a filter, you could use the fgets() function to read in a line of text from standard input. Instead, I’ve opted for more control by using the getchar() function. That way I can monitor input one character at a time.

For decoding hexcode output, you know that every line is formatted: 24 hex values on a line, each prefixed by a space, evaluates to 72 characters. Add one for the null character, and you need a 73 byte buffer to hold each line of input. Any line longer than 72 characters indicates some type of error, either a file read booboo or data that’s simple not in the hexcode format.

Here’s my solution:

#include <stdio.h>

#define LINE_LENGTH 73

int main()
{
    char line_buffer[LINE_LENGTH];
    char c;
    int buffer_index = 0;

    while(1)
    {
        c = getchar();
        if( c == EOF)
            break;
        line_buffer[buffer_index] = c;
        buffer_index++;
        if( buffer_index > LINE_LENGTH)
        {
            /* overflow condition */
            puts("\nInvalid hexcode line format");
            return(1);
        }
        if( c == '\n')
        {
            /* terminate string */
            line_buffer[buffer_index] = '\0';
            /* display contents */
            printf("%s",line_buffer);
            /* reset index */
            buffer_index = 0;
        }
    }

    return(0);
}

The filter runs from a endless while loop (Lines 11 through 33). Line 13 fetches a character from standard input. If the end of file (EOF in Line 14) is encountered, the loop terminates and the program stops. Otherwise, the character c is stored in the line_buffer char array at offset buffer_index (Line 16).

Line 18 checks for buffer overflow, which happens when the buffer_index variable is greater than the line length. This step not only prevents hackers from abusing the code, but it confirms that the input isn’t formatting for hexcode; the program spews out an error message (Line 21) and quits (Line 22).

When the newline is encountered (Line 24), the string is terminated (Line 27) and then displayed (Line 29). This is the point where processing would normally take place, but at this stage the string is output instead. Line 31 resets the buffer_index variable, and the endless while loop continues.

I used the following redirection command to create a hexcode formatted file, gettysburg.hexc:

hexencode < gettysburg.txt > gettysburg.hexc

Running this Lesson’s code on the file gettysburg.hexc generates this output:

a.out < gettysburg.hexc
START HEX CODE v1.0
 EC C5 DF D8 8A D9 C9 C5 D8 CF 8A CB C4 CE 8A D9 CF DC CF C4 8A D3 CF CB
 D8 D9 8A CB CD C5 8A C5 DF D8 8A CC CB DE C2 CF D8 D9 8A C8 D8 C5 DF CD
 C2 DE 8A CC C5 D8 DE C2 8A C5 C4 8A DE C2 C3 D9 8A C9 C5 C4 DE C3 C4 CF
 C4 DE 8A CB 8A C4 CF DD 8A C4 CB DE C3 C5 C4 86 8A C9 C5 C4 C9 CF C3 DC
 CF CE 8A C3 C4 8A C6 C3 C8 CF D8 DE D3 86 8A CB C4 CE 8A CE CF CE C3 C9
 CB DE CF CE 8A DE C5 8A DE C2 CF 8A DA D8 C5 DA C5 D9 C3 DE C3 C5 C4 8A
 DE C2 CB DE 8A CB C6 C6 8A C7 CF C4 8A CB D8 CF 8A C9 D8 CF CB DE CF CE
 8A CF DB DF CB C6 84 A0 A0
END HEX CODE

No processing takes place; the code simply spits out the formatted input. If you tried the code with a non-formatted file, such as the original gettysburg.txt file, you'd see this output:

a.out < gettysburg.txt 

Invalid hexcode line format

This solution confirms that hexcode output can be read and that long lines are rejected. Step 3 is to confirm the hexcode header, which I cover in the next Lesson.

Leave a Reply