The URL Decoding Filter – Solution

Unwinding percent-encoding involves three steps:

  1. Pass-through the unchanged characters.
  2. Change + back into a space.
  3. Decode the percent strings, which is the most involved process.


For this month’s Exercise, my solution is a simple I/O filter. In fact, for most decoding, the character input shoots back out the filter as output. Here is the main() function in my solution:

int main()
{
    int i;

    while( (i=getchar()) != EOF )
    {
        if( i=='%' )
            pdecode();
        else if( i=='+')
            putchar(' ');
        else
            putchar(i);
    }

    return(0);
}

The while loop keeps spinning until an EOF (end of file) character is encountered. The getchar() function reads input; the putchar() function writes output. The if-else-if-else structure handles the three steps for decoding percent encoding. As you can see in the main() function, I handle the steps backwards:

Third, when the % character is encountered, execution shuffles off to the pdecode() function. I could do the translation in the main() function, but by using pdecode() I keep main() short and readable.

Second, if the input character is a +, the space character is output.

First, anything left over is output directly. This final condition catches all alphanumeric text as well as the four exceptions in HTML5: - . _ *

The pdecode() function works to grab the next two characters of input; the % is already digested and not output.

void pdecode(void)
{
    int a;

    a = char2hex() * 0x10;
    a += char2hex();
    putchar(a);
}

Variable a builds the value, the ASCII code translated from the next two characters of input. The char2hex() function reads and tests input. The value returned is 0x0 through 0xF (15), translated from characters 0 through 9 and A through F.

The first character read is multiplied by 0x10 (16) and stored in variable a; the next character is added to that value. So, for example, string A1 is translated into 0xA * 0x10 + 0x1, which becomes 0xA1. That value is output by putchar(a).

The char2hex() function’s job is to read a character of input, confirm that it’s a valid hexadecimal digit, and return that value. On an error, the function exits the program:

int char2hex(void)
{
    int d;

    d = getchar();
    if( d==EOF ) exit(1);   /* quit on EOF */
    if( isnumber(d) )
        return( d - '0' );
    d = toupper(d);
    if( d>='A' && d<='F' )
        return( d - 'A' + 0x0A );
    else
        exit(1);
}

The getchar() function fetches the next character from the input stream. Immediately, if that character is the EOF, the function exits the program. This, and all exit conditions in this function, indicated a poorly-formed percent-encoded string, so bailing out is a valid move.

The next if test checks for a digit, 0 to 9. If so, that digit’s value is returned, 0 to 9.

For non-digit characters, the toupper() function translates input to uppercase, and another if test confirms whether the character is in the range 'A' to 'F'. If so, the character’s value, 10 through 15, is returned.

The final else catches any non-hexadecimal character and terminates the program as the percent-ended string is malformed or invalid.

Click here to view my entire solution. Your solution may vary, of course, but if it handles the translation, it’s good.

To test the program, process filtered input from last month’s Exercise and run it through the solution for this month’s exercise. If the final string matches the initial string, everything works. For example:

$ cat 09solution.txt
The URL Decoding Filter – Solution
$ cat 09solution.txt | ./percent-encode http%3A%2F%2Fc-for-dummies.com%2Fblog%2F%3Fp%3D2681%0A$ $ cat 09solution.txt | ./percent-encode | ./percent-decode http://c-for-dummies.com/blog/?p=2681

Leave a Reply