The URL Encoding Filter – Solution

A URL filter isn’t that difficult to code, once you know the rules. I’m sure you can concoct something clever or obfuscated in the C language, but I chose to use a clutch of if/else if/else statements to process input and generate output.

First, here is the basic filter skeleton:

    int i;

    while( (i=getchar()) != EOF )
    {
        putchar(i);
    }

The while loop fetches a single character, i, from the input stream. The getchar() function reads input and the putchar() function outputs the character. The EOF condition monitors the End Of File marker, which is useful when the filter reads redirected input. All the details are covered in my books.

A straight pipe between input and output is technically a filter, but it’s not really a computer program because it doesn’t do anything to alter input. For this month’s Exercise, the task is to convert ASCII code into the percent encoding format, which requires a single statement modification to the basic filter:

printf("%%%02X",i);

The percent sign must be escaped, so two are specified, %%. Then comes the output format, which is %02X. This placeholder sets output in hexadecimal format, two-digits wide, padded with a leading zero, and with uppercase letters A through F.

Still, the single printf() function doesn’t adhere to the percent encoding format guidelines for HTML 5: Alphanumeric characters are not converted, characters - . _ * are left alone, and spaces are converted to + signs. To deal with these exceptions, some if else action is required. Here is my full solution:

#include <stdio.h>
#include <ctype.h>

int main()
{
    int i;

    while( (i=getchar()) != EOF )
    {
        if( i=='-' || i=='.' || i=='_' || i=='*')
            putchar(i);
        else if( i==' ')
            putchar('+');
        else if( isalnum(i) )
            putchar(i);
        else
            printf("%%%02X",i);
    }

    return(0);
}

The first if statement at Line 10 uses multiple || (or) conditions to weed out characters - . _ *. These characters are output directly at Line 11. All other input slips down to Line 12 for the second comparison.

The else if at Line 12 converts the space character to a plus sign.

Next, at Line 14 the else if uses the isalnum() ctype function to cull out all alphanumeric characters, a to z, A to Z, and 0 to 9. These are output directly at Line 15.

The remaining characters or codes are handled by the else at Line 16. The printf() statement converts the character to percent encoding.

Another way to handle this operation would be to construct a horrendously long switch case structure, where each of the 128 ASCII codes are converted individually or in groups. That code would be fun to write, but it would take more time than the multiple if statements in my solution.

If you have a different solution, let me know. You can email me at the address found in the front of my books, or ask for an account here. I must manually sign-up people as the blog software is ripe for phony signups.

Leave a Reply