The URL Encoding Filter

URL encoding is a method of translating ASCII codes (not just text or URLs) into what’s often referred to as percent encoding. You’ve probably seen this format on your web browser’s address bar or in a search engine’s text. The encoding format is necessary to preserve the original content as plain text.

For example, ASCII codes 0 through 31 represent control characters, such as tab, newline, and others. To ensure that those character codes survive transmission over the Internet, URL encoding translates their values into a 2-digit hex string prefixed by the percent sign. So the tab character, ASCII code 8, becomes %08 when translated. And it’s translated back to ASCII code 8 when decoded.

With URL encoding/decoding, not every character is translated into the percent-hex string. Multiple standards exist, but for each one the alphanumeric characters (0-9, A-Z, a-z) are retained. In the HTML 5 standard, the characters - . _ and * are retained and spaces are converted to the + character. Everything else is translated to a two-digit hexadecimal string (upper or lower case) prefixed by a percent sign.

Your task for this month’s Exercise is to write a URL encoding filter. The filter processes standard input, translating characters to the percent encoding format as necessary, and outputting the result. (Details on writing a filter are presented in my C programming books.)

As an example this page’s URL is: http://c-for-dummies.com/blog/?p=2626 When run through the filter, the output is:

http%3A%2F%2Fc-for-dummies.com%2Fblog%2F%3Fp%3D2626

Please try this Exercise on your own before you click here to view my solution.

2 thoughts on “The URL Encoding Filter

  1. In my solution (for my computer) I included:
    else if ( (int)i==10 ) printf(“\n”);
    Otherwise, after I pressed [Enter] the filtered result would end with %0A and the terminal cursor would not advance to a new line.

  2. I struggled with that option for my solution. Like you, I saw the output and it all ran together, which is visually unappealing. Still, the encoding rules don’t stipulate for a newline. Also, next month’s Exercise (sneak peek!) has you decode the sequence, so that’s also something to consider.

Leave a Reply