Properly Padding Spaces and Tab Widths

The task for last week’s Lesson was to convert tabs as well as spaces. The problem is that tab stops aren’t considered: On the terminal, a tab character generates a variable number of spaces based on where the next tab stop position is located. It isn’t a fixed value.

The default tab stop on a Linux terminal is set to eight spaces. Like on a typewriter or word processor, when a tab character is encountered, the terminal generates spaces necessary to move the cursor to the next tab stop. In Figure 1, you see how the tab stops are set, and how spacing is calculated to position the cursor when a tab character is encountered.

Terminal graphics

Figure 1. How tab stops work on the terminal. Spaces are added to line-up text at the next interval.

It’s possible to alter the terminal’s tab stops. The tabs command sets the interval, which means a user can adjust the tab stops to a value other than eight. For my sconvert utility, however, I assume the default value. Various trickery can be employed to obtain the current tab stop value, but that isn’t the point of this exercise.

Two updates are required for the code to calculate the proper number of spaces to output to line up text at a tab stop. The first is to track the current cursor position or offset from the start of the line. The second is to determine how many spaces are necessary to move the cursor to the next tab stop.

For the cursor offset, I use int variable offset. It’s initialized to zero at the start of each line, then incremented each time a character is output, which includes the   HTML code for a non-breakable space. The newline ('\n') and carriage return ('\r') characters are also intercepted to reset the offset value to zero.

The expression I use to calculate spaces to the next tab stop is:

spaces = TAB_STOP-(offset%TAB_STOP);

The offset MOD TAB_STOP calculation must be subtracted from the TAB_STOP value to obtain the number of spaces to insert for the cursor to reach the next tab stop. The spaces value is then used to generate space characters ( ) to fill the gap. Here is my updated sconvert code:

2023_04_15-Lesson.c

#include <stdio.h>

/* assumptions */
#define TAB_STOP 8
#define LINE_LEN 80

int main()
{
    int ch,offset,spaces,x;

    offset = 0;
    while(1)
    {
        ch = getchar();
        if( ch==EOF )
            break;
        switch(ch)
        {
            case ' ':
                printf("&nbsp;");
                offset++;
                break;
            case '\t':
                spaces = TAB_STOP-(offset%TAB_STOP);
                for( x=0; x<spaces; x++ )
                {
                    offset++;
                    printf("&nbsp;");
                }
                break;
            case '\n':
            case '\r':
                putchar(ch);
                offset = 0;
                break;
            default:
                putchar(ch);
                offset++;
        }
        if( offset==LINE_LEN )
            offset = 0;
    }

    return(0);
}

The two defined constants, TAB_STOP and LINE_LEN, are assumptions based on standard terminal settings: eight-character tab stops and an 80-character line width.

Variable offset is initialized before the endless while loop.

The code’s switch-case structure is now more involved. Each character output modifies the offset value, with the newline and carriage return characters resetting the value.

For the tab, variable spaces represents the spaces to move forward the cursor by using the expression outlined earlier in this post. It’s important to use this value and not offset directly as variable offset is modified within the for loop that outputs the spaces (or &nbsp; HTML codes).

After the switch-case structure, a test is made to see whether offset is greater than the line length, which happens for a long line that overflows. If so, variable offset is again reset to zero.

Here’s a sample run, filtering output from the days program used in last week’s Lesson:

Monday  0
Tuesday 1
Wednesday       2
Thursday        3
Friday  4
Saturday        5
Sunday  6

Screenshot

Figure 1. Text output with tabs may not line up perfectly. (Tab stops set every eight positions.)

For reference, Figure 1 is shown nearby. The output is properly converted, which means the sconvert program is one step closer to being complete. All that remains is accounting for the special characters: <, >, &. I cover this final update in next week’s Lesson.

Leave a Reply