Remove Trailing Blank Lines – Solution

The problem with snipping blank lines from the end of a file is storing the file as it’s processed. At least that’s the issue I faced as I worked through my solution to this month’s Exercise.

My first approach was to write a filter to solve the problem. The filter processes one character at a time — or even a line or two at a time — looking for the EOF and then working backwards to eliminate all but the final newline. The problem with this approach is that you could theoretically encounter a file that has a ton of newlines, more than would fit into the buffer.

Another thought was to move the file position indicator to the end of the file and scan backwards for excess newlines. The location of the final, necessary newline could be located, its position marked, then the rest of the file output up until this point. The issue I faced here is that, again, a file may have dozens of newlines or even be composed solely of newlines. And working backwards through a file to find a given offset requires more overhead than I was willing to code.

The solution I chose is to store the entire file in a dynamic buffer. As each chunk of text is read from the file, the buffer is reallocated, the new text chunk is then appended to the buffer’s tail.

The malloc() function allocates and initializes the buffer, b, before any data is read from the file:

    /* allocate and initialize the big buffer */
    b = (char *)malloc( sizeof(char)*1 );
    if( b==NULL )
    {
        fclose(fp);
        fprintf(stderr,"Memory allocation error\n");
        exit(1);
    }
    *b = '\0';
    length = 0;

If memory isn’t available, the file pointer fp is closed and the program exits. Otherwise, a one-byte buffer is initialized with a null character, *b = '\0'; an its length set to zero.

The next step is to read 64-character chunks from the file and reallocate the buffer to accommodate. Defined constant SIZE holds the value 64:

    /* read the file */
    while( !feof(fp) )
    {
        /* read SIZE bytes of data */
        r = fgets(buffer,SIZE,fp);
        /* confirm data read */
        if( r==NULL )
            break;
        /* reallocate buffer */
        length = strlen(buffer) + strlen(b);
        b = (char *)realloc( b, length+1 );
        if( b==NULL )
        {
            fclose(fp);
            fprintf(stderr,"Memory allocation error\n");
            exit(1);
        }
        strcat(b,buffer);
    }

The fgets() function reads from the file, and its return value is tested to confirm data was read.

Variable length is recalculated to determine the tally of characters read from a file. Its value is used in the realloc() function to re-size the buffer, with one byte added for the null character at the end of a string.

Finally, the strcat() function appends the text read from the file into the dynamic buffer, b.

After the file is read into the buffer, another while loop locates the first of any number of trailing newline characters and caps the buffer immediately after:

    /* The file's text is now in the buffer.
       Process the text from the end back */
        /* decrement the length because the
           first byte is zero */
    length--;
        /* search for the first non-newline */
    while( *(b+length)=='\n' )
    {
        length--;
    }
        /* preserve the last newline */
    length+=2;
        /* cap the string */
    *(b+length) = '\0';

The buffer need not be resized at this point; a printf() statement dumps its contents and the program is done.

Click here to view the full source code on GitHub.

I hope you devised a clever solution. The proof is running test files through the program to ensure that all trailing newlines are removed and the remainder of the file is intact.

Leave a Reply