Raw Reading File Data

When you use the fopen() function to open a file for reading, a buncha functions are available for reading data: fread(), fgets(), fgetc(), and others I’m too lazy to look up. Reading files by using the open() function, however, gives you this choice: the read() function.

The read() function has this man page format (from page 2), the function is prototyped in the unistd.h header file:

ssize_t read(int fildes, void *buf, size_t nbyte);

The first argument, fildes, is the file descriptor integer returned from the open() function.

Argument buf is the address of a buffer to store the data read. It can be whatever type of data is required: character, structure, what-have-you.

The final argument, nbyte, is the number of bytes to read, usually the size of the buf (the buffer).

The return value is the number of bytes actually read.

Like other file-reading functions, read() consumes file data sequentially. A file indicator keeps track of the location in the file from which data is read. This indicator, or “pointer,” can be manipulated to allow for random file access.

It’s important to keep in mind that the read() function’s data is unformatted. It does not read strings or any data type, but rather raw bytes. The function cares not when the null character terminates a string, the size of an int, or anything a formatted file input function would concern itself with. It sees everything as unformatted data.

The following code uses the read() function to consume the raw bytes from the file gettysburg.txt.

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

#define SIZE 2048

int main()
{
    const char filename[] = "gettysburg.txt";
    int fdes,x;
    char buffer[SIZE];
    size_t r;

    /* open the file for unformatted input */
    fdes = open(filename,O_RDONLY);
    if( fdes==-1 )
    {
        fprintf(stderr,"Unable to open %s\n",filename);
        return(1);
    }

    /* read raw data */
    r = read( fdes, buffer, SIZE );
    
    /* output the buffer */
    /* not null character terminated! */
    for( x=0; x<r; x++ )
        putchar( buffer[x] );

    /* close the file */
    close(fdes);

    return(0);
}

Line 23 reads a buffer-sized chunk of data from the file:

r = read( fdes, buffer, SIZE );

I know that the file size is less than the value of SIZE, so a single read() function gobbles its entire contents. The data is stored in char array buffer[], and then spewed out in the for loop at Line 27:

for( x=0; x<r; x++ )

The loop uses the value of r, returned from the read() function, to set its limit. The file’s contents are output, but the code isn’t perfect because it wouldn’t output all the file’s data if its size is larger than defined constant SIZE, or 2048 bytes.

The following code code improves upon this file-reading example, modifying the read() function to fetch a single byte at a time from the open file:

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{
    const char filename[] = "gettysburg.txt";
    int fdes;
    char buffer[1];
    size_t r;

    /* open the file for unformatted input */
    fdes = open(filename,O_RDONLY);
    if( fdes==-1 )
    {
        fprintf(stderr,"Unable to open %s\n",filename);
        return(1);
    }

    /* output the buffer */
    while(1)
    {
        /* read raw data one byte at a time */
        r = read( fdes, buffer, 1 );
        if( r==0 )    /* end of file */
            break;
        if( r==-1 )    /* file read error */
        {
            fprintf(stderr,"File read error\n");
            break;
        }
        putchar( buffer[0] );
    }

    /* close the file */
    close(fdes);

    return(0);
}

The read() function at Line 24 grabs only one byte:

r = read( fdes, buffer, 1 );

The byte read must be stored in a buffer, a char array or pointer, which has only one character of storage as defined at Line 9:

char buffer[1];

The while loop that reads the data (starting at Line 21) is endless. To terminate the loop, the value r returned from the read() function must be equal to either 0 or -1: 0 indicates the file has been completely read and -1 indicates a read error.

This second example is better than the first because it reads file data no matter what the file’s size.

In next week’s Lesson I cover writing raw data to a file.

Leave a Reply