How Big is That File? – Solution

The challenge for this month’s Exercise is to return a file’s size without using the stat() function. My goal is to get you to think about various file tools and how they can be useful beyond their intended purpose.

The hint for this solution is random file access. This approach manipulates a “file pointer,” which has nothing to do with C language pointers. No, a file pointer is an index into the file, an offset. I prefer to use the term file position indicator, which is how I describe it in an older post. The functions required are fseek() and ftell().

The fseek() function moves the file position indicator. Here is the man page format:

int fseek(FILE *stream, long offset, int whence);

Given the open file handle *stream, you can reset the file position indicator’s location to the end of the file. Use the whence value SEEK_END and set the offset to zero. Upon success, the file position indicator references the file’s final byte. You then use the ftell() function to report this offset:

long ftell(FILE *stream);

The stream argument is the open file handle. The value returned is a long integer representing the file position indicator’s location — its offset from the start of the file. Presto! You have the file’s size, as shown in my solution:

2023_01-Exercise.c

#include <stdio.h>

int main(int argc, char *argv[])
{
    char *filename;
    FILE *fh;
    long size;
    
    /* check for filename argument */
    if( argc<2 )
    {
        puts("Specify a filename");
        return(1);
    }

    /* operate from the first argument */
    filename = argv[1];

    fh = fopen(filename,"r");
    if( fh==NULL )
    {
        fprintf(stderr,"Unable to open %s\n",filename);
        return(1);
    }

    fseek( fh, 0, SEEK_END );
    size = ftell(fh);
    fclose(fh);

    printf("%s is %ld bytes long\n",filename,size);

    return(0);
}

The code starts the same as the Exercise post that used the stat() function: An argument is tested for and, when found, the named file is opened for read access.

At Line 26, the fseek() function moves the file position indicator to the last byte in the file. Then ftell() reports that value, saved in long variable size. The file is closed, then the size is output.

Of course, all this file position indicator manipulation isn’t necessary if you just use the stat() function. Still, I find that random file access is something many C programmers don’t use often. To think of reading data at a certain offset in a file isn’t something you normally do. Yet it’s something you can do and something you can use to obtain information about a file, such as its size.

I hope your solution met with success. If you figured out a way to do it without manipulating the file position indicator, please let me know. Thanks!

6 thoughts on “How Big is That File? – Solution

  1. I assume the “official” method gets the information from the OS using FAT or whatever without the overhead of actually opening the file.

    Years ago I started writing a program to read EXIF data from a JPEG. Unfortunately the person who created the EXIF “format” is either a genius or an idiot. One of it’s quirks is that small items of information (I think up to 8 bytes) is stored in a fixed position within the chunk of EXIF data, but with anything larger that fixed position holds an offset to another location in the file where the actual data is stored. It’s therefore useful to be able to jump to an arbitrary file location.

    If you had a data file (CSV, JSON, XML for example) you’d probably want to read in the whole lot or read + process the whole file in chunks but obviously you wouldn’t read a whole JPEG into memory just to get the EXIF data so jumping around the file to get the stuff you want is a better option.

    (Off-topic but another quirk is that some numerical data is stored as the numerator and denominator of a fraction. For example if you take a photo at f8 it’s represented by 80 and 10, so you have to divide 80 by 10 to get 8. Why!!!???)

  2. I forgot to mention that SEEK_END might get the file size using stat or at least using the same process, so this solution might not actually bypass stat.

  3. Have you used the ImageMagick library? It’s what I’ve used for manipulating JPEGs and PINGs. It fetches the data automatically.

    Yes, I assume that stat() just reads the file table or probably the directory entry, which would have the same info. And, undoubtedly, fseek() uses the same data. Cheating!

  4. If one has the option of going either route, I personally would always prefer stat() over fseek()/ftell(). Two reasons for that

    1. If the same FILE stream is to be used afterwards, relying on ftell() requires a complicated dance: ftell(); /* current pos*/ fseek(SEEK_END); ftell(); /* file size */ fseek(SEEK_SET); /* restore pos */ … especially if proper error handling is required;

    2. On Windows, if text files are opened in “text-mode”, line endings might get normalized from CRLF to LF, leading to unreliable results… from MSDN:

    “The value returned by ftell may not reflect the physical byte offset for streams opened in text mode, because text mode causes carriage return-line feed translation.”

    That being said, I agree that writing code that takes all the different idiosyncrasies of various OS into account is somewhat painful… as a more general solution would probably also necessitate that one differentiated between stat() and stat64() (and, on Windows, _stat64()).

  5. Great point about Windows/*nix and the newline interpretation. That one has thrown me many times.

Leave a Reply