Hang On a Sec, Part II

The old days are gone, and with them the practice of using a for loop as a timing delay. Loops still do pause program execution, with the question being how long does it take a computer to wait for a loop?

Picking up from last week’s Lesson, I need to know how many steps a loop needs to emulate those old days and pause program execution for about one second. I could just guess, starting with one million or so, but last Lesson I chose to use the computer’s clock to help me.

The final code presented last week used the epoch value returned from the time() function to monitor a flip from one second to the next. As the epoch value represents seconds, this approach seems good enough to present a situation where the computer can count away during the loop and report the count value. Here is the update to last week’s code:

2023_03_25-Lesson-a.c

#include <stdio.h>
#include <time.h>

int main()
{
    time_t now,then;
    long count;

    /* obtain clock tick Epoch value */
    time(&now);

    /* pause one second */
    then = now;
    while(then==now)
        time(&now);
    /* now a new second has begun
       start the next "full second" loop */
    then = now;
    count = 0;
    while(then==now)
    {
        count++;
        time(&now);
    }
    printf("Cycles: %ld\n",count);

    return(0);
}

New to this version of the code is the declaration of long variable count. It’s initialized to zero at Line 19. Within the second while loop, count is incremented.

Gone are the two printf() statements that output the clock tick values. A single printf() statement at Line 25 outputs the number of cycles (the value of variable count) processed during the one-second loop. On my computer, I see this result:

Cycles: 377291420

The next step is to write a program that uses this value in a for loop to attempt to pause execution for one second. Here is the code I wrote:

2023_03_25-Lesson-b.c

#include <stdio.h>

int main()
{
    long delay;

    /* pause one second */
    for( delay=0; delay<377291420; delay ++ )
        ;

    return(0);
}

I could have used a while loop, but for loops are how I wrote code way back when. No output is needed. Instead, I use the command line time utility to see how long the program (a.out) takes to run:

$ time ./a.out

real 0m0.640s
user 0m0.633s
sys 0m0.000s

Rather than a full second, the code took 0.64 seconds to run — far less than what I hoped. Bummer.

Then I remembered something important: The original program that counted the loop’s iterations also called the time() function. Adding that call adds to the delay. To more accurately pause a second, the for loop should also call the time() function, as shown in this update:

2023_03_25-Lesson-c.c

#include <stdio.h>
#include <time.h>

int main()
{
    long delay;

    /* pause one second */
    for( delay=0; delay<377291420; delay ++ )
        time(NULL);

    return(0);
}

The time() function requires an argument, which is set as NULL. Also, the time.h header is required. Duh.

Here is the result when the program is run from the time utility:

$ time ./a.out

real 0m0.944s
user 0m0.937s
sys 0m0.001s

The value 0.944 is much closer to one second, which I’m flagging as a success.

Then again, it’s not really a success. Running these delay programs on an older PC yielded a cycles value of 6,686,683 — far less than 377,291,420 on my main, fast computer. So the final program, the one that supposedly takes one second to run, takes nearly a minute to run on that older PC! Hopefully you can see why using a delay loop on a modern computer shouldn’t be a thing any more.

4 thoughts on “Hang On a Sec, Part II

  1. First an admission: if I had gone the same route as above, I would (for sure!) also have forgotten to include time(NULL); on my first try ☺

    Also, I would like to second the opinion, that “busy waiting” shouldnʼt be used on modern systems. Except… well, except if someone wishes to wait for extremely short periods of time.

    To this end (and as a follow-up to my own solution from last week), I came up with the following strategy to implement a high-precision “busy waiting” function:

    (1) Read the processorʼs “time stamp counter” (TSC) before and after a loop that uses clock_gettime(CLOCK_MONOTONIC); to wait for exactly 1 second

    … leaving measurement errors aside, this should give us the TSCʼs frequency!

    (2) Waiting for a certain number of nano-, micro- or milliseconds should then simply be a matter of reading the current value of the TSC, adding an appropriate number of ticks (corresponding to the timeframe to wait for), and entering a “busy waiting loop” that uses the ‘__rdtsc()’ intrinsic to see if the processorʼs TSC has crossed this pre-calculated value

    This should work, because todays multi-core CPUs donʼt increment the built-in “time stamp counter” at every clock cycle (like earlier CPUs, since Pentium, used to do) – instead they count “reference cycles”! (See answer “answer-51907627” on StackOverflow for details.)

    Following the above strategy, I first implemented a get_approx_tsc_freq() function, which at its core works like this:

    struct timespec now, then;
    (void)clock_gettime (CLOCK_MONOTONIC, &then); ++then.tv_sec;
    tsc_now = read_tsc();
    do {
      (void)prof_gettime (PROF_MONOTONIC, &now);
    } while (now.tv_sec < then.tv_sec || now.tv_nsec < then.tv_nsec);
    tsc_then = read_tsc();

    On my system this returns: get_approx_tsc_freq() = 2,495,999,036 Hz

    I.e. the “time stamp counter” on my system is incremented at a frequency of 2,5 GHz (“reference cycles”). Note that this is, strictly speaking, not entirely true… in actuality, the external “bus frequency” is 100 MHz and at every bus clock cycle the TSC is incremented by 25 (meaning, that the TSC isnʼt accurate to 250 picoseconds but only to about 10 nanoseconds).

    Anyway, the TSCʼs frequency thus being (approximately) determined, writing, for example, a waitms() function is quite an easy matter:

    void waitms (unsigned int ms)
    { unsigned long long now, then;

      now = read_tsc ();
      then = now + ms*tsc_freq/1000ull;

      do {
        now = read_tsc ();
     } while (now < then);
    }

    One last caveat: various online resources point to the fact, that Intelʼs CPUs have been counting “reference cycles” (as opposed to CPU clock cycles) since Intel “Nehalem” (November 2008). Fifteen years is a long time… therefore, at least in the Intel world, the above should work on practically all systems out there.

    AMDʼs processors are another matter though… I have no idea if (or since when) they count “reference cycles”. (I.e. if the above works correctly on AMD at all.)

    As always, I have uploaded project files for Codelite (Linux) as well as Visual Studio (Windows) on GitHub (should anyone be interested): https://tinyurl.com/2a6w59hx

  2. I had hoped that this approach would be of interest ☺

    Truth be told, however, working with the processorʼs TSC was the first time for me also… I learned quite a lot while researching the above.

  3. That’s similar to how I discover new things in C. It seems like lots of fun and clever things exist, but the documentation is scant.

Leave a Reply