Initializing Arrays in C23

You can initialize an array in the C programming languages in three ways: Not at all, only some elements, or all of the elements. Some compilers offer a fourth way, which initializes all elements to the same value. With the C23 standard, yet another way to initialize an array is possible.

To review, an array is uninitialized when declared like this:

int values[20];

The array values[] has storage for 20 integers. These storage locations (elements) are defined, but not assigned values. The values could be anything; your code should never assume that all 20 elements are initialized to zero.

You can define only a few elements in the array, as I wrote about recently:

int numbers[6] = { [3] = 40, [4] = 50 };

Above, elements 3 and 4 are set the values 40 and 50, but the other elements remain uninitialized. (This method may be non-standard, implemented only in a few compilers.)

You can define all the elements:

char string[] = "Hello!";

Back in 2020, I wrote about initializing all elements in an array to zero:

int scores[5] = { 0 };

This construction is non-standard, but it makes you think you can initialize an array to all the same values. Alas, it doesn’t work that way:

int scores[5] = { 1 };

The above statement initializes element zero to 1, but the other four elements are set to zero. Again, this construction is compiler-specific and not part of the C standard.

What is part of the new C23 standard, however, is the following expression:

int a[5] = {};

This statement declares an integer array, a[], with five elements all initialized to zero. The ={} expression is called the empty initializer. In fact, it may already be implemented on your compiler, but its standard with C23.

2024_01_13-Lesson.c

#include <stdio.h>

int main()
{
    int a[5] = {};

    for( int x=0; x<5; x++ )
        printf("a[%d] = %d\n",x,a[x]);

    return 0;
}

The code above demonstrates the empty initializer. The expression int a[5] = {} assigns zeros to the five elements of a[]. A for loop outputs these values, which are all zeros.

This code builds under the current version of clang without the -std=c2x switch. Here is the output:

a[0] = 0
a[1] = 0
a[2] = 0
a[3] = 0
a[4] = 0

It’s too bad that C continues to not initialize variables when they’re declared. For example, to set all integers to zero and real numbers to 0.0 as well as pointers to NULL. I don’t know whether such a thing was discussed by the standards committee or even if any negatives are associated. But the addition of the empty initializer as a standard is welcome.

In next week’s Lesson, I review some of the new language features.

8 thoughts on “Initializing Arrays in C23

  1. I think this new “empty initialization” was inspired by C++11ʼs “uniform initialization” syntax (and in particular by C++ʼs “zero-initialization” syntax); regrettably, without being completely compatible to it:

    int value {};   /* Valid in C++11, but not in C23 (due to missing =) */

    “It’s too bad that C continues to not initialize variables when they’re declared.”

    It does do that for static objects (or, rather, the program loader does). I don’t think the standard will ever require such behavior for local variables, however. The simple reason being that the mindset in “C” is and has always been, that there should be no hidden costs. (The compiler having to emit statements to always initialize variables stored on the stack could in some cases turn out to be quite costly.)

    Thereʼs one inaccuracy I noted in the part where you describe how things worked up until now: “[…] this construction is compiler-specific and not part of the C standard”

    Thatʼs not true—Brian Kernighanʼs “The C programming language” (1ˢᵗ Edition, 1978) already contained the following passage with regards to array (and structure) initialization (in Appendix A, p.198):

    »If there are fewer initializers in the list than there are members of the aggregate, then the aggregate is padded with 0ʼs«

    The earliest C standard, ANSI X3.159-1989 ⇔ ISO/IEC 9899:1990, contains a similarly phrased passage mandating the same thing (in section §6.5.7 “Initialization”).

  2. Thanks for the updates, as always. I re-read the section on array initialization in K&R when I researched this post. I completely missed the part about partially initialized arrays.

    I’m curious about your statement regarding initializing values on the stack. I know that the stack is used to return values from a function and it can be used to pass arguments to a function. When would a value on the stack not be initialized?

  3. The following is a bit PC-centric, but what I meant with that remark is that compilers usually only have to manipulate the stack pointer to reserve (enough) space for all the local variables of a function. Letʼs take the following function as an example:

    void do_something (void)
    { char c;
      short s;
      int i;
      fprintf (stdout, “ord(c) = %d, s = %hhd, i = %d\n”, c, s, i);
    }

    To accomodate enough stack space for variables ‘c’, ‘s’ and ‘i’, all the compiler—as the stack grows downward on x86—has to do, is to emit a single SUB instruction for subtracting 8 from RSP (thereby freeing enough stack space¹ for these 3 variables):

    do_something:
      ; functionʼs prologue:
      sub RSP, 8  ; word [RSP] == i, word [RSP+4] == s, word [RSP + 7] == c

      ; functionʼs body:
           ⋮

      ; functionʼs epilogue:
      add RSP, 8
      ret

    If the standard were to suddenly mandate that local variables had to be initialized, this would force the compiler to emit something like a memset(RSP, 0, 8); call to ensure that all those stack bytes between RSP and RSP + 7 were set to zero (before the “real body” of the function was entered).

    With this overhead in mind, guaranteed automatic initialization of local variables is, in my opinion, not something the standard is ever likely to mandate.

    ¹ With alignment requirements of modern hardware dictating that variables with a size of n bytes should be located at an address that is divisible by n, there will be a “padding byte” before/after c so that &s % 2 == 0.

  4. Small correction…sigh: the comment after sub RSP, 8 (in the “functionʼs prologue”) should, of course, read as follows:

    ; dword [RSP] == i, word [RSP+4] == s, byte [RSP + 7] == c

  5. Sorry, only saw your post just now!

    While things can get muddy if script languages/interpreted languages enter the picture, for native (x86) code itʼs usually like this:

    A functions local variables are stored on the stack, and—if the __stdcall calling convention¹ is in effect—(all of) its arguments are passed on the stack as well (the former being the reason that automatic initialization of local variables isnʼt happening by default: local variables are created by simply moving the stack pointer down by a certain amount of bytes).

    The heap is that part of a programʼs (virtual) memory space that acts as a pool of memory from which—in “C” usually with the help of malloc()—dynamically allocated memory is taken.

    Variables with “automatic storage duration” (i.e. local variables [on the stack]) as well as variables with “static storage duration” (i.e. global variables [in segments .data or .bss]) can reference/point to runtime-allocated memory on the heap, but neither of them are stored there.

    Niall Cooling (from Feabhas Ltd) has a nice video on all the minutia of this topic: http://tinyurl.com/2zt2ae2t

    ¹ Agner Fogʼs “calling_conventions.pdf” contains a good, albeit somewhat technical, overview of the different possibilities on various systems: http://tinyurl.com/msx4fpcd

  6. Neat. I’ve studied function calls, but not automatic variable storage. Great info! Thanks.

  7. Just a small correction: the default calling convention for “C” is, of course, __cdecl (not __stdcall).

    Luckily, for the above discussion it doesnʼt really make any difference, because the same argument passing logic⁽¹⁾ applies in both cases (i.e. function arguments are pushed onto the stack in right-to-left order, with the last argument being pushed first).

    ⁽¹⁾ The only difference being in who is responsible for cleanup: with __cdecl itʼs the calling functions responsibility to adjust the stack pointer (thereby removing all passed arguments from the stack), whereas with __stdcall itʼs the calleeʼs.

Leave a Reply