Reader Q&A: What does it mean to initialize an int?
Acknowledgments: Thanks to Davis Herring, Jens Maurer, Richard Smith, Krystian Stasiowski, and Ville Voutilainen, who are all ISO C++ committee core language experts, for helping make my answer below correct and precise. I recently got this question in email from Sam Johnson. Sam wrote, lightly edited: That’s a great question. Cppreference is correct, and for … Continue reading Reader Q&A: What does it mean to initialize an int? →
Acknowledgments: Thanks to Davis Herring, Jens Maurer, Richard Smith, Krystian Stasiowski, and Ville Voutilainen, who are all ISO C++ committee core language experts, for helping make my answer below correct and precise.
I recently got this question in email from Sam Johnson. Sam wrote, lightly edited:
Given this code, at function local scope:
int a; a = 5;So many people think initialization happens on line 1, because websites like cppreference defines initialization as “Initialization of a variable provides its initial value at the time of construction.”
However, I’m convinced the initialization happens on line 2, because [various good C++ books] define initialization as simply the first meaningful value that goes into the variable.
Could you please tell me which line is considered initialization?
That’s a great question. Cppreference is correct, and for all class types the answer is simple: The object is initialized on line 1 by having its default constructor called.
But (and you knew a “but” was coming), for a local object of a fundamental built-in type like int
, the answer is… more elaborate. And that’s why Sam is asking, because Sam knows that the language has been kind of loose about initializing such local objects, for historical reasons that made sense at the time.
Short answer: Saying the variable gets its initial value on line 2 is completely reasonable. But note that I deliberately didn’t say “the object is initialized on line 2,” and both the code and this answer gloss over the more important problem of: “Yeah, but what about code between lines 1 and 2 that could try to read the object’s value?”
This post has three parts:
- Pre-C++26, yes, this is kind of awkward. But the funniest part is how Standard describes it today, which is just begging for a little in-good-fun roasting, and so I’ll indulge.
- In C++26, we make this code safe by default, thanks to Thomas Köppe! This is a Big Deal.
- In my Cpp2 experiment, this problem disappears entirely, and all types are treated equally with guaranteed initialization safety. My aim is to propose this for ISO C++ itself post-C++26, so ISO C++ could evolve to remove this issue too in the future, if there’s consensus for such a change.
Let’s start with the world today, our pre-C++26 status quo…
Pre-C++26 answer: The variable is never “initialized”
For those few built-in types like int
, the answer is that in this example there is no initialization at all, because (technically) neither line is an initialization. If that surprises you, consider:
- Line 1 declares an uninitialized object. There is no initial value at all, explicit or implicit.
- Line 2 then assigns an “initial value.” This overwrites the object’s bits and happens to give the object the same value as if its bits had been initialized that way on line 1… but it’s an assignment, not an initialization (construction).
That said, I think it’s reasonable to informally call line 2 “setting an initial value,” in the sense that it’s the first program-meaningful value put into that object. It’s not formally an initialization, but the bits end up the same, and good books can reasonably call line 2 “initializing a
.”
“But wait,” I hear someone in the back saying, “I read the Standard last night, and [dcl.init] says that line 1 is a ‘default-initialization’! Therefore line 1 is an initialization!” Yes, and no, respectively. So let’s look at the Standard’s formal precise and quite funny answer, and this is truly a delightful thing to read: The Standard does say that in line 1 the object is indeed default-initialized… but, for types like int
, the term “default-initialized” is defined to mean “no initialization is performed.”
I am not making this up. See [dcl.init] paragraph 7.
(This may be a good time to mention that “the Standard is not a tutorial”… in other words, we wouldn’t read the Standard to learn the language. The Standard is quite precise about telling us what a C++ compiler does, and there’s nothing really wrong with the Standard specifying things in this way, it’s totally fine and it totally works. But it’s not written for a lay reader, and nobody would blame you if you thought that “default-initialization [means] no initialization is performed” sounds like cognitive dissonance in action, Orwellian doublethink (which is not the same thing), passive-aggressive baiting, or just garden-variety Humpty Dumptyism.)
A related question is: After line 1, has the object’s lifetime started? The good news is that yes it has… in line 1, the uninitialized object’s lifetime has indeed started, per [basic.life] paragraph 1. But don’t let’s look too closely at that paragraph’s words about “vacuous initialization,” because that’s yet another fancyspeak in the Standard for the same concept of “initialized but, ha ha, just kidding.” (Have I mentioned that the Standard isn’t a tutorial?) And of course it’s a serious problem that the object’s lifetime has started, but it hasn’t been initialized with a predictable value… that’s the worst problem of an uninitialized variable, that it can be a security risk to read from it, which has been true “undefined behavior” that could do anything, and attackers can exploit this property.
Fortunately, this is where the safety story gets significantly better, in C++26…
C++26: It gets better (really!) and safe by default
Just a few months ago (the March 2024 meeting in Tokyo), we actually improved this for C++26 by adopting Thomas Köppe’s paper P2795R5, “Erroneous behavior for uninitialized reads.” If that sounds familiar to readers of this blog, it may be because I highlighted it in my Tokyo trip report.
C++26 has created the new concept of erroneous behavior, which is better than “undefined” or “unspecified” because it gives us a way to talk about code that is literally “well-defined as being Just Wrong”… seriously, that’s almost a direct quote from the paper… and because it’s now well-defined it gets stripped of the security scariness of “undefined behavior.” Think of this as the Standard having a tool to turn some behavior from “scarily undefined” to merely “tsk, we know this is partly our fault because we let you write this code and it doesn’t mean what it should mean, but you really wrote a bug here, and we’re going to put some guard-rails around this pit of snakes to remove the safety risk of you falling into it by default and our NSA/CISA/NIST/EO insurance premiums going up.” And the first place that concept has been applied has been to… drum roll… uninitialized local variables.
This is a big deal, because it means that the original example’s line 1 is now still uninitialized, but since C++26 it’s “erroneous behavior” which means that when the code is built with a C++26 compiler, undefined behavior cannot happen if you read the uninitialized value. Yes, that implies a C++26 compiler will generate different code than before… it will be guaranteed to write an erroneous value the compiler knows (but that isn’t guaranteed to be one the programmer can rely on; so don’t rely on it being zero) if there’s any possibility that value might be read.
This may seem like a small thing, but it’s already a major improvement, and shows that the committee is serious about actively changing our language to be safe by default. Making more and more code safe by default is a trend you can expect to see a lot more of in C++’s medium-term future, and that’s a very welcome thing.
While you wait for your favorite C++26 compiler to add this support, you can get an approximation of this feature today with the GCC or Clang switch -ftrivial-auto-var-init=pattern
or the with MSVC switch /RTC1
(run, don’t walk, to use those now if you can). They get you most of what C++26 gives, except that they may not emit a diagnostic (e.g., the Clang switch emits a diagnostic only if you’re running Memory Sanitizer).
For example, consider how this new default prevents secrets from leaking, in this program compiled with and without today’s flag (Godbolt link):
template<int N>
auto print(char (&a)[N]) { std::cout << std::string_view{a,N} << "\n"; }
auto f1() {
char a[] = {'s', 'e', 'c', 'r', 'e', 't' };
print(a);
}
auto f2() {
char a[6];
print(a); // today this likely prints "secret"
}
auto f3() {
char a[] = {'0', '1', '2', '3', '4', '5' };
print(a); // overwrites "secret" (if only just)
}
int main() {
f1();
f2();
f3();
}
Typically, all three local arrays will reuse the same stack storage, and after f1
returns the string secret
is likely still sitting on the stack, waiting for f2
‘s array to overlay it.
In today’s C++ by default, without -ftrivial-auto-var-init=pattern
or /RTC1
, f2
will likely print secret
. Which is… um (looks at feet and twists a toe to pretend to erase an imaginary spot on the floor)… let’s say problematic for safety and security. As Jon would say to today’s undefined-behavior uninitialized rule, “you give C++ a bad name.”
But with GCC and Clang -ftrivial-auto-var-init=pattern
, with MSVC /RTC1
, and in C++26 onward by default, f2
will not leak the secret. As Bjarne has sometimes said in other contexts, but I think applies here too: “This is progress!” And to any grumpy readers who may be inclined to say, “dude, I’m used to insecure code, getting rid of insecure code by default isn’t in the spirit of C++,” well, (a) it is now, and (b) get used to it because there’s a lot more like this on the way.
Edited to add: A frequently asked question is, why not initialize to zero? That is always proposed, but it isn’t the best answer for several reasons. The main two are: (1) zero is not necessarily a program-meaningful value, so injecting it often just changes one bug into another; (2) it often actively masks the failure to initialize from sanitizers, who now think the object is initialized and so can’t see and report the error. Using an implementation-defined well-known “erroneous” bit pattern doesn’t have those problems.
But this is C++, you always have the full power to take control and get maximum performance when you need to. So yes, if you really want, C++26 will let you opt out by writing [[indeterminate]]
, but every use of that attribute should be challenged in every code review and require justification in the form of clear performance measurements showing that you need to override the safe default:
int a [[indeterminate]] ;
// C++26-speak for "yes please hurt me,
// I want the bad old dangerous semantics"
Post-C++26: What more could we do?
So this is where we are pre-C++26 (highlighting the most problematic lines):
// In today’s C++ pre-C++26, for local variables
// Using a fundamental type like 'int'
int a; // declaration without initialization
std::cout << a; // undefined: read of uninitialized variable
a = 5; // assignment (not initialization)
std::cout << a; // prints 5
// Using a class type like 'std::string'
string b; // declaration with default construction
std::cout << b; // prints "": read of default constructed value
b = "5"; // assignment (not initialization)
std::cout << b; // prints "5"
Note that line 5 might not print anything… it’s undefined behavior, so you’d be lucky if it’s just a matter of printing something or not, because a conforming compiler could technically generate code to erase your hard drive, invoke nasal demons, or other traditional UB nastiness.
And here is where we are starting in C++26 (differences highlighted):
// In C++26, for local variables
// Using a fundamental type like 'int'
int a; // declaration with some erroneous value
std::cout << a; // prints ? or terminates: read of erroneous value
a = 5; // assignment (not initialization)
std::cout << a; // prints 5
// Using a class type like 'std::string'
string b; // declaration with default construction
std::cout << b; // prints "": read of default constructed value
b = "5"; // assignment (not initialization)
std::cout << b; // prints "5"
The good news: Our hard drives and noses are now safe from erasure and worse in line 5. Edited to add: The implementation might print a value or terminate, but there won’t be undefined behavior.
The fine print: C++26 compilers are required to make line 4 write a known value over the bits, and they are encouraged (but are not required) to tell you line 5 is a problem.
In my Cpp2 experimental syntax, local variables of all types are defined like a: some_type = initial_value;
. You can omit the = initial_value
part to express that stack space is allocated for the variable but its actual initialization is deferred, and then Cpp2 guarantees initialization before use; you must do the initialization later using =
(e.g., a = initial_value;
) before any other use of the variable, which gives you the flexibility of doing things like using different constructors for the same variable on different branch paths. So the equivalent example is (differences from C++26 highlighted):
// In my Cpp2 syntax, local variables
// Using a fundamental type like 'int'
a: int; // allocates space, no initialization
// std::cout << a; // illegal: can't use-before-init!
a = 5; // construction => real initialization!
std::cout << a; // prints 5
// Using a class type like 'std::string'
b: string; // allocates space, no initialization
// std::cout << b; // illegal: can't use-before-init!
b = "5"; // construction => real initialization!
std::cout << b; // prints "5"
Cpp2 deliberately has no easy way to opt out and use a variable before it has been initialized. To get that effect, you’d have to have an array of raw std::byte
s or similar on the stack, and do an unsafe_cast
to pretend it’s a different type… which is verbose and hard to write, and that’s because I think that unsafe code should be verbose and hard to write… but you can write it (verbosely) if you really need to, because that’s core to C++: I may disapprove of unsafe code you may write in the name of performance, but I defend to the death your right to write it when you need to; C++ always lets you open the hood and take control. My aim is simply to move from “performance by default, safety always available” where safety is the thing you have to work a bit harder to get, to “safety by default, performance always available.” The metaphor I use for this is that we don’t want to take any sharp knives away from C++ programmers, because chefs sometimes need sharp knives; but when the knives are not in use we just want to keep them in a drawer you need to opt into opening, instead of leaving them strewn about the floor and forever be reminding people to watch where they step.
So far, I find this model is working very well, and it has the triple benefits of performance (initialization work is never done until you need it), flexibility (I can call the real constructor I want), and safety (it’s always real “initialization” with real construction, and never any use-before-initialization). I think we could have this someday in ISO C++, and I intend to bring a proposal along these lines to the ISO C++ committee in the next year or two, and I’ll be as persuasive as I can. They might love it, they might find flaws I’ve overlooked, or something else… we’ll see! In any event, I’ll be sure to report any progress here.
Thanks again to Sam Johnson for this question!