[Jump to article]
Let’s build an efficient itoa step by step.
We’ll start naïve, improve incrementally, then reach a high-performance solution with table lookups.
Along the way, we’ll also explain SSO (Small String Optimization) and why in practice we often get zero heap allocations.
1) Naïve Approach (insert at front)
The most straightforward approach is: peel off digits, prepend them into a string.
#include <string>
std::string itoa_naive(int v) {
if (v == 0) return "0";
bool neg = v < 0;
unsigned int u = neg
? static_cast<unsigned int>(-static_cast<long long>(v)) // safe for INT_MIN
: static_cast<unsigned int>(v);
std::string s;
while (u > 0) {
char d = char('0' + (u % 10));
s.insert(s.begin(), d); // prepend
u /= 10;
}
if (neg) s.insert(s.begin(), '-');
return s;
}
What’s wrong?
s.insert(s.begin(), …)shifts the entire string each time.- For each digit we may also trigger a reallocation.
- Complexity is quadratic in number of digits.
This works but is slow.
2) Slightly Better: push_back + reverse
Instead of inserting at the front, we can push_back digits as we peel them off, then reverse at the end:
#include <string>
#include <algorithm>
std::string itoa_push_reverse(int v) {
if (v == 0) return "0";
bool neg = v < 0;
unsigned int u = neg
? static_cast<unsigned int>(-static_cast<long long>(v))
: static_cast<unsigned int>(v);
std::string s;
if (neg) s.push_back('-'); // will reverse later
while (u > 0) {
s.push_back(char('0' + (u % 10)));
u /= 10;
}
if (neg) {
std::reverse(s.begin() + 1, s.end());
} else {
std::reverse(s.begin(), s.end());
}
return s;
}
This reduces cost: appends are amortized O(1), only a single reverse at the end.
Still, there may be a couple of reallocations as the string grows.
But we can do even better.
3) One Allocation, Backward Fill
We know in advance how many digits a number will have.
If we pre-size the string and fill it backwards, we eliminate reallocation and reversals.
#include <string>
#include <cstdint>
static inline unsigned ulen10_u32(uint32_t x) {
if (x >= 1000000000u) return 10;
if (x >= 100000000u) return 9;
if (x >= 10000000u) return 8;
if (x >= 1000000u) return 7;
if (x >= 100000u) return 6;
if (x >= 10000u) return 5;
if (x >= 1000u) return 4;
if (x >= 100u) return 3;
if (x >= 10u) return 2;
return 1;
}
std::string itoa_single_alloc(int v) {
bool neg = v < 0;
uint32_t u = neg
? static_cast<uint32_t>(-static_cast<long long>(v))
: static_cast<uint32_t>(v);
unsigned len = ulen10_u32(u) + (neg ? 1u : 0u);
std::string s;
s.resize(len);
char* p = s.data() + len;
do {
*--p = char('0' + (u % 10));
u /= 10;
} while (u);
if (neg) *--p = '-';
return s; // NRVO / move elision
}
Here, we resize once (often no heap at all, then fill digits backwards.
Exactly one allocation.
Since most modern std::string implementations keep a small buffer inline inside the string object itself. This means short strings don’t allocate on the heap at all.
4) High Performance: Digit Pair Table
To go even faster, we can write two digits at a time.
Instead of repeated / 10 and % 10, we use / 100 and % 100, then look up both characters from a table:
#include <string>
#include <cstdint>
static const char digit_pairs[] =
"0001020304050607080910111213141516171819"
"2021222324252627282930313233343536373839"
"4041424344454647484950515253545556575859"
"6061626364656667686970717273747576777879"
"8081828384858687888990919293949596979899";
static inline unsigned ulen32(uint32_t x) {
if (x >= 1000000000u) return 10;
if (x >= 100000000u) return 9;
if (x >= 10000000u) return 8;
if (x >= 1000000u) return 7;
if (x >= 100000u) return 6;
if (x >= 10000u) return 5;
if (x >= 1000u) return 4;
if (x >= 100u) return 3;
if (x >= 10u) return 2;
return 1;
}
std::string efficient_itoa(int v) {
bool neg = v < 0;
uint32_t u = neg
? static_cast<uint32_t>(-static_cast<long long>(v))
: static_cast<uint32_t>(v);
unsigned len = ulen32(u) + (neg ? 1u : 0u);
std::string s;
s.resize(len);
char* p = s.data() + len;
while (u >= 100) {
unsigned two = (u % 100) * 2;
u /= 100;
*--p = digit_pairs[two + 1];
*--p = digit_pairs[two + 0];
}
if (u >= 10) {
unsigned two = u * 2;
*--p = digit_pairs[two + 1];
*--p = digit_pairs[two + 0];
} else {
*--p = char('0' + u);
}
if (neg) *--p = '-';
return s;
}
Why is this fast?
- Half as many divisions.
- Branches are simple and predictable.
- Still only one allocation (and most likely none).