Efficient itoa implementation

rajat · August 27, 2025, 11:57pm

[Jump to article]

Let’s build an efficient itoa step by step.
We’ll start naïve, improve incrementally, then reach a high-performance solution with table lookups.
Along the way, we’ll also explain SSO (Small String Optimization) and why in practice we often get zero heap allocations.

1) Naïve Approach (insert at front)

The most straightforward approach is: peel off digits, prepend them into a string.

#include <string>

std::string itoa_naive(int v) {
    if (v == 0) return "0";

    bool neg = v < 0;
    unsigned int u = neg
        ? static_cast<unsigned int>(-static_cast<long long>(v)) // safe for INT_MIN
        : static_cast<unsigned int>(v);

    std::string s;
    while (u > 0) {
        char d = char('0' + (u % 10));
        s.insert(s.begin(), d); // prepend
        u /= 10;
    }
    if (neg) s.insert(s.begin(), '-');
    return s;
}

What’s wrong?

s.insert(s.begin(), …) shifts the entire string each time.
For each digit we may also trigger a reallocation.
Complexity is quadratic in number of digits.

This works but is slow.

2) Slightly Better: `push_back` + `reverse`

Instead of inserting at the front, we can push_back digits as we peel them off, then reverse at the end:

#include <string>
#include <algorithm>

std::string itoa_push_reverse(int v) {
    if (v == 0) return "0";

    bool neg = v < 0;
    unsigned int u = neg
        ? static_cast<unsigned int>(-static_cast<long long>(v))
        : static_cast<unsigned int>(v);

    std::string s;
    if (neg) s.push_back('-'); // will reverse later
    while (u > 0) {
        s.push_back(char('0' + (u % 10)));
        u /= 10;
    }
    if (neg) {
        std::reverse(s.begin() + 1, s.end());
    } else {
        std::reverse(s.begin(), s.end());
    }
    return s;
}

This reduces cost: appends are amortized O(1), only a single reverse at the end.
Still, there may be a couple of reallocations as the string grows.

But we can do even better.

3) One Allocation, Backward Fill

We know in advance how many digits a number will have.
If we pre-size the string and fill it backwards, we eliminate reallocation and reversals.

#include <string>
#include <cstdint>

static inline unsigned ulen10_u32(uint32_t x) {
    if (x >= 1000000000u) return 10;
    if (x >= 100000000u)  return 9;
    if (x >= 10000000u)   return 8;
    if (x >= 1000000u)    return 7;
    if (x >= 100000u)     return 6;
    if (x >= 10000u)      return 5;
    if (x >= 1000u)       return 4;
    if (x >= 100u)        return 3;
    if (x >= 10u)         return 2;
    return 1;
}

std::string itoa_single_alloc(int v) {
    bool neg = v < 0;
    uint32_t u = neg
        ? static_cast<uint32_t>(-static_cast<long long>(v))
        : static_cast<uint32_t>(v);

    unsigned len = ulen10_u32(u) + (neg ? 1u : 0u);
    std::string s;
    s.resize(len);

    char* p = s.data() + len;
    do {
        *--p = char('0' + (u % 10));
        u /= 10;
    } while (u);
    if (neg) *--p = '-';

    return s; // NRVO / move elision
}

Here, we resize once (often no heap at all, then fill digits backwards.
Exactly one allocation.

Since most modern std::string implementations keep a small buffer inline inside the string object itself. This means short strings don’t allocate on the heap at all.

4) High Performance: Digit Pair Table

To go even faster, we can write two digits at a time.
Instead of repeated / 10 and % 10, we use / 100 and % 100, then look up both characters from a table:

#include <string>
#include <cstdint>

static const char digit_pairs[] =
    "0001020304050607080910111213141516171819"
    "2021222324252627282930313233343536373839"
    "4041424344454647484950515253545556575859"
    "6061626364656667686970717273747576777879"
    "8081828384858687888990919293949596979899";

static inline unsigned ulen32(uint32_t x) {
    if (x >= 1000000000u) return 10;
    if (x >= 100000000u)  return 9;
    if (x >= 10000000u)   return 8;
    if (x >= 1000000u)    return 7;
    if (x >= 100000u)     return 6;
    if (x >= 10000u)      return 5;
    if (x >= 1000u)       return 4;
    if (x >= 100u)        return 3;
    if (x >= 10u)         return 2;
    return 1;
}

std::string efficient_itoa(int v) {
    bool neg = v < 0;
    uint32_t u = neg
        ? static_cast<uint32_t>(-static_cast<long long>(v))
        : static_cast<uint32_t>(v);

    unsigned len = ulen32(u) + (neg ? 1u : 0u);
    std::string s;
    s.resize(len);
    char* p = s.data() + len;

    while (u >= 100) {
        unsigned two = (u % 100) * 2;
        u /= 100;
        *--p = digit_pairs[two + 1];
        *--p = digit_pairs[two + 0];
    }

    if (u >= 10) {
        unsigned two = u * 2;
        *--p = digit_pairs[two + 1];
        *--p = digit_pairs[two + 0];
    } else {
        *--p = char('0' + u);
    }

    if (neg) *--p = '-';
    return s;
}

Why is this fast?

Half as many divisions.
Branches are simple and predictable.
Still only one allocation (and most likely none).