How are GCC and g++ bootstrapped? How are GCC and g++ bootstrapped? c c

How are GCC and g++ bootstrapped?


The oldest version of GCC was compiled using another C compiler, since there were others when it was written. The very first C compiler ever (ca. 1973, IIRC) was implemented either in PDP-11 assembly, or in the B programming language which preceded it, but in any case the B compiler was written in assembly. Similarly, the first ever C++ compiler (CPre/Cfront, 1979-1983) were probably first implemented in C, then rewritten in C++.

When you compile GCC or any other self-hosting compiler, the full order of building is:

  1. Build new version of GCC with existing C compiler
  2. re-build new version of GCC with the one you just built
  3. (optional) repeat step 2 for verification purposes.

This process is called bootstrapping. It tests the compiler's capability of compiling itself and makes sure that the resulting compiler is built with all the optimizations that it itself implements.

EDIT: Drew Dormann, in the comments, points to Bjarne Stroustrup's account of the earliest implementation of C++. It was implemented in C++ but translated by what Stroustrup calls a "preprocessor" from C++ to C; not a full compiler by his definition, but still C++ was bootstrapped in C.


If you want to replicate the bootstrap process of GCC in a modern environment (x86 Linux), you can use the tools developed by the bootstrappable project:

  • We can start with hex0 assembler (on x86 it's 357 byte binary) which doesroughly what the following two commands do

    sed 's/[;#].*$//g' hex0_x86.hex0 | xxd -r -p > hex0chmod +x hex0

    I.e. it translates ASCII equivalent of binary program into binary code, but itis written in hex0 itself.

    Basically, hex0 has equivalent source code that is in one to one correspondenceto its binary code.

  • hex0 can be used to build a slighly more powerful hex1 assembler thatsupports a few more features (one character labels and calculates offsets).hex1 is written in hex0 assembly.

  • hex1 can be used to build hex2 (even more advanced assembler that supports multi character labels).

  • hex2 then can be used to build a macro assembler (where program using macros instead of hex opcodes).

  • You can then use thismacro assembler to build cc_x86 which is a "C compiler" written in assembly. cc_x86 only supports a small subset of C but that's an impresive start.

  • You can use cc_x86 to build M2-Planet (Macro Platform Neutral Transpiler) which is a C compiler written in C. M2-Planet is self hosting and can build itself.

  • You can then use M2-Planet to build GNU Mes which is a small scheme interpreter.

  • mes can be used to run mescc which is a C compiler written in scheme and lives in the same repository as mes.

  • mescc can be used to rebuild mes and also build mes C library.

  • Then mescc can be used to build a slighly patched Tiny C compiler.

  • Then you can use it to build newer version of TCC 0.9.27.

  • GCC 4.0.4 and musl C library can be built with TCC 0.9.27.

  • Then you can build newer GCC using older GCC.E.g. GCC 4.0.4 -> GCC 4.7.4 -> modern GCC.

TL;DR:

hex0 -> hex1 -> hex2 -> M0 -> M2-Planet -> Mes -> Mescc -> TCC -> GCC.