programming:crosstools:crosstoll_ng:doc_9_how_a_toolchain_is_constructed

My Wiki!

author “Yann E. MORIN” yann.morin.1998@free.fr

Thu Feb 09 21:39:31 2012 +0100 (11 days ago)

branch 1.13 changeset 2875 1a29ad87a9ec parent 2563 e17f35b05539 permissions -rw-r–r– 1.13: update version to 1.13.4

 1 File.........: 9 - Build procedure overview.txt

 2 Copyright....: (C) 2011 Yann E. MORIN <yann.morin.1998@anciens.enib.fr>

 3 License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5
 4 
 5 
 6 How is a toolchain constructed? /
 7 _______________________________/
 8 
 9 This is the result of a discussion with Francesco Turco <mail@fturco.org>:
10   http://sourceware.org/ml/crossgcc/2011-01/msg00060.html
11 
12 Francesco has a nice tutorial for beginners, along with a sample, step-by-
13 step procedure to build a toolchain for an ARM target from an x86_64 Debian
14 host:
15   http://fturco.org/wiki/doku.php?id=debian:cross-compiler
16 
17 Thank you Francesco for initiating this!
18 
19 
20 I want a cross-compiler! What is this toolchain you're speaking about? |
21 -----------------------------------------------------------------------+
22 
23 A cross-compiler is in fact a collection of different tools set up to
24 tightly work together. The tools are arranged in a way that they are
25 chained, in a kind of cascade, where the output from one becomes the
26 input to another one, to ultimately produce the actual binary code that
27 runs on a machine. So, we call this arrangement a "toolchain". When
28 a toolchain is meant to generate code for a machine different from the

29 machine it runs on, this is called a cross-toolchain.

30 

31 

32 So, what are those components in a toolchain? |

33 ----------------------------------------------+

34 

35 The components that play a role in the toolchain are first and foremost

36 the compiler itself. The compiler turns source code (in C, C++, whatever)

37 into assembly code. The compiler of choice is the GNU compiler collection,

38 well known as 'gcc'.

39 

40 The assembly code is interpreted by the assembler to generate object code.

41 This is done by the binary utilities, such as the GNU 'binutils'.

42 

43 Once the different object code files have been generated, they got to get

44 aggregated together to form the final executable binary. This is called

45 linking, and is achieved with the use of a linker. The GNU 'binutils' also

46 come with a linker.

47 

48 So far, we get a complete toolchain that is capable of turning source code

49 into actual executable code. Depending on the Operating System, or the lack

50 thereof, running on the target, we also need the C library. The C library

51 provides a standard abstraction layer that performs basic tasks (such as

52 allocating memory, printing output on a terminal, managing file access...).

53 There are many C libraries, each targeted to different systems. For the

54 Linux /desktop/, there is glibc or eglibc or even uClibc, for embedded Linux,

55 you have a choice of eglibc or uClibc, while for system without an Operating

56 System, you may use newlib, dietlibc, or even none at all. There a few other

57 C libraries, but they are not as widely used, and/or are targeted to very

58 specific needs (eg. klibc is a very small subset of the C library aimed at

59 building constrained initial ramdisks).

60 

61 Under Linux, the C library needs to know the API to the kernel to decide

62 what features are present, and if needed, what emulation to include for

63 missing features. That API is provided by the kernel headers. Note: this

64 is Linux-specific (and potentially a very few others), the C library on

65 other OSes do not need the kernel headers.

66 

67 

68 And now, how do all these components chained together? |

69 -------------------------------------------------------+

70 

71 So far, all major components have been covered, but yet there is a specific

72 order they need to be built. Here we see what the dependencies are, starting

73 with the compiler we want to ultimately use. We call that compiler the

74 'final compiler'.

75 

76   - the final compiler needs the C library, to know how to use it,

77 but:

78   - building the C library requires a compiler

79 

80 A needs B which needs A. This is the classic chicken'n'egg problem... This

81 is solved by building a stripped-down compiler that does not need the C

82 library, but is capable of building it. We call it a bootstrap, initial, or

83 core compiler. So here is the new dependency list:

84 

85   - the final compiler needs the C library, to know how to use it,

86   - building the C library requires a core compiler

87 but:

88   - the core compiler needs the C library headers and start files, to know

89     how to use the C library

90 

91 B needs C which needs B. Chicken'n'egg, again. To solve this one, we will

92 need to build a C library that will only install its headers and start

93 files. The start files are a very few files that gcc needs to be able to

94 turn on thread local storage (TLS) on an NPTL system. So now we have:

95 

96   - the final compiler needs the C library, to know how to use it,

97   - building the C library requires a core compiler

98   - the core compiler needs the C library headers and start files, to know

99     how to use the C library

 100 but:

 101   - building the start files require a compiler

 103 Geez... C needs D which needs C, yet again. So we need to build a yet

 104 simpler compiler, that does not need the headers and does need the start

 105 files. This compiler is also a bootstrap, initial or core compiler. In order

 106 to differentiate the two core compilers, let's call that one "core pass 1",

 107 and the former one "core pass 2". The dependency list becomes:

 109   - the final compiler needs the C library, to know how to use it,

 110   - building the C library requires a compiler

 111   - the core pass 2 compiler needs the C library headers and start files,

 112     to know how to use the C library

 113   - building the start files requires a compiler

 114   - we need a core pass 1 compiler

 116 And as we said earlier, the C library also requires the kernel headers.

 117 There is no requirement for the kernel headers, so end of story in this

 118 case:

 120   - the final compiler needs the C library, to know how to use it,

 121   - building the C library requires a core compiler

 122   - the core pass 2 compiler needs the C library headers and start files,

 123     to know how to use the C library

 124   - building the start files requires a compiler and the kernel headers

 125   - we need a core pass 1 compiler

 127 We need to add a few new requirements. The moment we compile code for the

 128 target, we need the assembler and the linker. Such code is, of course,

 129 built from the C library, so we need to build the binutils before the C

 130 library start files, and the complete C library itself. Also, some code

 131 in gcc will turn to run on the target as well. Luckily, there is no

 132 requirement for the binutils. So, our dependency chain is as follows:

 134   - the final compiler needs the C library, to know how to use it, and the

 135     binutils

 136   - building the C library requires a core pass 2 compiler and the binutils

 137   - the core pass 2 compiler needs the C library headers and start files,

 138     to know how to use the C library, and the binutils

 139   - building the start files requires a compiler, the kernel headers and the

 140     binutils

 141   - the core pass 1 compiler needs the binutils

 143 Which turns in this order to build the components:

 145   1 binutils

 146   2 core pass 1 compiler

 147   3 kernel headers

 148   4 C library headers and start files

 149   5 core pass 2 compiler

 150   6 complete C library

 151   7 final compiler

 153 Yes! :-) But are we done yet?

 155 In fact, no, there are still missing dependencies. As far as the tools

 156 themselves are involved, we do not need anything else.

 158 But gcc has a few pre-requisites. It relies on a few external libraries to

 159 perform some non-trivial tasks (such as handling complex numbers in

 160 constants...). There are a few options to build those libraries. First, one

 161 may think to rely on a Linux distribution to provide those libraries. Alas,

 162 they were not widely available until very, very recently. So, if the distro

 163 is not too recent, chances are that we will have to build those libraries

 164 (which we do below). The affected libraries are:

 166   - the GNU Multiple Precision Arithmetic Library, GMP

 167   - the C library for multiple-precision floating-point computations with

 168     correct rounding, MPFR

 169   - the C library for the arithmetic of complex numbers, MPC

 171 The dependencies for those libraries are:

 173   - MPC requires GMP and MPFR

 174   - MPFR requires GMP

 175   - GMP has no pre-requisite

 177 So, the build order becomes:

 179   1 GMP

 180   2 MPFR

 181   3 MPC

 182   4 binutils

 183   5 core pass 1 compiler

 184   6 kernel headers

 185   7 C library headers and start files

 186   8 core pass 2 compiler

 187   9 complete C library

 188  10 final compiler

 190 Yes! Or yet some more?

 192 This is now sufficient to build a functional toolchain. So if you've had

 193 enough for now, you can stop here. Or if you are curious, you can continue

 194 reading.

 196 gcc can also make use of a few other external libraries. These additional,

 197 optional libraries are used to enable advanced features in gcc, such as

 198 loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want

 199 to use these, you'll need three additional libraries:

 201 To enable GRAPHITE:

 202   - the Parma Polyhedra Library, PPL

 203   - the Chunky Loop Generator, using the PPL backend, CLooG/PPL

 205 To enable LTO:

 206   - the ELF object file access library, libelf

 208 The dependencies for those libraries are:

 210   - PPL requires GMP

 211   - CLooG/PPL requires GMP and PPL

 212   - libelf has no pre-requisites

 214 The list now looks like (optional libs with a *):

 216   1 GMP

 217   2 MPFR

 218   3 MPC

 219   4 PPL *

 220   5 CLooG/PPL *

 221   6 libelf *

 222   7 binutils

 223   8 core pass 1 compiler

 224   9 kernel headers

 225  10 C library headers and start files

 226  11 core pass 2 compiler

 227  12 complete C library

 228  13 final compiler

 230 This list is now complete! Wouhou! :-)

 233 So the list is complete. But why does crosstool-NG have more steps? |

 234 --------------------------------------------------------------------+

 236 The already thirteen steps are the necessary steps, from a theoretical point

 237 of view. In reality, though, there are small differences; there are three

 238 different reasons for the additional steps in crosstool-NG.

 240 First, the GNU binutils do not support some kinds of output. It is not possible

 241 to generate 'flat' binaries with binutils, so we have to use another component

 242 that adds this support: elf2flt. Another binary utility called sstrip has been

 243 added. It allows for super-stripping the target binaries, although it is not

 244 strictly required.

 246 Second, some C libraries require another step after the compiler is built, to

 247 install additional stuff. This is the case for mingw and newlib. Hence the

 248 libc_finish step.

 250 Third, crosstool-NG can also build some additional debug utilities to run on

 251 the target. This is where we build, for example, the cross-gdb, the gdbserver

 252 and the native gdb (the last two run on the target, the first runs on the

 253 same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc)

 254 are absolutely not related to the toolchain, but are nice-to-have stuff that

 255 can greatly help when developing, so are included as goodies (and they are

 256 quite easy to build, so it's OK; more complex stuff is not worth the effort

 257 to include in crosstool-NG).

Trace: • python_programming • crossplatform_c • analytical • doc_9_how_a_toolchain_is_constructed

Navigation

Search

Toolbox