summaryrefslogtreecommitdiff
path: root/tools/llvmc2/doc/LLVMC-Enhancements.rst
blob: a831ea06f8bc0a7a5e28dc176fd0d1411aa6819c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
Introduction
============

A complete rewrite of the LLVMC compiler driver is proposed, aimed at
making it more configurable and useful.

Motivation
==========

As it stands, current version of LLVMC does not meet its stated goals
of configurability and extensibility and is therefore not used
much. The need for enhancements in LLVMC is also reflected in [1]_. The
proposed rewrite will fix the aforementioned deficiences and provide
an extensible, future-proof solution.

Design
======

A compiler driver's job is essentially to find a way how to transform
a set of input files into a set of targets, depending on the
user-provided options. Since several ways of transformation can exist
potentially, it's natural to use a directed graph to represent all of
them. In this graph, nodes are tools (for example, ``gcc -S`` is a tool
that generates assembly from C language files) and edges between them
mean that the output of one tool can be given as input to another (as
in ``gcc -S -o - file.c | as``). We'll call this graph the compilation
graph.

The proposed design revolves around the compilation graph and the
following core abstractions:

- Target - An (intermediate) compilation target.

- Action - A shell command template that represents basic compilation
  transformation(example: ``gcc -S $INPUT_FILE -o $OUTPUT_FILE``).

- Tool - Encapsulates information about a concrete tool used in the
  compilation process, produces Actions. Operation depends on
  command-line options provided by the user.

- GraphBuilder - Constructs the compilation graph, operation depends
  on command-line options.

- GraphTraverser - Traverses the compilation graph and constructs a
  sequence of Actions needed to build the target file, operation
  depends on command-line options.

A high-level view of the compilation process:

  1. Configuration libraries (see below) are loaded in and the
  compilation graph is constructed from the tool descriptions.

  2. Information about possible options is gathered from (the nodes of)
  the compilation graph.

  3. Options are parsed based on data gathered in step 2.

  4. A sequence of Actions needed to build the target is constructed
  using the compilation graph and provided options.

  5. The resulting action sequence is executed.

Extensibility
==============

To make this design extensible, TableGen [2]_ will be used for
automatic generation of the Tool classes. Users wanting to customize
LLVMC will need to write a configuration library consisting of a set
of TableGen descriptions of compilation tools plus a number of hooks
that influence compilation graph construction and traversal. LLVMC
will have the ability to load user configuration libraries at runtime;
in fact, its own basic functionality will be implemented as a
configuration library.

TableGen specification example
------------------------------

This small example specifies a Tool that converts C source to object
files. Note that it is only a mock-up of inteded functionality, not a
final specification::

    def GCC : Tool<
     GCCProperties, // Properties of this tool
     GCCOptions     // Options description for this tool
    >;

    def GCCProperties : ToolProperties<[
     ToolName<"GCC">,
     InputLanguageName<"C">,
     OutputLanguageName<"Object-Code">
     InputFileExtension<"c">,
     OutputFileExtension<"o">,
     CommandFormat<"gcc -c $OPTIONS $FILES">
    ]>;

    def GCCOptions : ToolOptions<[
     Option<
       "-Wall",                 // Option name
       [None],                  // Allowed values
       [AddOption<"-Wall">]>,   // Action

     Option<
       "-Wextra",               // Option name
       [None],                  // Allowed values
       [AddOption<"-Wextra">]>, // Action

     Option<
       "-W",                 // Option name
       [None],               // Allowed values
       [AddOption<"-W">]>,   // Action

     Option<
       "-D",        // Option name
       [AnyString], // Allowed values

       [AddOptionWithArgument<"-D",GetOptionArgument<"-D">>]
       // Action:
       // If the driver was given option "-D<argument>", add
       // option "-D" with the same argument to the invocation string of
       // this tool.
       >

     ]>;

Example of generated code
-------------------------

The specification above compiles to the following code (again, it's a
mock-up)::

    class GCC : public Tool {

    public:

      GCC() { //... }

     // Properties

      static const char* ToolName = "GCC";
      static const char* InputLanguageName = "C";
      static const char* OutputLanguageName = "Object-Code";
      static const char* InputFileExtension = "c";
      static const char* OutputFileExtension = "o";
      static const char* CommandFormat = "gcc -c $OPTIONS $FILES";

     // Options

     OptionsDescription SupportedOptions() {
       OptionsDescription supportedOptions;

       supportedOptions.Add(Option("-Wall"));
       supportedOptions.Add(Option("-Wextra"));
       supportedOptions.Add(Option("-W"));
       supportedOptions.Add(Option("-D", AllowedArgs::ANY_STRING));

       return supportedOptions;
     }

     Action GenerateAction(Options providedOptions) {
       Action generatedAction(CommandFormat); Option curOpt;

       curOpt = providedOptions.Get("-D");
       if (curOpt) {
          assert(curOpt.HasArgument());
          generatedAction.AddOption(Option("-D", curOpt.GetArgument()));
       }

       curOpt = providedOptions.Get("-Wall");
       if (curOpt)
         generatedAction.AddOption(Option("-Wall"));

       curOpt = providedOptions.Get("-Wextra");
       if (curOpt)
         generatedAction.AddOption(Option("-Wall"));

       curOpt = providedOptions.Get("-W");
       if (curOpt)
         generatedAction.AddOption(Option("-Wall")); }

       return generatedAction;
     }

    };

    // defined somewhere...

    class Action { public: void AddOption(const Option& opt) {...}
    int Run(const Filenames& fnms) {...}

    }

Option handling
===============

Since one of the main tasks of the compiler driver is to correctly
handle user-provided options, it is important to define this process
in exact way. The intent of the proposed scheme is to function as a
drop-in replacement for GCC.

Option syntax
-------------

Option syntax is specified by the following formal grammar::

        <command-line>      ::=  <option>*
        <option>            ::=  <positional-option> | <named-option>
        <named-option>      ::=  -[-]<option-name>[<delimeter><option-argument>]
        <delimeter>         ::=  ',' | '=' | ' '
        <positional-option> ::=  <string>
        <option-name>       ::=  <string>
        <option-argument>   ::=  <string>

This roughly corresponds to the GCC option syntax. Note that grouping
of short options(as in ``ls -la``) is forbidden.

Example::

        llvmc -O3 -Wa,-foo,-bar -pedantic -std=c++0x a.c b.c c.c

Option arguments can also have special forms: for example, an argument
can be a comma-separated list (like in -Wa,-foo,-bar). In such cases,
it's up to the option handler to parse the argument.

Option semantics
----------------

According to their meaning, options are classified into following
categories:

- Global options - Options that influence compilation graph
  construction/traversal. Example: -E (stop after preprocessing).

- Local options - Options that influence one or several Actions in
  the generated action sequence. Example: -O3 (turn on optimization).

- Prefix options - Options that influence meaning of the following
  command-line arguments. Example: -x language (specify language for
  the input files explicitly). Prefix options can be local or global.

- Built-in options - Options that are hard-coded into
  driver. Examples: --help, -o file/-pipe (redirect output). Can be
  local or global.

Naming
======

Since the compiler driver, as a single point of access to the LLVM
tool set, is a very often used tool, it would be desirable to make its name
as short and easy to type as possible. Some possible names are 'llcc' or
'lcc', by analogy with gcc.


Issues
======

1. Should global-options-influencing hooks be written by hand or
   auto-generated from TableGen specifications?

2. More?

References
==========

.. [1] LLVM Bug#686

       http://llvm.org/bugs/show_bug.cgi?id=686

.. [2] TableGen Fundamentals

       http://llvm.org/docs/TableGenFundamentals.html