Overview¶
The LLVM compiler infrastructure supports a wide range of projects, from industrial strength compilers to specialized JIT applications to small research projects.
Similarly, documentation is broken down into several high-level groupings targeted at different audiences:
LLVM Design & Overview¶
Several introductory papers and presentations.
LLVM Language Reference Manual¶
- Abstract
- Introduction
- Identifiers
- High Level Structure
- Module Structure
- Linkage Types
- Calling Conventions
- Visibility Styles
- DLL Storage Classes
- Thread Local Storage Models
- Structure Types
- Global Variables
- Functions
- Aliases
- Comdats
- Named Metadata
- Parameter Attributes
- Garbage Collector Strategy Names
- Prefix Data
- Prologue Data
- Personality Function
- Attribute Groups
- Function Attributes
- Module-Level Inline Assembly
- Data Layout
- Target Triple
- Pointer Aliasing Rules
- Volatile Memory Accesses
- Memory Model for Concurrent Operations
- Atomic Memory Ordering Constraints
- Fast-Math Flags
- Use-list Order Directives
- Type System
- Constants
- Other Values
- Metadata
- Metadata Nodes and Metadata Strings
- Specialized Metadata Nodes
- DICompileUnit
- DIFile
- DIBasicType
- DISubroutineType
- DIDerivedType
- DICompositeType
- DISubrange
- DIEnumerator
- DITemplateTypeParameter
- DITemplateValueParameter
- DINamespace
- DIGlobalVariable
- DISubprogram
- DILexicalBlock
- DILexicalBlockFile
- DILocation
- DILocalVariable
- DIExpression
- DIObjCProperty
- DIImportedEntity
- ‘
tbaa
‘ Metadata - ‘
tbaa.struct
‘ Metadata - ‘
noalias
‘ and ‘alias.scope
‘ Metadata - ‘
fpmath
‘ Metadata - ‘
range
‘ Metadata - ‘
llvm.loop
‘ - ‘
llvm.loop.vectorize
‘ and ‘llvm.loop.interleave
‘ - ‘
llvm.loop.interleave.count
‘ Metadata - ‘
llvm.loop.vectorize.enable
‘ Metadata - ‘
llvm.loop.vectorize.width
‘ Metadata - ‘
llvm.loop.unroll
‘ - ‘
llvm.loop.unroll.count
‘ Metadata - ‘
llvm.loop.unroll.disable
‘ Metadata - ‘
llvm.loop.unroll.runtime.disable
‘ Metadata - ‘
llvm.loop.unroll.full
‘ Metadata - ‘
llvm.mem
‘ - ‘
llvm.mem.parallel_loop_access
‘ Metadata - ‘
llvm.bitsets
‘
- Specialized Metadata Nodes
- Metadata Nodes and Metadata Strings
- Module Flags Metadata
- Intrinsic Global Variables
- Instruction Reference
- Terminator Instructions
- Binary Operations
- Bitwise Binary Operations
- Vector Operations
- Aggregate Operations
- Memory Access and Addressing Operations
- Conversion Operations
- ‘
trunc .. to
‘ Instruction - ‘
zext .. to
‘ Instruction - ‘
sext .. to
‘ Instruction - ‘
fptrunc .. to
‘ Instruction - ‘
fpext .. to
‘ Instruction - ‘
fptoui .. to
‘ Instruction - ‘
fptosi .. to
‘ Instruction - ‘
uitofp .. to
‘ Instruction - ‘
sitofp .. to
‘ Instruction - ‘
ptrtoint .. to
‘ Instruction - ‘
inttoptr .. to
‘ Instruction - ‘
bitcast .. to
‘ Instruction - ‘
addrspacecast .. to
‘ Instruction
- ‘
- Other Operations
- Intrinsic Functions
- Variable Argument Handling Intrinsics
- Accurate Garbage Collection Intrinsics
- Code Generator Intrinsics
- ‘
llvm.returnaddress
‘ Intrinsic - ‘
llvm.frameaddress
‘ Intrinsic - ‘
llvm.localescape
‘ and ‘llvm.localrecover
‘ Intrinsics - ‘
llvm.read_register
‘ and ‘llvm.write_register
‘ Intrinsics - ‘
llvm.stacksave
‘ Intrinsic - ‘
llvm.stackrestore
‘ Intrinsic - ‘
llvm.prefetch
‘ Intrinsic - ‘
llvm.pcmarker
‘ Intrinsic - ‘
llvm.readcyclecounter
‘ Intrinsic - ‘
llvm.clear_cache
‘ Intrinsic - ‘
llvm.instrprof_increment
‘ Intrinsic
- ‘
- Standard C Library Intrinsics
- ‘
llvm.memcpy
‘ Intrinsic - ‘
llvm.memmove
‘ Intrinsic - ‘
llvm.memset.*
‘ Intrinsics - ‘
llvm.sqrt.*
‘ Intrinsic - ‘
llvm.powi.*
‘ Intrinsic - ‘
llvm.sin.*
‘ Intrinsic - ‘
llvm.cos.*
‘ Intrinsic - ‘
llvm.pow.*
‘ Intrinsic - ‘
llvm.exp.*
‘ Intrinsic - ‘
llvm.exp2.*
‘ Intrinsic - ‘
llvm.log.*
‘ Intrinsic - ‘
llvm.log10.*
‘ Intrinsic - ‘
llvm.log2.*
‘ Intrinsic - ‘
llvm.fma.*
‘ Intrinsic - ‘
llvm.fabs.*
‘ Intrinsic - ‘
llvm.minnum.*
‘ Intrinsic - ‘
llvm.maxnum.*
‘ Intrinsic - ‘
llvm.copysign.*
‘ Intrinsic - ‘
llvm.floor.*
‘ Intrinsic - ‘
llvm.ceil.*
‘ Intrinsic - ‘
llvm.trunc.*
‘ Intrinsic - ‘
llvm.rint.*
‘ Intrinsic - ‘
llvm.nearbyint.*
‘ Intrinsic - ‘
llvm.round.*
‘ Intrinsic
- ‘
- Bit Manipulation Intrinsics
- Arithmetic with Overflow Intrinsics
- Specialised Arithmetic Intrinsics
- Half Precision Floating Point Intrinsics
- Debugger Intrinsics
- Exception Handling Intrinsics
- Trampoline Intrinsics
- Masked Vector Load and Store Intrinsics
- Masked Vector Gather and Scatter Intrinsics
- Memory Use Markers
- General Intrinsics
- ‘
llvm.var.annotation
‘ Intrinsic - ‘
llvm.ptr.annotation.*
‘ Intrinsic - ‘
llvm.annotation.*
‘ Intrinsic - ‘
llvm.trap
‘ Intrinsic - ‘
llvm.debugtrap
‘ Intrinsic - ‘
llvm.stackprotector
‘ Intrinsic - ‘
llvm.stackprotectorcheck
‘ Intrinsic - ‘
llvm.objectsize
‘ Intrinsic - ‘
llvm.expect
‘ Intrinsic - ‘
llvm.assume
‘ Intrinsic - ‘
llvm.bitset.test
‘ Intrinsic - ‘
llvm.donothing
‘ Intrinsic
- ‘
- Stack Map Intrinsics
Abstract¶
This document is a reference manual for the LLVM assembly language. LLVM is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing ‘all’ high-level languages cleanly. It is the common code representation used throughout all phases of the LLVM compilation strategy.
Introduction¶
The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation. This allows LLVM to provide a powerful intermediate representation for efficient compiler transformations and analysis, while providing a natural means to debug and visualize the transformations. The three different forms of LLVM are all equivalent. This document describes the human readable representation and notation.
The LLVM representation aims to be light-weight and low-level while being expressive, typed, and extensible at the same time. It aims to be a “universal IR” of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it (similar to how microprocessors are “universal IR’s”, allowing many source languages to be mapped to them). By providing type information, LLVM can be used as the target of optimizations: for example, through pointer analysis, it can be proven that a C automatic variable is never accessed outside of the current function, allowing it to be promoted to a simple SSA value instead of a memory location.
Well-Formedness¶
It is important to note that this document describes ‘well formed’ LLVM assembly language. There is a difference between what the parser accepts and what is considered ‘well formed’. For example, the following instruction is syntactically okay, but not well formed:
%x = add i32 1, %x
because the definition of %x
does not dominate all of its uses. The
LLVM infrastructure provides a verification pass that may be used to
verify that an LLVM module is well formed. This pass is automatically
run by the parser after parsing input assembly and by the optimizer
before it outputs bitcode. The violations pointed out by the verifier
pass indicate bugs in transformation passes or input to the parser.
Identifiers¶
LLVM identifiers come in two basic types: global and local. Global
identifiers (functions, global variables) begin with the '@'
character. Local identifiers (register names, types) begin with the
'%'
character. Additionally, there are three different formats for
identifiers, for different purposes:
- Named values are represented as a string of characters with their
prefix. For example,
%foo
,@DivisionByZero
,%a.really.long.identifier
. The actual regular expression used is ‘[%@][-a-zA-Z$._][-a-zA-Z$._0-9]*
‘. Identifiers that require other characters in their names can be surrounded with quotes. Special characters may be escaped using"\xx"
wherexx
is the ASCII code for the character in hexadecimal. In this way, any character can be used in a name value, even quotes themselves. The"\01"
prefix can be used on global variables to suppress mangling. - Unnamed values are represented as an unsigned numeric value with
their prefix. For example,
%12
,@2
,%44
. - Constants, which are described in the section Constants below.
LLVM requires that values start with a prefix for two reasons: Compilers don’t need to worry about name clashes with reserved words, and the set of reserved words may be expanded in the future without penalty. Additionally, unnamed identifiers allow a compiler to quickly come up with a temporary variable without having to avoid symbol table conflicts.
Reserved words in LLVM are very similar to reserved words in other
languages. There are keywords for different opcodes (‘add
‘,
‘bitcast
‘, ‘ret
‘, etc...), for primitive type names (‘void
‘,
‘i32
‘, etc...), and others. These reserved words cannot conflict
with variable names, because none of them start with a prefix character
('%'
or '@'
).
Here is an example of LLVM code to multiply the integer variable
‘%X
‘ by 8:
The easy way:
%result = mul i32 %X, 8
After strength reduction:
%result = shl i32 %X, 3
And the hard way:
%0 = add i32 %X, %X ; yields i32:%0
%1 = add i32 %0, %0 ; yields i32:%1
%result = add i32 %1, %1
This last way of multiplying %X
by 8 illustrates several important
lexical features of LLVM:
- Comments are delimited with a ‘
;
‘ and go until the end of line. - Unnamed temporaries are created when the result of a computation is not assigned to a named value.
- Unnamed temporaries are numbered sequentially (using a per-function incrementing counter, starting with 0). Note that basic blocks and unnamed function parameters are included in this numbering. For example, if the entry basic block is not given a label name and all function parameters are named, then it will get number 0.
It also shows a convention that we follow in this document. When demonstrating instructions, we will follow an instruction with a comment that defines the type and name of value produced.
High Level Structure¶
Module Structure¶
LLVM programs are composed of Module
‘s, each of which is a
translation unit of the input programs. Each module consists of
functions, global variables, and symbol table entries. Modules may be
combined together with the LLVM linker, which merges function (and
global variable) definitions, resolves forward declarations, and merges
symbol table entries. Here is an example of the “hello world” module:
; Declare the string constant as a global constant.
@.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"
; External declaration of the puts function
declare i32 @puts(i8* nocapture) nounwind
; Definition of main function
define i32 @main() { ; i32()*
; Convert [13 x i8]* to i8 *...
%cast210 = getelementptr [13 x i8], [13 x i8]* @.str, i64 0, i64 0
; Call puts function to write out the string to stdout.
call i32 @puts(i8* %cast210)
ret i32 0
}
; Named metadata
!0 = !{i32 42, null, !"string"}
!foo = !{!0}
This example is made up of a global variable named
“.str
”, an external declaration of the “puts
” function, a
function definition for “main
” and
named metadata “foo
”.
In general, a module is made up of a list of global values (where both functions and global variables are global values). Global values are represented by a pointer to a memory location (in this case, a pointer to an array of char, and a pointer to a function), and have one of the following linkage types.
Linkage Types¶
All Global Variables and Functions have one of the following types of linkage:
private
- Global values with “
private
” linkage are only directly accessible by objects in the current module. In particular, linking code into a module with an private global value may cause the private to be renamed as necessary to avoid collisions. Because the symbol is private to the module, all references can be updated. This doesn’t show up in any symbol table in the object file. internal
- Similar to private, but the value shows as a local symbol
(
STB_LOCAL
in the case of ELF) in the object file. This corresponds to the notion of the ‘static
‘ keyword in C. available_externally
- Globals with “
available_externally
” linkage are never emitted into the object file corresponding to the LLVM module. They exist to allow inlining and other optimizations to take place given knowledge of the definition of the global, which is known to be somewhere outside the module. Globals withavailable_externally
linkage are allowed to be discarded at will, and are otherwise the same aslinkonce_odr
. This linkage type is only allowed on definitions, not declarations. linkonce
- Globals with “
linkonce
” linkage are merged with other globals of the same name when linkage occurs. This can be used to implement some forms of inline functions, templates, or other code which must be generated in each translation unit that uses it, but where the body may be overridden with a more definitive definition later. Unreferencedlinkonce
globals are allowed to be discarded. Note thatlinkonce
linkage does not actually allow the optimizer to inline the body of this function into callers because it doesn’t know if this definition of the function is the definitive definition within the program or whether it will be overridden by a stronger definition. To enable inlining and other optimizations, use “linkonce_odr
” linkage. weak
- “
weak
” linkage has the same merging semantics aslinkonce
linkage, except that unreferenced globals withweak
linkage may not be discarded. This is used for globals that are declared “weak” in C source code. common
- “
common
” linkage is most similar to “weak
” linkage, but they are used for tentative definitions in C, such as “int X;
” at global scope. Symbols with “common
” linkage are merged in the same way asweak symbols
, and they may not be deleted if unreferenced.common
symbols may not have an explicit section, must have a zero initializer, and may not be marked ‘constant‘. Functions and aliases may not have common linkage.
appending
- “
appending
” linkage may only be applied to global variables of pointer to array type. When two global variables with appending linkage are linked together, the two global arrays are appended together. This is the LLVM, typesafe, equivalent of having the system linker append together “sections” with identical names when .o files are linked. extern_weak
- The semantics of this linkage follow the ELF object file model: the symbol is weak until linked, if not linked, the symbol becomes null instead of being an undefined reference.
linkonce_odr
,weak_odr
- Some languages allow differing globals to be merged, such as two
functions with different semantics. Other languages, such as
C++
, ensure that only equivalent globals are ever merged (the “one definition rule” — “ODR”). Such languages can use thelinkonce_odr
andweak_odr
linkage types to indicate that the global will only be merged with equivalent globals. These linkage types are otherwise the same as their non-odr
versions. external
- If none of the above identifiers are used, the global is externally visible, meaning that it participates in linkage and can be used to resolve external symbol references.
It is illegal for a function declaration to have any linkage type
other than external
or extern_weak
.
Calling Conventions¶
LLVM functions, calls and invokes can all have an optional calling convention specified for the call. The calling convention of any pair of dynamic caller/callee must match, or the behavior of the program is undefined. The following calling conventions are supported by LLVM, and more may be added in the future:
- “
ccc
” - The C calling convention - This calling convention (the default if no other calling convention is specified) matches the target C calling conventions. This calling convention supports varargs function calls and tolerates some mismatch in the declared prototype and implemented declaration of the function (as does normal C).
- “
fastcc
” - The fast calling convention - This calling convention attempts to make calls as fast as possible (e.g. by passing things in registers). This calling convention allows the target to use whatever tricks it wants to produce fast code for the target, without having to conform to an externally specified ABI (Application Binary Interface). Tail calls can only be optimized when this, the GHC or the HiPE convention is used. This calling convention does not support varargs and requires the prototype of all callees to exactly match the prototype of the function definition.
- “
coldcc
” - The cold calling convention - This calling convention attempts to make code in the caller as efficient as possible under the assumption that the call is not commonly executed. As such, these calls often preserve all registers so that the call does not break any live ranges in the caller side. This calling convention does not support varargs and requires the prototype of all callees to exactly match the prototype of the function definition. Furthermore the inliner doesn’t consider such function calls for inlining.
- “
cc 10
” - GHC convention This calling convention has been implemented specifically for use by the Glasgow Haskell Compiler (GHC). It passes everything in registers, going to extremes to achieve this by disabling callee save registers. This calling convention should not be used lightly but only for specific situations such as an alternative to the register pinning performance technique often used when implementing functional programming languages. At the moment only X86 supports this convention and it has the following limitations:
- On X86-32 only supports up to 4 bit type parameters. No floating point types are supported.
- On X86-64 only supports up to 10 bit type parameters and 6 floating point parameters.
This calling convention supports tail call optimization but requires both the caller and callee are using it.
- “
cc 11
” - The HiPE calling convention - This calling convention has been implemented specifically for use by the High-Performance Erlang (HiPE) compiler, the native code compiler of the Ericsson’s Open Source Erlang/OTP system. It uses more registers for argument passing than the ordinary C calling convention and defines no callee-saved registers. The calling convention properly supports tail call optimization but requires that both the caller and the callee use it. It uses a register pinning mechanism, similar to GHC’s convention, for keeping frequently accessed runtime components pinned to specific hardware registers. At the moment only X86 supports this convention (both 32 and 64 bit).
- “
webkit_jscc
” - WebKit’s JavaScript calling convention - This calling convention has been implemented for WebKit FTL JIT. It passes arguments on the stack right to left (as cdecl does), and returns a value in the platform’s customary return register.
- “
anyregcc
” - Dynamic calling convention for code patching - This is a special convention that supports patching an arbitrary code sequence in place of a call site. This convention forces the call arguments into registers but allows them to be dynamically allocated. This can currently only be used with calls to llvm.experimental.patchpoint because only this intrinsic records the location of its arguments in a side table. See Stack maps and patch points in LLVM.
- “
preserve_mostcc
” - The PreserveMost calling convention This calling convention attempts to make the code in the caller as unintrusive as possible. This convention behaves identically to the C calling convention on how arguments and return values are passed, but it uses a different set of caller/callee-saved registers. This alleviates the burden of saving and recovering a large register set before and after the call in the caller. If the arguments are passed in callee-saved registers, then they will be preserved by the callee across the call. This doesn’t apply for values returned in callee-saved registers.
- On X86-64 the callee preserves all general purpose registers, except for R11. R11 can be used as a scratch register. Floating-point registers (XMMs/YMMs) are not preserved and need to be saved by the caller.
The idea behind this convention is to support calls to runtime functions that have a hot path and a cold path. The hot path is usually a small piece of code that doesn’t use many registers. The cold path might need to call out to another function and therefore only needs to preserve the caller-saved registers, which haven’t already been saved by the caller. The PreserveMost calling convention is very similar to the cold calling convention in terms of caller/callee-saved registers, but they are used for different types of function calls. coldcc is for function calls that are rarely executed, whereas preserve_mostcc function calls are intended to be on the hot path and definitely executed a lot. Furthermore preserve_mostcc doesn’t prevent the inliner from inlining the function call.
This calling convention will be used by a future version of the ObjectiveC runtime and should therefore still be considered experimental at this time. Although this convention was created to optimize certain runtime calls to the ObjectiveC runtime, it is not limited to this runtime and might be used by other runtimes in the future too. The current implementation only supports X86-64, but the intention is to support more architectures in the future.
- “
preserve_allcc
” - The PreserveAll calling convention This calling convention attempts to make the code in the caller even less intrusive than the PreserveMost calling convention. This calling convention also behaves identical to the C calling convention on how arguments and return values are passed, but it uses a different set of caller/callee-saved registers. This removes the burden of saving and recovering a large register set before and after the call in the caller. If the arguments are passed in callee-saved registers, then they will be preserved by the callee across the call. This doesn’t apply for values returned in callee-saved registers.
- On X86-64 the callee preserves all general purpose registers, except for R11. R11 can be used as a scratch register. Furthermore it also preserves all floating-point registers (XMMs/YMMs).
The idea behind this convention is to support calls to runtime functions that don’t need to call out to any other functions.
This calling convention, like the PreserveMost calling convention, will be used by a future version of the ObjectiveC runtime and should be considered experimental at this time.
- “
cc <n>
” - Numbered convention - Any calling convention may be specified by number, allowing target-specific calling conventions to be used. Target specific calling conventions start at 64.
More calling conventions can be added/defined on an as-needed basis, to support Pascal conventions or any other well-known target-independent convention.
Visibility Styles¶
All Global Variables and Functions have one of the following visibility styles:
- “
default
” - Default style - On targets that use the ELF object file format, default visibility means that the declaration is visible to other modules and, in shared libraries, means that the declared entity may be overridden. On Darwin, default visibility means that the declaration is visible to other modules. Default visibility corresponds to “external linkage” in the language.
- “
hidden
” - Hidden style - Two declarations of an object with hidden visibility refer to the same object if they are in the same shared object. Usually, hidden visibility indicates that the symbol will not be placed into the dynamic symbol table, so no other module (executable or shared library) can reference it directly.
- “
protected
” - Protected style - On ELF, protected visibility indicates that the symbol will be placed in the dynamic symbol table, but that references within the defining module will bind to the local symbol. That is, the symbol cannot be overridden by another module.
A symbol with internal
or private
linkage must have default
visibility.
DLL Storage Classes¶
All Global Variables, Functions and Aliases can have one of the following DLL storage class:
dllimport
- “
dllimport
” causes the compiler to reference a function or variable via a global pointer to a pointer that is set up by the DLL exporting the symbol. On Microsoft Windows targets, the pointer name is formed by combining__imp_
and the function or variable name. dllexport
- “
dllexport
” causes the compiler to provide a global pointer to a pointer in a DLL, so that it can be referenced with thedllimport
attribute. On Microsoft Windows targets, the pointer name is formed by combining__imp_
and the function or variable name. Since this storage class exists for defining a dll interface, the compiler, assembler and linker know it is externally referenced and must refrain from deleting the symbol.
Thread Local Storage Models¶
A variable may be defined as thread_local
, which means that it will
not be shared by threads (each thread will have a separated copy of the
variable). Not all targets support thread-local variables. Optionally, a
TLS model may be specified:
localdynamic
- For variables that are only used within the current shared library.
initialexec
- For variables in modules that will not be loaded dynamically.
localexec
- For variables defined in the executable and only used within it.
If no explicit model is given, the “general dynamic” model is used.
The models correspond to the ELF TLS models; see ELF Handling For Thread-Local Storage for more information on under which circumstances the different models may be used. The target may choose a different TLS model if the specified model is not supported, or if a better choice of model can be made.
A model can also be specified in a alias, but then it only governs how the alias is accessed. It will not have any effect in the aliasee.
Structure Types¶
LLVM IR allows you to specify both “identified” and “literal” structure types. Literal types are uniqued structurally, but identified types are never uniqued. An opaque structural type can also be used to forward declare a type that is not yet available.
An example of a identified structure specification is:
%mytype = type { %mytype*, i32 }
Prior to the LLVM 3.0 release, identified types were structurally uniqued. Only literal types are uniqued in recent versions of LLVM.
Global Variables¶
Global variables define regions of memory allocated at compilation time instead of run-time.
Global variable definitions must be initialized.
Global variables in other translation units can also be declared, in which case they don’t have an initializer.
Either global variable definitions or declarations may have an explicit section to be placed in and may have an optional explicit alignment specified.
A variable may be defined as a global constant
, which indicates that
the contents of the variable will never be modified (enabling better
optimization, allowing the global data to be placed in the read-only
section of an executable, etc). Note that variables that need runtime
initialization cannot be marked constant
as there is a store to the
variable.
LLVM explicitly allows declarations of global variables to be marked constant, even if the final definition of the global is not. This capability can be used to enable slightly better optimization of the program, but requires the language definition to guarantee that optimizations based on the ‘constantness’ are valid for the translation units that do not include the definition.
As SSA values, global variables define pointer values that are in scope (i.e. they dominate) all basic blocks in the program. Global variables always define a pointer to their “content” type because they describe a region of memory, and all memory objects in LLVM are accessed through pointers.
Global variables can be marked with unnamed_addr
which indicates
that the address is not significant, only the content. Constants marked
like this can be merged with other constants if they have the same
initializer. Note that a constant with significant address can be
merged with a unnamed_addr
constant, the result being a constant
whose address is significant.
A global variable may be declared to reside in a target-specific numbered address space. For targets that support them, address spaces may affect how optimizations are performed and/or what target instructions are used to access the variable. The default address space is zero. The address space qualifier must precede any other attributes.
LLVM allows an explicit section to be specified for globals. If the target supports it, it will emit globals to the section specified. Additionally, the global can placed in a comdat if the target has the necessary support.
By default, global initializers are optimized by assuming that global
variables defined within the module are not modified from their
initial values before the start of the global initializer. This is
true even for variables potentially accessible from outside the
module, including those with external linkage or appearing in
@llvm.used
or dllexported variables. This assumption may be suppressed
by marking the variable with externally_initialized
.
An explicit alignment may be specified for a global, which must be a
power of 2. If not present, or if the alignment is set to zero, the
alignment of the global is set by the target to whatever it feels
convenient. If an explicit alignment is specified, the global is forced
to have exactly that alignment. Targets and optimizers are not allowed
to over-align the global if the global has an assigned section. In this
case, the extra alignment could be observable: for example, code could
assume that the globals are densely packed in their section and try to
iterate over them as an array, alignment padding would break this
iteration. The maximum alignment is 1 << 29
.
Globals can also have a DLL storage class.
Variables and aliases can have a Thread Local Storage Model.
Syntax:
[@<GlobalVarName> =] [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal]
[unnamed_addr] [AddrSpace] [ExternallyInitialized]
<global | constant> <Type> [<InitializerConstant>]
[, section "name"] [, comdat [($name)]]
[, align <Alignment>]
For example, the following defines a global in a numbered address space with an initializer, section, and alignment:
@G = addrspace(5) constant float 1.0, section "foo", align 4
The following example just declares a global variable
@G = external global i32
The following example defines a thread-local global with the
initialexec
TLS model:
@G = thread_local(initialexec) global i32 0, align 4
Functions¶
LLVM function definitions consist of the “define
” keyword, an
optional linkage type, an optional visibility
style, an optional DLL storage class,
an optional calling convention,
an optional unnamed_addr
attribute, a return type, an optional
parameter attribute for the return type, a function
name, a (possibly empty) argument list (each with optional parameter
attributes), optional function attributes,
an optional section, an optional alignment,
an optional comdat,
an optional garbage collector name, an optional prefix,
an optional prologue,
an optional personality,
an opening curly brace, a list of basic blocks, and a closing curly brace.
LLVM function declarations consist of the “declare
” keyword, an
optional linkage type, an optional visibility
style, an optional DLL storage class,
an optional calling convention,
an optional unnamed_addr
attribute, a return type, an optional
parameter attribute for the return type, a function
name, a possibly empty list of arguments, an optional alignment, an optional
garbage collector name, an optional prefix,
and an optional prologue.
A function definition contains a list of basic blocks, forming the CFG (Control Flow Graph) for the function. Each basic block may optionally start with a label (giving the basic block a symbol table entry), contains a list of instructions, and ends with a terminator instruction (such as a branch or function return). If an explicit label is not provided, a block is assigned an implicit numbered label, using the next value from the same counter as used for unnamed temporaries (see above). For example, if a function entry block does not have an explicit label, it will be assigned label “%0”, then the first unnamed temporary in that block will be “%1”, etc.
The first basic block in a function is special in two ways: it is immediately executed on entrance to the function, and it is not allowed to have predecessor basic blocks (i.e. there can not be any branches to the entry block of a function). Because the block can have no predecessors, it also cannot have any PHI nodes.
LLVM allows an explicit section to be specified for functions. If the target supports it, it will emit functions to the section specified. Additionally, the function can be placed in a COMDAT.
An explicit alignment may be specified for a function. If not present, or if the alignment is set to zero, the alignment of the function is set by the target to whatever it feels convenient. If an explicit alignment is specified, the function is forced to have at least that much alignment. All alignments must be a power of 2.
If the unnamed_addr
attribute is given, the address is known to not
be significant and two identical functions can be merged.
Syntax:
define [linkage] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
[unnamed_addr] [fn Attrs] [section "name"] [comdat [($name)]]
[align N] [gc] [prefix Constant] [prologue Constant]
[personality Constant] { ... }
The argument list is a comma seperated sequence of arguments where each argument is of the following form
Syntax:
<type> [parameter Attrs] [name]
Aliases¶
Aliases, unlike function or variables, don’t create any new data. They are just a new symbol and metadata for an existing position.
Aliases have a name and an aliasee that is either a global value or a constant expression.
Aliases may have an optional linkage type, an optional visibility style, an optional DLL storage class and an optional tls model.
Syntax:
@<Name> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias <AliaseeTy> @<Aliasee>
The linkage must be one of private
, internal
, linkonce
, weak
,
linkonce_odr
, weak_odr
, external
. Note that some system linkers
might not correctly handle dropping a weak symbol that is aliased.
Aliases that are not unnamed_addr
are guaranteed to have the same address as
the aliasee expression. unnamed_addr
ones are only guaranteed to point
to the same content.
Since aliases are only a second name, some restrictions apply, of which some can only be checked when producing an object file:
- The expression defining the aliasee must be computable at assembly time. Since it is just a name, no relocations can be used.
- No alias in the expression can be weak as the possibility of the intermediate alias being overridden cannot be represented in an object file.
- No global value in the expression can be a declaration, since that would require a relocation, which is not possible.
Comdats¶
Comdat IR provides access to COFF and ELF object file COMDAT functionality.
Comdats have a name which represents the COMDAT key. All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key. Aliases are placed in the same COMDAT that their aliasee computes to, if any.
Comdats have a selection kind to provide input on how the linker should choose between keys in two different object files.
Syntax:
$<Name> = comdat SelectionKind
The selection kind must be one of the following:
any
- The linker may choose any COMDAT key, the choice is arbitrary.
exactmatch
- The linker may choose any COMDAT key but the sections must contain the same data.
largest
- The linker will choose the section containing the largest COMDAT key.
noduplicates
- The linker requires that only section with this COMDAT key exist.
samesize
- The linker may choose any COMDAT key but the sections must contain the same amount of data.
Note that the Mach-O platform doesn’t support COMDATs and ELF only supports
any
as a selection kind.
Here is an example of a COMDAT group where a function will only be selected if the COMDAT key’s section is the largest:
$foo = comdat largest
@foo = global i32 2, comdat($foo)
define void @bar() comdat($foo) {
ret void
}
As a syntactic sugar the $name
can be omitted if the name is the same as
the global name:
$foo = comdat any
@foo = global i32 2, comdat
In a COFF object file, this will create a COMDAT section with selection kind
IMAGE_COMDAT_SELECT_LARGEST
containing the contents of the @foo
symbol
and another COMDAT section with selection kind
IMAGE_COMDAT_SELECT_ASSOCIATIVE
which is associated with the first COMDAT
section and contains the contents of the @bar
symbol.
There are some restrictions on the properties of the global object. It, or an alias to it, must have the same name as the COMDAT group when targeting COFF. The contents and size of this object may be used during link-time to determine which COMDAT groups get selected depending on the selection kind. Because the name of the object must match the name of the COMDAT group, the linkage of the global object must not be local; local symbols can get renamed if a collision occurs in the symbol table.
The combined use of COMDATS and section attributes may yield surprising results. For example:
$foo = comdat any
$bar = comdat any
@g1 = global i32 42, section "sec", comdat($foo)
@g2 = global i32 42, section "sec", comdat($bar)
From the object file perspective, this requires the creation of two sections with the same name. This is necessary because both globals belong to different COMDAT groups and COMDATs, at the object file level, are represented by sections.
Note that certain IR constructs like global variables and functions may create COMDATs in the object file in addition to any which are specified using COMDAT IR. This arises when the code generator is configured to emit globals in individual sections (e.g. when -data-sections or -function-sections is supplied to llc).
Named Metadata¶
Named metadata is a collection of metadata. Metadata nodes (but not metadata strings) are the only valid operands for a named metadata.
- Named metadata are represented as a string of characters with the
metadata prefix. The rules for metadata names are the same as for
identifiers, but quoted names are not allowed.
"\xx"
type escapes are still valid, which allows any character to be part of a name.
Syntax:
; Some unnamed metadata nodes, which are referenced by the named metadata.
!0 = !{!"zero"}
!1 = !{!"one"}
!2 = !{!"two"}
; A named metadata.
!name = !{!0, !1, !2}
Parameter Attributes¶
The return type and each parameter of a function type may have a set of parameter attributes associated with them. Parameter attributes are used to communicate additional information about the result or parameters of a function. Parameter attributes are considered to be part of the function, not of the function type, so functions with different parameter attributes can have the same function type.
Parameter attributes are simple keywords that follow the type specified. If multiple parameter attributes are needed, they are space separated. For example:
declare i32 @printf(i8* noalias nocapture, ...)
declare i32 @atoi(i8 zeroext)
declare signext i8 @returns_signed_char()
Note that any attributes for the function result (nounwind
,
readonly
) come immediately after the argument list.
Currently, only the following parameter attributes are defined:
zeroext
- This indicates to the code generator that the parameter or return value should be zero-extended to the extent required by the target’s ABI (which is usually 32-bits, but is 8-bits for a i1 on x86-64) by the caller (for a parameter) or the callee (for a return value).
signext
- This indicates to the code generator that the parameter or return value should be sign-extended to the extent required by the target’s ABI (which is usually 32-bits) by the caller (for a parameter) or the callee (for a return value).
inreg
- This indicates that this parameter or return value should be treated in a special target-dependent fashion during while emitting code for a function call or return (usually, by putting it in a register as opposed to memory, though some targets use it to distinguish between two different kinds of registers). Use of this attribute is target-specific.
byval
This indicates that the pointer parameter should really be passed by value to the function. The attribute implies that a hidden copy of the pointee is made between the caller and the callee, so the callee is unable to modify the value in the caller. This attribute is only valid on LLVM pointer arguments. It is generally used to pass structs and arrays by value, but is also valid on pointers to scalars. The copy is considered to belong to the caller not the callee (for example,
readonly
functions should not write tobyval
parameters). This is not a valid attribute for return values.The byval attribute also supports specifying an alignment with the align attribute. It indicates the alignment of the stack slot to form and the known alignment of the pointer specified to the call site. If the alignment is not specified, then the code generator makes a target-specific assumption.
inalloca
The
inalloca
argument attribute allows the caller to take the address of outgoing stack arguments. Aninalloca
argument must be a pointer to stack memory produced by analloca
instruction. The alloca, or argument allocation, must also be tagged with the inalloca keyword. Only the last argument may have theinalloca
attribute, and that argument is guaranteed to be passed in memory.An argument allocation may be used by a call at most once because the call may deallocate it. The
inalloca
attribute cannot be used in conjunction with other attributes that affect argument storage, likeinreg
,nest
,sret
, orbyval
. Theinalloca
attribute also disables LLVM’s implicit lowering of large aggregate return values, which means that frontend authors must lower them withsret
pointers.When the call site is reached, the argument allocation must have been the most recent stack allocation that is still live, or the results are undefined. It is possible to allocate additional stack space after an argument allocation and before its call site, but it must be cleared off with llvm.stackrestore.
See Design and Usage of the InAlloca Attribute for more information on how to use this attribute.
sret
- This indicates that the pointer parameter specifies the address of a structure that is the return value of the function in the source program. This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee not to trap and to be properly aligned. This may only be applied to the first parameter. This is not a valid attribute for return values.
align <n>
This indicates that the pointer value may be assumed by the optimizer to have the specified alignment.
Note that this attribute has additional semantics when combined with the
byval
attribute.
noalias
This indicates that objects accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. The attribute on a return value also has additional semantics described below. The caller shares the responsibility with the callee for ensuring that these requirements are met. For further details, please see the discussion of the NoAlias response in alias analysis.
Note that this definition of
noalias
is intentionally similar to the definition ofrestrict
in C99 for function arguments.For function return values, C99’s
restrict
is not meaningful, while LLVM’snoalias
is. Furthermore, the semantics of thenoalias
attribute on return values are stronger than the semantics of the attribute when used on function arguments. On function return values, thenoalias
attribute indicates that the function acts like a system memory allocation function, returning a pointer to allocated storage disjoint from the storage for any other object accessible to the caller.nocapture
- This indicates that the callee does not make any copies of the pointer that outlive the callee itself. This is not a valid attribute for return values.
nest
- This indicates that the pointer parameter can be excised using the trampoline intrinsics. This is not a valid attribute for return values and can only be applied to one parameter.
returned
- This indicates that the function always returns the argument as its return value. This is an optimization hint to the code generator when generating the caller, allowing tail call optimization and omission of register saves and restores in some cases; it is not checked or enforced when generating the callee. The parameter and the function return type must be valid operands for the bitcast instruction. This is not a valid attribute for return values and can only be applied to one parameter.
nonnull
- This indicates that the parameter or return pointer is not null. This attribute may only be applied to pointer typed parameters. This is not checked or enforced by LLVM, the caller must ensure that the pointer passed in is non-null, or the callee must ensure that the returned pointer is non-null.
dereferenceable(<n>)
- This indicates that the parameter or return pointer is dereferenceable. This
attribute may only be applied to pointer typed parameters. A pointer that
is dereferenceable can be loaded from speculatively without a risk of
trapping. The number of bytes known to be dereferenceable must be provided
in parentheses. It is legal for the number of bytes to be less than the
size of the pointee type. The
nonnull
attribute does not imply dereferenceability (consider a pointer to one element past the end of an array), howeverdereferenceable(<n>)
does implynonnull
inaddrspace(0)
(which is the default address space). dereferenceable_or_null(<n>)
- This indicates that the parameter or return value isn’t both
non-null and non-dereferenceable (up to
<n>
bytes) at the same time. All non-null pointers tagged withdereferenceable_or_null(<n>)
aredereferenceable(<n>)
. For address space 0dereferenceable_or_null(<n>)
implies that a pointer is exactly one ofdereferenceable(<n>)
ornull
, and in other address spacesdereferenceable_or_null(<n>)
implies that a pointer is at least one ofdereferenceable(<n>)
ornull
(i.e. it may be bothnull
anddereferenceable(<n>)
). This attribute may only be applied to pointer typed parameters.
Garbage Collector Strategy Names¶
Each function may specify a garbage collector strategy name, which is simply a string:
define void @f() gc "name" { ... }
The supported values of name includes those built in to LLVM and any provided by loaded plugins. Specifying a GC strategy will cause the compiler to alter its output in order to support the named garbage collection algorithm. Note that LLVM itself does not contain a garbage collector, this functionality is restricted to generating machine code which can interoperate with a collector provided externally.
Prefix Data¶
Prefix data is data associated with a function which the code generator will emit immediately before the function’s entrypoint. The purpose of this feature is to allow frontends to associate language-specific runtime metadata with specific functions and make it available through the function pointer while still allowing the function pointer to be called.
To access the data for a given function, a program may bitcast the
function pointer to a pointer to the constant’s type and dereference
index -1. This implies that the IR symbol points just past the end of
the prefix data. For instance, take the example of a function annotated
with a single i32
,
define void @f() prefix i32 123 { ... }
The prefix data can be referenced as,
%0 = bitcast void* () @f to i32*
%a = getelementptr inbounds i32, i32* %0, i32 -1
%b = load i32, i32* %a
Prefix data is laid out as if it were an initializer for a global variable of the prefix data’s type. The function will be placed such that the beginning of the prefix data is aligned. This means that if the size of the prefix data is not a multiple of the alignment size, the function’s entrypoint will not be aligned. If alignment of the function’s entrypoint is desired, padding must be added to the prefix data.
A function may have prefix data but no body. This has similar semantics
to the available_externally
linkage in that the data may be used by the
optimizers but will not be emitted in the object file.
Prologue Data¶
The prologue
attribute allows arbitrary code (encoded as bytes) to
be inserted prior to the function body. This can be used for enabling
function hot-patching and instrumentation.
To maintain the semantics of ordinary function calls, the prologue data must have a particular format. Specifically, it must begin with a sequence of bytes which decode to a sequence of machine instructions, valid for the module’s target, which transfer control to the point immediately succeeding the prologue data, without performing any other visible action. This allows the inliner and other passes to reason about the semantics of the function definition without needing to reason about the prologue data. Obviously this makes the format of the prologue data highly target dependent.
A trivial example of valid prologue data for the x86 architecture is i8 144
,
which encodes the nop
instruction:
define void @f() prologue i8 144 { ... }
Generally prologue data can be formed by encoding a relative branch instruction
which skips the metadata, as in this example of valid prologue data for the
x86_64 architecture, where the first two bytes encode jmp .+10
:
%0 = type <{ i8, i8, i8* }>
define void @f() prologue %0 <{ i8 235, i8 8, i8* @md}> { ... }
A function may have prologue data but no body. This has similar semantics
to the available_externally
linkage in that the data may be used by the
optimizers but will not be emitted in the object file.
Personality Function¶
The personality
attribute permits functions to specify what function
to use for exception handling.
Attribute Groups¶
Attribute groups are groups of attributes that are referenced by objects within
the IR. They are important for keeping .ll
files readable, because a lot of
functions will use the same set of attributes. In the degenerative case of a
.ll
file that corresponds to a single .c
file, the single attribute
group will capture the important command line flags used to build that file.
An attribute group is a module-level object. To use an attribute group, an
object references the attribute group’s ID (e.g. #37
). An object may refer
to more than one attribute group. In that situation, the attributes from the
different groups are merged.
Here is an example of attribute groups for a function that should always be inlined, has a stack alignment of 4, and which shouldn’t use SSE instructions:
; Target-independent attributes:
attributes #0 = { alwaysinline alignstack=4 }
; Target-dependent attributes:
attributes #1 = { "no-sse" }
; Function @f has attributes: alwaysinline, alignstack=4, and "no-sse".
define void @f() #0 #1 { ... }
Function Attributes¶
Function attributes are set to communicate additional information about a function. Function attributes are considered to be part of the function, not of the function type, so functions with different function attributes can have the same function type.
Function attributes are simple keywords that follow the type specified. If multiple attributes are needed, they are space separated. For example:
define void @f() noinline { ... }
define void @f() alwaysinline { ... }
define void @f() alwaysinline optsize { ... }
define void @f() optsize { ... }
alignstack(<n>)
- This attribute indicates that, when emitting the prologue and epilogue, the backend should forcibly align the stack pointer. Specify the desired alignment, which must be a power of two, in parentheses.
alwaysinline
- This attribute indicates that the inliner should attempt to inline this function into callers whenever possible, ignoring any active inlining size threshold for this caller.
builtin
- This indicates that the callee function at a call site should be
recognized as a built-in function, even though the function’s declaration
uses the
nobuiltin
attribute. This is only valid at call sites for direct calls to functions that are declared with thenobuiltin
attribute. cold
- This attribute indicates that this function is rarely called. When computing edge weights, basic blocks post-dominated by a cold function call are also considered to be cold; and, thus, given low weight.
convergent
- This attribute indicates that the callee is dependent on a convergent thread execution pattern under certain parallel execution models. Transformations that are execution model agnostic may only move or tranform this call if the final location is control equivalent to its original position in the program, where control equivalence is defined as A dominates B and B post-dominates A, or vice versa.
inlinehint
- This attribute indicates that the source code contained a hint that inlining this function is desirable (such as the “inline” keyword in C/C++). It is just a hint; it imposes no requirements on the inliner.
jumptable
- This attribute indicates that the function should be added to a
jump-instruction table at code-generation time, and that all address-taken
references to this function should be replaced with a reference to the
appropriate jump-instruction-table function pointer. Note that this creates
a new pointer for the original function, which means that code that depends
on function-pointer identity can break. So, any function annotated with
jumptable
must also beunnamed_addr
. minsize
- This attribute suggests that optimization passes and code generator passes make choices that keep the code size of this function as small as possible and perform optimizations that may sacrifice runtime performance in order to minimize the size of the generated code.
naked
- This attribute disables prologue / epilogue emission for the function. This can have very system-specific consequences.
nobuiltin
- This indicates that the callee function at a call site is not recognized as
a built-in function. LLVM will retain the original call and not replace it
with equivalent code based on the semantics of the built-in function, unless
the call site uses the
builtin
attribute. This is valid at call sites and on function declarations and definitions. noduplicate
This attribute indicates that calls to the function cannot be duplicated. A call to a
noduplicate
function may be moved within its parent function, but may not be duplicated within its parent function.A function containing a
noduplicate
call may still be an inlining candidate, provided that the call is not duplicated by inlining. That implies that the function has internal linkage and only has one call site, so the original call is dead after inlining.noimplicitfloat
- This attributes disables implicit floating point instructions.
noinline
- This attribute indicates that the inliner should never inline this
function in any situation. This attribute may not be used together
with the
alwaysinline
attribute. nonlazybind
- This attribute suppresses lazy symbol binding for the function. This may make calls to the function faster, at the cost of extra program startup time if the function is not called during program startup.
noredzone
- This attribute indicates that the code generator should not use a red zone, even if the target-specific ABI normally permits it.
noreturn
- This function attribute indicates that the function never returns normally. This produces undefined behavior at runtime if the function ever does dynamically return.
nounwind
- This function attribute indicates that the function never raises an exception. If the function does raise an exception, its runtime behavior is undefined. However, functions marked nounwind may still trap or generate asynchronous exceptions. Exception handling schemes that are recognized by LLVM to handle asynchronous exceptions, such as SEH, will still provide their implementation defined semantics.
optnone
This function attribute indicates that the function is not optimized by any optimization or code generator passes with the exception of interprocedural optimization passes. This attribute cannot be used together with the
alwaysinline
attribute; this attribute is also incompatible with theminsize
attribute and theoptsize
attribute.This attribute requires the
noinline
attribute to be specified on the function as well, so the function is never inlined into any caller. Only functions with thealwaysinline
attribute are valid candidates for inlining into the body of this function.optsize
- This attribute suggests that optimization passes and code generator passes make choices that keep the code size of this function low, and otherwise do optimizations specifically to reduce code size as long as they do not significantly impact runtime performance.
readnone
On a function, this attribute indicates that the function computes its result (or decides to unwind an exception) based strictly on its arguments, without dereferencing any pointer arguments or otherwise accessing any mutable state (e.g. memory, control registers, etc) visible to caller functions. It does not write through any pointer arguments (including
byval
arguments) and never changes any state visible to callers. This means that it cannot unwind exceptions by calling theC++
exception throwing methods.On an argument, this attribute indicates that the function does not dereference that pointer argument, even though it may read or write the memory that the pointer points to if accessed through other pointers.
readonly
On a function, this attribute indicates that the function does not write through any pointer arguments (including
byval
arguments) or otherwise modify any state (e.g. memory, control registers, etc) visible to caller functions. It may dereference pointer arguments and read state that may be set in the caller. A readonly function always returns the same value (or unwinds an exception identically) when called with the same set of arguments and global state. It cannot unwind an exception by calling theC++
exception throwing methods.On an argument, this attribute indicates that the function does not write through this pointer argument, even though it may write to the memory that the pointer points to.
argmemonly
- This attribute indicates that the only memory accesses inside function are
loads and stores from objects pointed to by its pointer-typed arguments,
with arbitrary offsets. Or in other words, all memory operations in the
function can refer to memory only using pointers based on its function
arguments.
Note that
argmemonly
can be used together withreadonly
attribute in order to specify that function reads only from its arguments. returns_twice
- This attribute indicates that this function can return twice. The C
setjmp
is an example of such a function. The compiler disables some optimizations (like tail calls) in the caller of these functions. safestack
This attribute indicates that SafeStack protection is enabled for this function.
If a function that has a
safestack
attribute is inlined into a function that doesn’t have asafestack
attribute or which has anssp
,sspstrong
orsspreq
attribute, then the resulting function will have asafestack
attribute.sanitize_address
- This attribute indicates that AddressSanitizer checks (dynamic address safety analysis) are enabled for this function.
sanitize_memory
- This attribute indicates that MemorySanitizer checks (dynamic detection of accesses to uninitialized memory) are enabled for this function.
sanitize_thread
- This attribute indicates that ThreadSanitizer checks (dynamic thread safety analysis) are enabled for this function.
ssp
This attribute indicates that the function should emit a stack smashing protector. It is in the form of a “canary” — a random value placed on the stack before the local variables that’s checked upon return from the function to see if it has been overwritten. A heuristic is used to determine if a function needs stack protectors or not. The heuristic used will enable protectors for functions with:
- Character arrays larger than
ssp-buffer-size
(default 8). - Aggregates containing character arrays larger than
ssp-buffer-size
. - Calls to alloca() with variable sizes or constant sizes greater than
ssp-buffer-size
.
Variables that are identified as requiring a protector will be arranged on the stack such that they are adjacent to the stack protector guard.
If a function that has an
ssp
attribute is inlined into a function that doesn’t have anssp
attribute, then the resulting function will have anssp
attribute.- Character arrays larger than
sspreq
This attribute indicates that the function should always emit a stack smashing protector. This overrides the
ssp
function attribute.Variables that are identified as requiring a protector will be arranged on the stack such that they are adjacent to the stack protector guard. The specific layout rules are:
- Large arrays and structures containing large arrays
(
>= ssp-buffer-size
) are closest to the stack protector. - Small arrays and structures containing small arrays
(
< ssp-buffer-size
) are 2nd closest to the protector. - Variables that have had their address taken are 3rd closest to the protector.
If a function that has an
sspreq
attribute is inlined into a function that doesn’t have ansspreq
attribute or which has anssp
orsspstrong
attribute, then the resulting function will have ansspreq
attribute.- Large arrays and structures containing large arrays
(
sspstrong
This attribute indicates that the function should emit a stack smashing protector. This attribute causes a strong heuristic to be used when determining if a function needs stack protectors. The strong heuristic will enable protectors for functions with:
- Arrays of any size and type
- Aggregates containing an array of any size and type.
- Calls to alloca().
- Local variables that have had their address taken.
Variables that are identified as requiring a protector will be arranged on the stack such that they are adjacent to the stack protector guard. The specific layout rules are:
- Large arrays and structures containing large arrays
(
>= ssp-buffer-size
) are closest to the stack protector. - Small arrays and structures containing small arrays
(
< ssp-buffer-size
) are 2nd closest to the protector. - Variables that have had their address taken are 3rd closest to the protector.
This overrides the
ssp
function attribute.If a function that has an
sspstrong
attribute is inlined into a function that doesn’t have ansspstrong
attribute, then the resulting function will have ansspstrong
attribute."thunk"
- This attribute indicates that the function will delegate to some other function with a tail call. The prototype of a thunk should not be used for optimization purposes. The caller is expected to cast the thunk prototype to match the thunk target prototype.
uwtable
- This attribute indicates that the ABI being targeted requires that an unwind table entry be produce for this function even if we can show that no exceptions passes by it. This is normally the case for the ELF x86-64 abi, but it can be disabled for some compilation units.
Module-Level Inline Assembly¶
Modules may contain “module-level inline asm” blocks, which corresponds
to the GCC “file scope inline asm” blocks. These blocks are internally
concatenated by LLVM and treated as a single unit, but may be separated
in the .ll
file if desired. The syntax is very simple:
module asm "inline asm code goes here"
module asm "more can go here"
The strings can contain any character by escaping non-printable characters. The escape sequence used is simply “\xx” where “xx” is the two digit hex code for the number.
Note that the assembly string must be parseable by LLVM’s integrated assembler
(unless it is disabled), even when emitting a .s
file.
Data Layout¶
A module may specify a target specific data layout string that specifies how data is to be laid out in memory. The syntax for the data layout is simply:
target datalayout = "layout specification"
The layout specification consists of a list of specifications separated by the minus sign character (‘-‘). Each specification starts with a letter and may include other information after the letter to define some aspect of the data layout. The specifications accepted are as follows:
E
- Specifies that the target lays out data in big-endian form. That is, the bits with the most significance have the lowest address location.
e
- Specifies that the target lays out data in little-endian form. That is, the bits with the least significance have the lowest address location.
S<size>
- Specifies the natural alignment of the stack in bits. Alignment promotion of stack variables is limited to the natural stack alignment to avoid dynamic stack realignment. The stack alignment must be a multiple of 8-bits. If omitted, the natural stack alignment defaults to “unspecified”, which does not prevent any alignment promotions.
p[n]:<size>:<abi>:<pref>
- This specifies the size of a pointer and its
<abi>
and<pref>
erred alignments for address spacen
. All sizes are in bits. The address space,n
is optional, and if not specified, denotes the default address space 0. The value ofn
must be in the range [1,2^23). i<size>:<abi>:<pref>
- This specifies the alignment for an integer type of a given bit
<size>
. The value of<size>
must be in the range [1,2^23). v<size>:<abi>:<pref>
- This specifies the alignment for a vector type of a given bit
<size>
. f<size>:<abi>:<pref>
- This specifies the alignment for a floating point type of a given bit
<size>
. Only values of<size>
that are supported by the target will work. 32 (float) and 64 (double) are supported on all targets; 80 or 128 (different flavors of long double) are also supported on some targets. a:<abi>:<pref>
- This specifies the alignment for an object of aggregate type.
m:<mangling>
If present, specifies that llvm names are mangled in the output. The options are
e
: ELF mangling: Private symbols get a.L
prefix.m
: Mips mangling: Private symbols get a$
prefix.o
: Mach-O mangling: Private symbols getL
prefix. Other symbols get a_
prefix.w
: Windows COFF prefix: Similar to Mach-O, but stdcall and fastcall functions also get a suffix based on the frame size.
n<size1>:<size2>:<size3>...
- This specifies a set of native integer widths for the target CPU in
bits. For example, it might contain
n32
for 32-bit PowerPC,n32:64
for PowerPC 64, orn8:16:32:64
for X86-64. Elements of this set are considered to support most general arithmetic operations efficiently.
On every specification that takes a <abi>:<pref>
, specifying the
<pref>
alignment is optional. If omitted, the preceding :
should be omitted too and <pref>
will be equal to <abi>
.
When constructing the data layout for a given target, LLVM starts with a
default set of specifications which are then (possibly) overridden by
the specifications in the datalayout
keyword. The default
specifications are given in this list:
E
- big endianp:64:64:64
- 64-bit pointers with 64-bit alignment.p[n]:64:64:64
- Other address spaces are assumed to be the same as the default address space.S0
- natural stack alignment is unspecifiedi1:8:8
- i1 is 8-bit (byte) alignedi8:8:8
- i8 is 8-bit (byte) alignedi16:16:16
- i16 is 16-bit alignedi32:32:32
- i32 is 32-bit alignedi64:32:64
- i64 has ABI alignment of 32-bits but preferred alignment of 64-bitsf16:16:16
- half is 16-bit alignedf32:32:32
- float is 32-bit alignedf64:64:64
- double is 64-bit alignedf128:128:128
- quad is 128-bit alignedv64:64:64
- 64-bit vector is 64-bit alignedv128:128:128
- 128-bit vector is 128-bit aligneda:0:64
- aggregates are 64-bit aligned
When LLVM is determining the alignment for a given type, it uses the following rules:
- If the type sought is an exact match for one of the specifications, that specification is used.
- If no match is found, and the type sought is an integer type, then the smallest integer type that is larger than the bitwidth of the sought type is used. If none of the specifications are larger than the bitwidth then the largest integer type is used. For example, given the default specifications above, the i7 type will use the alignment of i8 (next largest) while both i65 and i256 will use the alignment of i64 (largest specified).
- If no match is found, and the type sought is a vector type, then the largest vector type that is smaller than the sought vector type will be used as a fall back. This happens because <128 x double> can be implemented in terms of 64 <2 x double>, for example.
The function of the data layout string may not be what you expect. Notably, this is not a specification from the frontend of what alignment the code generator should use.
Instead, if specified, the target data layout is required to match what the ultimate code generator expects. This string is used by the mid-level optimizers to improve code, and this only works if it matches what the ultimate code generator uses. There is no way to generate IR that does not embed this target-specific detail into the IR. If you don’t specify the string, the default specifications will be used to generate a Data Layout and the optimization phases will operate accordingly and introduce target specificity into the IR with respect to these default specifications.
Target Triple¶
A module may specify a target triple string that describes the target host. The syntax for the target triple is simply:
target triple = "x86_64-apple-macosx10.7.0"
The target triple string consists of a series of identifiers delimited by the minus sign character (‘-‘). The canonical forms are:
ARCHITECTURE-VENDOR-OPERATING_SYSTEM
ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
This information is passed along to the backend so that it generates
code for the proper architecture. It’s possible to override this on the
command line with the -mtriple
command line option.
Pointer Aliasing Rules¶
Any memory access must be done through a pointer value associated with an address range of the memory access, otherwise the behavior is undefined. Pointer values are associated with address ranges according to the following rules:
- A pointer value is associated with the addresses associated with any value it is based on.
- An address of a global variable is associated with the address range of the variable’s storage.
- The result value of an allocation instruction is associated with the address range of the allocated storage.
- A null pointer in the default address-space is associated with no address.
- An integer constant other than zero or a pointer value returned from a function not defined within LLVM may be associated with address ranges allocated through mechanisms other than those provided by LLVM. Such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM.
A pointer value is based on another pointer value according to the following rules:
- A pointer value formed from a
getelementptr
operation is based on the first value operand of thegetelementptr
. - The result value of a
bitcast
is based on the operand of thebitcast
. - A pointer value formed by an
inttoptr
is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value. - The “based on” relationship is transitive.
Note that this definition of “based” is intentionally similar to the definition of “based” in C99, though it is slightly weaker.
LLVM IR does not associate types with memory. The result type of a
load
merely indicates the size and alignment of the memory from
which to load, as well as the interpretation of the value. The first
operand type of a store
similarly only indicates the size and
alignment of the store.
Consequently, type-based alias analysis, aka TBAA, aka
-fstrict-aliasing
, is not applicable to general unadorned LLVM IR.
Metadata may be used to encode additional information
which specialized optimization passes may use to implement type-based
alias analysis.
Volatile Memory Accesses¶
Certain memory accesses, such as load‘s,
store‘s, and llvm.memcpy‘s may be
marked volatile
. The optimizers must not change the number of
volatile operations or change their order of execution relative to other
volatile operations. The optimizers may change the order of volatile
operations relative to non-volatile operations. This is not Java’s
“volatile” and has no cross-thread synchronization behavior.
IR-level volatile loads and stores cannot safely be optimized into llvm.memcpy or llvm.memmove intrinsics even when those intrinsics are flagged volatile. Likewise, the backend should never split or merge target-legal volatile load/store instructions.
Rationale
Platforms may rely on volatile loads and stores of natively supported data width to be executed as single instruction. For example, in C this holds for an l-value of volatile primitive type with native hardware support, but not necessarily for aggregate types. The frontend upholds these expectations, which are intentionally unspecified in the IR. The rules above ensure that IR transformation do not violate the frontend’s contract with the language.
Memory Model for Concurrent Operations¶
The LLVM IR does not define any way to start parallel threads of execution or to register signal handlers. Nonetheless, there are platform-specific ways to create them, and we define LLVM IR’s behavior in their presence. This model is inspired by the C++0x memory model.
For a more informal introduction to this model, see the LLVM Atomic Instructions and Concurrency Guide.
We define a happens-before partial order as the least partial order that
- Is a superset of single-thread program order, and
- When a synchronizes-with
b
, includes an edge froma
tob
. Synchronizes-with pairs are introduced by platform-specific techniques, like pthread locks, thread creation, thread joining, etc., and by atomic instructions. (See also Atomic Memory Ordering Constraints).
Note that program order does not introduce happens-before edges between a thread and signals executing inside that thread.
Every (defined) read operation (load instructions, memcpy, atomic loads/read-modify-writes, etc.) R reads a series of bytes written by (defined) write operations (store instructions, atomic stores/read-modify-writes, memcpy, etc.). For the purposes of this section, initialized globals are considered to have a write of the initializer which is atomic and happens before any other read or write of the memory in question. For each byte of a read R, Rbyte may see any write to the same byte, except:
- If write1 happens before write2, and write2 happens before Rbyte, then Rbyte does not see write1.
- If Rbyte happens before write3, then Rbyte does not see write3.
Given that definition, Rbyte is defined as follows:
- If R is volatile, the result is target-dependent. (Volatile is
supposed to give guarantees which can support
sig_atomic_t
in C/C++, and may be used for accesses to addresses that do not behave like normal memory. It does not generally provide cross-thread synchronization.) - Otherwise, if there is no write to the same byte that happens before
Rbyte, Rbyte returns
undef
for that byte. - Otherwise, if Rbyte may see exactly one write, Rbyte returns the value written by that write.
- Otherwise, if R is atomic, and all the writes Rbyte may see are atomic, it chooses one of the values written. See the Atomic Memory Ordering Constraints section for additional constraints on how the choice is made.
- Otherwise Rbyte returns
undef
.
R returns the value composed of the series of bytes it read. This
implies that some bytes within the value may be undef
without
the entire value being undef
. Note that this only defines the
semantics of the operation; it doesn’t mean that targets will emit more
than one instruction to read the series of bytes.
Note that in cases where none of the atomic intrinsics are used, this model places only one restriction on IR transformations on top of what is required for single-threaded execution: introducing a store to a byte which might not otherwise be stored is not allowed in general. (Specifically, in the case where another thread might write to and read from an address, introducing a store can change a load that may see exactly one write into a load that may see multiple writes.)
Atomic Memory Ordering Constraints¶
Atomic instructions (cmpxchg, atomicrmw, fence, atomic load, and atomic store) take ordering parameters that determine which other atomic instructions on the same address they synchronize with. These semantics are borrowed from Java and C++0x, but are somewhat more colloquial. If these descriptions aren’t precise enough, check those specs (see spec references in the atomics guide). fence instructions treat these orderings somewhat differently since they don’t take an address. See that instruction’s documentation for details.
For a simpler introduction to the ordering constraints, see the LLVM Atomic Instructions and Concurrency Guide.
unordered
- The set of values that can be read is governed by the happens-before partial order. A value cannot be read unless some operation wrote it. This is intended to provide a guarantee strong enough to model Java’s non-volatile shared variables. This ordering cannot be specified for read-modify-write operations; it is not strong enough to make them atomic in any interesting way.
monotonic
- In addition to the guarantees of
unordered
, there is a single total order for modifications bymonotonic
operations on each address. All modification orders must be compatible with the happens-before order. There is no guarantee that the modification orders can be combined to a global total order for the whole program (and this often will not be possible). The read in an atomic read-modify-write operation (cmpxchg and atomicrmw) reads the value in the modification order immediately before the value it writes. If one atomic read happens before another atomic read of the same address, the later read must see the same value or a later value in the address’s modification order. This disallows reordering ofmonotonic
(or stronger) operations on the same address. If an address is writtenmonotonic
-ally by one thread, and other threadsmonotonic
-ally read that address repeatedly, the other threads must eventually see the write. This corresponds to the C++0x/C1xmemory_order_relaxed
. acquire
- In addition to the guarantees of
monotonic
, a synchronizes-with edge may be formed with arelease
operation. This is intended to model C++’smemory_order_acquire
. release
- In addition to the guarantees of
monotonic
, if this operation writes a value which is subsequently read by anacquire
operation, it synchronizes-with that operation. (This isn’t a complete description; see the C++0x definition of a release sequence.) This corresponds to the C++0x/C1xmemory_order_release
. acq_rel
(acquire+release)- Acts as both an
acquire
andrelease
operation on its address. This corresponds to the C++0x/C1xmemory_order_acq_rel
. seq_cst
(sequentially consistent)- In addition to the guarantees of
acq_rel
(acquire
for an operation that only reads,release
for an operation that only writes), there is a global total order on all sequentially-consistent operations on all addresses, which is consistent with the happens-before partial order and with the modification orders of all the affected addresses. Each sequentially-consistent read sees the last preceding write to the same address in this global order. This corresponds to the C++0x/C1xmemory_order_seq_cst
and Java volatile.
If an atomic operation is marked singlethread
, it only synchronizes
with or participates in modification and seq_cst total orderings with
other operations running in the same thread (for example, in signal
handlers).
Fast-Math Flags¶
LLVM IR floating-point binary ops (fadd, fsub, fmul, fdiv, frem, fcmp) have the following flags that can be set to enable otherwise unsafe floating point operations
nnan
- No NaNs - Allow optimizations to assume the arguments and result are not NaN. Such optimizations are required to retain defined behavior over NaNs, but the value of the result is undefined.
ninf
- No Infs - Allow optimizations to assume the arguments and result are not +/-Inf. Such optimizations are required to retain defined behavior over +/-Inf, but the value of the result is undefined.
nsz
- No Signed Zeros - Allow optimizations to treat the sign of a zero argument or result as insignificant.
arcp
- Allow Reciprocal - Allow optimizations to use the reciprocal of an argument rather than perform division.
fast
- Fast - Allow algebraically equivalent transformations that may dramatically change results in floating point (e.g. reassociate). This flag implies all the others.
Use-list Order Directives¶
Use-list directives encode the in-memory order of each use-list, allowing the
order to be recreated. <order-indexes>
is a comma-separated list of
indexes that are assigned to the referenced value’s uses. The referenced
value’s use-list is immediately sorted by these indexes.
Use-list directives may appear at function scope or global scope. They are not instructions, and have no effect on the semantics of the IR. When they’re at function scope, they must appear after the terminator of the final basic block.
If basic blocks have their address taken via blockaddress()
expressions,
uselistorder_bb
can be used to reorder their use-lists from outside their
function’s scope.
Syntax: |
---|
uselistorder <ty> <value>, { <order-indexes> }
uselistorder_bb @function, %block { <order-indexes> }
Examples: |
---|
define void @foo(i32 %arg1, i32 %arg2) {
entry:
; ... instructions ...
bb:
; ... instructions ...
; At function scope.
uselistorder i32 %arg1, { 1, 0, 2 }
uselistorder label %bb, { 1, 0 }
}
; At global scope.
uselistorder i32* @global, { 1, 2, 0 }
uselistorder i32 7, { 1, 0 }
uselistorder i32 (i32) @bar, { 1, 0 }
uselistorder_bb @foo, %bb, { 5, 1, 3, 2, 0, 4 }
Type System¶
The LLVM type system is one of the most important features of the intermediate representation. Being typed enables a number of optimizations to be performed on the intermediate representation directly, without having to do extra analyses on the side before the transformation. A strong type system makes it easier to read the generated code and enables novel analyses and transformations that are not feasible to perform on normal three address code representations.
Function Type¶
Overview: |
---|
The function type can be thought of as a function signature. It consists of a return type and a list of formal parameter types. The return type of a function type is a void type or first class type — except for label and metadata types.
Syntax: |
---|
<returntype> (<parameter list>)
...where ‘<parameter list>
‘ is a comma-separated list of type
specifiers. Optionally, the parameter list may include a type ...
, which
indicates that the function takes a variable number of arguments. Variable
argument functions can access their arguments with the variable argument
handling intrinsic functions. ‘<returntype>
‘ is any type
except label and metadata.
Examples: |
---|
i32 (i32) |
function taking an i32 , returning an i32 |
float (i16, i32 *) * |
Pointer to a function that takes an i16 and a pointer to i32 , returning float . |
i32 (i8*, ...) |
A vararg function that takes at least one pointer to i8 (char in C), which returns an integer. This is the signature for printf in LLVM. |
{i32, i32} (i32) |
A function taking an i32 , returning a structure containing two i32 values |
First Class Types¶
The first class types are perhaps the most important. Values of these types are the only ones which can be produced by instructions.
Single Value Types¶
These are the types that are valid in registers from CodeGen’s perspective.
Integer Type¶
Overview: |
---|
The integer type is a very simple type that simply specifies an arbitrary bit width for the integer type desired. Any bit width from 1 bit to 223-1 (about 8 million) can be specified.
Syntax: |
---|
iN
The number of bits the integer will occupy is specified by the N
value.
i1 |
a single-bit integer. |
i32 |
a 32-bit integer. |
i1942652 |
a really big integer of over 1 million bits. |
Floating Point Types¶
Type | Description |
---|---|
half |
16-bit floating point value |
float |
32-bit floating point value |
double |
64-bit floating point value |
fp128 |
128-bit floating point value (112-bit mantissa) |
x86_fp80 |
80-bit floating point value (X87) |
ppc_fp128 |
128-bit floating point value (two 64-bits) |
X86_mmx Type¶
Overview: |
---|
The x86_mmx type represents a value held in an MMX register on an x86 machine. The operations allowed on it are quite limited: parameters and return values, load and store, and bitcast. User-specified MMX instructions are represented as intrinsic or asm calls with arguments and/or results of this type. There are no arrays, vectors or constants of this type.
Syntax: |
---|
x86_mmx
Pointer Type¶
Overview: |
---|
The pointer type is used to specify memory locations. Pointers are commonly used to reference objects in memory.
Pointer types may have an optional address space attribute defining the numbered address space where the pointed-to object resides. The default address space is number zero. The semantics of non-zero address spaces are target-specific.
Note that LLVM does not permit pointers to void (void*
) nor does it
permit pointers to labels (label*
). Use i8*
instead.
Syntax: |
---|
<type> *
Examples: |
---|
[4 x i32]* |
A pointer to array of four i32 values. |
i32 (i32*) * |
A pointer to a function that takes an i32* , returning an i32 . |
i32 addrspace(5)* |
A pointer to an i32 value that resides in address space #5. |
Vector Type¶
Overview: |
---|
A vector type is a simple derived type that represents a vector of elements. Vector types are used when multiple primitive data are operated in parallel using a single instruction (SIMD). A vector type requires a size (number of elements) and an underlying primitive data type. Vector types are considered first class.
Syntax: |
---|
< <# elements> x <elementtype> >
The number of elements is a constant integer value larger than 0; elementtype may be any integer, floating point or pointer type. Vectors of size zero are not allowed.
Examples: |
---|
<4 x i32> |
Vector of 4 32-bit integer values. |
<8 x float> |
Vector of 8 32-bit floating-point values. |
<2 x i64> |
Vector of 2 64-bit integer values. |
<4 x i64*> |
Vector of 4 pointers to 64-bit integer values. |
Metadata Type¶
Overview: |
---|
The metadata type represents embedded metadata. No derived types may be created from metadata except for function arguments.
Syntax: |
---|
metadata
Aggregate Types¶
Aggregate Types are a subset of derived types that can contain multiple member types. Arrays and structs are aggregate types. Vectors are not considered to be aggregate types.
Array Type¶
Overview: |
---|
The array type is a very simple derived type that arranges elements sequentially in memory. The array type requires a size (number of elements) and an underlying data type.
Syntax: |
---|
[<# elements> x <elementtype>]
The number of elements is a constant integer value; elementtype
may
be any type with a size.
Examples: |
---|
[40 x i32] |
Array of 40 32-bit integer values. |
[41 x i32] |
Array of 41 32-bit integer values. |
[4 x i8] |
Array of 4 8-bit integer values. |
Here are some examples of multidimensional arrays:
[3 x [4 x i32]] |
3x4 array of 32-bit integer values. |
[12 x [10 x float]] |
12x10 array of single precision floating point values. |
[2 x [3 x [4 x i16]]] |
2x3x4 array of 16-bit integer values. |
There is no restriction on indexing beyond the end of the array implied
by a static type (though there are restrictions on indexing beyond the
bounds of an allocated object in some cases). This means that
single-dimension ‘variable sized array’ addressing can be implemented in
LLVM with a zero length array type. An implementation of ‘pascal style
arrays’ in LLVM could use the type “{ i32, [0 x float]}
”, for
example.
Structure Type¶
Overview: |
---|
The structure type is used to represent a collection of data members together in memory. The elements of a structure may be any type that has a size.
Structures in memory are accessed using ‘load
‘ and ‘store
‘ by
getting a pointer to a field with the ‘getelementptr
‘ instruction.
Structures in registers are accessed using the ‘extractvalue
‘ and
‘insertvalue
‘ instructions.
Structures may optionally be “packed” structures, which indicate that the alignment of the struct is one byte, and that there is no padding between the elements. In non-packed structs, padding between field types is inserted as defined by the DataLayout string in the module, which is required to match what the underlying code generator expects.
Structures can either be “literal” or “identified”. A literal structure
is defined inline with other types (e.g. {i32, i32}*
) whereas
identified types are always defined at the top level with a name.
Literal types are uniqued by their contents and can never be recursive
or opaque since there is no way to write one. Identified types can be
recursive, can be opaqued, and are never uniqued.
Syntax: |
---|
%T1 = type { <type list> } ; Identified normal struct type
%T2 = type <{ <type list> }> ; Identified packed struct type
Examples: |
---|
{ i32, i32, i32 } |
A triple of three i32 values |
{ float, i32 (i32) * } |
A pair, where the first element is a float and the second element is a pointer to a function that takes an i32 , returning an i32 . |
<{ i8, i32 }> |
A packed struct known to be 5 bytes in size. |
Opaque Structure Types¶
Overview: |
---|
Opaque structure types are used to represent named structure types that do not have a body specified. This corresponds (for example) to the C notion of a forward declared structure.
Syntax: |
---|
%X = type opaque
%52 = type opaque
Examples: |
---|
opaque |
An opaque type. |
Constants¶
LLVM has several different basic types of constants. This section describes them all and their syntax.
Simple Constants¶
- Boolean constants
- The two strings ‘
true
‘ and ‘false
‘ are both valid constants of thei1
type. - Integer constants
- Standard integers (such as ‘4’) are constants of the integer type. Negative numbers may be used with integer types.
- Floating point constants
- Floating point constants use standard decimal notation (e.g. 123.421), exponential notation (e.g. 1.23421e+2), or a more precise hexadecimal notation (see below). The assembler requires the exact decimal value of a floating-point constant. For example, the assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating decimal in binary. Floating point constants must have a floating point type.
- Null pointer constants
- The identifier ‘
null
‘ is recognized as a null pointer constant and must be of pointer type.
The one non-intuitive notation for constants is the hexadecimal form of
floating point constants. For example, the form
‘double 0x432ff973cafa8000
‘ is equivalent to (but harder to read
than) ‘double 4.5e+15
‘. The only time hexadecimal floating point
constants are required (and the only time that they are generated by the
disassembler) is when a floating point constant must be emitted but it
cannot be represented as a decimal floating point number in a reasonable
number of digits. For example, NaN’s, infinities, and other special
values are represented in their IEEE hexadecimal format so that assembly
and disassembly do not cause any bits to change in the constants.
When using the hexadecimal form, constants of types half, float, and
double are represented using the 16-digit form shown above (which
matches the IEEE754 representation for double); half and float values
must, however, be exactly representable as IEEE 754 half and single
precision, respectively. Hexadecimal format is always used for long
double, and there are three forms of long double. The 80-bit format used
by x86 is represented as 0xK
followed by 20 hexadecimal digits. The
128-bit format used by PowerPC (two adjacent doubles) is represented by
0xM
followed by 32 hexadecimal digits. The IEEE 128-bit format is
represented by 0xL
followed by 32 hexadecimal digits. Long doubles
will only work if they match the long double format on your target.
The IEEE 16-bit format (half precision) is represented by 0xH
followed by 4 hexadecimal digits. All hexadecimal formats are big-endian
(sign bit at the left).
There are no constants of type x86_mmx.
Complex Constants¶
Complex constants are a (potentially recursive) combination of simple constants and smaller complex constants.
- Structure constants
- Structure constants are represented with notation similar to
structure type definitions (a comma separated list of elements,
surrounded by braces (
{}
)). For example: “{ i32 4, float 17.0, i32* @G }
”, where “@G
” is declared as “@G = external global i32
”. Structure constants must have structure type, and the number and types of elements must match those specified by the type. - Array constants
- Array constants are represented with notation similar to array type
definitions (a comma separated list of elements, surrounded by
square brackets (
[]
)). For example: “[ i32 42, i32 11, i32 74 ]
”. Array constants must have array type, and the number and types of elements must match those specified by the type. As a special case, character array constants may also be represented as a double-quoted string using thec
prefix. For example: “c"Hello World\0A\00"
”. - Vector constants
- Vector constants are represented with notation similar to vector
type definitions (a comma separated list of elements, surrounded by
less-than/greater-than’s (
<>
)). For example: “< i32 42, i32 11, i32 74, i32 100 >
”. Vector constants must have vector type, and the number and types of elements must match those specified by the type. - Zero initialization
- The string ‘
zeroinitializer
‘ can be used to zero initialize a value to zero of any type, including scalar and aggregate types. This is often used to avoid having to print large zero initializers (e.g. for large arrays) and is always exactly equivalent to using explicit zero initializers. - Metadata node
- A metadata node is a constant tuple without types. For example:
“
!{!0, !{!2, !0}, !"test"}
”. Metadata can reference constant values, for example: “!{!0, i32 0, i8* @global, i64 (i64)* @function, !"str"}
”. Unlike other typed constants that are meant to be interpreted as part of the instruction stream, metadata is a place to attach additional information such as debug info.
Global Variable and Function Addresses¶
The addresses of global variables and functions are always implicitly valid (link-time) constants. These constants are explicitly referenced when the identifier for the global is used and always have pointer type. For example, the following is a legal LLVM file:
@X = global i32 17
@Y = global i32 42
@Z = global [2 x i32*] [ i32* @X, i32* @Y ]
Undefined Values¶
The string ‘undef
‘ can be used anywhere a constant is expected, and
indicates that the user of the value may receive an unspecified
bit-pattern. Undefined values may be of any type (other than ‘label
‘
or ‘void
‘) and be used anywhere a constant is permitted.
Undefined values are useful because they indicate to the compiler that the program is well defined no matter what value is used. This gives the compiler more freedom to optimize. Here are some examples of (potentially surprising) transformations that are valid (in pseudo IR):
%A = add %X, undef
%B = sub %X, undef
%C = xor %X, undef
Safe:
%A = undef
%B = undef
%C = undef
This is safe because all of the output bits are affected by the undef bits. Any output bit can have a zero or one depending on the input bits.
%A = or %X, undef
%B = and %X, undef
Safe:
%A = -1
%B = 0
Unsafe:
%A = undef
%B = undef
These logical operations have bits that are not always affected by the
input. For example, if %X
has a zero bit, then the output of the
‘and
‘ operation will always be a zero for that bit, no matter what
the corresponding bit from the ‘undef
‘ is. As such, it is unsafe to
optimize or assume that the result of the ‘and
‘ is ‘undef
‘.
However, it is safe to assume that all bits of the ‘undef
‘ could be
0, and optimize the ‘and
‘ to 0. Likewise, it is safe to assume that
all the bits of the ‘undef
‘ operand to the ‘or
‘ could be set,
allowing the ‘or
‘ to be folded to -1.
%A = select undef, %X, %Y
%B = select undef, 42, %Y
%C = select %X, %Y, undef
Safe:
%A = %X (or %Y)
%B = 42 (or %Y)
%C = %Y
Unsafe:
%A = undef
%B = undef
%C = undef
This set of examples shows that undefined ‘select
‘ (and conditional
branch) conditions can go either way, but they have to come from one
of the two operands. In the %A
example, if %X
and %Y
were
both known to have a clear low bit, then %A
would have to have a
cleared low bit. However, in the %C
example, the optimizer is
allowed to assume that the ‘undef
‘ operand could be the same as
%Y
, allowing the whole ‘select
‘ to be eliminated.
%A = xor undef, undef
%B = undef
%C = xor %B, %B
%D = undef
%E = icmp slt %D, 4
%F = icmp gte %D, 4
Safe:
%A = undef
%B = undef
%C = undef
%D = undef
%E = undef
%F = undef
This example points out that two ‘undef
‘ operands are not
necessarily the same. This can be surprising to people (and also matches
C semantics) where they assume that “X^X
” is always zero, even if
X
is undefined. This isn’t true for a number of reasons, but the
short answer is that an ‘undef
‘ “variable” can arbitrarily change
its value over its “live range”. This is true because the variable
doesn’t actually have a live range. Instead, the value is logically
read from arbitrary registers that happen to be around when needed, so
the value is not necessarily consistent over time. In fact, %A
and
%C
need to have the same semantics or the core LLVM “replace all
uses with” concept would not hold.
%A = fdiv undef, %X
%B = fdiv %X, undef
Safe:
%A = undef
b: unreachable
These examples show the crucial difference between an undefined value
and undefined behavior. An undefined value (like ‘undef
‘) is
allowed to have an arbitrary bit-pattern. This means that the %A
operation can be constant folded to ‘undef
‘, because the ‘undef
‘
could be an SNaN, and fdiv
is not (currently) defined on SNaN’s.
However, in the second example, we can make a more aggressive
assumption: because the undef
is allowed to be an arbitrary value,
we are allowed to assume that it could be zero. Since a divide by zero
has undefined behavior, we are allowed to assume that the operation
does not execute at all. This allows us to delete the divide and all
code after it. Because the undefined operation “can’t happen”, the
optimizer can assume that it occurs in dead code.
a: store undef -> %X
b: store %X -> undef
Safe:
a: <deleted>
b: unreachable
These examples reiterate the fdiv
example: a store of an undefined
value can be assumed to not have any effect; we can assume that the
value is overwritten with bits that happen to match what was already
there. However, a store to an undefined location could clobber
arbitrary memory, therefore, it has undefined behavior.
Poison Values¶
Poison values are similar to undef values, however they also represent the fact that an instruction or constant expression that cannot evoke side effects has nevertheless detected a condition that results in undefined behavior.
There is currently no way of representing a poison value in the IR; they
only exist when produced by operations such as add with
the nsw
flag.
Poison value behavior is defined in terms of value dependence:
- Values other than phi nodes depend on their operands.
- Phi nodes depend on the operand corresponding to their dynamic predecessor basic block.
- Function arguments depend on the corresponding actual argument values in the dynamic callers of their functions.
- Call instructions depend on the ret instructions that dynamically transfer control back to them.
- Invoke instructions depend on the ret, resume, or exception-throwing call instructions that dynamically transfer control back to them.
- Non-volatile loads and stores depend on the most recent stores to all of the referenced memory addresses, following the order in the IR (including loads and stores implied by intrinsics such as @llvm.memcpy.)
- An instruction with externally visible side effects depends on the most recent preceding instruction with externally visible side effects, following the order in the IR. (This includes volatile operations.)
- An instruction control-depends on a terminator instruction if the terminator instruction has multiple successors and the instruction is always executed when control transfers to one of the successors, and may not be executed when control is transferred to another.
- Additionally, an instruction also control-depends on a terminator instruction if the set of instructions it otherwise depends on would be different if the terminator had transferred control to a different successor.
- Dependence is transitive.
Poison values have the same behavior as undef values, with the additional effect that any instruction that has a dependence on a poison value has undefined behavior.
Here are some examples:
entry:
%poison = sub nuw i32 0, 1 ; Results in a poison value.
%still_poison = and i32 %poison, 0 ; 0, but also poison.
%poison_yet_again = getelementptr i32, i32* @h, i32 %still_poison
store i32 0, i32* %poison_yet_again ; memory at @h[0] is poisoned
store i32 %poison, i32* @g ; Poison value stored to memory.
%poison2 = load i32, i32* @g ; Poison value loaded back from memory.
store volatile i32 %poison, i32* @g ; External observation; undefined behavior.
%narrowaddr = bitcast i32* @g to i16*
%wideaddr = bitcast i32* @g to i64*
%poison3 = load i16, i16* %narrowaddr ; Returns a poison value.
%poison4 = load i64, i64* %wideaddr ; Returns a poison value.
%cmp = icmp slt i32 %poison, 0 ; Returns a poison value.
br i1 %cmp, label %true, label %end ; Branch to either destination.
true:
store volatile i32 0, i32* @g ; This is control-dependent on %cmp, so
; it has undefined behavior.
br label %end
end:
%p = phi i32 [ 0, %entry ], [ 1, %true ]
; Both edges into this PHI are
; control-dependent on %cmp, so this
; always results in a poison value.
store volatile i32 0, i32* @g ; This would depend on the store in %true
; if %cmp is true, or the store in %entry
; otherwise, so this is undefined behavior.
br i1 %cmp, label %second_true, label %second_end
; The same branch again, but this time the
; true block doesn't have side effects.
second_true:
; No side effects!
ret void
second_end:
store volatile i32 0, i32* @g ; This time, the instruction always depends
; on the store in %end. Also, it is
; control-equivalent to %end, so this is
; well-defined (ignoring earlier undefined
; behavior in this example).
Addresses of Basic Blocks¶
blockaddress(@function, %block)
The ‘blockaddress
‘ constant computes the address of the specified
basic block in the specified function, and always has an i8*
type.
Taking the address of the entry block is illegal.
This value only has defined behavior when used as an operand to the
‘indirectbr‘ instruction, or for comparisons
against null. Pointer equality tests between labels addresses results in
undefined behavior — though, again, comparison against null is ok, and
no label is equal to the null pointer. This may be passed around as an
opaque pointer sized value as long as the bits are not inspected. This
allows ptrtoint
and arithmetic to be performed on these values so
long as the original value is reconstituted before the indirectbr
instruction.
Finally, some targets may provide defined semantics when using the value as the operand to an inline assembly, but that is target specific.
Constant Expressions¶
Constant expressions are used to allow expressions involving other constants to be used as constants. Constant expressions may be of any first class type and may involve any LLVM operation that does not have side effects (e.g. load and call are not supported). The following is the syntax for constant expressions:
trunc (CST to TYPE)
- Truncate a constant to another type. The bit size of CST must be larger than the bit size of TYPE. Both types must be integers.
zext (CST to TYPE)
- Zero extend a constant to another type. The bit size of CST must be smaller than the bit size of TYPE. Both types must be integers.
sext (CST to TYPE)
- Sign extend a constant to another type. The bit size of CST must be smaller than the bit size of TYPE. Both types must be integers.
fptrunc (CST to TYPE)
- Truncate a floating point constant to another floating point type. The size of CST must be larger than the size of TYPE. Both types must be floating point.
fpext (CST to TYPE)
- Floating point extend a constant to another type. The size of CST must be smaller or equal to the size of TYPE. Both types must be floating point.
fptoui (CST to TYPE)
- Convert a floating point constant to the corresponding unsigned integer constant. TYPE must be a scalar or vector integer type. CST must be of scalar or vector floating point type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won’t fit in the integer type, the results are undefined.
fptosi (CST to TYPE)
- Convert a floating point constant to the corresponding signed integer constant. TYPE must be a scalar or vector integer type. CST must be of scalar or vector floating point type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won’t fit in the integer type, the results are undefined.
uitofp (CST to TYPE)
- Convert an unsigned integer constant to the corresponding floating point constant. TYPE must be a scalar or vector floating point type. CST must be of scalar or vector integer type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won’t fit in the floating point type, the results are undefined.
sitofp (CST to TYPE)
- Convert a signed integer constant to the corresponding floating point constant. TYPE must be a scalar or vector floating point type. CST must be of scalar or vector integer type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won’t fit in the floating point type, the results are undefined.
ptrtoint (CST to TYPE)
- Convert a pointer typed constant to the corresponding integer
constant.
TYPE
must be an integer type.CST
must be of pointer type. TheCST
value is zero extended, truncated, or unchanged to make it fit inTYPE
. inttoptr (CST to TYPE)
- Convert an integer constant to a pointer constant. TYPE must be a pointer type. CST must be of integer type. The CST value is zero extended, truncated, or unchanged to make it fit in a pointer size. This one is really dangerous!
bitcast (CST to TYPE)
- Convert a constant, CST, to another TYPE. The constraints of the operands are the same as those for the bitcast instruction.
addrspacecast (CST to TYPE)
- Convert a constant pointer or constant vector of pointer, CST, to another TYPE in a different address space. The constraints of the operands are the same as those for the addrspacecast instruction.
getelementptr (TY, CSTPTR, IDX0, IDX1, ...)
,getelementptr inbounds (TY, CSTPTR, IDX0, IDX1, ...)
- Perform the getelementptr operation on constants. As with the getelementptr instruction, the index list may have zero or more indexes, which are required to make sense for the type of “pointer to TY”.
select (COND, VAL1, VAL2)
- Perform the select operation on constants.
icmp COND (VAL1, VAL2)
- Performs the icmp operation on constants.
fcmp COND (VAL1, VAL2)
- Performs the fcmp operation on constants.
extractelement (VAL, IDX)
- Perform the extractelement operation on constants.
insertelement (VAL, ELT, IDX)
- Perform the insertelement operation on constants.
shufflevector (VEC1, VEC2, IDXMASK)
- Perform the shufflevector operation on constants.
extractvalue (VAL, IDX0, IDX1, ...)
- Perform the extractvalue operation on constants. The index list is interpreted in a similar manner as indices in a ‘getelementptr‘ operation. At least one index value must be specified.
insertvalue (VAL, ELT, IDX0, IDX1, ...)
- Perform the insertvalue operation on constants. The index list is interpreted in a similar manner as indices in a ‘getelementptr‘ operation. At least one index value must be specified.
OPCODE (LHS, RHS)
- Perform the specified operation of the LHS and RHS constants. OPCODE may be any of the binary or bitwise binary operations. The constraints on operands are the same as those for the corresponding instruction (e.g. no bitwise operations on floating point values are allowed).
Other Values¶
Inline Assembler Expressions¶
LLVM supports inline assembler expressions (as opposed to Module-Level Inline Assembly) through the use of a special value. This value represents the inline assembler as a template string (containing the instructions to emit), a list of operand constraints (stored as a string), a flag that indicates whether or not the inline asm expression has side effects, and a flag indicating whether the function containing the asm needs to align its stack conservatively.
The template string supports argument substitution of the operands using “$
”
followed by a number, to indicate substitution of the given register/memory
location, as specified by the constraint string. “${NUM:MODIFIER}
” may also
be used, where MODIFIER
is a target-specific annotation for how to print the
operand (See Asm template argument modifiers).
A literal “$
” may be included by using “$$
” in the template. To include
other special characters into the output, the usual “\XX
” escapes may be
used, just as in other strings. Note that after template substitution, the
resulting assembly string is parsed by LLVM’s integrated assembler unless it is
disabled – even when emitting a .s
file – and thus must contain assembly
syntax known to LLVM.
LLVM’s support for inline asm is modeled closely on the requirements of Clang’s GCC-compatible inline-asm support. Thus, the feature-set and the constraint and modifier codes listed here are similar or identical to those in GCC’s inline asm support. However, to be clear, the syntax of the template and constraint strings described here is not the same as the syntax accepted by GCC and Clang, and, while most constraint letters are passed through as-is by Clang, some get translated to other codes when converting from the C source to the LLVM assembly.
An example inline assembler expression is:
i32 (i32) asm "bswap $0", "=r,r"
Inline assembler expressions may only be used as the callee operand of a call or an invoke instruction. Thus, typically we have:
%X = call i32 asm "bswap $0", "=r,r"(i32 %Y)
Inline asms with side effects not visible in the constraint list must be
marked as having side effects. This is done through the use of the
‘sideeffect
‘ keyword, like so:
call void asm sideeffect "eieio", ""()
In some cases inline asms will contain code that will not work unless
the stack is aligned in some way, such as calls or SSE instructions on
x86, yet will not contain code that does that alignment within the asm.
The compiler should make conservative assumptions about what the asm
might contain and should generate its usual stack alignment code in the
prologue if the ‘alignstack
‘ keyword is present:
call void asm alignstack "eieio", ""()
Inline asms also support using non-standard assembly dialects. The
assumed dialect is ATT. When the ‘inteldialect
‘ keyword is present,
the inline asm is using the Intel dialect. Currently, ATT and Intel are
the only supported dialects. An example is:
call void asm inteldialect "eieio", ""()
If multiple keywords appear the ‘sideeffect
‘ keyword must come
first, the ‘alignstack
‘ keyword second and the ‘inteldialect
‘
keyword last.
Inline Asm Constraint String¶
The constraint list is a comma-separated string, each element containing one or more constraint codes.
For each element in the constraint list an appropriate register or memory
operand will be chosen, and it will be made available to assembly template
string expansion as $0
for the first constraint in the list, $1
for the
second, etc.
There are three different types of constraints, which are distinguished by a prefix symbol in front of the constraint code: Output, Input, and Clobber. The constraints must always be given in that order: outputs first, then inputs, then clobbers. They cannot be intermingled.
There are also three different categories of constraint codes:
- Register constraint. This is either a register class, or a fixed physical register. This kind of constraint will allocate a register, and if necessary, bitcast the argument or result to the appropriate type.
- Memory constraint. This kind of constraint is for use with an instruction taking a memory operand. Different constraints allow for different addressing modes used by the target.
- Immediate value constraint. This kind of constraint is for an integer or other immediate value which can be rendered directly into an instruction. The various target-specific constraints allow the selection of a value in the proper range for the instruction you wish to use it with.
Output constraints¶
Output constraints are specified by an “=
” prefix (e.g. “=r
”). This
indicates that the assembly will write to this operand, and the operand will
then be made available as a return value of the asm
expression. Output
constraints do not consume an argument from the call instruction. (Except, see
below about indirect outputs).
Normally, it is expected that no output locations are written to by the assembly
expression until all of the inputs have been read. As such, LLVM may assign
the same register to an output and an input. If this is not safe (e.g. if the
assembly contains two instructions, where the first writes to one output, and
the second reads an input and writes to a second output), then the “&
”
modifier must be used (e.g. “=&r
”) to specify that the output is an
“early-clobber” output. Marking an ouput as “early-clobber” ensures that LLVM
will not use the same register for any inputs (other than an input tied to this
output).
Input constraints¶
Input constraints do not have a prefix – just the constraint codes. Each input constraint will consume one argument from the call instruction. It is not permitted for the asm to write to any input register or memory location (unless that input is tied to an output). Note also that multiple inputs may all be assigned to the same register, if LLVM can determine that they necessarily all contain the same value.
Instead of providing a Constraint Code, input constraints may also “tie”
themselves to an output constraint, by providing an integer as the constraint
string. Tied inputs still consume an argument from the call instruction, and
take up a position in the asm template numbering as is usual – they will simply
be constrained to always use the same register as the output they’ve been tied
to. For example, a constraint string of “=r,0
” says to assign a register for
output, and use that register as an input as well (it being the 0’th
constraint).
It is permitted to tie an input to an “early-clobber” output. In that case, no other input may share the same register as the input tied to the early-clobber (even when the other input has the same value).
You may only tie an input to an output which has a register constraint, not a memory constraint. Only a single input may be tied to an output.
There is also an “interesting” feature which deserves a bit of explanation: if a register class constraint allocates a register which is too small for the value type operand provided as input, the input value will be split into multiple registers, and all of them passed to the inline asm.
However, this feature is often not as useful as you might think.
Firstly, the registers are not guaranteed to be consecutive. So, on those architectures that have instructions which operate on multiple consecutive instructions, this is not an appropriate way to support them. (e.g. the 32-bit SparcV8 has a 64-bit load, which instruction takes a single 32-bit register. The hardware then loads into both the named register, and the next register. This feature of inline asm would not be useful to support that.)
A few of the targets provide a template string modifier allowing explicit access
to the second register of a two-register operand (e.g. MIPS L
, M
, and
D
). On such an architecture, you can actually access the second allocated
register (yet, still, not any subsequent ones). But, in that case, you’re still
probably better off simply splitting the value into two separate operands, for
clarity. (e.g. see the description of the A
constraint on X86, which,
despite existing only for use with this feature, is not really a good idea to
use)
Indirect inputs and outputs¶
Indirect output or input constraints can be specified by the “*
” modifier
(which goes after the “=
” in case of an output). This indicates that the asm
will write to or read from the contents of an address provided as an input
argument. (Note that in this way, indirect outputs act more like an input than
an output: just like an input, they consume an argument of the call expression,
rather than producing a return value. An indirect output constraint is an
“output” only in that the asm is expected to write to the contents of the input
memory location, instead of just read from it).
This is most typically used for memory constraint, e.g. “=*m
”, to pass the
address of a variable as a value.
It is also possible to use an indirect register constraint, but only on output
(e.g. “=*r
”). This will cause LLVM to allocate a register for an output
value normally, and then, separately emit a store to the address provided as
input, after the provided inline asm. (It’s not clear what value this
functionality provides, compared to writing the store explicitly after the asm
statement, and it can only produce worse code, since it bypasses many
optimization passes. I would recommend not using it.)
Clobber constraints¶
A clobber constraint is indicated by a “~
” prefix. A clobber does not
consume an input operand, nor generate an output. Clobbers cannot use any of the
general constraint code letters – they may use only explicit register
constraints, e.g. “~{eax}
”. The one exception is that a clobber string of
“~{memory}
” indicates that the assembly writes to arbitrary undeclared
memory locations – not only the memory pointed to by a declared indirect
output.
Constraint Codes¶
After a potential prefix comes constraint code, or codes.
A Constraint Code is either a single letter (e.g. “r
”), a “^
” character
followed by two letters (e.g. “^wc
”), or “{
” register-name “}
”
(e.g. “{eax}
”).
The one and two letter constraint codes are typically chosen to be the same as GCC’s constraint codes.
A single constraint may include one or more than constraint code in it, leaving it up to LLVM to choose which one to use. This is included mainly for compatibility with the translation of GCC inline asm coming from clang.
There are two ways to specify alternatives, and either or both may be used in an inline asm constraint list:
- Append the codes to each other, making a constraint code set. E.g. “
im
” or “{eax}m
”. This means “choose any of the options in the set”. The choice of constraint is made independently for each constraint in the constraint list. - Use “
|
” between constraint code sets, creating alternatives. Every constraint in the constraint list must have the same number of alternative sets. With this syntax, the same alternative in all of the items in the constraint list will be chosen together.
Putting those together, you might have a two operand constraint string like
"rm|r,ri|rm"
. This indicates that if operand 0 is r
or m
, then
operand 1 may be one of r
or i
. If operand 0 is r
, then operand 1
may be one of r
or m
. But, operand 0 and 1 cannot both be of type m.
However, the use of either of the alternatives features is NOT recommended, as
LLVM is not able to make an intelligent choice about which one to use. (At the
point it currently needs to choose, not enough information is available to do so
in a smart way.) Thus, it simply tries to make a choice that’s most likely to
compile, not one that will be optimal performance. (e.g., given “rm
”, it’ll
always choose to use memory, not registers). And, if given multiple registers,
or multiple register classes, it will simply choose the first one. (In fact, it
doesn’t currently even ensure explicitly specified physical registers are
unique, so specifying multiple physical registers as alternatives, like
{r11}{r12},{r11}{r12}
, will assign r11 to both operands, not at all what was
intended.)
Supported Constraint Code List¶
The constraint codes are, in general, expected to behave the same way they do in GCC. LLVM’s support is often implemented on an ‘as-needed’ basis, to support C inline asm code which was supported by GCC. A mismatch in behavior between LLVM and GCC likely indicates a bug in LLVM.
Some constraint codes are typically supported by all targets:
r
: A register in the target’s general purpose register class.m
: A memory address operand. It is target-specific what addressing modes are supported, typical examples are register, or register + register offset, or register + immediate offset (of some target-specific size).i
: An integer constant (of target-specific width). Allows either a simple immediate, or a relocatable value.n
: An integer constant – not including relocatable values.s
: An integer constant, but allowing only relocatable values.X
: Allows an operand of any kind, no constraint whatsoever. Typically useful to pass a label for an asm branch or call.{register-name}
: Requires exactly the named physical register.
Other constraints are target-specific:
AArch64:
z
: An immediate integer 0. OutputsWZR
orXZR
, as appropriate.I
: An immediate integer valid for anADD
orSUB
instruction, i.e. 0 to 4095 with optional shift by 12.J
: An immediate integer that, when negated, is valid for anADD
orSUB
instruction, i.e. -1 to -4095 with optional left shift by 12.K
: An immediate integer that is valid for the ‘bitmask immediate 32’ of a logical instruction likeAND
,EOR
, orORR
with a 32-bit register.L
: An immediate integer that is valid for the ‘bitmask immediate 64’ of a logical instruction likeAND
,EOR
, orORR
with a 64-bit register.M
: An immediate integer for use with theMOV
assembly alias on a 32-bit register. This is a superset ofK
: in addition to the bitmask immediate, also allows immediate integers which can be loaded with a singleMOVZ
orMOVL
instruction.N
: An immediate integer for use with theMOV
assembly alias on a 64-bit register. This is a superset ofL
.Q
: Memory address operand must be in a single register (no offsets). (However, LLVM currently does this for them
constraint as well.)r
: A 32 or 64-bit integer register (W* or X*).w
: A 32, 64, or 128-bit floating-point/SIMD register.x
: A lower 128-bit floating-point/SIMD register (V0
toV15
).
AMDGPU:
r
: A 32 or 64-bit integer register.[0-9]v
: The 32-bit VGPR register, number 0-9.[0-9]s
: The 32-bit SGPR register, number 0-9.
All ARM modes:
Q
,Um
,Un
,Uq
,Us
,Ut
,Uv
,Uy
: Memory address operand. Treated the same as operandm
, at the moment.
ARM and ARM’s Thumb2 mode:
j
: An immediate integer between 0 and 65535 (valid forMOVW
)I
: An immediate integer valid for a data-processing instruction.J
: An immediate integer between -4095 and 4095.K
: An immediate integer whose bitwise inverse is valid for a data-processing instruction. (Can be used with template modifier “B
” to print the inverted value).L
: An immediate integer whose negation is valid for a data-processing instruction. (Can be used with template modifier “n
” to print the negated value).M
: A power of two or a integer between 0 and 32.N
: Invalid immediate constraint.O
: Invalid immediate constraint.r
: A general-purpose 32-bit integer register (r0-r15
).l
: In Thumb2 mode, low 32-bit GPR registers (r0-r7
). In ARM mode, same asr
.h
: In Thumb2 mode, a high 32-bit GPR register (r8-r15
). In ARM mode, invalid.w
: A 32, 64, or 128-bit floating-point/SIMD register:s0-s31
,d0-d31
, orq0-q15
.x
: A 32, 64, or 128-bit floating-point/SIMD register:s0-s15
,d0-d7
, orq0-q3
.t
: A floating-point/SIMD register, only supports 32-bit values:s0-s31
.
ARM’s Thumb1 mode:
I
: An immediate integer between 0 and 255.J
: An immediate integer between -255 and -1.K
: An immediate integer between 0 and 255, with optional left-shift by some amount.L
: An immediate integer between -7 and 7.M
: An immediate integer which is a multiple of 4 between 0 and 1020.N
: An immediate integer between 0 and 31.O
: An immediate integer which is a multiple of 4 between -508 and 508.r
: A low 32-bit GPR register (r0-r7
).l
: A low 32-bit GPR register (r0-r7
).h
: A high GPR register (r0-r7
).w
: A 32, 64, or 128-bit floating-point/SIMD register:s0-s31
,d0-d31
, orq0-q15
.x
: A 32, 64, or 128-bit floating-point/SIMD register:s0-s15
,d0-d7
, orq0-q3
.t
: A floating-point/SIMD register, only supports 32-bit values:s0-s31
.
Hexagon:
o
,v
: A memory address operand, treated the same as constraintm
, at the moment.r
: A 32 or 64-bit register.
MSP430:
r
: An 8 or 16-bit register.
MIPS:
I
: An immediate signed 16-bit integer.J
: An immediate integer zero.K
: An immediate unsigned 16-bit integer.L
: An immediate 32-bit integer, where the lower 16 bits are 0.N
: An immediate integer between -65535 and -1.O
: An immediate signed 15-bit integer.P
: An immediate integer between 1 and 65535.m
: A memory address operand. In MIPS-SE mode, allows a base address register plus 16-bit immediate offset. In MIPS mode, just a base register.R
: A memory address operand. In MIPS-SE mode, allows a base address register plus a 9-bit signed offset. In MIPS mode, the same as constraintm
.ZC
: A memory address operand, suitable for use in apref
,ll
, orsc
instruction on the given subtarget (details vary).r
,d
,y
: A 32 or 64-bit GPR register.f
: A 32 or 64-bit FPU register (F0-F31
), or a 128-bit MSA register (W0-W31
). In the case of MSA registers, it is recommended to use thew
argument modifier for compatibility with GCC.c
: A 32-bit or 64-bit GPR register suitable for indirect jump (always25
).l
: Thelo
register, 32 or 64-bit.x
: Invalid.
NVPTX:
b
: A 1-bit integer register.c
orh
: A 16-bit integer register.r
: A 32-bit integer register.l
orN
: A 64-bit integer register.f
: A 32-bit float register.d
: A 64-bit float register.
PowerPC:
I
: An immediate signed 16-bit integer.J
: An immediate unsigned 16-bit integer, shifted left 16 bits.K
: An immediate unsigned 16-bit integer.L
: An immediate signed 16-bit integer, shifted left 16 bits.M
: An immediate integer greater than 31.N
: An immediate integer that is an exact power of 2.O
: The immediate integer constant 0.P
: An immediate integer constant whose negation is a signed 16-bit constant.es
,o
,Q
,Z
,Zy
: A memory address operand, currently treated the same asm
.r
: A 32 or 64-bit integer register.b
: A 32 or 64-bit integer register, excludingR0
(that is:R1-R31
).f
: A 32 or 64-bit float register (F0-F31
), or when QPX is enabled, a 128 or 256-bit QPX register (Q0-Q31
; aliases theF
registers).v
: For4 x f32
or4 x f64
types, when QPX is enabled, a 128 or 256-bit QPX register (Q0-Q31
), otherwise a 128-bit altivec vector register (V0-V31
).y
: Condition register (CR0-CR7
).wc
: An individual CR bit in a CR register.wa
,wd
,wf
: Any 128-bit VSX vector register, from the full VSX register set (overlapping both the floating-point and vector register files).ws
: A 32 or 64-bit floating point register, from the full VSX register set.
Sparc:
I
: An immediate 13-bit signed integer.r
: A 32-bit integer register.
SystemZ:
I
: An immediate unsigned 8-bit integer.J
: An immediate unsigned 12-bit integer.K
: An immediate signed 16-bit integer.L
: An immediate signed 20-bit integer.M
: An immediate integer 0x7fffffff.Q
,R
,S
,T
: A memory address operand, treated the same asm
, at the moment.r
ord
: A 32, 64, or 128-bit integer register.a
: A 32, 64, or 128-bit integer address register (excludes R0, which in an address context evaluates as zero).h
: A 32-bit value in the high part of a 64bit data register (LLVM-specific)f
: A 32, 64, or 128-bit floating point register.
X86:
I
: An immediate integer between 0 and 31.J
: An immediate integer between 0 and 64.K
: An immediate signed 8-bit integer.L
: An immediate integer, 0xff or 0xffff or (in 64-bit mode only) 0xffffffff.M
: An immediate integer between 0 and 3.N
: An immediate unsigned 8-bit integer.O
: An immediate integer between 0 and 127.e
: An immediate 32-bit signed integer.Z
: An immediate 32-bit unsigned integer.o
,v
: Treated the same asm
, at the moment.q
: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bitl
integer register. On X86-32, this is thea
,b
,c
, andd
registers, and on X86-64, it is all of the integer registers.Q
: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bith
integer register. This is thea
,b
,c
, andd
registers.r
orl
: An 8, 16, 32, or 64-bit integer register.R
: An 8, 16, 32, or 64-bit “legacy” integer register – one which has existed since i386, and can be accessed without the REX prefix.f
: A 32, 64, or 80-bit ‘387 FPU stack pseudo-register.y
: A 64-bit MMX register, if MMX is enabled.x
: If SSE is enabled: a 32 or 64-bit scalar operand, or 128-bit vector operand in a SSE register. If AVX is also enabled, can also be a 256-bit vector operand in an AVX register. If AVX-512 is also enabled, can also be a 512-bit vector operand in an AVX512 register, Otherwise, an error.Y
: The same asx
, if SSE2 is enabled, otherwise an error.A
: Special case: allocates EAX first, then EDX, for a single operand (in 32-bit mode, a 64-bit integer operand will get split into two registers). It is not recommended to use this constraint, as in 64-bit mode, the 64-bit operand will get allocated only to RAX – if two 32-bit operands are needed, you’re better off splitting it yourself, before passing it to the asm statement.
XCore:
r
: A 32-bit integer register.
Asm template argument modifiers¶
In the asm template string, modifiers can be used on the operand reference, like
“${0:n}
”.
The modifiers are, in general, expected to behave the same way they do in GCC. LLVM’s support is often implemented on an ‘as-needed’ basis, to support C inline asm code which was supported by GCC. A mismatch in behavior between LLVM and GCC likely indicates a bug in LLVM.
Target-independent:
c
: Print an immediate integer constant unadorned, without the target-specific immediate punctuation (e.g. no$
prefix).n
: Negate and print immediate integer constant unadorned, without the target-specific immediate punctuation (e.g. no$
prefix).l
: Print as an unadorned label, without the target-specific label punctuation (e.g. no$
prefix).
AArch64:
w
: Print a GPR register with aw*
name instead ofx*
name. E.g., instead ofx30
, printw30
.x
: Print a GPR register with ax*
name. (this is the default, anyhow).b
,h
,s
,d
,q
: Print a floating-point/SIMD register with ab*
,h*
,s*
,d*
, orq*
name, rather than the default ofv*
.
AMDGPU:
r
: No effect.
ARM:
a
: Print an operand as an address (with[
and]
surrounding a register).P
: No effect.q
: No effect.y
: Print a VFP single-precision register as an indexed double (e.g. print asd4[1]
instead ofs9
)B
: Bitwise invert and print an immediate integer constant without#
prefix.L
: Print the low 16-bits of an immediate integer constant.M
: Print as a register set suitable for ldm/stm. Also prints all register operands subsequent to the specified one (!), so use carefully.Q
: Print the low-order register of a register-pair, or the low-order register of a two-register operand.R
: Print the high-order register of a register-pair, or the high-order register of a two-register operand.H
: Print the second register of a register-pair. (On a big-endian system,H
is equivalent toQ
, and on little-endian system,H
is equivalent toR
.)e
: Print the low doubleword register of a NEON quad register.f
: Print the high doubleword register of a NEON quad register.m
: Print the base register of a memory operand without the[
and]
adornment.
Hexagon:
L
: Print the second register of a two-register operand. Requires that it has been allocated consecutively to the first.I
: Print the letter ‘i’ if the operand is an integer constant, otherwise nothing. Used to print ‘addi’ vs ‘add’ instructions.
MSP430:
No additional modifiers.
MIPS:
X
: Print an immediate integer as hexadecimalx
: Print the low 16 bits of an immediate integer as hexadecimal.d
: Print an immediate integer as decimal.m
: Subtract one and print an immediate integer as decimal.z
: Print $0 if an immediate zero, otherwise print normally.L
: Print the low-order register of a two-register operand, or prints the address of the low-order word of a double-word memory operand.M
: Print the high-order register of a two-register operand, or prints the address of the high-order word of a double-word memory operand.D
: Print the second register of a two-register operand, or prints the second word of a double-word memory operand. (On a big-endian system,D
is equivalent toL
, and on little-endian system,D
is equivalent toM
.)w
: No effect. Provided for compatibility with GCC which requires this modifier in order to print MSA registers (W0-W31
) with thef
constraint.
NVPTX:
r
: No effect.
PowerPC:
L
: Print the second register of a two-register operand. Requires that it has been allocated consecutively to the first.I
: Print the letter ‘i’ if the operand is an integer constant, otherwise nothing. Used to print ‘addi’ vs ‘add’ instructions.y
: For a memory operand, prints formatter for a two-register X-form instruction. (Currently always printsr0,OPERAND
).U
: Prints ‘u’ if the memory operand is an update form, and nothing otherwise. (NOTE: LLVM does not support update form, so this will currently always print nothing)X
: Prints ‘x’ if the memory operand is an indexed form. (NOTE: LLVM does not support indexed form, so this will currently always print nothing)
Sparc:
r
: No effect.
SystemZ:
SystemZ implements only n
, and does not support any of the other
target-independent modifiers.
X86:
c
: Print an unadorned integer or symbol name. (The latter is target-specific behavior for this typically target-independent modifier).A
: Print a register name with a ‘*
‘ before it.b
: Print an 8-bit register name (e.g.al
); do nothing on a memory operand.h
: Print the upper 8-bit register name (e.g.ah
); do nothing on a memory operand.w
: Print the 16-bit register name (e.g.ax
); do nothing on a memory operand.k
: Print the 32-bit register name (e.g.eax
); do nothing on a memory operand.q
: Print the 64-bit register name (e.g.rax
), if 64-bit registers are available, otherwise the 32-bit register name; do nothing on a memory operand.n
: Negate and print an unadorned integer, or, for operands other than an immediate integer (e.g. a relocatable symbol expression), print a ‘-‘ before the operand. (The behavior for relocatable symbol expressions is a target-specific behavior for this typically target-independent modifier)H
: Print a memory reference with additional offset +8.P
: Print a memory reference or operand for use as the argument of a call instruction. (E.g. omit(rip)
, even though it’s PC-relative.)
XCore:
No additional modifiers.
Inline Asm Metadata¶
The call instructions that wrap inline asm nodes may have a
“!srcloc
” MDNode attached to it that contains a list of constant
integers. If present, the code generator will use the integer as the
location cookie value when report errors through the LLVMContext
error reporting mechanisms. This allows a front-end to correlate backend
errors that occur with inline asm back to the source code that produced
it. For example:
call void asm sideeffect "something bad", ""(), !srcloc !42
...
!42 = !{ i32 1234567 }
It is up to the front-end to make sense of the magic numbers it places in the IR. If the MDNode contains multiple constants, the code generator will use the one that corresponds to the line of the asm that the error occurs on.
Metadata¶
LLVM IR allows metadata to be attached to instructions in the program that can convey extra information about the code to the optimizers and code generator. One example application of metadata is source-level debug information. There are two metadata primitives: strings and nodes.
Metadata does not have a type, and is not a value. If referenced from a
call
instruction, it uses the metadata
type.
All metadata are identified in syntax by a exclamation point (‘!
‘).
Metadata Nodes and Metadata Strings¶
A metadata string is a string surrounded by double quotes. It can
contain any character by escaping non-printable characters with
“\xx
” where “xx
” is the two digit hex code. For example:
“!"test\00"
”.
Metadata nodes are represented with notation similar to structure constants (a comma separated list of elements, surrounded by braces and preceded by an exclamation point). Metadata nodes can have any values as their operand. For example:
!{ !"test\00", i32 10}
Metadata nodes that aren’t uniqued use the distinct
keyword. For example:
!0 = distinct !{!"test\00", i32 10}
distinct
nodes are useful when nodes shouldn’t be merged based on their
content. They can also occur when transformations cause uniquing collisions
when metadata operands change.
A named metadata is a collection of metadata nodes, which can be looked up in the module symbol table. For example:
!foo = !{!4, !3}
Metadata can be used as function arguments. Here llvm.dbg.value
function is using two metadata arguments:
call void @llvm.dbg.value(metadata !24, i64 0, metadata !25)
Metadata can be attached with an instruction. Here metadata !21
is
attached to the add
instruction using the !dbg
identifier:
%indvar.next = add i64 %indvar, 1, !dbg !21
More information about specific metadata nodes recognized by the optimizers and code generator is found below.
Specialized Metadata Nodes¶
Specialized metadata nodes are custom data structures in metadata (as opposed to generic tuples). Their fields are labelled, and can be specified in any order.
These aren’t inherently debug info centric, but currently all the specialized metadata nodes are related to debug info.
DICompileUnit¶
DICompileUnit
nodes represent a compile unit. The enums:
,
retainedTypes:
, subprograms:
, globals:
and imports:
fields are
tuples containing the debug info to be emitted along with the compile unit,
regardless of code optimizations (some nodes are only emitted if there are
references to them from instructions).
!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
isOptimized: true, flags: "-O2", runtimeVersion: 2,
splitDebugFilename: "abc.debug", emissionKind: 1,
enums: !2, retainedTypes: !3, subprograms: !4,
globals: !5, imports: !6)
Compile unit descriptors provide the root scope for objects declared in a
specific compilation unit. File descriptors are defined using this scope.
These descriptors are collected by a named metadata !llvm.dbg.cu
. They
keep track of subprograms, global variables, type information, and imported
entities (declarations and namespaces).
DIFile¶
DIFile
nodes represent files. The filename:
can include slashes.
!0 = !DIFile(filename: "path/to/file", directory: "/path/to/dir")
Files are sometimes used in scope:
fields, and are the only valid target
for file:
fields.
DIBasicType¶
DIBasicType
nodes represent primitive types, such as int
, bool
and
float
. tag:
defaults to DW_TAG_base_type
.
!0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,
encoding: DW_ATE_unsigned_char)
!1 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)")
The encoding:
describes the details of the type. Usually it’s one of the
following:
DW_ATE_address = 1
DW_ATE_boolean = 2
DW_ATE_float = 4
DW_ATE_signed = 5
DW_ATE_signed_char = 6
DW_ATE_unsigned = 7
DW_ATE_unsigned_char = 8
DISubroutineType¶
DISubroutineType
nodes represent subroutine types. Their types:
field
refers to a tuple; the first operand is the return type, while the rest are the
types of the formal arguments in order. If the first operand is null
, that
represents a function with no return value (such as void foo() {}
in C++).
!0 = !BasicType(name: "int", size: 32, align: 32, DW_ATE_signed)
!1 = !BasicType(name: "char", size: 8, align: 8, DW_ATE_signed_char)
!2 = !DISubroutineType(types: !{null, !0, !1}) ; void (int, char)
DIDerivedType¶
DIDerivedType
nodes represent types derived from other types, such as
qualified types.
!0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,
encoding: DW_ATE_unsigned_char)
!1 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !0, size: 32,
align: 32)
The following tag:
values are valid:
DW_TAG_formal_parameter = 5
DW_TAG_member = 13
DW_TAG_pointer_type = 15
DW_TAG_reference_type = 16
DW_TAG_typedef = 22
DW_TAG_ptr_to_member_type = 31
DW_TAG_const_type = 38
DW_TAG_volatile_type = 53
DW_TAG_restrict_type = 55
DW_TAG_member
is used to define a member of a composite type or subprogram. The type of the member
is the baseType:
. The offset:
is the member’s bit offset.
DW_TAG_formal_parameter
is used to define a member which is a formal
argument of a subprogram.
DW_TAG_typedef
is used to provide a name for the baseType:
.
DW_TAG_pointer_type
, DW_TAG_reference_type
, DW_TAG_const_type
,
DW_TAG_volatile_type
and DW_TAG_restrict_type
are used to qualify the
baseType:
.
Note that the void *
type is expressed as a type derived from NULL.
DICompositeType¶
DICompositeType
nodes represent types composed of other types, like
structures and unions. elements:
points to a tuple of the composed types.
If the source language supports ODR, the identifier:
field gives the unique
identifier used for type merging between modules. When specified, other types
can refer to composite types indirectly via a metadata string that matches their identifier.
!0 = !DIEnumerator(name: "SixKind", value: 7)
!1 = !DIEnumerator(name: "SevenKind", value: 7)
!2 = !DIEnumerator(name: "NegEightKind", value: -8)
!3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Enum", file: !12,
line: 2, size: 32, align: 32, identifier: "_M4Enum",
elements: !{!0, !1, !2})
The following tag:
values are valid:
DW_TAG_array_type = 1
DW_TAG_class_type = 2
DW_TAG_enumeration_type = 4
DW_TAG_structure_type = 19
DW_TAG_union_type = 23
DW_TAG_subroutine_type = 21
DW_TAG_inheritance = 28
For DW_TAG_array_type
, the elements:
should be subrange
descriptors, each representing the range of subscripts at that
level of indexing. The DIFlagVector
flag to flags:
indicates that an
array type is a native packed vector.
For DW_TAG_enumeration_type
, the elements:
should be enumerator
descriptors, each representing the definition of an enumeration
value for the set. All enumeration type descriptors are collected in the
enums:
field of the compile unit.
For DW_TAG_structure_type
, DW_TAG_class_type
, and
DW_TAG_union_type
, the elements:
should be derived types with tag: DW_TAG_member
or tag: DW_TAG_inheritance
.
DISubrange¶
DISubrange
nodes are the elements for DW_TAG_array_type
variants of
DICompositeType. count: -1
indicates an empty array.
!0 = !DISubrange(count: 5, lowerBound: 0) ; array counting from 0
!1 = !DISubrange(count: 5, lowerBound: 1) ; array counting from 1
!2 = !DISubrange(count: -1) ; empty array.
DIEnumerator¶
DIEnumerator
nodes are the elements for DW_TAG_enumeration_type
variants of DICompositeType.
!0 = !DIEnumerator(name: "SixKind", value: 7)
!1 = !DIEnumerator(name: "SevenKind", value: 7)
!2 = !DIEnumerator(name: "NegEightKind", value: -8)
DITemplateTypeParameter¶
DITemplateTypeParameter
nodes represent type parameters to generic source
language constructs. They are used (optionally) in DICompositeType and
DISubprogram templateParams:
fields.
!0 = !DITemplateTypeParameter(name: "Ty", type: !1)
DITemplateValueParameter¶
DITemplateValueParameter
nodes represent value parameters to generic source
language constructs. tag:
defaults to DW_TAG_template_value_parameter
,
but if specified can also be set to DW_TAG_GNU_template_template_param
or
DW_TAG_GNU_template_param_pack
. They are used (optionally) in
DICompositeType and DISubprogram templateParams:
fields.
!0 = !DITemplateValueParameter(name: "Ty", type: !1, value: i32 7)
DINamespace¶
DINamespace
nodes represent namespaces in the source language.
!0 = !DINamespace(name: "myawesomeproject", scope: !1, file: !2, line: 7)
DIGlobalVariable¶
DIGlobalVariable
nodes represent global variables in the source language.
!0 = !DIGlobalVariable(name: "foo", linkageName: "foo", scope: !1,
file: !2, line: 7, type: !3, isLocal: true,
isDefinition: false, variable: i32* @foo,
declaration: !4)
All global variables should be referenced by the globals: field of a compile unit.
DISubprogram¶
DISubprogram
nodes represent functions from the source language. The
variables:
field points at variables that must be
retained, even if their IR counterparts are optimized out of the IR. The
type:
field must point at an DISubroutineType.
!0 = !DISubprogram(name: "foo", linkageName: "_Zfoov", scope: !1,
file: !2, line: 7, type: !3, isLocal: true,
isDefinition: false, scopeLine: 8, containingType: !4,
virtuality: DW_VIRTUALITY_pure_virtual, virtualIndex: 10,
flags: DIFlagPrototyped, isOptimized: true,
function: void ()* @_Z3foov,
templateParams: !5, declaration: !6, variables: !7)
DILexicalBlock¶
DILexicalBlock
nodes describe nested blocks within a subprogram. The line number and column numbers are used to dinstinguish
two lexical blocks at same depth. They are valid targets for scope:
fields.
!0 = distinct !DILexicalBlock(scope: !1, file: !2, line: 7, column: 35)
Usually lexical blocks are distinct
to prevent node merging based on
operands.
DILexicalBlockFile¶
DILexicalBlockFile
nodes are used to discriminate between sections of a
lexical block. The file:
field can be changed to
indicate textual inclusion, or the discriminator:
field can be used to
discriminate between control flow within a single block in the source language.
!0 = !DILexicalBlock(scope: !3, file: !4, line: 7, column: 35)
!1 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 0)
!2 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 1)
DILocation¶
DILocation
nodes represent source debug locations. The scope:
field is
mandatory, and points at an DILexicalBlockFile, an
DILexicalBlock, or an DISubprogram.
!0 = !DILocation(line: 2900, column: 42, scope: !1, inlinedAt: !2)
DILocalVariable¶
DILocalVariable
nodes represent local variables in the source language.
Instead of DW_TAG_variable
, they use LLVM-specific fake tags to
discriminate between local variables (DW_TAG_auto_variable
) and subprogram
arguments (DW_TAG_arg_variable
). In the latter case, the arg:
field
specifies the argument position, and this variable will be included in the
variables:
field of its DISubprogram.
!0 = !DILocalVariable(tag: DW_TAG_arg_variable, name: "this", arg: 0,
scope: !3, file: !2, line: 7, type: !3,
flags: DIFlagArtificial)
!1 = !DILocalVariable(tag: DW_TAG_arg_variable, name: "x", arg: 1,
scope: !4, file: !2, line: 7, type: !3)
!1 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "y",
scope: !5, file: !2, line: 7, type: !3)
DIExpression¶
DIExpression
nodes represent DWARF expression sequences. They are used in
debug intrinsics (such as llvm.dbg.declare
) to
describe how the referenced LLVM variable relates to the source language
variable.
The current supported vocabulary is limited:
DW_OP_deref
dereferences the working expression.DW_OP_plus, 93
adds93
to the working expression.DW_OP_bit_piece, 16, 8
specifies the offset and size (16
and8
here, respectively) of the variable piece from the working expression.
!0 = !DIExpression(DW_OP_deref)
!1 = !DIExpression(DW_OP_plus, 3)
!2 = !DIExpression(DW_OP_bit_piece, 3, 7)
!3 = !DIExpression(DW_OP_deref, DW_OP_plus, 3, DW_OP_bit_piece, 3, 7)
DIObjCProperty¶
DIObjCProperty
nodes represent Objective-C property nodes.
!3 = !DIObjCProperty(name: "foo", file: !1, line: 7, setter: "setFoo",
getter: "getFoo", attributes: 7, type: !2)
DIImportedEntity¶
DIImportedEntity
nodes represent entities (such as modules) imported into a
compile unit.
!2 = !DIImportedEntity(tag: DW_TAG_imported_module, name: "foo", scope: !0,
entity: !1, line: 7)
‘tbaa
‘ Metadata¶
In LLVM IR, memory does not have types, so LLVM’s own type system is not suitable for doing TBAA. Instead, metadata is added to the IR to describe a type system of a higher level language. This can be used to implement typical C/C++ TBAA, but it can also be used to implement custom alias analysis behavior for other languages.
The current metadata format is very simple. TBAA metadata nodes have up to three fields, e.g.:
!0 = !{ !"an example type tree" }
!1 = !{ !"int", !0 }
!2 = !{ !"float", !0 }
!3 = !{ !"const float", !2, i64 1 }
The first field is an identity field. It can be any value, usually a metadata string, which uniquely identifies the type. The most important name in the tree is the name of the root node. Two trees with different root node names are entirely disjoint, even if they have leaves with common names.
The second field identifies the type’s parent node in the tree, or is null or omitted for a root node. A type is considered to alias all of its descendants and all of its ancestors in the tree. Also, a type is considered to alias all types in other trees, so that bitcode produced from multiple front-ends is handled conservatively.
If the third field is present, it’s an integer which if equal to 1
indicates that the type is “constant” (meaning
pointsToConstantMemory
should return true; see other useful
AliasAnalysis methods).
‘tbaa.struct
‘ Metadata¶
The llvm.memcpy is often used to implement aggregate assignment operations in C and similar languages, however it is defined to copy a contiguous region of memory, which is more than strictly necessary for aggregate types which contain holes due to padding. Also, it doesn’t contain any TBAA information about the fields of the aggregate.
!tbaa.struct
metadata can describe which memory subregions in a
memcpy are padding and what the TBAA tags of the struct are.
The current metadata format is very simple. !tbaa.struct
metadata
nodes are a list of operands which are in conceptual groups of three.
For each group of three, the first operand gives the byte offset of a
field in bytes, the second gives its size in bytes, and the third gives
its tbaa tag. e.g.:
!4 = !{ i64 0, i64 4, !1, i64 8, i64 4, !2 }
This describes a struct with two fields. The first is at offset 0 bytes with size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytes and has size 4 bytes and has tbaa tag !2.
Note that the fields need not be contiguous. In this example, there is a 4 byte gap between the two fields. This gap represents padding which does not carry useful data and need not be preserved.
‘noalias
‘ and ‘alias.scope
‘ Metadata¶
noalias
and alias.scope
metadata provide the ability to specify generic
noalias memory-access sets. This means that some collection of memory access
instructions (loads, stores, memory-accessing calls, etc.) that carry
noalias
metadata can specifically be specified not to alias with some other
collection of memory access instructions that carry alias.scope
metadata.
Each type of metadata specifies a list of scopes where each scope has an id and
a domain. When evaluating an aliasing query, if for some domain, the set
of scopes with that domain in one instruction’s alias.scope
list is a
subset of (or equal to) the set of scopes for that domain in another
instruction’s noalias
list, then the two memory accesses are assumed not to
alias.
The metadata identifying each domain is itself a list containing one or two entries. The first entry is the name of the domain. Note that if the name is a string then it can be combined accross functions and translation units. A self-reference can be used to create globally unique domain names. A descriptive string may optionally be provided as a second list entry.
The metadata identifying each scope is also itself a list containing two or three entries. The first entry is the name of the scope. Note that if the name is a string then it can be combined accross functions and translation units. A self-reference can be used to create globally unique scope names. A metadata reference to the scope’s domain is the second entry. A descriptive string may optionally be provided as a third list entry.
For example,
; Two scope domains:
!0 = !{!0}
!1 = !{!1}
; Some scopes in these domains:
!2 = !{!2, !0}
!3 = !{!3, !0}
!4 = !{!4, !1}
; Some scope lists:
!5 = !{!4} ; A list containing only scope !4
!6 = !{!4, !3, !2}
!7 = !{!3}
; These two instructions don't alias:
%0 = load float, float* %c, align 4, !alias.scope !5
store float %0, float* %arrayidx.i, align 4, !noalias !5
; These two instructions also don't alias (for domain !1, the set of scopes
; in the !alias.scope equals that in the !noalias list):
%2 = load float, float* %c, align 4, !alias.scope !5
store float %2, float* %arrayidx.i2, align 4, !noalias !6
; These two instructions may alias (for domain !0, the set of scopes in
; the !noalias list is not a superset of, or equal to, the scopes in the
; !alias.scope list):
%2 = load float, float* %c, align 4, !alias.scope !6
store float %0, float* %arrayidx.i, align 4, !noalias !7
‘fpmath
‘ Metadata¶
fpmath
metadata may be attached to any instruction of floating point
type. It can be used to express the maximum acceptable error in the
result of that instruction, in ULPs, thus potentially allowing the
compiler to use a more efficient but less accurate method of computing
it. ULP is defined as follows:
Ifx
is a real number that lies between two finite consecutive floating-point numbersa
andb
, without being equal to one of them, thenulp(x) = |b - a|
, otherwiseulp(x)
is the distance between the two non-equal finite floating-point numbers nearestx
. Moreover,ulp(NaN)
isNaN
.
The metadata node shall consist of a single positive floating point number representing the maximum relative error, for example:
!0 = !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs
‘range
‘ Metadata¶
range
metadata may be attached only to load
, call
and invoke
of
integer types. It expresses the possible ranges the loaded value or the value
returned by the called function at this call site is in. The ranges are
represented with a flattened list of integers. The loaded value or the value
returned is known to be in the union of the ranges defined by each consecutive
pair. Each pair has the following properties:
- The type must match the type loaded by the instruction.
- The pair
a,b
represents the range[a,b)
. - Both
a
andb
are constants. - The range is allowed to wrap.
- The range should not represent the full or empty set. That is,
a!=b
.
In addition, the pairs must be in signed order of the lower bound and they must be non-contiguous.
Examples:
%a = load i8, i8* %x, align 1, !range !0 ; Can only be 0 or 1
%b = load i8, i8* %y, align 1, !range !1 ; Can only be 255 (-1), 0 or 1
%c = call i8 @foo(), !range !2 ; Can only be 0, 1, 3, 4 or 5
%d = invoke i8 @bar() to label %cont
unwind label %lpad, !range !3 ; Can only be -2, -1, 3, 4 or 5
...
!0 = !{ i8 0, i8 2 }
!1 = !{ i8 255, i8 2 }
!2 = !{ i8 0, i8 2, i8 3, i8 6 }
!3 = !{ i8 -2, i8 0, i8 3, i8 6 }
‘llvm.loop
‘¶
It is sometimes useful to attach information to loop constructs. Currently,
loop metadata is implemented as metadata attached to the branch instruction
in the loop latch block. This type of metadata refer to a metadata node that is
guaranteed to be separate for each loop. The loop identifier metadata is
specified with the name llvm.loop
.
The loop identifier metadata is implemented using a metadata that refers to itself to avoid merging it with any other identifier metadata, e.g., during module linkage or function inlining. That is, each loop should refer to their own identification metadata even if they reside in separate functions. The following example contains loop identifier metadata for two separate loop constructs:
!0 = !{!0}
!1 = !{!1}
The loop identifier metadata can be used to specify additional
per-loop metadata. Any operands after the first operand can be treated
as user-defined metadata. For example the llvm.loop.unroll.count
suggests an unroll factor to the loop unroller:
br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
...
!0 = !{!0, !1}
!1 = !{!"llvm.loop.unroll.count", i32 4}
‘llvm.loop.vectorize
‘ and ‘llvm.loop.interleave
‘¶
Metadata prefixed with llvm.loop.vectorize
or llvm.loop.interleave
are
used to control per-loop vectorization and interleaving parameters such as
vectorization width and interleave count. These metadata should be used in
conjunction with llvm.loop
loop identification metadata. The
llvm.loop.vectorize
and llvm.loop.interleave
metadata are only
optimization hints and the optimizer will only interleave and vectorize loops if
it believes it is safe to do so. The llvm.mem.parallel_loop_access
metadata
which contains information about loop-carried memory dependencies can be helpful
in determining the safety of these transformations.
‘llvm.loop.interleave.count
‘ Metadata¶
This metadata suggests an interleave count to the loop interleaver.
The first operand is the string llvm.loop.interleave.count
and the
second operand is an integer specifying the interleave count. For
example:
!0 = !{!"llvm.loop.interleave.count", i32 4}
Note that setting llvm.loop.interleave.count
to 1 disables interleaving
multiple iterations of the loop. If llvm.loop.interleave.count
is set to 0
then the interleave count will be determined automatically.
‘llvm.loop.vectorize.enable
‘ Metadata¶
This metadata selectively enables or disables vectorization for the loop. The
first operand is the string llvm.loop.vectorize.enable
and the second operand
is a bit. If the bit operand value is 1 vectorization is enabled. A value of
0 disables vectorization:
!0 = !{!"llvm.loop.vectorize.enable", i1 0}
!1 = !{!"llvm.loop.vectorize.enable", i1 1}
‘llvm.loop.vectorize.width
‘ Metadata¶
This metadata sets the target width of the vectorizer. The first
operand is the string llvm.loop.vectorize.width
and the second
operand is an integer specifying the width. For example:
!0 = !{!"llvm.loop.vectorize.width", i32 4}
Note that setting llvm.loop.vectorize.width
to 1 disables
vectorization of the loop. If llvm.loop.vectorize.width
is set to
0 or if the loop does not have this metadata the width will be
determined automatically.
‘llvm.loop.unroll
‘¶
Metadata prefixed with llvm.loop.unroll
are loop unrolling
optimization hints such as the unroll factor. llvm.loop.unroll
metadata should be used in conjunction with llvm.loop
loop
identification metadata. The llvm.loop.unroll
metadata are only
optimization hints and the unrolling will only be performed if the
optimizer believes it is safe to do so.
‘llvm.loop.unroll.count
‘ Metadata¶
This metadata suggests an unroll factor to the loop unroller. The
first operand is the string llvm.loop.unroll.count
and the second
operand is a positive integer specifying the unroll factor. For
example:
!0 = !{!"llvm.loop.unroll.count", i32 4}
If the trip count of the loop is less than the unroll count the loop will be partially unrolled.
‘llvm.loop.unroll.disable
‘ Metadata¶
This metadata disables loop unrolling. The metadata has a single operand
which is the string llvm.loop.unroll.disable
. For example:
!0 = !{!"llvm.loop.unroll.disable"}
‘llvm.loop.unroll.runtime.disable
‘ Metadata¶
This metadata disables runtime loop unrolling. The metadata has a single
operand which is the string llvm.loop.unroll.runtime.disable
. For example:
!0 = !{!"llvm.loop.unroll.runtime.disable"}
‘llvm.loop.unroll.full
‘ Metadata¶
This metadata suggests that the loop should be unrolled fully. The
metadata has a single operand which is the string llvm.loop.unroll.full
.
For example:
!0 = !{!"llvm.loop.unroll.full"}
‘llvm.mem
‘¶
Metadata types used to annotate memory accesses with information helpful
for optimizations are prefixed with llvm.mem
.
‘llvm.mem.parallel_loop_access
‘ Metadata¶
The llvm.mem.parallel_loop_access
metadata refers to a loop identifier,
or metadata containing a list of loop identifiers for nested loops.
The metadata is attached to memory accessing instructions and denotes that
no loop carried memory dependence exist between it and other instructions denoted
with the same loop identifier.
Precisely, given two instructions m1
and m2
that both have the
llvm.mem.parallel_loop_access
metadata, with L1
and L2
being the
set of loops associated with that metadata, respectively, then there is no loop
carried dependence between m1
and m2
for loops in both L1
and
L2
.
As a special case, if all memory accessing instructions in a loop have
llvm.mem.parallel_loop_access
metadata that refers to that loop, then the
loop has no loop carried memory dependences and is considered to be a parallel
loop.
Note that if not all memory access instructions have such metadata referring to the loop, then the loop is considered not being trivially parallel. Additional memory dependence analysis is required to make that determination. As a fail safe mechanism, this causes loops that were originally parallel to be considered sequential (if optimization passes that are unaware of the parallel semantics insert new memory instructions into the loop body).
Example of a loop that is considered parallel due to its correct use of
both llvm.loop
and llvm.mem.parallel_loop_access
metadata types that refer to the same loop identifier metadata.
for.body:
...
%val0 = load i32, i32* %arrayidx, !llvm.mem.parallel_loop_access !0
...
store i32 %val0, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
for.end:
...
!0 = !{!0}
It is also possible to have nested parallel loops. In that case the memory accesses refer to a list of loop identifier metadata nodes instead of the loop identifier metadata node directly:
outer.for.body:
...
%val1 = load i32, i32* %arrayidx3, !llvm.mem.parallel_loop_access !2
...
br label %inner.for.body
inner.for.body:
...
%val0 = load i32, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0
...
store i32 %val0, i32* %arrayidx2, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
inner.for.end:
...
store i32 %val1, i32* %arrayidx4, !llvm.mem.parallel_loop_access !2
...
br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
outer.for.end: ; preds = %for.body
...
!0 = !{!1, !2} ; a list of loop identifiers
!1 = !{!1} ; an identifier for the inner loop
!2 = !{!2} ; an identifier for the outer loop
‘llvm.bitsets
‘¶
The llvm.bitsets
global metadata is used to implement
bitsets.
Module Flags Metadata¶
Information about the module as a whole is difficult to convey to LLVM’s
subsystems. The LLVM IR isn’t sufficient to transmit this information.
The llvm.module.flags
named metadata exists in order to facilitate
this. These flags are in the form of key / value pairs — much like a
dictionary — making it easy for any subsystem who cares about a flag to
look it up.
The llvm.module.flags
metadata contains a list of metadata triplets.
Each triplet has the following form:
- The first element is a behavior flag, which specifies the behavior when two (or more) modules are merged together, and it encounters two (or more) metadata with the same ID. The supported behaviors are described below.
- The second element is a metadata string that is a unique ID for the metadata. Each module may only have one flag entry for each unique ID (not including entries with the Require behavior).
- The third element is the value of the flag.
When two (or more) modules are merged together, the resulting
llvm.module.flags
metadata is the union of the modules’ flags. That is, for
each unique metadata ID string, there will be exactly one entry in the merged
modules llvm.module.flags
metadata table, and the value for that entry will
be determined by the merge behavior flag, as described below. The only exception
is that entries with the Require behavior are always preserved.
The following behaviors are supported:
Value | Behavior |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
It is an error for a particular unique flag ID to have multiple behaviors, except in the case of Require (which adds restrictions on another metadata value) or Override.
An example of module flags:
!0 = !{ i32 1, !"foo", i32 1 }
!1 = !{ i32 4, !"bar", i32 37 }
!2 = !{ i32 2, !"qux", i32 42 }
!3 = !{ i32 3, !"qux",
!{
!"foo", i32 1
}
}
!llvm.module.flags = !{ !0, !1, !2, !3 }
Metadata
!0
has the ID!"foo"
and the value ‘1’. The behavior if two or more!"foo"
flags are seen is to emit an error if their values are not equal.Metadata
!1
has the ID!"bar"
and the value ‘37’. The behavior if two or more!"bar"
flags are seen is to use the value ‘37’.Metadata
!2
has the ID!"qux"
and the value ‘42’. The behavior if two or more!"qux"
flags are seen is to emit a warning if their values are not equal.Metadata
!3
has the ID!"qux"
and the value:!{ !"foo", i32 1 }
The behavior is to emit an error if the
llvm.module.flags
does not contain a flag with the ID!"foo"
that has the value ‘1’ after linking is performed.
Objective-C Garbage Collection Module Flags Metadata¶
On the Mach-O platform, Objective-C stores metadata about garbage collection in a special section called “image info”. The metadata consists of a version number and a bitmask specifying what types of garbage collection are supported (if any) by the file. If two or more modules are linked together their garbage collection metadata needs to be merged rather than appended together.
The Objective-C garbage collection module flags metadata consists of the following key-value pairs:
Key | Value |
---|---|
Objective-C Version |
[Required] — The Objective-C ABI version. Valid values are 1 and 2. |
Objective-C Image Info Version |
[Required] — The version of the image info section. Currently always 0. |
Objective-C Image Info Section |
[Required] — The section to place the metadata. Valid values are
"__OBJC, __image_info, regular" for Objective-C ABI version 1, and
"__DATA,__objc_imageinfo, regular, no_dead_strip" for
Objective-C ABI version 2. |
Objective-C Garbage Collection |
[Required] — Specifies whether garbage collection is supported or not. Valid values are 0, for no garbage collection, and 2, for garbage collection supported. |
Objective-C GC Only |
[Optional] — Specifies that only garbage collection is supported.
If present, its value must be 6. This flag requires that the
Objective-C Garbage Collection flag have the value 2. |
Some important flag interactions:
- If a module with
Objective-C Garbage Collection
set to 0 is merged with a module withObjective-C Garbage Collection
set to 2, then the resulting module has theObjective-C Garbage Collection
flag set to 0. - A module with
Objective-C Garbage Collection
set to 0 cannot be merged with a module withObjective-C GC Only
set to 6.
Automatic Linker Flags Module Flags Metadata¶
Some targets support embedding flags to the linker inside individual object files. Typically this is used in conjunction with language extensions which allow source files to explicitly declare the libraries they depend on, and have these automatically be transmitted to the linker via object files.
These flags are encoded in the IR using metadata in the module flags section,
using the Linker Options
key. The merge behavior for this flag is required
to be AppendUnique
, and the value for the key is expected to be a metadata
node which should be a list of other metadata nodes, each of which should be a
list of metadata strings defining linker options.
For example, the following metadata section specifies two separate sets of
linker options, presumably to link against libz
and the Cocoa
framework:
!0 = !{ i32 6, !"Linker Options",
!{
!{ !"-lz" },
!{ !"-framework", !"Cocoa" } } }
!llvm.module.flags = !{ !0 }
The metadata encoding as lists of lists of options, as opposed to a collapsed list of options, is chosen so that the IR encoding can use multiple option strings to specify e.g., a single library, while still having that specifier be preserved as an atomic element that can be recognized by a target specific assembly writer or object file emitter.
Each individual option is required to be either a valid option for the target’s linker, or an option that is reserved by the target specific assembly writer or object file emitter. No other aspect of these options is defined by the IR.
C type width Module Flags Metadata¶
The ARM backend emits a section into each generated object file describing the options that it was compiled with (in a compiler-independent way) to prevent linking incompatible objects, and to allow automatic library selection. Some of these options are not visible at the IR level, namely wchar_t width and enum width.
To pass this information to the backend, these options are encoded in module flags metadata, using the following key-value pairs:
Key | Value |
---|---|
short_wchar |
|
short_enum |
|
For example, the following metadata section specifies that the module was
compiled with a wchar_t
width of 4 bytes, and the underlying type of an
enum is the smallest type which can represent all of its values:
!llvm.module.flags = !{!0, !1}
!0 = !{i32 1, !"short_wchar", i32 1}
!1 = !{i32 1, !"short_enum", i32 0}
Intrinsic Global Variables¶
LLVM has a number of “magic” global variables that contain data that
affect code generation or other IR semantics. These are documented here.
All globals of this sort should have a section specified as
“llvm.metadata
”. This section and all globals that start with
“llvm.
” are reserved for use by LLVM.
The ‘llvm.used
‘ Global Variable¶
The @llvm.used
global is an array which has
appending linkage. This array contains a list of
pointers to named global variables, functions and aliases which may optionally
have a pointer cast formed of bitcast or getelementptr. For example, a legal
use of it is:
@X = global i8 4
@Y = global i32 123
@llvm.used = appending global [2 x i8*] [
i8* @X,
i8* bitcast (i32* @Y to i8*)
], section "llvm.metadata"
If a symbol appears in the @llvm.used
list, then the compiler, assembler,
and linker are required to treat the symbol as if there is a reference to the
symbol that it cannot see (which is why they have to be named). For example, if
a variable has internal linkage and no references other than that from the
@llvm.used
list, it cannot be deleted. This is commonly used to represent
references from inline asms and other things the compiler cannot “see”, and
corresponds to “attribute((used))
” in GNU C.
On some targets, the code generator must emit a directive to the assembler or object file to prevent the assembler and linker from molesting the symbol.
The ‘llvm.compiler.used
‘ Global Variable¶
The @llvm.compiler.used
directive is the same as the @llvm.used
directive, except that it only prevents the compiler from touching the
symbol. On targets that support it, this allows an intelligent linker to
optimize references to the symbol without being impeded as it would be
by @llvm.used
.
This is a rare construct that should only be used in rare circumstances, and should not be exposed to source languages.
The ‘llvm.global_ctors
‘ Global Variable¶
%0 = type { i32, void ()*, i8* }
@llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor, i8* @data }]
The @llvm.global_ctors
array contains a list of constructor
functions, priorities, and an optional associated global or function.
The functions referenced by this array will be called in ascending order
of priority (i.e. lowest first) when the module is loaded. The order of
functions with the same priority is not defined.
If the third field is present, non-null, and points to a global variable or function, the initializer function will only run if the associated data from the current module is not discarded.
The ‘llvm.global_dtors
‘ Global Variable¶
%0 = type { i32, void ()*, i8* }
@llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor, i8* @data }]
The @llvm.global_dtors
array contains a list of destructor
functions, priorities, and an optional associated global or function.
The functions referenced by this array will be called in descending
order of priority (i.e. highest first) when the module is unloaded. The
order of functions with the same priority is not defined.
If the third field is present, non-null, and points to a global variable or function, the destructor function will only run if the associated data from the current module is not discarded.
Instruction Reference¶
The LLVM instruction set consists of several different classifications of instructions: terminator instructions, binary instructions, bitwise binary instructions, memory instructions, and other instructions.
Terminator Instructions¶
As mentioned previously, every basic block in a
program ends with a “Terminator” instruction, which indicates which
block should be executed after the current block is finished. These
terminator instructions typically yield a ‘void
‘ value: they produce
control flow, not values (the one exception being the
‘invoke‘ instruction).
The terminator instructions are: ‘ret‘, ‘br‘, ‘switch‘, ‘indirectbr‘, ‘invoke‘, ‘resume‘, and ‘unreachable‘.
‘ret
‘ Instruction¶
Syntax:¶
ret <type> <value> ; Return a value from a non-void function
ret void ; Return from void function
Overview:¶
The ‘ret
‘ instruction is used to return control flow (and optionally
a value) from a function back to the caller.
There are two forms of the ‘ret
‘ instruction: one that returns a
value and then causes control flow, and one that just causes control
flow to occur.
Arguments:¶
The ‘ret
‘ instruction optionally accepts a single argument, the
return value. The type of the return value must be a ‘first
class‘ type.
A function is not well formed if it it has a non-void
return type and contains a ‘ret
‘ instruction with no return value or
a return value with a type that does not match its type, or if it has a
void return type and contains a ‘ret
‘ instruction with a return
value.
Semantics:¶
When the ‘ret
‘ instruction is executed, control flow returns back to
the calling function’s context. If the caller is a
“call” instruction, execution continues at the
instruction after the call. If the caller was an
“invoke” instruction, execution continues at the
beginning of the “normal” destination block. If the instruction returns
a value, that value shall set the call or invoke instruction’s return
value.
‘br
‘ Instruction¶
Overview:¶
The ‘br
‘ instruction is used to cause control flow to transfer to a
different basic block in the current function. There are two forms of
this instruction, corresponding to a conditional branch and an
unconditional branch.
Arguments:¶
The conditional branch form of the ‘br
‘ instruction takes a single
‘i1
‘ value and two ‘label
‘ values. The unconditional form of the
‘br
‘ instruction takes a single ‘label
‘ value as a target.
Semantics:¶
Upon execution of a conditional ‘br
‘ instruction, the ‘i1
‘
argument is evaluated. If the value is true
, control flows to the
‘iftrue
‘ label
argument. If “cond” is false
, control flows
to the ‘iffalse
‘ label
argument.
‘switch
‘ Instruction¶
Overview:¶
The ‘switch
‘ instruction is used to transfer control flow to one of
several different places. It is a generalization of the ‘br
‘
instruction, allowing a branch to occur to one of many possible
destinations.
Arguments:¶
The ‘switch
‘ instruction uses three parameters: an integer
comparison value ‘value
‘, a default ‘label
‘ destination, and an
array of pairs of comparison value constants and ‘label
‘s. The table
is not allowed to contain duplicate constant entries.
Semantics:¶
The switch
instruction specifies a table of values and destinations.
When the ‘switch
‘ instruction is executed, this table is searched
for the given value. If the value is found, control flow is transferred
to the corresponding destination; otherwise, control flow is transferred
to the default destination.
Implementation:¶
Depending on properties of the target machine and the particular
switch
instruction, this instruction may be code generated in
different ways. For example, it could be generated as a series of
chained conditional branches or with a lookup table.
Example:¶
; Emulate a conditional br instruction
%Val = zext i1 %value to i32
switch i32 %Val, label %truedest [ i32 0, label %falsedest ]
; Emulate an unconditional br instruction
switch i32 0, label %dest [ ]
; Implement a jump table:
switch i32 %val, label %otherwise [ i32 0, label %onzero
i32 1, label %onone
i32 2, label %ontwo ]
‘indirectbr
‘ Instruction¶
Overview:¶
The ‘indirectbr
‘ instruction implements an indirect branch to a
label within the current function, whose address is specified by
“address
”. Address must be derived from a
blockaddress constant.
Arguments:¶
The ‘address
‘ argument is the address of the label to jump to. The
rest of the arguments indicate the full set of possible destinations
that the address may point to. Blocks are allowed to occur multiple
times in the destination list, though this isn’t particularly useful.
This destination list is required so that dataflow analysis has an accurate understanding of the CFG.
Semantics:¶
Control transfers to the block specified in the address argument. All possible destination blocks must be listed in the label list, otherwise this instruction has undefined behavior. This implies that jumps to labels defined in other functions have undefined behavior as well.
Implementation:¶
This is typically implemented with a jump through a register.
‘invoke
‘ Instruction¶
Syntax:¶
<result> = invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs]
to label <normal label> unwind label <exception label>
Overview:¶
The ‘invoke
‘ instruction causes control to transfer to a specified
function, with the possibility of control flow transfer to either the
‘normal
‘ label or the ‘exception
‘ label. If the callee function
returns with the “ret
” instruction, control flow will return to the
“normal” label. If the callee (or any indirect callees) returns via the
“resume” instruction or other exception handling
mechanism, control is interrupted and continued at the dynamically
nearest “exception” label.
The ‘exception
‘ label is a landing
pad for the exception. As such,
‘exception
‘ label is required to have the
“landingpad” instruction, which contains the
information about the behavior of the program after unwinding happens,
as its first non-PHI instruction. The restrictions on the
“landingpad
” instruction’s tightly couples it to the “invoke
”
instruction, so that the important information contained within the
“landingpad
” instruction can’t be lost through normal code motion.
Arguments:¶
This instruction requires several arguments:
- The optional “cconv” marker indicates which calling convention the call should use. If none is specified, the call defaults to using C calling conventions.
- The optional Parameter Attributes list for return
values. Only ‘
zeroext
‘, ‘signext
‘, and ‘inreg
‘ attributes are valid here. - ‘
ptr to function ty
‘: shall be the signature of the pointer to function value being invoked. In most cases, this is a direct function invocation, but indirectinvoke
‘s are just as possible, branching off an arbitrary pointer to function value. - ‘
function ptr val
‘: An LLVM value containing a pointer to a function to be invoked. - ‘
function args
‘: argument list whose types match the function signature argument types and parameter attributes. All arguments must be of first class type. If the function signature indicates the function accepts a variable number of arguments, the extra arguments can be specified. - ‘
normal label
‘: the label reached when the called function executes a ‘ret
‘ instruction. - ‘
exception label
‘: the label reached when a callee returns via the resume instruction or other exception handling mechanism. - The optional function attributes list. Only
‘
noreturn
‘, ‘nounwind
‘, ‘readonly
‘ and ‘readnone
‘ attributes are valid here.
Semantics:¶
This instruction is designed to operate as a standard ‘call
‘
instruction in most regards. The primary difference is that it
establishes an association with a label, which is used by the runtime
library to unwind the stack.
This instruction is used in languages with destructors to ensure that
proper cleanup is performed in the case of either a longjmp
or a
thrown exception. Additionally, this is important for implementation of
‘catch
‘ clauses in high-level languages that support them.
For the purposes of the SSA form, the definition of the value returned
by the ‘invoke
‘ instruction is deemed to occur on the edge from the
current block to the “normal” label. If the callee unwinds then no
return value is available.
‘resume
‘ Instruction¶
Arguments:¶
The ‘resume
‘ instruction requires one argument, which must have the
same type as the result of any ‘landingpad
‘ instruction in the same
function.
Semantics:¶
The ‘resume
‘ instruction resumes propagation of an existing
(in-flight) exception whose unwinding was interrupted with a
landingpad instruction.
‘unreachable
‘ Instruction¶
Overview:¶
The ‘unreachable
‘ instruction has no defined semantics. This
instruction is used to inform the optimizer that a particular portion of
the code is not reachable. This can be used to indicate that the code
after a no-return function cannot be reached, and other facts.
Semantics:¶
The ‘unreachable
‘ instruction has no defined semantics.
Binary Operations¶
Binary operators are used to do most of the computation in a program. They require two operands of the same type, execute an operation on them, and produce a single value. The operands might represent multiple data, as is the case with the vector data type. The result value has the same type as its operands.
There are several different binary operators:
‘add
‘ Instruction¶
Syntax:¶
<result> = add <ty> <op1>, <op2> ; yields ty:result
<result> = add nuw <ty> <op1>, <op2> ; yields ty:result
<result> = add nsw <ty> <op1>, <op2> ; yields ty:result
<result> = add nuw nsw <ty> <op1>, <op2> ; yields ty:result
Arguments:¶
The two arguments to the ‘add
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Semantics:¶
The value produced is the integer sum of the two operands.
If the sum has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.
Because LLVM integers use a two’s complement representation, this instruction is appropriate for both signed and unsigned integers.
nuw
and nsw
stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw
and/or nsw
keywords are present, the
result value of the add
is a poison value if
unsigned and/or signed overflow, respectively, occurs.
‘fadd
‘ Instruction¶
Arguments:¶
The two arguments to the ‘fadd
‘ instruction must be floating
point or vector of floating point values.
Both arguments must have identical types.
Semantics:¶
The value produced is the floating point sum of the two operands. This instruction can also take any number of fast-math flags, which are optimization hints to enable otherwise unsafe floating point optimizations:
‘sub
‘ Instruction¶
Syntax:¶
<result> = sub <ty> <op1>, <op2> ; yields ty:result
<result> = sub nuw <ty> <op1>, <op2> ; yields ty:result
<result> = sub nsw <ty> <op1>, <op2> ; yields ty:result
<result> = sub nuw nsw <ty> <op1>, <op2> ; yields ty:result
Overview:¶
The ‘sub
‘ instruction returns the difference of its two operands.
Note that the ‘sub
‘ instruction is used to represent the ‘neg
‘
instruction present in most other intermediate representations.
Arguments:¶
The two arguments to the ‘sub
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Semantics:¶
The value produced is the integer difference of the two operands.
If the difference has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.
Because LLVM integers use a two’s complement representation, this instruction is appropriate for both signed and unsigned integers.
nuw
and nsw
stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw
and/or nsw
keywords are present, the
result value of the sub
is a poison value if
unsigned and/or signed overflow, respectively, occurs.
‘fsub
‘ Instruction¶
Overview:¶
The ‘fsub
‘ instruction returns the difference of its two operands.
Note that the ‘fsub
‘ instruction is used to represent the ‘fneg
‘
instruction present in most other intermediate representations.
Arguments:¶
The two arguments to the ‘fsub
‘ instruction must be floating
point or vector of floating point values.
Both arguments must have identical types.
Semantics:¶
The value produced is the floating point difference of the two operands. This instruction can also take any number of fast-math flags, which are optimization hints to enable otherwise unsafe floating point optimizations:
‘mul
‘ Instruction¶
Syntax:¶
<result> = mul <ty> <op1>, <op2> ; yields ty:result
<result> = mul nuw <ty> <op1>, <op2> ; yields ty:result
<result> = mul nsw <ty> <op1>, <op2> ; yields ty:result
<result> = mul nuw nsw <ty> <op1>, <op2> ; yields ty:result
Arguments:¶
The two arguments to the ‘mul
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Semantics:¶
The value produced is the integer product of the two operands.
If the result of the multiplication has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.
Because LLVM integers use a two’s complement representation, and the
result is the same width as the operands, this instruction returns the
correct result for both signed and unsigned integers. If a full product
(e.g. i32
* i32
-> i64
) is needed, the operands should be
sign-extended or zero-extended as appropriate to the width of the full
product.
nuw
and nsw
stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw
and/or nsw
keywords are present, the
result value of the mul
is a poison value if
unsigned and/or signed overflow, respectively, occurs.
‘fmul
‘ Instruction¶
Arguments:¶
The two arguments to the ‘fmul
‘ instruction must be floating
point or vector of floating point values.
Both arguments must have identical types.
Semantics:¶
The value produced is the floating point product of the two operands. This instruction can also take any number of fast-math flags, which are optimization hints to enable otherwise unsafe floating point optimizations:
‘udiv
‘ Instruction¶
Syntax:¶
<result> = udiv <ty> <op1>, <op2> ; yields ty:result
<result> = udiv exact <ty> <op1>, <op2> ; yields ty:result
Arguments:¶
The two arguments to the ‘udiv
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Semantics:¶
The value produced is the unsigned integer quotient of the two operands.
Note that unsigned integer division and signed integer division are
distinct operations; for signed integer division, use ‘sdiv
‘.
Division by zero leads to undefined behavior.
If the exact
keyword is present, the result value of the udiv
is
a poison value if %op1 is not a multiple of %op2 (as
such, “((a udiv exact b) mul b) == a”).
‘sdiv
‘ Instruction¶
Syntax:¶
<result> = sdiv <ty> <op1>, <op2> ; yields ty:result
<result> = sdiv exact <ty> <op1>, <op2> ; yields ty:result
Arguments:¶
The two arguments to the ‘sdiv
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Semantics:¶
The value produced is the signed integer quotient of the two operands rounded towards zero.
Note that signed integer division and unsigned integer division are
distinct operations; for unsigned integer division, use ‘udiv
‘.
Division by zero leads to undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by doing a 32-bit division of -2147483648 by -1.
If the exact
keyword is present, the result value of the sdiv
is
a poison value if the result would be rounded.
‘fdiv
‘ Instruction¶
Arguments:¶
The two arguments to the ‘fdiv
‘ instruction must be floating
point or vector of floating point values.
Both arguments must have identical types.
Semantics:¶
The value produced is the floating point quotient of the two operands. This instruction can also take any number of fast-math flags, which are optimization hints to enable otherwise unsafe floating point optimizations:
‘urem
‘ Instruction¶
Overview:¶
The ‘urem
‘ instruction returns the remainder from the unsigned
division of its two arguments.
Arguments:¶
The two arguments to the ‘urem
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Semantics:¶
This instruction returns the unsigned integer remainder of a division. This instruction always performs an unsigned division to get the remainder.
Note that unsigned integer remainder and signed integer remainder are
distinct operations; for signed integer remainder, use ‘srem
‘.
Taking the remainder of a division by zero leads to undefined behavior.
‘srem
‘ Instruction¶
Overview:¶
The ‘srem
‘ instruction returns the remainder from the signed
division of its two operands. This instruction can also take
vector versions of the values in which case the elements
must be integers.
Arguments:¶
The two arguments to the ‘srem
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Semantics:¶
This instruction returns the remainder of a division (where the result
is either zero or has the same sign as the dividend, op1
), not the
modulo operator (where the result is either zero or has the same sign
as the divisor, op2
) of a value. For more information about the
difference, see The Math
Forum. For a
table of how this is implemented in various languages, please see
Wikipedia: modulo
operation.
Note that signed integer remainder and unsigned integer remainder are
distinct operations; for unsigned integer remainder, use ‘urem
‘.
Taking the remainder of a division by zero leads to undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by taking the remainder of a 32-bit division of -2147483648 by -1. (The remainder doesn’t actually overflow, but this rule lets srem be implemented using instructions that return both the result of the division and the remainder.)
‘frem
‘ Instruction¶
Arguments:¶
The two arguments to the ‘frem
‘ instruction must be floating
point or vector of floating point values.
Both arguments must have identical types.
Semantics:¶
This instruction returns the remainder of a division. The remainder has the same sign as the dividend. This instruction can also take any number of fast-math flags, which are optimization hints to enable otherwise unsafe floating point optimizations:
Bitwise Binary Operations¶
Bitwise binary operators are used to do various forms of bit-twiddling in a program. They are generally very efficient instructions and can commonly be strength reduced from other instructions. They require two operands of the same type, execute an operation on them, and produce a single value. The resulting value is the same type as its operands.
‘shl
‘ Instruction¶
Syntax:¶
<result> = shl <ty> <op1>, <op2> ; yields ty:result
<result> = shl nuw <ty> <op1>, <op2> ; yields ty:result
<result> = shl nsw <ty> <op1>, <op2> ; yields ty:result
<result> = shl nuw nsw <ty> <op1>, <op2> ; yields ty:result
Overview:¶
The ‘shl
‘ instruction returns the first operand shifted to the left
a specified number of bits.
Arguments:¶
Both arguments to the ‘shl
‘ instruction must be the same
integer or vector of integer type.
‘op2
‘ is treated as an unsigned value.
Semantics:¶
The value produced is op1
* 2op2 mod 2n,
where n
is the width of the result. If op2
is (statically or
dynamically) equal to or larger than the number of bits in
op1
, the result is undefined. If the arguments are vectors, each
vector element of op1
is shifted by the corresponding shift amount
in op2
.
If the nuw
keyword is present, then the shift produces a poison
value if it shifts out any non-zero bits. If the
nsw
keyword is present, then the shift produces a poison
value if it shifts out any bits that disagree with the
resultant sign bit. As such, NUW/NSW have the same semantics as they
would if the shift were expressed as a mul instruction with the same
nsw/nuw bits in (mul %op1, (shl 1, %op2)).
‘lshr
‘ Instruction¶
Syntax:¶
<result> = lshr <ty> <op1>, <op2> ; yields ty:result
<result> = lshr exact <ty> <op1>, <op2> ; yields ty:result
Overview:¶
The ‘lshr
‘ instruction (logical shift right) returns the first
operand shifted to the right a specified number of bits with zero fill.
Arguments:¶
Both arguments to the ‘lshr
‘ instruction must be the same
integer or vector of integer type.
‘op2
‘ is treated as an unsigned value.
Semantics:¶
This instruction always performs a logical shift right operation. The
most significant bits of the result will be filled with zero bits after
the shift. If op2
is (statically or dynamically) equal to or larger
than the number of bits in op1
, the result is undefined. If the
arguments are vectors, each vector element of op1
is shifted by the
corresponding shift amount in op2
.
If the exact
keyword is present, the result value of the lshr
is
a poison value if any of the bits shifted out are
non-zero.
Example:¶
<result> = lshr i32 4, 1 ; yields i32:result = 2
<result> = lshr i32 4, 2 ; yields i32:result = 1
<result> = lshr i8 4, 3 ; yields i8:result = 0
<result> = lshr i8 -2, 1 ; yields i8:result = 0x7F
<result> = lshr i32 1, 32 ; undefined
<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>
‘ashr
‘ Instruction¶
Syntax:¶
<result> = ashr <ty> <op1>, <op2> ; yields ty:result
<result> = ashr exact <ty> <op1>, <op2> ; yields ty:result
Overview:¶
The ‘ashr
‘ instruction (arithmetic shift right) returns the first
operand shifted to the right a specified number of bits with sign
extension.
Arguments:¶
Both arguments to the ‘ashr
‘ instruction must be the same
integer or vector of integer type.
‘op2
‘ is treated as an unsigned value.
Semantics:¶
This instruction always performs an arithmetic shift right operation,
The most significant bits of the result will be filled with the sign bit
of op1
. If op2
is (statically or dynamically) equal to or larger
than the number of bits in op1
, the result is undefined. If the
arguments are vectors, each vector element of op1
is shifted by the
corresponding shift amount in op2
.
If the exact
keyword is present, the result value of the ashr
is
a poison value if any of the bits shifted out are
non-zero.
Example:¶
<result> = ashr i32 4, 1 ; yields i32:result = 2
<result> = ashr i32 4, 2 ; yields i32:result = 1
<result> = ashr i8 4, 3 ; yields i8:result = 0
<result> = ashr i8 -2, 1 ; yields i8:result = -1
<result> = ashr i32 1, 32 ; undefined
<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>
‘and
‘ Instruction¶
Arguments:¶
The two arguments to the ‘and
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
‘or
‘ Instruction¶
Arguments:¶
The two arguments to the ‘or
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
‘xor
‘ Instruction¶
Overview:¶
The ‘xor
‘ instruction returns the bitwise logical exclusive or of
its two operands. The xor
is used to implement the “one’s
complement” operation, which is the “~” operator in C.
Arguments:¶
The two arguments to the ‘xor
‘ instruction must be
integer or vector of integer values. Both
arguments must have identical types.
Vector Operations¶
LLVM supports several instructions to represent vector operations in a target-independent manner. These instructions cover the element-access and vector-specific operations needed to process vectors effectively. While LLVM does directly support these vector operations, many sophisticated algorithms will want to use target-specific intrinsics to take full advantage of a specific target.
‘extractelement
‘ Instruction¶
Overview:¶
The ‘extractelement
‘ instruction extracts a single scalar element
from a vector at a specified index.
Arguments:¶
The first operand of an ‘extractelement
‘ instruction is a value of
vector type. The second operand is an index indicating
the position from which to extract the element. The index may be a
variable of any integer type.
Semantics:¶
The result is a scalar of the same type as the element type of val
.
Its value is the value at position idx
of val
. If idx
exceeds the length of val
, the results are undefined.
‘insertelement
‘ Instruction¶
Overview:¶
The ‘insertelement
‘ instruction inserts a scalar element into a
vector at a specified index.
Arguments:¶
The first operand of an ‘insertelement
‘ instruction is a value of
vector type. The second operand is a scalar value whose
type must equal the element type of the first operand. The third operand
is an index indicating the position at which to insert the value. The
index may be a variable of any integer type.
Semantics:¶
The result is a vector of the same type as val
. Its element values
are those of val
except at position idx
, where it gets the value
elt
. If idx
exceeds the length of val
, the results are
undefined.
‘shufflevector
‘ Instruction¶
Syntax:¶
<result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask> ; yields <m x <ty>>
Overview:¶
The ‘shufflevector
‘ instruction constructs a permutation of elements
from two input vectors, returning a vector with the same element type as
the input and length that is the same as the shuffle mask.
Arguments:¶
The first two operands of a ‘shufflevector
‘ instruction are vectors
with the same type. The third argument is a shuffle mask whose element
type is always ‘i32’. The result of the instruction is a vector whose
length is the same as the shuffle mask and whose element type is the
same as the element type of the first two operands.
The shuffle mask operand is required to be a constant vector with either constant integer or undef values.
Semantics:¶
The elements of the two input vectors are numbered from left to right across both of the vectors. The shuffle mask operand specifies, for each element of the result vector, which element of the two input vectors the result element gets. The element selector may be undef (meaning “don’t care”) and the second operand may be undef if performing a shuffle from only one vector.
Example:¶
<result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
<4 x i32> <i32 0, i32 4, i32 1, i32 5> ; yields <4 x i32>
<result> = shufflevector <4 x i32> %v1, <4 x i32> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32> - Identity shuffle.
<result> = shufflevector <8 x i32> %v1, <8 x i32> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32>
<result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 > ; yields <8 x i32>
Aggregate Operations¶
LLVM supports several instructions for working with aggregate values.
‘extractvalue
‘ Instruction¶
Overview:¶
The ‘extractvalue
‘ instruction extracts the value of a member field
from an aggregate value.
Arguments:¶
The first operand of an ‘extractvalue
‘ instruction is a value of
struct or array type. The operands are
constant indices to specify which value to extract in a similar manner
as indices in a ‘getelementptr
‘ instruction.
The major differences to getelementptr
indexing are:
- Since the value being indexed is not a pointer, the first index is omitted and assumed to be zero.
- At least one index must be specified.
- Not only struct indices but also array indices must be in bounds.
Semantics:¶
The result is the value at the position in the aggregate specified by the index operands.
‘insertvalue
‘ Instruction¶
Syntax:¶
<result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}* ; yields <aggregate type>
Arguments:¶
The first operand of an ‘insertvalue
‘ instruction is a value of
struct or array type. The second operand is
a first-class value to insert. The following operands are constant
indices indicating the position at which to insert the value in a
similar manner as indices in a ‘extractvalue
‘ instruction. The value
to insert must have the same type as the value identified by the
indices.
Semantics:¶
The result is an aggregate of the same type as val
. Its value is
that of val
except that the value at the position specified by the
indices is that of elt
.
Memory Access and Addressing Operations¶
A key design point of an SSA-based representation is how it represents memory. In LLVM, no memory locations are in SSA form, which makes things very simple. This section describes how to read, write, and allocate memory in LLVM.
‘alloca
‘ Instruction¶
Syntax:¶
<result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>] ; yields type*:result
Overview:¶
The ‘alloca
‘ instruction allocates memory on the stack frame of the
currently executing function, to be automatically released when this
function returns to its caller. The object is always allocated in the
generic address space (address space zero).
Arguments:¶
The ‘alloca
‘ instruction allocates sizeof(<type>)*NumElements
bytes of memory on the runtime stack, returning a pointer of the
appropriate type to the program. If “NumElements” is specified, it is
the number of elements allocated, otherwise “NumElements” is defaulted
to be one. If a constant alignment is specified, the value result of the
allocation is guaranteed to be aligned to at least that boundary. The
alignment may not be greater than 1 << 29
. If not specified, or if
zero, the target can choose to align the allocation on any convenient
boundary compatible with the type.
‘type
‘ may be any sized type.
Semantics:¶
Memory is allocated; a pointer is returned. The operation is undefined
if there is insufficient stack space for the allocation. ‘alloca
‘d
memory is automatically released when the function returns. The
‘alloca
‘ instruction is commonly used to represent automatic
variables that must have an address available. When the function returns
(either with the ret
or resume
instructions), the memory is
reclaimed. Allocating zero bytes is legal, but the result is undefined.
The order in which memory is allocated (ie., which way the stack grows)
is not specified.
‘load
‘ Instruction¶
Syntax:¶
<result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !nonnull !<index>][, !dereferenceable !<index>][, !dereferenceable_or_null !<index>]
<result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment>
!<index> = !{ i32 1 }
Arguments:¶
The argument to the load
instruction specifies the memory address
from which to load. The type specified must be a first
class type. If the load
is marked as volatile
,
then the optimizer is not allowed to modify the number or order of
execution of this load
with other volatile
operations.
If the load
is marked as atomic
, it takes an extra
ordering and optional singlethread
argument. The
release
and acq_rel
orderings are not valid on load
instructions. Atomic loads produce defined results
when they may see multiple atomic stores. The type of the pointee must
be an integer type whose bit width is a power of two greater than or
equal to eight and less than or equal to a target-specific size limit.
align
must be explicitly specified on atomic loads, and the load has
undefined behavior if the alignment is not set to a value which is at
least the size in bytes of the pointee. !nontemporal
does not have
any defined semantics for atomic loads.
The optional constant align
argument specifies the alignment of the
operation (that is, the alignment of the memory address). A value of 0
or an omitted align
argument means that the operation has the ABI
alignment for the target. It is the responsibility of the code emitter
to ensure that the alignment information is correct. Overestimating the
alignment results in undefined behavior. Underestimating the alignment
may produce less efficient code. An alignment of 1 is always safe. The
maximum possible alignment is 1 << 29
.
The optional !nontemporal
metadata must reference a single
metadata name <index>
corresponding to a metadata node with one
i32
entry of value 1. The existence of the !nontemporal
metadata on the instruction tells the optimizer and code generator
that this load is not expected to be reused in the cache. The code
generator may select special instructions to save cache bandwidth, such
as the MOVNT
instruction on x86.
The optional !invariant.load
metadata must reference a single
metadata name <index>
corresponding to a metadata node with no
entries. The existence of the !invariant.load
metadata on the
instruction tells the optimizer and code generator that the address
operand to this load points to memory which can be assumed unchanged.
Being invariant does not imply that a location is dereferenceable,
but it does imply that once the location is known dereferenceable
its value is henceforth unchanging.
The optional !nonnull
metadata must reference a single
metadata name <index>
corresponding to a metadata node with no
entries. The existence of the !nonnull
metadata on the
instruction tells the optimizer that the value loaded is known to
never be null. This is analogous to the ‘’nonnull’’ attribute
on parameters and return values. This metadata can only be applied
to loads of a pointer type.
The optional !dereferenceable
metadata must reference a single
metadata name <index>
corresponding to a metadata node with one i64
entry. The existence of the !dereferenceable
metadata on the instruction
tells the optimizer that the value loaded is known to be dereferenceable.
The number of bytes known to be dereferenceable is specified by the integer
value in the metadata node. This is analogous to the ‘’dereferenceable’’
attribute on parameters and return values. This metadata can only be applied
to loads of a pointer type.
The optional !dereferenceable_or_null
metadata must reference a single
metadata name <index>
corresponding to a metadata node with one i64
entry. The existence of the !dereferenceable_or_null
metadata on the
instruction tells the optimizer that the value loaded is known to be either
dereferenceable or null.
The number of bytes known to be dereferenceable is specified by the integer
value in the metadata node. This is analogous to the ‘’dereferenceable_or_null’’
attribute on parameters and return values. This metadata can only be applied
to loads of a pointer type.
Semantics:¶
The location of memory pointed to is loaded. If the value being loaded
is of scalar type then the number of bytes read does not exceed the
minimum number of bytes needed to hold all bits of the type. For
example, loading an i24
reads at most three bytes. When loading a
value of a type like i20
with a size that is not an integral number
of bytes, the result is undefined if the value was not originally
written using a store of the same type.
‘store
‘ Instruction¶
Syntax:¶
store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>] ; yields void
store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> ; yields void
Arguments:¶
There are two arguments to the store
instruction: a value to store
and an address at which to store it. The type of the <pointer>
operand must be a pointer to the first class type of
the <value>
operand. If the store
is marked as volatile
,
then the optimizer is not allowed to modify the number or order of
execution of this store
with other volatile
operations.
If the store
is marked as atomic
, it takes an extra
ordering and optional singlethread
argument. The
acquire
and acq_rel
orderings aren’t valid on store
instructions. Atomic loads produce defined results
when they may see multiple atomic stores. The type of the pointee must
be an integer type whose bit width is a power of two greater than or
equal to eight and less than or equal to a target-specific size limit.
align
must be explicitly specified on atomic stores, and the store
has undefined behavior if the alignment is not set to a value which is
at least the size in bytes of the pointee. !nontemporal
does not
have any defined semantics for atomic stores.
The optional constant align
argument specifies the alignment of the
operation (that is, the alignment of the memory address). A value of 0
or an omitted align
argument means that the operation has the ABI
alignment for the target. It is the responsibility of the code emitter
to ensure that the alignment information is correct. Overestimating the
alignment results in undefined behavior. Underestimating the
alignment may produce less efficient code. An alignment of 1 is always
safe. The maximum possible alignment is 1 << 29
.
The optional !nontemporal
metadata must reference a single metadata
name <index>
corresponding to a metadata node with one i32
entry of
value 1. The existence of the !nontemporal
metadata on the instruction
tells the optimizer and code generator that this load is not expected to
be reused in the cache. The code generator may select special
instructions to save cache bandwidth, such as the MOVNT instruction on
x86.
Semantics:¶
The contents of memory are updated to contain <value>
at the
location specified by the <pointer>
operand. If <value>
is
of scalar type then the number of bytes written does not exceed the
minimum number of bytes needed to hold all bits of the type. For
example, storing an i24
writes at most three bytes. When writing a
value of a type like i20
with a size that is not an integral number
of bytes, it is unspecified what happens to the extra bits that do not
belong to the type, but they will typically be overwritten.
‘fence
‘ Instruction¶
Arguments:¶
‘fence
‘ instructions take an ordering argument which
defines what synchronizes-with edges they add. They can only be given
acquire
, release
, acq_rel
, and seq_cst
orderings.
Semantics:¶
A fence A which has (at least) release
ordering semantics
synchronizes with a fence B with (at least) acquire
ordering
semantics if and only if there exist atomic operations X and Y, both
operating on some atomic object M, such that A is sequenced before X, X
modifies M (either directly or through some side effect of a sequence
headed by X), Y is sequenced before B, and Y observes M. This provides a
happens-before dependency between A and B. Rather than an explicit
fence
, one (but not both) of the atomic operations X or Y might
provide a release
or acquire
(resp.) ordering constraint and
still synchronize-with the explicit fence
and establish the
happens-before edge.
A fence
which has seq_cst
ordering, in addition to having both
acquire
and release
semantics specified above, participates in
the global program order of other seq_cst
operations and/or fences.
The optional “singlethread” argument specifies that the fence only synchronizes with other fences in the same thread. (This is useful for interacting with signal handlers.)
‘cmpxchg
‘ Instruction¶
Syntax:¶
cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <success ordering> <failure ordering> ; yields { ty, i1 }
Overview:¶
The ‘cmpxchg
‘ instruction is used to atomically modify memory. It
loads a value in memory and compares it to a given value. If they are
equal, it tries to store a new value into the memory.
Arguments:¶
There are three arguments to the ‘cmpxchg
‘ instruction: an address
to operate on, a value to compare to the value currently be at that
address, and a new value to place at that address if the compared values
are equal. The type of ‘<cmp>’ must be an integer type whose bit width
is a power of two greater than or equal to eight and less than or equal
to a target-specific size limit. ‘<cmp>’ and ‘<new>’ must have the same
type, and the type of ‘<pointer>’ must be a pointer to that type. If the
cmpxchg
is marked as volatile
, then the optimizer is not allowed
to modify the number or order of execution of this cmpxchg
with
other volatile operations.
The success and failure ordering arguments specify how this
cmpxchg
synchronizes with other atomic operations. Both ordering parameters
must be at least monotonic
, the ordering constraint on failure must be no
stronger than that on success, and the failure ordering cannot be either
release
or acq_rel
.
The optional “singlethread
” argument declares that the cmpxchg
is only atomic with respect to code (usually signal handlers) running in
the same thread as the cmpxchg
. Otherwise the cmpxchg is atomic with
respect to all other code in the system.
The pointer passed into cmpxchg must have alignment greater than or equal to the size in memory of the operand.
Semantics:¶
The contents of memory at the location specified by the ‘<pointer>
‘ operand
is read and compared to ‘<cmp>
‘; if the read value is the equal, the
‘<new>
‘ is written. The original value at the location is returned, together
with a flag indicating success (true) or failure (false).
If the cmpxchg operation is marked as weak
then a spurious failure is
permitted: the operation may not write <new>
even if the comparison
matched.
If the cmpxchg operation is strong (the default), the i1 value is 1 if and only
if the value loaded equals cmp
.
A successful cmpxchg
is a read-modify-write instruction for the purpose of
identifying release sequences. A failed cmpxchg
is equivalent to an atomic
load with an ordering parameter determined the second ordering parameter.
Example:¶
entry:
%orig = atomic load i32, i32* %ptr unordered ; yields i32
br label %loop
loop:
%cmp = phi i32 [ %orig, %entry ], [%old, %loop]
%squared = mul i32 %cmp, %cmp
%val_success = cmpxchg i32* %ptr, i32 %cmp, i32 %squared acq_rel monotonic ; yields { i32, i1 }
%value_loaded = extractvalue { i32, i1 } %val_success, 0
%success = extractvalue { i32, i1 } %val_success, 1
br i1 %success, label %done, label %loop
done:
...
‘atomicrmw
‘ Instruction¶
Syntax:¶
atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering> ; yields ty
Arguments:¶
There are three arguments to the ‘atomicrmw
‘ instruction: an
operation to apply, an address whose value to modify, an argument to the
operation. The operation must be one of the following keywords:
- xchg
- add
- sub
- and
- nand
- or
- xor
- max
- min
- umax
- umin
The type of ‘<value>’ must be an integer type whose bit width is a power
of two greater than or equal to eight and less than or equal to a
target-specific size limit. The type of the ‘<pointer>
‘ operand must
be a pointer to that type. If the atomicrmw
is marked as
volatile
, then the optimizer is not allowed to modify the number or
order of execution of this atomicrmw
with other volatile
operations.
Semantics:¶
The contents of memory at the location specified by the ‘<pointer>
‘
operand are atomically read, modified, and written back. The original
value at the location is returned. The modification is specified by the
operation argument:
- xchg:
*ptr = val
- add:
*ptr = *ptr + val
- sub:
*ptr = *ptr - val
- and:
*ptr = *ptr & val
- nand:
*ptr = ~(*ptr & val)
- or:
*ptr = *ptr | val
- xor:
*ptr = *ptr ^ val
- max:
*ptr = *ptr > val ? *ptr : val
(using a signed comparison) - min:
*ptr = *ptr < val ? *ptr : val
(using a signed comparison) - umax:
*ptr = *ptr > val ? *ptr : val
(using an unsigned comparison) - umin:
*ptr = *ptr < val ? *ptr : val
(using an unsigned comparison)
‘getelementptr
‘ Instruction¶
Syntax:¶
<result> = getelementptr <ty>, <ty>* <ptrval>{, <ty> <idx>}*
<result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, <ty> <idx>}*
<result> = getelementptr <ty>, <ptr vector> <ptrval>, <vector index type> <idx>
Overview:¶
The ‘getelementptr
‘ instruction is used to get the address of a
subelement of an aggregate data structure. It performs
address calculation only and does not access memory. The instruction can also
be used to calculate a vector of such addresses.
Arguments:¶
The first argument is always a type used as the basis for the calculations. The second argument is always a pointer or a vector of pointers, and is the base address to start from. The remaining arguments are indices that indicate which of the elements of the aggregate object are indexed. The interpretation of each index is dependent on the type being indexed into. The first index always indexes the pointer value given as the first argument, the second index indexes a value of the type pointed to (not necessarily the value directly pointed to, since the first index can be non-zero), etc. The first type indexed into must be a pointer value, subsequent types can be arrays, vectors, and structs. Note that subsequent types being indexed into can never be pointers, since that would require loading the pointer before continuing calculation.
The type of each index argument depends on the type it is indexing into.
When indexing into a (optionally packed) structure, only i32
integer
constants are allowed (when using a vector of indices they must all
be the same i32
integer constant). When indexing into an array,
pointer or vector, integers of any width are allowed, and they are not
required to be constant. These integers are treated as signed values
where relevant.
For example, let’s consider a C code fragment and how it gets compiled to LLVM:
struct RT {
char A;
int B[10][20];
char C;
};
struct ST {
int X;
double Y;
struct RT Z;
};
int *foo(struct ST *s) {
return &s[1].Z.B[5][13];
}
The LLVM code generated by Clang is:
%struct.RT = type { i8, [10 x [20 x i32]], i8 }
%struct.ST = type { i32, double, %struct.RT }
define i32* @foo(%struct.ST* %s) nounwind uwtable readnone optsize ssp {
entry:
%arrayidx = getelementptr inbounds %struct.ST, %struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13
ret i32* %arrayidx
}
Semantics:¶
In the example above, the first index is indexing into the
‘%struct.ST*
‘ type, which is a pointer, yielding a ‘%struct.ST
‘
= ‘{ i32, double, %struct.RT }
‘ type, a structure. The second index
indexes into the third element of the structure, yielding a
‘%struct.RT
‘ = ‘{ i8 , [10 x [20 x i32]], i8 }
‘ type, another
structure. The third index indexes into the second element of the
structure, yielding a ‘[10 x [20 x i32]]
‘ type, an array. The two
dimensions of the array are subscripted into, yielding an ‘i32
‘
type. The ‘getelementptr
‘ instruction returns a pointer to this
element, thus computing a value of ‘i32*
‘ type.
Note that it is perfectly legal to index partially through a structure, returning a pointer to an inner element. Because of this, the LLVM code for the given testcase is equivalent to:
define i32* @foo(%struct.ST* %s) {
%t1 = getelementptr %struct.ST, %struct.ST* %s, i32 1 ; yields %struct.ST*:%t1
%t2 = getelementptr %struct.ST, %struct.ST* %t1, i32 0, i32 2 ; yields %struct.RT*:%t2
%t3 = getelementptr %struct.RT, %struct.RT* %t2, i32 0, i32 1 ; yields [10 x [20 x i32]]*:%t3
%t4 = getelementptr [10 x [20 x i32]], [10 x [20 x i32]]* %t3, i32 0, i32 5 ; yields [20 x i32]*:%t4
%t5 = getelementptr [20 x i32], [20 x i32]* %t4, i32 0, i32 13 ; yields i32*:%t5
ret i32* %t5
}
If the inbounds
keyword is present, the result value of the
getelementptr
is a poison value if the base
pointer is not an in bounds address of an allocated object, or if any
of the addresses that would be formed by successive addition of the
offsets implied by the indices to the base address with infinitely
precise signed arithmetic are not an in bounds address of that
allocated object. The in bounds addresses for an allocated object are
all the addresses that point into the object, plus the address one byte
past the end. In cases where the base is a vector of pointers the
inbounds
keyword applies to each of the computations element-wise.
If the inbounds
keyword is not present, the offsets are added to the
base address with silently-wrapping two’s complement arithmetic. If the
offsets have a different width from the pointer, they are sign-extended
or truncated to the width of the pointer. The result value of the
getelementptr
may be outside the object pointed to by the base
pointer. The result value may not necessarily be used to access memory
though, even if it happens to point into allocated storage. See the
Pointer Aliasing Rules section for more
information.
The getelementptr instruction is often confusing. For some more insight into how it works, see the getelementptr FAQ.
Example:¶
; yields [12 x i8]*:aptr
%aptr = getelementptr {i32, [12 x i8]}, {i32, [12 x i8]}* %saptr, i64 0, i32 1
; yields i8*:vptr
%vptr = getelementptr {i32, <2 x i8>}, {i32, <2 x i8>}* %svptr, i64 0, i32 1, i32 1
; yields i8*:eptr
%eptr = getelementptr [12 x i8], [12 x i8]* %aptr, i64 0, i32 1
; yields i32*:iptr
%iptr = getelementptr [10 x i32], [10 x i32]* @arr, i16 0, i16 0
Vector of pointers:¶
The getelementptr
returns a vector of pointers, instead of a single address,
when one or more of its arguments is a vector. In such cases, all vector
arguments should have the same number of elements, and every scalar argument
will be effectively broadcast into a vector during address calculation.
; All arguments are vectors:
; A[i] = ptrs[i] + offsets[i]*sizeof(i8)
%A = getelementptr i8, <4 x i8*> %ptrs, <4 x i64> %offsets
; Add the same scalar offset to each pointer of a vector:
; A[i] = ptrs[i] + offset*sizeof(i8)
%A = getelementptr i8, <4 x i8*> %ptrs, i64 %offset
; Add distinct offsets to the same pointer:
; A[i] = ptr + offsets[i]*sizeof(i8)
%A = getelementptr i8, i8* %ptr, <4 x i64> %offsets
; In all cases described above the type of the result is <4 x i8*>
The two following instructions are equivalent:
getelementptr %struct.ST, <4 x %struct.ST*> %s, <4 x i64> %ind1,
<4 x i32> <i32 2, i32 2, i32 2, i32 2>,
<4 x i32> <i32 1, i32 1, i32 1, i32 1>,
<4 x i32> %ind4,
<4 x i64> <i64 13, i64 13, i64 13, i64 13>
getelementptr %struct.ST, <4 x %struct.ST*> %s, <4 x i64> %ind1,
i32 2, i32 1, <4 x i32> %ind4, i64 13
Let’s look at the C code, where the vector version of getelementptr
makes sense:
// Let's assume that we vectorize the following loop:
double *A, B; int *C;
for (int i = 0; i < size; ++i) {
A[i] = B[C[i]];
}
; get pointers for 8 elements from array B
%ptrs = getelementptr double, double* %B, <8 x i32> %C
; load 8 elements from array B into A
%A = call <8 x double> @llvm.masked.gather.v8f64(<8 x double*> %ptrs,
i32 8, <8 x i1> %mask, <8 x double> %passthru)
Conversion Operations¶
The instructions in this category are the conversion instructions (casting) which all take a single operand and a type. They perform various bit conversions on the operand.
‘trunc .. to
‘ Instruction¶
Arguments:¶
The ‘trunc
‘ instruction takes a value to trunc, and a type to trunc
it to. Both types must be of integer types, or vectors
of the same number of integers. The bit size of the value
must be
larger than the bit size of the destination type, ty2
. Equal sized
types are not allowed.
Semantics:¶
The ‘trunc
‘ instruction truncates the high order bits in value
and converts the remaining bits to ty2
. Since the source size must
be larger than the destination size, trunc
cannot be a no-op cast.
It will always truncate bits.
‘zext .. to
‘ Instruction¶
Arguments:¶
The ‘zext
‘ instruction takes a value to cast, and a type to cast it
to. Both types must be of integer types, or vectors of
the same number of integers. The bit size of the value
must be
smaller than the bit size of the destination type, ty2
.
Semantics:¶
The zext
fills the high order bits of the value
with zero bits
until it reaches the size of the destination type, ty2
.
When zero extending from i1, the result will always be either 0 or 1.
‘sext .. to
‘ Instruction¶
Arguments:¶
The ‘sext
‘ instruction takes a value to cast, and a type to cast it
to. Both types must be of integer types, or vectors of
the same number of integers. The bit size of the value
must be
smaller than the bit size of the destination type, ty2
.
Semantics:¶
The ‘sext
‘ instruction performs a sign extension by copying the sign
bit (highest order bit) of the value
until it reaches the bit size
of the type ty2
.
When sign extending from i1, the extension always results in -1 or 0.
‘fptrunc .. to
‘ Instruction¶
Arguments:¶
The ‘fptrunc
‘ instruction takes a floating point
value to cast and a floating point type to cast it to.
The size of value
must be larger than the size of ty2
. This
implies that fptrunc
cannot be used to make a no-op cast.
Semantics:¶
The ‘fptrunc
‘ instruction truncates a value
from a larger
floating point type to a smaller floating
point type. If the value cannot fit within the
destination type, ty2
, then the results are undefined.
‘fpext .. to
‘ Instruction¶
Arguments:¶
The ‘fpext
‘ instruction takes a floating point
value
to cast, and a floating point type to cast it
to. The source type must be smaller than the destination type.
Semantics:¶
The ‘fpext
‘ instruction extends the value
from a smaller
floating point type to a larger floating
point type. The fpext
cannot be used to make a
no-op cast because it always changes bits. Use bitcast
to make a
no-op cast for a floating point cast.
‘fptoui .. to
‘ Instruction¶
Overview:¶
The ‘fptoui
‘ converts a floating point value
to its unsigned
integer equivalent of type ty2
.
Arguments:¶
The ‘fptoui
‘ instruction takes a value to cast, which must be a
scalar or vector floating point value, and a type to
cast it to ty2
, which must be an integer type. If
ty
is a vector floating point type, ty2
must be a vector integer
type with the same number of elements as ty
Semantics:¶
The ‘fptoui
‘ instruction converts its floating
point operand into the nearest (rounding towards zero)
unsigned integer value. If the value cannot fit in ty2
, the results
are undefined.
‘fptosi .. to
‘ Instruction¶
Overview:¶
The ‘fptosi
‘ instruction converts floating point
value
to type ty2
.
Arguments:¶
The ‘fptosi
‘ instruction takes a value to cast, which must be a
scalar or vector floating point value, and a type to
cast it to ty2
, which must be an integer type. If
ty
is a vector floating point type, ty2
must be a vector integer
type with the same number of elements as ty
Semantics:¶
The ‘fptosi
‘ instruction converts its floating
point operand into the nearest (rounding towards zero)
signed integer value. If the value cannot fit in ty2
, the results
are undefined.
‘uitofp .. to
‘ Instruction¶
Overview:¶
The ‘uitofp
‘ instruction regards value
as an unsigned integer
and converts that value to the ty2
type.
Arguments:¶
The ‘uitofp
‘ instruction takes a value to cast, which must be a
scalar or vector integer value, and a type to cast it to
ty2
, which must be an floating point type. If
ty
is a vector integer type, ty2
must be a vector floating point
type with the same number of elements as ty
Semantics:¶
The ‘uitofp
‘ instruction interprets its operand as an unsigned
integer quantity and converts it to the corresponding floating point
value. If the value cannot fit in the floating point value, the results
are undefined.
‘sitofp .. to
‘ Instruction¶
Overview:¶
The ‘sitofp
‘ instruction regards value
as a signed integer and
converts that value to the ty2
type.
Arguments:¶
The ‘sitofp
‘ instruction takes a value to cast, which must be a
scalar or vector integer value, and a type to cast it to
ty2
, which must be an floating point type. If
ty
is a vector integer type, ty2
must be a vector floating point
type with the same number of elements as ty
Semantics:¶
The ‘sitofp
‘ instruction interprets its operand as a signed integer
quantity and converts it to the corresponding floating point value. If
the value cannot fit in the floating point value, the results are
undefined.
‘ptrtoint .. to
‘ Instruction¶
Overview:¶
The ‘ptrtoint
‘ instruction converts the pointer or a vector of
pointers value
to the integer (or vector of integers) type ty2
.
Arguments:¶
The ‘ptrtoint
‘ instruction takes a value
to cast, which must be
a value of type pointer or a vector of pointers, and a
type to cast it to ty2
, which must be an integer or
a vector of integers type.
Semantics:¶
The ‘ptrtoint
‘ instruction converts value
to integer type
ty2
by interpreting the pointer value as an integer and either
truncating or zero extending that value to the size of the integer type.
If value
is smaller than ty2
then a zero extension is done. If
value
is larger than ty2
then a truncation is done. If they are
the same size, then nothing is done (no-op cast) other than a type
change.
‘inttoptr .. to
‘ Instruction¶
Arguments:¶
The ‘inttoptr
‘ instruction takes an integer value to
cast, and a type to cast it to, which must be a pointer
type.
Semantics:¶
The ‘inttoptr
‘ instruction converts value
to type ty2
by
applying either a zero extension or a truncation depending on the size
of the integer value
. If value
is larger than the size of a
pointer then a truncation is done. If value
is smaller than the size
of a pointer then a zero extension is done. If they are the same size,
nothing is done (no-op cast).
Example:¶
%X = inttoptr i32 255 to i32* ; yields zero extension on 64-bit architecture
%Y = inttoptr i32 255 to i32* ; yields no-op on 32-bit architecture
%Z = inttoptr i64 0 to i32* ; yields truncation on 32-bit architecture
%Z = inttoptr <4 x i32> %G to <4 x i8*>; yields truncation of vector G to four pointers
‘bitcast .. to
‘ Instruction¶
Arguments:¶
The ‘bitcast
‘ instruction takes a value to cast, which must be a
non-aggregate first class value, and a type to cast it to, which must
also be a non-aggregate first class type. The
bit sizes of value
and the destination type, ty2
, must be
identical. If the source type is a pointer, the destination type must
also be a pointer of the same size. This instruction supports bitwise
conversion of vectors to integers and to vectors of other types (as
long as they have the same size).
Semantics:¶
The ‘bitcast
‘ instruction converts value
to type ty2
. It
is always a no-op cast because no bits change with this
conversion. The conversion is done as if the value
had been stored
to memory and read back as type ty2
. Pointer (or vector of
pointers) types may only be converted to other pointer (or vector of
pointers) types with the same address space through this instruction.
To convert pointers to other types, use the inttoptr
or ptrtoint instructions first.
‘addrspacecast .. to
‘ Instruction¶
Overview:¶
The ‘addrspacecast
‘ instruction converts ptrval
from pty
in
address space n
to type pty2
in address space m
.
Arguments:¶
The ‘addrspacecast
‘ instruction takes a pointer or vector of pointer value
to cast and a pointer type to cast it to, which must have a different
address space.
Semantics:¶
The ‘addrspacecast
‘ instruction converts the pointer value
ptrval
to type pty2
. It can be a no-op cast or a complex
value modification, depending on the target and the address space
pair. Pointer conversions within the same address space must be
performed with the bitcast
instruction. Note that if the address space
conversion is legal then both result and operand refer to the same memory
location.
Other Operations¶
The instructions in this category are the “miscellaneous” instructions, which defy better classification.
‘icmp
‘ Instruction¶
Overview:¶
The ‘icmp
‘ instruction returns a boolean value or a vector of
boolean values based on comparison of its two integer, integer vector,
pointer, or pointer vector operands.
Arguments:¶
The ‘icmp
‘ instruction takes three operands. The first operand is
the condition code indicating the kind of comparison to perform. It is
not a value, just a keyword. The possible condition code are:
eq
: equalne
: not equalugt
: unsigned greater thanuge
: unsigned greater or equalult
: unsigned less thanule
: unsigned less or equalsgt
: signed greater thansge
: signed greater or equalslt
: signed less thansle
: signed less or equal
The remaining two arguments must be integer or pointer or integer vector typed. They must also be identical types.
Semantics:¶
The ‘icmp
‘ compares op1
and op2
according to the condition
code given as cond
. The comparison performed always yields either an
i1 or vector of i1
result, as follows:
eq
: yieldstrue
if the operands are equal,false
otherwise. No sign interpretation is necessary or performed.ne
: yieldstrue
if the operands are unequal,false
otherwise. No sign interpretation is necessary or performed.ugt
: interprets the operands as unsigned values and yieldstrue
ifop1
is greater thanop2
.uge
: interprets the operands as unsigned values and yieldstrue
ifop1
is greater than or equal toop2
.ult
: interprets the operands as unsigned values and yieldstrue
ifop1
is less thanop2
.ule
: interprets the operands as unsigned values and yieldstrue
ifop1
is less than or equal toop2
.sgt
: interprets the operands as signed values and yieldstrue
ifop1
is greater thanop2
.sge
: interprets the operands as signed values and yieldstrue
ifop1
is greater than or equal toop2
.slt
: interprets the operands as signed values and yieldstrue
ifop1
is less thanop2
.sle
: interprets the operands as signed values and yieldstrue
ifop1
is less than or equal toop2
.
If the operands are pointer typed, the pointer values are compared as if they were integers.
If the operands are integer vectors, then they are compared element by
element. The result is an i1
vector with the same number of elements
as the values being compared. Otherwise, the result is an i1
.
Example:¶
<result> = icmp eq i32 4, 5 ; yields: result=false
<result> = icmp ne float* %X, %X ; yields: result=false
<result> = icmp ult i16 4, 5 ; yields: result=true
<result> = icmp sgt i16 4, 5 ; yields: result=false
<result> = icmp ule i16 -4, 5 ; yields: result=false
<result> = icmp sge i16 4, 5 ; yields: result=false
Note that the code generator does not yet support vector types with the
icmp
instruction.
‘fcmp
‘ Instruction¶
Overview:¶
The ‘fcmp
‘ instruction returns a boolean value or vector of boolean
values based on comparison of its operands.
If the operands are floating point scalars, then the result type is a boolean (i1).
If the operands are floating point vectors, then the result type is a vector of boolean with the same number of elements as the operands being compared.
Arguments:¶
The ‘fcmp
‘ instruction takes three operands. The first operand is
the condition code indicating the kind of comparison to perform. It is
not a value, just a keyword. The possible condition code are:
false
: no comparison, always returns falseoeq
: ordered and equalogt
: ordered and greater thanoge
: ordered and greater than or equalolt
: ordered and less thanole
: ordered and less than or equalone
: ordered and not equalord
: ordered (no nans)ueq
: unordered or equalugt
: unordered or greater thanuge
: unordered or greater than or equalult
: unordered or less thanule
: unordered or less than or equalune
: unordered or not equaluno
: unordered (either nans)true
: no comparison, always returns true
Ordered means that neither operand is a QNAN while unordered means that either operand may be a QNAN.
Each of val1
and val2
arguments must be either a floating
point type or a vector of floating point
type. They must have identical types.
Semantics:¶
The ‘fcmp
‘ instruction compares op1
and op2
according to the
condition code given as cond
. If the operands are vectors, then the
vectors are compared element by element. Each comparison performed
always yields an i1 result, as follows:
false
: always yieldsfalse
, regardless of operands.oeq
: yieldstrue
if both operands are not a QNAN andop1
is equal toop2
.ogt
: yieldstrue
if both operands are not a QNAN andop1
is greater thanop2
.oge
: yieldstrue
if both operands are not a QNAN andop1
is greater than or equal toop2
.olt
: yieldstrue
if both operands are not a QNAN andop1
is less thanop2
.ole
: yieldstrue
if both operands are not a QNAN andop1
is less than or equal toop2
.one
: yieldstrue
if both operands are not a QNAN andop1
is not equal toop2
.ord
: yieldstrue
if both operands are not a QNAN.ueq
: yieldstrue
if either operand is a QNAN orop1
is equal toop2
.ugt
: yieldstrue
if either operand is a QNAN orop1
is greater thanop2
.uge
: yieldstrue
if either operand is a QNAN orop1
is greater than or equal toop2
.ult
: yieldstrue
if either operand is a QNAN orop1
is less thanop2
.ule
: yieldstrue
if either operand is a QNAN orop1
is less than or equal toop2
.une
: yieldstrue
if either operand is a QNAN orop1
is not equal toop2
.uno
: yieldstrue
if either operand is a QNAN.true
: always yieldstrue
, regardless of operands.
The fcmp
instruction can also optionally take any number of
fast-math flags, which are optimization hints to enable
otherwise unsafe floating point optimizations.
Any set of fast-math flags are legal on an fcmp
instruction, but the
only flags that have any effect on its semantics are those that allow
assumptions to be made about the values of input arguments; namely
nnan
, ninf
, and nsz
. See Fast-Math Flags for more information.
Example:¶
<result> = fcmp oeq float 4.0, 5.0 ; yields: result=false
<result> = fcmp one float 4.0, 5.0 ; yields: result=true
<result> = fcmp olt float 4.0, 5.0 ; yields: result=true
<result> = fcmp ueq double 1.0, 2.0 ; yields: result=false
Note that the code generator does not yet support vector types with the
fcmp
instruction.
‘phi
‘ Instruction¶
Overview:¶
The ‘phi
‘ instruction is used to implement the φ node in the SSA
graph representing the function.
Arguments:¶
The type of the incoming values is specified with the first type field.
After this, the ‘phi
‘ instruction takes a list of pairs as
arguments, with one pair for each predecessor basic block of the current
block. Only values of first class type may be used as
the value arguments to the PHI node. Only labels may be used as the
label arguments.
There must be no non-phi instructions between the start of a basic block and the PHI instructions: i.e. PHI instructions must be first in a basic block.
For the purposes of the SSA form, the use of each incoming value is
deemed to occur on the edge from the corresponding predecessor block to
the current block (but after any definition of an ‘invoke
‘
instruction’s return value on the same edge).
Semantics:¶
At runtime, the ‘phi
‘ instruction logically takes on the value
specified by the pair corresponding to the predecessor basic block that
executed just prior to the current block.
‘select
‘ Instruction¶
Syntax:¶
<result> = select selty <cond>, <ty> <val1>, <ty> <val2> ; yields ty
selty is either i1 or {<N x i1>}
Overview:¶
The ‘select
‘ instruction is used to choose one value based on a
condition, without IR-level branching.
Arguments:¶
The ‘select
‘ instruction requires an ‘i1’ value or a vector of ‘i1’
values indicating the condition, and two values of the same first
class type.
Semantics:¶
If the condition is an i1 and it evaluates to 1, the instruction returns the first value argument; otherwise, it returns the second value argument.
If the condition is a vector of i1, then the value arguments must be vectors of the same size, and the selection is done element by element.
If the condition is an i1 and the value arguments are vectors of the same size, then an entire vector is selected.
‘call
‘ Instruction¶
Syntax:¶
<result> = [tail | musttail] call [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs]
Arguments:¶
This instruction requires several arguments:
The optional
tail
andmusttail
markers indicate that the optimizers should perform tail call optimization. Thetail
marker is a hint that can be ignored. Themusttail
marker means that the call must be tail call optimized in order for the program to be correct. Themusttail
marker provides these guarantees:- The call will not cause unbounded stack growth if it is part of a recursive cycle in the call graph.
- Arguments with the inalloca attribute are forwarded in place.
Both markers imply that the callee does not access allocas or varargs from the caller. Calls marked
musttail
must obey the following additional rules:- The call must immediately precede a ret instruction, or a pointer bitcast followed by a ret instruction.
- The ret instruction must return the (possibly bitcasted) value produced by the call or void.
- The caller and callee prototypes must match. Pointer types of parameters or return types may differ in pointee type, but not in address space.
- The calling conventions of the caller and callee must match.
- All ABI-impacting function attributes, such as sret, byval, inreg, returned, and inalloca, must match.
- The callee must be varargs iff the caller is varargs. Bitcasting a non-varargs function to the appropriate varargs type is legal so long as the non-varargs prefixes obey the other rules.
Tail call optimization for calls marked
tail
is guaranteed to occur if the following conditions are met:- Caller and callee both have the calling convention
fastcc
. - The call is in tail position (ret immediately follows call and ret uses value of call or is void).
- Option
-tailcallopt
is enabled, orllvm::GuaranteedTailCallOpt
istrue
. - Platform-specific constraints are met.
The optional “cconv” marker indicates which calling convention the call should use. If none is specified, the call defaults to using C calling conventions. The calling convention of the call must match the calling convention of the target function, or else the behavior is undefined.
The optional Parameter Attributes list for return values. Only ‘
zeroext
‘, ‘signext
‘, and ‘inreg
‘ attributes are valid here.‘
ty
‘: the type of the call instruction itself which is also the type of the return value. Functions that return no value are markedvoid
.‘
fnty
‘: shall be the signature of the pointer to function value being invoked. The argument types must match the types implied by this signature. This type can be omitted if the function is not varargs and if the function type does not return a pointer to a function.‘
fnptrval
‘: An LLVM value containing a pointer to a function to be invoked. In most cases, this is a direct function invocation, but indirectcall
‘s are just as possible, calling an arbitrary pointer to function value.‘
function args
‘: argument list whose types match the function signature argument types and parameter attributes. All arguments must be of first class type. If the function signature indicates the function accepts a variable number of arguments, the extra arguments can be specified.The optional function attributes list. Only ‘
noreturn
‘, ‘nounwind
‘, ‘readonly
‘ and ‘readnone
‘ attributes are valid here.
Semantics:¶
The ‘call
‘ instruction is used to cause control flow to transfer to
a specified function, with its incoming arguments bound to the specified
values. Upon a ‘ret
‘ instruction in the called function, control
flow continues with the instruction after the function call, and the
return value of the function is bound to the result argument.
Example:¶
%retval = call i32 @test(i32 %argc)
call i32 (i8*, ...)* @printf(i8* %msg, i32 12, i8 42) ; yields i32
%X = tail call i32 @foo() ; yields i32
%Y = tail call fastcc i32 @foo() ; yields i32
call void %foo(i8 97 signext)
%struct.A = type { i32, i8 }
%r = call %struct.A @foo() ; yields { i32, i8 }
%gr = extractvalue %struct.A %r, 0 ; yields i32
%gr1 = extractvalue %struct.A %r, 1 ; yields i8
%Z = call void @foo() noreturn ; indicates that %foo never returns normally
%ZZ = call zeroext i32 @bar() ; Return value is %zero extended
llvm treats calls to some functions with names and arguments that match the standard C99 library as being the C99 library functions, and may perform optimizations or generate code for them under that assumption. This is something we’d like to change in the future to provide better support for freestanding environments and non-C-based languages.
‘va_arg
‘ Instruction¶
Overview:¶
The ‘va_arg
‘ instruction is used to access arguments passed through
the “variable argument” area of a function call. It is used to implement
the va_arg
macro in C.
Arguments:¶
This instruction takes a va_list*
value and the type of the
argument. It returns a value of the specified argument type and
increments the va_list
to point to the next argument. The actual
type of va_list
is target specific.
Semantics:¶
The ‘va_arg
‘ instruction loads an argument of the specified type
from the specified va_list
and causes the va_list
to point to
the next argument. For more information, see the variable argument
handling Intrinsic Functions.
It is legal for this instruction to be called in a function which does
not take a variable number of arguments, for example, the vfprintf
function.
va_arg
is an LLVM instruction instead of an intrinsic
function because it takes a type as an argument.
Example:¶
See the variable argument processing section.
Note that the code generator does not yet fully support va_arg on many targets. Also, it does not currently support va_arg with aggregate types on any target.
‘landingpad
‘ Instruction¶
Syntax:¶
<resultval> = landingpad <resultty> <clause>+
<resultval> = landingpad <resultty> cleanup <clause>*
<clause> := catch <type> <value>
<clause> := filter <array constant type> <array constant>
Overview:¶
The ‘landingpad
‘ instruction is used by LLVM’s exception handling
system to specify that a basic block
is a landing pad — one where the exception lands, and corresponds to the
code found in the catch
portion of a try
/catch
sequence. It
defines values supplied by the personality function upon
re-entry to the function. The resultval
has the type resultty
.
Arguments:¶
The optional
cleanup
flag indicates that the landing pad block is a cleanup.
A clause
begins with the clause type — catch
or filter
— and
contains the global variable representing the “type” that may be caught
or filtered respectively. Unlike the catch
clause, the filter
clause takes an array constant as its argument. Use
“[0 x i8**] undef
” for a filter which cannot throw. The
‘landingpad
‘ instruction must contain at least one clause
or
the cleanup
flag.
Semantics:¶
The ‘landingpad
‘ instruction defines the values which are set by the
personality function upon re-entry to the function, and
therefore the “result type” of the landingpad
instruction. As with
calling conventions, how the personality function results are
represented in LLVM IR is target specific.
The clauses are applied in order from top to bottom. If two
landingpad
instructions are merged together through inlining, the
clauses from the calling function are appended to the list of clauses.
When the call stack is being unwound due to an exception being thrown,
the exception is compared against each clause
in turn. If it doesn’t
match any of the clauses, and the cleanup
flag is not set, then
unwinding continues further up the call stack.
The landingpad
instruction has several restrictions:
- A landing pad block is a basic block which is the unwind destination
of an ‘
invoke
‘ instruction. - A landing pad block must have a ‘
landingpad
‘ instruction as its first non-PHI instruction. - There can be only one ‘
landingpad
‘ instruction within the landing pad block. - A basic block that is not a landing pad block may not include a
‘
landingpad
‘ instruction.
Example:¶
;; A landing pad which can catch an integer.
%res = landingpad { i8*, i32 }
catch i8** @_ZTIi
;; A landing pad that is a cleanup.
%res = landingpad { i8*, i32 }
cleanup
;; A landing pad which can catch an integer and can only throw a double.
%res = landingpad { i8*, i32 }
catch i8** @_ZTIi
filter [1 x i8**] [@_ZTId]
Intrinsic Functions¶
LLVM supports the notion of an “intrinsic function”. These functions have well known names and semantics and are required to follow certain restrictions. Overall, these intrinsics represent an extension mechanism for the LLVM language that does not require changing all of the transformations in LLVM when adding to the language (or the bitcode reader/writer, the parser, etc...).
Intrinsic function names must all start with an “llvm.
” prefix. This
prefix is reserved in LLVM for intrinsic names; thus, function names may
not begin with this prefix. Intrinsic functions must always be external
functions: you cannot define the body of intrinsic functions. Intrinsic
functions may only be used in call or invoke instructions: it is illegal
to take the address of an intrinsic function. Additionally, because
intrinsic functions are part of the LLVM language, it is required if any
are added that they be documented here.
Some intrinsic functions can be overloaded, i.e., the intrinsic represents a family of functions that perform the same operation but on different data types. Because LLVM can represent over 8 million different integer types, overloading is used commonly to allow an intrinsic function to operate on any integer type. One or more of the argument types or the result type can be overloaded to accept any integer type. Argument types may also be defined as exactly matching a previous argument’s type or the result type. This allows an intrinsic function which accepts multiple arguments, but needs all of them to be of the same type, to only be overloaded with respect to a single argument or the result.
Overloaded intrinsics will have the names of its overloaded argument
types encoded into its function name, each preceded by a period. Only
those types which are overloaded result in a name suffix. Arguments
whose type is matched against another type do not. For example, the
llvm.ctpop
function can take an integer of any width and returns an
integer of exactly the same integer width. This leads to a family of
functions such as i8 @llvm.ctpop.i8(i8 %val)
and
i29 @llvm.ctpop.i29(i29 %val)
. Only one type, the return type, is
overloaded, and only one type suffix is required. Because the argument’s
type is matched against the return type, it does not require its own
name suffix.
To learn how to add an intrinsic function, please see the Extending LLVM Guide.
Variable Argument Handling Intrinsics¶
Variable argument support is defined in LLVM with the
va_arg instruction and these three intrinsic
functions. These functions are related to the similarly named macros
defined in the <stdarg.h>
header file.
All of these functions operate on arguments that use a target-specific
value type “va_list
”. The LLVM assembly language reference manual
does not define what this type is, so all transformations should be
prepared to handle these functions regardless of the type used.
This example shows how the va_arg instruction and the variable argument handling intrinsic functions are used.
; This struct is different for every platform. For most platforms,
; it is merely an i8*.
%struct.va_list = type { i8* }
; For Unix x86_64 platforms, va_list is the following struct:
; %struct.va_list = type { i32, i32, i8*, i8* }
define i32 @test(i32 %X, ...) {
; Initialize variable argument processing
%ap = alloca %struct.va_list
%ap2 = bitcast %struct.va_list* %ap to i8*
call void @llvm.va_start(i8* %ap2)
; Read a single integer argument
%tmp = va_arg i8* %ap2, i32
; Demonstrate usage of llvm.va_copy and llvm.va_end
%aq = alloca i8*
%aq2 = bitcast i8** %aq to i8*
call void @llvm.va_copy(i8* %aq2, i8* %ap2)
call void @llvm.va_end(i8* %aq2)
; Stop processing of arguments.
call void @llvm.va_end(i8* %ap2)
ret i32 %tmp
}
declare void @llvm.va_start(i8*)
declare void @llvm.va_copy(i8*, i8*)
declare void @llvm.va_end(i8*)
‘llvm.va_start
‘ Intrinsic¶
Arguments:¶
The argument is a pointer to a va_list
element to initialize.
Semantics:¶
The ‘llvm.va_start
‘ intrinsic works just like the va_start
macro
available in C. In a target-dependent way, it initializes the
va_list
element to which the argument points, so that the next call
to va_arg
will produce the first variable argument passed to the
function. Unlike the C va_start
macro, this intrinsic does not need
to know the last argument of the function as the compiler can figure
that out.
‘llvm.va_end
‘ Intrinsic¶
Overview:¶
The ‘llvm.va_end
‘ intrinsic destroys *<arglist>
, which has been
initialized previously with llvm.va_start
or llvm.va_copy
.
Arguments:¶
The argument is a pointer to a va_list
to destroy.
Semantics:¶
The ‘llvm.va_end
‘ intrinsic works just like the va_end
macro
available in C. In a target-dependent way, it destroys the va_list
element to which the argument points. Calls to
llvm.va_start and
llvm.va_copy must be matched exactly with calls to
llvm.va_end
.
‘llvm.va_copy
‘ Intrinsic¶
Overview:¶
The ‘llvm.va_copy
‘ intrinsic copies the current argument position
from the source argument list to the destination argument list.
Arguments:¶
The first argument is a pointer to a va_list
element to initialize.
The second argument is a pointer to a va_list
element to copy from.
Semantics:¶
The ‘llvm.va_copy
‘ intrinsic works just like the va_copy
macro
available in C. In a target-dependent way, it copies the source
va_list
element into the destination va_list
element. This
intrinsic is necessary because the `` llvm.va_start`` intrinsic may be
arbitrarily complex and require, for example, memory allocation.
Accurate Garbage Collection Intrinsics¶
LLVM’s support for Accurate Garbage Collection (GC) requires the frontend to generate code containing appropriate intrinsic calls and select an appropriate GC strategy which knows how to lower these intrinsics in a manner which is appropriate for the target collector.
These intrinsics allow identification of GC roots on the stack, as well as garbage collector implementations that require read and write barriers. Frontends for type-safe garbage collected languages should generate these intrinsics to make use of the LLVM garbage collectors. For more details, see Garbage Collection with LLVM.
Experimental Statepoint Intrinsics¶
LLVM provides an second experimental set of intrinsics for describing garbage
collection safepoints in compiled code. These intrinsics are an alternative
to the llvm.gcroot
intrinsics, but are compatible with the ones for
read and write barriers. The
differences in approach are covered in the Garbage Collection with LLVM documentation. The intrinsics themselves are
described in Garbage Collection Safepoints in LLVM.
‘llvm.gcroot
‘ Intrinsic¶
Overview:¶
The ‘llvm.gcroot
‘ intrinsic declares the existence of a GC root to
the code generator, and allows some metadata to be associated with it.
Arguments:¶
The first argument specifies the address of a stack object that contains the root pointer. The second pointer (which must be either a constant or a global value address) contains the meta-data to be associated with the root.
Semantics:¶
At runtime, a call to this intrinsic stores a null pointer into the
“ptrloc” location. At compile-time, the code generator generates
information to allow the runtime to find the pointer at GC safe points.
The ‘llvm.gcroot
‘ intrinsic may only be used in a function which
specifies a GC algorithm.
‘llvm.gcread
‘ Intrinsic¶
Overview:¶
The ‘llvm.gcread
‘ intrinsic identifies reads of references from heap
locations, allowing garbage collector implementations that require read
barriers.
Arguments:¶
The second argument is the address to read from, which should be an address allocated from the garbage collector. The first object is a pointer to the start of the referenced object, if needed by the language runtime (otherwise null).
Semantics:¶
The ‘llvm.gcread
‘ intrinsic has the same semantics as a load
instruction, but may be replaced with substantially more complex code by
the garbage collector runtime, as needed. The ‘llvm.gcread
‘
intrinsic may only be used in a function which specifies a GC
algorithm.
‘llvm.gcwrite
‘ Intrinsic¶
Overview:¶
The ‘llvm.gcwrite
‘ intrinsic identifies writes of references to heap
locations, allowing garbage collector implementations that require write
barriers (such as generational or reference counting collectors).
Arguments:¶
The first argument is the reference to store, the second is the start of the object to store it to, and the third is the address of the field of Obj to store to. If the runtime does not require a pointer to the object, Obj may be null.
Semantics:¶
The ‘llvm.gcwrite
‘ intrinsic has the same semantics as a store
instruction, but may be replaced with substantially more complex code by
the garbage collector runtime, as needed. The ‘llvm.gcwrite
‘
intrinsic may only be used in a function which specifies a GC
algorithm.
Code Generator Intrinsics¶
These intrinsics are provided by LLVM to expose special features that may only be implemented with code generator support.
‘llvm.returnaddress
‘ Intrinsic¶
Overview:¶
The ‘llvm.returnaddress
‘ intrinsic attempts to compute a
target-specific value indicating the return address of the current
function or one of its callers.
Arguments:¶
The argument to this intrinsic indicates which function to return the address for. Zero indicates the calling function, one indicates its caller, etc. The argument is required to be a constant integer value.
Semantics:¶
The ‘llvm.returnaddress
‘ intrinsic either returns a pointer
indicating the return address of the specified call frame, or zero if it
cannot be identified. The value returned by this intrinsic is likely to
be incorrect or 0 for arguments other than zero, so it should only be
used for debugging purposes.
Note that calling this intrinsic does not prevent function inlining or other aggressive transformations, so the value returned may not be that of the obvious source-language caller.
‘llvm.frameaddress
‘ Intrinsic¶
Overview:¶
The ‘llvm.frameaddress
‘ intrinsic attempts to return the
target-specific frame pointer value for the specified stack frame.
Arguments:¶
The argument to this intrinsic indicates which function to return the frame pointer for. Zero indicates the calling function, one indicates its caller, etc. The argument is required to be a constant integer value.
Semantics:¶
The ‘llvm.frameaddress
‘ intrinsic either returns a pointer
indicating the frame address of the specified call frame, or zero if it
cannot be identified. The value returned by this intrinsic is likely to
be incorrect or 0 for arguments other than zero, so it should only be
used for debugging purposes.
Note that calling this intrinsic does not prevent function inlining or other aggressive transformations, so the value returned may not be that of the obvious source-language caller.
‘llvm.localescape
‘ and ‘llvm.localrecover
‘ Intrinsics¶
Syntax:¶
declare void @llvm.localescape(...)
declare i8* @llvm.localrecover(i8* %func, i8* %fp, i32 %idx)
Overview:¶
The ‘llvm.localescape
‘ intrinsic escapes offsets of a collection of static
allocas, and the ‘llvm.localrecover
‘ intrinsic applies those offsets to a
live frame pointer to recover the address of the allocation. The offset is
computed during frame layout of the caller of llvm.localescape
.
Arguments:¶
All arguments to ‘llvm.localescape
‘ must be pointers to static allocas or
casts of static allocas. Each function can only call ‘llvm.localescape
‘
once, and it can only do so from the entry block.
The func
argument to ‘llvm.localrecover
‘ must be a constant
bitcasted pointer to a function defined in the current module. The code
generator cannot determine the frame allocation offset of functions defined in
other modules.
The fp
argument to ‘llvm.localrecover
‘ must be a frame pointer of a
call frame that is currently live. The return value of ‘llvm.localaddress
‘
is one way to produce such a value, but various runtimes also expose a suitable
pointer in platform-specific ways.
The idx
argument to ‘llvm.localrecover
‘ indicates which alloca passed to
‘llvm.localescape
‘ to recover. It is zero-indexed.
Semantics:¶
These intrinsics allow a group of functions to share access to a set of local
stack allocations of a one parent function. The parent function may call the
‘llvm.localescape
‘ intrinsic once from the function entry block, and the
child functions can use ‘llvm.localrecover
‘ to access the escaped allocas.
The ‘llvm.localescape
‘ intrinsic blocks inlining, as inlining changes where
the escaped allocas are allocated, which would break attempts to use
‘llvm.localrecover
‘.
‘llvm.read_register
‘ and ‘llvm.write_register
‘ Intrinsics¶
Syntax:¶
declare i32 @llvm.read_register.i32(metadata)
declare i64 @llvm.read_register.i64(metadata)
declare void @llvm.write_register.i32(metadata, i32 @value)
declare void @llvm.write_register.i64(metadata, i64 @value)
!0 = !{!"sp\00"}
Overview:¶
The ‘llvm.read_register
‘ and ‘llvm.write_register
‘ intrinsics
provides access to the named register. The register must be valid on
the architecture being compiled to. The type needs to be compatible
with the register being read.
Semantics:¶
The ‘llvm.read_register
‘ intrinsic returns the current value of the
register, where possible. The ‘llvm.write_register
‘ intrinsic sets
the current value of the register, where possible.
This is useful to implement named register global variables that need to always be mapped to a specific register, as is common practice on bare-metal programs including OS kernels.
The compiler doesn’t check for register availability or use of the used register in surrounding code, including inline assembly. Because of that, allocatable registers are not supported.
Warning: So far it only works with the stack pointer on selected architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of work is needed to support other registers and even more so, allocatable registers.
‘llvm.stacksave
‘ Intrinsic¶
Overview:¶
The ‘llvm.stacksave
‘ intrinsic is used to remember the current state
of the function stack, for use with
llvm.stackrestore. This is useful for
implementing language features like scoped automatic variable sized
arrays in C99.
Semantics:¶
This intrinsic returns a opaque pointer value that can be passed to
llvm.stackrestore. When an
llvm.stackrestore
intrinsic is executed with a value saved from
llvm.stacksave
, it effectively restores the state of the stack to
the state it was in when the llvm.stacksave
intrinsic executed. In
practice, this pops any alloca blocks from the stack that
were allocated after the llvm.stacksave
was executed.
‘llvm.stackrestore
‘ Intrinsic¶
Overview:¶
The ‘llvm.stackrestore
‘ intrinsic is used to restore the state of
the function stack to the state it was in when the corresponding
llvm.stacksave intrinsic executed. This is
useful for implementing language features like scoped automatic variable
sized arrays in C99.
Semantics:¶
See the description for llvm.stacksave.
‘llvm.prefetch
‘ Intrinsic¶
Overview:¶
The ‘llvm.prefetch
‘ intrinsic is a hint to the code generator to
insert a prefetch instruction if supported; otherwise, it is a noop.
Prefetches have no effect on the behavior of the program but can change
its performance characteristics.
Arguments:¶
address
is the address to be prefetched, rw
is the specifier
determining if the fetch should be for a read (0) or write (1), and
locality
is a temporal locality specifier ranging from (0) - no
locality, to (3) - extremely local keep in cache. The cache type
specifies whether the prefetch is performed on the data (1) or
instruction (0) cache. The rw
, locality
and cache type
arguments must be constant integers.
Semantics:¶
This intrinsic does not modify the behavior of the program. In particular, prefetches cannot trap and do not produce a value. On targets that support this intrinsic, the prefetch can provide hints to the processor cache for better performance.
‘llvm.pcmarker
‘ Intrinsic¶
Overview:¶
The ‘llvm.pcmarker
‘ intrinsic is a method to export a Program
Counter (PC) in a region of code to simulators and other tools. The
method is target specific, but it is expected that the marker will use
exported symbols to transmit the PC of the marker. The marker makes no
guarantees that it will remain with any specific instruction after
optimizations. It is possible that the presence of a marker will inhibit
optimizations. The intended use is to be inserted after optimizations to
allow correlations of simulation runs.
Arguments:¶
id
is a numerical id identifying the marker.
Semantics:¶
This intrinsic does not modify the behavior of the program. Backends that do not support this intrinsic may ignore it.
‘llvm.readcyclecounter
‘ Intrinsic¶
Overview:¶
The ‘llvm.readcyclecounter
‘ intrinsic provides access to the cycle
counter register (or similar low latency, high accuracy clocks) on those
targets that support it. On X86, it should map to RDTSC. On Alpha, it
should map to RPCC. As the backing counters overflow quickly (on the
order of 9 seconds on alpha), this should only be used for small
timings.
Semantics:¶
When directly supported, reading the cycle counter should not modify any memory. Implementations are allowed to either return a application specific value or a system wide value. On backends without support, this is lowered to a constant 0.
Note that runtime support may be conditional on the privilege-level code is running at and the host platform.
‘llvm.clear_cache
‘ Intrinsic¶
Overview:¶
The ‘llvm.clear_cache
‘ intrinsic ensures visibility of modifications
in the specified range to the execution unit of the processor. On
targets with non-unified instruction and data cache, the implementation
flushes the instruction cache.
Semantics:¶
On platforms with coherent instruction and data caches (e.g. x86), this intrinsic is a nop. On platforms with non-coherent instruction and data cache (e.g. ARM, MIPS), the intrinsic is lowered either to appropriate instructions or a system call, if cache flushing requires special privileges.
The default behavior is to emit a call to __clear_cache
from the run
time library.
This instrinsic does not empty the instruction pipeline. Modifications of the current function are outside the scope of the intrinsic.
‘llvm.instrprof_increment
‘ Intrinsic¶
Syntax:¶
declare void @llvm.instrprof_increment(i8* <name>, i64 <hash>,
i32 <num-counters>, i32 <index>)
Overview:¶
The ‘llvm.instrprof_increment
‘ intrinsic can be emitted by a
frontend for use with instrumentation based profiling. These will be
lowered by the -instrprof
pass to generate execution counts of a
program at runtime.
Arguments:¶
The first argument is a pointer to a global variable containing the name of the entity being instrumented. This should generally be the (mangled) function name for a set of counters.
The second argument is a hash value that can be used by the consumer
of the profile data to detect changes to the instrumented source, and
the third is the number of counters associated with name
. It is an
error if hash
or num-counters
differ between two instances of
instrprof_increment
that refer to the same name.
The last argument refers to which of the counters for name
should
be incremented. It should be a value between 0 and num-counters
.
Semantics:¶
This intrinsic represents an increment of a profiling counter. It will
cause the -instrprof
pass to generate the appropriate data
structures and the code to increment the appropriate value, in a
format that can be written out by a compiler runtime and consumed via
the llvm-profdata
tool.
Standard C Library Intrinsics¶
LLVM provides intrinsics for a few important standard C library functions. These intrinsics allow source-language front-ends to pass information about the alignment of the pointer arguments to the code generator, providing opportunity for more efficient code generation.
‘llvm.memcpy
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.memcpy
on any
integer bit width and for different address spaces. Not all targets
support all bit widths however.
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
i32 <len>, i32 <align>, i1 <isvolatile>)
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
i64 <len>, i32 <align>, i1 <isvolatile>)
Overview:¶
The ‘llvm.memcpy.*
‘ intrinsics copy a block of memory from the
source location to the destination location.
Note that, unlike the standard libc function, the llvm.memcpy.*
intrinsics do not return a value, takes extra alignment/isvolatile
arguments and the pointers can be in specified address spaces.
Arguments:¶
The first argument is a pointer to the destination, the second is a pointer to the source. The third argument is an integer argument specifying the number of bytes to copy, the fourth argument is the alignment of the source and destination locations, and the fifth is a boolean indicating a volatile access.
If the call to this intrinsic has an alignment value that is not 0 or 1, then the caller guarantees that both the source and destination pointers are aligned to that boundary.
If the isvolatile
parameter is true
, the llvm.memcpy
call is
a volatile operation. The detailed access behavior is not
very cleanly specified and it is unwise to depend on it.
Semantics:¶
The ‘llvm.memcpy.*
‘ intrinsics copy a block of memory from the
source location to the destination location, which are not allowed to
overlap. It copies “len” bytes of memory over. If the argument is known
to be aligned to some boundary, this can be specified as the fourth
argument, otherwise it should be set to 0 or 1 (both meaning no alignment).
‘llvm.memmove
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.memmove on any integer bit width and for different address space. Not all targets support all bit widths however.
declare void @llvm.memmove.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
i32 <len>, i32 <align>, i1 <isvolatile>)
declare void @llvm.memmove.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
i64 <len>, i32 <align>, i1 <isvolatile>)
Overview:¶
The ‘llvm.memmove.*
‘ intrinsics move a block of memory from the
source location to the destination location. It is similar to the
‘llvm.memcpy
‘ intrinsic but allows the two memory locations to
overlap.
Note that, unlike the standard libc function, the llvm.memmove.*
intrinsics do not return a value, takes extra alignment/isvolatile
arguments and the pointers can be in specified address spaces.
Arguments:¶
The first argument is a pointer to the destination, the second is a pointer to the source. The third argument is an integer argument specifying the number of bytes to copy, the fourth argument is the alignment of the source and destination locations, and the fifth is a boolean indicating a volatile access.
If the call to this intrinsic has an alignment value that is not 0 or 1, then the caller guarantees that the source and destination pointers are aligned to that boundary.
If the isvolatile
parameter is true
, the llvm.memmove
call
is a volatile operation. The detailed access behavior is
not very cleanly specified and it is unwise to depend on it.
Semantics:¶
The ‘llvm.memmove.*
‘ intrinsics copy a block of memory from the
source location to the destination location, which may overlap. It
copies “len” bytes of memory over. If the argument is known to be
aligned to some boundary, this can be specified as the fourth argument,
otherwise it should be set to 0 or 1 (both meaning no alignment).
‘llvm.memset.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.memset on any integer bit width and for different address spaces. However, not all targets support all bit widths.
declare void @llvm.memset.p0i8.i32(i8* <dest>, i8 <val>,
i32 <len>, i32 <align>, i1 <isvolatile>)
declare void @llvm.memset.p0i8.i64(i8* <dest>, i8 <val>,
i64 <len>, i32 <align>, i1 <isvolatile>)
Overview:¶
The ‘llvm.memset.*
‘ intrinsics fill a block of memory with a
particular byte value.
Note that, unlike the standard libc function, the llvm.memset
intrinsic does not return a value and takes extra alignment/volatile
arguments. Also, the destination can be in an arbitrary address space.
Arguments:¶
The first argument is a pointer to the destination to fill, the second is the byte value with which to fill it, the third argument is an integer argument specifying the number of bytes to fill, and the fourth argument is the known alignment of the destination location.
If the call to this intrinsic has an alignment value that is not 0 or 1, then the caller guarantees that the destination pointer is aligned to that boundary.
If the isvolatile
parameter is true
, the llvm.memset
call is
a volatile operation. The detailed access behavior is not
very cleanly specified and it is unwise to depend on it.
Semantics:¶
The ‘llvm.memset.*
‘ intrinsics fill “len” bytes of memory starting
at the destination location. If the argument is known to be aligned to
some boundary, this can be specified as the fourth argument, otherwise
it should be set to 0 or 1 (both meaning no alignment).
‘llvm.sqrt.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.sqrt
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.sqrt.f32(float %Val)
declare double @llvm.sqrt.f64(double %Val)
declare x86_fp80 @llvm.sqrt.f80(x86_fp80 %Val)
declare fp128 @llvm.sqrt.f128(fp128 %Val)
declare ppc_fp128 @llvm.sqrt.ppcf128(ppc_fp128 %Val)
Overview:¶
The ‘llvm.sqrt
‘ intrinsics return the sqrt of the specified operand,
returning the same value as the libm ‘sqrt
‘ functions would. Unlike
sqrt
in libm, however, llvm.sqrt
has undefined behavior for
negative numbers other than -0.0 (which allows for better optimization,
because there is no need to worry about errno being set).
llvm.sqrt(-0.0)
is defined to return -0.0 like IEEE sqrt.
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the sqrt of the specified operand if it is a nonnegative floating point number.
‘llvm.powi.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.powi
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.powi.f32(float %Val, i32 %power)
declare double @llvm.powi.f64(double %Val, i32 %power)
declare x86_fp80 @llvm.powi.f80(x86_fp80 %Val, i32 %power)
declare fp128 @llvm.powi.f128(fp128 %Val, i32 %power)
declare ppc_fp128 @llvm.powi.ppcf128(ppc_fp128 %Val, i32 %power)
Overview:¶
The ‘llvm.powi.*
‘ intrinsics return the first operand raised to the
specified (positive or negative) power. The order of evaluation of
multiplications is not defined. When a vector of floating point type is
used, the second argument remains a scalar integer value.
Arguments:¶
The second argument is an integer power, and the first is a value to raise to that power.
Semantics:¶
This function returns the first value raised to the second power with an unspecified sequence of rounding operations.
‘llvm.sin.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.sin
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.sin.f32(float %Val)
declare double @llvm.sin.f64(double %Val)
declare x86_fp80 @llvm.sin.f80(x86_fp80 %Val)
declare fp128 @llvm.sin.f128(fp128 %Val)
declare ppc_fp128 @llvm.sin.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the sine of the specified operand, returning the
same values as the libm sin
functions would, and handles error
conditions in the same way.
‘llvm.cos.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.cos
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.cos.f32(float %Val)
declare double @llvm.cos.f64(double %Val)
declare x86_fp80 @llvm.cos.f80(x86_fp80 %Val)
declare fp128 @llvm.cos.f128(fp128 %Val)
declare ppc_fp128 @llvm.cos.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the cosine of the specified operand, returning the
same values as the libm cos
functions would, and handles error
conditions in the same way.
‘llvm.pow.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.pow
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.pow.f32(float %Val, float %Power)
declare double @llvm.pow.f64(double %Val, double %Power)
declare x86_fp80 @llvm.pow.f80(x86_fp80 %Val, x86_fp80 %Power)
declare fp128 @llvm.pow.f128(fp128 %Val, fp128 %Power)
declare ppc_fp128 @llvm.pow.ppcf128(ppc_fp128 %Val, ppc_fp128 Power)
Overview:¶
The ‘llvm.pow.*
‘ intrinsics return the first operand raised to the
specified (positive or negative) power.
Arguments:¶
The second argument is a floating point power, and the first is a value to raise to that power.
Semantics:¶
This function returns the first value raised to the second power,
returning the same values as the libm pow
functions would, and
handles error conditions in the same way.
‘llvm.exp.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.exp
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.exp.f32(float %Val)
declare double @llvm.exp.f64(double %Val)
declare x86_fp80 @llvm.exp.f80(x86_fp80 %Val)
declare fp128 @llvm.exp.f128(fp128 %Val)
declare ppc_fp128 @llvm.exp.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm exp
functions
would, and handles error conditions in the same way.
‘llvm.exp2.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.exp2
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.exp2.f32(float %Val)
declare double @llvm.exp2.f64(double %Val)
declare x86_fp80 @llvm.exp2.f80(x86_fp80 %Val)
declare fp128 @llvm.exp2.f128(fp128 %Val)
declare ppc_fp128 @llvm.exp2.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm exp2
functions
would, and handles error conditions in the same way.
‘llvm.log.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.log
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.log.f32(float %Val)
declare double @llvm.log.f64(double %Val)
declare x86_fp80 @llvm.log.f80(x86_fp80 %Val)
declare fp128 @llvm.log.f128(fp128 %Val)
declare ppc_fp128 @llvm.log.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm log
functions
would, and handles error conditions in the same way.
‘llvm.log10.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.log10
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.log10.f32(float %Val)
declare double @llvm.log10.f64(double %Val)
declare x86_fp80 @llvm.log10.f80(x86_fp80 %Val)
declare fp128 @llvm.log10.f128(fp128 %Val)
declare ppc_fp128 @llvm.log10.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm log10
functions
would, and handles error conditions in the same way.
‘llvm.log2.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.log2
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.log2.f32(float %Val)
declare double @llvm.log2.f64(double %Val)
declare x86_fp80 @llvm.log2.f80(x86_fp80 %Val)
declare fp128 @llvm.log2.f128(fp128 %Val)
declare ppc_fp128 @llvm.log2.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm log2
functions
would, and handles error conditions in the same way.
‘llvm.fma.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.fma
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.fma.f32(float %a, float %b, float %c)
declare double @llvm.fma.f64(double %a, double %b, double %c)
declare x86_fp80 @llvm.fma.f80(x86_fp80 %a, x86_fp80 %b, x86_fp80 %c)
declare fp128 @llvm.fma.f128(fp128 %a, fp128 %b, fp128 %c)
declare ppc_fp128 @llvm.fma.ppcf128(ppc_fp128 %a, ppc_fp128 %b, ppc_fp128 %c)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm fma
functions
would, and does not set errno.
‘llvm.fabs.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.fabs
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.fabs.f32(float %Val)
declare double @llvm.fabs.f64(double %Val)
declare x86_fp80 @llvm.fabs.f80(x86_fp80 %Val)
declare fp128 @llvm.fabs.f128(fp128 %Val)
declare ppc_fp128 @llvm.fabs.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm fabs
functions
would, and handles error conditions in the same way.
‘llvm.minnum.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.minnum
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.minnum.f32(float %Val0, float %Val1)
declare double @llvm.minnum.f64(double %Val0, double %Val1)
declare x86_fp80 @llvm.minnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
declare fp128 @llvm.minnum.f128(fp128 %Val0, fp128 %Val1)
declare ppc_fp128 @llvm.minnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
Arguments:¶
The arguments and return value are floating point numbers of the same type.
Semantics:¶
Follows the IEEE-754 semantics for minNum, which also match for libm’s fmin.
If either operand is a NaN, returns the other non-NaN operand. Returns NaN only if both operands are NaN. If the operands compare equal, returns a value that compares equal to both operands. This means that fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0.
‘llvm.maxnum.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.maxnum
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.maxnum.f32(float %Val0, float %Val1l)
declare double @llvm.maxnum.f64(double %Val0, double %Val1)
declare x86_fp80 @llvm.maxnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
declare fp128 @llvm.maxnum.f128(fp128 %Val0, fp128 %Val1)
declare ppc_fp128 @llvm.maxnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
Arguments:¶
The arguments and return value are floating point numbers of the same type.
Semantics:¶
Follows the IEEE-754 semantics for maxNum, which also match for libm’s fmax.
If either operand is a NaN, returns the other non-NaN operand. Returns NaN only if both operands are NaN. If the operands compare equal, returns a value that compares equal to both operands. This means that fmax(+/-0.0, +/-0.0) could return either -0.0 or 0.0.
‘llvm.copysign.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.copysign
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.copysign.f32(float %Mag, float %Sgn)
declare double @llvm.copysign.f64(double %Mag, double %Sgn)
declare x86_fp80 @llvm.copysign.f80(x86_fp80 %Mag, x86_fp80 %Sgn)
declare fp128 @llvm.copysign.f128(fp128 %Mag, fp128 %Sgn)
declare ppc_fp128 @llvm.copysign.ppcf128(ppc_fp128 %Mag, ppc_fp128 %Sgn)
Overview:¶
The ‘llvm.copysign.*
‘ intrinsics return a value with the magnitude of the
first operand and the sign of the second operand.
Arguments:¶
The arguments and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm copysign
functions would, and handles error conditions in the same way.
‘llvm.floor.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.floor
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.floor.f32(float %Val)
declare double @llvm.floor.f64(double %Val)
declare x86_fp80 @llvm.floor.f80(x86_fp80 %Val)
declare fp128 @llvm.floor.f128(fp128 %Val)
declare ppc_fp128 @llvm.floor.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm floor
functions
would, and handles error conditions in the same way.
‘llvm.ceil.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.ceil
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.ceil.f32(float %Val)
declare double @llvm.ceil.f64(double %Val)
declare x86_fp80 @llvm.ceil.f80(x86_fp80 %Val)
declare fp128 @llvm.ceil.f128(fp128 %Val)
declare ppc_fp128 @llvm.ceil.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm ceil
functions
would, and handles error conditions in the same way.
‘llvm.trunc.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.trunc
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.trunc.f32(float %Val)
declare double @llvm.trunc.f64(double %Val)
declare x86_fp80 @llvm.trunc.f80(x86_fp80 %Val)
declare fp128 @llvm.trunc.f128(fp128 %Val)
declare ppc_fp128 @llvm.trunc.ppcf128(ppc_fp128 %Val)
Overview:¶
The ‘llvm.trunc.*
‘ intrinsics returns the operand rounded to the
nearest integer not larger in magnitude than the operand.
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm trunc
functions
would, and handles error conditions in the same way.
‘llvm.rint.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.rint
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.rint.f32(float %Val)
declare double @llvm.rint.f64(double %Val)
declare x86_fp80 @llvm.rint.f80(x86_fp80 %Val)
declare fp128 @llvm.rint.f128(fp128 %Val)
declare ppc_fp128 @llvm.rint.ppcf128(ppc_fp128 %Val)
Overview:¶
The ‘llvm.rint.*
‘ intrinsics returns the operand rounded to the
nearest integer. It may raise an inexact floating-point exception if the
operand isn’t an integer.
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm rint
functions
would, and handles error conditions in the same way.
‘llvm.nearbyint.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.nearbyint
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.nearbyint.f32(float %Val)
declare double @llvm.nearbyint.f64(double %Val)
declare x86_fp80 @llvm.nearbyint.f80(x86_fp80 %Val)
declare fp128 @llvm.nearbyint.f128(fp128 %Val)
declare ppc_fp128 @llvm.nearbyint.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm nearbyint
functions would, and handles error conditions in the same way.
‘llvm.round.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.round
on any
floating point or vector of floating point type. Not all targets support
all types however.
declare float @llvm.round.f32(float %Val)
declare double @llvm.round.f64(double %Val)
declare x86_fp80 @llvm.round.f80(x86_fp80 %Val)
declare fp128 @llvm.round.f128(fp128 %Val)
declare ppc_fp128 @llvm.round.ppcf128(ppc_fp128 %Val)
Arguments:¶
The argument and return value are floating point numbers of the same type.
Semantics:¶
This function returns the same values as the libm round
functions would, and handles error conditions in the same way.
Bit Manipulation Intrinsics¶
LLVM provides intrinsics for a few important bit manipulation operations. These allow efficient code generation for some algorithms.
‘llvm.bswap.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic function. You can use bswap on any integer type that is an even number of bytes (i.e. BitWidth % 16 == 0).
declare i16 @llvm.bswap.i16(i16 <id>)
declare i32 @llvm.bswap.i32(i32 <id>)
declare i64 @llvm.bswap.i64(i64 <id>)
Overview:¶
The ‘llvm.bswap
‘ family of intrinsics is used to byte swap integer
values with an even number of bytes (positive multiple of 16 bits).
These are useful for performing operations on data that is not in the
target’s native byte order.
Semantics:¶
The llvm.bswap.i16
intrinsic returns an i16 value that has the high
and low byte of the input i16 swapped. Similarly, the llvm.bswap.i32
intrinsic returns an i32 value that has the four bytes of the input i32
swapped, so that if the input bytes are numbered 0, 1, 2, 3 then the
returned i32 will have its bytes in 3, 2, 1, 0 order. The
llvm.bswap.i48
, llvm.bswap.i64
and other intrinsics extend this
concept to additional even-byte lengths (6 bytes, 8 bytes and more,
respectively).
‘llvm.ctpop.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.ctpop on any integer bit width, or on any vector with integer elements. Not all targets support all bit widths or vector types, however.
declare i8 @llvm.ctpop.i8(i8 <src>)
declare i16 @llvm.ctpop.i16(i16 <src>)
declare i32 @llvm.ctpop.i32(i32 <src>)
declare i64 @llvm.ctpop.i64(i64 <src>)
declare i256 @llvm.ctpop.i256(i256 <src>)
declare <2 x i32> @llvm.ctpop.v2i32(<2 x i32> <src>)
Arguments:¶
The only argument is the value to be counted. The argument may be of any integer type, or a vector with integer elements. The return type must match the argument type.
Semantics:¶
The ‘llvm.ctpop
‘ intrinsic counts the 1’s in a variable, or within
each element of a vector.
‘llvm.ctlz.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.ctlz
on any
integer bit width, or any vector whose elements are integers. Not all
targets support all bit widths or vector types, however.
declare i8 @llvm.ctlz.i8 (i8 <src>, i1 <is_zero_undef>)
declare i16 @llvm.ctlz.i16 (i16 <src>, i1 <is_zero_undef>)
declare i32 @llvm.ctlz.i32 (i32 <src>, i1 <is_zero_undef>)
declare i64 @llvm.ctlz.i64 (i64 <src>, i1 <is_zero_undef>)
declare i256 @llvm.ctlz.i256(i256 <src>, i1 <is_zero_undef>)
declase <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>)
Overview:¶
The ‘llvm.ctlz
‘ family of intrinsic functions counts the number of
leading zeros in a variable.
Arguments:¶
The first argument is the value to be counted. This argument may be of any integer type, or a vector with integer element type. The return type must match the first argument type.
The second argument must be a constant and is a flag to indicate whether the intrinsic should ensure that a zero as the first argument produces a defined result. Historically some architectures did not provide a defined result for zero values as efficiently, and many algorithms are now predicated on avoiding zero-value inputs.
Semantics:¶
The ‘llvm.ctlz
‘ intrinsic counts the leading (most significant)
zeros in a variable, or within each element of the vector. If
src == 0
then the result is the size in bits of the type of src
if is_zero_undef == 0
and undef
otherwise. For example,
llvm.ctlz(i32 2) = 30
.
‘llvm.cttz.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.cttz
on any
integer bit width, or any vector of integer elements. Not all targets
support all bit widths or vector types, however.
declare i8 @llvm.cttz.i8 (i8 <src>, i1 <is_zero_undef>)
declare i16 @llvm.cttz.i16 (i16 <src>, i1 <is_zero_undef>)
declare i32 @llvm.cttz.i32 (i32 <src>, i1 <is_zero_undef>)
declare i64 @llvm.cttz.i64 (i64 <src>, i1 <is_zero_undef>)
declare i256 @llvm.cttz.i256(i256 <src>, i1 <is_zero_undef>)
declase <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>)
Arguments:¶
The first argument is the value to be counted. This argument may be of any integer type, or a vector with integer element type. The return type must match the first argument type.
The second argument must be a constant and is a flag to indicate whether the intrinsic should ensure that a zero as the first argument produces a defined result. Historically some architectures did not provide a defined result for zero values as efficiently, and many algorithms are now predicated on avoiding zero-value inputs.
Semantics:¶
The ‘llvm.cttz
‘ intrinsic counts the trailing (least significant)
zeros in a variable, or within each element of a vector. If src == 0
then the result is the size in bits of the type of src
if
is_zero_undef == 0
and undef
otherwise. For example,
llvm.cttz(2) = 1
.
Arithmetic with Overflow Intrinsics¶
LLVM provides intrinsics for some arithmetic with overflow operations.
‘llvm.sadd.with.overflow.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.sadd.with.overflow
on any integer bit width.
declare {i16, i1} @llvm.sadd.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.sadd.with.overflow.i64(i64 %a, i64 %b)
Overview:¶
The ‘llvm.sadd.with.overflow
‘ family of intrinsic functions perform
a signed addition of the two arguments, and indicate whether an overflow
occurred during the signed summation.
Arguments:¶
The arguments (%a and %b) and the first element of the result structure
may be of integer types of any bit width, but they must have the same
bit width. The second element of the result structure must be of type
i1
. %a
and %b
are the two values that will undergo signed
addition.
Semantics:¶
The ‘llvm.sadd.with.overflow
‘ family of intrinsic functions perform
a signed addition of the two variables. They return a structure — the
first element of which is the signed summation, and the second element
of which is a bit specifying if the signed summation resulted in an
overflow.
‘llvm.uadd.with.overflow.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.uadd.with.overflow
on any integer bit width.
declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
Overview:¶
The ‘llvm.uadd.with.overflow
‘ family of intrinsic functions perform
an unsigned addition of the two arguments, and indicate whether a carry
occurred during the unsigned summation.
Arguments:¶
The arguments (%a and %b) and the first element of the result structure
may be of integer types of any bit width, but they must have the same
bit width. The second element of the result structure must be of type
i1
. %a
and %b
are the two values that will undergo unsigned
addition.
Semantics:¶
The ‘llvm.uadd.with.overflow
‘ family of intrinsic functions perform
an unsigned addition of the two arguments. They return a structure — the
first element of which is the sum, and the second element of which is a
bit specifying if the unsigned summation resulted in a carry.
‘llvm.ssub.with.overflow.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.ssub.with.overflow
on any integer bit width.
declare {i16, i1} @llvm.ssub.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.ssub.with.overflow.i64(i64 %a, i64 %b)
Overview:¶
The ‘llvm.ssub.with.overflow
‘ family of intrinsic functions perform
a signed subtraction of the two arguments, and indicate whether an
overflow occurred during the signed subtraction.
Arguments:¶
The arguments (%a and %b) and the first element of the result structure
may be of integer types of any bit width, but they must have the same
bit width. The second element of the result structure must be of type
i1
. %a
and %b
are the two values that will undergo signed
subtraction.
Semantics:¶
The ‘llvm.ssub.with.overflow
‘ family of intrinsic functions perform
a signed subtraction of the two arguments. They return a structure — the
first element of which is the subtraction, and the second element of
which is a bit specifying if the signed subtraction resulted in an
overflow.
‘llvm.usub.with.overflow.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.usub.with.overflow
on any integer bit width.
declare {i16, i1} @llvm.usub.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.usub.with.overflow.i64(i64 %a, i64 %b)
Overview:¶
The ‘llvm.usub.with.overflow
‘ family of intrinsic functions perform
an unsigned subtraction of the two arguments, and indicate whether an
overflow occurred during the unsigned subtraction.
Arguments:¶
The arguments (%a and %b) and the first element of the result structure
may be of integer types of any bit width, but they must have the same
bit width. The second element of the result structure must be of type
i1
. %a
and %b
are the two values that will undergo unsigned
subtraction.
Semantics:¶
The ‘llvm.usub.with.overflow
‘ family of intrinsic functions perform
an unsigned subtraction of the two arguments. They return a structure —
the first element of which is the subtraction, and the second element of
which is a bit specifying if the unsigned subtraction resulted in an
overflow.
‘llvm.smul.with.overflow.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.smul.with.overflow
on any integer bit width.
declare {i16, i1} @llvm.smul.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.smul.with.overflow.i64(i64 %a, i64 %b)
Overview:¶
The ‘llvm.smul.with.overflow
‘ family of intrinsic functions perform
a signed multiplication of the two arguments, and indicate whether an
overflow occurred during the signed multiplication.
Arguments:¶
The arguments (%a and %b) and the first element of the result structure
may be of integer types of any bit width, but they must have the same
bit width. The second element of the result structure must be of type
i1
. %a
and %b
are the two values that will undergo signed
multiplication.
Semantics:¶
The ‘llvm.smul.with.overflow
‘ family of intrinsic functions perform
a signed multiplication of the two arguments. They return a structure —
the first element of which is the multiplication, and the second element
of which is a bit specifying if the signed multiplication resulted in an
overflow.
‘llvm.umul.with.overflow.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.umul.with.overflow
on any integer bit width.
declare {i16, i1} @llvm.umul.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
Overview:¶
The ‘llvm.umul.with.overflow
‘ family of intrinsic functions perform
a unsigned multiplication of the two arguments, and indicate whether an
overflow occurred during the unsigned multiplication.
Arguments:¶
The arguments (%a and %b) and the first element of the result structure
may be of integer types of any bit width, but they must have the same
bit width. The second element of the result structure must be of type
i1
. %a
and %b
are the two values that will undergo unsigned
multiplication.
Semantics:¶
The ‘llvm.umul.with.overflow
‘ family of intrinsic functions perform
an unsigned multiplication of the two arguments. They return a structure —
the first element of which is the multiplication, and the second
element of which is a bit specifying if the unsigned multiplication
resulted in an overflow.
Specialised Arithmetic Intrinsics¶
‘llvm.canonicalize.*
‘ Intrinsic¶
Syntax:¶
declare float @llvm.canonicalize.f32(float %a)
declare double @llvm.canonicalize.f64(double %b)
Overview:¶
The ‘llvm.canonicalize.*
‘ intrinsic returns the platform specific canonical
encoding of a floating point number. This canonicalization is useful for
implementing certain numeric primitives such as frexp. The canonical encoding is
defined by IEEE-754-2008 to be:
2.1.8 canonical encoding: The preferred encoding of a floating-point
representation in a format. Applied to declets, significands of finite
numbers, infinities, and NaNs, especially in decimal formats.
This operation can also be considered equivalent to the IEEE-754-2008 conversion of a floating-point value to the same format. NaNs are handled according to section 6.2.
Examples of non-canonical encodings:
- x87 pseudo denormals, pseudo NaNs, pseudo Infinity, Unnormals. These are converted to a canonical representation per hardware-specific protocol.
- Many normal decimal floating point numbers have non-canonical alternative encodings.
- Some machines, like GPUs or ARMv7 NEON, do not support subnormal values. These are treated as non-canonical encodings of zero and with be flushed to a zero of the same sign by this operation.
Note that per IEEE-754-2008 6.2, systems that support signaling NaNs with default exception handling must signal an invalid exception, and produce a quiet NaN result.
This function should always be implementable as multiplication by 1.0, provided
that the compiler does not constant fold the operation. Likewise, division by
1.0 and llvm.minnum(x, x)
are possible implementations. Addition with
-0.0 is also sufficient provided that the rounding mode is not -Infinity.
@llvm.canonicalize
must preserve the equality relation. That is:
(@llvm.canonicalize(x) == x)
is equivalent to(x == x)
(@llvm.canonicalize(x) == @llvm.canonicalize(y))
is equivalent to to(x == y)
Additionally, the sign of zero must be conserved:
@llvm.canonicalize(-0.0) = -0.0
and @llvm.canonicalize(+0.0) = +0.0
The payload bits of a NaN must be conserved, with two exceptions. First, environments which use only a single canonical representation of NaN must perform said canonicalization. Second, SNaNs must be quieted per the usual methods.
The canonicalization operation may be optimized away if:
- The input is known to be canonical. For example, it was produced by a floating-point operation that is required by the standard to be canonical.
- The result is consumed only by (or fused with) other floating-point operations. That is, the bits of the floating point value are not examined.
‘llvm.fmuladd.*
‘ Intrinsic¶
Syntax:¶
declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
declare double @llvm.fmuladd.f64(double %a, double %b, double %c)
Overview:¶
The ‘llvm.fmuladd.*
‘ intrinsic functions represent multiply-add
expressions that can be fused if the code generator determines that (a) the
target instruction set has support for a fused operation, and (b) that the
fused operation is more efficient than the equivalent, separate pair of mul
and add instructions.
Arguments:¶
The ‘llvm.fmuladd.*
‘ intrinsics each take three arguments: two
multiplicands, a and b, and an addend c.
Semantics:¶
The expression:
%0 = call float @llvm.fmuladd.f32(%a, %b, %c)
is equivalent to the expression a * b + c, except that rounding will
not be performed between the multiplication and addition steps if the
code generator fuses the operations. Fusion is not guaranteed, even if
the target platform supports it. If a fused multiply-add is required the
corresponding llvm.fma.* intrinsic function should be used
instead. This never sets errno, just as ‘llvm.fma.*
‘.
Half Precision Floating Point Intrinsics¶
For most target platforms, half precision floating point is a storage-only format. This means that it is a dense encoding (in memory) but does not support computation in the format.
This means that code must first load the half-precision floating point value as an i16, then convert it to float with llvm.convert.from.fp16. Computation can then be performed on the float value (including extending to double etc). To store the value back to memory, it is first converted to float if needed, then converted to i16 with llvm.convert.to.fp16, then storing as an i16 value.
‘llvm.convert.to.fp16
‘ Intrinsic¶
Syntax:¶
declare i16 @llvm.convert.to.fp16.f32(float %a)
declare i16 @llvm.convert.to.fp16.f64(double %a)
Overview:¶
The ‘llvm.convert.to.fp16
‘ intrinsic function performs a conversion from a
conventional floating point type to half precision floating point format.
Arguments:¶
The intrinsic function contains single argument - the value to be converted.
Semantics:¶
The ‘llvm.convert.to.fp16
‘ intrinsic function performs a conversion from a
conventional floating point format to half precision floating point format. The
return value is an i16
which contains the converted number.
‘llvm.convert.from.fp16
‘ Intrinsic¶
Syntax:¶
declare float @llvm.convert.from.fp16.f32(i16 %a)
declare double @llvm.convert.from.fp16.f64(i16 %a)
Overview:¶
The ‘llvm.convert.from.fp16
‘ intrinsic function performs a
conversion from half precision floating point format to single precision
floating point format.
Arguments:¶
The intrinsic function contains single argument - the value to be converted.
Semantics:¶
The ‘llvm.convert.from.fp16
‘ intrinsic function performs a
conversion from half single precision floating point format to single
precision floating point format. The input half-float value is
represented by an i16
value.
Debugger Intrinsics¶
The LLVM debugger intrinsics (which all start with llvm.dbg.
prefix), are described in the LLVM Source Level
Debugging
document.
Exception Handling Intrinsics¶
The LLVM exception handling intrinsics (which all start with
llvm.eh.
prefix), are described in the LLVM Exception
Handling document.
Trampoline Intrinsics¶
These intrinsics make it possible to excise one parameter, marked with the nest attribute, from a function. The result is a callable function pointer lacking the nest parameter - the caller does not need to provide a value for it. Instead, the value to use is stored in advance in a “trampoline”, a block of memory usually allocated on the stack, which also contains code to splice the nest value into the argument list. This is used to implement the GCC nested function address extension.
For example, if the function is i32 f(i8* nest %c, i32 %x, i32 %y)
then the resulting function pointer has signature i32 (i32, i32)*
.
It can be created as follows:
%tramp = alloca [10 x i8], align 4 ; size and alignment only correct for X86
%tramp1 = getelementptr [10 x i8], [10 x i8]* %tramp, i32 0, i32 0
call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8*, i32, i32)* @f to i8*), i8* %nval)
%p = call i8* @llvm.adjust.trampoline(i8* %tramp1)
%fp = bitcast i8* %p to i32 (i32, i32)*
The call %val = call i32 %fp(i32 %x, i32 %y)
is then equivalent to
%val = call i32 %f(i8* %nval, i32 %x, i32 %y)
.
‘llvm.init.trampoline
‘ Intrinsic¶
Overview:¶
This fills the memory pointed to by tramp
with executable code,
turning it into a trampoline.
Arguments:¶
The llvm.init.trampoline
intrinsic takes three arguments, all
pointers. The tramp
argument must point to a sufficiently large and
sufficiently aligned block of memory; this memory is written to by the
intrinsic. Note that the size and the alignment are target-specific -
LLVM currently provides no portable way of determining them, so a
front-end that generates this intrinsic needs to have some
target-specific knowledge. The func
argument must hold a function
bitcast to an i8*
.
Semantics:¶
The block of memory pointed to by tramp
is filled with target
dependent code, turning it into a function. Then tramp
needs to be
passed to llvm.adjust.trampoline to get a pointer which can
be bitcast (to a new function) and called. The new
function’s signature is the same as that of func
with any arguments
marked with the nest
attribute removed. At most one such nest
argument is allowed, and it must be of pointer type. Calling the new
function is equivalent to calling func
with the same argument list,
but with nval
used for the missing nest
argument. If, after
calling llvm.init.trampoline
, the memory pointed to by tramp
is
modified, then the effect of any later call to the returned function
pointer is undefined.
‘llvm.adjust.trampoline
‘ Intrinsic¶
Overview:¶
This performs any required machine-specific adjustment to the address of
a trampoline (passed as tramp
).
Arguments:¶
tramp
must point to a block of memory which already has trampoline
code filled in by a previous call to
llvm.init.trampoline.
Semantics:¶
On some architectures the address of the code to be executed needs to be
different than the address where the trampoline is actually stored. This
intrinsic returns the executable address corresponding to tramp
after performing the required machine specific adjustments. The pointer
returned can then be bitcast and executed.
Masked Vector Load and Store Intrinsics¶
LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the “off” lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.
‘llvm.masked.load.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. The loaded data is a vector of any integer or floating point data type.
declare <16 x float> @llvm.masked.load.v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
declare <2 x double> @llvm.masked.load.v2f64 (<2 x double>* <ptr>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>)
Overview:¶
Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the ‘passthru
‘ operand.
Arguments:¶
The first operand is the base pointer for the load. The second operand is the alignment of the source location. It must be a constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of the ‘passthru
‘ operand are the same vector types.
Semantics:¶
The ‘llvm.masked.load
‘ intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.
The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes.
%res = call <16 x float> @llvm.masked.load.v16f32 (<16 x float>* %ptr, i32 4, <16 x i1>%mask, <16 x float> %passthru)
;; The result of the two following instructions is identical aside from potential memory access exception
%loadlal = load <16 x float>, <16 x float>* %ptr, align 4
%res = select <16 x i1> %mask, <16 x float> %loadlal, <16 x float> %passthru
‘llvm.masked.store.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type.
declare void @llvm.masked.store.v8i32 (<8 x i32> <value>, <8 x i32> * <ptr>, i32 <alignment>, <8 x i1> <mask>)
declare void @llvm.masked.store.v16f32(<16 x i32> <value>, <16 x i32>* <ptr>, i32 <alignment>, <16 x i1> <mask>)
Overview:¶
Writes a vector to memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
Arguments:¶
The first operand is the vector value to be written to memory. The second operand is the base pointer for the store, it has the same underlying type as the value operand. The third operand is the alignment of the destination location. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
Semantics:¶
The ‘llvm.masked.store
‘ intrinsics is designed for conditional writing of selected vector elements in a single IR operation. It is useful for targets that support vector masked store and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
The result of this operation is equivalent to a load-modify-store sequence. However, using this intrinsic prevents exceptions and data races on memory access to masked-off lanes.
call void @llvm.masked.store.v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask)
;; The result of the following instructions is identical aside from potential data races and memory access exceptions
%oldval = load <16 x float>, <16 x float>* %ptr, align 4
%res = select <16 x i1> %mask, <16 x float> %value, <16 x float> %oldval
store <16 x float> %res, <16 x float>* %ptr, align 4
Masked Vector Gather and Scatter Intrinsics¶
LLVM provides intrinsics for vector gather and scatter operations. They are similar to Masked Vector Load and Store, except they are designed for arbitrary memory accesses, rather than sequential memory accesses. Gather and scatter also employ a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the “off” lanes are not accessed. When all bits are off, no memory is accessed.
‘llvm.masked.gather.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer or floating point data type gathered together into one vector.
declare <16 x float> @llvm.masked.gather.v16f32 (<16 x float*> <ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
declare <2 x double> @llvm.masked.gather.v2f64 (<2 x double*> <ptrs>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>)
Overview:¶
Reads scalar values from arbitrary memory locations and gathers them into one vector. The memory locations are provided in the vector of pointers ‘ptrs
‘. The memory is accessed according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the ‘passthru
‘ operand.
Arguments:¶
The first operand is a vector of pointers which holds all memory addresses to read. The second operand is an alignment of the source addresses. It must be a constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the vector of pointers and the type of the ‘passthru
‘ operand are the same vector types.
Semantics:¶
The ‘llvm.masked.gather
‘ intrinsic is designed for conditional reading of multiple scalar values from arbitrary memory locations in a single IR operation. It is useful for targets that support vector masked gathers and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of scalar load operations.
The semantics of this operation are equivalent to a sequence of conditional scalar loads with subsequent gathering all loaded values into a single vector. The mask restricts memory access to certain lanes and facilitates vectorization of predicated basic blocks.
%res = call <4 x double> @llvm.masked.gather.v4f64 (<4 x double*> %ptrs, i32 8, <4 x i1>%mask, <4 x double> <true, true, true, true>)
;; The gather with all-true mask is equivalent to the following instruction sequence
%ptr0 = extractelement <4 x double*> %ptrs, i32 0
%ptr1 = extractelement <4 x double*> %ptrs, i32 1
%ptr2 = extractelement <4 x double*> %ptrs, i32 2
%ptr3 = extractelement <4 x double*> %ptrs, i32 3
%val0 = load double, double* %ptr0, align 8
%val1 = load double, double* %ptr1, align 8
%val2 = load double, double* %ptr2, align 8
%val3 = load double, double* %ptr3, align 8
%vec0 = insertelement <4 x double>undef, %val0, 0
%vec01 = insertelement <4 x double>%vec0, %val1, 1
%vec012 = insertelement <4 x double>%vec01, %val2, 2
%vec0123 = insertelement <4 x double>%vec012, %val3, 3
‘llvm.masked.scatter.*
‘ Intrinsics¶
Syntax:¶
This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type. Each vector element is stored in an arbitrary memory addresses. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
declare void @llvm.masked.scatter.v8i32 (<8 x i32> <value>, <8 x i32*> <ptrs>, i32 <alignment>, <8 x i1> <mask>)
declare void @llvm.masked.scatter.v16f32(<16 x i32> <value>, <16 x i32*> <ptrs>, i32 <alignment>, <16 x i1> <mask>)
Overview:¶
Writes each element from the value vector to the corresponding memory address. The memory addresses are represented as a vector of pointers. Writing is done according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
Arguments:¶
The first operand is a vector value to be written to memory. The second operand is a vector of pointers, pointing to where the value elements should be stored. It has the same underlying type as the value operand. The third operand is an alignment of the destination addresses. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
Semantics:¶
The ‘llvm.masked.scatter
‘ intrinsics is designed for writing selected vector elements to arbitrary memory addresses in a single IR operation. The operation may be conditional, when not all bits in the mask are switched on. It is useful for targets that support vector masked scatter and allows vectorizing basic blocks with data and control divergency. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
;; This instruction unconditionaly stores data vector in multiple addresses
call @llvm.masked.scatter.v8i32 (<8 x i32> %value, <8 x i32*> %ptrs, i32 4, <8 x i1> <true, true, .. true>)
;; It is equivalent to a list of scalar stores
%val0 = extractelement <8 x i32> %value, i32 0
%val1 = extractelement <8 x i32> %value, i32 1
..
%val7 = extractelement <8 x i32> %value, i32 7
%ptr0 = extractelement <8 x i32*> %ptrs, i32 0
%ptr1 = extractelement <8 x i32*> %ptrs, i32 1
..
%ptr7 = extractelement <8 x i32*> %ptrs, i32 7
;; Note: the order of the following stores is important when they overlap:
store i32 %val0, i32* %ptr0, align 4
store i32 %val1, i32* %ptr1, align 4
..
store i32 %val7, i32* %ptr7, align 4
Memory Use Markers¶
This class of intrinsics provides information about the lifetime of memory objects and ranges where variables are immutable.
‘llvm.lifetime.start
‘ Intrinsic¶
Arguments:¶
The first argument is a constant integer representing the size of the object, or -1 if it is variable sized. The second argument is a pointer to the object.
Semantics:¶
This intrinsic indicates that before this point in the code, the value
of the memory pointed to by ptr
is dead. This means that it is known
to never be used and has an undefined value. A load from the pointer
that precedes this intrinsic can be replaced with 'undef'
.
‘llvm.lifetime.end
‘ Intrinsic¶
Arguments:¶
The first argument is a constant integer representing the size of the object, or -1 if it is variable sized. The second argument is a pointer to the object.
Semantics:¶
This intrinsic indicates that after this point in the code, the value of
the memory pointed to by ptr
is dead. This means that it is known to
never be used and has an undefined value. Any stores into the memory
object following this intrinsic may be removed as dead.
‘llvm.invariant.start
‘ Intrinsic¶
Overview:¶
The ‘llvm.invariant.start
‘ intrinsic specifies that the contents of
a memory object will not change.
Arguments:¶
The first argument is a constant integer representing the size of the object, or -1 if it is variable sized. The second argument is a pointer to the object.
Semantics:¶
This intrinsic indicates that until an llvm.invariant.end
that uses
the return value, the referenced memory location is constant and
unchanging.
‘llvm.invariant.end
‘ Intrinsic¶
Overview:¶
The ‘llvm.invariant.end
‘ intrinsic specifies that the contents of a
memory object are mutable.
Arguments:¶
The first argument is the matching llvm.invariant.start
intrinsic.
The second argument is a constant integer representing the size of the
object, or -1 if it is variable sized and the third argument is a
pointer to the object.
Semantics:¶
This intrinsic indicates that the memory is mutable again.
General Intrinsics¶
This class of intrinsics is designed to be generic and has no specific purpose.
‘llvm.var.annotation
‘ Intrinsic¶
Arguments:¶
The first argument is a pointer to a value, the second is a pointer to a global string, the third is a pointer to a global string which is the source file name, and the last argument is the line number.
Semantics:¶
This intrinsic allows annotation of local variables with arbitrary strings. This can be useful for special purpose optimizations that want to look for these annotations. These have no other defined use; they are ignored by code generation and optimization.
‘llvm.ptr.annotation.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use ‘llvm.ptr.annotation
‘ on a
pointer to an integer of any width. NOTE you must specify an address space for
the pointer. The identifier for the default address space is the integer
‘0
‘.
declare i8* @llvm.ptr.annotation.p<address space>i8(i8* <val>, i8* <str>, i8* <str>, i32 <int>)
declare i16* @llvm.ptr.annotation.p<address space>i16(i16* <val>, i8* <str>, i8* <str>, i32 <int>)
declare i32* @llvm.ptr.annotation.p<address space>i32(i32* <val>, i8* <str>, i8* <str>, i32 <int>)
declare i64* @llvm.ptr.annotation.p<address space>i64(i64* <val>, i8* <str>, i8* <str>, i32 <int>)
declare i256* @llvm.ptr.annotation.p<address space>i256(i256* <val>, i8* <str>, i8* <str>, i32 <int>)
Arguments:¶
The first argument is a pointer to an integer value of arbitrary bitwidth (result of some expression), the second is a pointer to a global string, the third is a pointer to a global string which is the source file name, and the last argument is the line number. It returns the value of the first argument.
Semantics:¶
This intrinsic allows annotation of a pointer to an integer with arbitrary strings. This can be useful for special purpose optimizations that want to look for these annotations. These have no other defined use; they are ignored by code generation and optimization.
‘llvm.annotation.*
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use ‘llvm.annotation
‘ on
any integer bit width.
declare i8 @llvm.annotation.i8(i8 <val>, i8* <str>, i8* <str>, i32 <int>)
declare i16 @llvm.annotation.i16(i16 <val>, i8* <str>, i8* <str>, i32 <int>)
declare i32 @llvm.annotation.i32(i32 <val>, i8* <str>, i8* <str>, i32 <int>)
declare i64 @llvm.annotation.i64(i64 <val>, i8* <str>, i8* <str>, i32 <int>)
declare i256 @llvm.annotation.i256(i256 <val>, i8* <str>, i8* <str>, i32 <int>)
Arguments:¶
The first argument is an integer value (result of some expression), the second is a pointer to a global string, the third is a pointer to a global string which is the source file name, and the last argument is the line number. It returns the value of the first argument.
Semantics:¶
This intrinsic allows annotations to be put on arbitrary expressions with arbitrary strings. This can be useful for special purpose optimizations that want to look for these annotations. These have no other defined use; they are ignored by code generation and optimization.
‘llvm.trap
‘ Intrinsic¶
Arguments:¶
None.
Semantics:¶
This intrinsic is lowered to the target dependent trap instruction. If
the target does not have a trap instruction, this intrinsic will be
lowered to a call of the abort()
function.
‘llvm.debugtrap
‘ Intrinsic¶
Arguments:¶
None.
Semantics:¶
This intrinsic is lowered to code which is intended to cause an execution trap with the intention of requesting the attention of a debugger.
‘llvm.stackprotector
‘ Intrinsic¶
Overview:¶
The llvm.stackprotector
intrinsic takes the guard
and stores it
onto the stack at slot
. The stack slot is adjusted to ensure that it
is placed on the stack before local variables.
Arguments:¶
The llvm.stackprotector
intrinsic requires two pointer arguments.
The first argument is the value loaded from the stack guard
@__stack_chk_guard
. The second variable is an alloca
that has
enough space to hold the value of the guard.
Semantics:¶
This intrinsic causes the prologue/epilogue inserter to force the position of
the AllocaInst
stack slot to be before local variables on the stack. This is
to ensure that if a local variable on the stack is overwritten, it will destroy
the value of the guard. When the function exits, the guard on the stack is
checked against the original guard by llvm.stackprotectorcheck
. If they are
different, then llvm.stackprotectorcheck
causes the program to abort by
calling the __stack_chk_fail()
function.
‘llvm.stackprotectorcheck
‘ Intrinsic¶
Overview:¶
The llvm.stackprotectorcheck
intrinsic compares guard
against an already
created stack protector and if they are not equal calls the
__stack_chk_fail()
function.
Arguments:¶
The llvm.stackprotectorcheck
intrinsic requires one pointer argument, the
the variable @__stack_chk_guard
.
Semantics:¶
This intrinsic is provided to perform the stack protector check by comparing
guard
with the stack slot created by llvm.stackprotector
and if the
values do not match call the __stack_chk_fail()
function.
The reason to provide this as an IR level intrinsic instead of implementing it via other IR operations is that in order to perform this operation at the IR level without an intrinsic, one would need to create additional basic blocks to handle the success/failure cases. This makes it difficult to stop the stack protector check from disrupting sibling tail calls in Codegen. With this intrinsic, we are able to generate the stack protector basic blocks late in codegen after the tail call decision has occurred.
‘llvm.objectsize
‘ Intrinsic¶
Syntax:¶
declare i32 @llvm.objectsize.i32(i8* <object>, i1 <min>)
declare i64 @llvm.objectsize.i64(i8* <object>, i1 <min>)
Overview:¶
The llvm.objectsize
intrinsic is designed to provide information to
the optimizers to determine at compile time whether a) an operation
(like memcpy) will overflow a buffer that corresponds to an object, or
b) that a runtime check for overflow isn’t necessary. An object in this
context means an allocation of a specific class, structure, array, or
other object.
Arguments:¶
The llvm.objectsize
intrinsic takes two arguments. The first
argument is a pointer to or into the object
. The second argument is
a boolean and determines whether llvm.objectsize
returns 0 (if true)
or -1 (if false) when the object size is unknown. The second argument
only accepts constants.
Semantics:¶
The llvm.objectsize
intrinsic is lowered to a constant representing
the size of the object concerned. If the size cannot be determined at
compile time, llvm.objectsize
returns i32/i64 -1 or 0
(depending
on the min
argument).
‘llvm.expect
‘ Intrinsic¶
Syntax:¶
This is an overloaded intrinsic. You can use llvm.expect
on any
integer bit width.
declare i1 @llvm.expect.i1(i1 <val>, i1 <expected_val>)
declare i32 @llvm.expect.i32(i32 <val>, i32 <expected_val>)
declare i64 @llvm.expect.i64(i64 <val>, i64 <expected_val>)
Overview:¶
The llvm.expect
intrinsic provides information about expected (the
most probable) value of val
, which can be used by optimizers.
Arguments:¶
The llvm.expect
intrinsic takes two arguments. The first argument is
a value. The second argument is an expected value, this needs to be a
constant value, variables are not allowed.
Semantics:¶
This intrinsic is lowered to the val
.
‘llvm.assume
‘ Intrinsic¶
Overview:¶
The llvm.assume
allows the optimizer to assume that the provided
condition is true. This information can then be used in simplifying other parts
of the code.
Arguments:¶
The condition which the optimizer may assume is always true.
Semantics:¶
The intrinsic allows the optimizer to assume that the provided condition is always true whenever the control flow reaches the intrinsic call. No code is generated for this intrinsic, and instructions that contribute only to the provided condition are not used for code generation. If the condition is violated during execution, the behavior is undefined.
Note that the optimizer might limit the transformations performed on values
used by the llvm.assume
intrinsic in order to preserve the instructions
only used to form the intrinsic’s input argument. This might prove undesirable
if the extra information provided by the llvm.assume
intrinsic does not cause
sufficient overall improvement in code quality. For this reason,
llvm.assume
should not be used to document basic mathematical invariants
that the optimizer can otherwise deduce or facts that are of little use to the
optimizer.
‘llvm.bitset.test
‘ Intrinsic¶
Arguments:¶
The first argument is a pointer to be tested. The second argument is a metadata string containing the name of a bitset.
‘llvm.donothing
‘ Intrinsic¶
Overview:¶
The llvm.donothing
intrinsic doesn’t perform any operation. It’s one of only
two intrinsics (besides llvm.experimental.patchpoint
) that can be called
with an invoke instruction.
Arguments:¶
None.
Semantics:¶
This intrinsic does nothing, and it’s removed by optimizers and ignored by codegen.
Stack Map Intrinsics¶
LLVM provides experimental intrinsics to support runtime patching mechanisms commonly desired in dynamic language JITs. These intrinsics are described in Stack maps and patch points in LLVM.
- LLVM Language Reference Manual
- Defines the LLVM intermediate representation.
- Introduction to the LLVM Compiler
- Presentation providing a users introduction to LLVM.
- Intro to LLVM
- Book chapter providing a compiler hacker’s introduction to LLVM.
- LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
- Design overview.
- LLVM: An Infrastructure for Multi-Stage Optimization
- More details (quite old now).
- Publications mentioning LLVM
User Guides¶
For those new to the LLVM system.
NOTE: If you are a user who is only interested in using LLVM-based compilers, you should look into Clang or DragonEgg instead. The documentation here is intended for users who have a need to work with the intermediate LLVM representation.
Building LLVM with CMake¶
Introduction¶
CMake is a cross-platform build-generator tool. CMake does not build the project, it generates the files needed by your build tool (GNU make, Visual Studio, etc) for building LLVM.
If you are really anxious about getting a functional LLVM build, go to the Quick start section. If you are a CMake novice, start on Basic CMake usage and then go back to the Quick start once you know what you are doing. The Options and variables section is a reference for customizing your build. If you already have experience with CMake, this is the recommended starting point.
Quick start¶
We use here the command-line, non-interactive CMake interface.
Download and install CMake. Version 2.8.8 is the minimum required.
Open a shell. Your development tools must be reachable from this shell through the PATH environment variable.
Create a directory for containing the build. It is not supported to build LLVM on the source directory. cd to this directory:
$ mkdir mybuilddir $ cd mybuilddir
Execute this command on the shell replacing path/to/llvm/source/root with the path to the root of your LLVM source tree:
$ cmake path/to/llvm/source/root
CMake will detect your development environment, perform a series of test and generate the files required for building LLVM. CMake will use default values for all build parameters. See the Options and variables section for fine-tuning your build
This can fail if CMake can’t detect your toolset, or if it thinks that the environment is not sane enough. On this case make sure that the toolset that you intend to use is the only one reachable from the shell and that the shell itself is the correct one for you development environment. CMake will refuse to build MinGW makefiles if you have a POSIX shell reachable through the PATH environment variable, for instance. You can force CMake to use a given build tool, see the Usage section.
After CMake has finished running, proceed to use IDE project files or start the build from the build directory:
$ cmake --build .
The
--build
option tellscmake
to invoke the underlying build tool (make
,ninja
,xcodebuild
,msbuild
, etc).The underlying build tool can be invoked directly either of course, but the
--build
option is portable.After LLVM has finished building, install it from the build directory:
$ cmake --build . --target install
The
--target
option withinstall
parameter in addition to the--build
option tellscmake
to build theinstall
target.It is possible to set a different install prefix at installation time by invoking the
cmake_install.cmake
script generated in the build directory:$ cmake -DCMAKE_INSTALL_PREFIX=/tmp/llvm -P cmake_install.cmake
Basic CMake usage¶
This section explains basic aspects of CMake, mostly for explaining those options which you may need on your day-to-day usage.
CMake comes with extensive documentation in the form of html files and on the
cmake executable itself. Execute cmake --help
for further help options.
CMake requires to know for which build tool it shall generate files (GNU make,
Visual Studio, Xcode, etc). If not specified on the command line, it tries to
guess it based on you environment. Once identified the build tool, CMake uses
the corresponding Generator for creating files for your build tool. You can
explicitly specify the generator with the command line option -G "Name of the
generator"
. For knowing the available generators on your platform, execute
$ cmake --help
This will list the generator’s names at the end of the help text. Generator’s names are case-sensitive. Example:
$ cmake -G "Visual Studio 11" path/to/llvm/source/root
For a given development platform there can be more than one adequate
generator. If you use Visual Studio “NMake Makefiles” is a generator you can use
for building with NMake. By default, CMake chooses the more specific generator
supported by your development environment. If you want an alternative generator,
you must tell this to CMake with the -G
option.
Options and variables¶
Variables customize how the build will be generated. Options are boolean variables, with possible values ON/OFF. Options and variables are defined on the CMake command line like this:
$ cmake -DVARIABLE=value path/to/llvm/source
You can set a variable after the initial CMake invocation for changing its value. You can also undefine a variable:
$ cmake -UVARIABLE path/to/llvm/source
Variables are stored on the CMake cache. This is a file named CMakeCache.txt
on the root of the build directory. Do not hand-edit it.
Variables are listed here appending its type after a colon. It is correct to write the variable and the type on the CMake command line:
$ cmake -DVARIABLE:TYPE=value path/to/llvm/source
Frequently-used CMake variables¶
Here are some of the CMake variables that are used often, along with a
brief explanation and LLVM-specific notes. For full documentation, check the
CMake docs or execute cmake --help-variable VARIABLE_NAME
.
- CMAKE_BUILD_TYPE:STRING
- Sets the build type for
make
based generators. Possible values are Release, Debug, RelWithDebInfo and MinSizeRel. On systems like Visual Studio the user sets the build type with the IDE settings. - CMAKE_INSTALL_PREFIX:PATH
- Path where LLVM will be installed if “make install” is invoked or the “INSTALL” target is built.
- LLVM_LIBDIR_SUFFIX:STRING
- Extra suffix to append to the directory where libraries are to be
installed. On a 64-bit architecture, one could use
-DLLVM_LIBDIR_SUFFIX=64
to install libraries to/usr/lib64
. - CMAKE_C_FLAGS:STRING
- Extra flags to use when compiling C source files.
- CMAKE_CXX_FLAGS:STRING
- Extra flags to use when compiling C++ source files.
- BUILD_SHARED_LIBS:BOOL
- Flag indicating if shared libraries will be built. Its default value is OFF. Shared libraries are not supported on Windows and not recommended on the other OSes.
LLVM-specific variables¶
- LLVM_TARGETS_TO_BUILD:STRING
- Semicolon-separated list of targets to build, or all for building all
targets. Case-sensitive. Defaults to all. Example:
-DLLVM_TARGETS_TO_BUILD="X86;PowerPC"
. - LLVM_BUILD_TOOLS:BOOL
- Build LLVM tools. Defaults to ON. Targets for building each tool are generated in any case. You can build an tool separately by invoking its target. For example, you can build llvm-as with a makefile-based system executing make llvm-as on the root of your build directory.
- LLVM_INCLUDE_TOOLS:BOOL
- Generate build targets for the LLVM tools. Defaults to ON. You can use that option for disabling the generation of build targets for the LLVM tools.
- LLVM_BUILD_EXAMPLES:BOOL
- Build LLVM examples. Defaults to OFF. Targets for building each example are generated in any case. See documentation for LLVM_BUILD_TOOLS above for more details.
- LLVM_INCLUDE_EXAMPLES:BOOL
- Generate build targets for the LLVM examples. Defaults to ON. You can use that option for disabling the generation of build targets for the LLVM examples.
- LLVM_BUILD_TESTS:BOOL
- Build LLVM unit tests. Defaults to OFF. Targets for building each unit test are generated in any case. You can build a specific unit test with the target UnitTestNameTests (where at this time UnitTestName can be ADT, Analysis, ExecutionEngine, JIT, Support, Transform, VMCore; see the subdirectories of unittests for an updated list.) It is possible to build all unit tests with the target UnitTests.
- LLVM_INCLUDE_TESTS:BOOL
- Generate build targets for the LLVM unit tests. Defaults to ON. You can use that option for disabling the generation of build targets for the LLVM unit tests.
- LLVM_APPEND_VC_REV:BOOL
- Append version control revision info (svn revision number or Git revision id) to LLVM version string (stored in the PACKAGE_VERSION macro). For this to work cmake must be invoked before the build. Defaults to OFF.
- LLVM_ENABLE_THREADS:BOOL
- Build with threads support, if available. Defaults to ON.
- LLVM_ENABLE_CXX1Y:BOOL
- Build in C++1y mode, if available. Defaults to OFF.
- LLVM_ENABLE_ASSERTIONS:BOOL
- Enables code assertions. Defaults to ON if and only if
CMAKE_BUILD_TYPE
is Debug. - LLVM_ENABLE_EH:BOOL
- Build LLVM with exception handling support. This is necessary if you wish to link against LLVM libraries and make use of C++ exceptions in your own code that need to propagate through LLVM code. Defaults to OFF.
- LLVM_ENABLE_PIC:BOOL
- Add the
-fPIC
flag for the compiler command-line, if the compiler supports this flag. Some systems, like Windows, do not need this flag. Defaults to ON. - LLVM_ENABLE_RTTI:BOOL
- Build LLVM with run time type information. Defaults to OFF.
- LLVM_ENABLE_WARNINGS:BOOL
- Enable all compiler warnings. Defaults to ON.
- LLVM_ENABLE_PEDANTIC:BOOL
- Enable pedantic mode. This disables compiler specific extensions, if possible. Defaults to ON.
- LLVM_ENABLE_WERROR:BOOL
- Stop and fail build, if a compiler warning is triggered. Defaults to OFF.
- LLVM_ABI_BREAKING_CHECKS:STRING
- Used to decide if LLVM should be built with ABI breaking checks or not. Allowed values are WITH_ASSERTS (default), FORCE_ON and FORCE_OFF. WITH_ASSERTS turns on ABI breaking checks in an assertion enabled build. FORCE_ON (FORCE_OFF) turns them on (off) irrespective of whether normal (NDEBUG based) assertions are enabled or not. A version of LLVM built with ABI breaking checks is not ABI compatible with a version built without it.
- LLVM_BUILD_32_BITS:BOOL
- Build 32-bits executables and libraries on 64-bits systems. This option is available only on some 64-bits unix systems. Defaults to OFF.
- LLVM_TARGET_ARCH:STRING
- LLVM target to use for native code generation. This is required for JIT generation. It defaults to “host”, meaning that it shall pick the architecture of the machine where LLVM is being built. If you are cross-compiling, set it to the target architecture name.
- LLVM_TABLEGEN:STRING
- Full path to a native TableGen executable (usually named
tblgen
). This is intended for cross-compiling: if the user sets this variable, no native TableGen will be created. - LLVM_LIT_ARGS:STRING
- Arguments given to lit.
make check
andmake clang-test
are affected. By default,'-sv --no-progress-bar'
on Visual C++ and Xcode,'-sv'
on others. - LLVM_LIT_TOOLS_DIR:PATH
- The path to GnuWin32 tools for tests. Valid on Windows host. Defaults to “”, then Lit seeks tools according to %PATH%. Lit can find tools(eg. grep, sort, &c) on LLVM_LIT_TOOLS_DIR at first, without specifying GnuWin32 to %PATH%.
- LLVM_ENABLE_FFI:BOOL
- Indicates whether LLVM Interpreter will be linked with Foreign Function Interface library. If the library or its headers are installed on a custom location, you can set the variables FFI_INCLUDE_DIR and FFI_LIBRARY_DIR. Defaults to OFF.
- LLVM_EXTERNAL_{CLANG,LLD,POLLY}_SOURCE_DIR:PATH
- Path to
{Clang,lld,Polly}
‘s source directory. Defaults totools/{clang,lld,polly}
.{Clang,lld,Polly}
will not be built when it is empty or it does not point to a valid path. - LLVM_USE_OPROFILE:BOOL
- Enable building OProfile JIT support. Defaults to OFF
- LLVM_USE_INTEL_JITEVENTS:BOOL
- Enable building support for Intel JIT Events API. Defaults to OFF
- LLVM_ENABLE_ZLIB:BOOL
- Build with zlib to support compression/uncompression in LLVM tools. Defaults to ON.
- LLVM_USE_SANITIZER:STRING
- Define the sanitizer used to build LLVM binaries and tests. Possible values
are
Address
,Memory
,MemoryWithOrigins
,Undefined
,Thread
, andAddress;Undefined
. Defaults to empty string. - LLVM_PARALLEL_COMPILE_JOBS:STRING
- Define the maximum number of concurrent compilation jobs.
- LLVM_PARALLEL_LINK_JOBS:STRING
- Define the maximum number of concurrent link jobs.
- LLVM_BUILD_DOCS:BOOL
- Enables all enabled documentation targets (i.e. Doxgyen and Sphinx targets) to
be built as part of the normal build. If the
install
target is run then this also enables all built documentation targets to be installed. Defaults to OFF. - LLVM_ENABLE_DOXYGEN:BOOL
- Enables the generation of browsable HTML documentation using doxygen. Defaults to OFF.
- LLVM_ENABLE_DOXYGEN_QT_HELP:BOOL
- Enables the generation of a Qt Compressed Help file. Defaults to OFF.
This affects the make target
doxygen-llvm
. When enabled, apart from the normal HTML output generated by doxygen, this will produce a QCH file namedorg.llvm.qch
. You can then load this file into Qt Creator. This option is only useful in combination with-DLLVM_ENABLE_DOXYGEN=ON
; otherwise this has no effect. - LLVM_DOXYGEN_QCH_FILENAME:STRING
- The filename of the Qt Compressed Help file that will be generated when
-DLLVM_ENABLE_DOXYGEN=ON
and-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON
are given. Defaults toorg.llvm.qch
. This option is only useful in combination with-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON
; otherwise this has no effect. - LLVM_DOXYGEN_QHP_NAMESPACE:STRING
- Namespace under which the intermediate Qt Help Project file lives. See Qt
Help Project
for more information. Defaults to “org.llvm”. This option is only useful in
combination with
-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON
; otherwise this has no effect. - LLVM_DOXYGEN_QHP_CUST_FILTER_NAME:STRING
- See Qt Help Project for
more information. Defaults to the CMake variable
${PACKAGE_STRING}
which is a combination of the package name and version string. This filter can then be used in Qt Creator to select only documentation from LLVM when browsing through all the help files that you might have loaded. This option is only useful in combination with-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON
; otherwise this has no effect.
- LLVM_DOXYGEN_QHELPGENERATOR_PATH:STRING
- The path to the
qhelpgenerator
executable. Defaults to whatever CMake’sfind_program()
can find. This option is only useful in combination with-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON
; otherwise this has no effect. - LLVM_DOXYGEN_SVG:BOOL
- Uses .svg files instead of .png files for graphs in the Doxygen output. Defaults to OFF.
- LLVM_ENABLE_SPHINX:BOOL
- If enabled CMake will search for the
sphinx-build
executable and will make theSPHINX_OUTPUT_HTML
andSPHINX_OUTPUT_MAN
CMake options available. Defaults to OFF. - SPHINX_EXECUTABLE:STRING
- The path to the
sphinx-build
executable detected by CMake. - SPHINX_OUTPUT_HTML:BOOL
- If enabled (and
LLVM_ENABLE_SPHINX
is enabled) then the targets for building the documentation as html are added (but not built by default unlessLLVM_BUILD_DOCS
is enabled). There is a target for each project in the source tree that uses sphinx (e.g.docs-llvm-html
,docs-clang-html
anddocs-lld-html
). Defaults to ON. - SPHINX_OUTPUT_MAN:BOOL
- If enabled (and
LLVM_ENABLE_SPHINX
is enabled) the targets for building the man pages are added (but not built by default unlessLLVM_BUILD_DOCS
is enabled). Currently the only target added isdocs-llvm-man
. Defaults to ON. - SPHINX_WARNINGS_AS_ERRORS:BOOL
- If enabled then sphinx documentation warnings will be treated as errors. Defaults to ON.
Executing the test suite¶
Testing is performed when the check target is built. For instance, if you are using makefiles, execute this command while on the top level of your build directory:
$ make check
On Visual Studio, you may run tests to build the project “check”.
Cross compiling¶
See this wiki page for generic instructions on how to cross-compile with CMake. It goes into detailed explanations and may seem daunting, but it is not. On the wiki page there are several examples including toolchain files. Go directly to this section for a quick solution.
Also see the LLVM-specific variables section for variables used when cross-compiling.
Embedding LLVM in your project¶
From LLVM 3.5 onwards both the CMake and autoconf/Makefile build systems export LLVM libraries as importable CMake targets. This means that clients of LLVM can now reliably use CMake to develop their own LLVM based projects against an installed version of LLVM regardless of how it was built.
Here is a simple example of CMakeLists.txt file that imports the LLVM libraries
and uses them to build a simple application simple-tool
.
cmake_minimum_required(VERSION 2.8.8)
project(SimpleProject)
find_package(LLVM REQUIRED CONFIG)
message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")
# Set your project compile flags.
# E.g. if using the C++ header files
# you will need to enable C++11 support
# for your compiler.
include_directories(${LLVM_INCLUDE_DIRS})
add_definitions(${LLVM_DEFINITIONS})
# Now build our tools
add_executable(simple-tool tool.cpp)
# Find the libraries that correspond to the LLVM components
# that we wish to use
llvm_map_components_to_libnames(llvm_libs support core irreader)
# Link against LLVM libraries
target_link_libraries(simple-tool ${llvm_libs})
The find_package(...)
directive when used in CONFIG mode (as in the above
example) will look for the LLVMConfig.cmake
file in various locations (see
cmake manual for details). It creates a LLVM_DIR
cache entry to save the
directory where LLVMConfig.cmake
is found or allows the user to specify the
directory (e.g. by passing -DLLVM_DIR=/usr/share/llvm/cmake
to
the cmake
command or by setting it directly in ccmake
or cmake-gui
).
This file is available in two different locations.
<INSTALL_PREFIX>/share/llvm/cmake/LLVMConfig.cmake
where<INSTALL_PREFIX>
is the install prefix of an installed version of LLVM. On Linux typically this is/usr/share/llvm/cmake/LLVMConfig.cmake
.<LLVM_BUILD_ROOT>/share/llvm/cmake/LLVMConfig.cmake
where<LLVM_BUILD_ROOT>
is the root of the LLVM build tree. Note this only available when building LLVM with CMake
If LLVM is installed in your operating system’s normal installation prefix (e.g.
on Linux this is usually /usr/
) find_package(LLVM ...)
will
automatically find LLVM if it is installed correctly. If LLVM is not installed
or you wish to build directly against the LLVM build tree you can use
LLVM_DIR
as previously mentioned.
The LLVMConfig.cmake
file sets various useful variables. Notable variables
include
LLVM_CMAKE_DIR
- The path to the LLVM CMake directory (i.e. the directory containing LLVMConfig.cmake).
LLVM_DEFINITIONS
- A list of preprocessor defines that should be used when building against LLVM.
LLVM_ENABLE_ASSERTIONS
- This is set to ON if LLVM was built with assertions, otherwise OFF.
LLVM_ENABLE_EH
- This is set to ON if LLVM was built with exception handling (EH) enabled, otherwise OFF.
LLVM_ENABLE_RTTI
- This is set to ON if LLVM was built with run time type information (RTTI), otherwise OFF.
LLVM_INCLUDE_DIRS
- A list of include paths to directories containing LLVM header files.
LLVM_PACKAGE_VERSION
- The LLVM version. This string can be used with CMake conditionals. E.g.
if (${LLVM_PACKAGE_VERSION} VERSION_LESS "3.5")
. LLVM_TOOLS_BINARY_DIR
- The path to the directory containing the LLVM tools (e.g.
llvm-as
).
Notice that in the above example we link simple-tool
against several LLVM
libraries. The list of libraries is determined by using the
llvm_map_components_to_libnames()
CMake function. For a list of available
components look at the output of running llvm-config --components
.
Note that for LLVM < 3.5 llvm_map_components_to_libraries()
was
used instead of llvm_map_components_to_libnames()
. This is now deprecated
and will be removed in a future version of LLVM.
Developing LLVM passes out of source¶
It is possible to develop LLVM passes out of LLVM’s source tree (i.e. against an installed or built LLVM). An example of a project layout is provided below.
<project dir>/
|
CMakeLists.txt
<pass name>/
|
CMakeLists.txt
Pass.cpp
...
Contents of <project dir>/CMakeLists.txt
:
find_package(LLVM REQUIRED CONFIG)
add_definitions(${LLVM_DEFINITIONS})
include_directories(${LLVM_INCLUDE_DIRS})
add_subdirectory(<pass name>)
Contents of <project dir>/<pass name>/CMakeLists.txt
:
add_library(LLVMPassname MODULE Pass.cpp)
Note if you intend for this pass to be merged into the LLVM source tree at some point in the future it might make more sense to use LLVM’s internal add_llvm_loadable_module function instead by...
Adding the following to <project dir>/CMakeLists.txt
(after
find_package(LLVM ...)
)
list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
include(AddLLVM)
And then changing <project dir>/<pass name>/CMakeLists.txt
to
add_llvm_loadable_module(LLVMPassname
Pass.cpp
)
When you are done developing your pass, you may wish to integrate it into LLVM source tree. You can achieve it in two easy steps:
- Copying
<pass name>
folder into<LLVM root>/lib/Transform
directory. - Adding
add_subdirectory(<pass name>)
line into<LLVM root>/lib/Transform/CMakeLists.txt
.
Compiler/Platform-specific topics¶
Notes for specific compilers and/or platforms.
Microsoft Visual C++¶
- LLVM_COMPILER_JOBS:STRING
- Specifies the maximum number of parallell compiler jobs to use per project when building with msbuild or Visual Studio. Only supported for the Visual Studio 2010 CMake generator. 0 means use all processors. Default is 0.
How To Build On ARM¶
Introduction¶
This document contains information about building/testing LLVM and Clang on an ARM machine.
This document is NOT tailored to help you cross-compile LLVM/Clang to ARM on another architecture, for example an x86_64 machine. To find out more about cross-compiling, please check How To Cross-Compile Clang/LLVM using Clang/LLVM.
Notes On Building LLVM/Clang on ARM¶
Here are some notes on building/testing LLVM/Clang on ARM. Note that ARM encompasses a wide variety of CPUs; this advice is primarily based on the ARMv6 and ARMv7 architectures and may be inapplicable to older chips.
If you are building LLVM/Clang on an ARM board with 1G of memory or less, please use
gold
rather then GNUld
. Building LLVM/Clang with--enable-optimized
is preferred since it consumes less memory. Otherwise, the building process will very likely fail due to insufficient memory. In any case it is probably a good idea to set up a swap partition.If you want to run
make check-all
after building LLVM/Clang, to avoid false alarms (e.g., ARCMT failure) please use at least the following configuration:$ ../$LLVM_SRC_DIR/configure --with-abi=aapcs-vfp
The most popular Linaro/Ubuntu OS’s for ARM boards, e.g., the Pandaboard, have become hard-float platforms. The following set of configuration options appears to be a good choice for this platform:
../$LLVM_SRC_DIR/configure --build=armv7l-unknown-linux-gnueabihf \ --host=armv7l-unknown-linux-gnueabihf \ --target=armv7l-unknown-linux-gnueabihf --with-cpu=cortex-a9 \ --with-float=hard --with-abi=aapcs-vfp --with-fpu=neon \ --enable-targets=arm --enable-optimized --enable-assertions
ARM development boards can be unstable and you may experience that cores are disappearing, caches being flushed on every big.LITTLE switch, and other similar issues. To help ease the effect of this, set the Linux scheduler to “performance” on all cores using this little script:
# The code below requires the package 'cpufrequtils' to be installed. for ((cpu=0; cpu<`grep -c proc /proc/cpuinfo`; cpu++)); do sudo cpufreq-set -c $cpu -g performance done
Running the build on SD cards is ok, but they are more prone to failures than good quality USB sticks, and those are more prone to failures than external hard-drives (those are also a lot faster). So, at least, you should consider to buy a fast USB stick. On systems with a fast eMMC, that’s a good option too.
Make sure you have a decent power supply (dozens of dollars worth) that can provide at least 4 amperes, this is especially important if you use USB devices with your board.
How To Cross-Compile Clang/LLVM using Clang/LLVM¶
Introduction¶
This document contains information about building LLVM and Clang on host machine, targeting another platform.
For more information on how to use Clang as a cross-compiler, please check http://clang.llvm.org/docs/CrossCompilation.html.
TODO: Add MIPS and other platforms to this document.
Cross-Compiling from x86_64 to ARM¶
In this use case, we’ll be using CMake and Ninja, on a Debian-based Linux system, cross-compiling from an x86_64 host (most Intel and AMD chips nowadays) to a hard-float ARM target (most ARM targets nowadays).
The packages you’ll need are:
cmake
ninja-build
(from backports in Ubuntu)gcc-4.7-arm-linux-gnueabihf
gcc-4.7-multilib-arm-linux-gnueabihf
binutils-arm-linux-gnueabihf
libgcc1-armhf-cross
libsfgcc1-armhf-cross
libstdc++6-armhf-cross
libstdc++6-4.7-dev-armhf-cross
Configuring CMake¶
For more information on how to configure CMake for LLVM/Clang, see Building LLVM with CMake.
- The CMake options you need to add are:
-DCMAKE_CROSSCOMPILING=True
-DCMAKE_INSTALL_PREFIX=<install-dir>
-DLLVM_TABLEGEN=<path-to-host-bin>/llvm-tblgen
-DCLANG_TABLEGEN=<path-to-host-bin>/clang-tblgen
-DLLVM_DEFAULT_TARGET_TRIPLE=arm-linux-gnueabihf
-DLLVM_TARGET_ARCH=ARM
-DLLVM_TARGETS_TO_BUILD=ARM
-DCMAKE_CXX_FLAGS='-target armv7a-linux-gnueabihf -mcpu=cortex-a9 -I/usr/arm-linux-gnueabihf/include/c++/4.7.2/arm-linux-gnueabihf/ -I/usr/arm-linux-gnueabihf/include/ -mfloat-abi=hard -ccc-gcc-name arm-linux-gnueabihf-gcc'
The TableGen options are required to compile it with the host compiler,
so you’ll need to compile LLVM (or at least llvm-tblgen
) to your host
platform before you start. The CXX flags define the target, cpu (which
defaults to fpu=VFP3
with NEON), and forcing the hard-float ABI. If you’re
using Clang as a cross-compiler, you will also have to set -ccc-gcc-name
,
to make sure it picks the correct linker.
Most of the time, what you want is to have a native compiler to the
platform itself, but not others. It might not even be feasible to
produce x86 binaries from ARM targets, so there’s no point in compiling
all back-ends. For that reason, you should also set the
TARGETS_TO_BUILD
to only build the ARM back-end.
You must set the CMAKE_INSTALL_PREFIX
, otherwise a ninja install
will copy ARM binaries to your root filesystem, which is not what you
want.
Hacks¶
There are some bugs in current LLVM, which require some fiddling before running CMake:
If you’re using Clang as the cross-compiler, there is a problem in the LLVM ARM back-end that is producing absolute relocations on position-independent code (
R_ARM_THM_MOVW_ABS_NC
), so for now, you should disable PIC:-DLLVM_ENABLE_PIC=False
This is not a problem, since Clang/LLVM libraries are statically linked anyway, it shouldn’t affect much.
The ARM libraries won’t be installed in your system, and possibly not easily installable anyway, so you’ll have to build/download them separately. But the CMake prepare step, which checks for dependencies, will check the host libraries, not the target ones.
A quick way of getting the libraries is to download them from a distribution repository, like Debian (http://packages.debian.org/wheezy/), and download the missing libraries. Note that the
libXXX
will have the shared objects (.so
) and thelibXXX-dev
will give you the headers and the static (.a
) library. Just in case, download both.The ones you need for ARM are:
libtinfo
,zlib1g
,libxml2
andliblzma
. In the Debian repository you’ll find downloads for all architectures.After you download and unpack all
.deb
packages, copy all.so
and.a
to a directory, make the appropriate symbolic links (if necessary), and add the relevant-L
and-I
paths to-DCMAKE_CXX_FLAGS
above.
Running CMake and Building¶
Finally, if you’re using your platform compiler, run:
$ cmake -G Ninja <source-dir> <options above>
If you’re using Clang as the cross-compiler, run:
$ CC='clang' CXX='clang++' cmake -G Ninja <source-dir> <options above>
If you have clang
/clang++
on the path, it should just work, and special
Ninja files will be created in the build directory. I strongly suggest
you to run cmake
on a separate build directory, not inside the
source tree.
To build, simply type:
$ ninja
It should automatically find out how many cores you have, what are the rules that needs building and will build the whole thing.
You can’t run ninja check-all
on this tree because the created
binaries are targeted to ARM, not x86_64.
Installing and Using¶
After the LLVM/Clang has built successfully, you should install it via:
$ ninja install
which will create a sysroot on the install-dir. You can then tar that directory into a binary with the full triple name (for easy identification), like:
$ ln -sf <install-dir> arm-linux-gnueabihf-clang $ tar zchf arm-linux-gnueabihf-clang.tar.gz arm-linux-gnueabihf-clang
If you copy that tarball to your target board, you’ll be able to use it for running the test-suite, for example. Follow the guidelines at http://llvm.org/docs/lnt/quickstart.html, unpack the tarball in the test directory, and use options:
$ ./sandbox/bin/python sandbox/bin/lnt runtest nt \ --sandbox sandbox \ --test-suite `pwd`/test-suite \ --cc `pwd`/arm-linux-gnueabihf-clang/bin/clang \ --cxx `pwd`/arm-linux-gnueabihf-clang/bin/clang++
Remember to add the -jN
options to lnt
to the number of CPUs
on your board. Also, the path to your clang has to be absolute, so
you’ll need the pwd trick above.
LLVM Command Guide¶
The following documents are command descriptions for all of the LLVM tools.
These pages describe how to use the LLVM commands and what their options are.
Note that these pages do not describe all of the options available for all
tools. To get a complete listing, pass the --help
(general options) or
--help-hidden
(general and debugging options) arguments to the tool you are
interested in.
Basic Commands¶
llvm-as - LLVM assembler¶
SYNOPSIS¶
llvm-as [options] [filename]
DESCRIPTION¶
llvm-as is the LLVM assembler. It reads a file containing human-readable LLVM assembly language, translates it to LLVM bitcode, and writes the result into a file or to standard output.
If filename is omitted or is -
, then llvm-as reads its input from
standard input.
If an output file is not specified with the -o option, then llvm-as sends its output to a file or standard output by following these rules:
- If the input is standard input, then the output is standard output.
- If the input is a file that ends with
.ll
, then the output file is of the same name, except that the suffix is changed to.bc
. - If the input is a file that does not end with the
.ll
suffix, then the output file has the same name as the input file, except that the.bc
suffix is appended.
OPTIONS¶
- -f
- Enable binary output on terminals. Normally, llvm-as will refuse to write raw bitcode output if the output stream is a terminal. With this option, llvm-as will write raw bitcode regardless of the output device.
- -help
- Print a summary of command line options.
- -o filename
- Specify the output file name. If filename is
-
, then llvm-as sends its output to standard output.
EXIT STATUS¶
If llvm-as succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
SEE ALSO¶
llvm-dis|llvm-dis, gccas|gccas
llvm-dis - LLVM disassembler¶
SYNOPSIS¶
llvm-dis [options] [filename]
DESCRIPTION¶
The llvm-dis command is the LLVM disassembler. It takes an LLVM bitcode file and converts it into human-readable LLVM assembly language.
If filename is omitted or specified as -
, llvm-dis reads its
input from standard input.
If the input is being read from standard input, then llvm-dis
will send its output to standard output by default. Otherwise, the
output will be written to a file named after the input file, with
a .ll
suffix added (any existing .bc
suffix will first be
removed). You can override the choice of output file using the
-o option.
OPTIONS¶
-f
Enable binary output on terminals. Normally, llvm-dis will refuse to write raw bitcode output if the output stream is a terminal. With this option, llvm-dis will write raw bitcode regardless of the output device.
-help
Print a summary of command line options.
-o filename
Specify the output file name. If filename is -, then the output is sent to standard output.
EXIT STATUS¶
If llvm-dis succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
SEE ALSO¶
llvm-as|llvm-as
opt - LLVM optimizer¶
SYNOPSIS¶
opt [options] [filename]
DESCRIPTION¶
The opt command is the modular LLVM optimizer and analyzer. It
takes LLVM source files as input, runs the specified optimizations or analyses
on it, and then outputs the optimized file or the analysis results. The
function of opt depends on whether the -analyze
option is
given.
When -analyze
is specified, opt performs various analyses
of the input source. It will usually print the results on standard output, but
in a few cases, it will print output to standard error or generate a file with
the analysis output, which is usually done when the output is meant for another
program.
While -analyze
is not given, opt attempts to produce an
optimized output file. The optimizations available via opt depend
upon what libraries were linked into it as well as any additional libraries
that have been loaded with the -load
option. Use the -help
option to determine what optimizations you can use.
If filename
is omitted from the command line or is “-
”, opt
reads its input from standard input. Inputs can be in either the LLVM assembly
language format (.ll
) or the LLVM bitcode format (.bc
).
If an output filename is not specified with the -o
option,
opt writes its output to the standard output.
OPTIONS¶
-
-f
¶
Enable binary output on terminals. Normally, opt will refuse to write raw bitcode output if the output stream is a terminal. With this option, opt will write raw bitcode regardless of the output device.
-
-help
¶
Print a summary of command line options.
-
-o
<filename>
¶ Specify the output filename.
-
-S
¶
Write output in LLVM intermediate language (instead of bitcode).
-
-
{passname}
¶ opt provides the ability to run any of LLVM’s optimization or analysis passes in any order. The
-help
option lists all the passes available. The order in which the options occur on the command line are the order in which they are executed (within pass constraints).
-
-disable-inlining
¶
This option simply removes the inlining pass from the standard list.
-
-disable-opt
¶
This option is only meaningful when
-std-link-opts
is given. It disables most passes.
-
-strip-debug
¶
This option causes opt to strip debug information from the module before applying other optimizations. It is essentially the same as
-strip
but it ensures that stripping of debug information is done first.
-
-verify-each
¶
This option causes opt to add a verify pass after every pass otherwise specified on the command line (including
-verify
). This is useful for cases where it is suspected that a pass is creating an invalid module but it is not clear which pass is doing it.
-
-stats
¶
Print statistics.
-
-time-passes
¶
Record the amount of time needed for each pass and print it to standard error.
-
-debug
¶
If this is a debug build, this option will enable debug printouts from passes which use the
DEBUG()
macro. See the LLVM Programmer’s Manual, section#DEBUG
for more information.
-
-load
=<plugin>
¶ Load the dynamic object
plugin
. This object should register new optimization or analysis passes. Once loaded, the object will add new command line options to enable various optimizations or analyses. To see the new complete list of optimizations, use the-help
and-load
options together. For example:opt -load=plugin.so -help
-
-p
¶
Print module after each transformation.
EXIT STATUS¶
If opt succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
llc - LLVM static compiler¶
SYNOPSIS¶
llc [options] [filename]
DESCRIPTION¶
The llc command compiles LLVM source inputs into assembly language for a specified architecture. The assembly language output can then be passed through a native assembler and linker to generate a native executable.
The choice of architecture for the output assembly code is automatically
determined from the input file, unless the -march
option is used to
override the default.
OPTIONS¶
If filename
is “-
” or omitted, llc reads from standard input.
Otherwise, it will from filename
. Inputs can be in either the LLVM assembly
language format (.ll
) or the LLVM bitcode format (.bc
).
If the -o
option is omitted, then llc will send its output
to standard output if the input is from standard input. If the -o
option specifies “-
”, then the output will also be sent to standard output.
If no -o
option is specified and an input file other than “-
” is
specified, then llc creates the output filename by taking the input
filename, removing any existing .bc
extension, and adding a .s
suffix.
Other llc options are described below.
End-user Options¶
-
-help
¶
Print a summary of command line options.
-
-O
=uint
¶ Generate code at different optimization levels. These correspond to the
-O0
,-O1
,-O2
, and-O3
optimization levels used by clang.
-
-mtriple
=<target triple>
¶ Override the target triple specified in the input file with the specified string.
-
-march
=<arch>
¶ Specify the architecture for which to generate assembly, overriding the target encoded in the input file. See the output of
llc -help
for a list of valid architectures. By default this is inferred from the target triple or autodetected to the current architecture.
-
-mcpu
=<cpuname>
¶ Specify a specific chip in the current architecture to generate code for. By default this is inferred from the target triple and autodetected to the current architecture. For a list of available CPUs, use:
llvm-as < /dev/null | llc -march=xyz -mcpu=help
-
-filetype
=<output file type>
¶ Specify what kind of output
llc
should generated. Options are:asm
for textual assembly ('.s'
),obj
for native object files ('.o'
) andnull
for not emitting anything (for performance testing).Note that not all targets support all options.
-
-mattr
=a1,+a2,-a3,...
¶ Override or control specific attributes of the target, such as whether SIMD operations are enabled or not. The default set of attributes is set by the current CPU. For a list of available attributes, use:
llvm-as < /dev/null | llc -march=xyz -mattr=help
-
--disable-fp-elim
¶
Disable frame pointer elimination optimization.
-
--disable-excess-fp-precision
¶
Disable optimizations that may produce excess precision for floating point. Note that this option can dramatically slow down code on some systems (e.g. X86).
-
--enable-no-infs-fp-math
¶
Enable optimizations that assume no Inf values.
-
--enable-no-nans-fp-math
¶
Enable optimizations that assume no NAN values.
-
--enable-unsafe-fp-math
¶
Enable optimizations that make unsafe assumptions about IEEE math (e.g. that addition is associative) or may not work for all input ranges. These optimizations allow the code generator to make use of some instructions which would otherwise not be usable (such as
fsin
on X86).
-
--stats
¶
Print statistics recorded by code-generation passes.
-
--time-passes
¶
Record the amount of time needed for each pass and print a report to standard error.
Tuning/Configuration Options¶
-
--print-machineinstrs
¶
Print generated machine code between compilation phases (useful for debugging).
-
--regalloc
=<allocator>
¶ Specify the register allocator to use. Valid register allocators are:
basic
Basic register allocator.fast
Fast register allocator. It is the default for unoptimized code.greedy
Greedy register allocator. It is the default for optimized code.pbqp
Register allocator based on ‘Partitioned Boolean Quadratic Programming’.
-
--spiller
=<spiller>
¶ Specify the spiller to use for register allocators that support it. Currently this option is used only by the linear scan register allocator. The default
spiller
is local. Valid spillers are:simple
Simple spillerlocal
Local spiller
EXIT STATUS¶
If llc succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
SEE ALSO¶
lli
lli - directly execute programs from LLVM bitcode¶
SYNOPSIS¶
lli [options] [filename] [program args]
DESCRIPTION¶
lli directly executes programs in LLVM bitcode format. It takes a program in LLVM bitcode format and executes it using a just-in-time compiler, if one is available for the current architecture, or an interpreter. lli takes all of the same code generator options as llc|llc, but they are only effective when lli is using the just-in-time compiler.
If filename is not specified, then lli reads the LLVM bitcode for the program from standard input.
The optional args specified on the command line are passed to the program as arguments.
GENERAL OPTIONS¶
-fake-argv0=executable
Override theargv[0]
value passed into the executing program.
-force-interpreter={false,true}
If set to true, use the interpreter even if a just-in-time compiler is available for this architecture. Defaults to false.
-help
Print a summary of command line options.
-load=pluginfilename
Causes lli to load the plugin (shared object) named pluginfilename and use it for optimization.
-stats
Print statistics from the code-generation passes. This is only meaningful for the just-in-time compiler, at present.
-time-passes
Record the amount of time needed for each code-generation pass and print it to standard error.
-version
Print out the version of lli and exit without doing anything else.
TARGET OPTIONS¶
-mtriple=target triple
Override the target triple specified in the input bitcode file with the specified string. This may result in a crash if you pick an architecture which is not compatible with the current system.
-march=arch
Specify the architecture for which to generate assembly, overriding the target encoded in the bitcode file. See the output of llc -help for a list of valid architectures. By default this is inferred from the target triple or autodetected to the current architecture.
-mcpu=cpuname
Specify a specific chip in the current architecture to generate code for. By default this is inferred from the target triple and autodetected to the current architecture. For a list of available CPUs, use: llvm-as < /dev/null | llc -march=xyz -mcpu=help
-mattr=a1,+a2,-a3,...
Override or control specific attributes of the target, such as whether SIMD operations are enabled or not. The default set of attributes is set by the current CPU. For a list of available attributes, use: llvm-as < /dev/null | llc -march=xyz -mattr=help
FLOATING POINT OPTIONS¶
-disable-excess-fp-precision
Disable optimizations that may increase floating point precision.
-enable-no-infs-fp-math
Enable optimizations that assume no Inf values.
-enable-no-nans-fp-math
Enable optimizations that assume no NAN values.
-enable-unsafe-fp-math
Causes lli to enable optimizations that may decrease floating point precision.
-soft-float
Causes lli to generate software floating point library calls instead of equivalent hardware instructions.
CODE GENERATION OPTIONS¶
-code-model=model
Choose the code model from:
default: Target default code model small: Small code model kernel: Kernel code model medium: Medium code model large: Large code model
-disable-post-RA-scheduler
Disable scheduling after register allocation.
-disable-spill-fusing
Disable fusing of spill code into instructions.
-jit-enable-eh
Exception handling should be enabled in the just-in-time compiler.
-join-liveintervals
Coalesce copies (default=true).
-nozero-initialized-in-bss Don’t place zero-initialized symbols into the BSS section.
-pre-RA-sched=scheduler
Instruction schedulers available (before register allocation):
=default: Best scheduler for the target =none: No scheduling: breadth first sequencing =simple: Simple two pass scheduling: minimize critical path and maximize processor utilization =simple-noitin: Simple two pass scheduling: Same as simple except using generic latency =list-burr: Bottom-up register reduction list scheduling =list-tdrr: Top-down register reduction list scheduling =list-td: Top-down list scheduler -print-machineinstrs - Print generated machine code
-regalloc=allocator
Register allocator to use (default=linearscan)
=bigblock: Big-block register allocator =linearscan: linear scan register allocator =local - local register allocator =simple: simple register allocator
-relocation-model=model
Choose relocation model from:
=default: Target default relocation model =static: Non-relocatable code =pic - Fully relocatable, position independent code =dynamic-no-pic: Relocatable external references, non-relocatable code
-spiller
Spiller to use (default=local)
=simple: simple spiller =local: local spiller
-x86-asm-syntax=syntax
Choose style of code to emit from X86 backend:
=att: Emit AT&T-style assembly =intel: Emit Intel-style assembly
EXIT STATUS¶
If lli fails to load the program, it will exit with an exit code of 1. Otherwise, it will return the exit code of the program it executes.
SEE ALSO¶
llc|llc
llvm-link - LLVM bitcode linker¶
SYNOPSIS¶
llvm-link [options] filename ...
DESCRIPTION¶
llvm-link takes several LLVM bitcode files and links them together
into a single LLVM bitcode file. It writes the output file to standard output,
unless the -o
option is used to specify a filename.
OPTIONS¶
-
-f
¶
Enable binary output on terminals. Normally, llvm-link will refuse to write raw bitcode output if the output stream is a terminal. With this option, llvm-link will write raw bitcode regardless of the output device.
-
-o
filename
¶ Specify the output file name. If
filename
is “-
”, then llvm-link will write its output to standard output.
-
-S
¶
Write output in LLVM intermediate language (instead of bitcode).
-
-d
¶
If specified, llvm-link prints a human-readable version of the output bitcode file to standard error.
-
-help
¶
Print a summary of command line options.
-
-v
¶
Verbose mode. Print information about what llvm-link is doing. This typically includes a message for each bitcode file linked in and for each library found.
EXIT STATUS¶
If llvm-link succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
llvm-ar - LLVM archiver¶
SYNOPSIS¶
llvm-ar [-]{dmpqrtx}[Rabfikou] [relpos] [count] <archive> [files...]
DESCRIPTION¶
The llvm-ar command is similar to the common Unix utility, ar
. It
archives several files together into a single file. The intent for this is
to produce archive libraries by LLVM bitcode that can be linked into an
LLVM program. However, the archive can contain any kind of file. By default,
llvm-ar generates a symbol table that makes linking faster because
only the symbol table needs to be consulted, not each individual file member
of the archive.
The llvm-ar command can be used to read SVR4, GNU and BSD style archive
files. However, right now it can only write in the GNU format. If an
SVR4 or BSD style archive is used with the r
(replace) or q
(quick
update) operations, the archive will be reconstructed in GNU format.
Here’s where llvm-ar departs from previous ar
implementations:
Symbol Table
Since llvm-ar supports bitcode files. The symbol table it creates is in GNU format and includes both native and bitcode files.
Long Paths
Currently llvm-ar can read GNU and BSD long file names, but only writes archives with the GNU format.
OPTIONS¶
The options to llvm-ar are compatible with other ar
implementations.
However, there are a few modifiers (R) that are not found in other ar
implementations. The options to llvm-ar specify a single basic operation to
perform on the archive, a variety of modifiers for that operation, the name of
the archive file, and an optional list of file names. These options are used to
determine how llvm-ar should process the archive file.
The Operations and Modifiers are explained in the sections below. The minimal
set of options is at least one operator and the name of the archive. Typically
archive files end with a .a
suffix, but this is not required. Following
the archive-name comes a list of files that indicate the specific members
of the archive to operate on. If the files option is not specified, it
generally means either “none” or “all” members, depending on the operation.
Operations¶
d
Delete files from the archive. No modifiers are applicable to this operation. The files options specify which members should be removed from the archive. It is not an error if a specified file does not appear in the archive. If no files are specified, the archive is not modified.
m[abi]
Move files from one location in the archive to another. The a, b, and i modifiers apply to this operation. The files will all be moved to the location given by the modifiers. If no modifiers are used, the files will be moved to the end of the archive. If no files are specified, the archive is not modified.
p
Print files to the standard output. This operation simply prints the files indicated to the standard output. If no files are specified, the entire archive is printed. Printing bitcode files is ill-advised as they might confuse your terminal settings. The p operation never modifies the archive.
q
Quickly append files to the end of the archive. This operation quickly adds the files to the archive without checking for duplicates that should be removed first. If no files are specified, the archive is not modified. Because of the way that llvm-ar constructs the archive file, its dubious whether the q operation is any faster than the r operation.
r[abu]
Replace or insert file members. The a, b, and u modifiers apply to this operation. This operation will replace existing files or insert them at the end of the archive if they do not exist. If no files are specified, the archive is not modified.
t[v]
Print the table of contents. Without any modifiers, this operation just prints the names of the members to the standard output. With the v modifier, llvm-ar also prints out the file type (B=bitcode, S=symbol table, blank=regular file), the permission mode, the owner and group, the size, and the date. If any files are specified, the listing is only for those files. If no files are specified, the table of contents for the whole archive is printed.
x[oP]
Extract archive members back to files. The o modifier applies to this operation. This operation retrieves the indicated files from the archive and writes them back to the operating system’s file system. If no files are specified, the entire archive is extract.
Modifiers (operation specific)¶
The modifiers below are specific to certain operations. See the Operations section (above) to determine which modifiers are applicable to which operations.
[a]
When inserting or moving member files, this option specifies the destination of the new files as being after the relpos member. If relpos is not found, the files are placed at the end of the archive.
[b]
When inserting or moving member files, this option specifies the destination of the new files as being before the relpos member. If relpos is not found, the files are placed at the end of the archive. This modifier is identical to the i modifier.
[i]
A synonym for the b option.
[o]
When extracting files, this option will cause llvm-ar to preserve the original modification times of the files it writes.
[u]
When replacing existing files in the archive, only replace those files that have a time stamp than the time stamp of the member in the archive.
Modifiers (generic)¶
The modifiers below may be applied to any operation.
[c]
For all operations, llvm-ar will always create the archive if it doesn’t exist. Normally, llvm-ar will print a warning message indicating that the archive is being created. Using this modifier turns off that warning.
[s]
This modifier requests that an archive index (or symbol table) be added to the archive. This is the default mode of operation. The symbol table will contain all the externally visible functions and global variables defined by all the bitcode files in the archive.
[S]
This modifier is the opposite of the s modifier. It instructs llvm-ar to not build the symbol table. If both s and S are used, the last modifier to occur in the options will prevail.
[v]
This modifier instructs llvm-ar to be verbose about what it is doing. Each editing operation taken against the archive will produce a line of output saying what is being done.
STANDARDS¶
The llvm-ar utility is intended to provide a superset of the IEEE Std 1003.2
(POSIX.2) functionality for ar
. llvm-ar can read both SVR4 and BSD4.4 (or
Mac OS X) archives. If the f
modifier is given to the x
or r
operations
then llvm-ar will write SVR4 compatible archives. Without this modifier,
llvm-ar will write BSD4.4 compatible archives that have long names
immediately after the header and indicated using the “#1/ddd” notation for the
name in the header.
FILE FORMAT¶
The file format for LLVM Archive files is similar to that of BSD 4.4 or Mac OSX
archive files. In fact, except for the symbol table, the ar
commands on those
operating systems should be able to read LLVM archive files. The details of the
file format follow.
Each archive begins with the archive magic number which is the eight printable characters ”!<arch>n” where n represents the newline character (0x0A). Following the magic number, the file is composed of even length members that begin with an archive header and end with a n padding character if necessary (to make the length even). Each file member is composed of a header (defined below), an optional newline-terminated “long file name” and the contents of the file.
The fields of the header are described in the items below. All fields of the header contain only ASCII characters, are left justified and are right padded with space characters.
name - char[16]
This field of the header provides the name of the archive member. If the name is longer than 15 characters or contains a slash (/) character, then this field contains#1/nnn
wherennn
provides the length of the name and the#1/
is literal. In this case, the actual name of the file is provided in thennn
bytes immediately following the header. If the name is 15 characters or less, it is contained directly in this field and terminated with a slash (/) character.
date - char[12]
This field provides the date of modification of the file in the form of a decimal encoded number that provides the number of seconds since the epoch (since 00:00:00 Jan 1, 1970) per Posix specifications.
uid - char[6]
This field provides the user id of the file encoded as a decimal ASCII string. This field might not make much sense on non-Unix systems. On Unix, it is the same value as the st_uid field of the stat structure returned by the stat(2) operating system call.
gid - char[6]
This field provides the group id of the file encoded as a decimal ASCII string. This field might not make much sense on non-Unix systems. On Unix, it is the same value as the st_gid field of the stat structure returned by the stat(2) operating system call.
mode - char[8]
This field provides the access mode of the file encoded as an octal ASCII string. This field might not make much sense on non-Unix systems. On Unix, it is the same value as the st_mode field of the stat structure returned by the stat(2) operating system call.
size - char[10]
This field provides the size of the file, in bytes, encoded as a decimal ASCII string.
fmag - char[2]
This field is the archive file member magic number. Its content is always the two characters back tick (0x60) and newline (0x0A). This provides some measure utility in identifying archive files that have been corrupted.
offset - vbr encoded 32-bit integer
The offset item provides the offset into the archive file where the bitcode member is stored that is associated with the symbol. The offset value is 0 based at the start of the first “normal” file member. To derive the actual file offset of the member, you must add the number of bytes occupied by the file signature (8 bytes) and the symbol tables. The value of this item is encoded using variable bit rate encoding to reduce the size of the symbol table. Variable bit rate encoding uses the high bit (0x80) of each byte to indicate if there are more bytes to follow. The remaining 7 bits in each byte carry bits from the value. The final byte does not have the high bit set.
length - vbr encoded 32-bit integer
The length item provides the length of the symbol that follows. Like this offset item, the length is variable bit rate encoded.
symbol - character array
The symbol item provides the text of the symbol that is associated with the offset. The symbol is not terminated by any character. Its length is provided by the length field. Note that is allowed (but unwise) to use non-printing characters (even 0x00) in the symbol. This allows for multiple encodings of symbol names.
EXIT STATUS¶
If llvm-ar succeeds, it will exit with 0. A usage error, results in an exit code of 1. A hard (file system typically) error results in an exit code of 2. Miscellaneous or unknown errors result in an exit code of 3.
SEE ALSO¶
ar(1)
llvm-nm - list LLVM bitcode and object file’s symbol table¶
SYNOPSIS¶
llvm-nm [options] [filenames...]
DESCRIPTION¶
The llvm-nm utility lists the names of symbols from the LLVM bitcode files, object files, or ar archives containing them, named on the command line. Each symbol is listed along with some simple information about its provenance. If no file name is specified, or - is used as a file name, llvm-nm will process a file on its standard input stream.
llvm-nm‘s default output format is the traditional BSD nm output format. Each such output record consists of an (optional) 8-digit hexadecimal address, followed by a type code character, followed by a name, for each symbol. One record is printed per line; fields are separated by spaces. When the address is omitted, it is replaced by 8 spaces.
Type code characters currently supported, and their meanings, are as follows:
U
Named object is referenced but undefined in this bitcode file
C
Common (multiple definitions link together into one def)
W
Weak reference (multiple definitions link together into zero or one definitions)
t
Local function (text) object
T
Global function (text) object
d
Local data object
D
Global data object
?
Something unrecognizable
Because LLVM bitcode files typically contain objects that are not considered to have addresses until they are linked into an executable image or dynamically compiled “just-in-time”, llvm-nm does not print an address for any symbol in an LLVM bitcode file, even symbols which are defined in the bitcode file.
OPTIONS¶
-
-B
(default)
¶ Use BSD output format. Alias for
--format=bsd
.
-
-P
¶
Use POSIX.2 output format. Alias for
--format=posix
.
-
--debug-syms
,
-a
¶
Show all symbols, even debugger only.
-
--defined-only
¶
Print only symbols defined in this file (as opposed to symbols which may be referenced by objects in this file, but not defined in this file.)
-
--dynamic
,
-D
¶
Display dynamic symbols instead of normal symbols.
-
--extern-only
,
-g
¶
Print only symbols whose definitions are external; that is, accessible from other files.
-
--format
=format
,
-f
format
¶ Select an output format; format may be sysv, posix, or bsd. The default is bsd.
-
-help
¶
Print a summary of command-line options and their meanings.
-
--no-sort
,
-p
¶
Shows symbols in order encountered.
-
--numeric-sort
,
-n
,
-v
¶
Sort symbols by address.
-
--print-file-name
,
-A
,
-o
¶
Precede each symbol with the file it came from.
-
--print-size
,
-S
¶
Show symbol size instead of address.
-
--size-sort
¶
Sort symbols by size.
-
--undefined-only
,
-u
¶
Print only symbols referenced but not defined in this file.
BUGS¶
- llvm-nm cannot demangle C++ mangled names, like GNU nm can.
- llvm-nm does not support the full set of arguments that GNU nm does.
EXIT STATUS¶
llvm-nm exits with an exit code of zero.
SEE ALSO¶
llvm-dis, ar(1), nm(1)
llvm-config - Print LLVM compilation options¶
SYNOPSIS¶
llvm-config option [components...]
DESCRIPTION¶
llvm-config makes it easier to build applications that use LLVM. It can print the compiler flags, linker flags and object libraries needed to link against LLVM.
EXAMPLES¶
To link against the JIT:
g++ `llvm-config --cxxflags` -o HowToUseJIT.o -c HowToUseJIT.cpp
g++ `llvm-config --ldflags` -o HowToUseJIT HowToUseJIT.o \
`llvm-config --libs engine bcreader scalaropts`
OPTIONS¶
–version
Print the version number of LLVM.
-help
Print a summary of llvm-config arguments.
–prefix
Print the installation prefix for LLVM.
–src-root
Print the source root from which LLVM was built.
–obj-root
Print the object root used to build LLVM.
–bindir
Print the installation directory for LLVM binaries.
–includedir
Print the installation directory for LLVM headers.
–libdir
Print the installation directory for LLVM libraries.
–cxxflags
Print the C++ compiler flags needed to use LLVM headers.
–ldflags
Print the flags needed to link against LLVM libraries.
–libs
Print all the libraries needed to link against the specified LLVM components, including any dependencies.
–libnames
Similar to –libs, but prints the bare filenames of the libraries without -l or pathnames. Useful for linking against a not-yet-installed copy of LLVM.
–libfiles
Similar to –libs, but print the full path to each library file. This is useful when creating makefile dependencies, to ensure that a tool is relinked if any library it uses changes.
–components
Print all valid component names.
–targets-built
Print the component names for all targets supported by this copy of LLVM.
–build-mode
Print the build mode used when LLVM was built (e.g. Debug or Release)
COMPONENTS¶
To print a list of all available components, run llvm-config –components. In most cases, components correspond directly to LLVM libraries. Useful “virtual” components include:
all
Includes all LLVM libraries. The default if no components are specified.
backend
Includes either a native backend or the C backend.
engine
Includes either a native JIT or the bitcode interpreter.
EXIT STATUS¶
If llvm-config succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
llvm-diff - LLVM structural ‘diff’¶
SYNOPSIS¶
llvm-diff [options] module 1 module 2 [global name ...]
DESCRIPTION¶
llvm-diff compares the structure of two LLVM modules, primarily focusing on differences in function definitions. Insignificant differences, such as changes in the ordering of globals or in the names of local values, are ignored.
An input module will be interpreted as an assembly file if its name ends in ‘.ll’; otherwise it will be read in as a bitcode file.
If a list of global names is given, just the values with those names are compared; otherwise, all global values are compared, and diagnostics are produced for globals which only appear in one module or the other.
llvm-diff compares two functions by comparing their basic blocks, beginning with the entry blocks. If the terminators seem to match, then the corresponding successors are compared; otherwise they are ignored. This algorithm is very sensitive to changes in control flow, which tend to stop any downstream changes from being detected.
llvm-diff is intended as a debugging tool for writers of LLVM passes and frontends. It does not have a stable output format.
EXIT STATUS¶
If llvm-diff finds no differences between the modules, it will exit with 0 and produce no output. Otherwise it will exit with a non-zero value.
BUGS¶
Many important differences, like changes in linkage or function attributes, are not diagnosed.
Changes in memory behavior (for example, coalescing loads) can cause massive detected differences in blocks.
llvm-cov - emit coverage information¶
SYNOPSIS¶
llvm-cov command [args...]
DESCRIPTION¶
The llvm-cov tool shows code coverage information for
programs that are instrumented to emit profile data. It can be used to
work with gcov
-style coverage or with clang
‘s instrumentation
based profiling.
If the program is invoked with a base name of gcov
, it will behave as if
the llvm-cov gcov command were called. Otherwise, a command should
be provided.
GCOV COMMAND¶
SYNOPSIS¶
llvm-cov gcov [options] SOURCEFILE
DESCRIPTION¶
The llvm-cov gcov tool reads code coverage data files and displays
the coverage information for a specified source file. It is compatible with the
gcov
tool from version 4.2 of GCC
and may also be compatible with some
later versions of gcov
.
To use llvm-cov gcov, you must first build an instrumented version
of your application that collects coverage data as it runs. Compile with the
-fprofile-arcs
and -ftest-coverage
options to add the
instrumentation. (Alternatively, you can use the --coverage
option, which
includes both of those other options.) You should compile with debugging
information (-g
) and without optimization (-O0
); otherwise, the
coverage data cannot be accurately mapped back to the source code.
At the time you compile the instrumented code, a .gcno
data file will be
generated for each object file. These .gcno
files contain half of the
coverage data. The other half of the data comes from .gcda
files that are
generated when you run the instrumented program, with a separate .gcda
file for each object file. Each time you run the program, the execution counts
are summed into any existing .gcda
files, so be sure to remove any old
files if you do not want their contents to be included.
By default, the .gcda
files are written into the same directory as the
object files, but you can override that by setting the GCOV_PREFIX
and
GCOV_PREFIX_STRIP
environment variables. The GCOV_PREFIX_STRIP
variable specifies a number of directory components to be removed from the
start of the absolute path to the object file directory. After stripping those
directories, the prefix from the GCOV_PREFIX
variable is added. These
environment variables allow you to run the instrumented program on a machine
where the original object file directories are not accessible, but you will
then need to copy the .gcda
files back to the object file directories
where llvm-cov gcov expects to find them.
Once you have generated the coverage data files, run llvm-cov gcov
for each main source file where you want to examine the coverage results. This
should be run from the same directory where you previously ran the
compiler. The results for the specified source file are written to a file named
by appending a .gcov
suffix. A separate output file is also created for
each file included by the main source file, also with a .gcov
suffix added.
The basic content of an .gcov
output file is a copy of the source file with
an execution count and line number prepended to every line. The execution
count is shown as -
if a line does not contain any executable code. If
a line contains code but that code was never executed, the count is displayed
as #####
.
OPTIONS¶
-
-a
,
--all-blocks
¶
Display all basic blocks. If there are multiple blocks for a single line of source code, this option causes llvm-cov to show the count for each block instead of just one count for the entire line.
-
-b
,
--branch-probabilities
¶
Display conditional branch probabilities and a summary of branch information.
-
-c
,
--branch-counts
¶
Display branch counts instead of probabilities (requires -b).
-
-f
,
--function-summaries
¶
Show a summary of coverage for each function instead of just one summary for an entire source file.
-
--help
¶
Display available options (–help-hidden for more).
-
-l
,
--long-file-names
¶
For coverage output of files included from the main source file, add the main file name followed by
##
as a prefix to the output file names. This can be combined with the –preserve-paths option to use complete paths for both the main file and the included file.
-
-n
,
--no-output
¶
Do not output any
.gcov
files. Summary information is still displayed.
-
-o
=<DIR|FILE>
,
--object-directory
=<DIR>
,
--object-file
=<FILE>
¶ Find objects in DIR or based on FILE’s path. If you specify a particular object file, the coverage data files are expected to have the same base name with
.gcno
and.gcda
extensions. If you specify a directory, the files are expected in that directory with the same base name as the source file.
-
-p
,
--preserve-paths
¶
Preserve path components when naming the coverage output files. In addition to the source file name, include the directories from the path to that file. The directories are separate by
#
characters, with.
directories removed and..
directories replaced by^
characters. When used with the –long-file-names option, this applies to both the main file name and the included file name.
-
-u
,
--unconditional-branches
¶
Include unconditional branches in the output for the –branch-probabilities option.
-
-version
¶
Display the version of llvm-cov.
EXIT STATUS¶
llvm-cov gcov returns 1 if it cannot read input files. Otherwise, it exits with zero.
SHOW COMMAND¶
SYNOPSIS¶
llvm-cov show [options] -instr-profile PROFILE BIN [SOURCES]
DESCRIPTION¶
The llvm-cov show command shows line by line coverage of a binary BIN using the profile data PROFILE. It can optionally be filtered to only show the coverage for the files listed in SOURCES.
To use llvm-cov show, you need a program that is compiled with
instrumentation to emit profile and coverage data. To build such a program with
clang
use the -fprofile-instr-generate
and -fcoverage-mapping
flags. If linking with the clang
driver, pass -fprofile-instr-generate
to the link stage to make sure the necessary runtime libraries are linked in.
The coverage information is stored in the built executable or library itself,
and this is what you should pass to llvm-cov show as the BIN
argument. The profile data is generated by running this instrumented program
normally. When the program exits it will write out a raw profile file,
typically called default.profraw
, which can be converted to a format that
is suitable for the PROFILE argument using the llvm-profdata merge
tool.
OPTIONS¶
-
-show-line-counts
¶
Show the execution counts for each line. This is enabled by default, unless another
-show
option is used.
-
-show-expansions
¶
Expand inclusions, such as preprocessor macros or textual inclusions, inline in the display of the source file.
-
-show-instantiations
¶
For source regions that are instantiated multiple times, such as templates in
C++
, show each instantiation separately as well as the combined summary.
-
-show-regions
¶
Show the execution counts for each region by displaying a caret that points to the character where the region starts.
-
-show-line-counts-or-regions
¶
Show the execution counts for each line if there is only one region on the line, but show the individual regions if there are multiple on the line.
-
-use-color
[=VALUE]
¶ Enable or disable color output. By default this is autodetected.
-
-arch
=<name>
¶ If the covered binary is a universal binary, select the architecture to use. It is an error to specify an architecture that is not included in the universal binary or to use an architecture that does not match a non-universal binary.
-
-name
=<NAME>
¶ Show code coverage only for functions with the given name.
-
-name-regex
=<PATTERN>
¶ Show code coverage only for functions that match the given regular expression.
-
-line-coverage-gt
=<N>
¶ Show code coverage only for functions with line coverage greater than the given threshold.
-
-line-coverage-lt
=<N>
¶ Show code coverage only for functions with line coverage less than the given threshold.
-
-region-coverage-gt
=<N>
¶ Show code coverage only for functions with region coverage greater than the given threshold.
-
-region-coverage-lt
=<N>
¶ Show code coverage only for functions with region coverage less than the given threshold.
REPORT COMMAND¶
SYNOPSIS¶
llvm-cov report [options] -instr-profile PROFILE BIN [SOURCES]
DESCRIPTION¶
The llvm-cov report command displays a summary of the coverage of a binary BIN using the profile data PROFILE. It can optionally be filtered to only show the coverage for the files listed in SOURCES.
If no source files are provided, a summary line is printed for each file in the coverage data. If any files are provided, summaries are shown for each function in the listed files instead.
For information on compiling programs for coverage and generating profile data, see SHOW COMMAND.
OPTIONS¶
-
-use-color
[=VALUE]
¶ Enable or disable color output. By default this is autodetected.
-
-arch
=<name>
¶ If the covered binary is a universal binary, select the architecture to use. It is an error to specify an architecture that is not included in the universal binary or to use an architecture that does not match a non-universal binary.
llvm-profdata - Profile data tool¶
SYNOPSIS¶
llvm-profdata command [args...]
DESCRIPTION¶
The llvm-profdata tool is a small utility for working with profile data files.
MERGE¶
SYNOPSIS¶
llvm-profdata merge [options] [filenames...]
DESCRIPTION¶
llvm-profdata merge takes several profile data files generated by PGO instrumentation and merges them together into a single indexed profile data file.
OPTIONS¶
-
-help
¶
Print a summary of command line options.
-
-output
=output
,
-o
=output
¶ Specify the output file name. Output cannot be
-
as the resulting indexed profile data can’t be written to standard output.
-
-instr
(default)
¶ Specify that the input profile is an instrumentation-based profile.
-
-sample
¶
Specify that the input profile is a sample-based profile. When using sample-based profiles, the format of the generated file can be generated in one of three ways:
-
-binary
(default)
¶
Emit the profile using a binary encoding.
-
-text
¶
Emit the profile in text mode.
-
-gcc
¶
Emit the profile using GCC’s gcov format (Not yet supported).
-
SHOW¶
SYNOPSIS¶
llvm-profdata show [options] [filename]
DESCRIPTION¶
llvm-profdata show takes a profile data file and displays the information about the profile counters for this file and for any of the specified function(s).
If filename is omitted or is -
, then llvm-profdata show reads its
input from standard input.
OPTIONS¶
-
-all-functions
¶
Print details for every function.
-
-counts
¶
Print the counter values for the displayed functions.
-
-function
=string
¶ Print details for a function if the function’s name contains the given string.
-
-help
¶
Print a summary of command line options.
-
-output
=output
,
-o
=output
¶ Specify the output file name. If output is
-
or it isn’t specified, then the output is sent to standard output.
-
-instr
(default)
¶ Specify that the input profile is an instrumentation-based profile.
-
-sample
¶
Specify that the input profile is a sample-based profile.
EXIT STATUS¶
llvm-profdata returns 1 if the command is omitted or is invalid, if it cannot read input files, or if there is a mismatch between their data.
llvm-stress - generate random .ll files¶
SYNOPSIS¶
llvm-stress [-size=filesize] [-seed=initialseed] [-o=outfile]
DESCRIPTION¶
The llvm-stress tool is used to generate random .ll
files that
can be used to test different components of LLVM.
OPTIONS¶
-
-o
filename
¶ Specify the output filename.
-
-size
size
¶ Specify the size of the generated
.ll
file.
-
-seed
seed
¶ Specify the seed to be used for the randomly generated instructions.
EXIT STATUS¶
llvm-stress returns 0.
llvm-symbolizer - convert addresses into source code locations¶
SYNOPSIS¶
llvm-symbolizer [options]
DESCRIPTION¶
llvm-symbolizer reads object file names and addresses from standard input and prints corresponding source code locations to standard output. If object file is specified in command line, llvm-symbolizer reads only addresses from standard input. This program uses debug info sections and symbol table in the object files.
EXAMPLE¶
$ cat addr.txt
a.out 0x4004f4
/tmp/b.out 0x400528
/tmp/c.so 0x710
/tmp/mach_universal_binary:i386 0x1f84
/tmp/mach_universal_binary:x86_64 0x100000f24
$ llvm-symbolizer < addr.txt
main
/tmp/a.cc:4
f(int, int)
/tmp/b.cc:11
h_inlined_into_g
/tmp/header.h:2
g_inlined_into_f
/tmp/header.h:7
f_inlined_into_main
/tmp/source.cc:3
main
/tmp/source.cc:8
_main
/tmp/source_i386.cc:8
_main
/tmp/source_x86_64.cc:8
$ cat addr2.txt
0x4004f4
0x401000
$ llvm-symbolizer -obj=a.out < addr2.txt
main
/tmp/a.cc:4
foo(int)
/tmp/a.cc:12
OPTIONS¶
-
-obj
¶
Path to object file to be symbolized.
-
-functions
=[none|short|linkage]
¶ Specify the way function names are printed (omit function name, print short function name, or print full linkage name, respectively). Defaults to
linkage
.
-
-use-symbol-table
¶
Prefer function names stored in symbol table to function names in debug info sections. Defaults to true.
-
-demangle
¶
Print demangled function names. Defaults to true.
-
-inlining
¶
If a source code location is in an inlined function, prints all the inlnied frames. Defaults to true.
-
-default-arch
¶
If a binary contains object files for multiple architectures (e.g. it is a Mach-O universal binary), symbolize the object file for a given architecture. You can also specify architecture by writing
binary_name:arch_name
in the input (see example above). If architecture is not specified in either way, address will not be symbolized. Defaults to empty string.
-
-dsym-hint
=<path/to/file.dSYM>
¶ (Darwin-only flag). If the debug info for a binary isn’t present in the default location, look for the debug info at the .dSYM path provided via the
-dsym-hint
flag. This flag can be used multiple times.
EXIT STATUS¶
llvm-symbolizer returns 0. Other exit codes imply internal program error.
llvm-dwarfdump - print contents of DWARF sections¶
SYNOPSIS¶
llvm-dwarfdump [options] [filenames...]
DESCRIPTION¶
llvm-dwarfdump parses DWARF sections in the object files and prints their contents in human-readable form.
OPTIONS¶
-
-debug-dump
=section
¶ Specify the DWARF section to dump. For example, use
abbrev
to dump the contents of.debug_abbrev
section,loc.dwo
to dump the contents of.debug_loc.dwo
etc. Seellvm-dwarfdump --help
for the complete list of supported sections. Useall
to dump all DWARF sections. It is the default.
EXIT STATUS¶
llvm-dwarfdump returns 0 if the input files were parsed and dumped successfully. Otherwise, it returns 1.
Debugging Tools¶
bugpoint - automatic test case reduction tool¶
SYNOPSIS¶
bugpoint [options] [input LLVM ll/bc files] [LLVM passes] –args program arguments
DESCRIPTION¶
bugpoint narrows down the source of problems in LLVM tools and passes. It can be used to debug three types of failures: optimizer crashes, miscompilations by optimizers, or bad native code generation (including problems in the static and JIT compilers). It aims to reduce large test cases to small, useful ones. For more information on the design and inner workings of bugpoint, as well as advice for using bugpoint, see llvm/docs/Bugpoint.html in the LLVM distribution.
OPTIONS¶
–additional-so library
Load the dynamic shared object library into the test program whenever it is run. This is useful if you are debugging programs which depend on non-LLVM libraries (such as the X or curses libraries) to run.
–append-exit-code={true,false}
Append the test programs exit code to the output file so that a change in exit code is considered a test failure. Defaults to false.
–args program args
Pass all arguments specified after –args to the test program whenever it runs. Note that if any of the program args start with a “
-
”, you should use:bugpoint [bugpoint args] --args -- [program args]The “
--
” right after the –args option tells bugpoint to consider any options starting with “-
” to be part of the –args option, not as options to bugpoint itself.
–tool-args tool args
Pass all arguments specified after –tool-args to the LLVM tool under test (llc, lli, etc.) whenever it runs. You should use this option in the following way:
bugpoint [bugpoint args] --tool-args -- [tool args]The “
--
” right after the –tool-args option tells bugpoint to consider any options starting with “-
” to be part of the –tool-args option, not as options to bugpoint itself. (See –args, above.)
–safe-tool-args tool args
Pass all arguments specified after –safe-tool-args to the “safe” execution tool.
–gcc-tool-args gcc tool args
Pass all arguments specified after –gcc-tool-args to the invocation of gcc.
–opt-args opt args
Pass all arguments specified after –opt-args to the invocation of opt.
–disable-{dce,simplifycfg}
Do not run the specified passes to clean up and reduce the size of the test program. By default, bugpoint uses these passes internally when attempting to reduce test programs. If you’re trying to find a bug in one of these passes, bugpoint may crash.
–enable-valgrind
Use valgrind to find faults in the optimization phase. This will allow bugpoint to find otherwise asymptomatic problems caused by memory mis-management.
-find-bugs
Continually randomize the specified passes and run them on the test program until a bug is found or the user kills bugpoint.
-help
Print a summary of command line options.
–input filename
Open filename and redirect the standard input of the test program, whenever it runs, to come from that file.
–load plugin
Load the dynamic object plugin into bugpoint itself. This object should register new optimization passes. Once loaded, the object will add new command line options to enable various optimizations. To see the new complete list of optimizations, use the -help and –load options together; for example:
bugpoint --load myNewPass.so -help
–mlimit megabytes
Specifies an upper limit on memory usage of the optimization and codegen. Set to zero to disable the limit.
–output filename
Whenever the test program produces output on its standard output stream, it should match the contents of filename (the “reference output”). If you do not use this option, bugpoint will attempt to generate a reference output by compiling the program with the “safe” backend and running it.
–run-{int,jit,llc,custom}
Whenever the test program is compiled, bugpoint should generate code for it using the specified code generator. These options allow you to choose the interpreter, the JIT compiler, the static native code compiler, or a custom command (see –exec-command) respectively.
–safe-{llc,custom}
When debugging a code generator, bugpoint should use the specified code generator as the “safe” code generator. This is a known-good code generator used to generate the “reference output” if it has not been provided, and to compile portions of the program that as they are excluded from the testcase. These options allow you to choose the static native code compiler, or a custom command, (see –exec-command) respectively. The interpreter and the JIT backends cannot currently be used as the “safe” backends.
–exec-command command
This option defines the command to use with the –run-custom and –safe-custom options to execute the bitcode testcase. This can be useful for cross-compilation.
–compile-command command
This option defines the command to use with the –compile-custom option to compile the bitcode testcase. This can be useful for testing compiler output without running any link or execute stages. To generate a reduced unit test, you may add CHECK directives to the testcase and pass the name of an executable compile-command script in this form:
#!/bin/sh llc "$@" not FileCheck [bugpoint input file].ll < bugpoint-test-program.sThis script will “fail” as long as FileCheck passes. So the result will be the minimum bitcode that passes FileCheck.
–safe-path path
This option defines the path to the command to execute with the –safe-{int,jit,llc,custom} option.
EXIT STATUS¶
If bugpoint succeeds in finding a problem, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
SEE ALSO¶
opt|opt
llvm-extract - extract a function from an LLVM module¶
SYNOPSIS¶
llvm-extract [options] –func function-name [filename]
DESCRIPTION¶
The llvm-extract command takes the name of a function and extracts it from the specified LLVM bitcode file. It is primarily used as a debugging tool to reduce test cases from larger programs that are triggering a bug.
In addition to extracting the bitcode of the specified function, llvm-extract will also remove unreachable global variables, prototypes, and unused types.
The llvm-extract command reads its input from standard input if
filename is omitted or if filename is -
. The output is always written to
standard output, unless the -o option is specified (see below).
OPTIONS¶
-f
Enable binary output on terminals. Normally, llvm-extract will refuse to write raw bitcode output if the output stream is a terminal. With this option, llvm-extract will write raw bitcode regardless of the output device.
–func function-name
Extract the function named function-name from the LLVM bitcode. May be specified multiple times to extract multiple functions at once.
–rfunc function-regular-expr
Extract the function(s) matching function-regular-expr from the LLVM bitcode. All functions matching the regular expression will be extracted. May be specified multiple times.
–glob global-name
Extract the global variable named global-name from the LLVM bitcode. May be specified multiple times to extract multiple global variables at once.
–rglob glob-regular-expr
Extract the global variable(s) matching global-regular-expr from the LLVM bitcode. All global variables matching the regular expression will be extracted. May be specified multiple times.
-help
Print a summary of command line options.
-o filename
Specify the output filename. If filename is “-” (the default), then llvm-extract sends its output to standard output.
-S
Write output in LLVM intermediate language (instead of bitcode).
EXIT STATUS¶
If llvm-extract succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
SEE ALSO¶
bugpoint
llvm-bcanalyzer - LLVM bitcode analyzer¶
SYNOPSIS¶
llvm-bcanalyzer [options] [filename]
DESCRIPTION¶
The llvm-bcanalyzer command is a small utility for analyzing bitcode files. The tool reads a bitcode file (such as generated with the llvm-as tool) and produces a statistical report on the contents of the bitcode file. The tool can also dump a low level but human readable version of the bitcode file. This tool is probably not of much interest or utility except for those working directly with the bitcode file format. Most LLVM users can just ignore this tool.
If filename is omitted or is -
, then llvm-bcanalyzer reads its
input from standard input. This is useful for combining the tool into a
pipeline. Output is written to the standard output.
OPTIONS¶
-
-nodetails
¶
Causes llvm-bcanalyzer to abbreviate its output by writing out only a module level summary. The details for individual functions are not displayed.
-
-dump
¶
Causes llvm-bcanalyzer to dump the bitcode in a human readable format. This format is significantly different from LLVM assembly and provides details about the encoding of the bitcode file.
-
-verify
¶
Causes llvm-bcanalyzer to verify the module produced by reading the bitcode. This ensures that the statistics generated are based on a consistent module.
-
-help
¶
Print a summary of command line options.
EXIT STATUS¶
If llvm-bcanalyzer succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value, usually 1.
SUMMARY OUTPUT DEFINITIONS¶
The following items are always printed by llvm-bcanalyzer. They comprize the summary output.
Bitcode Analysis Of Module
This just provides the name of the module for which bitcode analysis is being generated.
Bitcode Version Number
The bitcode version (not LLVM version) of the file read by the analyzer.
File Size
The size, in bytes, of the entire bitcode file.
Module Bytes
The size, in bytes, of the module block. Percentage is relative to File Size.
Function Bytes
The size, in bytes, of all the function blocks. Percentage is relative to File Size.
Global Types Bytes
The size, in bytes, of the Global Types Pool. Percentage is relative to File Size. This is the size of the definitions of all types in the bitcode file.
Constant Pool Bytes
The size, in bytes, of the Constant Pool Blocks Percentage is relative to File Size.
Module Globals Bytes
Ths size, in bytes, of the Global Variable Definitions and their initializers. Percentage is relative to File Size.
Instruction List Bytes
The size, in bytes, of all the instruction lists in all the functions. Percentage is relative to File Size. Note that this value is also included in the Function Bytes.
Compaction Table Bytes
The size, in bytes, of all the compaction tables in all the functions. Percentage is relative to File Size. Note that this value is also included in the Function Bytes.
Symbol Table Bytes
The size, in bytes, of all the symbol tables in all the functions. Percentage is relative to File Size. Note that this value is also included in the Function Bytes.
Dependent Libraries Bytes
The size, in bytes, of the list of dependent libraries in the module. Percentage is relative to File Size. Note that this value is also included in the Module Global Bytes.
Number Of Bitcode Blocks
The total number of blocks of any kind in the bitcode file.
Number Of Functions
The total number of function definitions in the bitcode file.
Number Of Types
The total number of types defined in the Global Types Pool.
Number Of Constants
The total number of constants (of any type) defined in the Constant Pool.
Number Of Basic Blocks
The total number of basic blocks defined in all functions in the bitcode file.
Number Of Instructions
The total number of instructions defined in all functions in the bitcode file.
Number Of Long Instructions
The total number of long instructions defined in all functions in the bitcode file. Long instructions are those taking greater than 4 bytes. Typically long instructions are GetElementPtr with several indices, PHI nodes, and calls to functions with large numbers of arguments.
Number Of Operands
The total number of operands used in all instructions in the bitcode file.
Number Of Compaction Tables
The total number of compaction tables in all functions in the bitcode file.
Number Of Symbol Tables
The total number of symbol tables in all functions in the bitcode file.
Number Of Dependent Libs
The total number of dependent libraries found in the bitcode file.
Total Instruction Size
The total size of the instructions in all functions in the bitcode file.
Average Instruction Size
The average number of bytes per instruction across all functions in the bitcode file. This value is computed by dividing Total Instruction Size by Number Of Instructions.
Maximum Type Slot Number
The maximum value used for a type’s slot number. Larger slot number values take more bytes to encode.
Maximum Value Slot Number
The maximum value used for a value’s slot number. Larger slot number values take more bytes to encode.
Bytes Per Value
The average size of a Value definition (of any type). This is computed by dividing File Size by the total number of values of any type.
Bytes Per Global
The average size of a global definition (constants and global variables).
Bytes Per Function
The average number of bytes per function definition. This is computed by dividing Function Bytes by Number Of Functions.
# of VBR 32-bit Integers
The total number of 32-bit integers encoded using the Variable Bit Rate encoding scheme.
# of VBR 64-bit Integers
The total number of 64-bit integers encoded using the Variable Bit Rate encoding scheme.
# of VBR Compressed Bytes
The total number of bytes consumed by the 32-bit and 64-bit integers that use the Variable Bit Rate encoding scheme.
# of VBR Expanded Bytes
The total number of bytes that would have been consumed by the 32-bit and 64-bit integers had they not been compressed with the Variable Bit Rage encoding scheme.
Bytes Saved With VBR
The total number of bytes saved by using the Variable Bit Rate encoding scheme. The percentage is relative to # of VBR Expanded Bytes.
DETAILED OUTPUT DEFINITIONS¶
The following definitions occur only if the -nodetails option was not given. The detailed output provides additional information on a per-function basis.
Type
The type signature of the function.
Byte Size
The total number of bytes in the function’s block.
Basic Blocks
The number of basic blocks defined by the function.
Instructions
The number of instructions defined by the function.
Long Instructions
The number of instructions using the long instruction format in the function.
Operands
The number of operands used by all instructions in the function.
Instruction Size
The number of bytes consumed by instructions in the function.
Average Instruction Size
The average number of bytes consumed by the instructions in the function. This value is computed by dividing Instruction Size by Instructions.
Bytes Per Instruction
The average number of bytes used by the function per instruction. This value is computed by dividing Byte Size by Instructions. Note that this is not the same as Average Instruction Size. It computes a number relative to the total function size not just the size of the instruction list.
Number of VBR 32-bit Integers
The total number of 32-bit integers found in this function (for any use).
Number of VBR 64-bit Integers
The total number of 64-bit integers found in this function (for any use).
Number of VBR Compressed Bytes
The total number of bytes in this function consumed by the 32-bit and 64-bit integers that use the Variable Bit Rate encoding scheme.
Number of VBR Expanded Bytes
The total number of bytes in this function that would have been consumed by the 32-bit and 64-bit integers had they not been compressed with the Variable Bit Rate encoding scheme.
Bytes Saved With VBR
The total number of bytes saved in this function by using the Variable Bit Rate encoding scheme. The percentage is relative to # of VBR Expanded Bytes.
Developer Tools¶
FileCheck - Flexible pattern matching file verifier¶
SYNOPSIS¶
FileCheck match-filename [–check-prefix=XXX] [–strict-whitespace]
DESCRIPTION¶
FileCheck reads two files (one from standard input, and one specified on the command line) and uses one to verify the other. This behavior is particularly useful for the testsuite, which wants to verify that the output of some tool (e.g. llc) contains the expected information (for example, a movsd from esp or whatever is interesting). This is similar to using grep, but it is optimized for matching multiple different inputs in one file in a specific order.
The match-filename
file specifies the file that contains the patterns to
match. The file to verify is read from standard input unless the
--input-file
option is used.
OPTIONS¶
-
-help
¶
Print a summary of command line options.
-
--check-prefix
prefix
¶ FileCheck searches the contents of
match-filename
for patterns to match. By default, these patterns are prefixed with “CHECK:
”. If you’d like to use a different prefix (e.g. because the same input file is checking multiple different tool or options), the--check-prefix
argument allows you to specify one or more prefixes to match. Multiple prefixes are useful for tests which might change for different run options, but most lines remain the same.
-
--input-file
filename
¶ File to check (defaults to stdin).
-
--strict-whitespace
¶
By default, FileCheck canonicalizes input horizontal whitespace (spaces and tabs) which causes it to ignore these differences (a space will match a tab). The
--strict-whitespace
argument disables this behavior. End-of-line sequences are canonicalized to UNIX-style\n
in all modes.
-
--implicit-check-not
check-pattern
¶ Adds implicit negative checks for the specified patterns between positive checks. The option allows writing stricter tests without stuffing them with
CHECK-NOT
s.For example, “
--implicit-check-not warning:
” can be useful when testing diagnostic messages from tools that don’t have an option similar toclang -verify
. With this option FileCheck will verify that input does not contain warnings not covered by anyCHECK:
patterns.
-
-version
¶
Show the version number of this program.
EXIT STATUS¶
If FileCheck verifies that the file matches the expected contents, it exits with 0. Otherwise, if not, or if an error occurs, it will exit with a non-zero value.
TUTORIAL¶
FileCheck is typically used from LLVM regression tests, being invoked on the RUN line of the test. A simple example of using FileCheck from a RUN line looks like this:
; RUN: llvm-as < %s | llc -march=x86-64 | FileCheck %s
This syntax says to pipe the current file (“%s
”) into llvm-as
, pipe
that into llc
, then pipe the output of llc
into FileCheck
. This
means that FileCheck will be verifying its standard input (the llc output)
against the filename argument specified (the original .ll
file specified by
“%s
”). To see how this works, let’s look at the rest of the .ll
file
(after the RUN line):
define void @sub1(i32* %p, i32 %v) {
entry:
; CHECK: sub1:
; CHECK: subl
%0 = tail call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %p, i32 %v)
ret void
}
define void @inc4(i64* %p) {
entry:
; CHECK: inc4:
; CHECK: incq
%0 = tail call i64 @llvm.atomic.load.add.i64.p0i64(i64* %p, i64 1)
ret void
}
Here you can see some “CHECK:
” lines specified in comments. Now you can
see how the file is piped into llvm-as
, then llc
, and the machine code
output is what we are verifying. FileCheck checks the machine code output to
verify that it matches what the “CHECK:
” lines specify.
The syntax of the “CHECK:
” lines is very simple: they are fixed strings that
must occur in order. FileCheck defaults to ignoring horizontal whitespace
differences (e.g. a space is allowed to match a tab) but otherwise, the contents
of the “CHECK:
” line is required to match some thing in the test file exactly.
One nice thing about FileCheck (compared to grep) is that it allows merging
test cases together into logical groups. For example, because the test above
is checking for the “sub1:
” and “inc4:
” labels, it will not match
unless there is a “subl
” in between those labels. If it existed somewhere
else in the file, that would not count: “grep subl
” matches if “subl
”
exists anywhere in the file.
The FileCheck -check-prefix option¶
The FileCheck -check-prefix
option allows multiple test
configurations to be driven from one .ll file. This is useful in many
circumstances, for example, testing different architectural variants with
llc. Here’s a simple example:
; RUN: llvm-as < %s | llc -mtriple=i686-apple-darwin9 -mattr=sse41 \
; RUN: | FileCheck %s -check-prefix=X32
; RUN: llvm-as < %s | llc -mtriple=x86_64-apple-darwin9 -mattr=sse41 \
; RUN: | FileCheck %s -check-prefix=X64
define <4 x i32> @pinsrd_1(i32 %s, <4 x i32> %tmp) nounwind {
%tmp1 = insertelement <4 x i32>; %tmp, i32 %s, i32 1
ret <4 x i32> %tmp1
; X32: pinsrd_1:
; X32: pinsrd $1, 4(%esp), %xmm0
; X64: pinsrd_1:
; X64: pinsrd $1, %edi, %xmm0
}
In this case, we’re testing that we get the expected code generation with both 32-bit and 64-bit code generation.
The “CHECK-NEXT:” directive¶
Sometimes you want to match lines and would like to verify that matches
happen on exactly consecutive lines with no other lines in between them. In
this case, you can use “CHECK:
” and “CHECK-NEXT:
” directives to specify
this. If you specified a custom check prefix, just use “<PREFIX>-NEXT:
”.
For example, something like this works as you’d expect:
define void @t2(<2 x double>* %r, <2 x double>* %A, double %B) {
%tmp3 = load <2 x double>* %A, align 16
%tmp7 = insertelement <2 x double> undef, double %B, i32 0
%tmp9 = shufflevector <2 x double> %tmp3,
<2 x double> %tmp7,
<2 x i32> < i32 0, i32 2 >
store <2 x double> %tmp9, <2 x double>* %r, align 16
ret void
; CHECK: t2:
; CHECK: movl 8(%esp), %eax
; CHECK-NEXT: movapd (%eax), %xmm0
; CHECK-NEXT: movhpd 12(%esp), %xmm0
; CHECK-NEXT: movl 4(%esp), %eax
; CHECK-NEXT: movapd %xmm0, (%eax)
; CHECK-NEXT: ret
}
“CHECK-NEXT:
” directives reject the input unless there is exactly one
newline between it and the previous directive. A “CHECK-NEXT:
” cannot be
the first directive in a file.
The “CHECK-SAME:” directive¶
Sometimes you want to match lines and would like to verify that matches happen
on the same line as the previous match. In this case, you can use “CHECK:
”
and “CHECK-SAME:
” directives to specify this. If you specified a custom
check prefix, just use “<PREFIX>-SAME:
”.
“CHECK-SAME:
” is particularly powerful in conjunction with “CHECK-NOT:
”
(described below).
For example, the following works like you’d expect:
!0 = !DILocation(line: 5, scope: !1, inlinedAt: !2)
; CHECK: !DILocation(line: 5,
; CHECK-NOT: column:
; CHECK-SAME: scope: ![[SCOPE:[0-9]+]]
“CHECK-SAME:
” directives reject the input if there are any newlines between
it and the previous directive. A “CHECK-SAME:
” cannot be the first
directive in a file.
The “CHECK-NOT:” directive¶
The “CHECK-NOT:
” directive is used to verify that a string doesn’t occur
between two matches (or before the first match, or after the last match). For
example, to verify that a load is removed by a transformation, a test like this
can be used:
define i8 @coerce_offset0(i32 %V, i32* %P) {
store i32 %V, i32* %P
%P2 = bitcast i32* %P to i8*
%P3 = getelementptr i8* %P2, i32 2
%A = load i8* %P3
ret i8 %A
; CHECK: @coerce_offset0
; CHECK-NOT: load
; CHECK: ret i8
}
The “CHECK-DAG:” directive¶
If it’s necessary to match strings that don’t occur in a strictly sequential
order, “CHECK-DAG:
” could be used to verify them between two matches (or
before the first match, or after the last match). For example, clang emits
vtable globals in reverse order. Using CHECK-DAG:
, we can keep the checks
in the natural order:
// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s
struct Foo { virtual void method(); };
Foo f; // emit vtable
// CHECK-DAG: @_ZTV3Foo =
struct Bar { virtual void method(); };
Bar b;
// CHECK-DAG: @_ZTV3Bar =
CHECK-NOT:
directives could be mixed with CHECK-DAG:
directives to
exclude strings between the surrounding CHECK-DAG:
directives. As a result,
the surrounding CHECK-DAG:
directives cannot be reordered, i.e. all
occurrences matching CHECK-DAG:
before CHECK-NOT:
must not fall behind
occurrences matching CHECK-DAG:
after CHECK-NOT:
. For example,
; CHECK-DAG: BEFORE
; CHECK-NOT: NOT
; CHECK-DAG: AFTER
This case will reject input strings where BEFORE
occurs after AFTER
.
With captured variables, CHECK-DAG:
is able to match valid topological
orderings of a DAG with edges from the definition of a variable to its use.
It’s useful, e.g., when your test cases need to match different output
sequences from the instruction scheduler. For example,
; CHECK-DAG: add [[REG1:r[0-9]+]], r1, r2
; CHECK-DAG: add [[REG2:r[0-9]+]], r3, r4
; CHECK: mul r5, [[REG1]], [[REG2]]
In this case, any order of that two add
instructions will be allowed.
If you are defining and using variables in the same CHECK-DAG:
block,
be aware that the definition rule can match after its use.
So, for instance, the code below will pass:
; CHECK-DAG: vmov.32 [[REG2:d[0-9]+]][0]
; CHECK-DAG: vmov.32 [[REG2]][1]
vmov.32 d0[1]
vmov.32 d0[0]
While this other code, will not:
; CHECK-DAG: vmov.32 [[REG2:d[0-9]+]][0]
; CHECK-DAG: vmov.32 [[REG2]][1]
vmov.32 d1[1]
vmov.32 d0[0]
While this can be very useful, it’s also dangerous, because in the case of register sequence, you must have a strong order (read before write, copy before use, etc). If the definition your test is looking for doesn’t match (because of a bug in the compiler), it may match further away from the use, and mask real bugs away.
In those cases, to enforce the order, use a non-DAG directive between DAG-blocks.
The “CHECK-LABEL:” directive¶
Sometimes in a file containing multiple tests divided into logical blocks, one
or more CHECK:
directives may inadvertently succeed by matching lines in a
later block. While an error will usually eventually be generated, the check
flagged as causing the error may not actually bear any relationship to the
actual source of the problem.
In order to produce better error messages in these cases, the “CHECK-LABEL:
”
directive can be used. It is treated identically to a normal CHECK
directive except that FileCheck makes an additional assumption that a line
matched by the directive cannot also be matched by any other check present in
match-filename
; this is intended to be used for lines containing labels or
other unique identifiers. Conceptually, the presence of CHECK-LABEL
divides
the input stream into separate blocks, each of which is processed independently,
preventing a CHECK:
directive in one block matching a line in another block.
For example,
define %struct.C* @C_ctor_base(%struct.C* %this, i32 %x) {
entry:
; CHECK-LABEL: C_ctor_base:
; CHECK: mov [[SAVETHIS:r[0-9]+]], r0
; CHECK: bl A_ctor_base
; CHECK: mov r0, [[SAVETHIS]]
%0 = bitcast %struct.C* %this to %struct.A*
%call = tail call %struct.A* @A_ctor_base(%struct.A* %0)
%1 = bitcast %struct.C* %this to %struct.B*
%call2 = tail call %struct.B* @B_ctor_base(%struct.B* %1, i32 %x)
ret %struct.C* %this
}
define %struct.D* @D_ctor_base(%struct.D* %this, i32 %x) {
entry:
; CHECK-LABEL: D_ctor_base:
The use of CHECK-LABEL:
directives in this case ensures that the three
CHECK:
directives only accept lines corresponding to the body of the
@C_ctor_base
function, even if the patterns match lines found later in
the file. Furthermore, if one of these three CHECK:
directives fail,
FileCheck will recover by continuing to the next block, allowing multiple test
failures to be detected in a single invocation.
There is no requirement that CHECK-LABEL:
directives contain strings that
correspond to actual syntactic labels in a source or output language: they must
simply uniquely match a single line in the file being verified.
CHECK-LABEL:
directives cannot contain variable definitions or uses.
FileCheck Pattern Matching Syntax¶
All FileCheck directives take a pattern to match.
For most uses of FileCheck, fixed string matching is perfectly sufficient. For
some things, a more flexible form of matching is desired. To support this,
FileCheck allows you to specify regular expressions in matching strings,
surrounded by double braces: {{yourregex}}
. Because we want to use fixed
string matching for a majority of what we do, FileCheck has been designed to
support mixing and matching fixed string matching with regular expressions.
This allows you to write things like this:
; CHECK: movhpd {{[0-9]+}}(%esp), {{%xmm[0-7]}}
In this case, any offset from the ESP register will be allowed, and any xmm register will be allowed.
Because regular expressions are enclosed with double braces, they are
visually distinct, and you don’t need to use escape characters within the double
braces like you would in C. In the rare case that you want to match double
braces explicitly from the input, you can use something ugly like
{{[{][{]}}
as your pattern.
FileCheck Variables¶
It is often useful to match a pattern and then verify that it occurs again later in the file. For codegen tests, this can be useful to allow any register, but verify that that register is used consistently later. To do this, FileCheck allows named variables to be defined and substituted into patterns. Here is a simple example:
; CHECK: test5:
; CHECK: notw [[REGISTER:%[a-z]+]]
; CHECK: andw {{.*}}[[REGISTER]]
The first check line matches a regex %[a-z]+
and captures it into the
variable REGISTER
. The second line verifies that whatever is in
REGISTER
occurs later in the file after an “andw
”. FileCheck
variable references are always contained in [[ ]]
pairs, and their names can
be formed with the regex [a-zA-Z][a-zA-Z0-9]*
. If a colon follows the name,
then it is a definition of the variable; otherwise, it is a use.
FileCheck variables can be defined multiple times, and uses always get the latest value. Variables can also be used later on the same line they were defined on. For example:
; CHECK: op [[REG:r[0-9]+]], [[REG]]
Can be useful if you want the operands of op
to be the same register,
and don’t care exactly which register it is.
FileCheck Expressions¶
Sometimes there’s a need to verify output which refers line numbers of the
match file, e.g. when testing compiler diagnostics. This introduces a certain
fragility of the match file structure, as “CHECK:
” lines contain absolute
line numbers in the same file, which have to be updated whenever line numbers
change due to text addition or deletion.
To support this case, FileCheck allows using [[@LINE]]
,
[[@LINE+<offset>]]
, [[@LINE-<offset>]]
expressions in patterns. These
expressions expand to a number of the line where a pattern is located (with an
optional integer offset).
This way match patterns can be put near the relevant test lines and include relative line number references, for example:
// CHECK: test.cpp:[[@LINE+4]]:6: error: expected ';' after top level declarator
// CHECK-NEXT: {{^int a}}
// CHECK-NEXT: {{^ \^}}
// CHECK-NEXT: {{^ ;}}
int a
tblgen - Target Description To C++ Code Generator¶
SYNOPSIS¶
tblgen [options] [filename]
DESCRIPTION¶
tblgen translates from target description (.td
) files into C++
code that can be included in the definition of an LLVM target library. Most
users of LLVM will not need to use this program. It is only for assisting with
writing an LLVM target backend.
The input and output of tblgen is beyond the scope of this short introduction; please see the introduction to TableGen.
The filename argument specifies the name of a Target Description (.td
)
file to read as input.
OPTIONS¶
-
-help
¶
Print a summary of command line options.
-
-o
filename
¶ Specify the output file name. If
filename
is-
, then tblgen sends its output to standard output.
-
-I
directory
¶ Specify where to find other target description files for inclusion. The
directory
value should be a full or partial path to a directory that contains target description files.
-
-asmparsernum
N
¶ Make -gen-asm-parser emit assembly writer number
N
.
-
-asmwriternum
N
¶ Make -gen-asm-writer emit assembly writer number
N
.
-
-class
className
¶ Print the enumeration list for this class.
-
-print-records
¶
Print all records to standard output (default).
-
-print-enums
¶
Print enumeration values for a class.
-
-print-sets
¶
Print expanded sets for testing DAG exprs.
-
-gen-emitter
¶
Generate machine code emitter.
-
-gen-register-info
¶
Generate registers and register classes info.
-
-gen-instr-info
¶
Generate instruction descriptions.
-
-gen-asm-writer
¶
Generate the assembly writer.
-
-gen-disassembler
¶
Generate disassembler.
-
-gen-pseudo-lowering
¶
Generate pseudo instruction lowering.
-
-gen-dag-isel
¶
Generate a DAG (Directed Acycle Graph) instruction selector.
-
-gen-asm-matcher
¶
Generate assembly instruction matcher.
-
-gen-dfa-packetizer
¶
Generate DFA Packetizer for VLIW targets.
-
-gen-fast-isel
¶
Generate a “fast” instruction selector.
-
-gen-subtarget
¶
Generate subtarget enumerations.
-
-gen-intrinsic
¶
Generate intrinsic information.
-
-gen-tgt-intrinsic
¶
Generate target intrinsic information.
-
-gen-enhanced-disassembly-info
¶
Generate enhanced disassembly info.
-
-version
¶
Show the version number of this program.
EXIT STATUS¶
If tblgen succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value.
lit - LLVM Integrated Tester¶
SYNOPSIS¶
lit [options] [tests]
DESCRIPTION¶
lit is a portable tool for executing LLVM and Clang style test suites, summarizing their results, and providing indication of failures. lit is designed to be a lightweight testing tool with as simple a user interface as possible.
lit should be run with one or more tests to run specified on the command line. Tests can be either individual test files or directories to search for tests (see TEST DISCOVERY).
Each specified test will be executed (potentially in parallel) and once all tests have been run lit will print summary information on the number of tests which passed or failed (see TEST STATUS RESULTS). The lit program will execute with a non-zero exit code if any tests fail.
By default lit will use a succinct progress display and will only print summary information for test failures. See OUTPUT OPTIONS for options controlling the lit progress display and output.
lit also includes a number of options for controlling how tests are executed (specific features may depend on the particular test format). See EXECUTION OPTIONS for more information.
Finally, lit also supports additional options for only running a subset of the options specified on the command line, see SELECTION OPTIONS for more information.
Users interested in the lit architecture or designing a lit testing implementation should see LIT INFRASTRUCTURE.
GENERAL OPTIONS¶
-
-h
,
--help
¶
Show the lit help message.
-
-j
N
,
--threads
=N
¶ Run
N
tests in parallel. By default, this is automatically chosen to match the number of detected available CPUs.
-
--config-prefix
=NAME
¶ Search for
NAME.cfg
andNAME.site.cfg
when searching for test suites, instead oflit.cfg
andlit.site.cfg
.
-
-D
NAME
,
-D
NAME=VALUE
,
--param
NAME
,
--param
NAME=VALUE
¶ Add a user defined parameter
NAME
with the givenVALUE
(or the empty string if not given). The meaning and use of these parameters is test suite dependent.
OUTPUT OPTIONS¶
-
-q
,
--quiet
¶
Suppress any output except for test failures.
-
-s
,
--succinct
¶
Show less output, for example don’t show information on tests that pass.
-
-v
,
--verbose
¶
Show more information on test failures, for example the entire test output instead of just the test result.
-
--no-progress-bar
¶
Do not use curses based progress bar.
-
--show-unsupported
¶
Show the names of unsupported tests.
-
--show-xfail
¶
Show the names of tests that were expected to fail.
EXECUTION OPTIONS¶
-
--path
=PATH
¶ Specify an additional
PATH
to use when searching for executables in tests.
-
--vg
¶
Run individual tests under valgrind (using the memcheck tool). The
--error-exitcode
argument for valgrind is used so that valgrind failures will cause the program to exit with a non-zero status.When this option is enabled, lit will also automatically provide a “
valgrind
” feature that can be used to conditionally disable (or expect failure in) certain tests.
-
--vg-leak
¶
When
--vg
is used, enable memory leak checks. When this option is enabled, lit will also automatically provide a “vg_leak
” feature that can be used to conditionally disable (or expect failure in) certain tests.
-
--time-tests
¶
Track the wall time individual tests take to execute and includes the results in the summary output. This is useful for determining which tests in a test suite take the most time to execute. Note that this option is most useful with
-j 1
.
SELECTION OPTIONS¶
-
--max-tests
=N
¶ Run at most
N
tests and then terminate.
-
--max-time
=N
¶ Spend at most
N
seconds (approximately) running tests and then terminate.
-
--shuffle
¶
Run the tests in a random order.
ADDITIONAL OPTIONS¶
-
--debug
¶
Run lit in debug mode, for debugging configuration issues and lit itself.
-
--show-suites
¶
List the discovered test suites and exit.
-
--show-tests
¶
List all of the discovered tests and exit.
EXIT STATUS¶
lit will exit with an exit code of 1 if there are any FAIL or XPASS results. Otherwise, it will exit with the status 0. Other exit codes are used for non-test related failures (for example a user error or an internal program error).
TEST DISCOVERY¶
The inputs passed to lit can be either individual tests, or entire directories or hierarchies of tests to run. When lit starts up, the first thing it does is convert the inputs into a complete list of tests to run as part of test discovery.
In the lit model, every test must exist inside some test suite.
lit resolves the inputs specified on the command line to test suites
by searching upwards from the input path until it finds a lit.cfg
or
lit.site.cfg
file. These files serve as both a marker of test suites
and as configuration files which lit loads in order to understand
how to find and run the tests inside the test suite.
Once lit has mapped the inputs into test suites it traverses the list of inputs adding tests for individual files and recursively searching for tests in directories.
This behavior makes it easy to specify a subset of tests to run, while still allowing the test suite configuration to control exactly how tests are interpreted. In addition, lit always identifies tests by the test suite they are in, and their relative path inside the test suite. For appropriately configured projects, this allows lit to provide convenient and flexible support for out-of-tree builds.
TEST STATUS RESULTS¶
Each test ultimately produces one of the following six results:
PASS
The test succeeded.
XFAIL
The test failed, but that is expected. This is used for test formats which allow specifying that a test does not currently work, but wish to leave it in the test suite.
XPASS
The test succeeded, but it was expected to fail. This is used for tests which were specified as expected to fail, but are now succeeding (generally because the feature they test was broken and has been fixed).
FAIL
The test failed.
UNRESOLVED
The test result could not be determined. For example, this occurs when the test could not be run, the test itself is invalid, or the test was interrupted.
UNSUPPORTED
The test is not supported in this environment. This is used by test formats which can report unsupported tests.
Depending on the test format tests may produce additional information about their status (generally only for failures). See the OUTPUT OPTIONS section for more information.
LIT INFRASTRUCTURE¶
This section describes the lit testing architecture for users interested in creating a new lit testing implementation, or extending an existing one.
lit proper is primarily an infrastructure for discovering and running arbitrary tests, and to expose a single convenient interface to these tests. lit itself doesn’t know how to run tests, rather this logic is defined by test suites.
TEST SUITES¶
As described in TEST DISCOVERY, tests are always located inside a test suite. Test suites serve to define the format of the tests they contain, the logic for finding those tests, and any additional information to run the tests.
lit identifies test suites as directories containing lit.cfg
or
lit.site.cfg
files (see also --config-prefix
). Test suites are
initially discovered by recursively searching up the directory hierarchy for
all the input files passed on the command line. You can use
--show-suites
to display the discovered test suites at startup.
Once a test suite is discovered, its config file is loaded. Config files themselves are Python modules which will be executed. When the config file is executed, two important global variables are predefined:
lit_config
The global lit configuration object (a LitConfig instance), which defines the builtin test formats, global configuration parameters, and other helper routines for implementing test configurations.
config
This is the config object (a TestingConfig instance) for the test suite, which the config file is expected to populate. The following variables are also available on the config object, some of which must be set by the config and others are optional or predefined:
name [required] The name of the test suite, for use in reports and diagnostics.
test_format [required] The test format object which will be used to discover and run tests in the test suite. Generally this will be a builtin test format available from the lit.formats module.
test_source_root The filesystem path to the test suite root. For out-of-dir builds this is the directory that will be scanned for tests.
test_exec_root For out-of-dir builds, the path to the test suite root inside the object directory. This is where tests will be run and temporary output files placed.
environment A dictionary representing the environment to use when executing tests in the suite.
suffixes For lit test formats which scan directories for tests, this variable is a list of suffixes to identify test files. Used by: ShTest.
substitutions For lit test formats which substitute variables into a test script, the list of substitutions to perform. Used by: ShTest.
unsupported Mark an unsupported directory, all tests within it will be reported as unsupported. Used by: ShTest.
parent The parent configuration, this is the config object for the directory containing the test suite, or None.
root The root configuration. This is the top-most lit configuration in the project.
pipefail Normally a test using a shell pipe fails if any of the commands on the pipe fail. If this is not desired, setting this variable to false makes the test fail only if the last command in the pipe fails.
TEST DISCOVERY¶
Once test suites are located, lit recursively traverses the source directory (following test_source_root) looking for tests. When lit enters a sub-directory, it first checks to see if a nested test suite is defined in that directory. If so, it loads that test suite recursively, otherwise it instantiates a local test config for the directory (see LOCAL CONFIGURATION FILES).
Tests are identified by the test suite they are contained within, and the relative path inside that suite. Note that the relative path may not refer to an actual file on disk; some test formats (such as GoogleTest) define “virtual tests” which have a path that contains both the path to the actual test file and a subpath to identify the virtual test.
LOCAL CONFIGURATION FILES¶
When lit loads a subdirectory in a test suite, it instantiates a local test configuration by cloning the configuration for the parent directory — the root of this configuration chain will always be a test suite. Once the test configuration is cloned lit checks for a lit.local.cfg file in the subdirectory. If present, this file will be loaded and can be used to specialize the configuration for each individual directory. This facility can be used to define subdirectories of optional tests, or to change other configuration parameters — for example, to change the test format, or the suffixes which identify test files.
TEST RUN OUTPUT FORMAT¶
The lit output for a test run conforms to the following schema, in both short and verbose modes (although in short mode no PASS lines will be shown). This schema has been chosen to be relatively easy to reliably parse by a machine (for example in buildbot log scraping), and for other tools to generate.
Each test result is expected to appear on a line that matches:
<result code>: <test name> (<progress info>)
where <result-code>
is a standard test result such as PASS, FAIL, XFAIL,
XPASS, UNRESOLVED, or UNSUPPORTED. The performance result codes of IMPROVED and
REGRESSED are also allowed.
The <test name>
field can consist of an arbitrary string containing no
newline.
The <progress info>
field can be used to report progress information such
as (1/300) or can be empty, but even when empty the parentheses are required.
Each test result may include additional (multiline) log information in the following format:
<log delineator> TEST '(<test name>)' <trailing delineator>
... log message ...
<log delineator>
where <test name>
should be the name of a preceding reported test, <log
delineator>
is a string of “*” characters at least four characters long
(the recommended length is 20), and <trailing delineator>
is an arbitrary
(unparsed) string.
The following is an example of a test run output which consists of four tests A, B, C, and D, and a log message for the failing test C:
PASS: A (1 of 4)
PASS: B (2 of 4)
FAIL: C (3 of 4)
******************** TEST 'C' FAILED ********************
Test 'C' failed as a result of exit code 1.
********************
PASS: D (4 of 4)
LIT EXAMPLE TESTS¶
The lit distribution contains several example implementations of test suites in the ExampleTests directory.
SEE ALSO¶
valgrind(1)
llvm-build - LLVM Project Build Utility¶
SYNOPSIS¶
llvm-build [options]
DESCRIPTION¶
llvm-build is a tool for working with LLVM projects that use the LLVMBuild system for describing their components.
At heart, llvm-build is responsible for loading, verifying, and manipulating the project’s component data. The tool is primarily designed for use in implementing build systems and tools which need access to the project structure information.
OPTIONS¶
-h, –help
Print the builtin program help.
–source-root=PATH
If given, load the project at the given source root path. If this option is not given, the location of the project sources will be inferred from the location of the llvm-build script itself.
–print-tree
Print the component tree for the project.
–write-library-table
Write out the C++ fragment which defines the components, library names, and required libraries. This C++ fragment is built into llvm-config|llvm-config in order to provide clients with the list of required libraries for arbitrary component combinations.
–write-llvmbuild
Write out new LLVMBuild.txt files based on the loaded components. This is useful for auto-upgrading the schema of the files. llvm-build will try to a limited extent to preserve the comments which were written in the original source file, although at this time it only preserves block comments that precede the section names in the LLVMBuild files.
–write-cmake-fragment
Write out the LLVMBuild in the form of a CMake fragment, so it can easily be consumed by the CMake based build system. The exact contents and format of this file are closely tied to how LLVMBuild is integrated with CMake, see LLVM’s top-level CMakeLists.txt.
–write-make-fragment
Write out the LLVMBuild in the form of a Makefile fragment, so it can easily be consumed by a Make based build system. The exact contents and format of this file are closely tied to how LLVMBuild is integrated with the Makefiles, see LLVM’s Makefile.rules.
–llvmbuild-source-root=PATH
If given, expect the LLVMBuild files for the project to be rooted at the given path, instead of inside the source tree itself. This option is primarily designed for use in conjunction with –write-llvmbuild to test changes to LLVMBuild schema.
EXIT STATUS¶
llvm-build exits with 0 if operation was successful. Otherwise, it will exist with a non-zero value.
llvm-readobj - LLVM Object Reader¶
SYNOPSIS¶
llvm-readobj [options] [input...]
DESCRIPTION¶
The llvm-readobj tool displays low-level format-specific information about one or more object files. The tool and its output is primarily designed for use in FileCheck-based tests.
OPTIONS¶
If input
is “-
” or omitted, llvm-readobj reads from standard
input. Otherwise, it will read from the specified filenames
.
-
-help
¶
Print a summary of command line options.
-
-version
¶
Display the version of this program
-
-file-headers
,
-h
¶
Display file headers.
-
-sections
,
-s
¶
Display all sections.
-
-section-data
,
-sd
¶
When used with
-sections
, display section data for each section shown.
-
-section-relocations
,
-sr
¶
When used with
-sections
, display relocations for each section shown.
-
-section-symbols
,
-st
¶
When used with
-sections
, display symbols for each section shown.
-
-relocations
,
-r
¶
Display the relocation entries in the file.
-
-symbols
,
-t
¶
Display the symbol table.
-
-dyn-symbols
¶
Display the dynamic symbol table (only for ELF object files).
-
-unwind
,
-u
¶
Display unwind information.
-
-expand-relocs
¶
When used with
-relocations
, display each relocation in an expanded multi-line format.
-
-dynamic-table
¶
Display the ELF .dynamic section table (only for ELF object files).
-
-needed-libs
¶
Display the needed libraries (only for ELF object files).
-
-program-headers
¶
Display the ELF program headers (only for ELF object files).
EXIT STATUS¶
llvm-readobj returns 0.
Getting Started with the LLVM System¶
- Overview
- Getting Started Quickly (A Summary)
- Requirements
- Getting Started with LLVM
- Program Layout
- An Example Using the LLVM Tool Chain
- Common Problems
- Links
Overview¶
Welcome to LLVM! In order to get started, you first need to know some basic information.
First, LLVM comes in three pieces. The first piece is the LLVM suite. This contains all of the tools, libraries, and header files needed to use LLVM. It contains an assembler, disassembler, bitcode analyzer and bitcode optimizer. It also contains basic regression tests that can be used to test the LLVM tools and the Clang front end.
The second piece is the Clang front end. This component compiles C, C++, Objective C, and Objective C++ code into LLVM bitcode. Once compiled into LLVM bitcode, a program can be manipulated with the LLVM tools from the LLVM suite.
There is a third, optional piece called Test Suite. It is a suite of programs with a testing harness that can be used to further test LLVM’s functionality and performance.
Getting Started Quickly (A Summary)¶
The LLVM Getting Started documentation may be out of date. So, the Clang Getting Started page might also be a good place to start.
Here’s the short story for getting up and running quickly with LLVM:
Read the documentation.
Read the documentation.
Remember that you were warned twice about reading the documentation.
Checkout LLVM:
cd where-you-want-llvm-to-live
svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
Checkout Clang:
cd where-you-want-llvm-to-live
cd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
Checkout Compiler-RT:
cd where-you-want-llvm-to-live
cd llvm/projects
svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt
Get the Test Suite Source Code [Optional]
cd where-you-want-llvm-to-live
cd llvm/projects
svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
Configure and build LLVM and Clang:
The usual build uses CMake. If you would rather use autotools, see Building LLVM with autotools.
cd where you want to build llvm
mkdir build
cd build
cmake -G <generator> [options] <path to llvm sources>
Some common generators are:
Unix Makefiles
— for generating make-compatible parallel makefiles.Ninja
— for generating Ninja <http://martine.github.io/ninja/>- build files.
Visual Studio
— for generating Visual Studio projects and- solutions.
Xcode
— for generating Xcode projects.
Some Common options:
-DCMAKE_INSTALL_PREFIX=directory
— Specify for directory the full pathname of where you want the LLVM tools and libraries to be installed (default/usr/local
).-DCMAKE_BUILD_TYPE=type
— Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug.-DLLVM_ENABLE_ASSERTIONS=On
— Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types).
Run your build tool of choice!
- The default target (i.e.
make
) will build all of LLVM - The
check-all
target (i.e.make check-all
) will run the regression tests to ensure everything is in working order. - CMake will generate build targets for each tool and library, and most
LLVM sub-projects generate their own
check-<project>
target.
- The default target (i.e.
For more information see CMake
If you get an “internal compiler error (ICE)” or test failures, see below.
Consult the Getting Started with LLVM section for detailed information on configuring and compiling LLVM. See Setting Up Your Environment for tips that simplify working with the Clang front end and LLVM tools. Go to Program Layout to learn about the layout of the source code tree.
Requirements¶
Before you begin to use the LLVM system, review the requirements given below. This may save you some trouble by knowing ahead of time what hardware and software you will need.
Hardware¶
LLVM is known to work on the following host platforms:
OS | Arch | Compilers |
---|---|---|
Linux | x861 | GCC, Clang |
Linux | amd64 | GCC, Clang |
Linux | ARM4 | GCC, Clang |
Linux | PowerPC | GCC, Clang |
Solaris | V9 (Ultrasparc) | GCC |
FreeBSD | x861 | GCC, Clang |
FreeBSD | amd64 | GCC, Clang |
MacOS X2 | PowerPC | GCC |
MacOS X | x86 | GCC, Clang |
Cygwin/Win32 | x861, 3 | GCC |
Windows | x861 | Visual Studio |
Windows x64 | x86-64 | Visual Studio |
Note
- Code generation supported for Pentium processors and up
- Code generation supported for 32-bit ABI only
- To use LLVM modules on Win32-based system, you may configure LLVM
with
-DBUILD_SHARED_LIBS=On
for CMake builds or--enable-shared
for configure builds. - MCJIT not working well pre-v7, old JIT engine not supported any more.
Note that you will need about 1-3 GB of space for a full LLVM build in Debug
mode, depending on the system (it is so large because of all the debugging
information and the fact that the libraries are statically linked into multiple
tools). If you do not need many of the tools and you are space-conscious, you
can pass ONLY_TOOLS="tools you need"
to make. The Release build requires
considerably less space.
The LLVM suite may compile on other platforms, but it is not guaranteed to do so. If compilation is successful, the LLVM utilities should be able to assemble, disassemble, analyze, and optimize LLVM bitcode. Code generation should work as well, although the generated native code may not work on your platform.
Software¶
Compiling LLVM requires that you have several software packages installed. The table below lists those required packages. The Package column is the usual name for the software package that LLVM depends on. The Version column provides “known to work” versions of the package. The Notes column describes how LLVM uses the package and provides other details.
Package | Version | Notes |
---|---|---|
GNU Make | 3.79, 3.79.1 | Makefile/build processor |
GCC | >=4.7.0 | C/C++ compiler1 |
python | >=2.7 | Automated test suite2 |
GNU M4 | 1.4 | Macro processor for configuration3 |
GNU Autoconf | 2.60 | Configuration script builder3 |
GNU Automake | 1.9.6 | aclocal macro generator3 |
libtool | 1.5.22 | Shared library manager3 |
zlib | >=1.2.3.4 | Compression library4 |
Note
- Only the C and C++ languages are needed so there’s no need to build the other languages for LLVM’s purposes. See below for specific version info.
- Only needed if you want to run the automated test suite in the
llvm/test
directory. - If you want to make changes to the configure scripts, you will need GNU autoconf (2.60), and consequently, GNU M4 (version 1.4 or higher). You will also need automake (1.9.6). We only use aclocal from that package.
- Optional, adds compression / uncompression capabilities to selected LLVM tools.
Additionally, your compilation host is expected to have the usual plethora of Unix utilities. Specifically:
- ar — archive library builder
- bzip2 — bzip2 command for distribution generation
- bunzip2 — bunzip2 command for distribution checking
- chmod — change permissions on a file
- cat — output concatenation utility
- cp — copy files
- date — print the current date/time
- echo — print to standard output
- egrep — extended regular expression search utility
- find — find files/dirs in a file system
- grep — regular expression search utility
- gzip — gzip command for distribution generation
- gunzip — gunzip command for distribution checking
- install — install directories/files
- mkdir — create a directory
- mv — move (rename) files
- ranlib — symbol table builder for archive libraries
- rm — remove (delete) files and directories
- sed — stream editor for transforming output
- sh — Bourne shell for make build scripts
- tar — tape archive for distribution generation
- test — test things in file system
- unzip — unzip command for distribution checking
- zip — zip command for distribution generation
Host C++ Toolchain, both Compiler and Standard Library¶
LLVM is very demanding of the host C++ compiler, and as such tends to expose bugs in the compiler. We are also planning to follow improvements and developments in the C++ language and library reasonably closely. As such, we require a modern host C++ toolchain, both compiler and standard library, in order to build LLVM.
For the most popular host toolchains we check for specific minimum versions in our build systems:
- Clang 3.1
- GCC 4.7
- Visual Studio 2013
Anything older than these toolchains may work, but will require forcing the build system with a special option and is not really a supported host platform. Also note that older versions of these compilers have often crashed or miscompiled LLVM.
For less widely used host toolchains such as ICC or xlC, be aware that a very recent version may be required to support all of the C++ features used in LLVM.
We track certain versions of software that are known to fail when used as part of the host toolchain. These even include linkers at times.
GCC 4.6.3 on ARM: Miscompiles llvm-readobj
at -O3
. A test failure
in test/Object/readobj-shared-object.test
is one symptom of the problem.
GNU ld 2.16.X. Some 2.16.X versions of the ld linker will produce very long
warning messages complaining that some “.gnu.linkonce.t.*
” symbol was
defined in a discarded section. You can safely ignore these messages as they are
erroneous and the linkage is correct. These messages disappear using ld 2.17.
GNU binutils 2.17: Binutils 2.17 contains a bug which causes huge link times (minutes instead of seconds) when building LLVM. We recommend upgrading to a newer version (2.17.50.0.4 or later).
GNU Binutils 2.19.1 Gold: This version of Gold contained a bug which causes intermittent failures when building LLVM with position independent code. The symptom is an error about cyclic dependencies. We recommend upgrading to a newer version of Gold.
Clang 3.0 with libstdc++ 4.7.x: a few Linux distributions (Ubuntu 12.10, Fedora 17) have both Clang 3.0 and libstdc++ 4.7 in their repositories. Clang 3.0 does not implement a few builtins that are used in this library. We recommend using the system GCC to compile LLVM and Clang in this case.
Clang 3.0 on Mageia 2. There’s a packaging issue: Clang can not find at
least some (cxxabi.h
) libstdc++ headers.
Clang in C++11 mode and libstdc++ 4.7.2. This version of libstdc++ contained a bug which causes Clang to refuse to compile condition_variable header file. At the time of writing, this breaks LLD build.
Getting a Modern Host C++ Toolchain¶
This section mostly applies to Linux and older BSDs. On Mac OS X, you should have a sufficiently modern Xcode, or you will likely need to upgrade until you do. On Windows, just use Visual Studio 2013 as the host compiler, it is explicitly supported and widely available. FreeBSD 10.0 and newer have a modern Clang as the system compiler.
However, some Linux distributions and some other or older BSDs sometimes have extremely old versions of GCC. These steps attempt to help you upgrade you compiler even on such a system. However, if at all possible, we encourage you to use a recent version of a distribution with a modern system compiler that meets these requirements. Note that it is tempting to to install a prior version of Clang and libc++ to be the host compiler, however libc++ was not well tested or set up to build on Linux until relatively recently. As a consequence, this guide suggests just using libstdc++ and a modern GCC as the initial host in a bootstrap, and then using Clang (and potentially libc++).
The first step is to get a recent GCC toolchain installed. The most common distribution on which users have struggled with the version requirements is Ubuntu Precise, 12.04 LTS. For this distribution, one easy option is to install the toolchain testing PPA and use it to install a modern GCC. There is a really nice discussions of this on the ask ubuntu stack exchange. However, not all users can use PPAs and there are many other distributions, so it may be necessary (or just useful, if you’re here you are doing compiler development after all) to build and install GCC from source. It is also quite easy to do these days.
Easy steps for installing GCC 4.8.2:
% wget https://ftp.gnu.org/gnu/gcc/gcc-4.8.2/gcc-4.8.2.tar.bz2
% wget https://ftp.gnu.org/gnu/gcc/gcc-4.8.2/gcc-4.8.2.tar.bz2.sig
% wget https://ftp.gnu.org/gnu/gnu-keyring.gpg
% signature_invalid=`gpg --verify --no-default-keyring --keyring ./gnu-keyring.gpg gcc-4.8.2.tar.bz2.sig`
% if [ $signature_invalid ]; then echo "Invalid signature" ; exit 1 ; fi
% tar -xvjf gcc-4.8.2.tar.bz2
% cd gcc-4.8.2
% ./contrib/download_prerequisites
% cd ..
% mkdir gcc-4.8.2-build
% cd gcc-4.8.2-build
% $PWD/../gcc-4.8.2/configure --prefix=$HOME/toolchains --enable-languages=c,c++
% make -j$(nproc)
% make install
For more details, check out the excellent GCC wiki entry, where I got most of this information from.
Once you have a GCC toolchain, configure your build of LLVM to use the new
toolchain for your host compiler and C++ standard library. Because the new
version of libstdc++ is not on the system library search path, you need to pass
extra linker flags so that it can be found at link time (-L
) and at runtime
(-rpath
). If you are using CMake, this invocation should produce working
binaries:
% mkdir build
% cd build
% CC=$HOME/toolchains/bin/gcc CXX=$HOME/toolchains/bin/g++ \
cmake .. -DCMAKE_CXX_LINK_FLAGS="-Wl,-rpath,$HOME/toolchains/lib64 -L$HOME/toolchains/lib64"
If you fail to set rpath, most LLVM binaries will fail on startup with a message
from the loader similar to libstdc++.so.6: version `GLIBCXX_3.4.20' not
found
. This means you need to tweak the -rpath linker flag.
When you build Clang, you will need to give it access to modern C++11
standard library in order to use it as your new host in part of a bootstrap.
There are two easy ways to do this, either build (and install) libc++ along
with Clang and then use it with the -stdlib=libc++
compile and link flag,
or install Clang into the same prefix ($HOME/toolchains
above) as GCC.
Clang will look within its own prefix for libstdc++ and use it if found. You
can also add an explicit prefix for Clang to look in for a GCC toolchain with
the --gcc-toolchain=/opt/my/gcc/prefix
flag, passing it to both compile and
link commands when using your just-built-Clang to bootstrap.
Getting Started with LLVM¶
The remainder of this guide is meant to get you up and running with LLVM and to give you some basic information about the LLVM environment.
The later sections of this guide describe the general layout of the LLVM source tree, a simple example using the LLVM tool chain, and links to find more information about LLVM or to get help via e-mail.
Terminology and Notation¶
Throughout this manual, the following names are used to denote paths specific to the local system and working environment. These are not environment variables you need to set but just strings used in the rest of this document below. In any of the examples below, simply replace each of these names with the appropriate pathname on your local system. All these paths are absolute:
SRC_ROOT
This is the top level directory of the LLVM source tree.
OBJ_ROOT
This is the top level directory of the LLVM object tree (i.e. the tree where object files and compiled programs will be placed. It can be the same as SRC_ROOT).
Setting Up Your Environment¶
In order to compile and use LLVM, you may need to set some environment variables.
LLVM_LIB_SEARCH_PATH=/path/to/your/bitcode/libs
[Optional] This environment variable helps LLVM linking tools find the locations of your bitcode libraries. It is provided only as a convenience since you can specify the paths using the -L options of the tools and the C/C++ front-end will automatically use the bitcode files installed in itslib
directory.
Unpacking the LLVM Archives¶
If you have the LLVM distribution, you will need to unpack it before you can begin to compile it. LLVM is distributed as a set of two files: the LLVM suite and the LLVM GCC front end compiled for your platform. There is an additional test suite that is optional. Each file is a TAR archive that is compressed with the gzip program.
The files are as follows, with x.y marking the version number:
llvm-x.y.tar.gz
Source release for the LLVM libraries and tools.
llvm-test-x.y.tar.gz
Source release for the LLVM test-suite.
Checkout LLVM from Subversion¶
If you have access to our Subversion repository, you can get a fresh copy of the entire source code. All you need to do is check it out from Subversion as follows:
cd where-you-want-llvm-to-live
- Read-Only:
svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
- Read-Write:
svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
This will create an ‘llvm
‘ directory in the current directory and fully
populate it with the LLVM source code, Makefiles, test directories, and local
copies of documentation files.
If you want to get a specific release (as opposed to the most recent revision),
you can checkout it from the ‘tags
‘ directory (instead of ‘trunk
‘). The
following releases are located in the following subdirectories of the ‘tags
‘
directory:
- Release 3.4: RELEASE_34/final
- Release 3.3: RELEASE_33/final
- Release 3.2: RELEASE_32/final
- Release 3.1: RELEASE_31/final
- Release 3.0: RELEASE_30/final
- Release 2.9: RELEASE_29/final
- Release 2.8: RELEASE_28
- Release 2.7: RELEASE_27
- Release 2.6: RELEASE_26
- Release 2.5: RELEASE_25
- Release 2.4: RELEASE_24
- Release 2.3: RELEASE_23
- Release 2.2: RELEASE_22
- Release 2.1: RELEASE_21
- Release 2.0: RELEASE_20
- Release 1.9: RELEASE_19
- Release 1.8: RELEASE_18
- Release 1.7: RELEASE_17
- Release 1.6: RELEASE_16
- Release 1.5: RELEASE_15
- Release 1.4: RELEASE_14
- Release 1.3: RELEASE_13
- Release 1.2: RELEASE_12
- Release 1.1: RELEASE_11
- Release 1.0: RELEASE_1
If you would like to get the LLVM test suite (a separate package as of 1.4), you get it from the Subversion repository:
% cd llvm/projects
% svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
By placing it in the llvm/projects
, it will be automatically configured by
the LLVM configure script as well as automatically updated when you run svn
update
.
Git Mirror¶
Git mirrors are available for a number of LLVM subprojects. These mirrors sync
automatically with each Subversion commit and contain all necessary git-svn
marks (so, you can recreate git-svn metadata locally). Note that right now
mirrors reflect only trunk
for each project. You can do the read-only Git
clone of LLVM via:
% git clone http://llvm.org/git/llvm.git
If you want to check out clang too, run:
% cd llvm/tools
% git clone http://llvm.org/git/clang.git
If you want to check out compiler-rt too, run:
% cd llvm/projects
% git clone http://llvm.org/git/compiler-rt.git
If you want to check out the Test Suite Source Code (optional), run:
% cd llvm/projects
% git clone http://llvm.org/git/test-suite.git
Since the upstream repository is in Subversion, you should use git
pull --rebase
instead of git pull
to avoid generating a non-linear history
in your clone. To configure git pull
to pass --rebase
by default on the
master branch, run the following command:
% git config branch.master.rebase true
Sending patches with Git¶
Please read Developer Policy, too.
Assume master
points the upstream and mybranch
points your working
branch, and mybranch
is rebased onto master
. At first you may check
sanity of whitespaces:
% git diff --check master..mybranch
The easiest way to generate a patch is as below:
% git diff master..mybranch > /path/to/mybranch.diff
It is a little different from svn-generated diff. git-diff-generated diff has
prefixes like a/
and b/
. Don’t worry, most developers might know it
could be accepted with patch -p1 -N
.
But you may generate patchset with git-format-patch. It generates by-each-commit patchset. To generate patch files to attach to your article:
% git format-patch --no-attach master..mybranch -o /path/to/your/patchset
If you would like to send patches directly, you may use git-send-email or git-imap-send. Here is an example to generate the patchset in Gmail’s [Drafts].
% git format-patch --attach master..mybranch --stdout | git imap-send
Then, your .git/config should have [imap] sections.
[imap]
host = imaps://imap.gmail.com
user = your.gmail.account@gmail.com
pass = himitsu!
port = 993
sslverify = false
; in English
folder = "[Gmail]/Drafts"
; example for Japanese, "Modified UTF-7" encoded.
folder = "[Gmail]/&Tgtm+DBN-"
; example for Traditional Chinese
folder = "[Gmail]/&g0l6Pw-"
For developers to work with git-svn¶
To set up clone from which you can submit code using git-svn
, run:
% git clone http://llvm.org/git/llvm.git
% cd llvm
% git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
% git config svn-remote.svn.fetch :refs/remotes/origin/master
% git svn rebase -l # -l avoids fetching ahead of the git mirror.
# If you have clang too:
% cd tools
% git clone http://llvm.org/git/clang.git
% cd clang
% git svn init https://llvm.org/svn/llvm-project/cfe/trunk --username=<username>
% git config svn-remote.svn.fetch :refs/remotes/origin/master
% git svn rebase -l
Likewise for compiler-rt and test-suite.
To update this clone without generating git-svn tags that conflict with the upstream Git repo, run:
% git fetch && (cd tools/clang && git fetch) # Get matching revisions of both trees.
% git checkout master
% git svn rebase -l
% (cd tools/clang &&
git checkout master &&
git svn rebase -l)
Likewise for compiler-rt and test-suite.
This leaves your working directories on their master branches, so you’ll need to
checkout
each working branch individually and rebase
it on top of its
parent branch.
For those who wish to be able to update an llvm repo/revert patches easily using
git-svn, please look in the directory for the scripts git-svnup
and
git-svnrevert
.
To perform the aforementioned update steps go into your source directory and
just type git-svnup
or git svnup
and everything will just work.
If one wishes to revert a commit with git-svn, but do not want the git hash to
escape into the commit message, one can use the script git-svnrevert
or
git svnrevert
which will take in the git hash for the commit you want to
revert, look up the appropriate svn revision, and output a message where all
references to the git hash have been replaced with the svn revision.
To commit back changes via git-svn, use git svn dcommit
:
% git svn dcommit
Note that git-svn will create one SVN commit for each Git commit you have pending,
so squash and edit each commit before executing dcommit
to make sure they all
conform to the coding standards and the developers’ policy.
On success, dcommit
will rebase against the HEAD of SVN, so to avoid conflict,
please make sure your current branch is up-to-date (via fetch/rebase) before
proceeding.
The git-svn metadata can get out of sync after you mess around with branches and
dcommit
. When that happens, git svn dcommit
stops working, complaining
about files with uncommitted changes. The fix is to rebuild the metadata:
% rm -rf .git/svn
% git svn rebase -l
Please, refer to the Git-SVN manual (man git-svn
) for more information.
Local LLVM Configuration¶
Once checked out from the Subversion repository, the LLVM suite source code must
be configured before being built. For instructions using autotools please see
Building LLVM With Autotools. The
recommended process uses CMake. Unlinke the normal configure
script, CMake
generates the build files in whatever format you request as well as various
*.inc
files, and llvm/include/Config/config.h
.
Variables are passed to cmake
on the command line using the format
-D<variable name>=<value>
. The following variables are some common options
used by people developing LLVM.
Variable | Purpose |
---|---|
CMAKE_C_COMPILER | Tells cmake which C compiler to use. By
default, this will be /usr/bin/cc. |
CMAKE_CXX_COMPILER | Tells cmake which C++ compiler to use. By
default, this will be /usr/bin/c++. |
CMAKE_BUILD_TYPE | Tells cmake what type of build you are trying
to generate files for. Valid options are Debug,
Release, RelWithDebInfo, and MinSizeRel. Default
is Debug. |
CMAKE_INSTALL_PREFIX | Specifies the install directory to target when running the install action of the build files. |
LLVM_TARGETS_TO_BUILD | A semicolon delimited list controlling which
targets will be built and linked into llc. This is
equivalent to the --enable-targets option in
the configure script. The default list is defined
as LLVM_ALL_TARGETS , and can be set to include
out-of-tree targets. The default value includes:
AArch64, AMDGPU, ARM, BPF, CppBackend, Hexagon,
Mips, MSP430, NVPTX, PowerPC, Sparc, SystemZ
X86, XCore . |
LLVM_ENABLE_DOXYGEN | Build doxygen-based documentation from the source code This is disabled by default because it is slow and generates a lot of output. |
LLVM_ENABLE_SPHINX | Build sphinx-based documentation from the source code. This is disabled by default because it is slow and generates a lot of output. |
LLVM_BUILD_LLVM_DYLIB | Generate libLLVM.so. This library contains a
default set of LLVM components that can be
overridden with LLVM_DYLIB_COMPONENTS . The
default contains most of LLVM and is defined in
tools/llvm-shlib/CMakelists.txt . |
LLVM_OPTIMIZED_TABLEGEN | Builds a release tablegen that gets used during the LLVM build. This can dramatically speed up debug builds. |
To configure LLVM, follow these steps:
Change directory into the object root directory:
% cd OBJ_ROOT
Run the
cmake
:% cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=prefix=/install/path [other options] SRC_ROOT
Compiling the LLVM Suite Source Code¶
Unlike with autotools, with CMake your build type is defined at configuration. If you want to change your build type, you can re-run cmake with the following invocation:
% cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=type SRC_ROOT
Between runs, CMake preserves the values set for all options. CMake has the following build types defined:
Debug
These builds are the default. The build system will compile the tools and libraries unoptimized, with debugging information, and asserts enabled.
Release
For these builds, the build system will compile the tools and libraries with optimizations enabled and not generate debug info. CMakes default optimization level is -O3. This can be configured by setting theCMAKE_CXX_FLAGS_RELEASE
variable on the CMake command line.
RelWithDebInfo
These builds are useful when debugging. They generate optimized binaries with debug information. CMakes default optimization level is -O2. This can be configured by setting theCMAKE_CXX_FLAGS_RELWITHDEBINFO
variable on the CMake command line.
Once you have LLVM configured, you can build it by entering the OBJ_ROOT directory and issuing the following command:
% make
If the build fails, please check here to see if you are using a version of GCC that is known not to compile LLVM.
If you have multiple processors in your machine, you may wish to use some of the parallel build options provided by GNU Make. For example, you could use the command:
% make -j2
There are several special targets which are useful when working with the LLVM source code:
make clean
Removes all files generated by the build. This includes object files, generated C/C++ files, libraries, and executables.
make install
Installs LLVM header files, libraries, tools, and documentation in a hierarchy under$PREFIX
, specified withCMAKE_INSTALL_PREFIX
, which defaults to/usr/local
.
make docs-llvm-html
If configured with-DLLVM_ENABLE_SPHINX=On
, this will generate a directory atOBJ_ROOT/docs/html
which contains the HTML formatted documentation.
Cross-Compiling LLVM¶
It is possible to cross-compile LLVM itself. That is, you can create LLVM
executables and libraries to be hosted on a platform different from the platform
where they are built (a Canadian Cross build). To generate build files for
cross-compiling CMake provides a variable CMAKE_TOOLCHAIN_FILE
which can
define compiler flags and variables used during the CMake test operations.
The result of such a build is executables that are not runnable on on the build host but can be executed on the target. As an example the following CMake invocation can generate build files targeting iOS. This will work on Mac OS X with the latest Xcode:
% cmake -G "Ninja" -DCMAKE_OSX_ARCHITECTURES=“armv7;armv7s;arm64"
-DCMAKE_TOOLCHAIN_FILE=<PATH_TO_LLVM>/cmake/platforms/iOS.cmake
-DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_RUNTIME=Off -DLLVM_INCLUDE_TESTS=Off
-DLLVM_INCLUDE_EXAMPLES=Off -DLLVM_ENABLE_BACKTRACES=Off [options]
<PATH_TO_LLVM>
Note: There are some additional flags that need to be passed when building for iOS due to limitations in the iOS SDK.
Check How To Cross-Compile Clang/LLVM using Clang/LLVM and Clang docs on how to cross-compile in general for more information about cross-compiling.
The Location of LLVM Object Files¶
The LLVM build system is capable of sharing a single LLVM source tree among several LLVM builds. Hence, it is possible to build LLVM for several different platforms or configurations using the same source tree.
This is accomplished in the typical autoconf manner:
Change directory to where the LLVM object files should live:
% cd OBJ_ROOT
Run
cmake
:% cmake -G "Unix Makefiles" SRC_ROOT
The LLVM build will create a structure underneath OBJ_ROOT that matches the
LLVM source tree. At each level where source files are present in the source
tree there will be a corresponding CMakeFiles
directory in the OBJ_ROOT.
Underneath that directory there is another directory with a name ending in
.dir
under which you’ll find object files for each source.
For example:
% cd llvm_build_dir % find lib/Support/ -name APFloat* lib/Support/CMakeFiles/LLVMSupport.dir/APFloat.cpp.o
Optional Configuration Items¶
If you’re running on a Linux system that supports the binfmt_misc module, and you have root access on the system, you can set your system up to execute LLVM bitcode files directly. To do this, use commands like this (the first command may not be required if you are already using the module):
% mount -t binfmt_misc none /proc/sys/fs/binfmt_misc
% echo ':llvm:M::BC::/path/to/lli:' > /proc/sys/fs/binfmt_misc/register
% chmod u+x hello.bc (if needed)
% ./hello.bc
This allows you to execute LLVM bitcode files directly. On Debian, you can also use this command instead of the ‘echo’ command above:
% sudo update-binfmts --install llvm /path/to/lli --magic 'BC'
Program Layout¶
One useful source of information about the LLVM source base is the LLVM doxygen documentation available at http://llvm.org/doxygen/. The following is a brief introduction to code layout:
llvm/examples
¶
This directory contains some simple examples of how to use the LLVM IR and JIT.
llvm/include
¶
This directory contains public header files exported from the LLVM library. The three main subdirectories of this directory are:
llvm/include/llvm
This directory contains all of the LLVM specific header files. This directory also has subdirectories for different portions of LLVM:Analysis
,CodeGen
,Target
,Transforms
, etc...
llvm/include/llvm/Support
This directory contains generic support libraries that are provided with LLVM but not necessarily specific to LLVM. For example, some C++ STL utilities and a Command Line option processing library store their header files here.
llvm/include/llvm/Config
This directory contains header files configured by theconfigure
script. They wrap “standard” UNIX and C header files. Source code can include these header files which automatically take care of the conditional #includes that theconfigure
script generates.
llvm/lib
¶
This directory contains most of the source files of the LLVM system. In LLVM, almost all code exists in libraries, making it very easy to share code among the different tools.
llvm/lib/IR/
This directory holds the core LLVM source files that implement core classes like Instruction and BasicBlock.
llvm/lib/AsmParser/
This directory holds the source code for the LLVM assembly language parser library.
llvm/lib/Bitcode/
This directory holds code for reading and write LLVM bitcode.
llvm/lib/Analysis/
This directory contains a variety of different program analyses, such as Dominator Information, Call Graphs, Induction Variables, Interval Identification, Natural Loop Identification, etc.
llvm/lib/Transforms/
This directory contains the source code for the LLVM to LLVM program transformations, such as Aggressive Dead Code Elimination, Sparse Conditional Constant Propagation, Inlining, Loop Invariant Code Motion, Dead Global Elimination, and many others.
llvm/lib/Target/
This directory contains files that describe various target architectures for code generation. For example, thellvm/lib/Target/X86
directory holds the X86 machine description whilellvm/lib/Target/ARM
implements the ARM backend.
llvm/lib/CodeGen/
This directory contains the major parts of the code generator: Instruction Selector, Instruction Scheduling, and Register Allocation.
llvm/lib/MC/
(FIXME: T.B.D.)
llvm/lib/Debugger/
This directory contains the source level debugger library that makes it possible to instrument LLVM programs so that a debugger could identify source code locations at which the program is executing.
llvm/lib/ExecutionEngine/
This directory contains libraries for executing LLVM bitcode directly at runtime in both interpreted and JIT compiled fashions.
llvm/lib/Support/
This directory contains the source code that corresponds to the header files located inllvm/include/ADT/
andllvm/include/Support/
.
llvm/projects
¶
This directory contains projects that are not strictly part of LLVM but are shipped with LLVM. This is also the directory where you should create your own LLVM-based projects.
llvm/runtime
¶
This directory contains libraries which are compiled into LLVM bitcode and used when linking programs with the Clang front end. Most of these libraries are skeleton versions of real libraries; for example, libc is a stripped down version of glibc.
Unlike the rest of the LLVM suite, this directory needs the LLVM GCC front end to compile.
llvm/test
¶
This directory contains feature and regression tests and other basic sanity checks on the LLVM infrastructure. These are intended to run quickly and cover a lot of territory without being exhaustive.
test-suite
¶
This is not a directory in the normal llvm module; it is a separate Subversion
module that must be checked out (usually to projects/test-suite
). This
module contains a comprehensive correctness, performance, and benchmarking test
suite for LLVM. It is a separate Subversion module because not every LLVM user
is interested in downloading or building such a comprehensive test suite. For
further details on this test suite, please see the Testing Guide document.
llvm/tools
¶
The tools directory contains the executables built out of the libraries
above, which form the main part of the user interface. You can always get help
for a tool by typing tool_name -help
. The following is a brief introduction
to the most important tools. More detailed information is in
the Command Guide.
bugpoint
bugpoint
is used to debug optimization passes or code generation backends by narrowing down the given test case to the minimum number of passes and/or instructions that still cause a problem, whether it is a crash or miscompilation. See HowToSubmitABug.html for more information on usingbugpoint
.
llvm-ar
The archiver produces an archive containing the given LLVM bitcode files, optionally with an index for faster lookup.
llvm-as
The assembler transforms the human readable LLVM assembly to LLVM bitcode.
llvm-dis
The disassembler transforms the LLVM bitcode to human readable LLVM assembly.
llvm-link
llvm-link
, not surprisingly, links multiple LLVM modules into a single program.
lli
lli
is the LLVM interpreter, which can directly execute LLVM bitcode (although very slowly...). For architectures that support it (currently x86, Sparc, and PowerPC), by default,lli
will function as a Just-In-Time compiler (if the functionality was compiled in), and will execute the code much faster than the interpreter.
llc
llc
is the LLVM backend compiler, which translates LLVM bitcode to a native code assembly file or to C code (with the-march=c
option).
opt
opt
reads LLVM bitcode, applies a series of LLVM to LLVM transformations (which are specified on the command line), and then outputs the resultant bitcode. The ‘opt -help
‘ command is a good way to get a list of the program transformations available in LLVM.
opt
can also be used to run a specific analysis on an input LLVM bitcode file and print out the results. It is primarily useful for debugging analyses, or familiarizing yourself with what an analysis does.
llvm/utils
¶
This directory contains utilities for working with LLVM source code, and some of the utilities are actually required as part of the build process because they are code generators for parts of LLVM infrastructure.
codegen-diff
codegen-diff
is a script that finds differences between code that LLC generates and code that LLI generates. This is a useful tool if you are debugging one of them, assuming that the other generates correct output. For the full user manual, run`perldoc codegen-diff'
.
emacs/
Theemacs
directory contains syntax-highlighting files which will work with Emacs and XEmacs editors, providing syntax highlighting support for LLVM assembly files and TableGen description files. For information on how to use the syntax files, consult theREADME
file in that directory.
getsrcs.sh
Thegetsrcs.sh
script finds and outputs all non-generated source files, which is useful if one wishes to do a lot of development across directories and does not want to individually find each file. One way to use it is to run, for example:xemacs `utils/getsources.sh`
from the top of your LLVM source tree.
llvmgrep
This little tool performs anegrep -H -n
on each source file in LLVM and passes to it a regular expression provided onllvmgrep
‘s command line. This is a very efficient way of searching the source base for a particular regular expression.
makellvm
Themakellvm
script compiles all files in the current directory and then compiles and links the tool that is the first argument. For example, assuming you are in the directoryllvm/lib/Target/Sparc
, ifmakellvm
is in your path, simply runningmakellvm llc
will make a build of the current directory, switch to directoryllvm/tools/llc
and build it, causing a re-linking of LLC.
TableGen/
TheTableGen
directory contains the tool used to generate register descriptions, instruction set descriptions, and even assemblers from common TableGen description files.
vim/
Thevim
directory contains syntax-highlighting files which will work with the VIM editor, providing syntax highlighting support for LLVM assembly files and TableGen description files. For information on how to use the syntax files, consult theREADME
file in that directory.
An Example Using the LLVM Tool Chain¶
This section gives an example of using LLVM with the Clang front end.
Example with clang¶
First, create a simple C file, name it ‘hello.c’:
#include <stdio.h> int main() { printf("hello world\n"); return 0; }
Next, compile the C file into a native executable:
% clang hello.c -o hello
Note
Clang works just like GCC by default. The standard -S and -c arguments work as usual (producing a native .s or .o file, respectively).
Next, compile the C file into an LLVM bitcode file:
% clang -O3 -emit-llvm hello.c -c -o hello.bc
The -emit-llvm option can be used with the -S or -c options to emit an LLVM
.ll
or.bc
file (respectively) for the code. This allows you to use the standard LLVM tools on the bitcode file.Run the program in both forms. To run the program, use:
% ./hello
and
% lli hello.bc
The second examples shows how to invoke the LLVM JIT, lli.
Use the
llvm-dis
utility to take a look at the LLVM assembly code:% llvm-dis < hello.bc | less
Compile the program to native assembly using the LLC code generator:
% llc hello.bc -o hello.s
Assemble the native assembly language file into a program:
% /opt/SUNWspro/bin/cc -xarch=v9 hello.s -o hello.native # On Solaris % gcc hello.s -o hello.native # On others
Execute the native code program:
% ./hello.native
Note that using clang to compile directly to native code (i.e. when the
-emit-llvm
option is not present) does steps 6/7/8 for you.
Common Problems¶
If you are having problems building or using LLVM, or if you have any other general questions about LLVM, please consult the Frequently Asked Questions page.
Links¶
This document is just an introduction on how to use LLVM to do some simple things... there are many more interesting and complicated things that you can do that aren’t documented here (but we’ll gladly accept a patch if you want to write something up!). For more information about LLVM, check out:
Getting Started with the LLVM System using Microsoft Visual Studio¶
Overview¶
Welcome to LLVM on Windows! This document only covers LLVM on Windows using Visual Studio, not mingw or cygwin. In order to get started, you first need to know some basic information.
There are many different projects that compose LLVM. The first is the LLVM suite. This contains all of the tools, libraries, and header files needed to use LLVM. It contains an assembler, disassembler, bitcode analyzer and bitcode optimizer. It also contains a test suite that can be used to test the LLVM tools.
Another useful project on Windows is Clang.
Clang is a C family ([Objective]C/C++) compiler. Clang mostly works on
Windows, but does not currently understand all of the Microsoft extensions
to C and C++. Because of this, clang cannot parse the C++ standard library
included with Visual Studio, nor parts of the Windows Platform SDK. However,
most standard C programs do compile. Clang can be used to emit bitcode,
directly emit object files or even linked executables using Visual Studio’s
link.exe
.
The large LLVM test suite cannot be run on the Visual Studio port at this time.
Most of the tools build and work. bugpoint
does build, but does
not work.
Additional information about the LLVM directory structure and tool chain can be found on the main Getting Started with the LLVM System page.
Requirements¶
Before you begin to use the LLVM system, review the requirements given below. This may save you some trouble by knowing ahead of time what hardware and software you will need.
Hardware¶
Any system that can adequately run Visual Studio 2013 is fine. The LLVM source tree and object files, libraries and executables will consume approximately 3GB.
Software¶
You will need Visual Studio 2013 or higher.
You will also need the CMake build system since it generates the project files you will use to build with.
If you would like to run the LLVM tests you will need Python. Version 2.7 and newer are known to work. You will need GnuWin32 tools, too.
Do not install the LLVM directory tree into a path containing spaces (e.g.
C:\Documents and Settings\...
) as the configure step will fail.
Getting Started¶
Here’s the short story for getting up and running quickly with LLVM:
Read the documentation.
Seriously, read the documentation.
Remember that you were warned twice about reading the documentation.
Get the Source Code
With the distributed files:
cd <where-you-want-llvm-to-live>
gunzip --stdout llvm-VERSION.tar.gz | tar -xvf -
(or use WinZip)cd llvm
With anonymous Subversion access:
cd <where-you-want-llvm-to-live>
svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
cd llvm
Use CMake to generate up-to-date project files:
- Once CMake is installed then the simplest way is to just start the
CMake GUI, select the directory where you have LLVM extracted to, and
the default options should all be fine. One option you may really
want to change, regardless of anything else, might be the
CMAKE_INSTALL_PREFIX
setting to select a directory to INSTALL to once compiling is complete, although installation is not mandatory for using LLVM. Another important option isLLVM_TARGETS_TO_BUILD
, which controls the LLVM target architectures that are included on the build. - See the LLVM CMake guide for detailed information about how to configure the LLVM build.
- CMake generates project files for all build types. To select a specific
build type, use the Configuration manager from the VS IDE or the
/property:Configuration
command line option when using MSBuild.
- Once CMake is installed then the simplest way is to just start the
CMake GUI, select the directory where you have LLVM extracted to, and
the default options should all be fine. One option you may really
want to change, regardless of anything else, might be the
Start Visual Studio
- In the directory you created the project files will have an
llvm.sln
file, just double-click on that to open Visual Studio.
- In the directory you created the project files will have an
Build the LLVM Suite:
- The projects may still be built individually, but to build them all do
not just select all of them in batch build (as some are meant as
configuration projects), but rather select and build just the
ALL_BUILD
project to build everything, or theINSTALL
project, which first builds theALL_BUILD
project, then installs the LLVM headers, libs, and other useful things to the directory set by theCMAKE_INSTALL_PREFIX
setting when you first configured CMake. - The Fibonacci project is a sample program that uses the JIT. Modify the project’s debugging properties to provide a numeric command line argument or run it from the command line. The program will print the corresponding fibonacci value.
- The projects may still be built individually, but to build them all do
not just select all of them in batch build (as some are meant as
configuration projects), but rather select and build just the
Test LLVM in Visual Studio:
- If
%PATH%
does not contain GnuWin32, you may specifyLLVM_LIT_TOOLS_DIR
on CMake for the path to GnuWin32. - You can run LLVM tests by merely building the project “check”. The test results will be shown in the VS output window.
- If
Test LLVM on the command line:
The LLVM tests can be run by changing directory to the llvm source directory and running:
C:\..\llvm> python ..\build\bin\llvm-lit --param build_config=Win32 --param build_mode=Debug --param llvm_site_config=../build/test/lit.site.cfg test
This example assumes that Python is in your PATH variable, you have built a Win32 Debug version of llvm with a standard out of line build. You should not see any unexpected failures, but will see many unsupported tests and expected failures.
A specific test or test directory can be run with:
C:\..\llvm> python ..\build\bin\llvm-lit --param build_config=Win32 --param build_mode=Debug --param llvm_site_config=../build/test/lit.site.cfg test/path/to/test
An Example Using the LLVM Tool Chain¶
First, create a simple C file, name it ‘
hello.c
‘:#include <stdio.h> int main() { printf("hello world\n"); return 0; }
Next, compile the C file into an LLVM bitcode file:
C:\..> clang -c hello.c -emit-llvm -o hello.bc
This will create the result file
hello.bc
which is the LLVM bitcode that corresponds the compiled program and the library facilities that it required. You can execute this file directly usinglli
tool, compile it to native assembly with thellc
, optimize or analyze it further with theopt
tool, etc.Alternatively you can directly output an executable with clang with:
C:\..> clang hello.c -o hello.exe
The
-o hello.exe
is required because clang currently outputsa.out
when neither-o
nor-c
are given.Run the program using the just-in-time compiler:
C:\..> lli hello.bc
Use the
llvm-dis
utility to take a look at the LLVM assembly code:C:\..> llvm-dis < hello.bc | more
Compile the program to object code using the LLC code generator:
C:\..> llc -filetype=obj hello.bc
Link to binary using Microsoft link:
C:\..> link hello.obj -defaultlib:libcmt
Execute the native code program:
C:\..> hello.exe
Common Problems¶
If you are having problems building or using LLVM, or if you have any other general questions about LLVM, please consult the Frequently Asked Questions page.
Links¶
This document is just an introduction to how to use LLVM to do some simple things... there are many more interesting and complicated things that you can do that aren’t documented here (but we’ll gladly accept a patch if you want to write something up!). For more information about LLVM, check out:
Building LLVM With Autotools¶
Overview¶
This document details how to use the LLVM autotools based build system to configure and build LLVM from source. The normal developer process using CMake is detailed here.
A Quick Summary¶
- Configure and build LLVM and Clang:
cd where-you-want-to-build-llvm
mkdir build
(for building without polluting the source dir)cd build
../llvm/configure [options]
Some common options:--prefix=directory
— Specify for directory the full pathname of where you want the LLVM tools and libraries to be installed (default/usr/local
).--enable-optimized
— Compile with optimizations enabled (default is NO).--enable-assertions
— Compile with assertion checks enabled (default is YES).
make [-j]
— The-j
specifies the number of jobs (commands) to run simultaneously. This builds both LLVM and Clang for Debug+Asserts mode. The--enable-optimized
configure option is used to specify a Release build.make check-all
— This run the regression tests to ensure everything is in working order.- If you get an “internal compiler error (ICE)” or test failures, see here.
Local LLVM Configuration¶
Once checked out from the Subversion repository, the LLVM suite source code must
be configured via the configure
script. This script sets variables in the
various *.in
files, most notably llvm/Makefile.config
and
llvm/include/Config/config.h
. It also populates OBJ_ROOT with the
Makefiles needed to begin building LLVM.
The following environment variables are used by the configure
script to
configure the build system:
Variable | Purpose |
---|---|
CC | Tells configure which C compiler to use. By default,
configure will check PATH for clang and GCC C
compilers (in this order). Use this variable to override
configure ‘s default behavior. |
CXX | Tells configure which C++ compiler to use. By
default, configure will check PATH for
clang++ and GCC C++ compilers (in this order). Use
this variable to override configure ‘s default
behavior. |
The following options can be used to set or enable LLVM specific options:
--enable-optimized
Enables optimized compilation (debugging symbols are removed and GCC optimization flags are enabled). Note that this is the default setting if you are using the LLVM distribution. The default behavior of a Subversion checkout is to use an unoptimized build (also known as a debug build).
--enable-debug-runtime
Enables debug symbols in the runtime libraries. The default is to strip debug symbols from the runtime libraries.
--enable-jit
Compile the Just In Time (JIT) compiler functionality. This is not available on all platforms. The default is dependent on platform, so it is best to explicitly enable it if you want it.
--enable-targets=target-option
Controls which targets will be built and linked into llc. The default value for
target_options
is “all” which builds and links all available targets. The “host” target is selected as the target of the build host. You can also specify a comma separated list of target names that you want available in llc. The target names use all lower case. The current set of targets is:aarch64, arm, arm64, cpp, hexagon, mips, mipsel, mips64, mips64el, msp430, powerpc, nvptx, r600, sparc, systemz, x86, x86_64, xcore
.
--enable-doxygen
Look for the doxygen program and enable construction of doxygen based documentation from the source code. This is disabled by default because generating the documentation can take a long time and producess 100s of megabytes of output.
To configure LLVM, follow these steps:
Change directory into the object root directory:
% cd OBJ_ROOT
Run the
configure
script located in the LLVM source tree:% $LLVM_SRC_DIR/configure --prefix=/install/path [other options]
Compiling the LLVM Suite Source Code¶
Once you have configured LLVM, you can build it. There are three types of builds:
Debug Builds
These builds are the default when one is using a Subversion checkout and typesgmake
(unless the--enable-optimized
option was used during configuration). The build system will compile the tools and libraries with debugging information. To get a Debug Build using the LLVM distribution the--disable-optimized
option must be passed toconfigure
.
Release (Optimized) Builds
These builds are enabled with the--enable-optimized
option toconfigure
or by specifyingENABLE_OPTIMIZED=1
on thegmake
command line. For these builds, the build system will compile the tools and libraries with GCC optimizations enabled and strip debugging information from the libraries and executables it generates. Note that Release Builds are default when using an LLVM distribution.
Profile Builds
These builds are for use with profiling. They compile profiling information into the code for use with programs likegprof
. Profile builds must be started by specifyingENABLE_PROFILING=1
on thegmake
command line.
Once you have LLVM configured, you can build it by entering the OBJ_ROOT directory and issuing the following command:
% gmake
If the build fails, please check here to see if you are using a version of GCC that is known not to compile LLVM.
If you have multiple processors in your machine, you may wish to use some of the parallel build options provided by GNU Make. For example, you could use the command:
% gmake -j2
There are several special targets which are useful when working with the LLVM source code:
gmake clean
Removes all files generated by the build. This includes object files, generated C/C++ files, libraries, and executables.
gmake dist-clean
Removes everything thatgmake clean
does, but also removes files generated byconfigure
. It attempts to return the source tree to the original state in which it was shipped.
gmake install
Installs LLVM header files, libraries, tools, and documentation in a hierarchy under$PREFIX
, specified with$LLVM_SRC_DIR/configure --prefix=[dir]
, which defaults to/usr/local
.
gmake -C runtime install-bytecode
Assuming you built LLVM into $OBJDIR, when this command is run, it will install bitcode libraries into the GCC front end’s bitcode library directory. If you need to update your bitcode libraries, this is the target to use once you’ve built them.
Please see the Makefile Guide for further details on
these make
targets and descriptions of other targets available.
It is also possible to override default values from configure
by declaring
variables on the command line. The following are some examples:
gmake ENABLE_OPTIMIZED=1
Perform a Release (Optimized) build.
gmake ENABLE_OPTIMIZED=1 DISABLE_ASSERTIONS=1
Perform a Release (Optimized) build without assertions enabled.
gmake ENABLE_OPTIMIZED=0
Perform a Debug build.
gmake ENABLE_PROFILING=1
Perform a Profiling build.
gmake VERBOSE=1
Print whatgmake
is doing on standard output.
gmake TOOL_VERBOSE=1
Ask each tool invoked by the makefiles to print out what it is doing on the standard output. This also impliesVERBOSE=1
.
Every directory in the LLVM object tree includes a Makefile
to build it and
any subdirectories that it contains. Entering any directory inside the LLVM
object tree and typing gmake
should rebuild anything in or below that
directory that is out of date.
This does not apply to building the documentation.
LLVM’s (non-Doxygen) documentation is produced with the
Sphinx documentation generation system.
There are some HTML documents that have not yet been converted to the new
system (which uses the easy-to-read and easy-to-write
reStructuredText plaintext markup
language).
The generated documentation is built in the $LLVM_SRC_DIR/docs
directory using
a special makefile.
For instructions on how to install Sphinx, see
Sphinx Introduction for LLVM Developers.
After following the instructions there for installing Sphinx, build the LLVM
HTML documentation by doing the following:
$ cd $LLVM_SRC_DIR/docs
$ make -f Makefile.sphinx
This creates a _build/html
sub-directory with all of the HTML files, not
just the generated ones.
This directory corresponds to llvm.org/docs
.
For example, _build/html/SphinxQuickstartTemplate.html
corresponds to
llvm.org/docs/SphinxQuickstartTemplate.html
.
The Sphinx Quickstart Template is useful when creating a new document.
Cross-Compiling LLVM¶
It is possible to cross-compile LLVM itself. That is, you can create LLVM
executables and libraries to be hosted on a platform different from the platform
where they are built (a Canadian Cross build). To configure a cross-compile,
supply the configure script with --build
and --host
options that are
different. The values of these options must be legal target triples that your
GCC compiler supports.
The result of such a build is executables that are not runnable on on the build host (–build option) but can be executed on the compile host (–host option).
Check How To Cross-Compile Clang/LLVM using Clang/LLVM and Clang docs on how to cross-compile in general for more information about cross-compiling.
The Location of LLVM Object Files¶
The LLVM build system is capable of sharing a single LLVM source tree among several LLVM builds. Hence, it is possible to build LLVM for several different platforms or configurations using the same source tree.
This is accomplished in the typical autoconf manner:
Change directory to where the LLVM object files should live:
% cd OBJ_ROOT
Run the
configure
script found in the LLVM source directory:% $LLVM_SRC_DIR/configure
The LLVM build will place files underneath OBJ_ROOT in directories named after the build type:
Debug Builds with assertions enabled (the default)
Tools
OBJ_ROOT/Debug+Asserts/bin
Libraries
OBJ_ROOT/Debug+Asserts/lib
Release Builds
Tools
OBJ_ROOT/Release/bin
Libraries
OBJ_ROOT/Release/lib
Profile Builds
Tools
OBJ_ROOT/Profile/bin
Libraries
OBJ_ROOT/Profile/lib
Frequently Asked Questions (FAQ)¶
- License
- Source Code
- Build Problems
- When I run configure, it finds the wrong C compiler.
- The
configure
script finds the right C compiler, but it uses the LLVM tools from a previous build. What do I do? - When creating a dynamic library, I get a strange GLIBC error.
- I’ve updated my source tree from Subversion, and now my build is trying to use a file/directory that doesn’t exist.
- I’ve modified a Makefile in my source tree, but my build tree keeps using the old version. What do I do?
- I’ve upgraded to a new version of LLVM, and I get strange build errors.
- I’ve built LLVM and am testing it, but the tests freeze.
- Why do test results differ when I perform different types of builds?
- Compiling LLVM with GCC 3.3.2 fails, what should I do?
- After Subversion update, rebuilding gives the error “No rule to make target”.
- Source Languages
- What source languages are supported?
- I’d like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?
- What support is there for a higher level source language constructs for building a compiler?
- I don’t understand the
GetElementPtr
instruction. Help!
- Using the C and C++ Front Ends
- Questions about code generated by the demo page
- What is this
llvm.global_ctors
and_GLOBAL__I_a...
stuff that happens when I#include <iostream>
? - Where did all of my code go??
- What is this “
undef
” thing that shows up in my code? - Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into “unreachable”? Why not make the verifier reject it?
- What is this
License¶
Does the University of Illinois Open Source License really qualify as an “open source” license?¶
Yes, the license is certified by the Open Source Initiative (OSI).
Can I modify LLVM source code and redistribute the modified source?¶
Yes. The modified source distribution must retain the copyright notice and follow the three bulletted conditions listed in the LLVM license.
Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?¶
Yes. This is why we distribute LLVM under a less restrictive license than GPL, as explained in the first question above.
Source Code¶
In what language is LLVM written?¶
All of the LLVM tools and libraries are written in C++ with extensive use of the STL.
How portable is the LLVM source code?¶
The LLVM source code should be portable to most modern Unix-like operating systems. Most of the code is written in standard C++ with operating system services abstracted to a support library. The tools required to build and test LLVM have been ported to a plethora of platforms.
Some porting problems may exist in the following areas:
- The autoconf/makefile build system relies heavily on UNIX shell tools, like the Bourne Shell and sed. Porting to systems without these tools (MacOS 9, Plan 9) will require more effort.
What API do I use to store a value to one of the virtual registers in LLVM IR’s SSA representation?¶
In short: you can’t. It’s actually kind of a silly question once you grok what’s going on. Basically, in code like:
%result = add i32 %foo, %bar
, %result
is just a name given to the Value
of the add
instruction. In other words, %result
is the add instruction. The
“assignment” doesn’t explicitly “store” anything to any “virtual register”;
the “=
” is more like the mathematical sense of equality.
Longer explanation: In order to generate a textual representation of the
IR, some kind of name has to be given to each instruction so that other
instructions can textually reference it. However, the isomorphic in-memory
representation that you manipulate from C++ has no such restriction since
instructions can simply keep pointers to any other Value
‘s that they
reference. In fact, the names of dummy numbered temporaries like %1
are
not explicitly represented in the in-memory representation at all (see
Value::getName()
).
Build Problems¶
When I run configure, it finds the wrong C compiler.¶
The configure
script attempts to locate first gcc
and then cc
,
unless it finds compiler paths set in CC
and CXX
for the C and C++
compiler, respectively.
If configure
finds the wrong compiler, either adjust your PATH
environment variable or set CC
and CXX
explicitly.
The configure
script finds the right C compiler, but it uses the LLVM tools from a previous build. What do I do?¶
The configure
script uses the PATH
to find executables, so if it’s
grabbing the wrong linker/assembler/etc, there are two ways to fix it:
- Adjust your
PATH
environment variable so that the correct program appears first in thePATH
. This may work, but may not be convenient when you want them first in your path for other work. - Run
configure
with an alternativePATH
that is correct. In a Bourne compatible shell, the syntax would be:
% PATH=[the path without the bad program] $LLVM_SRC_DIR/configure ...
This is still somewhat inconvenient, but it allows configure
to do its
work without having to adjust your PATH
permanently.
When creating a dynamic library, I get a strange GLIBC error.¶
Under some operating systems (i.e. Linux), libtool does not work correctly if
GCC was compiled with the --disable-shared option
. To work around this,
install your own version of GCC that has shared libraries enabled by default.
I’ve updated my source tree from Subversion, and now my build is trying to use a file/directory that doesn’t exist.¶
You need to re-run configure in your object directory. When new Makefiles are added to the source tree, they have to be copied over to the object tree in order to be used by the build.
I’ve modified a Makefile in my source tree, but my build tree keeps using the old version. What do I do?¶
If the Makefile already exists in your object tree, you can just run the following command in the top level directory of your object tree:
% ./config.status <relative path to Makefile>;
If the Makefile is new, you will have to modify the configure script to copy it over.
I’ve upgraded to a new version of LLVM, and I get strange build errors.¶
Sometimes, changes to the LLVM source code alters how the build system works.
Changes in libtool
, autoconf
, or header file dependencies are
especially prone to this sort of problem.
The best thing to try is to remove the old files and re-build. In most cases,
this takes care of the problem. To do this, just type make clean
and then
make
in the directory that fails to build.
I’ve built LLVM and am testing it, but the tests freeze.¶
This is most likely occurring because you built a profile or release
(optimized) build of LLVM and have not specified the same information on the
gmake
command line.
For example, if you built LLVM with the command:
% gmake ENABLE_PROFILING=1
...then you must run the tests with the following commands:
% cd llvm/test
% gmake ENABLE_PROFILING=1
Why do test results differ when I perform different types of builds?¶
The LLVM test suite is dependent upon several features of the LLVM tools and libraries.
First, the debugging assertions in code are not enabled in optimized or profiling builds. Hence, tests that used to fail may pass.
Second, some tests may rely upon debugging options or behavior that is only available in the debug build. These tests will fail in an optimized or profile build.
Compiling LLVM with GCC 3.3.2 fails, what should I do?¶
This is a bug in GCC, and affects projects other than LLVM. Try upgrading or downgrading your GCC.
After Subversion update, rebuilding gives the error “No rule to make target”.¶
If the error is of the form:
gmake[2]: *** No rule to make target `/path/to/somefile',
needed by `/path/to/another/file.d'.
Stop.
This may occur anytime files are moved within the Subversion repository or
removed entirely. In this case, the best solution is to erase all .d
files, which list dependencies for source files, and rebuild:
% cd $LLVM_OBJ_DIR
% rm -f `find . -name \*\.d`
% gmake
In other cases, it may be necessary to run make clean
before rebuilding.
Source Languages¶
What source languages are supported?¶
LLVM currently has full support for C and C++ source languages. These are available through both Clang and DragonEgg.
The PyPy developers are working on integrating LLVM into the PyPy backend so that PyPy language can translate to LLVM.
I’d like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?¶
Your compiler front-end will communicate with LLVM by creating a module in the LLVM intermediate representation (IR) format. Assuming you want to write your language’s compiler in the language itself (rather than C++), there are 3 major ways to tackle generating LLVM IR from a front-end:
- Call into the LLVM libraries code using your language’s FFI (foreign function interface).
- for: best tracks changes to the LLVM IR, .ll syntax, and .bc format
- for: enables running LLVM optimization passes without a emit/parse overhead
- for: adapts well to a JIT context
- against: lots of ugly glue code to write
- Emit LLVM assembly from your compiler’s native language.
- for: very straightforward to get started
- against: the .ll parser is slower than the bitcode reader when interfacing to the middle end
- against: it may be harder to track changes to the IR
- Emit LLVM bitcode from your compiler’s native language.
- for: can use the more-efficient bitcode reader when interfacing to the middle end
- against: you’ll have to re-engineer the LLVM IR object model and bitcode writer in your language
- against: it may be harder to track changes to the IR
If you go with the first option, the C bindings in include/llvm-c should help a lot, since most languages have strong support for interfacing with C. The most common hurdle with calling C from managed code is interfacing with the garbage collector. The C interface was designed to require very little memory management, and so is straightforward in this regard.
What support is there for a higher level source language constructs for building a compiler?¶
Currently, there isn’t much. LLVM supports an intermediate representation which is useful for code representation but will not support the high level (abstract syntax tree) representation needed by most compilers. There are no facilities for lexical nor semantic analysis.
Using the C and C++ Front Ends¶
Can I compile C or C++ code to platform-independent LLVM bitcode?¶
No. C and C++ are inherently platform-dependent languages. The most obvious example of this is the preprocessor. A very common way that C code is made portable is by using the preprocessor to include platform-specific code. In practice, information about other platforms is lost after preprocessing, so the result is inherently dependent on the platform that the preprocessing was targeting.
Another example is sizeof
. It’s common for sizeof(long)
to vary
between platforms. In most C front-ends, sizeof
is expanded to a
constant immediately, thus hard-wiring a platform-specific detail.
Also, since many platforms define their ABIs in terms of C, and since LLVM is lower-level than C, front-ends currently must emit platform-specific IR in order to have the result conform to the platform ABI.
Questions about code generated by the demo page¶
What is this llvm.global_ctors
and _GLOBAL__I_a...
stuff that happens when I #include <iostream>
?¶
If you #include
the <iostream>
header into a C++ translation unit,
the file will probably use the std::cin
/std::cout
/... global objects.
However, C++ does not guarantee an order of initialization between static
objects in different translation units, so if a static ctor/dtor in your .cpp
file used std::cout
, for example, the object would not necessarily be
automatically initialized before your use.
To make std::cout
and friends work correctly in these scenarios, the STL
that we use declares a static object that gets created in every translation
unit that includes <iostream>
. This object has a static constructor
and destructor that initializes and destroys the global iostream objects
before they could possibly be used in the file. The code that you see in the
.ll
file corresponds to the constructor and destructor registration code.
If you would like to make it easier to understand the LLVM code generated
by the compiler in the demo page, consider using printf()
instead of
iostream
s to print values.
Where did all of my code go??¶
If you are using the LLVM demo page, you may often wonder what happened to all of the code that you typed in. Remember that the demo script is running the code through the LLVM optimizers, so if your code doesn’t actually do anything useful, it might all be deleted.
To prevent this, make sure that the code is actually needed. For example, if
you are computing some expression, return the value from the function instead
of leaving it in a local variable. If you really want to constrain the
optimizer, you can read from and assign to volatile
global variables.
What is this “undef
” thing that shows up in my code?¶
undef
is the LLVM way of representing a value that is not defined. You
can get these if you do not initialize a variable before you use it. For
example, the C function:
int X() { int i; return i; }
Is compiled to “ret i32 undef
” because “i
” never has a value specified
for it.
Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into “unreachable”? Why not make the verifier reject it?¶
This is a common problem run into by authors of front-ends that are using custom calling conventions: you need to make sure to set the right calling convention on both the function and on each call to the function. For example, this code:
define fastcc void @foo() {
ret void
}
define void @bar() {
call void @foo()
ret void
}
Is optimized to:
define fastcc void @foo() {
ret void
}
define void @bar() {
unreachable
}
... with “opt -instcombine -simplifycfg
”. This often bites people because
“all their code disappears”. Setting the calling convention on the caller and
callee is required for indirect calls to work, so people often ask why not
make the verifier reject this sort of thing.
The answer is that this code has undefined behavior, but it is not illegal. If we made it illegal, then every transformation that could potentially create this would have to ensure that it doesn’t, and there is valid code that can create this sort of construct (in dead code). The sorts of things that can cause this to happen are fairly contrived, but we still need to accept them. Here’s an example:
define fastcc void @foo() {
ret void
}
define internal void @bar(void()* %FP, i1 %cond) {
br i1 %cond, label %T, label %F
T:
call void %FP()
ret void
F:
call fastcc void %FP()
ret void
}
define void @test() {
%X = or i1 false, false
call void @bar(void()* @foo, i1 %X)
ret void
}
In this example, “test” always passes @foo
/false
into bar
, which
ensures that it is dynamically called with the right calling conv (thus, the
code is perfectly well defined). If you run this through the inliner, you
get this (the explicit “or” is there so that the inliner doesn’t dead code
eliminate a bunch of stuff):
define fastcc void @foo() {
ret void
}
define void @test() {
%X = or i1 false, false
br i1 %X, label %T.i, label %F.i
T.i:
call void @foo()
br label %bar.exit
F.i:
call fastcc void @foo()
br label %bar.exit
bar.exit:
ret void
}
Here you can see that the inlining pass made an undefined call to @foo
with the wrong calling convention. We really don’t want to make the inliner
have to know about this sort of thing, so it needs to be valid code. In this
case, dead code elimination can trivially remove the undefined code. However,
if %X
was an input argument to @test
, the inliner would produce this:
define fastcc void @foo() {
ret void
}
define void @test(i1 %X) {
br i1 %X, label %T.i, label %F.i
T.i:
call void @foo()
br label %bar.exit
F.i:
call fastcc void @foo()
br label %bar.exit
bar.exit:
ret void
}
The interesting thing about this is that %X
must be false for the
code to be well-defined, but no amount of dead code elimination will be able
to delete the broken call as unreachable. However, since
instcombine
/simplifycfg
turns the undefined call into unreachable, we
end up with a branch on a condition that goes to unreachable: a branch to
unreachable can never happen, so “-inline -instcombine -simplifycfg
” is
able to produce:
define fastcc void @foo() {
ret void
}
define void @test(i1 %X) {
F.i:
call fastcc void @foo()
ret void
}
The LLVM Lexicon¶
Note
This document is a work in progress!
Definitions¶
A¶
- ADCE
- Aggressive Dead Code Elimination
- AST
Abstract Syntax Tree.
Due to Clang’s influence (mostly the fact that parsing and semantic analysis are so intertwined for C and especially C++), the typical working definition of AST in the LLVM community is roughly “the compiler’s first complete symbolic (as opposed to textual) representation of an input program”. As such, an “AST” might be a more general graph instead of a “tree” (consider the symbolic representation for the type of a typical “linked list node”). This working definition is closer to what some authors call an “annotated abstract syntax tree”.
Consult your favorite compiler book or search engine for more details.
B¶
- BB Vectorization
- Basic-Block Vectorization
- BURS
- Bottom Up Rewriting System — A method of instruction selection for code generation. An example is the BURG tool.
C¶
- CFI
- Call Frame Information. Used in DWARF debug info and in C++ unwind info to show how the function prolog lays out the stack frame.
- CIE
- Common Information Entry. A kind of CFI used to reduce the size of FDEs. The compiler creates a CIE which contains the information common across all the FDEs. Each FDE then points to its CIE.
- CSE
- Common Subexpression Elimination. An optimization that removes common
subexpression compuation. For example
(a+b)*(a+b)
has two subexpressions that are the same:(a+b)
. This optimization would perform the addition only once and then perform the multiply (but only if it’s computationally correct/safe).
D¶
- DAG
- Directed Acyclic Graph
- Derived Pointer
- A pointer to the interior of an object, such that a garbage collector is unable to use the pointer for reachability analysis. While a derived pointer is live, the corresponding object pointer must be kept in a root, otherwise the collector might free the referenced object. With copying collectors, derived pointers pose an additional hazard that they may be invalidated at any safe point. This term is used in opposition to object pointer.
- DSA
- Data Structure Analysis
- DSE
- Dead Store Elimination
F¶
- FCA
- First Class Aggregate
- FDE
- Frame Description Entry. A kind of CFI used to describe the stack frame of one function.
G¶
- GC
- Garbage Collection. The practice of using reachability analysis instead of explicit memory management to reclaim unused memory.
H¶
- Heap
- In garbage collection, the region of memory which is managed using reachability analysis.
I¶
- IPA
- Inter-Procedural Analysis. Refers to any variety of code analysis that occurs between procedures, functions or compilation units (modules).
- IPO
- Inter-Procedural Optimization. Refers to any variety of code optimization that occurs between procedures, functions or compilation units (modules).
- ISel
- Instruction Selection
L¶
- LCSSA
- Loop-Closed Static Single Assignment Form
- LGTM
- “Looks Good To Me”. In a review thread, this indicates that the reviewer thinks that the patch is okay to commit.
- LICM
- Loop Invariant Code Motion
- LSDA
- Language Specific Data Area. C++ “zero cost” unwinding is built on top a generic unwinding mechanism. As the unwinder walks each frame, it calls a “personality” function to do language specific analysis. Each function’s FDE points to an optional LSDA which is passed to the personality function. For C++, the LSDA contain info about the type and location of catch statements in that function.
- Load-VN
- Load Value Numbering
- LTO
- Link-Time Optimization
M¶
- MC
- Machine Code
N¶
- NFC
- “No functional change”. Used in a commit message to indicate that a patch is a pure refactoring/cleanup. Usually used in the first line, so it is visible without opening the actual commit email.
O¶
- Object Pointer
- A pointer to an object such that the garbage collector is able to trace references contained within the object. This term is used in opposition to derived pointer.
P¶
- PRE
- Partial Redundancy Elimination
R¶
RAUW
Replace All Uses With. The functionsUser::replaceUsesOfWith()
,Value::replaceAllUsesWith()
, andConstant::replaceUsesOfWithOnConstant()
implement the replacement of one Value with another by iterating over its def/use chain and fixing up all of the pointers to point to the new value. See also def/use chains.
- Reassociation
- Rearranging associative expressions to promote better redundancy elimination
and other optimization. For example, changing
(A+B-A)
into(B+A-A)
, permitting it to be optimized into(B+0)
then(B)
.
- Root
- In garbage collection, a pointer variable lying outside of the heap from which the collector begins its reachability analysis. In the context of code generation, “root” almost always refers to a “stack root” — a local or temporary variable within an executing function.
- RPO
- Reverse postorder
S¶
- Safe Point
- In garbage collection, it is necessary to identify stack roots so that reachability analysis may proceed. It may be infeasible to provide this information for every instruction, so instead the information may is calculated only at designated safe points. With a copying collector, derived pointers must not be retained across safe points and object pointers must be reloaded from stack roots.
- SDISel
- Selection DAG Instruction Selection.
- SCC
- Strongly Connected Component
- SCCP
- Sparse Conditional Constant Propagation
- SLP
- Superword-Level Parallelism, same as Basic-Block Vectorization.
- SRoA
- Scalar Replacement of Aggregates
- SSA
- Static Single Assignment
- Stack Map
- In garbage collection, metadata emitted by the code generator which identifies roots within the stack frame of an executing function.
T¶
- TBAA
- Type-Based Alias Analysis
How To Add Your Build Configuration To LLVM Buildbot Infrastructure¶
Introduction¶
This document contains information about adding a build configuration and buildslave to private slave builder to LLVM Buildbot Infrastructure http://lab.llvm.org:8011.
Steps To Add Builder To LLVM Buildbot¶
Volunteers can provide their build machines to work as build slaves to public LLVM Buildbot.
Here are the steps you can follow to do so:
Check the existing build configurations to make sure the one you are interested in is not covered yet or gets built on your computer much faster than on the existing one. We prefer faster builds so developers will get feedback sooner after changes get committed.
The computer you will be registering with the LLVM buildbot infrastructure should have all dependencies installed and you can actually build your configuration successfully. Please check what degree of parallelism (-j param) would give the fastest build. You can build multiple configurations on one computer.
Install buildslave (currently we are using buildbot version 0.8.5). Depending on the platform, buildslave could be available to download and install with your packet manager, or you can download it directly from http://trac.buildbot.net and install it manually.
Create a designated user account, your buildslave will be running under, and set appropriate permissions.
Choose the buildslave root directory (all builds will be placed under it), buildslave access name and password the build master will be using to authenticate your buildslave.
Create a buildslave in context of that buildslave account. Point it to the lab.llvm.org port 9990 (see Buildbot documentation, Creating a slave for more details) by running the following command:
$ buildslave create-slave <buildslave-root-directory> \ lab.llvm.org:9990 \ <buildslave-access-name> <buildslave-access-password>
Fill the buildslave description and admin name/e-mail. Here is an example of the buildslave description:
Windows 7 x64 Core i7 (2.66GHz), 16GB of RAM g++.exe (TDM-1 mingw32) 4.4.0 GNU Binutils 2.19.1 cmake version 2.8.4 Microsoft(R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Make sure you can actually start the buildslave successfully. Then set up your buildslave to start automatically at the start up time. See the buildbot documentation for help. You may want to restart your computer to see if it works.
Send a patch which adds your build slave and your builder to zorg.
- slaves are added to
buildbot/osuosl/master/config/slaves.py
- builders are added to
buildbot/osuosl/master/config/builders.py
- slaves are added to
Send the buildslave access name and the access password directly to Galina Kistanova, and wait till she will let you know that your changes are applied and buildmaster is reconfigured.
Check the status of your buildslave on the Waterfall Display to make sure it is connected, and
http://lab.llvm.org:8011/buildslaves/<your-buildslave-name>
to see if administrator contact and slave information are correct.Wait for the first build to succeed and enjoy.
yaml2obj¶
yaml2obj takes a YAML description of an object file and converts it to a binary file.
$ yaml2obj input-file
Outputs the binary to stdout.
COFF Syntax¶
Here’s a sample COFF file.
header:
Machine: IMAGE_FILE_MACHINE_I386 # (0x14C)
sections:
- Name: .text
Characteristics: [ IMAGE_SCN_CNT_CODE
, IMAGE_SCN_ALIGN_16BYTES
, IMAGE_SCN_MEM_EXECUTE
, IMAGE_SCN_MEM_READ
] # 0x60500020
SectionData:
"\x83\xEC\x0C\xC7\x44\x24\x08\x00\x00\x00\x00\xC7\x04\x24\x00\x00\x00\x00\xE8\x00\x00\x00\x00\xE8\x00\x00\x00\x00\x8B\x44\x24\x08\x83\xC4\x0C\xC3" # |....D$.......$...............D$.....|
symbols:
- Name: .text
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL # (0)
ComplexType: IMAGE_SYM_DTYPE_NULL # (0)
StorageClass: IMAGE_SYM_CLASS_STATIC # (3)
NumberOfAuxSymbols: 1
AuxiliaryData:
"\x24\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00" # |$.................|
- Name: _main
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL # (0)
ComplexType: IMAGE_SYM_DTYPE_NULL # (0)
StorageClass: IMAGE_SYM_CLASS_EXTERNAL # (2)
Here’s a simplified Kwalify schema with an extension to allow alternate types.
type: map
mapping:
header:
type: map
mapping:
Machine: [ {type: str, enum:
[ IMAGE_FILE_MACHINE_UNKNOWN
, IMAGE_FILE_MACHINE_AM33
, IMAGE_FILE_MACHINE_AMD64
, IMAGE_FILE_MACHINE_ARM
, IMAGE_FILE_MACHINE_ARMNT
, IMAGE_FILE_MACHINE_EBC
, IMAGE_FILE_MACHINE_I386
, IMAGE_FILE_MACHINE_IA64
, IMAGE_FILE_MACHINE_M32R
, IMAGE_FILE_MACHINE_MIPS16
, IMAGE_FILE_MACHINE_MIPSFPU
, IMAGE_FILE_MACHINE_MIPSFPU16
, IMAGE_FILE_MACHINE_POWERPC
, IMAGE_FILE_MACHINE_POWERPCFP
, IMAGE_FILE_MACHINE_R4000
, IMAGE_FILE_MACHINE_SH3
, IMAGE_FILE_MACHINE_SH3DSP
, IMAGE_FILE_MACHINE_SH4
, IMAGE_FILE_MACHINE_SH5
, IMAGE_FILE_MACHINE_THUMB
, IMAGE_FILE_MACHINE_WCEMIPSV2
]}
, {type: int}
]
Characteristics:
- type: seq
sequence:
- type: str
enum: [ IMAGE_FILE_RELOCS_STRIPPED
, IMAGE_FILE_EXECUTABLE_IMAGE
, IMAGE_FILE_LINE_NUMS_STRIPPED
, IMAGE_FILE_LOCAL_SYMS_STRIPPED
, IMAGE_FILE_AGGRESSIVE_WS_TRIM
, IMAGE_FILE_LARGE_ADDRESS_AWARE
, IMAGE_FILE_BYTES_REVERSED_LO
, IMAGE_FILE_32BIT_MACHINE
, IMAGE_FILE_DEBUG_STRIPPED
, IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP
, IMAGE_FILE_NET_RUN_FROM_SWAP
, IMAGE_FILE_SYSTEM
, IMAGE_FILE_DLL
, IMAGE_FILE_UP_SYSTEM_ONLY
, IMAGE_FILE_BYTES_REVERSED_HI
]
- type: int
sections:
type: seq
sequence:
- type: map
mapping:
Name: {type: str}
Characteristics:
- type: seq
sequence:
- type: str
enum: [ IMAGE_SCN_TYPE_NO_PAD
, IMAGE_SCN_CNT_CODE
, IMAGE_SCN_CNT_INITIALIZED_DATA
, IMAGE_SCN_CNT_UNINITIALIZED_DATA
, IMAGE_SCN_LNK_OTHER
, IMAGE_SCN_LNK_INFO
, IMAGE_SCN_LNK_REMOVE
, IMAGE_SCN_LNK_COMDAT
, IMAGE_SCN_GPREL
, IMAGE_SCN_MEM_PURGEABLE
, IMAGE_SCN_MEM_16BIT
, IMAGE_SCN_MEM_LOCKED
, IMAGE_SCN_MEM_PRELOAD
, IMAGE_SCN_ALIGN_1BYTES
, IMAGE_SCN_ALIGN_2BYTES
, IMAGE_SCN_ALIGN_4BYTES
, IMAGE_SCN_ALIGN_8BYTES
, IMAGE_SCN_ALIGN_16BYTES
, IMAGE_SCN_ALIGN_32BYTES
, IMAGE_SCN_ALIGN_64BYTES
, IMAGE_SCN_ALIGN_128BYTES
, IMAGE_SCN_ALIGN_256BYTES
, IMAGE_SCN_ALIGN_512BYTES
, IMAGE_SCN_ALIGN_1024BYTES
, IMAGE_SCN_ALIGN_2048BYTES
, IMAGE_SCN_ALIGN_4096BYTES
, IMAGE_SCN_ALIGN_8192BYTES
, IMAGE_SCN_LNK_NRELOC_OVFL
, IMAGE_SCN_MEM_DISCARDABLE
, IMAGE_SCN_MEM_NOT_CACHED
, IMAGE_SCN_MEM_NOT_PAGED
, IMAGE_SCN_MEM_SHARED
, IMAGE_SCN_MEM_EXECUTE
, IMAGE_SCN_MEM_READ
, IMAGE_SCN_MEM_WRITE
]
- type: int
SectionData: {type: str}
symbols:
type: seq
sequence:
- type: map
mapping:
Name: {type: str}
Value: {type: int}
SectionNumber: {type: int}
SimpleType: [ {type: str, enum: [ IMAGE_SYM_TYPE_NULL
, IMAGE_SYM_TYPE_VOID
, IMAGE_SYM_TYPE_CHAR
, IMAGE_SYM_TYPE_SHORT
, IMAGE_SYM_TYPE_INT
, IMAGE_SYM_TYPE_LONG
, IMAGE_SYM_TYPE_FLOAT
, IMAGE_SYM_TYPE_DOUBLE
, IMAGE_SYM_TYPE_STRUCT
, IMAGE_SYM_TYPE_UNION
, IMAGE_SYM_TYPE_ENUM
, IMAGE_SYM_TYPE_MOE
, IMAGE_SYM_TYPE_BYTE
, IMAGE_SYM_TYPE_WORD
, IMAGE_SYM_TYPE_UINT
, IMAGE_SYM_TYPE_DWORD
]}
, {type: int}
]
ComplexType: [ {type: str, enum: [ IMAGE_SYM_DTYPE_NULL
, IMAGE_SYM_DTYPE_POINTER
, IMAGE_SYM_DTYPE_FUNCTION
, IMAGE_SYM_DTYPE_ARRAY
]}
, {type: int}
]
StorageClass: [ {type: str, enum:
[ IMAGE_SYM_CLASS_END_OF_FUNCTION
, IMAGE_SYM_CLASS_NULL
, IMAGE_SYM_CLASS_AUTOMATIC
, IMAGE_SYM_CLASS_EXTERNAL
, IMAGE_SYM_CLASS_STATIC
, IMAGE_SYM_CLASS_REGISTER
, IMAGE_SYM_CLASS_EXTERNAL_DEF
, IMAGE_SYM_CLASS_LABEL
, IMAGE_SYM_CLASS_UNDEFINED_LABEL
, IMAGE_SYM_CLASS_MEMBER_OF_STRUCT
, IMAGE_SYM_CLASS_ARGUMENT
, IMAGE_SYM_CLASS_STRUCT_TAG
, IMAGE_SYM_CLASS_MEMBER_OF_UNION
, IMAGE_SYM_CLASS_UNION_TAG
, IMAGE_SYM_CLASS_TYPE_DEFINITION
, IMAGE_SYM_CLASS_UNDEFINED_STATIC
, IMAGE_SYM_CLASS_ENUM_TAG
, IMAGE_SYM_CLASS_MEMBER_OF_ENUM
, IMAGE_SYM_CLASS_REGISTER_PARAM
, IMAGE_SYM_CLASS_BIT_FIELD
, IMAGE_SYM_CLASS_BLOCK
, IMAGE_SYM_CLASS_FUNCTION
, IMAGE_SYM_CLASS_END_OF_STRUCT
, IMAGE_SYM_CLASS_FILE
, IMAGE_SYM_CLASS_SECTION
, IMAGE_SYM_CLASS_WEAK_EXTERNAL
, IMAGE_SYM_CLASS_CLR_TOKEN
]}
, {type: int}
]
How to submit an LLVM bug report¶
Introduction - Got bugs?¶
If you’re working with LLVM and run into a bug, we definitely want to know about it. This document describes what you can do to increase the odds of getting it fixed quickly.
Basically you have to do two things at a minimum. First, decide whether the bug crashes the compiler (or an LLVM pass), or if the compiler is miscompiling the program (i.e., the compiler successfully produces an executable, but it doesn’t run right). Based on what type of bug it is, follow the instructions in the linked section to narrow down the bug so that the person who fixes it will be able to find the problem more easily.
Once you have a reduced test-case, go to the LLVM Bug Tracking System and fill out the form with the necessary details (note that you don’t need to pick a category, just use the “new-bugs” category if you’re not sure). The bug description should contain the following information:
- All information necessary to reproduce the problem.
- The reduced test-case that triggers the bug.
- The location where you obtained LLVM (if not from our Subversion repository).
Thanks for helping us make LLVM better!
Crashing Bugs¶
More often than not, bugs in the compiler cause it to crash—often due to an assertion failure of some sort. The most important piece of the puzzle is to figure out if it is crashing in the GCC front-end or if it is one of the LLVM libraries (e.g. the optimizer or code generator) that has problems.
To figure out which component is crashing (the front-end, optimizer or code
generator), run the clang
command line as you were when the crash
occurred, but with the following extra command line options:
-O0 -emit-llvm
: Ifclang
still crashes when passed these options (which disable the optimizer and code generator), then the crash is in the front-end. Jump ahead to the section on front-end bugs.-emit-llvm
: Ifclang
crashes with this option (which disables the code generator), you found an optimizer bug. Jump ahead to compile-time optimization bugs.- Otherwise, you have a code generator crash. Jump ahead to code generator bugs.
Front-end bugs¶
If the problem is in the front-end, you should re-run the same clang
command that resulted in the crash, but add the -save-temps
option.
The compiler will crash again, but it will leave behind a foo.i
file
(containing preprocessed C source code) and possibly foo.s
for each
compiled foo.c
file. Send us the foo.i
file, along with the options
you passed to clang
, and a brief description of the error it caused.
The delta tool helps to reduce the preprocessed file down to the smallest amount of code that still replicates the problem. You’re encouraged to use delta to reduce the code to make the developers’ lives easier. This website has instructions on the best way to use delta.
Compile-time optimization bugs¶
If you find that a bug crashes in the optimizer, compile your test-case to a
.bc
file by passing “-emit-llvm -O0 -c -o foo.bc
”.
Then run:
opt -O3 -debug-pass=Arguments foo.bc -disable-output
This command should do two things: it should print out a list of passes, and then it should crash in the same way as clang. If it doesn’t crash, please follow the instructions for a front-end bug.
If this does crash, then you should be able to debug this with the following bugpoint command:
bugpoint foo.bc <list of passes printed by opt>
Please run this, then file a bug with the instructions and reduced .bc
files that bugpoint emits. If something goes wrong with bugpoint, please
submit the “foo.bc” file and the list of passes printed by opt
.
Code generator bugs¶
If you find a bug that crashes clang in the code generator, compile your
source file to a .bc file by passing “-emit-llvm -c -o foo.bc
” to
clang (in addition to the options you already pass). Once your have
foo.bc, one of the following commands should fail:
llc foo.bc
llc foo.bc -relocation-model=pic
llc foo.bc -relocation-model=static
If none of these crash, please follow the instructions for a front-end bug. If one of these do crash, you should be able to reduce this with one of the following bugpoint command lines (use the one corresponding to the command above that failed):
bugpoint -run-llc foo.bc
bugpoint -run-llc foo.bc --tool-args -relocation-model=pic
bugpoint -run-llc foo.bc --tool-args -relocation-model=static
Please run this, then file a bug with the instructions and reduced .bc file that bugpoint emits. If something goes wrong with bugpoint, please submit the “foo.bc” file and the option that llc crashes with.
Miscompilations¶
If clang successfully produces an executable, but that executable doesn’t run right, this is either a bug in the code or a bug in the compiler. The first thing to check is to make sure it is not using undefined behavior (e.g. reading a variable before it is defined). In particular, check to see if the program valgrind‘s clean, passes purify, or some other memory checker tool. Many of the “LLVM bugs” that we have chased down ended up being bugs in the program being compiled, not LLVM.
Once you determine that the program itself is not buggy, you should choose which code generator you wish to compile the program with (e.g. LLC or the JIT) and optionally a series of LLVM passes to run. For example:
bugpoint -run-llc [... optzn passes ...] file-to-test.bc --args -- [program arguments]
bugpoint will try to narrow down your list of passes to the one pass that causes an error, and simplify the bitcode file as much as it can to assist you. It will print a message letting you know how to reproduce the resulting error.
Incorrect code generation¶
Similarly to debugging incorrect compilation by mis-behaving passes, you
can debug incorrect code generation by either LLC or the JIT, using
bugpoint
. The process bugpoint
follows in this case is to try to
narrow the code down to a function that is miscompiled by one or the other
method, but since for correctness, the entire program must be run,
bugpoint
will compile the code it deems to not be affected with the C
Backend, and then link in the shared object it generates.
To debug the JIT:
bugpoint -run-jit -output=[correct output file] [bitcode file] \
--tool-args -- [arguments to pass to lli] \
--args -- [program arguments]
Similarly, to debug the LLC, one would run:
bugpoint -run-llc -output=[correct output file] [bitcode file] \
--tool-args -- [arguments to pass to llc] \
--args -- [program arguments]
Special note: if you are debugging MultiSource or SPEC tests that
already exist in the llvm/test
hierarchy, there is an easier way to
debug the JIT, LLC, and CBE, using the pre-written Makefile targets, which
will pass the program options specified in the Makefiles:
cd llvm/test/../../program
make bugpoint-jit
At the end of a successful bugpoint
run, you will be presented
with two bitcode files: a safe file which can be compiled with the C
backend and the test file which either LLC or the JIT
mis-codegenerates, and thus causes the error.
To reproduce the error that bugpoint
found, it is sufficient to do
the following:
Regenerate the shared object from the safe bitcode file:
llc -march=c safe.bc -o safe.c gcc -shared safe.c -o safe.so
If debugging LLC, compile test bitcode native and link with the shared object:
llc test.bc -o test.s gcc test.s safe.so -o test.llc ./test.llc [program options]
If debugging the JIT, load the shared object and supply the test bitcode:
lli -load=safe.so test.bc [program options]
Sphinx Quickstart Template¶
Introduction and Quickstart¶
This document is meant to get you writing documentation as fast as possible even if you have no previous experience with Sphinx. The goal is to take someone in the state of “I want to write documentation and get it added to LLVM’s docs” and turn that into useful documentation mailed to llvm-commits with as little nonsense as possible.
You can find this document in docs/SphinxQuickstartTemplate.rst
. You
should copy it, open the new file in your text editor, write your docs, and
then send the new document to llvm-commits for review.
Focus on content. It is easy to fix the Sphinx (reStructuredText) syntax later if necessary, although reStructuredText tries to imitate common plain-text conventions so it should be quite natural. A basic knowledge of reStructuredText syntax is useful when writing the document, so the last ~half of this document (starting with Example Section) gives examples which should cover 99% of use cases.
Let me say that again: focus on content. But if you really need to verify
Sphinx’s output, see docs/README.txt
for information.
Once you have finished with the content, please send the .rst
file to
llvm-commits for review.
Guidelines¶
Try to answer the following questions in your first section:
- Why would I want to read this document?
- What should I know to be able to follow along with this document?
- What will I have learned by the end of this document?
Common names for the first section are Introduction
, Overview
, or
Background
.
If possible, make your document a “how to”. Give it a name HowTo*.rst
like the other “how to” documents. This format is usually the easiest
for another person to understand and also the most useful.
You generally should not be writing documentation other than a “how to” unless there is already a “how to” about your topic. The reason for this is that without a “how to” document to read first, it is difficult for a person to understand a more advanced document.
Focus on content (yes, I had to say it again).
The rest of this document shows example reStructuredText markup constructs that are meant to be read by you in your text editor after you have copied this file into a new file for the documentation you are about to write.
Example Section¶
Your text can be emphasized, bold, or monospace
.
Use blank lines to separate paragraphs.
Headings (like Example Section
just above) give your document its
structure. Use the same kind of adornments (e.g. ======
vs. ------
)
as are used in this document. The adornment must be the same length as the
text above it. For Vim users, variations of yypVr=
might be handy.
Example Subsection¶
Make a link like this. There is also a more
sophisticated syntax which can be more readable for longer links since
it disrupts the flow less. You can put the .. _`link text`: <URL>
block
pretty much anywhere later in the document.
Lists can be made like this:
- A list starting with
#.
will be automatically numbered. - This is a second list element.
- Use indentation to create nested lists.
You can also use unordered lists.
- Stuff.
- Deeper stuff.
- More stuff.
Example Subsubsection¶
You can make blocks of code like this:
int main() {
return 0;
}
For a shell session, use a console
code block (some existing docs use
bash
):
$ echo "Goodbye cruel world!"
$ rm -rf /
If you need to show LLVM IR use the llvm
code block.
define i32 @test1() {
entry:
ret i32 0
}
Some other common code blocks you might need are c
, objc
, make
,
and cmake
. If you need something beyond that, you can look at the full
list of supported code blocks.
However, don’t waste time fiddling with syntax highlighting when you could be adding meaningful content. When in doubt, show preformatted text without any syntax highlighting like this:
.
+:.
..:: ::
.++:+:: ::+:.:.
.:+ :
::.::..:: .+.
..:+ :: :
......+:. ..
:++. .. :
.+:::+:: :
.. . .+ ::
+.: .::+.
...+. .: .
.++:..
...
Hopefully you won’t need to be this deep¶
If you need to do fancier things than what has been shown in this document, you can mail the list or check Sphinx’s reStructuredText Primer.
Code Reviews with Phabricator¶
If you prefer to use a web user interface for code reviews, you can now submit your patches for Clang and LLVM at LLVM’s Phabricator instance.
While Phabricator is a useful tool for some, the relevant -commits mailing list is the system of record for all LLVM code review. The mailing list should be added as a subscriber on all reviews, and Phabricator users should be prepared to respond to free-form comments in mail sent to the commits list.
Sign up¶
To get started with Phabricator, navigate to http://reviews.llvm.org and click the power icon in the top right. You can register with a GitHub account, a Google account, or you can create your own profile.
Make sure that the email address registered with Phabricator is subscribed to the relevant -commits mailing list. If your are not subscribed to the commit list, all mail sent by Phabricator on your behalf will be held for moderation.
Note that if you use your Subversion user name as Phabricator user name, Phabricator will automatically connect your submits to your Phabricator user in the Code Repository Browser.
Requesting a review via the command line¶
Phabricator has a tool called Arcanist to upload patches from the command line. To get you set up, follow the Arcanist Quick Start instructions.
You can learn more about how to use arc to interact with Phabricator in the Arcanist User Guide.
Requesting a review via the web interface¶
The tool to create and review patches in Phabricator is called Differential.
Note that you can upload patches created through various diff tools, including git and svn. To make reviews easier, please always include as much context as possible with your diff! Don’t worry, Phabricator will automatically send a diff with a smaller context in the review email, but having the full file in the web interface will help the reviewer understand your code.
To get a full diff, use one of the following commands (or just use Arcanist to upload your patch):
git diff -U999999 other-branch
svn diff --diff-cmd=diff -x -U999999
To upload a new patch:
- Click Differential.
- Click + Create Diff.
- Paste the text diff or browse to the patch file. Click Create Diff.
- Leave the Repository field blank.
- Leave the drop down on Create a new Revision... and click Continue.
- Enter a descriptive title and summary. The title and summary are usually in the form of a commit message.
- Add reviewers and mailing lists that you want to be included in the review. If your patch is for LLVM, add llvm-commits as a Subscriber; if your patch is for Clang, add cfe-commits.
- Leave the Repository and Project fields blank.
- Click Save.
To submit an updated patch:
- Click Differential.
- Click + Create Diff.
- Paste the updated diff or browse to the updated patch file. Click Create Diff.
- Select the review you want to from the Attach To dropdown and click Continue.
- Leave the Repository and Project fields blank.
- Add comments about the changes in the new diff. Click Save.
Reviewing code with Phabricator¶
Phabricator allows you to add inline comments as well as overall comments to a revision. To add an inline comment, select the lines of code you want to comment on by clicking and dragging the line numbers in the diff pane. When you have added all your comments, scroll to the bottom of the page and click the Submit button.
You can add overall comments in the text box at the bottom of the page. When you’re done, click the Submit button.
Phabricator has many useful features, for example allowing you to select diffs between different versions of the patch as it was reviewed in the Revision Update History. Most features are self descriptive - explore, and if you have a question, drop by on #llvm in IRC to get help.
Note that as e-mail is the system of reference for code reviews, and some people prefer it over a web interface, we do not generate automated mail when a review changes state, for example by clicking “Accept Revision” in the web interface. Thus, please type LGTM into the comment box to accept a change from Phabricator.
Committing a change¶
Arcanist can manage the commit transparently. It will retrieve the description,
reviewers, the Differential Revision
, etc from the review and commit it to the repository.
arc patch D<Revision>
arc commit --revision D<Revision>
When committing an LLVM change that has been reviewed using Phabricator, the convention is for the commit message to end with the line:
Differential Revision: <URL>
where <URL>
is the URL for the code review, starting with
http://reviews.llvm.org/
.
Note that Arcanist will add this automatically.
This allows people reading the version history to see the review for context. This also allows Phabricator to detect the commit, close the review, and add a link from the review to the commit.
Abandoning a change¶
If you decide you should not commit the patch, you should explicitly abandon the review so that reviewers don’t think it is still open. In the web UI, scroll to the bottom of the page where normally you would enter an overall comment. In the drop-down Action list, which defaults to “Comment,” you should select “Abandon Revision” and then enter a comment explaining why. Click the Submit button to finish closing the review.
Status¶
Please let us know whether you like it and what could be improved! We’re still working on setting up a bug tracker, but you can email klimek-at-google-dot-com and chandlerc-at-gmail-dot-com and CC the llvm-dev mailing list with questions until then. We also could use help implementing improvements. This sadly is really painful and hard because the Phabricator codebase is in PHP and not as testable as you might like. However, we’ve put exactly what we’re deploying up on an llvm-reviews GitHub project where folks can hack on it and post pull requests. We’re looking into what the right long-term hosting for this is, but note that it is a derivative of an existing open source project, and so not trivially a good fit for an official LLVM project.
LLVM Testing Infrastructure Guide¶
LLVM test-suite Makefile Guide¶
Overview¶
This document describes the features of the Makefile-based LLVM test-suite. This way of interacting with the test-suite is deprecated in favor of running the test-suite using LNT, but may continue to prove useful for some users. See the Testing Guide’s test-suite Quickstart section for more information.
Test suite Structure¶
The test-suite
module contains a number of programs that can be
compiled with LLVM and executed. These programs are compiled using the
native compiler and various LLVM backends. The output from the program
compiled with the native compiler is assumed correct; the results from
the other programs are compared to the native program output and pass if
they match.
When executing tests, it is usually a good idea to start out with a
subset of the available tests or programs. This makes test run times
smaller at first and later on this is useful to investigate individual
test failures. To run some test only on a subset of programs, simply
change directory to the programs you want tested and run gmake
there. Alternatively, you can run a different test using the TEST
variable to change what tests or run on the selected programs (see below
for more info).
In addition for testing correctness, the test-suite
directory also
performs timing tests of various LLVM optimizations. It also records
compilation times for the compilers and the JIT. This information can be
used to compare the effectiveness of LLVM’s optimizations and code
generation.
test-suite
tests are divided into three types of tests: MultiSource,
SingleSource, and External.
test-suite/SingleSource
The SingleSource directory contains test programs that are only a single source file in size. These are usually small benchmark programs or small programs that calculate a particular value. Several such programs are grouped together in each directory.
test-suite/MultiSource
The MultiSource directory contains subdirectories which contain entire programs with multiple source files. Large benchmarks and whole applications go here.
test-suite/External
The External directory contains Makefiles for building code that is external to (i.e., not distributed with) LLVM. The most prominent members of this directory are the SPEC 95 and SPEC 2000 benchmark suites. The
External
directory does not contain these actual tests, but only the Makefiles that know how to properly compile these programs from somewhere else. The presence and location of these external programs is configured by the test-suiteconfigure
script.
Each tree is then subdivided into several categories, including applications, benchmarks, regression tests, code that is strange grammatically, etc. These organizations should be relatively self explanatory.
Some tests are known to fail. Some are bugs that we have not fixed yet; others are features that we haven’t added yet (or may never add). In the regression tests, the result for such tests will be XFAIL (eXpected FAILure). In this way, you can tell the difference between an expected and unexpected failure.
The tests in the test suite have no such feature at this time. If the test passes, only warnings and other miscellaneous output will be generated. If a test fails, a large <program> FAILED message will be displayed. This will help you separate benign warnings from actual test failures.
Running the test suite¶
First, all tests are executed within the LLVM object directory tree. They are not executed inside of the LLVM source tree. This is because the test suite creates temporary files during execution.
To run the test suite, you need to use the following steps:
cd
into thellvm/projects
directory in your source tree.Check out the
test-suite
module with:% svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
This will get the test suite into
llvm/projects/test-suite
.Configure and build
llvm
.Configure and build
llvm-gcc
.Install
llvm-gcc
somewhere.Re-configure
llvm
from the top level of each build tree (LLVM object directory tree) in which you want to run the test suite, just as you do before building LLVM.During the re-configuration, you must either: (1) have
llvm-gcc
you just built in your path, or (2) specify the directory where your just-builtllvm-gcc
is installed using--with-llvmgccdir=$LLVM_GCC_DIR
.You must also tell the configure machinery that the test suite is available so it can be configured for your build tree:
% cd $LLVM_OBJ_ROOT ; $LLVM_SRC_ROOT/configure [--with-llvmgccdir=$LLVM_GCC_DIR]
[Remember that
$LLVM_GCC_DIR
is the directory where you installed llvm-gcc, not its src or obj directory.]You can now run the test suite from your build tree as follows:
% cd $LLVM_OBJ_ROOT/projects/test-suite % make
Note that the second and third steps only need to be done once. After you have the suite checked out and configured, you don’t need to do it again (unless the test code or configure script changes).
Configuring External Tests¶
In order to run the External tests in the test-suite
module, you
must specify –with-externals. This must be done during the
re-configuration step (see above), and the llvm
re-configuration
must recognize the previously-built llvm-gcc
. If any of these is
missing or neglected, the External tests won’t work.
- –with-externals
- –with-externals=<directory>
This tells LLVM where to find any external tests. They are expected to
be in specifically named subdirectories of <directory
>. If
directory
is left unspecified, configure
uses the default value
/home/vadve/shared/benchmarks/speccpu2000/benchspec
. Subdirectory
names known to LLVM include:
- spec95
- speccpu2000
- speccpu2006
- povray31
Others are added from time to time, and can be determined from
configure
.
Running different tests¶
In addition to the regular “whole program” tests, the test-suite
module also provides a mechanism for compiling the programs in different
ways. If the variable TEST is defined on the gmake
command line, the
test system will include a Makefile named
TEST.<value of TEST variable>.Makefile
. This Makefile can modify
build rules to yield different results.
For example, the LLVM nightly tester uses TEST.nightly.Makefile
to
create the nightly test reports. To run the nightly tests, run
gmake TEST=nightly
.
There are several TEST Makefiles available in the tree. Some of them are designed for internal LLVM research and will not work outside of the LLVM research group. They may still be valuable, however, as a guide to writing your own TEST Makefile for any optimization or analysis passes that you develop with LLVM.
Generating test output¶
There are a number of ways to run the tests and generate output. The
most simple one is simply running gmake
with no arguments. This will
compile and run all programs in the tree using a number of different
methods and compare results. Any failures are reported in the output,
but are likely drowned in the other output. Passes are not reported
explicitly.
Somewhat better is running gmake TEST=sometest test
, which runs the
specified test and usually adds per-program summaries to the output
(depending on which sometest you use). For example, the nightly
test
explicitly outputs TEST-PASS or TEST-FAIL for every test after each
program. Though these lines are still drowned in the output, it’s easy
to grep the output logs in the Output directories.
Even better are the report
and report.format
targets (where
format
is one of html
, csv
, text
or graphs
). The
exact contents of the report are dependent on which TEST
you are
running, but the text results are always shown at the end of the run and
the results are always stored in the report.<type>.format
file (when
running with TEST=<type>
). The report
also generate a file
called report.<type>.raw.out
containing the output of the entire
test run.
Writing custom tests for the test suite¶
Assuming you can run the test suite, (e.g.
“gmake TEST=nightly report
” should work), it is really easy to run
optimizations or code generator components against every program in the
tree, collecting statistics or running custom checks for correctness. At
base, this is how the nightly tester works, it’s just one example of a
general framework.
Lets say that you have an LLVM optimization pass, and you want to see how many times it triggers. First thing you should do is add an LLVM statistic to your pass, which will tally counts of things you care about.
Following this, you can set up a test and a report that collects these
and formats them for easy viewing. This consists of two files, a
“test-suite/TEST.XXX.Makefile
” fragment (where XXX is the name of
your test) and a “test-suite/TEST.XXX.report
” file that indicates
how to format the output into a table. There are many example reports of
various levels of sophistication included with the test suite, and the
framework is very general.
If you are interested in testing an optimization pass, check out the “libcalls” test as an example. It can be run like this:
% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
% make TEST=libcalls report
This will do a bunch of stuff, then eventually print a table like this:
Name | total | #exit |
...
FreeBench/analyzer/analyzer | 51 | 6 |
FreeBench/fourinarow/fourinarow | 1 | 1 |
FreeBench/neural/neural | 19 | 9 |
FreeBench/pifft/pifft | 5 | 3 |
MallocBench/cfrac/cfrac | 1 | * |
MallocBench/espresso/espresso | 52 | 12 |
MallocBench/gs/gs | 4 | * |
Prolangs-C/TimberWolfMC/timberwolfmc | 302 | * |
Prolangs-C/agrep/agrep | 33 | 12 |
Prolangs-C/allroots/allroots | * | * |
Prolangs-C/assembler/assembler | 47 | * |
Prolangs-C/bison/mybison | 74 | * |
...
This basically is grepping the -stats output and displaying it in a table. You can also use the “TEST=libcalls report.html” target to get the table in HTML form, similarly for report.csv and report.tex.
The source for this is in test-suite/TEST.libcalls.*
. The format is
pretty simple: the Makefile indicates how to run the test (in this case,
“opt -simplify-libcalls -stats
”), and the report contains one line
for each column of the output. The first value is the header for the
column and the second is the regex to grep the output of the command
for. There are lots of example reports that can do fancy stuff.
Overview¶
This document is the reference manual for the LLVM testing infrastructure. It documents the structure of the LLVM testing infrastructure, the tools needed to use it, and how to add and run tests.
Requirements¶
In order to use the LLVM testing infrastructure, you will need all of the software required to build LLVM, as well as Python 2.7 or later.
LLVM testing infrastructure organization¶
The LLVM testing infrastructure contains two major categories of tests:
regression tests and whole programs. The regression tests are contained
inside the LLVM repository itself under llvm/test
and are expected
to always pass – they should be run before every commit.
The whole programs tests are referred to as the “LLVM test suite” (or
“test-suite”) and are in the test-suite
module in subversion. For
historical reasons, these tests are also referred to as the “nightly
tests” in places, which is less ambiguous than “test-suite” and remains
in use although we run them much more often than nightly.
Regression tests¶
The regression tests are small pieces of code that test a specific
feature of LLVM or trigger a specific bug in LLVM. The language they are
written in depends on the part of LLVM being tested. These tests are driven by
the Lit testing tool (which is part of LLVM), and
are located in the llvm/test
directory.
Typically when a bug is found in LLVM, a regression test containing just enough code to reproduce the problem should be written and placed somewhere underneath this directory. For example, it can be a small piece of LLVM IR distilled from an actual application or benchmark.
test-suite
¶
The test suite contains whole programs, which are pieces of code which can be compiled and linked into a stand-alone program that can be executed. These programs are generally written in high level languages such as C or C++.
These programs are compiled using a user specified compiler and set of flags, and then executed to capture the program output and timing information. The output of these programs is compared to a reference output to ensure that the program is being compiled correctly.
In addition to compiling and executing programs, whole program tests serve as a way of benchmarking LLVM performance, both in terms of the efficiency of the programs generated as well as the speed with which LLVM compiles, optimizes, and generates code.
The test-suite is located in the test-suite
Subversion module.
Debugging Information tests¶
The test suite contains tests to check quality of debugging information. The test are written in C based languages or in LLVM assembly language.
These tests are compiled and run under a debugger. The debugger output
is checked to validate of debugging information. See README.txt in the
test suite for more information . This test suite is located in the
debuginfo-tests
Subversion module.
Quick start¶
The tests are located in two separate Subversion modules. The
regressions tests are in the main “llvm” module under the directory
llvm/test
(so you get these tests for free with the main LLVM tree).
Use make check-all
to run the regression tests after building LLVM.
The more comprehensive test suite that includes whole programs in C and C++
is in the test-suite
module. See test-suite Quickstart for more information on running these tests.
Regression tests¶
To run all of the LLVM regression tests, use the master Makefile in the
llvm/test
directory. LLVM Makefiles require GNU Make (read the LLVM
Makefile Guide for more details):
% make -C llvm/test
or:
% make check
If you have Clang checked out and built, you can run the LLVM and Clang tests simultaneously using:
% make check-all
To run the tests with Valgrind (Memcheck by default), use the LIT_ARGS
make
variable to pass the required options to lit. For example, you can use:
% make check LIT_ARGS="-v --vg --vg-leak"
to enable testing with valgrind and with leak checking enabled.
To run individual tests or subsets of tests, you can use the llvm-lit
script which is built as part of LLVM. For example, to run the
Integer/BitPacked.ll
test by itself you can run:
% llvm-lit ~/llvm/test/Integer/BitPacked.ll
or to run all of the ARM CodeGen tests:
% llvm-lit ~/llvm/test/CodeGen/ARM
For more information on using the lit tool, see llvm-lit --help
or the lit man page.
Debugging Information tests¶
To run debugging information tests simply checkout the tests inside clang/test directory.
% cd clang/test
% svn co http://llvm.org/svn/llvm-project/debuginfo-tests/trunk debuginfo-tests
These tests are already set up to run as part of clang regression tests.
Regression test structure¶
The LLVM regression tests are driven by lit and are located in the
llvm/test
directory.
This directory contains a large array of small tests that exercise various features of LLVM and to ensure that regressions do not occur. The directory is broken into several sub-directories, each focused on a particular area of LLVM.
Writing new regression tests¶
The regression test structure is very simple, but does require some
information to be set. This information is gathered via configure
and is written to a file, test/lit.site.cfg
in the build directory.
The llvm/test
Makefile does this work for you.
In order for the regression tests to work, each directory of tests must
have a lit.local.cfg
file. lit looks for this file to determine
how to run the tests. This file is just Python code and thus is very
flexible, but we’ve standardized it for the LLVM regression tests. If
you’re adding a directory of tests, just copy lit.local.cfg
from
another directory to get running. The standard lit.local.cfg
simply
specifies which files to look in for tests. Any directory that contains
only directories does not need the lit.local.cfg
file. Read the Lit
documentation for more information.
Each test file must contain lines starting with “RUN:” that tell lit how to run it. If there are no RUN lines, lit will issue an error while running a test.
RUN lines are specified in the comments of the test program using the
keyword RUN
followed by a colon, and lastly the command (pipeline)
to execute. Together, these lines form the “script” that lit
executes to run the test case. The syntax of the RUN lines is similar to a
shell’s syntax for pipelines including I/O redirection and variable
substitution. However, even though these lines may look like a shell
script, they are not. RUN lines are interpreted by lit.
Consequently, the syntax differs from shell in a few ways. You can specify
as many RUN lines as needed.
lit performs substitution on each RUN line to replace LLVM tool names
with the full paths to the executable built for each tool (in
$(LLVM_OBJ_ROOT)/$(BuildMode)/bin)
. This ensures that lit does
not invoke any stray LLVM tools in the user’s path during testing.
Each RUN line is executed on its own, distinct from other lines unless
its last character is \
. This continuation character causes the RUN
line to be concatenated with the next one. In this way you can build up
long pipelines of commands without making huge line lengths. The lines
ending in \
are concatenated until a RUN line that doesn’t end in
\
is found. This concatenated set of RUN lines then constitutes one
execution. lit will substitute variables and arrange for the pipeline
to be executed. If any process in the pipeline fails, the entire line (and
test case) fails too.
Below is an example of legal RUN lines in a .ll
file:
; RUN: llvm-as < %s | llvm-dis > %t1
; RUN: llvm-dis < %s.bc-13 > %t2
; RUN: diff %t1 %t2
As with a Unix shell, the RUN lines permit pipelines and I/O redirection to be used.
There are some quoting rules that you must pay attention to when writing your RUN lines. In general nothing needs to be quoted. lit won’t strip off any quote characters so they will get passed to the invoked program. To avoid this use curly braces to tell lit that it should treat everything enclosed as one value.
In general, you should strive to keep your RUN lines as simple as possible, using them only to run tools that generate textual output you can then examine. The recommended way to examine output to figure out if the test passes is using the FileCheck tool. [The usage of grep in RUN lines is deprecated - please do not send or commit patches that use it.]
Extra files¶
If your test requires extra files besides the file containing the RUN:
lines, the idiomatic place to put them is in a subdirectory Inputs
.
You can then refer to the extra files as %S/Inputs/foo.bar
.
For example, consider test/Linker/ident.ll
. The directory structure is
as follows:
test/
Linker/
ident.ll
Inputs/
ident.a.ll
ident.b.ll
For convenience, these are the contents:
;;;;; ident.ll:
; RUN: llvm-link %S/Inputs/ident.a.ll %S/Inputs/ident.b.ll -S | FileCheck %s
; Verify that multiple input llvm.ident metadata are linked together.
; CHECK-DAG: !llvm.ident = !{!0, !1, !2}
; CHECK-DAG: "Compiler V1"
; CHECK-DAG: "Compiler V2"
; CHECK-DAG: "Compiler V3"
;;;;; Inputs/ident.a.ll:
!llvm.ident = !{!0, !1}
!0 = metadata !{metadata !"Compiler V1"}
!1 = metadata !{metadata !"Compiler V2"}
;;;;; Inputs/ident.b.ll:
!llvm.ident = !{!0}
!0 = metadata !{metadata !"Compiler V3"}
For symmetry reasons, ident.ll
is just a dummy file that doesn’t
actually participate in the test besides holding the RUN:
lines.
Note
Some existing tests use RUN: true
in extra files instead of just
putting the extra files in an Inputs/
directory. This pattern is
deprecated.
Fragile tests¶
It is easy to write a fragile test that would fail spuriously if the tool being
tested outputs a full path to the input file. For example, opt by
default outputs a ModuleID
:
$ cat example.ll
define i32 @main() nounwind {
ret i32 0
}
$ opt -S /path/to/example.ll
; ModuleID = '/path/to/example.ll'
define i32 @main() nounwind {
ret i32 0
}
ModuleID
can unexpetedly match against CHECK
lines. For example:
; RUN: opt -S %s | FileCheck
define i32 @main() nounwind {
; CHECK-NOT: load
ret i32 0
}
This test will fail if placed into a download
directory.
To make your tests robust, always use opt ... < %s
in the RUN line.
opt does not output a ModuleID
when input comes from stdin.
Platform-Specific Tests¶
Whenever adding tests that require the knowledge of a specific platform, either related to code generated, specific output or back-end features, you must make sure to isolate the features, so that buildbots that run on different architectures (and don’t even compile all back-ends), don’t fail.
The first problem is to check for target-specific output, for example sizes of structures, paths and architecture names, for example:
- Tests containing Windows paths will fail on Linux and vice-versa.
- Tests that check for
x86_64
somewhere in the text will fail anywhere else. - Tests where the debug information calculates the size of types and structures.
Also, if the test rely on any behaviour that is coded in any back-end, it must
go in its own directory. So, for instance, code generator tests for ARM go
into test/CodeGen/ARM
and so on. Those directories contain a special
lit
configuration file that ensure all tests in that directory will
only run if a specific back-end is compiled and available.
For instance, on test/CodeGen/ARM
, the lit.local.cfg
is:
config.suffixes = ['.ll', '.c', '.cpp', '.test']
if not 'ARM' in config.root.targets:
config.unsupported = True
Other platform-specific tests are those that depend on a specific feature
of a specific sub-architecture, for example only to Intel chips that support AVX2
.
For instance, test/CodeGen/X86/psubus.ll
tests three sub-architecture
variants:
; RUN: llc -mcpu=core2 < %s | FileCheck %s -check-prefix=SSE2
; RUN: llc -mcpu=corei7-avx < %s | FileCheck %s -check-prefix=AVX1
; RUN: llc -mcpu=core-avx2 < %s | FileCheck %s -check-prefix=AVX2
And the checks are different:
; SSE2: @test1
; SSE2: psubusw LCPI0_0(%rip), %xmm0
; AVX1: @test1
; AVX1: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0
; AVX2: @test1
; AVX2: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0
So, if you’re testing for a behaviour that you know is platform-specific or depends on special features of sub-architectures, you must add the specific triple, test with the specific FileCheck and put it into the specific directory that will filter out all other architectures.
Substitutions¶
Besides replacing LLVM tool names the following substitutions are performed in RUN lines:
%%
- Replaced by a single
%
. This allows escaping other substitutions. %s
File path to the test case’s source. This is suitable for passing on the command line as the input to an LLVM tool.
Example:
/home/user/llvm/test/MC/ELF/foo_test.s
%S
Directory path to the test case’s source.
Example:
/home/user/llvm/test/MC/ELF
%t
File path to a temporary file name that could be used for this test case. The file name won’t conflict with other test cases. You can append to it if you need multiple temporaries. This is useful as the destination of some redirected output.
Example:
/home/user/llvm.build/test/MC/ELF/Output/foo_test.s.tmp
%T
Directory of
%t
.Example:
/home/user/llvm.build/test/MC/ELF/Output
%{pathsep}
Expands to the path separator, i.e.:
(or;
on Windows).
LLVM-specific substitutions:
%shlibext
The suffix for the host platforms shared library files. This includes the period as the first character.
Example:
.so
(Linux),.dylib
(OS X),.dll
(Windows)%exeext
The suffix for the host platforms executable files. This includes the period as the first character.
Example:
.exe
(Windows), empty on Linux.%(line)
,%(line+<number>)
,%(line-<number>)
- The number of the line where this substitution is used, with an optional integer offset. This can be used in tests with multiple RUN lines, which reference test file’s line numbers.
Clang-specific substitutions:
%clang
- Invokes the Clang driver.
%clang_cpp
- Invokes the Clang driver for C++.
%clang_cl
- Invokes the CL-compatible Clang driver.
%clangxx
- Invokes the G++-compatible Clang driver.
%clang_cc1
- Invokes the Clang frontend.
%itanium_abi_triple
,%ms_abi_triple
- These substitutions can be used to get the current target triple adjusted to
the desired ABI. For example, if the test suite is running with the
i686-pc-win32
target,%itanium_abi_triple
will expand toi686-pc-mingw32
. This allows a test to run with a specific ABI without constraining it to a specific triple.
To add more substituations, look at test/lit.cfg
or lit.local.cfg
.
Options¶
The llvm lit configuration allows to customize some things with user options:
llc
,opt
, ...Substitute the respective llvm tool name with a custom command line. This allows to specify custom paths and default arguments for these tools. Example:
% llvm-lit “-Dllc=llc -verify-machineinstrs”
run_long_tests
- Enable the execution of long running tests.
llvm_site_config
- Load the specified lit configuration instead of the default one.
Other Features¶
To make RUN line writing easier, there are several helper programs. These helpers are in the PATH when running tests, so you can just call them using their name. For example:
not
- This program runs its arguments and then inverts the result code from it. Zero result codes become 1. Non-zero result codes become 0.
Sometimes it is necessary to mark a test case as “expected fail” or
XFAIL. You can easily mark a test as XFAIL just by including XFAIL:
on a line near the top of the file. This signals that the test case
should succeed if the test fails. Such test cases are counted separately
by the testing tool. To specify an expected fail, use the XFAIL keyword
in the comments of the test program followed by a colon and one or more
failure patterns. Each failure pattern can be either *
(to specify
fail everywhere), or a part of a target triple (indicating the test
should fail on that platform), or the name of a configurable feature
(for example, loadable_module
). If there is a match, the test is
expected to fail. If not, the test is expected to succeed. To XFAIL
everywhere just specify XFAIL: *
. Here is an example of an XFAIL
line:
; XFAIL: darwin,sun
To make the output more useful, lit will scan
the lines of the test case for ones that contain a pattern that matches
PR[0-9]+
. This is the syntax for specifying a PR (Problem Report) number
that is related to the test case. The number after “PR” specifies the
LLVM bugzilla number. When a PR number is specified, it will be used in
the pass/fail reporting. This is useful to quickly get some context when
a test fails.
Finally, any line that contains “END.” will cause the special interpretation of lines to terminate. This is generally done right after the last RUN: line. This has two side effects:
- it prevents special interpretation of lines that are part of the test program, not the instructions to the test case, and
- it speeds things up for really big test cases by avoiding interpretation of the remainder of the file.
test-suite
Overview¶
The test-suite
module contains a number of programs that can be
compiled and executed. The test-suite
includes reference outputs for
all of the programs, so that the output of the executed program can be
checked for correctness.
test-suite
tests are divided into three types of tests: MultiSource,
SingleSource, and External.
test-suite/SingleSource
The SingleSource directory contains test programs that are only a single source file in size. These are usually small benchmark programs or small programs that calculate a particular value. Several such programs are grouped together in each directory.
test-suite/MultiSource
The MultiSource directory contains subdirectories which contain entire programs with multiple source files. Large benchmarks and whole applications go here.
test-suite/External
The External directory contains Makefiles for building code that is external to (i.e., not distributed with) LLVM. The most prominent members of this directory are the SPEC 95 and SPEC 2000 benchmark suites. The
External
directory does not contain these actual tests, but only the Makefiles that know how to properly compile these programs from somewhere else. When usingLNT
, use the--test-externals
option to include these tests in the results.
test-suite
Quickstart¶
The modern way of running the test-suite
is focused on testing and
benchmarking complete compilers using the
LNT testing infrastructure.
For more information on using LNT to execute the test-suite
, please
see the LNT Quickstart
documentation.
test-suite
Makefiles¶
Historically, the test-suite
was executed using a complicated setup
of Makefiles. The LNT based approach above is recommended for most
users, but there are some testing scenarios which are not supported by
the LNT approach. In addition, LNT currently uses the Makefile setup
under the covers and so developers who are interested in how LNT works
under the hood may want to understand the Makefile based setup.
For more information on the test-suite
Makefile setup, please see
the Test Suite Makefile Guide.
LLVM Tutorial: Table of Contents¶
Kaleidoscope: Implementing a Language with LLVM¶
Kaleidoscope: Tutorial Introduction and the Lexer¶
Tutorial Introduction¶
Welcome to the “Implementing a language with LLVM” tutorial. This tutorial runs through the implementation of a simple language, showing how fun and easy it can be. This tutorial will get you up and started as well as help to build a framework you can extend to other languages. The code in this tutorial can also be used as a playground to hack on other LLVM specific things.
The goal of this tutorial is to progressively unveil our language, describing how it is built up over time. This will let us cover a fairly broad range of language design and LLVM-specific usage issues, showing and explaining the code for it all along the way, without overwhelming you with tons of details up front.
It is useful to point out ahead of time that this tutorial is really about teaching compiler techniques and LLVM specifically, not about teaching modern and sane software engineering principles. In practice, this means that we’ll take a number of shortcuts to simplify the exposition. For example, the code leaks memory, uses global variables all over the place, doesn’t use nice design patterns like visitors, etc... but it is very simple. If you dig in and use the code as a basis for future projects, fixing these deficiencies shouldn’t be hard.
I’ve tried to put this tutorial together in a way that makes chapters easy to skip over if you are already familiar with or are uninterested in the various pieces. The structure of the tutorial is:
- Chapter #1: Introduction to the Kaleidoscope language, and the definition of its Lexer - This shows where we are going and the basic functionality that we want it to do. In order to make this tutorial maximally understandable and hackable, we choose to implement everything in C++ instead of using lexer and parser generators. LLVM obviously works just fine with such tools, feel free to use one if you prefer.
- Chapter #2: Implementing a Parser and AST - With the lexer in place, we can talk about parsing techniques and basic AST construction. This tutorial describes recursive descent parsing and operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific, the code doesn’t even link in LLVM at this point. :)
- Chapter #3: Code generation to LLVM IR - With the AST ready, we can show off how easy generation of LLVM IR really is.
- Chapter #4: Adding JIT and Optimizer Support - Because a lot of people are interested in using LLVM as a JIT, we’ll dive right into it and show you the 3 lines it takes to add JIT support. LLVM is also useful in many other ways, but this is one simple and “sexy” way to show off its power. :)
- Chapter #5: Extending the Language: Control Flow - With the language up and running, we show how to extend it with control flow operations (if/then/else and a ‘for’ loop). This gives us a chance to talk about simple SSA construction and control flow.
- Chapter #6: Extending the Language: User-defined Operators - This is a silly but fun chapter that talks about extending the language to let the user program define their own arbitrary unary and binary operators (with assignable precedence!). This lets us build a significant piece of the “language” as library routines.
- Chapter #7: Extending the Language: Mutable Variables - This chapter talks about adding user-defined local variables along with an assignment operator. The interesting part about this is how easy and trivial it is to construct SSA form in LLVM: no, LLVM does not require your front-end to construct SSA form!
- Chapter #8: Extending the Language: Debug Information - Having built a decent little programming language with control flow, functions and mutable variables, we consider what it takes to add debug information to standalone executables. This debug information will allow you to set breakpoints in Kaleidoscope functions, print out argument variables, and call functions - all from within the debugger!
- Chapter #9: Conclusion and other useful LLVM tidbits - This chapter wraps up the series by talking about potential ways to extend the language, but also includes a bunch of pointers to info about “special topics” like adding garbage collection support, exceptions, debugging, support for “spaghetti stacks”, and a bunch of other tips and tricks.
By the end of the tutorial, we’ll have written a bit less than 1000 lines of non-comment, non-blank, lines of code. With this small amount of code, we’ll have built up a very reasonable compiler for a non-trivial language including a hand-written lexer, parser, AST, as well as code generation support with a JIT compiler. While other systems may have interesting “hello world” tutorials, I think the breadth of this tutorial is a great testament to the strengths of LLVM and why you should consider it if you’re interested in language or compiler design.
A note about this tutorial: we expect you to extend the language and play with it on your own. Take the code and go crazy hacking away at it, compilers don’t need to be scary creatures - it can be a lot of fun to play with languages!
The Basic Language¶
This tutorial will be illustrated with a toy language that we’ll call “Kaleidoscope” (derived from “meaning beautiful, form, and view”). Kaleidoscope is a procedural language that allows you to define functions, use conditionals, math, etc. Over the course of the tutorial, we’ll extend Kaleidoscope to support the if/then/else construct, a for loop, user defined operators, JIT compilation with a simple command line interface, etc.
Because we want to keep things simple, the only datatype in Kaleidoscope is a 64-bit floating point type (aka ‘double’ in C parlance). As such, all values are implicitly double precision and the language doesn’t require type declarations. This gives the language a very nice and simple syntax. For example, the following simple example computes Fibonacci numbers:
# Compute the x'th fibonacci number.
def fib(x)
if x < 3 then
1
else
fib(x-1)+fib(x-2)
# This expression will compute the 40th number.
fib(40)
We also allow Kaleidoscope to call into standard library functions (the LLVM JIT makes this completely trivial). This means that you can use the ‘extern’ keyword to define a function before you use it (this is also useful for mutually recursive functions). For example:
extern sin(arg);
extern cos(arg);
extern atan2(arg1 arg2);
atan2(sin(.4), cos(42))
A more interesting example is included in Chapter 6 where we write a little Kaleidoscope application that displays a Mandelbrot Set at various levels of magnification.
Lets dive into the implementation of this language!
The Lexer¶
When it comes to implementing a language, the first thing needed is the ability to process a text file and recognize what it says. The traditional way to do this is to use a “lexer” (aka ‘scanner’) to break the input up into “tokens”. Each token returned by the lexer includes a token code and potentially some metadata (e.g. the numeric value of a number). First, we define the possibilities:
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2, tok_extern = -3,
// primary
tok_identifier = -4, tok_number = -5,
};
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
Each token returned by our lexer will either be one of the Token enum
values or it will be an ‘unknown’ character like ‘+’, which is returned
as its ASCII value. If the current token is an identifier, the
IdentifierStr
global variable holds the name of the identifier. If
the current token is a numeric literal (like 1.0), NumVal
holds its
value. Note that we use global variables for simplicity, this is not the
best choice for a real language implementation :).
The actual implementation of the lexer is a single function named
gettok
. The gettok
function is called to return the next token
from standard input. Its definition starts as:
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = getchar();
gettok
works by calling the C getchar()
function to read
characters one at a time from standard input. It eats them as it
recognizes them and stores the last character read, but not processed,
in LastChar. The first thing that it has to do is ignore whitespace
between tokens. This is accomplished with the loop above.
The next thing gettok
needs to do is recognize identifiers and
specific keywords like “def”. Kaleidoscope does this with this simple
loop:
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = getchar())))
IdentifierStr += LastChar;
if (IdentifierStr == "def") return tok_def;
if (IdentifierStr == "extern") return tok_extern;
return tok_identifier;
}
Note that this code sets the ‘IdentifierStr
‘ global whenever it
lexes an identifier. Also, since language keywords are matched by the
same loop, we handle them here inline. Numeric values are similar:
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = getchar();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
This is all pretty straight-forward code for processing input. When
reading a numeric value from input, we use the C strtod
function to
convert it to a numeric value that we store in NumVal
. Note that
this isn’t doing sufficient error checking: it will incorrectly read
“1.23.45.67” and handle it as if you typed in “1.23”. Feel free to
extend it :). Next we handle comments:
if (LastChar == '#') {
// Comment until end of line.
do LastChar = getchar();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
We handle comments by skipping to the end of the line and then return the next token. Finally, if the input doesn’t match one of the above cases, it is either an operator character like ‘+’ or the end of the file. These are handled with this code:
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = getchar();
return ThisChar;
}
With this, we have the complete lexer for the basic Kaleidoscope language (the full code listing for the Lexer is available in the next chapter of the tutorial). Next we’ll build a simple parser that uses this to build an Abstract Syntax Tree. When we have that, we’ll include a driver so that you can use the lexer and parser together.
Kaleidoscope: Implementing a Parser and AST¶
Chapter 2 Introduction¶
Welcome to Chapter 2 of the “Implementing a language with LLVM” tutorial. This chapter shows you how to use the lexer, built in Chapter 1, to build a full parser for our Kaleidoscope language. Once we have a parser, we’ll define and build an Abstract Syntax Tree (AST).
The parser we will build uses a combination of Recursive Descent Parsing and Operator-Precedence Parsing to parse the Kaleidoscope language (the latter for binary expressions and the former for everything else). Before we get to parsing though, lets talk about the output of the parser: the Abstract Syntax Tree.
The Abstract Syntax Tree (AST)¶
The AST for a program captures its behavior in such a way that it is easy for later stages of the compiler (e.g. code generation) to interpret. We basically want one object for each construct in the language, and the AST should closely model the language. In Kaleidoscope, we have expressions, a prototype, and a function object. We’ll start with expressions first:
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
};
The code above shows the definition of the base ExprAST class and one subclass which we use for numeric literals. The important thing to note about this code is that the NumberExprAST class captures the numeric value of the literal as an instance variable. This allows later phases of the compiler to know what the stored numeric value is.
Right now we only create the AST, so there are no useful accessor methods on them. It would be very easy to add a virtual method to pretty print the code, for example. Here are the other expression AST node definitions that we’ll use in the basic form of the Kaleidoscope language:
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(const std::string &name) : Name(name) {}
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
: Op(op), LHS(lhs), RHS(rhs) {}
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST*> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
: Callee(callee), Args(args) {}
};
This is all (intentionally) rather straight-forward: variables capture the variable name, binary operators capture their opcode (e.g. ‘+’), and calls capture a function name as well as a list of any argument expressions. One thing that is nice about our AST is that it captures the language features without talking about the syntax of the language. Note that there is no discussion about precedence of binary operators, lexical structure, etc.
For our basic language, these are all of the expression nodes we’ll define. Because it doesn’t have conditional control flow, it isn’t Turing-complete; we’ll fix that in a later installment. The two things we need next are a way to talk about the interface to a function, and a way to talk about functions themselves:
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes).
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args)
: Name(name), Args(args) {}
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body)
: Proto(proto), Body(body) {}
};
In Kaleidoscope, functions are typed with just a count of their arguments. Since all values are double precision floating point, the type of each argument doesn’t need to be stored anywhere. In a more aggressive and realistic language, the “ExprAST” class would probably have a type field.
With this scaffolding, we can now talk about parsing expressions and function bodies in Kaleidoscope.
Parser Basics¶
Now that we have an AST to build, we need to define the parser code to build it. The idea here is that we want to parse something like “x+y” (which is returned as three tokens by the lexer) into an AST that could be generated with calls like this:
ExprAST *X = new VariableExprAST("x");
ExprAST *Y = new VariableExprAST("y");
ExprAST *Result = new BinaryExprAST('+', X, Y);
In order to do this, we’ll start by defining some basic helper routines:
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() {
return CurTok = gettok();
}
This implements a simple token buffer around the lexer. This allows us to look one token ahead at what the lexer is returning. Every function in our parser will assume that CurTok is the current token that needs to be parsed.
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
The Error
routines are simple helper routines that our parser will
use to handle errors. The error recovery in our parser will not be the
best and is not particular user-friendly, but it will be enough for our
tutorial. These routines make it easier to handle errors in routines
that have various return types: they always return null.
With these basic helper functions, we can implement the first piece of our grammar: numeric literals.
Basic Expression Parsing¶
We start with numeric literals, because they are the simplest to process. For each production in our grammar, we’ll define a function which parses that production. For numeric literals, we have:
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
This routine is very simple: it expects to be called when the current
token is a tok_number
token. It takes the current number value,
creates a NumberExprAST
node, advances the lexer to the next token,
and finally returns.
There are some interesting aspects to this. The most important one is that this routine eats all of the tokens that correspond to the production and returns the lexer buffer with the next token (which is not part of the grammar production) ready to go. This is a fairly standard way to go for recursive descent parsers. For a better example, the parenthesis operator is defined like this:
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V) return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
This function illustrates a number of interesting things about the parser:
1) It shows how we use the Error routines. When called, this function expects that the current token is a ‘(‘ token, but after parsing the subexpression, it is possible that there is no ‘)’ waiting. For example, if the user types in “(4 x” instead of “(4)”, the parser should emit an error. Because errors can occur, the parser needs a way to indicate that they happened: in our parser, we return null on an error.
2) Another interesting aspect of this function is that it uses recursion
by calling ParseExpression
(we will soon see that
ParseExpression
can call ParseParenExpr
). This is powerful
because it allows us to handle recursive grammars, and keeps each
production very simple. Note that parentheses do not cause construction
of AST nodes themselves. While we could do it this way, the most
important role of parentheses are to guide the parser and provide
grouping. Once the parser constructs the AST, parentheses are not
needed.
The next simple production is for handling variable references and function calls:
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST*> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg) return 0;
Args.push_back(Arg);
if (CurTok == ')') break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(IdName, Args);
}
This routine follows the same style as the other routines. (It expects
to be called if the current token is a tok_identifier
token). It
also has recursion and error handling. One interesting aspect of this is
that it uses look-ahead to determine if the current identifier is a
stand alone variable reference or if it is a function call expression.
It handles this by checking to see if the token after the identifier is
a ‘(‘ token, constructing either a VariableExprAST
or
CallExprAST
node as appropriate.
Now that we have all of our simple expression-parsing logic in place, we can define a helper function to wrap it together into one entry point. We call this class of expressions “primary” expressions, for reasons that will become more clear later in the tutorial. In order to parse an arbitrary primary expression, we need to determine what sort of expression it is:
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default: return Error("unknown token when expecting an expression");
case tok_identifier: return ParseIdentifierExpr();
case tok_number: return ParseNumberExpr();
case '(': return ParseParenExpr();
}
}
Now that you see the definition of this function, it is more obvious why we can assume the state of CurTok in the various functions. This uses look-ahead to determine which sort of expression is being inspected, and then parses it with a function call.
Now that basic expressions are handled, we need to handle binary expressions. They are a bit more complex.
Binary Expression Parsing¶
Binary expressions are significantly harder to parse because they are often ambiguous. For example, when given the string “x+y*z”, the parser can choose to parse it as either “(x+y)*z” or “x+(y*z)”. With common definitions from mathematics, we expect the later parse, because “*” (multiplication) has higher precedence than “+” (addition).
There are many ways to handle this, but an elegant and efficient way is to use Operator-Precedence Parsing. This parsing technique uses the precedence of binary operators to guide recursion. To start with, we need a table of precedences:
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0) return -1;
return TokPrec;
}
int main() {
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
...
}
For the basic form of Kaleidoscope, we will only support 4 binary
operators (this can obviously be extended by you, our brave and intrepid
reader). The GetTokPrecedence
function returns the precedence for
the current token, or -1 if the token is not a binary operator. Having a
map makes it easy to add new operators and makes it clear that the
algorithm doesn’t depend on the specific operators involved, but it
would be easy enough to eliminate the map and do the comparisons in the
GetTokPrecedence
function. (Or just use a fixed-size array).
With the helper above defined, we can now start parsing binary expressions. The basic idea of operator precedence parsing is to break down an expression with potentially ambiguous binary operators into pieces. Consider ,for example, the expression “a+b+(c+d)*e*f+g”. Operator precedence parsing considers this as a stream of primary expressions separated by binary operators. As such, it will first parse the leading primary expression “a”, then it will see the pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses are primary expressions, the binary expression parser doesn’t need to worry about nested subexpressions like (c+d) at all.
To start, an expression is a primary expression potentially followed by a sequence of [binop,primaryexpr] pairs:
/// expression
/// ::= primary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParsePrimary();
if (!LHS) return 0;
return ParseBinOpRHS(0, LHS);
}
ParseBinOpRHS
is the function that parses the sequence of pairs for
us. It takes a precedence and a pointer to an expression for the part
that has been parsed so far. Note that “x” is a perfectly valid
expression: As such, “binoprhs” is allowed to be empty, in which case it
returns the expression that is passed into it. In our example above, the
code passes the expression for “a” into ParseBinOpRHS
and the
current token is “+”.
The precedence value passed into ParseBinOpRHS
indicates the
minimal operator precedence that the function is allowed to eat. For
example, if the current pair stream is [+, x] and ParseBinOpRHS
is
passed in a precedence of 40, it will not consume any tokens (because
the precedence of ‘+’ is only 20). With this in mind, ParseBinOpRHS
starts with:
/// binoprhs
/// ::= ('+' primary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
This code gets the precedence of the current token and checks to see if if is too low. Because we defined invalid tokens to have a precedence of -1, this check implicitly knows that the pair-stream ends when the token stream runs out of binary operators. If this check succeeds, we know that the token is a binary operator and that it will be included in this expression:
// Okay, we know this is a binop.
int BinOp = CurTok;
getNextToken(); // eat binop
// Parse the primary expression after the binary operator.
ExprAST *RHS = ParsePrimary();
if (!RHS) return 0;
As such, this code eats (and remembers) the binary operator and then parses the primary expression that follows. This builds up the whole pair, the first of which is [+, b] for the running example.
Now that we parsed the left-hand side of an expression and one pair of the RHS sequence, we have to decide which way the expression associates. In particular, we could have “(a+b) binop unparsed” or “a + (b binop unparsed)”. To determine this, we look ahead at “binop” to determine its precedence and compare it to BinOp’s precedence (which is ‘+’ in this case):
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
If the precedence of the binop to the right of “RHS” is lower or equal to the precedence of our current operator, then we know that the parentheses associate as “(a+b) binop ...”. In our example, the current operator is “+” and the next operator is “+”, we know that they have the same precedence. In this case we’ll create the AST node for “a+b”, and then continue parsing:
... if body omitted ...
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
} // loop around to the top of the while loop.
}
In our example above, this will turn “a+b+” into “(a+b)” and execute the next iteration of the loop, with “+” as the current token. The code above will eat, remember, and parse “(c+d)” as the primary expression, which makes the current pair equal to [+, (c+d)]. It will then evaluate the ‘if’ conditional above with “*” as the binop to the right of the primary. In this case, the precedence of “*” is higher than the precedence of “+” so the if condition will be entered.
The critical question left here is “how can the if condition parse the right hand side in full”? In particular, to build the AST correctly for our example, it needs to get all of “(c+d)*e*f” as the RHS expression variable. The code to do this is surprisingly simple (code from the above two blocks duplicated for context):
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec+1, RHS);
if (RHS == 0) return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
} // loop around to the top of the while loop.
}
At this point, we know that the binary operator to the RHS of our
primary has higher precedence than the binop we are currently parsing.
As such, we know that any sequence of pairs whose operators are all
higher precedence than “+” should be parsed together and returned as
“RHS”. To do this, we recursively invoke the ParseBinOpRHS
function
specifying “TokPrec+1” as the minimum precedence required for it to
continue. In our example above, this will cause it to return the AST
node for “(c+d)*e*f” as RHS, which is then set as the RHS of the ‘+’
expression.
Finally, on the next iteration of the while loop, the “+g” piece is parsed and added to the AST. With this little bit of code (14 non-trivial lines), we correctly handle fully general binary expression parsing in a very elegant way. This was a whirlwind tour of this code, and it is somewhat subtle. I recommend running through it with a few tough examples to see how it works.
This wraps up handling of expressions. At this point, we can point the parser at an arbitrary token stream and build an expression from it, stopping at the first token that is not part of the expression. Next up we need to handle function definitions, etc.
Parsing the Rest¶
The next thing missing is handling of function prototypes. In Kaleidoscope, these are used both for ‘extern’ function declarations as well as function body definitions. The code to do this is straight-forward and not very interesting (once you’ve survived expressions):
/// prototype
/// ::= id '(' id* ')'
static PrototypeAST *ParsePrototype() {
if (CurTok != tok_identifier)
return ErrorP("Expected function name in prototype");
std::string FnName = IdentifierStr;
getNextToken();
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
// Read the list of argument names.
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
return new PrototypeAST(FnName, ArgNames);
}
Given this, a function definition is very simple, just a prototype plus an expression to implement the body:
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0) return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
In addition, we support ‘extern’ to declare functions like ‘sin’ and ‘cos’ as well as to support forward declaration of user functions. These ‘extern’s are just prototypes with no body:
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
Finally, we’ll also let the user type in arbitrary top-level expressions and evaluate them on the fly. We will handle this by defining anonymous nullary (zero argument) functions for them:
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
Now that we have all the pieces, let’s build a little driver that will let us actually execute this code we’ve built!
The Driver¶
The driver for this simply invokes all of the parsing pieces with a top-level dispatch loop. There isn’t much interesting here, so I’ll just include the top-level loop. See below for full code in the “Top-Level Parsing” section.
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof: return;
case ';': getNextToken(); break; // ignore top-level semicolons.
case tok_def: HandleDefinition(); break;
case tok_extern: HandleExtern(); break;
default: HandleTopLevelExpression(); break;
}
}
}
The most interesting part of this is that we ignore top-level semicolons. Why is this, you ask? The basic reason is that if you type “4 + 5” at the command line, the parser doesn’t know whether that is the end of what you will type or not. For example, on the next line you could type “def foo...” in which case 4+5 is the end of a top-level expression. Alternatively you could type “* 6”, which would continue the expression. Having top-level semicolons allows you to type “4+5;”, and the parser will know you are done.
Conclusions¶
With just under 400 lines of commented code (240 lines of non-comment, non-blank code), we fully defined our minimal language, including a lexer, parser, and AST builder. With this done, the executable will validate Kaleidoscope code and tell us if it is grammatically invalid. For example, here is a sample interaction:
$ ./a.out
ready> def foo(x y) x+foo(y, 4.0);
Parsed a function definition.
ready> def foo(x y) x+y y;
Parsed a function definition.
Parsed a top-level expr
ready> def foo(x y) x+y );
Parsed a function definition.
Error: unknown token when expecting an expression
ready> extern sin(a);
ready> Parsed an extern
ready> ^D
$
There is a lot of room for extension here. You can define new AST nodes, extend the language in many ways, etc. In the next installment, we will describe how to generate LLVM Intermediate Representation (IR) from the AST.
Full Code Listing¶
Here is the complete code listing for this and the previous chapter. Note that it is fully self-contained: you don’t need LLVM or any external libraries at all for this. (Besides the C and C++ standard libraries, of course.) To build this, just compile with:
# Compile
clang++ -g -O3 toy.cpp
# Run
./a.out
Here is the code:
#include <cctype>
#include <cstdio>
#include <cstdlib>
#include <map>
#include <string>
#include <vector>
//===----------------------------------------------------------------------===//
// Lexer
//===----------------------------------------------------------------------===//
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2, tok_extern = -3,
// primary
tok_identifier = -4, tok_number = -5
};
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = getchar();
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = getchar())))
IdentifierStr += LastChar;
if (IdentifierStr == "def") return tok_def;
if (IdentifierStr == "extern") return tok_extern;
return tok_identifier;
}
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = getchar();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
if (LastChar == '#') {
// Comment until end of line.
do LastChar = getchar();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = getchar();
return ThisChar;
}
//===----------------------------------------------------------------------===//
// Abstract Syntax Tree (aka Parse Tree)
//===----------------------------------------------------------------------===//
namespace {
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
public:
NumberExprAST(double val) {}
};
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(const std::string &name) : Name(name) {}
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) {}
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST*> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
: Callee(callee), Args(args) {}
};
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes).
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args)
: Name(name), Args(args) {}
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
public:
FunctionAST(PrototypeAST *proto, ExprAST *body) {}
};
} // end anonymous namespace
//===----------------------------------------------------------------------===//
// Parser
//===----------------------------------------------------------------------===//
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() {
return CurTok = gettok();
}
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0) return -1;
return TokPrec;
}
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
static ExprAST *ParseExpression();
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST*> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg) return 0;
Args.push_back(Arg);
if (CurTok == ')') break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(IdName, Args);
}
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V) return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default: return Error("unknown token when expecting an expression");
case tok_identifier: return ParseIdentifierExpr();
case tok_number: return ParseNumberExpr();
case '(': return ParseParenExpr();
}
}
/// binoprhs
/// ::= ('+' primary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
// Okay, we know this is a binop.
int BinOp = CurTok;
getNextToken(); // eat binop
// Parse the primary expression after the binary operator.
ExprAST *RHS = ParsePrimary();
if (!RHS) return 0;
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec+1, RHS);
if (RHS == 0) return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
}
}
/// expression
/// ::= primary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParsePrimary();
if (!LHS) return 0;
return ParseBinOpRHS(0, LHS);
}
/// prototype
/// ::= id '(' id* ')'
static PrototypeAST *ParsePrototype() {
if (CurTok != tok_identifier)
return ErrorP("Expected function name in prototype");
std::string FnName = IdentifierStr;
getNextToken();
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
return new PrototypeAST(FnName, ArgNames);
}
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0) return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
//===----------------------------------------------------------------------===//
// Top-Level parsing
//===----------------------------------------------------------------------===//
static void HandleDefinition() {
if (ParseDefinition()) {
fprintf(stderr, "Parsed a function definition.\n");
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleExtern() {
if (ParseExtern()) {
fprintf(stderr, "Parsed an extern\n");
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (ParseTopLevelExpr()) {
fprintf(stderr, "Parsed a top-level expr\n");
} else {
// Skip token for error recovery.
getNextToken();
}
}
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof: return;
case ';': getNextToken(); break; // ignore top-level semicolons.
case tok_def: HandleDefinition(); break;
case tok_extern: HandleExtern(); break;
default: HandleTopLevelExpression(); break;
}
}
}
//===----------------------------------------------------------------------===//
// Main driver code.
//===----------------------------------------------------------------------===//
int main() {
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
fprintf(stderr, "ready> ");
getNextToken();
// Run the main "interpreter loop" now.
MainLoop();
return 0;
}
Kaleidoscope: Code generation to LLVM IR¶
Chapter 3 Introduction¶
Welcome to Chapter 3 of the “Implementing a language with LLVM” tutorial. This chapter shows you how to transform the Abstract Syntax Tree, built in Chapter 2, into LLVM IR. This will teach you a little bit about how LLVM does things, as well as demonstrate how easy it is to use. It’s much more work to build a lexer and parser than it is to generate LLVM IR code. :)
Please note: the code in this chapter and later require LLVM 2.2 or later. LLVM 2.1 and before will not work with it. Also note that you need to use a version of this tutorial that matches your LLVM release: If you are using an official LLVM release, use the version of the documentation included with your release or on the llvm.org releases page.
Code Generation Setup¶
In order to generate LLVM IR, we want some simple setup to get started. First we define virtual code generation (codegen) methods in each AST class:
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
virtual Value *Codegen();
};
...
The Codegen() method says to emit IR for that AST node along with all the things it depends on, and they all return an LLVM Value object. “Value” is the class used to represent a “Static Single Assignment (SSA) register” or “SSA value” in LLVM. The most distinct aspect of SSA values is that their value is computed as the related instruction executes, and it does not get a new value until (and if) the instruction re-executes. In other words, there is no way to “change” an SSA value. For more information, please read up on Static Single Assignment - the concepts are really quite natural once you grok them.
Note that instead of adding virtual methods to the ExprAST class hierarchy, it could also make sense to use a visitor pattern or some other way to model this. Again, this tutorial won’t dwell on good software engineering practices: for our purposes, adding a virtual method is simplest.
The second thing we want is an “Error” method like we used for the parser, which will be used to report errors found during code generation (for example, use of an undeclared parameter):
Value *ErrorV(const char *Str) { Error(Str); return 0; }
static Module *TheModule;
static IRBuilder<> Builder(getGlobalContext());
static std::map<std::string, Value*> NamedValues;
The static variables will be used during code generation. TheModule
is the LLVM construct that contains all of the functions and global
variables in a chunk of code. In many ways, it is the top-level
structure that the LLVM IR uses to contain code.
The Builder
object is a helper object that makes it easy to generate
LLVM instructions. Instances of the
IRBuilder
class template keep track of the current place to insert instructions
and has methods to create new instructions.
The NamedValues
map keeps track of which values are defined in the
current scope and what their LLVM representation is. (In other words, it
is a symbol table for the code). In this form of Kaleidoscope, the only
things that can be referenced are function parameters. As such, function
parameters will be in this map when generating code for their function
body.
With these basics in place, we can start talking about how to generate
code for each expression. Note that this assumes that the Builder
has been set up to generate code into something. For now, we’ll assume
that this has already been done, and we’ll just use it to emit code.
Expression Code Generation¶
Generating LLVM code for expression nodes is very straightforward: less than 45 lines of commented code for all four of our expression nodes. First we’ll do numeric literals:
Value *NumberExprAST::Codegen() {
return ConstantFP::get(getGlobalContext(), APFloat(Val));
}
In the LLVM IR, numeric constants are represented with the
ConstantFP
class, which holds the numeric value in an APFloat
internally (APFloat
has the capability of holding floating point
constants of Arbitrary Precision). This code basically just creates
and returns a ConstantFP
. Note that in the LLVM IR that constants
are all uniqued together and shared. For this reason, the API uses the
“foo::get(...)” idiom instead of “new foo(..)” or “foo::Create(..)”.
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
return V ? V : ErrorV("Unknown variable name");
}
References to variables are also quite simple using LLVM. In the simple
version of Kaleidoscope, we assume that the variable has already been
emitted somewhere and its value is available. In practice, the only
values that can be in the NamedValues
map are function arguments.
This code simply checks to see that the specified name is in the map (if
not, an unknown variable is being referenced) and returns the value for
it. In future chapters, we’ll add support for loop induction
variables in the symbol table, and for local
variables.
Value *BinaryExprAST::Codegen() {
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0) return 0;
switch (Op) {
case '+': return Builder.CreateFAdd(L, R, "addtmp");
case '-': return Builder.CreateFSub(L, R, "subtmp");
case '*': return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default: return ErrorV("invalid binary operator");
}
}
Binary operators start to get more interesting. The basic idea here is that we recursively emit code for the left-hand side of the expression, then the right-hand side, then we compute the result of the binary expression. In this code, we do a simple switch on the opcode to create the right LLVM instruction.
In the example above, the LLVM builder class is starting to show its
value. IRBuilder knows where to insert the newly created instruction,
all you have to do is specify what instruction to create (e.g. with
CreateFAdd
), which operands to use (L
and R
here) and
optionally provide a name for the generated instruction.
One nice thing about LLVM is that the name is just a hint. For instance, if the code above emits multiple “addtmp” variables, LLVM will automatically provide each one with an increasing, unique numeric suffix. Local value names for instructions are purely optional, but it makes it much easier to read the IR dumps.
LLVM instructions are constrained by strict rules: for example, the Left and Right operators of an add instruction must have the same type, and the result type of the add must match the operand types. Because all values in Kaleidoscope are doubles, this makes for very simple code for add, sub and mul.
On the other hand, LLVM specifies that the fcmp instruction always returns an ‘i1’ value (a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with a uitofp instruction. This instruction converts its input integer into a floating point value by treating the input as an unsigned value. In contrast, if we used the sitofp instruction, the Kaleidoscope ‘<’ operator would return 0.0 and -1.0, depending on the input value.
Value *CallExprAST::Codegen() {
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value*> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0) return 0;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
Code generation for function calls is quite straightforward with LLVM. The code above initially does a function name lookup in the LLVM Module’s symbol table. Recall that the LLVM Module is the container that holds all of the functions we are JIT’ing. By giving each function the same name as what the user specifies, we can use the LLVM symbol table to resolve function names for us.
Once we have the function to call, we recursively codegen each argument that is to be passed in, and create an LLVM call instruction. Note that LLVM uses the native C calling conventions by default, allowing these calls to also call into standard library functions like “sin” and “cos”, with no additional effort.
This wraps up our handling of the four basic expressions that we have so far in Kaleidoscope. Feel free to go in and add some more. For example, by browsing the LLVM language reference you’ll find several other interesting instructions that are really easy to plug into our basic framework.
Function Code Generation¶
Code generation for prototypes and functions must handle a number of details, which make their code less beautiful than expression code generation, but allows us to illustrate some important points. First, lets talk about code generation for prototypes: they are used both for function bodies and external function declarations. The code starts with:
Function *PrototypeAST::Codegen() {
// Make the function type: double(double,double) etc.
std::vector<Type*> Doubles(Args.size(),
Type::getDoubleTy(getGlobalContext()));
FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
Doubles, false);
Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
This code packs a lot of power into a few lines. Note first that this function returns a “Function*” instead of a “Value*”. Because a “prototype” really talks about the external interface for a function (not the value computed by an expression), it makes sense for it to return the LLVM Function it corresponds to when codegen’d.
The call to FunctionType::get
creates the FunctionType
that
should be used for a given Prototype. Since all function arguments in
Kaleidoscope are of type double, the first line creates a vector of “N”
LLVM double types. It then uses the Functiontype::get
method to
create a function type that takes “N” doubles as arguments, returns one
double as a result, and that is not vararg (the false parameter
indicates this). Note that Types in LLVM are uniqued just like Constants
are, so you don’t “new” a type, you “get” it.
The final line above actually creates the function that the prototype
will correspond to. This indicates the type, linkage and name to use, as
well as which module to insert into. “external
linkage” means that the function may be
defined outside the current module and/or that it is callable by
functions outside the module. The Name passed in is the name the user
specified: since “TheModule
” is specified, this name is registered
in “TheModule
“s symbol table, which is used by the function call
code above.
// If F conflicted, there was already something named 'Name'. If it has a
// body, don't allow redefinition or reextern.
if (F->getName() != Name) {
// Delete the one we just made and get the existing one.
F->eraseFromParent();
F = TheModule->getFunction(Name);
The Module symbol table works just like the Function symbol table when it comes to name conflicts: if a new function is created with a name that was previously added to the symbol table, the new function will get implicitly renamed when added to the Module. The code above exploits this fact to determine if there was a previous definition of this function.
In Kaleidoscope, I choose to allow redefinitions of functions in two cases: first, we want to allow ‘extern’ing a function more than once, as long as the prototypes for the externs match (since all arguments have the same type, we just have to check that the number of arguments match). Second, we want to allow ‘extern’ing a function and then defining a body for it. This is useful when defining mutually recursive functions.
In order to implement this, the code above first checks to see if there
is a collision on the name of the function. If so, it deletes the
function we just created (by calling eraseFromParent
) and then
calling getFunction
to get the existing function with the specified
name. Note that many APIs in LLVM have “erase” forms and “remove” forms.
The “remove” form unlinks the object from its parent (e.g. a Function
from a Module) and returns it. The “erase” form unlinks the object and
then deletes it.
// If F already has a body, reject this.
if (!F->empty()) {
ErrorF("redefinition of function");
return 0;
}
// If F took a different number of args, reject.
if (F->arg_size() != Args.size()) {
ErrorF("redefinition of function with different # args");
return 0;
}
}
In order to verify the logic above, we first check to see if the pre-existing function is “empty”. In this case, empty means that it has no basic blocks in it, which means it has no body. If it has no body, it is a forward declaration. Since we don’t allow anything after a full definition of the function, the code rejects this case. If the previous reference to a function was an ‘extern’, we simply verify that the number of arguments for that definition and this one match up. If not, we emit an error.
// Set names for all arguments.
unsigned Idx = 0;
for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
++AI, ++Idx) {
AI->setName(Args[Idx]);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = AI;
}
return F;
}
The last bit of code for prototypes loops over all of the arguments in
the function, setting the name of the LLVM Argument objects to match,
and registering the arguments in the NamedValues
map for future use
by the VariableExprAST
AST node. Once this is set up, it returns the
Function object to the caller. Note that we don’t check for conflicting
argument names here (e.g. “extern foo(a b a)”). Doing so would be very
straight-forward with the mechanics we have already used above.
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
Code generation for function definitions starts out simply enough: we
just codegen the prototype (Proto) and verify that it is ok. We then
clear out the NamedValues
map to make sure that there isn’t anything
in it from the last function we compiled. Code generation of the
prototype ensures that there is an LLVM Function object that is ready to
go for us.
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
if (Value *RetVal = Body->Codegen()) {
Now we get to the point where the Builder
is set up. The first line
creates a new basic block
(named “entry”), which is inserted into TheFunction
. The second line
then tells the builder that new instructions should be inserted into the
end of the new basic block. Basic blocks in LLVM are an important part
of functions that define the Control Flow
Graph. Since we
don’t have any control flow, our functions will only contain one block
at this point. We’ll fix this in Chapter 5 :).
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
return TheFunction;
}
Once the insertion point is set up, we call the CodeGen()
method for
the root expression of the function. If no error happens, this emits
code to compute the expression into the entry block and returns the
value that was computed. Assuming no error, we then create an LLVM ret
instruction, which completes the function.
Once the function is built, we call verifyFunction
, which is
provided by LLVM. This function does a variety of consistency checks on
the generated code, to determine if our compiler is doing everything
right. Using this is important: it can catch a lot of bugs. Once the
function is finished and validated, we return it.
// Error reading body, remove function.
TheFunction->eraseFromParent();
return 0;
}
The only piece left here is handling of the error case. For simplicity,
we handle this by merely deleting the function we produced with the
eraseFromParent
method. This allows the user to redefine a function
that they incorrectly typed in before: if we didn’t delete it, it would
live in the symbol table, with a body, preventing future redefinition.
This code does have a bug, though. Since the PrototypeAST::Codegen
can return a previously defined forward declaration, our code can
actually delete a forward declaration. There are a number of ways to fix
this bug, see what you can come up with! Here is a testcase:
extern foo(a b); # ok, defines foo.
def foo(a b) c; # error, 'c' is invalid.
def bar() foo(1, 2); # error, unknown function "foo"
Driver Changes and Closing Thoughts¶
For now, code generation to LLVM doesn’t really get us much, except that
we can look at the pretty IR calls. The sample code inserts calls to
Codegen into the “HandleDefinition
”, “HandleExtern
” etc
functions, and then dumps out the LLVM IR. This gives a nice way to look
at the LLVM IR for simple functions. For example:
ready> 4+5;
Read top-level expression:
define double @0() {
entry:
ret double 9.000000e+00
}
Note how the parser turns the top-level expression into anonymous functions for us. This will be handy when we add JIT support in the next chapter. Also note that the code is very literally transcribed, no optimizations are being performed except simple constant folding done by IRBuilder. We will add optimizations explicitly in the next chapter.
ready> def foo(a b) a*a + 2*a*b + b*b;
Read function definition:
define double @foo(double %a, double %b) {
entry:
%multmp = fmul double %a, %a
%multmp1 = fmul double 2.000000e+00, %a
%multmp2 = fmul double %multmp1, %b
%addtmp = fadd double %multmp, %multmp2
%multmp3 = fmul double %b, %b
%addtmp4 = fadd double %addtmp, %multmp3
ret double %addtmp4
}
This shows some simple arithmetic. Notice the striking similarity to the LLVM builder calls that we use to create the instructions.
ready> def bar(a) foo(a, 4.0) + bar(31337);
Read function definition:
define double @bar(double %a) {
entry:
%calltmp = call double @foo(double %a, double 4.000000e+00)
%calltmp1 = call double @bar(double 3.133700e+04)
%addtmp = fadd double %calltmp, %calltmp1
ret double %addtmp
}
This shows some function calls. Note that this function will take a long time to execute if you call it. In the future we’ll add conditional control flow to actually make recursion useful :).
ready> extern cos(x);
Read extern:
declare double @cos(double)
ready> cos(1.234);
Read top-level expression:
define double @1() {
entry:
%calltmp = call double @cos(double 1.234000e+00)
ret double %calltmp
}
This shows an extern for the libm “cos” function, and a call to it.
ready> ^D
; ModuleID = 'my cool jit'
define double @0() {
entry:
%addtmp = fadd double 4.000000e+00, 5.000000e+00
ret double %addtmp
}
define double @foo(double %a, double %b) {
entry:
%multmp = fmul double %a, %a
%multmp1 = fmul double 2.000000e+00, %a
%multmp2 = fmul double %multmp1, %b
%addtmp = fadd double %multmp, %multmp2
%multmp3 = fmul double %b, %b
%addtmp4 = fadd double %addtmp, %multmp3
ret double %addtmp4
}
define double @bar(double %a) {
entry:
%calltmp = call double @foo(double %a, double 4.000000e+00)
%calltmp1 = call double @bar(double 3.133700e+04)
%addtmp = fadd double %calltmp, %calltmp1
ret double %addtmp
}
declare double @cos(double)
define double @1() {
entry:
%calltmp = call double @cos(double 1.234000e+00)
ret double %calltmp
}
When you quit the current demo, it dumps out the IR for the entire module generated. Here you can see the big picture with all the functions referencing each other.
This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we’ll describe how to add JIT codegen and optimizer support to this so we can actually start running code!
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the LLVM code generator. Because this uses the LLVM libraries, we need to link them in. To do this, we use the llvm-config tool to inform our makefile/command line about which options to use:
# Compile
clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy
# Run
./toy
Here is the code:
#include "llvm/IR/Verifier.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include <cctype>
#include <cstdio>
#include <map>
#include <string>
#include <vector>
using namespace llvm;
//===----------------------------------------------------------------------===//
// Lexer
//===----------------------------------------------------------------------===//
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2, tok_extern = -3,
// primary
tok_identifier = -4, tok_number = -5
};
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = getchar();
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = getchar())))
IdentifierStr += LastChar;
if (IdentifierStr == "def") return tok_def;
if (IdentifierStr == "extern") return tok_extern;
return tok_identifier;
}
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = getchar();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
if (LastChar == '#') {
// Comment until end of line.
do LastChar = getchar();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = getchar();
return ThisChar;
}
//===----------------------------------------------------------------------===//
// Abstract Syntax Tree (aka Parse Tree)
//===----------------------------------------------------------------------===//
namespace {
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
Value *Codegen() override;
};
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(const std::string &name) : Name(name) {}
Value *Codegen() override;
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
: Op(op), LHS(lhs), RHS(rhs) {}
Value *Codegen() override;
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST*> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
: Callee(callee), Args(args) {}
Value *Codegen() override;
};
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes).
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args)
: Name(name), Args(args) {}
Function *Codegen();
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body)
: Proto(proto), Body(body) {}
Function *Codegen();
};
} // end anonymous namespace
//===----------------------------------------------------------------------===//
// Parser
//===----------------------------------------------------------------------===//
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() {
return CurTok = gettok();
}
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0) return -1;
return TokPrec;
}
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
static ExprAST *ParseExpression();
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST*> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg) return 0;
Args.push_back(Arg);
if (CurTok == ')') break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(IdName, Args);
}
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V) return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default: return Error("unknown token when expecting an expression");
case tok_identifier: return ParseIdentifierExpr();
case tok_number: return ParseNumberExpr();
case '(': return ParseParenExpr();
}
}
/// binoprhs
/// ::= ('+' primary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
// Okay, we know this is a binop.
int BinOp = CurTok;
getNextToken(); // eat binop
// Parse the primary expression after the binary operator.
ExprAST *RHS = ParsePrimary();
if (!RHS) return 0;
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec+1, RHS);
if (RHS == 0) return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
}
}
/// expression
/// ::= primary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParsePrimary();
if (!LHS) return 0;
return ParseBinOpRHS(0, LHS);
}
/// prototype
/// ::= id '(' id* ')'
static PrototypeAST *ParsePrototype() {
if (CurTok != tok_identifier)
return ErrorP("Expected function name in prototype");
std::string FnName = IdentifierStr;
getNextToken();
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
return new PrototypeAST(FnName, ArgNames);
}
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0) return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
//===----------------------------------------------------------------------===//
// Code Generation
//===----------------------------------------------------------------------===//
static Module *TheModule;
static IRBuilder<> Builder(getGlobalContext());
static std::map<std::string, Value*> NamedValues;
Value *ErrorV(const char *Str) { Error(Str); return 0; }
Value *NumberExprAST::Codegen() {
return ConstantFP::get(getGlobalContext(), APFloat(Val));
}
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
return V ? V : ErrorV("Unknown variable name");
}
Value *BinaryExprAST::Codegen() {
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0) return 0;
switch (Op) {
case '+': return Builder.CreateFAdd(L, R, "addtmp");
case '-': return Builder.CreateFSub(L, R, "subtmp");
case '*': return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default: return ErrorV("invalid binary operator");
}
}
Value *CallExprAST::Codegen() {
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value*> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0) return 0;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
Function *PrototypeAST::Codegen() {
// Make the function type: double(double,double) etc.
std::vector<Type*> Doubles(Args.size(),
Type::getDoubleTy(getGlobalContext()));
FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
Doubles, false);
Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
// If F conflicted, there was already something named 'Name'. If it has a
// body, don't allow redefinition or reextern.
if (F->getName() != Name) {
// Delete the one we just made and get the existing one.
F->eraseFromParent();
F = TheModule->getFunction(Name);
// If F already has a body, reject this.
if (!F->empty()) {
ErrorF("redefinition of function");
return 0;
}
// If F took a different number of args, reject.
if (F->arg_size() != Args.size()) {
ErrorF("redefinition of function with different # args");
return 0;
}
}
// Set names for all arguments.
unsigned Idx = 0;
for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
++AI, ++Idx) {
AI->setName(Args[Idx]);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = AI;
}
return F;
}
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
return TheFunction;
}
// Error reading body, remove function.
TheFunction->eraseFromParent();
return 0;
}
//===----------------------------------------------------------------------===//
// Top-Level parsing and JIT Driver
//===----------------------------------------------------------------------===//
static void HandleDefinition() {
if (FunctionAST *F = ParseDefinition()) {
if (Function *LF = F->Codegen()) {
fprintf(stderr, "Read function definition:");
LF->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleExtern() {
if (PrototypeAST *P = ParseExtern()) {
if (Function *F = P->Codegen()) {
fprintf(stderr, "Read extern: ");
F->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
if (Function *LF = F->Codegen()) {
fprintf(stderr, "Read top-level expression:");
LF->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof: return;
case ';': getNextToken(); break; // ignore top-level semicolons.
case tok_def: HandleDefinition(); break;
case tok_extern: HandleExtern(); break;
default: HandleTopLevelExpression(); break;
}
}
}
//===----------------------------------------------------------------------===//
// "Library" functions that can be "extern'd" from user code.
//===----------------------------------------------------------------------===//
/// putchard - putchar that takes a double and returns 0.
extern "C"
double putchard(double X) {
putchar((char)X);
return 0;
}
//===----------------------------------------------------------------------===//
// Main driver code.
//===----------------------------------------------------------------------===//
int main() {
LLVMContext &Context = getGlobalContext();
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
fprintf(stderr, "ready> ");
getNextToken();
// Make the module, which holds all the code.
TheModule = new Module("my cool jit", Context);
// Run the main "interpreter loop" now.
MainLoop();
// Print out all of the generated code.
TheModule->dump();
return 0;
}
Kaleidoscope: Adding JIT and Optimizer Support¶
Chapter 4 Introduction¶
Welcome to Chapter 4 of the “Implementing a language with LLVM” tutorial. Chapters 1-3 described the implementation of a simple language and added support for generating LLVM IR. This chapter describes two new techniques: adding optimizer support to your language, and adding JIT compiler support. These additions will demonstrate how to get nice, efficient code for the Kaleidoscope language.
Trivial Constant Folding¶
Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately, it does not produce wonderful code. The IRBuilder, however, does give us obvious optimizations when compiling simple code:
ready> def test(x) 1+2+x;
Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 3.000000e+00, %x
ret double %addtmp
}
This code is not a literal transcription of the AST built by parsing the input. That would be:
ready> def test(x) 1+2+x;
Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 2.000000e+00, 1.000000e+00
%addtmp1 = fadd double %addtmp, %x
ret double %addtmp1
}
Constant folding, as seen above, in particular, is a very common and very important optimization: so much so that many language implementors implement constant folding support in their AST representation.
With LLVM, you don’t need this support in the AST. Since all calls to build LLVM IR go through the LLVM IR builder, the builder itself checked to see if there was a constant folding opportunity when you call it. If so, it just does the constant fold and return the constant instead of creating an instruction.
Well, that was easy :). In practice, we recommend always using
IRBuilder
when generating code like this. It has no “syntactic
overhead” for its use (you don’t have to uglify your compiler with
constant checks everywhere) and it can dramatically reduce the amount of
LLVM IR that is generated in some cases (particular for languages with a
macro preprocessor or that use a lot of constants).
On the other hand, the IRBuilder
is limited by the fact that it does
all of its analysis inline with the code as it is built. If you take a
slightly more complex example:
ready> def test(x) (1+2+x)*(x+(1+2));
ready> Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 3.000000e+00, %x
%addtmp1 = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp1
ret double %multmp
}
In this case, the LHS and RHS of the multiplication are the same value.
We’d really like to see this generate “tmp = x+3; result = tmp*tmp;
”
instead of computing “x+3
” twice.
Unfortunately, no amount of local analysis will be able to detect and correct this. This requires two transformations: reassociation of expressions (to make the add’s lexically identical) and Common Subexpression Elimination (CSE) to delete the redundant add instruction. Fortunately, LLVM provides a broad range of optimizations that you can use, in the form of “passes”.
LLVM Optimization Passes¶
LLVM provides many optimization passes, which do many different sorts of things and have different tradeoffs. Unlike other systems, LLVM doesn’t hold to the mistaken notion that one set of optimizations is right for all languages and for all situations. LLVM allows a compiler implementor to make complete decisions about what optimizations to use, in which order, and in what situation.
As a concrete example, LLVM supports both “whole module” passes, which look across as large of body of code as they can (often a whole file, but if run at link time, this can be a substantial portion of the whole program). It also supports and includes “per-function” passes which just operate on a single function at a time, without looking at other functions. For more information on passes and how they are run, see the How to Write a Pass document and the List of LLVM Passes.
For Kaleidoscope, we are currently generating functions on the fly, one at a time, as the user types them in. We aren’t shooting for the ultimate optimization experience in this setting, but we also want to catch the easy and quick stuff where possible. As such, we will choose to run a few per-function optimizations as the user types the function in. If we wanted to make a “static Kaleidoscope compiler”, we would use exactly the code we have now, except that we would defer running the optimizer until the entire file has been parsed.
In order to get per-function optimizations going, we need to set up a FunctionPassManager to hold and organize the LLVM optimizations that we want to run. Once we have that, we can add a set of optimizations to run. The code looks like this:
FunctionPassManager OurFPM(TheModule);
// Set up the optimizer pipeline. Start with registering info about how the
// target lays out data structures.
OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
// Provide basic AliasAnalysis support for GVN.
OurFPM.add(createBasicAliasAnalysisPass());
// Do simple "peephole" optimizations and bit-twiddling optzns.
OurFPM.add(createInstructionCombiningPass());
// Reassociate expressions.
OurFPM.add(createReassociatePass());
// Eliminate Common SubExpressions.
OurFPM.add(createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
OurFPM.add(createCFGSimplificationPass());
OurFPM.doInitialization();
// Set the global so the code gen can use this.
TheFPM = &OurFPM;
// Run the main "interpreter loop" now.
MainLoop();
This code defines a FunctionPassManager
, “OurFPM
”. It requires a
pointer to the Module
to construct itself. Once it is set up, we use
a series of “add” calls to add a bunch of LLVM passes. The first pass is
basically boilerplate, it adds a pass so that later optimizations know
how the data structures in the program are laid out. The
“TheExecutionEngine
” variable is related to the JIT, which we will
get to in the next section.
In this case, we choose to add 4 optimization passes. The passes we chose here are a pretty standard set of “cleanup” optimizations that are useful for a wide variety of code. I won’t delve into what they do but, believe me, they are a good starting place :).
Once the PassManager is set up, we need to make use of it. We do this by
running it after our newly created function is constructed (in
FunctionAST::Codegen
), but before it is returned to the client:
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
// Optimize the function.
TheFPM->run(*TheFunction);
return TheFunction;
}
As you can see, this is pretty straightforward. The
FunctionPassManager
optimizes and updates the LLVM Function* in
place, improving (hopefully) its body. With this in place, we can try
our test above again:
ready> def test(x) (1+2+x)*(x+(1+2));
ready> Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp
ret double %multmp
}
As expected, we now get our nicely optimized code, saving a floating point add instruction from every execution of this function.
LLVM provides a wide variety of optimizations that can be used in
certain circumstances. Some documentation about the various
passes is available, but it isn’t very complete.
Another good source of ideas can come from looking at the passes that
Clang
runs to get started. The “opt
” tool allows you to
experiment with passes from the command line, so you can see if they do
anything.
Now that we have reasonable code coming out of our front-end, lets talk about executing it!
Adding a JIT Compiler¶
Code that is available in LLVM IR can have a wide variety of tools applied to it. For example, you can run optimizations on it (as we did above), you can dump it out in textual or binary forms, you can compile the code to an assembly file (.s) for some target, or you can JIT compile it. The nice thing about the LLVM IR representation is that it is the “common currency” between many different parts of the compiler.
In this section, we’ll add JIT compiler support to our interpreter. The basic idea that we want for Kaleidoscope is to have the user enter function bodies as they do now, but immediately evaluate the top-level expressions they type in. For example, if they type in “1 + 2;”, we should evaluate and print out 3. If they define a function, they should be able to call it from the command line.
In order to do this, we first declare and initialize the JIT. This is
done by adding a global variable and a call in main
:
static ExecutionEngine *TheExecutionEngine;
...
int main() {
..
// Create the JIT. This takes ownership of the module.
TheExecutionEngine = EngineBuilder(TheModule).create();
..
}
This creates an abstract “Execution Engine” which can be either a JIT compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler for you if one is available for your platform, otherwise it will fall back to the interpreter.
Once the ExecutionEngine
is created, the JIT is ready to be used.
There are a variety of APIs that are useful, but the simplest one is the
“getPointerToFunction(F)
” method. This method JIT compiles the
specified LLVM Function and returns a function pointer to the generated
machine code. In our case, this means that we can change the code that
parses a top-level expression to look like this:
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
if (Function *LF = F->Codegen()) {
LF->dump(); // Dump the function for exposition purposes.
// JIT the function, returning a function pointer.
void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
// Cast it to the right type (takes no arguments, returns a double) so we
// can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)FPtr;
fprintf(stderr, "Evaluated to %f\n", FP());
}
Recall that we compile top-level expressions into a self-contained LLVM function that takes no arguments and returns the computed double. Because the LLVM JIT compiler matches the native platform ABI, this means that you can just cast the result pointer to a function pointer of that type and call it directly. This means, there is no difference between JIT compiled code and native machine code that is statically linked into your application.
With just these two changes, lets see how Kaleidoscope works now!
ready> 4+5;
Read top-level expression:
define double @0() {
entry:
ret double 9.000000e+00
}
Evaluated to 9.000000
Well this looks like it is basically working. The dump of the function shows the “no argument function that always returns double” that we synthesize for each top-level expression that is typed in. This demonstrates very basic functionality, but can we do more?
ready> def testfunc(x y) x + y*2;
Read function definition:
define double @testfunc(double %x, double %y) {
entry:
%multmp = fmul double %y, 2.000000e+00
%addtmp = fadd double %multmp, %x
ret double %addtmp
}
ready> testfunc(4, 10);
Read top-level expression:
define double @1() {
entry:
%calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
ret double %calltmp
}
Evaluated to 24.000000
This illustrates that we can now call user code, but there is something
a bit subtle going on here. Note that we only invoke the JIT on the
anonymous functions that call testfunc, but we never invoked it on
testfunc itself. What actually happened here is that the JIT scanned
for all non-JIT’d functions transitively called from the anonymous
function and compiled all of them before returning from
getPointerToFunction()
.
The JIT provides a number of other more advanced interfaces for things like freeing allocated machine code, rejit’ing functions to update them, etc. However, even with this simple code, we get some surprisingly powerful capabilities - check this out (I removed the dump of the anonymous functions, you should get the idea by now :) :
ready> extern sin(x);
Read extern:
declare double @sin(double)
ready> extern cos(x);
Read extern:
declare double @cos(double)
ready> sin(1.0);
Read top-level expression:
define double @2() {
entry:
ret double 0x3FEAED548F090CEE
}
Evaluated to 0.841471
ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
Read function definition:
define double @foo(double %x) {
entry:
%calltmp = call double @sin(double %x)
%multmp = fmul double %calltmp, %calltmp
%calltmp2 = call double @cos(double %x)
%multmp4 = fmul double %calltmp2, %calltmp2
%addtmp = fadd double %multmp, %multmp4
ret double %addtmp
}
ready> foo(4.0);
Read top-level expression:
define double @3() {
entry:
%calltmp = call double @foo(double 4.000000e+00)
ret double %calltmp
}
Evaluated to 1.000000
Whoa, how does the JIT know about sin and cos? The answer is
surprisingly simple: in this example, the JIT started execution of a
function and got to a function call. It realized that the function was
not yet JIT compiled and invoked the standard set of routines to resolve
the function. In this case, there is no body defined for the function,
so the JIT ended up calling “dlsym("sin")
” on the Kaleidoscope
process itself. Since “sin
” is defined within the JIT’s address
space, it simply patches up calls in the module to call the libm version
of sin
directly.
The LLVM JIT provides a number of interfaces (look in the
ExecutionEngine.h
file) for controlling how unknown functions get
resolved. It allows you to establish explicit mappings between IR
objects and addresses (useful for LLVM global variables that you want to
map to static tables, for example), allows you to dynamically decide on
the fly based on the function name, and even allows you to have the JIT
compile functions lazily the first time they’re called.
One interesting application of this is that we can now extend the language by writing arbitrary C++ code to implement operations. For example, if we add:
/// putchard - putchar that takes a double and returns 0.
extern "C"
double putchard(double X) {
putchar((char)X);
return 0;
}
Now we can produce simple output to the console by using things like:
“extern putchard(x); putchard(120);
”, which prints a lowercase ‘x’
on the console (120 is the ASCII code for ‘x’). Similar code could be
used to implement file I/O, console input, and many other capabilities
in Kaleidoscope.
This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At this point, we can compile a non-Turing-complete programming language, optimize and JIT compile it in a user-driven way. Next up we’ll look into extending the language with control flow constructs, tackling some interesting LLVM IR issues along the way.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the LLVM JIT and optimizer. To build this example, use:
# Compile
clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
If you are compiling this on Linux, make sure to add the “-rdynamic” option as well. This makes sure that the external functions are resolved properly at runtime.
Here is the code:
#include "llvm/Analysis/Passes.h"
#include "llvm/ExecutionEngine/ExecutionEngine.h"
#include "llvm/ExecutionEngine/MCJIT.h"
#include "llvm/ExecutionEngine/SectionMemoryManager.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Verifier.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Transforms/Scalar.h"
#include <cctype>
#include <cstdio>
#include <map>
#include <string>
#include <vector>
using namespace llvm;
//===----------------------------------------------------------------------===//
// Lexer
//===----------------------------------------------------------------------===//
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2,
tok_extern = -3,
// primary
tok_identifier = -4,
tok_number = -5
};
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = getchar();
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = getchar())))
IdentifierStr += LastChar;
if (IdentifierStr == "def")
return tok_def;
if (IdentifierStr == "extern")
return tok_extern;
return tok_identifier;
}
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = getchar();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
if (LastChar == '#') {
// Comment until end of line.
do
LastChar = getchar();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = getchar();
return ThisChar;
}
//===----------------------------------------------------------------------===//
// Abstract Syntax Tree (aka Parse Tree)
//===----------------------------------------------------------------------===//
namespace {
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
Value *Codegen() override;
};
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(const std::string &name) : Name(name) {}
Value *Codegen() override;
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
: Op(op), LHS(lhs), RHS(rhs) {}
Value *Codegen() override;
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST *> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST *> &args)
: Callee(callee), Args(args) {}
Value *Codegen() override;
};
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes).
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args)
: Name(name), Args(args) {}
Function *Codegen();
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {}
Function *Codegen();
};
} // end anonymous namespace
//===----------------------------------------------------------------------===//
// Parser
//===----------------------------------------------------------------------===//
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() { return CurTok = gettok(); }
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0)
return -1;
return TokPrec;
}
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) {
fprintf(stderr, "Error: %s\n", Str);
return 0;
}
PrototypeAST *ErrorP(const char *Str) {
Error(Str);
return 0;
}
FunctionAST *ErrorF(const char *Str) {
Error(Str);
return 0;
}
static ExprAST *ParseExpression();
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST *> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg)
return 0;
Args.push_back(Arg);
if (CurTok == ')')
break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(IdName, Args);
}
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V)
return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default:
return Error("unknown token when expecting an expression");
case tok_identifier:
return ParseIdentifierExpr();
case tok_number:
return ParseNumberExpr();
case '(':
return ParseParenExpr();
}
}
/// binoprhs
/// ::= ('+' primary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
// Okay, we know this is a binop.
int BinOp = CurTok;
getNextToken(); // eat binop
// Parse the primary expression after the binary operator.
ExprAST *RHS = ParsePrimary();
if (!RHS)
return 0;
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec + 1, RHS);
if (RHS == 0)
return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
}
}
/// expression
/// ::= primary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParsePrimary();
if (!LHS)
return 0;
return ParseBinOpRHS(0, LHS);
}
/// prototype
/// ::= id '(' id* ')'
static PrototypeAST *ParsePrototype() {
if (CurTok != tok_identifier)
return ErrorP("Expected function name in prototype");
std::string FnName = IdentifierStr;
getNextToken();
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
return new PrototypeAST(FnName, ArgNames);
}
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0)
return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
//===----------------------------------------------------------------------===//
// Quick and dirty hack
//===----------------------------------------------------------------------===//
// FIXME: Obviously we can do better than this
std::string GenerateUniqueName(const char *root) {
static int i = 0;
char s[16];
sprintf(s, "%s%d", root, i++);
std::string S = s;
return S;
}
std::string MakeLegalFunctionName(std::string Name) {
std::string NewName;
if (!Name.length())
return GenerateUniqueName("anon_func_");
// Start with what we have
NewName = Name;
// Look for a numberic first character
if (NewName.find_first_of("0123456789") == 0) {
NewName.insert(0, 1, 'n');
}
// Replace illegal characters with their ASCII equivalent
std::string legal_elements =
"_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
size_t pos;
while ((pos = NewName.find_first_not_of(legal_elements)) !=
std::string::npos) {
char old_c = NewName.at(pos);
char new_str[16];
sprintf(new_str, "%d", (int)old_c);
NewName = NewName.replace(pos, 1, new_str);
}
return NewName;
}
//===----------------------------------------------------------------------===//
// MCJIT helper class
//===----------------------------------------------------------------------===//
class MCJITHelper {
public:
MCJITHelper(LLVMContext &C) : Context(C), OpenModule(NULL) {}
~MCJITHelper();
Function *getFunction(const std::string FnName);
Module *getModuleForNewFunction();
void *getPointerToFunction(Function *F);
void *getSymbolAddress(const std::string &Name);
void dump();
private:
typedef std::vector<Module *> ModuleVector;
typedef std::vector<ExecutionEngine *> EngineVector;
LLVMContext &Context;
Module *OpenModule;
ModuleVector Modules;
EngineVector Engines;
};
class HelpingMemoryManager : public SectionMemoryManager {
HelpingMemoryManager(const HelpingMemoryManager &) = delete;
void operator=(const HelpingMemoryManager &) = delete;
public:
HelpingMemoryManager(MCJITHelper *Helper) : MasterHelper(Helper) {}
~HelpingMemoryManager() override {}
/// This method returns the address of the specified symbol.
/// Our implementation will attempt to find symbols in other
/// modules associated with the MCJITHelper to cross link symbols
/// from one generated module to another.
uint64_t getSymbolAddress(const std::string &Name) override;
private:
MCJITHelper *MasterHelper;
};
uint64_t HelpingMemoryManager::getSymbolAddress(const std::string &Name) {
uint64_t FnAddr = SectionMemoryManager::getSymbolAddress(Name);
if (FnAddr)
return FnAddr;
uint64_t HelperFun = (uint64_t)MasterHelper->getSymbolAddress(Name);
if (!HelperFun)
report_fatal_error("Program used extern function '" + Name +
"' which could not be resolved!");
return HelperFun;
}
MCJITHelper::~MCJITHelper() {
if (OpenModule)
delete OpenModule;
EngineVector::iterator begin = Engines.begin();
EngineVector::iterator end = Engines.end();
EngineVector::iterator it;
for (it = begin; it != end; ++it)
delete *it;
}
Function *MCJITHelper::getFunction(const std::string FnName) {
ModuleVector::iterator begin = Modules.begin();
ModuleVector::iterator end = Modules.end();
ModuleVector::iterator it;
for (it = begin; it != end; ++it) {
Function *F = (*it)->getFunction(FnName);
if (F) {
if (*it == OpenModule)
return F;
assert(OpenModule != NULL);
// This function is in a module that has already been JITed.
// We need to generate a new prototype for external linkage.
Function *PF = OpenModule->getFunction(FnName);
if (PF && !PF->empty()) {
ErrorF("redefinition of function across modules");
return 0;
}
// If we don't have a prototype yet, create one.
if (!PF)
PF = Function::Create(F->getFunctionType(), Function::ExternalLinkage,
FnName, OpenModule);
return PF;
}
}
return NULL;
}
Module *MCJITHelper::getModuleForNewFunction() {
// If we have a Module that hasn't been JITed, use that.
if (OpenModule)
return OpenModule;
// Otherwise create a new Module.
std::string ModName = GenerateUniqueName("mcjit_module_");
Module *M = new Module(ModName, Context);
Modules.push_back(M);
OpenModule = M;
return M;
}
void *MCJITHelper::getPointerToFunction(Function *F) {
// See if an existing instance of MCJIT has this function.
EngineVector::iterator begin = Engines.begin();
EngineVector::iterator end = Engines.end();
EngineVector::iterator it;
for (it = begin; it != end; ++it) {
void *P = (*it)->getPointerToFunction(F);
if (P)
return P;
}
// If we didn't find the function, see if we can generate it.
if (OpenModule) {
std::string ErrStr;
ExecutionEngine *NewEngine =
EngineBuilder(std::unique_ptr<Module>(OpenModule))
.setErrorStr(&ErrStr)
.setMCJITMemoryManager(std::unique_ptr<HelpingMemoryManager>(
new HelpingMemoryManager(this)))
.create();
if (!NewEngine) {
fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
exit(1);
}
// Create a function pass manager for this engine
auto *FPM = new legacy::FunctionPassManager(OpenModule);
// Set up the optimizer pipeline. Start with registering info about how the
// target lays out data structures.
OpenModule->setDataLayout(*NewEngine->getDataLayout());
// Provide basic AliasAnalysis support for GVN.
FPM->add(createBasicAliasAnalysisPass());
// Promote allocas to registers.
FPM->add(createPromoteMemoryToRegisterPass());
// Do simple "peephole" optimizations and bit-twiddling optzns.
FPM->add(createInstructionCombiningPass());
// Reassociate expressions.
FPM->add(createReassociatePass());
// Eliminate Common SubExpressions.
FPM->add(createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
FPM->add(createCFGSimplificationPass());
FPM->doInitialization();
// For each function in the module
Module::iterator it;
Module::iterator end = OpenModule->end();
for (it = OpenModule->begin(); it != end; ++it) {
// Run the FPM on this function
FPM->run(*it);
}
// We don't need this anymore
delete FPM;
OpenModule = NULL;
Engines.push_back(NewEngine);
NewEngine->finalizeObject();
return NewEngine->getPointerToFunction(F);
}
return NULL;
}
void *MCJITHelper::getSymbolAddress(const std::string &Name) {
// Look for the symbol in each of our execution engines.
EngineVector::iterator begin = Engines.begin();
EngineVector::iterator end = Engines.end();
EngineVector::iterator it;
for (it = begin; it != end; ++it) {
uint64_t FAddr = (*it)->getFunctionAddress(Name);
if (FAddr) {
return (void *)FAddr;
}
}
return NULL;
}
void MCJITHelper::dump() {
ModuleVector::iterator begin = Modules.begin();
ModuleVector::iterator end = Modules.end();
ModuleVector::iterator it;
for (it = begin; it != end; ++it)
(*it)->dump();
}
//===----------------------------------------------------------------------===//
// Code Generation
//===----------------------------------------------------------------------===//
static MCJITHelper *JITHelper;
static IRBuilder<> Builder(getGlobalContext());
static std::map<std::string, Value *> NamedValues;
Value *ErrorV(const char *Str) {
Error(Str);
return 0;
}
Value *NumberExprAST::Codegen() {
return ConstantFP::get(getGlobalContext(), APFloat(Val));
}
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
return V ? V : ErrorV("Unknown variable name");
}
Value *BinaryExprAST::Codegen() {
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0)
return 0;
switch (Op) {
case '+':
return Builder.CreateFAdd(L, R, "addtmp");
case '-':
return Builder.CreateFSub(L, R, "subtmp");
case '*':
return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default:
return ErrorV("invalid binary operator");
}
}
Value *CallExprAST::Codegen() {
// Look up the name in the global module table.
Function *CalleeF = JITHelper->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value *> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0)
return 0;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
Function *PrototypeAST::Codegen() {
// Make the function type: double(double,double) etc.
std::vector<Type *> Doubles(Args.size(),
Type::getDoubleTy(getGlobalContext()));
FunctionType *FT =
FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
std::string FnName = MakeLegalFunctionName(Name);
Module *M = JITHelper->getModuleForNewFunction();
Function *F = Function::Create(FT, Function::ExternalLinkage, FnName, M);
// If F conflicted, there was already something named 'Name'. If it has a
// body, don't allow redefinition or reextern.
if (F->getName() != FnName) {
// Delete the one we just made and get the existing one.
F->eraseFromParent();
F = JITHelper->getFunction(Name);
// If F already has a body, reject this.
if (!F->empty()) {
ErrorF("redefinition of function");
return 0;
}
// If F took a different number of args, reject.
if (F->arg_size() != Args.size()) {
ErrorF("redefinition of function with different # args");
return 0;
}
}
// Set names for all arguments.
unsigned Idx = 0;
for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
++AI, ++Idx) {
AI->setName(Args[Idx]);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = AI;
}
return F;
}
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
return TheFunction;
}
// Error reading body, remove function.
TheFunction->eraseFromParent();
return 0;
}
//===----------------------------------------------------------------------===//
// Top-Level parsing and JIT Driver
//===----------------------------------------------------------------------===//
static void HandleDefinition() {
if (FunctionAST *F = ParseDefinition()) {
if (Function *LF = F->Codegen()) {
fprintf(stderr, "Read function definition:");
LF->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleExtern() {
if (PrototypeAST *P = ParseExtern()) {
if (Function *F = P->Codegen()) {
fprintf(stderr, "Read extern: ");
F->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
if (Function *LF = F->Codegen()) {
// JIT the function, returning a function pointer.
void *FPtr = JITHelper->getPointerToFunction(LF);
// Cast it to the right type (takes no arguments, returns a double) so we
// can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)FPtr;
fprintf(stderr, "Evaluated to %f\n", FP());
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof:
return;
case ';':
getNextToken();
break; // ignore top-level semicolons.
case tok_def:
HandleDefinition();
break;
case tok_extern:
HandleExtern();
break;
default:
HandleTopLevelExpression();
break;
}
}
}
//===----------------------------------------------------------------------===//
// "Library" functions that can be "extern'd" from user code.
//===----------------------------------------------------------------------===//
/// putchard - putchar that takes a double and returns 0.
extern "C" double putchard(double X) {
putchar((char)X);
return 0;
}
//===----------------------------------------------------------------------===//
// Main driver code.
//===----------------------------------------------------------------------===//
int main() {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
LLVMContext &Context = getGlobalContext();
JITHelper = new MCJITHelper(Context);
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
fprintf(stderr, "ready> ");
getNextToken();
// Run the main "interpreter loop" now.
MainLoop();
// Print out all of the generated code.
JITHelper->dump();
return 0;
}
Kaleidoscope: Extending the Language: Control Flow¶
Chapter 5 Introduction¶
Welcome to Chapter 5 of the “Implementing a language with LLVM” tutorial. Parts 1-4 described the implementation of the simple Kaleidoscope language and included support for generating LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is mostly useless: it has no control flow other than call and return. This means that you can’t have conditional branches in the code, significantly limiting its power. In this episode of “build that compiler”, we’ll extend Kaleidoscope to have an if/then/else expression plus a simple ‘for’ loop.
If/Then/Else¶
Extending Kaleidoscope to support if/then/else is quite straightforward. It basically requires adding support for this “new” concept to the lexer, parser, AST, and LLVM code emitter. This example is nice, because it shows how easy it is to “grow” a language over time, incrementally extending it as new ideas are discovered.
Before we get going on “how” we add this extension, lets talk about “what” we want. The basic idea is that we want to be able to write this sort of thing:
def fib(x)
if x < 3 then
1
else
fib(x-1)+fib(x-2);
In Kaleidoscope, every construct is an expression: there are no statements. As such, the if/then/else expression needs to return a value like any other. Since we’re using a mostly functional form, we’ll have it evaluate its conditional, then return the ‘then’ or ‘else’ value based on how the condition was resolved. This is very similar to the C ”?:” expression.
The semantics of the if/then/else expression is that it evaluates the condition to a boolean equality value: 0.0 is considered to be false and everything else is considered to be true. If the condition is true, the first subexpression is evaluated and returned, if the condition is false, the second subexpression is evaluated and returned. Since Kaleidoscope allows side-effects, this behavior is important to nail down.
Now that we know what we “want”, lets break this down into its constituent pieces.
Lexer Extensions for If/Then/Else¶
The lexer extensions are straightforward. First we add new enum values for the relevant tokens:
// control
tok_if = -6, tok_then = -7, tok_else = -8,
Once we have that, we recognize the new keywords in the lexer. This is pretty simple stuff:
...
if (IdentifierStr == "def") return tok_def;
if (IdentifierStr == "extern") return tok_extern;
if (IdentifierStr == "if") return tok_if;
if (IdentifierStr == "then") return tok_then;
if (IdentifierStr == "else") return tok_else;
return tok_identifier;
AST Extensions for If/Then/Else¶
To represent the new expression we add a new AST node for it:
/// IfExprAST - Expression class for if/then/else.
class IfExprAST : public ExprAST {
ExprAST *Cond, *Then, *Else;
public:
IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
: Cond(cond), Then(then), Else(_else) {}
virtual Value *Codegen();
};
The AST node just has pointers to the various subexpressions.
Parser Extensions for If/Then/Else¶
Now that we have the relevant tokens coming from the lexer and we have the AST node to build, our parsing logic is relatively straightforward. First we define a new parsing function:
/// ifexpr ::= 'if' expression 'then' expression 'else' expression
static ExprAST *ParseIfExpr() {
getNextToken(); // eat the if.
// condition.
ExprAST *Cond = ParseExpression();
if (!Cond) return 0;
if (CurTok != tok_then)
return Error("expected then");
getNextToken(); // eat the then
ExprAST *Then = ParseExpression();
if (Then == 0) return 0;
if (CurTok != tok_else)
return Error("expected else");
getNextToken();
ExprAST *Else = ParseExpression();
if (!Else) return 0;
return new IfExprAST(Cond, Then, Else);
}
Next we hook it up as a primary expression:
static ExprAST *ParsePrimary() {
switch (CurTok) {
default: return Error("unknown token when expecting an expression");
case tok_identifier: return ParseIdentifierExpr();
case tok_number: return ParseNumberExpr();
case '(': return ParseParenExpr();
case tok_if: return ParseIfExpr();
}
}
LLVM IR for If/Then/Else¶
Now that we have it parsing and building the AST, the final piece is adding LLVM code generation support. This is the most interesting part of the if/then/else example, because this is where it starts to introduce new concepts. All of the code above has been thoroughly described in previous chapters.
To motivate the code we want to produce, lets take a look at a simple example. Consider:
extern foo();
extern bar();
def baz(x) if x then foo() else bar();
If you disable optimizations, the code you’ll (soon) get from Kaleidoscope looks like this:
declare double @foo()
declare double @bar()
define double @baz(double %x) {
entry:
%ifcond = fcmp one double %x, 0.000000e+00
br i1 %ifcond, label %then, label %else
then: ; preds = %entry
%calltmp = call double @foo()
br label %ifcont
else: ; preds = %entry
%calltmp1 = call double @bar()
br label %ifcont
ifcont: ; preds = %else, %then
%iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
ret double %iftmp
}
To visualize the control flow graph, you can use a nifty feature of the
LLVM ‘opt‘ tool. If you put this LLVM
IR into “t.ll” and run “llvm-as < t.ll | opt -analyze -view-cfg
”, a
window will pop up and you’ll
see this graph:

Example CFG
Another way to get this is to call “F->viewCFG()
” or
“F->viewCFGOnly()
” (where F is a “Function*
”) either by
inserting actual calls into the code and recompiling or by calling these
in the debugger. LLVM has many nice features for visualizing various
graphs.
Getting back to the generated code, it is fairly simple: the entry block
evaluates the conditional expression (“x” in our case here) and compares
the result to 0.0 with the “fcmp one
” instruction (‘one’ is “Ordered
and Not Equal”). Based on the result of this expression, the code jumps
to either the “then” or “else” blocks, which contain the expressions for
the true/false cases.
Once the then/else blocks are finished executing, they both branch back to the ‘ifcont’ block to execute the code that happens after the if/then/else. In this case the only thing left to do is to return to the caller of the function. The question then becomes: how does the code know which expression to return?
The answer to this question involves an important SSA operation: the Phi operation. If you’re not familiar with SSA, the wikipedia article is a good introduction and there are various other introductions to it available on your favorite search engine. The short version is that “execution” of the Phi operation requires “remembering” which block control came from. The Phi operation takes on the value corresponding to the input control block. In this case, if control comes in from the “then” block, it gets the value of “calltmp”. If control comes from the “else” block, it gets the value of “calltmp1”.
At this point, you are probably starting to think “Oh no! This means my simple and elegant front-end will have to start generating SSA form in order to use LLVM!”. Fortunately, this is not the case, and we strongly advise not implementing an SSA construction algorithm in your front-end unless there is an amazingly good reason to do so. In practice, there are two sorts of values that float around in code written for your average imperative programming language that might need Phi nodes:
- Code that involves user variables:
x = 1; x = x + 1;
- Values that are implicit in the structure of your AST, such as the Phi node in this case.
In Chapter 7 of this tutorial (“mutable variables”), we’ll talk about #1 in depth. For now, just believe me that you don’t need SSA construction to handle this case. For #2, you have the choice of using the techniques that we will describe for #1, or you can insert Phi nodes directly, if convenient. In this case, it is really easy to generate the Phi node, so we choose to do it directly.
Okay, enough of the motivation and overview, lets generate code!
Code Generation for If/Then/Else¶
In order to generate code for this, we implement the Codegen
method
for IfExprAST
:
Value *IfExprAST::Codegen() {
Value *CondV = Cond->Codegen();
if (CondV == 0) return 0;
// Convert condition to a bool by comparing equal to 0.0.
CondV = Builder.CreateFCmpONE(CondV,
ConstantFP::get(getGlobalContext(), APFloat(0.0)),
"ifcond");
This code is straightforward and similar to what we saw before. We emit the expression for the condition, then compare that value to zero to get a truth value as a 1-bit (bool) value.
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create blocks for the then and else cases. Insert the 'then' block at the
// end of the function.
BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
Builder.CreateCondBr(CondV, ThenBB, ElseBB);
This code creates the basic blocks that are related to the if/then/else statement, and correspond directly to the blocks in the example above. The first line gets the current Function object that is being built. It gets this by asking the builder for the current BasicBlock, and asking that block for its “parent” (the function it is currently embedded into).
Once it has that, it creates three blocks. Note that it passes “TheFunction” into the constructor for the “then” block. This causes the constructor to automatically insert the new block into the end of the specified function. The other two blocks are created, but aren’t yet inserted into the function.
Once the blocks are created, we can emit the conditional branch that chooses between them. Note that creating new blocks does not implicitly affect the IRBuilder, so it is still inserting into the block that the condition went into. Also note that it is creating a branch to the “then” block and the “else” block, even though the “else” block isn’t inserted into the function yet. This is all ok: it is the standard way that LLVM supports forward references.
// Emit then value.
Builder.SetInsertPoint(ThenBB);
Value *ThenV = Then->Codegen();
if (ThenV == 0) return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
ThenBB = Builder.GetInsertBlock();
After the conditional branch is inserted, we move the builder to start inserting into the “then” block. Strictly speaking, this call moves the insertion point to be at the end of the specified block. However, since the “then” block is empty, it also starts out by inserting at the beginning of the block. :)
Once the insertion point is set, we recursively codegen the “then” expression from the AST. To finish off the “then” block, we create an unconditional branch to the merge block. One interesting (and very important) aspect of the LLVM IR is that it requires all basic blocks to be “terminated” with a control flow instruction such as return or branch. This means that all control flow, including fall throughs must be made explicit in the LLVM IR. If you violate this rule, the verifier will emit an error.
The final line here is quite subtle, but is very important. The basic issue is that when we create the Phi node in the merge block, we need to set up the block/value pairs that indicate how the Phi will work. Importantly, the Phi node expects to have an entry for each predecessor of the block in the CFG. Why then, are we getting the current block when we just set it to ThenBB 5 lines above? The problem is that the “Then” expression may actually itself change the block that the Builder is emitting into if, for example, it contains a nested “if/then/else” expression. Because calling Codegen recursively could arbitrarily change the notion of the current block, we are required to get an up-to-date value for code that will set up the Phi node.
// Emit else block.
TheFunction->getBasicBlockList().push_back(ElseBB);
Builder.SetInsertPoint(ElseBB);
Value *ElseV = Else->Codegen();
if (ElseV == 0) return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
ElseBB = Builder.GetInsertBlock();
Code generation for the ‘else’ block is basically identical to codegen for the ‘then’ block. The only significant difference is the first line, which adds the ‘else’ block to the function. Recall previously that the ‘else’ block was created, but not added to the function. Now that the ‘then’ and ‘else’ blocks are emitted, we can finish up with the merge code:
// Emit merge block.
TheFunction->getBasicBlockList().push_back(MergeBB);
Builder.SetInsertPoint(MergeBB);
PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
"iftmp");
PN->addIncoming(ThenV, ThenBB);
PN->addIncoming(ElseV, ElseBB);
return PN;
}
The first two lines here are now familiar: the first adds the “merge” block to the Function object (it was previously floating, like the else block above). The second changes the insertion point so that newly created code will go into the “merge” block. Once that is done, we need to create the PHI node and set up the block/value pairs for the PHI.
Finally, the CodeGen function returns the phi node as the value computed by the if/then/else expression. In our example above, this returned value will feed into the code for the top-level function, which will create the return instruction.
Overall, we now have the ability to execute conditional code in Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language that can calculate a wide variety of numeric functions. Next up we’ll add another useful expression that is familiar from non-functional languages...
‘for’ Loop Expression¶
Now that we know how to add basic control flow constructs to the language, we have the tools to add more powerful things. Lets add something more aggressive, a ‘for’ expression:
extern putchard(char)
def printstar(n)
for i = 1, i < n, 1.0 in
putchard(42); # ascii 42 = '*'
# print 100 '*' characters
printstar(100);
This expression defines a new variable (“i” in this case) which iterates from a starting value, while the condition (“i < n” in this case) is true, incrementing by an optional step value (“1.0” in this case). If the step value is omitted, it defaults to 1.0. While the loop is true, it executes its body expression. Because we don’t have anything better to return, we’ll just define the loop as always returning 0.0. In the future when we have mutable variables, it will get more useful.
As before, lets talk about the changes that we need to Kaleidoscope to support this.
Lexer Extensions for the ‘for’ Loop¶
The lexer extensions are the same sort of thing as for if/then/else:
... in enum Token ...
// control
tok_if = -6, tok_then = -7, tok_else = -8,
tok_for = -9, tok_in = -10
... in gettok ...
if (IdentifierStr == "def") return tok_def;
if (IdentifierStr == "extern") return tok_extern;
if (IdentifierStr == "if") return tok_if;
if (IdentifierStr == "then") return tok_then;
if (IdentifierStr == "else") return tok_else;
if (IdentifierStr == "for") return tok_for;
if (IdentifierStr == "in") return tok_in;
return tok_identifier;
AST Extensions for the ‘for’ Loop¶
The AST node is just as simple. It basically boils down to capturing the variable name and the constituent expressions in the node.
/// ForExprAST - Expression class for for/in.
class ForExprAST : public ExprAST {
std::string VarName;
ExprAST *Start, *End, *Step, *Body;
public:
ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
ExprAST *step, ExprAST *body)
: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
virtual Value *Codegen();
};
Parser Extensions for the ‘for’ Loop¶
The parser code is also fairly standard. The only interesting thing here is handling of the optional step value. The parser code handles it by checking to see if the second comma is present. If not, it sets the step value to null in the AST node:
/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
static ExprAST *ParseForExpr() {
getNextToken(); // eat the for.
if (CurTok != tok_identifier)
return Error("expected identifier after for");
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '=')
return Error("expected '=' after for");
getNextToken(); // eat '='.
ExprAST *Start = ParseExpression();
if (Start == 0) return 0;
if (CurTok != ',')
return Error("expected ',' after for start value");
getNextToken();
ExprAST *End = ParseExpression();
if (End == 0) return 0;
// The step value is optional.
ExprAST *Step = 0;
if (CurTok == ',') {
getNextToken();
Step = ParseExpression();
if (Step == 0) return 0;
}
if (CurTok != tok_in)
return Error("expected 'in' after for");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0) return 0;
return new ForExprAST(IdName, Start, End, Step, Body);
}
LLVM IR for the ‘for’ Loop¶
Now we get to the good part: the LLVM IR we want to generate for this thing. With the simple example above, we get this LLVM IR (note that this dump is generated with optimizations disabled for clarity):
declare double @putchard(double)
define double @printstar(double %n) {
entry:
; initial value = 1.0 (inlined into phi)
br label %loop
loop: ; preds = %loop, %entry
%i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
; body
%calltmp = call double @putchard(double 4.200000e+01)
; increment
%nextvar = fadd double %i, 1.000000e+00
; termination test
%cmptmp = fcmp ult double %i, %n
%booltmp = uitofp i1 %cmptmp to double
%loopcond = fcmp one double %booltmp, 0.000000e+00
br i1 %loopcond, label %loop, label %afterloop
afterloop: ; preds = %loop
; loop always returns 0.0
ret double 0.000000e+00
}
This loop contains all the same constructs we saw before: a phi node, several expressions, and some basic blocks. Lets see how this fits together.
Code Generation for the ‘for’ Loop¶
The first part of Codegen is very simple: we just output the start expression for the loop value:
Value *ForExprAST::Codegen() {
// Emit the start code first, without 'variable' in scope.
Value *StartVal = Start->Codegen();
if (StartVal == 0) return 0;
With this out of the way, the next step is to set up the LLVM basic block for the start of the loop body. In the case above, the whole loop body is one block, but remember that the body code itself could consist of multiple blocks (e.g. if it contains an if/then/else or a for/in expression).
// Make the new basic block for the loop header, inserting after current
// block.
Function *TheFunction = Builder.GetInsertBlock()->getParent();
BasicBlock *PreheaderBB = Builder.GetInsertBlock();
BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
// Insert an explicit fall through from the current block to the LoopBB.
Builder.CreateBr(LoopBB);
This code is similar to what we saw for if/then/else. Because we will need it to create the Phi node, we remember the block that falls through into the loop. Once we have that, we create the actual block that starts the loop and create an unconditional branch for the fall-through between the two blocks.
// Start insertion in LoopBB.
Builder.SetInsertPoint(LoopBB);
// Start the PHI node with an entry for Start.
PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str());
Variable->addIncoming(StartVal, PreheaderBB);
Now that the “preheader” for the loop is set up, we switch to emitting code for the loop body. To begin with, we move the insertion point and create the PHI node for the loop induction variable. Since we already know the incoming value for the starting value, we add it to the Phi node. Note that the Phi will eventually get a second value for the backedge, but we can’t set it up yet (because it doesn’t exist!).
// Within the loop, the variable is defined equal to the PHI node. If it
// shadows an existing variable, we have to restore it, so save it now.
Value *OldVal = NamedValues[VarName];
NamedValues[VarName] = Variable;
// Emit the body of the loop. This, like any other expr, can change the
// current BB. Note that we ignore the value computed by the body, but don't
// allow an error.
if (Body->Codegen() == 0)
return 0;
Now the code starts to get more interesting. Our ‘for’ loop introduces a
new variable to the symbol table. This means that our symbol table can
now contain either function arguments or loop variables. To handle this,
before we codegen the body of the loop, we add the loop variable as the
current value for its name. Note that it is possible that there is a
variable of the same name in the outer scope. It would be easy to make
this an error (emit an error and return null if there is already an
entry for VarName) but we choose to allow shadowing of variables. In
order to handle this correctly, we remember the Value that we are
potentially shadowing in OldVal
(which will be null if there is no
shadowed variable).
Once the loop variable is set into the symbol table, the code recursively codegen’s the body. This allows the body to use the loop variable: any references to it will naturally find it in the symbol table.
// Emit the step value.
Value *StepVal;
if (Step) {
StepVal = Step->Codegen();
if (StepVal == 0) return 0;
} else {
// If not specified, use 1.0.
StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
}
Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
Now that the body is emitted, we compute the next value of the iteration
variable by adding the step value, or 1.0 if it isn’t present.
‘NextVar
‘ will be the value of the loop variable on the next
iteration of the loop.
// Compute the end condition.
Value *EndCond = End->Codegen();
if (EndCond == 0) return EndCond;
// Convert condition to a bool by comparing equal to 0.0.
EndCond = Builder.CreateFCmpONE(EndCond,
ConstantFP::get(getGlobalContext(), APFloat(0.0)),
"loopcond");
Finally, we evaluate the exit value of the loop, to determine whether the loop should exit. This mirrors the condition evaluation for the if/then/else statement.
// Create the "after loop" block and insert it.
BasicBlock *LoopEndBB = Builder.GetInsertBlock();
BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
// Insert the conditional branch into the end of LoopEndBB.
Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
// Any new code will be inserted in AfterBB.
Builder.SetInsertPoint(AfterBB);
With the code for the body of the loop complete, we just need to finish up the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit (“afterloop”). Based on the value of the exit condition, it creates a conditional branch that chooses between executing the loop again and exiting the loop. Any future code is emitted in the “afterloop” block, so it sets the insertion position to it.
// Add a new entry to the PHI node for the backedge.
Variable->addIncoming(NextVar, LoopEndBB);
// Restore the unshadowed variable.
if (OldVal)
NamedValues[VarName] = OldVal;
else
NamedValues.erase(VarName);
// for expr always returns 0.0.
return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
}
The final code handles various cleanups: now that we have the “NextVar”
value, we can add the incoming value to the loop PHI node. After that,
we remove the loop variable from the symbol table, so that it isn’t in
scope after the for loop. Finally, code generation of the for loop
always returns 0.0, so that is what we return from
ForExprAST::Codegen
.
With this, we conclude the “adding control flow to Kaleidoscope” chapter of the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors to know. In the next chapter of our saga, we will get a bit crazier and add user-defined operators to our poor innocent language.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the if/then/else and for expressions.. To build this example, use:
# Compile
clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
Here is the code:
#include "llvm/ADT/STLExtras.h"
#include "llvm/Analysis/Passes.h"
#include "llvm/ExecutionEngine/ExecutionEngine.h"
#include "llvm/ExecutionEngine/MCJIT.h"
#include "llvm/ExecutionEngine/SectionMemoryManager.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Verifier.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Transforms/Scalar.h"
#include <cctype>
#include <cstdio>
#include <map>
#include <string>
#include <vector>
using namespace llvm;
//===----------------------------------------------------------------------===//
// Lexer
//===----------------------------------------------------------------------===//
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2,
tok_extern = -3,
// primary
tok_identifier = -4,
tok_number = -5,
// control
tok_if = -6,
tok_then = -7,
tok_else = -8,
tok_for = -9,
tok_in = -10
};
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = getchar();
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = getchar())))
IdentifierStr += LastChar;
if (IdentifierStr == "def")
return tok_def;
if (IdentifierStr == "extern")
return tok_extern;
if (IdentifierStr == "if")
return tok_if;
if (IdentifierStr == "then")
return tok_then;
if (IdentifierStr == "else")
return tok_else;
if (IdentifierStr == "for")
return tok_for;
if (IdentifierStr == "in")
return tok_in;
return tok_identifier;
}
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = getchar();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
if (LastChar == '#') {
// Comment until end of line.
do
LastChar = getchar();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = getchar();
return ThisChar;
}
//===----------------------------------------------------------------------===//
// Abstract Syntax Tree (aka Parse Tree)
//===----------------------------------------------------------------------===//
namespace {
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
Value *Codegen() override;
};
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(const std::string &name) : Name(name) {}
Value *Codegen() override;
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
: Op(op), LHS(lhs), RHS(rhs) {}
Value *Codegen() override;
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST *> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST *> &args)
: Callee(callee), Args(args) {}
Value *Codegen() override;
};
/// IfExprAST - Expression class for if/then/else.
class IfExprAST : public ExprAST {
ExprAST *Cond, *Then, *Else;
public:
IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
: Cond(cond), Then(then), Else(_else) {}
Value *Codegen() override;
};
/// ForExprAST - Expression class for for/in.
class ForExprAST : public ExprAST {
std::string VarName;
ExprAST *Start, *End, *Step, *Body;
public:
ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
ExprAST *step, ExprAST *body)
: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
Value *Codegen() override;
};
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes).
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args)
: Name(name), Args(args) {}
Function *Codegen();
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {}
Function *Codegen();
};
} // end anonymous namespace
//===----------------------------------------------------------------------===//
// Parser
//===----------------------------------------------------------------------===//
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() { return CurTok = gettok(); }
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0)
return -1;
return TokPrec;
}
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) {
fprintf(stderr, "Error: %s\n", Str);
return 0;
}
PrototypeAST *ErrorP(const char *Str) {
Error(Str);
return 0;
}
FunctionAST *ErrorF(const char *Str) {
Error(Str);
return 0;
}
static ExprAST *ParseExpression();
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST *> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg)
return 0;
Args.push_back(Arg);
if (CurTok == ')')
break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(IdName, Args);
}
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V)
return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
/// ifexpr ::= 'if' expression 'then' expression 'else' expression
static ExprAST *ParseIfExpr() {
getNextToken(); // eat the if.
// condition.
ExprAST *Cond = ParseExpression();
if (!Cond)
return 0;
if (CurTok != tok_then)
return Error("expected then");
getNextToken(); // eat the then
ExprAST *Then = ParseExpression();
if (Then == 0)
return 0;
if (CurTok != tok_else)
return Error("expected else");
getNextToken();
ExprAST *Else = ParseExpression();
if (!Else)
return 0;
return new IfExprAST(Cond, Then, Else);
}
/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
static ExprAST *ParseForExpr() {
getNextToken(); // eat the for.
if (CurTok != tok_identifier)
return Error("expected identifier after for");
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '=')
return Error("expected '=' after for");
getNextToken(); // eat '='.
ExprAST *Start = ParseExpression();
if (Start == 0)
return 0;
if (CurTok != ',')
return Error("expected ',' after for start value");
getNextToken();
ExprAST *End = ParseExpression();
if (End == 0)
return 0;
// The step value is optional.
ExprAST *Step = 0;
if (CurTok == ',') {
getNextToken();
Step = ParseExpression();
if (Step == 0)
return 0;
}
if (CurTok != tok_in)
return Error("expected 'in' after for");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0)
return 0;
return new ForExprAST(IdName, Start, End, Step, Body);
}
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
/// ::= ifexpr
/// ::= forexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default:
return Error("unknown token when expecting an expression");
case tok_identifier:
return ParseIdentifierExpr();
case tok_number:
return ParseNumberExpr();
case '(':
return ParseParenExpr();
case tok_if:
return ParseIfExpr();
case tok_for:
return ParseForExpr();
}
}
/// binoprhs
/// ::= ('+' primary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
// Okay, we know this is a binop.
int BinOp = CurTok;
getNextToken(); // eat binop
// Parse the primary expression after the binary operator.
ExprAST *RHS = ParsePrimary();
if (!RHS)
return 0;
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec + 1, RHS);
if (RHS == 0)
return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
}
}
/// expression
/// ::= primary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParsePrimary();
if (!LHS)
return 0;
return ParseBinOpRHS(0, LHS);
}
/// prototype
/// ::= id '(' id* ')'
static PrototypeAST *ParsePrototype() {
if (CurTok != tok_identifier)
return ErrorP("Expected function name in prototype");
std::string FnName = IdentifierStr;
getNextToken();
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
return new PrototypeAST(FnName, ArgNames);
}
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0)
return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
//===----------------------------------------------------------------------===//
// Code Generation
//===----------------------------------------------------------------------===//
static Module *TheModule;
static IRBuilder<> Builder(getGlobalContext());
static std::map<std::string, Value *> NamedValues;
static legacy::FunctionPassManager *TheFPM;
Value *ErrorV(const char *Str) {
Error(Str);
return 0;
}
Value *NumberExprAST::Codegen() {
return ConstantFP::get(getGlobalContext(), APFloat(Val));
}
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
return V ? V : ErrorV("Unknown variable name");
}
Value *BinaryExprAST::Codegen() {
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0)
return 0;
switch (Op) {
case '+':
return Builder.CreateFAdd(L, R, "addtmp");
case '-':
return Builder.CreateFSub(L, R, "subtmp");
case '*':
return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default:
return ErrorV("invalid binary operator");
}
}
Value *CallExprAST::Codegen() {
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value *> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0)
return 0;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
Value *IfExprAST::Codegen() {
Value *CondV = Cond->Codegen();
if (CondV == 0)
return 0;
// Convert condition to a bool by comparing equal to 0.0.
CondV = Builder.CreateFCmpONE(
CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond");
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create blocks for the then and else cases. Insert the 'then' block at the
// end of the function.
BasicBlock *ThenBB =
BasicBlock::Create(getGlobalContext(), "then", TheFunction);
BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
Builder.CreateCondBr(CondV, ThenBB, ElseBB);
// Emit then value.
Builder.SetInsertPoint(ThenBB);
Value *ThenV = Then->Codegen();
if (ThenV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
ThenBB = Builder.GetInsertBlock();
// Emit else block.
TheFunction->getBasicBlockList().push_back(ElseBB);
Builder.SetInsertPoint(ElseBB);
Value *ElseV = Else->Codegen();
if (ElseV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
ElseBB = Builder.GetInsertBlock();
// Emit merge block.
TheFunction->getBasicBlockList().push_back(MergeBB);
Builder.SetInsertPoint(MergeBB);
PHINode *PN =
Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp");
PN->addIncoming(ThenV, ThenBB);
PN->addIncoming(ElseV, ElseBB);
return PN;
}
Value *ForExprAST::Codegen() {
// Output this as:
// ...
// start = startexpr
// goto loop
// loop:
// variable = phi [start, loopheader], [nextvariable, loopend]
// ...
// bodyexpr
// ...
// loopend:
// step = stepexpr
// nextvariable = variable + step
// endcond = endexpr
// br endcond, loop, endloop
// outloop:
// Emit the start code first, without 'variable' in scope.
Value *StartVal = Start->Codegen();
if (StartVal == 0)
return 0;
// Make the new basic block for the loop header, inserting after current
// block.
Function *TheFunction = Builder.GetInsertBlock()->getParent();
BasicBlock *PreheaderBB = Builder.GetInsertBlock();
BasicBlock *LoopBB =
BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
// Insert an explicit fall through from the current block to the LoopBB.
Builder.CreateBr(LoopBB);
// Start insertion in LoopBB.
Builder.SetInsertPoint(LoopBB);
// Start the PHI node with an entry for Start.
PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
2, VarName.c_str());
Variable->addIncoming(StartVal, PreheaderBB);
// Within the loop, the variable is defined equal to the PHI node. If it
// shadows an existing variable, we have to restore it, so save it now.
Value *OldVal = NamedValues[VarName];
NamedValues[VarName] = Variable;
// Emit the body of the loop. This, like any other expr, can change the
// current BB. Note that we ignore the value computed by the body, but don't
// allow an error.
if (Body->Codegen() == 0)
return 0;
// Emit the step value.
Value *StepVal;
if (Step) {
StepVal = Step->Codegen();
if (StepVal == 0)
return 0;
} else {
// If not specified, use 1.0.
StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
}
Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
// Compute the end condition.
Value *EndCond = End->Codegen();
if (EndCond == 0)
return EndCond;
// Convert condition to a bool by comparing equal to 0.0.
EndCond = Builder.CreateFCmpONE(
EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond");
// Create the "after loop" block and insert it.
BasicBlock *LoopEndBB = Builder.GetInsertBlock();
BasicBlock *AfterBB =
BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
// Insert the conditional branch into the end of LoopEndBB.
Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
// Any new code will be inserted in AfterBB.
Builder.SetInsertPoint(AfterBB);
// Add a new entry to the PHI node for the backedge.
Variable->addIncoming(NextVar, LoopEndBB);
// Restore the unshadowed variable.
if (OldVal)
NamedValues[VarName] = OldVal;
else
NamedValues.erase(VarName);
// for expr always returns 0.0.
return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
}
Function *PrototypeAST::Codegen() {
// Make the function type: double(double,double) etc.
std::vector<Type *> Doubles(Args.size(),
Type::getDoubleTy(getGlobalContext()));
FunctionType *FT =
FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
Function *F =
Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
// If F conflicted, there was already something named 'Name'. If it has a
// body, don't allow redefinition or reextern.
if (F->getName() != Name) {
// Delete the one we just made and get the existing one.
F->eraseFromParent();
F = TheModule->getFunction(Name);
// If F already has a body, reject this.
if (!F->empty()) {
ErrorF("redefinition of function");
return 0;
}
// If F took a different number of args, reject.
if (F->arg_size() != Args.size()) {
ErrorF("redefinition of function with different # args");
return 0;
}
}
// Set names for all arguments.
unsigned Idx = 0;
for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
++AI, ++Idx) {
AI->setName(Args[Idx]);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = AI;
}
return F;
}
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
// Optimize the function.
TheFPM->run(*TheFunction);
return TheFunction;
}
// Error reading body, remove function.
TheFunction->eraseFromParent();
return 0;
}
//===----------------------------------------------------------------------===//
// Top-Level parsing and JIT Driver
//===----------------------------------------------------------------------===//
static ExecutionEngine *TheExecutionEngine;
static void HandleDefinition() {
if (FunctionAST *F = ParseDefinition()) {
if (Function *LF = F->Codegen()) {
fprintf(stderr, "Read function definition:");
LF->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleExtern() {
if (PrototypeAST *P = ParseExtern()) {
if (Function *F = P->Codegen()) {
fprintf(stderr, "Read extern: ");
F->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
if (Function *LF = F->Codegen()) {
TheExecutionEngine->finalizeObject();
// JIT the function, returning a function pointer.
void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
// Cast it to the right type (takes no arguments, returns a double) so we
// can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)FPtr;
fprintf(stderr, "Evaluated to %f\n", FP());
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof:
return;
case ';':
getNextToken();
break; // ignore top-level semicolons.
case tok_def:
HandleDefinition();
break;
case tok_extern:
HandleExtern();
break;
default:
HandleTopLevelExpression();
break;
}
}
}
//===----------------------------------------------------------------------===//
// "Library" functions that can be "extern'd" from user code.
//===----------------------------------------------------------------------===//
/// putchard - putchar that takes a double and returns 0.
extern "C" double putchard(double X) {
putchar((char)X);
return 0;
}
//===----------------------------------------------------------------------===//
// Main driver code.
//===----------------------------------------------------------------------===//
int main() {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
LLVMContext &Context = getGlobalContext();
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
fprintf(stderr, "ready> ");
getNextToken();
// Make the module, which holds all the code.
std::unique_ptr<Module> Owner = make_unique<Module>("my cool jit", Context);
TheModule = Owner.get();
// Create the JIT. This takes ownership of the module.
std::string ErrStr;
TheExecutionEngine =
EngineBuilder(std::move(Owner))
.setErrorStr(&ErrStr)
.setMCJITMemoryManager(llvm::make_unique<SectionMemoryManager>())
.create();
if (!TheExecutionEngine) {
fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
exit(1);
}
legacy::FunctionPassManager OurFPM(TheModule);
// Set up the optimizer pipeline. Start with registering info about how the
// target lays out data structures.
TheModule->setDataLayout(*TheExecutionEngine->getDataLayout());
// Provide basic AliasAnalysis support for GVN.
OurFPM.add(createBasicAliasAnalysisPass());
// Do simple "peephole" optimizations and bit-twiddling optzns.
OurFPM.add(createInstructionCombiningPass());
// Reassociate expressions.
OurFPM.add(createReassociatePass());
// Eliminate Common SubExpressions.
OurFPM.add(createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
OurFPM.add(createCFGSimplificationPass());
OurFPM.doInitialization();
// Set the global so the code gen can use this.
TheFPM = &OurFPM;
// Run the main "interpreter loop" now.
MainLoop();
TheFPM = 0;
// Print out all of the generated code.
TheModule->dump();
return 0;
}
Kaleidoscope: Extending the Language: User-defined Operators¶
Chapter 6 Introduction¶
Welcome to Chapter 6 of the “Implementing a language with LLVM” tutorial. At this point in our tutorial, we now have a fully functional language that is fairly minimal, but also useful. There is still one big problem with it, however. Our language doesn’t have many useful operators (like division, logical negation, or even any comparisons besides less-than).
This chapter of the tutorial takes a wild digression into adding user-defined operators to the simple and beautiful Kaleidoscope language. This digression now gives us a simple and ugly language in some ways, but also a powerful one at the same time. One of the great things about creating your own language is that you get to decide what is good or bad. In this tutorial we’ll assume that it is okay to use this as a way to show some interesting parsing techniques.
At the end of this tutorial, we’ll run through an example Kaleidoscope application that renders the Mandelbrot set. This gives an example of what you can build with Kaleidoscope and its feature set.
User-defined Operators: the Idea¶
The “operator overloading” that we will add to Kaleidoscope is more general than languages like C++. In C++, you are only allowed to redefine existing operators: you can’t programatically change the grammar, introduce new operators, change precedence levels, etc. In this chapter, we will add this capability to Kaleidoscope, which will let the user round out the set of operators that are supported.
The point of going into user-defined operators in a tutorial like this is to show the power and flexibility of using a hand-written parser. Thus far, the parser we have been implementing uses recursive descent for most parts of the grammar and operator precedence parsing for the expressions. See Chapter 2 for details. Without using operator precedence parsing, it would be very difficult to allow the programmer to introduce new operators into the grammar: the grammar is dynamically extensible as the JIT runs.
The two specific features we’ll add are programmable unary operators (right now, Kaleidoscope has no unary operators at all) as well as binary operators. An example of this is:
# Logical unary not.
def unary!(v)
if v then
0
else
1;
# Define > with the same precedence as <.
def binary> 10 (LHS RHS)
RHS < LHS;
# Binary "logical or", (note that it does not "short circuit")
def binary| 5 (LHS RHS)
if LHS then
1
else if RHS then
1
else
0;
# Define = with slightly lower precedence than relationals.
def binary= 9 (LHS RHS)
!(LHS < RHS | LHS > RHS);
Many languages aspire to being able to implement their standard runtime library in the language itself. In Kaleidoscope, we can implement significant parts of the language in the library!
We will break down implementation of these features into two parts: implementing support for user-defined binary operators and adding unary operators.
User-defined Binary Operators¶
Adding support for user-defined binary operators is pretty simple with our current framework. We’ll first add support for the unary/binary keywords:
enum Token {
...
// operators
tok_binary = -11, tok_unary = -12
};
...
static int gettok() {
...
if (IdentifierStr == "for") return tok_for;
if (IdentifierStr == "in") return tok_in;
if (IdentifierStr == "binary") return tok_binary;
if (IdentifierStr == "unary") return tok_unary;
return tok_identifier;
This just adds lexer support for the unary and binary keywords, like we did in previous chapters. One nice thing about our current AST, is that we represent binary operators with full generalisation by using their ASCII code as the opcode. For our extended operators, we’ll use this same representation, so we don’t need any new AST or parser support.
On the other hand, we have to be able to represent the definitions of
these new operators, in the “def binary| 5” part of the function
definition. In our grammar so far, the “name” for the function
definition is parsed as the “prototype” production and into the
PrototypeAST
AST node. To represent our new user-defined operators
as prototypes, we have to extend the PrototypeAST
AST node like
this:
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its argument names as well as if it is an operator.
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
bool isOperator;
unsigned Precedence; // Precedence if a binary op.
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args,
bool isoperator = false, unsigned prec = 0)
: Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
bool isUnaryOp() const { return isOperator && Args.size() == 1; }
bool isBinaryOp() const { return isOperator && Args.size() == 2; }
char getOperatorName() const {
assert(isUnaryOp() || isBinaryOp());
return Name[Name.size()-1];
}
unsigned getBinaryPrecedence() const { return Precedence; }
Function *Codegen();
};
Basically, in addition to knowing a name for the prototype, we now keep track of whether it was an operator, and if it was, what precedence level the operator is at. The precedence is only used for binary operators (as you’ll see below, it just doesn’t apply for unary operators). Now that we have a way to represent the prototype for a user-defined operator, we need to parse it:
/// prototype
/// ::= id '(' id* ')'
/// ::= binary LETTER number? (id, id)
static PrototypeAST *ParsePrototype() {
std::string FnName;
unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
unsigned BinaryPrecedence = 30;
switch (CurTok) {
default:
return ErrorP("Expected function name in prototype");
case tok_identifier:
FnName = IdentifierStr;
Kind = 0;
getNextToken();
break;
case tok_binary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected binary operator");
FnName = "binary";
FnName += (char)CurTok;
Kind = 2;
getNextToken();
// Read the precedence if present.
if (CurTok == tok_number) {
if (NumVal < 1 || NumVal > 100)
return ErrorP("Invalid precedecnce: must be 1..100");
BinaryPrecedence = (unsigned)NumVal;
getNextToken();
}
break;
}
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
// Verify right number of names for operator.
if (Kind && ArgNames.size() != Kind)
return ErrorP("Invalid number of operands for operator");
return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
}
This is all fairly straightforward parsing code, and we have already
seen a lot of similar code in the past. One interesting part about the
code above is the couple lines that set up FnName
for binary
operators. This builds names like “binary@” for a newly defined “@”
operator. This then takes advantage of the fact that symbol names in the
LLVM symbol table are allowed to have any character in them, including
embedded nul characters.
The next interesting thing to add, is codegen support for these binary operators. Given our current structure, this is a simple addition of a default case for our existing binary operator node:
Value *BinaryExprAST::Codegen() {
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0) return 0;
switch (Op) {
case '+': return Builder.CreateFAdd(L, R, "addtmp");
case '-': return Builder.CreateFSub(L, R, "subtmp");
case '*': return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default: break;
}
// If it wasn't a builtin binary operator, it must be a user defined one. Emit
// a call to it.
Function *F = TheModule->getFunction(std::string("binary")+Op);
assert(F && "binary operator not found!");
Value *Ops[2] = { L, R };
return Builder.CreateCall(F, Ops, "binop");
}
As you can see above, the new code is actually really simple. It just does a lookup for the appropriate operator in the symbol table and generates a function call to it. Since user-defined operators are just built as normal functions (because the “prototype” boils down to a function with the right name) everything falls into place.
The final piece of code we are missing, is a bit of top-level magic:
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
// If this is an operator, install it.
if (Proto->isBinaryOp())
BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
if (Value *RetVal = Body->Codegen()) {
...
Basically, before codegening a function, if it is a user-defined operator, we register it in the precedence table. This allows the binary operator parsing logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to “extend the grammar”.
Now we have useful user-defined binary operators. This builds a lot on the previous framework we built for other operators. Adding unary operators is a bit more challenging, because we don’t have any framework for it yet - lets see what it takes.
User-defined Unary Operators¶
Since we don’t currently support unary operators in the Kaleidoscope language, we’ll need to add everything to support them. Above, we added simple support for the ‘unary’ keyword to the lexer. In addition to that, we need an AST node:
/// UnaryExprAST - Expression class for a unary operator.
class UnaryExprAST : public ExprAST {
char Opcode;
ExprAST *Operand;
public:
UnaryExprAST(char opcode, ExprAST *operand)
: Opcode(opcode), Operand(operand) {}
virtual Value *Codegen();
};
This AST node is very simple and obvious by now. It directly mirrors the binary operator AST node, except that it only has one child. With this, we need to add the parsing logic. Parsing a unary operator is pretty simple: we’ll add a new function to do it:
/// unary
/// ::= primary
/// ::= '!' unary
static ExprAST *ParseUnary() {
// If the current token is not an operator, it must be a primary expr.
if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
return ParsePrimary();
// If this is a unary operator, read it.
int Opc = CurTok;
getNextToken();
if (ExprAST *Operand = ParseUnary())
return new UnaryExprAST(Opc, Operand);
return 0;
}
The grammar we add is pretty straightforward here. If we see a unary operator when parsing a primary operator, we eat the operator as a prefix and parse the remaining piece as another unary operator. This allows us to handle multiple unary operators (e.g. ”!!x”). Note that unary operators can’t have ambiguous parses like binary operators can, so there is no need for precedence information.
The problem with this function, is that we need to call ParseUnary from somewhere. To do this, we change previous callers of ParsePrimary to call ParseUnary instead:
/// binoprhs
/// ::= ('+' unary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
...
// Parse the unary expression after the binary operator.
ExprAST *RHS = ParseUnary();
if (!RHS) return 0;
...
}
/// expression
/// ::= unary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParseUnary();
if (!LHS) return 0;
return ParseBinOpRHS(0, LHS);
}
With these two simple changes, we are now able to parse unary operators and build the AST for them. Next up, we need to add parser support for prototypes, to parse the unary operator prototype. We extend the binary operator code above with:
/// prototype
/// ::= id '(' id* ')'
/// ::= binary LETTER number? (id, id)
/// ::= unary LETTER (id)
static PrototypeAST *ParsePrototype() {
std::string FnName;
unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
unsigned BinaryPrecedence = 30;
switch (CurTok) {
default:
return ErrorP("Expected function name in prototype");
case tok_identifier:
FnName = IdentifierStr;
Kind = 0;
getNextToken();
break;
case tok_unary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected unary operator");
FnName = "unary";
FnName += (char)CurTok;
Kind = 1;
getNextToken();
break;
case tok_binary:
...
As with binary operators, we name unary operators with a name that includes the operator character. This assists us at code generation time. Speaking of, the final piece we need to add is codegen support for unary operators. It looks like this:
Value *UnaryExprAST::Codegen() {
Value *OperandV = Operand->Codegen();
if (OperandV == 0) return 0;
Function *F = TheModule->getFunction(std::string("unary")+Opcode);
if (F == 0)
return ErrorV("Unknown unary operator");
return Builder.CreateCall(F, OperandV, "unop");
}
This code is similar to, but simpler than, the code for binary operators. It is simpler primarily because it doesn’t need to handle any predefined operators.
Kicking the Tires¶
It is somewhat hard to believe, but with a few simple extensions we’ve covered in the last chapters, we have grown a real-ish language. With this, we can do a lot of interesting things, including I/O, math, and a bunch of other things. For example, we can now add a nice sequencing operator (printd is defined to print out the specified value and a newline):
ready> extern printd(x);
Read extern:
declare double @printd(double)
ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
..
ready> printd(123) : printd(456) : printd(789);
123.000000
456.000000
789.000000
Evaluated to 0.000000
We can also define a bunch of other “primitive” operations, such as:
# Logical unary not.
def unary!(v)
if v then
0
else
1;
# Unary negate.
def unary-(v)
0-v;
# Define > with the same precedence as <.
def binary> 10 (LHS RHS)
RHS < LHS;
# Binary logical or, which does not short circuit.
def binary| 5 (LHS RHS)
if LHS then
1
else if RHS then
1
else
0;
# Binary logical and, which does not short circuit.
def binary& 6 (LHS RHS)
if !LHS then
0
else
!!RHS;
# Define = with slightly lower precedence than relationals.
def binary = 9 (LHS RHS)
!(LHS < RHS | LHS > RHS);
# Define ':' for sequencing: as a low-precedence operator that ignores operands
# and just returns the RHS.
def binary : 1 (x y) y;
Given the previous if/then/else support, we can also define interesting functions for I/O. For example, the following prints out a character whose “density” reflects the value passed in: the lower the value, the denser the character:
ready>
extern putchard(char)
def printdensity(d)
if d > 8 then
putchard(32) # ' '
else if d > 4 then
putchard(46) # '.'
else if d > 2 then
putchard(43) # '+'
else
putchard(42); # '*'
...
ready> printdensity(1): printdensity(2): printdensity(3):
printdensity(4): printdensity(5): printdensity(9):
putchard(10);
**++.
Evaluated to 0.000000
Based on these simple primitive operations, we can start to define more interesting things. For example, here’s a little function that solves for the number of iterations it takes a function in the complex plane to converge:
# Determine whether the specific location diverges.
# Solve for z = z^2 + c in the complex plane.
def mandleconverger(real imag iters creal cimag)
if iters > 255 | (real*real + imag*imag > 4) then
iters
else
mandleconverger(real*real - imag*imag + creal,
2*real*imag + cimag,
iters+1, creal, cimag);
# Return the number of iterations required for the iteration to escape
def mandleconverge(real imag)
mandleconverger(real, imag, 0, real, imag);
This “z = z2 + c
” function is a beautiful little creature that is
the basis for computation of the Mandelbrot
Set. Our
mandelconverge
function returns the number of iterations that it
takes for a complex orbit to escape, saturating to 255. This is not a
very useful function by itself, but if you plot its value over a
two-dimensional plane, you can see the Mandelbrot set. Given that we are
limited to using putchard here, our amazing graphical output is limited,
but we can whip together something using the density plotter above:
# Compute and plot the mandlebrot set with the specified 2 dimensional range
# info.
def mandelhelp(xmin xmax xstep ymin ymax ystep)
for y = ymin, y < ymax, ystep in (
(for x = xmin, x < xmax, xstep in
printdensity(mandleconverge(x,y)))
: putchard(10)
)
# mandel - This is a convenient helper function for plotting the mandelbrot set
# from the specified position with the specified Magnification.
def mandel(realstart imagstart realmag imagmag)
mandelhelp(realstart, realstart+realmag*78, realmag,
imagstart, imagstart+imagmag*40, imagmag);
Given this, we can try plotting out the mandlebrot set! Lets try it out:
ready> mandel(-2.3, -1.3, 0.05, 0.07);
*******************************+++++++++++*************************************
*************************+++++++++++++++++++++++*******************************
**********************+++++++++++++++++++++++++++++****************************
*******************+++++++++++++++++++++.. ...++++++++*************************
*****************++++++++++++++++++++++.... ...+++++++++***********************
***************+++++++++++++++++++++++..... ...+++++++++*********************
**************+++++++++++++++++++++++.... ....+++++++++********************
*************++++++++++++++++++++++...... .....++++++++*******************
************+++++++++++++++++++++....... .......+++++++******************
***********+++++++++++++++++++.... ... .+++++++*****************
**********+++++++++++++++++....... .+++++++****************
*********++++++++++++++........... ...+++++++***************
********++++++++++++............ ...++++++++**************
********++++++++++... .......... .++++++++**************
*******+++++++++..... .+++++++++*************
*******++++++++...... ..+++++++++*************
*******++++++....... ..+++++++++*************
*******+++++...... ..+++++++++*************
*******.... .... ...+++++++++*************
*******.... . ...+++++++++*************
*******+++++...... ...+++++++++*************
*******++++++....... ..+++++++++*************
*******++++++++...... .+++++++++*************
*******+++++++++..... ..+++++++++*************
********++++++++++... .......... .++++++++**************
********++++++++++++............ ...++++++++**************
*********++++++++++++++.......... ...+++++++***************
**********++++++++++++++++........ .+++++++****************
**********++++++++++++++++++++.... ... ..+++++++****************
***********++++++++++++++++++++++....... .......++++++++*****************
************+++++++++++++++++++++++...... ......++++++++******************
**************+++++++++++++++++++++++.... ....++++++++********************
***************+++++++++++++++++++++++..... ...+++++++++*********************
*****************++++++++++++++++++++++.... ...++++++++***********************
*******************+++++++++++++++++++++......++++++++*************************
*********************++++++++++++++++++++++.++++++++***************************
*************************+++++++++++++++++++++++*******************************
******************************+++++++++++++************************************
*******************************************************************************
*******************************************************************************
*******************************************************************************
Evaluated to 0.000000
ready> mandel(-2, -1, 0.02, 0.04);
**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
**************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
**********++++++++++++++++++++++++++++++++++++++++++++++.............
********+++++++++++++++++++++++++++++++++++++++++++..................
*******+++++++++++++++++++++++++++++++++++++++.......................
******+++++++++++++++++++++++++++++++++++...........................
*****++++++++++++++++++++++++++++++++............................
*****++++++++++++++++++++++++++++...............................
****++++++++++++++++++++++++++...... .........................
***++++++++++++++++++++++++......... ...... ...........
***++++++++++++++++++++++............
**+++++++++++++++++++++..............
**+++++++++++++++++++................
*++++++++++++++++++.................
*++++++++++++++++............ ...
*++++++++++++++..............
*+++....++++................
*.......... ...........
*
*.......... ...........
*+++....++++................
*++++++++++++++..............
*++++++++++++++++............ ...
*++++++++++++++++++.................
**+++++++++++++++++++................
**+++++++++++++++++++++..............
***++++++++++++++++++++++............
***++++++++++++++++++++++++......... ...... ...........
****++++++++++++++++++++++++++...... .........................
*****++++++++++++++++++++++++++++...............................
*****++++++++++++++++++++++++++++++++............................
******+++++++++++++++++++++++++++++++++++...........................
*******+++++++++++++++++++++++++++++++++++++++.......................
********+++++++++++++++++++++++++++++++++++++++++++..................
Evaluated to 0.000000
ready> mandel(-0.9, -1.4, 0.02, 0.03);
*******************************************************************************
*******************************************************************************
*******************************************************************************
**********+++++++++++++++++++++************************************************
*+++++++++++++++++++++++++++++++++++++++***************************************
+++++++++++++++++++++++++++++++++++++++++++++**********************************
++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
+++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
+++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
++++++++++++++++++++++++............. .......++++++++++++++++++++++******
+++++++++++++++++++++++............. ........+++++++++++++++++++++++****
++++++++++++++++++++++........... ..........++++++++++++++++++++++***
++++++++++++++++++++........... .........++++++++++++++++++++++*
++++++++++++++++++............ ...........++++++++++++++++++++
++++++++++++++++............... .............++++++++++++++++++
++++++++++++++................. ...............++++++++++++++++
++++++++++++.................. .................++++++++++++++
+++++++++.................. .................+++++++++++++
++++++........ . ......... ..++++++++++++
++............ ...... ....++++++++++
.............. ...++++++++++
.............. ....+++++++++
.............. .....++++++++
............. ......++++++++
........... .......++++++++
......... ........+++++++
......... ........+++++++
......... ....+++++++
........ ...+++++++
....... ...+++++++
....+++++++
.....+++++++
....+++++++
....+++++++
....+++++++
Evaluated to 0.000000
ready> ^D
At this point, you may be starting to realize that Kaleidoscope is a real and powerful language. It may not be self-similar :), but it can be used to plot things that are!
With this, we conclude the “adding user-defined operators” chapter of the tutorial. We have successfully augmented our language, adding the ability to extend the language in the library, and we have shown how this can be used to build a simple but interesting end-user application in Kaleidoscope. At this point, Kaleidoscope can build a variety of applications that are functional and can call functions with side-effects, but it can’t actually define and mutate a variable itself.
Strikingly, variable mutation is an important feature of some languages, and it is not at all obvious how to add support for mutable variables without having to add an “SSA construction” phase to your front-end. In the next chapter, we will describe how you can add variable mutation without building SSA in your front-end.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the if/then/else and for expressions.. To build this example, use:
# Compile
clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
On some platforms, you will need to specify -rdynamic or -Wl,–export-dynamic when linking. This ensures that symbols defined in the main executable are exported to the dynamic linker and so are available for symbol resolution at run time. This is not needed if you compile your support code into a shared library, although doing that will cause problems on Windows.
Here is the code:
#include "llvm/ADT/STLExtras.h"
#include "llvm/Analysis/Passes.h"
#include "llvm/ExecutionEngine/ExecutionEngine.h"
#include "llvm/ExecutionEngine/MCJIT.h"
#include "llvm/ExecutionEngine/SectionMemoryManager.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Verifier.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Transforms/Scalar.h"
#include <cctype>
#include <cstdio>
#include <map>
#include <string>
#include <vector>
using namespace llvm;
//===----------------------------------------------------------------------===//
// Lexer
//===----------------------------------------------------------------------===//
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2,
tok_extern = -3,
// primary
tok_identifier = -4,
tok_number = -5,
// control
tok_if = -6,
tok_then = -7,
tok_else = -8,
tok_for = -9,
tok_in = -10,
// operators
tok_binary = -11,
tok_unary = -12
};
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = getchar();
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = getchar())))
IdentifierStr += LastChar;
if (IdentifierStr == "def")
return tok_def;
if (IdentifierStr == "extern")
return tok_extern;
if (IdentifierStr == "if")
return tok_if;
if (IdentifierStr == "then")
return tok_then;
if (IdentifierStr == "else")
return tok_else;
if (IdentifierStr == "for")
return tok_for;
if (IdentifierStr == "in")
return tok_in;
if (IdentifierStr == "binary")
return tok_binary;
if (IdentifierStr == "unary")
return tok_unary;
return tok_identifier;
}
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = getchar();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
if (LastChar == '#') {
// Comment until end of line.
do
LastChar = getchar();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = getchar();
return ThisChar;
}
//===----------------------------------------------------------------------===//
// Abstract Syntax Tree (aka Parse Tree)
//===----------------------------------------------------------------------===//
namespace {
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
Value *Codegen() override;
};
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(const std::string &name) : Name(name) {}
Value *Codegen() override;
};
/// UnaryExprAST - Expression class for a unary operator.
class UnaryExprAST : public ExprAST {
char Opcode;
ExprAST *Operand;
public:
UnaryExprAST(char opcode, ExprAST *operand)
: Opcode(opcode), Operand(operand) {}
Value *Codegen() override;
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
: Op(op), LHS(lhs), RHS(rhs) {}
Value *Codegen() override;
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST *> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST *> &args)
: Callee(callee), Args(args) {}
Value *Codegen() override;
};
/// IfExprAST - Expression class for if/then/else.
class IfExprAST : public ExprAST {
ExprAST *Cond, *Then, *Else;
public:
IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
: Cond(cond), Then(then), Else(_else) {}
Value *Codegen() override;
};
/// ForExprAST - Expression class for for/in.
class ForExprAST : public ExprAST {
std::string VarName;
ExprAST *Start, *End, *Step, *Body;
public:
ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
ExprAST *step, ExprAST *body)
: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
Value *Codegen() override;
};
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes), as well as if it is an operator.
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
bool isOperator;
unsigned Precedence; // Precedence if a binary op.
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args,
bool isoperator = false, unsigned prec = 0)
: Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
bool isUnaryOp() const { return isOperator && Args.size() == 1; }
bool isBinaryOp() const { return isOperator && Args.size() == 2; }
char getOperatorName() const {
assert(isUnaryOp() || isBinaryOp());
return Name[Name.size() - 1];
}
unsigned getBinaryPrecedence() const { return Precedence; }
Function *Codegen();
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {}
Function *Codegen();
};
} // end anonymous namespace
//===----------------------------------------------------------------------===//
// Parser
//===----------------------------------------------------------------------===//
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() { return CurTok = gettok(); }
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0)
return -1;
return TokPrec;
}
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) {
fprintf(stderr, "Error: %s\n", Str);
return 0;
}
PrototypeAST *ErrorP(const char *Str) {
Error(Str);
return 0;
}
FunctionAST *ErrorF(const char *Str) {
Error(Str);
return 0;
}
static ExprAST *ParseExpression();
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST *> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg)
return 0;
Args.push_back(Arg);
if (CurTok == ')')
break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(IdName, Args);
}
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V)
return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
/// ifexpr ::= 'if' expression 'then' expression 'else' expression
static ExprAST *ParseIfExpr() {
getNextToken(); // eat the if.
// condition.
ExprAST *Cond = ParseExpression();
if (!Cond)
return 0;
if (CurTok != tok_then)
return Error("expected then");
getNextToken(); // eat the then
ExprAST *Then = ParseExpression();
if (Then == 0)
return 0;
if (CurTok != tok_else)
return Error("expected else");
getNextToken();
ExprAST *Else = ParseExpression();
if (!Else)
return 0;
return new IfExprAST(Cond, Then, Else);
}
/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
static ExprAST *ParseForExpr() {
getNextToken(); // eat the for.
if (CurTok != tok_identifier)
return Error("expected identifier after for");
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '=')
return Error("expected '=' after for");
getNextToken(); // eat '='.
ExprAST *Start = ParseExpression();
if (Start == 0)
return 0;
if (CurTok != ',')
return Error("expected ',' after for start value");
getNextToken();
ExprAST *End = ParseExpression();
if (End == 0)
return 0;
// The step value is optional.
ExprAST *Step = 0;
if (CurTok == ',') {
getNextToken();
Step = ParseExpression();
if (Step == 0)
return 0;
}
if (CurTok != tok_in)
return Error("expected 'in' after for");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0)
return 0;
return new ForExprAST(IdName, Start, End, Step, Body);
}
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
/// ::= ifexpr
/// ::= forexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default:
return Error("unknown token when expecting an expression");
case tok_identifier:
return ParseIdentifierExpr();
case tok_number:
return ParseNumberExpr();
case '(':
return ParseParenExpr();
case tok_if:
return ParseIfExpr();
case tok_for:
return ParseForExpr();
}
}
/// unary
/// ::= primary
/// ::= '!' unary
static ExprAST *ParseUnary() {
// If the current token is not an operator, it must be a primary expr.
if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
return ParsePrimary();
// If this is a unary operator, read it.
int Opc = CurTok;
getNextToken();
if (ExprAST *Operand = ParseUnary())
return new UnaryExprAST(Opc, Operand);
return 0;
}
/// binoprhs
/// ::= ('+' unary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
// Okay, we know this is a binop.
int BinOp = CurTok;
getNextToken(); // eat binop
// Parse the unary expression after the binary operator.
ExprAST *RHS = ParseUnary();
if (!RHS)
return 0;
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec + 1, RHS);
if (RHS == 0)
return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
}
}
/// expression
/// ::= unary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParseUnary();
if (!LHS)
return 0;
return ParseBinOpRHS(0, LHS);
}
/// prototype
/// ::= id '(' id* ')'
/// ::= binary LETTER number? (id, id)
/// ::= unary LETTER (id)
static PrototypeAST *ParsePrototype() {
std::string FnName;
unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
unsigned BinaryPrecedence = 30;
switch (CurTok) {
default:
return ErrorP("Expected function name in prototype");
case tok_identifier:
FnName = IdentifierStr;
Kind = 0;
getNextToken();
break;
case tok_unary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected unary operator");
FnName = "unary";
FnName += (char)CurTok;
Kind = 1;
getNextToken();
break;
case tok_binary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected binary operator");
FnName = "binary";
FnName += (char)CurTok;
Kind = 2;
getNextToken();
// Read the precedence if present.
if (CurTok == tok_number) {
if (NumVal < 1 || NumVal > 100)
return ErrorP("Invalid precedecnce: must be 1..100");
BinaryPrecedence = (unsigned)NumVal;
getNextToken();
}
break;
}
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
// Verify right number of names for operator.
if (Kind && ArgNames.size() != Kind)
return ErrorP("Invalid number of operands for operator");
return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
}
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0)
return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
//===----------------------------------------------------------------------===//
// Code Generation
//===----------------------------------------------------------------------===//
static Module *TheModule;
static IRBuilder<> Builder(getGlobalContext());
static std::map<std::string, Value *> NamedValues;
static legacy::FunctionPassManager *TheFPM;
Value *ErrorV(const char *Str) {
Error(Str);
return 0;
}
Value *NumberExprAST::Codegen() {
return ConstantFP::get(getGlobalContext(), APFloat(Val));
}
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
return V ? V : ErrorV("Unknown variable name");
}
Value *UnaryExprAST::Codegen() {
Value *OperandV = Operand->Codegen();
if (OperandV == 0)
return 0;
Function *F = TheModule->getFunction(std::string("unary") + Opcode);
if (F == 0)
return ErrorV("Unknown unary operator");
return Builder.CreateCall(F, OperandV, "unop");
}
Value *BinaryExprAST::Codegen() {
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0)
return 0;
switch (Op) {
case '+':
return Builder.CreateFAdd(L, R, "addtmp");
case '-':
return Builder.CreateFSub(L, R, "subtmp");
case '*':
return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default:
break;
}
// If it wasn't a builtin binary operator, it must be a user defined one. Emit
// a call to it.
Function *F = TheModule->getFunction(std::string("binary") + Op);
assert(F && "binary operator not found!");
Value *Ops[] = { L, R };
return Builder.CreateCall(F, Ops, "binop");
}
Value *CallExprAST::Codegen() {
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value *> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0)
return 0;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
Value *IfExprAST::Codegen() {
Value *CondV = Cond->Codegen();
if (CondV == 0)
return 0;
// Convert condition to a bool by comparing equal to 0.0.
CondV = Builder.CreateFCmpONE(
CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond");
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create blocks for the then and else cases. Insert the 'then' block at the
// end of the function.
BasicBlock *ThenBB =
BasicBlock::Create(getGlobalContext(), "then", TheFunction);
BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
Builder.CreateCondBr(CondV, ThenBB, ElseBB);
// Emit then value.
Builder.SetInsertPoint(ThenBB);
Value *ThenV = Then->Codegen();
if (ThenV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
ThenBB = Builder.GetInsertBlock();
// Emit else block.
TheFunction->getBasicBlockList().push_back(ElseBB);
Builder.SetInsertPoint(ElseBB);
Value *ElseV = Else->Codegen();
if (ElseV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
ElseBB = Builder.GetInsertBlock();
// Emit merge block.
TheFunction->getBasicBlockList().push_back(MergeBB);
Builder.SetInsertPoint(MergeBB);
PHINode *PN =
Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp");
PN->addIncoming(ThenV, ThenBB);
PN->addIncoming(ElseV, ElseBB);
return PN;
}
Value *ForExprAST::Codegen() {
// Output this as:
// ...
// start = startexpr
// goto loop
// loop:
// variable = phi [start, loopheader], [nextvariable, loopend]
// ...
// bodyexpr
// ...
// loopend:
// step = stepexpr
// nextvariable = variable + step
// endcond = endexpr
// br endcond, loop, endloop
// outloop:
// Emit the start code first, without 'variable' in scope.
Value *StartVal = Start->Codegen();
if (StartVal == 0)
return 0;
// Make the new basic block for the loop header, inserting after current
// block.
Function *TheFunction = Builder.GetInsertBlock()->getParent();
BasicBlock *PreheaderBB = Builder.GetInsertBlock();
BasicBlock *LoopBB =
BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
// Insert an explicit fall through from the current block to the LoopBB.
Builder.CreateBr(LoopBB);
// Start insertion in LoopBB.
Builder.SetInsertPoint(LoopBB);
// Start the PHI node with an entry for Start.
PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
2, VarName.c_str());
Variable->addIncoming(StartVal, PreheaderBB);
// Within the loop, the variable is defined equal to the PHI node. If it
// shadows an existing variable, we have to restore it, so save it now.
Value *OldVal = NamedValues[VarName];
NamedValues[VarName] = Variable;
// Emit the body of the loop. This, like any other expr, can change the
// current BB. Note that we ignore the value computed by the body, but don't
// allow an error.
if (Body->Codegen() == 0)
return 0;
// Emit the step value.
Value *StepVal;
if (Step) {
StepVal = Step->Codegen();
if (StepVal == 0)
return 0;
} else {
// If not specified, use 1.0.
StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
}
Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
// Compute the end condition.
Value *EndCond = End->Codegen();
if (EndCond == 0)
return EndCond;
// Convert condition to a bool by comparing equal to 0.0.
EndCond = Builder.CreateFCmpONE(
EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond");
// Create the "after loop" block and insert it.
BasicBlock *LoopEndBB = Builder.GetInsertBlock();
BasicBlock *AfterBB =
BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
// Insert the conditional branch into the end of LoopEndBB.
Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
// Any new code will be inserted in AfterBB.
Builder.SetInsertPoint(AfterBB);
// Add a new entry to the PHI node for the backedge.
Variable->addIncoming(NextVar, LoopEndBB);
// Restore the unshadowed variable.
if (OldVal)
NamedValues[VarName] = OldVal;
else
NamedValues.erase(VarName);
// for expr always returns 0.0.
return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
}
Function *PrototypeAST::Codegen() {
// Make the function type: double(double,double) etc.
std::vector<Type *> Doubles(Args.size(),
Type::getDoubleTy(getGlobalContext()));
FunctionType *FT =
FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
Function *F =
Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
// If F conflicted, there was already something named 'Name'. If it has a
// body, don't allow redefinition or reextern.
if (F->getName() != Name) {
// Delete the one we just made and get the existing one.
F->eraseFromParent();
F = TheModule->getFunction(Name);
// If F already has a body, reject this.
if (!F->empty()) {
ErrorF("redefinition of function");
return 0;
}
// If F took a different number of args, reject.
if (F->arg_size() != Args.size()) {
ErrorF("redefinition of function with different # args");
return 0;
}
}
// Set names for all arguments.
unsigned Idx = 0;
for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
++AI, ++Idx) {
AI->setName(Args[Idx]);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = AI;
}
return F;
}
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
// If this is an operator, install it.
if (Proto->isBinaryOp())
BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
// Optimize the function.
TheFPM->run(*TheFunction);
return TheFunction;
}
// Error reading body, remove function.
TheFunction->eraseFromParent();
if (Proto->isBinaryOp())
BinopPrecedence.erase(Proto->getOperatorName());
return 0;
}
//===----------------------------------------------------------------------===//
// Top-Level parsing and JIT Driver
//===----------------------------------------------------------------------===//
static ExecutionEngine *TheExecutionEngine;
static void HandleDefinition() {
if (FunctionAST *F = ParseDefinition()) {
if (Function *LF = F->Codegen()) {
fprintf(stderr, "Read function definition:");
LF->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleExtern() {
if (PrototypeAST *P = ParseExtern()) {
if (Function *F = P->Codegen()) {
fprintf(stderr, "Read extern: ");
F->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
if (Function *LF = F->Codegen()) {
TheExecutionEngine->finalizeObject();
// JIT the function, returning a function pointer.
void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
// Cast it to the right type (takes no arguments, returns a double) so we
// can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)FPtr;
fprintf(stderr, "Evaluated to %f\n", FP());
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof:
return;
case ';':
getNextToken();
break; // ignore top-level semicolons.
case tok_def:
HandleDefinition();
break;
case tok_extern:
HandleExtern();
break;
default:
HandleTopLevelExpression();
break;
}
}
}
//===----------------------------------------------------------------------===//
// "Library" functions that can be "extern'd" from user code.
//===----------------------------------------------------------------------===//
/// putchard - putchar that takes a double and returns 0.
extern "C" double putchard(double X) {
putchar((char)X);
return 0;
}
/// printd - printf that takes a double prints it as "%f\n", returning 0.
extern "C" double printd(double X) {
printf("%f\n", X);
return 0;
}
//===----------------------------------------------------------------------===//
// Main driver code.
//===----------------------------------------------------------------------===//
int main() {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
LLVMContext &Context = getGlobalContext();
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
fprintf(stderr, "ready> ");
getNextToken();
// Make the module, which holds all the code.
std::unique_ptr<Module> Owner = make_unique<Module>("my cool jit", Context);
TheModule = Owner.get();
// Create the JIT. This takes ownership of the module.
std::string ErrStr;
TheExecutionEngine =
EngineBuilder(std::move(Owner))
.setErrorStr(&ErrStr)
.setMCJITMemoryManager(llvm::make_unique<SectionMemoryManager>())
.create();
if (!TheExecutionEngine) {
fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
exit(1);
}
legacy::FunctionPassManager OurFPM(TheModule);
// Set up the optimizer pipeline. Start with registering info about how the
// target lays out data structures.
TheModule->setDataLayout(*TheExecutionEngine->getDataLayout());
// Provide basic AliasAnalysis support for GVN.
OurFPM.add(createBasicAliasAnalysisPass());
// Do simple "peephole" optimizations and bit-twiddling optzns.
OurFPM.add(createInstructionCombiningPass());
// Reassociate expressions.
OurFPM.add(createReassociatePass());
// Eliminate Common SubExpressions.
OurFPM.add(createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
OurFPM.add(createCFGSimplificationPass());
OurFPM.doInitialization();
// Set the global so the code gen can use this.
TheFPM = &OurFPM;
// Run the main "interpreter loop" now.
MainLoop();
TheFPM = 0;
// Print out all of the generated code.
TheModule->dump();
return 0;
}
Next: Extending the language: mutable variables / SSA construction
Kaleidoscope: Extending the Language: Mutable Variables¶
Chapter 7 Introduction¶
Welcome to Chapter 7 of the “Implementing a language with LLVM” tutorial. In chapters 1 through 6, we’ve built a very respectable, albeit simple, functional programming language. In our journey, we learned some parsing techniques, how to build and represent an AST, how to build LLVM IR, and how to optimize the resultant code as well as JIT compile it.
While Kaleidoscope is interesting as a functional language, the fact that it is functional makes it “too easy” to generate LLVM IR for it. In particular, a functional language makes it very easy to build LLVM IR directly in SSA form. Since LLVM requires that the input code be in SSA form, this is a very nice property and it is often unclear to newcomers how to generate code for an imperative language with mutable variables.
The short (and happy) summary of this chapter is that there is no need for your front-end to build SSA form: LLVM provides highly tuned and well tested support for this, though the way it works is a bit unexpected for some.
Why is this a hard problem?¶
To understand why mutable variables cause complexities in SSA construction, consider this extremely simple C example:
int G, H;
int test(_Bool Condition) {
int X;
if (Condition)
X = G;
else
X = H;
return X;
}
In this case, we have the variable “X”, whose value depends on the path executed in the program. Because there are two different possible values for X before the return instruction, a PHI node is inserted to merge the two values. The LLVM IR that we want for this example looks like this:
@G = weak global i32 0 ; type of @G is i32*
@H = weak global i32 0 ; type of @H is i32*
define i32 @test(i1 %Condition) {
entry:
br i1 %Condition, label %cond_true, label %cond_false
cond_true:
%X.0 = load i32* @G
br label %cond_next
cond_false:
%X.1 = load i32* @H
br label %cond_next
cond_next:
%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
ret i32 %X.2
}
In this example, the loads from the G and H global variables are explicit in the LLVM IR, and they live in the then/else branches of the if statement (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node in the cond_next block selects the right value to use based on where control flow is coming from: if control flow comes from the cond_false block, X.2 gets the value of X.1. Alternatively, if control flow comes from cond_true, it gets the value of X.0. The intent of this chapter is not to explain the details of SSA form. For more information, see one of the many online references.
The question for this article is “who places the phi nodes when lowering assignments to mutable variables?”. The issue here is that LLVM requires that its IR be in SSA form: there is no “non-ssa” mode for it. However, SSA construction requires non-trivial algorithms and data structures, so it is inconvenient and wasteful for every front-end to have to reproduce this logic.
Memory in LLVM¶
The ‘trick’ here is that while LLVM does require all register values to be in SSA form, it does not require (or permit) memory objects to be in SSA form. In the example above, note that the loads from G and H are direct accesses to G and H: they are not renamed or versioned. This differs from some other compiler systems, which do try to version memory objects. In LLVM, instead of encoding dataflow analysis of memory into the LLVM IR, it is handled with Analysis Passes which are computed on demand.
With this in mind, the high-level idea is that we want to make a stack variable (which lives in memory, because it is on the stack) for each mutable object in a function. To take advantage of this trick, we need to talk about how LLVM represents stack variables.
In LLVM, all memory accesses are explicit with load/store instructions, and it is carefully designed not to have (or need) an “address-of” operator. Notice how the type of the @G/@H global variables is actually “i32*” even though the variable is defined as “i32”. What this means is that @G defines space for an i32 in the global data area, but its name actually refers to the address for that space. Stack variables work the same way, except that instead of being declared with global variable definitions, they are declared with the LLVM alloca instruction:
define i32 @example() {
entry:
%X = alloca i32 ; type of %X is i32*.
...
%tmp = load i32* %X ; load the stack value %X from the stack.
%tmp2 = add i32 %tmp, 1 ; increment it
store i32 %tmp2, i32* %X ; store it back
...
This code shows an example of how you can declare and manipulate a stack variable in the LLVM IR. Stack memory allocated with the alloca instruction is fully general: you can pass the address of the stack slot to functions, you can store it in other variables, etc. In our example above, we could rewrite the example to use the alloca technique to avoid using a PHI node:
@G = weak global i32 0 ; type of @G is i32*
@H = weak global i32 0 ; type of @H is i32*
define i32 @test(i1 %Condition) {
entry:
%X = alloca i32 ; type of %X is i32*.
br i1 %Condition, label %cond_true, label %cond_false
cond_true:
%X.0 = load i32* @G
store i32 %X.0, i32* %X ; Update X
br label %cond_next
cond_false:
%X.1 = load i32* @H
store i32 %X.1, i32* %X ; Update X
br label %cond_next
cond_next:
%X.2 = load i32* %X ; Read X
ret i32 %X.2
}
With this, we have discovered a way to handle arbitrary mutable variables without the need to create Phi nodes at all:
- Each mutable variable becomes a stack allocation.
- Each read of the variable becomes a load from the stack.
- Each update of the variable becomes a store to the stack.
- Taking the address of a variable just uses the stack address directly.
While this solution has solved our immediate problem, it introduced another one: we have now apparently introduced a lot of stack traffic for very simple and common operations, a major performance problem. Fortunately for us, the LLVM optimizer has a highly-tuned optimization pass named “mem2reg” that handles this case, promoting allocas like this into SSA registers, inserting Phi nodes as appropriate. If you run this example through the pass, for example, you’ll get:
$ llvm-as < example.ll | opt -mem2reg | llvm-dis
@G = weak global i32 0
@H = weak global i32 0
define i32 @test(i1 %Condition) {
entry:
br i1 %Condition, label %cond_true, label %cond_false
cond_true:
%X.0 = load i32* @G
br label %cond_next
cond_false:
%X.1 = load i32* @H
br label %cond_next
cond_next:
%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
ret i32 %X.01
}
The mem2reg pass implements the standard “iterated dominance frontier” algorithm for constructing SSA form and has a number of optimizations that speed up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing with mutable variables, and we highly recommend that you depend on it. Note that mem2reg only works on variables in certain circumstances:
- mem2reg is alloca-driven: it looks for allocas and if it can handle them, it promotes them. It does not apply to global variables or heap allocations.
- mem2reg only looks for alloca instructions in the entry block of the function. Being in the entry block guarantees that the alloca is only executed once, which makes analysis simpler.
- mem2reg only promotes allocas whose uses are direct loads and stores. If the address of the stack object is passed to a function, or if any funny pointer arithmetic is involved, the alloca will not be promoted.
- mem2reg only works on allocas of first class values (such as pointers, scalars and vectors), and only if the array size of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of promoting structs or arrays to registers. Note that the “scalarrepl” pass is more powerful and can promote structs, “unions”, and arrays in many cases.
All of these properties are easy to satisfy for most imperative languages, and we’ll illustrate it below with Kaleidoscope. The final question you may be asking is: should I bother with this nonsense for my front-end? Wouldn’t it be better if I just did SSA construction directly, avoiding use of the mem2reg optimization pass? In short, we strongly recommend that you use this technique for building SSA form, unless there is an extremely good reason not to. Using this technique is:
- Proven and well tested: clang uses this technique for local mutable variables. As such, the most common clients of LLVM are using this to handle a bulk of their variables. You can be sure that bugs are found fast and fixed early.
- Extremely Fast: mem2reg has a number of special cases that make it fast in common cases as well as fully general. For example, it has fast-paths for variables that are only used in a single block, variables that only have one assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
- Needed for debug info generation: Debug information in LLVM relies on having the address of the variable exposed so that debug info can be attached to it. This technique dovetails very naturally with this style of debug info.
If nothing else, this makes it much easier to get your front-end up and running, and is very simple to implement. Lets extend Kaleidoscope with mutable variables now!
Mutable Variables in Kaleidoscope¶
Now that we know the sort of problem we want to tackle, lets see what this looks like in the context of our little Kaleidoscope language. We’re going to add two features:
- The ability to mutate variables with the ‘=’ operator.
- The ability to define new variables.
While the first item is really what this is about, we only have variables for incoming arguments as well as for induction variables, and redefining those only goes so far :). Also, the ability to define new variables is a useful thing regardless of whether you will be mutating them. Here’s a motivating example that shows how we could use these:
# Define ':' for sequencing: as a low-precedence operator that ignores operands
# and just returns the RHS.
def binary : 1 (x y) y;
# Recursive fib, we could do this before.
def fib(x)
if (x < 3) then
1
else
fib(x-1)+fib(x-2);
# Iterative fib.
def fibi(x)
var a = 1, b = 1, c in
(for i = 3, i < x in
c = a + b :
a = b :
b = c) :
b;
# Call it.
fibi(10);
In order to mutate variables, we have to change our existing variables to use the “alloca trick”. Once we have that, we’ll add our new operator, then extend Kaleidoscope to support new variable definitions.
Adjusting Existing Variables for Mutation¶
The symbol table in Kaleidoscope is managed at code generation time by
the ‘NamedValues
‘ map. This map currently keeps track of the LLVM
“Value*” that holds the double value for the named variable. In order
to support mutation, we need to change this slightly, so that it
NamedValues
holds the memory location of the variable in question.
Note that this change is a refactoring: it changes the structure of the
code, but does not (by itself) change the behavior of the compiler. All
of these changes are isolated in the Kaleidoscope code generator.
At this point in Kaleidoscope’s development, it only supports variables for two things: incoming arguments to functions and the induction variable of ‘for’ loops. For consistency, we’ll allow mutation of these variables in addition to other user-defined variables. This means that these will both need memory locations.
To start our transformation of Kaleidoscope, we’ll change the NamedValues map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++ compiler will tell us what parts of the code we need to update:
static std::map<std::string, AllocaInst*> NamedValues;
Also, since we will need to create these alloca’s, we’ll use a helper function that ensures that the allocas are created in the entry block of the function:
/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
/// the function. This is used for mutable variables etc.
static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
const std::string &VarName) {
IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
TheFunction->getEntryBlock().begin());
return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
VarName.c_str());
}
This funny looking code creates an IRBuilder object that is pointing at the first instruction (.begin()) of the entry block. It then creates an alloca with the expected name and returns it. Because all values in Kaleidoscope are doubles, there is no need to pass in a type to use.
With this in place, the first functionality change we want to make is to variable references. In our new scheme, variables live on the stack, so code generating a reference to them actually needs to produce a load from the stack slot:
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
if (V == 0) return ErrorV("Unknown variable name");
// Load the value.
return Builder.CreateLoad(V, Name.c_str());
}
As you can see, this is pretty straightforward. Now we need to update
the things that define the variables to set up the alloca. We’ll start
with ForExprAST::Codegen
(see the full code listing for
the unabridged code):
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create an alloca for the variable in the entry block.
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
// Emit the start code first, without 'variable' in scope.
Value *StartVal = Start->Codegen();
if (StartVal == 0) return 0;
// Store the value into the alloca.
Builder.CreateStore(StartVal, Alloca);
...
// Compute the end condition.
Value *EndCond = End->Codegen();
if (EndCond == 0) return EndCond;
// Reload, increment, and restore the alloca. This handles the case where
// the body of the loop mutates the variable.
Value *CurVar = Builder.CreateLoad(Alloca);
Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
Builder.CreateStore(NextVar, Alloca);
...
This code is virtually identical to the code before we allowed mutable variables. The big difference is that we no longer have to construct a PHI node, and we use load/store to access the variable as needed.
To support mutable argument variables, we need to also make allocas for them. The code for this is also pretty simple:
/// CreateArgumentAllocas - Create an alloca for each argument and register the
/// argument in the symbol table so that references to it will succeed.
void PrototypeAST::CreateArgumentAllocas(Function *F) {
Function::arg_iterator AI = F->arg_begin();
for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
// Create an alloca for this variable.
AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
// Store the initial value into the alloca.
Builder.CreateStore(AI, Alloca);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = Alloca;
}
}
For each argument, we make an alloca, store the input value to the
function into the alloca, and register the alloca as the memory location
for the argument. This method gets invoked by FunctionAST::Codegen
right after it sets up the entry block for the function.
The final missing piece is adding the mem2reg pass, which allows us to get good codegen once again:
// Set up the optimizer pipeline. Start with registering info about how the
// target lays out data structures.
OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
// Promote allocas to registers.
OurFPM.add(createPromoteMemoryToRegisterPass());
// Do simple "peephole" optimizations and bit-twiddling optzns.
OurFPM.add(createInstructionCombiningPass());
// Reassociate expressions.
OurFPM.add(createReassociatePass());
It is interesting to see what the code looks like before and after the mem2reg optimization runs. For example, this is the before/after code for our recursive fib function. Before the optimization:
define double @fib(double %x) {
entry:
%x1 = alloca double
store double %x, double* %x1
%x2 = load double* %x1
%cmptmp = fcmp ult double %x2, 3.000000e+00
%booltmp = uitofp i1 %cmptmp to double
%ifcond = fcmp one double %booltmp, 0.000000e+00
br i1 %ifcond, label %then, label %else
then: ; preds = %entry
br label %ifcont
else: ; preds = %entry
%x3 = load double* %x1
%subtmp = fsub double %x3, 1.000000e+00
%calltmp = call double @fib(double %subtmp)
%x4 = load double* %x1
%subtmp5 = fsub double %x4, 2.000000e+00
%calltmp6 = call double @fib(double %subtmp5)
%addtmp = fadd double %calltmp, %calltmp6
br label %ifcont
ifcont: ; preds = %else, %then
%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
ret double %iftmp
}
Here there is only one variable (x, the input argument) but you can still see the extremely simple-minded code generation strategy we are using. In the entry block, an alloca is created, and the initial input value is stored into it. Each reference to the variable does a reload from the stack. Also, note that we didn’t modify the if/then/else expression, so it still inserts a PHI node. While we could make an alloca for it, it is actually easier to create a PHI node for it, so we still just make the PHI.
Here is the code after the mem2reg pass runs:
define double @fib(double %x) {
entry:
%cmptmp = fcmp ult double %x, 3.000000e+00
%booltmp = uitofp i1 %cmptmp to double
%ifcond = fcmp one double %booltmp, 0.000000e+00
br i1 %ifcond, label %then, label %else
then:
br label %ifcont
else:
%subtmp = fsub double %x, 1.000000e+00
%calltmp = call double @fib(double %subtmp)
%subtmp5 = fsub double %x, 2.000000e+00
%calltmp6 = call double @fib(double %subtmp5)
%addtmp = fadd double %calltmp, %calltmp6
br label %ifcont
ifcont: ; preds = %else, %then
%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
ret double %iftmp
}
This is a trivial case for mem2reg, since there are no redefinitions of the variable. The point of showing this is to calm your tension about inserting such blatent inefficiencies :).
After the rest of the optimizers run, we get:
define double @fib(double %x) {
entry:
%cmptmp = fcmp ult double %x, 3.000000e+00
%booltmp = uitofp i1 %cmptmp to double
%ifcond = fcmp ueq double %booltmp, 0.000000e+00
br i1 %ifcond, label %else, label %ifcont
else:
%subtmp = fsub double %x, 1.000000e+00
%calltmp = call double @fib(double %subtmp)
%subtmp5 = fsub double %x, 2.000000e+00
%calltmp6 = call double @fib(double %subtmp5)
%addtmp = fadd double %calltmp, %calltmp6
ret double %addtmp
ifcont:
ret double 1.000000e+00
}
Here we see that the simplifycfg pass decided to clone the return instruction into the end of the ‘else’ block. This allowed it to eliminate some branches and the PHI node.
Now that all symbol table references are updated to use stack variables, we’ll add the assignment operator.
New Assignment Operator¶
With our current framework, adding a new assignment operator is really simple. We will parse it just like any other binary operator, but handle it internally (instead of allowing the user to define it). The first step is to set a precedence:
int main() {
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['='] = 2;
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
Now that the parser knows the precedence of the binary operator, it takes care of all the parsing and AST generation. We just need to implement codegen for the assignment operator. This looks like:
Value *BinaryExprAST::Codegen() {
// Special case '=' because we don't want to emit the LHS as an expression.
if (Op == '=') {
// Assignment requires the LHS to be an identifier.
VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
if (!LHSE)
return ErrorV("destination of '=' must be a variable");
Unlike the rest of the binary operators, our assignment operator doesn’t follow the “emit LHS, emit RHS, do computation” model. As such, it is handled as a special case before the other binary operators are handled. The other strange thing is that it requires the LHS to be a variable. It is invalid to have “(x+1) = expr” - only things like “x = expr” are allowed.
// Codegen the RHS.
Value *Val = RHS->Codegen();
if (Val == 0) return 0;
// Look up the name.
Value *Variable = NamedValues[LHSE->getName()];
if (Variable == 0) return ErrorV("Unknown variable name");
Builder.CreateStore(Val, Variable);
return Val;
}
...
Once we have the variable, codegen’ing the assignment is straightforward: we emit the RHS of the assignment, create a store, and return the computed value. Returning a value allows for chained assignments like “X = (Y = Z)”.
Now that we have an assignment operator, we can mutate loop variables and arguments. For example, we can now run code like this:
# Function to print a double.
extern printd(x);
# Define ':' for sequencing: as a low-precedence operator that ignores operands
# and just returns the RHS.
def binary : 1 (x y) y;
def test(x)
printd(x) :
x = 4 :
printd(x);
test(123);
When run, this example prints “123” and then “4”, showing that we did actually mutate the value! Okay, we have now officially implemented our goal: getting this to work requires SSA construction in the general case. However, to be really useful, we want the ability to define our own local variables, lets add this next!
User-defined Local Variables¶
Adding var/in is just like any other extension we made to Kaleidoscope: we extend the lexer, the parser, the AST and the code generator. The first step for adding our new ‘var/in’ construct is to extend the lexer. As before, this is pretty trivial, the code looks like this:
enum Token {
...
// var definition
tok_var = -13
...
}
...
static int gettok() {
...
if (IdentifierStr == "in") return tok_in;
if (IdentifierStr == "binary") return tok_binary;
if (IdentifierStr == "unary") return tok_unary;
if (IdentifierStr == "var") return tok_var;
return tok_identifier;
...
The next step is to define the AST node that we will construct. For var/in, it looks like this:
/// VarExprAST - Expression class for var/in
class VarExprAST : public ExprAST {
std::vector<std::pair<std::string, ExprAST*> > VarNames;
ExprAST *Body;
public:
VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
ExprAST *body)
: VarNames(varnames), Body(body) {}
virtual Value *Codegen();
};
var/in allows a list of names to be defined all at once, and each name can optionally have an initializer value. As such, we capture this information in the VarNames vector. Also, var/in has a body, this body is allowed to access the variables defined by the var/in.
With this in place, we can define the parser pieces. The first thing we do is add it as a primary expression:
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
/// ::= ifexpr
/// ::= forexpr
/// ::= varexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default: return Error("unknown token when expecting an expression");
case tok_identifier: return ParseIdentifierExpr();
case tok_number: return ParseNumberExpr();
case '(': return ParseParenExpr();
case tok_if: return ParseIfExpr();
case tok_for: return ParseForExpr();
case tok_var: return ParseVarExpr();
}
}
Next we define ParseVarExpr:
/// varexpr ::= 'var' identifier ('=' expression)?
// (',' identifier ('=' expression)?)* 'in' expression
static ExprAST *ParseVarExpr() {
getNextToken(); // eat the var.
std::vector<std::pair<std::string, ExprAST*> > VarNames;
// At least one variable name is required.
if (CurTok != tok_identifier)
return Error("expected identifier after var");
The first part of this code parses the list of identifier/expr pairs
into the local VarNames
vector.
while (1) {
std::string Name = IdentifierStr;
getNextToken(); // eat identifier.
// Read the optional initializer.
ExprAST *Init = 0;
if (CurTok == '=') {
getNextToken(); // eat the '='.
Init = ParseExpression();
if (Init == 0) return 0;
}
VarNames.push_back(std::make_pair(Name, Init));
// End of var list, exit loop.
if (CurTok != ',') break;
getNextToken(); // eat the ','.
if (CurTok != tok_identifier)
return Error("expected identifier list after var");
}
Once all the variables are parsed, we then parse the body and create the AST node:
// At this point, we have to have 'in'.
if (CurTok != tok_in)
return Error("expected 'in' keyword after 'var'");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0) return 0;
return new VarExprAST(VarNames, Body);
}
Now that we can parse and represent the code, we need to support emission of LLVM IR for it. This code starts out with:
Value *VarExprAST::Codegen() {
std::vector<AllocaInst *> OldBindings;
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Register all variables and emit their initializer.
for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
const std::string &VarName = VarNames[i].first;
ExprAST *Init = VarNames[i].second;
Basically it loops over all the variables, installing them one at a time. For each variable we put into the symbol table, we remember the previous value that we replace in OldBindings.
// Emit the initializer before adding the variable to scope, this prevents
// the initializer from referencing the variable itself, and permits stuff
// like this:
// var a = 1 in
// var a = a in ... # refers to outer 'a'.
Value *InitVal;
if (Init) {
InitVal = Init->Codegen();
if (InitVal == 0) return 0;
} else { // If not specified, use 0.0.
InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
}
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
Builder.CreateStore(InitVal, Alloca);
// Remember the old variable binding so that we can restore the binding when
// we unrecurse.
OldBindings.push_back(NamedValues[VarName]);
// Remember this binding.
NamedValues[VarName] = Alloca;
}
There are more comments here than code. The basic idea is that we emit the initializer, create the alloca, then update the symbol table to point to it. Once all the variables are installed in the symbol table, we evaluate the body of the var/in expression:
// Codegen the body, now that all vars are in scope.
Value *BodyVal = Body->Codegen();
if (BodyVal == 0) return 0;
Finally, before returning, we restore the previous variable bindings:
// Pop all our variables from scope.
for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
NamedValues[VarNames[i].first] = OldBindings[i];
// Return the body computation.
return BodyVal;
}
The end result of all of this is that we get properly scoped variable definitions, and we even (trivially) allow mutation of them :).
With this, we completed what we set out to do. Our nice iterative fib example from the intro compiles and runs just fine. The mem2reg pass optimizes all of our stack variables into SSA registers, inserting PHI nodes where needed, and our front-end remains simple: no “iterated dominance frontier” computation anywhere in sight.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with mutable variables and var/in support. To build this example, use:
# Compile
clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
Here is the code:
#include "llvm/ADT/STLExtras.h"
#include "llvm/Analysis/Passes.h"
#include "llvm/ExecutionEngine/ExecutionEngine.h"
#include "llvm/ExecutionEngine/MCJIT.h"
#include "llvm/ExecutionEngine/SectionMemoryManager.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Verifier.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Transforms/Scalar.h"
#include <cctype>
#include <cstdio>
#include <map>
#include <string>
#include <vector>
using namespace llvm;
//===----------------------------------------------------------------------===//
// Lexer
//===----------------------------------------------------------------------===//
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2,
tok_extern = -3,
// primary
tok_identifier = -4,
tok_number = -5,
// control
tok_if = -6,
tok_then = -7,
tok_else = -8,
tok_for = -9,
tok_in = -10,
// operators
tok_binary = -11,
tok_unary = -12,
// var definition
tok_var = -13
};
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = getchar();
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = getchar())))
IdentifierStr += LastChar;
if (IdentifierStr == "def")
return tok_def;
if (IdentifierStr == "extern")
return tok_extern;
if (IdentifierStr == "if")
return tok_if;
if (IdentifierStr == "then")
return tok_then;
if (IdentifierStr == "else")
return tok_else;
if (IdentifierStr == "for")
return tok_for;
if (IdentifierStr == "in")
return tok_in;
if (IdentifierStr == "binary")
return tok_binary;
if (IdentifierStr == "unary")
return tok_unary;
if (IdentifierStr == "var")
return tok_var;
return tok_identifier;
}
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = getchar();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
if (LastChar == '#') {
// Comment until end of line.
do
LastChar = getchar();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = getchar();
return ThisChar;
}
//===----------------------------------------------------------------------===//
// Abstract Syntax Tree (aka Parse Tree)
//===----------------------------------------------------------------------===//
namespace {
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
Value *Codegen() override;
};
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(const std::string &name) : Name(name) {}
const std::string &getName() const { return Name; }
Value *Codegen() override;
};
/// UnaryExprAST - Expression class for a unary operator.
class UnaryExprAST : public ExprAST {
char Opcode;
ExprAST *Operand;
public:
UnaryExprAST(char opcode, ExprAST *operand)
: Opcode(opcode), Operand(operand) {}
Value *Codegen() override;
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
: Op(op), LHS(lhs), RHS(rhs) {}
Value *Codegen() override;
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST *> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST *> &args)
: Callee(callee), Args(args) {}
Value *Codegen() override;
};
/// IfExprAST - Expression class for if/then/else.
class IfExprAST : public ExprAST {
ExprAST *Cond, *Then, *Else;
public:
IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
: Cond(cond), Then(then), Else(_else) {}
Value *Codegen() override;
};
/// ForExprAST - Expression class for for/in.
class ForExprAST : public ExprAST {
std::string VarName;
ExprAST *Start, *End, *Step, *Body;
public:
ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
ExprAST *step, ExprAST *body)
: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
Value *Codegen() override;
};
/// VarExprAST - Expression class for var/in
class VarExprAST : public ExprAST {
std::vector<std::pair<std::string, ExprAST *> > VarNames;
ExprAST *Body;
public:
VarExprAST(const std::vector<std::pair<std::string, ExprAST *> > &varnames,
ExprAST *body)
: VarNames(varnames), Body(body) {}
Value *Codegen() override;
};
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its argument names as well as if it is an operator.
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
bool isOperator;
unsigned Precedence; // Precedence if a binary op.
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args,
bool isoperator = false, unsigned prec = 0)
: Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
bool isUnaryOp() const { return isOperator && Args.size() == 1; }
bool isBinaryOp() const { return isOperator && Args.size() == 2; }
char getOperatorName() const {
assert(isUnaryOp() || isBinaryOp());
return Name[Name.size() - 1];
}
unsigned getBinaryPrecedence() const { return Precedence; }
Function *Codegen();
void CreateArgumentAllocas(Function *F);
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {}
Function *Codegen();
};
} // end anonymous namespace
//===----------------------------------------------------------------------===//
// Parser
//===----------------------------------------------------------------------===//
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() { return CurTok = gettok(); }
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0)
return -1;
return TokPrec;
}
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) {
fprintf(stderr, "Error: %s\n", Str);
return 0;
}
PrototypeAST *ErrorP(const char *Str) {
Error(Str);
return 0;
}
FunctionAST *ErrorF(const char *Str) {
Error(Str);
return 0;
}
static ExprAST *ParseExpression();
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST *> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg)
return 0;
Args.push_back(Arg);
if (CurTok == ')')
break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(IdName, Args);
}
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V)
return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
/// ifexpr ::= 'if' expression 'then' expression 'else' expression
static ExprAST *ParseIfExpr() {
getNextToken(); // eat the if.
// condition.
ExprAST *Cond = ParseExpression();
if (!Cond)
return 0;
if (CurTok != tok_then)
return Error("expected then");
getNextToken(); // eat the then
ExprAST *Then = ParseExpression();
if (Then == 0)
return 0;
if (CurTok != tok_else)
return Error("expected else");
getNextToken();
ExprAST *Else = ParseExpression();
if (!Else)
return 0;
return new IfExprAST(Cond, Then, Else);
}
/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
static ExprAST *ParseForExpr() {
getNextToken(); // eat the for.
if (CurTok != tok_identifier)
return Error("expected identifier after for");
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '=')
return Error("expected '=' after for");
getNextToken(); // eat '='.
ExprAST *Start = ParseExpression();
if (Start == 0)
return 0;
if (CurTok != ',')
return Error("expected ',' after for start value");
getNextToken();
ExprAST *End = ParseExpression();
if (End == 0)
return 0;
// The step value is optional.
ExprAST *Step = 0;
if (CurTok == ',') {
getNextToken();
Step = ParseExpression();
if (Step == 0)
return 0;
}
if (CurTok != tok_in)
return Error("expected 'in' after for");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0)
return 0;
return new ForExprAST(IdName, Start, End, Step, Body);
}
/// varexpr ::= 'var' identifier ('=' expression)?
// (',' identifier ('=' expression)?)* 'in' expression
static ExprAST *ParseVarExpr() {
getNextToken(); // eat the var.
std::vector<std::pair<std::string, ExprAST *> > VarNames;
// At least one variable name is required.
if (CurTok != tok_identifier)
return Error("expected identifier after var");
while (1) {
std::string Name = IdentifierStr;
getNextToken(); // eat identifier.
// Read the optional initializer.
ExprAST *Init = 0;
if (CurTok == '=') {
getNextToken(); // eat the '='.
Init = ParseExpression();
if (Init == 0)
return 0;
}
VarNames.push_back(std::make_pair(Name, Init));
// End of var list, exit loop.
if (CurTok != ',')
break;
getNextToken(); // eat the ','.
if (CurTok != tok_identifier)
return Error("expected identifier list after var");
}
// At this point, we have to have 'in'.
if (CurTok != tok_in)
return Error("expected 'in' keyword after 'var'");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0)
return 0;
return new VarExprAST(VarNames, Body);
}
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
/// ::= ifexpr
/// ::= forexpr
/// ::= varexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default:
return Error("unknown token when expecting an expression");
case tok_identifier:
return ParseIdentifierExpr();
case tok_number:
return ParseNumberExpr();
case '(':
return ParseParenExpr();
case tok_if:
return ParseIfExpr();
case tok_for:
return ParseForExpr();
case tok_var:
return ParseVarExpr();
}
}
/// unary
/// ::= primary
/// ::= '!' unary
static ExprAST *ParseUnary() {
// If the current token is not an operator, it must be a primary expr.
if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
return ParsePrimary();
// If this is a unary operator, read it.
int Opc = CurTok;
getNextToken();
if (ExprAST *Operand = ParseUnary())
return new UnaryExprAST(Opc, Operand);
return 0;
}
/// binoprhs
/// ::= ('+' unary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
// Okay, we know this is a binop.
int BinOp = CurTok;
getNextToken(); // eat binop
// Parse the unary expression after the binary operator.
ExprAST *RHS = ParseUnary();
if (!RHS)
return 0;
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec + 1, RHS);
if (RHS == 0)
return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS);
}
}
/// expression
/// ::= unary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParseUnary();
if (!LHS)
return 0;
return ParseBinOpRHS(0, LHS);
}
/// prototype
/// ::= id '(' id* ')'
/// ::= binary LETTER number? (id, id)
/// ::= unary LETTER (id)
static PrototypeAST *ParsePrototype() {
std::string FnName;
unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
unsigned BinaryPrecedence = 30;
switch (CurTok) {
default:
return ErrorP("Expected function name in prototype");
case tok_identifier:
FnName = IdentifierStr;
Kind = 0;
getNextToken();
break;
case tok_unary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected unary operator");
FnName = "unary";
FnName += (char)CurTok;
Kind = 1;
getNextToken();
break;
case tok_binary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected binary operator");
FnName = "binary";
FnName += (char)CurTok;
Kind = 2;
getNextToken();
// Read the precedence if present.
if (CurTok == tok_number) {
if (NumVal < 1 || NumVal > 100)
return ErrorP("Invalid precedecnce: must be 1..100");
BinaryPrecedence = (unsigned)NumVal;
getNextToken();
}
break;
}
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
// Verify right number of names for operator.
if (Kind && ArgNames.size() != Kind)
return ErrorP("Invalid number of operands for operator");
return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
}
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0)
return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
//===----------------------------------------------------------------------===//
// Code Generation
//===----------------------------------------------------------------------===//
static Module *TheModule;
static IRBuilder<> Builder(getGlobalContext());
static std::map<std::string, AllocaInst *> NamedValues;
static legacy::FunctionPassManager *TheFPM;
Value *ErrorV(const char *Str) {
Error(Str);
return 0;
}
/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
/// the function. This is used for mutable variables etc.
static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
const std::string &VarName) {
IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
TheFunction->getEntryBlock().begin());
return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
VarName.c_str());
}
Value *NumberExprAST::Codegen() {
return ConstantFP::get(getGlobalContext(), APFloat(Val));
}
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
if (V == 0)
return ErrorV("Unknown variable name");
// Load the value.
return Builder.CreateLoad(V, Name.c_str());
}
Value *UnaryExprAST::Codegen() {
Value *OperandV = Operand->Codegen();
if (OperandV == 0)
return 0;
Function *F = TheModule->getFunction(std::string("unary") + Opcode);
if (F == 0)
return ErrorV("Unknown unary operator");
return Builder.CreateCall(F, OperandV, "unop");
}
Value *BinaryExprAST::Codegen() {
// Special case '=' because we don't want to emit the LHS as an expression.
if (Op == '=') {
// Assignment requires the LHS to be an identifier.
// This assume we're building without RTTI because LLVM builds that way by
// default. If you build LLVM with RTTI this can be changed to a
// dynamic_cast for automatic error checking.
VariableExprAST *LHSE = static_cast<VariableExprAST *>(LHS);
if (!LHSE)
return ErrorV("destination of '=' must be a variable");
// Codegen the RHS.
Value *Val = RHS->Codegen();
if (Val == 0)
return 0;
// Look up the name.
Value *Variable = NamedValues[LHSE->getName()];
if (Variable == 0)
return ErrorV("Unknown variable name");
Builder.CreateStore(Val, Variable);
return Val;
}
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0)
return 0;
switch (Op) {
case '+':
return Builder.CreateFAdd(L, R, "addtmp");
case '-':
return Builder.CreateFSub(L, R, "subtmp");
case '*':
return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default:
break;
}
// If it wasn't a builtin binary operator, it must be a user defined one. Emit
// a call to it.
Function *F = TheModule->getFunction(std::string("binary") + Op);
assert(F && "binary operator not found!");
Value *Ops[] = { L, R };
return Builder.CreateCall(F, Ops, "binop");
}
Value *CallExprAST::Codegen() {
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value *> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0)
return 0;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
Value *IfExprAST::Codegen() {
Value *CondV = Cond->Codegen();
if (CondV == 0)
return 0;
// Convert condition to a bool by comparing equal to 0.0.
CondV = Builder.CreateFCmpONE(
CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond");
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create blocks for the then and else cases. Insert the 'then' block at the
// end of the function.
BasicBlock *ThenBB =
BasicBlock::Create(getGlobalContext(), "then", TheFunction);
BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
Builder.CreateCondBr(CondV, ThenBB, ElseBB);
// Emit then value.
Builder.SetInsertPoint(ThenBB);
Value *ThenV = Then->Codegen();
if (ThenV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
ThenBB = Builder.GetInsertBlock();
// Emit else block.
TheFunction->getBasicBlockList().push_back(ElseBB);
Builder.SetInsertPoint(ElseBB);
Value *ElseV = Else->Codegen();
if (ElseV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
ElseBB = Builder.GetInsertBlock();
// Emit merge block.
TheFunction->getBasicBlockList().push_back(MergeBB);
Builder.SetInsertPoint(MergeBB);
PHINode *PN =
Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp");
PN->addIncoming(ThenV, ThenBB);
PN->addIncoming(ElseV, ElseBB);
return PN;
}
Value *ForExprAST::Codegen() {
// Output this as:
// var = alloca double
// ...
// start = startexpr
// store start -> var
// goto loop
// loop:
// ...
// bodyexpr
// ...
// loopend:
// step = stepexpr
// endcond = endexpr
//
// curvar = load var
// nextvar = curvar + step
// store nextvar -> var
// br endcond, loop, endloop
// outloop:
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create an alloca for the variable in the entry block.
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
// Emit the start code first, without 'variable' in scope.
Value *StartVal = Start->Codegen();
if (StartVal == 0)
return 0;
// Store the value into the alloca.
Builder.CreateStore(StartVal, Alloca);
// Make the new basic block for the loop header, inserting after current
// block.
BasicBlock *LoopBB =
BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
// Insert an explicit fall through from the current block to the LoopBB.
Builder.CreateBr(LoopBB);
// Start insertion in LoopBB.
Builder.SetInsertPoint(LoopBB);
// Within the loop, the variable is defined equal to the PHI node. If it
// shadows an existing variable, we have to restore it, so save it now.
AllocaInst *OldVal = NamedValues[VarName];
NamedValues[VarName] = Alloca;
// Emit the body of the loop. This, like any other expr, can change the
// current BB. Note that we ignore the value computed by the body, but don't
// allow an error.
if (Body->Codegen() == 0)
return 0;
// Emit the step value.
Value *StepVal;
if (Step) {
StepVal = Step->Codegen();
if (StepVal == 0)
return 0;
} else {
// If not specified, use 1.0.
StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
}
// Compute the end condition.
Value *EndCond = End->Codegen();
if (EndCond == 0)
return EndCond;
// Reload, increment, and restore the alloca. This handles the case where
// the body of the loop mutates the variable.
Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
Builder.CreateStore(NextVar, Alloca);
// Convert condition to a bool by comparing equal to 0.0.
EndCond = Builder.CreateFCmpONE(
EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond");
// Create the "after loop" block and insert it.
BasicBlock *AfterBB =
BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
// Insert the conditional branch into the end of LoopEndBB.
Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
// Any new code will be inserted in AfterBB.
Builder.SetInsertPoint(AfterBB);
// Restore the unshadowed variable.
if (OldVal)
NamedValues[VarName] = OldVal;
else
NamedValues.erase(VarName);
// for expr always returns 0.0.
return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
}
Value *VarExprAST::Codegen() {
std::vector<AllocaInst *> OldBindings;
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Register all variables and emit their initializer.
for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
const std::string &VarName = VarNames[i].first;
ExprAST *Init = VarNames[i].second;
// Emit the initializer before adding the variable to scope, this prevents
// the initializer from referencing the variable itself, and permits stuff
// like this:
// var a = 1 in
// var a = a in ... # refers to outer 'a'.
Value *InitVal;
if (Init) {
InitVal = Init->Codegen();
if (InitVal == 0)
return 0;
} else { // If not specified, use 0.0.
InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
}
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
Builder.CreateStore(InitVal, Alloca);
// Remember the old variable binding so that we can restore the binding when
// we unrecurse.
OldBindings.push_back(NamedValues[VarName]);
// Remember this binding.
NamedValues[VarName] = Alloca;
}
// Codegen the body, now that all vars are in scope.
Value *BodyVal = Body->Codegen();
if (BodyVal == 0)
return 0;
// Pop all our variables from scope.
for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
NamedValues[VarNames[i].first] = OldBindings[i];
// Return the body computation.
return BodyVal;
}
Function *PrototypeAST::Codegen() {
// Make the function type: double(double,double) etc.
std::vector<Type *> Doubles(Args.size(),
Type::getDoubleTy(getGlobalContext()));
FunctionType *FT =
FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
Function *F =
Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
// If F conflicted, there was already something named 'Name'. If it has a
// body, don't allow redefinition or reextern.
if (F->getName() != Name) {
// Delete the one we just made and get the existing one.
F->eraseFromParent();
F = TheModule->getFunction(Name);
// If F already has a body, reject this.
if (!F->empty()) {
ErrorF("redefinition of function");
return 0;
}
// If F took a different number of args, reject.
if (F->arg_size() != Args.size()) {
ErrorF("redefinition of function with different # args");
return 0;
}
}
// Set names for all arguments.
unsigned Idx = 0;
for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
++AI, ++Idx)
AI->setName(Args[Idx]);
return F;
}
/// CreateArgumentAllocas - Create an alloca for each argument and register the
/// argument in the symbol table so that references to it will succeed.
void PrototypeAST::CreateArgumentAllocas(Function *F) {
Function::arg_iterator AI = F->arg_begin();
for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
// Create an alloca for this variable.
AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
// Store the initial value into the alloca.
Builder.CreateStore(AI, Alloca);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = Alloca;
}
}
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
// If this is an operator, install it.
if (Proto->isBinaryOp())
BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
// Add all arguments to the symbol table and create their allocas.
Proto->CreateArgumentAllocas(TheFunction);
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
// Optimize the function.
TheFPM->run(*TheFunction);
return TheFunction;
}
// Error reading body, remove function.
TheFunction->eraseFromParent();
if (Proto->isBinaryOp())
BinopPrecedence.erase(Proto->getOperatorName());
return 0;
}
//===----------------------------------------------------------------------===//
// Top-Level parsing and JIT Driver
//===----------------------------------------------------------------------===//
static ExecutionEngine *TheExecutionEngine;
static void HandleDefinition() {
if (FunctionAST *F = ParseDefinition()) {
if (Function *LF = F->Codegen()) {
fprintf(stderr, "Read function definition:");
LF->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleExtern() {
if (PrototypeAST *P = ParseExtern()) {
if (Function *F = P->Codegen()) {
fprintf(stderr, "Read extern: ");
F->dump();
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
if (Function *LF = F->Codegen()) {
TheExecutionEngine->finalizeObject();
// JIT the function, returning a function pointer.
void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
// Cast it to the right type (takes no arguments, returns a double) so we
// can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)FPtr;
fprintf(stderr, "Evaluated to %f\n", FP());
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof:
return;
case ';':
getNextToken();
break; // ignore top-level semicolons.
case tok_def:
HandleDefinition();
break;
case tok_extern:
HandleExtern();
break;
default:
HandleTopLevelExpression();
break;
}
}
}
//===----------------------------------------------------------------------===//
// "Library" functions that can be "extern'd" from user code.
//===----------------------------------------------------------------------===//
/// putchard - putchar that takes a double and returns 0.
extern "C" double putchard(double X) {
putchar((char)X);
return 0;
}
/// printd - printf that takes a double prints it as "%f\n", returning 0.
extern "C" double printd(double X) {
printf("%f\n", X);
return 0;
}
//===----------------------------------------------------------------------===//
// Main driver code.
//===----------------------------------------------------------------------===//
int main() {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
LLVMContext &Context = getGlobalContext();
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['='] = 2;
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
fprintf(stderr, "ready> ");
getNextToken();
// Make the module, which holds all the code.
std::unique_ptr<Module> Owner = make_unique<Module>("my cool jit", Context);
TheModule = Owner.get();
// Create the JIT. This takes ownership of the module.
std::string ErrStr;
TheExecutionEngine =
EngineBuilder(std::move(Owner))
.setErrorStr(&ErrStr)
.setMCJITMemoryManager(llvm::make_unique<SectionMemoryManager>())
.create();
if (!TheExecutionEngine) {
fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
exit(1);
}
legacy::FunctionPassManager OurFPM(TheModule);
// Set up the optimizer pipeline. Start with registering info about how the
// target lays out data structures.
TheModule->setDataLayout(*TheExecutionEngine->getDataLayout());
// Provide basic AliasAnalysis support for GVN.
OurFPM.add(createBasicAliasAnalysisPass());
// Promote allocas to registers.
OurFPM.add(createPromoteMemoryToRegisterPass());
// Do simple "peephole" optimizations and bit-twiddling optzns.
OurFPM.add(createInstructionCombiningPass());
// Reassociate expressions.
OurFPM.add(createReassociatePass());
// Eliminate Common SubExpressions.
OurFPM.add(createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
OurFPM.add(createCFGSimplificationPass());
OurFPM.doInitialization();
// Set the global so the code gen can use this.
TheFPM = &OurFPM;
// Run the main "interpreter loop" now.
MainLoop();
TheFPM = 0;
// Print out all of the generated code.
TheModule->dump();
return 0;
}
Kaleidoscope: Adding Debug Information¶
Chapter 8 Introduction¶
Welcome to Chapter 8 of the “Implementing a language with LLVM” tutorial. In chapters 1 through 7, we’ve built a decent little programming language with functions and variables. What happens if something goes wrong though, how do you debug your program?
Source level debugging uses formatted data that helps a debugger translate from binary and the state of the machine back to the source that the programmer wrote. In LLVM we generally use a format called DWARF. DWARF is a compact encoding that represents types, source locations, and variable locations.
The short summary of this chapter is that we’ll go through the various things you have to add to a programming language to support debug info, and how you translate that into DWARF.
Caveat: For now we can’t debug via the JIT, so we’ll need to compile our program down to something small and standalone. As part of this we’ll make a few modifications to the running of the language and how programs are compiled. This means that we’ll have a source file with a simple program written in Kaleidoscope rather than the interactive JIT. It does involve a limitation that we can only have one “top level” command at a time to reduce the number of changes necessary.
Here’s the sample program we’ll be compiling:
def fib(x)
if x < 3 then
1
else
fib(x-1)+fib(x-2);
fib(10)
Why is this a hard problem?¶
Debug information is a hard problem for a few different reasons - mostly centered around optimized code. First, optimization makes keeping source locations more difficult. In LLVM IR we keep the original source location for each IR level instruction on the instruction. Optimization passes should keep the source locations for newly created instructions, but merged instructions only get to keep a single location - this can cause jumping around when stepping through optimized programs. Secondly, optimization can move variables in ways that are either optimized out, shared in memory with other variables, or difficult to track. For the purposes of this tutorial we’re going to avoid optimization (as you’ll see with one of the next sets of patches).
Ahead-of-Time Compilation Mode¶
To highlight only the aspects of adding debug information to a source language without needing to worry about the complexities of JIT debugging we’re going to make a few changes to Kaleidoscope to support compiling the IR emitted by the front end into a simple standalone program that you can execute, debug, and see results.
First we make our anonymous function that contains our top level statement be our “main”:
- PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
+ PrototypeAST *Proto = new PrototypeAST("main", std::vector<std::string>());
just with the simple change of giving it a name.
Then we’re going to remove the command line code wherever it exists:
@@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
- fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof:
return;
@@ -1184,7 +1183,6 @@ int main() {
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
- fprintf(stderr, "ready> ");
getNextToken();
Lastly we’re going to disable all of the optimization passes and the JIT so that the only thing that happens after we’re done parsing and generating code is that the llvm IR goes to standard error:
@@ -1108,17 +1108,8 @@ static void HandleExtern() {
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
- if (Function *LF = F->Codegen()) {
- // We're just doing this to make sure it executes.
- TheExecutionEngine->finalizeObject();
- // JIT the function, returning a function pointer.
- void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
-
- // Cast it to the right type (takes no arguments, returns a double) so we
- // can call it as a native function.
- double (*FP)() = (double (*)())(intptr_t)FPtr;
- // Ignore the return value for this.
- (void)FP;
+ if (!F->Codegen()) {
+ fprintf(stderr, "Error generating code for top level expr");
}
} else {
// Skip token for error recovery.
@@ -1439,11 +1459,11 @@ int main() {
// target lays out data structures.
TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
OurFPM.add(new DataLayoutPass());
+#if 0
OurFPM.add(createBasicAliasAnalysisPass());
// Promote allocas to registers.
OurFPM.add(createPromoteMemoryToRegisterPass());
@@ -1218,7 +1210,7 @@ int main() {
OurFPM.add(createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
OurFPM.add(createCFGSimplificationPass());
-
+ #endif
OurFPM.doInitialization();
// Set the global so the code gen can use this.
This relatively small set of changes get us to the point that we can compile our piece of Kaleidoscope language down to an executable program via this command line:
Kaleidoscope-Ch8 < fib.ks | & clang -x ir -
which gives an a.out/a.exe in the current working directory.
Compile Unit¶
The top level container for a section of code in DWARF is a compile unit. This contains the type and function data for an individual translation unit (read: one file of source code). So the first thing we need to do is construct one for our fib.ks file.
DWARF Emission Setup¶
Similar to the IRBuilder
class we have a
`DIBuilder
<http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
that helps in constructing debug metadata for an llvm IR file. It
corresponds 1:1 similarly to IRBuilder
and llvm IR, but with nicer names.
Using it does require that you be more familiar with DWARF terminology than
you needed to be with IRBuilder
and Instruction
names, but if you
read through the general documentation on the
`Metadata Format
<http://llvm.org/docs/SourceLevelDebugging.html>`_ it
should be a little more clear. We’ll be using this class to construct all
of our IR level descriptions. Construction for it takes a module so we
need to construct it shortly after we construct our module. We’ve left it
as a global static variable to make it a bit easier to use.
Next we’re going to create a small container to cache some of our frequent data. The first will be our compile unit, but we’ll also write a bit of code for our one type since we won’t have to worry about multiple typed expressions:
static DIBuilder *DBuilder;
struct DebugInfo {
DICompileUnit *TheCU;
DIType *DblTy;
DIType *getDoubleTy();
} KSDbgInfo;
DIType *DebugInfo::getDoubleTy() {
if (DblTy.isValid())
return DblTy;
DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
return DblTy;
}
And then later on in main
when we’re constructing our module:
DBuilder = new DIBuilder(*TheModule);
KSDbgInfo.TheCU = DBuilder->createCompileUnit(
dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
There are a couple of things to note here. First, while we’re producing a
compile unit for a language called Kaleidoscope we used the language
constant for C. This is because a debugger wouldn’t necessarily understand
the calling conventions or default ABI for a language it doesn’t recognize
and we follow the C ABI in our llvm code generation so it’s the closest
thing to accurate. This ensures we can actually call functions from the
debugger and have them execute. Secondly, you’ll see the “fib.ks” in the
call to createCompileUnit
. This is a default hard coded value since
we’re using shell redirection to put our source into the Kaleidoscope
compiler. In a usual front end you’d have an input file name and it would
go there.
One last thing as part of emitting debug information via DIBuilder is that we need to “finalize” the debug information. The reasons are part of the underlying API for DIBuilder, but make sure you do this near the end of main:
DBuilder->finalize();
before you dump out the module.
Functions¶
Now that we have our Compile Unit
and our source locations, we can add
function definitions to the debug info. So in PrototypeAST::Codegen
we
add a few lines of code to describe a context for our subprogram, in this
case the “File”, and the actual definition of the function itself.
So the context:
DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
KSDbgInfo.TheCU.getDirectory());
giving us an DIFile and asking the Compile Unit
we created above for the
directory and filename where we are currently. Then, for now, we use some
source locations of 0 (since our AST doesn’t currently have source location
information) and construct our function definition:
DIScope *FContext = Unit;
unsigned LineNo = 0;
unsigned ScopeLine = 0;
DISubprogram *SP = DBuilder->createFunction(
FContext, Name, StringRef(), Unit, LineNo,
CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
true /* definition */, ScopeLine, DINode::FlagPrototyped, false, F);
and we now have an DISubprogram that contains a reference to all of our metadata for the function.
Source Locations¶
The most important thing for debug information is accurate source location - this makes it possible to map your source code back. We have a problem though, Kaleidoscope really doesn’t have any source location information in the lexer or parser so we’ll need to add it.
struct SourceLocation {
int Line;
int Col;
};
static SourceLocation CurLoc;
static SourceLocation LexLoc = {1, 0};
static int advance() {
int LastChar = getchar();
if (LastChar == '\n' || LastChar == '\r') {
LexLoc.Line++;
LexLoc.Col = 0;
} else
LexLoc.Col++;
return LastChar;
}
In this set of code we’ve added some functionality on how to keep track of the
line and column of the “source file”. As we lex every token we set our current
current “lexical location” to the assorted line and column for the beginning
of the token. We do this by overriding all of the previous calls to
getchar()
with our new advance()
that keeps track of the information
and then we have added to all of our AST classes a source location:
class ExprAST {
SourceLocation Loc;
public:
int getLine() const { return Loc.Line; }
int getCol() const { return Loc.Col; }
ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
virtual std::ostream &dump(std::ostream &out, int ind) {
return out << ':' << getLine() << ':' << getCol() << '\n';
}
that we pass down through when we create a new expression:
LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
giving us locations for each of our expressions and variables.
From this we can make sure to tell DIBuilder
when we’re at a new source
location so it can use that when we generate the rest of our code and make
sure that each instruction has source location information. We do this
by constructing another small function:
void DebugInfo::emitLocation(ExprAST *AST) {
DIScope *Scope;
if (LexicalBlocks.empty())
Scope = TheCU;
else
Scope = LexicalBlocks.back();
Builder.SetCurrentDebugLocation(
DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
}
that both tells the main IRBuilder
where we are, but also what scope
we’re in. Since we’ve just created a function above we can either be in
the main file scope (like when we created our function), or now we can be
in the function scope we just created. To represent this we create a stack
of scopes:
std::vector<DIScope *> LexicalBlocks;
std::map<const PrototypeAST *, DIScope *> FnScopeMap;
and keep a map of each function to the scope that it represents (an DISubprogram is also an DIScope).
Then we make sure to:
KSDbgInfo.emitLocation(this);
emit the location every time we start to generate code for a new AST, and also:
KSDbgInfo.FnScopeMap[this] = SP;
store the scope (function) when we create it and use it:
KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
when we start generating the code for each function.
also, don’t forget to pop the scope back off of your scope stack at the end of the code generation for the function:
// Pop off the lexical block for the function since we added it
// unconditionally.
KSDbgInfo.LexicalBlocks.pop_back();
Variables¶
Now that we have functions, we need to be able to print out the variables
we have in scope. Let’s get our function arguments set up so we can get
decent backtraces and see how our functions are being called. It isn’t
a lot of code, and we generally handle it when we’re creating the
argument allocas in PrototypeAST::CreateArgumentAllocas
.
DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
KSDbgInfo.TheCU.getDirectory());
DILocalVariable D = DBuilder->createLocalVariable(
dwarf::DW_TAG_arg_variable, Scope, Args[Idx], Unit, Line,
KSDbgInfo.getDoubleTy(), Idx);
Instruction *Call = DBuilder->insertDeclare(
Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock());
Call->setDebugLoc(DebugLoc::get(Line, 0, Scope));
Here we’re doing a few things. First, we’re grabbing our current scope
for the variable so we can say what range of code our variable is valid
through. Second, we’re creating the variable, giving it the scope,
the name, source location, type, and since it’s an argument, the argument
index. Third, we create an lvm.dbg.declare
call to indicate at the IR
level that we’ve got a variable in an alloca (and it gives a starting
location for the variable). Lastly, we set a source location for the
beginning of the scope on the declare.
One interesting thing to note at this point is that various debuggers have
assumptions based on how code and debug information was generated for them
in the past. In this case we need to do a little bit of a hack to avoid
generating line information for the function prologue so that the debugger
knows to skip over those instructions when setting a breakpoint. So in
FunctionAST::CodeGen
we add a couple of lines:
// Unset the location for the prologue emission (leading instructions with no
// location in a function are considered part of the prologue and the debugger
// will run past them when breaking on a function)
KSDbgInfo.emitLocation(nullptr);
and then emit a new location when we actually start generating code for the body of the function:
KSDbgInfo.emitLocation(Body);
With this we have enough debug information to set breakpoints in functions, print out argument variables, and call functions. Not too bad for just a few simple lines of code!
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with debug information. To build this example, use:
# Compile
clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
Here is the code:
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Triple.h"
#include "llvm/Analysis/Passes.h"
#include "llvm/ExecutionEngine/ExecutionEngine.h"
#include "llvm/ExecutionEngine/MCJIT.h"
#include "llvm/ExecutionEngine/SectionMemoryManager.h"
#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Verifier.h"
#include "llvm/Support/Host.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Transforms/Scalar.h"
#include <cctype>
#include <cstdio>
#include <iostream>
#include <map>
#include <string>
#include <vector>
using namespace llvm;
//===----------------------------------------------------------------------===//
// Lexer
//===----------------------------------------------------------------------===//
// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
// of these for known things.
enum Token {
tok_eof = -1,
// commands
tok_def = -2,
tok_extern = -3,
// primary
tok_identifier = -4,
tok_number = -5,
// control
tok_if = -6,
tok_then = -7,
tok_else = -8,
tok_for = -9,
tok_in = -10,
// operators
tok_binary = -11,
tok_unary = -12,
// var definition
tok_var = -13
};
std::string getTokName(int Tok) {
switch (Tok) {
case tok_eof:
return "eof";
case tok_def:
return "def";
case tok_extern:
return "extern";
case tok_identifier:
return "identifier";
case tok_number:
return "number";
case tok_if:
return "if";
case tok_then:
return "then";
case tok_else:
return "else";
case tok_for:
return "for";
case tok_in:
return "in";
case tok_binary:
return "binary";
case tok_unary:
return "unary";
case tok_var:
return "var";
}
return std::string(1, (char)Tok);
}
namespace {
class PrototypeAST;
class ExprAST;
}
static IRBuilder<> Builder(getGlobalContext());
struct DebugInfo {
DICompileUnit *TheCU;
DIType *DblTy;
std::vector<DIScope *> LexicalBlocks;
std::map<const PrototypeAST *, DIScope *> FnScopeMap;
void emitLocation(ExprAST *AST);
DIType *getDoubleTy();
} KSDbgInfo;
static std::string IdentifierStr; // Filled in if tok_identifier
static double NumVal; // Filled in if tok_number
struct SourceLocation {
int Line;
int Col;
};
static SourceLocation CurLoc;
static SourceLocation LexLoc = { 1, 0 };
static int advance() {
int LastChar = getchar();
if (LastChar == '\n' || LastChar == '\r') {
LexLoc.Line++;
LexLoc.Col = 0;
} else
LexLoc.Col++;
return LastChar;
}
/// gettok - Return the next token from standard input.
static int gettok() {
static int LastChar = ' ';
// Skip any whitespace.
while (isspace(LastChar))
LastChar = advance();
CurLoc = LexLoc;
if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
IdentifierStr = LastChar;
while (isalnum((LastChar = advance())))
IdentifierStr += LastChar;
if (IdentifierStr == "def")
return tok_def;
if (IdentifierStr == "extern")
return tok_extern;
if (IdentifierStr == "if")
return tok_if;
if (IdentifierStr == "then")
return tok_then;
if (IdentifierStr == "else")
return tok_else;
if (IdentifierStr == "for")
return tok_for;
if (IdentifierStr == "in")
return tok_in;
if (IdentifierStr == "binary")
return tok_binary;
if (IdentifierStr == "unary")
return tok_unary;
if (IdentifierStr == "var")
return tok_var;
return tok_identifier;
}
if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
std::string NumStr;
do {
NumStr += LastChar;
LastChar = advance();
} while (isdigit(LastChar) || LastChar == '.');
NumVal = strtod(NumStr.c_str(), 0);
return tok_number;
}
if (LastChar == '#') {
// Comment until end of line.
do
LastChar = advance();
while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
if (LastChar != EOF)
return gettok();
}
// Check for end of file. Don't eat the EOF.
if (LastChar == EOF)
return tok_eof;
// Otherwise, just return the character as its ascii value.
int ThisChar = LastChar;
LastChar = advance();
return ThisChar;
}
//===----------------------------------------------------------------------===//
// Abstract Syntax Tree (aka Parse Tree)
//===----------------------------------------------------------------------===//
namespace {
std::ostream &indent(std::ostream &O, int size) {
return O << std::string(size, ' ');
}
/// ExprAST - Base class for all expression nodes.
class ExprAST {
SourceLocation Loc;
public:
int getLine() const { return Loc.Line; }
int getCol() const { return Loc.Col; }
ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
virtual std::ostream &dump(std::ostream &out, int ind) {
return out << ':' << getLine() << ':' << getCol() << '\n';
}
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
NumberExprAST(double val) : Val(val) {}
std::ostream &dump(std::ostream &out, int ind) override {
return ExprAST::dump(out << Val, ind);
}
Value *Codegen() override;
};
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
VariableExprAST(SourceLocation Loc, const std::string &name)
: ExprAST(Loc), Name(name) {}
const std::string &getName() const { return Name; }
std::ostream &dump(std::ostream &out, int ind) override {
return ExprAST::dump(out << Name, ind);
}
Value *Codegen() override;
};
/// UnaryExprAST - Expression class for a unary operator.
class UnaryExprAST : public ExprAST {
char Opcode;
ExprAST *Operand;
public:
UnaryExprAST(char opcode, ExprAST *operand)
: Opcode(opcode), Operand(operand) {}
std::ostream &dump(std::ostream &out, int ind) override {
ExprAST::dump(out << "unary" << Opcode, ind);
Operand->dump(out, ind + 1);
return out;
}
Value *Codegen() override;
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(SourceLocation Loc, char op, ExprAST *lhs, ExprAST *rhs)
: ExprAST(Loc), Op(op), LHS(lhs), RHS(rhs) {}
std::ostream &dump(std::ostream &out, int ind) override {
ExprAST::dump(out << "binary" << Op, ind);
LHS->dump(indent(out, ind) << "LHS:", ind + 1);
RHS->dump(indent(out, ind) << "RHS:", ind + 1);
return out;
}
Value *Codegen() override;
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST *> Args;
public:
CallExprAST(SourceLocation Loc, const std::string &callee,
std::vector<ExprAST *> &args)
: ExprAST(Loc), Callee(callee), Args(args) {}
std::ostream &dump(std::ostream &out, int ind) override {
ExprAST::dump(out << "call " << Callee, ind);
for (ExprAST *Arg : Args)
Arg->dump(indent(out, ind + 1), ind + 1);
return out;
}
Value *Codegen() override;
};
/// IfExprAST - Expression class for if/then/else.
class IfExprAST : public ExprAST {
ExprAST *Cond, *Then, *Else;
public:
IfExprAST(SourceLocation Loc, ExprAST *cond, ExprAST *then, ExprAST *_else)
: ExprAST(Loc), Cond(cond), Then(then), Else(_else) {}
std::ostream &dump(std::ostream &out, int ind) override {
ExprAST::dump(out << "if", ind);
Cond->dump(indent(out, ind) << "Cond:", ind + 1);
Then->dump(indent(out, ind) << "Then:", ind + 1);
Else->dump(indent(out, ind) << "Else:", ind + 1);
return out;
}
Value *Codegen() override;
};
/// ForExprAST - Expression class for for/in.
class ForExprAST : public ExprAST {
std::string VarName;
ExprAST *Start, *End, *Step, *Body;
public:
ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
ExprAST *step, ExprAST *body)
: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
std::ostream &dump(std::ostream &out, int ind) override {
ExprAST::dump(out << "for", ind);
Start->dump(indent(out, ind) << "Cond:", ind + 1);
End->dump(indent(out, ind) << "End:", ind + 1);
Step->dump(indent(out, ind) << "Step:", ind + 1);
Body->dump(indent(out, ind) << "Body:", ind + 1);
return out;
}
Value *Codegen() override;
};
/// VarExprAST - Expression class for var/in
class VarExprAST : public ExprAST {
std::vector<std::pair<std::string, ExprAST *> > VarNames;
ExprAST *Body;
public:
VarExprAST(const std::vector<std::pair<std::string, ExprAST *> > &varnames,
ExprAST *body)
: VarNames(varnames), Body(body) {}
std::ostream &dump(std::ostream &out, int ind) override {
ExprAST::dump(out << "var", ind);
for (const auto &NamedVar : VarNames)
NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1);
Body->dump(indent(out, ind) << "Body:", ind + 1);
return out;
}
Value *Codegen() override;
};
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its argument names as well as if it is an operator.
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
bool isOperator;
unsigned Precedence; // Precedence if a binary op.
int Line;
public:
PrototypeAST(SourceLocation Loc, const std::string &name,
const std::vector<std::string> &args, bool isoperator = false,
unsigned prec = 0)
: Name(name), Args(args), isOperator(isoperator), Precedence(prec),
Line(Loc.Line) {}
bool isUnaryOp() const { return isOperator && Args.size() == 1; }
bool isBinaryOp() const { return isOperator && Args.size() == 2; }
char getOperatorName() const {
assert(isUnaryOp() || isBinaryOp());
return Name[Name.size() - 1];
}
unsigned getBinaryPrecedence() const { return Precedence; }
Function *Codegen();
void CreateArgumentAllocas(Function *F);
const std::vector<std::string> &getArgs() const { return Args; }
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {}
std::ostream &dump(std::ostream &out, int ind) {
indent(out, ind) << "FunctionAST\n";
++ind;
indent(out, ind) << "Body:";
return Body ? Body->dump(out, ind) : out << "null\n";
}
Function *Codegen();
};
} // end anonymous namespace
//===----------------------------------------------------------------------===//
// Parser
//===----------------------------------------------------------------------===//
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
/// token the parser is looking at. getNextToken reads another token from the
/// lexer and updates CurTok with its results.
static int CurTok;
static int getNextToken() { return CurTok = gettok(); }
/// BinopPrecedence - This holds the precedence for each binary operator that is
/// defined.
static std::map<char, int> BinopPrecedence;
/// GetTokPrecedence - Get the precedence of the pending binary operator token.
static int GetTokPrecedence() {
if (!isascii(CurTok))
return -1;
// Make sure it's a declared binop.
int TokPrec = BinopPrecedence[CurTok];
if (TokPrec <= 0)
return -1;
return TokPrec;
}
/// Error* - These are little helper functions for error handling.
ExprAST *Error(const char *Str) {
fprintf(stderr, "Error: %s\n", Str);
return 0;
}
PrototypeAST *ErrorP(const char *Str) {
Error(Str);
return 0;
}
FunctionAST *ErrorF(const char *Str) {
Error(Str);
return 0;
}
static ExprAST *ParseExpression();
/// identifierexpr
/// ::= identifier
/// ::= identifier '(' expression* ')'
static ExprAST *ParseIdentifierExpr() {
std::string IdName = IdentifierStr;
SourceLocation LitLoc = CurLoc;
getNextToken(); // eat identifier.
if (CurTok != '(') // Simple variable ref.
return new VariableExprAST(LitLoc, IdName);
// Call.
getNextToken(); // eat (
std::vector<ExprAST *> Args;
if (CurTok != ')') {
while (1) {
ExprAST *Arg = ParseExpression();
if (!Arg)
return 0;
Args.push_back(Arg);
if (CurTok == ')')
break;
if (CurTok != ',')
return Error("Expected ')' or ',' in argument list");
getNextToken();
}
}
// Eat the ')'.
getNextToken();
return new CallExprAST(LitLoc, IdName, Args);
}
/// numberexpr ::= number
static ExprAST *ParseNumberExpr() {
ExprAST *Result = new NumberExprAST(NumVal);
getNextToken(); // consume the number
return Result;
}
/// parenexpr ::= '(' expression ')'
static ExprAST *ParseParenExpr() {
getNextToken(); // eat (.
ExprAST *V = ParseExpression();
if (!V)
return 0;
if (CurTok != ')')
return Error("expected ')'");
getNextToken(); // eat ).
return V;
}
/// ifexpr ::= 'if' expression 'then' expression 'else' expression
static ExprAST *ParseIfExpr() {
SourceLocation IfLoc = CurLoc;
getNextToken(); // eat the if.
// condition.
ExprAST *Cond = ParseExpression();
if (!Cond)
return 0;
if (CurTok != tok_then)
return Error("expected then");
getNextToken(); // eat the then
ExprAST *Then = ParseExpression();
if (Then == 0)
return 0;
if (CurTok != tok_else)
return Error("expected else");
getNextToken();
ExprAST *Else = ParseExpression();
if (!Else)
return 0;
return new IfExprAST(IfLoc, Cond, Then, Else);
}
/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
static ExprAST *ParseForExpr() {
getNextToken(); // eat the for.
if (CurTok != tok_identifier)
return Error("expected identifier after for");
std::string IdName = IdentifierStr;
getNextToken(); // eat identifier.
if (CurTok != '=')
return Error("expected '=' after for");
getNextToken(); // eat '='.
ExprAST *Start = ParseExpression();
if (Start == 0)
return 0;
if (CurTok != ',')
return Error("expected ',' after for start value");
getNextToken();
ExprAST *End = ParseExpression();
if (End == 0)
return 0;
// The step value is optional.
ExprAST *Step = 0;
if (CurTok == ',') {
getNextToken();
Step = ParseExpression();
if (Step == 0)
return 0;
}
if (CurTok != tok_in)
return Error("expected 'in' after for");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0)
return 0;
return new ForExprAST(IdName, Start, End, Step, Body);
}
/// varexpr ::= 'var' identifier ('=' expression)?
// (',' identifier ('=' expression)?)* 'in' expression
static ExprAST *ParseVarExpr() {
getNextToken(); // eat the var.
std::vector<std::pair<std::string, ExprAST *> > VarNames;
// At least one variable name is required.
if (CurTok != tok_identifier)
return Error("expected identifier after var");
while (1) {
std::string Name = IdentifierStr;
getNextToken(); // eat identifier.
// Read the optional initializer.
ExprAST *Init = 0;
if (CurTok == '=') {
getNextToken(); // eat the '='.
Init = ParseExpression();
if (Init == 0)
return 0;
}
VarNames.push_back(std::make_pair(Name, Init));
// End of var list, exit loop.
if (CurTok != ',')
break;
getNextToken(); // eat the ','.
if (CurTok != tok_identifier)
return Error("expected identifier list after var");
}
// At this point, we have to have 'in'.
if (CurTok != tok_in)
return Error("expected 'in' keyword after 'var'");
getNextToken(); // eat 'in'.
ExprAST *Body = ParseExpression();
if (Body == 0)
return 0;
return new VarExprAST(VarNames, Body);
}
/// primary
/// ::= identifierexpr
/// ::= numberexpr
/// ::= parenexpr
/// ::= ifexpr
/// ::= forexpr
/// ::= varexpr
static ExprAST *ParsePrimary() {
switch (CurTok) {
default:
return Error("unknown token when expecting an expression");
case tok_identifier:
return ParseIdentifierExpr();
case tok_number:
return ParseNumberExpr();
case '(':
return ParseParenExpr();
case tok_if:
return ParseIfExpr();
case tok_for:
return ParseForExpr();
case tok_var:
return ParseVarExpr();
}
}
/// unary
/// ::= primary
/// ::= '!' unary
static ExprAST *ParseUnary() {
// If the current token is not an operator, it must be a primary expr.
if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
return ParsePrimary();
// If this is a unary operator, read it.
int Opc = CurTok;
getNextToken();
if (ExprAST *Operand = ParseUnary())
return new UnaryExprAST(Opc, Operand);
return 0;
}
/// binoprhs
/// ::= ('+' unary)*
static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
// If this is a binop, find its precedence.
while (1) {
int TokPrec = GetTokPrecedence();
// If this is a binop that binds at least as tightly as the current binop,
// consume it, otherwise we are done.
if (TokPrec < ExprPrec)
return LHS;
// Okay, we know this is a binop.
int BinOp = CurTok;
SourceLocation BinLoc = CurLoc;
getNextToken(); // eat binop
// Parse the unary expression after the binary operator.
ExprAST *RHS = ParseUnary();
if (!RHS)
return 0;
// If BinOp binds less tightly with RHS than the operator after RHS, let
// the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence();
if (TokPrec < NextPrec) {
RHS = ParseBinOpRHS(TokPrec + 1, RHS);
if (RHS == 0)
return 0;
}
// Merge LHS/RHS.
LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
}
}
/// expression
/// ::= unary binoprhs
///
static ExprAST *ParseExpression() {
ExprAST *LHS = ParseUnary();
if (!LHS)
return 0;
return ParseBinOpRHS(0, LHS);
}
/// prototype
/// ::= id '(' id* ')'
/// ::= binary LETTER number? (id, id)
/// ::= unary LETTER (id)
static PrototypeAST *ParsePrototype() {
std::string FnName;
SourceLocation FnLoc = CurLoc;
unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
unsigned BinaryPrecedence = 30;
switch (CurTok) {
default:
return ErrorP("Expected function name in prototype");
case tok_identifier:
FnName = IdentifierStr;
Kind = 0;
getNextToken();
break;
case tok_unary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected unary operator");
FnName = "unary";
FnName += (char)CurTok;
Kind = 1;
getNextToken();
break;
case tok_binary:
getNextToken();
if (!isascii(CurTok))
return ErrorP("Expected binary operator");
FnName = "binary";
FnName += (char)CurTok;
Kind = 2;
getNextToken();
// Read the precedence if present.
if (CurTok == tok_number) {
if (NumVal < 1 || NumVal > 100)
return ErrorP("Invalid precedecnce: must be 1..100");
BinaryPrecedence = (unsigned)NumVal;
getNextToken();
}
break;
}
if (CurTok != '(')
return ErrorP("Expected '(' in prototype");
std::vector<std::string> ArgNames;
while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr);
if (CurTok != ')')
return ErrorP("Expected ')' in prototype");
// success.
getNextToken(); // eat ')'.
// Verify right number of names for operator.
if (Kind && ArgNames.size() != Kind)
return ErrorP("Invalid number of operands for operator");
return new PrototypeAST(FnLoc, FnName, ArgNames, Kind != 0, BinaryPrecedence);
}
/// definition ::= 'def' prototype expression
static FunctionAST *ParseDefinition() {
getNextToken(); // eat def.
PrototypeAST *Proto = ParsePrototype();
if (Proto == 0)
return 0;
if (ExprAST *E = ParseExpression())
return new FunctionAST(Proto, E);
return 0;
}
/// toplevelexpr ::= expression
static FunctionAST *ParseTopLevelExpr() {
SourceLocation FnLoc = CurLoc;
if (ExprAST *E = ParseExpression()) {
// Make an anonymous proto.
PrototypeAST *Proto =
new PrototypeAST(FnLoc, "main", std::vector<std::string>());
return new FunctionAST(Proto, E);
}
return 0;
}
/// external ::= 'extern' prototype
static PrototypeAST *ParseExtern() {
getNextToken(); // eat extern.
return ParsePrototype();
}
//===----------------------------------------------------------------------===//
// Debug Info Support
//===----------------------------------------------------------------------===//
static DIBuilder *DBuilder;
DIType *DebugInfo::getDoubleTy() {
if (DblTy)
return DblTy;
DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
return DblTy;
}
void DebugInfo::emitLocation(ExprAST *AST) {
if (!AST)
return Builder.SetCurrentDebugLocation(DebugLoc());
DIScope *Scope;
if (LexicalBlocks.empty())
Scope = TheCU;
else
Scope = LexicalBlocks.back();
Builder.SetCurrentDebugLocation(
DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
}
static DISubroutineType *CreateFunctionType(unsigned NumArgs, DIFile *Unit) {
SmallVector<Metadata *, 8> EltTys;
DIType *DblTy = KSDbgInfo.getDoubleTy();
// Add the result type.
EltTys.push_back(DblTy);
for (unsigned i = 0, e = NumArgs; i != e; ++i)
EltTys.push_back(DblTy);
return DBuilder->createSubroutineType(Unit,
DBuilder->getOrCreateTypeArray(EltTys));
}
//===----------------------------------------------------------------------===//
// Code Generation
//===----------------------------------------------------------------------===//
static Module *TheModule;
static std::map<std::string, AllocaInst *> NamedValues;
static legacy::FunctionPassManager *TheFPM;
Value *ErrorV(const char *Str) {
Error(Str);
return 0;
}
/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
/// the function. This is used for mutable variables etc.
static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
const std::string &VarName) {
IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
TheFunction->getEntryBlock().begin());
return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
VarName.c_str());
}
Value *NumberExprAST::Codegen() {
KSDbgInfo.emitLocation(this);
return ConstantFP::get(getGlobalContext(), APFloat(Val));
}
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
if (V == 0)
return ErrorV("Unknown variable name");
KSDbgInfo.emitLocation(this);
// Load the value.
return Builder.CreateLoad(V, Name.c_str());
}
Value *UnaryExprAST::Codegen() {
Value *OperandV = Operand->Codegen();
if (OperandV == 0)
return 0;
Function *F = TheModule->getFunction(std::string("unary") + Opcode);
if (F == 0)
return ErrorV("Unknown unary operator");
KSDbgInfo.emitLocation(this);
return Builder.CreateCall(F, OperandV, "unop");
}
Value *BinaryExprAST::Codegen() {
KSDbgInfo.emitLocation(this);
// Special case '=' because we don't want to emit the LHS as an expression.
if (Op == '=') {
// Assignment requires the LHS to be an identifier.
// This assume we're building without RTTI because LLVM builds that way by
// default. If you build LLVM with RTTI this can be changed to a
// dynamic_cast for automatic error checking.
VariableExprAST *LHSE = static_cast<VariableExprAST *>(LHS);
if (!LHSE)
return ErrorV("destination of '=' must be a variable");
// Codegen the RHS.
Value *Val = RHS->Codegen();
if (Val == 0)
return 0;
// Look up the name.
Value *Variable = NamedValues[LHSE->getName()];
if (Variable == 0)
return ErrorV("Unknown variable name");
Builder.CreateStore(Val, Variable);
return Val;
}
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0)
return 0;
switch (Op) {
case '+':
return Builder.CreateFAdd(L, R, "addtmp");
case '-':
return Builder.CreateFSub(L, R, "subtmp");
case '*':
return Builder.CreateFMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "cmptmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
"booltmp");
default:
break;
}
// If it wasn't a builtin binary operator, it must be a user defined one. Emit
// a call to it.
Function *F = TheModule->getFunction(std::string("binary") + Op);
assert(F && "binary operator not found!");
Value *Ops[] = { L, R };
return Builder.CreateCall(F, Ops, "binop");
}
Value *CallExprAST::Codegen() {
KSDbgInfo.emitLocation(this);
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value *> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0)
return 0;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
Value *IfExprAST::Codegen() {
KSDbgInfo.emitLocation(this);
Value *CondV = Cond->Codegen();
if (CondV == 0)
return 0;
// Convert condition to a bool by comparing equal to 0.0.
CondV = Builder.CreateFCmpONE(
CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond");
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create blocks for the then and else cases. Insert the 'then' block at the
// end of the function.
BasicBlock *ThenBB =
BasicBlock::Create(getGlobalContext(), "then", TheFunction);
BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
Builder.CreateCondBr(CondV, ThenBB, ElseBB);
// Emit then value.
Builder.SetInsertPoint(ThenBB);
Value *ThenV = Then->Codegen();
if (ThenV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
ThenBB = Builder.GetInsertBlock();
// Emit else block.
TheFunction->getBasicBlockList().push_back(ElseBB);
Builder.SetInsertPoint(ElseBB);
Value *ElseV = Else->Codegen();
if (ElseV == 0)
return 0;
Builder.CreateBr(MergeBB);
// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
ElseBB = Builder.GetInsertBlock();
// Emit merge block.
TheFunction->getBasicBlockList().push_back(MergeBB);
Builder.SetInsertPoint(MergeBB);
PHINode *PN =
Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp");
PN->addIncoming(ThenV, ThenBB);
PN->addIncoming(ElseV, ElseBB);
return PN;
}
Value *ForExprAST::Codegen() {
// Output this as:
// var = alloca double
// ...
// start = startexpr
// store start -> var
// goto loop
// loop:
// ...
// bodyexpr
// ...
// loopend:
// step = stepexpr
// endcond = endexpr
//
// curvar = load var
// nextvar = curvar + step
// store nextvar -> var
// br endcond, loop, endloop
// outloop:
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Create an alloca for the variable in the entry block.
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
KSDbgInfo.emitLocation(this);
// Emit the start code first, without 'variable' in scope.
Value *StartVal = Start->Codegen();
if (StartVal == 0)
return 0;
// Store the value into the alloca.
Builder.CreateStore(StartVal, Alloca);
// Make the new basic block for the loop header, inserting after current
// block.
BasicBlock *LoopBB =
BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
// Insert an explicit fall through from the current block to the LoopBB.
Builder.CreateBr(LoopBB);
// Start insertion in LoopBB.
Builder.SetInsertPoint(LoopBB);
// Within the loop, the variable is defined equal to the PHI node. If it
// shadows an existing variable, we have to restore it, so save it now.
AllocaInst *OldVal = NamedValues[VarName];
NamedValues[VarName] = Alloca;
// Emit the body of the loop. This, like any other expr, can change the
// current BB. Note that we ignore the value computed by the body, but don't
// allow an error.
if (Body->Codegen() == 0)
return 0;
// Emit the step value.
Value *StepVal;
if (Step) {
StepVal = Step->Codegen();
if (StepVal == 0)
return 0;
} else {
// If not specified, use 1.0.
StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
}
// Compute the end condition.
Value *EndCond = End->Codegen();
if (EndCond == 0)
return EndCond;
// Reload, increment, and restore the alloca. This handles the case where
// the body of the loop mutates the variable.
Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
Builder.CreateStore(NextVar, Alloca);
// Convert condition to a bool by comparing equal to 0.0.
EndCond = Builder.CreateFCmpONE(
EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond");
// Create the "after loop" block and insert it.
BasicBlock *AfterBB =
BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
// Insert the conditional branch into the end of LoopEndBB.
Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
// Any new code will be inserted in AfterBB.
Builder.SetInsertPoint(AfterBB);
// Restore the unshadowed variable.
if (OldVal)
NamedValues[VarName] = OldVal;
else
NamedValues.erase(VarName);
// for expr always returns 0.0.
return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
}
Value *VarExprAST::Codegen() {
std::vector<AllocaInst *> OldBindings;
Function *TheFunction = Builder.GetInsertBlock()->getParent();
// Register all variables and emit their initializer.
for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
const std::string &VarName = VarNames[i].first;
ExprAST *Init = VarNames[i].second;
// Emit the initializer before adding the variable to scope, this prevents
// the initializer from referencing the variable itself, and permits stuff
// like this:
// var a = 1 in
// var a = a in ... # refers to outer 'a'.
Value *InitVal;
if (Init) {
InitVal = Init->Codegen();
if (InitVal == 0)
return 0;
} else { // If not specified, use 0.0.
InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
}
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
Builder.CreateStore(InitVal, Alloca);
// Remember the old variable binding so that we can restore the binding when
// we unrecurse.
OldBindings.push_back(NamedValues[VarName]);
// Remember this binding.
NamedValues[VarName] = Alloca;
}
KSDbgInfo.emitLocation(this);
// Codegen the body, now that all vars are in scope.
Value *BodyVal = Body->Codegen();
if (BodyVal == 0)
return 0;
// Pop all our variables from scope.
for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
NamedValues[VarNames[i].first] = OldBindings[i];
// Return the body computation.
return BodyVal;
}
Function *PrototypeAST::Codegen() {
// Make the function type: double(double,double) etc.
std::vector<Type *> Doubles(Args.size(),
Type::getDoubleTy(getGlobalContext()));
FunctionType *FT =
FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
Function *F =
Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
// If F conflicted, there was already something named 'Name'. If it has a
// body, don't allow redefinition or reextern.
if (F->getName() != Name) {
// Delete the one we just made and get the existing one.
F->eraseFromParent();
F = TheModule->getFunction(Name);
// If F already has a body, reject this.
if (!F->empty()) {
ErrorF("redefinition of function");
return 0;
}
// If F took a different number of args, reject.
if (F->arg_size() != Args.size()) {
ErrorF("redefinition of function with different # args");
return 0;
}
}
// Set names for all arguments.
unsigned Idx = 0;
for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
++AI, ++Idx)
AI->setName(Args[Idx]);
// Create a subprogram DIE for this function.
DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU->getFilename(),
KSDbgInfo.TheCU->getDirectory());
DIScope *FContext = Unit;
unsigned LineNo = Line;
unsigned ScopeLine = Line;
DISubprogram *SP = DBuilder->createFunction(
FContext, Name, StringRef(), Unit, LineNo,
CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
true /* definition */, ScopeLine, DINode::FlagPrototyped, false, F);
KSDbgInfo.FnScopeMap[this] = SP;
return F;
}
/// CreateArgumentAllocas - Create an alloca for each argument and register the
/// argument in the symbol table so that references to it will succeed.
void PrototypeAST::CreateArgumentAllocas(Function *F) {
Function::arg_iterator AI = F->arg_begin();
for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
// Create an alloca for this variable.
AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
// Create a debug descriptor for the variable.
DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU->getFilename(),
KSDbgInfo.TheCU->getDirectory());
DILocalVariable *D = DBuilder->createLocalVariable(
dwarf::DW_TAG_arg_variable, Scope, Args[Idx], Unit, Line,
KSDbgInfo.getDoubleTy(), Idx);
DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
DebugLoc::get(Line, 0, Scope),
Builder.GetInsertBlock());
// Store the initial value into the alloca.
Builder.CreateStore(AI, Alloca);
// Add arguments to variable symbol table.
NamedValues[Args[Idx]] = Alloca;
}
}
Function *FunctionAST::Codegen() {
NamedValues.clear();
Function *TheFunction = Proto->Codegen();
if (TheFunction == 0)
return 0;
// Push the current scope.
KSDbgInfo.LexicalBlocks.push_back(KSDbgInfo.FnScopeMap[Proto]);
// Unset the location for the prologue emission (leading instructions with no
// location in a function are considered part of the prologue and the debugger
// will run past them when breaking on a function)
KSDbgInfo.emitLocation(nullptr);
// If this is an operator, install it.
if (Proto->isBinaryOp())
BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
// Create a new basic block to start insertion into.
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Builder.SetInsertPoint(BB);
// Add all arguments to the symbol table and create their allocas.
Proto->CreateArgumentAllocas(TheFunction);
KSDbgInfo.emitLocation(Body);
if (Value *RetVal = Body->Codegen()) {
// Finish off the function.
Builder.CreateRet(RetVal);
// Pop off the lexical block for the function.
KSDbgInfo.LexicalBlocks.pop_back();
// Validate the generated code, checking for consistency.
verifyFunction(*TheFunction);
// Optimize the function.
TheFPM->run(*TheFunction);
return TheFunction;
}
// Error reading body, remove function.
TheFunction->eraseFromParent();
if (Proto->isBinaryOp())
BinopPrecedence.erase(Proto->getOperatorName());
// Pop off the lexical block for the function since we added it
// unconditionally.
KSDbgInfo.LexicalBlocks.pop_back();
return 0;
}
//===----------------------------------------------------------------------===//
// Top-Level parsing and JIT Driver
//===----------------------------------------------------------------------===//
static ExecutionEngine *TheExecutionEngine;
static void HandleDefinition() {
if (FunctionAST *F = ParseDefinition()) {
if (!F->Codegen()) {
fprintf(stderr, "Error reading function definition:");
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleExtern() {
if (PrototypeAST *P = ParseExtern()) {
if (!P->Codegen()) {
fprintf(stderr, "Error reading extern");
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
static void HandleTopLevelExpression() {
// Evaluate a top-level expression into an anonymous function.
if (FunctionAST *F = ParseTopLevelExpr()) {
if (!F->Codegen()) {
fprintf(stderr, "Error generating code for top level expr");
}
} else {
// Skip token for error recovery.
getNextToken();
}
}
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (1) {
switch (CurTok) {
case tok_eof:
return;
case ';':
getNextToken();
break; // ignore top-level semicolons.
case tok_def:
HandleDefinition();
break;
case tok_extern:
HandleExtern();
break;
default:
HandleTopLevelExpression();
break;
}
}
}
//===----------------------------------------------------------------------===//
// "Library" functions that can be "extern'd" from user code.
//===----------------------------------------------------------------------===//
/// putchard - putchar that takes a double and returns 0.
extern "C" double putchard(double X) {
putchar((char)X);
return 0;
}
/// printd - printf that takes a double prints it as "%f\n", returning 0.
extern "C" double printd(double X) {
printf("%f\n", X);
return 0;
}
//===----------------------------------------------------------------------===//
// Main driver code.
//===----------------------------------------------------------------------===//
int main() {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
LLVMContext &Context = getGlobalContext();
// Install standard binary operators.
// 1 is lowest precedence.
BinopPrecedence['='] = 2;
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40; // highest.
// Prime the first token.
getNextToken();
// Make the module, which holds all the code.
std::unique_ptr<Module> Owner = make_unique<Module>("my cool jit", Context);
TheModule = Owner.get();
// Add the current debug info version into the module.
TheModule->addModuleFlag(Module::Warning, "Debug Info Version",
DEBUG_METADATA_VERSION);
// Darwin only supports dwarf2.
if (Triple(sys::getProcessTriple()).isOSDarwin())
TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2);
// Construct the DIBuilder, we do this here because we need the module.
DBuilder = new DIBuilder(*TheModule);
// Create the compile unit for the module.
// Currently down as "fib.ks" as a filename since we're redirecting stdin
// but we'd like actual source locations.
KSDbgInfo.TheCU = DBuilder->createCompileUnit(
dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
// Create the JIT. This takes ownership of the module.
std::string ErrStr;
TheExecutionEngine =
EngineBuilder(std::move(Owner))
.setErrorStr(&ErrStr)
.setMCJITMemoryManager(llvm::make_unique<SectionMemoryManager>())
.create();
if (!TheExecutionEngine) {
fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
exit(1);
}
legacy::FunctionPassManager OurFPM(TheModule);
// Set up the optimizer pipeline. Start with registering info about how the
// target lays out data structures.
TheModule->setDataLayout(*TheExecutionEngine->getDataLayout());
#if 0
// Provide basic AliasAnalysis support for GVN.
OurFPM.add(createBasicAliasAnalysisPass());
// Promote allocas to registers.
OurFPM.add(createPromoteMemoryToRegisterPass());
// Do simple "peephole" optimizations and bit-twiddling optzns.
OurFPM.add(createInstructionCombiningPass());
// Reassociate expressions.
OurFPM.add(createReassociatePass());
// Eliminate Common SubExpressions.
OurFPM.add(createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
OurFPM.add(createCFGSimplificationPass());
#endif
OurFPM.doInitialization();
// Set the global so the code gen can use this.
TheFPM = &OurFPM;
// Run the main "interpreter loop" now.
MainLoop();
TheFPM = 0;
// Finalize the debug info.
DBuilder->finalize();
// Print out all of the generated code.
TheModule->dump();
return 0;
}
Kaleidoscope: Conclusion and other useful LLVM tidbits¶
Tutorial Conclusion¶
Welcome to the final chapter of the “Implementing a language with LLVM” tutorial. In the course of this tutorial, we have grown our little Kaleidoscope language from being a useless toy, to being a semi-interesting (but probably still useless) toy. :)
It is interesting to see how far we’ve come, and how little code it has taken. We built the entire lexer, parser, AST, code generator, an interactive run-loop (with a JIT!), and emitted debug information in standalone executables - all in under 1000 lines of (non-comment/non-blank) code.
Our little language supports a couple of interesting features: it supports user defined binary and unary operators, it uses JIT compilation for immediate evaluation, and it supports a few control flow constructs with SSA construction.
Part of the idea of this tutorial was to show you how easy and fun it can be to define, build, and play with languages. Building a compiler need not be a scary or mystical process! Now that you’ve seen some of the basics, I strongly encourage you to take the code and hack on it. For example, try adding:
- global variables - While global variables have questional value
in modern software engineering, they are often useful when putting
together quick little hacks like the Kaleidoscope compiler itself.
Fortunately, our current setup makes it very easy to add global
variables: just have value lookup check to see if an unresolved
variable is in the global variable symbol table before rejecting it.
To create a new global variable, make an instance of the LLVM
GlobalVariable
class. - typed variables - Kaleidoscope currently only supports variables of type double. This gives the language a very nice elegance, because only supporting one type means that you never have to specify types. Different languages have different ways of handling this. The easiest way is to require the user to specify types for every variable definition, and record the type of the variable in the symbol table along with its Value*.
- arrays, structs, vectors, etc - Once you add types, you can start extending the type system in all sorts of interesting ways. Simple arrays are very easy and are quite useful for many different applications. Adding them is mostly an exercise in learning how the LLVM getelementptr instruction works: it is so nifty/unconventional, it has its own FAQ! If you add support for recursive types (e.g. linked lists), make sure to read the section in the LLVM Programmer’s Manual that describes how to construct them.
- standard runtime - Our current language allows the user to access arbitrary external functions, and we use it for things like “printd” and “putchard”. As you extend the language to add higher-level constructs, often these constructs make the most sense if they are lowered to calls into a language-supplied runtime. For example, if you add hash tables to the language, it would probably make sense to add the routines to a runtime, instead of inlining them all the way.
- memory management - Currently we can only access the stack in Kaleidoscope. It would also be useful to be able to allocate heap memory, either with calls to the standard libc malloc/free interface or with a garbage collector. If you would like to use garbage collection, note that LLVM fully supports Accurate Garbage Collection including algorithms that move objects and need to scan/update the stack.
- exception handling support - LLVM supports generation of zero cost exceptions which interoperate with code compiled in other languages. You could also generate code by implicitly making every function return an error value and checking it. You could also make explicit use of setjmp/longjmp. There are many different ways to go here.
- object orientation, generics, database access, complex numbers, geometric programming, ... - Really, there is no end of crazy features that you can add to the language.
- unusual domains - We’ve been talking about applying LLVM to a domain that many people are interested in: building a compiler for a specific language. However, there are many other domains that can use compiler technology that are not typically considered. For example, LLVM has been used to implement OpenGL graphics acceleration, translate C++ code to ActionScript, and many other cute and clever things. Maybe you will be the first to JIT compile a regular expression interpreter into native code with LLVM?
Have fun - try doing something crazy and unusual. Building a language like everyone else always has, is much less fun than trying something a little crazy or off the wall and seeing how it turns out. If you get stuck or want to talk about it, feel free to email the llvm-dev mailing list: it has lots of people who are interested in languages and are often willing to help out.
Before we end this tutorial, I want to talk about some “tips and tricks” for generating LLVM IR. These are some of the more subtle things that may not be obvious, but are very useful if you want to take advantage of LLVM’s capabilities.
Properties of the LLVM IR¶
We have a couple common questions about code in the LLVM IR form - lets just get these out of the way right now, shall we?
Target Independence¶
Kaleidoscope is an example of a “portable language”: any program written in Kaleidoscope will work the same way on any target that it runs on. Many other languages have this property, e.g. lisp, java, haskell, javascript, python, etc (note that while these languages are portable, not all their libraries are).
One nice aspect of LLVM is that it is often capable of preserving target independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled program and run it on any target that LLVM supports, even emitting C code and compiling that on targets that LLVM doesn’t support natively. You can trivially tell that the Kaleidoscope compiler generates target-independent code because it never queries for any target-specific information when generating code.
The fact that LLVM provides a compact, target-independent, representation for code gets a lot of people excited. Unfortunately, these people are usually thinking about C or a language from the C family when they are asking questions about language portability. I say “unfortunately”, because there is really no way to make (fully general) C code portable, other than shipping the source code around (and of course, C source code is not actually portable in general either - ever port a really old application from 32- to 64-bits?).
The problem with C (again, in its full generality) is that it is heavily laden with target specific assumptions. As one simple example, the preprocessor often destructively removes target-independence from the code when it processes the input text:
#ifdef __i386__
int X = 1;
#else
int X = 42;
#endif
While it is possible to engineer more and more complex solutions to problems like this, it cannot be solved in full generality in a way that is better than shipping the actual source code.
That said, there are interesting subsets of C that can be made portable. If you are willing to fix primitive types to a fixed size (say int = 32-bits, and long = 64-bits), don’t care about ABI compatibility with existing binaries, and are willing to give up some other minor features, you can have portable code. This can make sense for specialized domains such as an in-kernel language.
Safety Guarantees¶
Many of the languages above are also “safe” languages: it is impossible for a program written in Java to corrupt its address space and crash the process (assuming the JVM has no bugs). Safety is an interesting property that requires a combination of language design, runtime support, and often operating system support.
It is certainly possible to implement a safe language in LLVM, but LLVM IR does not itself guarantee safety. The LLVM IR allows unsafe pointer casts, use after free bugs, buffer over-runs, and a variety of other problems. Safety needs to be implemented as a layer on top of LLVM and, conveniently, several groups have investigated this. Ask on the llvm-dev mailing list if you are interested in more details.
Language-Specific Optimizations¶
One thing about LLVM that turns off many people is that it does not solve all the world’s problems in one system (sorry ‘world hunger’, someone else will have to solve you some other day). One specific complaint is that people perceive LLVM as being incapable of performing high-level language-specific optimization: LLVM “loses too much information”.
Unfortunately, this is really not the place to give you a full and unified version of “Chris Lattner’s theory of compiler design”. Instead, I’ll make a few observations:
First, you’re right that LLVM does lose information. For example, as of this writing, there is no way to distinguish in the LLVM IR whether an SSA-value came from a C “int” or a C “long” on an ILP32 machine (other than debug info). Both get compiled down to an ‘i32’ value and the information about what it came from is lost. The more general issue here, is that the LLVM type system uses “structural equivalence” instead of “name equivalence”. Another place this surprises people is if you have two types in a high-level language that have the same structure (e.g. two different structs that have a single int field): these types will compile down into a single LLVM type and it will be impossible to tell what it came from.
Second, while LLVM does lose information, LLVM is not a fixed target: we continue to enhance and improve it in many different ways. In addition to adding new features (LLVM did not always support exceptions or debug info), we also extend the IR to capture important information for optimization (e.g. whether an argument is sign or zero extended, information about pointers aliasing, etc). Many of the enhancements are user-driven: people want LLVM to include some specific feature, so they go ahead and extend it.
Third, it is possible and easy to add language-specific optimizations, and you have a number of choices in how to do it. As one trivial example, it is easy to add language-specific optimization passes that “know” things about code compiled for a language. In the case of the C family, there is an optimization pass that “knows” about the standard C library functions. If you call “exit(0)” in main(), it knows that it is safe to optimize that into “return 0;” because C specifies what the ‘exit’ function does.
In addition to simple library knowledge, it is possible to embed a variety of other language-specific information into the LLVM IR. If you have a specific need and run into a wall, please bring the topic up on the llvm-dev list. At the very worst, you can always treat LLVM as if it were a “dumb code generator” and implement the high-level optimizations you desire in your front-end, on the language-specific AST.
Tips and Tricks¶
There is a variety of useful tips and tricks that you come to know after working on/with LLVM that aren’t obvious at first glance. Instead of letting everyone rediscover them, this section talks about some of these issues.
Implementing portable offsetof/sizeof¶
One interesting thing that comes up, if you are trying to keep the code generated by your compiler “target independent”, is that you often need to know the size of some LLVM type or the offset of some field in an llvm structure. For example, you might need to pass the size of a type into a function that allocates memory.
Unfortunately, this can vary widely across targets: for example the width of a pointer is trivially target-specific. However, there is a clever way to use the getelementptr instruction that allows you to compute this in a portable way.
Garbage Collected Stack Frames¶
Some languages want to explicitly manage their stack frames, often so that they are garbage collected or to allow easy implementation of closures. There are often better ways to implement these features than explicit stack frames, but LLVM does support them, if you want. It requires your front-end to convert the code into Continuation Passing Style and the use of tail calls (which LLVM also supports).
Kaleidoscope: Implementing a Language with LLVM in Objective Caml¶
Kaleidoscope: Tutorial Introduction and the Lexer¶
Tutorial Introduction¶
Welcome to the “Implementing a language with LLVM” tutorial. This tutorial runs through the implementation of a simple language, showing how fun and easy it can be. This tutorial will get you up and started as well as help to build a framework you can extend to other languages. The code in this tutorial can also be used as a playground to hack on other LLVM specific things.
The goal of this tutorial is to progressively unveil our language, describing how it is built up over time. This will let us cover a fairly broad range of language design and LLVM-specific usage issues, showing and explaining the code for it all along the way, without overwhelming you with tons of details up front.
It is useful to point out ahead of time that this tutorial is really about teaching compiler techniques and LLVM specifically, not about teaching modern and sane software engineering principles. In practice, this means that we’ll take a number of shortcuts to simplify the exposition. For example, the code leaks memory, uses global variables all over the place, doesn’t use nice design patterns like visitors, etc... but it is very simple. If you dig in and use the code as a basis for future projects, fixing these deficiencies shouldn’t be hard.
I’ve tried to put this tutorial together in a way that makes chapters easy to skip over if you are already familiar with or are uninterested in the various pieces. The structure of the tutorial is:
- Chapter #1: Introduction to the Kaleidoscope language, and the definition of its Lexer - This shows where we are going and the basic functionality that we want it to do. In order to make this tutorial maximally understandable and hackable, we choose to implement everything in Objective Caml instead of using lexer and parser generators. LLVM obviously works just fine with such tools, feel free to use one if you prefer.
- Chapter #2: Implementing a Parser and AST - With the lexer in place, we can talk about parsing techniques and basic AST construction. This tutorial describes recursive descent parsing and operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific, the code doesn’t even link in LLVM at this point. :)
- Chapter #3: Code generation to LLVM IR - With the AST ready, we can show off how easy generation of LLVM IR really is.
- Chapter #4: Adding JIT and Optimizer Support - Because a lot of people are interested in using LLVM as a JIT, we’ll dive right into it and show you the 3 lines it takes to add JIT support. LLVM is also useful in many other ways, but this is one simple and “sexy” way to shows off its power. :)
- Chapter #5: Extending the Language: Control Flow - With the language up and running, we show how to extend it with control flow operations (if/then/else and a ‘for’ loop). This gives us a chance to talk about simple SSA construction and control flow.
- Chapter #6: Extending the Language: User-defined Operators - This is a silly but fun chapter that talks about extending the language to let the user program define their own arbitrary unary and binary operators (with assignable precedence!). This lets us build a significant piece of the “language” as library routines.
- Chapter #7: Extending the Language: Mutable Variables - This chapter talks about adding user-defined local variables along with an assignment operator. The interesting part about this is how easy and trivial it is to construct SSA form in LLVM: no, LLVM does not require your front-end to construct SSA form!
- Chapter #8: Conclusion and other useful LLVM tidbits - This chapter wraps up the series by talking about potential ways to extend the language, but also includes a bunch of pointers to info about “special topics” like adding garbage collection support, exceptions, debugging, support for “spaghetti stacks”, and a bunch of other tips and tricks.
By the end of the tutorial, we’ll have written a bit less than 700 lines of non-comment, non-blank, lines of code. With this small amount of code, we’ll have built up a very reasonable compiler for a non-trivial language including a hand-written lexer, parser, AST, as well as code generation support with a JIT compiler. While other systems may have interesting “hello world” tutorials, I think the breadth of this tutorial is a great testament to the strengths of LLVM and why you should consider it if you’re interested in language or compiler design.
A note about this tutorial: we expect you to extend the language and play with it on your own. Take the code and go crazy hacking away at it, compilers don’t need to be scary creatures - it can be a lot of fun to play with languages!
The Basic Language¶
This tutorial will be illustrated with a toy language that we’ll call “Kaleidoscope” (derived from “meaning beautiful, form, and view”). Kaleidoscope is a procedural language that allows you to define functions, use conditionals, math, etc. Over the course of the tutorial, we’ll extend Kaleidoscope to support the if/then/else construct, a for loop, user defined operators, JIT compilation with a simple command line interface, etc.
Because we want to keep things simple, the only datatype in Kaleidoscope is a 64-bit floating point type (aka ‘float’ in O’Caml parlance). As such, all values are implicitly double precision and the language doesn’t require type declarations. This gives the language a very nice and simple syntax. For example, the following simple example computes Fibonacci numbers:
# Compute the x'th fibonacci number.
def fib(x)
if x < 3 then
1
else
fib(x-1)+fib(x-2)
# This expression will compute the 40th number.
fib(40)
We also allow Kaleidoscope to call into standard library functions (the LLVM JIT makes this completely trivial). This means that you can use the ‘extern’ keyword to define a function before you use it (this is also useful for mutually recursive functions). For example:
extern sin(arg);
extern cos(arg);
extern atan2(arg1 arg2);
atan2(sin(.4), cos(42))
A more interesting example is included in Chapter 6 where we write a little Kaleidoscope application that displays a Mandelbrot Set at various levels of magnification.
Lets dive into the implementation of this language!
The Lexer¶
When it comes to implementing a language, the first thing needed is the ability to process a text file and recognize what it says. The traditional way to do this is to use a “lexer” (aka ‘scanner’) to break the input up into “tokens”. Each token returned by the lexer includes a token code and potentially some metadata (e.g. the numeric value of a number). First, we define the possibilities:
(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
* these others for known things. *)
type token =
(* commands *)
| Def | Extern
(* primary *)
| Ident of string | Number of float
(* unknown *)
| Kwd of char
Each token returned by our lexer will be one of the token variant
values. An unknown character like ‘+’ will be returned as
Token.Kwd '+'
. If the curr token is an identifier, the value will be
Token.Ident s
. If the current token is a numeric literal (like 1.0),
the value will be Token.Number 1.0
.
The actual implementation of the lexer is a collection of functions
driven by a function named Lexer.lex
. The Lexer.lex
function is
called to return the next token from standard input. We will use
Camlp4 to
simplify the tokenization of the standard input. Its definition starts
as:
(*===----------------------------------------------------------------------===
* Lexer
*===----------------------------------------------------------------------===*)
let rec lex = parser
(* Skip any whitespace. *)
| [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
Lexer.lex
works by recursing over a char Stream.t
to read
characters one at a time from the standard input. It eats them as it
recognizes them and stores them in in a Token.token
variant. The
first thing that it has to do is ignore whitespace between tokens. This
is accomplished with the recursive call above.
The next thing Lexer.lex
needs to do is recognize identifiers and
specific keywords like “def”. Kaleidoscope does this with a pattern
match and a helper function.
(* identifier: [a-zA-Z][a-zA-Z0-9] *)
| [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
let buffer = Buffer.create 1 in
Buffer.add_char buffer c;
lex_ident buffer stream
...
and lex_ident buffer = parser
| [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
Buffer.add_char buffer c;
lex_ident buffer stream
| [< stream=lex >] ->
match Buffer.contents buffer with
| "def" -> [< 'Token.Def; stream >]
| "extern" -> [< 'Token.Extern; stream >]
| id -> [< 'Token.Ident id; stream >]
Numeric values are similar:
(* number: [0-9.]+ *)
| [< ' ('0' .. '9' as c); stream >] ->
let buffer = Buffer.create 1 in
Buffer.add_char buffer c;
lex_number buffer stream
...
and lex_number buffer = parser
| [< ' ('0' .. '9' | '.' as c); stream >] ->
Buffer.add_char buffer c;
lex_number buffer stream
| [< stream=lex >] ->
[< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
This is all pretty straight-forward code for processing input. When
reading a numeric value from input, we use the ocaml float_of_string
function to convert it to a numeric value that we store in
Token.Number
. Note that this isn’t doing sufficient error checking:
it will raise Failure
if the string “1.23.45.67”. Feel free to
extend it :). Next we handle comments:
(* Comment until end of line. *)
| [< ' ('#'); stream >] ->
lex_comment stream
...
and lex_comment = parser
| [< ' ('\n'); stream=lex >] -> stream
| [< 'c; e=lex_comment >] -> e
| [< >] -> [< >]
We handle comments by skipping to the end of the line and then return the next token. Finally, if the input doesn’t match one of the above cases, it is either an operator character like ‘+’ or the end of the file. These are handled with this code:
(* Otherwise, just return the character as its ascii value. *)
| [< 'c; stream >] ->
[< 'Token.Kwd c; lex stream >]
(* end of stream. *)
| [< >] -> [< >]
With this, we have the complete lexer for the basic Kaleidoscope language (the full code listing for the Lexer is available in the next chapter of the tutorial). Next we’ll build a simple parser that uses this to build an Abstract Syntax Tree. When we have that, we’ll include a driver so that you can use the lexer and parser together.
Kaleidoscope: Implementing a Parser and AST¶
Chapter 2 Introduction¶
Welcome to Chapter 2 of the “Implementing a language with LLVM in Objective Caml” tutorial. This chapter shows you how to use the lexer, built in Chapter 1, to build a full parser for our Kaleidoscope language. Once we have a parser, we’ll define and build an Abstract Syntax Tree (AST).
The parser we will build uses a combination of Recursive Descent Parsing and Operator-Precedence Parsing to parse the Kaleidoscope language (the latter for binary expressions and the former for everything else). Before we get to parsing though, lets talk about the output of the parser: the Abstract Syntax Tree.
The Abstract Syntax Tree (AST)¶
The AST for a program captures its behavior in such a way that it is easy for later stages of the compiler (e.g. code generation) to interpret. We basically want one object for each construct in the language, and the AST should closely model the language. In Kaleidoscope, we have expressions, a prototype, and a function object. We’ll start with expressions first:
(* expr - Base type for all expression nodes. *)
type expr =
(* variant for numeric literals like "1.0". *)
| Number of float
The code above shows the definition of the base ExprAST class and one subclass which we use for numeric literals. The important thing to note about this code is that the Number variant captures the numeric value of the literal as an instance variable. This allows later phases of the compiler to know what the stored numeric value is.
Right now we only create the AST, so there are no useful functions on them. It would be very easy to add a function to pretty print the code, for example. Here are the other expression AST node definitions that we’ll use in the basic form of the Kaleidoscope language:
(* variant for referencing a variable, like "a". *)
| Variable of string
(* variant for a binary operator. *)
| Binary of char * expr * expr
(* variant for function calls. *)
| Call of string * expr array
This is all (intentionally) rather straight-forward: variables capture the variable name, binary operators capture their opcode (e.g. ‘+’), and calls capture a function name as well as a list of any argument expressions. One thing that is nice about our AST is that it captures the language features without talking about the syntax of the language. Note that there is no discussion about precedence of binary operators, lexical structure, etc.
For our basic language, these are all of the expression nodes we’ll define. Because it doesn’t have conditional control flow, it isn’t Turing-complete; we’ll fix that in a later installment. The two things we need next are a way to talk about the interface to a function, and a way to talk about functions themselves:
(* proto - This type represents the "prototype" for a function, which captures
* its name, and its argument names (thus implicitly the number of arguments the
* function takes). *)
type proto = Prototype of string * string array
(* func - This type represents a function definition itself. *)
type func = Function of proto * expr
In Kaleidoscope, functions are typed with just a count of their arguments. Since all values are double precision floating point, the type of each argument doesn’t need to be stored anywhere. In a more aggressive and realistic language, the “expr” variants would probably have a type field.
With this scaffolding, we can now talk about parsing expressions and function bodies in Kaleidoscope.
Parser Basics¶
Now that we have an AST to build, we need to define the parser code to build it. The idea here is that we want to parse something like “x+y” (which is returned as three tokens by the lexer) into an AST that could be generated with calls like this:
let x = Variable "x" in
let y = Variable "y" in
let result = Binary ('+', x, y) in
...
The error handling routines make use of the builtin Stream.Failure
and Stream.Error``s. ``Stream.Failure
is raised when the parser is
unable to find any matching token in the first position of a pattern.
Stream.Error
is raised when the first token matches, but the rest do
not. The error recovery in our parser will not be the best and is not
particular user-friendly, but it will be enough for our tutorial. These
exceptions make it easier to handle errors in routines that have various
return types.
With these basic types and exceptions, we can implement the first piece of our grammar: numeric literals.
Basic Expression Parsing¶
We start with numeric literals, because they are the simplest to process. For each production in our grammar, we’ll define a function which parses that production. We call this class of expressions “primary” expressions, for reasons that will become more clear later in the tutorial. In order to parse an arbitrary primary expression, we need to determine what sort of expression it is. For numeric literals, we have:
(* primary
* ::= identifier
* ::= numberexpr
* ::= parenexpr *)
parse_primary = parser
(* numberexpr ::= number *)
| [< 'Token.Number n >] -> Ast.Number n
This routine is very simple: it expects to be called when the current
token is a Token.Number
token. It takes the current number value,
creates a Ast.Number
node, advances the lexer to the next token, and
finally returns.
There are some interesting aspects to this. The most important one is that this routine eats all of the tokens that correspond to the production and returns the lexer buffer with the next token (which is not part of the grammar production) ready to go. This is a fairly standard way to go for recursive descent parsers. For a better example, the parenthesis operator is defined like this:
(* parenexpr ::= '(' expression ')' *)
| [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
This function illustrates a number of interesting things about the parser:
1) It shows how we use the Stream.Error
exception. When called, this
function expects that the current token is a ‘(‘ token, but after
parsing the subexpression, it is possible that there is no ‘)’ waiting.
For example, if the user types in “(4 x” instead of “(4)”, the parser
should emit an error. Because errors can occur, the parser needs a way
to indicate that they happened. In our parser, we use the camlp4
shortcut syntax token ?? "parse error"
, where if the token before
the ??
does not match, then Stream.Error "parse error"
will be
raised.
2) Another interesting aspect of this function is that it uses recursion
by calling Parser.parse_primary
(we will soon see that
Parser.parse_primary
can call Parser.parse_primary
). This is
powerful because it allows us to handle recursive grammars, and keeps
each production very simple. Note that parentheses do not cause
construction of AST nodes themselves. While we could do it this way, the
most important role of parentheses are to guide the parser and provide
grouping. Once the parser constructs the AST, parentheses are not
needed.
The next simple production is for handling variable references and function calls:
(* identifierexpr
* ::= identifier
* ::= identifier '(' argumentexpr ')' *)
| [< 'Token.Ident id; stream >] ->
let rec parse_args accumulator = parser
| [< e=parse_expr; stream >] ->
begin parser
| [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
| [< >] -> e :: accumulator
end stream
| [< >] -> accumulator
in
let rec parse_ident id = parser
(* Call. *)
| [< 'Token.Kwd '(';
args=parse_args [];
'Token.Kwd ')' ?? "expected ')'">] ->
Ast.Call (id, Array.of_list (List.rev args))
(* Simple variable ref. *)
| [< >] -> Ast.Variable id
in
parse_ident id stream
This routine follows the same style as the other routines. (It expects
to be called if the current token is a Token.Ident
token). It also
has recursion and error handling. One interesting aspect of this is that
it uses look-ahead to determine if the current identifier is a stand
alone variable reference or if it is a function call expression. It
handles this by checking to see if the token after the identifier is a
‘(‘ token, constructing either a Ast.Variable
or Ast.Call
node
as appropriate.
We finish up by raising an exception if we received a token we didn’t expect:
| [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
Now that basic expressions are handled, we need to handle binary expressions. They are a bit more complex.
Binary Expression Parsing¶
Binary expressions are significantly harder to parse because they are often ambiguous. For example, when given the string “x+y*z”, the parser can choose to parse it as either “(x+y)*z” or “x+(y*z)”. With common definitions from mathematics, we expect the later parse, because “*” (multiplication) has higher precedence than “+” (addition).
There are many ways to handle this, but an elegant and efficient way is to use Operator-Precedence Parsing. This parsing technique uses the precedence of binary operators to guide recursion. To start with, we need a table of precedences:
(* binop_precedence - This holds the precedence for each binary operator that is
* defined *)
let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
(* precedence - Get the precedence of the pending binary operator token. *)
let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
...
let main () =
(* Install standard binary operators.
* 1 is the lowest precedence. *)
Hashtbl.add Parser.binop_precedence '<' 10;
Hashtbl.add Parser.binop_precedence '+' 20;
Hashtbl.add Parser.binop_precedence '-' 20;
Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
...
For the basic form of Kaleidoscope, we will only support 4 binary
operators (this can obviously be extended by you, our brave and intrepid
reader). The Parser.precedence
function returns the precedence for
the current token, or -1 if the token is not a binary operator. Having a
Hashtbl.t
makes it easy to add new operators and makes it clear that
the algorithm doesn’t depend on the specific operators involved, but it
would be easy enough to eliminate the Hashtbl.t
and do the
comparisons in the Parser.precedence
function. (Or just use a
fixed-size array).
With the helper above defined, we can now start parsing binary expressions. The basic idea of operator precedence parsing is to break down an expression with potentially ambiguous binary operators into pieces. Consider ,for example, the expression “a+b+(c+d)*e*f+g”. Operator precedence parsing considers this as a stream of primary expressions separated by binary operators. As such, it will first parse the leading primary expression “a”, then it will see the pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses are primary expressions, the binary expression parser doesn’t need to worry about nested subexpressions like (c+d) at all.
To start, an expression is a primary expression potentially followed by a sequence of [binop,primaryexpr] pairs:
(* expression
* ::= primary binoprhs *)
and parse_expr = parser
| [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
Parser.parse_bin_rhs
is the function that parses the sequence of
pairs for us. It takes a precedence and a pointer to an expression for
the part that has been parsed so far. Note that “x” is a perfectly valid
expression: As such, “binoprhs” is allowed to be empty, in which case it
returns the expression that is passed into it. In our example above, the
code passes the expression for “a” into Parser.parse_bin_rhs
and the
current token is “+”.
The precedence value passed into Parser.parse_bin_rhs
indicates the
minimal operator precedence that the function is allowed to eat. For
example, if the current pair stream is [+, x] and
Parser.parse_bin_rhs
is passed in a precedence of 40, it will not
consume any tokens (because the precedence of ‘+’ is only 20). With this
in mind, Parser.parse_bin_rhs
starts with:
(* binoprhs
* ::= ('+' primary)* *)
and parse_bin_rhs expr_prec lhs stream =
match Stream.peek stream with
(* If this is a binop, find its precedence. *)
| Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
let token_prec = precedence c in
(* If this is a binop that binds at least as tightly as the current binop,
* consume it, otherwise we are done. *)
if token_prec < expr_prec then lhs else begin
This code gets the precedence of the current token and checks to see if if is too low. Because we defined invalid tokens to have a precedence of -1, this check implicitly knows that the pair-stream ends when the token stream runs out of binary operators. If this check succeeds, we know that the token is a binary operator and that it will be included in this expression:
(* Eat the binop. *)
Stream.junk stream;
(* Parse the primary expression after the binary operator *)
let rhs = parse_primary stream in
(* Okay, we know this is a binop. *)
let rhs =
match Stream.peek stream with
| Some (Token.Kwd c2) ->
As such, this code eats (and remembers) the binary operator and then parses the primary expression that follows. This builds up the whole pair, the first of which is [+, b] for the running example.
Now that we parsed the left-hand side of an expression and one pair of the RHS sequence, we have to decide which way the expression associates. In particular, we could have “(a+b) binop unparsed” or “a + (b binop unparsed)”. To determine this, we look ahead at “binop” to determine its precedence and compare it to BinOp’s precedence (which is ‘+’ in this case):
(* If BinOp binds less tightly with rhs than the operator after
* rhs, let the pending operator take rhs as its lhs. *)
let next_prec = precedence c2 in
if token_prec < next_prec
If the precedence of the binop to the right of “RHS” is lower or equal to the precedence of our current operator, then we know that the parentheses associate as “(a+b) binop ...”. In our example, the current operator is “+” and the next operator is “+”, we know that they have the same precedence. In this case we’ll create the AST node for “a+b”, and then continue parsing:
... if body omitted ...
in
(* Merge lhs/rhs. *)
let lhs = Ast.Binary (c, lhs, rhs) in
parse_bin_rhs expr_prec lhs stream
end
In our example above, this will turn “a+b+” into “(a+b)” and execute the next iteration of the loop, with “+” as the current token. The code above will eat, remember, and parse “(c+d)” as the primary expression, which makes the current pair equal to [+, (c+d)]. It will then evaluate the ‘if’ conditional above with “*” as the binop to the right of the primary. In this case, the precedence of “*” is higher than the precedence of “+” so the if condition will be entered.
The critical question left here is “how can the if condition parse the right hand side in full”? In particular, to build the AST correctly for our example, it needs to get all of “(c+d)*e*f” as the RHS expression variable. The code to do this is surprisingly simple (code from the above two blocks duplicated for context):
match Stream.peek stream with
| Some (Token.Kwd c2) ->
(* If BinOp binds less tightly with rhs than the operator after
* rhs, let the pending operator take rhs as its lhs. *)
if token_prec < precedence c2
then parse_bin_rhs (token_prec + 1) rhs stream
else rhs
| _ -> rhs
in
(* Merge lhs/rhs. *)
let lhs = Ast.Binary (c, lhs, rhs) in
parse_bin_rhs expr_prec lhs stream
end
At this point, we know that the binary operator to the RHS of our
primary has higher precedence than the binop we are currently parsing.
As such, we know that any sequence of pairs whose operators are all
higher precedence than “+” should be parsed together and returned as
“RHS”. To do this, we recursively invoke the Parser.parse_bin_rhs
function specifying “token_prec+1” as the minimum precedence required
for it to continue. In our example above, this will cause it to return
the AST node for “(c+d)*e*f” as RHS, which is then set as the RHS of
the ‘+’ expression.
Finally, on the next iteration of the while loop, the “+g” piece is parsed and added to the AST. With this little bit of code (14 non-trivial lines), we correctly handle fully general binary expression parsing in a very elegant way. This was a whirlwind tour of this code, and it is somewhat subtle. I recommend running through it with a few tough examples to see how it works.
This wraps up handling of expressions. At this point, we can point the parser at an arbitrary token stream and build an expression from it, stopping at the first token that is not part of the expression. Next up we need to handle function definitions, etc.
Parsing the Rest¶
The next thing missing is handling of function prototypes. In Kaleidoscope, these are used both for ‘extern’ function declarations as well as function body definitions. The code to do this is straight-forward and not very interesting (once you’ve survived expressions):
(* prototype
* ::= id '(' id* ')' *)
let parse_prototype =
let rec parse_args accumulator = parser
| [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
| [< >] -> accumulator
in
parser
| [< 'Token.Ident id;
'Token.Kwd '(' ?? "expected '(' in prototype";
args=parse_args [];
'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
(* success. *)
Ast.Prototype (id, Array.of_list (List.rev args))
| [< >] ->
raise (Stream.Error "expected function name in prototype")
Given this, a function definition is very simple, just a prototype plus an expression to implement the body:
(* definition ::= 'def' prototype expression *)
let parse_definition = parser
| [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
Ast.Function (p, e)
In addition, we support ‘extern’ to declare functions like ‘sin’ and ‘cos’ as well as to support forward declaration of user functions. These ‘extern’s are just prototypes with no body:
(* external ::= 'extern' prototype *)
let parse_extern = parser
| [< 'Token.Extern; e=parse_prototype >] -> e
Finally, we’ll also let the user type in arbitrary top-level expressions and evaluate them on the fly. We will handle this by defining anonymous nullary (zero argument) functions for them:
(* toplevelexpr ::= expression *)
let parse_toplevel = parser
| [< e=parse_expr >] ->
(* Make an anonymous proto. *)
Ast.Function (Ast.Prototype ("", [||]), e)
Now that we have all the pieces, let’s build a little driver that will let us actually execute this code we’ve built!
The Driver¶
The driver for this simply invokes all of the parsing pieces with a top-level dispatch loop. There isn’t much interesting here, so I’ll just include the top-level loop. See below for full code in the “Top-Level Parsing” section.
(* top ::= definition | external | expression | ';' *)
let rec main_loop stream =
match Stream.peek stream with
| None -> ()
(* ignore top-level semicolons. *)
| Some (Token.Kwd ';') ->
Stream.junk stream;
main_loop stream
| Some token ->
begin
try match token with
| Token.Def ->
ignore(Parser.parse_definition stream);
print_endline "parsed a function definition.";
| Token.Extern ->
ignore(Parser.parse_extern stream);
print_endline "parsed an extern.";
| _ ->
(* Evaluate a top-level expression into an anonymous function. *)
ignore(Parser.parse_toplevel stream);
print_endline "parsed a top-level expr";
with Stream.Error s ->
(* Skip token for error recovery. *)
Stream.junk stream;
print_endline s;
end;
print_string "ready> "; flush stdout;
main_loop stream
The most interesting part of this is that we ignore top-level semicolons. Why is this, you ask? The basic reason is that if you type “4 + 5” at the command line, the parser doesn’t know whether that is the end of what you will type or not. For example, on the next line you could type “def foo...” in which case 4+5 is the end of a top-level expression. Alternatively you could type “* 6”, which would continue the expression. Having top-level semicolons allows you to type “4+5;”, and the parser will know you are done.
Conclusions¶
With just under 300 lines of commented code (240 lines of non-comment, non-blank code), we fully defined our minimal language, including a lexer, parser, and AST builder. With this done, the executable will validate Kaleidoscope code and tell us if it is grammatically invalid. For example, here is a sample interaction:
$ ./toy.byte
ready> def foo(x y) x+foo(y, 4.0);
Parsed a function definition.
ready> def foo(x y) x+y y;
Parsed a function definition.
Parsed a top-level expr
ready> def foo(x y) x+y );
Parsed a function definition.
Error: unknown token when expecting an expression
ready> extern sin(a);
ready> Parsed an extern
ready> ^D
$
There is a lot of room for extension here. You can define new AST nodes, extend the language in many ways, etc. In the next installment, we will describe how to generate LLVM Intermediate Representation (IR) from the AST.
Full Code Listing¶
Here is the complete code listing for this and the previous chapter. Note that it is fully self-contained: you don’t need LLVM or any external libraries at all for this. (Besides the ocaml standard libraries, of course.) To build this, just compile with:
# Compile
ocamlbuild toy.byte
# Run
./toy.byte
Here is the code:
- _tags:
<{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
- token.ml:
(*===----------------------------------------------------------------------=== * Lexer Tokens *===----------------------------------------------------------------------===*) (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of * these others for known things. *) type token = (* commands *) | Def | Extern (* primary *) | Ident of string | Number of float (* unknown *) | Kwd of char
- lexer.ml:
(*===----------------------------------------------------------------------=== * Lexer *===----------------------------------------------------------------------===*) let rec lex = parser (* Skip any whitespace. *) | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream (* identifier: [a-zA-Z][a-zA-Z0-9] *) | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_ident buffer stream (* number: [0-9.]+ *) | [< ' ('0' .. '9' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_number buffer stream (* Comment until end of line. *) | [< ' ('#'); stream >] -> lex_comment stream (* Otherwise, just return the character as its ascii value. *) | [< 'c; stream >] -> [< 'Token.Kwd c; lex stream >] (* end of stream. *) | [< >] -> [< >] and lex_number buffer = parser | [< ' ('0' .. '9' | '.' as c); stream >] -> Buffer.add_char buffer c; lex_number buffer stream | [< stream=lex >] -> [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] and lex_ident buffer = parser | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> Buffer.add_char buffer c; lex_ident buffer stream | [< stream=lex >] -> match Buffer.contents buffer with | "def" -> [< 'Token.Def; stream >] | "extern" -> [< 'Token.Extern; stream >] | id -> [< 'Token.Ident id; stream >] and lex_comment = parser | [< ' ('\n'); stream=lex >] -> stream | [< 'c; e=lex_comment >] -> e | [< >] -> [< >]
- ast.ml:
(*===----------------------------------------------------------------------=== * Abstract Syntax Tree (aka Parse Tree) *===----------------------------------------------------------------------===*) (* expr - Base type for all expression nodes. *) type expr = (* variant for numeric literals like "1.0". *) | Number of float (* variant for referencing a variable, like "a". *) | Variable of string (* variant for a binary operator. *) | Binary of char * expr * expr (* variant for function calls. *) | Call of string * expr array (* proto - This type represents the "prototype" for a function, which captures * its name, and its argument names (thus implicitly the number of arguments the * function takes). *) type proto = Prototype of string * string array (* func - This type represents a function definition itself. *) type func = Function of proto * expr
- parser.ml:
(*===---------------------------------------------------------------------=== * Parser *===---------------------------------------------------------------------===*) (* binop_precedence - This holds the precedence for each binary operator that is * defined *) let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 (* precedence - Get the precedence of the pending binary operator token. *) let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 (* primary * ::= identifier * ::= numberexpr * ::= parenexpr *) let rec parse_primary = parser (* numberexpr ::= number *) | [< 'Token.Number n >] -> Ast.Number n (* parenexpr ::= '(' expression ')' *) | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e (* identifierexpr * ::= identifier * ::= identifier '(' argumentexpr ')' *) | [< 'Token.Ident id; stream >] -> let rec parse_args accumulator = parser | [< e=parse_expr; stream >] -> begin parser | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e | [< >] -> e :: accumulator end stream | [< >] -> accumulator in let rec parse_ident id = parser (* Call. *) | [< 'Token.Kwd '('; args=parse_args []; 'Token.Kwd ')' ?? "expected ')'">] -> Ast.Call (id, Array.of_list (List.rev args)) (* Simple variable ref. *) | [< >] -> Ast.Variable id in parse_ident id stream | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") (* binoprhs * ::= ('+' primary)* *) and parse_bin_rhs expr_prec lhs stream = match Stream.peek stream with (* If this is a binop, find its precedence. *) | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> let token_prec = precedence c in (* If this is a binop that binds at least as tightly as the current binop, * consume it, otherwise we are done. *) if token_prec < expr_prec then lhs else begin (* Eat the binop. *) Stream.junk stream; (* Parse the primary expression after the binary operator. *) let rhs = parse_primary stream in (* Okay, we know this is a binop. *) let rhs = match Stream.peek stream with | Some (Token.Kwd c2) -> (* If BinOp binds less tightly with rhs than the operator after * rhs, let the pending operator take rhs as its lhs. *) let next_prec = precedence c2 in if token_prec < next_prec then parse_bin_rhs (token_prec + 1) rhs stream else rhs | _ -> rhs in (* Merge lhs/rhs. *) let lhs = Ast.Binary (c, lhs, rhs) in parse_bin_rhs expr_prec lhs stream end | _ -> lhs (* expression * ::= primary binoprhs *) and parse_expr = parser | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream (* prototype * ::= id '(' id* ')' *) let parse_prototype = let rec parse_args accumulator = parser | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | [< >] -> accumulator in parser | [< 'Token.Ident id; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> (* success. *) Ast.Prototype (id, Array.of_list (List.rev args)) | [< >] -> raise (Stream.Error "expected function name in prototype") (* definition ::= 'def' prototype expression *) let parse_definition = parser | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> Ast.Function (p, e) (* toplevelexpr ::= expression *) let parse_toplevel = parser | [< e=parse_expr >] -> (* Make an anonymous proto. *) Ast.Function (Ast.Prototype ("", [||]), e) (* external ::= 'extern' prototype *) let parse_extern = parser | [< 'Token.Extern; e=parse_prototype >] -> e
- toplevel.ml:
(*===----------------------------------------------------------------------=== * Top-Level parsing and JIT Driver *===----------------------------------------------------------------------===*) (* top ::= definition | external | expression | ';' *) let rec main_loop stream = match Stream.peek stream with | None -> () (* ignore top-level semicolons. *) | Some (Token.Kwd ';') -> Stream.junk stream; main_loop stream | Some token -> begin try match token with | Token.Def -> ignore(Parser.parse_definition stream); print_endline "parsed a function definition."; | Token.Extern -> ignore(Parser.parse_extern stream); print_endline "parsed an extern."; | _ -> (* Evaluate a top-level expression into an anonymous function. *) ignore(Parser.parse_toplevel stream); print_endline "parsed a top-level expr"; with Stream.Error s -> (* Skip token for error recovery. *) Stream.junk stream; print_endline s; end; print_string "ready> "; flush stdout; main_loop stream
- toy.ml:
(*===----------------------------------------------------------------------=== * Main driver code. *===----------------------------------------------------------------------===*) let main () = (* Install standard binary operators. * 1 is the lowest precedence. *) Hashtbl.add Parser.binop_precedence '<' 10; Hashtbl.add Parser.binop_precedence '+' 20; Hashtbl.add Parser.binop_precedence '-' 20; Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) (* Prime the first token. *) print_string "ready> "; flush stdout; let stream = Lexer.lex (Stream.of_channel stdin) in (* Run the main "interpreter loop" now. *) Toplevel.main_loop stream; ;; main ()
Kaleidoscope: Code generation to LLVM IR¶
Chapter 3 Introduction¶
Welcome to Chapter 3 of the “Implementing a language with LLVM” tutorial. This chapter shows you how to transform the Abstract Syntax Tree, built in Chapter 2, into LLVM IR. This will teach you a little bit about how LLVM does things, as well as demonstrate how easy it is to use. It’s much more work to build a lexer and parser than it is to generate LLVM IR code. :)
Please note: the code in this chapter and later require LLVM 2.3 or LLVM SVN to work. LLVM 2.2 and before will not work with it.
Code Generation Setup¶
In order to generate LLVM IR, we want some simple setup to get started. First we define virtual code generation (codegen) methods in each AST class:
let rec codegen_expr = function
| Ast.Number n -> ...
| Ast.Variable name -> ...
The Codegen.codegen_expr
function says to emit IR for that AST node
along with all the things it depends on, and they all return an LLVM
Value object. “Value” is the class used to represent a “Static Single
Assignment
(SSA)
register” or “SSA value” in LLVM. The most distinct aspect of SSA values
is that their value is computed as the related instruction executes, and
it does not get a new value until (and if) the instruction re-executes.
In other words, there is no way to “change” an SSA value. For more
information, please read up on Static Single
Assignment
- the concepts are really quite natural once you grok them.
The second thing we want is an “Error” exception like we used for the parser, which will be used to report errors found during code generation (for example, use of an undeclared parameter):
exception Error of string
let context = global_context ()
let the_module = create_module context "my cool jit"
let builder = builder context
let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
let double_type = double_type context
The static variables will be used during code generation.
Codgen.the_module
is the LLVM construct that contains all of the
functions and global variables in a chunk of code. In many ways, it is
the top-level structure that the LLVM IR uses to contain code.
The Codegen.builder
object is a helper object that makes it easy to
generate LLVM instructions. Instances of the
IRBuilder
class keep track of the current place to insert instructions and has
methods to create new instructions.
The Codegen.named_values
map keeps track of which values are defined
in the current scope and what their LLVM representation is. (In other
words, it is a symbol table for the code). In this form of Kaleidoscope,
the only things that can be referenced are function parameters. As such,
function parameters will be in this map when generating code for their
function body.
With these basics in place, we can start talking about how to generate
code for each expression. Note that this assumes that the
Codgen.builder
has been set up to generate code into something.
For now, we’ll assume that this has already been done, and we’ll just
use it to emit code.
Expression Code Generation¶
Generating LLVM code for expression nodes is very straightforward: less than 30 lines of commented code for all four of our expression nodes. First we’ll do numeric literals:
| Ast.Number n -> const_float double_type n
In the LLVM IR, numeric constants are represented with the
ConstantFP
class, which holds the numeric value in an APFloat
internally (APFloat
has the capability of holding floating point
constants of Arbitrary Precision). This code basically just creates
and returns a ConstantFP
. Note that in the LLVM IR that constants
are all uniqued together and shared. For this reason, the API uses “the
foo::get(..)” idiom instead of “new foo(..)” or “foo::Create(..)”.
| Ast.Variable name ->
(try Hashtbl.find named_values name with
| Not_found -> raise (Error "unknown variable name"))
References to variables are also quite simple using LLVM. In the simple
version of Kaleidoscope, we assume that the variable has already been
emitted somewhere and its value is available. In practice, the only
values that can be in the Codegen.named_values
map are function
arguments. This code simply checks to see that the specified name is in
the map (if not, an unknown variable is being referenced) and returns
the value for it. In future chapters, we’ll add support for loop
induction variables in the symbol table, and for
local variables.
| Ast.Binary (op, lhs, rhs) ->
let lhs_val = codegen_expr lhs in
let rhs_val = codegen_expr rhs in
begin
match op with
| '+' -> build_fadd lhs_val rhs_val "addtmp" builder
| '-' -> build_fsub lhs_val rhs_val "subtmp" builder
| '*' -> build_fmul lhs_val rhs_val "multmp" builder
| '<' ->
(* Convert bool 0/1 to double 0.0 or 1.0 *)
let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
build_uitofp i double_type "booltmp" builder
| _ -> raise (Error "invalid binary operator")
end
Binary operators start to get more interesting. The basic idea here is that we recursively emit code for the left-hand side of the expression, then the right-hand side, then we compute the result of the binary expression. In this code, we do a simple switch on the opcode to create the right LLVM instruction.
In the example above, the LLVM builder class is starting to show its
value. IRBuilder knows where to insert the newly created instruction,
all you have to do is specify what instruction to create (e.g. with
Llvm.create_add
), which operands to use (lhs
and rhs
here)
and optionally provide a name for the generated instruction.
One nice thing about LLVM is that the name is just a hint. For instance, if the code above emits multiple “addtmp” variables, LLVM will automatically provide each one with an increasing, unique numeric suffix. Local value names for instructions are purely optional, but it makes it much easier to read the IR dumps.
LLVM instructions are constrained by strict rules: for example, the Left and Right operators of an add instruction must have the same type, and the result type of the add must match the operand types. Because all values in Kaleidoscope are doubles, this makes for very simple code for add, sub and mul.
On the other hand, LLVM specifies that the fcmp instruction always returns an ‘i1’ value (a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with a uitofp instruction. This instruction converts its input integer into a floating point value by treating the input as an unsigned value. In contrast, if we used the sitofp instruction, the Kaleidoscope ‘<’ operator would return 0.0 and -1.0, depending on the input value.
| Ast.Call (callee, args) ->
(* Look up the name in the module table. *)
let callee =
match lookup_function callee the_module with
| Some callee -> callee
| None -> raise (Error "unknown function referenced")
in
let params = params callee in
(* If argument mismatch error. *)
if Array.length params == Array.length args then () else
raise (Error "incorrect # arguments passed");
let args = Array.map codegen_expr args in
build_call callee args "calltmp" builder
Code generation for function calls is quite straightforward with LLVM. The code above initially does a function name lookup in the LLVM Module’s symbol table. Recall that the LLVM Module is the container that holds all of the functions we are JIT’ing. By giving each function the same name as what the user specifies, we can use the LLVM symbol table to resolve function names for us.
Once we have the function to call, we recursively codegen each argument that is to be passed in, and create an LLVM call instruction. Note that LLVM uses the native C calling conventions by default, allowing these calls to also call into standard library functions like “sin” and “cos”, with no additional effort.
This wraps up our handling of the four basic expressions that we have so far in Kaleidoscope. Feel free to go in and add some more. For example, by browsing the LLVM language reference you’ll find several other interesting instructions that are really easy to plug into our basic framework.
Function Code Generation¶
Code generation for prototypes and functions must handle a number of details, which make their code less beautiful than expression code generation, but allows us to illustrate some important points. First, lets talk about code generation for prototypes: they are used both for function bodies and external function declarations. The code starts with:
let codegen_proto = function
| Ast.Prototype (name, args) ->
(* Make the function type: double(double,double) etc. *)
let doubles = Array.make (Array.length args) double_type in
let ft = function_type double_type doubles in
let f =
match lookup_function name the_module with
This code packs a lot of power into a few lines. Note first that this
function returns a “Function*” instead of a “Value*” (although at the
moment they both are modeled by llvalue
in ocaml). Because a
“prototype” really talks about the external interface for a function
(not the value computed by an expression), it makes sense for it to
return the LLVM Function it corresponds to when codegen’d.
The call to Llvm.function_type
creates the Llvm.llvalue
that
should be used for a given Prototype. Since all function arguments in
Kaleidoscope are of type double, the first line creates a vector of “N”
LLVM double types. It then uses the Llvm.function_type
method to
create a function type that takes “N” doubles as arguments, returns one
double as a result, and that is not vararg (that uses the function
Llvm.var_arg_function_type
). Note that Types in LLVM are uniqued
just like Constant
‘s are, so you don’t “new” a type, you “get” it.
The final line above checks if the function has already been defined in
Codegen.the_module
. If not, we will create it.
| None -> declare_function name ft the_module
This indicates the type and name to use, as well as which module to
insert into. By default we assume a function has
Llvm.Linkage.ExternalLinkage
. “external
linkage” means that the function may be defined
outside the current module and/or that it is callable by functions
outside the module. The “name
” passed in is the name the user
specified: this name is registered in “Codegen.the_module
“s symbol
table, which is used by the function call code above.
In Kaleidoscope, I choose to allow redefinitions of functions in two cases: first, we want to allow ‘extern’ing a function more than once, as long as the prototypes for the externs match (since all arguments have the same type, we just have to check that the number of arguments match). Second, we want to allow ‘extern’ing a function and then defining a body for it. This is useful when defining mutually recursive functions.
(* If 'f' conflicted, there was already something named 'name'. If it
* has a body, don't allow redefinition or reextern. *)
| Some f ->
(* If 'f' already has a body, reject this. *)
if Array.length (basic_blocks f) == 0 then () else
raise (Error "redefinition of function");
(* If 'f' took a different number of arguments, reject. *)
if Array.length (params f) == Array.length args then () else
raise (Error "redefinition of function with different # args");
f
in
In order to verify the logic above, we first check to see if the pre-existing function is “empty”. In this case, empty means that it has no basic blocks in it, which means it has no body. If it has no body, it is a forward declaration. Since we don’t allow anything after a full definition of the function, the code rejects this case. If the previous reference to a function was an ‘extern’, we simply verify that the number of arguments for that definition and this one match up. If not, we emit an error.
(* Set names for all arguments. *)
Array.iteri (fun i a ->
let n = args.(i) in
set_value_name n a;
Hashtbl.add named_values n a;
) (params f);
f
The last bit of code for prototypes loops over all of the arguments in
the function, setting the name of the LLVM Argument objects to match,
and registering the arguments in the Codegen.named_values
map for
future use by the Ast.Variable
variant. Once this is set up, it
returns the Function object to the caller. Note that we don’t check for
conflicting argument names here (e.g. “extern foo(a b a)”). Doing so
would be very straight-forward with the mechanics we have already used
above.
let codegen_func = function
| Ast.Function (proto, body) ->
Hashtbl.clear named_values;
let the_function = codegen_proto proto in
Code generation for function definitions starts out simply enough: we
just codegen the prototype (Proto) and verify that it is ok. We then
clear out the Codegen.named_values
map to make sure that there isn’t
anything in it from the last function we compiled. Code generation of
the prototype ensures that there is an LLVM Function object that is
ready to go for us.
(* Create a new basic block to start insertion into. *)
let bb = append_block context "entry" the_function in
position_at_end bb builder;
try
let ret_val = codegen_expr body in
Now we get to the point where the Codegen.builder
is set up. The
first line creates a new basic
block (named “entry”),
which is inserted into the_function
. The second line then tells the
builder that new instructions should be inserted into the end of the new
basic block. Basic blocks in LLVM are an important part of functions
that define the Control Flow
Graph. Since we
don’t have any control flow, our functions will only contain one block
at this point. We’ll fix this in Chapter 5 :).
let ret_val = codegen_expr body in
(* Finish off the function. *)
let _ = build_ret ret_val builder in
(* Validate the generated code, checking for consistency. *)
Llvm_analysis.assert_valid_function the_function;
the_function
Once the insertion point is set up, we call the Codegen.codegen_func
method for the root expression of the function. If no error happens,
this emits code to compute the expression into the entry block and
returns the value that was computed. Assuming no error, we then create
an LLVM ret instruction, which completes the
function. Once the function is built, we call
Llvm_analysis.assert_valid_function
, which is provided by LLVM. This
function does a variety of consistency checks on the generated code, to
determine if our compiler is doing everything right. Using this is
important: it can catch a lot of bugs. Once the function is finished and
validated, we return it.
with e ->
delete_function the_function;
raise e
The only piece left here is handling of the error case. For simplicity,
we handle this by merely deleting the function we produced with the
Llvm.delete_function
method. This allows the user to redefine a
function that they incorrectly typed in before: if we didn’t delete it,
it would live in the symbol table, with a body, preventing future
redefinition.
This code does have a bug, though. Since the Codegen.codegen_proto
can return a previously defined forward declaration, our code can
actually delete a forward declaration. There are a number of ways to fix
this bug, see what you can come up with! Here is a testcase:
extern foo(a b); # ok, defines foo.
def foo(a b) c; # error, 'c' is invalid.
def bar() foo(1, 2); # error, unknown function "foo"
Driver Changes and Closing Thoughts¶
For now, code generation to LLVM doesn’t really get us much, except that
we can look at the pretty IR calls. The sample code inserts calls to
Codegen into the “Toplevel.main_loop
”, and then dumps out the LLVM
IR. This gives a nice way to look at the LLVM IR for simple functions.
For example:
ready> 4+5;
Read top-level expression:
define double @""() {
entry:
%addtmp = fadd double 4.000000e+00, 5.000000e+00
ret double %addtmp
}
Note how the parser turns the top-level expression into anonymous functions for us. This will be handy when we add JIT support in the next chapter. Also note that the code is very literally transcribed, no optimizations are being performed. We will add optimizations explicitly in the next chapter.
ready> def foo(a b) a*a + 2*a*b + b*b;
Read function definition:
define double @foo(double %a, double %b) {
entry:
%multmp = fmul double %a, %a
%multmp1 = fmul double 2.000000e+00, %a
%multmp2 = fmul double %multmp1, %b
%addtmp = fadd double %multmp, %multmp2
%multmp3 = fmul double %b, %b
%addtmp4 = fadd double %addtmp, %multmp3
ret double %addtmp4
}
This shows some simple arithmetic. Notice the striking similarity to the LLVM builder calls that we use to create the instructions.
ready> def bar(a) foo(a, 4.0) + bar(31337);
Read function definition:
define double @bar(double %a) {
entry:
%calltmp = call double @foo(double %a, double 4.000000e+00)
%calltmp1 = call double @bar(double 3.133700e+04)
%addtmp = fadd double %calltmp, %calltmp1
ret double %addtmp
}
This shows some function calls. Note that this function will take a long time to execute if you call it. In the future we’ll add conditional control flow to actually make recursion useful :).
ready> extern cos(x);
Read extern:
declare double @cos(double)
ready> cos(1.234);
Read top-level expression:
define double @""() {
entry:
%calltmp = call double @cos(double 1.234000e+00)
ret double %calltmp
}
This shows an extern for the libm “cos” function, and a call to it.
ready> ^D
; ModuleID = 'my cool jit'
define double @""() {
entry:
%addtmp = fadd double 4.000000e+00, 5.000000e+00
ret double %addtmp
}
define double @foo(double %a, double %b) {
entry:
%multmp = fmul double %a, %a
%multmp1 = fmul double 2.000000e+00, %a
%multmp2 = fmul double %multmp1, %b
%addtmp = fadd double %multmp, %multmp2
%multmp3 = fmul double %b, %b
%addtmp4 = fadd double %addtmp, %multmp3
ret double %addtmp4
}
define double @bar(double %a) {
entry:
%calltmp = call double @foo(double %a, double 4.000000e+00)
%calltmp1 = call double @bar(double 3.133700e+04)
%addtmp = fadd double %calltmp, %calltmp1
ret double %addtmp
}
declare double @cos(double)
define double @""() {
entry:
%calltmp = call double @cos(double 1.234000e+00)
ret double %calltmp
}
When you quit the current demo, it dumps out the IR for the entire module generated. Here you can see the big picture with all the functions referencing each other.
This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we’ll describe how to add JIT codegen and optimizer support to this so we can actually start running code!
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the LLVM code generator. Because this uses the LLVM libraries, we need to link them in. To do this, we use the llvm-config tool to inform our makefile/command line about which options to use:
# Compile
ocamlbuild toy.byte
# Run
./toy.byte
Here is the code:
- _tags:
<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
- myocamlbuild.ml:
open Ocamlbuild_plugin;; ocaml_lib ~extern:true "llvm";; ocaml_lib ~extern:true "llvm_analysis";; flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
- token.ml:
(*===----------------------------------------------------------------------=== * Lexer Tokens *===----------------------------------------------------------------------===*) (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of * these others for known things. *) type token = (* commands *) | Def | Extern (* primary *) | Ident of string | Number of float (* unknown *) | Kwd of char
- lexer.ml:
(*===----------------------------------------------------------------------=== * Lexer *===----------------------------------------------------------------------===*) let rec lex = parser (* Skip any whitespace. *) | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream (* identifier: [a-zA-Z][a-zA-Z0-9] *) | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_ident buffer stream (* number: [0-9.]+ *) | [< ' ('0' .. '9' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_number buffer stream (* Comment until end of line. *) | [< ' ('#'); stream >] -> lex_comment stream (* Otherwise, just return the character as its ascii value. *) | [< 'c; stream >] -> [< 'Token.Kwd c; lex stream >] (* end of stream. *) | [< >] -> [< >] and lex_number buffer = parser | [< ' ('0' .. '9' | '.' as c); stream >] -> Buffer.add_char buffer c; lex_number buffer stream | [< stream=lex >] -> [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] and lex_ident buffer = parser | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> Buffer.add_char buffer c; lex_ident buffer stream | [< stream=lex >] -> match Buffer.contents buffer with | "def" -> [< 'Token.Def; stream >] | "extern" -> [< 'Token.Extern; stream >] | id -> [< 'Token.Ident id; stream >] and lex_comment = parser | [< ' ('\n'); stream=lex >] -> stream | [< 'c; e=lex_comment >] -> e | [< >] -> [< >]
- ast.ml:
(*===----------------------------------------------------------------------=== * Abstract Syntax Tree (aka Parse Tree) *===----------------------------------------------------------------------===*) (* expr - Base type for all expression nodes. *) type expr = (* variant for numeric literals like "1.0". *) | Number of float (* variant for referencing a variable, like "a". *) | Variable of string (* variant for a binary operator. *) | Binary of char * expr * expr (* variant for function calls. *) | Call of string * expr array (* proto - This type represents the "prototype" for a function, which captures * its name, and its argument names (thus implicitly the number of arguments the * function takes). *) type proto = Prototype of string * string array (* func - This type represents a function definition itself. *) type func = Function of proto * expr
- parser.ml:
(*===---------------------------------------------------------------------=== * Parser *===---------------------------------------------------------------------===*) (* binop_precedence - This holds the precedence for each binary operator that is * defined *) let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 (* precedence - Get the precedence of the pending binary operator token. *) let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 (* primary * ::= identifier * ::= numberexpr * ::= parenexpr *) let rec parse_primary = parser (* numberexpr ::= number *) | [< 'Token.Number n >] -> Ast.Number n (* parenexpr ::= '(' expression ')' *) | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e (* identifierexpr * ::= identifier * ::= identifier '(' argumentexpr ')' *) | [< 'Token.Ident id; stream >] -> let rec parse_args accumulator = parser | [< e=parse_expr; stream >] -> begin parser | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e | [< >] -> e :: accumulator end stream | [< >] -> accumulator in let rec parse_ident id = parser (* Call. *) | [< 'Token.Kwd '('; args=parse_args []; 'Token.Kwd ')' ?? "expected ')'">] -> Ast.Call (id, Array.of_list (List.rev args)) (* Simple variable ref. *) | [< >] -> Ast.Variable id in parse_ident id stream | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") (* binoprhs * ::= ('+' primary)* *) and parse_bin_rhs expr_prec lhs stream = match Stream.peek stream with (* If this is a binop, find its precedence. *) | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> let token_prec = precedence c in (* If this is a binop that binds at least as tightly as the current binop, * consume it, otherwise we are done. *) if token_prec < expr_prec then lhs else begin (* Eat the binop. *) Stream.junk stream; (* Parse the primary expression after the binary operator. *) let rhs = parse_primary stream in (* Okay, we know this is a binop. *) let rhs = match Stream.peek stream with | Some (Token.Kwd c2) -> (* If BinOp binds less tightly with rhs than the operator after * rhs, let the pending operator take rhs as its lhs. *) let next_prec = precedence c2 in if token_prec < next_prec then parse_bin_rhs (token_prec + 1) rhs stream else rhs | _ -> rhs in (* Merge lhs/rhs. *) let lhs = Ast.Binary (c, lhs, rhs) in parse_bin_rhs expr_prec lhs stream end | _ -> lhs (* expression * ::= primary binoprhs *) and parse_expr = parser | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream (* prototype * ::= id '(' id* ')' *) let parse_prototype = let rec parse_args accumulator = parser | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | [< >] -> accumulator in parser | [< 'Token.Ident id; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> (* success. *) Ast.Prototype (id, Array.of_list (List.rev args)) | [< >] -> raise (Stream.Error "expected function name in prototype") (* definition ::= 'def' prototype expression *) let parse_definition = parser | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> Ast.Function (p, e) (* toplevelexpr ::= expression *) let parse_toplevel = parser | [< e=parse_expr >] -> (* Make an anonymous proto. *) Ast.Function (Ast.Prototype ("", [||]), e) (* external ::= 'extern' prototype *) let parse_extern = parser | [< 'Token.Extern; e=parse_prototype >] -> e
- codegen.ml:
(*===----------------------------------------------------------------------=== * Code Generation *===----------------------------------------------------------------------===*) open Llvm exception Error of string let context = global_context () let the_module = create_module context "my cool jit" let builder = builder context let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 let double_type = double_type context let rec codegen_expr = function | Ast.Number n -> const_float double_type n | Ast.Variable name -> (try Hashtbl.find named_values name with | Not_found -> raise (Error "unknown variable name")) | Ast.Binary (op, lhs, rhs) -> let lhs_val = codegen_expr lhs in let rhs_val = codegen_expr rhs in begin match op with | '+' -> build_add lhs_val rhs_val "addtmp" builder | '-' -> build_sub lhs_val rhs_val "subtmp" builder | '*' -> build_mul lhs_val rhs_val "multmp" builder | '<' -> (* Convert bool 0/1 to double 0.0 or 1.0 *) let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in build_uitofp i double_type "booltmp" builder | _ -> raise (Error "invalid binary operator") end | Ast.Call (callee, args) -> (* Look up the name in the module table. *) let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "unknown function referenced") in let params = params callee in (* If argument mismatch error. *) if Array.length params == Array.length args then () else raise (Error "incorrect # arguments passed"); let args = Array.map codegen_expr args in build_call callee args "calltmp" builder let codegen_proto = function | Ast.Prototype (name, args) -> (* Make the function type: double(double,double) etc. *) let doubles = Array.make (Array.length args) double_type in let ft = function_type double_type doubles in let f = match lookup_function name the_module with | None -> declare_function name ft the_module (* If 'f' conflicted, there was already something named 'name'. If it * has a body, don't allow redefinition or reextern. *) | Some f -> (* If 'f' already has a body, reject this. *) if block_begin f <> At_end f then raise (Error "redefinition of function"); (* If 'f' took a different number of arguments, reject. *) if element_type (type_of f) <> ft then raise (Error "redefinition of function with different # args"); f in (* Set names for all arguments. *) Array.iteri (fun i a -> let n = args.(i) in set_value_name n a; Hashtbl.add named_values n a; ) (params f); f let codegen_func = function | Ast.Function (proto, body) -> Hashtbl.clear named_values; let the_function = codegen_proto proto in (* Create a new basic block to start insertion into. *) let bb = append_block context "entry" the_function in position_at_end bb builder; try let ret_val = codegen_expr body in (* Finish off the function. *) let _ = build_ret ret_val builder in (* Validate the generated code, checking for consistency. *) Llvm_analysis.assert_valid_function the_function; the_function with e -> delete_function the_function; raise e
- toplevel.ml:
(*===----------------------------------------------------------------------=== * Top-Level parsing and JIT Driver *===----------------------------------------------------------------------===*) open Llvm (* top ::= definition | external | expression | ';' *) let rec main_loop stream = match Stream.peek stream with | None -> () (* ignore top-level semicolons. *) | Some (Token.Kwd ';') -> Stream.junk stream; main_loop stream | Some token -> begin try match token with | Token.Def -> let e = Parser.parse_definition stream in print_endline "parsed a function definition."; dump_value (Codegen.codegen_func e); | Token.Extern -> let e = Parser.parse_extern stream in print_endline "parsed an extern."; dump_value (Codegen.codegen_proto e); | _ -> (* Evaluate a top-level expression into an anonymous function. *) let e = Parser.parse_toplevel stream in print_endline "parsed a top-level expr"; dump_value (Codegen.codegen_func e); with Stream.Error s | Codegen.Error s -> (* Skip token for error recovery. *) Stream.junk stream; print_endline s; end; print_string "ready> "; flush stdout; main_loop stream
- toy.ml:
(*===----------------------------------------------------------------------=== * Main driver code. *===----------------------------------------------------------------------===*) open Llvm let main () = (* Install standard binary operators. * 1 is the lowest precedence. *) Hashtbl.add Parser.binop_precedence '<' 10; Hashtbl.add Parser.binop_precedence '+' 20; Hashtbl.add Parser.binop_precedence '-' 20; Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) (* Prime the first token. *) print_string "ready> "; flush stdout; let stream = Lexer.lex (Stream.of_channel stdin) in (* Run the main "interpreter loop" now. *) Toplevel.main_loop stream; (* Print out all the generated code. *) dump_module Codegen.the_module ;; main ()
Kaleidoscope: Adding JIT and Optimizer Support¶
Chapter 4 Introduction¶
Welcome to Chapter 4 of the “Implementing a language with LLVM” tutorial. Chapters 1-3 described the implementation of a simple language and added support for generating LLVM IR. This chapter describes two new techniques: adding optimizer support to your language, and adding JIT compiler support. These additions will demonstrate how to get nice, efficient code for the Kaleidoscope language.
Trivial Constant Folding¶
Note: the default IRBuilder
now always includes the constant
folding optimisations below.
Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately, it does not produce wonderful code. For example, when compiling simple code, we don’t get obvious optimizations:
ready> def test(x) 1+2+x;
Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 1.000000e+00, 2.000000e+00
%addtmp1 = fadd double %addtmp, %x
ret double %addtmp1
}
This code is a very, very literal transcription of the AST built by
parsing the input. As such, this transcription lacks optimizations like
constant folding (we’d like to get “add x, 3.0
” in the example
above) as well as other more important optimizations. Constant folding,
in particular, is a very common and very important optimization: so much
so that many language implementors implement constant folding support in
their AST representation.
With LLVM, you don’t need this support in the AST. Since all calls to
build LLVM IR go through the LLVM builder, it would be nice if the
builder itself checked to see if there was a constant folding
opportunity when you call it. If so, it could just do the constant fold
and return the constant instead of creating an instruction. This is
exactly what the LLVMFoldingBuilder
class does.
All we did was switch from LLVMBuilder
to LLVMFoldingBuilder
.
Though we change no other code, we now have all of our instructions
implicitly constant folded without us having to do anything about it.
For example, the input above now compiles to:
ready> def test(x) 1+2+x;
Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 3.000000e+00, %x
ret double %addtmp
}
Well, that was easy :). In practice, we recommend always using
LLVMFoldingBuilder
when generating code like this. It has no
“syntactic overhead” for its use (you don’t have to uglify your compiler
with constant checks everywhere) and it can dramatically reduce the
amount of LLVM IR that is generated in some cases (particular for
languages with a macro preprocessor or that use a lot of constants).
On the other hand, the LLVMFoldingBuilder
is limited by the fact
that it does all of its analysis inline with the code as it is built. If
you take a slightly more complex example:
ready> def test(x) (1+2+x)*(x+(1+2));
ready> Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 3.000000e+00, %x
%addtmp1 = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp1
ret double %multmp
}
In this case, the LHS and RHS of the multiplication are the same value.
We’d really like to see this generate “tmp = x+3; result = tmp*tmp;
”
instead of computing “x*3
” twice.
Unfortunately, no amount of local analysis will be able to detect and correct this. This requires two transformations: reassociation of expressions (to make the add’s lexically identical) and Common Subexpression Elimination (CSE) to delete the redundant add instruction. Fortunately, LLVM provides a broad range of optimizations that you can use, in the form of “passes”.
LLVM Optimization Passes¶
LLVM provides many optimization passes, which do many different sorts of things and have different tradeoffs. Unlike other systems, LLVM doesn’t hold to the mistaken notion that one set of optimizations is right for all languages and for all situations. LLVM allows a compiler implementor to make complete decisions about what optimizations to use, in which order, and in what situation.
As a concrete example, LLVM supports both “whole module” passes, which look across as large of body of code as they can (often a whole file, but if run at link time, this can be a substantial portion of the whole program). It also supports and includes “per-function” passes which just operate on a single function at a time, without looking at other functions. For more information on passes and how they are run, see the How to Write a Pass document and the List of LLVM Passes.
For Kaleidoscope, we are currently generating functions on the fly, one at a time, as the user types them in. We aren’t shooting for the ultimate optimization experience in this setting, but we also want to catch the easy and quick stuff where possible. As such, we will choose to run a few per-function optimizations as the user types the function in. If we wanted to make a “static Kaleidoscope compiler”, we would use exactly the code we have now, except that we would defer running the optimizer until the entire file has been parsed.
In order to get per-function optimizations going, we need to set up a Llvm.PassManager to hold and organize the LLVM optimizations that we want to run. Once we have that, we can add a set of optimizations to run. The code looks like this:
(* Create the JIT. *)
let the_execution_engine = ExecutionEngine.create Codegen.the_module in
let the_fpm = PassManager.create_function Codegen.the_module in
(* Set up the optimizer pipeline. Start with registering info about how the
* target lays out data structures. *)
DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
(* Do simple "peephole" optimizations and bit-twiddling optzn. *)
add_instruction_combining the_fpm;
(* reassociate expressions. *)
add_reassociation the_fpm;
(* Eliminate Common SubExpressions. *)
add_gvn the_fpm;
(* Simplify the control flow graph (deleting unreachable blocks, etc). *)
add_cfg_simplification the_fpm;
ignore (PassManager.initialize the_fpm);
(* Run the main "interpreter loop" now. *)
Toplevel.main_loop the_fpm the_execution_engine stream;
The meat of the matter here, is the definition of “the_fpm
”. It
requires a pointer to the the_module
to construct itself. Once it is
set up, we use a series of “add” calls to add a bunch of LLVM passes.
The first pass is basically boilerplate, it adds a pass so that later
optimizations know how the data structures in the program are laid out.
The “the_execution_engine
” variable is related to the JIT, which we
will get to in the next section.
In this case, we choose to add 4 optimization passes. The passes we chose here are a pretty standard set of “cleanup” optimizations that are useful for a wide variety of code. I won’t delve into what they do but, believe me, they are a good starting place :).
Once the Llvm.PassManager.
is set up, we need to make use of it. We
do this by running it after our newly created function is constructed
(in Codegen.codegen_func
), but before it is returned to the client:
let codegen_func the_fpm = function
...
try
let ret_val = codegen_expr body in
(* Finish off the function. *)
let _ = build_ret ret_val builder in
(* Validate the generated code, checking for consistency. *)
Llvm_analysis.assert_valid_function the_function;
(* Optimize the function. *)
let _ = PassManager.run_function the_function the_fpm in
the_function
As you can see, this is pretty straightforward. The the_fpm
optimizes and updates the LLVM Function* in place, improving
(hopefully) its body. With this in place, we can try our test above
again:
ready> def test(x) (1+2+x)*(x+(1+2));
ready> Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp
ret double %multmp
}
As expected, we now get our nicely optimized code, saving a floating point add instruction from every execution of this function.
LLVM provides a wide variety of optimizations that can be used in
certain circumstances. Some documentation about the various
passes is available, but it isn’t very complete.
Another good source of ideas can come from looking at the passes that
Clang
runs to get started. The “opt
” tool allows you to
experiment with passes from the command line, so you can see if they do
anything.
Now that we have reasonable code coming out of our front-end, lets talk about executing it!
Adding a JIT Compiler¶
Code that is available in LLVM IR can have a wide variety of tools applied to it. For example, you can run optimizations on it (as we did above), you can dump it out in textual or binary forms, you can compile the code to an assembly file (.s) for some target, or you can JIT compile it. The nice thing about the LLVM IR representation is that it is the “common currency” between many different parts of the compiler.
In this section, we’ll add JIT compiler support to our interpreter. The basic idea that we want for Kaleidoscope is to have the user enter function bodies as they do now, but immediately evaluate the top-level expressions they type in. For example, if they type in “1 + 2;”, we should evaluate and print out 3. If they define a function, they should be able to call it from the command line.
In order to do this, we first declare and initialize the JIT. This is
done by adding a global variable and a call in main
:
...
let main () =
...
(* Create the JIT. *)
let the_execution_engine = ExecutionEngine.create Codegen.the_module in
...
This creates an abstract “Execution Engine” which can be either a JIT compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler for you if one is available for your platform, otherwise it will fall back to the interpreter.
Once the Llvm_executionengine.ExecutionEngine.t
is created, the JIT
is ready to be used. There are a variety of APIs that are useful, but
the simplest one is the
“Llvm_executionengine.ExecutionEngine.run_function
” function. This
method JIT compiles the specified LLVM Function and returns a function
pointer to the generated machine code. In our case, this means that we
can change the code that parses a top-level expression to look like
this:
(* Evaluate a top-level expression into an anonymous function. *)
let e = Parser.parse_toplevel stream in
print_endline "parsed a top-level expr";
let the_function = Codegen.codegen_func the_fpm e in
dump_value the_function;
(* JIT the function, returning a function pointer. *)
let result = ExecutionEngine.run_function the_function [||]
the_execution_engine in
print_string "Evaluated to ";
print_float (GenericValue.as_float Codegen.double_type result);
print_newline ();
Recall that we compile top-level expressions into a self-contained LLVM function that takes no arguments and returns the computed double. Because the LLVM JIT compiler matches the native platform ABI, this means that you can just cast the result pointer to a function pointer of that type and call it directly. This means, there is no difference between JIT compiled code and native machine code that is statically linked into your application.
With just these two changes, lets see how Kaleidoscope works now!
ready> 4+5;
define double @""() {
entry:
ret double 9.000000e+00
}
Evaluated to 9.000000
Well this looks like it is basically working. The dump of the function shows the “no argument function that always returns double” that we synthesize for each top level expression that is typed in. This demonstrates very basic functionality, but can we do more?
ready> def testfunc(x y) x + y*2;
Read function definition:
define double @testfunc(double %x, double %y) {
entry:
%multmp = fmul double %y, 2.000000e+00
%addtmp = fadd double %multmp, %x
ret double %addtmp
}
ready> testfunc(4, 10);
define double @""() {
entry:
%calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
ret double %calltmp
}
Evaluated to 24.000000
This illustrates that we can now call user code, but there is something
a bit subtle going on here. Note that we only invoke the JIT on the
anonymous functions that call testfunc, but we never invoked it on
testfunc itself. What actually happened here is that the JIT scanned
for all non-JIT’d functions transitively called from the anonymous
function and compiled all of them before returning from
run_function
.
The JIT provides a number of other more advanced interfaces for things like freeing allocated machine code, rejit’ing functions to update them, etc. However, even with this simple code, we get some surprisingly powerful capabilities - check this out (I removed the dump of the anonymous functions, you should get the idea by now :) :
ready> extern sin(x);
Read extern:
declare double @sin(double)
ready> extern cos(x);
Read extern:
declare double @cos(double)
ready> sin(1.0);
Evaluated to 0.841471
ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
Read function definition:
define double @foo(double %x) {
entry:
%calltmp = call double @sin(double %x)
%multmp = fmul double %calltmp, %calltmp
%calltmp2 = call double @cos(double %x)
%multmp4 = fmul double %calltmp2, %calltmp2
%addtmp = fadd double %multmp, %multmp4
ret double %addtmp
}
ready> foo(4.0);
Evaluated to 1.000000
Whoa, how does the JIT know about sin and cos? The answer is
surprisingly simple: in this example, the JIT started execution of a
function and got to a function call. It realized that the function was
not yet JIT compiled and invoked the standard set of routines to resolve
the function. In this case, there is no body defined for the function,
so the JIT ended up calling “dlsym("sin")
” on the Kaleidoscope
process itself. Since “sin
” is defined within the JIT’s address
space, it simply patches up calls in the module to call the libm version
of sin
directly.
The LLVM JIT provides a number of interfaces (look in the
llvm_executionengine.mli
file) for controlling how unknown functions
get resolved. It allows you to establish explicit mappings between IR
objects and addresses (useful for LLVM global variables that you want to
map to static tables, for example), allows you to dynamically decide on
the fly based on the function name, and even allows you to have the JIT
compile functions lazily the first time they’re called.
One interesting application of this is that we can now extend the language by writing arbitrary C code to implement operations. For example, if we add:
/* putchard - putchar that takes a double and returns 0. */
extern "C"
double putchard(double X) {
putchar((char)X);
return 0;
}
Now we can produce simple output to the console by using things like:
“extern putchard(x); putchard(120);
”, which prints a lowercase ‘x’
on the console (120 is the ASCII code for ‘x’). Similar code could be
used to implement file I/O, console input, and many other capabilities
in Kaleidoscope.
This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At this point, we can compile a non-Turing-complete programming language, optimize and JIT compile it in a user-driven way. Next up we’ll look into extending the language with control flow constructs, tackling some interesting LLVM IR issues along the way.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the LLVM JIT and optimizer. To build this example, use:
# Compile
ocamlbuild toy.byte
# Run
./toy.byte
Here is the code:
- _tags:
<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) <*.{byte,native}>: g++, use_llvm, use_llvm_analysis <*.{byte,native}>: use_llvm_executionengine, use_llvm_target <*.{byte,native}>: use_llvm_scalar_opts, use_bindings
- myocamlbuild.ml:
open Ocamlbuild_plugin;; ocaml_lib ~extern:true "llvm";; ocaml_lib ~extern:true "llvm_analysis";; ocaml_lib ~extern:true "llvm_executionengine";; ocaml_lib ~extern:true "llvm_target";; ocaml_lib ~extern:true "llvm_scalar_opts";; flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
- token.ml:
(*===----------------------------------------------------------------------=== * Lexer Tokens *===----------------------------------------------------------------------===*) (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of * these others for known things. *) type token = (* commands *) | Def | Extern (* primary *) | Ident of string | Number of float (* unknown *) | Kwd of char
- lexer.ml:
(*===----------------------------------------------------------------------=== * Lexer *===----------------------------------------------------------------------===*) let rec lex = parser (* Skip any whitespace. *) | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream (* identifier: [a-zA-Z][a-zA-Z0-9] *) | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_ident buffer stream (* number: [0-9.]+ *) | [< ' ('0' .. '9' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_number buffer stream (* Comment until end of line. *) | [< ' ('#'); stream >] -> lex_comment stream (* Otherwise, just return the character as its ascii value. *) | [< 'c; stream >] -> [< 'Token.Kwd c; lex stream >] (* end of stream. *) | [< >] -> [< >] and lex_number buffer = parser | [< ' ('0' .. '9' | '.' as c); stream >] -> Buffer.add_char buffer c; lex_number buffer stream | [< stream=lex >] -> [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] and lex_ident buffer = parser | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> Buffer.add_char buffer c; lex_ident buffer stream | [< stream=lex >] -> match Buffer.contents buffer with | "def" -> [< 'Token.Def; stream >] | "extern" -> [< 'Token.Extern; stream >] | id -> [< 'Token.Ident id; stream >] and lex_comment = parser | [< ' ('\n'); stream=lex >] -> stream | [< 'c; e=lex_comment >] -> e | [< >] -> [< >]
- ast.ml:
(*===----------------------------------------------------------------------=== * Abstract Syntax Tree (aka Parse Tree) *===----------------------------------------------------------------------===*) (* expr - Base type for all expression nodes. *) type expr = (* variant for numeric literals like "1.0". *) | Number of float (* variant for referencing a variable, like "a". *) | Variable of string (* variant for a binary operator. *) | Binary of char * expr * expr (* variant for function calls. *) | Call of string * expr array (* proto - This type represents the "prototype" for a function, which captures * its name, and its argument names (thus implicitly the number of arguments the * function takes). *) type proto = Prototype of string * string array (* func - This type represents a function definition itself. *) type func = Function of proto * expr
- parser.ml:
(*===---------------------------------------------------------------------=== * Parser *===---------------------------------------------------------------------===*) (* binop_precedence - This holds the precedence for each binary operator that is * defined *) let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 (* precedence - Get the precedence of the pending binary operator token. *) let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 (* primary * ::= identifier * ::= numberexpr * ::= parenexpr *) let rec parse_primary = parser (* numberexpr ::= number *) | [< 'Token.Number n >] -> Ast.Number n (* parenexpr ::= '(' expression ')' *) | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e (* identifierexpr * ::= identifier * ::= identifier '(' argumentexpr ')' *) | [< 'Token.Ident id; stream >] -> let rec parse_args accumulator = parser | [< e=parse_expr; stream >] -> begin parser | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e | [< >] -> e :: accumulator end stream | [< >] -> accumulator in let rec parse_ident id = parser (* Call. *) | [< 'Token.Kwd '('; args=parse_args []; 'Token.Kwd ')' ?? "expected ')'">] -> Ast.Call (id, Array.of_list (List.rev args)) (* Simple variable ref. *) | [< >] -> Ast.Variable id in parse_ident id stream | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") (* binoprhs * ::= ('+' primary)* *) and parse_bin_rhs expr_prec lhs stream = match Stream.peek stream with (* If this is a binop, find its precedence. *) | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> let token_prec = precedence c in (* If this is a binop that binds at least as tightly as the current binop, * consume it, otherwise we are done. *) if token_prec < expr_prec then lhs else begin (* Eat the binop. *) Stream.junk stream; (* Parse the primary expression after the binary operator. *) let rhs = parse_primary stream in (* Okay, we know this is a binop. *) let rhs = match Stream.peek stream with | Some (Token.Kwd c2) -> (* If BinOp binds less tightly with rhs than the operator after * rhs, let the pending operator take rhs as its lhs. *) let next_prec = precedence c2 in if token_prec < next_prec then parse_bin_rhs (token_prec + 1) rhs stream else rhs | _ -> rhs in (* Merge lhs/rhs. *) let lhs = Ast.Binary (c, lhs, rhs) in parse_bin_rhs expr_prec lhs stream end | _ -> lhs (* expression * ::= primary binoprhs *) and parse_expr = parser | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream (* prototype * ::= id '(' id* ')' *) let parse_prototype = let rec parse_args accumulator = parser | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | [< >] -> accumulator in parser | [< 'Token.Ident id; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> (* success. *) Ast.Prototype (id, Array.of_list (List.rev args)) | [< >] -> raise (Stream.Error "expected function name in prototype") (* definition ::= 'def' prototype expression *) let parse_definition = parser | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> Ast.Function (p, e) (* toplevelexpr ::= expression *) let parse_toplevel = parser | [< e=parse_expr >] -> (* Make an anonymous proto. *) Ast.Function (Ast.Prototype ("", [||]), e) (* external ::= 'extern' prototype *) let parse_extern = parser | [< 'Token.Extern; e=parse_prototype >] -> e
- codegen.ml:
(*===----------------------------------------------------------------------=== * Code Generation *===----------------------------------------------------------------------===*) open Llvm exception Error of string let context = global_context () let the_module = create_module context "my cool jit" let builder = builder context let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 let double_type = double_type context let rec codegen_expr = function | Ast.Number n -> const_float double_type n | Ast.Variable name -> (try Hashtbl.find named_values name with | Not_found -> raise (Error "unknown variable name")) | Ast.Binary (op, lhs, rhs) -> let lhs_val = codegen_expr lhs in let rhs_val = codegen_expr rhs in begin match op with | '+' -> build_add lhs_val rhs_val "addtmp" builder | '-' -> build_sub lhs_val rhs_val "subtmp" builder | '*' -> build_mul lhs_val rhs_val "multmp" builder | '<' -> (* Convert bool 0/1 to double 0.0 or 1.0 *) let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in build_uitofp i double_type "booltmp" builder | _ -> raise (Error "invalid binary operator") end | Ast.Call (callee, args) -> (* Look up the name in the module table. *) let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "unknown function referenced") in let params = params callee in (* If argument mismatch error. *) if Array.length params == Array.length args then () else raise (Error "incorrect # arguments passed"); let args = Array.map codegen_expr args in build_call callee args "calltmp" builder let codegen_proto = function | Ast.Prototype (name, args) -> (* Make the function type: double(double,double) etc. *) let doubles = Array.make (Array.length args) double_type in let ft = function_type double_type doubles in let f = match lookup_function name the_module with | None -> declare_function name ft the_module (* If 'f' conflicted, there was already something named 'name'. If it * has a body, don't allow redefinition or reextern. *) | Some f -> (* If 'f' already has a body, reject this. *) if block_begin f <> At_end f then raise (Error "redefinition of function"); (* If 'f' took a different number of arguments, reject. *) if element_type (type_of f) <> ft then raise (Error "redefinition of function with different # args"); f in (* Set names for all arguments. *) Array.iteri (fun i a -> let n = args.(i) in set_value_name n a; Hashtbl.add named_values n a; ) (params f); f let codegen_func the_fpm = function | Ast.Function (proto, body) -> Hashtbl.clear named_values; let the_function = codegen_proto proto in (* Create a new basic block to start insertion into. *) let bb = append_block context "entry" the_function in position_at_end bb builder; try let ret_val = codegen_expr body in (* Finish off the function. *) let _ = build_ret ret_val builder in (* Validate the generated code, checking for consistency. *) Llvm_analysis.assert_valid_function the_function; (* Optimize the function. *) let _ = PassManager.run_function the_function the_fpm in the_function with e -> delete_function the_function; raise e
- toplevel.ml:
(*===----------------------------------------------------------------------=== * Top-Level parsing and JIT Driver *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine (* top ::= definition | external | expression | ';' *) let rec main_loop the_fpm the_execution_engine stream = match Stream.peek stream with | None -> () (* ignore top-level semicolons. *) | Some (Token.Kwd ';') -> Stream.junk stream; main_loop the_fpm the_execution_engine stream | Some token -> begin try match token with | Token.Def -> let e = Parser.parse_definition stream in print_endline "parsed a function definition."; dump_value (Codegen.codegen_func the_fpm e); | Token.Extern -> let e = Parser.parse_extern stream in print_endline "parsed an extern."; dump_value (Codegen.codegen_proto e); | _ -> (* Evaluate a top-level expression into an anonymous function. *) let e = Parser.parse_toplevel stream in print_endline "parsed a top-level expr"; let the_function = Codegen.codegen_func the_fpm e in dump_value the_function; (* JIT the function, returning a function pointer. *) let result = ExecutionEngine.run_function the_function [||] the_execution_engine in print_string "Evaluated to "; print_float (GenericValue.as_float Codegen.double_type result); print_newline (); with Stream.Error s | Codegen.Error s -> (* Skip token for error recovery. *) Stream.junk stream; print_endline s; end; print_string "ready> "; flush stdout; main_loop the_fpm the_execution_engine stream
- toy.ml:
(*===----------------------------------------------------------------------=== * Main driver code. *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine open Llvm_target open Llvm_scalar_opts let main () = ignore (initialize_native_target ()); (* Install standard binary operators. * 1 is the lowest precedence. *) Hashtbl.add Parser.binop_precedence '<' 10; Hashtbl.add Parser.binop_precedence '+' 20; Hashtbl.add Parser.binop_precedence '-' 20; Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) (* Prime the first token. *) print_string "ready> "; flush stdout; let stream = Lexer.lex (Stream.of_channel stdin) in (* Create the JIT. *) let the_execution_engine = ExecutionEngine.create Codegen.the_module in let the_fpm = PassManager.create_function Codegen.the_module in (* Set up the optimizer pipeline. Start with registering info about how the * target lays out data structures. *) DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; (* Do simple "peephole" optimizations and bit-twiddling optzn. *) add_instruction_combination the_fpm; (* reassociate expressions. *) add_reassociation the_fpm; (* Eliminate Common SubExpressions. *) add_gvn the_fpm; (* Simplify the control flow graph (deleting unreachable blocks, etc). *) add_cfg_simplification the_fpm; ignore (PassManager.initialize the_fpm); (* Run the main "interpreter loop" now. *) Toplevel.main_loop the_fpm the_execution_engine stream; (* Print out all the generated code. *) dump_module Codegen.the_module ;; main ()
- bindings.c
#include <stdio.h> /* putchard - putchar that takes a double and returns 0. */ extern double putchard(double X) { putchar((char)X); return 0; }
Kaleidoscope: Extending the Language: Control Flow¶
Chapter 5 Introduction¶
Welcome to Chapter 5 of the “Implementing a language with LLVM” tutorial. Parts 1-4 described the implementation of the simple Kaleidoscope language and included support for generating LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is mostly useless: it has no control flow other than call and return. This means that you can’t have conditional branches in the code, significantly limiting its power. In this episode of “build that compiler”, we’ll extend Kaleidoscope to have an if/then/else expression plus a simple ‘for’ loop.
If/Then/Else¶
Extending Kaleidoscope to support if/then/else is quite straightforward. It basically requires adding lexer support for this “new” concept to the lexer, parser, AST, and LLVM code emitter. This example is nice, because it shows how easy it is to “grow” a language over time, incrementally extending it as new ideas are discovered.
Before we get going on “how” we add this extension, lets talk about “what” we want. The basic idea is that we want to be able to write this sort of thing:
def fib(x)
if x < 3 then
1
else
fib(x-1)+fib(x-2);
In Kaleidoscope, every construct is an expression: there are no statements. As such, the if/then/else expression needs to return a value like any other. Since we’re using a mostly functional form, we’ll have it evaluate its conditional, then return the ‘then’ or ‘else’ value based on how the condition was resolved. This is very similar to the C ”?:” expression.
The semantics of the if/then/else expression is that it evaluates the condition to a boolean equality value: 0.0 is considered to be false and everything else is considered to be true. If the condition is true, the first subexpression is evaluated and returned, if the condition is false, the second subexpression is evaluated and returned. Since Kaleidoscope allows side-effects, this behavior is important to nail down.
Now that we know what we “want”, lets break this down into its constituent pieces.
Lexer Extensions for If/Then/Else¶
The lexer extensions are straightforward. First we add new variants for the relevant tokens:
(* control *)
| If | Then | Else | For | In
Once we have that, we recognize the new keywords in the lexer. This is pretty simple stuff:
...
match Buffer.contents buffer with
| "def" -> [< 'Token.Def; stream >]
| "extern" -> [< 'Token.Extern; stream >]
| "if" -> [< 'Token.If; stream >]
| "then" -> [< 'Token.Then; stream >]
| "else" -> [< 'Token.Else; stream >]
| "for" -> [< 'Token.For; stream >]
| "in" -> [< 'Token.In; stream >]
| id -> [< 'Token.Ident id; stream >]
AST Extensions for If/Then/Else¶
To represent the new expression we add a new AST variant for it:
type expr =
...
(* variant for if/then/else. *)
| If of expr * expr * expr
The AST variant just has pointers to the various subexpressions.
Parser Extensions for If/Then/Else¶
Now that we have the relevant tokens coming from the lexer and we have the AST node to build, our parsing logic is relatively straightforward. First we define a new parsing function:
let rec parse_primary = parser
...
(* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
| [< 'Token.If; c=parse_expr;
'Token.Then ?? "expected 'then'"; t=parse_expr;
'Token.Else ?? "expected 'else'"; e=parse_expr >] ->
Ast.If (c, t, e)
Next we hook it up as a primary expression:
let rec parse_primary = parser
...
(* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
| [< 'Token.If; c=parse_expr;
'Token.Then ?? "expected 'then'"; t=parse_expr;
'Token.Else ?? "expected 'else'"; e=parse_expr >] ->
Ast.If (c, t, e)
LLVM IR for If/Then/Else¶
Now that we have it parsing and building the AST, the final piece is adding LLVM code generation support. This is the most interesting part of the if/then/else example, because this is where it starts to introduce new concepts. All of the code above has been thoroughly described in previous chapters.
To motivate the code we want to produce, lets take a look at a simple example. Consider:
extern foo();
extern bar();
def baz(x) if x then foo() else bar();
If you disable optimizations, the code you’ll (soon) get from Kaleidoscope looks like this:
declare double @foo()
declare double @bar()
define double @baz(double %x) {
entry:
%ifcond = fcmp one double %x, 0.000000e+00
br i1 %ifcond, label %then, label %else
then: ; preds = %entry
%calltmp = call double @foo()
br label %ifcont
else: ; preds = %entry
%calltmp1 = call double @bar()
br label %ifcont
ifcont: ; preds = %else, %then
%iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
ret double %iftmp
}
To visualize the control flow graph, you can use a nifty feature of the
LLVM ‘opt‘ tool. If you put this LLVM
IR into “t.ll” and run “llvm-as < t.ll | opt -analyze -view-cfg
”, a
window will pop up and you’ll
see this graph:

Example CFG
Another way to get this is to call
“Llvm_analysis.view_function_cfg f
” or
“Llvm_analysis.view_function_cfg_only f
” (where f
is a
“Function
”) either by inserting actual calls into the code and
recompiling or by calling these in the debugger. LLVM has many nice
features for visualizing various graphs.
Getting back to the generated code, it is fairly simple: the entry block
evaluates the conditional expression (“x” in our case here) and compares
the result to 0.0 with the “fcmp one
” instruction (‘one’ is “Ordered
and Not Equal”). Based on the result of this expression, the code jumps
to either the “then” or “else” blocks, which contain the expressions for
the true/false cases.
Once the then/else blocks are finished executing, they both branch back to the ‘ifcont’ block to execute the code that happens after the if/then/else. In this case the only thing left to do is to return to the caller of the function. The question then becomes: how does the code know which expression to return?
The answer to this question involves an important SSA operation: the Phi operation. If you’re not familiar with SSA, the wikipedia article is a good introduction and there are various other introductions to it available on your favorite search engine. The short version is that “execution” of the Phi operation requires “remembering” which block control came from. The Phi operation takes on the value corresponding to the input control block. In this case, if control comes in from the “then” block, it gets the value of “calltmp”. If control comes from the “else” block, it gets the value of “calltmp1”.
At this point, you are probably starting to think “Oh no! This means my simple and elegant front-end will have to start generating SSA form in order to use LLVM!”. Fortunately, this is not the case, and we strongly advise not implementing an SSA construction algorithm in your front-end unless there is an amazingly good reason to do so. In practice, there are two sorts of values that float around in code written for your average imperative programming language that might need Phi nodes:
- Code that involves user variables:
x = 1; x = x + 1;
- Values that are implicit in the structure of your AST, such as the Phi node in this case.
In Chapter 7 of this tutorial (“mutable variables”), we’ll talk about #1 in depth. For now, just believe me that you don’t need SSA construction to handle this case. For #2, you have the choice of using the techniques that we will describe for #1, or you can insert Phi nodes directly, if convenient. In this case, it is really really easy to generate the Phi node, so we choose to do it directly.
Okay, enough of the motivation and overview, lets generate code!
Code Generation for If/Then/Else¶
In order to generate code for this, we implement the Codegen
method
for IfExprAST
:
let rec codegen_expr = function
...
| Ast.If (cond, then_, else_) ->
let cond = codegen_expr cond in
(* Convert condition to a bool by comparing equal to 0.0 *)
let zero = const_float double_type 0.0 in
let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in
This code is straightforward and similar to what we saw before. We emit the expression for the condition, then compare that value to zero to get a truth value as a 1-bit (bool) value.
(* Grab the first block so that we might later add the conditional branch
* to it at the end of the function. *)
let start_bb = insertion_block builder in
let the_function = block_parent start_bb in
let then_bb = append_block context "then" the_function in
position_at_end then_bb builder;
As opposed to the C++ tutorial, we have to build our
basic blocks bottom up since we can’t have dangling BasicBlocks. We
start off by saving a pointer to the first block (which might not be the
entry block), which we’ll need to build a conditional branch later. We
do this by asking the builder
for the current BasicBlock. The fourth
line gets the current Function object that is being built. It gets this
by the start_bb
for its “parent” (the function it is currently
embedded into).
Once it has that, it creates one block. It is automatically appended into the function’s list of blocks.
(* Emit 'then' value. *)
position_at_end then_bb builder;
let then_val = codegen_expr then_ in
(* Codegen of 'then' can change the current block, update then_bb for the
* phi. We create a new name because one is used for the phi node, and the
* other is used for the conditional branch. *)
let new_then_bb = insertion_block builder in
We move the builder to start inserting into the “then” block. Strictly speaking, this call moves the insertion point to be at the end of the specified block. However, since the “then” block is empty, it also starts out by inserting at the beginning of the block. :)
Once the insertion point is set, we recursively codegen the “then” expression from the AST.
The final line here is quite subtle, but is very important. The basic issue is that when we create the Phi node in the merge block, we need to set up the block/value pairs that indicate how the Phi will work. Importantly, the Phi node expects to have an entry for each predecessor of the block in the CFG. Why then, are we getting the current block when we just set it to ThenBB 5 lines above? The problem is that the “Then” expression may actually itself change the block that the Builder is emitting into if, for example, it contains a nested “if/then/else” expression. Because calling Codegen recursively could arbitrarily change the notion of the current block, we are required to get an up-to-date value for code that will set up the Phi node.
(* Emit 'else' value. *)
let else_bb = append_block context "else" the_function in
position_at_end else_bb builder;
let else_val = codegen_expr else_ in
(* Codegen of 'else' can change the current block, update else_bb for the
* phi. *)
let new_else_bb = insertion_block builder in
Code generation for the ‘else’ block is basically identical to codegen for the ‘then’ block.
(* Emit merge block. *)
let merge_bb = append_block context "ifcont" the_function in
position_at_end merge_bb builder;
let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in
let phi = build_phi incoming "iftmp" builder in
The first two lines here are now familiar: the first adds the “merge” block to the Function object. The second changes the insertion point so that newly created code will go into the “merge” block. Once that is done, we need to create the PHI node and set up the block/value pairs for the PHI.
(* Return to the start block to add the conditional branch. *)
position_at_end start_bb builder;
ignore (build_cond_br cond_val then_bb else_bb builder);
Once the blocks are created, we can emit the conditional branch that chooses between them. Note that creating new blocks does not implicitly affect the IRBuilder, so it is still inserting into the block that the condition went into. This is why we needed to save the “start” block.
(* Set a unconditional branch at the end of the 'then' block and the
* 'else' block to the 'merge' block. *)
position_at_end new_then_bb builder; ignore (build_br merge_bb builder);
position_at_end new_else_bb builder; ignore (build_br merge_bb builder);
(* Finally, set the builder to the end of the merge block. *)
position_at_end merge_bb builder;
phi
To finish off the blocks, we create an unconditional branch to the merge block. One interesting (and very important) aspect of the LLVM IR is that it requires all basic blocks to be “terminated” with a control flow instruction such as return or branch. This means that all control flow, including fall throughs must be made explicit in the LLVM IR. If you violate this rule, the verifier will emit an error.
Finally, the CodeGen function returns the phi node as the value computed by the if/then/else expression. In our example above, this returned value will feed into the code for the top-level function, which will create the return instruction.
Overall, we now have the ability to execute conditional code in Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language that can calculate a wide variety of numeric functions. Next up we’ll add another useful expression that is familiar from non-functional languages...
‘for’ Loop Expression¶
Now that we know how to add basic control flow constructs to the language, we have the tools to add more powerful things. Lets add something more aggressive, a ‘for’ expression:
extern putchard(char);
def printstar(n)
for i = 1, i < n, 1.0 in
putchard(42); # ascii 42 = '*'
# print 100 '*' characters
printstar(100);
This expression defines a new variable (“i” in this case) which iterates from a starting value, while the condition (“i < n” in this case) is true, incrementing by an optional step value (“1.0” in this case). If the step value is omitted, it defaults to 1.0. While the loop is true, it executes its body expression. Because we don’t have anything better to return, we’ll just define the loop as always returning 0.0. In the future when we have mutable variables, it will get more useful.
As before, lets talk about the changes that we need to Kaleidoscope to support this.
Lexer Extensions for the ‘for’ Loop¶
The lexer extensions are the same sort of thing as for if/then/else:
... in Token.token ...
(* control *)
| If | Then | Else
| For | In
... in Lexer.lex_ident...
match Buffer.contents buffer with
| "def" -> [< 'Token.Def; stream >]
| "extern" -> [< 'Token.Extern; stream >]
| "if" -> [< 'Token.If; stream >]
| "then" -> [< 'Token.Then; stream >]
| "else" -> [< 'Token.Else; stream >]
| "for" -> [< 'Token.For; stream >]
| "in" -> [< 'Token.In; stream >]
| id -> [< 'Token.Ident id; stream >]
AST Extensions for the ‘for’ Loop¶
The AST variant is just as simple. It basically boils down to capturing the variable name and the constituent expressions in the node.
type expr =
...
(* variant for for/in. *)
| For of string * expr * expr * expr option * expr
Parser Extensions for the ‘for’ Loop¶
The parser code is also fairly standard. The only interesting thing here is handling of the optional step value. The parser code handles it by checking to see if the second comma is present. If not, it sets the step value to null in the AST node:
let rec parse_primary = parser
...
(* forexpr
::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *)
| [< 'Token.For;
'Token.Ident id ?? "expected identifier after for";
'Token.Kwd '=' ?? "expected '=' after for";
stream >] ->
begin parser
| [<
start=parse_expr;
'Token.Kwd ',' ?? "expected ',' after for";
end_=parse_expr;
stream >] ->
let step =
begin parser
| [< 'Token.Kwd ','; step=parse_expr >] -> Some step
| [< >] -> None
end stream
in
begin parser
| [< 'Token.In; body=parse_expr >] ->
Ast.For (id, start, end_, step, body)
| [< >] ->
raise (Stream.Error "expected 'in' after for")
end stream
| [< >] ->
raise (Stream.Error "expected '=' after for")
end stream
LLVM IR for the ‘for’ Loop¶
Now we get to the good part: the LLVM IR we want to generate for this thing. With the simple example above, we get this LLVM IR (note that this dump is generated with optimizations disabled for clarity):
declare double @putchard(double)
define double @printstar(double %n) {
entry:
; initial value = 1.0 (inlined into phi)
br label %loop
loop: ; preds = %loop, %entry
%i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
; body
%calltmp = call double @putchard(double 4.200000e+01)
; increment
%nextvar = fadd double %i, 1.000000e+00
; termination test
%cmptmp = fcmp ult double %i, %n
%booltmp = uitofp i1 %cmptmp to double
%loopcond = fcmp one double %booltmp, 0.000000e+00
br i1 %loopcond, label %loop, label %afterloop
afterloop: ; preds = %loop
; loop always returns 0.0
ret double 0.000000e+00
}
This loop contains all the same constructs we saw before: a phi node, several expressions, and some basic blocks. Lets see how this fits together.
Code Generation for the ‘for’ Loop¶
The first part of Codegen is very simple: we just output the start expression for the loop value:
let rec codegen_expr = function
...
| Ast.For (var_name, start, end_, step, body) ->
(* Emit the start code first, without 'variable' in scope. *)
let start_val = codegen_expr start in
With this out of the way, the next step is to set up the LLVM basic block for the start of the loop body. In the case above, the whole loop body is one block, but remember that the body code itself could consist of multiple blocks (e.g. if it contains an if/then/else or a for/in expression).
(* Make the new basic block for the loop header, inserting after current
* block. *)
let preheader_bb = insertion_block builder in
let the_function = block_parent preheader_bb in
let loop_bb = append_block context "loop" the_function in
(* Insert an explicit fall through from the current block to the
* loop_bb. *)
ignore (build_br loop_bb builder);
This code is similar to what we saw for if/then/else. Because we will need it to create the Phi node, we remember the block that falls through into the loop. Once we have that, we create the actual block that starts the loop and create an unconditional branch for the fall-through between the two blocks.
(* Start insertion in loop_bb. *)
position_at_end loop_bb builder;
(* Start the PHI node with an entry for start. *)
let variable = build_phi [(start_val, preheader_bb)] var_name builder in
Now that the “preheader” for the loop is set up, we switch to emitting code for the loop body. To begin with, we move the insertion point and create the PHI node for the loop induction variable. Since we already know the incoming value for the starting value, we add it to the Phi node. Note that the Phi will eventually get a second value for the backedge, but we can’t set it up yet (because it doesn’t exist!).
(* Within the loop, the variable is defined equal to the PHI node. If it
* shadows an existing variable, we have to restore it, so save it
* now. *)
let old_val =
try Some (Hashtbl.find named_values var_name) with Not_found -> None
in
Hashtbl.add named_values var_name variable;
(* Emit the body of the loop. This, like any other expr, can change the
* current BB. Note that we ignore the value computed by the body, but
* don't allow an error *)
ignore (codegen_expr body);
Now the code starts to get more interesting. Our ‘for’ loop introduces a
new variable to the symbol table. This means that our symbol table can
now contain either function arguments or loop variables. To handle this,
before we codegen the body of the loop, we add the loop variable as the
current value for its name. Note that it is possible that there is a
variable of the same name in the outer scope. It would be easy to make
this an error (emit an error and return null if there is already an
entry for VarName) but we choose to allow shadowing of variables. In
order to handle this correctly, we remember the Value that we are
potentially shadowing in old_val
(which will be None if there is no
shadowed variable).
Once the loop variable is set into the symbol table, the code recursively codegen’s the body. This allows the body to use the loop variable: any references to it will naturally find it in the symbol table.
(* Emit the step value. *)
let step_val =
match step with
| Some step -> codegen_expr step
(* If not specified, use 1.0. *)
| None -> const_float double_type 1.0
in
let next_var = build_add variable step_val "nextvar" builder in
Now that the body is emitted, we compute the next value of the iteration
variable by adding the step value, or 1.0 if it isn’t present.
‘next_var
‘ will be the value of the loop variable on the next
iteration of the loop.
(* Compute the end condition. *)
let end_cond = codegen_expr end_ in
(* Convert condition to a bool by comparing equal to 0.0. *)
let zero = const_float double_type 0.0 in
let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in
Finally, we evaluate the exit value of the loop, to determine whether the loop should exit. This mirrors the condition evaluation for the if/then/else statement.
(* Create the "after loop" block and insert it. *)
let loop_end_bb = insertion_block builder in
let after_bb = append_block context "afterloop" the_function in
(* Insert the conditional branch into the end of loop_end_bb. *)
ignore (build_cond_br end_cond loop_bb after_bb builder);
(* Any new code will be inserted in after_bb. *)
position_at_end after_bb builder;
With the code for the body of the loop complete, we just need to finish up the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit (“afterloop”). Based on the value of the exit condition, it creates a conditional branch that chooses between executing the loop again and exiting the loop. Any future code is emitted in the “afterloop” block, so it sets the insertion position to it.
(* Add a new entry to the PHI node for the backedge. *)
add_incoming (next_var, loop_end_bb) variable;
(* Restore the unshadowed variable. *)
begin match old_val with
| Some old_val -> Hashtbl.add named_values var_name old_val
| None -> ()
end;
(* for expr always returns 0.0. *)
const_null double_type
The final code handles various cleanups: now that we have the
“next_var
” value, we can add the incoming value to the loop PHI
node. After that, we remove the loop variable from the symbol table, so
that it isn’t in scope after the for loop. Finally, code generation of
the for loop always returns 0.0, so that is what we return from
Codegen.codegen_expr
.
With this, we conclude the “adding control flow to Kaleidoscope” chapter of the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors to know. In the next chapter of our saga, we will get a bit crazier and add user-defined operators to our poor innocent language.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the if/then/else and for expressions.. To build this example, use:
# Compile
ocamlbuild toy.byte
# Run
./toy.byte
Here is the code:
- _tags:
<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) <*.{byte,native}>: g++, use_llvm, use_llvm_analysis <*.{byte,native}>: use_llvm_executionengine, use_llvm_target <*.{byte,native}>: use_llvm_scalar_opts, use_bindings
- myocamlbuild.ml:
open Ocamlbuild_plugin;; ocaml_lib ~extern:true "llvm";; ocaml_lib ~extern:true "llvm_analysis";; ocaml_lib ~extern:true "llvm_executionengine";; ocaml_lib ~extern:true "llvm_target";; ocaml_lib ~extern:true "llvm_scalar_opts";; flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
- token.ml:
(*===----------------------------------------------------------------------=== * Lexer Tokens *===----------------------------------------------------------------------===*) (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of * these others for known things. *) type token = (* commands *) | Def | Extern (* primary *) | Ident of string | Number of float (* unknown *) | Kwd of char (* control *) | If | Then | Else | For | In
- lexer.ml:
(*===----------------------------------------------------------------------=== * Lexer *===----------------------------------------------------------------------===*) let rec lex = parser (* Skip any whitespace. *) | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream (* identifier: [a-zA-Z][a-zA-Z0-9] *) | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_ident buffer stream (* number: [0-9.]+ *) | [< ' ('0' .. '9' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_number buffer stream (* Comment until end of line. *) | [< ' ('#'); stream >] -> lex_comment stream (* Otherwise, just return the character as its ascii value. *) | [< 'c; stream >] -> [< 'Token.Kwd c; lex stream >] (* end of stream. *) | [< >] -> [< >] and lex_number buffer = parser | [< ' ('0' .. '9' | '.' as c); stream >] -> Buffer.add_char buffer c; lex_number buffer stream | [< stream=lex >] -> [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] and lex_ident buffer = parser | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> Buffer.add_char buffer c; lex_ident buffer stream | [< stream=lex >] -> match Buffer.contents buffer with | "def" -> [< 'Token.Def; stream >] | "extern" -> [< 'Token.Extern; stream >] | "if" -> [< 'Token.If; stream >] | "then" -> [< 'Token.Then; stream >] | "else" -> [< 'Token.Else; stream >] | "for" -> [< 'Token.For; stream >] | "in" -> [< 'Token.In; stream >] | id -> [< 'Token.Ident id; stream >] and lex_comment = parser | [< ' ('\n'); stream=lex >] -> stream | [< 'c; e=lex_comment >] -> e | [< >] -> [< >]
- ast.ml:
(*===----------------------------------------------------------------------=== * Abstract Syntax Tree (aka Parse Tree) *===----------------------------------------------------------------------===*) (* expr - Base type for all expression nodes. *) type expr = (* variant for numeric literals like "1.0". *) | Number of float (* variant for referencing a variable, like "a". *) | Variable of string (* variant for a binary operator. *) | Binary of char * expr * expr (* variant for function calls. *) | Call of string * expr array (* variant for if/then/else. *) | If of expr * expr * expr (* variant for for/in. *) | For of string * expr * expr * expr option * expr (* proto - This type represents the "prototype" for a function, which captures * its name, and its argument names (thus implicitly the number of arguments the * function takes). *) type proto = Prototype of string * string array (* func - This type represents a function definition itself. *) type func = Function of proto * expr
- parser.ml:
(*===---------------------------------------------------------------------=== * Parser *===---------------------------------------------------------------------===*) (* binop_precedence - This holds the precedence for each binary operator that is * defined *) let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 (* precedence - Get the precedence of the pending binary operator token. *) let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 (* primary * ::= identifier * ::= numberexpr * ::= parenexpr * ::= ifexpr * ::= forexpr *) let rec parse_primary = parser (* numberexpr ::= number *) | [< 'Token.Number n >] -> Ast.Number n (* parenexpr ::= '(' expression ')' *) | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e (* identifierexpr * ::= identifier * ::= identifier '(' argumentexpr ')' *) | [< 'Token.Ident id; stream >] -> let rec parse_args accumulator = parser | [< e=parse_expr; stream >] -> begin parser | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e | [< >] -> e :: accumulator end stream | [< >] -> accumulator in let rec parse_ident id = parser (* Call. *) | [< 'Token.Kwd '('; args=parse_args []; 'Token.Kwd ')' ?? "expected ')'">] -> Ast.Call (id, Array.of_list (List.rev args)) (* Simple variable ref. *) | [< >] -> Ast.Variable id in parse_ident id stream (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) | [< 'Token.If; c=parse_expr; 'Token.Then ?? "expected 'then'"; t=parse_expr; 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> Ast.If (c, t, e) (* forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) | [< 'Token.For; 'Token.Ident id ?? "expected identifier after for"; 'Token.Kwd '=' ?? "expected '=' after for"; stream >] -> begin parser | [< start=parse_expr; 'Token.Kwd ',' ?? "expected ',' after for"; end_=parse_expr; stream >] -> let step = begin parser | [< 'Token.Kwd ','; step=parse_expr >] -> Some step | [< >] -> None end stream in begin parser | [< 'Token.In; body=parse_expr >] -> Ast.For (id, start, end_, step, body) | [< >] -> raise (Stream.Error "expected 'in' after for") end stream | [< >] -> raise (Stream.Error "expected '=' after for") end stream | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") (* binoprhs * ::= ('+' primary)* *) and parse_bin_rhs expr_prec lhs stream = match Stream.peek stream with (* If this is a binop, find its precedence. *) | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> let token_prec = precedence c in (* If this is a binop that binds at least as tightly as the current binop, * consume it, otherwise we are done. *) if token_prec < expr_prec then lhs else begin (* Eat the binop. *) Stream.junk stream; (* Parse the primary expression after the binary operator. *) let rhs = parse_primary stream in (* Okay, we know this is a binop. *) let rhs = match Stream.peek stream with | Some (Token.Kwd c2) -> (* If BinOp binds less tightly with rhs than the operator after * rhs, let the pending operator take rhs as its lhs. *) let next_prec = precedence c2 in if token_prec < next_prec then parse_bin_rhs (token_prec + 1) rhs stream else rhs | _ -> rhs in (* Merge lhs/rhs. *) let lhs = Ast.Binary (c, lhs, rhs) in parse_bin_rhs expr_prec lhs stream end | _ -> lhs (* expression * ::= primary binoprhs *) and parse_expr = parser | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream (* prototype * ::= id '(' id* ')' *) let parse_prototype = let rec parse_args accumulator = parser | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | [< >] -> accumulator in parser | [< 'Token.Ident id; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> (* success. *) Ast.Prototype (id, Array.of_list (List.rev args)) | [< >] -> raise (Stream.Error "expected function name in prototype") (* definition ::= 'def' prototype expression *) let parse_definition = parser | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> Ast.Function (p, e) (* toplevelexpr ::= expression *) let parse_toplevel = parser | [< e=parse_expr >] -> (* Make an anonymous proto. *) Ast.Function (Ast.Prototype ("", [||]), e) (* external ::= 'extern' prototype *) let parse_extern = parser | [< 'Token.Extern; e=parse_prototype >] -> e
- codegen.ml:
(*===----------------------------------------------------------------------=== * Code Generation *===----------------------------------------------------------------------===*) open Llvm exception Error of string let context = global_context () let the_module = create_module context "my cool jit" let builder = builder context let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 let double_type = double_type context let rec codegen_expr = function | Ast.Number n -> const_float double_type n | Ast.Variable name -> (try Hashtbl.find named_values name with | Not_found -> raise (Error "unknown variable name")) | Ast.Binary (op, lhs, rhs) -> let lhs_val = codegen_expr lhs in let rhs_val = codegen_expr rhs in begin match op with | '+' -> build_add lhs_val rhs_val "addtmp" builder | '-' -> build_sub lhs_val rhs_val "subtmp" builder | '*' -> build_mul lhs_val rhs_val "multmp" builder | '<' -> (* Convert bool 0/1 to double 0.0 or 1.0 *) let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in build_uitofp i double_type "booltmp" builder | _ -> raise (Error "invalid binary operator") end | Ast.Call (callee, args) -> (* Look up the name in the module table. *) let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "unknown function referenced") in let params = params callee in (* If argument mismatch error. *) if Array.length params == Array.length args then () else raise (Error "incorrect # arguments passed"); let args = Array.map codegen_expr args in build_call callee args "calltmp" builder | Ast.If (cond, then_, else_) -> let cond = codegen_expr cond in (* Convert condition to a bool by comparing equal to 0.0 *) let zero = const_float double_type 0.0 in let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in (* Grab the first block so that we might later add the conditional branch * to it at the end of the function. *) let start_bb = insertion_block builder in let the_function = block_parent start_bb in let then_bb = append_block context "then" the_function in (* Emit 'then' value. *) position_at_end then_bb builder; let then_val = codegen_expr then_ in (* Codegen of 'then' can change the current block, update then_bb for the * phi. We create a new name because one is used for the phi node, and the * other is used for the conditional branch. *) let new_then_bb = insertion_block builder in (* Emit 'else' value. *) let else_bb = append_block context "else" the_function in position_at_end else_bb builder; let else_val = codegen_expr else_ in (* Codegen of 'else' can change the current block, update else_bb for the * phi. *) let new_else_bb = insertion_block builder in (* Emit merge block. *) let merge_bb = append_block context "ifcont" the_function in position_at_end merge_bb builder; let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in let phi = build_phi incoming "iftmp" builder in (* Return to the start block to add the conditional branch. *) position_at_end start_bb builder; ignore (build_cond_br cond_val then_bb else_bb builder); (* Set a unconditional branch at the end of the 'then' block and the * 'else' block to the 'merge' block. *) position_at_end new_then_bb builder; ignore (build_br merge_bb builder); position_at_end new_else_bb builder; ignore (build_br merge_bb builder); (* Finally, set the builder to the end of the merge block. *) position_at_end merge_bb builder; phi | Ast.For (var_name, start, end_, step, body) -> (* Emit the start code first, without 'variable' in scope. *) let start_val = codegen_expr start in (* Make the new basic block for the loop header, inserting after current * block. *) let preheader_bb = insertion_block builder in let the_function = block_parent preheader_bb in let loop_bb = append_block context "loop" the_function in (* Insert an explicit fall through from the current block to the * loop_bb. *) ignore (build_br loop_bb builder); (* Start insertion in loop_bb. *) position_at_end loop_bb builder; (* Start the PHI node with an entry for start. *) let variable = build_phi [(start_val, preheader_bb)] var_name builder in (* Within the loop, the variable is defined equal to the PHI node. If it * shadows an existing variable, we have to restore it, so save it * now. *) let old_val = try Some (Hashtbl.find named_values var_name) with Not_found -> None in Hashtbl.add named_values var_name variable; (* Emit the body of the loop. This, like any other expr, can change the * current BB. Note that we ignore the value computed by the body, but * don't allow an error *) ignore (codegen_expr body); (* Emit the step value. *) let step_val = match step with | Some step -> codegen_expr step (* If not specified, use 1.0. *) | None -> const_float double_type 1.0 in let next_var = build_add variable step_val "nextvar" builder in (* Compute the end condition. *) let end_cond = codegen_expr end_ in (* Convert condition to a bool by comparing equal to 0.0. *) let zero = const_float double_type 0.0 in let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in (* Create the "after loop" block and insert it. *) let loop_end_bb = insertion_block builder in let after_bb = append_block context "afterloop" the_function in (* Insert the conditional branch into the end of loop_end_bb. *) ignore (build_cond_br end_cond loop_bb after_bb builder); (* Any new code will be inserted in after_bb. *) position_at_end after_bb builder; (* Add a new entry to the PHI node for the backedge. *) add_incoming (next_var, loop_end_bb) variable; (* Restore the unshadowed variable. *) begin match old_val with | Some old_val -> Hashtbl.add named_values var_name old_val | None -> () end; (* for expr always returns 0.0. *) const_null double_type let codegen_proto = function | Ast.Prototype (name, args) -> (* Make the function type: double(double,double) etc. *) let doubles = Array.make (Array.length args) double_type in let ft = function_type double_type doubles in let f = match lookup_function name the_module with | None -> declare_function name ft the_module (* If 'f' conflicted, there was already something named 'name'. If it * has a body, don't allow redefinition or reextern. *) | Some f -> (* If 'f' already has a body, reject this. *) if block_begin f <> At_end f then raise (Error "redefinition of function"); (* If 'f' took a different number of arguments, reject. *) if element_type (type_of f) <> ft then raise (Error "redefinition of function with different # args"); f in (* Set names for all arguments. *) Array.iteri (fun i a -> let n = args.(i) in set_value_name n a; Hashtbl.add named_values n a; ) (params f); f let codegen_func the_fpm = function | Ast.Function (proto, body) -> Hashtbl.clear named_values; let the_function = codegen_proto proto in (* Create a new basic block to start insertion into. *) let bb = append_block context "entry" the_function in position_at_end bb builder; try let ret_val = codegen_expr body in (* Finish off the function. *) let _ = build_ret ret_val builder in (* Validate the generated code, checking for consistency. *) Llvm_analysis.assert_valid_function the_function; (* Optimize the function. *) let _ = PassManager.run_function the_function the_fpm in the_function with e -> delete_function the_function; raise e
- toplevel.ml:
(*===----------------------------------------------------------------------=== * Top-Level parsing and JIT Driver *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine (* top ::= definition | external | expression | ';' *) let rec main_loop the_fpm the_execution_engine stream = match Stream.peek stream with | None -> () (* ignore top-level semicolons. *) | Some (Token.Kwd ';') -> Stream.junk stream; main_loop the_fpm the_execution_engine stream | Some token -> begin try match token with | Token.Def -> let e = Parser.parse_definition stream in print_endline "parsed a function definition."; dump_value (Codegen.codegen_func the_fpm e); | Token.Extern -> let e = Parser.parse_extern stream in print_endline "parsed an extern."; dump_value (Codegen.codegen_proto e); | _ -> (* Evaluate a top-level expression into an anonymous function. *) let e = Parser.parse_toplevel stream in print_endline "parsed a top-level expr"; let the_function = Codegen.codegen_func the_fpm e in dump_value the_function; (* JIT the function, returning a function pointer. *) let result = ExecutionEngine.run_function the_function [||] the_execution_engine in print_string "Evaluated to "; print_float (GenericValue.as_float Codegen.double_type result); print_newline (); with Stream.Error s | Codegen.Error s -> (* Skip token for error recovery. *) Stream.junk stream; print_endline s; end; print_string "ready> "; flush stdout; main_loop the_fpm the_execution_engine stream
- toy.ml:
(*===----------------------------------------------------------------------=== * Main driver code. *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine open Llvm_target open Llvm_scalar_opts let main () = ignore (initialize_native_target ()); (* Install standard binary operators. * 1 is the lowest precedence. *) Hashtbl.add Parser.binop_precedence '<' 10; Hashtbl.add Parser.binop_precedence '+' 20; Hashtbl.add Parser.binop_precedence '-' 20; Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) (* Prime the first token. *) print_string "ready> "; flush stdout; let stream = Lexer.lex (Stream.of_channel stdin) in (* Create the JIT. *) let the_execution_engine = ExecutionEngine.create Codegen.the_module in let the_fpm = PassManager.create_function Codegen.the_module in (* Set up the optimizer pipeline. Start with registering info about how the * target lays out data structures. *) DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; (* Do simple "peephole" optimizations and bit-twiddling optzn. *) add_instruction_combination the_fpm; (* reassociate expressions. *) add_reassociation the_fpm; (* Eliminate Common SubExpressions. *) add_gvn the_fpm; (* Simplify the control flow graph (deleting unreachable blocks, etc). *) add_cfg_simplification the_fpm; ignore (PassManager.initialize the_fpm); (* Run the main "interpreter loop" now. *) Toplevel.main_loop the_fpm the_execution_engine stream; (* Print out all the generated code. *) dump_module Codegen.the_module ;; main ()
- bindings.c
#include <stdio.h> /* putchard - putchar that takes a double and returns 0. */ extern double putchard(double X) { putchar((char)X); return 0; }
Kaleidoscope: Extending the Language: User-defined Operators¶
Chapter 6 Introduction¶
Welcome to Chapter 6 of the “Implementing a language with LLVM” tutorial. At this point in our tutorial, we now have a fully functional language that is fairly minimal, but also useful. There is still one big problem with it, however. Our language doesn’t have many useful operators (like division, logical negation, or even any comparisons besides less-than).
This chapter of the tutorial takes a wild digression into adding user-defined operators to the simple and beautiful Kaleidoscope language. This digression now gives us a simple and ugly language in some ways, but also a powerful one at the same time. One of the great things about creating your own language is that you get to decide what is good or bad. In this tutorial we’ll assume that it is okay to use this as a way to show some interesting parsing techniques.
At the end of this tutorial, we’ll run through an example Kaleidoscope application that renders the Mandelbrot set. This gives an example of what you can build with Kaleidoscope and its feature set.
User-defined Operators: the Idea¶
The “operator overloading” that we will add to Kaleidoscope is more general than languages like C++. In C++, you are only allowed to redefine existing operators: you can’t programatically change the grammar, introduce new operators, change precedence levels, etc. In this chapter, we will add this capability to Kaleidoscope, which will let the user round out the set of operators that are supported.
The point of going into user-defined operators in a tutorial like this is to show the power and flexibility of using a hand-written parser. Thus far, the parser we have been implementing uses recursive descent for most parts of the grammar and operator precedence parsing for the expressions. See Chapter 2 for details. Without using operator precedence parsing, it would be very difficult to allow the programmer to introduce new operators into the grammar: the grammar is dynamically extensible as the JIT runs.
The two specific features we’ll add are programmable unary operators (right now, Kaleidoscope has no unary operators at all) as well as binary operators. An example of this is:
# Logical unary not.
def unary!(v)
if v then
0
else
1;
# Define > with the same precedence as <.
def binary> 10 (LHS RHS)
RHS < LHS;
# Binary "logical or", (note that it does not "short circuit")
def binary| 5 (LHS RHS)
if LHS then
1
else if RHS then
1
else
0;
# Define = with slightly lower precedence than relationals.
def binary= 9 (LHS RHS)
!(LHS < RHS | LHS > RHS);
Many languages aspire to being able to implement their standard runtime library in the language itself. In Kaleidoscope, we can implement significant parts of the language in the library!
We will break down implementation of these features into two parts: implementing support for user-defined binary operators and adding unary operators.
User-defined Binary Operators¶
Adding support for user-defined binary operators is pretty simple with our current framework. We’ll first add support for the unary/binary keywords:
type token =
...
(* operators *)
| Binary | Unary
...
and lex_ident buffer = parser
...
| "for" -> [< 'Token.For; stream >]
| "in" -> [< 'Token.In; stream >]
| "binary" -> [< 'Token.Binary; stream >]
| "unary" -> [< 'Token.Unary; stream >]
This just adds lexer support for the unary and binary keywords, like we did in previous chapters. One nice thing about our current AST, is that we represent binary operators with full generalisation by using their ASCII code as the opcode. For our extended operators, we’ll use this same representation, so we don’t need any new AST or parser support.
On the other hand, we have to be able to represent the definitions of
these new operators, in the “def binary| 5” part of the function
definition. In our grammar so far, the “name” for the function
definition is parsed as the “prototype” production and into the
Ast.Prototype
AST node. To represent our new user-defined operators
as prototypes, we have to extend the Ast.Prototype
AST node like
this:
(* proto - This type represents the "prototype" for a function, which captures
* its name, and its argument names (thus implicitly the number of arguments the
* function takes). *)
type proto =
| Prototype of string * string array
| BinOpPrototype of string * string array * int
Basically, in addition to knowing a name for the prototype, we now keep track of whether it was an operator, and if it was, what precedence level the operator is at. The precedence is only used for binary operators (as you’ll see below, it just doesn’t apply for unary operators). Now that we have a way to represent the prototype for a user-defined operator, we need to parse it:
(* prototype
* ::= id '(' id* ')'
* ::= binary LETTER number? (id, id)
* ::= unary LETTER number? (id) *)
let parse_prototype =
let rec parse_args accumulator = parser
| [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
| [< >] -> accumulator
in
let parse_operator = parser
| [< 'Token.Unary >] -> "unary", 1
| [< 'Token.Binary >] -> "binary", 2
in
let parse_binary_precedence = parser
| [< 'Token.Number n >] -> int_of_float n
| [< >] -> 30
in
parser
| [< 'Token.Ident id;
'Token.Kwd '(' ?? "expected '(' in prototype";
args=parse_args [];
'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
(* success. *)
Ast.Prototype (id, Array.of_list (List.rev args))
| [< (prefix, kind)=parse_operator;
'Token.Kwd op ?? "expected an operator";
(* Read the precedence if present. *)
binary_precedence=parse_binary_precedence;
'Token.Kwd '(' ?? "expected '(' in prototype";
args=parse_args [];
'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
let name = prefix ^ (String.make 1 op) in
let args = Array.of_list (List.rev args) in
(* Verify right number of arguments for operator. *)
if Array.length args != kind
then raise (Stream.Error "invalid number of operands for operator")
else
if kind == 1 then
Ast.Prototype (name, args)
else
Ast.BinOpPrototype (name, args, binary_precedence)
| [< >] ->
raise (Stream.Error "expected function name in prototype")
This is all fairly straightforward parsing code, and we have already
seen a lot of similar code in the past. One interesting part about the
code above is the couple lines that set up name
for binary
operators. This builds names like “binary@” for a newly defined “@”
operator. This then takes advantage of the fact that symbol names in the
LLVM symbol table are allowed to have any character in them, including
embedded nul characters.
The next interesting thing to add, is codegen support for these binary operators. Given our current structure, this is a simple addition of a default case for our existing binary operator node:
let codegen_expr = function
...
| Ast.Binary (op, lhs, rhs) ->
let lhs_val = codegen_expr lhs in
let rhs_val = codegen_expr rhs in
begin
match op with
| '+' -> build_add lhs_val rhs_val "addtmp" builder
| '-' -> build_sub lhs_val rhs_val "subtmp" builder
| '*' -> build_mul lhs_val rhs_val "multmp" builder
| '<' ->
(* Convert bool 0/1 to double 0.0 or 1.0 *)
let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
build_uitofp i double_type "booltmp" builder
| _ ->
(* If it wasn't a builtin binary operator, it must be a user defined
* one. Emit a call to it. *)
let callee = "binary" ^ (String.make 1 op) in
let callee =
match lookup_function callee the_module with
| Some callee -> callee
| None -> raise (Error "binary operator not found!")
in
build_call callee [|lhs_val; rhs_val|] "binop" builder
end
As you can see above, the new code is actually really simple. It just does a lookup for the appropriate operator in the symbol table and generates a function call to it. Since user-defined operators are just built as normal functions (because the “prototype” boils down to a function with the right name) everything falls into place.
The final piece of code we are missing, is a bit of top level magic:
let codegen_func the_fpm = function
| Ast.Function (proto, body) ->
Hashtbl.clear named_values;
let the_function = codegen_proto proto in
(* If this is an operator, install it. *)
begin match proto with
| Ast.BinOpPrototype (name, args, prec) ->
let op = name.[String.length name - 1] in
Hashtbl.add Parser.binop_precedence op prec;
| _ -> ()
end;
(* Create a new basic block to start insertion into. *)
let bb = append_block context "entry" the_function in
position_at_end bb builder;
...
Basically, before codegening a function, if it is a user-defined operator, we register it in the precedence table. This allows the binary operator parsing logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to “extend the grammar”.
Now we have useful user-defined binary operators. This builds a lot on the previous framework we built for other operators. Adding unary operators is a bit more challenging, because we don’t have any framework for it yet - lets see what it takes.
User-defined Unary Operators¶
Since we don’t currently support unary operators in the Kaleidoscope language, we’ll need to add everything to support them. Above, we added simple support for the ‘unary’ keyword to the lexer. In addition to that, we need an AST node:
type expr =
...
(* variant for a unary operator. *)
| Unary of char * expr
...
This AST node is very simple and obvious by now. It directly mirrors the binary operator AST node, except that it only has one child. With this, we need to add the parsing logic. Parsing a unary operator is pretty simple: we’ll add a new function to do it:
(* unary
* ::= primary
* ::= '!' unary *)
and parse_unary = parser
(* If this is a unary operator, read it. *)
| [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] ->
Ast.Unary (op, operand)
(* If the current token is not an operator, it must be a primary expr. *)
| [< stream >] -> parse_primary stream
The grammar we add is pretty straightforward here. If we see a unary operator when parsing a primary operator, we eat the operator as a prefix and parse the remaining piece as another unary operator. This allows us to handle multiple unary operators (e.g. ”!!x”). Note that unary operators can’t have ambiguous parses like binary operators can, so there is no need for precedence information.
The problem with this function, is that we need to call ParseUnary from
somewhere. To do this, we change previous callers of ParsePrimary to
call parse_unary
instead:
(* binoprhs
* ::= ('+' primary)* *)
and parse_bin_rhs expr_prec lhs stream =
...
(* Parse the unary expression after the binary operator. *)
let rhs = parse_unary stream in
...
...
(* expression
* ::= primary binoprhs *)
and parse_expr = parser
| [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream
With these two simple changes, we are now able to parse unary operators and build the AST for them. Next up, we need to add parser support for prototypes, to parse the unary operator prototype. We extend the binary operator code above with:
(* prototype
* ::= id '(' id* ')'
* ::= binary LETTER number? (id, id)
* ::= unary LETTER number? (id) *)
let parse_prototype =
let rec parse_args accumulator = parser
| [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
| [< >] -> accumulator
in
let parse_operator = parser
| [< 'Token.Unary >] -> "unary", 1
| [< 'Token.Binary >] -> "binary", 2
in
let parse_binary_precedence = parser
| [< 'Token.Number n >] -> int_of_float n
| [< >] -> 30
in
parser
| [< 'Token.Ident id;
'Token.Kwd '(' ?? "expected '(' in prototype";
args=parse_args [];
'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
(* success. *)
Ast.Prototype (id, Array.of_list (List.rev args))
| [< (prefix, kind)=parse_operator;
'Token.Kwd op ?? "expected an operator";
(* Read the precedence if present. *)
binary_precedence=parse_binary_precedence;
'Token.Kwd '(' ?? "expected '(' in prototype";
args=parse_args [];
'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
let name = prefix ^ (String.make 1 op) in
let args = Array.of_list (List.rev args) in
(* Verify right number of arguments for operator. *)
if Array.length args != kind
then raise (Stream.Error "invalid number of operands for operator")
else
if kind == 1 then
Ast.Prototype (name, args)
else
Ast.BinOpPrototype (name, args, binary_precedence)
| [< >] ->
raise (Stream.Error "expected function name in prototype")
As with binary operators, we name unary operators with a name that includes the operator character. This assists us at code generation time. Speaking of, the final piece we need to add is codegen support for unary operators. It looks like this:
let rec codegen_expr = function
...
| Ast.Unary (op, operand) ->
let operand = codegen_expr operand in
let callee = "unary" ^ (String.make 1 op) in
let callee =
match lookup_function callee the_module with
| Some callee -> callee
| None -> raise (Error "unknown unary operator")
in
build_call callee [|operand|] "unop" builder
This code is similar to, but simpler than, the code for binary operators. It is simpler primarily because it doesn’t need to handle any predefined operators.
Kicking the Tires¶
It is somewhat hard to believe, but with a few simple extensions we’ve covered in the last chapters, we have grown a real-ish language. With this, we can do a lot of interesting things, including I/O, math, and a bunch of other things. For example, we can now add a nice sequencing operator (printd is defined to print out the specified value and a newline):
ready> extern printd(x);
Read extern: declare double @printd(double)
ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
..
ready> printd(123) : printd(456) : printd(789);
123.000000
456.000000
789.000000
Evaluated to 0.000000
We can also define a bunch of other “primitive” operations, such as:
# Logical unary not.
def unary!(v)
if v then
0
else
1;
# Unary negate.
def unary-(v)
0-v;
# Define > with the same precedence as <.
def binary> 10 (LHS RHS)
RHS < LHS;
# Binary logical or, which does not short circuit.
def binary| 5 (LHS RHS)
if LHS then
1
else if RHS then
1
else
0;
# Binary logical and, which does not short circuit.
def binary& 6 (LHS RHS)
if !LHS then
0
else
!!RHS;
# Define = with slightly lower precedence than relationals.
def binary = 9 (LHS RHS)
!(LHS < RHS | LHS > RHS);
Given the previous if/then/else support, we can also define interesting functions for I/O. For example, the following prints out a character whose “density” reflects the value passed in: the lower the value, the denser the character:
ready>
extern putchard(char)
def printdensity(d)
if d > 8 then
putchard(32) # ' '
else if d > 4 then
putchard(46) # '.'
else if d > 2 then
putchard(43) # '+'
else
putchard(42); # '*'
...
ready> printdensity(1): printdensity(2): printdensity(3) :
printdensity(4): printdensity(5): printdensity(9): putchard(10);
*++..
Evaluated to 0.000000
Based on these simple primitive operations, we can start to define more interesting things. For example, here’s a little function that solves for the number of iterations it takes a function in the complex plane to converge:
# determine whether the specific location diverges.
# Solve for z = z^2 + c in the complex plane.
def mandleconverger(real imag iters creal cimag)
if iters > 255 | (real*real + imag*imag > 4) then
iters
else
mandleconverger(real*real - imag*imag + creal,
2*real*imag + cimag,
iters+1, creal, cimag);
# return the number of iterations required for the iteration to escape
def mandleconverge(real imag)
mandleconverger(real, imag, 0, real, imag);
This “z = z2 + c” function is a beautiful little creature
that is the basis for computation of the Mandelbrot
Set. Our
mandelconverge
function returns the number of iterations that it
takes for a complex orbit to escape, saturating to 255. This is not a
very useful function by itself, but if you plot its value over a
two-dimensional plane, you can see the Mandelbrot set. Given that we are
limited to using putchard here, our amazing graphical output is limited,
but we can whip together something using the density plotter above:
# compute and plot the mandlebrot set with the specified 2 dimensional range
# info.
def mandelhelp(xmin xmax xstep ymin ymax ystep)
for y = ymin, y < ymax, ystep in (
(for x = xmin, x < xmax, xstep in
printdensity(mandleconverge(x,y)))
: putchard(10)
)
# mandel - This is a convenient helper function for plotting the mandelbrot set
# from the specified position with the specified Magnification.
def mandel(realstart imagstart realmag imagmag)
mandelhelp(realstart, realstart+realmag*78, realmag,
imagstart, imagstart+imagmag*40, imagmag);
Given this, we can try plotting out the mandlebrot set! Lets try it out:
ready> mandel(-2.3, -1.3, 0.05, 0.07);
*******************************+++++++++++*************************************
*************************+++++++++++++++++++++++*******************************
**********************+++++++++++++++++++++++++++++****************************
*******************+++++++++++++++++++++.. ...++++++++*************************
*****************++++++++++++++++++++++.... ...+++++++++***********************
***************+++++++++++++++++++++++..... ...+++++++++*********************
**************+++++++++++++++++++++++.... ....+++++++++********************
*************++++++++++++++++++++++...... .....++++++++*******************
************+++++++++++++++++++++....... .......+++++++******************
***********+++++++++++++++++++.... ... .+++++++*****************
**********+++++++++++++++++....... .+++++++****************
*********++++++++++++++........... ...+++++++***************
********++++++++++++............ ...++++++++**************
********++++++++++... .......... .++++++++**************
*******+++++++++..... .+++++++++*************
*******++++++++...... ..+++++++++*************
*******++++++....... ..+++++++++*************
*******+++++...... ..+++++++++*************
*******.... .... ...+++++++++*************
*******.... . ...+++++++++*************
*******+++++...... ...+++++++++*************
*******++++++....... ..+++++++++*************
*******++++++++...... .+++++++++*************
*******+++++++++..... ..+++++++++*************
********++++++++++... .......... .++++++++**************
********++++++++++++............ ...++++++++**************
*********++++++++++++++.......... ...+++++++***************
**********++++++++++++++++........ .+++++++****************
**********++++++++++++++++++++.... ... ..+++++++****************
***********++++++++++++++++++++++....... .......++++++++*****************
************+++++++++++++++++++++++...... ......++++++++******************
**************+++++++++++++++++++++++.... ....++++++++********************
***************+++++++++++++++++++++++..... ...+++++++++*********************
*****************++++++++++++++++++++++.... ...++++++++***********************
*******************+++++++++++++++++++++......++++++++*************************
*********************++++++++++++++++++++++.++++++++***************************
*************************+++++++++++++++++++++++*******************************
******************************+++++++++++++************************************
*******************************************************************************
*******************************************************************************
*******************************************************************************
Evaluated to 0.000000
ready> mandel(-2, -1, 0.02, 0.04);
**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
**************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
**********++++++++++++++++++++++++++++++++++++++++++++++.............
********+++++++++++++++++++++++++++++++++++++++++++..................
*******+++++++++++++++++++++++++++++++++++++++.......................
******+++++++++++++++++++++++++++++++++++...........................
*****++++++++++++++++++++++++++++++++............................
*****++++++++++++++++++++++++++++...............................
****++++++++++++++++++++++++++...... .........................
***++++++++++++++++++++++++......... ...... ...........
***++++++++++++++++++++++............
**+++++++++++++++++++++..............
**+++++++++++++++++++................
*++++++++++++++++++.................
*++++++++++++++++............ ...
*++++++++++++++..............
*+++....++++................
*.......... ...........
*
*.......... ...........
*+++....++++................
*++++++++++++++..............
*++++++++++++++++............ ...
*++++++++++++++++++.................
**+++++++++++++++++++................
**+++++++++++++++++++++..............
***++++++++++++++++++++++............
***++++++++++++++++++++++++......... ...... ...........
****++++++++++++++++++++++++++...... .........................
*****++++++++++++++++++++++++++++...............................
*****++++++++++++++++++++++++++++++++............................
******+++++++++++++++++++++++++++++++++++...........................
*******+++++++++++++++++++++++++++++++++++++++.......................
********+++++++++++++++++++++++++++++++++++++++++++..................
Evaluated to 0.000000
ready> mandel(-0.9, -1.4, 0.02, 0.03);
*******************************************************************************
*******************************************************************************
*******************************************************************************
**********+++++++++++++++++++++************************************************
*+++++++++++++++++++++++++++++++++++++++***************************************
+++++++++++++++++++++++++++++++++++++++++++++**********************************
++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
+++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
+++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
++++++++++++++++++++++++............. .......++++++++++++++++++++++******
+++++++++++++++++++++++............. ........+++++++++++++++++++++++****
++++++++++++++++++++++........... ..........++++++++++++++++++++++***
++++++++++++++++++++........... .........++++++++++++++++++++++*
++++++++++++++++++............ ...........++++++++++++++++++++
++++++++++++++++............... .............++++++++++++++++++
++++++++++++++................. ...............++++++++++++++++
++++++++++++.................. .................++++++++++++++
+++++++++.................. .................+++++++++++++
++++++........ . ......... ..++++++++++++
++............ ...... ....++++++++++
.............. ...++++++++++
.............. ....+++++++++
.............. .....++++++++
............. ......++++++++
........... .......++++++++
......... ........+++++++
......... ........+++++++
......... ....+++++++
........ ...+++++++
....... ...+++++++
....+++++++
.....+++++++
....+++++++
....+++++++
....+++++++
Evaluated to 0.000000
ready> ^D
At this point, you may be starting to realize that Kaleidoscope is a real and powerful language. It may not be self-similar :), but it can be used to plot things that are!
With this, we conclude the “adding user-defined operators” chapter of the tutorial. We have successfully augmented our language, adding the ability to extend the language in the library, and we have shown how this can be used to build a simple but interesting end-user application in Kaleidoscope. At this point, Kaleidoscope can build a variety of applications that are functional and can call functions with side-effects, but it can’t actually define and mutate a variable itself.
Strikingly, variable mutation is an important feature of some languages, and it is not at all obvious how to add support for mutable variables without having to add an “SSA construction” phase to your front-end. In the next chapter, we will describe how you can add variable mutation without building SSA in your front-end.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with the if/then/else and for expressions.. To build this example, use:
# Compile
ocamlbuild toy.byte
# Run
./toy.byte
Here is the code:
- _tags:
<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) <*.{byte,native}>: g++, use_llvm, use_llvm_analysis <*.{byte,native}>: use_llvm_executionengine, use_llvm_target <*.{byte,native}>: use_llvm_scalar_opts, use_bindings
- myocamlbuild.ml:
open Ocamlbuild_plugin;; ocaml_lib ~extern:true "llvm";; ocaml_lib ~extern:true "llvm_analysis";; ocaml_lib ~extern:true "llvm_executionengine";; ocaml_lib ~extern:true "llvm_target";; ocaml_lib ~extern:true "llvm_scalar_opts";; flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
- token.ml:
(*===----------------------------------------------------------------------=== * Lexer Tokens *===----------------------------------------------------------------------===*) (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of * these others for known things. *) type token = (* commands *) | Def | Extern (* primary *) | Ident of string | Number of float (* unknown *) | Kwd of char (* control *) | If | Then | Else | For | In (* operators *) | Binary | Unary
- lexer.ml:
(*===----------------------------------------------------------------------=== * Lexer *===----------------------------------------------------------------------===*) let rec lex = parser (* Skip any whitespace. *) | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream (* identifier: [a-zA-Z][a-zA-Z0-9] *) | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_ident buffer stream (* number: [0-9.]+ *) | [< ' ('0' .. '9' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_number buffer stream (* Comment until end of line. *) | [< ' ('#'); stream >] -> lex_comment stream (* Otherwise, just return the character as its ascii value. *) | [< 'c; stream >] -> [< 'Token.Kwd c; lex stream >] (* end of stream. *) | [< >] -> [< >] and lex_number buffer = parser | [< ' ('0' .. '9' | '.' as c); stream >] -> Buffer.add_char buffer c; lex_number buffer stream | [< stream=lex >] -> [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] and lex_ident buffer = parser | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> Buffer.add_char buffer c; lex_ident buffer stream | [< stream=lex >] -> match Buffer.contents buffer with | "def" -> [< 'Token.Def; stream >] | "extern" -> [< 'Token.Extern; stream >] | "if" -> [< 'Token.If; stream >] | "then" -> [< 'Token.Then; stream >] | "else" -> [< 'Token.Else; stream >] | "for" -> [< 'Token.For; stream >] | "in" -> [< 'Token.In; stream >] | "binary" -> [< 'Token.Binary; stream >] | "unary" -> [< 'Token.Unary; stream >] | id -> [< 'Token.Ident id; stream >] and lex_comment = parser | [< ' ('\n'); stream=lex >] -> stream | [< 'c; e=lex_comment >] -> e | [< >] -> [< >]
- ast.ml:
(*===----------------------------------------------------------------------=== * Abstract Syntax Tree (aka Parse Tree) *===----------------------------------------------------------------------===*) (* expr - Base type for all expression nodes. *) type expr = (* variant for numeric literals like "1.0". *) | Number of float (* variant for referencing a variable, like "a". *) | Variable of string (* variant for a unary operator. *) | Unary of char * expr (* variant for a binary operator. *) | Binary of char * expr * expr (* variant for function calls. *) | Call of string * expr array (* variant for if/then/else. *) | If of expr * expr * expr (* variant for for/in. *) | For of string * expr * expr * expr option * expr (* proto - This type represents the "prototype" for a function, which captures * its name, and its argument names (thus implicitly the number of arguments the * function takes). *) type proto = | Prototype of string * string array | BinOpPrototype of string * string array * int (* func - This type represents a function definition itself. *) type func = Function of proto * expr
- parser.ml:
(*===---------------------------------------------------------------------=== * Parser *===---------------------------------------------------------------------===*) (* binop_precedence - This holds the precedence for each binary operator that is * defined *) let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 (* precedence - Get the precedence of the pending binary operator token. *) let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 (* primary * ::= identifier * ::= numberexpr * ::= parenexpr * ::= ifexpr * ::= forexpr *) let rec parse_primary = parser (* numberexpr ::= number *) | [< 'Token.Number n >] -> Ast.Number n (* parenexpr ::= '(' expression ')' *) | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e (* identifierexpr * ::= identifier * ::= identifier '(' argumentexpr ')' *) | [< 'Token.Ident id; stream >] -> let rec parse_args accumulator = parser | [< e=parse_expr; stream >] -> begin parser | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e | [< >] -> e :: accumulator end stream | [< >] -> accumulator in let rec parse_ident id = parser (* Call. *) | [< 'Token.Kwd '('; args=parse_args []; 'Token.Kwd ')' ?? "expected ')'">] -> Ast.Call (id, Array.of_list (List.rev args)) (* Simple variable ref. *) | [< >] -> Ast.Variable id in parse_ident id stream (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) | [< 'Token.If; c=parse_expr; 'Token.Then ?? "expected 'then'"; t=parse_expr; 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> Ast.If (c, t, e) (* forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) | [< 'Token.For; 'Token.Ident id ?? "expected identifier after for"; 'Token.Kwd '=' ?? "expected '=' after for"; stream >] -> begin parser | [< start=parse_expr; 'Token.Kwd ',' ?? "expected ',' after for"; end_=parse_expr; stream >] -> let step = begin parser | [< 'Token.Kwd ','; step=parse_expr >] -> Some step | [< >] -> None end stream in begin parser | [< 'Token.In; body=parse_expr >] -> Ast.For (id, start, end_, step, body) | [< >] -> raise (Stream.Error "expected 'in' after for") end stream | [< >] -> raise (Stream.Error "expected '=' after for") end stream | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") (* unary * ::= primary * ::= '!' unary *) and parse_unary = parser (* If this is a unary operator, read it. *) | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> Ast.Unary (op, operand) (* If the current token is not an operator, it must be a primary expr. *) | [< stream >] -> parse_primary stream (* binoprhs * ::= ('+' primary)* *) and parse_bin_rhs expr_prec lhs stream = match Stream.peek stream with (* If this is a binop, find its precedence. *) | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> let token_prec = precedence c in (* If this is a binop that binds at least as tightly as the current binop, * consume it, otherwise we are done. *) if token_prec < expr_prec then lhs else begin (* Eat the binop. *) Stream.junk stream; (* Parse the unary expression after the binary operator. *) let rhs = parse_unary stream in (* Okay, we know this is a binop. *) let rhs = match Stream.peek stream with | Some (Token.Kwd c2) -> (* If BinOp binds less tightly with rhs than the operator after * rhs, let the pending operator take rhs as its lhs. *) let next_prec = precedence c2 in if token_prec < next_prec then parse_bin_rhs (token_prec + 1) rhs stream else rhs | _ -> rhs in (* Merge lhs/rhs. *) let lhs = Ast.Binary (c, lhs, rhs) in parse_bin_rhs expr_prec lhs stream end | _ -> lhs (* expression * ::= primary binoprhs *) and parse_expr = parser | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream (* prototype * ::= id '(' id* ')' * ::= binary LETTER number? (id, id) * ::= unary LETTER number? (id) *) let parse_prototype = let rec parse_args accumulator = parser | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | [< >] -> accumulator in let parse_operator = parser | [< 'Token.Unary >] -> "unary", 1 | [< 'Token.Binary >] -> "binary", 2 in let parse_binary_precedence = parser | [< 'Token.Number n >] -> int_of_float n | [< >] -> 30 in parser | [< 'Token.Ident id; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> (* success. *) Ast.Prototype (id, Array.of_list (List.rev args)) | [< (prefix, kind)=parse_operator; 'Token.Kwd op ?? "expected an operator"; (* Read the precedence if present. *) binary_precedence=parse_binary_precedence; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> let name = prefix ^ (String.make 1 op) in let args = Array.of_list (List.rev args) in (* Verify right number of arguments for operator. *) if Array.length args != kind then raise (Stream.Error "invalid number of operands for operator") else if kind == 1 then Ast.Prototype (name, args) else Ast.BinOpPrototype (name, args, binary_precedence) | [< >] -> raise (Stream.Error "expected function name in prototype") (* definition ::= 'def' prototype expression *) let parse_definition = parser | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> Ast.Function (p, e) (* toplevelexpr ::= expression *) let parse_toplevel = parser | [< e=parse_expr >] -> (* Make an anonymous proto. *) Ast.Function (Ast.Prototype ("", [||]), e) (* external ::= 'extern' prototype *) let parse_extern = parser | [< 'Token.Extern; e=parse_prototype >] -> e
- codegen.ml:
(*===----------------------------------------------------------------------=== * Code Generation *===----------------------------------------------------------------------===*) open Llvm exception Error of string let context = global_context () let the_module = create_module context "my cool jit" let builder = builder context let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 let double_type = double_type context let rec codegen_expr = function | Ast.Number n -> const_float double_type n | Ast.Variable name -> (try Hashtbl.find named_values name with | Not_found -> raise (Error "unknown variable name")) | Ast.Unary (op, operand) -> let operand = codegen_expr operand in let callee = "unary" ^ (String.make 1 op) in let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "unknown unary operator") in build_call callee [|operand|] "unop" builder | Ast.Binary (op, lhs, rhs) -> let lhs_val = codegen_expr lhs in let rhs_val = codegen_expr rhs in begin match op with | '+' -> build_add lhs_val rhs_val "addtmp" builder | '-' -> build_sub lhs_val rhs_val "subtmp" builder | '*' -> build_mul lhs_val rhs_val "multmp" builder | '<' -> (* Convert bool 0/1 to double 0.0 or 1.0 *) let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in build_uitofp i double_type "booltmp" builder | _ -> (* If it wasn't a builtin binary operator, it must be a user defined * one. Emit a call to it. *) let callee = "binary" ^ (String.make 1 op) in let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "binary operator not found!") in build_call callee [|lhs_val; rhs_val|] "binop" builder end | Ast.Call (callee, args) -> (* Look up the name in the module table. *) let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "unknown function referenced") in let params = params callee in (* If argument mismatch error. *) if Array.length params == Array.length args then () else raise (Error "incorrect # arguments passed"); let args = Array.map codegen_expr args in build_call callee args "calltmp" builder | Ast.If (cond, then_, else_) -> let cond = codegen_expr cond in (* Convert condition to a bool by comparing equal to 0.0 *) let zero = const_float double_type 0.0 in let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in (* Grab the first block so that we might later add the conditional branch * to it at the end of the function. *) let start_bb = insertion_block builder in let the_function = block_parent start_bb in let then_bb = append_block context "then" the_function in (* Emit 'then' value. *) position_at_end then_bb builder; let then_val = codegen_expr then_ in (* Codegen of 'then' can change the current block, update then_bb for the * phi. We create a new name because one is used for the phi node, and the * other is used for the conditional branch. *) let new_then_bb = insertion_block builder in (* Emit 'else' value. *) let else_bb = append_block context "else" the_function in position_at_end else_bb builder; let else_val = codegen_expr else_ in (* Codegen of 'else' can change the current block, update else_bb for the * phi. *) let new_else_bb = insertion_block builder in (* Emit merge block. *) let merge_bb = append_block context "ifcont" the_function in position_at_end merge_bb builder; let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in let phi = build_phi incoming "iftmp" builder in (* Return to the start block to add the conditional branch. *) position_at_end start_bb builder; ignore (build_cond_br cond_val then_bb else_bb builder); (* Set a unconditional branch at the end of the 'then' block and the * 'else' block to the 'merge' block. *) position_at_end new_then_bb builder; ignore (build_br merge_bb builder); position_at_end new_else_bb builder; ignore (build_br merge_bb builder); (* Finally, set the builder to the end of the merge block. *) position_at_end merge_bb builder; phi | Ast.For (var_name, start, end_, step, body) -> (* Emit the start code first, without 'variable' in scope. *) let start_val = codegen_expr start in (* Make the new basic block for the loop header, inserting after current * block. *) let preheader_bb = insertion_block builder in let the_function = block_parent preheader_bb in let loop_bb = append_block context "loop" the_function in (* Insert an explicit fall through from the current block to the * loop_bb. *) ignore (build_br loop_bb builder); (* Start insertion in loop_bb. *) position_at_end loop_bb builder; (* Start the PHI node with an entry for start. *) let variable = build_phi [(start_val, preheader_bb)] var_name builder in (* Within the loop, the variable is defined equal to the PHI node. If it * shadows an existing variable, we have to restore it, so save it * now. *) let old_val = try Some (Hashtbl.find named_values var_name) with Not_found -> None in Hashtbl.add named_values var_name variable; (* Emit the body of the loop. This, like any other expr, can change the * current BB. Note that we ignore the value computed by the body, but * don't allow an error *) ignore (codegen_expr body); (* Emit the step value. *) let step_val = match step with | Some step -> codegen_expr step (* If not specified, use 1.0. *) | None -> const_float double_type 1.0 in let next_var = build_add variable step_val "nextvar" builder in (* Compute the end condition. *) let end_cond = codegen_expr end_ in (* Convert condition to a bool by comparing equal to 0.0. *) let zero = const_float double_type 0.0 in let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in (* Create the "after loop" block and insert it. *) let loop_end_bb = insertion_block builder in let after_bb = append_block context "afterloop" the_function in (* Insert the conditional branch into the end of loop_end_bb. *) ignore (build_cond_br end_cond loop_bb after_bb builder); (* Any new code will be inserted in after_bb. *) position_at_end after_bb builder; (* Add a new entry to the PHI node for the backedge. *) add_incoming (next_var, loop_end_bb) variable; (* Restore the unshadowed variable. *) begin match old_val with | Some old_val -> Hashtbl.add named_values var_name old_val | None -> () end; (* for expr always returns 0.0. *) const_null double_type let codegen_proto = function | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> (* Make the function type: double(double,double) etc. *) let doubles = Array.make (Array.length args) double_type in let ft = function_type double_type doubles in let f = match lookup_function name the_module with | None -> declare_function name ft the_module (* If 'f' conflicted, there was already something named 'name'. If it * has a body, don't allow redefinition or reextern. *) | Some f -> (* If 'f' already has a body, reject this. *) if block_begin f <> At_end f then raise (Error "redefinition of function"); (* If 'f' took a different number of arguments, reject. *) if element_type (type_of f) <> ft then raise (Error "redefinition of function with different # args"); f in (* Set names for all arguments. *) Array.iteri (fun i a -> let n = args.(i) in set_value_name n a; Hashtbl.add named_values n a; ) (params f); f let codegen_func the_fpm = function | Ast.Function (proto, body) -> Hashtbl.clear named_values; let the_function = codegen_proto proto in (* If this is an operator, install it. *) begin match proto with | Ast.BinOpPrototype (name, args, prec) -> let op = name.[String.length name - 1] in Hashtbl.add Parser.binop_precedence op prec; | _ -> () end; (* Create a new basic block to start insertion into. *) let bb = append_block context "entry" the_function in position_at_end bb builder; try let ret_val = codegen_expr body in (* Finish off the function. *) let _ = build_ret ret_val builder in (* Validate the generated code, checking for consistency. *) Llvm_analysis.assert_valid_function the_function; (* Optimize the function. *) let _ = PassManager.run_function the_function the_fpm in the_function with e -> delete_function the_function; raise e
- toplevel.ml:
(*===----------------------------------------------------------------------=== * Top-Level parsing and JIT Driver *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine (* top ::= definition | external | expression | ';' *) let rec main_loop the_fpm the_execution_engine stream = match Stream.peek stream with | None -> () (* ignore top-level semicolons. *) | Some (Token.Kwd ';') -> Stream.junk stream; main_loop the_fpm the_execution_engine stream | Some token -> begin try match token with | Token.Def -> let e = Parser.parse_definition stream in print_endline "parsed a function definition."; dump_value (Codegen.codegen_func the_fpm e); | Token.Extern -> let e = Parser.parse_extern stream in print_endline "parsed an extern."; dump_value (Codegen.codegen_proto e); | _ -> (* Evaluate a top-level expression into an anonymous function. *) let e = Parser.parse_toplevel stream in print_endline "parsed a top-level expr"; let the_function = Codegen.codegen_func the_fpm e in dump_value the_function; (* JIT the function, returning a function pointer. *) let result = ExecutionEngine.run_function the_function [||] the_execution_engine in print_string "Evaluated to "; print_float (GenericValue.as_float Codegen.double_type result); print_newline (); with Stream.Error s | Codegen.Error s -> (* Skip token for error recovery. *) Stream.junk stream; print_endline s; end; print_string "ready> "; flush stdout; main_loop the_fpm the_execution_engine stream
- toy.ml:
(*===----------------------------------------------------------------------=== * Main driver code. *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine open Llvm_target open Llvm_scalar_opts let main () = ignore (initialize_native_target ()); (* Install standard binary operators. * 1 is the lowest precedence. *) Hashtbl.add Parser.binop_precedence '<' 10; Hashtbl.add Parser.binop_precedence '+' 20; Hashtbl.add Parser.binop_precedence '-' 20; Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) (* Prime the first token. *) print_string "ready> "; flush stdout; let stream = Lexer.lex (Stream.of_channel stdin) in (* Create the JIT. *) let the_execution_engine = ExecutionEngine.create Codegen.the_module in let the_fpm = PassManager.create_function Codegen.the_module in (* Set up the optimizer pipeline. Start with registering info about how the * target lays out data structures. *) DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; (* Do simple "peephole" optimizations and bit-twiddling optzn. *) add_instruction_combination the_fpm; (* reassociate expressions. *) add_reassociation the_fpm; (* Eliminate Common SubExpressions. *) add_gvn the_fpm; (* Simplify the control flow graph (deleting unreachable blocks, etc). *) add_cfg_simplification the_fpm; ignore (PassManager.initialize the_fpm); (* Run the main "interpreter loop" now. *) Toplevel.main_loop the_fpm the_execution_engine stream; (* Print out all the generated code. *) dump_module Codegen.the_module ;; main ()
- bindings.c
#include <stdio.h> /* putchard - putchar that takes a double and returns 0. */ extern double putchard(double X) { putchar((char)X); return 0; } /* printd - printf that takes a double prints it as "%f\n", returning 0. */ extern double printd(double X) { printf("%f\n", X); return 0; }
Next: Extending the language: mutable variables / SSA construction
Kaleidoscope: Extending the Language: Mutable Variables¶
Chapter 7 Introduction¶
Welcome to Chapter 7 of the “Implementing a language with LLVM” tutorial. In chapters 1 through 6, we’ve built a very respectable, albeit simple, functional programming language. In our journey, we learned some parsing techniques, how to build and represent an AST, how to build LLVM IR, and how to optimize the resultant code as well as JIT compile it.
While Kaleidoscope is interesting as a functional language, the fact that it is functional makes it “too easy” to generate LLVM IR for it. In particular, a functional language makes it very easy to build LLVM IR directly in SSA form. Since LLVM requires that the input code be in SSA form, this is a very nice property and it is often unclear to newcomers how to generate code for an imperative language with mutable variables.
The short (and happy) summary of this chapter is that there is no need for your front-end to build SSA form: LLVM provides highly tuned and well tested support for this, though the way it works is a bit unexpected for some.
Why is this a hard problem?¶
To understand why mutable variables cause complexities in SSA construction, consider this extremely simple C example:
int G, H;
int test(_Bool Condition) {
int X;
if (Condition)
X = G;
else
X = H;
return X;
}
In this case, we have the variable “X”, whose value depends on the path executed in the program. Because there are two different possible values for X before the return instruction, a PHI node is inserted to merge the two values. The LLVM IR that we want for this example looks like this:
@G = weak global i32 0 ; type of @G is i32*
@H = weak global i32 0 ; type of @H is i32*
define i32 @test(i1 %Condition) {
entry:
br i1 %Condition, label %cond_true, label %cond_false
cond_true:
%X.0 = load i32* @G
br label %cond_next
cond_false:
%X.1 = load i32* @H
br label %cond_next
cond_next:
%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
ret i32 %X.2
}
In this example, the loads from the G and H global variables are explicit in the LLVM IR, and they live in the then/else branches of the if statement (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node in the cond_next block selects the right value to use based on where control flow is coming from: if control flow comes from the cond_false block, X.2 gets the value of X.1. Alternatively, if control flow comes from cond_true, it gets the value of X.0. The intent of this chapter is not to explain the details of SSA form. For more information, see one of the many online references.
The question for this article is “who places the phi nodes when lowering assignments to mutable variables?”. The issue here is that LLVM requires that its IR be in SSA form: there is no “non-ssa” mode for it. However, SSA construction requires non-trivial algorithms and data structures, so it is inconvenient and wasteful for every front-end to have to reproduce this logic.
Memory in LLVM¶
The ‘trick’ here is that while LLVM does require all register values to be in SSA form, it does not require (or permit) memory objects to be in SSA form. In the example above, note that the loads from G and H are direct accesses to G and H: they are not renamed or versioned. This differs from some other compiler systems, which do try to version memory objects. In LLVM, instead of encoding dataflow analysis of memory into the LLVM IR, it is handled with Analysis Passes which are computed on demand.
With this in mind, the high-level idea is that we want to make a stack variable (which lives in memory, because it is on the stack) for each mutable object in a function. To take advantage of this trick, we need to talk about how LLVM represents stack variables.
In LLVM, all memory accesses are explicit with load/store instructions, and it is carefully designed not to have (or need) an “address-of” operator. Notice how the type of the @G/@H global variables is actually “i32*” even though the variable is defined as “i32”. What this means is that @G defines space for an i32 in the global data area, but its name actually refers to the address for that space. Stack variables work the same way, except that instead of being declared with global variable definitions, they are declared with the LLVM alloca instruction:
define i32 @example() {
entry:
%X = alloca i32 ; type of %X is i32*.
...
%tmp = load i32* %X ; load the stack value %X from the stack.
%tmp2 = add i32 %tmp, 1 ; increment it
store i32 %tmp2, i32* %X ; store it back
...
This code shows an example of how you can declare and manipulate a stack variable in the LLVM IR. Stack memory allocated with the alloca instruction is fully general: you can pass the address of the stack slot to functions, you can store it in other variables, etc. In our example above, we could rewrite the example to use the alloca technique to avoid using a PHI node:
@G = weak global i32 0 ; type of @G is i32*
@H = weak global i32 0 ; type of @H is i32*
define i32 @test(i1 %Condition) {
entry:
%X = alloca i32 ; type of %X is i32*.
br i1 %Condition, label %cond_true, label %cond_false
cond_true:
%X.0 = load i32* @G
store i32 %X.0, i32* %X ; Update X
br label %cond_next
cond_false:
%X.1 = load i32* @H
store i32 %X.1, i32* %X ; Update X
br label %cond_next
cond_next:
%X.2 = load i32* %X ; Read X
ret i32 %X.2
}
With this, we have discovered a way to handle arbitrary mutable variables without the need to create Phi nodes at all:
- Each mutable variable becomes a stack allocation.
- Each read of the variable becomes a load from the stack.
- Each update of the variable becomes a store to the stack.
- Taking the address of a variable just uses the stack address directly.
While this solution has solved our immediate problem, it introduced another one: we have now apparently introduced a lot of stack traffic for very simple and common operations, a major performance problem. Fortunately for us, the LLVM optimizer has a highly-tuned optimization pass named “mem2reg” that handles this case, promoting allocas like this into SSA registers, inserting Phi nodes as appropriate. If you run this example through the pass, for example, you’ll get:
$ llvm-as < example.ll | opt -mem2reg | llvm-dis
@G = weak global i32 0
@H = weak global i32 0
define i32 @test(i1 %Condition) {
entry:
br i1 %Condition, label %cond_true, label %cond_false
cond_true:
%X.0 = load i32* @G
br label %cond_next
cond_false:
%X.1 = load i32* @H
br label %cond_next
cond_next:
%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
ret i32 %X.01
}
The mem2reg pass implements the standard “iterated dominance frontier” algorithm for constructing SSA form and has a number of optimizations that speed up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing with mutable variables, and we highly recommend that you depend on it. Note that mem2reg only works on variables in certain circumstances:
- mem2reg is alloca-driven: it looks for allocas and if it can handle them, it promotes them. It does not apply to global variables or heap allocations.
- mem2reg only looks for alloca instructions in the entry block of the function. Being in the entry block guarantees that the alloca is only executed once, which makes analysis simpler.
- mem2reg only promotes allocas whose uses are direct loads and stores. If the address of the stack object is passed to a function, or if any funny pointer arithmetic is involved, the alloca will not be promoted.
- mem2reg only works on allocas of first class values (such as pointers, scalars and vectors), and only if the array size of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of promoting structs or arrays to registers. Note that the “scalarrepl” pass is more powerful and can promote structs, “unions”, and arrays in many cases.
All of these properties are easy to satisfy for most imperative languages, and we’ll illustrate it below with Kaleidoscope. The final question you may be asking is: should I bother with this nonsense for my front-end? Wouldn’t it be better if I just did SSA construction directly, avoiding use of the mem2reg optimization pass? In short, we strongly recommend that you use this technique for building SSA form, unless there is an extremely good reason not to. Using this technique is:
- Proven and well tested: clang uses this technique for local mutable variables. As such, the most common clients of LLVM are using this to handle a bulk of their variables. You can be sure that bugs are found fast and fixed early.
- Extremely Fast: mem2reg has a number of special cases that make it fast in common cases as well as fully general. For example, it has fast-paths for variables that are only used in a single block, variables that only have one assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
- Needed for debug info generation: Debug information in LLVM relies on having the address of the variable exposed so that debug info can be attached to it. This technique dovetails very naturally with this style of debug info.
If nothing else, this makes it much easier to get your front-end up and running, and is very simple to implement. Lets extend Kaleidoscope with mutable variables now!
Mutable Variables in Kaleidoscope¶
Now that we know the sort of problem we want to tackle, lets see what this looks like in the context of our little Kaleidoscope language. We’re going to add two features:
- The ability to mutate variables with the ‘=’ operator.
- The ability to define new variables.
While the first item is really what this is about, we only have variables for incoming arguments as well as for induction variables, and redefining those only goes so far :). Also, the ability to define new variables is a useful thing regardless of whether you will be mutating them. Here’s a motivating example that shows how we could use these:
# Define ':' for sequencing: as a low-precedence operator that ignores operands
# and just returns the RHS.
def binary : 1 (x y) y;
# Recursive fib, we could do this before.
def fib(x)
if (x < 3) then
1
else
fib(x-1)+fib(x-2);
# Iterative fib.
def fibi(x)
var a = 1, b = 1, c in
(for i = 3, i < x in
c = a + b :
a = b :
b = c) :
b;
# Call it.
fibi(10);
In order to mutate variables, we have to change our existing variables to use the “alloca trick”. Once we have that, we’ll add our new operator, then extend Kaleidoscope to support new variable definitions.
Adjusting Existing Variables for Mutation¶
The symbol table in Kaleidoscope is managed at code generation time by
the ‘named_values
‘ map. This map currently keeps track of the LLVM
“Value*” that holds the double value for the named variable. In order
to support mutation, we need to change this slightly, so that it
named_values
holds the memory location of the variable in
question. Note that this change is a refactoring: it changes the
structure of the code, but does not (by itself) change the behavior of
the compiler. All of these changes are isolated in the Kaleidoscope code
generator.
At this point in Kaleidoscope’s development, it only supports variables for two things: incoming arguments to functions and the induction variable of ‘for’ loops. For consistency, we’ll allow mutation of these variables in addition to other user-defined variables. This means that these will both need memory locations.
To start our transformation of Kaleidoscope, we’ll change the
named_values
map so that it maps to AllocaInst* instead of Value*.
Once we do this, the C++ compiler will tell us what parts of the code we
need to update:
Note: the ocaml bindings currently model both Value*
‘s and
AllocInst*
‘s as Llvm.llvalue
‘s, but this may change in the future
to be more type safe.
let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
Also, since we will need to create these alloca’s, we’ll use a helper function that ensures that the allocas are created in the entry block of the function:
(* Create an alloca instruction in the entry block of the function. This
* is used for mutable variables etc. *)
let create_entry_block_alloca the_function var_name =
let builder = builder_at (instr_begin (entry_block the_function)) in
build_alloca double_type var_name builder
This funny looking code creates an Llvm.llbuilder
object that is
pointing at the first instruction of the entry block. It then creates an
alloca with the expected name and returns it. Because all values in
Kaleidoscope are doubles, there is no need to pass in a type to use.
With this in place, the first functionality change we want to make is to variable references. In our new scheme, variables live on the stack, so code generating a reference to them actually needs to produce a load from the stack slot:
let rec codegen_expr = function
...
| Ast.Variable name ->
let v = try Hashtbl.find named_values name with
| Not_found -> raise (Error "unknown variable name")
in
(* Load the value. *)
build_load v name builder
As you can see, this is pretty straightforward. Now we need to update
the things that define the variables to set up the alloca. We’ll start
with codegen_expr Ast.For ...
(see the full code listing
for the unabridged code):
| Ast.For (var_name, start, end_, step, body) ->
let the_function = block_parent (insertion_block builder) in
(* Create an alloca for the variable in the entry block. *)
let alloca = create_entry_block_alloca the_function var_name in
(* Emit the start code first, without 'variable' in scope. *)
let start_val = codegen_expr start in
(* Store the value into the alloca. *)
ignore(build_store start_val alloca builder);
...
(* Within the loop, the variable is defined equal to the PHI node. If it
* shadows an existing variable, we have to restore it, so save it
* now. *)
let old_val =
try Some (Hashtbl.find named_values var_name) with Not_found -> None
in
Hashtbl.add named_values var_name alloca;
...
(* Compute the end condition. *)
let end_cond = codegen_expr end_ in
(* Reload, increment, and restore the alloca. This handles the case where
* the body of the loop mutates the variable. *)
let cur_var = build_load alloca var_name builder in
let next_var = build_add cur_var step_val "nextvar" builder in
ignore(build_store next_var alloca builder);
...
This code is virtually identical to the code before we allowed mutable variables. The big difference is that we no longer have to construct a PHI node, and we use load/store to access the variable as needed.
To support mutable argument variables, we need to also make allocas for them. The code for this is also pretty simple:
(* Create an alloca for each argument and register the argument in the symbol
* table so that references to it will succeed. *)
let create_argument_allocas the_function proto =
let args = match proto with
| Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args
in
Array.iteri (fun i ai ->
let var_name = args.(i) in
(* Create an alloca for this variable. *)
let alloca = create_entry_block_alloca the_function var_name in
(* Store the initial value into the alloca. *)
ignore(build_store ai alloca builder);
(* Add arguments to variable symbol table. *)
Hashtbl.add named_values var_name alloca;
) (params the_function)
For each argument, we make an alloca, store the input value to the
function into the alloca, and register the alloca as the memory location
for the argument. This method gets invoked by Codegen.codegen_func
right after it sets up the entry block for the function.
The final missing piece is adding the mem2reg pass, which allows us to get good codegen once again:
let main () =
...
let the_fpm = PassManager.create_function Codegen.the_module in
(* Set up the optimizer pipeline. Start with registering info about how the
* target lays out data structures. *)
DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
(* Promote allocas to registers. *)
add_memory_to_register_promotion the_fpm;
(* Do simple "peephole" optimizations and bit-twiddling optzn. *)
add_instruction_combining the_fpm;
(* reassociate expressions. *)
add_reassociation the_fpm;
It is interesting to see what the code looks like before and after the mem2reg optimization runs. For example, this is the before/after code for our recursive fib function. Before the optimization:
define double @fib(double %x) {
entry:
%x1 = alloca double
store double %x, double* %x1
%x2 = load double* %x1
%cmptmp = fcmp ult double %x2, 3.000000e+00
%booltmp = uitofp i1 %cmptmp to double
%ifcond = fcmp one double %booltmp, 0.000000e+00
br i1 %ifcond, label %then, label %else
then: ; preds = %entry
br label %ifcont
else: ; preds = %entry
%x3 = load double* %x1
%subtmp = fsub double %x3, 1.000000e+00
%calltmp = call double @fib(double %subtmp)
%x4 = load double* %x1
%subtmp5 = fsub double %x4, 2.000000e+00
%calltmp6 = call double @fib(double %subtmp5)
%addtmp = fadd double %calltmp, %calltmp6
br label %ifcont
ifcont: ; preds = %else, %then
%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
ret double %iftmp
}
Here there is only one variable (x, the input argument) but you can still see the extremely simple-minded code generation strategy we are using. In the entry block, an alloca is created, and the initial input value is stored into it. Each reference to the variable does a reload from the stack. Also, note that we didn’t modify the if/then/else expression, so it still inserts a PHI node. While we could make an alloca for it, it is actually easier to create a PHI node for it, so we still just make the PHI.
Here is the code after the mem2reg pass runs:
define double @fib(double %x) {
entry:
%cmptmp = fcmp ult double %x, 3.000000e+00
%booltmp = uitofp i1 %cmptmp to double
%ifcond = fcmp one double %booltmp, 0.000000e+00
br i1 %ifcond, label %then, label %else
then:
br label %ifcont
else:
%subtmp = fsub double %x, 1.000000e+00
%calltmp = call double @fib(double %subtmp)
%subtmp5 = fsub double %x, 2.000000e+00
%calltmp6 = call double @fib(double %subtmp5)
%addtmp = fadd double %calltmp, %calltmp6
br label %ifcont
ifcont: ; preds = %else, %then
%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
ret double %iftmp
}
This is a trivial case for mem2reg, since there are no redefinitions of the variable. The point of showing this is to calm your tension about inserting such blatent inefficiencies :).
After the rest of the optimizers run, we get:
define double @fib(double %x) {
entry:
%cmptmp = fcmp ult double %x, 3.000000e+00
%booltmp = uitofp i1 %cmptmp to double
%ifcond = fcmp ueq double %booltmp, 0.000000e+00
br i1 %ifcond, label %else, label %ifcont
else:
%subtmp = fsub double %x, 1.000000e+00
%calltmp = call double @fib(double %subtmp)
%subtmp5 = fsub double %x, 2.000000e+00
%calltmp6 = call double @fib(double %subtmp5)
%addtmp = fadd double %calltmp, %calltmp6
ret double %addtmp
ifcont:
ret double 1.000000e+00
}
Here we see that the simplifycfg pass decided to clone the return instruction into the end of the ‘else’ block. This allowed it to eliminate some branches and the PHI node.
Now that all symbol table references are updated to use stack variables, we’ll add the assignment operator.
New Assignment Operator¶
With our current framework, adding a new assignment operator is really simple. We will parse it just like any other binary operator, but handle it internally (instead of allowing the user to define it). The first step is to set a precedence:
let main () =
(* Install standard binary operators.
* 1 is the lowest precedence. *)
Hashtbl.add Parser.binop_precedence '=' 2;
Hashtbl.add Parser.binop_precedence '<' 10;
Hashtbl.add Parser.binop_precedence '+' 20;
Hashtbl.add Parser.binop_precedence '-' 20;
...
Now that the parser knows the precedence of the binary operator, it takes care of all the parsing and AST generation. We just need to implement codegen for the assignment operator. This looks like:
let rec codegen_expr = function
begin match op with
| '=' ->
(* Special case '=' because we don't want to emit the LHS as an
* expression. *)
let name =
match lhs with
| Ast.Variable name -> name
| _ -> raise (Error "destination of '=' must be a variable")
in
Unlike the rest of the binary operators, our assignment operator doesn’t follow the “emit LHS, emit RHS, do computation” model. As such, it is handled as a special case before the other binary operators are handled. The other strange thing is that it requires the LHS to be a variable. It is invalid to have “(x+1) = expr” - only things like “x = expr” are allowed.
(* Codegen the rhs. *)
let val_ = codegen_expr rhs in
(* Lookup the name. *)
let variable = try Hashtbl.find named_values name with
| Not_found -> raise (Error "unknown variable name")
in
ignore(build_store val_ variable builder);
val_
| _ ->
...
Once we have the variable, codegen’ing the assignment is straightforward: we emit the RHS of the assignment, create a store, and return the computed value. Returning a value allows for chained assignments like “X = (Y = Z)”.
Now that we have an assignment operator, we can mutate loop variables and arguments. For example, we can now run code like this:
# Function to print a double.
extern printd(x);
# Define ':' for sequencing: as a low-precedence operator that ignores operands
# and just returns the RHS.
def binary : 1 (x y) y;
def test(x)
printd(x) :
x = 4 :
printd(x);
test(123);
When run, this example prints “123” and then “4”, showing that we did actually mutate the value! Okay, we have now officially implemented our goal: getting this to work requires SSA construction in the general case. However, to be really useful, we want the ability to define our own local variables, lets add this next!
User-defined Local Variables¶
Adding var/in is just like any other other extensions we made to Kaleidoscope: we extend the lexer, the parser, the AST and the code generator. The first step for adding our new ‘var/in’ construct is to extend the lexer. As before, this is pretty trivial, the code looks like this:
type token =
...
(* var definition *)
| Var
...
and lex_ident buffer = parser
...
| "in" -> [< 'Token.In; stream >]
| "binary" -> [< 'Token.Binary; stream >]
| "unary" -> [< 'Token.Unary; stream >]
| "var" -> [< 'Token.Var; stream >]
...
The next step is to define the AST node that we will construct. For var/in, it looks like this:
type expr =
...
(* variant for var/in. *)
| Var of (string * expr option) array * expr
...
var/in allows a list of names to be defined all at once, and each name can optionally have an initializer value. As such, we capture this information in the VarNames vector. Also, var/in has a body, this body is allowed to access the variables defined by the var/in.
With this in place, we can define the parser pieces. The first thing we do is add it as a primary expression:
(* primary
* ::= identifier
* ::= numberexpr
* ::= parenexpr
* ::= ifexpr
* ::= forexpr
* ::= varexpr *)
let rec parse_primary = parser
...
(* varexpr
* ::= 'var' identifier ('=' expression?
* (',' identifier ('=' expression)?)* 'in' expression *)
| [< 'Token.Var;
(* At least one variable name is required. *)
'Token.Ident id ?? "expected identifier after var";
init=parse_var_init;
var_names=parse_var_names [(id, init)];
(* At this point, we have to have 'in'. *)
'Token.In ?? "expected 'in' keyword after 'var'";
body=parse_expr >] ->
Ast.Var (Array.of_list (List.rev var_names), body)
...
and parse_var_init = parser
(* read in the optional initializer. *)
| [< 'Token.Kwd '='; e=parse_expr >] -> Some e
| [< >] -> None
and parse_var_names accumulator = parser
| [< 'Token.Kwd ',';
'Token.Ident id ?? "expected identifier list after var";
init=parse_var_init;
e=parse_var_names ((id, init) :: accumulator) >] -> e
| [< >] -> accumulator
Now that we can parse and represent the code, we need to support emission of LLVM IR for it. This code starts out with:
let rec codegen_expr = function
...
| Ast.Var (var_names, body)
let old_bindings = ref [] in
let the_function = block_parent (insertion_block builder) in
(* Register all variables and emit their initializer. *)
Array.iter (fun (var_name, init) ->
Basically it loops over all the variables, installing them one at a time. For each variable we put into the symbol table, we remember the previous value that we replace in OldBindings.
(* Emit the initializer before adding the variable to scope, this
* prevents the initializer from referencing the variable itself, and
* permits stuff like this:
* var a = 1 in
* var a = a in ... # refers to outer 'a'. *)
let init_val =
match init with
| Some init -> codegen_expr init
(* If not specified, use 0.0. *)
| None -> const_float double_type 0.0
in
let alloca = create_entry_block_alloca the_function var_name in
ignore(build_store init_val alloca builder);
(* Remember the old variable binding so that we can restore the binding
* when we unrecurse. *)
begin
try
let old_value = Hashtbl.find named_values var_name in
old_bindings := (var_name, old_value) :: !old_bindings;
with Not_found > ()
end;
(* Remember this binding. *)
Hashtbl.add named_values var_name alloca;
) var_names;
There are more comments here than code. The basic idea is that we emit the initializer, create the alloca, then update the symbol table to point to it. Once all the variables are installed in the symbol table, we evaluate the body of the var/in expression:
(* Codegen the body, now that all vars are in scope. *)
let body_val = codegen_expr body in
Finally, before returning, we restore the previous variable bindings:
(* Pop all our variables from scope. *)
List.iter (fun (var_name, old_value) ->
Hashtbl.add named_values var_name old_value
) !old_bindings;
(* Return the body computation. *)
body_val
The end result of all of this is that we get properly scoped variable definitions, and we even (trivially) allow mutation of them :).
With this, we completed what we set out to do. Our nice iterative fib example from the intro compiles and runs just fine. The mem2reg pass optimizes all of our stack variables into SSA registers, inserting PHI nodes where needed, and our front-end remains simple: no “iterated dominance frontier” computation anywhere in sight.
Full Code Listing¶
Here is the complete code listing for our running example, enhanced with mutable variables and var/in support. To build this example, use:
# Compile
ocamlbuild toy.byte
# Run
./toy.byte
Here is the code:
- _tags:
<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) <*.{byte,native}>: g++, use_llvm, use_llvm_analysis <*.{byte,native}>: use_llvm_executionengine, use_llvm_target <*.{byte,native}>: use_llvm_scalar_opts, use_bindings
- myocamlbuild.ml:
open Ocamlbuild_plugin;; ocaml_lib ~extern:true "llvm";; ocaml_lib ~extern:true "llvm_analysis";; ocaml_lib ~extern:true "llvm_executionengine";; ocaml_lib ~extern:true "llvm_target";; ocaml_lib ~extern:true "llvm_scalar_opts";; flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
- token.ml:
(*===----------------------------------------------------------------------=== * Lexer Tokens *===----------------------------------------------------------------------===*) (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of * these others for known things. *) type token = (* commands *) | Def | Extern (* primary *) | Ident of string | Number of float (* unknown *) | Kwd of char (* control *) | If | Then | Else | For | In (* operators *) | Binary | Unary (* var definition *) | Var
- lexer.ml:
(*===----------------------------------------------------------------------=== * Lexer *===----------------------------------------------------------------------===*) let rec lex = parser (* Skip any whitespace. *) | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream (* identifier: [a-zA-Z][a-zA-Z0-9] *) | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_ident buffer stream (* number: [0-9.]+ *) | [< ' ('0' .. '9' as c); stream >] -> let buffer = Buffer.create 1 in Buffer.add_char buffer c; lex_number buffer stream (* Comment until end of line. *) | [< ' ('#'); stream >] -> lex_comment stream (* Otherwise, just return the character as its ascii value. *) | [< 'c; stream >] -> [< 'Token.Kwd c; lex stream >] (* end of stream. *) | [< >] -> [< >] and lex_number buffer = parser | [< ' ('0' .. '9' | '.' as c); stream >] -> Buffer.add_char buffer c; lex_number buffer stream | [< stream=lex >] -> [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] and lex_ident buffer = parser | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> Buffer.add_char buffer c; lex_ident buffer stream | [< stream=lex >] -> match Buffer.contents buffer with | "def" -> [< 'Token.Def; stream >] | "extern" -> [< 'Token.Extern; stream >] | "if" -> [< 'Token.If; stream >] | "then" -> [< 'Token.Then; stream >] | "else" -> [< 'Token.Else; stream >] | "for" -> [< 'Token.For; stream >] | "in" -> [< 'Token.In; stream >] | "binary" -> [< 'Token.Binary; stream >] | "unary" -> [< 'Token.Unary; stream >] | "var" -> [< 'Token.Var; stream >] | id -> [< 'Token.Ident id; stream >] and lex_comment = parser | [< ' ('\n'); stream=lex >] -> stream | [< 'c; e=lex_comment >] -> e | [< >] -> [< >]
- ast.ml:
(*===----------------------------------------------------------------------=== * Abstract Syntax Tree (aka Parse Tree) *===----------------------------------------------------------------------===*) (* expr - Base type for all expression nodes. *) type expr = (* variant for numeric literals like "1.0". *) | Number of float (* variant for referencing a variable, like "a". *) | Variable of string (* variant for a unary operator. *) | Unary of char * expr (* variant for a binary operator. *) | Binary of char * expr * expr (* variant for function calls. *) | Call of string * expr array (* variant for if/then/else. *) | If of expr * expr * expr (* variant for for/in. *) | For of string * expr * expr * expr option * expr (* variant for var/in. *) | Var of (string * expr option) array * expr (* proto - This type represents the "prototype" for a function, which captures * its name, and its argument names (thus implicitly the number of arguments the * function takes). *) type proto = | Prototype of string * string array | BinOpPrototype of string * string array * int (* func - This type represents a function definition itself. *) type func = Function of proto * expr
- parser.ml:
(*===---------------------------------------------------------------------=== * Parser *===---------------------------------------------------------------------===*) (* binop_precedence - This holds the precedence for each binary operator that is * defined *) let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 (* precedence - Get the precedence of the pending binary operator token. *) let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 (* primary * ::= identifier * ::= numberexpr * ::= parenexpr * ::= ifexpr * ::= forexpr * ::= varexpr *) let rec parse_primary = parser (* numberexpr ::= number *) | [< 'Token.Number n >] -> Ast.Number n (* parenexpr ::= '(' expression ')' *) | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e (* identifierexpr * ::= identifier * ::= identifier '(' argumentexpr ')' *) | [< 'Token.Ident id; stream >] -> let rec parse_args accumulator = parser | [< e=parse_expr; stream >] -> begin parser | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e | [< >] -> e :: accumulator end stream | [< >] -> accumulator in let rec parse_ident id = parser (* Call. *) | [< 'Token.Kwd '('; args=parse_args []; 'Token.Kwd ')' ?? "expected ')'">] -> Ast.Call (id, Array.of_list (List.rev args)) (* Simple variable ref. *) | [< >] -> Ast.Variable id in parse_ident id stream (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) | [< 'Token.If; c=parse_expr; 'Token.Then ?? "expected 'then'"; t=parse_expr; 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> Ast.If (c, t, e) (* forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) | [< 'Token.For; 'Token.Ident id ?? "expected identifier after for"; 'Token.Kwd '=' ?? "expected '=' after for"; stream >] -> begin parser | [< start=parse_expr; 'Token.Kwd ',' ?? "expected ',' after for"; end_=parse_expr; stream >] -> let step = begin parser | [< 'Token.Kwd ','; step=parse_expr >] -> Some step | [< >] -> None end stream in begin parser | [< 'Token.In; body=parse_expr >] -> Ast.For (id, start, end_, step, body) | [< >] -> raise (Stream.Error "expected 'in' after for") end stream | [< >] -> raise (Stream.Error "expected '=' after for") end stream (* varexpr * ::= 'var' identifier ('=' expression? * (',' identifier ('=' expression)?)* 'in' expression *) | [< 'Token.Var; (* At least one variable name is required. *) 'Token.Ident id ?? "expected identifier after var"; init=parse_var_init; var_names=parse_var_names [(id, init)]; (* At this point, we have to have 'in'. *) 'Token.In ?? "expected 'in' keyword after 'var'"; body=parse_expr >] -> Ast.Var (Array.of_list (List.rev var_names), body) | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") (* unary * ::= primary * ::= '!' unary *) and parse_unary = parser (* If this is a unary operator, read it. *) | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> Ast.Unary (op, operand) (* If the current token is not an operator, it must be a primary expr. *) | [< stream >] -> parse_primary stream (* binoprhs * ::= ('+' primary)* *) and parse_bin_rhs expr_prec lhs stream = match Stream.peek stream with (* If this is a binop, find its precedence. *) | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> let token_prec = precedence c in (* If this is a binop that binds at least as tightly as the current binop, * consume it, otherwise we are done. *) if token_prec < expr_prec then lhs else begin (* Eat the binop. *) Stream.junk stream; (* Parse the primary expression after the binary operator. *) let rhs = parse_unary stream in (* Okay, we know this is a binop. *) let rhs = match Stream.peek stream with | Some (Token.Kwd c2) -> (* If BinOp binds less tightly with rhs than the operator after * rhs, let the pending operator take rhs as its lhs. *) let next_prec = precedence c2 in if token_prec < next_prec then parse_bin_rhs (token_prec + 1) rhs stream else rhs | _ -> rhs in (* Merge lhs/rhs. *) let lhs = Ast.Binary (c, lhs, rhs) in parse_bin_rhs expr_prec lhs stream end | _ -> lhs and parse_var_init = parser (* read in the optional initializer. *) | [< 'Token.Kwd '='; e=parse_expr >] -> Some e | [< >] -> None and parse_var_names accumulator = parser | [< 'Token.Kwd ','; 'Token.Ident id ?? "expected identifier list after var"; init=parse_var_init; e=parse_var_names ((id, init) :: accumulator) >] -> e | [< >] -> accumulator (* expression * ::= primary binoprhs *) and parse_expr = parser | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream (* prototype * ::= id '(' id* ')' * ::= binary LETTER number? (id, id) * ::= unary LETTER number? (id) *) let parse_prototype = let rec parse_args accumulator = parser | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | [< >] -> accumulator in let parse_operator = parser | [< 'Token.Unary >] -> "unary", 1 | [< 'Token.Binary >] -> "binary", 2 in let parse_binary_precedence = parser | [< 'Token.Number n >] -> int_of_float n | [< >] -> 30 in parser | [< 'Token.Ident id; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> (* success. *) Ast.Prototype (id, Array.of_list (List.rev args)) | [< (prefix, kind)=parse_operator; 'Token.Kwd op ?? "expected an operator"; (* Read the precedence if present. *) binary_precedence=parse_binary_precedence; 'Token.Kwd '(' ?? "expected '(' in prototype"; args=parse_args []; 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> let name = prefix ^ (String.make 1 op) in let args = Array.of_list (List.rev args) in (* Verify right number of arguments for operator. *) if Array.length args != kind then raise (Stream.Error "invalid number of operands for operator") else if kind == 1 then Ast.Prototype (name, args) else Ast.BinOpPrototype (name, args, binary_precedence) | [< >] -> raise (Stream.Error "expected function name in prototype") (* definition ::= 'def' prototype expression *) let parse_definition = parser | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> Ast.Function (p, e) (* toplevelexpr ::= expression *) let parse_toplevel = parser | [< e=parse_expr >] -> (* Make an anonymous proto. *) Ast.Function (Ast.Prototype ("", [||]), e) (* external ::= 'extern' prototype *) let parse_extern = parser | [< 'Token.Extern; e=parse_prototype >] -> e
- codegen.ml:
(*===----------------------------------------------------------------------=== * Code Generation *===----------------------------------------------------------------------===*) open Llvm exception Error of string let context = global_context () let the_module = create_module context "my cool jit" let builder = builder context let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 let double_type = double_type context (* Create an alloca instruction in the entry block of the function. This * is used for mutable variables etc. *) let create_entry_block_alloca the_function var_name = let builder = builder_at context (instr_begin (entry_block the_function)) in build_alloca double_type var_name builder let rec codegen_expr = function | Ast.Number n -> const_float double_type n | Ast.Variable name -> let v = try Hashtbl.find named_values name with | Not_found -> raise (Error "unknown variable name") in (* Load the value. *) build_load v name builder | Ast.Unary (op, operand) -> let operand = codegen_expr operand in let callee = "unary" ^ (String.make 1 op) in let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "unknown unary operator") in build_call callee [|operand|] "unop" builder | Ast.Binary (op, lhs, rhs) -> begin match op with | '=' -> (* Special case '=' because we don't want to emit the LHS as an * expression. *) let name = match lhs with | Ast.Variable name -> name | _ -> raise (Error "destination of '=' must be a variable") in (* Codegen the rhs. *) let val_ = codegen_expr rhs in (* Lookup the name. *) let variable = try Hashtbl.find named_values name with | Not_found -> raise (Error "unknown variable name") in ignore(build_store val_ variable builder); val_ | _ -> let lhs_val = codegen_expr lhs in let rhs_val = codegen_expr rhs in begin match op with | '+' -> build_add lhs_val rhs_val "addtmp" builder | '-' -> build_sub lhs_val rhs_val "subtmp" builder | '*' -> build_mul lhs_val rhs_val "multmp" builder | '<' -> (* Convert bool 0/1 to double 0.0 or 1.0 *) let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in build_uitofp i double_type "booltmp" builder | _ -> (* If it wasn't a builtin binary operator, it must be a user defined * one. Emit a call to it. *) let callee = "binary" ^ (String.make 1 op) in let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "binary operator not found!") in build_call callee [|lhs_val; rhs_val|] "binop" builder end end | Ast.Call (callee, args) -> (* Look up the name in the module table. *) let callee = match lookup_function callee the_module with | Some callee -> callee | None -> raise (Error "unknown function referenced") in let params = params callee in (* If argument mismatch error. *) if Array.length params == Array.length args then () else raise (Error "incorrect # arguments passed"); let args = Array.map codegen_expr args in build_call callee args "calltmp" builder | Ast.If (cond, then_, else_) -> let cond = codegen_expr cond in (* Convert condition to a bool by comparing equal to 0.0 *) let zero = const_float double_type 0.0 in let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in (* Grab the first block so that we might later add the conditional branch * to it at the end of the function. *) let start_bb = insertion_block builder in let the_function = block_parent start_bb in let then_bb = append_block context "then" the_function in (* Emit 'then' value. *) position_at_end then_bb builder; let then_val = codegen_expr then_ in (* Codegen of 'then' can change the current block, update then_bb for the * phi. We create a new name because one is used for the phi node, and the * other is used for the conditional branch. *) let new_then_bb = insertion_block builder in (* Emit 'else' value. *) let else_bb = append_block context "else" the_function in position_at_end else_bb builder; let else_val = codegen_expr else_ in (* Codegen of 'else' can change the current block, update else_bb for the * phi. *) let new_else_bb = insertion_block builder in (* Emit merge block. *) let merge_bb = append_block context "ifcont" the_function in position_at_end merge_bb builder; let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in let phi = build_phi incoming "iftmp" builder in (* Return to the start block to add the conditional branch. *) position_at_end start_bb builder; ignore (build_cond_br cond_val then_bb else_bb builder); (* Set a unconditional branch at the end of the 'then' block and the * 'else' block to the 'merge' block. *) position_at_end new_then_bb builder; ignore (build_br merge_bb builder); position_at_end new_else_bb builder; ignore (build_br merge_bb builder); (* Finally, set the builder to the end of the merge block. *) position_at_end merge_bb builder; phi | Ast.For (var_name, start, end_, step, body) -> (* Output this as: * var = alloca double * ... * start = startexpr * store start -> var * goto loop * loop: * ... * bodyexpr * ... * loopend: * step = stepexpr * endcond = endexpr * * curvar = load var * nextvar = curvar + step * store nextvar -> var * br endcond, loop, endloop * outloop: *) let the_function = block_parent (insertion_block builder) in (* Create an alloca for the variable in the entry block. *) let alloca = create_entry_block_alloca the_function var_name in (* Emit the start code first, without 'variable' in scope. *) let start_val = codegen_expr start in (* Store the value into the alloca. *) ignore(build_store start_val alloca builder); (* Make the new basic block for the loop header, inserting after current * block. *) let loop_bb = append_block context "loop" the_function in (* Insert an explicit fall through from the current block to the * loop_bb. *) ignore (build_br loop_bb builder); (* Start insertion in loop_bb. *) position_at_end loop_bb builder; (* Within the loop, the variable is defined equal to the PHI node. If it * shadows an existing variable, we have to restore it, so save it * now. *) let old_val = try Some (Hashtbl.find named_values var_name) with Not_found -> None in Hashtbl.add named_values var_name alloca; (* Emit the body of the loop. This, like any other expr, can change the * current BB. Note that we ignore the value computed by the body, but * don't allow an error *) ignore (codegen_expr body); (* Emit the step value. *) let step_val = match step with | Some step -> codegen_expr step (* If not specified, use 1.0. *) | None -> const_float double_type 1.0 in (* Compute the end condition. *) let end_cond = codegen_expr end_ in (* Reload, increment, and restore the alloca. This handles the case where * the body of the loop mutates the variable. *) let cur_var = build_load alloca var_name builder in let next_var = build_add cur_var step_val "nextvar" builder in ignore(build_store next_var alloca builder); (* Convert condition to a bool by comparing equal to 0.0. *) let zero = const_float double_type 0.0 in let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in (* Create the "after loop" block and insert it. *) let after_bb = append_block context "afterloop" the_function in (* Insert the conditional branch into the end of loop_end_bb. *) ignore (build_cond_br end_cond loop_bb after_bb builder); (* Any new code will be inserted in after_bb. *) position_at_end after_bb builder; (* Restore the unshadowed variable. *) begin match old_val with | Some old_val -> Hashtbl.add named_values var_name old_val | None -> () end; (* for expr always returns 0.0. *) const_null double_type | Ast.Var (var_names, body) -> let old_bindings = ref [] in let the_function = block_parent (insertion_block builder) in (* Register all variables and emit their initializer. *) Array.iter (fun (var_name, init) -> (* Emit the initializer before adding the variable to scope, this * prevents the initializer from referencing the variable itself, and * permits stuff like this: * var a = 1 in * var a = a in ... # refers to outer 'a'. *) let init_val = match init with | Some init -> codegen_expr init (* If not specified, use 0.0. *) | None -> const_float double_type 0.0 in let alloca = create_entry_block_alloca the_function var_name in ignore(build_store init_val alloca builder); (* Remember the old variable binding so that we can restore the binding * when we unrecurse. *) begin try let old_value = Hashtbl.find named_values var_name in old_bindings := (var_name, old_value) :: !old_bindings; with Not_found -> () end; (* Remember this binding. *) Hashtbl.add named_values var_name alloca; ) var_names; (* Codegen the body, now that all vars are in scope. *) let body_val = codegen_expr body in (* Pop all our variables from scope. *) List.iter (fun (var_name, old_value) -> Hashtbl.add named_values var_name old_value ) !old_bindings; (* Return the body computation. *) body_val let codegen_proto = function | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> (* Make the function type: double(double,double) etc. *) let doubles = Array.make (Array.length args) double_type in let ft = function_type double_type doubles in let f = match lookup_function name the_module with | None -> declare_function name ft the_module (* If 'f' conflicted, there was already something named 'name'. If it * has a body, don't allow redefinition or reextern. *) | Some f -> (* If 'f' already has a body, reject this. *) if block_begin f <> At_end f then raise (Error "redefinition of function"); (* If 'f' took a different number of arguments, reject. *) if element_type (type_of f) <> ft then raise (Error "redefinition of function with different # args"); f in (* Set names for all arguments. *) Array.iteri (fun i a -> let n = args.(i) in set_value_name n a; Hashtbl.add named_values n a; ) (params f); f (* Create an alloca for each argument and register the argument in the symbol * table so that references to it will succeed. *) let create_argument_allocas the_function proto = let args = match proto with | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args in Array.iteri (fun i ai -> let var_name = args.(i) in (* Create an alloca for this variable. *) let alloca = create_entry_block_alloca the_function var_name in (* Store the initial value into the alloca. *) ignore(build_store ai alloca builder); (* Add arguments to variable symbol table. *) Hashtbl.add named_values var_name alloca; ) (params the_function) let codegen_func the_fpm = function | Ast.Function (proto, body) -> Hashtbl.clear named_values; let the_function = codegen_proto proto in (* If this is an operator, install it. *) begin match proto with | Ast.BinOpPrototype (name, args, prec) -> let op = name.[String.length name - 1] in Hashtbl.add Parser.binop_precedence op prec; | _ -> () end; (* Create a new basic block to start insertion into. *) let bb = append_block context "entry" the_function in position_at_end bb builder; try (* Add all arguments to the symbol table and create their allocas. *) create_argument_allocas the_function proto; let ret_val = codegen_expr body in (* Finish off the function. *) let _ = build_ret ret_val builder in (* Validate the generated code, checking for consistency. *) Llvm_analysis.assert_valid_function the_function; (* Optimize the function. *) let _ = PassManager.run_function the_function the_fpm in the_function with e -> delete_function the_function; raise e
- toplevel.ml:
(*===----------------------------------------------------------------------=== * Top-Level parsing and JIT Driver *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine (* top ::= definition | external | expression | ';' *) let rec main_loop the_fpm the_execution_engine stream = match Stream.peek stream with | None -> () (* ignore top-level semicolons. *) | Some (Token.Kwd ';') -> Stream.junk stream; main_loop the_fpm the_execution_engine stream | Some token -> begin try match token with | Token.Def -> let e = Parser.parse_definition stream in print_endline "parsed a function definition."; dump_value (Codegen.codegen_func the_fpm e); | Token.Extern -> let e = Parser.parse_extern stream in print_endline "parsed an extern."; dump_value (Codegen.codegen_proto e); | _ -> (* Evaluate a top-level expression into an anonymous function. *) let e = Parser.parse_toplevel stream in print_endline "parsed a top-level expr"; let the_function = Codegen.codegen_func the_fpm e in dump_value the_function; (* JIT the function, returning a function pointer. *) let result = ExecutionEngine.run_function the_function [||] the_execution_engine in print_string "Evaluated to "; print_float (GenericValue.as_float Codegen.double_type result); print_newline (); with Stream.Error s | Codegen.Error s -> (* Skip token for error recovery. *) Stream.junk stream; print_endline s; end; print_string "ready> "; flush stdout; main_loop the_fpm the_execution_engine stream
- toy.ml:
(*===----------------------------------------------------------------------=== * Main driver code. *===----------------------------------------------------------------------===*) open Llvm open Llvm_executionengine open Llvm_target open Llvm_scalar_opts let main () = ignore (initialize_native_target ()); (* Install standard binary operators. * 1 is the lowest precedence. *) Hashtbl.add Parser.binop_precedence '=' 2; Hashtbl.add Parser.binop_precedence '<' 10; Hashtbl.add Parser.binop_precedence '+' 20; Hashtbl.add Parser.binop_precedence '-' 20; Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) (* Prime the first token. *) print_string "ready> "; flush stdout; let stream = Lexer.lex (Stream.of_channel stdin) in (* Create the JIT. *) let the_execution_engine = ExecutionEngine.create Codegen.the_module in let the_fpm = PassManager.create_function Codegen.the_module in (* Set up the optimizer pipeline. Start with registering info about how the * target lays out data structures. *) DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; (* Promote allocas to registers. *) add_memory_to_register_promotion the_fpm; (* Do simple "peephole" optimizations and bit-twiddling optzn. *) add_instruction_combination the_fpm; (* reassociate expressions. *) add_reassociation the_fpm; (* Eliminate Common SubExpressions. *) add_gvn the_fpm; (* Simplify the control flow graph (deleting unreachable blocks, etc). *) add_cfg_simplification the_fpm; ignore (PassManager.initialize the_fpm); (* Run the main "interpreter loop" now. *) Toplevel.main_loop the_fpm the_execution_engine stream; (* Print out all the generated code. *) dump_module Codegen.the_module ;; main ()
- bindings.c
#include <stdio.h> /* putchard - putchar that takes a double and returns 0. */ extern double putchard(double X) { putchar((char)X); return 0; } /* printd - printf that takes a double prints it as "%f\n", returning 0. */ extern double printd(double X) { printf("%f\n", X); return 0; }
Kaleidoscope: Conclusion and other useful LLVM tidbits¶
Tutorial Conclusion¶
Welcome to the final chapter of the “Implementing a language with LLVM” tutorial. In the course of this tutorial, we have grown our little Kaleidoscope language from being a useless toy, to being a semi-interesting (but probably still useless) toy. :)
It is interesting to see how far we’ve come, and how little code it has taken. We built the entire lexer, parser, AST, code generator, and an interactive run-loop (with a JIT!) by-hand in under 700 lines of (non-comment/non-blank) code.
Our little language supports a couple of interesting features: it supports user defined binary and unary operators, it uses JIT compilation for immediate evaluation, and it supports a few control flow constructs with SSA construction.
Part of the idea of this tutorial was to show you how easy and fun it can be to define, build, and play with languages. Building a compiler need not be a scary or mystical process! Now that you’ve seen some of the basics, I strongly encourage you to take the code and hack on it. For example, try adding:
- global variables - While global variables have questional value
in modern software engineering, they are often useful when putting
together quick little hacks like the Kaleidoscope compiler itself.
Fortunately, our current setup makes it very easy to add global
variables: just have value lookup check to see if an unresolved
variable is in the global variable symbol table before rejecting it.
To create a new global variable, make an instance of the LLVM
GlobalVariable
class. - typed variables - Kaleidoscope currently only supports variables of type double. This gives the language a very nice elegance, because only supporting one type means that you never have to specify types. Different languages have different ways of handling this. The easiest way is to require the user to specify types for every variable definition, and record the type of the variable in the symbol table along with its Value*.
- arrays, structs, vectors, etc - Once you add types, you can start extending the type system in all sorts of interesting ways. Simple arrays are very easy and are quite useful for many different applications. Adding them is mostly an exercise in learning how the LLVM getelementptr instruction works: it is so nifty/unconventional, it has its own FAQ! If you add support for recursive types (e.g. linked lists), make sure to read the section in the LLVM Programmer’s Manual that describes how to construct them.
- standard runtime - Our current language allows the user to access arbitrary external functions, and we use it for things like “printd” and “putchard”. As you extend the language to add higher-level constructs, often these constructs make the most sense if they are lowered to calls into a language-supplied runtime. For example, if you add hash tables to the language, it would probably make sense to add the routines to a runtime, instead of inlining them all the way.
- memory management - Currently we can only access the stack in Kaleidoscope. It would also be useful to be able to allocate heap memory, either with calls to the standard libc malloc/free interface or with a garbage collector. If you would like to use garbage collection, note that LLVM fully supports Accurate Garbage Collection including algorithms that move objects and need to scan/update the stack.
- debugger support - LLVM supports generation of DWARF Debug
info which is understood by common
debuggers like GDB. Adding support for debug info is fairly
straightforward. The best way to understand it is to compile some
C/C++ code with “
clang -g -O0
” and taking a look at what it produces. - exception handling support - LLVM supports generation of zero cost exceptions which interoperate with code compiled in other languages. You could also generate code by implicitly making every function return an error value and checking it. You could also make explicit use of setjmp/longjmp. There are many different ways to go here.
- object orientation, generics, database access, complex numbers, geometric programming, ... - Really, there is no end of crazy features that you can add to the language.
- unusual domains - We’ve been talking about applying LLVM to a domain that many people are interested in: building a compiler for a specific language. However, there are many other domains that can use compiler technology that are not typically considered. For example, LLVM has been used to implement OpenGL graphics acceleration, translate C++ code to ActionScript, and many other cute and clever things. Maybe you will be the first to JIT compile a regular expression interpreter into native code with LLVM?
Have fun - try doing something crazy and unusual. Building a language like everyone else always has, is much less fun than trying something a little crazy or off the wall and seeing how it turns out. If you get stuck or want to talk about it, feel free to email the llvm-dev mailing list: it has lots of people who are interested in languages and are often willing to help out.
Before we end this tutorial, I want to talk about some “tips and tricks” for generating LLVM IR. These are some of the more subtle things that may not be obvious, but are very useful if you want to take advantage of LLVM’s capabilities.
Properties of the LLVM IR¶
We have a couple common questions about code in the LLVM IR form - lets just get these out of the way right now, shall we?
Target Independence¶
Kaleidoscope is an example of a “portable language”: any program written in Kaleidoscope will work the same way on any target that it runs on. Many other languages have this property, e.g. lisp, java, haskell, javascript, python, etc (note that while these languages are portable, not all their libraries are).
One nice aspect of LLVM is that it is often capable of preserving target independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled program and run it on any target that LLVM supports, even emitting C code and compiling that on targets that LLVM doesn’t support natively. You can trivially tell that the Kaleidoscope compiler generates target-independent code because it never queries for any target-specific information when generating code.
The fact that LLVM provides a compact, target-independent, representation for code gets a lot of people excited. Unfortunately, these people are usually thinking about C or a language from the C family when they are asking questions about language portability. I say “unfortunately”, because there is really no way to make (fully general) C code portable, other than shipping the source code around (and of course, C source code is not actually portable in general either - ever port a really old application from 32- to 64-bits?).
The problem with C (again, in its full generality) is that it is heavily laden with target specific assumptions. As one simple example, the preprocessor often destructively removes target-independence from the code when it processes the input text:
#ifdef __i386__
int X = 1;
#else
int X = 42;
#endif
While it is possible to engineer more and more complex solutions to problems like this, it cannot be solved in full generality in a way that is better than shipping the actual source code.
That said, there are interesting subsets of C that can be made portable. If you are willing to fix primitive types to a fixed size (say int = 32-bits, and long = 64-bits), don’t care about ABI compatibility with existing binaries, and are willing to give up some other minor features, you can have portable code. This can make sense for specialized domains such as an in-kernel language.
Safety Guarantees¶
Many of the languages above are also “safe” languages: it is impossible for a program written in Java to corrupt its address space and crash the process (assuming the JVM has no bugs). Safety is an interesting property that requires a combination of language design, runtime support, and often operating system support.
It is certainly possible to implement a safe language in LLVM, but LLVM IR does not itself guarantee safety. The LLVM IR allows unsafe pointer casts, use after free bugs, buffer over-runs, and a variety of other problems. Safety needs to be implemented as a layer on top of LLVM and, conveniently, several groups have investigated this. Ask on the llvm-dev mailing list if you are interested in more details.
Language-Specific Optimizations¶
One thing about LLVM that turns off many people is that it does not solve all the world’s problems in one system (sorry ‘world hunger’, someone else will have to solve you some other day). One specific complaint is that people perceive LLVM as being incapable of performing high-level language-specific optimization: LLVM “loses too much information”.
Unfortunately, this is really not the place to give you a full and unified version of “Chris Lattner’s theory of compiler design”. Instead, I’ll make a few observations:
First, you’re right that LLVM does lose information. For example, as of this writing, there is no way to distinguish in the LLVM IR whether an SSA-value came from a C “int” or a C “long” on an ILP32 machine (other than debug info). Both get compiled down to an ‘i32’ value and the information about what it came from is lost. The more general issue here, is that the LLVM type system uses “structural equivalence” instead of “name equivalence”. Another place this surprises people is if you have two types in a high-level language that have the same structure (e.g. two different structs that have a single int field): these types will compile down into a single LLVM type and it will be impossible to tell what it came from.
Second, while LLVM does lose information, LLVM is not a fixed target: we continue to enhance and improve it in many different ways. In addition to adding new features (LLVM did not always support exceptions or debug info), we also extend the IR to capture important information for optimization (e.g. whether an argument is sign or zero extended, information about pointers aliasing, etc). Many of the enhancements are user-driven: people want LLVM to include some specific feature, so they go ahead and extend it.
Third, it is possible and easy to add language-specific optimizations, and you have a number of choices in how to do it. As one trivial example, it is easy to add language-specific optimization passes that “know” things about code compiled for a language. In the case of the C family, there is an optimization pass that “knows” about the standard C library functions. If you call “exit(0)” in main(), it knows that it is safe to optimize that into “return 0;” because C specifies what the ‘exit’ function does.
In addition to simple library knowledge, it is possible to embed a variety of other language-specific information into the LLVM IR. If you have a specific need and run into a wall, please bring the topic up on the llvm-dev list. At the very worst, you can always treat LLVM as if it were a “dumb code generator” and implement the high-level optimizations you desire in your front-end, on the language-specific AST.
Tips and Tricks¶
There is a variety of useful tips and tricks that you come to know after working on/with LLVM that aren’t obvious at first glance. Instead of letting everyone rediscover them, this section talks about some of these issues.
Implementing portable offsetof/sizeof¶
One interesting thing that comes up, if you are trying to keep the code generated by your compiler “target independent”, is that you often need to know the size of some LLVM type or the offset of some field in an llvm structure. For example, you might need to pass the size of a type into a function that allocates memory.
Unfortunately, this can vary widely across targets: for example the width of a pointer is trivially target-specific. However, there is a clever way to use the getelementptr instruction that allows you to compute this in a portable way.
Garbage Collected Stack Frames¶
Some languages want to explicitly manage their stack frames, often so that they are garbage collected or to allow easy implementation of closures. There are often better ways to implement these features than explicit stack frames, but LLVM does support them, if you want. It requires your front-end to convert the code into Continuation Passing Style and the use of tail calls (which LLVM also supports).
External Tutorials¶
- Tutorial: Creating an LLVM Backend for the Cpu0 Architecture
- A step-by-step tutorial for developing an LLVM backend. Under active development at https://github.com/Jonathan2251/lbd (please contribute!).
- Howto: Implementing LLVM Integrated Assembler
- A simple guide for how to implement an LLVM integrated assembler for an architecture.
Advanced Topics¶
LLVM 3.7 Release Notes¶
Introduction¶
This document contains the release notes for the LLVM Compiler Infrastructure, release 3.7. Here we describe the status of LLVM, including major improvements from the previous release, improvements in various subprojects of LLVM, and some of the current users of the code. All LLVM releases may be downloaded from the LLVM releases web site.
For more information about LLVM, including information about the latest release, please check out the main LLVM web site. If you have questions or comments, the LLVM Developer’s Mailing List is a good place to send them.
Note that if you are reading this file from a Subversion checkout or the main LLVM web page, this document applies to the next release, not the current one. To see the release notes for a specific release, please see the releases page.
Major changes in 3.7.1¶
3.7.0 was released with an inadvertent change to the signature of the C API function: LLVMBuildLandingPad, which made the C API incompatible with prior releases. This has been corrected in LLVM 3.7.1.
As a result of this change, 3.7.0 is not ABI compatible with 3.7.1.
History of the LLVMBuildLandingPad() function 3.6.2 and prior releases - LLVMBuildLandingPad(LLVMBuilderRef,
- LLVMTypeRef, LLVMValueRef, unsigned, const char*)
3.7.0 - LLVMBuildLandingPad(LLVMBuilderRef,
- LLVMTypeRef, unsigned, const char*)
3.7.1 and future releases - LLVMBuildLandingPad(LLVMBuilderRef,
- LLVMTypeRef, LLVMValueRef, unsigned, const char*)
Non-comprehensive list of changes in 3.7.0¶
The minimum required Visual Studio version for building LLVM is now 2013 Update 4.
A new documentation page, Performance Tips for Frontend Authors, contains a collection of tips for frontend authors on how to generate IR which LLVM is able to effectively optimize.
The
DataLayout
is no longer optional. All the IR level optimizations expects it to be present and the API has been changed to use a reference instead of a pointer to make it explicit. The Module owns the datalayout and it has to match the one attached to the TargetMachine for generating code.- In 3.6, a pass was inserted in the pipeline to make the
DataLayout
accessible: MyPassManager->add(new DataLayoutPass(MyTargetMachine->getDataLayout()));
- In 3.7, you don’t need a pass, you set the
DataLayout
on theModule
: MyModule->setDataLayout(MyTargetMachine->createDataLayout());
The LLVM C API
LLVMGetTargetMachineData
is deprecated to reflect the fact that it won’t be available anymore fromTargetMachine
in 3.8.- In 3.6, a pass was inserted in the pipeline to make the
Comdats are now orthogonal to the linkage. LLVM will not create comdats for weak linkage globals and the frontends are responsible for explicitly adding them.
On ELF we now support multiple sections with the same name and comdat. This allows for smaller object files since multiple sections can have a simple name (.text, .rodata, etc).
LLVM now lazily loads metadata in some cases. Creating archives with IR files with debug info is now 25X faster.
llvm-ar can create archives in the BSD format used by OS X.
LLVM received a backend for the extended Berkely Packet Filter instruction set that can be dynamically loaded into the Linux kernel via the bpf(2) syscall.
Support for BPF has been present in the kernel for some time, but starting from 3.18 has been extended with such features as: 64-bit registers, 8 additional registers registers, conditional backwards jumps, call instruction, shift instructions, map (hash table, array, etc.), 1-8 byte load/store from stack, and more.
Up until now, users of BPF had to write bytecode by hand, or use custom generators. This release adds a proper LLVM backend target for the BPF bytecode architecture.
The BPF target is now available by default, and options exist in both Clang (-target bpf) or llc (-march=bpf) to pick eBPF as a backend.
Switch-case lowering was rewritten to avoid generating unbalanced search trees (PR22262) and to exploit profile information when available. Some lowering strategies are now disabled when optimizations are turned off, to save compile time.
The debug info IR class hierarchy now inherits from
Metadata
and has its own bitcode records and assembly syntax (documented in LangRef). The debug info verifier has been merged with the main verifier.LLVM IR and APIs are in a period of transition to aid in the removal of pointer types (the end goal being that pointers are typeless/opaque - void*, if you will). Some APIs and IR constructs have been modified to take explicit types that are currently checked to match the target type of their pre-existing pointer type operands. Further changes are still needed, but the more you can avoid using
PointerType::getPointeeType
, the easier the migration will be.Argument-less
TargetMachine::getSubtarget
andTargetMachine::getSubtargetImpl
have been removed from the tree. Updating out of tree ports is as simple as implementing a non-virtual version in the target, but implementing fullFunction
basedTargetSubtargetInfo
support is recommended.This is expected to be the last major release of LLVM that supports being run on Windows XP and Windows Vista. For the next major release the minimum Windows version requirement will be Windows 7.
Changes to the MIPS Target¶
During this release the MIPS target has:
- Added support for MIPS32R3, MIPS32R5, MIPS32R3, MIPS32R5, and microMIPS32.
- Added support for dynamic stack realignment. This is of particular importance to MSA on 32-bit subtargets since vectors always exceed the stack alignment on the O32 ABI.
- Added support for compiler-rt including:
- Support for the Address, and Undefined Behaviour Sanitizers for all MIPS subtargets.
- Support for the Data Flow, and Memory Sanitizer for 64-bit subtargets.
- Support for the Profiler for all MIPS subtargets.
- Added support for libcxx, and libcxxabi.
- Improved inline assembly support such that memory constraints may now make use
of the appropriate address offsets available to the instructions. Also, added
support for the
ZC
constraint. - Added support for 128-bit integers on 64-bit subtargets and 16-bit floating point conversions on all subtargets.
- Added support for read-only
.eh_frame
sections by storing type information indirectly. - Added support for MCJIT on all 64-bit subtargets as well as MIPS32R6.
- Added support for fast instruction selection on MIPS32 and MIPS32R2 with PIC.
- Various bug fixes. Including the following notable fixes:
- Fixed ‘jumpy’ debug line info around calls where calculation of the address of the function would inappropriately change the line number.
- Fixed missing
__mips_isa_rev
macro on the MIPS32R6 and MIPS32R6 subtargets. - Fixed representation of NaN when targeting systems using traditional encodings. Traditionally, MIPS has used NaN encodings that were compatible with IEEE754-1985 but would later be found incompatible with IEEE754-2008.
- Fixed multiple segfaults and assertions in the disassembler when disassembling instructions that have memory operands.
- Fixed multiple cases of suboptimal code generation involving $zero.
- Fixed code generation of 128-bit shifts on 64-bit subtargets.
- Prevented the delay slot filler from filling call delay slots with instructions that modify or use $ra.
- Fixed some remaining N32/N64 calling convention bugs when using small structures on big-endian subtargets.
- Fixed missing sign-extensions that are required by the N32/N64 calling convention when generating calls to library functions with 32-bit parameters.
- Corrected the
int64_t
typedef to belong
for N64. -mno-odd-spreg
is now honoured for vector insertion/extraction operations when using -mmsa.- Fixed vector insertion and extraction for MSA on 64-bit subtargets.
- Corrected the representation of member function pointers. This makes them usable on microMIPS subtargets.
Changes to the PowerPC Target¶
There are numerous improvements to the PowerPC target in this release:
- LLVM now supports the ISA 2.07B (POWER8) instruction set, including direct moves between general registers and vector registers, and built-in support for hardware transactional memory (HTM). Some missing instructions from ISA 2.06 (POWER7) were also added.
- Code generation for the local-dynamic and global-dynamic thread-local storage models has been improved.
- Loops may be restructured to leverage pre-increment loads and stores.
- QPX - The vector instruction set used by the IBM Blue Gene/Q supercomputers is now supported.
- Loads from the TOC area are now correctly treated as invariant.
- PowerPC now has support for i128 and v1i128 types. The types differ in how they are passed in registers for the ELFv2 ABI.
- Disassembly will now print shorter mnemonic aliases when available.
- Optional register name prefixes for VSX and QPX registers are now supported in the assembly parser.
- The back end now contains a pass to remove unnecessary vector swaps from POWER8 little-endian code generation. Additional improvements are planned for release 3.8.
- The undefined-behavior sanitizer (UBSan) is now supported for PowerPC.
- Many new vector programming APIs have been added to altivec.h. Additional ones are planned for release 3.8.
- PowerPC now supports __builtin_call_with_static_chain.
- PowerPC now supports the revised -mrecip option that permits finer control over reciprocal estimates.
- Many bugs have been identified and fixed.
Changes to the SystemZ Target¶
- LLVM no longer attempts to automatically detect the current host CPU when invoked natively.
- Support for all thread-local storage models. (Previous releases would support only the local-exec TLS model.)
- The POPCNT instruction is now used on z196 and above.
- The RISBGN instruction is now used on zEC12 and above.
- Support for the transactional-execution facility on zEC12 and above.
- Support for the z13 processor and its vector facility.
Changes to the JIT APIs¶
Added a new C++ JIT API called On Request Compilation, or ORC.
ORC is a new JIT API inspired by MCJIT but designed to be more testable, and easier to extend with new features. A key new feature already in tree is lazy, function-at-a-time compilation for X86. Also included is a reimplementation of MCJIT’s API and behavior (OrcMCJITReplacement). MCJIT itself remains in tree, and continues to be the default JIT ExecutionEngine, though new users are encouraged to try ORC out for their projects. (A good place to start is the new ORC tutorials under llvm/examples/kaleidoscope/orc).
Sub-project Status Update¶
In addition to the core LLVM 3.7 distribution of production-quality compiler infrastructure, the LLVM project includes sub-projects that use the LLVM core and share the same distribution license. This section provides updates on these sub-projects.
Polly - The Polyhedral Loop Optimizer in LLVM¶
Polly is a polyhedral loop optimization infrastructure that provides data-locality optimizations to LLVM-based compilers. When compiled as part of clang or loaded as a module into clang, it can perform loop optimizations such as tiling, loop fusion or outer-loop vectorization. As a generic loop optimization infrastructure it allows developers to get a per-loop-iteration model of a loop nest on which detailed analysis and transformations can be performed.
Changes since the last release:
isl imported into Polly distribution
isl, the math library Polly uses, has been imported into the source code repository of Polly and is now distributed as part of Polly. As this was the last external library dependency of Polly, Polly can now be compiled right after checking out the Polly source code without the need for any additional libraries to be pre-installed.
Small integer optimization of isl
The MIT licensed imath backend using in isl for arbitrary width integer computations has been optimized to use native integer operations for the common case where the operands of a computation fit into 32 bit and to only fall back to large arbitrary precision integers for the remaining cases. This optimization has greatly improved the compile-time performance of Polly, both due to faster native operations also due to a reduction in malloc traffic and pointer indirections. As a result, computations that use arbitrary precision integers heavily have been speed up by almost 6x. As a result, the compile-time of Polly on the Polybench test kernels in the LNT suite has been reduced by 20% on average with compile time reductions between 9-43%.
Schedule Trees
Polly now uses internally so-called > Schedule Trees < to model the loop structure it optimizes. Schedule trees are an easy to understand tree structure that describes a loop nest using integer constraint sets to keep track of execution constraints. It allows the developer to use per-tree-node operations to modify the loop tree. Programatic analysis that work on the schedule tree (e.g., as dependence analysis) also show a visible speedup as they can exploit the tree structure of the schedule and need to fall back to ILP based optimization problems less often. Section 6 of Polyhedral AST generation is more than scanning polyhedra gives a detailed explanation of this schedule trees.
Scalar and PHI node modeling - Polly as an analysis
Polly now requires almost no preprocessing to analyse LLVM-IR, which makes it easier to use Polly as a pure analysis pass e.g. to provide more precise dependence information to non-polyhedral transformation passes. Originally, Polly required the input LLVM-IR to be preprocessed such that all scalar and PHI-node dependences are translated to in-memory operations. Since this release, Polly has full support for scalar and PHI node dependences and requires no scalar-to-memory translation for such kind of dependences.
Modeling of modulo and non-affine conditions
Polly can now supports modulo operations such as A[t%2][i][j] as they appear often in stencil computations and also allows data-dependent conditional branches as they result e.g. from ternary conditions ala A[i] > 255 ? 255 : A[i].
Delinearization
Polly now support the analysis of manually linearized multi-dimensional arrays as they result form macros such as “#define 2DARRAY(A,i,j) (A.data[(i) * A.size + (j)]”. Similar constructs appear in old C code written before C99, C++ code such as boost::ublas, LLVM exported from Julia, Matlab generated code and many others. Our work titled Optimistic Delinearization of Parametrically Sized Arrays gives details.
Compile time improvements
Pratik Bahtu worked on compile-time performance tuning of Polly. His work together with the support for schedule trees and the small integer optimization in isl notably reduced the compile time.
Increased compute timeouts
As Polly’s compile time has been notabily improved, we were able to increase the compile time saveguards in Polly. As a result, the default configuration of Polly can now analyze larger loop nests without running into compile time restrictions.
Export Debug Locations via JSCoP file
Polly’s JSCoP import/export format gained support for debug locations that show to the user the source code location of detected scops.
Improved windows support
The compilation of Polly on windows using cmake has been improved and several visual studio build issues have been addressed.
Many bug fixes
libunwind¶
The unwind implementation which use to reside in libc++abi has been moved into a separate repository. This implementation can still be used for libc++abi by specifying -DLIBCXXABI_USE_LLVM_UNWINDER=YES and -DLIBCXXABI_LIBUNWIND_PATH=<path to libunwind source> when configuring libc++abi, which defaults to true when building on ARM.
The new repository can also be built standalone if just libunwind is desired.
External Open Source Projects Using LLVM 3.7¶
An exciting aspect of LLVM is that it is used as an enabling technology for a lot of other language and tools projects. This section lists some of the projects that have already been updated to work with LLVM 3.7.
LDC - the LLVM-based D compiler¶
D is a language with C-like syntax and static typing. It pragmatically combines efficiency, control, and modeling power, with safety and programmer productivity. D supports powerful concepts like Compile-Time Function Execution (CTFE) and Template Meta-Programming, provides an innovative approach to concurrency and offers many classical paradigms.
LDC uses the frontend from the reference compiler combined with LLVM as backend to produce efficient native code. LDC targets x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on PowerPC (32/64 bit). Ports to other architectures like ARM, AArch64 and MIPS64 are underway.
Portable Computing Language (pocl)¶
In addition to producing an easily portable open source OpenCL implementation, another major goal of pocl is improving performance portability of OpenCL programs with compiler optimizations, reducing the need for target-dependent manual optimizations. An important part of pocl is a set of LLVM passes used to statically parallelize multiple work-items with the kernel compiler, even in the presence of work-group barriers.
TTA-based Co-design Environment (TCE)¶
TCE is a toolset for designing customized exposed datapath processors based on the Transport triggered architecture (TTA).
The toolset provides a complete co-design flow from C/C++ programs down to synthesizable VHDL/Verilog and parallel program binaries. Processor customization points include the register files, function units, supported operations, and the interconnection network.
TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent optimizations and also for parts of code generation. It generates new LLVM-based code generators “on the fly” for the designed processors and loads them in to the compiler backend as runtime libraries to avoid per-target recompilation of larger parts of the compiler chain.
BPF Compiler Collection (BCC)¶
BCC is a Python + C framework for tracing and networking that is using Clang rewriter + 2nd pass of Clang + BPF backend to generate eBPF and push it into the kernel.
LLVMSharp & ClangSharp¶
LLVMSharp and ClangSharp are type-safe C# bindings for Microsoft.NET and Mono that Platform Invoke into the native libraries. ClangSharp is self-hosted and is used to generated LLVMSharp using the LLVM-C API.
LLVMSharp Kaleidoscope Tutorials are instructive examples of writing a compiler in C#, with certain improvements like using the visitor pattern to generate LLVM IR.
ClangSharp PInvoke Generator is the self-hosting mechanism for LLVM/ClangSharp and is demonstrative of using LibClang to generate Platform Invoke (PInvoke) signatures for C APIs.
Additional Information¶
A wide variety of additional information is available on the LLVM web page, in particular in the documentation section. The web page also contains versions of the
API documentation which is up-to-date with the Subversion version of the source
code. You can access versions of these documents specific to this release by
going into the llvm/docs/
directory in the LLVM tree.
If you have any questions or comments about LLVM, please feel free to contact us via the mailing lists.
LLVM’s Analysis and Transform Passes¶
- Introduction
- Analysis Passes
-aa-eval
: Exhaustive Alias Analysis Precision Evaluator-basicaa
: Basic Alias Analysis (stateless AA impl)-basiccg
: Basic CallGraph Construction-count-aa
: Count Alias Analysis Query Responses-da
: Dependence Analysis-debug-aa
: AA use debugger-domfrontier
: Dominance Frontier Construction-domtree
: Dominator Tree Construction-dot-callgraph
: Print Call Graph to “dot” file-dot-cfg
: Print CFG of function to “dot” file-dot-cfg-only
: Print CFG of function to “dot” file (with no function bodies)-dot-dom
: Print dominance tree of function to “dot” file-dot-dom-only
: Print dominance tree of function to “dot” file (with no function bodies)-dot-postdom
: Print postdominance tree of function to “dot” file-dot-postdom-only
: Print postdominance tree of function to “dot” file (with no function bodies)-globalsmodref-aa
: Simple mod/ref analysis for globals-instcount
: Counts the various types ofInstruction
s-intervals
: Interval Partition Construction-iv-users
: Induction Variable Users-lazy-value-info
: Lazy Value Information Analysis-libcall-aa
: LibCall Alias Analysis-lint
: Statically lint-checks LLVM IR-loops
: Natural Loop Information-memdep
: Memory Dependence Analysis-module-debuginfo
: Decodes module-level debug info-no-aa
: No Alias Analysis (always returns ‘may’ alias)-postdomfrontier
: Post-Dominance Frontier Construction-postdomtree
: Post-Dominator Tree Construction-print-alias-sets
: Alias Set Printer-print-callgraph
: Print a call graph-print-callgraph-sccs
: Print SCCs of the Call Graph-print-cfg-sccs
: Print SCCs of each function CFG-print-dom-info
: Dominator Info Printer-print-externalfnconstants
: Print external fn callsites passed constants-print-function
: Print function to stderr-print-module
: Print module to stderr-print-used-types
: Find Used Types-regions
: Detect single entry single exit regions-scalar-evolution
: Scalar Evolution Analysis-scev-aa
: ScalarEvolution-based Alias Analysis-targetdata
: Target Data Layout
- Transform Passes
-adce
: Aggressive Dead Code Elimination-always-inline
: Inliner foralways_inline
functions-argpromotion
: Promote ‘by reference’ arguments to scalars-bb-vectorize
: Basic-Block Vectorization-block-placement
: Profile Guided Basic Block Placement-break-crit-edges
: Break critical edges in CFG-codegenprepare
: Optimize for code generation-constmerge
: Merge Duplicate Global Constants-constprop
: Simple constant propagation-dce
: Dead Code Elimination-deadargelim
: Dead Argument Elimination-deadtypeelim
: Dead Type Elimination-die
: Dead Instruction Elimination-dse
: Dead Store Elimination-functionattrs
: Deduce function attributes-globaldce
: Dead Global Elimination-globalopt
: Global Variable Optimizer-gvn
: Global Value Numbering-indvars
: Canonicalize Induction Variables-inline
: Function Integration/Inlining-instcombine
: Combine redundant instructions-internalize
: Internalize Global Symbols-ipconstprop
: Interprocedural constant propagation-ipsccp
: Interprocedural Sparse Conditional Constant Propagation-jump-threading
: Jump Threading-lcssa
: Loop-Closed SSA Form Pass-licm
: Loop Invariant Code Motion-loop-deletion
: Delete dead loops-loop-extract
: Extract loops into new functions-loop-extract-single
: Extract at most one loop into a new function-loop-reduce
: Loop Strength Reduction-loop-rotate
: Rotate Loops-loop-simplify
: Canonicalize natural loops-loop-unroll
: Unroll loops-loop-unswitch
: Unswitch loops-loweratomic
: Lower atomic intrinsics to non-atomic form-lowerinvoke
: Lower invokes to calls, for unwindless code generators-lowerswitch
: LowerSwitchInst
s to branches-mem2reg
: Promote Memory to Register-memcpyopt
: MemCpy Optimization-mergefunc
: Merge Functions-mergereturn
: Unify function exit nodes-partial-inliner
: Partial Inliner-prune-eh
: Remove unused exception handling info-reassociate
: Reassociate expressions-reg2mem
: Demote all values to stack slots-scalarrepl
: Scalar Replacement of Aggregates (DT)-sccp
: Sparse Conditional Constant Propagation-simplifycfg
: Simplify the CFG-sink
: Code sinking-strip
: Strip all symbols from a module-strip-dead-debug-info
: Strip debug info for unused symbols-strip-dead-prototypes
: Strip Unused Function Prototypes-strip-debug-declare
: Strip allllvm.dbg.declare
intrinsics-strip-nondebug
: Strip all symbols, except dbg symbols, from a module-tailcallelim
: Tail Call Elimination
- Utility Passes
-deadarghaX0r
: Dead Argument Hacking (BUGPOINT USE ONLY; DO NOT USE)-extract-blocks
: Extract Basic Blocks From Module (for bugpoint use)-instnamer
: Assign names to anonymous instructions-verify
: Module Verifier-view-cfg
: View CFG of function-view-cfg-only
: View CFG of function (with no function bodies)-view-dom
: View dominance tree of function-view-dom-only
: View dominance tree of function (with no function bodies)-view-postdom
: View postdominance tree of function-view-postdom-only
: View postdominance tree of function (with no function bodies)
Introduction¶
This document serves as a high level summary of the optimization features that LLVM provides. Optimizations are implemented as Passes that traverse some portion of a program to either collect information or transform the program. The table below divides the passes that LLVM provides into three categories. Analysis passes compute information that other passes can use or for debugging or program visualization purposes. Transform passes can use (or invalidate) the analysis passes. Transform passes all mutate the program in some way. Utility passes provides some utility but don’t otherwise fit categorization. For example passes to extract functions to bitcode or write a module to bitcode are neither analysis nor transform passes. The table of contents above provides a quick summary of each pass and links to the more complete pass description later in the document.
Analysis Passes¶
This section describes the LLVM Analysis Passes.
-aa-eval
: Exhaustive Alias Analysis Precision Evaluator¶
This is a simple N^2 alias analysis accuracy evaluator. Basically, for each function in the program, it simply queries to see how the alias analysis implementation answers alias queries between each pair of pointers in the function.
This is inspired and adapted from code by: Naveen Neelakantam, Francesco Spadini, and Wojciech Stryjewski.
-basicaa
: Basic Alias Analysis (stateless AA impl)¶
A basic alias analysis pass that implements identities (two different globals cannot alias, etc), but does no stateful analysis.
-basiccg
: Basic CallGraph Construction¶
Yet to be written.
-count-aa
: Count Alias Analysis Query Responses¶
A pass which can be used to count how many alias queries are being made and how the alias analysis implementation being used responds.
-da
: Dependence Analysis¶
Dependence analysis framework, which is used to detect dependences in memory accesses.
-debug-aa
: AA use debugger¶
This simple pass checks alias analysis users to ensure that if they create a new value, they do not query AA without informing it of the value. It acts as a shim over any other AA pass you want.
Yes keeping track of every value in the program is expensive, but this is a debugging pass.
-domfrontier
: Dominance Frontier Construction¶
This pass is a simple dominator construction algorithm for finding forward dominator frontiers.
-domtree
: Dominator Tree Construction¶
This pass is a simple dominator construction algorithm for finding forward dominators.
-dot-callgraph
: Print Call Graph to “dot” file¶
This pass, only available in opt
, prints the call graph into a .dot
graph. This graph can then be processed with the “dot” tool to convert it to
postscript or some other suitable format.
-dot-cfg
: Print CFG of function to “dot” file¶
This pass, only available in opt
, prints the control flow graph into a
.dot
graph. This graph can then be processed with the dot tool
to convert it to postscript or some other suitable format.
-dot-cfg-only
: Print CFG of function to “dot” file (with no function bodies)¶
This pass, only available in opt
, prints the control flow graph into a
.dot
graph, omitting the function bodies. This graph can then be processed
with the dot tool to convert it to postscript or some other suitable
format.
-dot-dom
: Print dominance tree of function to “dot” file¶
This pass, only available in opt
, prints the dominator tree into a .dot
graph. This graph can then be processed with the dot tool to
convert it to postscript or some other suitable format.
-dot-dom-only
: Print dominance tree of function to “dot” file (with no function bodies)¶
This pass, only available in opt
, prints the dominator tree into a .dot
graph, omitting the function bodies. This graph can then be processed with the
dot tool to convert it to postscript or some other suitable format.
-dot-postdom
: Print postdominance tree of function to “dot” file¶
This pass, only available in opt
, prints the post dominator tree into a
.dot
graph. This graph can then be processed with the dot tool
to convert it to postscript or some other suitable format.
-dot-postdom-only
: Print postdominance tree of function to “dot” file (with no function bodies)¶
This pass, only available in opt
, prints the post dominator tree into a
.dot
graph, omitting the function bodies. This graph can then be processed
with the dot tool to convert it to postscript or some other suitable
format.
-globalsmodref-aa
: Simple mod/ref analysis for globals¶
This simple pass provides alias and mod/ref information for global values that do not have their address taken, and keeps track of whether functions read or write memory (are “pure”). For this simple (but very common) case, we can provide pretty accurate and useful information.
-instcount
: Counts the various types of Instruction
s¶
This pass collects the count of all instructions and reports them.
-intervals
: Interval Partition Construction¶
This analysis calculates and represents the interval partition of a function, or a preexisting interval partition.
In this way, the interval partition may be used to reduce a flow graph down to its degenerate single node interval partition (unless it is irreducible).
-iv-users
: Induction Variable Users¶
Bookkeeping for “interesting” users of expressions computed from induction variables.
-lazy-value-info
: Lazy Value Information Analysis¶
Interface for lazy computation of value constraint information.
-libcall-aa
: LibCall Alias Analysis¶
LibCall Alias Analysis.
-lint
: Statically lint-checks LLVM IR¶
This pass statically checks for common and easily-identified constructs which produce undefined or likely unintended behavior in LLVM IR.
It is not a guarantee of correctness, in two ways. First, it isn’t comprehensive. There are checks which could be done statically which are not yet implemented. Some of these are indicated by TODO comments, but those aren’t comprehensive either. Second, many conditions cannot be checked statically. This pass does no dynamic instrumentation, so it can’t check for all possible problems.
Another limitation is that it assumes all code will be executed. A store through a null pointer in a basic block which is never reached is harmless, but this pass will warn about it anyway.
Optimization passes may make conditions that this pass checks for more or less obvious. If an optimization pass appears to be introducing a warning, it may be that the optimization pass is merely exposing an existing condition in the code.
This code may be run before instcombine. In many cases, instcombine checks for the same kinds of things and turns instructions with undefined behavior into unreachable (or equivalent). Because of this, this pass makes some effort to look through bitcasts and so on.
-loops
: Natural Loop Information¶
This analysis is used to identify natural loops and determine the loop depth of various nodes of the CFG. Note that the loops identified may actually be several natural loops that share the same header node... not just a single natural loop.
-memdep
: Memory Dependence Analysis¶
An analysis that determines, for a given memory operation, what preceding memory operations it depends on. It builds on alias analysis information, and tries to provide a lazy, caching interface to a common kind of alias information query.
-module-debuginfo
: Decodes module-level debug info¶
This pass decodes the debug info metadata in a module and prints in a (sufficiently-prepared-) human-readable form.
For example, run this pass from opt
along with the -analyze
option, and
it’ll print to standard output.
-no-aa
: No Alias Analysis (always returns ‘may’ alias)¶
This is the default implementation of the Alias Analysis interface. It always returns “I don’t know” for alias queries. NoAA is unlike other alias analysis implementations, in that it does not chain to a previous analysis. As such it doesn’t follow many of the rules that other alias analyses must.
-postdomfrontier
: Post-Dominance Frontier Construction¶
This pass is a simple post-dominator construction algorithm for finding post-dominator frontiers.
-postdomtree
: Post-Dominator Tree Construction¶
This pass is a simple post-dominator construction algorithm for finding post-dominators.
-print-alias-sets
: Alias Set Printer¶
Yet to be written.
-print-callgraph
: Print a call graph¶
This pass, only available in opt
, prints the call graph to standard error
in a human-readable form.
-print-callgraph-sccs
: Print SCCs of the Call Graph¶
This pass, only available in opt
, prints the SCCs of the call graph to
standard error in a human-readable form.
-print-cfg-sccs
: Print SCCs of each function CFG¶
This pass, only available in opt
, printsthe SCCs of each function CFG to
standard error in a human-readable fom.
-print-dom-info
: Dominator Info Printer¶
Dominator Info Printer.
-print-externalfnconstants
: Print external fn callsites passed constants¶
This pass, only available in opt
, prints out call sites to external
functions that are called with constant arguments. This can be useful when
looking for standard library functions we should constant fold or handle in
alias analyses.
-print-function
: Print function to stderr¶
The PrintFunctionPass
class is designed to be pipelined with other
FunctionPasses
, and prints out the functions of the module as they are
processed.
-print-module
: Print module to stderr¶
This pass simply prints out the entire module when it is executed.
-print-used-types
: Find Used Types¶
This pass is used to seek out all of the types in use by the program. Note that this analysis explicitly does not include types only used by the symbol table.
-regions
: Detect single entry single exit regions¶
The RegionInfo
pass detects single entry single exit regions in a function,
where a region is defined as any subgraph that is connected to the remaining
graph at only two spots. Furthermore, an hierarchical region tree is built.
-scalar-evolution
: Scalar Evolution Analysis¶
The ScalarEvolution
analysis can be used to analyze and catagorize scalar
expressions in loops. It specializes in recognizing general induction
variables, representing them with the abstract and opaque SCEV
class.
Given this analysis, trip counts of loops and other important properties can be
obtained.
This analysis is primarily useful for induction variable substitution and strength reduction.
-scev-aa
: ScalarEvolution-based Alias Analysis¶
Simple alias analysis implemented in terms of ScalarEvolution
queries.
This differs from traditional loop dependence analysis in that it tests for dependencies within a single iteration of a loop, rather than dependencies between different iterations.
ScalarEvolution
has a more complete understanding of pointer arithmetic
than BasicAliasAnalysis
‘ collection of ad-hoc analyses.
-targetdata
: Target Data Layout¶
Provides other passes access to information on how the size and alignment required by the target ABI for various data types.
Transform Passes¶
This section describes the LLVM Transform Passes.
-adce
: Aggressive Dead Code Elimination¶
ADCE aggressively tries to eliminate code. This pass is similar to DCE but it assumes that values are dead until proven otherwise. This is similar to SCCP, except applied to the liveness of values.
-always-inline
: Inliner for always_inline
functions¶
A custom inliner that handles only functions that are marked as “always inline”.
-argpromotion
: Promote ‘by reference’ arguments to scalars¶
This pass promotes “by reference” arguments to be “by value” arguments. In practice, this means looking for internal functions that have pointer arguments. If it can prove, through the use of alias analysis, that an argument is only loaded, then it can pass the value into the function instead of the address of the value. This can cause recursive simplification of code and lead to the elimination of allocas (especially in C++ template code like the STL).
This pass also handles aggregate arguments that are passed into a function, scalarizing them if the elements of the aggregate are only loaded. Note that it refuses to scalarize aggregates which would require passing in more than three operands to the function, because passing thousands of operands for a large array or structure is unprofitable!
Note that this transformation could also be done for arguments that are only stored to (returning the value instead), but does not currently. This case would be best handled when and if LLVM starts supporting multiple return values from functions.
-bb-vectorize
: Basic-Block Vectorization¶
This pass combines instructions inside basic blocks to form vector instructions. It iterates over each basic block, attempting to pair compatible instructions, repeating this process until no additional pairs are selected for vectorization. When the outputs of some pair of compatible instructions are used as inputs by some other pair of compatible instructions, those pairs are part of a potential vectorization chain. Instruction pairs are only fused into vector instructions when they are part of a chain longer than some threshold length. Moreover, the pass attempts to find the best possible chain for each pair of compatible instructions. These heuristics are intended to prevent vectorization in cases where it would not yield a performance increase of the resulting code.
-block-placement
: Profile Guided Basic Block Placement¶
This pass is a very simple profile guided basic block placement algorithm. The idea is to put frequently executed blocks together at the start of the function and hopefully increase the number of fall-through conditional branches. If there is no profile information for a particular function, this pass basically orders blocks in depth-first order.
-break-crit-edges
: Break critical edges in CFG¶
Break all of the critical edges in the CFG by inserting a dummy basic block. It may be “required” by passes that cannot deal with critical edges. This transformation obviously invalidates the CFG, but can update forward dominator (set, immediate dominators, tree, and frontier) information.
-codegenprepare
: Optimize for code generation¶
This pass munges the code in the input function to better prepare it for SelectionDAG-based code generation. This works around limitations in its basic-block-at-a-time approach. It should eventually be removed.
-constmerge
: Merge Duplicate Global Constants¶
Merges duplicate global constants together into a single constant that is shared. This is useful because some passes (i.e., TraceValues) insert a lot of string constants into the program, regardless of whether or not an existing string is available.
-constprop
: Simple constant propagation¶
This pass implements constant propagation and merging. It looks for instructions involving only constant operands and replaces them with a constant value instead of an instruction. For example:
add i32 1, 2
becomes
i32 3
NOTE: this pass has a habit of making definitions be dead. It is a good idea to run a Dead Instruction Elimination pass sometime after running this pass.
-dce
: Dead Code Elimination¶
Dead code elimination is similar to dead instruction elimination, but it rechecks instructions that were used by removed instructions to see if they are newly dead.
-deadargelim
: Dead Argument Elimination¶
This pass deletes dead arguments from internal functions. Dead argument elimination removes arguments which are directly dead, as well as arguments only passed into function calls as dead arguments of other functions. This pass also deletes dead arguments in a similar way.
This pass is often useful as a cleanup pass to run after aggressive interprocedural passes, which add possibly-dead arguments.
-deadtypeelim
: Dead Type Elimination¶
This pass is used to cleanup the output of GCC. It eliminate names for types that are unused in the entire translation unit, using the find used types pass.
-die
: Dead Instruction Elimination¶
Dead instruction elimination performs a single pass over the function, removing instructions that are obviously dead.
-dse
: Dead Store Elimination¶
A trivial dead store elimination that only considers basic-block local redundant stores.
-functionattrs
: Deduce function attributes¶
A simple interprocedural pass which walks the call-graph, looking for functions
which do not access or only read non-local memory, and marking them
readnone
/readonly
. In addition, it marks function arguments (of
pointer type) “nocapture
” if a call to the function does not create any
copies of the pointer value that outlive the call. This more or less means
that the pointer is only dereferenced, and not returned from the function or
stored in a global. This pass is implemented as a bottom-up traversal of the
call-graph.
-globaldce
: Dead Global Elimination¶
This transform is designed to eliminate unreachable internal globals from the program. It uses an aggressive algorithm, searching out globals that are known to be alive. After it finds all of the globals which are needed, it deletes whatever is left over. This allows it to delete recursive chunks of the program which are unreachable.
-globalopt
: Global Variable Optimizer¶
This pass transforms simple global variables that never have their address taken. If obviously true, it marks read/write globals as constant, deletes variables only stored to, etc.
-gvn
: Global Value Numbering¶
This pass performs global value numbering to eliminate fully and partially redundant instructions. It also performs redundant load elimination.
-indvars
: Canonicalize Induction Variables¶
This transformation analyzes and transforms the induction variables (and computations derived from them) into simpler forms suitable for subsequent analysis and transformation.
This transformation makes the following changes to each loop with an identifiable induction variable:
- All loops are transformed to have a single canonical induction variable which starts at zero and steps by one.
- The canonical induction variable is guaranteed to be the first PHI node in the loop header block.
- Any pointer arithmetic recurrences are raised to use array subscripts.
If the trip count of a loop is computable, this pass also makes the following changes:
The exit condition for the loop is canonicalized to compare the induction value against the exit value. This turns loops like:
for (i = 7; i*i < 1000; ++i) into
for (i = 0; i != 25; ++i)
Any use outside of the loop of an expression derived from the indvar is changed to compute the derived value outside of the loop, eliminating the dependence on the exit value of the induction variable. If the only purpose of the loop is to compute the exit value of some derived expression, this transformation will make the loop dead.
This transformation should be followed by strength reduction after all of the desired loop transformations have been performed. Additionally, on targets where it is profitable, the loop could be transformed to count down to zero (the “do loop” optimization).
-inline
: Function Integration/Inlining¶
Bottom-up inlining of functions into callees.
-instcombine
: Combine redundant instructions¶
Combine instructions to form fewer, simple instructions. This pass does not modify the CFG. This pass is where algebraic simplification happens.
This pass combines things like:
%Y = add i32 %X, 1
%Z = add i32 %Y, 1
into:
%Z = add i32 %X, 2
This is a simple worklist driven algorithm.
This pass guarantees that the following canonicalizations are performed on the program:
- If a binary operator has a constant operand, it is moved to the right-hand side.
- Bitwise operators with constant operands are always grouped so that shifts
are performed first, then
or
s, thenand
s, thenxor
s. - Compare instructions are converted from
<
,>
,≤
, or≥
to=
or≠
if possible. - All
cmp
instructions on boolean values are replaced with logical operations. add X, X
is represented asmul X, 2
⇒shl X, 1
- Multiplies with a constant power-of-two argument are transformed into shifts.
- … etc.
This pass can also simplify calls to specific well-known function calls (e.g.
runtime library functions). For example, a call exit(3)
that occurs within
the main()
function can be transformed into simply return 3
. Whether or
not library calls are simplified is controlled by the
-functionattrs pass and LLVM’s knowledge of
library calls on different targets.
-internalize
: Internalize Global Symbols¶
This pass loops over all of the functions in the input module, looking for a main function. If a main function is found, all other functions and all global variables with initializers are marked as internal.
-ipconstprop
: Interprocedural constant propagation¶
This pass implements an extremely simple interprocedural constant propagation pass. It could certainly be improved in many different ways, like using a worklist. This pass makes arguments dead, but does not remove them. The existing dead argument elimination pass should be run after this to clean up the mess.
-ipsccp
: Interprocedural Sparse Conditional Constant Propagation¶
An interprocedural variant of Sparse Conditional Constant Propagation.
-jump-threading
: Jump Threading¶
Jump threading tries to find distinct threads of control flow running through a basic block. This pass looks at blocks that have multiple predecessors and multiple successors. If one or more of the predecessors of the block can be proven to always cause a jump to one of the successors, we forward the edge from the predecessor to the successor by duplicating the contents of this block.
An example of when this can occur is code like this:
if () { ...
X = 4;
}
if (X < 3) {
In this case, the unconditional branch at the end of the first if can be revectored to the false side of the second if.
-lcssa
: Loop-Closed SSA Form Pass¶
This pass transforms loops by placing phi nodes at the end of the loops for all values that are live across the loop boundary. For example, it turns the left into the right code:
for (...) for (...)
if (c) if (c)
X1 = ... X1 = ...
else else
X2 = ... X2 = ...
X3 = phi(X1, X2) X3 = phi(X1, X2)
... = X3 + 4 X4 = phi(X3)
... = X4 + 4
This is still valid LLVM; the extra phi nodes are purely redundant, and will be
trivially eliminated by InstCombine
. The major benefit of this
transformation is that it makes many other loop optimizations, such as
LoopUnswitch
ing, simpler.
-licm
: Loop Invariant Code Motion¶
This pass performs loop invariant code motion, attempting to remove as much code from the body of a loop as possible. It does this by either hoisting code into the preheader block, or by sinking code to the exit blocks if it is safe. This pass also promotes must-aliased memory locations in the loop to live in registers, thus hoisting and sinking “invariant” loads and stores.
This pass uses alias analysis for two purposes:
Moving loop invariant loads and calls out of loops. If we can determine that a load or call inside of a loop never aliases anything stored to, we can hoist it or sink it like any other instruction.
Scalar Promotion of Memory. If there is a store instruction inside of the loop, we try to move the store to happen AFTER the loop instead of inside of the loop. This can only happen if a few conditions are true:
- The pointer stored through is loop invariant.
- There are no stores or loads in the loop which may alias the pointer. There are no calls in the loop which mod/ref the pointer.
If these conditions are true, we can promote the loads and stores in the loop of the pointer to use a temporary alloca’d variable. We then use the mem2reg functionality to construct the appropriate SSA form for the variable.
-loop-deletion
: Delete dead loops¶
This file implements the Dead Loop Deletion Pass. This pass is responsible for eliminating loops with non-infinite computable trip counts that have no side effects or volatile instructions, and do not contribute to the computation of the function’s return value.
-loop-extract
: Extract loops into new functions¶
A pass wrapper around the ExtractLoop()
scalar transformation to extract
each top-level loop into its own new function. If the loop is the only loop
in a given function, it is not touched. This is a pass most useful for
debugging via bugpoint.
-loop-extract-single
: Extract at most one loop into a new function¶
Similar to Extract loops into new functions, this pass extracts one natural loop from the program into a function if it can. This is used by bugpoint.
-loop-reduce
: Loop Strength Reduction¶
This pass performs a strength reduction on array references inside loops that have as one or more of their components the loop induction variable. This is accomplished by creating a new value to hold the initial value of the array access for the first iteration, and then creating a new GEP instruction in the loop to increment the value by the appropriate amount.
-loop-rotate
: Rotate Loops¶
A simple loop rotation transformation.
-loop-simplify
: Canonicalize natural loops¶
This pass performs several transformations to transform natural loops into a simpler form, which makes subsequent analyses and transformations simpler and more effective.
Loop pre-header insertion guarantees that there is a single, non-critical entry edge from outside of the loop to the loop header. This simplifies a number of analyses and transformations, such as LICM.
Loop exit-block insertion guarantees that all exit blocks from the loop (blocks which are outside of the loop that have predecessors inside of the loop) only have predecessors from inside of the loop (and are thus dominated by the loop header). This simplifies transformations such as store-sinking that are built into LICM.
This pass also guarantees that loops will have exactly one backedge.
Note that the simplifycfg pass will clean up blocks which are split out but end up being unnecessary, so usage of this pass should not pessimize generated code.
This pass obviously modifies the CFG, but updates loop information and dominator information.
-loop-unroll
: Unroll loops¶
This pass implements a simple loop unroller. It works best when loops have been canonicalized by the indvars pass, allowing it to determine the trip counts of loops easily.
-loop-unswitch
: Unswitch loops¶
This pass transforms loops that contain branches on loop-invariant conditions to have multiple loops. For example, it turns the left into the right code:
for (...) if (lic)
A for (...)
if (lic) A; B; C
B else
C for (...)
A; C
This can increase the size of the code exponentially (doubling it every time a loop is unswitched) so we only unswitch if the resultant code will be smaller than a threshold.
This pass expects LICM to be run before it to hoist invariant conditions out of the loop, to make the unswitching opportunity obvious.
-loweratomic
: Lower atomic intrinsics to non-atomic form¶
This pass lowers atomic intrinsics to non-atomic form for use in a known non-preemptible environment.
The pass does not verify that the environment is non-preemptible (in general this would require knowledge of the entire call graph of the program including any libraries which may not be available in bitcode form); it simply lowers every atomic intrinsic.
-lowerinvoke
: Lower invokes to calls, for unwindless code generators¶
This transformation is designed for use by code generators which do not yet
support stack unwinding. This pass converts invoke
instructions to
call
instructions, so that any exception-handling landingpad
blocks
become dead code (which can be removed by running the -simplifycfg
pass
afterwards).
-lowerswitch
: Lower SwitchInst
s to branches¶
Rewrites switch instructions with a sequence of branches, which allows targets to get away with not implementing the switch instruction until it is convenient.
-mem2reg
: Promote Memory to Register¶
This file promotes memory references to be register references. It promotes
alloca instructions which only have loads and stores as uses. An alloca
is
transformed by using dominator frontiers to place phi nodes, then traversing
the function in depth-first order to rewrite loads and stores as appropriate.
This is just the standard SSA construction algorithm to construct “pruned” SSA
form.
-memcpyopt
: MemCpy Optimization¶
This pass performs various transformations related to eliminating memcpy
calls, or transforming sets of stores into memset
s.
-mergefunc
: Merge Functions¶
This pass looks for equivalent functions that are mergable and folds them.
Total-ordering is introduced among the functions set: we define comparison that answers for every two functions which of them is greater. It allows to arrange functions into the binary tree.
For every new function we check for equivalent in tree.
If equivalent exists we fold such functions. If both functions are overridable, we move the functionality into a new internal function and leave two overridable thunks to it.
If there is no equivalent, then we add this function to tree.
Lookup routine has O(log(n)) complexity, while whole merging process has complexity of O(n*log(n)).
Read this article for more details.
-mergereturn
: Unify function exit nodes¶
Ensure that functions have at most one ret
instruction in them.
Additionally, it keeps track of which node is the new exit node of the CFG.
-partial-inliner
: Partial Inliner¶
This pass performs partial inlining, typically by inlining an if
statement
that surrounds the body of the function.
-prune-eh
: Remove unused exception handling info¶
This file implements a simple interprocedural pass which walks the call-graph, turning invoke instructions into call instructions if and only if the callee cannot throw an exception. It implements this as a bottom-up traversal of the call-graph.
-reassociate
: Reassociate expressions¶
This pass reassociates commutative expressions in an order that is designed to promote better constant propagation, GCSE, LICM, PRE, etc.
For example: 4 + (x + 5) ⇒ x + (4 + 5)
In the implementation of this algorithm, constants are assigned rank = 0, function arguments are rank = 1, and other values are assigned ranks corresponding to the reverse post order traversal of current function (starting at 2), which effectively gives values in deep loops higher rank than values not in loops.
-reg2mem
: Demote all values to stack slots¶
This file demotes all registers to memory references. It is intended to be the
inverse of mem2reg. By converting to load
instructions, the only values live across basic blocks are alloca
instructions and load
instructions before phi
nodes. It is intended
that this should make CFG hacking much easier. To make later hacking easier,
the entry block is split into two, such that all introduced alloca
instructions (and nothing else) are in the entry block.
-scalarrepl
: Scalar Replacement of Aggregates (DT)¶
The well-known scalar replacement of aggregates transformation. This transform
breaks up alloca
instructions of aggregate type (structure or array) into
individual alloca
instructions for each member if possible. Then, if
possible, it transforms the individual alloca
instructions into nice clean
scalar SSA form.
This combines a simple scalar replacement of aggregates algorithm with the
mem2reg algorithm because they often interact,
especially for C++ programs. As such, iterating between scalarrepl
, then
mem2reg until we run out of things to promote works
well.
-sccp
: Sparse Conditional Constant Propagation¶
Sparse conditional constant propagation and merging, which can be summarized as:
- Assumes values are constant unless proven otherwise
- Assumes BasicBlocks are dead unless proven otherwise
- Proves values to be constant, and replaces them with constants
- Proves conditional branches to be unconditional
Note that this pass has a habit of making definitions be dead. It is a good idea to run a DCE pass sometime after running this pass.
-simplifycfg
: Simplify the CFG¶
Performs dead code elimination and basic block merging. Specifically:
- Removes basic blocks with no predecessors.
- Merges a basic block into its predecessor if there is only one and the predecessor only has one successor.
- Eliminates PHI nodes for basic blocks with a single predecessor.
- Eliminates a basic block that only contains an unconditional branch.
-sink
: Code sinking¶
This pass moves instructions into successor blocks, when possible, so that they aren’t executed on paths where their results aren’t needed.
-strip
: Strip all symbols from a module¶
Performs code stripping. This transformation can delete:
- names for virtual registers
- symbols for internal globals and functions
- debug information
Note that this transformation makes code much less readable, so it should only be used in situations where the strip utility would be used, such as reducing code size or making it harder to reverse engineer code.
-strip-dead-debug-info
: Strip debug info for unused symbols¶
performs code stripping. this transformation can delete:
- names for virtual registers
- symbols for internal globals and functions
- debug information
note that this transformation makes code much less readable, so it should only be used in situations where the strip utility would be used, such as reducing code size or making it harder to reverse engineer code.
-strip-dead-prototypes
: Strip Unused Function Prototypes¶
This pass loops over all of the functions in the input module, looking for dead declarations and removes them. Dead declarations are declarations of functions for which no implementation is available (i.e., declarations for unused library functions).
-strip-debug-declare
: Strip all llvm.dbg.declare
intrinsics¶
This pass implements code stripping. Specifically, it can delete:
- names for virtual registers
- symbols for internal globals and functions
- debug information
Note that this transformation makes code much less readable, so it should only be used in situations where the ‘strip’ utility would be used, such as reducing code size or making it harder to reverse engineer code.
-strip-nondebug
: Strip all symbols, except dbg symbols, from a module¶
This pass implements code stripping. Specifically, it can delete:
- names for virtual registers
- symbols for internal globals and functions
- debug information
Note that this transformation makes code much less readable, so it should only be used in situations where the ‘strip’ utility would be used, such as reducing code size or making it harder to reverse engineer code.
-tailcallelim
: Tail Call Elimination¶
This file transforms calls of the current function (self recursion) followed by a return instruction with a branch to the entry of the function, creating a loop. This pass also implements the following extensions to the basic algorithm:
- Trivial instructions between the call and return do not prevent the transformation from taking place, though currently the analysis cannot support moving any really useful instructions (only dead ones).
- This pass transforms functions that are prevented from being tail recursive by an associative expression to use an accumulator variable, thus compiling the typical naive factorial or fib implementation into efficient code.
- TRE is performed if the function returns void, if the return returns the result returned by the call, or if the function returns a run-time constant on all exits from the function. It is possible, though unlikely, that the return returns something else (like constant 0), and can still be TRE’d. It can be TRE’d if all other return instructions in the function return the exact same value.
- If it can prove that callees do not access theier caller stack frame, they are marked as eligible for tail call elimination (by the code generator).
Utility Passes¶
This section describes the LLVM Utility Passes.
-deadarghaX0r
: Dead Argument Hacking (BUGPOINT USE ONLY; DO NOT USE)¶
Same as dead argument elimination, but deletes arguments to functions which are external. This is only for use by bugpoint.
-extract-blocks
: Extract Basic Blocks From Module (for bugpoint use)¶
This pass is used by bugpoint to extract all blocks from the module into their own functions.
-instnamer
: Assign names to anonymous instructions¶
This is a little utility pass that gives instructions names, this is mostly useful when diffing the effect of an optimization because deleting an unnamed instruction can change all other instruction numbering, making the diff very noisy.
-verify
: Module Verifier¶
Verifies an LLVM IR code. This is useful to run after an optimization which is undergoing testing. Note that llvm-as verifies its input before emitting bitcode, and also that malformed bitcode is likely to make LLVM crash. All language front-ends are therefore encouraged to verify their output before performing optimizing transformations.
- Both of a binary operator’s parameters are of the same type.
- Verify that the indices of mem access instructions match other operands.
- Verify that arithmetic and other things are only performed on first-class types. Verify that shifts and logicals only happen on integrals f.e.
- All of the constants in a switch statement are of the correct type.
- The code is in valid SSA form.
- It is illegal to put a label into any other type (like a structure) or to return one.
- Only phi nodes can be self referential:
%x = add i32 %x
,%x
is invalid. - PHI nodes must have an entry for each predecessor, with no extras.
- PHI nodes must be the first thing in a basic block, all grouped together.
- PHI nodes must have at least one entry.
- All basic blocks should only end with terminator insts, not contain them.
- The entry node to a function must not have predecessors.
- All Instructions must be embedded into a basic block.
- Functions cannot take a void-typed parameter.
- Verify that a function’s argument list agrees with its declared type.
- It is illegal to specify a name for a void value.
- It is illegal to have an internal global value with no initializer.
- It is illegal to have a
ret
instruction that returns a value that does not agree with the function return value type. - Function call argument types match the function prototype.
- All other things that are tested by asserts spread about the code.
Note that this does not provide full security verification (like Java), but instead just tries to ensure that code is well-formed.
-view-cfg
: View CFG of function¶
Displays the control flow graph using the GraphViz tool.
-view-cfg-only
: View CFG of function (with no function bodies)¶
Displays the control flow graph using the GraphViz tool, but omitting function bodies.
-view-dom
: View dominance tree of function¶
Displays the dominator tree using the GraphViz tool.
-view-dom-only
: View dominance tree of function (with no function bodies)¶
Displays the dominator tree using the GraphViz tool, but omitting function bodies.
-view-postdom
: View postdominance tree of function¶
Displays the post dominator tree using the GraphViz tool.
-view-postdom-only
: View postdominance tree of function (with no function bodies)¶
Displays the post dominator tree using the GraphViz tool, but omitting function bodies.
YAML I/O¶
Introduction to YAML¶
YAML is a human readable data serialization language. The full YAML language spec can be read at yaml.org. The simplest form of yaml is just “scalars”, “mappings”, and “sequences”. A scalar is any number or string. The pound/hash symbol (#) begins a comment line. A mapping is a set of key-value pairs where the key ends with a colon. For example:
# a mapping
name: Tom
hat-size: 7
A sequence is a list of items where each item starts with a leading dash (‘-‘). For example:
# a sequence
- x86
- x86_64
- PowerPC
You can combine mappings and sequences by indenting. For example a sequence of mappings in which one of the mapping values is itself a sequence:
# a sequence of mappings with one key's value being a sequence
- name: Tom
cpus:
- x86
- x86_64
- name: Bob
cpus:
- x86
- name: Dan
cpus:
- PowerPC
- x86
Sometime sequences are known to be short and the one entry per line is too verbose, so YAML offers an alternate syntax for sequences called a “Flow Sequence” in which you put comma separated sequence elements into square brackets. The above example could then be simplified to :
# a sequence of mappings with one key's value being a flow sequence
- name: Tom
cpus: [ x86, x86_64 ]
- name: Bob
cpus: [ x86 ]
- name: Dan
cpus: [ PowerPC, x86 ]
Introduction to YAML I/O¶
The use of indenting makes the YAML easy for a human to read and understand, but having a program read and write YAML involves a lot of tedious details. The YAML I/O library structures and simplifies reading and writing YAML documents.
YAML I/O assumes you have some “native” data structures which you want to be able to dump as YAML and recreate from YAML. The first step is to try writing example YAML for your data structures. You may find after looking at possible YAML representations that a direct mapping of your data structures to YAML is not very readable. Often the fields are not in the order that a human would find readable. Or the same information is replicated in multiple locations, making it hard for a human to write such YAML correctly.
In relational database theory there is a design step called normalization in which you reorganize fields and tables. The same considerations need to go into the design of your YAML encoding. But, you may not want to change your existing native data structures. Therefore, when writing out YAML there may be a normalization step, and when reading YAML there would be a corresponding denormalization step.
YAML I/O uses a non-invasive, traits based design. YAML I/O defines some abstract base templates. You specialize those templates on your data types. For instance, if you have an enumerated type FooBar you could specialize ScalarEnumerationTraits on that type and define the enumeration() method:
using llvm::yaml::ScalarEnumerationTraits;
using llvm::yaml::IO;
template <>
struct ScalarEnumerationTraits<FooBar> {
static void enumeration(IO &io, FooBar &value) {
...
}
};
As with all YAML I/O template specializations, the ScalarEnumerationTraits is used for both reading and writing YAML. That is, the mapping between in-memory enum values and the YAML string representation is only in one place. This assures that the code for writing and parsing of YAML stays in sync.
To specify a YAML mappings, you define a specialization on llvm::yaml::MappingTraits. If your native data structure happens to be a struct that is already normalized, then the specialization is simple. For example:
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
template <>
struct MappingTraits<Person> {
static void mapping(IO &io, Person &info) {
io.mapRequired("name", info.name);
io.mapOptional("hat-size", info.hatSize);
}
};
A YAML sequence is automatically inferred if you data type has begin()/end() iterators and a push_back() method. Therefore any of the STL containers (such as std::vector<>) will automatically translate to YAML sequences.
Once you have defined specializations for your data types, you can programmatically use YAML I/O to write a YAML document:
using llvm::yaml::Output;
Person tom;
tom.name = "Tom";
tom.hatSize = 8;
Person dan;
dan.name = "Dan";
dan.hatSize = 7;
std::vector<Person> persons;
persons.push_back(tom);
persons.push_back(dan);
Output yout(llvm::outs());
yout << persons;
This would write the following:
- name: Tom
hat-size: 8
- name: Dan
hat-size: 7
And you can also read such YAML documents with the following code:
using llvm::yaml::Input;
typedef std::vector<Person> PersonList;
std::vector<PersonList> docs;
Input yin(document.getBuffer());
yin >> docs;
if ( yin.error() )
return;
// Process read document
for ( PersonList &pl : docs ) {
for ( Person &person : pl ) {
cout << "name=" << person.name;
}
}
One other feature of YAML is the ability to define multiple documents in a single file. That is why reading YAML produces a vector of your document type.
Error Handling¶
When parsing a YAML document, if the input does not match your schema (as expressed in your XxxTraits<> specializations). YAML I/O will print out an error message and your Input object’s error() method will return true. For instance the following document:
- name: Tom
shoe-size: 12
- name: Dan
hat-size: 7
Has a key (shoe-size) that is not defined in the schema. YAML I/O will automatically generate this error:
YAML:2:2: error: unknown key 'shoe-size'
shoe-size: 12
^~~~~~~~~
Similar errors are produced for other input not conforming to the schema.
Scalars¶
YAML scalars are just strings (i.e. not a sequence or mapping). The YAML I/O library provides support for translating between YAML scalars and specific C++ types.
Built-in types¶
The following types have built-in support in YAML I/O:
- bool
- float
- double
- StringRef
- std::string
- int64_t
- int32_t
- int16_t
- int8_t
- uint64_t
- uint32_t
- uint16_t
- uint8_t
That is, you can use those types in fields of MappingTraits or as element type in sequence. When reading, YAML I/O will validate that the string found is convertible to that type and error out if not.
Unique types¶
Given that YAML I/O is trait based, the selection of how to convert your data to YAML is based on the type of your data. But in C++ type matching, typedefs do not generate unique type names. That means if you have two typedefs of unsigned int, to YAML I/O both types look exactly like unsigned int. To facilitate make unique type names, YAML I/O provides a macro which is used like a typedef on built-in types, but expands to create a class with conversion operators to and from the base type. For example:
LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyFooFlags)
LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyBarFlags)
This generates two classes MyFooFlags and MyBarFlags which you can use in your native data structures instead of uint32_t. They are implicitly converted to and from uint32_t. The point of creating these unique types is that you can now specify traits on them to get different YAML conversions.
Hex types¶
An example use of a unique type is that YAML I/O provides fixed sized unsigned integers that are written with YAML I/O as hexadecimal instead of the decimal format used by the built-in integer types:
- Hex64
- Hex32
- Hex16
- Hex8
You can use llvm::yaml::Hex32 instead of uint32_t and the only different will be that when YAML I/O writes out that type it will be formatted in hexadecimal.
ScalarEnumerationTraits¶
YAML I/O supports translating between in-memory enumerations and a set of string values in YAML documents. This is done by specializing ScalarEnumerationTraits<> on your enumeration type and define a enumeration() method. For instance, suppose you had an enumeration of CPUs and a struct with it as a field:
enum CPUs {
cpu_x86_64 = 5,
cpu_x86 = 7,
cpu_PowerPC = 8
};
struct Info {
CPUs cpu;
uint32_t flags;
};
To support reading and writing of this enumeration, you can define a ScalarEnumerationTraits specialization on CPUs, which can then be used as a field type:
using llvm::yaml::ScalarEnumerationTraits;
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
template <>
struct ScalarEnumerationTraits<CPUs> {
static void enumeration(IO &io, CPUs &value) {
io.enumCase(value, "x86_64", cpu_x86_64);
io.enumCase(value, "x86", cpu_x86);
io.enumCase(value, "PowerPC", cpu_PowerPC);
}
};
template <>
struct MappingTraits<Info> {
static void mapping(IO &io, Info &info) {
io.mapRequired("cpu", info.cpu);
io.mapOptional("flags", info.flags, 0);
}
};
When reading YAML, if the string found does not match any of the strings specified by enumCase() methods, an error is automatically generated. When writing YAML, if the value being written does not match any of the values specified by the enumCase() methods, a runtime assertion is triggered.
BitValue¶
Another common data structure in C++ is a field where each bit has a unique meaning. This is often used in a “flags” field. YAML I/O has support for converting such fields to a flow sequence. For instance suppose you had the following bit flags defined:
enum {
flagsPointy = 1
flagsHollow = 2
flagsFlat = 4
flagsRound = 8
};
LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyFlags)
To support reading and writing of MyFlags, you specialize ScalarBitSetTraits<> on MyFlags and provide the bit values and their names.
using llvm::yaml::ScalarBitSetTraits;
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
template <>
struct ScalarBitSetTraits<MyFlags> {
static void bitset(IO &io, MyFlags &value) {
io.bitSetCase(value, "hollow", flagHollow);
io.bitSetCase(value, "flat", flagFlat);
io.bitSetCase(value, "round", flagRound);
io.bitSetCase(value, "pointy", flagPointy);
}
};
struct Info {
StringRef name;
MyFlags flags;
};
template <>
struct MappingTraits<Info> {
static void mapping(IO &io, Info& info) {
io.mapRequired("name", info.name);
io.mapRequired("flags", info.flags);
}
};
With the above, YAML I/O (when writing) will test mask each value in the bitset trait against the flags field, and each that matches will cause the corresponding string to be added to the flow sequence. The opposite is done when reading and any unknown string values will result in a error. With the above schema, a same valid YAML document is:
name: Tom
flags: [ pointy, flat ]
Sometimes a “flags” field might contains an enumeration part defined by a bit-mask.
enum {
flagsFeatureA = 1,
flagsFeatureB = 2,
flagsFeatureC = 4,
flagsCPUMask = 24,
flagsCPU1 = 8,
flagsCPU2 = 16
};
To support reading and writing such fields, you need to use the maskedBitSet() method and provide the bit values, their names and the enumeration mask.
template <>
struct ScalarBitSetTraits<MyFlags> {
static void bitset(IO &io, MyFlags &value) {
io.bitSetCase(value, "featureA", flagsFeatureA);
io.bitSetCase(value, "featureB", flagsFeatureB);
io.bitSetCase(value, "featureC", flagsFeatureC);
io.maskedBitSetCase(value, "CPU1", flagsCPU1, flagsCPUMask);
io.maskedBitSetCase(value, "CPU2", flagsCPU2, flagsCPUMask);
}
};
YAML I/O (when writing) will apply the enumeration mask to the flags field, and compare the result and values from the bitset. As in case of a regular bitset, each that matches will cause the corresponding string to be added to the flow sequence.
Custom Scalar¶
Sometimes for readability a scalar needs to be formatted in a custom way. For instance your internal data structure may use a integer for time (seconds since some epoch), but in YAML it would be much nicer to express that integer in some time format (e.g. 4-May-2012 10:30pm). YAML I/O has a way to support custom formatting and parsing of scalar types by specializing ScalarTraits<> on your data type. When writing, YAML I/O will provide the native type and your specialization must create a temporary llvm::StringRef. When reading, YAML I/O will provide an llvm::StringRef of scalar and your specialization must convert that to your native data type. An outline of a custom scalar type looks like:
using llvm::yaml::ScalarTraits;
using llvm::yaml::IO;
template <>
struct ScalarTraits<MyCustomType> {
static void output(const T &value, void*, llvm::raw_ostream &out) {
out << value; // do custom formatting here
}
static StringRef input(StringRef scalar, void*, T &value) {
// do custom parsing here. Return the empty string on success,
// or an error message on failure.
return StringRef();
}
// Determine if this scalar needs quotes.
static bool mustQuote(StringRef) { return true; }
};
Block Scalars¶
YAML block scalars are string literals that are represented in YAML using the literal block notation, just like the example shown below:
text: |
First line
Second line
The YAML I/O library provides support for translating between YAML block scalars and specific C++ types by allowing you to specialize BlockScalarTraits<> on your data type. The library doesn’t provide any built-in support for block scalar I/O for types like std::string and llvm::StringRef as they are already supported by YAML I/O and use the ordinary scalar notation by default.
BlockScalarTraits specializations are very similar to the ScalarTraits specialization - YAML I/O will provide the native type and your specialization must create a temporary llvm::StringRef when writing, and it will also provide an llvm::StringRef that has the value of that block scalar and your specialization must convert that to your native data type when reading. An example of a custom type with an appropriate specialization of BlockScalarTraits is shown below:
using llvm::yaml::BlockScalarTraits;
using llvm::yaml::IO;
struct MyStringType {
std::string Str;
};
template <>
struct BlockScalarTraits<MyStringType> {
static void output(const MyStringType &Value, void *Ctxt,
llvm::raw_ostream &OS) {
OS << Value.Str;
}
static StringRef input(StringRef Scalar, void *Ctxt,
MyStringType &Value) {
Value.Str = Scalar.str();
return StringRef();
}
};
Mappings¶
To be translated to or from a YAML mapping for your type T you must specialize llvm::yaml::MappingTraits on T and implement the “void mapping(IO &io, T&)” method. If your native data structures use pointers to a class everywhere, you can specialize on the class pointer. Examples:
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
// Example of struct Foo which is used by value
template <>
struct MappingTraits<Foo> {
static void mapping(IO &io, Foo &foo) {
io.mapOptional("size", foo.size);
...
}
};
// Example of struct Bar which is natively always a pointer
template <>
struct MappingTraits<Bar*> {
static void mapping(IO &io, Bar *&bar) {
io.mapOptional("size", bar->size);
...
}
};
No Normalization¶
The mapping() method is responsible, if needed, for normalizing and denormalizing. In a simple case where the native data structure requires no normalization, the mapping method just uses mapOptional() or mapRequired() to bind the struct’s fields to YAML key names. For example:
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
template <>
struct MappingTraits<Person> {
static void mapping(IO &io, Person &info) {
io.mapRequired("name", info.name);
io.mapOptional("hat-size", info.hatSize);
}
};
Normalization¶
When [de]normalization is required, the mapping() method needs a way to access normalized values as fields. To help with this, there is a template MappingNormalization<> which you can then use to automatically do the normalization and denormalization. The template is used to create a local variable in your mapping() method which contains the normalized keys.
Suppose you have native data type Polar which specifies a position in polar coordinates (distance, angle):
struct Polar {
float distance;
float angle;
};
but you’ve decided the normalized YAML for should be in x,y coordinates. That is, you want the yaml to look like:
x: 10.3
y: -4.7
You can support this by defining a MappingTraits that normalizes the polar coordinates to x,y coordinates when writing YAML and denormalizes x,y coordinates into polar when reading YAML.
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
template <>
struct MappingTraits<Polar> {
class NormalizedPolar {
public:
NormalizedPolar(IO &io)
: x(0.0), y(0.0) {
}
NormalizedPolar(IO &, Polar &polar)
: x(polar.distance * cos(polar.angle)),
y(polar.distance * sin(polar.angle)) {
}
Polar denormalize(IO &) {
return Polar(sqrt(x*x+y*y), arctan(x,y));
}
float x;
float y;
};
static void mapping(IO &io, Polar &polar) {
MappingNormalization<NormalizedPolar, Polar> keys(io, polar);
io.mapRequired("x", keys->x);
io.mapRequired("y", keys->y);
}
};
When writing YAML, the local variable “keys” will be a stack allocated instance of NormalizedPolar, constructed from the supplied polar object which initializes it x and y fields. The mapRequired() methods then write out the x and y values as key/value pairs.
When reading YAML, the local variable “keys” will be a stack allocated instance of NormalizedPolar, constructed by the empty constructor. The mapRequired methods will find the matching key in the YAML document and fill in the x and y fields of the NormalizedPolar object keys. At the end of the mapping() method when the local keys variable goes out of scope, the denormalize() method will automatically be called to convert the read values back to polar coordinates, and then assigned back to the second parameter to mapping().
In some cases, the normalized class may be a subclass of the native type and could be returned by the denormalize() method, except that the temporary normalized instance is stack allocated. In these cases, the utility template MappingNormalizationHeap<> can be used instead. It just like MappingNormalization<> except that it heap allocates the normalized object when reading YAML. It never destroys the normalized object. The denormalize() method can this return “this”.
Default values¶
Within a mapping() method, calls to io.mapRequired() mean that that key is required to exist when parsing YAML documents, otherwise YAML I/O will issue an error.
On the other hand, keys registered with io.mapOptional() are allowed to not exist in the YAML document being read. So what value is put in the field for those optional keys? There are two steps to how those optional fields are filled in. First, the second parameter to the mapping() method is a reference to a native class. That native class must have a default constructor. Whatever value the default constructor initially sets for an optional field will be that field’s value. Second, the mapOptional() method has an optional third parameter. If provided it is the value that mapOptional() should set that field to if the YAML document does not have that key.
There is one important difference between those two ways (default constructor and third parameter to mapOptional). When YAML I/O generates a YAML document, if the mapOptional() third parameter is used, if the actual value being written is the same as (using ==) the default value, then that key/value is not written.
Order of Keys¶
When writing out a YAML document, the keys are written in the order that the calls to mapRequired()/mapOptional() are made in the mapping() method. This gives you a chance to write the fields in an order that a human reader of the YAML document would find natural. This may be different that the order of the fields in the native class.
When reading in a YAML document, the keys in the document can be in any order, but they are processed in the order that the calls to mapRequired()/mapOptional() are made in the mapping() method. That enables some interesting functionality. For instance, if the first field bound is the cpu and the second field bound is flags, and the flags are cpu specific, you can programmatically switch how the flags are converted to and from YAML based on the cpu. This works for both reading and writing. For example:
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
struct Info {
CPUs cpu;
uint32_t flags;
};
template <>
struct MappingTraits<Info> {
static void mapping(IO &io, Info &info) {
io.mapRequired("cpu", info.cpu);
// flags must come after cpu for this to work when reading yaml
if ( info.cpu == cpu_x86_64 )
io.mapRequired("flags", *(My86_64Flags*)info.flags);
else
io.mapRequired("flags", *(My86Flags*)info.flags);
}
};
Tags¶
The YAML syntax supports tags as a way to specify the type of a node before it is parsed. This allows dynamic types of nodes. But the YAML I/O model uses static typing, so there are limits to how you can use tags with the YAML I/O model. Recently, we added support to YAML I/O for checking/setting the optional tag on a map. Using this functionality it is even possbile to support different mappings, as long as they are convertable.
To check a tag, inside your mapping() method you can use io.mapTag() to specify what the tag should be. This will also add that tag when writing yaml.
Validation¶
Sometimes in a yaml map, each key/value pair is valid, but the combination is
not. This is similar to something having no syntax errors, but still having
semantic errors. To support semantic level checking, YAML I/O allows
an optional validate()
method in a MappingTraits template specialization.
When parsing yaml, the validate()
method is call after all key/values in
the map have been processed. Any error message returned by the validate()
method during input will be printed just a like a syntax error would be printed.
When writing yaml, the validate()
method is called before the yaml
key/values are written. Any error during output will trigger an assert()
because it is a programming error to have invalid struct values.
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
struct Stuff {
...
};
template <>
struct MappingTraits<Stuff> {
static void mapping(IO &io, Stuff &stuff) {
...
}
static StringRef validate(IO &io, Stuff &stuff) {
// Look at all fields in 'stuff' and if there
// are any bad values return a string describing
// the error. Otherwise return an empty string.
return StringRef();
}
};
Flow Mapping¶
A YAML “flow mapping” is a mapping that uses the inline notation (e.g { x: 1, y: 0 } ) when written to YAML. To specify that a type should be written in YAML using flow mapping, your MappingTraits specialization should add “static const bool flow = true;”. For instance:
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
struct Stuff {
...
};
template <>
struct MappingTraits<Stuff> {
static void mapping(IO &io, Stuff &stuff) {
...
}
static const bool flow = true;
}
Flow mappings are subject to line wrapping according to the Output object configuration.
Sequence¶
To be translated to or from a YAML sequence for your type T you must specialize
llvm::yaml::SequenceTraits on T and implement two methods:
size_t size(IO &io, T&)
and
T::value_type& element(IO &io, T&, size_t indx)
. For example:
template <>
struct SequenceTraits<MySeq> {
static size_t size(IO &io, MySeq &list) { ... }
static MySeqEl &element(IO &io, MySeq &list, size_t index) { ... }
};
The size() method returns how many elements are currently in your sequence. The element() method returns a reference to the i’th element in the sequence. When parsing YAML, the element() method may be called with an index one bigger than the current size. Your element() method should allocate space for one more element (using default constructor if element is a C++ object) and returns a reference to that new allocated space.
Flow Sequence¶
A YAML “flow sequence” is a sequence that when written to YAML it uses the inline notation (e.g [ foo, bar ] ). To specify that a sequence type should be written in YAML as a flow sequence, your SequenceTraits specialization should add “static const bool flow = true;”. For instance:
template <>
struct SequenceTraits<MyList> {
static size_t size(IO &io, MyList &list) { ... }
static MyListEl &element(IO &io, MyList &list, size_t index) { ... }
// The existence of this member causes YAML I/O to use a flow sequence
static const bool flow = true;
};
With the above, if you used MyList as the data type in your native data structures, then when converted to YAML, a flow sequence of integers will be used (e.g. [ 10, -3, 4 ]).
Flow sequences are subject to line wrapping according to the Output object configuration.
Utility Macros¶
Since a common source of sequences is std::vector<>, YAML I/O provides macros: LLVM_YAML_IS_SEQUENCE_VECTOR() and LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR() which can be used to easily specify SequenceTraits<> on a std::vector type. YAML I/O does not partial specialize SequenceTraits on std::vector<> because that would force all vectors to be sequences. An example use of the macros:
std::vector<MyType1>;
std::vector<MyType2>;
LLVM_YAML_IS_SEQUENCE_VECTOR(MyType1)
LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR(MyType2)
Document List¶
YAML allows you to define multiple “documents” in a single YAML file. Each new document starts with a left aligned “—” token. The end of all documents is denoted with a left aligned ”...” token. Many users of YAML will never have need for multiple documents. The top level node in their YAML schema will be a mapping or sequence. For those cases, the following is not needed. But for cases where you do want multiple documents, you can specify a trait for you document list type. The trait has the same methods as SequenceTraits but is named DocumentListTraits. For example:
template <>
struct DocumentListTraits<MyDocList> {
static size_t size(IO &io, MyDocList &list) { ... }
static MyDocType element(IO &io, MyDocList &list, size_t index) { ... }
};
User Context Data¶
When an llvm::yaml::Input or llvm::yaml::Output object is created their constructors take an optional “context” parameter. This is a pointer to whatever state information you might need.
For instance, in a previous example we showed how the conversion type for a flags field could be determined at runtime based on the value of another field in the mapping. But what if an inner mapping needs to know some field value of an outer mapping? That is where the “context” parameter comes in. You can set values in the context in the outer map’s mapping() method and retrieve those values in the inner map’s mapping() method.
The context value is just a void*. All your traits which use the context and operate on your native data types, need to agree what the context value actually is. It could be a pointer to an object or struct which your various traits use to shared context sensitive information.
Output¶
The llvm::yaml::Output class is used to generate a YAML document from your in-memory data structures, using traits defined on your data types. To instantiate an Output object you need an llvm::raw_ostream, an optional context pointer and an optional wrapping column:
class Output : public IO {
public:
Output(llvm::raw_ostream &, void *context = NULL, int WrapColumn = 70);
Once you have an Output object, you can use the C++ stream operator on it
to write your native data as YAML. One thing to recall is that a YAML file
can contain multiple “documents”. If the top level data structure you are
streaming as YAML is a mapping, scalar, or sequence, then Output assumes you
are generating one document and wraps the mapping output
with “---
” and trailing “...
”.
The WrapColumn parameter will cause the flow mappings and sequences to line-wrap when they go over the supplied column. Pass 0 to completely suppress the wrapping.
using llvm::yaml::Output;
void dumpMyMapDoc(const MyMapType &info) {
Output yout(llvm::outs());
yout << info;
}
The above could produce output like:
---
name: Tom
hat-size: 7
...
On the other hand, if the top level data structure you are streaming as YAML has a DocumentListTraits specialization, then Output walks through each element of your DocumentList and generates a “—” before the start of each element and ends with a ”...”.
using llvm::yaml::Output;
void dumpMyMapDoc(const MyDocListType &docList) {
Output yout(llvm::outs());
yout << docList;
}
The above could produce output like:
---
name: Tom
hat-size: 7
---
name: Tom
shoe-size: 11
...
Input¶
The llvm::yaml::Input class is used to parse YAML document(s) into your native data structures. To instantiate an Input object you need a StringRef to the entire YAML file, and optionally a context pointer:
class Input : public IO {
public:
Input(StringRef inputContent, void *context=NULL);
Once you have an Input object, you can use the C++ stream operator to read the document(s). If you expect there might be multiple YAML documents in one file, you’ll need to specialize DocumentListTraits on a list of your document type and stream in that document list type. Otherwise you can just stream in the document type. Also, you can check if there was any syntax errors in the YAML be calling the error() method on the Input object. For example:
// Reading a single document
using llvm::yaml::Input;
Input yin(mb.getBuffer());
// Parse the YAML file
MyDocType theDoc;
yin >> theDoc;
// Check for error
if ( yin.error() )
return;
// Reading multiple documents in one file
using llvm::yaml::Input;
LLVM_YAML_IS_DOCUMENT_LIST_VECTOR(std::vector<MyDocType>)
Input yin(mb.getBuffer());
// Parse the YAML file
std::vector<MyDocType> theDocList;
yin >> theDocList;
// Check for error
if ( yin.error() )
return;
The Often Misunderstood GEP Instruction¶
- Introduction
- Address Computation
- What is the first index of the GEP instruction?
- Why is the extra 0 index required?
- What is dereferenced by GEP?
- Why don’t GEP x,0,0,1 and GEP x,1 alias?
- Why do GEP x,1,0,0 and GEP x,1 alias?
- Can GEP index into vector elements?
- What effect do address spaces have on GEPs?
- How is GEP different from
ptrtoint
, arithmetic, andinttoptr
? - I’m writing a backend for a target which needs custom lowering for GEP. How do I do this?
- How does VLA addressing work with GEPs?
- Rules
- What happens if an array index is out of bounds?
- Can array indices be negative?
- Can I compare two values computed with GEPs?
- Can I do GEP with a different pointer type than the type of the underlying object?
- Can I cast an object’s address to integer and add it to null?
- Can I compute the distance between two objects, and add that value to one address to compute the other address?
- Can I do type-based alias analysis on LLVM IR?
- What happens if a GEP computation overflows?
- How can I tell if my front-end is following the rules?
- Rationale
- Summary
Introduction¶
This document seeks to dispel the mystery and confusion surrounding LLVM’s GetElementPtr (GEP) instruction. Questions about the wily GEP instruction are probably the most frequently occurring questions once a developer gets down to coding with LLVM. Here we lay out the sources of confusion and show that the GEP instruction is really quite simple.
Address Computation¶
When people are first confronted with the GEP instruction, they tend to relate it to known concepts from other programming paradigms, most notably C array indexing and field selection. GEP closely resembles C array indexing and field selection, however it is a little different and this leads to the following questions.
What is the first index of the GEP instruction?¶
Quick answer: The index stepping through the first operand.
The confusion with the first index usually arises from thinking about the GetElementPtr instruction as if it was a C index operator. They aren’t the same. For example, when we write, in “C”:
AType *Foo;
...
X = &Foo->F;
it is natural to think that there is only one index, the selection of the field
F
. However, in this example, Foo
is a pointer. That pointer
must be indexed explicitly in LLVM. C, on the other hand, indices through it
transparently. To arrive at the same address location as the C code, you would
provide the GEP instruction with two index operands. The first operand indexes
through the pointer; the second operand indexes the field F
of the
structure, just as if you wrote:
X = &Foo[0].F;
Sometimes this question gets rephrased as:
Why is it okay to index through the first pointer, but subsequent pointers won’t be dereferenced?
The answer is simply because memory does not have to be accessed to perform the computation. The first operand to the GEP instruction must be a value of a pointer type. The value of the pointer is provided directly to the GEP instruction as an operand without any need for accessing memory. It must, therefore be indexed and requires an index operand. Consider this example:
struct munger_struct {
int f1;
int f2;
};
void munge(struct munger_struct *P) {
P[0].f1 = P[1].f1 + P[2].f2;
}
...
munger_struct Array[3];
...
munge(Array);
In this “C” example, the front end compiler (Clang) will generate three GEP
instructions for the three indices through “P” in the assignment statement. The
function argument P
will be the first operand of each of these GEP
instructions. The second operand indexes through that pointer. The third
operand will be the field offset into the struct munger_struct
type, for
either the f1
or f2
field. So, in LLVM assembly the munge
function
looks like:
void %munge(%struct.munger_struct* %P) {
entry:
%tmp = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 1, i32 0
%tmp = load i32* %tmp
%tmp6 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 2, i32 1
%tmp7 = load i32* %tmp6
%tmp8 = add i32 %tmp7, %tmp
%tmp9 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 0, i32 0
store i32 %tmp8, i32* %tmp9
ret void
}
In each case the first operand is the pointer through which the GEP instruction starts. The same is true whether the first operand is an argument, allocated memory, or a global variable.
To make this clear, let’s consider a more obtuse example:
%MyVar = uninitialized global i32
...
%idx1 = getelementptr i32, i32* %MyVar, i64 0
%idx2 = getelementptr i32, i32* %MyVar, i64 1
%idx3 = getelementptr i32, i32* %MyVar, i64 2
These GEP instructions are simply making address computations from the base
address of MyVar
. They compute, as follows (using C syntax):
idx1 = (char*) &MyVar + 0
idx2 = (char*) &MyVar + 4
idx3 = (char*) &MyVar + 8
Since the type i32
is known to be four bytes long, the indices 0, 1 and 2
translate into memory offsets of 0, 4, and 8, respectively. No memory is
accessed to make these computations because the address of %MyVar
is passed
directly to the GEP instructions.
The obtuse part of this example is in the cases of %idx2
and %idx3
. They
result in the computation of addresses that point to memory past the end of the
%MyVar
global, which is only one i32
long, not three i32
s long.
While this is legal in LLVM, it is inadvisable because any load or store with
the pointer that results from these GEP instructions would produce undefined
results.
Why is the extra 0 index required?¶
Quick answer: there are no superfluous indices.
This question arises most often when the GEP instruction is applied to a global variable which is always a pointer type. For example, consider this:
%MyStruct = uninitialized global { float*, i32 }
...
%idx = getelementptr { float*, i32 }, { float*, i32 }* %MyStruct, i64 0, i32 1
The GEP above yields an i32*
by indexing the i32
typed field of the
structure %MyStruct
. When people first look at it, they wonder why the i64
0
index is needed. However, a closer inspection of how globals and GEPs work
reveals the need. Becoming aware of the following facts will dispel the
confusion:
- The type of
%MyStruct
is not{ float*, i32 }
but rather{ float*, i32 }*
. That is,%MyStruct
is a pointer to a structure containing a pointer to afloat
and ani32
. - Point #1 is evidenced by noticing the type of the first operand of the GEP
instruction (
%MyStruct
) which is{ float*, i32 }*
. - The first index,
i64 0
is required to step over the global variable%MyStruct
. Since the first argument to the GEP instruction must always be a value of pointer type, the first index steps through that pointer. A value of 0 means 0 elements offset from that pointer. - The second index,
i32 1
selects the second field of the structure (thei32
).
What is dereferenced by GEP?¶
Quick answer: nothing.
The GetElementPtr instruction dereferences nothing. That is, it doesn’t access memory in any way. That’s what the Load and Store instructions are for. GEP is only involved in the computation of addresses. For example, consider this:
%MyVar = uninitialized global { [40 x i32 ]* }
...
%idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %MyVar, i64 0, i32 0, i64 0, i64 17
In this example, we have a global variable, %MyVar
that is a pointer to a
structure containing a pointer to an array of 40 ints. The GEP instruction seems
to be accessing the 18th integer of the structure’s array of ints. However, this
is actually an illegal GEP instruction. It won’t compile. The reason is that the
pointer in the structure must be dereferenced in order to index into the
array of 40 ints. Since the GEP instruction never accesses memory, it is
illegal.
In order to access the 18th integer in the array, you would need to do the following:
%idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %, i64 0, i32 0
%arr = load [40 x i32]** %idx
%idx = getelementptr [40 x i32], [40 x i32]* %arr, i64 0, i64 17
In this case, we have to load the pointer in the structure with a load instruction before we can index into the array. If the example was changed to:
%MyVar = uninitialized global { [40 x i32 ] }
...
%idx = getelementptr { [40 x i32] }, { [40 x i32] }*, i64 0, i32 0, i64 17
then everything works fine. In this case, the structure does not contain a
pointer and the GEP instruction can index through the global variable, into the
first field of the structure and access the 18th i32
in the array there.
Why don’t GEP x,0,0,1 and GEP x,1 alias?¶
Quick Answer: They compute different address locations.
If you look at the first indices in these GEP instructions you find that they are different (0 and 1), therefore the address computation diverges with that index. Consider this example:
%MyVar = global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 0, i32 0, i64 1
%idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
In this example, idx1
computes the address of the second integer in the
array that is in the structure in %MyVar
, that is MyVar+4
. The type of
idx1
is i32*
. However, idx2
computes the address of the next
structure after %MyVar
. The type of idx2
is { [10 x i32] }*
and its
value is equivalent to MyVar + 40
because it indexes past the ten 4-byte
integers in MyVar
. Obviously, in such a situation, the pointers don’t
alias.
Why do GEP x,1,0,0 and GEP x,1 alias?¶
Quick Answer: They compute the same address location.
These two GEP instructions will compute the same address because indexing through the 0th element does not change the address. However, it does change the type. Consider this example:
%MyVar = global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1, i32 0, i64 0
%idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
In this example, the value of %idx1
is %MyVar+40
and its type is
i32*
. The value of %idx2
is also MyVar+40
but its type is { [10 x
i32] }*
.
Can GEP index into vector elements?¶
This hasn’t always been forcefully disallowed, though it’s not recommended. It leads to awkward special cases in the optimizers, and fundamental inconsistency in the IR. In the future, it will probably be outright disallowed.
What effect do address spaces have on GEPs?¶
None, except that the address space qualifier on the first operand pointer type always matches the address space qualifier on the result type.
How is GEP different from ptrtoint
, arithmetic, and inttoptr
?¶
It’s very similar; there are only subtle differences.
With ptrtoint, you have to pick an integer type. One approach is to pick i64; this is safe on everything LLVM supports (LLVM internally assumes pointers are never wider than 64 bits in many places), and the optimizer will actually narrow the i64 arithmetic down to the actual pointer size on targets which don’t support 64-bit arithmetic in most cases. However, there are some cases where it doesn’t do this. With GEP you can avoid this problem.
Also, GEP carries additional pointer aliasing rules. It’s invalid to take a GEP from one object, address into a different separately allocated object, and dereference it. IR producers (front-ends) must follow this rule, and consumers (optimizers, specifically alias analysis) benefit from being able to rely on it. See the Rules section for more information.
And, GEP is more concise in common cases.
However, for the underlying integer computation implied, there is no difference.
I’m writing a backend for a target which needs custom lowering for GEP. How do I do this?¶
You don’t. The integer computation implied by a GEP is target-independent. Typically what you’ll need to do is make your backend pattern-match expressions trees involving ADD, MUL, etc., which are what GEP is lowered into. This has the advantage of letting your code work correctly in more cases.
GEP does use target-dependent parameters for the size and layout of data types, which targets can customize.
If you require support for addressing units which are not 8 bits, you’ll need to fix a lot of code in the backend, with GEP lowering being only a small piece of the overall picture.
How does VLA addressing work with GEPs?¶
GEPs don’t natively support VLAs. LLVM’s type system is entirely static, and GEP address computations are guided by an LLVM type.
VLA indices can be implemented as linearized indices. For example, an expression
like X[a][b][c]
, must be effectively lowered into a form like
X[a*m+b*n+c]
, so that it appears to the GEP as a single-dimensional array
reference.
This means if you want to write an analysis which understands array indices and you want to support VLAs, your code will have to be prepared to reverse-engineer the linearization. One way to solve this problem is to use the ScalarEvolution library, which always presents VLA and non-VLA indexing in the same manner.
Rules¶
What happens if an array index is out of bounds?¶
There are two senses in which an array index can be out of bounds.
First, there’s the array type which comes from the (static) type of the first operand to the GEP. Indices greater than the number of elements in the corresponding static array type are valid. There is no problem with out of bounds indices in this sense. Indexing into an array only depends on the size of the array element, not the number of elements.
A common example of how this is used is arrays where the size is not known. It’s common to use array types with zero length to represent these. The fact that the static type says there are zero elements is irrelevant; it’s perfectly valid to compute arbitrary element indices, as the computation only depends on the size of the array element, not the number of elements. Note that zero-sized arrays are not a special case here.
This sense is unconnected with inbounds
keyword. The inbounds
keyword is
designed to describe low-level pointer arithmetic overflow conditions, rather
than high-level array indexing rules.
Analysis passes which wish to understand array indexing should not assume that the static array type bounds are respected.
The second sense of being out of bounds is computing an address that’s beyond the actual underlying allocated object.
With the inbounds
keyword, the result value of the GEP is undefined if the
address is outside the actual underlying allocated object and not the address
one-past-the-end.
Without the inbounds
keyword, there are no restrictions on computing
out-of-bounds addresses. Obviously, performing a load or a store requires an
address of allocated and sufficiently aligned memory. But the GEP itself is only
concerned with computing addresses.
Can array indices be negative?¶
Yes. This is basically a special case of array indices being out of bounds.
Can I compare two values computed with GEPs?¶
Yes. If both addresses are within the same allocated object, or one-past-the-end, you’ll get the comparison result you expect. If either is outside of it, integer arithmetic wrapping may occur, so the comparison may not be meaningful.
Can I do GEP with a different pointer type than the type of the underlying object?¶
Yes. There are no restrictions on bitcasting a pointer value to an arbitrary pointer type. The types in a GEP serve only to define the parameters for the underlying integer computation. They need not correspond with the actual type of the underlying object.
Furthermore, loads and stores don’t have to use the same types as the type of the underlying object. Types in this context serve only to specify memory size and alignment. Beyond that there are merely a hint to the optimizer indicating how the value will likely be used.
Can I cast an object’s address to integer and add it to null?¶
You can compute an address that way, but if you use GEP to do the add, you can’t use that pointer to actually access the object, unless the object is managed outside of LLVM.
The underlying integer computation is sufficiently defined; null has a defined value — zero — and you can add whatever value you want to it.
However, it’s invalid to access (load from or store to) an LLVM-aware object
with such a pointer. This includes GlobalVariables
, Allocas
, and objects
pointed to by noalias pointers.
If you really need this functionality, you can do the arithmetic with explicit integer instructions, and use inttoptr to convert the result to an address. Most of GEP’s special aliasing rules do not apply to pointers computed from ptrtoint, arithmetic, and inttoptr sequences.
Can I compute the distance between two objects, and add that value to one address to compute the other address?¶
As with arithmetic on null, you can use GEP to compute an address that way, but you can’t use that pointer to actually access the object if you do, unless the object is managed outside of LLVM.
Also as above, ptrtoint and inttoptr provide an alternative way to do this which do not have this restriction.
Can I do type-based alias analysis on LLVM IR?¶
You can’t do type-based alias analysis using LLVM’s built-in type system, because LLVM has no restrictions on mixing types in addressing, loads or stores.
LLVM’s type-based alias analysis pass uses metadata to describe a different type system (such as the C type system), and performs type-based aliasing on top of that. Further details are in the language reference.
What happens if a GEP computation overflows?¶
If the GEP lacks the inbounds
keyword, the value is the result from
evaluating the implied two’s complement integer computation. However, since
there’s no guarantee of where an object will be allocated in the address space,
such values have limited meaning.
If the GEP has the inbounds
keyword, the result value is undefined (a “trap
value”) if the GEP overflows (i.e. wraps around the end of the address space).
As such, there are some ramifications of this for inbounds GEPs: scales implied
by array/vector/pointer indices are always known to be “nsw” since they are
signed values that are scaled by the element size. These values are also
allowed to be negative (e.g. “gep i32 *%P, i32 -1
”) but the pointer itself
is logically treated as an unsigned value. This means that GEPs have an
asymmetric relation between the pointer base (which is treated as unsigned) and
the offset applied to it (which is treated as signed). The result of the
additions within the offset calculation cannot have signed overflow, but when
applied to the base pointer, there can be signed overflow.
How can I tell if my front-end is following the rules?¶
There is currently no checker for the getelementptr rules. Currently, the only way to do this is to manually check each place in your front-end where GetElementPtr operators are created.
It’s not possible to write a checker which could find all rule violations statically. It would be possible to write a checker which works by instrumenting the code with dynamic checks though. Alternatively, it would be possible to write a static checker which catches a subset of possible problems. However, no such checker exists today.
Rationale¶
Why is GEP designed this way?¶
The design of GEP has the following goals, in rough unofficial order of priority:
- Support C, C-like languages, and languages which can be conceptually lowered into C (this covers a lot).
- Support optimizations such as those that are common in C compilers. In particular, GEP is a cornerstone of LLVM’s pointer aliasing model.
- Provide a consistent method for computing addresses so that address computations don’t need to be a part of load and store instructions in the IR.
- Support non-C-like languages, to the extent that it doesn’t interfere with other goals.
- Minimize target-specific information in the IR.
Why do struct member indices always use i32
?¶
The specific type i32 is probably just a historical artifact, however it’s wide enough for all practical purposes, so there’s been no need to change it. It doesn’t necessarily imply i32 address arithmetic; it’s just an identifier which identifies a field in a struct. Requiring that all struct indices be the same reduces the range of possibilities for cases where two GEPs are effectively the same but have distinct operand types.
What’s an uglygep?¶
Some LLVM optimizers operate on GEPs by internally lowering them into more primitive integer expressions, which allows them to be combined with other integer expressions and/or split into multiple separate integer expressions. If they’ve made non-trivial changes, translating back into LLVM IR can involve reverse-engineering the structure of the addressing in order to fit it into the static type of the original first operand. It isn’t always possibly to fully reconstruct this structure; sometimes the underlying addressing doesn’t correspond with the static type at all. In such cases the optimizer instead will emit a GEP with the base pointer casted to a simple address-unit pointer, using the name “uglygep”. This isn’t pretty, but it’s just as valid, and it’s sufficient to preserve the pointer aliasing guarantees that GEP provides.
Summary¶
In summary, here’s some things to always remember about the GetElementPtr instruction:
- The GEP instruction never accesses memory, it only provides pointer computations.
- The first operand to the GEP instruction is always a pointer and it must be indexed.
- There are no superfluous indices for the GEP instruction.
- Trailing zero indices are superfluous for pointer aliasing, but not for the types of the pointers.
- Leading zero indices are not superfluous for pointer aliasing nor the types of the pointers.
Performance Tips for Frontend Authors¶
Abstract¶
The intended audience of this document is developers of language frontends targeting LLVM IR. This document is home to a collection of tips on how to generate IR that optimizes well. As with any optimizer, LLVM has its strengths and weaknesses. In some cases, surprisingly small changes in the source IR can have a large effect on the generated code.
Avoid loads and stores of large aggregate type¶
LLVM currently does not optimize well loads and stores of large aggregate types (i.e. structs and arrays). As an alternative, consider loading individual fields from memory.
Aggregates that are smaller than the largest (performant) load or store instruction supported by the targeted hardware are well supported. These can be an effective way to represent collections of small packed fields.
Prefer zext over sext when legal¶
On some architectures (X86_64 is one), sign extension can involve an extra instruction whereas zero extension can be folded into a load. LLVM will try to replace a sext with a zext when it can be proven safe, but if you have information in your source language about the range of a integer value, it can be profitable to use a zext rather than a sext.
Alternatively, you can specify the range of the value using metadata and LLVM can do the sext to zext conversion for you.
Zext GEP indices to machine register width¶
Internally, LLVM often promotes the width of GEP indices to machine register width. When it does so, it will default to using sign extension (sext) operations for safety. If your source language provides information about the range of the index, you may wish to manually extend indices to machine register width using a zext instruction.
Other things to consider¶
- Make sure that a DataLayout is provided (this will likely become required in the near future, but is certainly important for optimization).
- Add nsw/nuw flags as appropriate. Reasoning about overflow is generally hard for an optimizer so providing these facts from the frontend can be very impactful.
- Use fast-math flags on floating point operations if legal. If you don’t need strict IEEE floating point semantics, there are a number of additional optimizations that can be performed. This can be highly impactful for floating point intensive computations.
- Use inbounds on geps. This can help to disambiguate some aliasing queries.
- Add noalias/align/dereferenceable/nonnull to function arguments and return values as appropriate
- Mark functions as readnone/readonly or noreturn/nounwind when known. The optimizer will try to infer these flags, but may not always be able to. Manual annotations are particularly important for external functions that the optimizer can not analyze.
- Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing analysis), prefer GEPs
- Use the lifetime.start/lifetime.end and invariant.start/invariant.end intrinsics where possible. Common profitable uses are for stack like data structures (thus allowing dead store elimination) and for describing life times of allocas (thus allowing smaller stack sizes).
- Use pointer aliasing metadata, especially tbaa metadata, to communicate otherwise-non-deducible pointer aliasing facts
- Use the “most-private” possible linkage types for the functions being defined (private, internal or linkonce_odr preferably)
- Mark invariant locations using !invariant.load and TBAA’s constant flags
- Prefer globals over inttoptr of a constant address - this gives you dereferencability information. In MCJIT, use getSymbolAddress to provide actual address.
- Be wary of ordered and atomic memory operations. They are hard to optimize and may not be well optimized by the current optimizer. Depending on your source language, you may consider using fences instead.
- If calling a function which is known to throw an exception (unwind), use an invoke with a normal destination which contains an unreachable instruction. This form conveys to the optimizer that the call returns abnormally. For an invoke which neither returns normally or requires unwind code in the current function, you can use a noreturn call instruction if desired. This is generally not required because the optimizer will convert an invoke with an unreachable unwind destination to a call instruction.
- If you language uses range checks, consider using the IRCE pass. It is not currently part of the standard pass order.
- For languages with numerous rarely executed guard conditions (e.g. null checks, type checks, range checks) consider adding an extra execution or two of LoopUnswith and LICM to your pass order. The standard pass order, which is tuned for C and C++ applications, may not be sufficient to remove all dischargeable checks from loops.
- Use profile metadata to indicate statically known cold paths, even if dynamic profiling information is not available. This can make a large difference in code placement and thus the performance of tight loops.
- When generating code for loops, try to avoid terminating the header block of the loop earlier than necessary. If the terminator of the loop header block is a loop exiting conditional branch, the effectiveness of LICM will be limited for loads not in the header. (This is due to the fact that LLVM may not know such a load is safe to speculatively execute and thus can’t lift an otherwise loop invariant load unless it can prove the exiting condition is not taken.) It can be profitable, in some cases, to emit such instructions into the header even if they are not used along a rarely executed path that exits the loop. This guidance specifically does not apply if the condition which terminates the loop header is itself invariant, or can be easily discharged by inspecting the loop index variables.
- In hot loops, consider duplicating instructions from small basic blocks which end in highly predictable terminators into their successor blocks. If a hot successor block contains instructions which can be vectorized with the duplicated ones, this can provide a noticeable throughput improvement. Note that this is not always profitable and does involve a potentially large increase in code size.
- Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds of predecessors). Among other issues, the register allocator is known to perform badly with confronted with such structures. The only exception to this guidance is that a unified return block with high in-degree is fine.
- When checking a value against a constant, emit the check using a consistent comparison type. The GVN pass will optimize redundant equalities even if the type of comparison is inverted, but GVN only runs late in the pipeline. As a result, you may miss the opportunity to run other important optimizations. Improvements to EarlyCSE to remove this issue are tracked in Bug 23333.
- Avoid using arithmetic intrinsics unless you are required by your source language specification to emit a particular code sequence. The optimizer is quite good at reasoning about general control flow and arithmetic, it is not anywhere near as strong at reasoning about the various intrinsics. If profitable for code generation purposes, the optimizer will likely form the intrinsics itself late in the optimization pipeline. It is very rarely profitable to emit these directly in the language frontend. This item explicitly includes the use of the overflow intrinsics.
- Avoid using the assume intrinsic until you’ve established that a) there’s no other way to express the given fact and b) that fact is critical for optimization purposes. Assumes are a great prototyping mechanism, but they can have negative effects on both compile time and optimization effectiveness. The former is fixable with enough effort, but the later is fairly fundamental to their designed purpose.
p.s. If you want to help improve this document, patches expanding any of the above items into standalone sections of their own with a more complete discussion would be very welcome.
Adding to this document¶
If you run across a case that you feel deserves to be covered here, please send a patch to llvm-commits for review.
If you have questions on these items, please direct them to llvm-dev. The more relevant context you are able to give to your question, the more likely it is to be answered.
MCJIT Design and Implementation¶
Introduction¶
This document describes the internal workings of the MCJIT execution engine and the RuntimeDyld component. It is intended as a high level overview of the implementation, showing the flow and interactions of objects throughout the code generation and dynamic loading process.
Engine Creation¶
In most cases, an EngineBuilder object is used to create an instance of the MCJIT execution engine. The EngineBuilder takes an llvm::Module object as an argument to its constructor. The client may then set various options that we control the later be passed along to the MCJIT engine, including the selection of MCJIT as the engine type to be created. Of particular interest is the EngineBuilder::setMCJITMemoryManager function. If the client does not explicitly create a memory manager at this time, a default memory manager (specifically SectionMemoryManager) will be created when the MCJIT engine is instantiated.
Once the options have been set, a client calls EngineBuilder::create to create an instance of the MCJIT engine. If the client does not use the form of this function that takes a TargetMachine as a parameter, a new TargetMachine will be created based on the target triple associated with the Module that was used to create the EngineBuilder.

EngineBuilder::create will call the static MCJIT::createJIT function, passing in its pointers to the module, memory manager and target machine objects, all of which will subsequently be owned by the MCJIT object.
The MCJIT class has a member variable, Dyld, which contains an instance of the RuntimeDyld wrapper class. This member will be used for communications between MCJIT and the actual RuntimeDyldImpl object that gets created when an object is loaded.

Upon creation, MCJIT holds a pointer to the Module object that it received from EngineBuilder but it does not immediately generate code for this module. Code generation is deferred until either the MCJIT::finalizeObject method is called explicitly or a function such as MCJIT::getPointerToFunction is called which requires the code to have been generated.
Code Generation¶
When code generation is triggered, as described above, MCJIT will first attempt to retrieve an object image from its ObjectCache member, if one has been set. If a cached object image cannot be retrieved, MCJIT will call its emitObject method. MCJIT::emitObject uses a local PassManager instance and creates a new ObjectBufferStream instance, both of which it passes to TargetMachine::addPassesToEmitMC before calling PassManager::run on the Module with which it was created.

The PassManager::run call causes the MC code generation mechanisms to emit a complete relocatable binary object image (either in either ELF or MachO format, depending on the target) into the ObjectBufferStream object, which is flushed to complete the process. If an ObjectCache is being used, the image will be passed to the ObjectCache here.
At this point, the ObjectBufferStream contains the raw object image. Before the code can be executed, the code and data sections from this image must be loaded into suitable memory, relocations must be applied and memory permission and code cache invalidation (if required) must be completed.
Object Loading¶
Once an object image has been obtained, either through code generation or having been retrieved from an ObjectCache, it is passed to RuntimeDyld to be loaded. The RuntimeDyld wrapper class examines the object to determine its file format and creates an instance of either RuntimeDyldELF or RuntimeDyldMachO (both of which derive from the RuntimeDyldImpl base class) and calls the RuntimeDyldImpl::loadObject method to perform that actual loading.

RuntimeDyldImpl::loadObject begins by creating an ObjectImage instance from the ObjectBuffer it received. ObjectImage, which wraps the ObjectFile class, is a helper class which parses the binary object image and provides access to the information contained in the format-specific headers, including section, symbol and relocation information.
RuntimeDyldImpl::loadObject then iterates through the symbols in the image. Information about common symbols is collected for later use. For each function or data symbol, the associated section is loaded into memory and the symbol is stored in a symbol table map data structure. When the iteration is complete, a section is emitted for the common symbols.
Next, RuntimeDyldImpl::loadObject iterates through the sections in the object image and for each section iterates through the relocations for that sections. For each relocation, it calls the format-specific processRelocationRef method, which will examine the relocation and store it in one of two data structures, a section-based relocation list map and an external symbol relocation map.

When RuntimeDyldImpl::loadObject returns, all of the code and data sections for the object will have been loaded into memory allocated by the memory manager and relocation information will have been prepared, but the relocations have not yet been applied and the generated code is still not ready to be executed.
[Currently (as of August 2013) the MCJIT engine will immediately apply relocations when loadObject completes. However, this shouldn’t be happening. Because the code may have been generated for a remote target, the client should be given a chance to re-map the section addresses before relocations are applied. It is possible to apply relocations multiple times, but in the case where addresses are to be re-mapped, this first application is wasted effort.]
Address Remapping¶
At any time after initial code has been generated and before finalizeObject is called, the client can remap the address of sections in the object. Typically this is done because the code was generated for an external process and is being mapped into that process’ address space. The client remaps the section address by calling MCJIT::mapSectionAddress. This should happen before the section memory is copied to its new location.
When MCJIT::mapSectionAddress is called, MCJIT passes the call on to RuntimeDyldImpl (via its Dyld member). RuntimeDyldImpl stores the new address in an internal data structure but does not update the code at this time, since other sections are likely to change.
When the client is finished remapping section addresses, it will call MCJIT::finalizeObject to complete the remapping process.
Final Preparations¶
When MCJIT::finalizeObject is called, MCJIT calls RuntimeDyld::resolveRelocations. This function will attempt to locate any external symbols and then apply all relocations for the object.
External symbols are resolved by calling the memory manager’s getPointerToNamedFunction method. The memory manager will return the address of the requested symbol in the target address space. (Note, this may not be a valid pointer in the host process.) RuntimeDyld will then iterate through the list of relocations it has stored which are associated with this symbol and invoke the resolveRelocation method which, through an format-specific implementation, will apply the relocation to the loaded section memory.
Next, RuntimeDyld::resolveRelocations iterates through the list of sections and for each section iterates through a list of relocations that have been saved which reference that symbol and call resolveRelocation for each entry in this list. The relocation list here is a list of relocations for which the symbol associated with the relocation is located in the section associated with the list. Each of these locations will have a target location at which the relocation will be applied that is likely located in a different section.

Once relocations have been applied as described above, MCJIT calls RuntimeDyld::getEHFrameSection, and if a non-zero result is returned passes the section data to the memory manager’s registerEHFrames method. This allows the memory manager to call any desired target-specific functions, such as registering the EH frame information with a debugger.
Finally, MCJIT calls the memory manager’s finalizeMemory method. In this method, the memory manager will invalidate the target code cache, if necessary, and apply final permissions to the memory pages it has allocated for code and data memory.
- Getting Started with the LLVM System
- Discusses how to get up and running quickly with the LLVM infrastructure. Everything from unpacking and compilation of the distribution to execution of some tools.
- Building LLVM with CMake
- An addendum to the main Getting Started guide for those using the CMake build system.
- How To Build On ARM
- Notes on building and testing LLVM/Clang on ARM.
- How To Cross-Compile Clang/LLVM using Clang/LLVM
- Notes on cross-building and testing LLVM/Clang.
- Getting Started with the LLVM System using Microsoft Visual Studio
- An addendum to the main Getting Started guide for those using Visual Studio on Windows.
- Building LLVM With Autotools
- An addendum to the Getting Started guide with instructions for building LLVM with the Autotools build system.
- LLVM Tutorial: Table of Contents
- Tutorials about using LLVM. Includes a tutorial about making a custom language with LLVM.
- LLVM Command Guide
- A reference manual for the LLVM command line utilities (“man” pages for LLVM tools).
- LLVM’s Analysis and Transform Passes
- A list of optimizations and analyses implemented in LLVM.
- Frequently Asked Questions (FAQ)
- A list of common questions and problems and their solutions.
- Release notes for the current release
- This describes new features, known bugs, and other limitations.
- How to submit an LLVM bug report
- Instructions for properly submitting information about any bugs you run into in the LLVM system.
- Sphinx Quickstart Template
- A template + tutorial for writing new Sphinx documentation. It is meant to be read in source form.
- LLVM Testing Infrastructure Guide
- A reference manual for using the LLVM testing infrastructure.
- How to build the C, C++, ObjC, and ObjC++ front end
- Instructions for building the clang front-end from source.
- The LLVM Lexicon
- Definition of acronyms, terms and concepts used in LLVM.
- How To Add Your Build Configuration To LLVM Buildbot Infrastructure
- Instructions for adding new builder to LLVM buildbot master.
- YAML I/O
- A reference guide for using LLVM’s YAML I/O library.
- The Often Misunderstood GEP Instruction
- Answers to some very frequent questions about LLVM’s most frequently misunderstood instruction.
- Performance Tips for Frontend Authors
- A collection of tips for frontend authors on how to generate IR which LLVM is able to effectively optimize.
Programming Documentation¶
For developers of applications which use LLVM as a library.
LLVM Atomic Instructions and Concurrency Guide¶
Introduction¶
Historically, LLVM has not had very strong support for concurrency; some minimal
intrinsics were provided, and volatile
was used in some cases to achieve
rough semantics in the presence of concurrency. However, this is changing;
there are now new instructions which are well-defined in the presence of threads
and asynchronous signals, and the model for existing instructions has been
clarified in the IR.
The atomic instructions are designed specifically to provide readable IR and optimized code generation for the following:
- The new C++11
<atomic>
header. (C++11 draft available here.) (C11 draft available here.) - Proper semantics for Java-style memory, for both
volatile
and regular shared variables. (Java Specification) - gcc-compatible
__sync_*
builtins. (Description) - Other scenarios with atomic semantics, including
static
variables with non-trivial constructors in C++.
Atomic and volatile in the IR are orthogonal; “volatile” is the C/C++ volatile, which ensures that every volatile load and store happens and is performed in the stated order. A couple examples: if a SequentiallyConsistent store is immediately followed by another SequentiallyConsistent store to the same address, the first store can be erased. This transformation is not allowed for a pair of volatile stores. On the other hand, a non-volatile non-atomic load can be moved across a volatile load freely, but not an Acquire load.
This document is intended to provide a guide to anyone either writing a frontend for LLVM or working on optimization passes for LLVM with a guide for how to deal with instructions with special semantics in the presence of concurrency. This is not intended to be a precise guide to the semantics; the details can get extremely complicated and unreadable, and are not usually necessary.
Optimization outside atomic¶
The basic 'load'
and 'store'
allow a variety of optimizations, but can
lead to undefined results in a concurrent environment; see NotAtomic. This
section specifically goes into the one optimizer restriction which applies in
concurrent environments, which gets a bit more of an extended description
because any optimization dealing with stores needs to be aware of it.
From the optimizer’s point of view, the rule is that if there are not any instructions with atomic ordering involved, concurrency does not matter, with one exception: if a variable might be visible to another thread or signal handler, a store cannot be inserted along a path where it might not execute otherwise. Take the following example:
/* C code, for readability; run through clang -O2 -S -emit-llvm to get
equivalent IR */
int x;
void f(int* a) {
for (int i = 0; i < 100; i++) {
if (a[i])
x += 1;
}
}
The following is equivalent in non-concurrent situations:
int x;
void f(int* a) {
int xtemp = x;
for (int i = 0; i < 100; i++) {
if (a[i])
xtemp += 1;
}
x = xtemp;
}
However, LLVM is not allowed to transform the former to the latter: it could
indirectly introduce undefined behavior if another thread can access x
at
the same time. (This example is particularly of interest because before the
concurrency model was implemented, LLVM would perform this transformation.)
Note that speculative loads are allowed; a load which is part of a race returns
undef
, but does not have undefined behavior.
Atomic instructions¶
For cases where simple loads and stores are not sufficient, LLVM provides various atomic instructions. The exact guarantees provided depend on the ordering; see Atomic orderings.
load atomic
and store atomic
provide the same basic functionality as
non-atomic loads and stores, but provide additional guarantees in situations
where threads and signals are involved.
cmpxchg
and atomicrmw
are essentially like an atomic load followed by an
atomic store (where the store is conditional for cmpxchg
), but no other
memory operation can happen on any thread between the load and store.
A fence
provides Acquire and/or Release ordering which is not part of
another operation; it is normally used along with Monotonic memory operations.
A Monotonic load followed by an Acquire fence is roughly equivalent to an
Acquire load, and a Monotonic store following a Release fence is roughly
equivalent to a Release store. SequentiallyConsistent fences behave as both
an Acquire and a Release fence, and offer some additional complicated
guarantees, see the C++11 standard for details.
Frontends generating atomic instructions generally need to be aware of the target to some degree; atomic instructions are guaranteed to be lock-free, and therefore an instruction which is wider than the target natively supports can be impossible to generate.
Atomic orderings¶
In order to achieve a balance between performance and necessary guarantees, there are six levels of atomicity. They are listed in order of strength; each level includes all the guarantees of the previous level except for Acquire/Release. (See also LangRef Ordering.)
NotAtomic¶
NotAtomic is the obvious, a load or store which is not atomic. (This isn’t really a level of atomicity, but is listed here for comparison.) This is essentially a regular load or store. If there is a race on a given memory location, loads from that location return undef.
- Relevant standard
- This is intended to match shared variables in C/C++, and to be used in any other context where memory access is necessary, and a race is impossible. (The precise definition is in LangRef Memory Model.)
- Notes for frontends
- The rule is essentially that all memory accessed with basic loads and stores by multiple threads should be protected by a lock or other synchronization; otherwise, you are likely to run into undefined behavior. If your frontend is for a “safe” language like Java, use Unordered to load and store any shared variable. Note that NotAtomic volatile loads and stores are not properly atomic; do not try to use them as a substitute. (Per the C/C++ standards, volatile does provide some limited guarantees around asynchronous signals, but atomics are generally a better solution.)
- Notes for optimizers
- Introducing loads to shared variables along a codepath where they would not otherwise exist is allowed; introducing stores to shared variables is not. See Optimization outside atomic.
- Notes for code generation
- The one interesting restriction here is that it is not allowed to write to bytes outside of the bytes relevant to a store. This is mostly relevant to unaligned stores: it is not allowed in general to convert an unaligned store into two aligned stores of the same width as the unaligned store. Backends are also expected to generate an i8 store as an i8 store, and not an instruction which writes to surrounding bytes. (If you are writing a backend for an architecture which cannot satisfy these restrictions and cares about concurrency, please send an email to llvm-dev.)
Unordered¶
Unordered is the lowest level of atomicity. It essentially guarantees that races produce somewhat sane results instead of having undefined behavior. It also guarantees the operation to be lock-free, so it does not depend on the data being part of a special atomic structure or depend on a separate per-process global lock. Note that code generation will fail for unsupported atomic operations; if you need such an operation, use explicit locking.
- Relevant standard
- This is intended to match the Java memory model for shared variables.
- Notes for frontends
- This cannot be used for synchronization, but is useful for Java and other “safe” languages which need to guarantee that the generated code never exhibits undefined behavior. Note that this guarantee is cheap on common platforms for loads of a native width, but can be expensive or unavailable for wider loads, like a 64-bit store on ARM. (A frontend for Java or other “safe” languages would normally split a 64-bit store on ARM into two 32-bit unordered stores.)
- Notes for optimizers
- In terms of the optimizer, this prohibits any transformation that transforms a single load into multiple loads, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise. Some examples of unsafe optimizations are narrowing an assignment into a bitfield, rematerializing a load, and turning loads and stores into a memcpy call. Reordering unordered operations is safe, though, and optimizers should take advantage of that because unordered operations are common in languages that need them.
- Notes for code generation
- These operations are required to be atomic in the sense that if you use
unordered loads and unordered stores, a load cannot see a value which was
never stored. A normal load or store instruction is usually sufficient, but
note that an unordered load or store cannot be split into multiple
instructions (or an instruction which does multiple memory operations, like
LDRD
on ARM without LPAE, or not naturally-alignedLDRD
on LPAE ARM).
Monotonic¶
Monotonic is the weakest level of atomicity that can be used in synchronization primitives, although it does not provide any general synchronization. It essentially guarantees that if you take all the operations affecting a specific address, a consistent ordering exists.
- Relevant standard
- This corresponds to the C++11/C11
memory_order_relaxed
; see those standards for the exact definition. - Notes for frontends
- If you are writing a frontend which uses this directly, use with caution. The
guarantees in terms of synchronization are very weak, so make sure these are
only used in a pattern which you know is correct. Generally, these would
either be used for atomic operations which do not protect other memory (like
an atomic counter), or along with a
fence
. - Notes for optimizers
- In terms of the optimizer, this can be treated as a read+write on the relevant memory location (and alias analysis will take advantage of that). In addition, it is legal to reorder non-atomic and Unordered loads around Monotonic loads. CSE/DSE and a few other optimizations are allowed, but Monotonic operations are unlikely to be used in ways which would make those optimizations useful.
- Notes for code generation
- Code generation is essentially the same as that for unordered for loads and
stores. No fences are required.
cmpxchg
andatomicrmw
are required to appear as a single operation.
Acquire¶
Acquire provides a barrier of the sort necessary to acquire a lock to access other memory with normal loads and stores.
- Relevant standard
- This corresponds to the C++11/C11
memory_order_acquire
. It should also be used for C++11/C11memory_order_consume
. - Notes for frontends
- If you are writing a frontend which uses this directly, use with caution. Acquire only provides a semantic guarantee when paired with a Release operation.
- Notes for optimizers
- Optimizers not aware of atomics can treat this like a nothrow call. It is also possible to move stores from before an Acquire load or read-modify-write operation to after it, and move non-Acquire loads from before an Acquire operation to after it.
- Notes for code generation
- Architectures with weak memory ordering (essentially everything relevant today
except x86 and SPARC) require some sort of fence to maintain the Acquire
semantics. The precise fences required varies widely by architecture, but for
a simple implementation, most architectures provide a barrier which is strong
enough for everything (
dmb
on ARM,sync
on PowerPC, etc.). Putting such a fence after the equivalent Monotonic operation is sufficient to maintain Acquire semantics for a memory operation.
Release¶
Release is similar to Acquire, but with a barrier of the sort necessary to release a lock.
- Relevant standard
- This corresponds to the C++11/C11
memory_order_release
. - Notes for frontends
- If you are writing a frontend which uses this directly, use with caution. Release only provides a semantic guarantee when paired with a Acquire operation.
- Notes for optimizers
- Optimizers not aware of atomics can treat this like a nothrow call. It is also possible to move loads from after a Release store or read-modify-write operation to before it, and move non-Release stores from after an Release operation to before it.
- Notes for code generation
- See the section on Acquire; a fence before the relevant operation is usually sufficient for Release. Note that a store-store fence is not sufficient to implement Release semantics; store-store fences are generally not exposed to IR because they are extremely difficult to use correctly.
AcquireRelease¶
AcquireRelease (acq_rel
in IR) provides both an Acquire and a Release
barrier (for fences and operations which both read and write memory).
- Relevant standard
- This corresponds to the C++11/C11
memory_order_acq_rel
. - Notes for frontends
- If you are writing a frontend which uses this directly, use with caution. Acquire only provides a semantic guarantee when paired with a Release operation, and vice versa.
- Notes for optimizers
- In general, optimizers should treat this like a nothrow call; the possible optimizations are usually not interesting.
- Notes for code generation
- This operation has Acquire and Release semantics; see the sections on Acquire and Release.
SequentiallyConsistent¶
SequentiallyConsistent (seq_cst
in IR) provides Acquire semantics for loads
and Release semantics for stores. Additionally, it guarantees that a total
ordering exists between all SequentiallyConsistent operations.
- Relevant standard
- This corresponds to the C++11/C11
memory_order_seq_cst
, Java volatile, and the gcc-compatible__sync_*
builtins which do not specify otherwise. - Notes for frontends
- If a frontend is exposing atomic operations, these are much easier to reason about for the programmer than other kinds of operations, and using them is generally a practical performance tradeoff.
- Notes for optimizers
- Optimizers not aware of atomics can treat this like a nothrow call. For SequentiallyConsistent loads and stores, the same reorderings are allowed as for Acquire loads and Release stores, except that SequentiallyConsistent operations may not be reordered.
- Notes for code generation
- SequentiallyConsistent loads minimally require the same barriers as Acquire operations and SequentiallyConsistent stores require Release barriers. Additionally, the code generator must enforce ordering between SequentiallyConsistent stores followed by SequentiallyConsistent loads. This is usually done by emitting either a full fence before the loads or a full fence after the stores; which is preferred varies by architecture.
Atomics and IR optimization¶
Predicates for optimizer writers to query:
isSimple()
: A load or store which is not volatile or atomic. This is what, for example, memcpyopt would check for operations it might transform.isUnordered()
: A load or store which is not volatile and at most Unordered. This would be checked, for example, by LICM before hoisting an operation.mayReadFromMemory()
/mayWriteToMemory()
: Existing predicate, but note that they return true for any operation which is volatile or at least Monotonic.isAtLeastAcquire()
/isAtLeastRelease()
: These are predicates on orderings. They can be useful for passes that are aware of atomics, for example to do DSE across a single atomic access, but not across a release-acquire pair (see MemoryDependencyAnalysis for an example of this)- Alias analysis: Note that AA will return ModRef for anything Acquire or Release, and for the address accessed by any Monotonic operation.
To support optimizing around atomic operations, make sure you are using the right predicates; everything should work if that is done. If your pass should optimize some atomic operations (Unordered operations in particular), make sure it doesn’t replace an atomic load or store with a non-atomic operation.
Some examples of how optimizations interact with various kinds of atomic operations:
memcpyopt
: An atomic operation cannot be optimized into part of a memcpy/memset, including unordered loads/stores. It can pull operations across some atomic operations.- LICM: Unordered loads/stores can be moved out of a loop. It just treats monotonic operations like a read+write to a memory location, and anything stricter than that like a nothrow call.
- DSE: Unordered stores can be DSE’ed like normal stores. Monotonic stores can be DSE’ed in some cases, but it’s tricky to reason about, and not especially important. It is possible in some case for DSE to operate across a stronger atomic operation, but it is fairly tricky. DSE delegates this reasoning to MemoryDependencyAnalysis (which is also used by other passes like GVN).
- Folding a load: Any atomic load from a constant global can be constant-folded, because it cannot be observed. Similar reasoning allows scalarrepl with atomic loads and stores.
Atomics and Codegen¶
Atomic operations are represented in the SelectionDAG with ATOMIC_*
opcodes.
On architectures which use barrier instructions for all atomic ordering (like
ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
setInsertFencesForAtomic()
was used.
The MachineMemOperand for all atomic operations is currently marked as volatile; this is not correct in the IR sense of volatile, but CodeGen handles anything marked volatile very conservatively. This should get fixed at some point.
Common architectures have some way of representing at least a pointer-sized
lock-free cmpxchg
; such an operation can be used to implement all the other
atomic operations which can be represented in IR up to that size. Backends are
expected to implement all those operations, but not operations which cannot be
implemented in a lock-free manner. It is expected that backends will give an
error when given an operation which cannot be implemented. (The LLVM code
generator is not very helpful here at the moment, but hopefully that will
change.)
On x86, all atomic loads generate a MOV
. SequentiallyConsistent stores
generate an XCHG
, other stores generate a MOV
. SequentiallyConsistent
fences generate an MFENCE
, other fences do not cause any code to be
generated. cmpxchg uses the LOCK CMPXCHG
instruction. atomicrmw xchg
uses XCHG
, atomicrmw add
and atomicrmw sub
use XADD
, and all
other atomicrmw
operations generate a loop with LOCK CMPXCHG
. Depending
on the users of the result, some atomicrmw
operations can be translated into
operations like LOCK AND
, but that does not work in general.
On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
and SequentiallyConsistent semantics require barrier instructions for every such
operation. Loads and stores generate normal instructions. cmpxchg
and
atomicrmw
can be represented using a loop with LL/SC-style instructions
which take some sort of exclusive lock on a cache line (LDREX
and STREX
on ARM, etc.).
It is often easiest for backends to use AtomicExpandPass to lower some of the atomic constructs. Here are some lowerings it can do:
- cmpxchg -> loop with load-linked/store-conditional
by overriding
hasLoadLinkedStoreConditional()
,emitLoadLinked()
,emitStoreConditional()
- large loads/stores -> ll-sc/cmpxchg
by overriding
shouldExpandAtomicStoreInIR()
/shouldExpandAtomicLoadInIR()
- strong atomic accesses -> monotonic accesses + fences
by using
setInsertFencesForAtomic()
and overridingemitLeadingFence()
andemitTrailingFence()
- atomic rmw -> loop with cmpxchg or load-linked/store-conditional
by overriding
expandAtomicRMWInIR()
For an example of all of these, look at the ARM backend.
LLVM Coding Standards¶
- Introduction
- Languages, Libraries, and Standards
- Mechanical Source Issues
- Style Issues
- The High-Level Issues
- The Low-Level Issues
- Name Types, Functions, Variables, and Enumerators Properly
- Assert Liberally
- Do Not Use
using namespace std
- Provide a Virtual Method Anchor for Classes in Headers
- Don’t use default labels in fully covered switches over enumerations
- Don’t evaluate
end()
every time through a loop #include <iostream>
is Forbidden- Use
raw_ostream
- Avoid
std::endl
- Don’t use
inline
when defining a function in a class definition
- Microscopic Details
- See Also
Introduction¶
This document attempts to describe a few coding standards that are being used in the LLVM source tree. Although no coding standards should be regarded as absolute requirements to be followed in all instances, coding standards are particularly important for large-scale code bases that follow a library-based design (like LLVM).
While this document may provide guidance for some mechanical formatting issues, whitespace, or other “microscopic details”, these are not fixed standards. Always follow the golden rule:
If you are extending, enhancing, or bug fixing already implemented code, use the style that is already being used so that the source is uniform and easy to follow.
Note that some code bases (e.g. libc++
) have really good reasons to deviate
from the coding standards. In the case of libc++
, this is because the
naming and other conventions are dictated by the C++ standard. If you think
there is a specific good reason to deviate from the standards here, please bring
it up on the LLVM-dev mailing list.
There are some conventions that are not uniformly followed in the code base (e.g. the naming convention). This is because they are relatively new, and a lot of code was written before they were put in place. Our long term goal is for the entire codebase to follow the convention, but we explicitly do not want patches that do large-scale reformating of existing code. On the other hand, it is reasonable to rename the methods of a class if you’re about to change it in some other way. Just do the reformating as a separate commit from the functionality change.
The ultimate goal of these guidelines is the increase readability and maintainability of our common source base. If you have suggestions for topics to be included, please mail them to Chris.
Languages, Libraries, and Standards¶
Most source code in LLVM and other LLVM projects using these coding standards is C++ code. There are some places where C code is used either due to environment restrictions, historical restrictions, or due to third-party source code imported into the tree. Generally, our preference is for standards conforming, modern, and portable C++ code as the implementation language of choice.
C++ Standard Versions¶
LLVM, Clang, and LLD are currently written using C++11 conforming code, although we restrict ourselves to features which are available in the major toolchains supported as host compilers. The LLDB project is even more aggressive in the set of host compilers supported and thus uses still more features. Regardless of the supported features, code is expected to (when reasonable) be standard, portable, and modern C++11 code. We avoid unnecessary vendor-specific extensions, etc.
C++ Standard Library¶
Use the C++ standard library facilities whenever they are available for a particular task. LLVM and related projects emphasize and rely on the standard library facilities for as much as possible. Common support libraries providing functionality missing from the standard library for which there are standard interfaces or active work on adding standard interfaces will often be implemented in the LLVM namespace following the expected standard interface.
There are some exceptions such as the standard I/O streams library which are avoided. Also, there is much more detailed information on these subjects in the LLVM Programmer’s Manual.
Supported C++11 Language and Library Features¶
While LLVM, Clang, and LLD use C++11, not all features are available in all of the toolchains which we support. The set of features supported for use in LLVM is the intersection of those supported in MSVC 2013, GCC 4.7, and Clang 3.1. The ultimate definition of this set is what build bots with those respective toolchains accept. Don’t argue with the build bots. However, we have some guidance below to help you know what to expect.
Each toolchain provides a good reference for what it accepts:
- Clang: http://clang.llvm.org/cxx_status.html
- GCC: http://gcc.gnu.org/projects/cxx0x.html
- MSVC: http://msdn.microsoft.com/en-us/library/hh567368.aspx
In most cases, the MSVC list will be the dominating factor. Here is a summary of the features that are expected to work. Features not on this list are unlikely to be supported by our host compilers.
- Rvalue references: N2118
- But not Rvalue references for
*this
or member qualifiers (N2439)
- But not Rvalue references for
- Static assert: N1720
auto
type deduction: N1984, N1737- Trailing return types: N2541
- Lambdas: N2927
- But not lambdas with default arguments.
decltype
: N2343- Nested closing right angle brackets: N1757
- Extern templates: N1987
nullptr
: N2431- Strongly-typed and forward declarable enums: N2347, N2764
- Local and unnamed types as template arguments: N2657
- Range-based for-loop: N2930
- But
{}
are required around innerdo {} while()
loops. As a result,{}
are required around function-like macros inside range-based for loops.
- But
override
andfinal
: N2928, N3206, N3272- Atomic operations and the C++11 memory model: N2429
- Variadic templates: N2242
- Explicit conversion operators: N2437
- Defaulted and deleted functions: N2346
- But not defaulted move constructors or move assignment operators, MSVC 2013 cannot synthesize them.
- Initializer lists: N2627
- Delegating constructors: N1986
- Default member initializers (non-static data member initializers): N2756
- Only use these for scalar members that would otherwise be left uninitialized. Non-scalar members generally have appropriate default constructors, and MSVC 2013 has problems when braced initializer lists are involved.
The supported features in the C++11 standard libraries are less well tracked, but also much greater. Most of the standard libraries implement most of C++11’s library. The most likely lowest common denominator is Linux support. For libc++, the support is just poorly tested and undocumented but expected to be largely complete. YMMV. For libstdc++, the support is documented in detail in the libstdc++ manual. There are some very minor missing facilities that are unlikely to be common problems, and there are a few larger gaps that are worth being aware of:
- Not all of the type traits are implemented
- No regular expression library.
- While most of the atomics library is well implemented, the fences are missing. Fortunately, they are rarely needed.
- The locale support is incomplete.
std::equal()
(and other algorithms) incorrectly assert in MSVC when givennullptr
as an iterator.
Other than these areas you should assume the standard library is available and working as expected until some build bot tells you otherwise. If you’re in an uncertain area of one of the above points, but you cannot test on a Linux system, your best approach is to minimize your use of these features, and watch the Linux build bots to find out if your usage triggered a bug. For example, if you hit a type trait which doesn’t work we can then add support to LLVM’s traits header to emulate it.
Other Languages¶
Any code written in the Go programming language is not subject to the formatting rules below. Instead, we adopt the formatting rules enforced by the gofmt tool.
Go code should strive to be idiomatic. Two good sets of guidelines for what this means are Effective Go and Go Code Review Comments.
Mechanical Source Issues¶
Source Code Formatting¶
Commenting¶
Comments are one critical part of readability and maintainability. Everyone knows they should comment their code, and so should you. When writing comments, write them as English prose, which means they should use proper capitalization, punctuation, etc. Aim to describe what the code is trying to do and why, not how it does it at a micro level. Here are a few critical things to document:
File Headers¶
Every source file should have a header on it that describes the basic purpose of the file. If a file does not have a header, it should not be checked into the tree. The standard header looks like this:
//===-- llvm/Instruction.h - Instruction class definition -------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
///
/// \file
/// This file contains the declaration of the Instruction class, which is the
/// base class for all of the VM instructions.
///
//===----------------------------------------------------------------------===//
A few things to note about this particular format: The “-*- C++ -*-
” string
on the first line is there to tell Emacs that the source file is a C++ file, not
a C file (Emacs assumes .h
files are C files by default).
Note
This tag is not necessary in .cpp
files. The name of the file is also
on the first line, along with a very short description of the purpose of the
file. This is important when printing out code and flipping though lots of
pages.
The next section in the file is a concise note that defines the license that the file is released under. This makes it perfectly clear what terms the source code can be distributed under and should not be modified in any way.
The main body is a doxygen
comment (identified by the ///
comment
marker instead of the usual //
) describing the purpose of the file. The
first sentence or a passage beginning with \brief
is used as an abstract.
Any additional information should be separated by a blank line. If an
algorithm is being implemented or something tricky is going on, a reference
to the paper where it is published should be included, as well as any notes or
gotchas in the code to watch out for.
Class overviews¶
Classes are one fundamental part of a good object oriented design. As such, a
class definition should have a comment block that explains what the class is
used for and how it works. Every non-trivial class is expected to have a
doxygen
comment block.
Method information¶
Methods defined in a class (as well as any global functions) should also be documented properly. A quick note about what it does and a description of the borderline behaviour is all that is necessary here (unless something particularly tricky or insidious is going on). The hope is that people can figure out how to use your interfaces without reading the code itself.
Good things to talk about here are what happens when something unexpected happens: does the method return null? Abort? Format your hard disk?
Comment Formatting¶
In general, prefer C++ style comments (//
for normal comments, ///
for
doxygen
documentation comments). They take less space, require
less typing, don’t have nesting problems, etc. There are a few cases when it is
useful to use C style (/* */
) comments however:
- When writing C code: Obviously if you are writing C code, use C style comments.
- When writing a header file that may be
#include
d by a C source file. - When writing a source file that is used by a tool that only accepts C style comments.
To comment out a large block of code, use #if 0
and #endif
. These nest
properly and are better behaved in general than C style comments.
Doxygen Use in Documentation Comments¶
Use the \file
command to turn the standard file header into a file-level
comment.
Include descriptive paragraphs for all public interfaces (public classes,
member and non-member functions). Don’t just restate the information that can
be inferred from the API name. The first sentence or a paragraph beginning
with \brief
is used as an abstract. Put detailed discussion into separate
paragraphs.
To refer to parameter names inside a paragraph, use the \p name
command.
Don’t use the \arg name
command since it starts a new paragraph that
contains documentation for the parameter.
Wrap non-inline code examples in \code ... \endcode
.
To document a function parameter, start a new paragraph with the
\param name
command. If the parameter is used as an out or an in/out
parameter, use the \param [out] name
or \param [in,out] name
command,
respectively.
To describe function return value, start a new paragraph with the \returns
command.
A minimal documentation comment:
/// Sets the xyzzy property to \p Baz.
void setXyzzy(bool Baz);
A documentation comment that uses all Doxygen features in a preferred way:
/// \brief Does foo and bar.
///
/// Does not do foo the usual way if \p Baz is true.
///
/// Typical usage:
/// \code
/// fooBar(false, "quux", Res);
/// \endcode
///
/// \param Quux kind of foo to do.
/// \param [out] Result filled with bar sequence on foo success.
///
/// \returns true on success.
bool fooBar(bool Baz, StringRef Quux, std::vector<int> &Result);
Don’t duplicate the documentation comment in the header file and in the implementation file. Put the documentation comments for public APIs into the header file. Documentation comments for private APIs can go to the implementation file. In any case, implementation files can include additional comments (not necessarily in Doxygen markup) to explain implementation details as needed.
Don’t duplicate function or class name at the beginning of the comment. For humans it is obvious which function or class is being documented; automatic documentation processing tools are smart enough to bind the comment to the correct declaration.
Wrong:
// In Something.h:
/// Something - An abstraction for some complicated thing.
class Something {
public:
/// fooBar - Does foo and bar.
void fooBar();
};
// In Something.cpp:
/// fooBar - Does foo and bar.
void Something::fooBar() { ... }
Correct:
// In Something.h:
/// An abstraction for some complicated thing.
class Something {
public:
/// Does foo and bar.
void fooBar();
};
// In Something.cpp:
// Builds a B-tree in order to do foo. See paper by...
void Something::fooBar() { ... }
It is not required to use additional Doxygen features, but sometimes it might be a good idea to do so.
Consider:
- adding comments to any narrow namespace containing a collection of related functions or types;
- using top-level groups to organize a collection of related functions at namespace scope where the grouping is smaller than the namespace;
- using member groups and additional comments attached to member groups to organize within a class.
For example:
class Something {
/// \name Functions that do Foo.
/// @{
void fooBar();
void fooBaz();
/// @}
...
};
#include
Style¶
Immediately after the header file comment (and include guards if working on a
header file), the minimal list of #includes required by the file should be
listed. We prefer these #include
s to be listed in this order:
- Main Module Header
- Local/Private Headers
llvm/...
- System
#include
s
and each category should be sorted lexicographically by the full path.
The Main Module Header file applies to .cpp
files which implement an
interface defined by a .h
file. This #include
should always be included
first regardless of where it lives on the file system. By including a
header file first in the .cpp
files that implement the interfaces, we ensure
that the header does not have any hidden dependencies which are not explicitly
#include
d in the header, but should be. It is also a form of documentation
in the .cpp
file to indicate where the interfaces it implements are defined.
Source Code Width¶
Write your code to fit within 80 columns of text. This helps those of us who
like to print out code and look at your code in an xterm
without resizing
it.
The longer answer is that there must be some limit to the width of the code in order to reasonably allow developers to have multiple files side-by-side in windows on a modest display. If you are going to pick a width limit, it is somewhat arbitrary but you might as well pick something standard. Going with 90 columns (for example) instead of 80 columns wouldn’t add any significant value and would be detrimental to printing out code. Also many other projects have standardized on 80 columns, so some people have already configured their editors for it (vs something else, like 90 columns).
This is one of many contentious issues in coding standards, but it is not up for debate.
Use Spaces Instead of Tabs¶
In all cases, prefer spaces to tabs in source files. People have different preferred indentation levels, and different styles of indentation that they like; this is fine. What isn’t fine is that different editors/viewers expand tabs out to different tab stops. This can cause your code to look completely unreadable, and it is not worth dealing with.
As always, follow the Golden Rule above: follow the style of existing code if you are modifying and extending it. If you like four spaces of indentation, DO NOT do that in the middle of a chunk of code with two spaces of indentation. Also, do not reindent a whole source file: it makes for incredible diffs that are absolutely worthless.
Indent Code Consistently¶
Okay, in your first year of programming you were told that indentation is important. If you didn’t believe and internalize this then, now is the time. Just do it. With the introduction of C++11, there are some new formatting challenges that merit some suggestions to help have consistent, maintainable, and tool-friendly formatting and indentation.
Format Lambdas Like Blocks Of Code¶
When formatting a multi-line lambda, format it like a block of code, that’s what it is. If there is only one multi-line lambda in a statement, and there are no expressions lexically after it in the statement, drop the indent to the standard two space indent for a block of code, as if it were an if-block opened by the preceding part of the statement:
std::sort(foo.begin(), foo.end(), [&](Foo a, Foo b) -> bool {
if (a.blah < b.blah)
return true;
if (a.baz < b.baz)
return true;
return a.bam < b.bam;
});
To take best advantage of this formatting, if you are designing an API which
accepts a continuation or single callable argument (be it a functor, or
a std::function
), it should be the last argument if at all possible.
If there are multiple multi-line lambdas in a statement, or there is anything
interesting after the lambda in the statement, indent the block two spaces from
the indent of the []
:
dyn_switch(V->stripPointerCasts(),
[] (PHINode *PN) {
// process phis...
},
[] (SelectInst *SI) {
// process selects...
},
[] (LoadInst *LI) {
// process loads...
},
[] (AllocaInst *AI) {
// process allocas...
});
Braced Initializer Lists¶
With C++11, there are significantly more uses of braced lists to perform initialization. These allow you to easily construct aggregate temporaries in expressions among other niceness. They now have a natural way of ending up nested within each other and within function calls in order to build up aggregates (such as option structs) from local variables. To make matters worse, we also have many more uses of braces in an expression context that are not performing initialization.
The historically common formatting of braced initialization of aggregate variables does not mix cleanly with deep nesting, general expression contexts, function arguments, and lambdas. We suggest new code use a simple rule for formatting braced initialization lists: act as-if the braces were parentheses in a function call. The formatting rules exactly match those already well understood for formatting nested function calls. Examples:
foo({a, b, c}, {1, 2, 3});
llvm::Constant *Mask[] = {
llvm::ConstantInt::get(llvm::Type::getInt32Ty(getLLVMContext()), 0),
llvm::ConstantInt::get(llvm::Type::getInt32Ty(getLLVMContext()), 1),
llvm::ConstantInt::get(llvm::Type::getInt32Ty(getLLVMContext()), 2)};
This formatting scheme also makes it particularly easy to get predictable, consistent, and automatic formatting with tools like Clang Format.
Language and Compiler Issues¶
Treat Compiler Warnings Like Errors¶
If your code has compiler warnings in it, something is wrong — you aren’t casting values correctly, you have “questionable” constructs in your code, or you are doing something legitimately wrong. Compiler warnings can cover up legitimate errors in output and make dealing with a translation unit difficult.
It is not possible to prevent all warnings from all compilers, nor is it
desirable. Instead, pick a standard compiler (like gcc
) that provides a
good thorough set of warnings, and stick to it. At least in the case of
gcc
, it is possible to work around any spurious errors by changing the
syntax of the code slightly. For example, a warning that annoys me occurs when
I write code like this:
if (V = getValue()) {
...
}
gcc
will warn me that I probably want to use the ==
operator, and that I
probably mistyped it. In most cases, I haven’t, and I really don’t want the
spurious errors. To fix this particular problem, I rewrite the code like
this:
if ((V = getValue())) {
...
}
which shuts gcc
up. Any gcc
warning that annoys you can be fixed by
massaging the code appropriately.
Write Portable Code¶
In almost all cases, it is possible and within reason to write completely portable code. If there are cases where it isn’t possible to write portable code, isolate it behind a well defined (and well documented) interface.
In practice, this means that you shouldn’t assume much about the host compiler
(and Visual Studio tends to be the lowest common denominator). If advanced
features are used, they should only be an implementation detail of a library
which has a simple exposed API, and preferably be buried in libSystem
.
Do not use RTTI or Exceptions¶
In an effort to reduce code and executable size, LLVM does not use RTTI
(e.g. dynamic_cast<>;
) or exceptions. These two language features violate
the general C++ principle of “you only pay for what you use”, causing
executable bloat even if exceptions are never used in the code base, or if RTTI
is never used for a class. Because of this, we turn them off globally in the
code.
That said, LLVM does make extensive use of a hand-rolled form of RTTI that use
templates like isa<>, cast<>, and dyn_cast<>.
This form of RTTI is opt-in and can be
added to any class. It is also
substantially more efficient than dynamic_cast<>
.
Do not use Static Constructors¶
Static constructors and destructors (e.g. global variables whose types have a constructor or destructor) should not be added to the code base, and should be removed wherever possible. Besides well known problems where the order of initialization is undefined between globals in different source files, the entire concept of static constructors is at odds with the common use case of LLVM as a library linked into a larger application.
Consider the use of LLVM as a JIT linked into another application (perhaps for OpenGL, custom languages, shaders in movies, etc). Due to the design of static constructors, they must be executed at startup time of the entire application, regardless of whether or how LLVM is used in that larger application. There are two problems with this:
- The time to run the static constructors impacts startup time of applications — a critical time for GUI apps, among others.
- The static constructors cause the app to pull many extra pages of memory off
the disk: both the code for the constructor in each
.o
file and the small amount of data that gets touched. In addition, touched/dirty pages put more pressure on the VM system on low-memory machines.
We would really like for there to be zero cost for linking in an additional LLVM target or other library into an application, but static constructors violate this goal.
That said, LLVM unfortunately does contain static constructors. It would be a
great project for someone to purge all static
constructors from LLVM, and then enable the -Wglobal-constructors
warning
flag (when building with Clang) to ensure we do not regress in the future.
Use of class
and struct
Keywords¶
In C++, the class
and struct
keywords can be used almost
interchangeably. The only difference is when they are used to declare a class:
class
makes all members private by default while struct
makes all
members public by default.
Unfortunately, not all compilers follow the rules and some will generate
different symbols based on whether class
or struct
was used to declare
the symbol (e.g., MSVC). This can lead to problems at link time.
- All declarations and definitions of a given
class
orstruct
must use the same keyword. For example:
class Foo;
// Breaks mangling in MSVC.
struct Foo { int Data; };
- As a rule of thumb,
struct
should be kept to structures where all members are declared public.
// Foo feels like a class... this is strange.
struct Foo {
private:
int Data;
public:
Foo() : Data(0) { }
int getData() const { return Data; }
void setData(int D) { Data = D; }
};
// Bar isn't POD, but it does look like a struct.
struct Bar {
int Data;
Bar() : Data(0) { }
};
Do not use Braced Initializer Lists to Call a Constructor¶
In C++11 there is a “generalized initialization syntax” which allows calling constructors using braced initializer lists. Do not use these to call constructors with any interesting logic or if you care that you’re calling some particular constructor. Those should look like function calls using parentheses rather than like aggregate initialization. Similarly, if you need to explicitly name the type and call its constructor to create a temporary, don’t use a braced initializer list. Instead, use a braced initializer list (without any type for temporaries) when doing aggregate initialization or something notionally equivalent. Examples:
class Foo {
public:
// Construct a Foo by reading data from the disk in the whizbang format, ...
Foo(std::string filename);
// Construct a Foo by looking up the Nth element of some global data ...
Foo(int N);
// ...
};
// The Foo constructor call is very deliberate, no braces.
std::fill(foo.begin(), foo.end(), Foo("name"));
// The pair is just being constructed like an aggregate, use braces.
bar_map.insert({my_key, my_value});
If you use a braced initializer list when initializing a variable, use an equals before the open curly brace:
int data[] = {0, 1, 2, 3};
Use auto
Type Deduction to Make Code More Readable¶
Some are advocating a policy of “almost always auto
” in C++11, however LLVM
uses a more moderate stance. Use auto
if and only if it makes the code more
readable or easier to maintain. Don’t “almost always” use auto
, but do use
auto
with initializers like cast<Foo>(...)
or other places where the
type is already obvious from the context. Another time when auto
works well
for these purposes is when the type would have been abstracted away anyways,
often behind a container’s typedef such as std::vector<T>::iterator
.
Beware unnecessary copies with auto
¶
The convenience of auto
makes it easy to forget that its default behavior
is a copy. Particularly in range-based for
loops, careless copies are
expensive.
As a rule of thumb, use auto &
unless you need to copy the result, and use
auto *
when copying pointers.
// Typically there's no reason to copy.
for (const auto &Val : Container) { observe(Val); }
for (auto &Val : Container) { Val.change(); }
// Remove the reference if you really want a new copy.
for (auto Val : Container) { Val.change(); saveSomewhere(Val); }
// Copy pointers, but make it clear that they're pointers.
for (const auto *Ptr : Container) { observe(*Ptr); }
for (auto *Ptr : Container) { Ptr->change(); }
Style Issues¶
The High-Level Issues¶
A Public Header File is a Module¶
C++ doesn’t do too well in the modularity department. There is no real
encapsulation or data hiding (unless you use expensive protocol classes), but it
is what we have to work with. When you write a public header file (in the LLVM
source tree, they live in the top level “include
” directory), you are
defining a module of functionality.
Ideally, modules should be completely independent of each other, and their
header files should only #include
the absolute minimum number of headers
possible. A module is not just a class, a function, or a namespace: it’s a
collection of these that defines an interface. This interface may be several
functions, classes, or data structures, but the important issue is how they work
together.
In general, a module should be implemented by one or more .cpp
files. Each
of these .cpp
files should include the header that defines their interface
first. This ensures that all of the dependences of the module header have been
properly added to the module header itself, and are not implicit. System
headers should be included after user headers for a translation unit.
#include
as Little as Possible¶
#include
hurts compile time performance. Don’t do it unless you have to,
especially in header files.
But wait! Sometimes you need to have the definition of a class to use it, or to
inherit from it. In these cases go ahead and #include
that header file. Be
aware however that there are many cases where you don’t need to have the full
definition of a class. If you are using a pointer or reference to a class, you
don’t need the header file. If you are simply returning a class instance from a
prototyped function or method, you don’t need it. In fact, for most cases, you
simply don’t need the definition of a class. And not #include
ing speeds up
compilation.
It is easy to try to go too overboard on this recommendation, however. You must include all of the header files that you are using — you can include them either directly or indirectly through another header file. To make sure that you don’t accidentally forget to include a header file in your module header, make sure to include your module header first in the implementation file (as mentioned above). This way there won’t be any hidden dependencies that you’ll find out about later.
Keep “Internal” Headers Private¶
Many modules have a complex implementation that causes them to use more than one
implementation (.cpp
) file. It is often tempting to put the internal
communication interface (helper classes, extra functions, etc) in the public
module header file. Don’t do this!
If you really need to do something like this, put a private header file in the same directory as the source files, and include it locally. This ensures that your private interface remains private and undisturbed by outsiders.
Note
It’s okay to put extra implementation methods in a public class itself. Just make them private (or protected) and all is well.
Use Early Exits and continue
to Simplify Code¶
When reading code, keep in mind how much state and how many previous decisions
have to be remembered by the reader to understand a block of code. Aim to
reduce indentation where possible when it doesn’t make it more difficult to
understand the code. One great way to do this is by making use of early exits
and the continue
keyword in long loops. As an example of using an early
exit from a function, consider this “bad” code:
Value *doSomething(Instruction *I) {
if (!isa<TerminatorInst>(I) &&
I->hasOneUse() && doOtherThing(I)) {
... some long code ....
}
return 0;
}
This code has several problems if the body of the 'if'
is large. When
you’re looking at the top of the function, it isn’t immediately clear that this
only does interesting things with non-terminator instructions, and only
applies to things with the other predicates. Second, it is relatively difficult
to describe (in comments) why these predicates are important because the if
statement makes it difficult to lay out the comments. Third, when you’re deep
within the body of the code, it is indented an extra level. Finally, when
reading the top of the function, it isn’t clear what the result is if the
predicate isn’t true; you have to read to the end of the function to know that
it returns null.
It is much preferred to format the code like this:
Value *doSomething(Instruction *I) {
// Terminators never need 'something' done to them because ...
if (isa<TerminatorInst>(I))
return 0;
// We conservatively avoid transforming instructions with multiple uses
// because goats like cheese.
if (!I->hasOneUse())
return 0;
// This is really just here for example.
if (!doOtherThing(I))
return 0;
... some long code ....
}
This fixes these problems. A similar problem frequently happens in for
loops. A silly example is something like this:
for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) {
if (BinaryOperator *BO = dyn_cast<BinaryOperator>(II)) {
Value *LHS = BO->getOperand(0);
Value *RHS = BO->getOperand(1);
if (LHS != RHS) {
...
}
}
}
When you have very, very small loops, this sort of structure is fine. But if it
exceeds more than 10-15 lines, it becomes difficult for people to read and
understand at a glance. The problem with this sort of code is that it gets very
nested very quickly. Meaning that the reader of the code has to keep a lot of
context in their brain to remember what is going immediately on in the loop,
because they don’t know if/when the if
conditions will have else
s etc.
It is strongly preferred to structure the loop like this:
for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) {
BinaryOperator *BO = dyn_cast<BinaryOperator>(II);
if (!BO) continue;
Value *LHS = BO->getOperand(0);
Value *RHS = BO->getOperand(1);
if (LHS == RHS) continue;
...
}
This has all the benefits of using early exits for functions: it reduces nesting
of the loop, it makes it easier to describe why the conditions are true, and it
makes it obvious to the reader that there is no else
coming up that they
have to push context into their brain for. If a loop is large, this can be a
big understandability win.
Don’t use else
after a return
¶
For similar reasons above (reduction of indentation and easier reading), please
do not use 'else'
or 'else if'
after something that interrupts control
flow — like return
, break
, continue
, goto
, etc. For
example, this is bad:
case 'J': {
if (Signed) {
Type = Context.getsigjmp_bufType();
if (Type.isNull()) {
Error = ASTContext::GE_Missing_sigjmp_buf;
return QualType();
} else {
break;
}
} else {
Type = Context.getjmp_bufType();
if (Type.isNull()) {
Error = ASTContext::GE_Missing_jmp_buf;
return QualType();
} else {
break;
}
}
}
It is better to write it like this:
case 'J':
if (Signed) {
Type = Context.getsigjmp_bufType();
if (Type.isNull()) {
Error = ASTContext::GE_Missing_sigjmp_buf;
return QualType();
}
} else {
Type = Context.getjmp_bufType();
if (Type.isNull()) {
Error = ASTContext::GE_Missing_jmp_buf;
return QualType();
}
}
break;
Or better yet (in this case) as:
case 'J':
if (Signed)
Type = Context.getsigjmp_bufType();
else
Type = Context.getjmp_bufType();
if (Type.isNull()) {
Error = Signed ? ASTContext::GE_Missing_sigjmp_buf :
ASTContext::GE_Missing_jmp_buf;
return QualType();
}
break;
The idea is to reduce indentation and the amount of code you have to keep track of when reading the code.
Turn Predicate Loops into Predicate Functions¶
It is very common to write small loops that just compute a boolean value. There are a number of ways that people commonly write these, but an example of this sort of thing is:
bool FoundFoo = false;
for (unsigned I = 0, E = BarList.size(); I != E; ++I)
if (BarList[I]->isFoo()) {
FoundFoo = true;
break;
}
if (FoundFoo) {
...
}
This sort of code is awkward to write, and is almost always a bad sign. Instead of this sort of loop, we strongly prefer to use a predicate function (which may be static) that uses early exits to compute the predicate. We prefer the code to be structured like this:
/// \returns true if the specified list has an element that is a foo.
static bool containsFoo(const std::vector<Bar*> &List) {
for (unsigned I = 0, E = List.size(); I != E; ++I)
if (List[I]->isFoo())
return true;
return false;
}
...
if (containsFoo(BarList)) {
...
}
There are many reasons for doing this: it reduces indentation and factors out code which can often be shared by other code that checks for the same predicate. More importantly, it forces you to pick a name for the function, and forces you to write a comment for it. In this silly example, this doesn’t add much value. However, if the condition is complex, this can make it a lot easier for the reader to understand the code that queries for this predicate. Instead of being faced with the in-line details of how we check to see if the BarList contains a foo, we can trust the function name and continue reading with better locality.
The Low-Level Issues¶
Name Types, Functions, Variables, and Enumerators Properly¶
Poorly-chosen names can mislead the reader and cause bugs. We cannot stress enough how important it is to use descriptive names. Pick names that match the semantics and role of the underlying entities, within reason. Avoid abbreviations unless they are well known. After picking a good name, make sure to use consistent capitalization for the name, as inconsistency requires clients to either memorize the APIs or to look it up to find the exact spelling.
In general, names should be in camel case (e.g. TextFileReader
and
isLValue()
). Different kinds of declarations have different rules:
Type names (including classes, structs, enums, typedefs, etc) should be nouns and start with an upper-case letter (e.g.
TextFileReader
).Variable names should be nouns (as they represent state). The name should be camel case, and start with an upper case letter (e.g.
Leader
orBoats
).Function names should be verb phrases (as they represent actions), and command-like function should be imperative. The name should be camel case, and start with a lower case letter (e.g.
openFile()
orisFoo()
).Enum declarations (e.g.
enum Foo {...}
) are types, so they should follow the naming conventions for types. A common use for enums is as a discriminator for a union, or an indicator of a subclass. When an enum is used for something like this, it should have aKind
suffix (e.g.ValueKind
).Enumerators (e.g.
enum { Foo, Bar }
) and public member variables should start with an upper-case letter, just like types. Unless the enumerators are defined in their own small namespace or inside a class, enumerators should have a prefix corresponding to the enum declaration name. For example,enum ValueKind { ... };
may contain enumerators likeVK_Argument
,VK_BasicBlock
, etc. Enumerators that are just convenience constants are exempt from the requirement for a prefix. For instance:enum { MaxSize = 42, Density = 12 };
As an exception, classes that mimic STL classes can have member names in STL’s
style of lower-case words separated by underscores (e.g. begin()
,
push_back()
, and empty()
). Classes that provide multiple
iterators should add a singular prefix to begin()
and end()
(e.g. global_begin()
and use_begin()
).
Here are some examples of good and bad names:
class VehicleMaker {
...
Factory<Tire> F; // Bad -- abbreviation and non-descriptive.
Factory<Tire> Factory; // Better.
Factory<Tire> TireFactory; // Even better -- if VehicleMaker has more than one
// kind of factories.
};
Vehicle MakeVehicle(VehicleType Type) {
VehicleMaker M; // Might be OK if having a short life-span.
Tire Tmp1 = M.makeTire(); // Bad -- 'Tmp1' provides no information.
Light Headlight = M.makeLight("head"); // Good -- descriptive.
...
}
Assert Liberally¶
Use the “assert
” macro to its fullest. Check all of your preconditions and
assumptions, you never know when a bug (not necessarily even yours) might be
caught early by an assertion, which reduces debugging time dramatically. The
“<cassert>
” header file is probably already included by the header files you
are using, so it doesn’t cost anything to use it.
To further assist with debugging, make sure to put some kind of error message in the assertion statement, which is printed if the assertion is tripped. This helps the poor debugger make sense of why an assertion is being made and enforced, and hopefully what to do about it. Here is one complete example:
inline Value *getOperand(unsigned I) {
assert(I < Operands.size() && "getOperand() out of range!");
return Operands[I];
}
Here are more examples:
assert(Ty->isPointerType() && "Can't allocate a non-pointer type!");
assert((Opcode == Shl || Opcode == Shr) && "ShiftInst Opcode invalid!");
assert(idx < getNumSuccessors() && "Successor # out of range!");
assert(V1.getType() == V2.getType() && "Constant types must be identical!");
assert(isa<PHINode>(Succ->front()) && "Only works on PHId BBs!");
You get the idea.
In the past, asserts were used to indicate a piece of code that should not be reached. These were typically of the form:
assert(0 && "Invalid radix for integer literal");
This has a few issues, the main one being that some compilers might not understand the assertion, or warn about a missing return in builds where assertions are compiled out.
Today, we have something much better: llvm_unreachable
:
llvm_unreachable("Invalid radix for integer literal");
When assertions are enabled, this will print the message if it’s ever reached
and then exit the program. When assertions are disabled (i.e. in release
builds), llvm_unreachable
becomes a hint to compilers to skip generating
code for this branch. If the compiler does not support this, it will fall back
to the “abort” implementation.
Another issue is that values used only by assertions will produce an “unused value” warning when assertions are disabled. For example, this code will warn:
unsigned Size = V.size();
assert(Size > 42 && "Vector smaller than it should be");
bool NewToSet = Myset.insert(Value);
assert(NewToSet && "The value shouldn't be in the set yet");
These are two interesting different cases. In the first case, the call to
V.size()
is only useful for the assert, and we don’t want it executed when
assertions are disabled. Code like this should move the call into the assert
itself. In the second case, the side effects of the call must happen whether
the assert is enabled or not. In this case, the value should be cast to void to
disable the warning. To be specific, it is preferred to write the code like
this:
assert(V.size() > 42 && "Vector smaller than it should be");
bool NewToSet = Myset.insert(Value); (void)NewToSet;
assert(NewToSet && "The value shouldn't be in the set yet");
Do Not Use using namespace std
¶
In LLVM, we prefer to explicitly prefix all identifiers from the standard
namespace with an “std::
” prefix, rather than rely on “using namespace
std;
”.
In header files, adding a 'using namespace XXX'
directive pollutes the
namespace of any source file that #include
s the header. This is clearly a
bad thing.
In implementation files (e.g. .cpp
files), the rule is more of a stylistic
rule, but is still important. Basically, using explicit namespace prefixes
makes the code clearer, because it is immediately obvious what facilities
are being used and where they are coming from. And more portable, because
namespace clashes cannot occur between LLVM code and other namespaces. The
portability rule is important because different standard library implementations
expose different symbols (potentially ones they shouldn’t), and future revisions
to the C++ standard will add more symbols to the std
namespace. As such, we
never use 'using namespace std;'
in LLVM.
The exception to the general rule (i.e. it’s not an exception for the std
namespace) is for implementation files. For example, all of the code in the
LLVM project implements code that lives in the ‘llvm’ namespace. As such, it is
ok, and actually clearer, for the .cpp
files to have a 'using namespace
llvm;'
directive at the top, after the #include
s. This reduces
indentation in the body of the file for source editors that indent based on
braces, and keeps the conceptual context cleaner. The general form of this rule
is that any .cpp
file that implements code in any namespace may use that
namespace (and its parents’), but should not use any others.
Provide a Virtual Method Anchor for Classes in Headers¶
If a class is defined in a header file and has a vtable (either it has virtual
methods or it derives from classes with virtual methods), it must always have at
least one out-of-line virtual method in the class. Without this, the compiler
will copy the vtable and RTTI into every .o
file that #include
s the
header, bloating .o
file sizes and increasing link times.
Don’t use default labels in fully covered switches over enumerations¶
-Wswitch
warns if a switch, without a default label, over an enumeration
does not cover every enumeration value. If you write a default label on a fully
covered switch over an enumeration then the -Wswitch
warning won’t fire
when new elements are added to that enumeration. To help avoid adding these
kinds of defaults, Clang has the warning -Wcovered-switch-default
which is
off by default but turned on when building LLVM with a version of Clang that
supports the warning.
A knock-on effect of this stylistic requirement is that when building LLVM with
GCC you may get warnings related to “control may reach end of non-void function”
if you return from each case of a covered switch-over-enum because GCC assumes
that the enum expression may take any representable value, not just those of
individual enumerators. To suppress this warning, use llvm_unreachable
after
the switch.
Don’t evaluate end()
every time through a loop¶
Because C++ doesn’t have a standard “foreach
” loop (though it can be
emulated with macros and may be coming in C++‘0x) we end up writing a lot of
loops that manually iterate from begin to end on a variety of containers or
through other data structures. One common mistake is to write a loop in this
style:
BasicBlock *BB = ...
for (BasicBlock::iterator I = BB->begin(); I != BB->end(); ++I)
... use I ...
The problem with this construct is that it evaluates “BB->end()
” every time
through the loop. Instead of writing the loop like this, we strongly prefer
loops to be written so that they evaluate it once before the loop starts. A
convenient way to do this is like so:
BasicBlock *BB = ...
for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
... use I ...
The observant may quickly point out that these two loops may have different
semantics: if the container (a basic block in this case) is being mutated, then
“BB->end()
” may change its value every time through the loop and the second
loop may not in fact be correct. If you actually do depend on this behavior,
please write the loop in the first form and add a comment indicating that you
did it intentionally.
Why do we prefer the second form (when correct)? Writing the loop in the first
form has two problems. First it may be less efficient than evaluating it at the
start of the loop. In this case, the cost is probably minor — a few extra
loads every time through the loop. However, if the base expression is more
complex, then the cost can rise quickly. I’ve seen loops where the end
expression was actually something like: “SomeMap[X]->end()
” and map lookups
really aren’t cheap. By writing it in the second form consistently, you
eliminate the issue entirely and don’t even have to think about it.
The second (even bigger) issue is that writing the loop in the first form hints to the reader that the loop is mutating the container (a fact that a comment would handily confirm!). If you write the loop in the second form, it is immediately obvious without even looking at the body of the loop that the container isn’t being modified, which makes it easier to read the code and understand what it does.
While the second form of the loop is a few extra keystrokes, we do strongly prefer it.
#include <iostream>
is Forbidden¶
The use of #include <iostream>
in library files is hereby forbidden,
because many common implementations transparently inject a static constructor
into every translation unit that includes it.
Note that using the other stream headers (<sstream>
for example) is not
problematic in this regard — just <iostream>
. However, raw_ostream
provides various APIs that are better performing for almost every use than
std::ostream
style APIs.
Note
New code should always use raw_ostream for writing, or the
llvm::MemoryBuffer
API for reading files.
Use raw_ostream
¶
LLVM includes a lightweight, simple, and efficient stream implementation in
llvm/Support/raw_ostream.h
, which provides all of the common features of
std::ostream
. All new code should use raw_ostream
instead of
ostream
.
Unlike std::ostream
, raw_ostream
is not a template and can be forward
declared as class raw_ostream
. Public headers should generally not include
the raw_ostream
header, but use forward declarations and constant references
to raw_ostream
instances.
Avoid std::endl
¶
The std::endl
modifier, when used with iostreams
outputs a newline to
the output stream specified. In addition to doing this, however, it also
flushes the output stream. In other words, these are equivalent:
std::cout << std::endl;
std::cout << '\n' << std::flush;
Most of the time, you probably have no reason to flush the output stream, so
it’s better to use a literal '\n'
.
Don’t use inline
when defining a function in a class definition¶
A member function defined in a class definition is implicitly inline, so don’t
put the inline
keyword in this case.
Don’t:
class Foo {
public:
inline void bar() {
// ...
}
};
Do:
class Foo {
public:
void bar() {
// ...
}
};
Microscopic Details¶
This section describes preferred low-level formatting guidelines along with reasoning on why we prefer them.
Spaces Before Parentheses¶
We prefer to put a space before an open parenthesis only in control flow statements, but not in normal function call expressions and function-like macros. For example, this is good:
if (X) ...
for (I = 0; I != 100; ++I) ...
while (LLVMRocks) ...
somefunc(42);
assert(3 != 4 && "laws of math are failing me");
A = foo(42, 92) + bar(X);
and this is bad:
if(X) ...
for(I = 0; I != 100; ++I) ...
while(LLVMRocks) ...
somefunc (42);
assert (3 != 4 && "laws of math are failing me");
A = foo (42, 92) + bar (X);
The reason for doing this is not completely arbitrary. This style makes control
flow operators stand out more, and makes expressions flow better. The function
call operator binds very tightly as a postfix operator. Putting a space after a
function name (as in the last example) makes it appear that the code might bind
the arguments of the left-hand-side of a binary operator with the argument list
of a function and the name of the right side. More specifically, it is easy to
misread the “A
” example as:
A = foo ((42, 92) + bar) (X);
when skimming through the code. By avoiding a space in a function, we avoid this misinterpretation.
Prefer Preincrement¶
Hard fast rule: Preincrement (++X
) may be no slower than postincrement
(X++
) and could very well be a lot faster than it. Use preincrementation
whenever possible.
The semantics of postincrement include making a copy of the value being incremented, returning it, and then preincrementing the “work value”. For primitive types, this isn’t a big deal. But for iterators, it can be a huge issue (for example, some iterators contains stack and set objects in them... copying an iterator could invoke the copy ctor’s of these as well). In general, get in the habit of always using preincrement, and you won’t have a problem.
Namespace Indentation¶
In general, we strive to reduce indentation wherever possible. This is useful
because we want code to fit into 80 columns without wrapping horribly, but
also because it makes it easier to understand the code. To facilitate this and
avoid some insanely deep nesting on occasion, don’t indent namespaces. If it
helps readability, feel free to add a comment indicating what namespace is
being closed by a }
. For example:
namespace llvm {
namespace knowledge {
/// This class represents things that Smith can have an intimate
/// understanding of and contains the data associated with it.
class Grokable {
...
public:
explicit Grokable() { ... }
virtual ~Grokable() = 0;
...
};
} // end namespace knowledge
} // end namespace llvm
Feel free to skip the closing comment when the namespace being closed is obvious for any reason. For example, the outer-most namespace in a header file is rarely a source of confusion. But namespaces both anonymous and named in source files that are being closed half way through the file probably could use clarification.
Anonymous Namespaces¶
After talking about namespaces in general, you may be wondering about anonymous
namespaces in particular. Anonymous namespaces are a great language feature
that tells the C++ compiler that the contents of the namespace are only visible
within the current translation unit, allowing more aggressive optimization and
eliminating the possibility of symbol name collisions. Anonymous namespaces are
to C++ as “static” is to C functions and global variables. While “static
”
is available in C++, anonymous namespaces are more general: they can make entire
classes private to a file.
The problem with anonymous namespaces is that they naturally want to encourage indentation of their body, and they reduce locality of reference: if you see a random function definition in a C++ file, it is easy to see if it is marked static, but seeing if it is in an anonymous namespace requires scanning a big chunk of the file.
Because of this, we have a simple guideline: make anonymous namespaces as small as possible, and only use them for class declarations. For example, this is good:
namespace {
class StringSort {
...
public:
StringSort(...)
bool operator<(const char *RHS) const;
};
} // end anonymous namespace
static void runHelper() {
...
}
bool StringSort::operator<(const char *RHS) const {
...
}
This is bad:
namespace {
class StringSort {
...
public:
StringSort(...)
bool operator<(const char *RHS) const;
};
void runHelper() {
...
}
bool StringSort::operator<(const char *RHS) const {
...
}
} // end anonymous namespace
This is bad specifically because if you’re looking at “runHelper
” in the middle
of a large C++ file, that you have no immediate way to tell if it is local to
the file. When it is marked static explicitly, this is immediately obvious.
Also, there is no reason to enclose the definition of “operator<
” in the
namespace just because it was declared there.
See Also¶
A lot of these comments and recommendations have been culled from other sources. Two particularly important books for our work are:
- Effective C++ by Scott Meyers. Also interesting and useful are “More Effective C++” and “Effective STL” by the same author.
- Large-Scale C++ Software Design by John Lakos
If you get some free time, and you haven’t read them: do so, you might learn something.
CommandLine 2.0 Library Manual¶
Introduction¶
This document describes the CommandLine argument processing library. It will show you how to use it, and what it can do. The CommandLine library uses a declarative approach to specifying the command line options that your program takes. By default, these options declarations implicitly hold the value parsed for the option declared (of course this can be changed).
Although there are a lot of command line argument parsing libraries out there in many different languages, none of them fit well with what I needed. By looking at the features and problems of other libraries, I designed the CommandLine library to have the following features:
- Speed: The CommandLine library is very quick and uses little resources. The parsing time of the library is directly proportional to the number of arguments parsed, not the number of options recognized. Additionally, command line argument values are captured transparently into user defined global variables, which can be accessed like any other variable (and with the same performance).
- Type Safe: As a user of CommandLine, you don’t have to worry about remembering the type of arguments that you want (is it an int? a string? a bool? an enum?) and keep casting it around. Not only does this help prevent error prone constructs, it also leads to dramatically cleaner source code.
- No subclasses required: To use CommandLine, you instantiate variables that correspond to the arguments that you would like to capture, you don’t subclass a parser. This means that you don’t have to write any boilerplate code.
- Globally accessible: Libraries can specify command line arguments that are automatically enabled in any tool that links to the library. This is possible because the application doesn’t have to keep a list of arguments to pass to the parser. This also makes supporting dynamically loaded options trivial.
- Cleaner: CommandLine supports enum and other types directly, meaning that there is less error and more security built into the library. You don’t have to worry about whether your integral command line argument accidentally got assigned a value that is not valid for your enum type.
- Powerful: The CommandLine library supports many different types of arguments, from simple boolean flags to scalars arguments (strings, integers, enums, doubles), to lists of arguments. This is possible because CommandLine is...
- Extensible: It is very simple to add a new argument type to CommandLine. Simply specify the parser that you want to use with the command line option when you declare it. Custom parsers are no problem.
- Labor Saving: The CommandLine library cuts down on the amount of grunt work
that you, the user, have to do. For example, it automatically provides a
-help
option that shows the available command line options for your tool. Additionally, it does most of the basic correctness checking for you. - Capable: The CommandLine library can handle lots of different forms of
options often found in real programs. For example, positional arguments,
ls
style grouping options (to allow processing ‘ls -lad
‘ naturally),ld
style prefix options (to parse ‘-lmalloc -L/usr/lib
‘), and interpreter style options.
This document will hopefully let you jump in and start using CommandLine in your utility quickly and painlessly. Additionally it should be a simple reference manual to figure out how stuff works.
Quick Start Guide¶
This section of the manual runs through a simple CommandLine’ification of a basic compiler tool. This is intended to show you how to jump into using the CommandLine library in your own program, and show you some of the cool things it can do.
To start out, you need to include the CommandLine header file into your program:
#include "llvm/Support/CommandLine.h"
Additionally, you need to add this as the first line of your main program:
int main(int argc, char **argv) {
cl::ParseCommandLineOptions(argc, argv);
...
}
... which actually parses the arguments and fills in the variable declarations.
Now that you are ready to support command line arguments, we need to tell the
system which ones we want, and what type of arguments they are. The CommandLine
library uses a declarative syntax to model command line arguments with the
global variable declarations that capture the parsed values. This means that
for every command line option that you would like to support, there should be a
global variable declaration to capture the result. For example, in a compiler,
we would like to support the Unix-standard ‘-o <filename>
‘ option to specify
where to put the output. With the CommandLine library, this is represented like
this:
cl::opt<string> OutputFilename("o", cl::desc("Specify output filename"), cl::value_desc("filename"));
This declares a global variable “OutputFilename
” that is used to capture the
result of the “o
” argument (first parameter). We specify that this is a
simple scalar option by using the “cl::opt
” template (as opposed to the
“cl::list
” template), and tell the CommandLine library that the data
type that we are parsing is a string.
The second and third parameters (which are optional) are used to specify what to
output for the “-help
” option. In this case, we get a line that looks like
this:
USAGE: compiler [options]
OPTIONS:
-help - display available options (-help-hidden for more)
-o <filename> - Specify output filename
Because we specified that the command line option should parse using the
string
data type, the variable declared is automatically usable as a real
string in all contexts that a normal C++ string object may be used. For
example:
...
std::ofstream Output(OutputFilename.c_str());
if (Output.good()) ...
...
There are many different options that you can use to customize the command line option handling library, but the above example shows the general interface to these options. The options can be specified in any order, and are specified with helper functions like cl::desc(...), so there are no positional dependencies to remember. The available options are discussed in detail in the Reference Guide.
Continuing the example, we would like to have our compiler take an input
filename as well as an output filename, but we do not want the input filename to
be specified with a hyphen (ie, not -filename.c
). To support this style of
argument, the CommandLine library allows for positional arguments to be
specified for the program. These positional arguments are filled with command
line parameters that are not in option form. We use this feature like this:
cl::opt<string> InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
This declaration indicates that the first positional argument should be treated as the input filename. Here we use the cl::init option to specify an initial value for the command line option, which is used if the option is not specified (if you do not specify a cl::init modifier for an option, then the default constructor for the data type is used to initialize the value). Command line options default to being optional, so if we would like to require that the user always specify an input filename, we would add the cl::Required flag, and we could eliminate the cl::init modifier, like this:
cl::opt<string> InputFilename(cl::Positional, cl::desc("<input file>"), cl::Required);
Again, the CommandLine library does not require the options to be specified in any particular order, so the above declaration is equivalent to:
cl::opt<string> InputFilename(cl::Positional, cl::Required, cl::desc("<input file>"));
By simply adding the cl::Required flag, the CommandLine library will
automatically issue an error if the argument is not specified, which shifts all
of the command line option verification code out of your application into the
library. This is just one example of how using flags can alter the default
behaviour of the library, on a per-option basis. By adding one of the
declarations above, the -help
option synopsis is now extended to:
USAGE: compiler [options] <input file>
OPTIONS:
-help - display available options (-help-hidden for more)
-o <filename> - Specify output filename
... indicating that an input filename is expected.
Boolean Arguments¶
In addition to input and output filenames, we would like the compiler example to
support three boolean flags: “-f
” to force writing binary output to a
terminal, “--quiet
” to enable quiet mode, and “-q
” for backwards
compatibility with some of our users. We can support these by declaring options
of boolean type like this:
cl::opt<bool> Force ("f", cl::desc("Enable binary output on terminals"));
cl::opt<bool> Quiet ("quiet", cl::desc("Don't print informational messages"));
cl::opt<bool> Quiet2("q", cl::desc("Don't print informational messages"), cl::Hidden);
This does what you would expect: it declares three boolean variables
(“Force
”, “Quiet
”, and “Quiet2
”) to recognize these options. Note
that the “-q
” option is specified with the “cl::Hidden” flag. This
modifier prevents it from being shown by the standard “-help
” output (note
that it is still shown in the “-help-hidden
” output).
The CommandLine library uses a different parser for different data types.
For example, in the string case, the argument passed to the option is copied
literally into the content of the string variable... we obviously cannot do that
in the boolean case, however, so we must use a smarter parser. In the case of
the boolean parser, it allows no options (in which case it assigns the value of
true to the variable), or it allows the values “true
” or “false
” to be
specified, allowing any of the following inputs:
compiler -f # No value, 'Force' == true
compiler -f=true # Value specified, 'Force' == true
compiler -f=TRUE # Value specified, 'Force' == true
compiler -f=FALSE # Value specified, 'Force' == false
... you get the idea. The bool parser just turns the string values into
boolean values, and rejects things like ‘compiler -f=foo
‘. Similarly, the
float, double, and int parsers work like you would expect, using the
‘strtol
‘ and ‘strtod
‘ C library calls to parse the string value into the
specified data type.
With the declarations above, “compiler -help
” emits this:
USAGE: compiler [options] <input file>
OPTIONS:
-f - Enable binary output on terminals
-o - Override output filename
-quiet - Don't print informational messages
-help - display available options (-help-hidden for more)
and “compiler -help-hidden
” prints this:
USAGE: compiler [options] <input file>
OPTIONS:
-f - Enable binary output on terminals
-o - Override output filename
-q - Don't print informational messages
-quiet - Don't print informational messages
-help - display available options (-help-hidden for more)
This brief example has shown you how to use the ‘cl::opt‘ class to parse simple scalar command line arguments. In addition to simple scalar arguments, the CommandLine library also provides primitives to support CommandLine option aliases, and lists of options.
Argument Aliases¶
So far, the example works well, except for the fact that we need to check the quiet condition like this now:
...
if (!Quiet && !Quiet2) printInformationalMessage(...);
...
... which is a real pain! Instead of defining two values for the same
condition, we can use the “cl::alias” class to make the “-q
” option an
alias for the “-quiet
” option, instead of providing a value itself:
cl::opt<bool> Force ("f", cl::desc("Overwrite output files"));
cl::opt<bool> Quiet ("quiet", cl::desc("Don't print informational messages"));
cl::alias QuietA("q", cl::desc("Alias for -quiet"), cl::aliasopt(Quiet));
The third line (which is the only one we modified from above) defines a “-q
”
alias that updates the “Quiet
” variable (as specified by the cl::aliasopt
modifier) whenever it is specified. Because aliases do not hold state, the only
thing the program has to query is the Quiet
variable now. Another nice
feature of aliases is that they automatically hide themselves from the -help
output (although, again, they are still visible in the -help-hidden output
).
Now the application code can simply use:
...
if (!Quiet) printInformationalMessage(...);
...
... which is much nicer! The “cl::alias” can be used to specify an alternative name for any variable type, and has many uses.
Selecting an alternative from a set of possibilities¶
So far we have seen how the CommandLine library handles builtin types like
std::string
, bool
and int
, but how does it handle things it doesn’t
know about, like enums or ‘int*
‘s?
The answer is that it uses a table-driven generic parser (unless you specify your own parser, as described in the Extension Guide). This parser maps literal strings to whatever type is required, and requires you to tell it what this mapping should be.
Let’s say that we would like to add four optimization levels to our optimizer,
using the standard flags “-g
”, “-O0
”, “-O1
”, and “-O2
”. We
could easily implement this with boolean options like above, but there are
several problems with this strategy:
- A user could specify more than one of the options at a time, for example,
“
compiler -O3 -O2
”. The CommandLine library would not be able to catch this erroneous input for us. - We would have to test 4 different variables to see which ones are set.
- This doesn’t map to the numeric levels that we want... so we cannot easily
see if some level >= “
-O1
” is enabled.
To cope with these problems, we can use an enum value, and have the CommandLine library fill it in with the appropriate level directly, which is used like this:
enum OptLevel {
g, O1, O2, O3
};
cl::opt<OptLevel> OptimizationLevel(cl::desc("Choose optimization level:"),
cl::values(
clEnumVal(g , "No optimizations, enable debugging"),
clEnumVal(O1, "Enable trivial optimizations"),
clEnumVal(O2, "Enable default optimizations"),
clEnumVal(O3, "Enable expensive optimizations"),
clEnumValEnd));
...
if (OptimizationLevel >= O2) doPartialRedundancyElimination(...);
...
This declaration defines a variable “OptimizationLevel
” of the
“OptLevel
” enum type. This variable can be assigned any of the values that
are listed in the declaration (Note that the declaration list must be terminated
with the “clEnumValEnd
” argument!). The CommandLine library enforces that
the user can only specify one of the options, and it ensure that only valid enum
values can be specified. The “clEnumVal
” macros ensure that the command
line arguments matched the enum values. With this option added, our help output
now is:
USAGE: compiler [options] <input file>
OPTIONS:
Choose optimization level:
-g - No optimizations, enable debugging
-O1 - Enable trivial optimizations
-O2 - Enable default optimizations
-O3 - Enable expensive optimizations
-f - Enable binary output on terminals
-help - display available options (-help-hidden for more)
-o <filename> - Specify output filename
-quiet - Don't print informational messages
In this case, it is sort of awkward that flag names correspond directly to enum
names, because we probably don’t want a enum definition named “g
” in our
program. Because of this, we can alternatively write this example like this:
enum OptLevel {
Debug, O1, O2, O3
};
cl::opt<OptLevel> OptimizationLevel(cl::desc("Choose optimization level:"),
cl::values(
clEnumValN(Debug, "g", "No optimizations, enable debugging"),
clEnumVal(O1 , "Enable trivial optimizations"),
clEnumVal(O2 , "Enable default optimizations"),
clEnumVal(O3 , "Enable expensive optimizations"),
clEnumValEnd));
...
if (OptimizationLevel == Debug) outputDebugInfo(...);
...
By using the “clEnumValN
” macro instead of “clEnumVal
”, we can directly
specify the name that the flag should get. In general a direct mapping is nice,
but sometimes you can’t or don’t want to preserve the mapping, which is when you
would use it.
Named Alternatives¶
Another useful argument form is a named alternative style. We shall use this
style in our compiler to specify different debug levels that can be used.
Instead of each debug level being its own switch, we want to support the
following options, of which only one can be specified at a time:
“--debug-level=none
”, “--debug-level=quick
”,
“--debug-level=detailed
”. To do this, we use the exact same format as our
optimization level flags, but we also specify an option name. For this case,
the code looks like this:
enum DebugLev {
nodebuginfo, quick, detailed
};
// Enable Debug Options to be specified on the command line
cl::opt<DebugLev> DebugLevel("debug_level", cl::desc("Set the debugging level:"),
cl::values(
clEnumValN(nodebuginfo, "none", "disable debug information"),
clEnumVal(quick, "enable quick debug information"),
clEnumVal(detailed, "enable detailed debug information"),
clEnumValEnd));
This definition defines an enumerated command line variable of type “enum
DebugLev
”, which works exactly the same way as before. The difference here is
just the interface exposed to the user of your program and the help output by
the “-help
” option:
USAGE: compiler [options] <input file>
OPTIONS:
Choose optimization level:
-g - No optimizations, enable debugging
-O1 - Enable trivial optimizations
-O2 - Enable default optimizations
-O3 - Enable expensive optimizations
-debug_level - Set the debugging level:
=none - disable debug information
=quick - enable quick debug information
=detailed - enable detailed debug information
-f - Enable binary output on terminals
-help - display available options (-help-hidden for more)
-o <filename> - Specify output filename
-quiet - Don't print informational messages
Again, the only structural difference between the debug level declaration and
the optimization level declaration is that the debug level declaration includes
an option name ("debug_level"
), which automatically changes how the library
processes the argument. The CommandLine library supports both forms so that you
can choose the form most appropriate for your application.
Parsing a list of options¶
Now that we have the standard run-of-the-mill argument types out of the way,
lets get a little wild and crazy. Lets say that we want our optimizer to accept
a list of optimizations to perform, allowing duplicates. For example, we
might want to run: “compiler -dce -constprop -inline -dce -strip
”. In this
case, the order of the arguments and the number of appearances is very
important. This is what the “cl::list
” template is for. First, start by
defining an enum of the optimizations that you would like to perform:
enum Opts {
// 'inline' is a C++ keyword, so name it 'inlining'
dce, constprop, inlining, strip
};
Then define your “cl::list
” variable:
cl::list<Opts> OptimizationList(cl::desc("Available Optimizations:"),
cl::values(
clEnumVal(dce , "Dead Code Elimination"),
clEnumVal(constprop , "Constant Propagation"),
clEnumValN(inlining, "inline", "Procedure Integration"),
clEnumVal(strip , "Strip Symbols"),
clEnumValEnd));
This defines a variable that is conceptually of the type
“std::vector<enum Opts>
”. Thus, you can access it with standard vector
methods:
for (unsigned i = 0; i != OptimizationList.size(); ++i)
switch (OptimizationList[i])
...
... to iterate through the list of options specified.
Note that the “cl::list
” template is completely general and may be used with
any data types or other arguments that you can use with the “cl::opt
”
template. One especially useful way to use a list is to capture all of the
positional arguments together if there may be more than one specified. In the
case of a linker, for example, the linker takes several ‘.o
‘ files, and
needs to capture them into a list. This is naturally specified as:
...
cl::list<std::string> InputFilenames(cl::Positional, cl::desc("<Input files>"), cl::OneOrMore);
...
This variable works just like a “vector<string>
” object. As such, accessing
the list is simple, just like above. In this example, we used the
cl::OneOrMore modifier to inform the CommandLine library that it is an error
if the user does not specify any .o
files on our command line. Again, this
just reduces the amount of checking we have to do.
Collecting options as a set of flags¶
Instead of collecting sets of options in a list, it is also possible to gather
information for enum values in a bit vector. The representation used by the
cl::bits class is an unsigned
integer. An enum value is represented by a
0/1 in the enum’s ordinal value bit position. 1 indicating that the enum was
specified, 0 otherwise. As each specified value is parsed, the resulting enum’s
bit is set in the option’s bit vector:
bits |= 1 << (unsigned)enum;
Options that are specified multiple times are redundant. Any instances after the first are discarded.
Reworking the above list example, we could replace cl::list with cl::bits:
cl::bits<Opts> OptimizationBits(cl::desc("Available Optimizations:"),
cl::values(
clEnumVal(dce , "Dead Code Elimination"),
clEnumVal(constprop , "Constant Propagation"),
clEnumValN(inlining, "inline", "Procedure Integration"),
clEnumVal(strip , "Strip Symbols"),
clEnumValEnd));
To test to see if constprop
was specified, we can use the cl:bits::isSet
function:
if (OptimizationBits.isSet(constprop)) {
...
}
It’s also possible to get the raw bit vector using the cl::bits::getBits
function:
unsigned bits = OptimizationBits.getBits();
Finally, if external storage is used, then the location specified must be of
type unsigned
. In all other ways a cl::bits option is equivalent to a
cl::list option.
Adding freeform text to help output¶
As our program grows and becomes more mature, we may decide to put summary
information about what it does into the help output. The help output is styled
to look similar to a Unix man
page, providing concise information about a
program. Unix man
pages, however often have a description about what the
program does. To add this to your CommandLine program, simply pass a third
argument to the cl::ParseCommandLineOptions call in main. This additional
argument is then printed as the overview information for your program, allowing
you to include any additional information that you want. For example:
int main(int argc, char **argv) {
cl::ParseCommandLineOptions(argc, argv, " CommandLine compiler example\n\n"
" This program blah blah blah...\n");
...
}
would yield the help output:
**OVERVIEW: CommandLine compiler example
This program blah blah blah...**
USAGE: compiler [options] <input file>
OPTIONS:
...
-help - display available options (-help-hidden for more)
-o <filename> - Specify output filename
Grouping options into categories¶
If our program has a large number of options it may become difficult for users
of our tool to navigate the output of -help
. To alleviate this problem we
can put our options into categories. This can be done by declaring option
categories (cl::OptionCategory objects) and then placing our options into
these categories using the cl::cat option attribute. For example:
cl::OptionCategory StageSelectionCat("Stage Selection Options",
"These control which stages are run.");
cl::opt<bool> Preprocessor("E",cl::desc("Run preprocessor stage."),
cl::cat(StageSelectionCat));
cl::opt<bool> NoLink("c",cl::desc("Run all stages except linking."),
cl::cat(StageSelectionCat));
The output of -help
will become categorized if an option category is
declared. The output looks something like
OVERVIEW: This is a small program to demo the LLVM CommandLine API
USAGE: Sample [options]
OPTIONS:
General options:
-help - Display available options (-help-hidden for more)
-help-list - Display list of available options (-help-list-hidden for more)
Stage Selection Options:
These control which stages are run.
-E - Run preprocessor stage.
-c - Run all stages except linking.
In addition to the behaviour of -help
changing when an option category is
declared, the command line option -help-list
becomes visible which will
print the command line options as uncategorized list.
Note that Options that are not explicitly categorized will be placed in the
cl::GeneralCategory
category.
Reference Guide¶
Now that you know the basics of how to use the CommandLine library, this section will give you the detailed information you need to tune how command line options work, as well as information on more “advanced” command line option processing capabilities.
Positional Arguments¶
Positional arguments are those arguments that are not named, and are not
specified with a hyphen. Positional arguments should be used when an option is
specified by its position alone. For example, the standard Unix grep
tool
takes a regular expression argument, and an optional filename to search through
(which defaults to standard input if a filename is not specified). Using the
CommandLine library, this would be specified as:
cl::opt<string> Regex (cl::Positional, cl::desc("<regular expression>"), cl::Required);
cl::opt<string> Filename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
Given these two option declarations, the -help
output for our grep
replacement would look like this:
USAGE: spiffygrep [options] <regular expression> <input file>
OPTIONS:
-help - display available options (-help-hidden for more)
... and the resultant program could be used just like the standard grep
tool.
Positional arguments are sorted by their order of construction. This means that command line options will be ordered according to how they are listed in a .cpp file, but will not have an ordering defined if the positional arguments are defined in multiple .cpp files. The fix for this problem is simply to define all of your positional arguments in one .cpp file.
Specifying positional options with hyphens¶
Sometimes you may want to specify a value to your positional argument that
starts with a hyphen (for example, searching for ‘-foo
‘ in a file). At
first, you will have trouble doing this, because it will try to find an argument
named ‘-foo
‘, and will fail (and single quotes will not save you). Note
that the system grep
has the same problem:
$ spiffygrep '-foo' test.txt
Unknown command line argument '-foo'. Try: spiffygrep -help'
$ grep '-foo' test.txt
grep: illegal option -- f
grep: illegal option -- o
grep: illegal option -- o
Usage: grep -hblcnsviw pattern file . . .
The solution for this problem is the same for both your tool and the system
version: use the ‘--
‘ marker. When the user specifies ‘--
‘ on the
command line, it is telling the program that all options after the ‘--
‘
should be treated as positional arguments, not options. Thus, we can use it
like this:
$ spiffygrep -- -foo test.txt
...output...
Determining absolute position with getPosition()¶
Sometimes an option can affect or modify the meaning of another option. For
example, consider gcc
‘s -x LANG
option. This tells gcc
to ignore the
suffix of subsequent positional arguments and force the file to be interpreted
as if it contained source code in language LANG
. In order to handle this
properly, you need to know the absolute position of each argument, especially
those in lists, so their interaction(s) can be applied correctly. This is also
useful for options like -llibname
which is actually a positional argument
that starts with a dash.
So, generally, the problem is that you have two cl::list
variables that
interact in some way. To ensure the correct interaction, you can use the
cl::list::getPosition(optnum)
method. This method returns the absolute
position (as found on the command line) of the optnum
item in the
cl::list
.
The idiom for usage is like this:
static cl::list<std::string> Files(cl::Positional, cl::OneOrMore);
static cl::list<std::string> Libraries("l", cl::ZeroOrMore);
int main(int argc, char**argv) {
// ...
std::vector<std::string>::iterator fileIt = Files.begin();
std::vector<std::string>::iterator libIt = Libraries.begin();
unsigned libPos = 0, filePos = 0;
while ( 1 ) {
if ( libIt != Libraries.end() )
libPos = Libraries.getPosition( libIt - Libraries.begin() );
else
libPos = 0;
if ( fileIt != Files.end() )
filePos = Files.getPosition( fileIt - Files.begin() );
else
filePos = 0;
if ( filePos != 0 && (libPos == 0 || filePos < libPos) ) {
// Source File Is next
++fileIt;
}
else if ( libPos != 0 && (filePos == 0 || libPos < filePos) ) {
// Library is next
++libIt;
}
else
break; // we're done with the list
}
}
Note that, for compatibility reasons, the cl::opt
also supports an
unsigned getPosition()
option that will provide the absolute position of
that option. You can apply the same approach as above with a cl::opt
and a
cl::list
option as you can with two lists.
The cl::ConsumeAfter
modifier¶
The cl::ConsumeAfter
formatting option is used to construct programs that
use “interpreter style” option processing. With this style of option
processing, all arguments specified after the last positional argument are
treated as special interpreter arguments that are not interpreted by the command
line argument.
As a concrete example, lets say we are developing a replacement for the standard
Unix Bourne shell (/bin/sh
). To run /bin/sh
, first you specify options
to the shell itself (like -x
which turns on trace output), then you specify
the name of the script to run, then you specify arguments to the script. These
arguments to the script are parsed by the Bourne shell command line option
processor, but are not interpreted as options to the shell itself. Using the
CommandLine library, we would specify this as:
cl::opt<string> Script(cl::Positional, cl::desc("<input script>"), cl::init("-"));
cl::list<string> Argv(cl::ConsumeAfter, cl::desc("<program arguments>..."));
cl::opt<bool> Trace("x", cl::desc("Enable trace output"));
which automatically provides the help output:
USAGE: spiffysh [options] <input script> <program arguments>...
OPTIONS:
-help - display available options (-help-hidden for more)
-x - Enable trace output
At runtime, if we run our new shell replacement as `spiffysh -x test.sh -a -x
-y bar
‘, the Trace
variable will be set to true, the Script
variable
will be set to “test.sh
”, and the Argv
list will contain ["-a", "-x",
"-y", "bar"]
, because they were specified after the last positional argument
(which is the script name).
There are several limitations to when cl::ConsumeAfter
options can be
specified. For example, only one cl::ConsumeAfter
can be specified per
program, there must be at least one positional argument specified, there must
not be any cl::list positional arguments, and the cl::ConsumeAfter
option
should be a cl::list option.
Internal vs External Storage¶
By default, all command line options automatically hold the value that they parse from the command line. This is very convenient in the common case, especially when combined with the ability to define command line options in the files that use them. This is called the internal storage model.
Sometimes, however, it is nice to separate the command line option processing
code from the storage of the value parsed. For example, lets say that we have a
‘-debug
‘ option that we would like to use to enable debug information across
the entire body of our program. In this case, the boolean value controlling the
debug code should be globally accessible (in a header file, for example) yet the
command line option processing code should not be exposed to all of these
clients (requiring lots of .cpp files to #include CommandLine.h
).
To do this, set up your .h file with your option, like this for example:
// DebugFlag.h - Get access to the '-debug' command line option
//
// DebugFlag - This boolean is set to true if the '-debug' command line option
// is specified. This should probably not be referenced directly, instead, use
// the DEBUG macro below.
//
extern bool DebugFlag;
// DEBUG macro - This macro should be used by code to emit debug information.
// In the '-debug' option is specified on the command line, and if this is a
// debug build, then the code specified as the option to the macro will be
// executed. Otherwise it will not be.
#ifdef NDEBUG
#define DEBUG(X)
#else
#define DEBUG(X) do { if (DebugFlag) { X; } } while (0)
#endif
This allows clients to blissfully use the DEBUG()
macro, or the
DebugFlag
explicitly if they want to. Now we just need to be able to set
the DebugFlag
boolean when the option is set. To do this, we pass an
additional argument to our command line argument processor, and we specify where
to fill in with the cl::location attribute:
bool DebugFlag; // the actual value
static cl::opt<bool, true> // The parser
Debug("debug", cl::desc("Enable debug output"), cl::Hidden, cl::location(DebugFlag));
In the above example, we specify “true
” as the second argument to the
cl::opt template, indicating that the template should not maintain a copy of
the value itself. In addition to this, we specify the cl::location
attribute, so that DebugFlag
is automatically set.
Option Attributes¶
This section describes the basic attributes that you can specify on options.
The option name attribute (which is required for all options, except positional options) specifies what the option name is. This option is specified in simple double quotes:
cl::opt<bool> Quiet("quiet");
- The cl::desc attribute specifies a description for the option to be
shown in the
-help
output for the program. This attribute supports multi-line descriptions with lines separated by ‘n’.
- The cl::value_desc attribute specifies a string that can be used to
fine tune the
-help
output for a command line option. Look here for an example.
The cl::init attribute specifies an initial value for a scalar option. If this attribute is not specified then the command line option value defaults to the value created by the default constructor for the type.
Warning
If you specify both cl::init and cl::location for an option, you must specify cl::location first, so that when the command-line parser sees cl::init, it knows where to put the initial value. (You will get an error at runtime if you don’t put them in the right order.)
- The cl::location attribute where to store the value for a parsed command line option if using external storage. See the section on Internal vs External Storage for more information.
- The cl::aliasopt attribute specifies which option a cl::alias option is an alias for.
The cl::values attribute specifies the string-to-value mapping to be used by the generic parser. It takes a clEnumValEnd terminated list of (option, value, description) triplets that specify the option name, the value mapped to, and the description shown in the
-help
for the tool. Because the generic parser is used most frequently with enum values, two macros are often useful:- The clEnumVal macro is used as a nice simple way to specify a triplet for an enum. This macro automatically makes the option name be the same as the enum name. The first option to the macro is the enum, the second is the description for the command line option.
- The clEnumValN macro is used to specify macro options where the option name doesn’t equal the enum name. For this macro, the first argument is the enum value, the second is the flag name, and the second is the description.
You will get a compile time error if you try to use cl::values with a parser that does not support it.
- The cl::multi_val attribute specifies that this option takes has multiple
values (example:
-sectalign segname sectname sectvalue
). This attribute takes one unsigned argument - the number of values for the option. This attribute is valid only oncl::list
options (and will fail with compile error if you try to use it with other option types). It is allowed to use all of the usual modifiers on multi-valued options (besidescl::ValueDisallowed
, obviously).
- The cl::cat attribute specifies the option category that the option belongs to. The category should be a cl::OptionCategory object.
Option Modifiers¶
Option modifiers are the flags and expressions that you pass into the
constructors for cl::opt and cl::list. These modifiers give you the
ability to tweak how options are parsed and how -help
output is generated to
fit your application well.
These options fall into five main categories:
- Hiding an option from
-help
output - Controlling the number of occurrences required and allowed
- Controlling whether or not a value must be specified
- Controlling other formatting options
- Miscellaneous option modifiers
It is not possible to specify two options from the same category (you’ll get a runtime error) to a single option, except for options in the miscellaneous category. The CommandLine library specifies defaults for all of these settings that are the most useful in practice and the most common, which mean that you usually shouldn’t have to worry about these.
Hiding an option from -help
output¶
The cl::NotHidden
, cl::Hidden
, and cl::ReallyHidden
modifiers are
used to control whether or not an option appears in the -help
and
-help-hidden
output for the compiled program:
Controlling the number of occurrences required and allowed¶
This group of options is used to control how many time an option is allowed (or required) to be specified on the command line of your program. Specifying a value for this setting allows the CommandLine library to do error checking for you.
The allowed values for this option group are:
- The cl::Optional modifier (which is the default for the cl::opt and cl::alias classes) indicates that your program will allow either zero or one occurrence of the option to be specified.
- The cl::ZeroOrMore modifier (which is the default for the cl::list class) indicates that your program will allow the option to be specified zero or more times.
- The cl::Required modifier indicates that the specified option must be specified exactly one time.
- The cl::OneOrMore modifier indicates that the option must be specified at least one time.
- The cl::ConsumeAfter modifier is described in the Positional arguments section.
If an option is not specified, then the value of the option is equal to the
value specified by the cl::init attribute. If the cl::init
attribute is
not specified, the option value is initialized with the default constructor for
the data type.
If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained.
Controlling whether or not a value must be specified¶
This group of options is used to control whether or not the option allows a
value to be present. In the case of the CommandLine library, a value is either
specified with an equal sign (e.g. ‘-index-depth=17
‘) or as a trailing
string (e.g. ‘-o a.out
‘).
The allowed values for this option group are:
- The cl::ValueOptional modifier (which is the default for
bool
typed options) specifies that it is acceptable to have a value, or not. A boolean argument can be enabled just by appearing on the command line, or it can have an explicit ‘-foo=true
‘. If an option is specified with this mode, it is illegal for the value to be provided without the equal sign. Therefore ‘-foo true
‘ is illegal. To get this behavior, you must use the cl::ValueRequired modifier.
- The cl::ValueRequired modifier (which is the default for all other types
except for unnamed alternatives using the generic parser) specifies that a
value must be provided. This mode informs the command line library that if an
option is not provides with an equal sign, that the next argument provided
must be the value. This allows things like ‘
-o a.out
‘ to work.
- The cl::ValueDisallowed modifier (which is the default for unnamed
alternatives using the generic parser) indicates that it is a runtime error
for the user to specify a value. This can be provided to disallow users from
providing options to boolean options (like ‘
-foo=true
‘).
In general, the default values for this option group work just like you would want them to. As mentioned above, you can specify the cl::ValueDisallowed modifier to a boolean argument to restrict your command line parser. These options are mostly useful when extending the library.
Controlling other formatting options¶
The formatting option group is used to specify that the command line option has special abilities and is otherwise different from other command line arguments. As usual, you can only specify one of these arguments at most.
- The cl::NormalFormatting modifier (which is the default all options) specifies that this option is “normal”.
- The cl::Positional modifier specifies that this is a positional argument that does not have a command line option associated with it. See the Positional Arguments section for more information.
- The cl::ConsumeAfter modifier specifies that this option is used to capture “interpreter style” arguments. See this section for more information.
- The cl::Prefix modifier specifies that this option prefixes its value.
With ‘Prefix’ options, the equal sign does not separate the value from the
option name specified. Instead, the value is everything after the prefix,
including any equal sign if present. This is useful for processing odd
arguments like
-lmalloc
and-L/usr/lib
in a linker tool or-DNAME=value
in a compiler tool. Here, the ‘l
‘, ‘D
‘ and ‘L
‘ options are normal string (or list) options, that have the cl::Prefix modifier added to allow the CommandLine library to recognize them. Note that cl::Prefix options must not have the cl::ValueDisallowed modifier specified.
- The cl::Grouping modifier is used to implement Unix-style tools (like
ls
) that have lots of single letter arguments, but only require a single dash. For example, the ‘ls -labF
‘ command actually enables four different options, all of which are single letters. Note that cl::Grouping options cannot have values.
The CommandLine library does not restrict how you use the cl::Prefix or cl::Grouping modifiers, but it is possible to specify ambiguous argument settings. Thus, it is possible to have multiple letter options that are prefix or grouping options, and they will still work as designed.
To do this, the CommandLine library uses a greedy algorithm to parse the input option into (potentially multiple) prefix and grouping options. The strategy basically looks like this:
parse(string OrigInput) {
1. string input = OrigInput;
2. if (isOption(input)) return getOption(input).parse(); // Normal option
3. while (!isOption(input) && !input.empty()) input.pop_back(); // Remove the last letter
4. if (input.empty()) return error(); // No matching option
5. if (getOption(input).isPrefix())
return getOption(input).parse(input);
6. while (!input.empty()) { // Must be grouping options
getOption(input).parse();
OrigInput.erase(OrigInput.begin(), OrigInput.begin()+input.length());
input = OrigInput;
while (!isOption(input) && !input.empty()) input.pop_back();
}
7. if (!OrigInput.empty()) error();
}
Miscellaneous option modifiers¶
The miscellaneous option modifiers are the only flags where you can specify more than one flag from the set: they are not mutually exclusive. These flags specify boolean properties that modify the option.
- The cl::CommaSeparated modifier indicates that any commas specified for an
option’s value should be used to split the value up into multiple values for
the option. For example, these two options are equivalent when
cl::CommaSeparated
is specified: “-foo=a -foo=b -foo=c
” and “-foo=a,b,c
”. This option only makes sense to be used in a case where the option is allowed to accept one or more values (i.e. it is a cl::list option).
- The cl::PositionalEatsArgs modifier (which only applies to positional
arguments, and only makes sense for lists) indicates that positional argument
should consume any strings after it (including strings that start with a “-”)
up until another recognized positional argument. For example, if you have two
“eating” positional arguments, “
pos1
” and “pos2
”, the string “-pos1 -foo -bar baz -pos2 -bork
” would cause the “-foo -bar -baz
” strings to be applied to the “-pos1
” option and the “-bork
” string to be applied to the “-pos2
” option.
- The cl::Sink modifier is used to handle unknown options. If there is at
least one option with
cl::Sink
modifier specified, the parser passes unrecognized option strings to it as values instead of signaling an error. As withcl::CommaSeparated
, this modifier only makes sense with a cl::list option.
So far, these are the only three miscellaneous option modifiers.
Response files¶
Some systems, such as certain variants of Microsoft Windows and some older Unices have a relatively low limit on command-line length. It is therefore customary to use the so-called ‘response files’ to circumvent this restriction. These files are mentioned on the command-line (using the “@file”) syntax. The program reads these files and inserts the contents into argv, thereby working around the command-line length limits. Response files are enabled by an optional fourth argument to cl::ParseEnvironmentOptions and cl::ParseCommandLineOptions.
Top-Level Classes and Functions¶
Despite all of the built-in flexibility, the CommandLine option library really only consists of one function cl::ParseCommandLineOptions) and three main classes: cl::opt, cl::list, and cl::alias. This section describes these three classes in detail.
The cl::getRegisteredOptions
function¶
The cl::getRegisteredOptions
function is designed to give a programmer
access to declared non-positional command line options so that how they appear
in -help
can be modified prior to calling cl::ParseCommandLineOptions.
Note this method should not be called during any static initialisation because
it cannot be guaranteed that all options will have been initialised. Hence it
should be called from main
.
This function can be used to gain access to options declared in libraries that the tool writter may not have direct access to.
The function retrieves a StringMap that maps the option
string (e.g. -help
) to an Option*
.
Here is an example of how the function could be used:
using namespace llvm;
int main(int argc, char **argv) {
cl::OptionCategory AnotherCategory("Some options");
StringMap<cl::Option*> Map;
cl::getRegisteredOptions(Map);
//Unhide useful option and put it in a different category
assert(Map.count("print-all-options") > 0);
Map["print-all-options"]->setHiddenFlag(cl::NotHidden);
Map["print-all-options"]->setCategory(AnotherCategory);
//Hide an option we don't want to see
assert(Map.count("enable-no-infs-fp-math") > 0);
Map["enable-no-infs-fp-math"]->setHiddenFlag(cl::Hidden);
//Change --version to --show-version
assert(Map.count("version") > 0);
Map["version"]->setArgStr("show-version");
//Change --help description
assert(Map.count("help") > 0);
Map["help"]->setDescription("Shows help");
cl::ParseCommandLineOptions(argc, argv, "This is a small program to demo the LLVM CommandLine API");
...
}
The cl::ParseCommandLineOptions
function¶
The cl::ParseCommandLineOptions
function is designed to be called directly
from main
, and is used to fill in the values of all of the command line
option variables once argc
and argv
are available.
The cl::ParseCommandLineOptions
function requires two parameters (argc
and argv
), but may also take an optional third parameter which holds
additional extra text to emit when the -help
option is invoked, and a
fourth boolean parameter that enables response files.
The cl::ParseEnvironmentOptions
function¶
The cl::ParseEnvironmentOptions
function has mostly the same effects as
cl::ParseCommandLineOptions, except that it is designed to take values for
options from an environment variable, for those cases in which reading the
command line is not convenient or desired. It fills in the values of all the
command line option variables just like cl::ParseCommandLineOptions does.
It takes four parameters: the name of the program (since argv
may not be
available, it can’t just look in argv[0]
), the name of the environment
variable to examine, the optional additional extra text to emit when the
-help
option is invoked, and the boolean switch that controls whether
response files should be read.
cl::ParseEnvironmentOptions
will break the environment variable’s value up
into words and then process them using cl::ParseCommandLineOptions.
Note: Currently cl::ParseEnvironmentOptions
does not support quoting, so
an environment variable containing -option "foo bar"
will be parsed as three
words, -option
, "foo
, and bar"
, which is different from what you
would get from the shell with the same input.
The cl::SetVersionPrinter
function¶
The cl::SetVersionPrinter
function is designed to be called directly from
main
and before cl::ParseCommandLineOptions
. Its use is optional. It
simply arranges for a function to be called in response to the --version
option instead of having the CommandLine
library print out the usual version
string for LLVM. This is useful for programs that are not part of LLVM but wish
to use the CommandLine
facilities. Such programs should just define a small
function that takes no arguments and returns void
and that prints out
whatever version information is appropriate for the program. Pass the address of
that function to cl::SetVersionPrinter
to arrange for it to be called when
the --version
option is given by the user.
The cl::opt
class¶
The cl::opt
class is the class used to represent scalar command line
options, and is the one used most of the time. It is a templated class which
can take up to three arguments (all except for the first have default values
though):
namespace cl {
template <class DataType, bool ExternalStorage = false,
class ParserClass = parser<DataType> >
class opt;
}
The first template argument specifies what underlying data type the command line argument is, and is used to select a default parser implementation. The second template argument is used to specify whether the option should contain the storage for the option (the default) or whether external storage should be used to contain the value parsed for the option (see Internal vs External Storage for more information).
The third template argument specifies which parser to use. The default value
selects an instantiation of the parser
class based on the underlying data
type of the option. In general, this default works well for most applications,
so this option is only used when using a custom parser.
The cl::list
class¶
The cl::list
class is the class used to represent a list of command line
options. It too is a templated class which can take up to three arguments:
namespace cl {
template <class DataType, class Storage = bool,
class ParserClass = parser<DataType> >
class list;
}
This class works the exact same as the cl::opt class, except that the second
argument is the type of the external storage, not a boolean value. For this
class, the marker type ‘bool
‘ is used to indicate that internal storage
should be used.
The cl::bits
class¶
The cl::bits
class is the class used to represent a list of command line
options in the form of a bit vector. It is also a templated class which can
take up to three arguments:
namespace cl {
template <class DataType, class Storage = bool,
class ParserClass = parser<DataType> >
class bits;
}
This class works the exact same as the cl::list class, except that the second
argument must be of type unsigned
if external storage is used.
The cl::alias
class¶
The cl::alias
class is a nontemplated class that is used to form aliases for
other arguments.
namespace cl {
class alias;
}
The cl::aliasopt attribute should be used to specify which option this is an alias for. Alias arguments default to being cl::Hidden, and use the aliased options parser to do the conversion from string to data.
The cl::extrahelp
class¶
The cl::extrahelp
class is a nontemplated class that allows extra help text
to be printed out for the -help
option.
namespace cl {
struct extrahelp;
}
To use the extrahelp, simply construct one with a const char*
parameter to
the constructor. The text passed to the constructor will be printed at the
bottom of the help message, verbatim. Note that multiple cl::extrahelp
can be used, but this practice is discouraged. If your tool needs to print
additional help information, put all that help into a single cl::extrahelp
instance.
For example:
cl::extrahelp("\nADDITIONAL HELP:\n\n This is the extra help\n");
The cl::OptionCategory
class¶
The cl::OptionCategory
class is a simple class for declaring
option categories.
namespace cl {
class OptionCategory;
}
An option category must have a name and optionally a description which are
passed to the constructor as const char*
.
Note that declaring an option category and associating it with an option before
parsing options (e.g. statically) will change the output of -help
from
uncategorized to categorized. If an option category is declared but not
associated with an option then it will be hidden from the output of -help
but will be shown in the output of -help-hidden
.
Builtin parsers¶
Parsers control how the string value taken from the command line is translated
into a typed value, suitable for use in a C++ program. By default, the
CommandLine library uses an instance of parser<type>
if the command line
option specifies that it uses values of type ‘type
‘. Because of this,
custom option processing is specified with specializations of the ‘parser
‘
class.
The CommandLine library provides the following builtin parser specializations, which are sufficient for most applications. It can, however, also be extended to work with new data types and new ways of interpreting the same data. See the Writing a Custom Parser for more details on this type of library extension.
- The generic
parser<t>
parser can be used to map strings values to any data type, through the use of the cl::values property, which specifies the mapping information. The most common use of this parser is for parsing enum values, which allows you to use the CommandLine library for all of the error checking to make sure that only valid enum values are specified (as opposed to accepting arbitrary strings). Despite this, however, the generic parser class can be used for any data type.
- The parser<bool> specialization is used to convert boolean strings to a
boolean value. Currently accepted strings are “
true
”, “TRUE
”, “True
”, “1
”, “false
”, “FALSE
”, “False
”, and “0
”. - The parser<boolOrDefault> specialization is used for cases where the value is boolean, but we also need to know whether the option was specified at all. boolOrDefault is an enum with 3 values, BOU_UNSET, BOU_TRUE and BOU_FALSE. This parser accepts the same strings as ``parser<bool>``.
- The parser<string> specialization simply stores the parsed string into the string value specified. No conversion or modification of the data is performed.
- The parser<int> specialization uses the C
strtol
function to parse the string input. As such, it will accept a decimal number (with an optional ‘+’ or ‘-‘ prefix) which must start with a non-zero digit. It accepts octal numbers, which are identified with a ‘0
‘ prefix digit, and hexadecimal numbers with a prefix of ‘0x
‘ or ‘0X
‘.
- The parser<double> and parser<float> specializations use the standard
C
strtod
function to convert floating point strings into floating point values. As such, a broad range of string formats is supported, including exponential notation (ex:1.7e15
) and properly supports locales.
Extension Guide¶
Although the CommandLine library has a lot of functionality built into it already (as discussed previously), one of its true strengths lie in its extensibility. This section discusses how the CommandLine library works under the covers and illustrates how to do some simple, common, extensions.
Writing a custom parser¶
One of the simplest and most common extensions is the use of a custom parser. As discussed previously, parsers are the portion of the CommandLine library that turns string input from the user into a particular parsed data type, validating the input in the process.
There are two ways to use a new parser:
Specialize the cl::parser template for your custom data type.
This approach has the advantage that users of your custom data type will automatically use your custom parser whenever they define an option with a value type of your data type. The disadvantage of this approach is that it doesn’t work if your fundamental data type is something that is already supported.
Write an independent class, using it explicitly from options that need it.
This approach works well in situations where you would line to parse an option using special syntax for a not-very-special data-type. The drawback of this approach is that users of your parser have to be aware that they are using your parser instead of the builtin ones.
To guide the discussion, we will discuss a custom parser that accepts file
sizes, specified with an optional unit after the numeric size. For example, we
would like to parse “102kb”, “41M”, “1G” into the appropriate integer value. In
this case, the underlying data type we want to parse into is ‘unsigned
‘. We
choose approach #2 above because we don’t want to make this the default for all
unsigned
options.
To start out, we declare our new FileSizeParser
class:
struct FileSizeParser : public cl::parser<unsigned> {
// parse - Return true on error.
bool parse(cl::Option &O, StringRef ArgName, const std::string &ArgValue,
unsigned &Val);
};
Our new class inherits from the cl::parser
template class to fill in
the default, boiler plate code for us. We give it the data type that we parse
into, the last argument to the parse
method, so that clients of our custom
parser know what object type to pass in to the parse method. (Here we declare
that we parse into ‘unsigned
‘ variables.)
For most purposes, the only method that must be implemented in a custom parser
is the parse
method. The parse
method is called whenever the option is
invoked, passing in the option itself, the option name, the string to parse, and
a reference to a return value. If the string to parse is not well-formed, the
parser should output an error message and return true. Otherwise it should
return false and set ‘Val
‘ to the parsed value. In our example, we
implement parse
as:
bool FileSizeParser::parse(cl::Option &O, StringRef ArgName,
const std::string &Arg, unsigned &Val) {
const char *ArgStart = Arg.c_str();
char *End;
// Parse integer part, leaving 'End' pointing to the first non-integer char
Val = (unsigned)strtol(ArgStart, &End, 0);
while (1) {
switch (*End++) {
case 0: return false; // No error
case 'i': // Ignore the 'i' in KiB if people use that
case 'b': case 'B': // Ignore B suffix
break;
case 'g': case 'G': Val *= 1024*1024*1024; break;
case 'm': case 'M': Val *= 1024*1024; break;
case 'k': case 'K': Val *= 1024; break;
default:
// Print an error message if unrecognized character!
return O.error("'" + Arg + "' value invalid for file size argument!");
}
}
}
This function implements a very simple parser for the kinds of strings we are
interested in. Although it has some holes (it allows “123KKK
” for example),
it is good enough for this example. Note that we use the option itself to print
out the error message (the error
method always returns true) in order to get
a nice error message (shown below). Now that we have our parser class, we can
use it like this:
static cl::opt<unsigned, false, FileSizeParser>
MFS("max-file-size", cl::desc("Maximum file size to accept"),
cl::value_desc("size"));
Which adds this to the output of our program:
OPTIONS:
-help - display available options (-help-hidden for more)
...
-max-file-size=<size> - Maximum file size to accept
And we can test that our parse works correctly now (the test program just prints out the max-file-size argument value):
$ ./test
MFS: 0
$ ./test -max-file-size=123MB
MFS: 128974848
$ ./test -max-file-size=3G
MFS: 3221225472
$ ./test -max-file-size=dog
-max-file-size option: 'dog' value invalid for file size argument!
It looks like it works. The error message that we get is nice and helpful, and we seem to accept reasonable file sizes. This wraps up the “custom parser” tutorial.
Exploiting external storage¶
Several of the LLVM libraries define static cl::opt
instances that will
automatically be included in any program that links with that library. This is
a feature. However, sometimes it is necessary to know the value of the command
line option outside of the library. In these cases the library does or should
provide an external storage location that is accessible to users of the
library. Examples of this include the llvm::DebugFlag
exported by the
lib/Support/Debug.cpp
file and the llvm::TimePassesIsEnabled
flag
exported by the lib/VMCore/PassManager.cpp
file.
Dynamically adding command line options
Architecture & Platform Information for Compiler Writers¶
Note
This document is a work-in-progress. Additions and clarifications are welcome.
NVPTX¶
- CUDA Documentation includes the PTX ISA and Driver API documentation
Miscellaneous Resources¶
- Executable File Format library
- GCC prefetch project page has a good survey of the prefetching capabilities of a variety of modern processors.
Extending LLVM: Adding instructions, intrinsics, types, etc.¶
Introduction and Warning¶
During the course of using LLVM, you may wish to customize it for your research project or for experimentation. At this point, you may realize that you need to add something to LLVM, whether it be a new fundamental type, a new intrinsic function, or a whole new instruction.
When you come to this realization, stop and think. Do you really need to extend
LLVM? Is it a new fundamental capability that LLVM does not support at its
current incarnation or can it be synthesized from already pre-existing LLVM
elements? If you are not sure, ask on the LLVM-dev list. The reason is that
extending LLVM will get involved as you need to update all the different passes
that you intend to use with your extension, and there are many
LLVM analyses
and transformations, so it may be quite a bit of work.
Adding an intrinsic function is far easier than adding an instruction, and is transparent to optimization passes. If your added functionality can be expressed as a function call, an intrinsic function is the method of choice for LLVM extension.
Before you invest a significant amount of effort into a non-trivial extension, ask on the list if what you are looking to do can be done with already-existing infrastructure, or if maybe someone else is already working on it. You will save yourself a lot of time and effort by doing so.
Adding a new intrinsic function¶
Adding a new intrinsic function to LLVM is much easier than adding a new instruction. Almost all extensions to LLVM should start as an intrinsic function and then be turned into an instruction if warranted.
llvm/docs/LangRef.html
:Document the intrinsic. Decide whether it is code generator specific and what the restrictions are. Talk to other people about it so that you are sure it’s a good idea.
llvm/include/llvm/IR/Intrinsics*.td
:Add an entry for your intrinsic. Describe its memory access characteristics for optimization (this controls whether it will be DCE’d, CSE’d, etc). Note that any intrinsic using the
llvm_int_ty
type for an argument will be deemed bytblgen
as overloaded and the corresponding suffix will be required on the intrinsic’s name.llvm/lib/Analysis/ConstantFolding.cpp
:If it is possible to constant fold your intrinsic, add support to it in the
canConstantFoldCallTo
andConstantFoldCall
functions.llvm/test/*
:Add test cases for your test cases to the test suite
Once the intrinsic has been added to the system, you must add code generator support for it. Generally you must do the following steps:
Add support to the .td file for the target(s) of your choice in
lib/Target/*/*.td
.
This is usually a matter of adding a pattern to the .td file that matches the intrinsic, though it may obviously require adding the instructions you want to generate as well. There are lots of examples in the PowerPC and X86 backend to follow.
Adding a new SelectionDAG node¶
As with intrinsics, adding a new SelectionDAG node to LLVM is much easier than adding a new instruction. New nodes are often added to help represent instructions common to many targets. These nodes often map to an LLVM instruction (add, sub) or intrinsic (byteswap, population count). In other cases, new nodes have been added to allow many targets to perform a common task (converting between floating point and integer representation) or capture more complicated behavior in a single node (rotate).
include/llvm/CodeGen/ISDOpcodes.h
:Add an enum value for the new SelectionDAG node.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
:- Add code to print the node to
getOperationName
. If your new node can be evaluated at compile time when given constant arguments (such as an add of a constant with another constant), find the
getNode
method that takes the appropriate number of arguments, and add a case for your node to the switch statement that performs constant folding for nodes that take the same number of arguments as your new node.
- Add code to print the node to
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
:Add code to legalize, promote, and expand the node as necessary. At a minimum, you will need to add a case statement for your node in
LegalizeOp
which calls LegalizeOp on the node’s operands, and returns a new node if any of the operands changed as a result of being legalized. It is likely that not all targets supported by the SelectionDAG framework will natively support the new node. In this case, you must also add code in your node’s case statement inLegalizeOp
to Expand your node into simpler, legal operations. The case forISD::UREM
for expanding a remainder into a divide, multiply, and a subtract is a good example.lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
:- If targets may support the new node being added only at certain sizes, you
will also need to add code to your node’s case statement in
LegalizeOp
to Promote your node’s operands to a larger size, and perform the correct operation. You will also need to add code toPromoteOp
to do this as well. For a good example, seeISD::BSWAP
, which promotes its operand to a wider size, performs the byteswap, and then shifts the correct bytes right to emulate the narrower byteswap in the wider type.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
:Add a case for your node in
ExpandOp
to teach the legalizer how to perform the action represented by the new node on a value that has been split into high and low halves. This case will be used to support your node with a 64 bit operand on a 32 bit target.lib/CodeGen/SelectionDAG/DAGCombiner.cpp
:If your node can be combined with itself, or other existing nodes in a peephole-like fashion, add a visit function for it, and call that function from. There are several good examples for simple combines you can do;
visitFABS
andvisitSRL
are good starting places.lib/Target/PowerPC/PPCISelLowering.cpp
:Each target has an implementation of the
TargetLowering
class, usually in its own file (although some targets include it in the same file as the DAGToDAGISel). The default behavior for a target is to assume that your new node is legal for all types that are legal for that target. If this target does not natively support your node, then tell the target to either Promote it (if it is supported at a larger type) or Expand it. This will cause the code you wrote inLegalizeOp
above to decompose your new node into other legal nodes for this target.lib/Target/TargetSelectionDAG.td
:Most current targets supported by LLVM generate code using the DAGToDAG method, where SelectionDAG nodes are pattern matched to target-specific nodes, which represent individual instructions. In order for the targets to match an instruction to your new node, you must add a def for that node to the list in this file, with the appropriate type constraints. Look at
add
,bswap
, andfadd
for examples.lib/Target/PowerPC/PPCInstrInfo.td
:Each target has a tablegen file that describes the target’s instruction set. For targets that use the DAGToDAG instruction selection framework, add a pattern for your new node that uses one or more target nodes. Documentation for this is a bit sparse right now, but there are several decent examples. See the patterns for
rotl
inPPCInstrInfo.td
.TODO: document complex patterns.
llvm/test/CodeGen/*
:Add test cases for your new node to the test suite.
llvm/test/CodeGen/X86/bswap.ll
is a good example.
Adding a new instruction¶
Warning
Adding instructions changes the bitcode format, and it will take some effort to maintain compatibility with the previous version. Only add an instruction if it is absolutely necessary.
llvm/include/llvm/IR/Instruction.def
:add a number for your instruction and an enum name
llvm/include/llvm/IR/Instructions.h
:add a definition for the class that will represent your instruction
llvm/include/llvm/IR/InstVisitor.h
:add a prototype for a visitor to your new instruction type
llvm/lib/AsmParser/LLLexer.cpp
:add a new token to parse your instruction from assembly text file
llvm/lib/AsmParser/LLParser.cpp
:add the grammar on how your instruction can be read and what it will construct as a result
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
:add a case for your instruction and how it will be parsed from bitcode
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
:add a case for your instruction and how it will be parsed from bitcode
llvm/lib/IR/Instruction.cpp
:add a case for how your instruction will be printed out to assembly
llvm/lib/IR/Instructions.cpp
:implement the class you defined in
llvm/include/llvm/Instructions.h
Test your instruction
llvm/lib/Target/*
:add support for your instruction to code generators, or add a lowering pass.
llvm/test/*
:add your test cases to the test suite.
Also, you need to implement (or modify) any analyses or passes that you want to understand this new instruction.
Adding a new type¶
Warning
Adding new types changes the bitcode format, and will break compatibility with currently-existing LLVM installations. Only add new types if it is absolutely necessary.
Adding a fundamental type¶
llvm/include/llvm/IR/Type.h
:add enum for the new type; add static
Type*
for this typellvm/lib/IR/Type.cpp
andllvm/lib/IR/ValueTypes.cpp
:add mapping from
TypeID
=>Type*
; initialize the staticType*
llvm/llvm/llvm-c/Core.cpp
:add enum
LLVMTypeKind
and modifyLLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)
for the new typellvm/include/llvm/IR/TypeBuilder.h
:add new class to represent new type in the hierarchy
llvm/lib/AsmParser/LLLexer.cpp
:add ability to parse in the type from text assembly
llvm/lib/AsmParser/LLParser.cpp
:add a token for that type
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
:modify
static void WriteTypeTable(const ValueEnumerator &VE, BitstreamWriter &Stream)
to serialize your typellvm/lib/Bitcode/Reader/BitcodeReader.cpp
:modify
bool BitcodeReader::ParseTypeType()
to read your data typeinclude/llvm/Bitcode/LLVMBitCodes.h
:add enum
TypeCodes
for the new type
Adding a derived type¶
llvm/include/llvm/IR/Type.h
:add enum for the new type; add a forward declaration of the type also
llvm/include/llvm/IR/DerivedTypes.h
:add new class to represent new class in the hierarchy; add forward declaration to the TypeMap value type
llvm/lib/IR/Type.cpp
andllvm/lib/IR/ValueTypes.cpp
:add support for derived type, notably enum TypeID and is, get methods.
llvm/llvm/llvm-c/Core.cpp
:add enum
LLVMTypeKind
and modify LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty) for the new typellvm/include/llvm/IR/TypeBuilder.h
:add new class to represent new class in the hierarchy
llvm/lib/AsmParser/LLLexer.cpp
:modify
lltok::Kind LLLexer::LexIdentifier()
to add ability to parse in the type from text assemblyllvm/lib/Bitcode/Writer/BitcodeWriter.cpp
:modify
static void WriteTypeTable(const ValueEnumerator &VE, BitstreamWriter &Stream)
to serialize your typellvm/lib/Bitcode/Reader/BitcodeReader.cpp
:modify
bool BitcodeReader::ParseTypeType()
to read your data typeinclude/llvm/Bitcode/LLVMBitCodes.h
:add enum
TypeCodes
for the new typellvm/lib/IR/AsmWriter.cpp
:modify
void TypePrinting::print(Type *Ty, raw_ostream &OS)
to output the new derived type
How to set up LLVM-style RTTI for your class hierarchy¶
Contents
Background¶
LLVM avoids using C++’s built in RTTI. Instead, it pervasively uses its own hand-rolled form of RTTI which is much more efficient and flexible, although it requires a bit more work from you as a class author.
A description of how to use LLVM-style RTTI from a client’s perspective is given in the Programmer’s Manual. This document, in contrast, discusses the steps you need to take as a class hierarchy author to make LLVM-style RTTI available to your clients.
Before diving in, make sure that you are familiar with the Object Oriented Programming concept of “is-a”.
Basic Setup¶
This section describes how to set up the most basic form of LLVM-style RTTI (which is sufficient for 99.9% of the cases). We will set up LLVM-style RTTI for this class hierarchy:
class Shape {
public:
Shape() {}
virtual double computeArea() = 0;
};
class Square : public Shape {
double SideLength;
public:
Square(double S) : SideLength(S) {}
double computeArea() override;
};
class Circle : public Shape {
double Radius;
public:
Circle(double R) : Radius(R) {}
double computeArea() override;
};
The most basic working setup for LLVM-style RTTI requires the following steps:
In the header where you declare
Shape
, you will want to#include "llvm/Support/Casting.h"
, which declares LLVM’s RTTI templates. That way your clients don’t even have to think about it.#include "llvm/Support/Casting.h"
In the base class, introduce an enum which discriminates all of the different concrete classes in the hierarchy, and stash the enum value somewhere in the base class.
Here is the code after introducing this change:
class Shape { public: + /// Discriminator for LLVM-style RTTI (dyn_cast<> et al.) + enum ShapeKind { + SK_Square, + SK_Circle + }; +private: + const ShapeKind Kind; +public: + ShapeKind getKind() const { return Kind; } + Shape() {} virtual double computeArea() = 0; };
You will usually want to keep the
Kind
member encapsulated and private, but let the enumShapeKind
be public along with providing agetKind()
method. This is convenient for clients so that they can do aswitch
over the enum.A common naming convention is that these enums are “kind”s, to avoid ambiguity with the words “type” or “class” which have overloaded meanings in many contexts within LLVM. Sometimes there will be a natural name for it, like “opcode”. Don’t bikeshed over this; when in doubt use
Kind
.You might wonder why the
Kind
enum doesn’t have an entry forShape
. The reason for this is that sinceShape
is abstract (computeArea() = 0;
), you will never actually have non-derived instances of exactly that class (only subclasses). See Concrete Bases and Deeper Hierarchies for information on how to deal with non-abstract bases. It’s worth mentioning here that unlikedynamic_cast<>
, LLVM-style RTTI can be used (and is often used) for classes that don’t have v-tables.Next, you need to make sure that the
Kind
gets initialized to the value corresponding to the dynamic type of the class. Typically, you will want to have it be an argument to the constructor of the base class, and then pass in the respectiveXXXKind
from subclass constructors.Here is the code after that change:
class Shape { public: /// Discriminator for LLVM-style RTTI (dyn_cast<> et al.) enum ShapeKind { SK_Square, SK_Circle }; private: const ShapeKind Kind; public: ShapeKind getKind() const { return Kind; } - Shape() {} + Shape(ShapeKind K) : Kind(K) {} virtual double computeArea() = 0; }; class Square : public Shape { double SideLength; public: - Square(double S) : SideLength(S) {} + Square(double S) : Shape(SK_Square), SideLength(S) {} double computeArea() override; }; class Circle : public Shape { double Radius; public: - Circle(double R) : Radius(R) {} + Circle(double R) : Shape(SK_Circle), Radius(R) {} double computeArea() override; };
Finally, you need to inform LLVM’s RTTI templates how to dynamically determine the type of a class (i.e. whether the
isa<>
/dyn_cast<>
should succeed). The default “99.9% of use cases” way to accomplish this is through a small static member functionclassof
. In order to have proper context for an explanation, we will display this code first, and then below describe each part:class Shape { public: /// Discriminator for LLVM-style RTTI (dyn_cast<> et al.) enum ShapeKind { SK_Square, SK_Circle }; private: const ShapeKind Kind; public: ShapeKind getKind() const { return Kind; } Shape(ShapeKind K) : Kind(K) {} virtual double computeArea() = 0; }; class Square : public Shape { double SideLength; public: Square(double S) : Shape(SK_Square), SideLength(S) {} double computeArea() override; + + static bool classof(const Shape *S) { + return S->getKind() == SK_Square; + } }; class Circle : public Shape { double Radius; public: Circle(double R) : Shape(SK_Circle), Radius(R) {} double computeArea() override; + + static bool classof(const Shape *S) { + return S->getKind() == SK_Circle; + } };
The job of
classof
is to dynamically determine whether an object of a base class is in fact of a particular derived class. In order to downcast a typeBase
to a typeDerived
, there needs to be aclassof
inDerived
which will accept an object of typeBase
.To be concrete, consider the following code:
Shape *S = ...; if (isa<Circle>(S)) { /* do something ... */ }
The code of the
isa<>
test in this code will eventually boil down—after template instantiation and some other machinery—to a check roughly likeCircle::classof(S)
. For more information, see The Contract of classof.The argument to
classof
should always be an ancestor class because the implementation has logic to allow and optimize away upcasts/up-isa<>
‘s automatically. It is as though every classFoo
automatically has aclassof
like:class Foo { [...] template <class T> static bool classof(const T *, ::std::enable_if< ::std::is_base_of<Foo, T>::value >::type* = 0) { return true; } [...] };
Note that this is the reason that we did not need to introduce a
classof
intoShape
: all relevant classes derive fromShape
, andShape
itself is abstract (has no entry in theKind
enum), so this notional inferredclassof
is all we need. See Concrete Bases and Deeper Hierarchies for more information about how to extend this example to more general hierarchies.
Although for this small example setting up LLVM-style RTTI seems like a lot of “boilerplate”, if your classes are doing anything interesting then this will end up being a tiny fraction of the code.
Concrete Bases and Deeper Hierarchies¶
For concrete bases (i.e. non-abstract interior nodes of the inheritance
tree), the Kind
check inside classof
needs to be a bit more
complicated. The situation differs from the example above in that
- Since the class is concrete, it must itself have an entry in the
Kind
enum because it is possible to have objects with this class as a dynamic type. - Since the class has children, the check inside
classof
must take them into account.
Say that SpecialSquare
and OtherSpecialSquare
derive
from Square
, and so ShapeKind
becomes:
enum ShapeKind {
SK_Square,
+ SK_SpecialSquare,
+ SK_OtherSpecialSquare,
SK_Circle
}
Then in Square
, we would need to modify the classof
like so:
- static bool classof(const Shape *S) {
- return S->getKind() == SK_Square;
- }
+ static bool classof(const Shape *S) {
+ return S->getKind() >= SK_Square &&
+ S->getKind() <= SK_OtherSpecialSquare;
+ }
The reason that we need to test a range like this instead of just equality
is that both SpecialSquare
and OtherSpecialSquare
“is-a”
Square
, and so classof
needs to return true
for them.
This approach can be made to scale to arbitrarily deep hierarchies. The trick is that you arrange the enum values so that they correspond to a preorder traversal of the class hierarchy tree. With that arrangement, all subclass tests can be done with two comparisons as shown above. If you just list the class hierarchy like a list of bullet points, you’ll get the ordering right:
| Shape
| Square
| SpecialSquare
| OtherSpecialSquare
| Circle
A Bug to be Aware Of¶
The example just given opens the door to bugs where the classof
s are
not updated to match the Kind
enum when adding (or removing) classes to
(from) the hierarchy.
Continuing the example above, suppose we add a SomewhatSpecialSquare
as
a subclass of Square
, and update the ShapeKind
enum like so:
enum ShapeKind {
SK_Square,
SK_SpecialSquare,
SK_OtherSpecialSquare,
+ SK_SomewhatSpecialSquare,
SK_Circle
}
Now, suppose that we forget to update Square::classof()
, so it still
looks like:
static bool classof(const Shape *S) {
// BUG: Returns false when S->getKind() == SK_SomewhatSpecialSquare,
// even though SomewhatSpecialSquare "is a" Square.
return S->getKind() >= SK_Square &&
S->getKind() <= SK_OtherSpecialSquare;
}
As the comment indicates, this code contains a bug. A straightforward and
non-clever way to avoid this is to introduce an explicit SK_LastSquare
entry in the enum when adding the first subclass(es). For example, we could
rewrite the example at the beginning of Concrete Bases and Deeper
Hierarchies as:
enum ShapeKind {
SK_Square,
+ SK_SpecialSquare,
+ SK_OtherSpecialSquare,
+ SK_LastSquare,
SK_Circle
}
...
// Square::classof()
- static bool classof(const Shape *S) {
- return S->getKind() == SK_Square;
- }
+ static bool classof(const Shape *S) {
+ return S->getKind() >= SK_Square &&
+ S->getKind() <= SK_LastSquare;
+ }
Then, adding new subclasses is easy:
enum ShapeKind {
SK_Square,
SK_SpecialSquare,
SK_OtherSpecialSquare,
+ SK_SomewhatSpecialSquare,
SK_LastSquare,
SK_Circle
}
Notice that Square::classof
does not need to be changed.
The Contract of classof
¶
To be more precise, let classof
be inside a class C
. Then the
contract for classof
is “return true
if the dynamic type of the
argument is-a C
”. As long as your implementation fulfills this
contract, you can tweak and optimize it as much as you want.
For example, LLVM-style RTTI can work fine in the presence of
multiple-inheritance by defining an appropriate classof
.
An example of this in practice is
Decl vs.
DeclContext
inside Clang.
The Decl
hierarchy is done very similarly to the example setup
demonstrated in this tutorial.
The key part is how to then incorporate DeclContext
: all that is needed
is in bool DeclContext::classof(const Decl *)
, which asks the question
“Given a Decl
, how can I determine if it is-a DeclContext
?”.
It answers this with a simple switch over the set of Decl
“kinds”, and
returning true for ones that are known to be DeclContext
‘s.
Rules of Thumb¶
- The
Kind
enum should have one entry per concrete class, ordered according to a preorder traversal of the inheritance tree. - The argument to
classof
should be aconst Base *
, whereBase
is some ancestor in the inheritance hierarchy. The argument should never be a derived class or the class itself: the template machinery forisa<>
already handles this case and optimizes it. - For each class in the hierarchy that has no children, implement a
classof
that checks only against itsKind
. - For each class in the hierarchy that has children, implement a
classof
that checks a range of the first child’sKind
and the last child’sKind
.
LLVM Programmer’s Manual¶
- Introduction
- General Information
- Important and useful LLVM APIs
- Picking the Right Data Structure for a Task
- Helpful Hints for Common Operations
- Basic Inspection and Traversal Routines
- Iterating over the
BasicBlock
in aFunction
- Iterating over the
Instruction
in aBasicBlock
- Iterating over the
Instruction
in aFunction
- Turning an iterator into a class pointer (and vice-versa)
- Finding call sites: a slightly more complex example
- Treating calls and invokes the same way
- Iterating over def-use & use-def chains
- Iterating over predecessors & successors of blocks
- Iterating over the
- Making simple changes
- How to Create Types
- Basic Inspection and Traversal Routines
- Threads and LLVM
- Advanced Topics
- The Core LLVM Class Hierarchy Reference
Warning
This is always a work in progress.
Introduction¶
This document is meant to highlight some of the important classes and interfaces available in the LLVM source-base. This manual is not intended to explain what LLVM is, how it works, and what LLVM code looks like. It assumes that you know the basics of LLVM and are interested in writing transformations or otherwise analyzing or manipulating the code.
This document should get you oriented so that you can find your way in the continuously growing source code that makes up the LLVM infrastructure. Note that this manual is not intended to serve as a replacement for reading the source code, so if you think there should be a method in one of these classes to do something, but it’s not listed, check the source. Links to the doxygen sources are provided to make this as easy as possible.
The first section of this document describes general information that is useful
to know when working in the LLVM infrastructure, and the second describes the
Core LLVM classes. In the future this manual will be extended with information
describing how to use extension libraries, such as dominator information, CFG
traversal routines, and useful utilities like the InstVisitor
(doxygen) template.
General Information¶
This section contains general information that is useful if you are working in the LLVM source-base, but that isn’t specific to any particular API.
The C++ Standard Template Library¶
LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much more than you are used to, or have seen before. Because of this, you might want to do a little background reading in the techniques used and capabilities of the library. There are many good pages that discuss the STL, and several books on the subject that you can get, so it will not be discussed in this document.
Here are some useful links:
- cppreference.com - an excellent reference for the STL and other parts of the standard C++ library.
- C++ In a Nutshell - This is an O’Reilly book in the making. It has a decent Standard Library Reference that rivals Dinkumware’s, and is unfortunately no longer free since the book has been published.
- C++ Frequently Asked Questions.
- SGI’s STL Programmer’s Guide - Contains a useful Introduction to the STL.
- Bjarne Stroustrup’s C++ Page.
- Bruce Eckel’s Thinking in C++, 2nd ed. Volume 2 Revision 4.0 (even better, get the book).
You are also encouraged to take a look at the LLVM Coding Standards guide which focuses on how to write maintainable code more than where to put your curly braces.
Important and useful LLVM APIs¶
Here we highlight some LLVM APIs that are generally useful and good to know about when writing transformations.
The isa<>
, cast<>
and dyn_cast<>
templates¶
The LLVM source-base makes extensive use of a custom form of RTTI. These
templates have many similarities to the C++ dynamic_cast<>
operator, but
they don’t have some drawbacks (primarily stemming from the fact that
dynamic_cast<>
only works on classes that have a v-table). Because they are
used so often, you must know what they do and how they work. All of these
templates are defined in the llvm/Support/Casting.h
(doxygen) file (note that you very
rarely have to include this file directly).
isa<>
:- The
isa<>
operator works exactly like the Java “instanceof
” operator. It returns true or false depending on whether a reference or pointer points to an instance of the specified class. This can be very useful for constraint checking of various sorts (example below). cast<>
:The
cast<>
operator is a “checked cast” operation. It converts a pointer or reference from a base class to a derived class, causing an assertion failure if it is not really an instance of the right type. This should be used in cases where you have some information that makes you believe that something is of the right type. An example of theisa<>
andcast<>
template is:static bool isLoopInvariant(const Value *V, const Loop *L) { if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V)) return true; // Otherwise, it must be an instruction... return !L->contains(cast<Instruction>(V)->getParent()); }
Note that you should not use an
isa<>
test followed by acast<>
, for that use thedyn_cast<>
operator.dyn_cast<>
:The
dyn_cast<>
operator is a “checking cast” operation. It checks to see if the operand is of the specified type, and if so, returns a pointer to it (this operator does not work with references). If the operand is not of the correct type, a null pointer is returned. Thus, this works very much like thedynamic_cast<>
operator in C++, and should be used in the same circumstances. Typically, thedyn_cast<>
operator is used in anif
statement or some other flow control statement like this:if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) { // ... }
This form of the
if
statement effectively combines together a call toisa<>
and a call tocast<>
into one statement, which is very convenient.Note that the
dyn_cast<>
operator, like C++’sdynamic_cast<>
or Java’sinstanceof
operator, can be abused. In particular, you should not use big chainedif/then/else
blocks to check for lots of different variants of classes. If you find yourself wanting to do this, it is much cleaner and more efficient to use theInstVisitor
class to dispatch over the instruction type directly.cast_or_null<>
:- The
cast_or_null<>
operator works just like thecast<>
operator, except that it allows for a null pointer as an argument (which it then propagates). This can sometimes be useful, allowing you to combine several null checks into one. dyn_cast_or_null<>
:- The
dyn_cast_or_null<>
operator works just like thedyn_cast<>
operator, except that it allows for a null pointer as an argument (which it then propagates). This can sometimes be useful, allowing you to combine several null checks into one.
These five templates can be used with any classes, whether they have a v-table or not. If you want to add support for these templates, see the document How to set up LLVM-style RTTI for your class hierarchy
Passing strings (the StringRef
and Twine
classes)¶
Although LLVM generally does not do much string manipulation, we do have several
important APIs which take strings. Two important examples are the Value class
– which has names for instructions, functions, etc. – and the StringMap
class which is used extensively in LLVM and Clang.
These are generic classes, and they need to be able to accept strings which may
have embedded null characters. Therefore, they cannot simply take a const
char *
, and taking a const std::string&
requires clients to perform a heap
allocation which is usually unnecessary. Instead, many LLVM APIs use a
StringRef
or a const Twine&
for passing strings efficiently.
The StringRef
class¶
The StringRef
data type represents a reference to a constant string (a
character array and a length) and supports the common operations available on
std::string
, but does not require heap allocation.
It can be implicitly constructed using a C style null-terminated string, an
std::string
, or explicitly with a character pointer and length. For
example, the StringRef
find function is declared as:
iterator find(StringRef Key);
and clients can call it using any one of:
Map.find("foo"); // Lookup "foo"
Map.find(std::string("bar")); // Lookup "bar"
Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
Similarly, APIs which need to return a string may return a StringRef
instance, which can be used directly or converted to an std::string
using
the str
member function. See llvm/ADT/StringRef.h
(doxygen) for more
information.
You should rarely use the StringRef
class directly, because it contains
pointers to external memory it is not generally safe to store an instance of the
class (unless you know that the external storage will not be freed).
StringRef
is small and pervasive enough in LLVM that it should always be
passed by value.
The Twine
class¶
The Twine
(doxygen)
class is an efficient way for APIs to accept concatenated strings. For example,
a common LLVM paradigm is to name one instruction based on the name of another
instruction with a suffix, for example:
New = CmpInst::Create(..., SO->getName() + ".cmp");
The Twine
class is effectively a lightweight rope which points to
temporary (stack allocated) objects. Twines can be implicitly constructed as
the result of the plus operator applied to strings (i.e., a C strings, an
std::string
, or a StringRef
). The twine delays the actual concatenation
of strings until it is actually required, at which point it can be efficiently
rendered directly into a character array. This avoids unnecessary heap
allocation involved in constructing the temporary results of string
concatenation. See llvm/ADT/Twine.h
(doxygen) and here
for more information.
As with a StringRef
, Twine
objects point to external memory and should
almost never be stored or mentioned directly. They are intended solely for use
when defining a function which should be able to efficiently accept concatenated
strings.
Passing functions and other callable objects¶
Sometimes you may want a function to be passed a callback object. In order to support lambda expressions and other function objects, you should not use the traditional C approach of taking a function pointer and an opaque cookie:
void takeCallback(bool (*Callback)(Function *, void *), void *Cookie);
Instead, use one of the following approaches:
Function template¶
If you don’t mind putting the definition of your function into a header file, make it a function template that is templated on the callable type.
template<typename Callable>
void takeCallback(Callable Callback) {
Callback(1, 2, 3);
}
The function_ref
class template¶
The function_ref
(doxygen) class
template represents a reference to a callable object, templated over the type
of the callable. This is a good choice for passing a callback to a function,
if you don’t need to hold onto the callback after the function returns. In this
way, function_ref
is to std::function
as StringRef
is to
std::string
.
function_ref<Ret(Param1, Param2, ...)>
can be implicitly constructed from
any callable object that can be called with arguments of type Param1
,
Param2
, ..., and returns a value that can be converted to type Ret
.
For example:
void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) {
for (BasicBlock &BB : *F)
if (Callback(&BB))
return;
}
can be called using:
visitBasicBlocks(F, [&](BasicBlock *BB) {
if (process(BB))
return isEmpty(BB);
return false;
});
Note that a function_ref
object contains pointers to external memory, so it
is not generally safe to store an instance of the class (unless you know that
the external storage will not be freed). If you need this ability, consider
using std::function
. function_ref
is small enough that it should always
be passed by value.
The DEBUG()
macro and -debug
option¶
Often when working on your pass you will put a bunch of debugging printouts and other code into your pass. After you get it working, you want to remove it, but you may need it again in the future (to work out new bugs that you run across).
Naturally, because of this, you don’t want to delete the debug printouts, but you don’t want them to always be noisy. A standard compromise is to comment them out, allowing you to enable them if you need them in the future.
The llvm/Support/Debug.h
(doxygen) file provides a macro named
DEBUG()
that is a much nicer solution to this problem. Basically, you can
put arbitrary code into the argument of the DEBUG
macro, and it is only
executed if ‘opt
‘ (or any other tool) is run with the ‘-debug
‘ command
line argument:
DEBUG(errs() << "I am here!\n");
Then you can run your pass like this:
$ opt < a.bc > /dev/null -mypass
<no output>
$ opt < a.bc > /dev/null -mypass -debug
I am here!
Using the DEBUG()
macro instead of a home-brewed solution allows you to not
have to create “yet another” command line option for the debug output for your
pass. Note that DEBUG()
macros are disabled for optimized builds, so they
do not cause a performance impact at all (for the same reason, they should also
not contain side-effects!).
One additional nice thing about the DEBUG()
macro is that you can enable or
disable it directly in gdb. Just use “set DebugFlag=0
” or “set
DebugFlag=1
” from the gdb if the program is running. If the program hasn’t
been started yet, you can always just run it with -debug
.
Fine grained debug info with DEBUG_TYPE
and the -debug-only
option¶
Sometimes you may find yourself in a situation where enabling -debug
just
turns on too much information (such as when working on the code generator).
If you want to enable debug information with more fine-grained control, you
can define the DEBUG_TYPE
macro and use the -debug-only
option as
follows:
#undef DEBUG_TYPE
DEBUG(errs() << "No debug type\n");
#define DEBUG_TYPE "foo"
DEBUG(errs() << "'foo' debug type\n");
#undef DEBUG_TYPE
#define DEBUG_TYPE "bar"
DEBUG(errs() << "'bar' debug type\n"));
#undef DEBUG_TYPE
#define DEBUG_TYPE ""
DEBUG(errs() << "No debug type (2)\n");
Then you can run your pass like this:
$ opt < a.bc > /dev/null -mypass
<no output>
$ opt < a.bc > /dev/null -mypass -debug
No debug type
'foo' debug type
'bar' debug type
No debug type (2)
$ opt < a.bc > /dev/null -mypass -debug-only=foo
'foo' debug type
$ opt < a.bc > /dev/null -mypass -debug-only=bar
'bar' debug type
Of course, in practice, you should only set DEBUG_TYPE
at the top of a file,
to specify the debug type for the entire module (if you do this before you
#include "llvm/Support/Debug.h"
, you don’t have to insert the ugly
#undef
‘s). Also, you should use names more meaningful than “foo” and “bar”,
because there is no system in place to ensure that names do not conflict. If
two different modules use the same string, they will all be turned on when the
name is specified. This allows, for example, all debug information for
instruction scheduling to be enabled with -debug-only=InstrSched
, even if
the source lives in multiple files.
For performance reasons, -debug-only is not available in optimized build
(--enable-optimized
) of LLVM.
The DEBUG_WITH_TYPE
macro is also available for situations where you would
like to set DEBUG_TYPE
, but only for one specific DEBUG
statement. It
takes an additional first parameter, which is the type to use. For example, the
preceding example could be written as:
DEBUG_WITH_TYPE("", errs() << "No debug type\n");
DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
The Statistic
class & -stats
option¶
The llvm/ADT/Statistic.h
(doxygen) file provides a class
named Statistic
that is used as a unified way to keep track of what the LLVM
compiler is doing and how effective various optimizations are. It is useful to
see what optimizations are contributing to making a particular program run
faster.
Often you may run your pass on some big program, and you’re interested to see
how many times it makes a certain transformation. Although you can do this with
hand inspection, or some ad-hoc method, this is a real pain and not very useful
for big programs. Using the Statistic
class makes it very easy to keep
track of this information, and the calculated information is presented in a
uniform manner with the rest of the passes being executed.
There are many examples of Statistic
uses, but the basics of using it are as
follows:
- Define your statistic like this:
#define DEBUG_TYPE "mypassname" // This goes before any #includes. STATISTIC(NumXForms, "The # of times I did stuff");The
STATISTIC
macro defines a static variable, whose name is specified by the first argument. The pass name is taken from theDEBUG_TYPE
macro, and the description is taken from the second argument. The variable defined (“NumXForms” in this case) acts like an unsigned integer.
- Whenever you make a transformation, bump the counter:
++NumXForms; // I did stuff!
That’s all you have to do. To get ‘opt
‘ to print out the statistics
gathered, use the ‘-stats
‘ option:
$ opt -stats -mypassname < program.bc > /dev/null
... statistics output ...
Note that in order to use the ‘-stats
‘ option, LLVM must be
compiled with assertions enabled.
When running opt
on a C file from the SPEC benchmark suite, it gives a
report that looks like this:
7646 bitcodewriter - Number of normal instructions
725 bitcodewriter - Number of oversized instructions
129996 bitcodewriter - Number of bitcode bytes written
2817 raise - Number of insts DCEd or constprop'd
3213 raise - Number of cast-of-self removed
5046 raise - Number of expression trees converted
75 raise - Number of other getelementptr's formed
138 raise - Number of load/store peepholes
42 deadtypeelim - Number of unused typenames removed from symtab
392 funcresolve - Number of varargs functions resolved
27 globaldce - Number of global variables removed
2 adce - Number of basic blocks removed
134 cee - Number of branches revectored
49 cee - Number of setcc instruction eliminated
532 gcse - Number of loads removed
2919 gcse - Number of instructions removed
86 indvars - Number of canonical indvars added
87 indvars - Number of aux indvars removed
25 instcombine - Number of dead inst eliminate
434 instcombine - Number of insts combined
248 licm - Number of load insts hoisted
1298 licm - Number of insts hoisted to a loop pre-header
3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
75 mem2reg - Number of alloca's promoted
1444 cfgsimplify - Number of blocks simplified
Obviously, with so many optimizations, having a unified framework for this stuff is very nice. Making your pass fit well into the framework makes it more maintainable and useful.
Viewing graphs while debugging code¶
Several of the important data structures in LLVM are graphs: for example CFGs made out of LLVM BasicBlocks, CFGs made out of LLVM MachineBasicBlocks, and Instruction Selection DAGs. In many cases, while debugging various parts of the compiler, it is nice to instantly visualize these graphs.
LLVM provides several callbacks that are available in a debug build to do
exactly that. If you call the Function::viewCFG()
method, for example, the
current LLVM tool will pop up a window containing the CFG for the function where
each basic block is a node in the graph, and each node contains the instructions
in the block. Similarly, there also exists Function::viewCFGOnly()
(does
not include the instructions), the MachineFunction::viewCFG()
and
MachineFunction::viewCFGOnly()
, and the SelectionDAG::viewGraph()
methods. Within GDB, for example, you can usually use something like call
DAG.viewGraph()
to pop up a window. Alternatively, you can sprinkle calls to
these functions in your code in places you want to debug.
Getting this to work requires a small amount of setup. On Unix systems
with X11, install the graphviz toolkit, and make
sure ‘dot’ and ‘gv’ are in your path. If you are running on Mac OS X, download
and install the Mac OS X Graphviz program and add
/Applications/Graphviz.app/Contents/MacOS/
(or wherever you install it) to
your path. The programs need not be present when configuring, building or
running LLVM and can simply be installed when needed during an active debug
session.
SelectionDAG
has been extended to make it easier to locate interesting
nodes in large complex graphs. From gdb, if you call DAG.setGraphColor(node,
"color")
, then the next call DAG.viewGraph()
would highlight the node in
the specified color (choices of colors can be found at colors.) More complex node attributes
can be provided with call DAG.setGraphAttrs(node, "attributes")
(choices can
be found at Graph attributes.)
If you want to restart and clear all the current graph attributes, then you can
call DAG.clearGraphAttrs()
.
Note that graph visualization features are compiled out of Release builds to reduce file size. This means that you need a Debug+Asserts or Release+Asserts build to use these features.
Picking the Right Data Structure for a Task¶
LLVM has a plethora of data structures in the llvm/ADT/
directory, and we
commonly use STL data structures. This section describes the trade-offs you
should consider when you pick one.
The first step is a choose your own adventure: do you want a sequential container, a set-like container, or a map-like container? The most important thing when choosing a container is the algorithmic properties of how you plan to access the container. Based on that, you should use:
- a map-like container if you need efficient look-up of a value based on another value. Map-like containers also support efficient queries for containment (whether a key is in the map). Map-like containers generally do not support efficient reverse mapping (values to keys). If you need that, use two maps. Some map-like containers also support efficient iteration through the keys in sorted order. Map-like containers are the most expensive sort, only use them if you need one of these capabilities.
- a set-like container if you need to put a bunch of stuff into a container that automatically eliminates duplicates. Some set-like containers support efficient iteration through the elements in sorted order. Set-like containers are more expensive than sequential containers.
- a sequential container provides the most efficient way to add elements and keeps track of the order they are added to the collection. They permit duplicates and support efficient iteration, but do not support efficient look-up based on a key.
- a string container is a specialized sequential container or reference structure that is used for character or byte arrays.
- a bit container provides an efficient way to store and perform set operations on sets of numeric id’s, while automatically eliminating duplicates. Bit containers require a maximum of 1 bit for each identifier you want to store.
Once the proper category of container is determined, you can fine tune the memory use, constant factors, and cache behaviors of access by intelligently picking a member of the category. Note that constant factors and cache behavior can be a big deal. If you have a vector that usually only contains a few elements (but could contain many), for example, it’s much better to use SmallVector than vector. Doing so avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding the elements to the container.
Sequential Containers (std::vector, std::list, etc)¶
There are a variety of sequential containers available for you, based on your needs. Pick the first in this section that will do what you want.
llvm/ADT/ArrayRef.h¶
The llvm::ArrayRef
class is the preferred class to use in an interface that
accepts a sequential list of elements in memory and just reads from them. By
taking an ArrayRef
, the API can be passed a fixed size array, an
std::vector
, an llvm::SmallVector
and anything else that is contiguous
in memory.
Fixed Size Arrays¶
Fixed size arrays are very simple and very fast. They are good if you know exactly how many elements you have, or you have a (low) upper bound on how many you have.
Heap Allocated Arrays¶
Heap allocated arrays (new[]
+ delete[]
) are also simple. They are good
if the number of elements is variable, if you know how many elements you will
need before the array is allocated, and if the array is usually large (if not,
consider a SmallVector). The cost of a heap allocated
array is the cost of the new/delete (aka malloc/free). Also note that if you
are allocating an array of a type with a constructor, the constructor and
destructors will be run for every element in the array (re-sizable vectors only
construct those elements actually used).
llvm/ADT/TinyPtrVector.h¶
TinyPtrVector<Type>
is a highly specialized collection class that is
optimized to avoid allocation in the case when a vector has zero or one
elements. It has two major restrictions: 1) it can only hold values of pointer
type, and 2) it cannot hold a null pointer.
Since this container is highly specialized, it is rarely used.
llvm/ADT/SmallVector.h¶
SmallVector<Type, N>
is a simple class that looks and smells just like
vector<Type>
: it supports efficient iteration, lays out elements in memory
order (so you can do pointer arithmetic between elements), supports efficient
push_back/pop_back operations, supports efficient random access to its elements,
etc.
The advantage of SmallVector is that it allocates space for some number of elements (N) in the object itself. Because of this, if the SmallVector is dynamically smaller than N, no malloc is performed. This can be a big win in cases where the malloc/free call is far more expensive than the code that fiddles around with the elements.
This is good for vectors that are “usually small” (e.g. the number of predecessors/successors of a block is usually less than 8). On the other hand, this makes the size of the SmallVector itself large, so you don’t want to allocate lots of them (doing so will waste a lot of space). As such, SmallVectors are most useful when on the stack.
SmallVector also provides a nice portable and efficient replacement for
alloca
.
Note
Prefer to use SmallVectorImpl<T>
as a parameter type.
In APIs that don’t care about the “small size” (most?), prefer to use
the SmallVectorImpl<T>
class, which is basically just the “vector
header” (and methods) without the elements allocated after it. Note that
SmallVector<T, N>
inherits from SmallVectorImpl<T>
so the
conversion is implicit and costs nothing. E.g.
// BAD: Clients cannot pass e.g. SmallVector<Foo, 4>.
hardcodedSmallSize(SmallVector<Foo, 2> &Out);
// GOOD: Clients can pass any SmallVector<Foo, N>.
allowsAnySmallSize(SmallVectorImpl<Foo> &Out);
void someFunc() {
SmallVector<Foo, 8> Vec;
hardcodedSmallSize(Vec); // Error.
allowsAnySmallSize(Vec); // Works.
}
Even though it has “Impl
” in the name, this is so widely used that
it really isn’t “private to the implementation” anymore. A name like
SmallVectorHeader
would be more appropriate.
<vector>¶
std::vector
is well loved and respected. It is useful when SmallVector
isn’t: when the size of the vector is often large (thus the small optimization
will rarely be a benefit) or if you will be allocating many instances of the
vector itself (which would waste space for elements that aren’t in the
container). vector is also useful when interfacing with code that expects
vectors :).
One worthwhile note about std::vector: avoid code like this:
for ( ... ) {
std::vector<foo> V;
// make use of V.
}
Instead, write this as:
std::vector<foo> V;
for ( ... ) {
// make use of V.
V.clear();
}
Doing so will save (at least) one heap allocation and free per iteration of the loop.
<deque>¶
std::deque
is, in some senses, a generalized version of std::vector
.
Like std::vector
, it provides constant time random access and other similar
properties, but it also provides efficient access to the front of the list. It
does not guarantee continuity of elements within memory.
In exchange for this extra flexibility, std::deque
has significantly higher
constant factor costs than std::vector
. If possible, use std::vector
or
something cheaper.
<list>¶
std::list
is an extremely inefficient class that is rarely useful. It
performs a heap allocation for every element inserted into it, thus having an
extremely high constant factor, particularly for small data types.
std::list
also only supports bidirectional iteration, not random access
iteration.
In exchange for this high cost, std::list supports efficient access to both ends
of the list (like std::deque
, but unlike std::vector
or
SmallVector
). In addition, the iterator invalidation characteristics of
std::list are stronger than that of a vector class: inserting or removing an
element into the list does not invalidate iterator or pointers to other elements
in the list.
llvm/ADT/ilist.h¶
ilist<T>
implements an ‘intrusive’ doubly-linked list. It is intrusive,
because it requires the element to store and provide access to the prev/next
pointers for the list.
ilist
has the same drawbacks as std::list
, and additionally requires an
ilist_traits
implementation for the element type, but it provides some novel
characteristics. In particular, it can efficiently store polymorphic objects,
the traits class is informed when an element is inserted or removed from the
list, and ilist
s are guaranteed to support a constant-time splice
operation.
These properties are exactly what we want for things like Instruction
s and
basic blocks, which is why these are implemented with ilist
s.
Related classes of interest are explained in the following subsections:
llvm/ADT/PackedVector.h¶
Useful for storing a vector of values using only a few number of bits for each value. Apart from the standard operations of a vector-like container, it can also perform an ‘or’ set operation.
For example:
enum State {
None = 0x0,
FirstCondition = 0x1,
SecondCondition = 0x2,
Both = 0x3
};
State get() {
PackedVector<State, 2> Vec1;
Vec1.push_back(FirstCondition);
PackedVector<State, 2> Vec2;
Vec2.push_back(SecondCondition);
Vec1 |= Vec2;
return Vec1[0]; // returns 'Both'.
}
ilist_traits¶
ilist_traits<T>
is ilist<T>
‘s customization mechanism. iplist<T>
(and consequently ilist<T>
) publicly derive from this traits class.
iplist¶
iplist<T>
is ilist<T>
‘s base and as such supports a slightly narrower
interface. Notably, inserters from T&
are absent.
ilist_traits<T>
is a public base of this class and can be used for a wide
variety of customizations.
llvm/ADT/ilist_node.h¶
ilist_node<T>
implements the forward and backward links that are expected
by the ilist<T>
(and analogous containers) in the default manner.
ilist_node<T>
s are meant to be embedded in the node type T
, usually
T
publicly derives from ilist_node<T>
.
Sentinels¶
ilist
s have another specialty that must be considered. To be a good
citizen in the C++ ecosystem, it needs to support the standard container
operations, such as begin
and end
iterators, etc. Also, the
operator--
must work correctly on the end
iterator in the case of
non-empty ilist
s.
The only sensible solution to this problem is to allocate a so-called sentinel
along with the intrusive list, which serves as the end
iterator, providing
the back-link to the last element. However conforming to the C++ convention it
is illegal to operator++
beyond the sentinel and it also must not be
dereferenced.
These constraints allow for some implementation freedom to the ilist
how to
allocate and store the sentinel. The corresponding policy is dictated by
ilist_traits<T>
. By default a T
gets heap-allocated whenever the need
for a sentinel arises.
While the default policy is sufficient in most cases, it may break down when
T
does not provide a default constructor. Also, in the case of many
instances of ilist
s, the memory overhead of the associated sentinels is
wasted. To alleviate the situation with numerous and voluminous
T
-sentinels, sometimes a trick is employed, leading to ghostly sentinels.
Ghostly sentinels are obtained by specially-crafted ilist_traits<T>
which
superpose the sentinel with the ilist
instance in memory. Pointer
arithmetic is used to obtain the sentinel, which is relative to the ilist
‘s
this
pointer. The ilist
is augmented by an extra pointer, which serves
as the back-link of the sentinel. This is the only field in the ghostly
sentinel which can be legally accessed.
Other Sequential Container options¶
Other STL containers are available, such as std::string
.
There are also various STL adapter classes such as std::queue
,
std::priority_queue
, std::stack
, etc. These provide simplified access
to an underlying container but don’t affect the cost of the container itself.
String-like containers¶
There are a variety of ways to pass around and use strings in C and C++, and LLVM adds a few new options to choose from. Pick the first option on this list that will do what you need, they are ordered according to their relative cost.
Note that it is generally preferred to not pass strings around as const
char*
‘s. These have a number of problems, including the fact that they
cannot represent embedded nul (“0”) characters, and do not have a length
available efficiently. The general replacement for ‘const char*
‘ is
StringRef.
For more information on choosing string containers for APIs, please see Passing Strings.
llvm/ADT/StringRef.h¶
The StringRef class is a simple value class that contains a pointer to a character and a length, and is quite related to the ArrayRef class (but specialized for arrays of characters). Because StringRef carries a length with it, it safely handles strings with embedded nul characters in it, getting the length does not require a strlen call, and it even has very convenient APIs for slicing and dicing the character range that it represents.
StringRef is ideal for passing simple strings around that are known to be live, either because they are C string literals, std::string, a C array, or a SmallVector. Each of these cases has an efficient implicit conversion to StringRef, which doesn’t result in a dynamic strlen being executed.
StringRef has a few major limitations which make more powerful string containers useful:
- You cannot directly convert a StringRef to a ‘const char*’ because there is no way to add a trailing nul (unlike the .c_str() method on various stronger classes).
- StringRef doesn’t own or keep alive the underlying string bytes. As such it can easily lead to dangling pointers, and is not suitable for embedding in datastructures in most cases (instead, use an std::string or something like that).
- For the same reason, StringRef cannot be used as the return value of a method if the method “computes” the result string. Instead, use std::string.
- StringRef’s do not allow you to mutate the pointed-to string bytes and it doesn’t allow you to insert or remove bytes from the range. For editing operations like this, it interoperates with the Twine class.
Because of its strengths and limitations, it is very common for a function to take a StringRef and for a method on an object to return a StringRef that points into some string that it owns.
llvm/ADT/Twine.h¶
The Twine class is used as an intermediary datatype for APIs that want to take a string that can be constructed inline with a series of concatenations. Twine works by forming recursive instances of the Twine datatype (a simple value object) on the stack as temporary objects, linking them together into a tree which is then linearized when the Twine is consumed. Twine is only safe to use as the argument to a function, and should always be a const reference, e.g.:
void foo(const Twine &T);
...
StringRef X = ...
unsigned i = ...
foo(X + "." + Twine(i));
This example forms a string like “blarg.42” by concatenating the values together, and does not form intermediate strings containing “blarg” or “blarg.”.
Because Twine is constructed with temporary objects on the stack, and because these instances are destroyed at the end of the current statement, it is an inherently dangerous API. For example, this simple variant contains undefined behavior and will probably crash:
void foo(const Twine &T);
...
StringRef X = ...
unsigned i = ...
const Twine &Tmp = X + "." + Twine(i);
foo(Tmp);
... because the temporaries are destroyed before the call. That said, Twine’s are much more efficient than intermediate std::string temporaries, and they work really well with StringRef. Just be aware of their limitations.
llvm/ADT/SmallString.h¶
SmallString is a subclass of SmallVector that adds some convenience APIs like += that takes StringRef’s. SmallString avoids allocating memory in the case when the preallocated space is enough to hold its data, and it calls back to general heap allocation when required. Since it owns its data, it is very safe to use and supports full mutation of the string.
Like SmallVector’s, the big downside to SmallString is their sizeof. While they are optimized for small strings, they themselves are not particularly small. This means that they work great for temporary scratch buffers on the stack, but should not generally be put into the heap: it is very rare to see a SmallString as the member of a frequently-allocated heap data structure or returned by-value.
std::string¶
The standard C++ std::string class is a very general class that (like SmallString) owns its underlying data. sizeof(std::string) is very reasonable so it can be embedded into heap data structures and returned by-value. On the other hand, std::string is highly inefficient for inline editing (e.g. concatenating a bunch of stuff together) and because it is provided by the standard library, its performance characteristics depend a lot of the host standard library (e.g. libc++ and MSVC provide a highly optimized string class, GCC contains a really slow implementation).
The major disadvantage of std::string is that almost every operation that makes them larger can allocate memory, which is slow. As such, it is better to use SmallVector or Twine as a scratch buffer, but then use std::string to persist the result.
Set-Like Containers (std::set, SmallSet, SetVector, etc)¶
Set-like containers are useful when you need to canonicalize multiple values into a single representation. There are several different choices for how to do this, providing various trade-offs.
A sorted ‘vector’¶
If you intend to insert a lot of elements, then do a lot of queries, a great approach is to use a vector (or other sequential container) with std::sort+std::unique to remove duplicates. This approach works really well if your usage pattern has these two distinct phases (insert then query), and can be coupled with a good choice of sequential container.
This combination provides the several nice properties: the result data is
contiguous in memory (good for cache locality), has few allocations, is easy to
address (iterators in the final vector are just indices or pointers), and can be
efficiently queried with a standard binary search (e.g.
std::lower_bound
; if you want the whole range of elements comparing
equal, use std::equal_range
).
llvm/ADT/SmallSet.h¶
If you have a set-like data structure that is usually small and whose elements
are reasonably small, a SmallSet<Type, N>
is a good choice. This set has
space for N elements in place (thus, if the set is dynamically smaller than N,
no malloc traffic is required) and accesses them with a simple linear search.
When the set grows beyond N elements, it allocates a more expensive
representation that guarantees efficient access (for most types, it falls back
to std::set, but for pointers it uses something far better,
SmallPtrSet.
The magic of this class is that it handles small sets extremely efficiently, but gracefully handles extremely large sets without loss of efficiency. The drawback is that the interface is quite small: it supports insertion, queries and erasing, but does not support iteration.
llvm/ADT/SmallPtrSet.h¶
SmallPtrSet
has all the advantages of SmallSet
(and a SmallSet
of
pointers is transparently implemented with a SmallPtrSet
), but also supports
iterators. If more than N insertions are performed, a single quadratically
probed hash table is allocated and grows as needed, providing extremely
efficient access (constant time insertion/deleting/queries with low constant
factors) and is very stingy with malloc traffic.
Note that, unlike std::set, the iterators of SmallPtrSet
are invalidated whenever an insertion occurs. Also, the values visited by the
iterators are not visited in sorted order.
llvm/ADT/StringSet.h¶
StringSet
is a thin wrapper around StringMap<char>,
and it allows efficient storage and retrieval of unique strings.
Functionally analogous to SmallSet<StringRef>
, StringSet
also suports
iteration. (The iterator dereferences to a StringMapEntry<char>
, so you
need to call i->getKey()
to access the item of the StringSet.) On the
other hand, StringSet
doesn’t support range-insertion and
copy-construction, which SmallSet and SmallPtrSet do support.
llvm/ADT/DenseSet.h¶
DenseSet is a simple quadratically probed hash table. It excels at supporting small values: it uses a single allocation to hold all of the pairs that are currently inserted in the set. DenseSet is a great way to unique small values that are not simple pointers (use SmallPtrSet for pointers). Note that DenseSet has the same requirements for the value type that DenseMap has.
llvm/ADT/SparseSet.h¶
SparseSet holds a small number of objects identified by unsigned keys of moderate size. It uses a lot of memory, but provides operations that are almost as fast as a vector. Typical keys are physical registers, virtual registers, or numbered basic blocks.
SparseSet is useful for algorithms that need very fast clear/find/insert/erase and fast iteration over small sets. It is not intended for building composite data structures.
llvm/ADT/SparseMultiSet.h¶
SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet’s desirable attributes. Like SparseSet, it typically uses a lot of memory, but provides operations that are almost as fast as a vector. Typical keys are physical registers, virtual registers, or numbered basic blocks.
SparseMultiSet is useful for algorithms that need very fast clear/find/insert/erase of the entire collection, and iteration over sets of elements sharing a key. It is often a more efficient choice than using composite data structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for building composite data structures.
llvm/ADT/FoldingSet.h¶
FoldingSet is an aggregate class that is really good at uniquing expensive-to-create or polymorphic objects. It is a combination of a chained hash table with intrusive links (uniqued objects are required to inherit from FoldingSetNode) that uses SmallVector as part of its ID process.
Consider a case where you want to implement a “getOrCreateFoo” method for a complex object (for example, a node in the code generator). The client has a description of what it wants to generate (it knows the opcode and all the operands), but we don’t want to ‘new’ a node, then try inserting it into a set only to find out it already exists, at which point we would have to delete it and return the node that already exists.
To support this style of client, FoldingSet perform a query with a FoldingSetNodeID (which wraps SmallVector) that can be used to describe the element that we want to query for. The query either returns the element matching the ID or it returns an opaque ID that indicates where insertion should take place. Construction of the ID usually does not require heap traffic.
Because FoldingSet uses intrusive links, it can support polymorphic objects in the set (for example, you can have SDNode instances mixed with LoadSDNodes). Because the elements are individually allocated, pointers to the elements are stable: inserting or removing elements does not invalidate any pointers to other elements.
<set>¶
std::set
is a reasonable all-around set class, which is decent at many
things but great at nothing. std::set allocates memory for each element
inserted (thus it is very malloc intensive) and typically stores three pointers
per element in the set (thus adding a large amount of per-element space
overhead). It offers guaranteed log(n) performance, which is not particularly
fast from a complexity standpoint (particularly if the elements of the set are
expensive to compare, like strings), and has extremely high constant factors for
lookup, insertion and removal.
The advantages of std::set are that its iterators are stable (deleting or inserting an element from the set does not affect iterators or pointers to other elements) and that iteration over the set is guaranteed to be in sorted order. If the elements in the set are large, then the relative overhead of the pointers and malloc traffic is not a big deal, but if the elements of the set are small, std::set is almost never a good choice.
llvm/ADT/SetVector.h¶
LLVM’s SetVector<Type>
is an adapter class that combines your choice of a
set-like container along with a Sequential Container The
important property that this provides is efficient insertion with uniquing
(duplicate elements are ignored) with iteration support. It implements this by
inserting elements into both a set-like container and the sequential container,
using the set-like container for uniquing and the sequential container for
iteration.
The difference between SetVector and other sets is that the order of iteration is guaranteed to match the order of insertion into the SetVector. This property is really important for things like sets of pointers. Because pointer values are non-deterministic (e.g. vary across runs of the program on different machines), iterating over the pointers in the set will not be in a well-defined order.
The drawback of SetVector is that it requires twice as much space as a normal set and has the sum of constant factors from the set-like container and the sequential container that it uses. Use it only if you need to iterate over the elements in a deterministic order. SetVector is also expensive to delete elements out of (linear time), unless you use its “pop_back” method, which is faster.
SetVector
is an adapter class that defaults to using std::vector
and a
size 16 SmallSet
for the underlying containers, so it is quite expensive.
However, "llvm/ADT/SetVector.h"
also provides a SmallSetVector
class,
which defaults to using a SmallVector
and SmallSet
of a specified size.
If you use this, and if your sets are dynamically smaller than N
, you will
save a lot of heap traffic.
llvm/ADT/UniqueVector.h¶
UniqueVector is similar to SetVector but it retains a unique ID for each element inserted into the set. It internally contains a map and a vector, and it assigns a unique ID for each value inserted into the set.
UniqueVector is very expensive: its cost is the sum of the cost of maintaining both the map and vector, it has high complexity, high constant factors, and produces a lot of malloc traffic. It should be avoided.
llvm/ADT/ImmutableSet.h¶
ImmutableSet is an immutable (functional) set implementation based on an AVL tree. Adding or removing elements is done through a Factory object and results in the creation of a new ImmutableSet object. If an ImmutableSet already exists with the given contents, then the existing one is returned; equality is compared with a FoldingSetNodeID. The time and space complexity of add or remove operations is logarithmic in the size of the original set.
There is no method for returning an element of the set, you can only check for membership.
Other Set-Like Container Options¶
The STL provides several other options, such as std::multiset and the various “hash_set” like containers (whether from C++ TR1 or from the SGI library). We never use hash_set and unordered_set because they are generally very expensive (each insertion requires a malloc) and very non-portable.
std::multiset is useful if you’re not interested in elimination of duplicates, but has all the drawbacks of std::set. A sorted vector (where you don’t delete duplicate entries) or some other approach is almost always better.
Map-Like Containers (std::map, DenseMap, etc)¶
Map-like containers are useful when you want to associate data to a key. As usual, there are a lot of different ways to do this. :)
A sorted ‘vector’¶
If your usage pattern follows a strict insert-then-query approach, you can trivially use the same approach as sorted vectors for set-like containers. The only difference is that your query function (which uses std::lower_bound to get efficient log(n) lookup) should only compare the key, not both the key and value. This yields the same advantages as sorted vectors for sets.
llvm/ADT/StringMap.h¶
Strings are commonly used as keys in maps, and they are difficult to support efficiently: they are variable length, inefficient to hash and compare when long, expensive to copy, etc. StringMap is a specialized container designed to cope with these issues. It supports mapping an arbitrary range of bytes to an arbitrary other object.
The StringMap implementation uses a quadratically-probed hash table, where the
buckets store a pointer to the heap allocated entries (and some other stuff).
The entries in the map must be heap allocated because the strings are variable
length. The string data (key) and the element object (value) are stored in the
same allocation with the string data immediately after the element object.
This container guarantees the “(char*)(&Value+1)
” points to the key string
for a value.
The StringMap is very fast for several reasons: quadratic probing is very cache efficient for lookups, the hash value of strings in buckets is not recomputed when looking up an element, StringMap rarely has to touch the memory for unrelated objects when looking up a value (even when hash collisions happen), hash table growth does not recompute the hash values for strings already in the table, and each pair in the map is store in a single allocation (the string data is stored in the same allocation as the Value of a pair).
StringMap also provides query methods that take byte ranges, so it only ever copies a string if a value is inserted into the table.
StringMap iteratation order, however, is not guaranteed to be deterministic, so any uses which require that should instead use a std::map.
llvm/ADT/IndexedMap.h¶
IndexedMap is a specialized container for mapping small dense integers (or values that can be mapped to small dense integers) to some other type. It is internally implemented as a vector with a mapping function that maps the keys to the dense integer range.
This is useful for cases like virtual registers in the LLVM code generator: they have a dense mapping that is offset by a compile-time constant (the first virtual register ID).
llvm/ADT/DenseMap.h¶
DenseMap is a simple quadratically probed hash table. It excels at supporting small keys and values: it uses a single allocation to hold all of the pairs that are currently inserted in the map. DenseMap is a great way to map pointers to pointers, or map other small types to each other.
There are several aspects of DenseMap that you should be aware of, however. The iterators in a DenseMap are invalidated whenever an insertion occurs, unlike map. Also, because DenseMap allocates space for a large number of key/value pairs (it starts with 64 by default), it will waste a lot of space if your keys or values are large. Finally, you must implement a partial specialization of DenseMapInfo for the key that you want, if it isn’t already supported. This is required to tell DenseMap about two special marker values (which can never be inserted into the map) that it needs internally.
DenseMap’s find_as() method supports lookup operations using an alternate key type. This is useful in cases where the normal key type is expensive to construct, but cheap to compare against. The DenseMapInfo is responsible for defining the appropriate comparison and hashing methods for each alternate key type used.
llvm/IR/ValueMap.h¶
ValueMap is a wrapper around a DenseMap mapping
Value*
s (or subclasses) to another type. When a Value is deleted or
RAUW’ed, ValueMap will update itself so the new version of the key is mapped to
the same value, just as if the key were a WeakVH. You can configure exactly how
this happens, and what else happens on these two events, by passing a Config
parameter to the ValueMap template.
llvm/ADT/IntervalMap.h¶
IntervalMap is a compact map for small keys and values. It maps key intervals instead of single keys, and it will automatically coalesce adjacent intervals. When the map only contains a few intervals, they are stored in the map object itself to avoid allocations.
The IntervalMap iterators are quite big, so they should not be passed around as STL iterators. The heavyweight iterators allow a smaller data structure.
<map>¶
std::map has similar characteristics to std::set: it uses a single allocation per pair inserted into the map, it offers log(n) lookup with an extremely large constant factor, imposes a space penalty of 3 pointers per pair in the map, etc.
std::map is most useful when your keys or values are very large, if you need to iterate over the collection in sorted order, or if you need stable iterators into the map (i.e. they don’t get invalidated if an insertion or deletion of another element takes place).
llvm/ADT/MapVector.h¶
MapVector<KeyT,ValueT>
provides a subset of the DenseMap interface. The
main difference is that the iteration order is guaranteed to be the insertion
order, making it an easy (but somewhat expensive) solution for non-deterministic
iteration over maps of pointers.
It is implemented by mapping from key to an index in a vector of key,value
pairs. This provides fast lookup and iteration, but has two main drawbacks:
the key is stored twice and removing elements takes linear time. If it is
necessary to remove elements, it’s best to remove them in bulk using
remove_if()
.
llvm/ADT/IntEqClasses.h¶
IntEqClasses provides a compact representation of equivalence classes of small integers. Initially, each integer in the range 0..n-1 has its own equivalence class. Classes can be joined by passing two class representatives to the join(a, b) method. Two integers are in the same class when findLeader() returns the same representative.
Once all equivalence classes are formed, the map can be compressed so each integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m is the total number of equivalence classes. The map must be uncompressed before it can be edited again.
llvm/ADT/ImmutableMap.h¶
ImmutableMap is an immutable (functional) map implementation based on an AVL tree. Adding or removing elements is done through a Factory object and results in the creation of a new ImmutableMap object. If an ImmutableMap already exists with the given key set, then the existing one is returned; equality is compared with a FoldingSetNodeID. The time and space complexity of add or remove operations is logarithmic in the size of the original map.
Other Map-Like Container Options¶
The STL provides several other options, such as std::multimap and the various “hash_map” like containers (whether from C++ TR1 or from the SGI library). We never use hash_set and unordered_set because they are generally very expensive (each insertion requires a malloc) and very non-portable.
std::multimap is useful if you want to map a key to multiple values, but has all the drawbacks of std::map. A sorted vector or some other approach is almost always better.
Bit storage containers (BitVector, SparseBitVector)¶
Unlike the other containers, there are only two bit storage containers, and choosing when to use each is relatively straightforward.
One additional option is std::vector<bool>
: we discourage its use for two
reasons 1) the implementation in many common compilers (e.g. commonly
available versions of GCC) is extremely inefficient and 2) the C++ standards
committee is likely to deprecate this container and/or change it significantly
somehow. In any case, please don’t use it.
BitVector¶
The BitVector container provides a dynamic size set of bits for manipulation. It supports individual bit setting/testing, as well as set operations. The set operations take time O(size of bitvector), but operations are performed one word at a time, instead of one bit at a time. This makes the BitVector very fast for set operations compared to other containers. Use the BitVector when you expect the number of set bits to be high (i.e. a dense set).
SmallBitVector¶
The SmallBitVector container provides the same interface as BitVector, but it is optimized for the case where only a small number of bits, less than 25 or so, are needed. It also transparently supports larger bit counts, but slightly less efficiently than a plain BitVector, so SmallBitVector should only be used when larger counts are rare.
At this time, SmallBitVector does not support set operations (and, or, xor), and its operator[] does not provide an assignable lvalue.
SparseBitVector¶
The SparseBitVector container is much like BitVector, with one major difference: Only the bits that are set, are stored. This makes the SparseBitVector much more space efficient than BitVector when the set is sparse, as well as making set operations O(number of set bits) instead of O(size of universe). The downside to the SparseBitVector is that setting and testing of random bits is O(N), and on large SparseBitVectors, this can be slower than BitVector. In our implementation, setting or testing bits in sorted order (either forwards or reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends on size) of the current bit is also O(1). As a general statement, testing/setting bits in a SparseBitVector is O(distance away from last set bit).
Helpful Hints for Common Operations¶
This section describes how to perform some very simple transformations of LLVM code. This is meant to give examples of common idioms used, showing the practical side of LLVM transformations.
Because this is a “how-to” section, you should also read about the main classes that you will be working with. The Core LLVM Class Hierarchy Reference contains details and descriptions of the main classes that you should know about.
Basic Inspection and Traversal Routines¶
The LLVM compiler infrastructure have many different data structures that may be
traversed. Following the example of the C++ standard template library, the
techniques used to traverse these various data structures are all basically the
same. For a enumerable sequence of values, the XXXbegin()
function (or
method) returns an iterator to the start of the sequence, the XXXend()
function returns an iterator pointing to one past the last valid element of the
sequence, and there is some XXXiterator
data type that is common between the
two operations.
Because the pattern for iteration is common across many different aspects of the program representation, the standard template library algorithms may be used on them, and it is easier to remember how to iterate. First we show a few common examples of the data structures that need to be traversed. Other data structures are traversed in very similar ways.
Iterating over the BasicBlock
in a Function
¶
It’s quite common to have a Function
instance that you’d like to transform
in some way; in particular, you’d like to manipulate its BasicBlock
s. To
facilitate this, you’ll need to iterate over all of the BasicBlock
s that
constitute the Function
. The following is an example that prints the name
of a BasicBlock
and the number of Instruction
s it contains:
// func is a pointer to a Function instance
for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i)
// Print out the name of the basic block if it has one, and then the
// number of instructions that it contains
errs() << "Basic block (name=" << i->getName() << ") has "
<< i->size() << " instructions.\n";
Note that i can be used as if it were a pointer for the purposes of invoking
member functions of the Instruction
class. This is because the indirection
operator is overloaded for the iterator classes. In the above code, the
expression i->size()
is exactly equivalent to (*i).size()
just like
you’d expect.
Iterating over the Instruction
in a BasicBlock
¶
Just like when dealing with BasicBlock
s in Function
s, it’s easy to
iterate over the individual instructions that make up BasicBlock
s. Here’s
a code snippet that prints out each instruction in a BasicBlock
:
// blk is a pointer to a BasicBlock instance
for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i)
// The next statement works since operator<<(ostream&,...)
// is overloaded for Instruction&
errs() << *i << "\n";
However, this isn’t really the best way to print out the contents of a
BasicBlock
! Since the ostream operators are overloaded for virtually
anything you’ll care about, you could have just invoked the print routine on the
basic block itself: errs() << *blk << "\n";
.
Iterating over the Instruction
in a Function
¶
If you’re finding that you commonly iterate over a Function
‘s
BasicBlock
s and then that BasicBlock
‘s Instruction
s,
InstIterator
should be used instead. You’ll need to include
llvm/IR/InstIterator.h
(doxygen) and then instantiate
InstIterator
s explicitly in your code. Here’s a small example that shows
how to dump all instructions in a function to the standard error stream:
#include "llvm/IR/InstIterator.h"
// F is a pointer to a Function instance
for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
errs() << *I << "\n";
Easy, isn’t it? You can also use InstIterator
s to fill a work list with
its initial contents. For example, if you wanted to initialize a work list to
contain all instructions in a Function
F, all you would need to do is
something like:
std::set<Instruction*> worklist;
// or better yet, SmallPtrSet<Instruction*, 64> worklist;
for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
worklist.insert(&*I);
The STL set worklist
would now contain all instructions in the Function
pointed to by F.
Turning an iterator into a class pointer (and vice-versa)¶
Sometimes, it’ll be useful to grab a reference (or pointer) to a class instance
when all you’ve got at hand is an iterator. Well, extracting a reference or a
pointer from an iterator is very straight-forward. Assuming that i
is a
BasicBlock::iterator
and j
is a BasicBlock::const_iterator
:
Instruction& inst = *i; // Grab reference to instruction reference
Instruction* pinst = &*i; // Grab pointer to instruction reference
const Instruction& inst = *j;
However, the iterators you’ll be working with in the LLVM framework are special: they will automatically convert to a ptr-to-instance type whenever they need to. Instead of derferencing the iterator and then taking the address of the result, you can simply assign the iterator to the proper pointer type and you get the dereference and address-of operation as a result of the assignment (behind the scenes, this is a result of overloading casting mechanisms). Thus the second line of the last example,
Instruction *pinst = &*i;
is semantically equivalent to
Instruction *pinst = i;
It’s also possible to turn a class pointer into the corresponding iterator, and this is a constant time operation (very efficient). The following code snippet illustrates use of the conversion constructors provided by LLVM iterators. By using these, you can explicitly grab the iterator of something without actually obtaining it via iteration over some structure:
void printNextInstruction(Instruction* inst) {
BasicBlock::iterator it(inst);
++it; // After this line, it refers to the instruction after *inst
if (it != inst->getParent()->end()) errs() << *it << "\n";
}
Unfortunately, these implicit conversions come at a cost; they prevent these
iterators from conforming to standard iterator conventions, and thus from being
usable with standard algorithms and containers. For example, they prevent the
following code, where B
is a BasicBlock
, from compiling:
llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
Because of this, these implicit conversions may be removed some day, and
operator*
changed to return a pointer instead of a reference.
Finding call sites: a slightly more complex example¶
Say that you’re writing a FunctionPass and would like to count all the locations
in the entire module (that is, across every Function
) where a certain
function (i.e., some Function *
) is already in scope. As you’ll learn
later, you may want to use an InstVisitor
to accomplish this in a much more
straight-forward manner, but this example will allow us to explore how you’d do
it if you didn’t have InstVisitor
around. In pseudo-code, this is what we
want to do:
initialize callCounter to zero
for each Function f in the Module
for each BasicBlock b in f
for each Instruction i in b
if (i is a CallInst and calls the given function)
increment callCounter
And the actual code is (remember, because we’re writing a FunctionPass
, our
FunctionPass
-derived class simply has to override the runOnFunction
method):
Function* targetFunc = ...;
class OurFunctionPass : public FunctionPass {
public:
OurFunctionPass(): callCounter(0) { }
virtual runOnFunction(Function& F) {
for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) {
for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) {
if (CallInst* callInst = dyn_cast<CallInst>(&*i)) {
// We know we've encountered a call instruction, so we
// need to determine if it's a call to the
// function pointed to by m_func or not.
if (callInst->getCalledFunction() == targetFunc)
++callCounter;
}
}
}
}
private:
unsigned callCounter;
};
Treating calls and invokes the same way¶
You may have noticed that the previous example was a bit oversimplified in that
it did not deal with call sites generated by ‘invoke’ instructions. In this,
and in other situations, you may find that you want to treat CallInst
s and
InvokeInst
s the same way, even though their most-specific common base
class is Instruction
, which includes lots of less closely-related things.
For these cases, LLVM provides a handy wrapper class called CallSite
(doxygen) It is
essentially a wrapper around an Instruction
pointer, with some methods that
provide functionality common to CallInst
s and InvokeInst
s.
This class has “value semantics”: it should be passed by value, not by reference
and it should not be dynamically allocated or deallocated using operator new
or operator delete
. It is efficiently copyable, assignable and
constructable, with costs equivalents to that of a bare pointer. If you look at
its definition, it has only a single pointer member.
Iterating over def-use & use-def chains¶
Frequently, we might have an instance of the Value
class (doxygen) and we want to determine
which User
s use the Value
. The list of all User
s of a particular
Value
is called a def-use chain. For example, let’s say we have a
Function*
named F
to a particular function foo
. Finding all of the
instructions that use foo
is as simple as iterating over the def-use
chain of F
:
Function *F = ...;
for (User *U : F->users()) {
if (Instruction *Inst = dyn_cast<Instruction>(U)) {
errs() << "F is used in instruction:\n";
errs() << *Inst << "\n";
}
Alternatively, it’s common to have an instance of the User
Class (doxygen) and need to know what
Value
s are used by it. The list of all Value
s used by a User
is
known as a use-def chain. Instances of class Instruction
are common
User
s, so we might want to iterate over all of the values that a particular
instruction uses (that is, the operands of the particular Instruction
):
Instruction *pi = ...;
for (Use &U : pi->operands()) {
Value *v = U.get();
// ...
}
Declaring objects as const
is an important tool of enforcing mutation free
algorithms (such as analyses, etc.). For this purpose above iterators come in
constant flavors as Value::const_use_iterator
and
Value::const_op_iterator
. They automatically arise when calling
use/op_begin()
on const Value*
s or const User*
s respectively.
Upon dereferencing, they return const Use*
s. Otherwise the above patterns
remain unchanged.
Iterating over predecessors & successors of blocks¶
Iterating over the predecessors and successors of a block is quite easy with the
routines defined in "llvm/IR/CFG.h"
. Just use code like this to
iterate over all predecessors of BB:
#include "llvm/Support/CFG.h"
BasicBlock *BB = ...;
for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
BasicBlock *Pred = *PI;
// ...
}
Similarly, to iterate over successors use succ_iterator/succ_begin/succ_end
.
Making simple changes¶
There are some primitive transformation operations present in the LLVM infrastructure that are worth knowing about. When performing transformations, it’s fairly common to manipulate the contents of basic blocks. This section describes some of the common methods for doing so and gives example code.
Creating and inserting new Instruction
s¶
Instantiating Instructions
Creation of Instruction
s is straight-forward: simply call the constructor
for the kind of instruction to instantiate and provide the necessary parameters.
For example, an AllocaInst
only requires a (const-ptr-to) Type
. Thus:
AllocaInst* ai = new AllocaInst(Type::Int32Ty);
will create an AllocaInst
instance that represents the allocation of one
integer in the current stack frame, at run time. Each Instruction
subclass
is likely to have varying default parameters which change the semantics of the
instruction, so refer to the doxygen documentation for the subclass of
Instruction that
you’re interested in instantiating.
Naming values
It is very useful to name the values of instructions when you’re able to, as
this facilitates the debugging of your transformations. If you end up looking
at generated LLVM machine code, you definitely want to have logical names
associated with the results of instructions! By supplying a value for the
Name
(default) parameter of the Instruction
constructor, you associate a
logical name with the result of the instruction’s execution at run time. For
example, say that I’m writing a transformation that dynamically allocates space
for an integer on the stack, and that integer is going to be used as some kind
of index by some other code. To accomplish this, I place an AllocaInst
at
the first point in the first BasicBlock
of some Function
, and I’m
intending to use it within the same Function
. I might do:
AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
where indexLoc
is now the logical name of the instruction’s execution value,
which is a pointer to an integer on the run time stack.
Inserting instructions
There are essentially three ways to insert an Instruction
into an existing
sequence of instructions that form a BasicBlock
:
Insertion into an explicit instruction list
Given a
BasicBlock* pb
, anInstruction* pi
within thatBasicBlock
, and a newly-created instruction we wish to insert before*pi
, we do the following:BasicBlock *pb = ...; Instruction *pi = ...; Instruction *newInst = new Instruction(...); pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
Appending to the end of a
BasicBlock
is so common that theInstruction
class andInstruction
-derived classes provide constructors which take a pointer to aBasicBlock
to be appended to. For example code that looked like:BasicBlock *pb = ...; Instruction *newInst = new Instruction(...); pb->getInstList().push_back(newInst); // Appends newInst to pb
becomes:
BasicBlock *pb = ...; Instruction *newInst = new Instruction(..., pb);
which is much cleaner, especially if you are creating long instruction streams.
Insertion into an implicit instruction list
Instruction
instances that are already inBasicBlock
s are implicitly associated with an existing instruction list: the instruction list of the enclosing basic block. Thus, we could have accomplished the same thing as the above code without being given aBasicBlock
by doing:Instruction *pi = ...; Instruction *newInst = new Instruction(...); pi->getParent()->getInstList().insert(pi, newInst);
In fact, this sequence of steps occurs so frequently that the
Instruction
class andInstruction
-derived classes provide constructors which take (as a default parameter) a pointer to anInstruction
which the newly-createdInstruction
should precede. That is,Instruction
constructors are capable of inserting the newly-created instance into theBasicBlock
of a provided instruction, immediately before that instruction. Using anInstruction
constructor with ainsertBefore
(default) parameter, the above code becomes:Instruction* pi = ...; Instruction* newInst = new Instruction(..., pi);
which is much cleaner, especially if you’re creating a lot of instructions and adding them to
BasicBlock
s.Insertion using an instance of
IRBuilder
Inserting several
Instruction
s can be quite laborious using the previous methods. TheIRBuilder
is a convenience class that can be used to add several instructions to the end of aBasicBlock
or before a particularInstruction
. It also supports constant folding and renaming named registers (seeIRBuilder
‘s template arguments).The example below demonstrates a very simple use of the
IRBuilder
where three instructions are inserted before the instructionpi
. The first two instructions are Call instructions and third instruction multiplies the return value of the two calls.Instruction *pi = ...; IRBuilder<> Builder(pi); CallInst* callOne = Builder.CreateCall(...); CallInst* callTwo = Builder.CreateCall(...); Value* result = Builder.CreateMul(callOne, callTwo);
The example below is similar to the above example except that the created
IRBuilder
inserts instructions at the end of theBasicBlock
pb
.BasicBlock *pb = ...; IRBuilder<> Builder(pb); CallInst* callOne = Builder.CreateCall(...); CallInst* callTwo = Builder.CreateCall(...); Value* result = Builder.CreateMul(callOne, callTwo);
See Kaleidoscope: Code generation to LLVM IR for a practical use of the
IRBuilder
.
Deleting Instructions¶
Deleting an instruction from an existing sequence of instructions that form a
BasicBlock is very straight-forward: just call the instruction’s
eraseFromParent()
method. For example:
Instruction *I = .. ;
I->eraseFromParent();
This unlinks the instruction from its containing basic block and deletes it. If
you’d just like to unlink the instruction from its containing basic block but
not delete it, you can use the removeFromParent()
method.
Replacing an Instruction with another Value¶
Replacing individual instructions¶
Including “llvm/Transforms/Utils/BasicBlockUtils.h” permits use of two
very useful replace functions: ReplaceInstWithValue
and
ReplaceInstWithInst
.
Deleting Instructions¶
ReplaceInstWithValue
This function replaces all uses of a given instruction with a value, and then removes the original instruction. The following example illustrates the replacement of the result of a particular
AllocaInst
that allocates memory for a single integer with a null pointer to an integer.AllocaInst* instToReplace = ...; BasicBlock::iterator ii(instToReplace); ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii, Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty)));
ReplaceInstWithInst
This function replaces a particular instruction with another instruction, inserting the new instruction into the basic block at the location where the old instruction was, and replacing any uses of the old instruction with the new instruction. The following example illustrates the replacement of one
AllocaInst
with another.AllocaInst* instToReplace = ...; BasicBlock::iterator ii(instToReplace); ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii, new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
Replacing multiple uses of Users and Values¶
You can use Value::replaceAllUsesWith
and User::replaceUsesOfWith
to
change more than one use at a time. See the doxygen documentation for the
Value Class and User Class, respectively, for more
information.
Deleting GlobalVariables¶
Deleting a global variable from a module is just as easy as deleting an Instruction. First, you must have a pointer to the global variable that you wish to delete. You use this pointer to erase it from its parent, the module. For example:
GlobalVariable *GV = .. ;
GV->eraseFromParent();
How to Create Types¶
In generating IR, you may need some complex types. If you know these types
statically, you can use TypeBuilder<...>::get()
, defined in
llvm/Support/TypeBuilder.h
, to retrieve them. TypeBuilder
has two forms
depending on whether you’re building types for cross-compilation or native
library use. TypeBuilder<T, true>
requires that T
be independent of the
host environment, meaning that it’s built out of types from the llvm::types
(doxygen) namespace
and pointers, functions, arrays, etc. built of those. TypeBuilder<T, false>
additionally allows native C types whose size may depend on the host compiler.
For example,
FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get();
is easier to read and write than the equivalent
std::vector<const Type*> params;
params.push_back(PointerType::getUnqual(Type::Int32Ty));
FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false);
See the class comment for more details.
Threads and LLVM¶
This section describes the interaction of the LLVM APIs with multithreading, both on the part of client applications, and in the JIT, in the hosted application.
Note that LLVM’s support for multithreading is still relatively young. Up through version 2.5, the execution of threaded hosted applications was supported, but not threaded client access to the APIs. While this use case is now supported, clients must adhere to the guidelines specified below to ensure proper operation in multithreaded mode.
Note that, on Unix-like platforms, LLVM requires the presence of GCC’s atomic intrinsics in order to support threaded operation. If you need a multhreading-capable LLVM on a platform without a suitably modern system compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and using the resultant compiler to build a copy of LLVM with multithreading support.
Ending Execution with llvm_shutdown()
¶
When you are done using the LLVM APIs, you should call llvm_shutdown()
to
deallocate memory used for internal structures.
Lazy Initialization with ManagedStatic
¶
ManagedStatic
is a utility class in LLVM used to implement static
initialization of static resources, such as the global type tables. In a
single-threaded environment, it implements a simple lazy initialization scheme.
When LLVM is compiled with support for multi-threading, however, it uses
double-checked locking to implement thread-safe lazy initialization.
Achieving Isolation with LLVMContext
¶
LLVMContext
is an opaque class in the LLVM API which clients can use to
operate multiple, isolated instances of LLVM concurrently within the same
address space. For instance, in a hypothetical compile-server, the compilation
of an individual translation unit is conceptually independent from all the
others, and it would be desirable to be able to compile incoming translation
units concurrently on independent server threads. Fortunately, LLVMContext
exists to enable just this kind of scenario!
Conceptually, LLVMContext
provides isolation. Every LLVM entity
(Module
s, Value
s, Type
s, Constant
s, etc.) in LLVM’s
in-memory IR belongs to an LLVMContext
. Entities in different contexts
cannot interact with each other: Module
s in different contexts cannot be
linked together, Function
s cannot be added to Module
s in different
contexts, etc. What this means is that is is safe to compile on multiple
threads simultaneously, as long as no two threads operate on entities within the
same context.
In practice, very few places in the API require the explicit specification of a
LLVMContext
, other than the Type
creation/lookup APIs. Because every
Type
carries a reference to its owning context, most other entities can
determine what context they belong to by looking at their own Type
. If you
are adding new entities to LLVM IR, please try to maintain this interface
design.
For clients that do not require the benefits of isolation, LLVM provides a
convenience API getGlobalContext()
. This returns a global, lazily
initialized LLVMContext
that may be used in situations where isolation is
not a concern.
Threads and the JIT¶
LLVM’s “eager” JIT compiler is safe to use in threaded programs. Multiple
threads can call ExecutionEngine::getPointerToFunction()
or
ExecutionEngine::runFunction()
concurrently, and multiple threads can run
code output by the JIT concurrently. The user must still ensure that only one
thread accesses IR in a given LLVMContext
while another thread might be
modifying it. One way to do that is to always hold the JIT lock while accessing
IR outside the JIT (the JIT modifies the IR by adding CallbackVH
s).
Another way is to only call getPointerToFunction()
from the
LLVMContext
‘s thread.
When the JIT is configured to compile lazily (using
ExecutionEngine::DisableLazyCompilation(false)
), there is currently a race
condition in updating call sites
after a function is lazily-jitted. It’s still possible to use the lazy JIT in a
threaded program if you ensure that only one thread at a time can call any
particular lazy stub and that the JIT lock guards any IR access, but we suggest
using only the eager JIT in threaded programs.
Advanced Topics¶
This section describes some of the advanced or obscure API’s that most clients do not need to be aware of. These API’s tend manage the inner workings of the LLVM system, and only need to be accessed in unusual circumstances.
The ValueSymbolTable
class¶
The ValueSymbolTable
(doxygen) class provides
a symbol table that the Function and Module classes use for
naming value definitions. The symbol table can provide a name for any Value.
Note that the SymbolTable
class should not be directly accessed by most
clients. It should only be used when iteration over the symbol table names
themselves are required, which is very special purpose. Note that not all LLVM
Values have names, and those without names (i.e. they have an empty name) do
not exist in the symbol table.
Symbol tables support iteration over the values in the symbol table with
begin/end/iterator
and supports querying to see if a specific name is in the
symbol table (with lookup
). The ValueSymbolTable
class exposes no
public mutator methods, instead, simply call setName
on a value, which will
autoinsert it into the appropriate symbol table.
The User
and owned Use
classes’ memory layout¶
The User
(doxygen)
class provides a basis for expressing the ownership of User
towards other
Value instances. The
Use
(doxygen) helper
class is employed to do the bookkeeping and to facilitate O(1) addition and
removal.
Interaction and relationship between User
and Use
objects¶
A subclass of User
can choose between incorporating its Use
objects or
refer to them out-of-line by means of a pointer. A mixed variant (some Use
s inline others hung off) is impractical and breaks the invariant that the
Use
objects belonging to the same User
form a contiguous array.
We have 2 different layouts in the User
(sub)classes:
Layout a)
The
Use
object(s) are inside (resp. at fixed offset) of theUser
object and there are a fixed number of them.Layout b)
The
Use
object(s) are referenced by a pointer to an array from theUser
object and there may be a variable number of them.
As of v2.4 each layout still possesses a direct pointer to the start of the
array of Use
s. Though not mandatory for layout a), we stick to this
redundancy for the sake of simplicity. The User
object also stores the
number of Use
objects it has. (Theoretically this information can also be
calculated given the scheme presented below.)
Special forms of allocation operators (operator new
) enforce the following
memory layouts:
Layout a) is modelled by prepending the
User
object by theUse[]
array....---.---.---.---.-------... | P | P | P | P | User '''---'---'---'---'-------'''
Layout b) is modelled by pointing at the
Use[]
array..-------... | User '-------''' | v .---.---.---.---... | P | P | P | P | '---'---'---'---'''
(In the above figures ‘P
‘ stands for the Use**
that is stored in
each Use
object in the member Use::Prev
)
The waymarking algorithm¶
Since the Use
objects are deprived of the direct (back)pointer to their
User
objects, there must be a fast and exact method to recover it. This is
accomplished by the following scheme:
A bit-encoding in the 2 LSBits (least significant bits) of the Use::Prev
allows to find the start of the User
object:
00
— binary digit 001
— binary digit 110
— stop and calculate (s
)11
— full stop (S
)
Given a Use*
, all we have to do is to walk till we get a stop and we either
have a User
immediately behind or we have to walk to the next stop picking
up digits and calculating the offset:
.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.----------------
| 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*)
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'----------------
|+15 |+10 |+6 |+3 |+1
| | | | | __>
| | | | __________>
| | | ______________________>
| | ______________________________________>
| __________________________________________________________>
Only the significant number of bits need to be stored between the stops, so that
the worst case is 20 memory accesses when there are 1000 Use
objects
associated with a User
.
Reference implementation¶
The following literate Haskell fragment demonstrates the concept:
> import Test.QuickCheck
>
> digits :: Int -> [Char] -> [Char]
> digits 0 acc = '0' : acc
> digits 1 acc = '1' : acc
> digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
>
> dist :: Int -> [Char] -> [Char]
> dist 0 [] = ['S']
> dist 0 acc = acc
> dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
> dist n acc = dist (n - 1) $ dist 1 acc
>
> takeLast n ss = reverse $ take n $ reverse ss
>
> test = takeLast 40 $ dist 20 []
>
Printing <test> gives: "1s100000s11010s10100s1111s1010s110s11s1S"
The reverse algorithm computes the length of the string just by examining a certain prefix:
> pref :: [Char] -> Int
> pref "S" = 1
> pref ('s':'1':rest) = decode 2 1 rest
> pref (_:rest) = 1 + pref rest
>
> decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
> decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
> decode walk acc _ = walk + acc
>
Now, as expected, printing <pref test> gives 40
.
We can quickCheck this with following property:
> testcase = dist 2000 []
> testcaseLength = length testcase
>
> identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
> where arr = takeLast n testcase
>
As expected <quickCheck identityProp> gives:
*Main> quickCheck identityProp
OK, passed 100 tests.
Let’s be a bit more exhaustive:
>
> deepCheck p = check (defaultConfig { configMaxTest = 500 }) p
>
And here is the result of <deepCheck identityProp>:
*Main> deepCheck identityProp
OK, passed 500 tests.
Tagging considerations¶
To maintain the invariant that the 2 LSBits of each Use**
in Use
never
change after being set up, setters of Use::Prev
must re-tag the new
Use**
on every modification. Accordingly getters must strip the tag bits.
For layout b) instead of the User
we find a pointer (User*
with LSBit
set). Following this pointer brings us to the User
. A portable trick
ensures that the first bytes of User
(if interpreted as a pointer) never has
the LSBit set. (Portability is relying on the fact that all known compilers
place the vptr
in the first word of the instances.)
Designing Type Hiercharies and Polymorphic Interfaces¶
There are two different design patterns that tend to result in the use of
virtual dispatch for methods in a type hierarchy in C++ programs. The first is
a genuine type hierarchy where different types in the hierarchy model
a specific subset of the functionality and semantics, and these types nest
strictly within each other. Good examples of this can be seen in the Value
or Type
type hierarchies.
A second is the desire to dispatch dynamically across a collection of polymorphic interface implementations. This latter use case can be modeled with virtual dispatch and inheritance by defining an abstract interface base class which all implementations derive from and override. However, this implementation strategy forces an “is-a” relationship to exist that is not actually meaningful. There is often not some nested hierarchy of useful generalizations which code might interact with and move up and down. Instead, there is a singular interface which is dispatched across a range of implementations.
The preferred implementation strategy for the second use case is that of
generic programming (sometimes called “compile-time duck typing” or “static
polymorphism”). For example, a template over some type parameter T
can be
instantiated across any particular implementation that conforms to the
interface or concept. A good example here is the highly generic properties of
any type which models a node in a directed graph. LLVM models these primarily
through templates and generic programming. Such templates include the
LoopInfoBase
and DominatorTreeBase
. When this type of polymorphism
truly needs dynamic dispatch you can generalize it using a technique
called concept-based polymorphism. This pattern emulates the interfaces and
behaviors of templates using a very limited form of virtual dispatch for type
erasure inside its implementation. You can find examples of this technique in
the PassManager.h
system, and there is a more detailed introduction to it
by Sean Parent in several of his talks and papers:
- Inheritance Is The Base Class of Evil - The GoingNative 2013 talk describing this technique, and probably the best place to start.
- Value Semantics and Concepts-based Polymorphism - The C++Now! 2012 talk describing this technique in more detail.
- Sean Parent’s Papers and Presentations - A Github project full of links to slides, video, and sometimes code.
When deciding between creating a type hierarchy (with either tagged or virtual dispatch) and using templates or concepts-based polymorphism, consider whether there is some refinement of an abstract base class which is a semantically meaningful type on an interface boundary. If anything more refined than the root abstract interface is meaningless to talk about as a partial extension of the semantic model, then your use case likely fits better with polymorphism and you should avoid using virtual dispatch. However, there may be some exigent circumstances that require one technique or the other to be used.
If you do need to introduce a type hierarchy, we prefer to use explicitly closed type hierarchies with manual tagged dispatch and/or RTTI rather than the open inheritance model and virtual dispatch that is more common in C++ code. This is because LLVM rarely encourages library consumers to extend its core types, and leverages the closed and tag-dispatched nature of its hierarchies to generate significantly more efficient code. We have also found that a large amount of our usage of type hierarchies fits better with tag-based pattern matching rather than dynamic dispatch across a common interface. Within LLVM we have built custom helpers to facilitate this design. See this document’s section on isa and dyn_cast and our detailed document which describes how you can implement this pattern for use with the LLVM helpers.
ABI Breaking Checks¶
Checks and asserts that alter the LLVM C++ ABI are predicated on the preprocessor symbol LLVM_ENABLE_ABI_BREAKING_CHECKS – LLVM libraries built with LLVM_ENABLE_ABI_BREAKING_CHECKS are not ABI compatible LLVM libraries built without it defined. By default, turning on assertions also turns on LLVM_ENABLE_ABI_BREAKING_CHECKS so a default +Asserts build is not ABI compatible with a default -Asserts build. Clients that want ABI compatibility between +Asserts and -Asserts builds should use the CMake or autoconf build systems to set LLVM_ENABLE_ABI_BREAKING_CHECKS independently of LLVM_ENABLE_ASSERTIONS.
The Core LLVM Class Hierarchy Reference¶
#include "llvm/IR/Type.h"
header source: Type.h
doxygen info: Type Clases
The Core LLVM classes are the primary means of representing the program being
inspected or transformed. The core LLVM classes are defined in header files in
the include/llvm/IR
directory, and implemented in the lib/IR
directory. It’s worth noting that, for historical reasons, this library is
called libLLVMCore.so
, not libLLVMIR.so
as you might expect.
The Type class and Derived Types¶
Type
is a superclass of all type classes. Every Value
has a Type
.
Type
cannot be instantiated directly but only through its subclasses.
Certain primitive types (VoidType
, LabelType
, FloatType
and
DoubleType
) have hidden subclasses. They are hidden because they offer no
useful functionality beyond what the Type
class offers except to distinguish
themselves from other subclasses of Type
.
All other types are subclasses of DerivedType
. Types can be named, but this
is not a requirement. There exists exactly one instance of a given shape at any
one time. This allows type equality to be performed with address equality of
the Type Instance. That is, given two Type*
values, the types are identical
if the pointers are identical.
Important Public Methods¶
bool isIntegerTy() const
: Returns true for any integer type.bool isFloatingPointTy()
: Return true if this is one of the five floating point types.bool isSized()
: Return true if the type has known size. Things that don’t have a size are abstract types, labels and void.
Important Derived Types¶
IntegerType
Subclass of DerivedType that represents integer types of any bit width. Any bit width between
IntegerType::MIN_INT_BITS
(1) andIntegerType::MAX_INT_BITS
(~8 million) can be represented.static const IntegerType* get(unsigned NumBits)
: get an integer type of a specific bit width.unsigned getBitWidth() const
: Get the bit width of an integer type.
SequentialType
This is subclassed by ArrayType, PointerType and VectorType.
const Type * getElementType() const
: Returns the type of each of the elements in the sequential type.
ArrayType
This is a subclass of SequentialType and defines the interface for array types.
unsigned getNumElements() const
: Returns the number of elements in the array.
PointerType
- Subclass of SequentialType for pointer types.
VectorType
- Subclass of SequentialType for vector types. A vector type is similar to an ArrayType but is distinguished because it is a first class type whereas ArrayType is not. Vector types are used for vector operations and are usually small vectors of an integer or floating point type.
StructType
- Subclass of DerivedTypes for struct types.
FunctionType
Subclass of DerivedTypes for function types.
bool isVarArg() const
: Returns true if it’s a vararg function.const Type * getReturnType() const
: Returns the return type of the function.const Type * getParamType (unsigned i)
: Returns the type of the ith parameter.const unsigned getNumParams() const
: Returns the number of formal parameters.
The Module
class¶
#include "llvm/IR/Module.h"
header source: Module.h
doxygen info: Module Class
The Module
class represents the top level structure present in LLVM
programs. An LLVM module is effectively either a translation unit of the
original program or a combination of several translation units merged by the
linker. The Module
class keeps track of a list of Functions, a list of GlobalVariables, and a SymbolTable.
Additionally, it contains a few helpful member functions that try to make common
operations easy.
Important Public Members of the Module
class¶
Module::Module(std::string name = "")
Constructing a Module is easy. You can optionally provide a name for it (probably based on the name of the translation unit).
Module::iterator
- Typedef for function list iteratorModule::const_iterator
- Typedef for const_iterator.begin()
,end()
,size()
,empty()
These are forwarding methods that make it easy to access the contents of a
Module
object’s Function list.Module::FunctionListType &getFunctionList()
Returns the list of Functions. This is necessary to use when you need to update the list or perform a complex action that doesn’t have a forwarding method.
Module::global_iterator
- Typedef for global variable list iteratorModule::const_global_iterator
- Typedef for const_iterator.global_begin()
,global_end()
,global_size()
,global_empty()
These are forwarding methods that make it easy to access the contents of a
Module
object’s GlobalVariable list.Module::GlobalListType &getGlobalList()
Returns the list of GlobalVariables. This is necessary to use when you need to update the list or perform a complex action that doesn’t have a forwarding method.
SymbolTable *getSymbolTable()
Return a reference to the SymbolTable for this
Module
.
Function *getFunction(StringRef Name) const
Look up the specified function in the
Module
SymbolTable. If it does not exist, returnnull
.Function *getOrInsertFunction(const std::string &Name, const FunctionType *T)
Look up the specified function in the
Module
SymbolTable. If it does not exist, add an external declaration for the function and return it.std::string getTypeName(const Type *Ty)
If there is at least one entry in the SymbolTable for the specified Type, return it. Otherwise return the empty string.
bool addTypeName(const std::string &Name, const Type *Ty)
Insert an entry in the SymbolTable mapping
Name
toTy
. If there is already an entry for this name, true is returned and the SymbolTable is not modified.
The Value
class¶
#include "llvm/IR/Value.h"
header source: Value.h
doxygen info: Value Class
The Value
class is the most important class in the LLVM Source base. It
represents a typed value that may be used (among other things) as an operand to
an instruction. There are many different types of Value
s, such as
Constants, Arguments. Even Instructions and Functions are Value
s.
A particular Value
may be used many times in the LLVM representation for a
program. For example, an incoming argument to a function (represented with an
instance of the Argument class) is “used” by every instruction in the function
that references the argument. To keep track of this relationship, the Value
class keeps a list of all of the User
s that is using it (the User class
is a base class for all nodes in the LLVM graph that can refer to Value
s).
This use list is how LLVM represents def-use information in the program, and is
accessible through the use_*
methods, shown below.
Because LLVM is a typed representation, every LLVM Value
is typed, and this
Type is available through the getType()
method. In addition, all LLVM
values can be named. The “name” of the Value
is a symbolic string printed
in the LLVM code:
%foo = add i32 1, 2
The name of this instruction is “foo”. NOTE that the name of any value may
be missing (an empty string), so names should ONLY be used for debugging
(making the source code easier to read, debugging printouts), they should not be
used to keep track of values or map between them. For this purpose, use a
std::map
of pointers to the Value
itself instead.
One important aspect of LLVM is that there is no distinction between an SSA variable and the operation that produces it. Because of this, any reference to the value produced by an instruction (or the value available as an incoming argument, for example) is represented as a direct pointer to the instance of the class that represents this value. Although this may take some getting used to, it simplifies the representation and makes it easier to manipulate.
Important Public Members of the Value
class¶
Value::use_iterator
- Typedef for iterator over the use-listValue::const_use_iterator
- Typedef for const_iterator over the use-listunsigned use_size()
- Returns the number of users of the value.bool use_empty()
- Returns true if there are no users.use_iterator use_begin()
- Get an iterator to the start of the use-list.use_iterator use_end()
- Get an iterator to the end of the use-list.User *use_back()
- Returns the last element in the list.These methods are the interface to access the def-use information in LLVM. As with all other iterators in LLVM, the naming conventions follow the conventions defined by the STL.
Type *getType() const
This method returns the Type of the Value.bool hasName() const
std::string getName() const
void setName(const std::string &Name)
This family of methods is used to access and assign a name to a
Value
, be aware of the precaution above.void replaceAllUsesWith(Value *V)
This method traverses the use list of a
Value
changing all Users of the current value to refer to “V
” instead. For example, if you detect that an instruction always produces a constant value (for example through constant folding), you can replace all uses of the instruction with the constant like this:Inst->replaceAllUsesWith(ConstVal);
The User
class¶
#include "llvm/IR/User.h"
header source: User.h
doxygen info: User Class
Superclass: Value
The User
class is the common base class of all LLVM nodes that may refer to
Value
s. It exposes a list of “Operands” that are all of the Value
s
that the User is referring to. The User
class itself is a subclass of
Value
.
The operands of a User
point directly to the LLVM Value
that it refers
to. Because LLVM uses Static Single Assignment (SSA) form, there can only be
one definition referred to, allowing this direct connection. This connection
provides the use-def information in LLVM.
Important Public Members of the User
class¶
The User
class exposes the operand list in two ways: through an index access
interface and through an iterator based interface.
Value *getOperand(unsigned i)
unsigned getNumOperands()
These two methods expose the operands of the
User
in a convenient form for direct access.User::op_iterator
- Typedef for iterator over the operand listop_iterator op_begin()
- Get an iterator to the start of the operand list.op_iterator op_end()
- Get an iterator to the end of the operand list.Together, these methods make up the iterator based interface to the operands of a
User
.
The Instruction
class¶
#include "llvm/IR/Instruction.h"
header source: Instruction.h
doxygen info: Instruction Class
The Instruction
class is the common base class for all LLVM instructions.
It provides only a few methods, but is a very commonly used class. The primary
data tracked by the Instruction
class itself is the opcode (instruction
type) and the parent BasicBlock the Instruction
is embedded into. To
represent a specific type of instruction, one of many subclasses of
Instruction
are used.
Because the Instruction
class subclasses the User class, its operands can
be accessed in the same way as for other User
s (with the
getOperand()
/getNumOperands()
and op_begin()
/op_end()
methods).
An important file for the Instruction
class is the llvm/Instruction.def
file. This file contains some meta-data about the various different types of
instructions in LLVM. It describes the enum values that are used as opcodes
(for example Instruction::Add
and Instruction::ICmp
), as well as the
concrete sub-classes of Instruction
that implement the instruction (for
example BinaryOperator and CmpInst). Unfortunately, the use of macros in this
file confuses doxygen, so these enum values don’t show up correctly in the
doxygen output.
Important Subclasses of the Instruction
class¶
BinaryOperator
This subclasses represents all two operand instructions whose operands must be the same type, except for the comparison instructions.
CastInst
This subclass is the parent of the 12 casting instructions. It provides common operations on cast instructions.
CmpInst
This subclass respresents the two comparison instructions, ICmpInst (integer opreands), and FCmpInst (floating point operands).
TerminatorInst
This subclass is the parent of all terminator instructions (those which can terminate a block).
Important Public Members of the Instruction
class¶
BasicBlock *getParent()
Returns the BasicBlock that this
Instruction
is embedded into.bool mayWriteToMemory()
Returns true if the instruction writes to memory, i.e. it is a
call
,free
,invoke
, orstore
.unsigned getOpcode()
Returns the opcode for the
Instruction
.Instruction *clone() const
Returns another instance of the specified instruction, identical in all ways to the original except that the instruction has no parent (i.e. it’s not embedded into a BasicBlock), and it has no name.
The Constant
class and subclasses¶
Constant represents a base class for different types of constants. It is subclassed by ConstantInt, ConstantArray, etc. for representing the various types of Constants. GlobalValue is also a subclass, which represents the address of a global variable or function.
Important Subclasses of Constant¶
- ConstantInt : This subclass of Constant represents an integer constant of
any width.
const APInt& getValue() const
: Returns the underlying value of this constant, an APInt value.int64_t getSExtValue() const
: Converts the underlying APInt value to an int64_t via sign extension. If the value (not the bit width) of the APInt is too large to fit in an int64_t, an assertion will result. For this reason, use of this method is discouraged.uint64_t getZExtValue() const
: Converts the underlying APInt value to a uint64_t via zero extension. IF the value (not the bit width) of the APInt is too large to fit in a uint64_t, an assertion will result. For this reason, use of this method is discouraged.static ConstantInt* get(const APInt& Val)
: Returns the ConstantInt object that represents the value provided byVal
. The type is implied as the IntegerType that corresponds to the bit width ofVal
.static ConstantInt* get(const Type *Ty, uint64_t Val)
: Returns the ConstantInt object that represents the value provided byVal
for integer typeTy
.
- ConstantFP : This class represents a floating point constant.
double getValue() const
: Returns the underlying value of this constant.
- ConstantArray : This represents a constant array.
const std::vector<Use> &getValues() const
: Returns a vector of component constants that makeup this array.
- ConstantStruct : This represents a constant struct.
const std::vector<Use> &getValues() const
: Returns a vector of component constants that makeup this array.
- GlobalValue : This represents either a global variable or a function. In either case, the value is a constant fixed address (after linking).
The GlobalValue
class¶
#include "llvm/IR/GlobalValue.h"
header source: GlobalValue.h
doxygen info: GlobalValue Class
Superclasses: Constant, User, Value
Global values ( GlobalVariables or Functions) are the
only LLVM values that are visible in the bodies of all Functions. Because they are visible at global scope, they are also
subject to linking with other globals defined in different translation units.
To control the linking process, GlobalValue
s know their linkage rules.
Specifically, GlobalValue
s know whether they have internal or external
linkage, as defined by the LinkageTypes
enumeration.
If a GlobalValue
has internal linkage (equivalent to being static
in C),
it is not visible to code outside the current translation unit, and does not
participate in linking. If it has external linkage, it is visible to external
code, and does participate in linking. In addition to linkage information,
GlobalValue
s keep track of which Module they are currently part of.
Because GlobalValue
s are memory objects, they are always referred to by
their address. As such, the Type of a global is always a pointer to its
contents. It is important to remember this when using the GetElementPtrInst
instruction because this pointer must be dereferenced first. For example, if
you have a GlobalVariable
(a subclass of GlobalValue)
that is an array
of 24 ints, type [24 x i32]
, then the GlobalVariable
is a pointer to
that array. Although the address of the first element of this array and the
value of the GlobalVariable
are the same, they have different types. The
GlobalVariable
‘s type is [24 x i32]
. The first element’s type is
i32.
Because of this, accessing a global value requires you to dereference
the pointer with GetElementPtrInst
first, then its elements can be accessed.
This is explained in the LLVM Language Reference Manual.
Important Public Members of the GlobalValue
class¶
bool hasInternalLinkage() const
bool hasExternalLinkage() const
void setInternalLinkage(bool HasInternalLinkage)
These methods manipulate the linkage characteristics of the
GlobalValue
.Module *getParent()
This returns the Module that the GlobalValue is currently embedded into.
The Function
class¶
#include "llvm/IR/Function.h"
header source: Function.h
doxygen info: Function Class
Superclasses: GlobalValue, Constant, User, Value
The Function
class represents a single procedure in LLVM. It is actually
one of the more complex classes in the LLVM hierarchy because it must keep track
of a large amount of data. The Function
class keeps track of a list of
BasicBlocks, a list of formal Arguments, and a SymbolTable.
The list of BasicBlocks is the most commonly used part of Function
objects. The list imposes an implicit ordering of the blocks in the function,
which indicate how the code will be laid out by the backend. Additionally, the
first BasicBlock is the implicit entry node for the Function
. It is not
legal in LLVM to explicitly branch to this initial block. There are no implicit
exit nodes, and in fact there may be multiple exit nodes from a single
Function
. If the BasicBlock list is empty, this indicates that the
Function
is actually a function declaration: the actual body of the function
hasn’t been linked in yet.
In addition to a list of BasicBlocks, the Function
class also keeps track
of the list of formal Arguments that the function receives. This container
manages the lifetime of the Argument nodes, just like the BasicBlock list does
for the BasicBlocks.
The SymbolTable is a very rarely used LLVM feature that is only used when you have to look up a value by name. Aside from that, the SymbolTable is used internally to make sure that there are not conflicts between the names of Instructions, BasicBlocks, or Arguments in the function body.
Note that Function
is a GlobalValue and therefore also a Constant. The
value of the function is its address (after linking) which is guaranteed to be
constant.
Important Public Members of the Function
¶
Function(const FunctionType *Ty, LinkageTypes Linkage, const std::string &N = "", Module* Parent = 0)
Constructor used when you need to create new
Function
s to add the program. The constructor must specify the type of the function to create and what type of linkage the function should have. The FunctionType argument specifies the formal arguments and return value for the function. The same FunctionType value can be used to create multiple functions. TheParent
argument specifies the Module in which the function is defined. If this argument is provided, the function will automatically be inserted into that module’s list of functions.bool isDeclaration()
Return whether or not the
Function
has a body defined. If the function is “external”, it does not have a body, and thus must be resolved by linking with a function defined in a different translation unit.Function::iterator
- Typedef for basic block list iteratorFunction::const_iterator
- Typedef for const_iterator.begin()
,end()
,size()
,empty()
These are forwarding methods that make it easy to access the contents of a
Function
object’s BasicBlock list.Function::BasicBlockListType &getBasicBlockList()
Returns the list of BasicBlocks. This is necessary to use when you need to update the list or perform a complex action that doesn’t have a forwarding method.
Function::arg_iterator
- Typedef for the argument list iteratorFunction::const_arg_iterator
- Typedef for const_iterator.arg_begin()
,arg_end()
,arg_size()
,arg_empty()
These are forwarding methods that make it easy to access the contents of a
Function
object’s Argument list.Function::ArgumentListType &getArgumentList()
Returns the list of Argument. This is necessary to use when you need to update the list or perform a complex action that doesn’t have a forwarding method.
BasicBlock &getEntryBlock()
Returns the entry
BasicBlock
for the function. Because the entry block for the function is always the first block, this returns the first block of theFunction
.Type *getReturnType()
FunctionType *getFunctionType()
This traverses the Type of the
Function
and returns the return type of the function, or the FunctionType of the actual function.SymbolTable *getSymbolTable()
Return a pointer to the SymbolTable for this
Function
.
The GlobalVariable
class¶
#include "llvm/IR/GlobalVariable.h"
header source: GlobalVariable.h
doxygen info: GlobalVariable Class
Superclasses: GlobalValue, Constant, User, Value
Global variables are represented with the (surprise surprise) GlobalVariable
class. Like functions, GlobalVariable
s are also subclasses of
GlobalValue, and as such are always referenced by their address (global values
must live in memory, so their “name” refers to their constant address). See
GlobalValue for more on this. Global variables may have an initial value
(which must be a Constant), and if they have an initializer, they may be marked
as “constant” themselves (indicating that their contents never change at
runtime).
Important Public Members of the GlobalVariable
class¶
GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage, Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)
Create a new global variable of the specified type. If
isConstant
is true then the global variable will be marked as unchanging for the program. The Linkage parameter specifies the type of linkage (internal, external, weak, linkonce, appending) for the variable. If the linkage is InternalLinkage, WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then the resultant global variable will have internal linkage. AppendingLinkage concatenates together all instances (in different translation units) of the variable into a single variable but is only applicable to arrays. See the LLVM Language Reference for further details on linkage types. Optionally an initializer, a name, and the module to put the variable into may be specified for the global variable as well.bool isConstant() const
Returns true if this is a global variable that is known not to be modified at runtime.
bool hasInitializer()
Returns true if this
GlobalVariable
has an intializer.Constant *getInitializer()
Returns the initial value for a
GlobalVariable
. It is not legal to call this method if there is no initializer.
The BasicBlock
class¶
#include "llvm/IR/BasicBlock.h"
header source: BasicBlock.h
doxygen info: BasicBlock Class
Superclass: Value
This class represents a single entry single exit section of the code, commonly
known as a basic block by the compiler community. The BasicBlock
class
maintains a list of Instructions, which form the body of the block. Matching
the language definition, the last element of this list of instructions is always
a terminator instruction (a subclass of the TerminatorInst class).
In addition to tracking the list of instructions that make up the block, the
BasicBlock
class also keeps track of the Function that
it is embedded into.
Note that BasicBlock
s themselves are Values, because they are
referenced by instructions like branches and can go in the switch tables.
BasicBlock
s have type label
.
Important Public Members of the BasicBlock
class¶
BasicBlock(const std::string &Name = "", Function *Parent = 0)
The
BasicBlock
constructor is used to create new basic blocks for insertion into a function. The constructor optionally takes a name for the new block, and a Function to insert it into. If theParent
parameter is specified, the newBasicBlock
is automatically inserted at the end of the specified Function, if not specified, the BasicBlock must be manually inserted into the Function.BasicBlock::iterator
- Typedef for instruction list iteratorBasicBlock::const_iterator
- Typedef for const_iterator.begin()
,end()
,front()
,back()
,size()
,empty()
STL-style functions for accessing the instruction list.These methods and typedefs are forwarding functions that have the same semantics as the standard library methods of the same names. These methods expose the underlying instruction list of a basic block in a way that is easy to manipulate. To get the full complement of container operations (including operations to update the list), you must use the
getInstList()
method.BasicBlock::InstListType &getInstList()
This method is used to get access to the underlying container that actually holds the Instructions. This method must be used when there isn’t a forwarding function in the
BasicBlock
class for the operation that you would like to perform. Because there are no forwarding functions for “updating” operations, you need to use this if you want to update the contents of aBasicBlock
.Function *getParent()
Returns a pointer to Function the block is embedded into, or a null pointer if it is homeless.
TerminatorInst *getTerminator()
Returns a pointer to the terminator instruction that appears at the end of the
BasicBlock
. If there is no terminator instruction, or if the last instruction in the block is not a terminator, then a null pointer is returned.
The Argument
class¶
This subclass of Value defines the interface for incoming formal arguments to a function. A Function maintains a list of its formal arguments. An argument has a pointer to the parent Function.
LLVM Extensions¶
Introduction¶
This document describes extensions to tools and formats LLVM seeks compatibility with.
General Assembly Syntax¶
C99-style Hexadecimal Floating-point Constants¶
LLVM’s assemblers allow floating-point constants to be written in C99’s hexadecimal format instead of decimal if desired.
.section .data
.float 0x1c2.2ap3
Machine-specific Assembly Syntax¶
X86/COFF-Dependent¶
Relocations¶
The following additional relocation types are supported:
@IMGREL (AT&T syntax only) generates an image-relative relocation that
corresponds to the COFF relocation types IMAGE_REL_I386_DIR32NB
(32-bit) or
IMAGE_REL_AMD64_ADDR32NB
(64-bit).
.text
fun:
mov foo@IMGREL(%ebx, %ecx, 4), %eax
.section .pdata
.long fun@IMGREL
.long (fun@imgrel + 0x3F)
.long $unwind$fun@imgrel
.secrel32 generates a relocation that corresponds to the COFF relocation
types IMAGE_REL_I386_SECREL
(32-bit) or IMAGE_REL_AMD64_SECREL
(64-bit).
.secidx relocation generates an index of the section that contains
the target. It corresponds to the COFF relocation types
IMAGE_REL_I386_SECTION
(32-bit) or IMAGE_REL_AMD64_SECTION
(64-bit).
.section .debug$S,"rn"
.long 4
.long 242
.long 40
.secrel32 _function_name
.secidx _function_name
...
.linkonce
Directive¶
Syntax:
.linkonce [ comdat type ]
Supported COMDAT types:
discard
- Discards duplicate sections with the same COMDAT symbol. This is the default if no type is specified.
one_only
- If the symbol is defined multiple times, the linker issues an error.
same_size
- Duplicates are discarded, but the linker issues an error if any have different sizes.
same_contents
- Duplicates are discarded, but the linker issues an error if any duplicates do not have exactly the same content.
largest
- Links the largest section from among the duplicates.
newest
- Links the newest section from among the duplicates.
.section .text$foo
.linkonce
...
.section
Directive¶
MC supports passing the information in .linkonce
at the end of
.section
. For example, these two codes are equivalent
.section secName, "dr", discard, "Symbol1"
.globl Symbol1
Symbol1:
.long 1
.section secName, "dr"
.linkonce discard
.globl Symbol1
Symbol1:
.long 1
Note that in the combined form the COMDAT symbol is explicit. This extension exists to support multiple sections with the same name in different COMDATs:
.section secName, "dr", discard, "Symbol1"
.globl Symbol1
Symbol1:
.long 1
.section secName, "dr", discard, "Symbol2"
.globl Symbol2
Symbol2:
.long 1
In addition to the types allowed with .linkonce
, .section
also accepts
associative
. The meaning is that the section is linked if a certain other
COMDAT section is linked. This other section is indicated by the comdat symbol
in this directive. It can be any symbol defined in the associated section, but
is usually the associated section’s comdat.
The following restrictions apply to the associated section:
- It must be a COMDAT section.
- It cannot be another associative COMDAT section.
In the following example the symobl sym
is the comdat symbol of .foo
and .bar
is associated to .foo
.
.section .foo,"bw",discard, "sym"
.section .bar,"rd",associative, "sym"
ELF-Dependent¶
.section
Directive¶
In order to support creating multiple sections with the same name and comdat,
it is possible to add an unique number at the end of the .seciton
directive.
For example, the following code creates two sections named .text
.
.section .text,"ax",@progbits,unique,1
nop
.section .text,"ax",@progbits,unique,2
nop
The unique number is not present in the resulting object at all. It is just used in the assembler to differentiate the sections.
Target Specific Behaviour¶
Windows on ARM¶
Stack Probe Emission¶
The reference implementation (Microsoft Visual Studio 2012) emits stack probes in the following fashion:
movw r4, #constant
bl __chkstk
sub.w sp, sp, r4
However, this has the limitation of 32 MiB (±16MiB). In order to accommodate
larger binaries, LLVM supports the use of -mcode-model=large
to allow a 4GiB
range via a slight deviation. It will generate an indirect jump as follows:
movw r4, #constant
movw r12, :lower16:__chkstk
movt r12, :upper16:__chkstk
blx r12
sub.w sp, sp, r4
Variable Length Arrays¶
The reference implementation (Microsoft Visual Studio 2012) does not permit the emission of Variable Length Arrays (VLAs).
The Windows ARM Itanium ABI extends the base ABI by adding support for emitting
a dynamic stack allocation. When emitting a variable stack allocation, a call
to __chkstk
is emitted unconditionally to ensure that guard pages are setup
properly. The emission of this stack probe emission is handled similar to the
standard stack probe emission.
The MSVC environment does not emit code for VLAs currently.
LibFuzzer – a library for coverage-guided fuzz testing.¶
Introduction¶
This library is intended primarily for in-process coverage-guided fuzz testing (fuzzing) of other libraries. The typical workflow looks like this:
- Build the Fuzzer library as a static archive (or just a set of .o files). Note that the Fuzzer contains the main() function. Preferably do not use sanitizers while building the Fuzzer.
- Build the library you are going to test with -fsanitize-coverage={bb,edge}[,indirect-calls,8bit-counters] and one of the sanitizers. We recommend to build the library in several different modes (e.g. asan, msan, lsan, ubsan, etc) and even using different optimizations options (e.g. -O0, -O1, -O2) to diversify testing.
- Build a test driver using the same options as the library.
The test driver is a C/C++ file containing interesting calls to the library
inside a single function
extern "C" void LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size);
- Link the Fuzzer, the library and the driver together into an executable using the same sanitizer options as for the library.
- Collect the initial corpus of inputs for the
fuzzer (a directory with test inputs, one file per input).
The better your inputs are the faster you will find something interesting.
Also try to keep your inputs small, otherwise the Fuzzer will run too slow.
By default, the Fuzzer limits the size of every input to 64 bytes
(use
-max_len=N
to override). - Run the fuzzer with the test corpus. As new interesting test cases are discovered they will be added to the corpus. If a bug is discovered by the sanitizer (asan, etc) it will be reported as usual and the reproducer will be written to disk. Each Fuzzer process is single-threaded (unless the library starts its own threads). You can run the Fuzzer on the same corpus in multiple processes in parallel.
The Fuzzer is similar in concept to AFL, but uses in-process Fuzzing, which is more fragile, more restrictive, but potentially much faster as it has no overhead for process start-up. It uses LLVM’s SanitizerCoverage instrumentation to get in-process coverage-feedback
The code resides in the LLVM repository, requires the fresh Clang compiler to build and is used to fuzz various parts of LLVM, but the Fuzzer itself does not (and should not) depend on any part of LLVM and can be used for other projects w/o requiring the rest of LLVM.
Flags¶
The most important flags are:
seed 0 Random seed. If 0, seed is generated.
runs -1 Number of individual test runs (-1 for infinite runs).
max_len 64 Maximum length of the test input.
cross_over 1 If 1, cross over inputs.
mutate_depth 5 Apply this number of consecutive mutations to each input.
timeout 1200 Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
help 0 Print help.
save_minimized_corpus 0 If 1, the minimized corpus is saved into the first input directory
jobs 0 Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log.
workers 0 Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used.
tokens 0 Use the file with tokens (one token per line) to fuzz a token based input language.
apply_tokens 0 Read the given input file, substitute bytes with tokens and write the result to stdout.
sync_command 0 Execute an external command "<sync_command> <test_corpus>" to synchronize the test corpus.
sync_timeout 600 Minimum timeout between syncs.
For the full list of flags run the fuzzer binary with -help=1
.
Usage examples¶
Toy example¶
A simple function that does something interesting if it receives the input “HI!”:
cat << EOF >> test_fuzzer.cc
extern "C" void LLVMFuzzerTestOneInput(const unsigned char *data, unsigned long size) {
if (size > 0 && data[0] == 'H')
if (size > 1 && data[1] == 'I')
if (size > 2 && data[2] == '!')
__builtin_trap();
}
EOF
# Get lib/Fuzzer. Assuming that you already have fresh clang in PATH.
svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
# Build lib/Fuzzer files.
clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
# Build test_fuzzer.cc with asan and link against lib/Fuzzer.
clang++ -fsanitize=address -fsanitize-coverage=edge test_fuzzer.cc Fuzzer*.o
# Run the fuzzer with no corpus.
./a.out
You should get Illegal instruction (core dumped)
pretty quickly.
PCRE2¶
Here we show how to use lib/Fuzzer on something real, yet simple: pcre2:
COV_FLAGS=" -fsanitize-coverage=edge,indirect-calls,8bit-counters"
# Get PCRE2
svn co svn://vcs.exim.org/pcre2/code/trunk pcre
# Get lib/Fuzzer. Assuming that you already have fresh clang in PATH.
svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
# Build PCRE2 with AddressSanitizer and coverage.
(cd pcre; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install)
# Build lib/Fuzzer files.
clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
# Build the actual function that does something interesting with PCRE2.
cat << EOF > pcre_fuzzer.cc
#include <string.h>
#include "pcre2posix.h"
extern "C" void LLVMFuzzerTestOneInput(const unsigned char *data, size_t size) {
if (size < 1) return;
char *str = new char[size+1];
memcpy(str, data, size);
str[size] = 0;
regex_t preg;
if (0 == regcomp(&preg, str, 0)) {
regexec(&preg, str, 0, 0, 0);
regfree(&preg);
}
delete [] str;
}
EOF
clang++ -g -fsanitize=address $COV_FLAGS -c -std=c++11 -I inst/include/ pcre_fuzzer.cc
# Link.
clang++ -g -fsanitize=address -Wl,--whole-archive inst/lib/*.a -Wl,-no-whole-archive Fuzzer*.o pcre_fuzzer.o -o pcre_fuzzer
This will give you a binary of the fuzzer, called pcre_fuzzer
.
Now, create a directory that will hold the test corpus:
mkdir -p CORPUS
For simple input languages like regular expressions this is all you need. For more complicated inputs populate the directory with some input samples. Now run the fuzzer with the corpus dir as the only parameter:
./pcre_fuzzer ./CORPUS
You will see output like this:
Seed: 1876794929
#0 READ cov 0 bits 0 units 1 exec/s 0
#1 pulse cov 3 bits 0 units 1 exec/s 0
#1 INITED cov 3 bits 0 units 1 exec/s 0
#2 pulse cov 208 bits 0 units 1 exec/s 0
#2 NEW cov 208 bits 0 units 2 exec/s 0 L: 64
#3 NEW cov 217 bits 0 units 3 exec/s 0 L: 63
#4 pulse cov 217 bits 0 units 3 exec/s 0
- The
Seed:
line shows you the current random seed (you can change it with-seed=N
flag). - The
READ
line shows you how many input files were read (since you passed an empty dir there were inputs, but one dummy input was synthesised). - The
INITED
line shows you that how many inputs will be fuzzed. - The
NEW
lines appear with the fuzzer finds a new interesting input, which is saved to the CORPUS dir. If multiple corpus dirs are given, the first one is used. - The
pulse
lines appear periodically to show the current status.
Now, interrupt the fuzzer and run it again the same way. You will see:
Seed: 1879995378
#0 READ cov 0 bits 0 units 564 exec/s 0
#1 pulse cov 502 bits 0 units 564 exec/s 0
...
#512 pulse cov 2933 bits 0 units 564 exec/s 512
#564 INITED cov 2991 bits 0 units 344 exec/s 564
#1024 pulse cov 2991 bits 0 units 344 exec/s 1024
#1455 NEW cov 2995 bits 0 units 345 exec/s 1455 L: 49
This time you were running the fuzzer with a non-empty input corpus (564 items).
As the first step, the fuzzer minimized the set to produce 344 interesting items (the INITED
line)
It is quite convenient to store test corpuses in git. As an example, here is a git repository with test inputs for the above PCRE2 fuzzer:
git clone https://github.com/kcc/fuzzing-with-sanitizers.git
./pcre_fuzzer ./fuzzing-with-sanitizers/pcre2/C1/
You may run N
independent fuzzer jobs in parallel on M
CPUs:
N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M
By default (-reload=1
) the fuzzer processes will periodically scan the CORPUS directory
and reload any new tests. This way the test inputs found by one process will be picked up
by all others.
If -workers=$M
is not supplied, min($N,NumberOfCpuCore/2)
will be used.
Heartbleed¶
Remember Heartbleed? As it was recently shown, fuzzing with AddressSanitizer can find Heartbleed. Indeed, here are the step-by-step instructions to find Heartbleed with LibFuzzer:
wget https://www.openssl.org/source/openssl-1.0.1f.tar.gz
tar xf openssl-1.0.1f.tar.gz
COV_FLAGS="-fsanitize-coverage=edge,indirect-calls" # -fsanitize-coverage=8bit-counters
(cd openssl-1.0.1f/ && ./config &&
make -j 32 CC="clang -g -fsanitize=address $COV_FLAGS")
# Get and build LibFuzzer
svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
# Get examples of key/pem files.
git clone https://github.com/hannob/selftls
cp selftls/server* . -v
cat << EOF > handshake-fuzz.cc
#include <openssl/ssl.h>
#include <openssl/err.h>
#include <assert.h>
SSL_CTX *sctx;
int Init() {
SSL_library_init();
SSL_load_error_strings();
ERR_load_BIO_strings();
OpenSSL_add_all_algorithms();
assert (sctx = SSL_CTX_new(TLSv1_method()));
assert (SSL_CTX_use_certificate_file(sctx, "server.pem", SSL_FILETYPE_PEM));
assert (SSL_CTX_use_PrivateKey_file(sctx, "server.key", SSL_FILETYPE_PEM));
return 0;
}
extern "C" void LLVMFuzzerTestOneInput(unsigned char *Data, size_t Size) {
static int unused = Init();
SSL *server = SSL_new(sctx);
BIO *sinbio = BIO_new(BIO_s_mem());
BIO *soutbio = BIO_new(BIO_s_mem());
SSL_set_bio(server, sinbio, soutbio);
SSL_set_accept_state(server);
BIO_write(sinbio, Data, Size);
SSL_do_handshake(server);
SSL_free(server);
}
EOF
# Build the fuzzer.
clang++ -g handshake-fuzz.cc -fsanitize=address \
openssl-1.0.1f/libssl.a openssl-1.0.1f/libcrypto.a Fuzzer*.o
# Run 20 independent fuzzer jobs.
./a.out -jobs=20 -workers=20
Voila:
#1048576 pulse cov 3424 bits 0 units 9 exec/s 24385
=================================================================
==17488==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000004748 at pc 0x00000048c979 bp 0x7fffe3e864f0 sp 0x7fffe3e85ca8
READ of size 60731 at 0x629000004748 thread T0
#0 0x48c978 in __asan_memcpy
#1 0x4db504 in tls1_process_heartbeat openssl-1.0.1f/ssl/t1_lib.c:2586:3
#2 0x580be3 in ssl3_read_bytes openssl-1.0.1f/ssl/s3_pkt.c:1092:4
Advanced features¶
Tokens¶
By default, the fuzzer is not aware of complexities of the input language
and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
It is very hard for the fuzzer to come up with something like reinterpret_cast<int>
from a test corpus that doesn’t have it.
See a detailed discussion of this topic at
http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
lib/Fuzzer implements a simple technique that allows to fuzz input languages with
long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
and pass it to the fuzzer as -tokens=TOKENS_FILE.txt
.
Three implicit tokens are added: " "
, "\t"
, and "\n"
.
The fuzzer itself will still be mutating a string of bytes
but before passing this input to the target library it will replace every byte b
with the b
-th token.
If there are less than b
tokens, a space will be added instead.
AFL compatibility¶
LibFuzzer can be used in parallel with AFL on the same test corpus. Both fuzzers expect the test corpus to reside in a directory, one file per input. You can run both fuzzers on the same corpus in parallel:
./afl-fuzz -i testcase_dir -o findings_dir /path/to/program -r @@
./llvm-fuzz testcase_dir findings_dir # Will write new tests to testcase_dir
Periodically restart both fuzzers so that they can use each other’s findings.
How good is my fuzzer?¶
Once you implement your target function LLVMFuzzerTestOneInput
and fuzz it to death,
you will want to know whether the function or the corpus can be improved further.
One easy to use metric is, of course, code coverage.
You can get the coverage for your corpus like this:
ASAN_OPTIONS=coverage_pcs=1 ./fuzzer CORPUS_DIR -runs=0
This will run all the tests in the CORPUS_DIR but will not generate any new tests and dump covered PCs to disk before exiting. Then you can subtract the set of covered PCs from the set of all instrumented PCs in the binary, see SanitizerCoverage for details.
User-supplied mutators¶
LibFuzzer allows to use custom (user-supplied) mutators, see FuzzerInterface.h
Fuzzing components of LLVM¶
clang-format-fuzzer¶
The inputs are random pieces of C++-like text.
Build (make sure to use fresh clang as the host compiler):
cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release /path/to/llvm
ninja clang-format-fuzzer
mkdir CORPUS_DIR
./bin/clang-format-fuzzer CORPUS_DIR
Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc).
TODO: commit the pre-fuzzed corpus to svn (?).
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
clang-fuzzer¶
The default behavior is very similar to clang-format-fuzzer
.
Clang can also be fuzzed with Tokens using -tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt
option.
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
Buildbot¶
We have a buildbot that runs the above fuzzers for LLVM components 24/7/365 at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer .
Pre-fuzzed test inputs in git¶
The buildbot occumulates large test corpuses over time. The corpuses are stored in git on github and can be used like this:
git clone https://github.com/kcc/fuzzing-with-sanitizers.git
bin/clang-format-fuzzer fuzzing-with-sanitizers/llvm/clang-format/C1
bin/clang-fuzzer fuzzing-with-sanitizers/llvm/clang/C1/
bin/clang-fuzzer fuzzing-with-sanitizers/llvm/clang/TOK1 -tokens=$LLVM/llvm/lib/Fuzzer/cxx_fuzzer_tokens.txt
FAQ¶
Q. Why Fuzzer does not use any of the LLVM support?¶
There are two reasons.
First, we want this library to be used outside of the LLVM w/o users having to build the rest of LLVM. This may sound unconvincing for many LLVM folks, but in practice the need for building the whole LLVM frightens many potential users – and we want more users to use this code.
Second, there is a subtle technical reason not to rely on the rest of LLVM, or any other large body of code (maybe not even STL). When coverage instrumentation is enabled, it will also instrument the LLVM support code which will blow up the coverage set of the process (since the fuzzer is in-process). In other words, by using more external dependencies we will slow down the fuzzer while the main reason for it to exist is extreme speed.
Q. What about Windows then? The Fuzzer contains code that does not build on Windows.¶
The sanitizer coverage support does not work on Windows either as of 01/2015. Once it’s there, we’ll need to re-implement OS-specific parts (I/O, signals).
Q. When this Fuzzer is not a good solution for a problem?¶
- If the test inputs are validated by the target library and the validator asserts/crashes on invalid inputs, the in-process fuzzer is not applicable (we could use fork() w/o exec, but it comes with extra overhead).
- Bugs in the target library may accumulate w/o being detected. E.g. a memory corruption that goes undetected at first and then leads to a crash while testing another input. This is why it is highly recommended to run this in-process fuzzer with all sanitizers to detect most bugs on the spot.
- It is harder to protect the in-process fuzzer from excessive memory consumption and infinite loops in the target library (still possible).
- The target library should not have significant global state that is not reset between the runs.
- Many interesting target libs are not designed in a way that supports the in-process fuzzer interface (e.g. require a file path instead of a byte array).
- If a single test run takes a considerable fraction of a second (or more) the speed benefit from the in-process fuzzer is negligible.
- If the target library runs persistent threads (that outlive execution of one test) the fuzzing results will be unreliable.
Q. So, what exactly this Fuzzer is good for?¶
This Fuzzer might be a good choice for testing libraries that have relatively small inputs, each input takes < 1ms to run, and the library code is not expected to crash on invalid inputs. Examples: regular expression matchers, text or binary format parsers.
- LLVM Language Reference Manual
- Defines the LLVM intermediate representation and the assembly form of the different nodes.
- LLVM Atomic Instructions and Concurrency Guide
- Information about LLVM’s concurrency model.
- LLVM Programmer’s Manual
- Introduction to the general layout of the LLVM sourcebase, important classes and APIs, and some tips & tricks.
- LLVM Extensions
- LLVM-specific extensions to tools and formats LLVM seeks compatibility with.
- CommandLine 2.0 Library Manual
- Provides information on using the command line parsing library.
- LLVM Coding Standards
- Details the LLVM coding standards and provides useful information on writing efficient C++ code.
- How to set up LLVM-style RTTI for your class hierarchy
- How to make
isa<>
,dyn_cast<>
, etc. available for clients of your class hierarchy. - Extending LLVM: Adding instructions, intrinsics, types, etc.
- Look here to see how to add instructions and intrinsics to LLVM.
- Doxygen generated documentation
- (classes) (tarball)
- ViewVC Repository Browser
- Architecture & Platform Information for Compiler Writers
- A list of helpful links for compiler writers.
- LibFuzzer – a library for coverage-guided fuzz testing.
- A library for writing in-process guided fuzzers.
Subsystem Documentation¶
For API clients and LLVM developers.
LLVM Alias Analysis Infrastructure¶
- Introduction
AliasAnalysis
Class Overview- Writing a new
AliasAnalysis
Implementation - Using alias analysis results
- Existing alias analysis implementations and clients
- Memory Dependence Analysis
Introduction¶
Alias Analysis (aka Pointer Analysis) is a class of techniques which attempt to determine whether or not two pointers ever can point to the same object in memory. There are many different algorithms for alias analysis and many different ways of classifying them: flow-sensitive vs. flow-insensitive, context-sensitive vs. context-insensitive, field-sensitive vs. field-insensitive, unification-based vs. subset-based, etc. Traditionally, alias analyses respond to a query with a Must, May, or No alias response, indicating that two pointers always point to the same object, might point to the same object, or are known to never point to the same object.
The LLVM AliasAnalysis class is the primary interface used by clients and implementations of alias analyses in the LLVM system. This class is the common interface between clients of alias analysis information and the implementations providing it, and is designed to support a wide range of implementations and clients (but currently all clients are assumed to be flow-insensitive). In addition to simple alias analysis information, this class exposes Mod/Ref information from those implementations which can provide it, allowing for powerful analyses and transformations to work well together.
This document contains information necessary to successfully implement this interface, use it, and to test both sides. It also explains some of the finer points about what exactly results mean. If you feel that something is unclear or should be added, please let me know.
AliasAnalysis
Class Overview¶
The AliasAnalysis
class defines the interface that the various alias analysis implementations
should support. This class exports two important enums: AliasResult
and
ModRefResult
which represent the result of an alias query or a mod/ref
query, respectively.
The AliasAnalysis
interface exposes information about memory, represented in
several different ways. In particular, memory objects are represented as a
starting address and size, and function calls are represented as the actual
call
or invoke
instructions that performs the call. The
AliasAnalysis
interface also exposes some helper methods which allow you to
get mod/ref information for arbitrary instructions.
All AliasAnalysis
interfaces require that in queries involving multiple
values, values which are not constants are all
defined within the same function.
Representation of Pointers¶
Most importantly, the AliasAnalysis
class provides several methods which are
used to query whether or not two memory objects alias, whether function calls
can modify or read a memory object, etc. For all of these queries, memory
objects are represented as a pair of their starting address (a symbolic LLVM
Value*
) and a static size.
Representing memory objects as a starting address and a size is critically important for correct Alias Analyses. For example, consider this (silly, but possible) C code:
int i;
char C[2];
char A[10];
/* ... */
for (i = 0; i != 10; ++i) {
C[0] = A[i]; /* One byte store */
C[1] = A[9-i]; /* One byte store */
}
In this case, the basicaa
pass will disambiguate the stores to C[0]
and
C[1]
because they are accesses to two distinct locations one byte apart, and
the accesses are each one byte. In this case, the Loop Invariant Code Motion
(LICM) pass can use store motion to remove the stores from the loop. In
constrast, the following code:
int i;
char C[2];
char A[10];
/* ... */
for (i = 0; i != 10; ++i) {
((short*)C)[0] = A[i]; /* Two byte store! */
C[1] = A[9-i]; /* One byte store */
}
In this case, the two stores to C do alias each other, because the access to the
&C[0]
element is a two byte access. If size information wasn’t available in
the query, even the first case would have to conservatively assume that the
accesses alias.
The alias
method¶
The alias
method is the primary interface used to determine whether or not
two memory objects alias each other. It takes two memory objects as input and
returns MustAlias, PartialAlias, MayAlias, or NoAlias as appropriate.
Like all AliasAnalysis
interfaces, the alias
method requires that either
the two pointer values be defined within the same function, or at least one of
the values is a constant.
Must, May, and No Alias Responses¶
The NoAlias
response may be used when there is never an immediate dependence
between any memory reference based on one pointer and any memory reference
based the other. The most obvious example is when the two pointers point to
non-overlapping memory ranges. Another is when the two pointers are only ever
used for reading memory. Another is when the memory is freed and reallocated
between accesses through one pointer and accesses through the other — in this
case, there is a dependence, but it’s mediated by the free and reallocation.
As an exception to this is with the noalias keyword; the “irrelevant” dependencies are ignored.
The MayAlias
response is used whenever the two pointers might refer to the
same object.
The PartialAlias
response is used when the two memory objects are known to
be overlapping in some way, but do not start at the same address.
The MustAlias
response may only be returned if the two memory objects are
guaranteed to always start at exactly the same location. A MustAlias
response implies that the pointers compare equal.
The getModRefInfo
methods¶
The getModRefInfo
methods return information about whether the execution of
an instruction can read or modify a memory location. Mod/Ref information is
always conservative: if an instruction might read or write a location,
ModRef
is returned.
The AliasAnalysis
class also provides a getModRefInfo
method for testing
dependencies between function calls. This method takes two call sites (CS1
& CS2
), returns NoModRef
if neither call writes to memory read or
written by the other, Ref
if CS1
reads memory written by CS2
,
Mod
if CS1
writes to memory read or written by CS2
, or ModRef
if
CS1
might read or write memory written to by CS2
. Note that this
relation is not commutative.
Other useful AliasAnalysis
methods¶
Several other tidbits of information are often collected by various alias analysis implementations and can be put to good use by various clients.
The pointsToConstantMemory
method¶
The pointsToConstantMemory
method returns true if and only if the analysis
can prove that the pointer only points to unchanging memory locations
(functions, constant global variables, and the null pointer). This information
can be used to refine mod/ref information: it is impossible for an unchanging
memory location to be modified.
The doesNotAccessMemory
and onlyReadsMemory
methods¶
These methods are used to provide very simple mod/ref information for function
calls. The doesNotAccessMemory
method returns true for a function if the
analysis can prove that the function never reads or writes to memory, or if the
function only reads from constant memory. Functions with this property are
side-effect free and only depend on their input arguments, allowing them to be
eliminated if they form common subexpressions or be hoisted out of loops. Many
common functions behave this way (e.g., sin
and cos
) but many others do
not (e.g., acos
, which modifies the errno
variable).
The onlyReadsMemory
method returns true for a function if analysis can prove
that (at most) the function only reads from non-volatile memory. Functions with
this property are side-effect free, only depending on their input arguments and
the state of memory when they are called. This property allows calls to these
functions to be eliminated and moved around, as long as there is no store
instruction that changes the contents of memory. Note that all functions that
satisfy the doesNotAccessMemory
method also satisfies onlyReadsMemory
.
Writing a new AliasAnalysis
Implementation¶
Writing a new alias analysis implementation for LLVM is quite straight-forward. There are already several implementations that you can use for examples, and the following information should help fill in any details. For a examples, take a look at the various alias analysis implementations included with LLVM.
Different Pass styles¶
The first step to determining what type of LLVM pass you need to use for your Alias Analysis. As is the case with most other analyses and transformations, the answer should be fairly obvious from what type of problem you are trying to solve:
- If you require interprocedural analysis, it should be a
Pass
. - If you are a function-local analysis, subclass
FunctionPass
. - If you don’t need to look at the program at all, subclass
ImmutablePass
.
In addition to the pass that you subclass, you should also inherit from the
AliasAnalysis
interface, of course, and use the RegisterAnalysisGroup
template to register as an implementation of AliasAnalysis
.
Required initialization calls¶
Your subclass of AliasAnalysis
is required to invoke two methods on the
AliasAnalysis
base class: getAnalysisUsage
and
InitializeAliasAnalysis
. In particular, your implementation of
getAnalysisUsage
should explicitly call into the
AliasAnalysis::getAnalysisUsage
method in addition to doing any declaring
any pass dependencies your pass has. Thus you should have something like this:
void getAnalysisUsage(AnalysisUsage &AU) const {
AliasAnalysis::getAnalysisUsage(AU);
// declare your dependencies here.
}
Additionally, your must invoke the InitializeAliasAnalysis
method from your
analysis run method (run
for a Pass
, runOnFunction
for a
FunctionPass
, or InitializePass
for an ImmutablePass
). For example
(as part of a Pass
):
bool run(Module &M) {
InitializeAliasAnalysis(this);
// Perform analysis here...
return false;
}
Required methods to override¶
You must override the getAdjustedAnalysisPointer
method on all subclasses
of AliasAnalysis
. An example implementation of this method would look like:
void *getAdjustedAnalysisPointer(const void* ID) override {
if (ID == &AliasAnalysis::ID)
return (AliasAnalysis*)this;
return this;
}
Interfaces which may be specified¶
All of the AliasAnalysis virtual methods default to providing chaining to another alias analysis implementation, which ends up returning conservatively correct information (returning “May” Alias and “Mod/Ref” for alias and mod/ref queries respectively). Depending on the capabilities of the analysis you are implementing, you just override the interfaces you can improve.
AliasAnalysis
chaining behavior¶
With only one special exception (the -no-aa pass)
every alias analysis pass chains to another alias analysis implementation (for
example, the user can specify “-basicaa -ds-aa -licm
” to get the maximum
benefit from both alias analyses). The alias analysis class automatically
takes care of most of this for methods that you don’t override. For methods
that you do override, in code paths that return a conservative MayAlias or
Mod/Ref result, simply return whatever the superclass computes. For example:
AliasResult alias(const Value *V1, unsigned V1Size,
const Value *V2, unsigned V2Size) {
if (...)
return NoAlias;
...
// Couldn't determine a must or no-alias result.
return AliasAnalysis::alias(V1, V1Size, V2, V2Size);
}
In addition to analysis queries, you must make sure to unconditionally pass LLVM update notification methods to the superclass as well if you override them, which allows all alias analyses in a change to be updated.
Updating analysis results for transformations¶
Alias analysis information is initially computed for a static snapshot of the program, but clients will use this information to make transformations to the code. All but the most trivial forms of alias analysis will need to have their analysis results updated to reflect the changes made by these transformations.
The AliasAnalysis
interface exposes four methods which are used to
communicate program changes from the clients to the analysis implementations.
Various alias analysis implementations should use these methods to ensure that
their internal data structures are kept up-to-date as the program changes (for
example, when an instruction is deleted), and clients of alias analysis must be
sure to call these interfaces appropriately.
The deleteValue
method¶
The deleteValue
method is called by transformations when they remove an
instruction or any other value from the program (including values that do not
use pointers). Typically alias analyses keep data structures that have entries
for each value in the program. When this method is called, they should remove
any entries for the specified value, if they exist.
The copyValue
method¶
The copyValue
method is used when a new value is introduced into the
program. There is no way to introduce a value into the program that did not
exist before (this doesn’t make sense for a safe compiler transformation), so
this is the only way to introduce a new value. This method indicates that the
new value has exactly the same properties as the value being copied.
The replaceWithNewValue
method¶
This method is a simple helper method that is provided to make clients easier to use. It is implemented by copying the old analysis information to the new value, then deleting the old value. This method cannot be overridden by alias analysis implementations.
The addEscapingUse
method¶
The addEscapingUse
method is used when the uses of a pointer value have
changed in ways that may invalidate precomputed analysis information.
Implementations may either use this callback to provide conservative responses
for points whose uses have change since analysis time, or may recompute some or
all of their internal state to continue providing accurate responses.
In general, any new use of a pointer value is considered an escaping use, and must be reported through this callback, except for the uses below:
- A
bitcast
orgetelementptr
of the pointer - A
store
through the pointer (but not astore
of the pointer) - A
load
through the pointer
Efficiency Issues¶
From the LLVM perspective, the only thing you need to do to provide an efficient alias analysis is to make sure that alias analysis queries are serviced quickly. The actual calculation of the alias analysis results (the “run” method) is only performed once, but many (perhaps duplicate) queries may be performed. Because of this, try to move as much computation to the run method as possible (within reason).
Limitations¶
The AliasAnalysis infrastructure has several limitations which make writing a
new AliasAnalysis
implementation difficult.
There is no way to override the default alias analysis. It would be very useful
to be able to do something like “opt -my-aa -O2
” and have it use -my-aa
for all passes which need AliasAnalysis, but there is currently no support for
that, short of changing the source code and recompiling. Similarly, there is
also no way of setting a chain of analyses as the default.
There is no way for transform passes to declare that they preserve
AliasAnalysis
implementations. The AliasAnalysis
interface includes
deleteValue
and copyValue
methods which are intended to allow a pass to
keep an AliasAnalysis consistent, however there’s no way for a pass to declare
in its getAnalysisUsage
that it does so. Some passes attempt to use
AU.addPreserved<AliasAnalysis>
, however this doesn’t actually have any
effect.
AliasAnalysisCounter
(-count-aa
) and AliasDebugger
(-debug-aa
)
are implemented as ModulePass
classes, so if your alias analysis uses
FunctionPass
, it won’t be able to use these utilities. If you try to use
them, the pass manager will silently route alias analysis queries directly to
BasicAliasAnalysis
instead.
Similarly, the opt -p
option introduces ModulePass
passes between each
pass, which prevents the use of FunctionPass
alias analysis passes.
The AliasAnalysis
API does have functions for notifying implementations when
values are deleted or copied, however these aren’t sufficient. There are many
other ways that LLVM IR can be modified which could be relevant to
AliasAnalysis
implementations which can not be expressed.
The AliasAnalysisDebugger
utility seems to suggest that AliasAnalysis
implementations can expect that they will be informed of any relevant Value
before it appears in an alias query. However, popular clients such as GVN
don’t support this, and are known to trigger errors when run with the
AliasAnalysisDebugger
.
Due to several of the above limitations, the most obvious use for the
AliasAnalysisCounter
utility, collecting stats on all alias queries in a
compilation, doesn’t work, even if the AliasAnalysis
implementations don’t
use FunctionPass
. There’s no way to set a default, much less a default
sequence, and there’s no way to preserve it.
The AliasSetTracker
class (which is used by LICM
) makes a
non-deterministic number of alias queries. This can cause stats collected by
AliasAnalysisCounter
to have fluctuations among identical runs, for
example. Another consequence is that debugging techniques involving pausing
execution after a predetermined number of queries can be unreliable.
Many alias queries can be reformulated in terms of other alias queries. When
multiple AliasAnalysis
queries are chained together, it would make sense to
start those queries from the beginning of the chain, with care taken to avoid
infinite looping, however currently an implementation which wants to do this can
only start such queries from itself.
Using alias analysis results¶
There are several different ways to use alias analysis results. In order of preference, these are:
Using the MemoryDependenceAnalysis
Pass¶
The memdep
pass uses alias analysis to provide high-level dependence
information about memory-using instructions. This will tell you which store
feeds into a load, for example. It uses caching and other techniques to be
efficient, and is used by Dead Store Elimination, GVN, and memcpy optimizations.
Using the AliasSetTracker
class¶
Many transformations need information about alias sets that are active in
some scope, rather than information about pairwise aliasing. The
AliasSetTracker
class is used to efficiently build these Alias Sets from the pairwise alias
analysis information provided by the AliasAnalysis
interface.
First you initialize the AliasSetTracker by using the “add
” methods to add
information about various potentially aliasing instructions in the scope you are
interested in. Once all of the alias sets are completed, your pass should
simply iterate through the constructed alias sets, using the AliasSetTracker
begin()
/end()
methods.
The AliasSet
s formed by the AliasSetTracker
are guaranteed to be
disjoint, calculate mod/ref information and volatility for the set, and keep
track of whether or not all of the pointers in the set are Must aliases. The
AliasSetTracker also makes sure that sets are properly folded due to call
instructions, and can provide a list of pointers in each set.
As an example user of this, the Loop Invariant Code Motion pass uses AliasSetTracker
s to calculate alias
sets for each loop nest. If an AliasSet
in a loop is not modified, then all
load instructions from that set may be hoisted out of the loop. If any alias
sets are stored to and are must alias sets, then the stores may be sunk
to outside of the loop, promoting the memory location to a register for the
duration of the loop nest. Both of these transformations only apply if the
pointer argument is loop-invariant.
The AliasSetTracker implementation¶
The AliasSetTracker class is implemented to be as efficient as possible. It uses the union-find algorithm to efficiently merge AliasSets when a pointer is inserted into the AliasSetTracker that aliases multiple sets. The primary data structure is a hash table mapping pointers to the AliasSet they are in.
The AliasSetTracker class must maintain a list of all of the LLVM Value*
s
that are in each AliasSet. Since the hash table already has entries for each
LLVM Value*
of interest, the AliasesSets thread the linked list through
these hash-table nodes to avoid having to allocate memory unnecessarily, and to
make merging alias sets extremely efficient (the linked list merge is constant
time).
You shouldn’t need to understand these details if you are just a client of the AliasSetTracker, but if you look at the code, hopefully this brief description will help make sense of why things are designed the way they are.
Using the AliasAnalysis
interface directly¶
If neither of these utility class are what your pass needs, you should use the
interfaces exposed by the AliasAnalysis
class directly. Try to use the
higher-level methods when possible (e.g., use mod/ref information instead of the
alias method directly if possible) to get the best precision and efficiency.
Existing alias analysis implementations and clients¶
If you’re going to be working with the LLVM alias analysis infrastructure, you should know what clients and implementations of alias analysis are available. In particular, if you are implementing an alias analysis, you should be aware of the the clients that are useful for monitoring and evaluating different implementations.
Available AliasAnalysis
implementations¶
This section lists the various implementations of the AliasAnalysis
interface. With the exception of the -no-aa
implementation, all of these chain to other
alias analysis implementations.
The -no-aa
pass¶
The -no-aa
pass is just like what it sounds: an alias analysis that never
returns any useful information. This pass can be useful if you think that alias
analysis is doing something wrong and are trying to narrow down a problem.
The -basicaa
pass¶
The -basicaa
pass is an aggressive local analysis that knows many
important facts:
- Distinct globals, stack allocations, and heap allocations can never alias.
- Globals, stack allocations, and heap allocations never alias the null pointer.
- Different fields of a structure do not alias.
- Indexes into arrays with statically differing subscripts cannot alias.
- Many common standard C library functions never access memory or only read memory.
- Pointers that obviously point to constant globals “
pointToConstantMemory
”. - Function calls can not modify or references stack allocations if they never escape from the function that allocates them (a common case for automatic arrays).
The -globalsmodref-aa
pass¶
This pass implements a simple context-sensitive mod/ref and alias analysis for internal global variables that don’t “have their address taken”. If a global does not have its address taken, the pass knows that no pointers alias the global. This pass also keeps track of functions that it knows never access memory or never read memory. This allows certain optimizations (e.g. GVN) to eliminate call instructions entirely.
The real power of this pass is that it provides context-sensitive mod/ref information for call instructions. This allows the optimizer to know that calls to a function do not clobber or read the value of the global, allowing loads and stores to be eliminated.
Note
This pass is somewhat limited in its scope (only support non-address taken globals), but is very quick analysis.
The -steens-aa
pass¶
The -steens-aa
pass implements a variation on the well-known “Steensgaard’s
algorithm” for interprocedural alias analysis. Steensgaard’s algorithm is a
unification-based, flow-insensitive, context-insensitive, and field-insensitive
alias analysis that is also very scalable (effectively linear time).
The LLVM -steens-aa
pass implements a “speculatively field-sensitive”
version of Steensgaard’s algorithm using the Data Structure Analysis framework.
This gives it substantially more precision than the standard algorithm while
maintaining excellent analysis scalability.
Note
-steens-aa
is available in the optional “poolalloc” module. It is not part
of the LLVM core.
The -ds-aa
pass¶
The -ds-aa
pass implements the full Data Structure Analysis algorithm. Data
Structure Analysis is a modular unification-based, flow-insensitive,
context-sensitive, and speculatively field-sensitive alias
analysis that is also quite scalable, usually at O(n * log(n))
.
This algorithm is capable of responding to a full variety of alias analysis queries, and can provide context-sensitive mod/ref information as well. The only major facility not implemented so far is support for must-alias information.
Note
-ds-aa
is available in the optional “poolalloc” module. It is not part of
the LLVM core.
The -scev-aa
pass¶
The -scev-aa
pass implements AliasAnalysis queries by translating them into
ScalarEvolution queries. This gives it a more complete understanding of
getelementptr
instructions and loop induction variables than other alias
analyses have.
Alias analysis driven transformations¶
LLVM includes several alias-analysis driven transformations which can be used with any of the implementations above.
The -adce
pass¶
The -adce
pass, which implements Aggressive Dead Code Elimination uses the
AliasAnalysis
interface to delete calls to functions that do not have
side-effects and are not used.
The -licm
pass¶
The -licm
pass implements various Loop Invariant Code Motion related
transformations. It uses the AliasAnalysis
interface for several different
transformations:
- It uses mod/ref information to hoist or sink load instructions out of loops if there are no instructions in the loop that modifies the memory loaded.
- It uses mod/ref information to hoist function calls out of loops that do not write to memory and are loop-invariant.
- If uses alias information to promote memory objects that are loaded and stored to in loops to live in a register instead. It can do this if there are no may aliases to the loaded/stored memory location.
The -argpromotion
pass¶
The -argpromotion
pass promotes by-reference arguments to be passed in
by-value instead. In particular, if pointer arguments are only loaded from it
passes in the value loaded instead of the address to the function. This pass
uses alias information to make sure that the value loaded from the argument
pointer is not modified between the entry of the function and any load of the
pointer.
The -gvn
, -memcpyopt
, and -dse
passes¶
These passes use AliasAnalysis information to reason about loads and stores.
Clients for debugging and evaluation of implementations¶
These passes are useful for evaluating the various alias analysis implementations. You can use them with commands like:
% opt -ds-aa -aa-eval foo.bc -disable-output -stats
The -print-alias-sets
pass¶
The -print-alias-sets
pass is exposed as part of the opt
tool to print
out the Alias Sets formed by the AliasSetTracker class. This is useful if
you’re using the AliasSetTracker
class. To use it, use something like:
% opt -ds-aa -print-alias-sets -disable-output
The -count-aa
pass¶
The -count-aa
pass is useful to see how many queries a particular pass is
making and what responses are returned by the alias analysis. As an example:
% opt -basicaa -count-aa -ds-aa -count-aa -licm
will print out how many queries (and what responses are returned) by the
-licm
pass (of the -ds-aa
pass) and how many queries are made of the
-basicaa
pass by the -ds-aa
pass. This can be useful when debugging a
transformation or an alias analysis implementation.
The -aa-eval
pass¶
The -aa-eval
pass simply iterates through all pairs of pointers in a
function and asks an alias analysis whether or not the pointers alias. This
gives an indication of the precision of the alias analysis. Statistics are
printed indicating the percent of no/may/must aliases found (a more precise
algorithm will have a lower number of may aliases).
Memory Dependence Analysis¶
If you’re just looking to be a client of alias analysis information, consider using the Memory Dependence Analysis interface instead. MemDep is a lazy, caching layer on top of alias analysis that is able to answer the question of what preceding memory operations a given instruction depends on, either at an intra- or inter-block level. Because of its laziness and caching policy, using MemDep can be a significant performance win over accessing alias analysis directly.
LLVM Bitcode File Format¶
- Abstract
- Overview
- Bitstream Format
- Bitcode Wrapper Format
- Native Object File Wrapper Format
- LLVM IR Encoding
- Basics
- MODULE_BLOCK Contents
- PARAMATTR_BLOCK Contents
- TYPE_BLOCK Contents
- TYPE_CODE_NUMENTRY Record
- TYPE_CODE_VOID Record
- TYPE_CODE_HALF Record
- TYPE_CODE_FLOAT Record
- TYPE_CODE_DOUBLE Record
- TYPE_CODE_LABEL Record
- TYPE_CODE_OPAQUE Record
- TYPE_CODE_INTEGER Record
- TYPE_CODE_POINTER Record
- TYPE_CODE_FUNCTION Record
- TYPE_CODE_STRUCT Record
- TYPE_CODE_ARRAY Record
- TYPE_CODE_VECTOR Record
- TYPE_CODE_X86_FP80 Record
- TYPE_CODE_FP128 Record
- TYPE_CODE_PPC_FP128 Record
- TYPE_CODE_METADATA Record
- CONSTANTS_BLOCK Contents
- FUNCTION_BLOCK Contents
- TYPE_SYMTAB_BLOCK Contents
- VALUE_SYMTAB_BLOCK Contents
- METADATA_BLOCK Contents
- METADATA_ATTACHMENT Contents
Abstract¶
This document describes the LLVM bitstream file format and the encoding of the LLVM IR into it.
Overview¶
What is commonly known as the LLVM bitcode file format (also, sometimes anachronistically known as bytecode) is actually two things: a bitstream container format and an encoding of LLVM IR into the container format.
The bitstream format is an abstract encoding of structured data, very similar to XML in some ways. Like XML, bitstream files contain tags, and nested structures, and you can parse the file without having to understand the tags. Unlike XML, the bitstream format is a binary encoding, and unlike XML it provides a mechanism for the file to self-describe “abbreviations”, which are effectively size optimizations for the content.
LLVM IR files may be optionally embedded into a wrapper structure, or in a native object file. Both of these mechanisms make it easy to embed extra data along with LLVM IR files.
This document first describes the LLVM bitstream format, describes the wrapper format, then describes the record structure used by LLVM IR files.
Bitstream Format¶
The bitstream format is literally a stream of bits, with a very simple structure. This structure consists of the following concepts:
- A “magic number” that identifies the contents of the stream.
- Encoding primitives like variable bit-rate integers.
- Blocks, which define nested content.
- Data Records, which describe entities within the file.
- Abbreviations, which specify compression optimizations for the file.
Note that the llvm-bcanalyzer tool can be used to dump and inspect arbitrary bitstreams, which is very useful for understanding the encoding.
Magic Numbers¶
The first two bytes of a bitcode file are ‘BC’ (0x42
, 0x43
). The second
two bytes are an application-specific magic number. Generic bitcode tools can
look at only the first two bytes to verify the file is bitcode, while
application-specific programs will want to look at all four.
Primitives¶
A bitstream literally consists of a stream of bits, which are read in order starting with the least significant bit of each byte. The stream is made up of a number of primitive values that encode a stream of unsigned integer values. These integers are encoded in two ways: either as Fixed Width Integers or as Variable Width Integers.
Fixed Width Integers¶
Fixed-width integer values have their low bits emitted directly to the file. For example, a 3-bit integer value encodes 1 as 001. Fixed width integers are used when there are a well-known number of options for a field. For example, boolean values are usually encoded with a 1-bit wide integer.
Variable Width Integers¶
Variable-width integer (VBR) values encode values of arbitrary size, optimizing for the case where the values are small. Given a 4-bit VBR field, any 3-bit value (0 through 7) is encoded directly, with the high bit set to zero. Values larger than N-1 bits emit their bits in a series of N-1 bit chunks, where all but the last set the high bit.
For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a vbr4 value. The first set of four bits indicates the value 3 (011) with a continuation piece (indicated by a high bit of 1). The next word indicates a value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value 27.
6-bit characters¶
6-bit characters encode common characters into a fixed 6-bit field. They represent the following characters with the following 6-bit values:
'a' .. 'z' --- 0 .. 25
'A' .. 'Z' --- 26 .. 51
'0' .. '9' --- 52 .. 61
'.' --- 62
'_' --- 63
This encoding is only suitable for encoding characters and strings that consist only of the above characters. It is completely incapable of encoding characters not in the set.
Word Alignment¶
Occasionally, it is useful to emit zero bits until the bitstream is a multiple of 32 bits. This ensures that the bit position in the stream can be represented as a multiple of 32-bit words.
Abbreviation IDs¶
A bitstream is a sequential series of Blocks and Data Records. Both of these start with an abbreviation ID encoded as a fixed-bitwidth field. The width is specified by the current block, as described below. The value of the abbreviation ID specifies either a builtin ID (which have special meanings, defined below) or one of the abbreviation IDs defined for the current block by the stream itself.
The set of builtin abbrev IDs is:
- 0 - END_BLOCK — This abbrev ID marks the end of the current block.
- 1 - ENTER_SUBBLOCK — This abbrev ID marks the beginning of a new block.
- 2 - DEFINE_ABBREV — This defines a new abbreviation.
- 3 - UNABBREV_RECORD — This ID specifies the definition of an unabbreviated record.
Abbreviation IDs 4 and above are defined by the stream itself, and specify an abbreviated record encoding.
Blocks¶
Blocks in a bitstream denote nested regions of the stream, and are identified by a content-specific id number (for example, LLVM IR uses an ID of 12 to represent function bodies). Block IDs 0-7 are reserved for standard blocks whose meaning is defined by Bitcode; block IDs 8 and greater are application specific. Nested blocks capture the hierarchical structure of the data encoded in it, and various properties are associated with blocks as the file is parsed. Block definitions allow the reader to efficiently skip blocks in constant time if the reader wants a summary of blocks, or if it wants to efficiently skip data it does not understand. The LLVM IR reader uses this mechanism to skip function bodies, lazily reading them on demand.
When reading and encoding the stream, several properties are maintained for the block. In particular, each block maintains:
- A current abbrev id width. This value starts at 2 at the beginning of the stream, and is set every time a block record is entered. The block entry specifies the abbrev id width for the body of the block.
- A set of abbreviations. Abbreviations may be defined within a block, in
which case they are only defined in that block (neither subblocks nor
enclosing blocks see the abbreviation). Abbreviations can also be defined
inside a BLOCKINFO block, in which case they are defined in all blocks
that match the ID that the
BLOCKINFO
block is describing.
As sub blocks are entered, these properties are saved and the new sub-block has its own set of abbreviations, and its own abbrev id width. When a sub-block is popped, the saved values are restored.
ENTER_SUBBLOCK Encoding¶
[ENTER_SUBBLOCK, blockidvbr8, newabbrevlenvbr4, <align32bits>, blocklen_32]
The ENTER_SUBBLOCK
abbreviation ID specifies the start of a new block
record. The blockid
value is encoded as an 8-bit VBR identifier, and
indicates the type of block being entered, which can be a standard block or
an application-specific block. The newabbrevlen
value is a 4-bit VBR, which
specifies the abbrev id width for the sub-block. The blocklen
value is a
32-bit aligned value that specifies the size of the subblock in 32-bit
words. This value allows the reader to skip over the entire block in one jump.
END_BLOCK Encoding¶
[END_BLOCK, <align32bits>]
The END_BLOCK
abbreviation ID specifies the end of the current block record.
Its end is aligned to 32-bits to ensure that the size of the block is an even
multiple of 32-bits.
Data Records¶
Data records consist of a record code and a number of (up to) 64-bit integer
values. The interpretation of the code and values is application specific and
may vary between different block types. Records can be encoded either using an
unabbrev record, or with an abbreviation. In the LLVM IR format, for example,
there is a record which encodes the target triple of a module. The code is
MODULE_CODE_TRIPLE
, and the values of the record are the ASCII codes for the
characters in the string.
UNABBREV_RECORD Encoding¶
[UNABBREV_RECORD, codevbr6, numopsvbr6, op0vbr6, op1vbr6, ...]
An UNABBREV_RECORD
provides a default fallback encoding, which is both
completely general and extremely inefficient. It can describe an arbitrary
record by emitting the code and operands as VBRs.
For example, emitting an LLVM IR target triple as an unabbreviated record
requires emitting the UNABBREV_RECORD
abbrevid, a vbr6 for the
MODULE_CODE_TRIPLE
code, a vbr6 for the length of the string, which is equal
to the number of operands, and a vbr6 for each character. Because there are no
letters with values less than 32, each letter would need to be emitted as at
least a two-part VBR, which means that each letter would require at least 12
bits. This is not an efficient encoding, but it is fully general.
Abbreviated Record Encoding¶
[<abbrevid>, fields...]
An abbreviated record is a abbreviation id followed by a set of fields that are encoded according to the abbreviation definition. This allows records to be encoded significantly more densely than records encoded with the UNABBREV_RECORD type, and allows the abbreviation types to be specified in the stream itself, which allows the files to be completely self describing. The actual encoding of abbreviations is defined below.
The record code, which is the first field of an abbreviated record, may be encoded in the abbreviation definition (as a literal operand) or supplied in the abbreviated record (as a Fixed or VBR operand value).
Abbreviations¶
Abbreviations are an important form of compression for bitstreams. The idea is to specify a dense encoding for a class of records once, then use that encoding to emit many records. It takes space to emit the encoding into the file, but the space is recouped (hopefully plus some) when the records that use it are emitted.
Abbreviations can be determined dynamically per client, per file. Because the abbreviations are stored in the bitstream itself, different streams of the same format can contain different sets of abbreviations according to the needs of the specific stream. As a concrete example, LLVM IR files usually emit an abbreviation for binary operators. If a specific LLVM module contained no or few binary operators, the abbreviation does not need to be emitted.
DEFINE_ABBREV Encoding¶
[DEFINE_ABBREV, numabbrevopsvbr5, abbrevop0, abbrevop1, ...]
A DEFINE_ABBREV
record adds an abbreviation to the list of currently defined
abbreviations in the scope of this block. This definition only exists inside
this immediate block — it is not visible in subblocks or enclosing blocks.
Abbreviations are implicitly assigned IDs sequentially starting from 4 (the
first application-defined abbreviation ID). Any abbreviations defined in a
BLOCKINFO
record for the particular block type receive IDs first, in order,
followed by any abbreviations defined within the block itself. Abbreviated data
records reference this ID to indicate what abbreviation they are invoking.
An abbreviation definition consists of the DEFINE_ABBREV
abbrevid followed
by a VBR that specifies the number of abbrev operands, then the abbrev operands
themselves. Abbreviation operands come in three forms. They all start with a
single bit that indicates whether the abbrev operand is a literal operand (when
the bit is 1) or an encoding operand (when the bit is 0).
- Literal operands — [11, litvaluevbr8] — Literal operands specify that the value in the result is always a single specific value. This specific value is emitted as a vbr8 after the bit indicating that it is a literal operand.
- Encoding info without data — [01, encoding3] — Operand encodings that do not have extra data are just emitted as their code.
- Encoding info with data — [01, encoding3, valuevbr5] — Operand encodings that do have extra data are emitted as their code, followed by the extra data.
The possible operand encodings are:
- Fixed (code 1): The field should be emitted as a fixed-width value, whose width is specified by the operand’s extra data.
- VBR (code 2): The field should be emitted as a variable-width value, whose width is specified by the operand’s extra data.
- Array (code 3): This field is an array of values. The array operand has no extra data, but expects another operand to follow it, indicating the element type of the array. When reading an array in an abbreviated record, the first integer is a vbr6 that indicates the array length, followed by the encoded elements of the array. An array may only occur as the last operand of an abbreviation (except for the one final operand that gives the array’s type).
- Char6 (code 4): This field should be emitted as a char6-encoded value. This operand type takes no extra data. Char6 encoding is normally used as an array element type.
- Blob (code 5): This field is emitted as a vbr6, followed by padding to a 32-bit boundary (for alignment) and an array of 8-bit objects. The array of bytes is further followed by tail padding to ensure that its total length is a multiple of 4 bytes. This makes it very efficient for the reader to decode the data without having to make a copy of it: it can use a pointer to the data in the mapped in file and poke directly at it. A blob may only occur as the last operand of an abbreviation.
For example, target triples in LLVM modules are encoded as a record of the form
[TRIPLE, 'a', 'b', 'c', 'd']
. Consider if the bitstream emitted the
following abbrev entry:
[0, Fixed, 4]
[0, Array]
[0, Char6]
When emitting a record with this abbreviation, the above entry would be emitted as:
[4abbrevwidth, 24, 4vbr6, 06, 16, 26, 36]
These values are:
- The first value, 4, is the abbreviation ID for this abbreviation.
- The second value, 2, is the record code for
TRIPLE
records within LLVM IR fileMODULE_BLOCK
blocks. - The third value, 4, is the length of the array.
- The rest of the values are the char6 encoded values for
"abcd"
.
With this abbreviation, the triple is emitted with only 37 bits (assuming a
abbrev id width of 3). Without the abbreviation, significantly more space would
be required to emit the target triple. Also, because the TRIPLE
value is
not emitted as a literal in the abbreviation, the abbreviation can also be used
for any other string value.
Standard Blocks¶
In addition to the basic block structure and record encodings, the bitstream also defines specific built-in block types. These block types specify how the stream is to be decoded or other metadata. In the future, new standard blocks may be added. Block IDs 0-7 are reserved for standard blocks.
#0 - BLOCKINFO Block¶
The BLOCKINFO
block allows the description of metadata for other blocks.
The currently specified records are:
[SETBID (#1), blockid]
[DEFINE_ABBREV, ...]
[BLOCKNAME, ...name...]
[SETRECORDNAME, RecordID, ...name...]
The SETBID
record (code 1) indicates which block ID is being described.
SETBID
records can occur multiple times throughout the block to change which
block ID is being described. There must be a SETBID
record prior to any
other records.
Standard DEFINE_ABBREV
records can occur inside BLOCKINFO
blocks, but
unlike their occurrence in normal blocks, the abbreviation is defined for blocks
matching the block ID we are describing, not the BLOCKINFO
block
itself. The abbreviations defined in BLOCKINFO
blocks receive abbreviation
IDs as described in DEFINE_ABBREV.
The BLOCKNAME
record (code 2) can optionally occur in this block. The
elements of the record are the bytes of the string name of the block.
llvm-bcanalyzer can use this to dump out bitcode files symbolically.
The SETRECORDNAME
record (code 3) can also optionally occur in this block.
The first operand value is a record ID number, and the rest of the elements of
the record are the bytes for the string name of the record. llvm-bcanalyzer can
use this to dump out bitcode files symbolically.
Note that although the data in BLOCKINFO
blocks is described as “metadata,”
the abbreviations they contain are essential for parsing records from the
corresponding blocks. It is not safe to skip them.
Bitcode Wrapper Format¶
Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper structure. This structure contains a simple header that indicates the offset and size of the embedded BC file. This allows additional information to be stored alongside the BC file. The structure of this file header is:
[Magic32, Version32, Offset32, Size32, CPUType32]
Each of the fields are 32-bit fields stored in little endian form (as with the
rest of the bitcode file fields). The Magic number is always 0x0B17C0DE
and
the version is currently always 0
. The Offset field is the offset in bytes
to the start of the bitcode stream in the file, and the Size field is the size
in bytes of the stream. CPUType is a target-specific value that can be used to
encode the CPU of the target.
Native Object File Wrapper Format¶
Bitcode files for LLVM IR may also be wrapped in a native object file
(i.e. ELF, COFF, Mach-O). The bitcode must be stored in a section of the
object file named .llvmbc
. This wrapper format is useful for accommodating
LTO in compilation pipelines where intermediate objects must be native object
files which contain metadata in other sections.
Not all tools support this format.
LLVM IR Encoding¶
LLVM IR is encoded into a bitstream by defining blocks and records. It uses blocks for things like constant pools, functions, symbol tables, etc. It uses records for things like instructions, global variable descriptors, type descriptions, etc. This document does not describe the set of abbreviations that the writer uses, as these are fully self-described in the file, and the reader is not allowed to build in any knowledge of this.
Basics¶
LLVM IR Magic Number¶
The magic number for LLVM IR files is:
[0x04, 0xC4, 0xE4, 0xD4]
When combined with the bitcode magic number and viewed as bytes, this is
"BC 0xC0DE"
.
Signed VBRs¶
Variable Width Integer encoding is an efficient way to encode arbitrary sized unsigned values, but is an extremely inefficient for encoding signed values, as signed values are otherwise treated as maximally large unsigned values.
As such, signed VBR values of a specific width are emitted as follows:
- Positive values are emitted as VBRs of the specified width, but with their value shifted left by one.
- Negative values are emitted as VBRs of the specified width, but the negated value is shifted left by one, and the low bit is set.
With this encoding, small positive and small negative values can both be emitted
efficiently. Signed VBR encoding is used in CST_CODE_INTEGER
and
CST_CODE_WIDE_INTEGER
records within CONSTANTS_BLOCK
blocks.
It is also used for phi instruction operands in MODULE_CODE_VERSION 1.
LLVM IR Blocks¶
LLVM IR is defined with the following blocks:
- 8 — MODULE_BLOCK — This is the top-level block that contains the entire module, and describes a variety of per-module information.
- 9 — PARAMATTR_BLOCK — This enumerates the parameter attributes.
- 10 — TYPE_BLOCK — This describes all of the types in the module.
- 11 — CONSTANTS_BLOCK — This describes constants for a module or function.
- 12 — FUNCTION_BLOCK — This describes a function body.
- 13 — TYPE_SYMTAB_BLOCK — This describes the type symbol table.
- 14 — VALUE_SYMTAB_BLOCK — This describes a value symbol table.
- 15 — METADATA_BLOCK — This describes metadata items.
- 16 — METADATA_ATTACHMENT — This contains records associating metadata with function instruction values.
MODULE_BLOCK Contents¶
The MODULE_BLOCK
block (id 8) is the top-level block for LLVM bitcode files,
and each bitcode file must contain exactly one. In addition to records
(described below) containing information about the module, a MODULE_BLOCK
block may contain the following sub-blocks:
- BLOCKINFO
- PARAMATTR_BLOCK
- TYPE_BLOCK
- TYPE_SYMTAB_BLOCK
- VALUE_SYMTAB_BLOCK
- CONSTANTS_BLOCK
- FUNCTION_BLOCK
- METADATA_BLOCK
MODULE_CODE_VERSION Record¶
[VERSION, version#]
The VERSION
record (code 1) contains a single value indicating the format
version. Versions 0 and 1 are supported at this time. The difference between
version 0 and 1 is in the encoding of instruction operands in
each FUNCTION_BLOCK.
In version 0, each value defined by an instruction is assigned an ID
unique to the function. Function-level value IDs are assigned starting from
NumModuleValues
since they share the same namespace as module-level
values. The value enumerator resets after each function. When a value is
an operand of an instruction, the value ID is used to represent the operand.
For large functions or large modules, these operand values can be large.
The encoding in version 1 attempts to avoid large operand values in common cases. Instead of using the value ID directly, operands are encoded as relative to the current instruction. Thus, if an operand is the value defined by the previous instruction, the operand will be encoded as 1.
For example, instead of
#n = load #n-1
#n+1 = icmp eq #n, #const0
br #n+1, label #(bb1), label #(bb2)
version 1 will encode the instructions as
#n = load #1
#n+1 = icmp eq #1, (#n+1)-#const0
br #1, label #(bb1), label #(bb2)
Note in the example that operands which are constants also use the relative encoding, while operands like basic block labels do not use the relative encoding.
Forward references will result in a negative value. This can be inefficient, as operands are normally encoded as unsigned VBRs. However, forward references are rare, except in the case of phi instructions. For phi instructions, operands are encoded as Signed VBRs to deal with forward references.
MODULE_CODE_TRIPLE Record¶
[TRIPLE, ...string...]
The TRIPLE
record (code 2) contains a variable number of values representing
the bytes of the target triple
specification string.
MODULE_CODE_DATALAYOUT Record¶
[DATALAYOUT, ...string...]
The DATALAYOUT
record (code 3) contains a variable number of values
representing the bytes of the target datalayout
specification string.
MODULE_CODE_ASM Record¶
[ASM, ...string...]
The ASM
record (code 4) contains a variable number of values representing
the bytes of module asm
strings, with individual assembly blocks separated
by newline (ASCII 10) characters.
MODULE_CODE_SECTIONNAME Record¶
[SECTIONNAME, ...string...]
The SECTIONNAME
record (code 5) contains a variable number of values
representing the bytes of a single section name string. There should be one
SECTIONNAME
record for each section name referenced (e.g., in global
variable or function section
attributes) within the module. These records
can be referenced by the 1-based index in the section fields of GLOBALVAR
or FUNCTION
records.
MODULE_CODE_DEPLIB Record¶
[DEPLIB, ...string...]
The DEPLIB
record (code 6) contains a variable number of values representing
the bytes of a single dependent library name string, one of the libraries
mentioned in a deplibs
declaration. There should be one DEPLIB
record
for each library name referenced.
MODULE_CODE_GLOBALVAR Record¶
[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr, externally_initialized, dllstorageclass, comdat]
The GLOBALVAR
record (code 7) marks the declaration or definition of a
global variable. The operand fields are:
- pointer type: The type index of the pointer type used to point to this global variable
- isconst: Non-zero if the variable is treated as constant within the module, or zero if it is not
- initid: If non-zero, the value index of the initializer for this variable, plus 1.
- linkage: An encoding of the linkage type for this variable:
*
external
: code 0 *weak
: code 1 *appending
: code 2 *internal
: code 3 *linkonce
: code 4 *dllimport
: code 5 *dllexport
: code 6 *extern_weak
: code 7 *common
: code 8 *private
: code 9 *weak_odr
: code 10 *linkonce_odr
: code 11 *available_externally
: code 12 * deprecated : code 13 * deprecated : code 14 - alignment*: The logarithm base 2 of the variable’s requested alignment, plus 1
- section: If non-zero, the 1-based section index in the table of MODULE_CODE_SECTIONNAME entries.
- visibility: If present, an encoding of the visibility of this variable:
*
default
: code 0 *hidden
: code 1 *protected
: code 2 - threadlocal: If present, an encoding of the thread local storage mode of the
variable:
*
not thread local
: code 0 *thread local; default TLS model
: code 1 *localdynamic
: code 2 *initialexec
: code 3 *localexec
: code 4 - unnamed_addr: If present and non-zero, indicates that the variable has
unnamed_addr
- dllstorageclass: If present, an encoding of the DLL storage class of this variable:
default
: code 0dllimport
: code 1dllexport
: code 2
MODULE_CODE_FUNCTION Record¶
[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prologuedata, dllstorageclass, comdat, prefixdata, personalityfn]
The FUNCTION
record (code 8) marks the declaration or definition of a
function. The operand fields are:
- type: The type index of the function type describing this function
- callingconv: The calling convention number:
*
ccc
: code 0 *fastcc
: code 8 *coldcc
: code 9 *webkit_jscc
: code 12 *anyregcc
: code 13 *preserve_mostcc
: code 14 *preserve_allcc
: code 15 *x86_stdcallcc
: code 64 *x86_fastcallcc
: code 65 *arm_apcscc
: code 66 *arm_aapcscc
: code 67 *arm_aapcs_vfpcc
: code 68 - isproto*: Non-zero if this entry represents a declaration rather than a definition
- linkage: An encoding of the linkage type for this function
- paramattr: If nonzero, the 1-based parameter attribute index into the table of PARAMATTR_CODE_ENTRY entries.
- alignment: The logarithm base 2 of the function’s requested alignment, plus 1
- section: If non-zero, the 1-based section index in the table of MODULE_CODE_SECTIONNAME entries.
- visibility: An encoding of the visibility of this function
- gc: If present and nonzero, the 1-based garbage collector index in the table of MODULE_CODE_GCNAME entries.
- unnamed_addr: If present and non-zero, indicates that the function has
unnamed_addr
- prologuedata: If non-zero, the value index of the prologue data for this function, plus 1.
- dllstorageclass: An encoding of the dllstorageclass of this function
- comdat: An encoding of the COMDAT of this function
- prefixdata: If non-zero, the value index of the prefix data for this function, plus 1.
- personalityfn: If non-zero, the value index of the personality function for this function, plus 1.
MODULE_CODE_ALIAS Record¶
[ALIAS, alias type, aliasee val#, linkage, visibility, dllstorageclass]
The ALIAS
record (code 9) marks the definition of an alias. The operand
fields are
- alias type: The type index of the alias
- aliasee val#: The value index of the aliased value
- linkage: An encoding of the linkage type for this alias
- visibility: If present, an encoding of the visibility of the alias
- dllstorageclass: If present, an encoding of the dllstorageclass of the alias
MODULE_CODE_PURGEVALS Record¶
[PURGEVALS, numvals]
The PURGEVALS
record (code 10) resets the module-level value list to the
size given by the single operand value. Module-level value list items are added
by GLOBALVAR
, FUNCTION
, and ALIAS
records. After a PURGEVALS
record is seen, new value indices will start from the given numvals value.
MODULE_CODE_GCNAME Record¶
[GCNAME, ...string...]
The GCNAME
record (code 11) contains a variable number of values
representing the bytes of a single garbage collector name string. There should
be one GCNAME
record for each garbage collector name referenced in function
gc
attributes within the module. These records can be referenced by 1-based
index in the gc fields of FUNCTION
records.
PARAMATTR_BLOCK Contents¶
The PARAMATTR_BLOCK
block (id 9) contains a table of entries describing the
attributes of function parameters. These entries are referenced by 1-based index
in the paramattr field of module block FUNCTION records, or within the
attr field of function block INST_INVOKE
and INST_CALL
records.
Entries within PARAMATTR_BLOCK
are constructed to ensure that each is unique
(i.e., no two indicies represent equivalent attribute lists).
PARAMATTR_CODE_ENTRY Record¶
[ENTRY, paramidx0, attr0, paramidx1, attr1...]
The ENTRY
record (code 1) contains an even number of values describing a
unique set of function parameter attributes. Each paramidx value indicates
which set of attributes is represented, with 0 representing the return value
attributes, 0xFFFFFFFF representing function attributes, and other values
representing 1-based function parameters. Each attr value is a bitmap with the
following interpretation:
- bit 0:
zeroext
- bit 1:
signext
- bit 2:
noreturn
- bit 3:
inreg
- bit 4:
sret
- bit 5:
nounwind
- bit 6:
noalias
- bit 7:
byval
- bit 8:
nest
- bit 9:
readnone
- bit 10:
readonly
- bit 11:
noinline
- bit 12:
alwaysinline
- bit 13:
optsize
- bit 14:
ssp
- bit 15:
sspreq
- bits 16-31:
align n
- bit 32:
nocapture
- bit 33:
noredzone
- bit 34:
noimplicitfloat
- bit 35:
naked
- bit 36:
inlinehint
- bits 37-39:
alignstack n
, represented as the logarithm base 2 of the requested alignment, plus 1
TYPE_BLOCK Contents¶
The TYPE_BLOCK
block (id 10) contains records which constitute a table of
type operator entries used to represent types referenced within an LLVM
module. Each record (with the exception of NUMENTRY) generates a single type
table entry, which may be referenced by 0-based index from instructions,
constants, metadata, type symbol table entries, or other type operator records.
Entries within TYPE_BLOCK
are constructed to ensure that each entry is
unique (i.e., no two indicies represent structurally equivalent types).
TYPE_CODE_NUMENTRY Record¶
[NUMENTRY, numentries]
The NUMENTRY
record (code 1) contains a single value which indicates the
total number of type code entries in the type table of the module. If present,
NUMENTRY
should be the first record in the block.
TYPE_CODE_HALF Record¶
[HALF]
The HALF
record (code 10) adds a half
(16-bit floating point) type to
the type table.
TYPE_CODE_FLOAT Record¶
[FLOAT]
The FLOAT
record (code 3) adds a float
(32-bit floating point) type to
the type table.
TYPE_CODE_DOUBLE Record¶
[DOUBLE]
The DOUBLE
record (code 4) adds a double
(64-bit floating point) type to
the type table.
TYPE_CODE_OPAQUE Record¶
[OPAQUE]
The OPAQUE
record (code 6) adds an opaque
type to the type table. Note
that distinct opaque
types are not unified.
TYPE_CODE_INTEGER Record¶
[INTEGER, width]
The INTEGER
record (code 7) adds an integer type to the type table. The
single width field indicates the width of the integer type.
TYPE_CODE_POINTER Record¶
[POINTER, pointee type, address space]
The POINTER
record (code 8) adds a pointer type to the type table. The
operand fields are
- pointee type: The type index of the pointed-to type
- address space: If supplied, the target-specific numbered address space where the pointed-to object resides. Otherwise, the default address space is zero.
TYPE_CODE_FUNCTION Record¶
[FUNCTION, vararg, ignored, retty, ...paramty... ]
The FUNCTION
record (code 9) adds a function type to the type table. The
operand fields are
- vararg: Non-zero if the type represents a varargs function
- ignored: This value field is present for backward compatibility only, and is ignored
- retty: The type index of the function’s return type
- paramty: Zero or more type indices representing the parameter types of the function
TYPE_CODE_STRUCT Record¶
[STRUCT, ispacked, ...eltty...]
The STRUCT
record (code 10) adds a struct type to the type table. The
operand fields are
- ispacked: Non-zero if the type represents a packed structure
- eltty: Zero or more type indices representing the element types of the structure
TYPE_CODE_ARRAY Record¶
[ARRAY, numelts, eltty]
The ARRAY
record (code 11) adds an array type to the type table. The
operand fields are
- numelts: The number of elements in arrays of this type
- eltty: The type index of the array element type
TYPE_CODE_VECTOR Record¶
[VECTOR, numelts, eltty]
The VECTOR
record (code 12) adds a vector type to the type table. The
operand fields are
- numelts: The number of elements in vectors of this type
- eltty: The type index of the vector element type
TYPE_CODE_X86_FP80 Record¶
[X86_FP80]
The X86_FP80
record (code 13) adds an x86_fp80
(80-bit floating point)
type to the type table.
TYPE_CODE_FP128 Record¶
[FP128]
The FP128
record (code 14) adds an fp128
(128-bit floating point) type
to the type table.
TYPE_CODE_PPC_FP128 Record¶
[PPC_FP128]
The PPC_FP128
record (code 15) adds a ppc_fp128
(128-bit floating point)
type to the type table.
TYPE_CODE_METADATA Record¶
[METADATA]
The METADATA
record (code 16) adds a metadata
type to the type table.
CONSTANTS_BLOCK Contents¶
The CONSTANTS_BLOCK
block (id 11) ...
FUNCTION_BLOCK Contents¶
The FUNCTION_BLOCK
block (id 12) ...
In addition to the record types described below, a FUNCTION_BLOCK
block may
contain the following sub-blocks:
TYPE_SYMTAB_BLOCK Contents¶
The TYPE_SYMTAB_BLOCK
block (id 13) contains entries which map between
module-level named types and their corresponding type indices.
TST_CODE_ENTRY Record¶
[ENTRY, typeid, ...string...]
The ENTRY
record (code 1) contains a variable number of values, with the
first giving the type index of the designated type, and the remaining values
giving the character codes of the type name. Each entry corresponds to a single
named type.
VALUE_SYMTAB_BLOCK Contents¶
The VALUE_SYMTAB_BLOCK
block (id 14) ...
METADATA_BLOCK Contents¶
The METADATA_BLOCK
block (id 15) ...
METADATA_ATTACHMENT Contents¶
The METADATA_ATTACHMENT
block (id 16) ...
LLVM Block Frequency Terminology¶
Introduction¶
Block Frequency is a metric for estimating the relative frequency of different
basic blocks. This document describes the terminology that the
BlockFrequencyInfo
and MachineBlockFrequencyInfo
analysis passes use.
Branch Probability¶
Blocks with multiple successors have probabilities associated with each outgoing edge. These are called branch probabilities. For a given block, the sum of its outgoing branch probabilities should be 1.0.
Branch Weight¶
Rather than storing fractions on each edge, we store an integer weight. Weights are relative to the other edges of a given predecessor block. The branch probability associated with a given edge is its own weight divided by the sum of the weights on the predecessor’s outgoing edges.
For example, consider this IR:
define void @foo() {
; ...
A:
br i1 %cond, label %B, label %C, !prof !0
; ...
}
!0 = metadata !{metadata !"branch_weights", i32 7, i32 8}
and this simple graph representation:
A -> B (edge-weight: 7)
A -> C (edge-weight: 8)
The probability of branching from block A to block B is 7/15, and the probability of branching from block A to block C is 8/15.
See LLVM Branch Weight Metadata for details about the branch weight IR representation.
Block Frequency¶
Block frequency is a relative metric that represents the number of times a block executes. The ratio of a block frequency to the entry block frequency is the expected number of times the block will execute per entry to the function.
Block frequency is the main output of the BlockFrequencyInfo
and
MachineBlockFrequencyInfo
analysis passes.
Implementation: a series of DAGs¶
The implementation of the block frequency calculation analyses each loop, bottom-up, ignoring backedges; i.e., as a DAG. After each loop is processed, it’s packaged up to act as a pseudo-node in its parent loop’s (or the function’s) DAG analysis.
Block Mass¶
For each DAG, the entry node is assigned a mass of UINT64_MAX
and mass is
distributed to successors according to branch weights. Block Mass uses a
fixed-point representation where UINT64_MAX
represents 1.0
and 0
represents a number just above 0.0
.
After mass is fully distributed, in any cut of the DAG that separates the exit
nodes from the entry node, the sum of the block masses of the nodes succeeded
by a cut edge should equal UINT64_MAX
. In other words, mass is conserved
as it “falls” through the DAG.
If a function’s basic block graph is a DAG, then block masses are valid block frequencies. This works poorly in practise though, since downstream users rely on adding block frequencies together without hitting the maximum.
Loop Scale¶
Loop scale is a metric that indicates how many times a loop iterates per entry. As mass is distributed through the loop’s DAG, the (otherwise ignored) backedge mass is collected. This backedge mass is used to compute the exit frequency, and thus the loop scale.
Implementation: Getting from mass and scale to frequency¶
After analysing the complete series of DAGs, each block has a mass (local to its containing loop, if any), and each loop pseudo-node has a loop scale and its own mass (from its parent’s DAG).
We can get an initial frequency assignment (with entry frequency of 1.0) by multiplying these masses and loop scales together. A given block’s frequency is the product of its mass, the mass of containing loops’ pseudo nodes, and the containing loops’ loop scales.
Since downstream users need integers (not floating point), this initial
frequency assignment is shifted as necessary into the range of uint64_t
.
Block Bias¶
Block bias is a proposed absolute metric to indicate a bias toward or away from a given block during a function’s execution. The idea is that bias can be used in isolation to indicate whether a block is relatively hot or cold, or to compare two blocks to indicate whether one is hotter or colder than the other.
The proposed calculation involves calculating a reference block frequency, where:
- every branch weight is assumed to be 1 (i.e., every branch probability distribution is even) and
- loop scales are ignored.
This reference frequency represents what the block frequency would be in an unbiased graph.
The bias is the ratio of the block frequency to this reference block frequency.
LLVM Branch Weight Metadata¶
Introduction¶
Branch Weight Metadata represents branch weights as its likeliness to be taken
(see LLVM Block Frequency Terminology). Metadata is assigned to the
TerminatorInst
as a MDNode
of the MD_prof
kind. The first operator
is always a MDString
node with the string “branch_weights”. Number of
operators depends on the terminator type.
Branch weights might be fetch from the profiling file, or generated based on __builtin_expect instruction.
All weights are represented as an unsigned 32-bit values, where higher value indicates greater chance to be taken.
Supported Instructions¶
BranchInst
¶
Metadata is only assigned to the conditional branches. There are two extra operarands for the true and the false branch.
!0 = metadata !{
metadata !"branch_weights",
i32 <TRUE_BRANCH_WEIGHT>,
i32 <FALSE_BRANCH_WEIGHT>
}
SwitchInst
¶
Branch weights are assigned to every case (including the default
case which
is always case #0).
!0 = metadata !{
metadata !"branch_weights",
i32 <DEFAULT_BRANCH_WEIGHT>
[ , i32 <CASE_BRANCH_WEIGHT> ... ]
}
IndirectBrInst
¶
Branch weights are assigned to every destination.
!0 = metadata !{
metadata !"branch_weights",
i32 <LABEL_BRANCH_WEIGHT>
[ , i32 <LABEL_BRANCH_WEIGHT> ... ]
}
Built-in expect
Instructions¶
__builtin_expect(long exp, long c)
instruction provides branch prediction
information. The return value is the value of exp
.
It is especially useful in conditional statements. Currently Clang supports two conditional statements:
if
statement¶
The exp
parameter is the condition. The c
parameter is the expected
comparison value. If it is equal to 1 (true), the condition is likely to be
true, in other case condition is likely to be false. For example:
if (__builtin_expect(x > 0, 1)) {
// This block is likely to be taken.
}
switch
statement¶
The exp
parameter is the value. The c
parameter is the expected
value. If the expected value doesn’t show on the cases list, the default
case is assumed to be likely taken.
switch (__builtin_expect(x, 5)) {
default: break;
case 0: // ...
case 3: // ...
case 5: // This case is likely to be taken.
}
CFG Modifications¶
Branch Weight Metatada is not proof against CFG changes. If terminator operands’ are changed some action should be taken. In other case some misoptimizations may occur due to incorrent branch prediction information.
Function Entry Counts¶
To allow comparing different functions durint inter-procedural analysis and
optimization, MD_prof
nodes can also be assigned to a function definition.
The first operand is a string indicating the name of the associated counter.
Currently, one counter is supported: “function_entry_count”. This is a 64-bit counter that indicates the number of times that this function was invoked (in the case of instrumentation-based profiles). In the case of sampling-based profiles, this counter is an approximation of how many times the function was invoked.
For example, in the code below, the instrumentation for function foo() indicates that it was called 2,590 times at runtime.
define i32 @foo() !prof !1 {
ret i32 0
}
!1 = !{!"function_entry_count", i64 2590}
LLVM bugpoint tool: design and usage¶
Description¶
bugpoint
narrows down the source of problems in LLVM tools and passes. It
can be used to debug three types of failures: optimizer crashes, miscompilations
by optimizers, or bad native code generation (including problems in the static
and JIT compilers). It aims to reduce large test cases to small, useful ones.
For example, if opt
crashes while optimizing a file, it will identify the
optimization (or combination of optimizations) that causes the crash, and reduce
the file down to a small example which triggers the crash.
For detailed case scenarios, such as debugging opt
, or one of the LLVM code
generators, see How to submit an LLVM bug report.
Design Philosophy¶
bugpoint
is designed to be a useful tool without requiring any hooks into
the LLVM infrastructure at all. It works with any and all LLVM passes and code
generators, and does not need to “know” how they work. Because of this, it may
appear to do stupid things or miss obvious simplifications. bugpoint
is
also designed to trade off programmer time for computer time in the
compiler-debugging process; consequently, it may take a long period of
(unattended) time to reduce a test case, but we feel it is still worth it. Note
that bugpoint
is generally very quick unless debugging a miscompilation
where each test of the program (which requires executing it) takes a long time.
Automatic Debugger Selection¶
bugpoint
reads each .bc
or .ll
file specified on the command line
and links them together into a single module, called the test program. If any
LLVM passes are specified on the command line, it runs these passes on the test
program. If any of the passes crash, or if they produce malformed output (which
causes the verifier to abort), bugpoint
starts the crash debugger.
Otherwise, if the -output
option was not specified, bugpoint
runs the
test program with the “safe” backend (which is assumed to generate good code) to
generate a reference output. Once bugpoint
has a reference output for the
test program, it tries executing it with the selected code generator. If the
selected code generator crashes, bugpoint
starts the crash debugger on
the code generator. Otherwise, if the resulting output differs from the
reference output, it assumes the difference resulted from a code generator
failure, and starts the code generator debugger.
Finally, if the output of the selected code generator matches the reference
output, bugpoint
runs the test program after all of the LLVM passes have
been applied to it. If its output differs from the reference output, it assumes
the difference resulted from a failure in one of the LLVM passes, and enters the
miscompilation debugger. Otherwise, there is no problem bugpoint
can
debug.
Crash debugger¶
If an optimizer or code generator crashes, bugpoint
will try as hard as it
can to reduce the list of passes (for optimizer crashes) and the size of the
test program. First, bugpoint
figures out which combination of optimizer
passes triggers the bug. This is useful when debugging a problem exposed by
opt
, for example, because it runs over 38 passes.
Next, bugpoint
tries removing functions from the test program, to reduce its
size. Usually it is able to reduce a test program to a single function, when
debugging intraprocedural optimizations. Once the number of functions has been
reduced, it attempts to delete various edges in the control flow graph, to
reduce the size of the function as much as possible. Finally, bugpoint
deletes any individual LLVM instructions whose absence does not eliminate the
failure. At the end, bugpoint
should tell you what passes crash, give you a
bitcode file, and give you instructions on how to reproduce the failure with
opt
or llc
.
Code generator debugger¶
The code generator debugger attempts to narrow down the amount of code that is being miscompiled by the selected code generator. To do this, it takes the test program and partitions it into two pieces: one piece which it compiles with the “safe” backend (into a shared object), and one piece which it runs with either the JIT or the static LLC compiler. It uses several techniques to reduce the amount of code pushed through the LLVM code generator, to reduce the potential scope of the problem. After it is finished, it emits two bitcode files (called “test” [to be compiled with the code generator] and “safe” [to be compiled with the “safe” backend], respectively), and instructions for reproducing the problem. The code generator debugger assumes that the “safe” backend produces good code.
Miscompilation debugger¶
The miscompilation debugger works similarly to the code generator debugger. It works by splitting the test program into two pieces, running the optimizations specified on one piece, linking the two pieces back together, and then executing the result. It attempts to narrow down the list of passes to the one (or few) which are causing the miscompilation, then reduce the portion of the test program which is being miscompiled. The miscompilation debugger assumes that the selected code generator is working properly.
Advice for using bugpoint¶
bugpoint
can be a remarkably useful tool, but it sometimes works in
non-obvious ways. Here are some hints and tips:
In the code generator and miscompilation debuggers,
bugpoint
only works with programs that have deterministic output. Thus, if the program outputsargv[0]
, the date, time, or any other “random” data,bugpoint
may misinterpret differences in these data, when output, as the result of a miscompilation. Programs should be temporarily modified to disable outputs that are likely to vary from run to run.In the code generator and miscompilation debuggers, debugging will go faster if you manually modify the program or its inputs to reduce the runtime, but still exhibit the problem.
bugpoint
is extremely useful when working on a new optimization: it helps track down regressions quickly. To avoid having to relinkbugpoint
every time you change your optimization however, havebugpoint
dynamically load your optimization with the-load
option.bugpoint
can generate a lot of output and run for a long period of time. It is often useful to capture the output of the program to file. For example, in the C shell, you can run:$ bugpoint ... |& tee bugpoint.log
to get a copy of
bugpoint
‘s output in the filebugpoint.log
, as well as on your terminal.bugpoint
cannot debug problems with the LLVM linker. Ifbugpoint
crashes before you see its “All input ok” message, you might tryllvm-link -v
on the same set of input files. If that also crashes, you may be experiencing a linker bug.bugpoint
is useful for proactively finding bugs in LLVM. Invokingbugpoint
with the-find-bugs
option will cause the list of specified optimizations to be randomized and applied to the program. This process will repeat until a bug is found or the user killsbugpoint
.
What to do when bugpoint isn’t enough¶
Sometimes, bugpoint
is not enough. In particular, InstCombine and
TargetLowering both have visitor structured code with lots of potential
transformations. If the process of using bugpoint has left you with still too
much code to figure out and the problem seems to be in instcombine, the
following steps may help. These same techniques are useful with TargetLowering
as well.
Turn on -debug-only=instcombine
and see which transformations within
instcombine are firing by selecting out lines with “IC
” in them.
At this point, you have a decision to make. Is the number of transformations small enough to step through them using a debugger? If so, then try that.
If there are too many transformations, then a source modification approach may
be helpful. In this approach, you can modify the source code of instcombine to
disable just those transformations that are being performed on your test input
and perform a binary search over the set of transformations. One set of places
to modify are the “visit*
” methods of InstCombiner
(e.g.
visitICmpInst
) by adding a “return false
” as the first line of the
method.
If that still doesn’t remove enough, then change the caller of
InstCombiner::DoOneIteration
, InstCombiner::runOnFunction
to limit the
number of iterations.
You may also find it useful to use “-stats
” now to see what parts of
instcombine are firing. This can guide where to put additional reporting code.
At this point, if the amount of transformations is still too large, then inserting code to limit whether or not to execute the body of the code in the visit function can be helpful. Add a static counter which is incremented on every invocation of the function. Then add code which simply returns false on desired ranges. For example:
static int calledCount = 0;
calledCount++;
DEBUG(if (calledCount < 212) return false);
DEBUG(if (calledCount > 217) return false);
DEBUG(if (calledCount == 213) return false);
DEBUG(if (calledCount == 214) return false);
DEBUG(if (calledCount == 215) return false);
DEBUG(if (calledCount == 216) return false);
DEBUG(dbgs() << "visitXOR calledCount: " << calledCount << "\n");
DEBUG(dbgs() << "I: "; I->dump());
could be added to visitXOR
to limit visitXor
to being applied only to
calls 212 and 217. This is from an actual test case and raises an important
point—a simple binary search may not be sufficient, as transformations that
interact may require isolating more than one call. In TargetLowering, use
return SDNode();
instead of return false;
.
Now that the number of transformations is down to a manageable number, try
examining the output to see if you can figure out which transformations are
being done. If that can be figured out, then do the usual debugging. If which
code corresponds to the transformation being performed isn’t obvious, set a
breakpoint after the call count based disabling and step through the code.
Alternatively, you can use “printf
” style debugging to report waypoints.
The LLVM Target-Independent Code Generator¶
- Introduction
- Target description classes
- Machine code description classes
- The “MC” Layer
- Target-independent code generation algorithms
- Instruction Selection
- Introduction to SelectionDAGs
- SelectionDAG Instruction Selection Process
- Initial SelectionDAG Construction
- SelectionDAG LegalizeTypes Phase
- SelectionDAG Legalize Phase
- SelectionDAG Optimization Phase: the DAG Combiner
- SelectionDAG Select Phase
- SelectionDAG Scheduling and Formation Phase
- Future directions for the SelectionDAG
- SSA-based Machine Code Optimizations
- Live Intervals
- Register Allocation
- Prolog/Epilog Code Insertion
- Late Machine Code Optimizations
- Code Emission
- VLIW Packetizer
- Instruction Selection
- Implementing a Native Assembler
- Target-specific Implementation Notes
Warning
This is a work in progress.
Introduction¶
The LLVM target-independent code generator is a framework that provides a suite of reusable components for translating the LLVM internal representation to the machine code for a specified target—either in assembly form (suitable for a static compiler) or in binary machine code format (usable for a JIT compiler). The LLVM target-independent code generator consists of six main components:
- Abstract target description interfaces which capture important properties
about various aspects of the machine, independently of how they will be used.
These interfaces are defined in
include/llvm/Target/
. - Classes used to represent the code being generated for a target. These
classes are intended to be abstract enough to represent the machine code for
any target machine. These classes are defined in
include/llvm/CodeGen/
. At this level, concepts like “constant pool entries” and “jump tables” are explicitly exposed. - Classes and algorithms used to represent code as the object file level, the MC Layer. These classes represent assembly level constructs like labels, sections, and instructions. At this level, concepts like “constant pool entries” and “jump tables” don’t exist.
- Target-independent algorithms used to implement various phases of native
code generation (register allocation, scheduling, stack frame representation,
etc). This code lives in
lib/CodeGen/
. - Implementations of the abstract target description interfaces for
particular targets. These machine descriptions make use of the components
provided by LLVM, and can optionally provide custom target-specific passes,
to build complete code generators for a specific target. Target descriptions
live in
lib/Target/
. - The target-independent JIT components. The LLVM JIT is completely target
independent (it uses the
TargetJITInfo
structure to interface for target-specific issues. The code for the target-independent JIT lives inlib/ExecutionEngine/JIT
.
Depending on which part of the code generator you are interested in working on, different pieces of this will be useful to you. In any case, you should be familiar with the target description and machine code representation classes. If you want to add a backend for a new target, you will need to implement the target description classes for your new target and understand the LLVM code representation. If you are interested in implementing a new code generation algorithm, it should only depend on the target-description and machine code representation classes, ensuring that it is portable.
Required components in the code generator¶
The two pieces of the LLVM code generator are the high-level interface to the code generator and the set of reusable components that can be used to build target-specific backends. The two most important interfaces ( TargetMachine and DataLayout ) are the only ones that are required to be defined for a backend to fit into the LLVM system, but the others must be defined if the reusable code generator components are going to be used.
This design has two important implications. The first is that LLVM can support completely non-traditional code generation targets. For example, the C backend does not require register allocation, instruction selection, or any of the other standard components provided by the system. As such, it only implements these two interfaces, and does its own thing. Note that C backend was removed from the trunk since LLVM 3.1 release. Another example of a code generator like this is a (purely hypothetical) backend that converts LLVM to the GCC RTL form and uses GCC to emit machine code for a target.
This design also implies that it is possible to design and implement radically different code generators in the LLVM system that do not make use of any of the built-in components. Doing so is not recommended at all, but could be required for radically different targets that do not fit into the LLVM machine description model: FPGAs for example.
The high-level design of the code generator¶
The LLVM target-independent code generator is designed to support efficient and quality code generation for standard register-based microprocessors. Code generation in this model is divided into the following stages:
- Instruction Selection — This phase determines an efficient way to express the input LLVM code in the target instruction set. This stage produces the initial code for the program in the target instruction set, then makes use of virtual registers in SSA form and physical registers that represent any required register assignments due to target constraints or calling conventions. This step turns the LLVM code into a DAG of target instructions.
- Scheduling and Formation — This phase takes the DAG of target instructions produced by the instruction selection phase, determines an ordering of the instructions, then emits the instructions as MachineInstrs with that ordering. Note that we describe this in the instruction selection section because it operates on a SelectionDAG.
- SSA-based Machine Code Optimizations — This optional stage consists of a series of machine-code optimizations that operate on the SSA-form produced by the instruction selector. Optimizations like modulo-scheduling or peephole optimization work here.
- Register Allocation — The target code is transformed from an infinite virtual register file in SSA form to the concrete register file used by the target. This phase introduces spill code and eliminates all virtual register references from the program.
- Prolog/Epilog Code Insertion — Once the machine code has been generated for the function and the amount of stack space required is known (used for LLVM alloca’s and spill slots), the prolog and epilog code for the function can be inserted and “abstract stack location references” can be eliminated. This stage is responsible for implementing optimizations like frame-pointer elimination and stack packing.
- Late Machine Code Optimizations — Optimizations that operate on “final” machine code can go here, such as spill code scheduling and peephole optimizations.
- Code Emission — The final stage actually puts out the code for the current function, either in the target assembler format or in machine code.
The code generator is based on the assumption that the instruction selector will use an optimal pattern matching selector to create high-quality sequences of native instructions. Alternative code generator designs based on pattern expansion and aggressive iterative peephole optimization are much slower. This design permits efficient compilation (important for JIT environments) and aggressive optimization (used when generating code offline) by allowing components of varying levels of sophistication to be used for any step of compilation.
In addition to these stages, target implementations can insert arbitrary target-specific passes into the flow. For example, the X86 target uses a special pass to handle the 80x87 floating point stack architecture. Other targets with unusual requirements can be supported with custom passes as needed.
Using TableGen for target description¶
The target description classes require a detailed description of the target
architecture. These target descriptions often have a large amount of common
information (e.g., an add
instruction is almost identical to a sub
instruction). In order to allow the maximum amount of commonality to be
factored out, the LLVM code generator uses the
TableGen tool to describe big chunks of the
target machine, which allows the use of domain-specific and target-specific
abstractions to reduce the amount of repetition.
As LLVM continues to be developed and refined, we plan to move more and more of
the target description to the .td
form. Doing so gives us a number of
advantages. The most important is that it makes it easier to port LLVM because
it reduces the amount of C++ code that has to be written, and the surface area
of the code generator that needs to be understood before someone can get
something working. Second, it makes it easier to change things. In particular,
if tables and other things are all emitted by tblgen
, we only need a change
in one place (tblgen
) to update all of the targets to a new interface.
Target description classes¶
The LLVM target description classes (located in the include/llvm/Target
directory) provide an abstract description of the target machine independent of
any particular client. These classes are designed to capture the abstract
properties of the target (such as the instructions and registers it has), and do
not incorporate any particular pieces of code generation algorithms.
All of the target description classes (except the DataLayout class) are designed to be subclassed by the concrete target implementation, and have virtual methods implemented. To get to these implementations, the TargetMachine class provides accessors that should be implemented by the target.
The TargetMachine
class¶
The TargetMachine
class provides virtual methods that are used to access the
target-specific implementations of the various target description classes via
the get*Info
methods (getInstrInfo
, getRegisterInfo
,
getFrameInfo
, etc.). This class is designed to be specialized by a concrete
target implementation (e.g., X86TargetMachine
) which implements the various
virtual methods. The only required target description class is the
DataLayout class, but if the code
generator components are to be used, the other interfaces should be implemented
as well.
The DataLayout
class¶
The DataLayout
class is the only required target description class, and it
is the only class that is not extensible (you cannot derive a new class from
it). DataLayout
specifies information about how the target lays out memory
for structures, the alignment requirements for various data types, the size of
pointers in the target, and whether the target is little-endian or
big-endian.
The TargetLowering
class¶
The TargetLowering
class is used by SelectionDAG based instruction selectors
primarily to describe how LLVM code should be lowered to SelectionDAG
operations. Among other things, this class indicates:
- an initial register class to use for various
ValueType
s, - which operations are natively supported by the target machine,
- the return type of
setcc
operations, - the type to use for shift amounts, and
- various high-level characteristics, like whether it is profitable to turn division by a constant into a multiplication sequence.
The TargetRegisterInfo
class¶
The TargetRegisterInfo
class is used to describe the register file of the
target and any interactions between the registers.
Registers are represented in the code generator by unsigned integers. Physical
registers (those that actually exist in the target description) are unique
small numbers, and virtual registers are generally large. Note that
register #0
is reserved as a flag value.
Each register in the processor description has an associated
TargetRegisterDesc
entry, which provides a textual name for the register
(used for assembly output and debugging dumps) and a set of aliases (used to
indicate whether one register overlaps with another).
In addition to the per-register description, the TargetRegisterInfo
class
exposes a set of processor specific register classes (instances of the
TargetRegisterClass
class). Each register class contains sets of registers
that have the same properties (for example, they are all 32-bit integer
registers). Each SSA virtual register created by the instruction selector has
an associated register class. When the register allocator runs, it replaces
virtual registers with a physical register in the set.
The target-specific implementations of these classes is auto-generated from a TableGen description of the register file.
The TargetInstrInfo
class¶
The TargetInstrInfo
class is used to describe the machine instructions
supported by the target. Descriptions define things like the mnemonic for
the opcode, the number of operands, the list of implicit register uses and defs,
whether the instruction has certain target-independent properties (accesses
memory, is commutable, etc), and holds any target-specific flags.
The TargetFrameLowering
class¶
The TargetFrameLowering
class is used to provide information about the stack
frame layout of the target. It holds the direction of stack growth, the known
stack alignment on entry to each function, and the offset to the local area.
The offset to the local area is the offset from the stack pointer on function
entry to the first location where function data (local variables, spill
locations) can be stored.
The TargetSubtarget
class¶
The TargetSubtarget
class is used to provide information about the specific
chip set being targeted. A sub-target informs code generation of which
instructions are supported, instruction latencies and instruction execution
itinerary; i.e., which processing units are used, in what order, and for how
long.
The TargetJITInfo
class¶
The TargetJITInfo
class exposes an abstract interface used by the
Just-In-Time code generator to perform target-specific activities, such as
emitting stubs. If a TargetMachine
supports JIT code generation, it should
provide one of these objects through the getJITInfo
method.
Machine code description classes¶
At the high-level, LLVM code is translated to a machine specific representation
formed out of MachineFunction ,
MachineBasicBlock , and
MachineInstr instances (defined in
include/llvm/CodeGen
). This representation is completely target agnostic,
representing instructions in their most abstract form: an opcode and a series of
operands. This representation is designed to support both an SSA representation
for machine code, as well as a register allocated, non-SSA form.
The MachineInstr
class¶
Target machine instructions are represented as instances of the MachineInstr
class. This class is an extremely abstract way of representing machine
instructions. In particular, it only keeps track of an opcode number and a set
of operands.
The opcode number is a simple unsigned integer that only has meaning to a
specific backend. All of the instructions for a target should be defined in the
*InstrInfo.td
file for the target. The opcode enum values are auto-generated
from this description. The MachineInstr
class does not have any information
about how to interpret the instruction (i.e., what the semantics of the
instruction are); for that you must refer to the
TargetInstrInfo class.
The operands of a machine instruction can be of several different types: a register reference, a constant integer, a basic block reference, etc. In addition, a machine operand should be marked as a def or a use of the value (though only registers are allowed to be defs).
By convention, the LLVM code generator orders instruction operands so that all
register definitions come before the register uses, even on architectures that
are normally printed in other orders. For example, the SPARC add instruction:
“add %i1, %i2, %i3
” adds the “%i1”, and “%i2” registers and stores the
result into the “%i3” register. In the LLVM code generator, the operands should
be stored as “%i3, %i1, %i2
”: with the destination first.
Keeping destination (definition) operands at the beginning of the operand list has several advantages. In particular, the debugging printer will print the instruction like this:
%r3 = add %i1, %i2
Also if the first operand is a def, it is easier to create instructions whose only def is the first operand.
Using the MachineInstrBuilder.h
functions¶
Machine instructions are created by using the BuildMI
functions, located in
the include/llvm/CodeGen/MachineInstrBuilder.h
file. The BuildMI
functions make it easy to build arbitrary machine instructions. Usage of the
BuildMI
functions look like this:
// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
// instruction. The '1' specifies how many operands will be added.
MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
// Create the same instr, but insert it at the end of a basic block.
MachineBasicBlock &MBB = ...
BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
// Create the same instr, but insert it before a specified iterator point.
MachineBasicBlock::iterator MBBI = ...
BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
// Create a 'cmp Reg, 0' instruction, no destination reg.
MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
// Create an 'sahf' instruction which takes no operands and stores nothing.
MI = BuildMI(X86::SAHF, 0);
// Create a self looping branch instruction.
BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
The key thing to remember with the BuildMI
functions is that you have to
specify the number of operands that the machine instruction will take. This
allows for efficient memory allocation. You also need to specify if operands
default to be uses of values, not definitions. If you need to add a definition
operand (other than the optional destination register), you must explicitly mark
it as such:
MI.addReg(Reg, RegState::Define);
Fixed (preassigned) registers¶
One important issue that the code generator needs to be aware of is the presence
of fixed registers. In particular, there are often places in the instruction
stream where the register allocator must arrange for a particular value to be
in a particular register. This can occur due to limitations of the instruction
set (e.g., the X86 can only do a 32-bit divide with the EAX
/EDX
registers), or external factors like calling conventions. In any case, the
instruction selector should emit code that copies a virtual register into or out
of a physical register when needed.
For example, consider this simple LLVM example:
define i32 @test(i32 %X, i32 %Y) {
%Z = sdiv i32 %X, %Y
ret i32 %Z
}
The X86 instruction selector might produce this machine code for the div
and
ret
:
;; Start of div
%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
%reg1027 = sar %reg1024, 31
%EDX = mov %reg1027 ;; Sign extend X into EDX
idiv %reg1025 ;; Divide by Y (in reg1025)
%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
;; Start of ret
%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
ret
By the end of code generation, the register allocator would coalesce the registers and delete the resultant identity moves producing the following code:
;; X is in EAX, Y is in ECX
mov %EAX, %EDX
sar %EDX, 31
idiv %ECX
ret
This approach is extremely general (if it can handle the X86 architecture, it can handle anything!) and allows all of the target specific knowledge about the instruction stream to be isolated in the instruction selector. Note that physical registers should have a short lifetime for good code generation, and all physical registers are assumed dead on entry to and exit from basic blocks (before register allocation). Thus, if you need a value to be live across basic block boundaries, it must live in a virtual register.
Call-clobbered registers¶
Some machine instructions, like calls, clobber a large number of physical
registers. Rather than adding <def,dead>
operands for all of them, it is
possible to use an MO_RegisterMask
operand instead. The register mask
operand holds a bit mask of preserved registers, and everything else is
considered to be clobbered by the instruction.
Machine code in SSA form¶
MachineInstr
‘s are initially selected in SSA-form, and are maintained in
SSA-form until register allocation happens. For the most part, this is
trivially simple since LLVM is already in SSA form; LLVM PHI nodes become
machine code PHI nodes, and virtual registers are only allowed to have a single
definition.
After register allocation, machine code is no longer in SSA-form because there are no virtual registers left in the code.
The MachineBasicBlock
class¶
The MachineBasicBlock
class contains a list of machine instructions
( MachineInstr instances). It roughly
corresponds to the LLVM code input to the instruction selector, but there can be
a one-to-many mapping (i.e. one LLVM basic block can map to multiple machine
basic blocks). The MachineBasicBlock
class has a “getBasicBlock
” method,
which returns the LLVM basic block that it comes from.
The MachineFunction
class¶
The MachineFunction
class contains a list of machine basic blocks
( MachineBasicBlock instances). It
corresponds one-to-one with the LLVM function input to the instruction selector.
In addition to a list of basic blocks, the MachineFunction
contains a a
MachineConstantPool
, a MachineFrameInfo
, a MachineFunctionInfo
, and
a MachineRegisterInfo
. See include/llvm/CodeGen/MachineFunction.h
for
more information.
MachineInstr Bundles
¶
LLVM code generator can model sequences of instructions as MachineInstr bundles. A MI bundle can model a VLIW group / pack which contains an arbitrary number of parallel instructions. It can also be used to model a sequential list of instructions (potentially with data dependencies) that cannot be legally separated (e.g. ARM Thumb2 IT blocks).
Conceptually a MI bundle is a MI with a number of other MIs nested within:
--------------
| Bundle | ---------
-------------- \
| ----------------
| | MI |
| ----------------
| |
| ----------------
| | MI |
| ----------------
| |
| ----------------
| | MI |
| ----------------
|
--------------
| Bundle | --------
-------------- \
| ----------------
| | MI |
| ----------------
| |
| ----------------
| | MI |
| ----------------
| |
| ...
|
--------------
| Bundle | --------
-------------- \
|
...
MI bundle support does not change the physical representations of MachineBasicBlock and MachineInstr. All the MIs (including top level and nested ones) are stored as sequential list of MIs. The “bundled” MIs are marked with the ‘InsideBundle’ flag. A top level MI with the special BUNDLE opcode is used to represent the start of a bundle. It’s legal to mix BUNDLE MIs with indiviual MIs that are not inside bundles nor represent bundles.
MachineInstr passes should operate on a MI bundle as a single unit. Member methods have been taught to correctly handle bundles and MIs inside bundles. The MachineBasicBlock iterator has been modified to skip over bundled MIs to enforce the bundle-as-a-single-unit concept. An alternative iterator instr_iterator has been added to MachineBasicBlock to allow passes to iterate over all of the MIs in a MachineBasicBlock, including those which are nested inside bundles. The top level BUNDLE instruction must have the correct set of register MachineOperand’s that represent the cumulative inputs and outputs of the bundled MIs.
Packing / bundling of MachineInstr’s should be done as part of the register allocation super-pass. More specifically, the pass which determines what MIs should be bundled together must be done after code generator exits SSA form (i.e. after two-address pass, PHI elimination, and copy coalescing). Bundles should only be finalized (i.e. adding BUNDLE MIs and input and output register MachineOperands) after virtual registers have been rewritten into physical registers. This requirement eliminates the need to add virtual register operands to BUNDLE instructions which would effectively double the virtual register def and use lists.
The “MC” Layer¶
The MC Layer is used to represent and process code at the raw machine code level, devoid of “high level” information like “constant pools”, “jump tables”, “global variables” or anything like that. At this level, LLVM handles things like label names, machine instructions, and sections in the object file. The code in this layer is used for a number of important purposes: the tail end of the code generator uses it to write a .s or .o file, and it is also used by the llvm-mc tool to implement standalone machine code assemblers and disassemblers.
This section describes some of the important classes. There are also a number of important subsystems that interact at this layer, they are described later in this manual.
The MCStreamer
API¶
MCStreamer is best thought of as an assembler API. It is an abstract API which is implemented in different ways (e.g. to output a .s file, output an ELF .o file, etc) but whose API correspond directly to what you see in a .s file. MCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which directly correspond to assembly level directives. It also has an EmitInstruction method, which is used to output an MCInst to the streamer.
This API is most important for two clients: the llvm-mc stand-alone assembler is effectively a parser that parses a line, then invokes a method on MCStreamer. In the code generator, the Code Emission phase of the code generator lowers higher level LLVM IR and Machine* constructs down to the MC layer, emitting directives through MCStreamer.
On the implementation side of MCStreamer, there are two major implementations:
one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation
that prints out a directive for each method (e.g. EmitValue -> .byte
), but
MCObjectStreamer implements a full assembler.
For target specific directives, the MCStreamer has a MCTargetStreamer instance.
Each target that needs it defines a class that inherits from it and is a lot
like MCStreamer itself: It has one method per directive and two classes that
inherit from it, a target object streamer and a target asm streamer. The target
asm streamer just prints it (emitFnStart -> .fnstrart
), and the object
streamer implement the assembler logic for it.
To make llvm use these classes, the target initialization must call TargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer passing callbacks that allocate the corresponding target streamer and pass it to createAsmStreamer or to the appropriate object streamer constructor.
The MCContext
class¶
The MCContext class is the owner of a variety of uniqued data structures at the MC layer, including symbols, sections, etc. As such, this is the class that you interact with to create symbols and sections. This class can not be subclassed.
The MCSymbol
class¶
The MCSymbol class represents a symbol (aka label) in the assembly file. There are two interesting kinds of symbols: assembler temporary symbols, and normal symbols. Assembler temporary symbols are used and processed by the assembler but are discarded when the object file is produced. The distinction is usually represented by adding a prefix to the label, for example “L” labels are assembler temporary labels in MachO.
MCSymbols are created by MCContext and uniqued there. This means that MCSymbols can be compared for pointer equivalence to find out if they are the same symbol. Note that pointer inequality does not guarantee the labels will end up at different addresses though. It’s perfectly legal to output something like this to the .s file:
foo:
bar:
.byte 4
In this case, both the foo and bar symbols will have the same address.
The MCSection
class¶
The MCSection
class represents an object-file specific section. It is
subclassed by object file specific implementations (e.g. MCSectionMachO
,
MCSectionCOFF
, MCSectionELF
) and these are created and uniqued by
MCContext. The MCStreamer has a notion of the current section, which can be
changed with the SwitchToSection method (which corresponds to a ”.section”
directive in a .s file).
The MCInst
class¶
The MCInst
class is a target-independent representation of an instruction.
It is a simple class (much more so than MachineInstr) that holds a
target-specific opcode and a vector of MCOperands. MCOperand, in turn, is a
simple discriminated union of three cases: 1) a simple immediate, 2) a target
register ID, 3) a symbolic expression (e.g. “Lfoo-Lbar+42
”) as an MCExpr.
MCInst is the common currency used to represent machine instructions at the MC layer. It is the type used by the instruction encoder, the instruction printer, and the type generated by the assembly parser and disassembler.
Target-independent code generation algorithms¶
This section documents the phases described in the high-level design of the code generator. It explains how they work and some of the rationale behind their design.
Instruction Selection¶
Instruction Selection is the process of translating LLVM code presented to the code generator into target-specific machine instructions. There are several well-known ways to do this in the literature. LLVM uses a SelectionDAG based instruction selector.
Portions of the DAG instruction selector are generated from the target
description (*.td
) files. Our goal is for the entire instruction selector
to be generated from these .td
files, though currently there are still
things that require custom C++ code.
Introduction to SelectionDAGs¶
The SelectionDAG provides an abstraction for code representation in a way that is amenable to instruction selection using automatic techniques (e.g. dynamic-programming based optimal pattern matching selectors). It is also well-suited to other phases of code generation; in particular, instruction scheduling (SelectionDAG’s are very close to scheduling DAGs post-selection). Additionally, the SelectionDAG provides a host representation where a large variety of very-low-level (but target-independent) optimizations may be performed; ones which require extensive information about the instructions efficiently supported by the target.
The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
SDNode
class. The primary payload of the SDNode
is its operation code
(Opcode) that indicates what operation the node performs and the operands to the
operation. The various operation node types are described at the top of the
include/llvm/CodeGen/ISDOpcodes.h
file.
Although most operations define a single value, each node in the graph may
define multiple values. For example, a combined div/rem operation will define
both the dividend and the remainder. Many other situations require multiple
values as well. Each node also has some number of operands, which are edges to
the node defining the used value. Because nodes may define multiple values,
edges are represented by instances of the SDValue
class, which is a
<SDNode, unsigned>
pair, indicating the node and result value being used,
respectively. Each value produced by an SDNode
has an associated MVT
(Machine Value Type) indicating what the type of the value is.
SelectionDAGs contain two different kinds of values: those that represent data
flow and those that represent control flow dependencies. Data values are simple
edges with an integer or floating point value type. Control edges are
represented as “chain” edges which are of type MVT::Other
. These edges
provide an ordering between nodes that have side effects (such as loads, stores,
calls, returns, etc). All nodes that have side effects should take a token
chain as input and produce a new one as output. By convention, token chain
inputs are always operand #0, and chain results are always the last value
produced by an operation. However, after instruction selection, the
machine nodes have their chain after the instruction’s operands, and
may be followed by glue nodes.
A SelectionDAG has designated “Entry” and “Root” nodes. The Entry node is
always a marker node with an Opcode of ISD::EntryToken
. The Root node is
the final side-effecting node in the token chain. For example, in a single basic
block function it would be the return node.
One important concept for SelectionDAGs is the notion of a “legal” vs. “illegal” DAG. A legal DAG for a target is one that only uses supported operations and supported types. On a 32-bit PowerPC, for example, a DAG with a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a SREM or UREM operation. The legalize types and legalize operations phases are responsible for turning an illegal DAG into a legal DAG.
SelectionDAG Instruction Selection Process¶
SelectionDAG-based instruction selection consists of the following steps:
- Build initial DAG — This stage performs a simple translation from the input LLVM code to an illegal SelectionDAG.
- Optimize SelectionDAG — This stage performs simple optimizations on the
SelectionDAG to simplify it, and recognize meta instructions (like rotates
and
div
/rem
pairs) for targets that support these meta operations. This makes the resultant code more efficient and the select instructions from DAG phase (below) simpler. - Legalize SelectionDAG Types — This stage transforms SelectionDAG nodes to eliminate any types that are unsupported on the target.
- Optimize SelectionDAG — The SelectionDAG optimizer is run to clean up redundancies exposed by type legalization.
- Legalize SelectionDAG Ops — This stage transforms SelectionDAG nodes to eliminate any operations that are unsupported on the target.
- Optimize SelectionDAG — The SelectionDAG optimizer is run to eliminate inefficiencies introduced by operation legalization.
- Select instructions from DAG — Finally, the target instruction selector matches the DAG operations to target instructions. This process translates the target-independent input DAG into another DAG of target instructions.
- SelectionDAG Scheduling and Formation — The last phase assigns a linear order to the instructions in the target-instruction DAG and emits them into the MachineFunction being compiled. This step uses traditional prepass scheduling techniques.
After all of these steps are complete, the SelectionDAG is destroyed and the rest of the code generation passes are run.
One great way to visualize what is going on here is to take advantage of a few LLC command line options. The following options pop up a window displaying the SelectionDAG at specific times (if you only get errors printed to the console while using this, you probably need to configure your system to add support for it).
-view-dag-combine1-dags
displays the DAG after being built, before the first optimization pass.-view-legalize-dags
displays the DAG before Legalization.-view-dag-combine2-dags
displays the DAG before the second optimization pass.-view-isel-dags
displays the DAG before the Select phase.-view-sched-dags
displays the DAG before Scheduling.
The -view-sunit-dags
displays the Scheduler’s dependency graph. This graph
is based on the final SelectionDAG, with nodes that must be scheduled together
bundled into a single scheduling-unit node, and with immediate operands and
other nodes that aren’t relevant for scheduling omitted.
The option -filter-view-dags
allows to select the name of the basic block
that you are interested to visualize and filters all the previous
view-*-dags
options.
Initial SelectionDAG Construction¶
The initial SelectionDAG is naïvely peephole expanded from
the LLVM input by the SelectionDAGBuilder
class. The intent of this pass
is to expose as much low-level, target-specific details to the SelectionDAG as
possible. This pass is mostly hard-coded (e.g. an LLVM add
turns into an
SDNode add
while a getelementptr
is expanded into the obvious
arithmetic). This pass requires target-specific hooks to lower calls, returns,
varargs, etc. For these features, the TargetLowering
interface is used.
SelectionDAG LegalizeTypes Phase¶
The Legalize phase is in charge of converting a DAG to only use the types that are natively supported by the target.
There are two main ways of converting values of unsupported scalar types to values of supported types: converting small types to larger types (“promoting”), and breaking up large integer types into smaller ones (“expanding”). For example, a target might require that all f32 values are promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same target might require that all i64 values be expanded into pairs of i32 values. These changes can insert sign and zero extensions as needed to make sure that the final code has the same behavior as the input.
There are two main ways of converting values of unsupported vector types to value of supported types: splitting vector types, multiple times if necessary, until a legal type is found, and extending vector types by adding elements to the end to round them out to legal types (“widening”). If a vector gets split all the way down to single-element parts with no supported vector type being found, the elements are converted to scalars (“scalarizing”).
A target implementation tells the legalizer which types are supported (and which
register class to use for them) by calling the addRegisterClass
method in
its TargetLowering
constructor.
SelectionDAG Legalize Phase¶
The Legalize phase is in charge of converting a DAG to only use the operations that are natively supported by the target.
Targets often have weird constraints, such as not supporting every operation on every supported datatype (e.g. X86 does not support byte conditional moves and PowerPC does not support sign-extending loads from a 16-bit memory location). Legalize takes care of this by open-coding another sequence of operations to emulate the operation (“expansion”), by promoting one type to a larger type that supports the operation (“promotion”), or by using a target-specific hook to implement the legalization (“custom”).
A target implementation tells the legalizer which operations are not supported
(and which of the above three actions to take) by calling the
setOperationAction
method in its TargetLowering
constructor.
Prior to the existence of the Legalize passes, we required that every target selector supported and handled every operator and type even if they are not natively supported. The introduction of the Legalize phases allows all of the canonicalization patterns to be shared across targets, and makes it very easy to optimize the canonicalized code because it is still in the form of a DAG.
SelectionDAG Optimization Phase: the DAG Combiner¶
The SelectionDAG optimization phase is run multiple times for code generation, immediately after the DAG is built and once after each legalization. The first run of the pass allows the initial code to be cleaned up (e.g. performing optimizations that depend on knowing that the operators have restricted type inputs). Subsequent runs of the pass clean up the messy code generated by the Legalize passes, which allows Legalize to be very simple (it can focus on making code legal instead of focusing on generating good and legal code).
One important class of optimizations performed is optimizing inserted sign and zero extension instructions. We currently use ad-hoc techniques, but could move to more rigorous techniques in the future. Here are some good papers on the subject:
“Widening integer arithmetic”
Kevin Redwine and Norman Ramsey
International Conference on Compiler Construction (CC) 2004
“Effective sign extension elimination”
Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani
Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
and Implementation.
SelectionDAG Select Phase¶
The Select phase is the bulk of the target-specific code for instruction selection. This phase takes a legal SelectionDAG as input, pattern matches the instructions supported by the target to this DAG, and produces a new DAG of target code. For example, consider the following LLVM fragment:
%t1 = fadd float %W, %X
%t2 = fmul float %t1, %Y
%t3 = fadd float %t2, %Z
This LLVM code corresponds to a SelectionDAG that looks basically like this:
(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
If a target supports floating point multiply-and-add (FMA) operations, one of the adds can be merged with the multiply. On the PowerPC, for example, the output of the instruction selector might look like this DAG:
(FMADDS (FADDS W, X), Y, Z)
The FMADDS
instruction is a ternary instruction that multiplies its first
two operands and adds the third (as single-precision floating-point numbers).
The FADDS
instruction is a simple binary single-precision add instruction.
To perform this pattern match, the PowerPC backend includes the following
instruction definitions:
def FMADDS : AForm_1<59, 29,
(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
"fmadds $FRT, $FRA, $FRC, $FRB",
[(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
F4RC:$FRB))]>;
def FADDS : AForm_2<59, 21,
(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
"fadds $FRT, $FRA, $FRB",
[(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;
The highlighted portion of the instruction definitions indicates the pattern
used to match the instructions. The DAG operators (like fmul
/fadd
)
are defined in the include/llvm/Target/TargetSelectionDAG.td
file.
“F4RC
” is the register class of the input and result values.
The TableGen DAG instruction selector generator reads the instruction patterns
in the .td
file and automatically builds parts of the pattern matching code
for your target. It has the following strengths:
At compiler-compiler time, it analyzes your instruction patterns and tells you if your patterns make sense or not.
It can handle arbitrary constraints on operands for the pattern match. In particular, it is straight-forward to say things like “match any immediate that is a 13-bit sign-extended value”. For examples, see the
immSExt16
and relatedtblgen
classes in the PowerPC backend.It knows several important identities for the patterns defined. For example, it knows that addition is commutative, so it allows the
FMADDS
pattern above to match “(fadd X, (fmul Y, Z))
” as well as “(fadd (fmul X, Y), Z)
”, without the target author having to specially handle this case.It has a full-featured type-inferencing system. In particular, you should rarely have to explicitly tell the system what type parts of your patterns are. In the
FMADDS
case above, we didn’t have to telltblgen
that all of the nodes in the pattern are of type ‘f32’. It was able to infer and propagate this knowledge from the fact thatF4RC
has type ‘f32’.Targets can define their own (and rely on built-in) “pattern fragments”. Pattern fragments are chunks of reusable patterns that get inlined into your patterns during compiler-compiler time. For example, the integer “
(not x)
” operation is actually defined as a pattern fragment that expands as “(xor x, -1)
”, since the SelectionDAG does not have a native ‘not
‘ operation. Targets can define their own short-hand fragments as they see fit. See the definition of ‘not
‘ and ‘ineg
‘ for examples.In addition to instructions, targets can specify arbitrary patterns that map to one or more instructions using the ‘Pat’ class. For example, the PowerPC has no way to load an arbitrary integer immediate into a register in one instruction. To tell tblgen how to do this, it defines:
// Arbitrary immediate support. Implement in terms of LIS/ORI. def : Pat<(i32 imm:$imm), (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
If none of the single-instruction patterns for loading an immediate into a register match, this will be used. This rule says “match an arbitrary i32 immediate, turning it into an
ORI
(‘or a 16-bit immediate’) and anLIS
(‘load 16-bit immediate, where the immediate is shifted to the left 16 bits’) instruction”. To make this work, theLO16
/HI16
node transformations are used to manipulate the input immediate (in this case, take the high or low 16-bits of the immediate).When using the ‘Pat’ class to map a pattern to an instruction that has one or more complex operands (like e.g. X86 addressing mode), the pattern may either specify the operand as a whole using a
ComplexPattern
, or else it may specify the components of the complex operand separately. The latter is done e.g. for pre-increment instructions by the PowerPC back end:def STWU : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst), "stwu $rS, $dst", LdStStoreUpd, []>, RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">; def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff), (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>;
Here, the pair of
ptroff
andptrreg
operands is matched onto the complex operanddst
of classmemri
in theSTWU
instruction.While the system does automate a lot, it still allows you to write custom C++ code to match special cases if there is something that is hard to express.
While it has many strengths, the system currently has some limitations, primarily because it is a work in progress and is not yet finished:
- Overall, there is no way to define or match SelectionDAG nodes that define
multiple values (e.g.
SMUL_LOHI
,LOAD
,CALL
, etc). This is the biggest reason that you currently still have to write custom C++ code for your instruction selector. - There is no great way to support matching complex addressing modes yet. In the future, we will extend pattern fragments to allow them to define multiple values (e.g. the four operands of the X86 addressing mode, which are currently matched with custom C++ code). In addition, we’ll extend fragments so that a fragment can match multiple different patterns.
- We don’t automatically infer flags like
isStore
/isLoad
yet. - We don’t automatically generate the set of supported registers and operations for the Legalizer yet.
- We don’t have a way of tying in custom legalized nodes yet.
Despite these limitations, the instruction selector generator is still quite useful for most of the binary and logical operations in typical instruction sets. If you run into any problems or can’t figure out how to do something, please let Chris know!
SelectionDAG Scheduling and Formation Phase¶
The scheduling phase takes the DAG of target instructions from the selection phase and assigns an order. The scheduler can pick an order depending on various constraints of the machines (i.e. order for minimal register pressure or try to cover instruction latencies). Once an order is established, the DAG is converted to a list of MachineInstrs and the SelectionDAG is destroyed.
Note that this phase is logically separate from the instruction selection phase, but is tied to it closely in the code because it operates on SelectionDAGs.
Future directions for the SelectionDAG¶
- Optional function-at-a-time selection.
- Auto-generate entire selector from
.td
file.
SSA-based Machine Code Optimizations¶
To Be Written
Live Intervals¶
Live Intervals are the ranges (intervals) where a variable is live. They are used by some register allocator passes to determine if two or more virtual registers which require the same physical register are live at the same point in the program (i.e., they conflict). When this situation occurs, one virtual register must be spilled.
Live Variable Analysis¶
The first step in determining the live intervals of variables is to calculate the set of registers that are immediately dead after the instruction (i.e., the instruction calculates the value, but it is never used) and the set of registers that are used by the instruction, but are never used after the instruction (i.e., they are killed). Live variable information is computed for each virtual register and register allocatable physical register in the function. This is done in a very efficient manner because it uses SSA to sparsely compute lifetime information for virtual registers (which are in SSA form) and only has to track physical registers within a block. Before register allocation, LLVM can assume that physical registers are only live within a single basic block. This allows it to do a single, local analysis to resolve physical register lifetimes within each basic block. If a physical register is not register allocatable (e.g., a stack pointer or condition codes), it is not tracked.
Physical registers may be live in to or out of a function. Live in values are
typically arguments in registers. Live out values are typically return values in
registers. Live in values are marked as such, and are given a dummy “defining”
instruction during live intervals analysis. If the last basic block of a
function is a return
, then it’s marked as using all live out values in the
function.
PHI
nodes need to be handled specially, because the calculation of the live
variable information from a depth first traversal of the CFG of the function
won’t guarantee that a virtual register used by the PHI
node is defined
before it’s used. When a PHI
node is encountered, only the definition is
handled, because the uses will be handled in other basic blocks.
For each PHI
node of the current basic block, we simulate an assignment at
the end of the current basic block and traverse the successor basic blocks. If a
successor basic block has a PHI
node and one of the PHI
node’s operands
is coming from the current basic block, then the variable is marked as alive
within the current basic block and all of its predecessor basic blocks, until
the basic block with the defining instruction is encountered.
Live Intervals Analysis¶
We now have the information available to perform the live intervals analysis and
build the live intervals themselves. We start off by numbering the basic blocks
and machine instructions. We then handle the “live-in” values. These are in
physical registers, so the physical register is assumed to be killed by the end
of the basic block. Live intervals for virtual registers are computed for some
ordering of the machine instructions [1, N]
. A live interval is an interval
[i, j)
, where 1 >= i >= j > N
, for which a variable is live.
Note
More to come...
Register Allocation¶
The Register Allocation problem consists in mapping a program Pv, that can use an unbounded number of virtual registers, to a program Pp that contains a finite (possibly small) number of physical registers. Each target architecture has a different number of physical registers. If the number of physical registers is not enough to accommodate all the virtual registers, some of them will have to be mapped into memory. These virtuals are called spilled virtuals.
How registers are represented in LLVM¶
In LLVM, physical registers are denoted by integer numbers that normally range
from 1 to 1023. To see how this numbering is defined for a particular
architecture, you can read the GenRegisterNames.inc
file for that
architecture. For instance, by inspecting
lib/Target/X86/X86GenRegisterInfo.inc
we see that the 32-bit register
EAX
is denoted by 43, and the MMX register MM0
is mapped to 65.
Some architectures contain registers that share the same physical location. A
notable example is the X86 platform. For instance, in the X86 architecture, the
registers EAX
, AX
and AL
share the first eight bits. These physical
registers are marked as aliased in LLVM. Given a particular architecture, you
can check which registers are aliased by inspecting its RegisterInfo.td
file. Moreover, the class MCRegAliasIterator
enumerates all the physical
registers aliased to a register.
Physical registers, in LLVM, are grouped in Register Classes. Elements in the
same register class are functionally equivalent, and can be interchangeably
used. Each virtual register can only be mapped to physical registers of a
particular class. For instance, in the X86 architecture, some virtuals can only
be allocated to 8 bit registers. A register class is described by
TargetRegisterClass
objects. To discover if a virtual register is
compatible with a given physical, this code can be used:
bool RegMapping_Fer::compatible_class(MachineFunction &mf,
unsigned v_reg,
unsigned p_reg) {
assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
"Target register must be physical");
const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
return trc->contains(p_reg);
}
Sometimes, mostly for debugging purposes, it is useful to change the number of
physical registers available in the target architecture. This must be done
statically, inside the TargetRegsterInfo.td
file. Just grep
for
RegisterClass
, the last parameter of which is a list of registers. Just
commenting some out is one simple way to avoid them being used. A more polite
way is to explicitly exclude some registers from the allocation order. See the
definition of the GR8
register class in
lib/Target/X86/X86RegisterInfo.td
for an example of this.
Virtual registers are also denoted by integer numbers. Contrary to physical
registers, different virtual registers never share the same number. Whereas
physical registers are statically defined in a TargetRegisterInfo.td
file
and cannot be created by the application developer, that is not the case with
virtual registers. In order to create new virtual registers, use the method
MachineRegisterInfo::createVirtualRegister()
. This method will return a new
virtual register. Use an IndexedMap<Foo, VirtReg2IndexFunctor>
to hold
information per virtual register. If you need to enumerate all virtual
registers, use the function TargetRegisterInfo::index2VirtReg()
to find the
virtual register numbers:
for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
stuff(VirtReg);
}
Before register allocation, the operands of an instruction are mostly virtual
registers, although physical registers may also be used. In order to check if a
given machine operand is a register, use the boolean function
MachineOperand::isRegister()
. To obtain the integer code of a register, use
MachineOperand::getReg()
. An instruction may define or use a register. For
instance, ADD reg:1026 := reg:1025 reg:1024
defines the registers 1024, and
uses registers 1025 and 1026. Given a register operand, the method
MachineOperand::isUse()
informs if that register is being used by the
instruction. The method MachineOperand::isDef()
informs if that registers is
being defined.
We will call physical registers present in the LLVM bitcode before register
allocation pre-colored registers. Pre-colored registers are used in many
different situations, for instance, to pass parameters of functions calls, and
to store results of particular instructions. There are two types of pre-colored
registers: the ones implicitly defined, and those explicitly
defined. Explicitly defined registers are normal operands, and can be accessed
with MachineInstr::getOperand(int)::getReg()
. In order to check which
registers are implicitly defined by an instruction, use the
TargetInstrInfo::get(opcode)::ImplicitDefs
, where opcode
is the opcode
of the target instruction. One important difference between explicit and
implicit physical registers is that the latter are defined statically for each
instruction, whereas the former may vary depending on the program being
compiled. For example, an instruction that represents a function call will
always implicitly define or use the same set of physical registers. To read the
registers implicitly used by an instruction, use
TargetInstrInfo::get(opcode)::ImplicitUses
. Pre-colored registers impose
constraints on any register allocation algorithm. The register allocator must
make sure that none of them are overwritten by the values of virtual registers
while still alive.
Mapping virtual registers to physical registers¶
There are two ways to map virtual registers to physical registers (or to memory
slots). The first way, that we will call direct mapping, is based on the use
of methods of the classes TargetRegisterInfo
, and MachineOperand
. The
second way, that we will call indirect mapping, relies on the VirtRegMap
class in order to insert loads and stores sending and getting values to and from
memory.
The direct mapping provides more flexibility to the developer of the register
allocator; however, it is more error prone, and demands more implementation
work. Basically, the programmer will have to specify where load and store
instructions should be inserted in the target function being compiled in order
to get and store values in memory. To assign a physical register to a virtual
register present in a given operand, use MachineOperand::setReg(p_reg)
. To
insert a store instruction, use TargetInstrInfo::storeRegToStackSlot(...)
,
and to insert a load instruction, use TargetInstrInfo::loadRegFromStackSlot
.
The indirect mapping shields the application developer from the complexities of
inserting load and store instructions. In order to map a virtual register to a
physical one, use VirtRegMap::assignVirt2Phys(vreg, preg)
. In order to map
a certain virtual register to memory, use
VirtRegMap::assignVirt2StackSlot(vreg)
. This method will return the stack
slot where vreg
‘s value will be located. If it is necessary to map another
virtual register to the same stack slot, use
VirtRegMap::assignVirt2StackSlot(vreg, stack_location)
. One important point
to consider when using the indirect mapping, is that even if a virtual register
is mapped to memory, it still needs to be mapped to a physical register. This
physical register is the location where the virtual register is supposed to be
found before being stored or after being reloaded.
If the indirect strategy is used, after all the virtual registers have been
mapped to physical registers or stack slots, it is necessary to use a spiller
object to place load and store instructions in the code. Every virtual that has
been mapped to a stack slot will be stored to memory after being defined and will
be loaded before being used. The implementation of the spiller tries to recycle
load/store instructions, avoiding unnecessary instructions. For an example of
how to invoke the spiller, see RegAllocLinearScan::runOnMachineFunction
in
lib/CodeGen/RegAllocLinearScan.cpp
.
Handling two address instructions¶
With very rare exceptions (e.g., function calls), the LLVM machine code
instructions are three address instructions. That is, each instruction is
expected to define at most one register, and to use at most two registers.
However, some architectures use two address instructions. In this case, the
defined register is also one of the used registers. For instance, an instruction
such as ADD %EAX, %EBX
, in X86 is actually equivalent to %EAX = %EAX +
%EBX
.
In order to produce correct code, LLVM must convert three address instructions
that represent two address instructions into true two address instructions. LLVM
provides the pass TwoAddressInstructionPass
for this specific purpose. It
must be run before register allocation takes place. After its execution, the
resulting code may no longer be in SSA form. This happens, for instance, in
situations where an instruction such as %a = ADD %b %c
is converted to two
instructions such as:
%a = MOVE %b
%a = ADD %a %c
Notice that, internally, the second instruction is represented as ADD
%a[def/use] %c
. I.e., the register operand %a
is both used and defined by
the instruction.
The SSA deconstruction phase¶
An important transformation that happens during register allocation is called the SSA Deconstruction Phase. The SSA form simplifies many analyses that are performed on the control flow graph of programs. However, traditional instruction sets do not implement PHI instructions. Thus, in order to generate executable code, compilers must replace PHI instructions with other instructions that preserve their semantics.
There are many ways in which PHI instructions can safely be removed from the
target code. The most traditional PHI deconstruction algorithm replaces PHI
instructions with copy instructions. That is the strategy adopted by LLVM. The
SSA deconstruction algorithm is implemented in
lib/CodeGen/PHIElimination.cpp
. In order to invoke this pass, the identifier
PHIEliminationID
must be marked as required in the code of the register
allocator.
Instruction folding¶
Instruction folding is an optimization performed during register allocation that removes unnecessary copy instructions. For instance, a sequence of instructions such as:
%EBX = LOAD %mem_address
%EAX = COPY %EBX
can be safely substituted by the single instruction:
%EAX = LOAD %mem_address
Instructions can be folded with the
TargetRegisterInfo::foldMemoryOperand(...)
method. Care must be taken when
folding instructions; a folded instruction can be quite different from the
original instruction. See LiveIntervals::addIntervalsForSpills
in
lib/CodeGen/LiveIntervalAnalysis.cpp
for an example of its use.
Built in register allocators¶
The LLVM infrastructure provides the application developer with three different register allocators:
- Fast — This register allocator is the default for debug builds. It allocates registers on a basic block level, attempting to keep values in registers and reusing registers as appropriate.
- Basic — This is an incremental approach to register allocation. Live ranges are assigned to registers one at a time in an order that is driven by heuristics. Since code can be rewritten on-the-fly during allocation, this framework allows interesting allocators to be developed as extensions. It is not itself a production register allocator but is a potentially useful stand-alone mode for triaging bugs and as a performance baseline.
- Greedy — The default allocator. This is a highly tuned implementation of the Basic allocator that incorporates global live range splitting. This allocator works hard to minimize the cost of spill code.
- PBQP — A Partitioned Boolean Quadratic Programming (PBQP) based register allocator. This allocator works by constructing a PBQP problem representing the register allocation problem under consideration, solving this using a PBQP solver, and mapping the solution back to a register assignment.
The type of register allocator used in llc
can be chosen with the command
line option -regalloc=...
:
$ llc -regalloc=linearscan file.bc -o ln.s
$ llc -regalloc=fast file.bc -o fa.s
$ llc -regalloc=pbqp file.bc -o pbqp.s
Prolog/Epilog Code Insertion¶
Compact Unwind
Throwing an exception requires unwinding out of a function. The information on how to unwind a given function is traditionally expressed in DWARF unwind (a.k.a. frame) info. But that format was originally developed for debuggers to backtrace, and each Frame Description Entry (FDE) requires ~20-30 bytes per function. There is also the cost of mapping from an address in a function to the corresponding FDE at runtime. An alternative unwind encoding is called compact unwind and requires just 4-bytes per function.
The compact unwind encoding is a 32-bit value, which is encoded in an
architecture-specific way. It specifies which registers to restore and from
where, and how to unwind out of the function. When the linker creates a final
linked image, it will create a __TEXT,__unwind_info
section. This section is
a small and fast way for the runtime to access unwind info for any given
function. If we emit compact unwind info for the function, that compact unwind
info will be encoded in the __TEXT,__unwind_info
section. If we emit DWARF
unwind info, the __TEXT,__unwind_info
section will contain the offset of the
FDE in the __TEXT,__eh_frame
section in the final linked image.
For X86, there are three modes for the compact unwind encoding:
- Function with a Frame Pointer (``EBP`` or ``RBP``)
EBP/RBP
-based frame, whereEBP/RBP
is pushed onto the stack immediately after the return address, thenESP/RSP
is moved toEBP/RBP
. Thus to unwind,ESP/RSP
is restored with the currentEBP/RBP
value, thenEBP/RBP
is restored by popping the stack, and the return is done by popping the stack once more into the PC. All non-volatile registers that need to be restored must have been saved in a small range on the stack that startsEBP-4
toEBP-1020
(RBP-8
toRBP-1020
). The offset (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask:0x00FF0000
). The registers saved are encoded in bits 0-14 (mask:0x00007FFF
) as five 3-bit entries from the following table:Compact Number i386 Register x86-64 Register 1 EBX
RBX
2 ECX
R12
3 EDX
R13
4 EDI
R14
5 ESI
R15
6 EBP
RBP
- Frameless with a Small Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)
- To return, a constant (encoded in the compact unwind encoding) is added to the
ESP/RSP
. Then the return is done by popping the stack into the PC. All non-volatile registers that need to be restored must have been saved on the stack immediately after the return address. The stack size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask:0x00FF0000
). There is a maximum stack size of 1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers saved is encoded in bits 9-12 (mask:0x00001C00
). Bits 0-9 (mask:0x000003FF
) contain which registers were saved and their order. (See theencodeCompactUnwindRegistersWithoutFrame()
function inlib/Target/X86FrameLowering.cpp
for the encoding algorithm.) - Frameless with a Large Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)
- This case is like the “Frameless with a Small Constant Stack Size” case, but
the stack size is too large to encode in the compact unwind encoding. Instead
it requires that the function contains “
subl $nnnnnn, %esp
” in its prolog. The compact encoding contains the offset to the$nnnnnn
value in the function in bits 9-12 (mask:0x00001C00
).
Late Machine Code Optimizations¶
Note
To Be Written
Code Emission¶
The code emission step of code generation is responsible for lowering from the code generator abstractions (like MachineFunction, MachineInstr, etc) down to the abstractions used by the MC layer (MCInst, MCStreamer, etc). This is done with a combination of several different classes: the (misnamed) target-independent AsmPrinter class, target-specific subclasses of AsmPrinter (such as SparcAsmPrinter), and the TargetLoweringObjectFile class.
Since the MC layer works at the level of abstraction of object files, it doesn’t have a notion of functions, global variables etc. Instead, it thinks about labels, directives, and instructions. A key class used at this time is the MCStreamer class. This is an abstract API that is implemented in different ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively an “assembler API”. MCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly level directives.
If you are interested in implementing a code generator for a target, there are three important things that you have to implement for your target:
- First, you need a subclass of AsmPrinter for your target. This class implements the general lowering process converting MachineFunction’s into MC label constructs. The AsmPrinter base class provides a number of useful methods and routines, and also allows you to override the lowering process in some important ways. You should get much of the lowering for free if you are implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile class implements much of the common logic.
- Second, you need to implement an instruction printer for your target. The
instruction printer takes an MCInst and renders it to a raw_ostream as
text. Most of this is automatically generated from the .td file (when you
specify something like “
add $dst, $src1, $src2
” in the instructions), but you need to implement routines to print operands. - Third, you need to implement code that lowers a MachineInstr to an MCInst, usually implemented in “<target>MCInstLower.cpp”. This lowering process is often target specific, and is responsible for turning jump table entries, constant pool indices, global variable addresses, etc into MCLabels as appropriate. This translation layer is also responsible for expanding pseudo ops used by the code generator into the actual machine instructions they correspond to. The MCInsts that are generated by this are fed into the instruction printer or the encoder.
Finally, at your choosing, you can also implement a subclass of MCCodeEmitter which lowers MCInst’s into machine code bytes and relocations. This is important if you want to support direct .o file emission, or would like to implement an assembler for your target.
VLIW Packetizer¶
In a Very Long Instruction Word (VLIW) architecture, the compiler is responsible for mapping instructions to functional-units available on the architecture. To that end, the compiler creates groups of instructions called packets or bundles. The VLIW packetizer in LLVM is a target-independent mechanism to enable the packetization of machine instructions.
Mapping from instructions to functional units¶
Instructions in a VLIW target can typically be mapped to multiple functional units. During the process of packetizing, the compiler must be able to reason about whether an instruction can be added to a packet. This decision can be complex since the compiler has to examine all possible mappings of instructions to functional units. Therefore to alleviate compilation-time complexity, the VLIW packetizer parses the instruction classes of a target and generates tables at compiler build time. These tables can then be queried by the provided machine-independent API to determine if an instruction can be accommodated in a packet.
How the packetization tables are generated and used¶
The packetizer reads instruction classes from a target’s itineraries and creates a deterministic finite automaton (DFA) to represent the state of a packet. A DFA consists of three major elements: inputs, states, and transitions. The set of inputs for the generated DFA represents the instruction being added to a packet. The states represent the possible consumption of functional units by instructions in a packet. In the DFA, transitions from one state to another occur on the addition of an instruction to an existing packet. If there is a legal mapping of functional units to instructions, then the DFA contains a corresponding transition. The absence of a transition indicates that a legal mapping does not exist and that the instruction cannot be added to the packet.
To generate tables for a VLIW target, add TargetGenDFAPacketizer.inc as a
target to the Makefile in the target directory. The exported API provides three
functions: DFAPacketizer::clearResources()
,
DFAPacketizer::reserveResources(MachineInstr *MI)
, and
DFAPacketizer::canReserveResources(MachineInstr *MI)
. These functions allow
a target packetizer to add an instruction to an existing packet and to check
whether an instruction can be added to a packet. See
llvm/CodeGen/DFAPacketizer.h
for more information.
Implementing a Native Assembler¶
Though you’re probably reading this because you want to write or maintain a compiler backend, LLVM also fully supports building a native assembler. We’ve tried hard to automate the generation of the assembler from the .td files (in particular the instruction syntax and encodings), which means that a large part of the manual and repetitive data entry can be factored and shared with the compiler.
Instruction Parsing¶
Note
To Be Written
Instruction Alias Processing¶
Once the instruction is parsed, it enters the MatchInstructionImpl function. The MatchInstructionImpl function performs alias processing and then does actual matching.
Alias processing is the phase that canonicalizes different lexical forms of the same instructions down to one representation. There are several different kinds of alias that are possible to implement and they are listed below in the order that they are processed (which is in order from simplest/weakest to most complex/powerful). Generally you want to use the first alias mechanism that meets the needs of your instruction, because it will allow a more concise description.
Mnemonic Aliases¶
The first phase of alias processing is simple instruction mnemonic remapping for classes of instructions which are allowed with two different mnemonics. This phase is a simple and unconditionally remapping from one input mnemonic to one output mnemonic. It isn’t possible for this form of alias to look at the operands at all, so the remapping must apply for all forms of a given mnemonic. Mnemonic aliases are defined simply, for example X86 has:
def : MnemonicAlias<"cbw", "cbtw">;
def : MnemonicAlias<"smovq", "movsq">;
def : MnemonicAlias<"fldcww", "fldcw">;
def : MnemonicAlias<"fucompi", "fucomip">;
def : MnemonicAlias<"ud2a", "ud2">;
... and many others. With a MnemonicAlias definition, the mnemonic is remapped simply and directly. Though MnemonicAlias’s can’t look at any aspect of the instruction (such as the operands) they can depend on global modes (the same ones supported by the matcher), through a Requires clause:
def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
In this example, the mnemonic gets mapped into a different one depending on the current instruction set.
Instruction Aliases¶
The most general phase of alias processing occurs while matching is happening: it provides new forms for the matcher to match along with a specific instruction to generate. An instruction alias has two parts: the string to match and the instruction to generate. For example:
def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8 :$src)>;
def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
def : InstAlias<"movsx $src, $dst", (MOVSX32rr8 GR32:$dst, GR8 :$src)>;
def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
def : InstAlias<"movsx $src, $dst", (MOVSX64rr8 GR64:$dst, GR8 :$src)>;
def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
This shows a powerful example of the instruction aliases, matching the same mnemonic in multiple different ways depending on what operands are present in the assembly. The result of instruction aliases can include operands in a different order than the destination instruction, and can use an input multiple times, for example:
def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>;
def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
This example also shows that tied operands are only listed once. In the X86 backend, XOR8rr has two input GR8’s and one output GR8 (where an input is tied to the output). InstAliases take a flattened operand list without duplicates for tied operands. The result of an instruction alias can also use immediates and fixed physical registers which are added as simple immediate operands in the result, for example:
// Fixed Immediate operand.
def : InstAlias<"aad", (AAD8i8 10)>;
// Fixed register operand.
def : InstAlias<"fcomi", (COM_FIr ST1)>;
// Simple alias.
def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
Instruction aliases can also have a Requires clause to make them subtarget specific.
If the back-end supports it, the instruction printer can automatically emit the alias rather than what’s being aliased. It typically leads to better, more readable code. If it’s better to print out what’s being aliased, then pass a ‘0’ as the third parameter to the InstAlias definition.
Instruction Matching¶
Note
To Be Written
Target-specific Implementation Notes¶
This section of the document explains features or design decisions that are specific to the code generator for a particular target. First we start with a table that summarizes what features are supported by each target.
Target Feature Matrix¶
Note that this table does not include the C backend or Cpp backends, since they do not use the target independent code generator infrastructure. It also doesn’t list features that are not supported fully by any target yet. It considers a feature to be supported if at least one subtarget supports it. A feature being supported means that it is useful and works for most cases, it does not indicate that there are zero known bugs in the implementation. Here is the key:
Unknown
Not Applicable
No support
Partial Support
Complete Support
Here is the table:
Target
Feature
ARM
Hexagon
MSP430
Mips
NVPTX
PowerPC
Sparc
SystemZ
X86
XCore
eBPF
Is Generally Reliable¶
This box indicates whether the target is considered to be production quality. This indicates that the target has been used as a static compiler to compile large amounts of code by a variety of different people and is in continuous use.
Assembly Parser¶
This box indicates whether the target supports parsing target specific .s files by implementing the MCAsmParser interface. This is required for llvm-mc to be able to act as a native assembler and is required for inline assembly support in the native .o file writer.
Disassembler¶
This box indicates whether the target supports the MCDisassembler API for disassembling machine opcode bytes into MCInst’s.
Inline Asm¶
This box indicates whether the target supports most popular inline assembly constraints and modifiers.
JIT Support¶
This box indicates whether the target supports the JIT compiler through the ExecutionEngine interface.
The ARM backend has basic support for integer code in ARM codegen mode, but lacks NEON and full Thumb support.
.o File Writing¶
This box indicates whether the target supports writing .o files (e.g. MachO, ELF, and/or COFF) files directly from the target. Note that the target also must include an assembly parser and general inline assembly support for full inline assembly support in the .o writer.
Targets that don’t support this feature can obviously still write out .o files, they just rely on having an external assembler to translate from a .s file to a .o file (as is the case for many C compilers).
Tail Calls¶
This box indicates whether the target supports guaranteed tail calls. These are calls marked “tail” and use the fastcc calling convention. Please see the tail call section for more details.
Segmented Stacks¶
This box indicates whether the target supports segmented stacks. This replaces the traditional large C stack with many linked segments. It is compatible with the gcc implementation used by the Go front end.
Basic support exists on the X86 backend. Currently vararg doesn’t work and the object files are not marked the way the gold linker expects, but simple Go programs can be built by dragonegg.
Tail call optimization¶
Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:
- Caller and callee have the calling convention
fastcc
,cc 10
(GHC calling convention) orcc 11
(HiPE calling convention). - The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).
- Option
-tailcallopt
is enabled. - Platform-specific constraints are met.
x86/x86-64 constraints:
- No variable argument lists are used.
- On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported.
PowerPC constraints:
- No variable argument lists are used.
- No byval parameters are used.
- On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.
Example:
Call as llc -tailcallopt test.ll
.
declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
%l1 = add i32 %in1, %in2
%tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
ret i32 %tmp
}
Implications of -tailcallopt
:
To support tail call optimization in situations where the callee has more
arguments than the caller a ‘callee pops arguments’ convention is used. This
currently causes each fastcc
call that is not tail call optimized (because
one or more of above constraints are not met) to be followed by a readjustment
of the stack. So performance might be worse in such cases.
Sibling call optimization¶
Sibling call optimization is a restricted form of tail call optimization.
Unlike tail call optimization described in the previous section, it can be
performed automatically on any tail calls when -tailcallopt
option is not
specified.
Sibling call optimization is currently performed on x86/x86-64 when the following constraints are met:
- Caller and callee have the same calling convention. It can be either
c
orfastcc
. - The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).
- Caller and callee have matching return type or the callee result is not used.
- If any of the callee arguments are being passed in stack, they must be available in caller’s own incoming argument stack and the frame offsets must be the same.
Example:
declare i32 @bar(i32, i32)
define i32 @foo(i32 %a, i32 %b, i32 %c) {
entry:
%0 = tail call i32 @bar(i32 %a, i32 %b)
ret i32 %0
}
The X86 backend¶
The X86 code generator lives in the lib/Target/X86
directory. This code
generator is capable of targeting a variety of x86-32 and x86-64 processors, and
includes support for ISA extensions such as MMX and SSE.
X86 Target Triples supported¶
The following are the known target triples that are supported by the X86 backend. This is not an exhaustive list, and it would be useful to add those that people test.
- i686-pc-linux-gnu — Linux
- i386-unknown-freebsd5.3 — FreeBSD 5.3
- i686-pc-cygwin — Cygwin on Win32
- i686-pc-mingw32 — MingW on Win32
- i386-pc-mingw32msvc — MingW crosscompiler on Linux
- i686-apple-darwin* — Apple Darwin on X86
- x86_64-unknown-linux-gnu — Linux
X86 Calling Conventions supported¶
The following target-specific calling conventions are known to backend:
- x86_StdCall — stdcall calling convention seen on Microsoft Windows platform (CC ID = 64).
- x86_FastCall — fastcall calling convention seen on Microsoft Windows platform (CC ID = 65).
- x86_ThisCall — Similar to X86_StdCall. Passes first argument in ECX, others via stack. Callee is responsible for stack cleaning. This convention is used by MSVC by default for methods in its ABI (CC ID = 70).
Representing X86 addressing modes in MachineInstrs¶
The x86 has a very flexible way of accessing memory. It is capable of forming memory addresses of the following expression directly in integer instructions (which use ModR/M addressing):
SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
In order to represent this, LLVM tracks no less than 5 operands for each memory
operand of this form. This means that the “load” form of ‘mov
‘ has the
following MachineOperand
s in this order:
Index: 0 | 1 2 3 4 5
Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement Segment
OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg
Stores, and all other instructions, treat the four memory operands in the same way and in the same order. If the segment register is unspecified (regno = 0), then no segment override is generated. “Lea” operations do not have a segment register specified, so they only have 4 operands for their memory reference.
X86 address spaces supported¶
x86 has a feature which provides the ability to perform loads and stores to different address spaces via the x86 segment registers. A segment override prefix byte on an instruction causes the instruction’s memory access to go to the specified segment. LLVM address space 0 is the default address space, which includes the stack, and any unqualified memory accesses in a program. Address spaces 1-255 are currently reserved for user-defined code. The GS-segment is represented by address space 256, while the FS-segment is represented by address space 257. Other x86 segments have yet to be allocated address space numbers.
While these address spaces may seem similar to TLS via the thread_local
keyword, and often use the same underlying hardware, there are some fundamental
differences.
The thread_local
keyword applies to global variables and specifies that they
are to be allocated in thread-local memory. There are no type qualifiers
involved, and these variables can be pointed to with normal pointers and
accessed with normal loads and stores. The thread_local
keyword is
target-independent at the LLVM IR level (though LLVM doesn’t yet have
implementations of it for some configurations)
Special address spaces, in contrast, apply to static types. Every load and store has a particular address space in its address operand type, and this is what determines which address space is accessed. LLVM ignores these special address space qualifiers on global variables, and does not provide a way to directly allocate storage in them. At the LLVM IR level, the behavior of these special address spaces depends in part on the underlying OS or runtime environment, and they are specific to x86 (and LLVM doesn’t yet handle them correctly in some cases).
Some operating systems and runtime environments use (or may in the future use) the FS/GS-segment registers for various low-level purposes, so care should be taken when considering them.
Instruction naming¶
An instruction name consists of the base name, a default operand size, and a a character per operand with an optional special size. For example:
ADD8rr -> add, 8-bit register, 8-bit register
IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
The PowerPC backend¶
The PowerPC code generator lives in the lib/Target/PowerPC directory. The code generation is retargetable to several variations or subtargets of the PowerPC ISA; including ppc32, ppc64 and altivec.
LLVM PowerPC ABI¶
LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative (PIC) or static addressing for accessing global values, so no TOC (r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth of a stack frame. LLVM takes advantage of having no TOC to provide space to save the frame pointer in the PowerPC linkage area of the caller frame. Other details of PowerPC ABI can be found at PowerPC ABI. Note: This link describes the 32 bit ABI. The 64 bit ABI is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved for system use.
Frame Layout¶
The size of a PowerPC frame is usually fixed for the duration of a function’s invocation. Since the frame is fixed size, all references into the frame can be accessed via fixed offsets from the stack pointer. The exception to this is when dynamic alloca or variable sized arrays are present, then a base pointer (r31) is used as a proxy for the stack pointer and stack pointer is free to grow or shrink. A base pointer is also used if llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is always aligned to 16 bytes, so that space allocated for altivec vectors will be properly aligned.
An invocation frame is laid out as follows (low memory at top):
Linkage
Parameter area
Dynamic area
Locals area
Saved registers area
Previous Frame
The linkage area is used by a callee to save special registers prior to allocating its own frame. Only three entries are relevant to LLVM. The first entry is the previous stack pointer (sp), aka link. This allows probing tools like gdb or exception handlers to quickly scan the frames in the stack. A function epilog can also use the link to pop the frame from the stack. The third entry in the linkage area is used to save the return address from the lr register. Finally, as mentioned above, the last entry is used to save the previous frame pointer (r31.) The entries in the linkage area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64 bit mode.
32 bit linkage area:
0
Saved SP (r1)
4
Saved CR
8
Saved LR
12
Reserved
16
Reserved
20
Saved FP (r31)
64 bit linkage area:
0
Saved SP (r1)
8
Saved CR
16
Saved LR
24
Reserved
32
Reserved
40
Saved FP (r31)
The parameter area is used to store arguments being passed to a callee function. Following the PowerPC ABI, the first few arguments are actually passed in registers, with the space in the parameter area unused. However, if there are not enough registers or the callee is a thunk or vararg function, these register arguments can be spilled into the parameter area. Thus, the parameter area must be large enough to store all the parameters for the largest call sequence made by the caller. The size must also be minimally large enough to spill registers r3-r10. This allows callees blind to the call signature, such as thunks and vararg functions, enough space to cache the argument registers. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is a fixed offset from the top of the frame, that a callee can access its spilt arguments using fixed offsets from the stack pointer (or base pointer.)
Combining the information about the linkage, parameter areas and alignment. A stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit mode.
The dynamic area starts out as size zero. If a function uses dynamic alloca then space is added to the stack, the linkage and parameter areas are shifted to top of stack, and the new space is available immediately below the linkage and parameter areas. The cost of shifting the linkage and parameter areas is minor since only the link value needs to be copied. The link value can be easily fetched by adding the original frame size to the base pointer. Note that allocations in the dynamic space need to observe 16 byte alignment.
The locals area is where the llvm compiler reserves space for local variables.
The saved registers area is where the llvm compiler spills callee saved registers on entry to the callee.
Prolog/Epilog¶
The llvm prolog and epilog are the same as described in the PowerPC ABI, with the following exceptions. Callee saved registers are spilled after the frame is created. This allows the llvm epilog/prolog support to be common with other targets. The base pointer callee saved register r31 is saved in the TOC slot of linkage area. This simplifies allocation of space for the base pointer and makes it convenient to locate programatically and during debugging.
Dynamic Allocation¶
Note
TODO - More to come.
The NVPTX backend¶
The NVPTX code generator under lib/Target/NVPTX is an open-source version of the NVIDIA NVPTX code generator for LLVM. It is contributed by NVIDIA and is a port of the code generator used in the CUDA compiler (nvcc). It targets the PTX 3.0/3.1 ISA and can target any compute capability greater than or equal to 2.0 (Fermi).
This target is of production quality and should be completely compatible with the official NVIDIA toolchain.
Code Generator Options:
Option
Description
sm_20
Set shader model/compute capability to 2.0
sm_21
Set shader model/compute capability to 2.1
sm_30
Set shader model/compute capability to 3.0
sm_35
Set shader model/compute capability to 3.5
ptx30
Target PTX 3.0
ptx31
Target PTX 3.1
The extended Berkeley Packet Filter (eBPF) backend¶
Extended BPF (or eBPF) is similar to the original (“classic”) BPF (cBPF) used to filter network packets. The bpf() system call performs a range of operations related to eBPF. For both cBPF and eBPF programs, the Linux kernel statically analyzes the programs before loading them, in order to ensure that they cannot harm the running system. eBPF is a 64-bit RISC instruction set designed for one to one mapping to 64-bit CPUs. Opcodes are 8-bit encoded, and 87 instructions are defined. There are 10 registers, grouped by function as outlined below.
R0 return value from in-kernel functions; exit value for eBPF program
R1 - R5 function call arguments to in-kernel functions
R6 - R9 callee-saved registers preserved by in-kernel functions
R10 stack frame pointer (read only)
Instruction encoding (arithmetic and jump)¶
eBPF is reusing most of the opcode encoding from classic to simplify conversion of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit ‘code’ field is divided into three parts:
+----------------+--------+--------------------+
| 4 bits | 1 bit | 3 bits |
| operation code | source | instruction class |
+----------------+--------+--------------------+
(MSB) (LSB)
Three LSB bits store instruction class which is one of:
BPF_LD 0x0
BPF_LDX 0x1
BPF_ST 0x2
BPF_STX 0x3
BPF_ALU 0x4
BPF_JMP 0x5
(unused) 0x6
BPF_ALU64 0x7
When BPF_CLASS(code) == BPF_ALU or BPF_ALU64 or BPF_JMP, 4th bit encodes source operand
BPF_X 0x0 use src_reg register as source operand
BPF_K 0x1 use 32 bit immediate as source operand
and four MSB bits store operation code
BPF_ADD 0x0 add
BPF_SUB 0x1 subtract
BPF_MUL 0x2 multiply
BPF_DIV 0x3 divide
BPF_OR 0x4 bitwise logical OR
BPF_AND 0x5 bitwise logical AND
BPF_LSH 0x6 left shift
BPF_RSH 0x7 right shift (zero extended)
BPF_NEG 0x8 arithmetic negation
BPF_MOD 0x9 modulo
BPF_XOR 0xa bitwise logical XOR
BPF_MOV 0xb move register to register
BPF_ARSH 0xc right shift (sign extended)
BPF_END 0xd endianness conversion
If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of
BPF_JA 0x0 unconditional jump
BPF_JEQ 0x1 jump ==
BPF_JGT 0x2 jump >
BPF_JGE 0x3 jump >=
BPF_JSET 0x4 jump if (DST & SRC)
BPF_JNE 0x5 jump !=
BPF_JSGT 0x6 jump signed >
BPF_JSGE 0x7 jump signed >=
BPF_CALL 0x8 function call
BPF_EXIT 0x9 function return
Instruction encoding (load, store)¶
For load and store instructions the 8-bit ‘code’ field is divided as:
+--------+--------+-------------------+
| 3 bits | 2 bits | 3 bits |
| mode | size | instruction class |
+--------+--------+-------------------+
(MSB) (LSB)
Size modifier is one of
BPF_W 0x0 word
BPF_H 0x1 half word
BPF_B 0x2 byte
BPF_DW 0x3 double word
Mode modifier is one of
BPF_IMM 0x0 immediate
BPF_ABS 0x1 used to access packet data
BPF_IND 0x2 used to access packet data
BPF_MEM 0x3 memory
(reserved) 0x4
(reserved) 0x5
BPF_XADD 0x6 exclusive add
Packet data access (BPF_ABS, BPF_IND)¶
Two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD) which are used to access packet data. Register R6 is an implicit input that must contain pointer to sk_buff. Register R0 is an implicit output which contains the data fetched from the packet. Registers R1-R5 are scratch registers and must not be used to store the data across BPF_ABS | BPF_LD or BPF_IND | BPF_LD instructions. These instructions have implicit program exit condition as well. When eBPF program is trying to access the data beyond the packet boundary, the interpreter will abort the execution of the program.
- BPF_IND | BPF_W | BPF_LD is equivalent to:
- R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
eBPF maps¶
eBPF maps are provided for sharing data between kernel and user-space. Currently implemented types are hash and array, with potential extension to support bloom filters, radix trees, etc. A map is defined by its type, maximum number of elements, key size and value size in bytes. eBPF syscall supports create, update, find and delete functions on maps.
Function calls¶
Function call arguments are passed using up to five registers (R1 - R5). The return value is passed in a dedicated register (R0). Four additional registers (R6 - R9) are callee-saved, and the values in these registers are preserved within kernel functions. R0 - R5 are scratch registers within kernel functions, and eBPF programs must therefor store/restore values in these registers if needed across function calls. The stack can be accessed using the read-only frame pointer R10. eBPF registers map 1:1 to hardware registers on x86_64 and other 64-bit architectures. For example, x86_64 in-kernel JIT maps them as
R0 - rax
R1 - rdi
R2 - rsi
R3 - rdx
R4 - rcx
R5 - r8
R6 - rbx
R7 - r13
R8 - r14
R9 - r15
R10 - rbp
since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing and rbx, r12 - r15 are callee saved.
Program start¶
An eBPF program receives a single argument and contains a single eBPF main routine; the program does not contain eBPF functions. Function calls are limited to a predefined set of kernel functions. The size of a program is limited to 4K instructions: this ensures fast termination and a limited number of kernel function calls. Prior to running an eBPF program, a verifier performs static analysis to prevent loops in the code and to ensure valid register usage and operand types.
Exception Handling in LLVM¶
Introduction¶
This document is the central repository for all information pertaining to exception handling in LLVM. It describes the format that LLVM exception handling information takes, which is useful for those interested in creating front-ends or dealing directly with the information. Further, this document provides specific examples of what exception handling information is used for in C and C++.
Itanium ABI Zero-cost Exception Handling¶
Exception handling for most programming languages is designed to recover from conditions that rarely occur during general use of an application. To that end, exception handling should not interfere with the main flow of an application’s algorithm by performing checkpointing tasks, such as saving the current pc or register state.
The Itanium ABI Exception Handling Specification defines a methodology for providing outlying data in the form of exception tables without inlining speculative exception handling code in the flow of an application’s main algorithm. Thus, the specification is said to add “zero-cost” to the normal execution of an application.
A more complete description of the Itanium ABI exception handling runtime support of can be found at Itanium C++ ABI: Exception Handling. A description of the exception frame format can be found at Exception Frames, with details of the DWARF 4 specification at DWARF 4 Standard. A description for the C++ exception table formats can be found at Exception Handling Tables.
Setjmp/Longjmp Exception Handling¶
Setjmp/Longjmp (SJLJ) based exception handling uses LLVM intrinsics llvm.eh.sjlj.setjmp and llvm.eh.sjlj.longjmp to handle control flow for exception handling.
For each function which does exception processing — be it try
/catch
blocks or cleanups — that function registers itself on a global frame
list. When exceptions are unwinding, the runtime uses this list to identify
which functions need processing.
Landing pad selection is encoded in the call site entry of the function context. The runtime returns to the function via llvm.eh.sjlj.longjmp, where a switch table transfers control to the appropriate landing pad based on the index stored in the function context.
In contrast to DWARF exception handling, which encodes exception regions and frame information in out-of-line tables, SJLJ exception handling builds and removes the unwind frame context at runtime. This results in faster exception handling at the expense of slower execution when no exceptions are thrown. As exceptions are, by their nature, intended for uncommon code paths, DWARF exception handling is generally preferred to SJLJ.
Windows Runtime Exception Handling¶
Windows runtime based exception handling uses the same basic IR structure as
Itanium ABI based exception handling, but it relies on the personality
functions provided by the native Windows runtime library, __CxxFrameHandler3
for C++ exceptions: __C_specific_handler
for 64-bit SEH or
_frame_handler3/4
for 32-bit SEH. This results in a very different
execution model and requires some minor modifications to the initial IR
representation and a significant restructuring just before code generation.
General information about the Windows x64 exception handling mechanism can be found at MSDN Exception Handling (x64).
Overview¶
When an exception is thrown in LLVM code, the runtime does its best to find a handler suited to processing the circumstance.
The runtime first attempts to find an exception frame corresponding to the function where the exception was thrown. If the programming language supports exception handling (e.g. C++), the exception frame contains a reference to an exception table describing how to process the exception. If the language does not support exception handling (e.g. C), or if the exception needs to be forwarded to a prior activation, the exception frame contains information about how to unwind the current activation and restore the state of the prior activation. This process is repeated until the exception is handled. If the exception is not handled and no activations remain, then the application is terminated with an appropriate error message.
Because different programming languages have different behaviors when handling
exceptions, the exception handling ABI provides a mechanism for
supplying personalities. An exception handling personality is defined by
way of a personality function (e.g. __gxx_personality_v0
in C++),
which receives the context of the exception, an exception structure
containing the exception object type and value, and a reference to the exception
table for the current function. The personality function for the current
compile unit is specified in a common exception frame.
The organization of an exception table is language dependent. For C++, an exception table is organized as a series of code ranges defining what to do if an exception occurs in that range. Typically, the information associated with a range defines which types of exception objects (using C++ type info) that are handled in that range, and an associated action that should take place. Actions typically pass control to a landing pad.
A landing pad corresponds roughly to the code found in the catch
portion of
a try
/catch
sequence. When execution resumes at a landing pad, it
receives an exception structure and a selector value corresponding to the
type of exception thrown. The selector is then used to determine which catch
should actually process the exception.
LLVM Code Generation¶
From a C++ developer’s perspective, exceptions are defined in terms of the
throw
and try
/catch
statements. In this section we will describe the
implementation of LLVM exception handling in terms of C++ examples.
Throw¶
Languages that support exception handling typically provide a throw
operation to initiate the exception process. Internally, a throw
operation
breaks down into two steps.
- A request is made to allocate exception space for an exception structure. This structure needs to survive beyond the current activation. This structure will contain the type and value of the object being thrown.
- A call is made to the runtime to raise the exception, passing the exception structure as an argument.
In C++, the allocation of the exception structure is done by the
__cxa_allocate_exception
runtime function. The exception raising is handled
by __cxa_throw
. The type of the exception is represented using a C++ RTTI
structure.
Try/Catch¶
A call within the scope of a try statement can potentially raise an
exception. In those circumstances, the LLVM C++ front-end replaces the call with
an invoke
instruction. Unlike a call, the invoke
has two potential
continuation points:
- where to continue when the call succeeds as per normal, and
- where to continue if the call raises an exception, either by a throw or the unwinding of a throw
The term used to define the place where an invoke
continues after an
exception is called a landing pad. LLVM landing pads are conceptually
alternative function entry points where an exception structure reference and a
type info index are passed in as arguments. The landing pad saves the exception
structure reference and then proceeds to select the catch block that corresponds
to the type info of the exception object.
The LLVM ‘landingpad’ Instruction is used to convey information about the landing
pad to the back end. For C++, the landingpad
instruction returns a pointer
and integer pair corresponding to the pointer to the exception structure and
the selector value respectively.
The landingpad
instruction takes a reference to the personality function to
be used for this try
/catch
sequence. The remainder of the instruction is
a list of cleanup, catch, and filter clauses. The exception is tested
against the clauses sequentially from first to last. The clauses have the
following meanings:
catch <type> @ExcType
- This clause means that the landingpad block should be entered if the
exception being thrown is of type
@ExcType
or a subtype of@ExcType
. For C++,@ExcType
is a pointer to thestd::type_info
object (an RTTI object) representing the C++ exception type. - If
@ExcType
isnull
, any exception matches, so the landingpad should always be entered. This is used for C++ catch-all blocks (“catch (...)
”). - When this clause is matched, the selector value will be equal to the value
returned by “
@llvm.eh.typeid.for(i8* @ExcType)
”. This will always be a positive value.
- This clause means that the landingpad block should be entered if the
exception being thrown is of type
filter <type> [<type> @ExcType1, ..., <type> @ExcTypeN]
- This clause means that the landingpad should be entered if the exception
being thrown does not match any of the types in the list (which, for C++,
are again specified as
std::type_info
pointers). - C++ front-ends use this to implement C++ exception specifications, such as
“
void foo() throw (ExcType1, ..., ExcTypeN) { ... }
”. - When this clause is matched, the selector value will be negative.
- The array argument to
filter
may be empty; for example, “[0 x i8**] undef
”. This means that the landingpad should always be entered. (Note that such afilter
would not be equivalent to “catch i8* null
”, becausefilter
andcatch
produce negative and positive selector values respectively.)
- This clause means that the landingpad should be entered if the exception
being thrown does not match any of the types in the list (which, for C++,
are again specified as
cleanup
This clause means that the landingpad should always be entered.
C++ front-ends use this for calling objects’ destructors.
When this clause is matched, the selector value will be zero.
The runtime may treat “
cleanup
” differently from “catch <type> null
”.In C++, if an unhandled exception occurs, the language runtime will call
std::terminate()
, but it is implementation-defined whether the runtime unwinds the stack and calls object destructors first. For example, the GNU C++ unwinder does not call object destructors when an unhandled exception occurs. The reason for this is to improve debuggability: it ensures thatstd::terminate()
is called from the context of thethrow
, so that this context is not lost by unwinding the stack. A runtime will typically implement this by searching for a matching non-cleanup
clause, and aborting if it does not find one, before entering any landingpad blocks.
Once the landing pad has the type info selector, the code branches to the code for the first catch. The catch then checks the value of the type info selector against the index of type info for that catch. Since the type info index is not known until all the type infos have been gathered in the backend, the catch code must call the llvm.eh.typeid.for intrinsic to determine the index for a given type info. If the catch fails to match the selector then control is passed on to the next catch.
Finally, the entry and exit of catch code is bracketed with calls to
__cxa_begin_catch
and __cxa_end_catch
.
__cxa_begin_catch
takes an exception structure reference as an argument and returns the value of the exception object.__cxa_end_catch
takes no arguments. This function:- Locates the most recently caught exception and decrements its handler count,
- Removes the exception from the caught stack if the handler count goes to zero, and
- Destroys the exception if the handler count goes to zero and the exception was not re-thrown by throw.
Note
a rethrow from within the catch may replace this call with a
__cxa_rethrow
.
Cleanups¶
A cleanup is extra code which needs to be run as part of unwinding a scope. C++ destructors are a typical example, but other languages and language extensions provide a variety of different kinds of cleanups. In general, a landing pad may need to run arbitrary amounts of cleanup code before actually entering a catch block. To indicate the presence of cleanups, a ‘landingpad’ Instruction should have a cleanup clause. Otherwise, the unwinder will not stop at the landing pad if there are no catches or filters that require it to.
Note
Do not allow a new exception to propagate out of the execution of a cleanup. This can corrupt the internal state of the unwinder. Different languages describe different high-level semantics for these situations: for example, C++ requires that the process be terminated, whereas Ada cancels both exceptions and throws a third.
When all cleanups are finished, if the exception is not handled by the current
function, resume unwinding by calling the resume instruction,
passing in the result of the landingpad
instruction for the original
landing pad.
Throw Filters¶
C++ allows the specification of which exception types may be thrown from a
function. To represent this, a top level landing pad may exist to filter out
invalid types. To express this in LLVM code the ‘landingpad’ Instruction will have a
filter clause. The clause consists of an array of type infos.
landingpad
will return a negative value
if the exception does not match any of the type infos. If no match is found then
a call to __cxa_call_unexpected
should be made, otherwise
_Unwind_Resume
. Each of these functions requires a reference to the
exception structure. Note that the most general form of a landingpad
instruction can have any number of catch, cleanup, and filter clauses (though
having more than one cleanup is pointless). The LLVM C++ front-end can generate
such landingpad
instructions due to inlining creating nested exception
handling scopes.
Restrictions¶
The unwinder delegates the decision of whether to stop in a call frame to that call frame’s language-specific personality function. Not all unwinders guarantee that they will stop to perform cleanups. For example, the GNU C++ unwinder doesn’t do so unless the exception is actually caught somewhere further up the stack.
In order for inlining to behave correctly, landing pads must be prepared to
handle selector results that they did not originally advertise. Suppose that a
function catches exceptions of type A
, and it’s inlined into a function that
catches exceptions of type B
. The inliner will update the landingpad
instruction for the inlined landing pad to include the fact that B
is also
caught. If that landing pad assumes that it will only be entered to catch an
A
, it’s in for a rude awakening. Consequently, landing pads must test for
the selector results they understand and then resume exception propagation with
the resume instruction if none of the conditions
match.
C++ Exception Handling using the Windows Runtime¶
- (Note: Windows C++ exception handling support is a work in progress and is
- not yet fully implemented. The text below describes how it will work when completed.)
The Windows runtime function for C++ exception handling uses a multi-phase approach. When an exception occurs it searches the current callstack for a frame that has a handler for the exception. If a handler is found, it then calls the cleanup handler for each frame above the handler which has a cleanup handler before calling the catch handler. These calls are all made from a stack context different from the original frame in which the handler is defined. Therefore, it is necessary to outline these handlers from their original context before code generation.
Catch handlers are called with a pointer to the handler itself as the first
argument and a pointer to the parent function’s stack frame as the second
argument. The catch handler uses the llvm.localrecover to get a
pointer to a frame allocation block that is created in the parent frame using
the llvm.localescape intrinsic.
The WinEHPrepare
pass will have created a structure definition for the
contents of this block. The first two members of the structure will always be
(1) a 32-bit integer that the runtime uses to track the exception state of the
parent frame for the purposes of handling chained exceptions and (2) a pointer
to the object associated with the exception (roughly, the parameter of the
catch clause). These two members will be followed by any frame variables from
the parent function which must be accessed in any of the functions unwind or
catch handlers. The catch handler returns the address at which execution
should continue.
Cleanup handlers perform any cleanup necessary as the frame goes out of scope, such as calling object destructors. The runtime handles the actual unwinding of the stack. If an exception occurs in a cleanup handler the runtime manages termination of the process. Cleanup handlers are called with the same arguments as catch handlers (a pointer to the handler and a pointer to the parent stack frame) and use the same mechanism described above to access frame variables in the parent function. Cleanup handlers do not return a value.
The IR generated for Windows runtime based C++ exception handling is initially
very similar to the landingpad
mechanism described above. Calls to
libc++abi functions (such as __cxa_begin_catch
/__cxa_end_catch
and
__cxa_throw_exception
are replaced with calls to intrinsics or Windows
runtime functions (such as llvm.eh.begincatch
/llvm.eh.endcatch
and
__CxxThrowException
).
During the WinEHPrepare pass, the handler functions are outlined into handler
functions and the original landing pad code is replaced with a call to the
llvm.eh.actions
intrinsic that describes the order in which handlers will
be processed from the logical location of the landing pad and an indirect
branch to the return value of the llvm.eh.actions
intrinsic. The
llvm.eh.actions
intrinsic is defined as returning the address at which
execution will continue. This is a temporary construct which will be removed
before code generation, but it allows for the accurate tracking of control
flow until then.
A typical landing pad will look like this after outlining:
lpad:
%vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
cleanup
catch i8* bitcast (i8** @_ZTIi to i8*)
catch i8* bitcast (i8** @_ZTIf to i8*)
%recover = call i8* (...)* @llvm.eh.actions(
i32 3, i8* bitcast (i8** @_ZTIi to i8*), i8* (i8*, i8*)* @_Z4testb.catch.1)
i32 2, i8* null, void (i8*, i8*)* @_Z4testb.cleanup.1)
i32 1, i8* bitcast (i8** @_ZTIf to i8*), i8* (i8*, i8*)* @_Z4testb.catch.0)
i32 0, i8* null, void (i8*, i8*)* @_Z4testb.cleanup.0)
indirectbr i8* %recover, [label %try.cont1, label %try.cont2]
In this example, the landing pad represents an exception handling context with
two catch handlers and a cleanup handler that have been outlined. If an
exception is thrown with a type that matches _ZTIi
, the _Z4testb.catch.1
handler will be called an no clean-up is needed. If an exception is thrown
with a type that matches _ZTIf
, first the _Z4testb.cleanup.1
handler
will be called to perform unwind-related cleanup, then the _Z4testb.catch.1
handler will be called. If an exception is throw which does not match either
of these types and the exception is handled by another frame further up the
call stack, first the _Z4testb.cleanup.1
handler will be called, then the
_Z4testb.cleanup.0
handler (which corresponds to a different scope) will be
called, and exception handling will continue at the next frame in the call
stack will be called. One of the catch handlers will return the address of
%try.cont1
in the parent function and the other will return the address of
%try.cont2
, meaning that execution continues at one of those blocks after
an exception is caught.
Exception Handling Intrinsics¶
In addition to the landingpad
and resume
instructions, LLVM uses several
intrinsic functions (name prefixed with llvm.eh
) to provide exception
handling information at various points in generated code.
llvm.eh.typeid.for
¶
i32 @llvm.eh.typeid.for(i8* %type_info)
This intrinsic returns the type info index in the exception table of the current
function. This value can be used to compare against the result of
landingpad
instruction. The single argument is a reference to a type info.
Uses of this intrinsic are generated by the C++ front-end.
llvm.eh.begincatch
¶
void @llvm.eh.begincatch(i8* %ehptr, i8* %ehobj)
This intrinsic marks the beginning of catch handling code within the blocks
following a landingpad
instruction. The exact behavior of this function
depends on the compilation target and the personality function associated
with the landingpad
instruction.
The first argument to this intrinsic is a pointer that was previously extracted
from the aggregate return value of the landingpad
instruction. The second
argument to the intrinsic is a pointer to stack space where the exception object
should be stored. The runtime handles the details of copying the exception
object into the slot. If the second parameter is null, no copy occurs.
Uses of this intrinsic are generated by the C++ front-end. Many targets will
use implementation-specific functions (such as __cxa_begin_catch
) instead
of this intrinsic. The intrinsic is provided for targets that require a more
abstract interface.
When used in the native Windows C++ exception handling implementation, this intrinsic serves as a placeholder to delimit code before a catch handler is outlined. When the handler is is outlined, this intrinsic will be replaced by instructions that retrieve the exception object pointer from the frame allocation block.
llvm.eh.endcatch
¶
void @llvm.eh.endcatch()
This intrinsic marks the end of catch handling code within the current block,
which will be a successor of a block which called llvm.eh.begincatch''.
The exact behavior of this function depends on the compilation target and the
personality function associated with the corresponding ``landingpad
instruction.
There may be more than one call to llvm.eh.endcatch
for any given call to
llvm.eh.begincatch
with each llvm.eh.endcatch
call corresponding to the
end of a different control path. All control paths following a call to
llvm.eh.begincatch
must reach a call to llvm.eh.endcatch
.
Uses of this intrinsic are generated by the C++ front-end. Many targets will
use implementation-specific functions (such as __cxa_begin_catch
) instead
of this intrinsic. The intrinsic is provided for targets that require a more
abstract interface.
When used in the native Windows C++ exception handling implementation, this intrinsic serves as a placeholder to delimit code before a catch handler is outlined. After the handler is outlined, this intrinsic is simply removed.
llvm.eh.actions
¶
void @llvm.eh.actions()
This intrinsic represents the list of actions to take when an exception is
thrown. It is typically used by Windows exception handling schemes where cleanup
outlining is required by the runtime. The arguments are a sequence of i32
sentinels indicating the action type followed by some pre-determined number of
arguments required to implement that action.
A code of i32 0
indicates a cleanup action, which expects one additional
argument. The argument is a pointer to a function that implements the cleanup
action.
A code of i32 1
indicates a catch action, which expects three additional
arguments. Different EH schemes give different meanings to the three arguments,
but the first argument indicates whether the catch should fire, the second is
the localescape index of the exception object, and the third is the code to run
to catch the exception.
For Windows C++ exception handling, the first argument for a catch handler is a
pointer to the RTTI type descriptor for the object to catch. The second
argument is an index into the argument list of the llvm.localescape
call in
the main function. The exception object will be copied into the provided stack
object. If the exception object is not required, this argument should be -1.
The third argument is a pointer to a function implementing the catch. This
function returns the address of the basic block where execution should resume
after handling the exception.
For Windows SEH, the first argument is a pointer to the filter function, which indicates if the exception should be caught or not. The second argument is typically negative one. The third argument is the address of a basic block where the exception will be handled. In other words, catch handlers are not outlined in SEH. After running cleanups, execution immediately resumes at this PC.
In order to preserve the structure of the CFG, a call to ‘llvm.eh.actions
‘
must be followed by an ‘indirectbr‘ instruction that
jumps to the result of the intrinsic call.
SJLJ Intrinsics¶
The llvm.eh.sjlj
intrinsics are used internally within LLVM’s
backend. Uses of them are generated by the backend’s
SjLjEHPrepare
pass.
llvm.eh.sjlj.setjmp
¶
i32 @llvm.eh.sjlj.setjmp(i8* %setjmp_buf)
For SJLJ based exception handling, this intrinsic forces register saving for the
current function and stores the address of the following instruction for use as
a destination address by llvm.eh.sjlj.longjmp. The buffer format and the
overall functioning of this intrinsic is compatible with the GCC
__builtin_setjmp
implementation allowing code built with the clang and GCC
to interoperate.
The single parameter is a pointer to a five word buffer in which the calling context is saved. The front end places the frame pointer in the first word, and the target implementation of this intrinsic should place the destination address for a llvm.eh.sjlj.longjmp in the second word. The following three words are available for use in a target-specific manner.
llvm.eh.sjlj.longjmp
¶
void @llvm.eh.sjlj.longjmp(i8* %setjmp_buf)
For SJLJ based exception handling, the llvm.eh.sjlj.longjmp
intrinsic is
used to implement __builtin_longjmp()
. The single parameter is a pointer to
a buffer populated by llvm.eh.sjlj.setjmp. The frame pointer and stack
pointer are restored from the buffer, then control is transferred to the
destination address.
llvm.eh.sjlj.lsda
¶
i8* @llvm.eh.sjlj.lsda()
For SJLJ based exception handling, the llvm.eh.sjlj.lsda
intrinsic returns
the address of the Language Specific Data Area (LSDA) for the current
function. The SJLJ front-end code stores this address in the exception handling
function context for use by the runtime.
llvm.eh.sjlj.callsite
¶
void @llvm.eh.sjlj.callsite(i32 %call_site_num)
For SJLJ based exception handling, the llvm.eh.sjlj.callsite
intrinsic
identifies the callsite value associated with the following invoke
instruction. This is used to ensure that landing pad entries in the LSDA are
generated in matching order.
Asm Table Formats¶
There are two tables that are used by the exception handling runtime to determine which actions should be taken when an exception is thrown.
Exception Handling Frame¶
An exception handling frame eh_frame
is very similar to the unwind frame
used by DWARF debug info. The frame contains all the information necessary to
tear down the current frame and restore the state of the prior frame. There is
an exception handling frame for each function in a compile unit, plus a common
exception handling frame that defines information common to all functions in the
unit.
Exception Tables¶
An exception table contains information about what actions to take when an exception is thrown in a particular part of a function’s code. There is one exception table per function, except leaf functions and functions that have calls only to non-throwing functions. They do not need an exception table.
LLVM Link Time Optimization: Design and Implementation¶
Description¶
LLVM features powerful intermodular optimizations which can be used at link time. Link Time Optimization (LTO) is another name for intermodular optimization when performed during the link stage. This document describes the interface and design between the LTO optimizer and the linker.
Design Philosophy¶
The LLVM Link Time Optimizer provides complete transparency, while doing intermodular optimization, in the compiler tool chain. Its main goal is to let the developer take advantage of intermodular optimizations without making any significant changes to the developer’s makefiles or build system. This is achieved through tight integration with the linker. In this model, the linker treates LLVM bitcode files like native object files and allows mixing and matching among them. The linker uses libLTO, a shared object, to handle LLVM bitcode files. This tight integration between the linker and LLVM optimizer helps to do optimizations that are not possible in other models. The linker input allows the optimizer to avoid relying on conservative escape analysis.
Example of link time optimization¶
The following example illustrates the advantages of LTO’s integrated approach and clean interface. This example requires a system linker which supports LTO through the interface described in this document. Here, clang transparently invokes system linker.
- Input source file
a.c
is compiled into LLVM bitcode form. - Input source file
main.c
is compiled into native object code.
--- a.h ---
extern int foo1(void);
extern void foo2(void);
extern void foo4(void);
--- a.c ---
#include "a.h"
static signed int i = 0;
void foo2(void) {
i = -1;
}
static int foo3() {
foo4();
return 10;
}
int foo1(void) {
int data = 0;
if (i < 0)
data = foo3();
data = data + 42;
return data;
}
--- main.c ---
#include <stdio.h>
#include "a.h"
void foo4(void) {
printf("Hi\n");
}
int main() {
return foo1();
}
To compile, run:
% clang -emit-llvm -c a.c -o a.o # <-- a.o is LLVM bitcode file
% clang -c main.c -o main.o # <-- main.o is native object file
% clang a.o main.o -o main # <-- standard link command without modifications
- In this example, the linker recognizes that
foo2()
is an externally visible symbol defined in LLVM bitcode file. The linker completes its usual symbol resolution pass and finds thatfoo2()
is not used anywhere. This information is used by the LLVM optimizer and it removesfoo2()
. - As soon as
foo2()
is removed, the optimizer recognizes that conditioni < 0
is always false, which meansfoo3()
is never used. Hence, the optimizer also removesfoo3()
. - And this in turn, enables linker to remove
foo4()
.
This example illustrates the advantage of tight integration with the
linker. Here, the optimizer can not remove foo3()
without the linker’s
input.
Alternative Approaches¶
- Compiler driver invokes link time optimizer separately.
- In this model the link time optimizer is not able to take advantage of
information collected during the linker’s normal symbol resolution phase.
In the above example, the optimizer can not remove
foo2()
without the linker’s input because it is externally visible. This in turn prohibits the optimizer from removingfoo3()
. - Use separate tool to collect symbol information from all object files.
- In this model, a new, separate, tool or library replicates the linker’s capability to collect information for link time optimization. Not only is this code duplication difficult to justify, but it also has several other disadvantages. For example, the linking semantics and the features provided by the linker on various platform are not unique. This means, this new tool needs to support all such features and platforms in one super tool or a separate tool per platform is required. This increases maintenance cost for link time optimizer significantly, which is not necessary. This approach also requires staying synchronized with linker developements on various platforms, which is not the main focus of the link time optimizer. Finally, this approach increases end user’s build time due to the duplication of work done by this separate tool and the linker itself.
Multi-phase communication between libLTO
and linker¶
The linker collects information about symbol definitions and uses in various link objects which is more accurate than any information collected by other tools during typical build cycles. The linker collects this information by looking at the definitions and uses of symbols in native .o files and using symbol visibility information. The linker also uses user-supplied information, such as a list of exported symbols. LLVM optimizer collects control flow information, data flow information and knows much more about program structure from the optimizer’s point of view. Our goal is to take advantage of tight integration between the linker and the optimizer by sharing this information during various linking phases.
Phase 1 : Read LLVM Bitcode Files¶
The linker first reads all object files in natural order and collects symbol
information. This includes native object files as well as LLVM bitcode files.
To minimize the cost to the linker in the case that all .o files are native
object files, the linker only calls lto_module_create()
when a supplied
object file is found to not be a native object file. If lto_module_create()
returns that the file is an LLVM bitcode file, the linker then iterates over the
module using lto_module_get_symbol_name()
and
lto_module_get_symbol_attribute()
to get all symbols defined and referenced.
This information is added to the linker’s global symbol table.
The lto* functions are all implemented in a shared object libLTO. This allows the LLVM LTO code to be updated independently of the linker tool. On platforms that support it, the shared object is lazily loaded.
Phase 2 : Symbol Resolution¶
In this stage, the linker resolves symbols using global symbol table. It may report undefined symbol errors, read archive members, replace weak symbols, etc. The linker is able to do this seamlessly even though it does not know the exact content of input LLVM bitcode files. If dead code stripping is enabled then the linker collects the list of live symbols.
Phase 3 : Optimize Bitcode Files¶
After symbol resolution, the linker tells the LTO shared object which symbols
are needed by native object files. In the example above, the linker reports
that only foo1()
is used by native object files using
lto_codegen_add_must_preserve_symbol()
. Next the linker invokes the LLVM
optimizer and code generators using lto_codegen_compile()
which returns a
native object file creating by merging the LLVM bitcode files and applying
various optimization passes.
Phase 4 : Symbol Resolution after optimization¶
In this phase, the linker reads optimized a native object file and updates the
internal global symbol table to reflect any changes. The linker also collects
information about any changes in use of external symbols by LLVM bitcode
files. In the example above, the linker notes that foo4()
is not used any
more. If dead code stripping is enabled then the linker refreshes the live
symbol information appropriately and performs dead code stripping.
After this phase, the linker continues linking as if it never saw LLVM bitcode files.
libLTO
¶
libLTO
is a shared object that is part of the LLVM tools, and is intended
for use by a linker. libLTO
provides an abstract C interface to use the LLVM
interprocedural optimizer without exposing details of LLVM’s internals. The
intention is to keep the interface as stable as possible even when the LLVM
optimizer continues to evolve. It should even be possible for a completely
different compilation technology to provide a different libLTO that works with
their object files and the standard linker tool.
lto_module_t
¶
A non-native object file is handled via an lto_module_t
. The following
functions allow the linker to check if a file (on disk or in a memory buffer) is
a file which libLTO can process:
lto_module_is_object_file(const char*)
lto_module_is_object_file_for_target(const char*, const char*)
lto_module_is_object_file_in_memory(const void*, size_t)
lto_module_is_object_file_in_memory_for_target(const void*, size_t, const char*)
If the object file can be processed by libLTO
, the linker creates a
lto_module_t
by using one of:
lto_module_create(const char*)
lto_module_create_from_memory(const void*, size_t)
and when done, the handle is released via
lto_module_dispose(lto_module_t)
The linker can introspect the non-native object file by getting the number of symbols and getting the name and attributes of each symbol via:
lto_module_get_num_symbols(lto_module_t)
lto_module_get_symbol_name(lto_module_t, unsigned int)
lto_module_get_symbol_attribute(lto_module_t, unsigned int)
The attributes of a symbol include the alignment, visibility, and kind.
lto_code_gen_t
¶
Once the linker has loaded each non-native object files into an
lto_module_t
, it can request libLTO
to process them all and generate a
native object file. This is done in a couple of steps. First, a code generator
is created with:
lto_codegen_create()
Then, each non-native object file is added to the code generator with:
lto_codegen_add_module(lto_code_gen_t, lto_module_t)
The linker then has the option of setting some codegen options. Whether or not to generate DWARF debug info is set with:
lto_codegen_set_debug_model(lto_code_gen_t)
Which kind of position independence is set with:
lto_codegen_set_pic_model(lto_code_gen_t)
And each symbol that is referenced by a native object file or otherwise must not be optimized away is set with:
lto_codegen_add_must_preserve_symbol(lto_code_gen_t, const char*)
After all these settings are done, the linker requests that a native object file be created from the modules with the settings using:
lto_codegen_compile(lto_code_gen_t, size*)
which returns a pointer to a buffer containing the generated native object file. The linker then parses that and links it with the rest of the native object files.
Segmented Stacks in LLVM¶
Introduction¶
Segmented stack allows stack space to be allocated incrementally than as a
monolithic chunk (of some worst case size) at thread initialization. This is
done by allocating stack blocks (henceforth called stacklets) and linking them
into a doubly linked list. The function prologue is responsible for checking if
the current stacklet has enough space for the function to execute; and if not,
call into the libgcc runtime to allocate more stack space. Segmented stacks are
enabled with the "split-stack"
attribute on LLVM functions.
The runtime functionality is already there in libgcc.
Implementation Details¶
Allocating Stacklets¶
As mentioned above, the function prologue checks if the current stacklet has
enough space. The current approach is to use a slot in the TCB to store the
current stack limit (minus the amount of space needed to allocate a new block) -
this slot’s offset is again dictated by libgcc
. The generated
assembly looks like this on x86-64:
leaq -8(%rsp), %r10
cmpq %fs:112, %r10
jg .LBB0_2
# More stack space needs to be allocated
movabsq $8, %r10 # The amount of space needed
movabsq $0, %r11 # The total size of arguments passed on stack
callq __morestack
ret # The reason for this extra return is explained below
.LBB0_2:
# Usual prologue continues here
The size of function arguments on the stack needs to be passed to
__morestack
(this function is implemented in libgcc
) since that number
of bytes has to be copied from the previous stacklet to the current one. This is
so that SP (and FP) relative addressing of function arguments work as expected.
The unusual ret
is needed to have the function which made a call to
__morestack
return correctly. __morestack
, instead of returning, calls
into .LBB0_2
. This is possible since both, the size of the ret
instruction and the PC of call to __morestack
are known. When the function
body returns, control is transferred back to __morestack
. __morestack
then de-allocates the new stacklet, restores the correct SP value, and does a
second return, which returns control to the correct caller.
Variable Sized Allocas¶
The section on allocating stacklets automatically assumes that every stack
frame will be of fixed size. However, LLVM allows the use of the llvm.alloca
intrinsic to allocate dynamically sized blocks of memory on the stack. When
faced with such a variable-sized alloca, code is generated to:
- Check if the current stacklet has enough space. If yes, just bump the SP, like in the normal case.
- If not, generate a call to
libgcc
, which allocates the memory from the heap.
The memory allocated from the heap is linked into a list in the current stacklet, and freed along with the same. This prevents a memory leak.
TableGen Fundamentals¶
TableGen¶
TableGen BackEnds¶
- Introduction
- LLVM BackEnds
- Clang BackEnds
- ClangAttrClasses
- ClangAttrParserStringSwitches
- ClangAttrImpl
- ClangAttrList
- ClangAttrPCHRead
- ClangAttrPCHWrite
- ClangAttrSpellings
- ClangAttrSpellingListIndex
- ClangAttrVisitor
- ClangAttrTemplateInstantiate
- ClangAttrParsedAttrList
- ClangAttrParsedAttrImpl
- ClangAttrParsedAttrKinds
- ClangAttrDump
- ClangDiagsDefs
- ClangDiagGroups
- ClangDiagsIndexName
- ClangCommentNodes
- ClangDeclNodes
- ClangStmtNodes
- ClangSACheckers
- ClangCommentHTMLTags
- ClangCommentHTMLTagsProperties
- ClangCommentHTMLNamedCharacterReferences
- ClangCommentCommandInfo
- ClangCommentCommandList
- ArmNeon
- ArmNeonSema
- ArmNeonTest
- AttrDocs
- How to write a back-end
Introduction¶
TableGen backends are at the core of TableGen’s functionality. The source files provide the semantics to a generated (in memory) structure, but it’s up to the backend to print this out in a way that is meaningful to the user (normally a C program including a file or a textual list of warnings, options and error messages).
TableGen is used by both LLVM and Clang with very different goals. LLVM uses it as a way to automate the generation of massive amounts of information regarding instructions, schedules, cores and architecture features. Some backends generate output that is consumed by more than one source file, so they need to be created in a way that is easy to use pre-processor tricks. Some backends can also print C code structures, so that they can be directly included as-is.
Clang, on the other hand, uses it mainly for diagnostic messages (errors, warnings, tips) and attributes, so more on the textual end of the scale.
LLVM BackEnds¶
Warning
This document is raw. Each section below needs three sub-sections: description of its purpose with a list of users, output generated from generic input, and finally why it needed a new backend (in case there’s something similar).
Overall, each backend will take the same TableGen file type and transform into similar output for different targets/uses. There is an implicit contract between the TableGen files, the back-ends and their users.
For instance, a global contract is that each back-end produces macro-guarded sections. Based on whether the file is included by a header or a source file, or even in which context of each file the include is being used, you have todefine a macro just before including it, to get the right output:
#define GET_REGINFO_TARGET_DESC
#include "ARMGenRegisterInfo.inc"
And just part of the generated file would be included. This is useful if you need the same information in multiple formats (instantiation, initialization, getter/setter functions, etc) from the same source TableGen file without having to re-compile the TableGen file multiple times.
Sometimes, multiple macros might be defined before the same include file to output multiple blocks:
#define GET_REGISTER_MATCHER
#define GET_SUBTARGET_FEATURE_NAME
#define GET_MATCHER_IMPLEMENTATION
#include "ARMGenAsmMatcher.inc"
The macros will be undef’d automatically as they’re used, in the include file.
On all LLVM back-ends, the llvm-tblgen
binary will be executed on the root
TableGen file <Target>.td
, which should include all others. This guarantees
that all information needed is accessible, and that no duplication is needed
in the TbleGen files.
CodeEmitter¶
Purpose: CodeEmitterGen uses the descriptions of instructions and their fields to construct an automated code emitter: a function that, given a MachineInstr, returns the (currently, 32-bit unsigned) value of the instruction.
Output: C++ code, implementing the target’s CodeEmitter
class by overriding the virtual functions as <Target>CodeEmitter::function()
.
Usage: Used to include directly at the end of <Target>MCCodeEmitter.cpp
.
RegisterInfo¶
Purpose: This tablegen backend is responsible for emitting a description of a target register file for a code generator. It uses instances of the Register, RegisterAliases, and RegisterClass classes to gather this information.
Output: C++ code with enums and structures representing the register mappings, properties, masks, etc.
Usage: Both on <Target>BaseRegisterInfo
and <Target>MCTargetDesc
(headers
and source files) with macros defining in which they are for declaration vs.
initialization issues.
InstrInfo¶
Purpose: This tablegen backend is responsible for emitting a description of the target instruction set for the code generator. (what are the differences from CodeEmitter?)
Output: C++ code with enums and structures representing the register mappings, properties, masks, etc.
Usage: Both on <Target>BaseInstrInfo
and <Target>MCTargetDesc
(headers
and source files) with macros defining in which they are for declaration vs.
AsmWriter¶
Purpose: Emits an assembly printer for the current target.
Output: Implementation of <Target>InstPrinter::printInstruction()
, among
other things.
Usage: Included directly into InstPrinter/<Target>InstPrinter.cpp
.
AsmMatcher¶
Purpose: Emits a target specifier matcher for
converting parsed assembly operands in the MCInst structures. It also
emits a matcher for custom operand parsing. Extensive documentation is
written on the AsmMatcherEmitter.cpp
file.
Output: Assembler parsers’ matcher functions, declarations, etc.
Usage: Used in back-ends’ AsmParser/<Target>AsmParser.cpp
for
building the AsmParser class.
Disassembler¶
Purpose: Contains disassembler table emitters for various
architectures. Extensive documentation is written on the
DisassemblerEmitter.cpp
file.
Output: Decoding tables, static decoding functions, etc.
Usage: Directly included in Disassembler/<Target>Disassembler.cpp
to cater for all default decodings, after all hand-made ones.
PseudoLowering¶
Purpose: Generate pseudo instruction lowering.
Output: Implements ARMAsmPrinter::emitPseudoExpansionLowering()
.
Usage: Included directly into <Target>AsmPrinter.cpp
.
CallingConv¶
Purpose: Responsible for emitting descriptions of the calling conventions supported by this target.
Output: Implement static functions to deal with calling conventions chained by matching styles, returning false on no match.
Usage: Used in ISelLowering and FastIsel as function pointers to implementation returned by a CC sellection function.
DAGISel¶
Purpose: Generate a DAG instruction selector.
Output: Creates huge functions for automating DAG selection.
Usage: Included in <Target>ISelDAGToDAG.cpp
inside the target’s
implementation of SelectionDAGISel
.
DFAPacketizer¶
Purpose: This class parses the Schedule.td file and produces an API that can be used to reason about whether an instruction can be added to a packet on a VLIW architecture. The class internally generates a deterministic finite automaton (DFA) that models all possible mappings of machine instructions to functional units as instructions are added to a packet.
Output: Scheduling tables for GPU back-ends (Hexagon, AMD).
Usage: Included directly on <Target>InstrInfo.cpp
.
FastISel¶
Purpose: This tablegen backend emits code for use by the “fast” instruction selection algorithm. See the comments at the top of lib/CodeGen/SelectionDAG/FastISel.cpp for background. This file scans through the target’s tablegen instruction-info files and extracts instructions with obvious-looking patterns, and it emits code to look up these instructions by type and operator.
Output: Generates Predicate
and FastEmit
methods.
Usage: Implements private methods of the targets’ implementation
of FastISel
class.
Subtarget¶
Purpose: Generate subtarget enumerations.
Output: Enums, globals, local tables for sub-target information.
Usage: Populates <Target>Subtarget
and
MCTargetDesc/<Target>MCTargetDesc
files (both headers and source).
OptParserDefs¶
Purpose: Print enum values for a class.
Clang BackEnds¶
ClangAttrClasses¶
Purpose: Creates Attrs.inc, which contains semantic attribute class
declarations for any attribute in Attr.td
that has not set ASTNode = 0
.
This file is included as part of Attr.h
.
ClangAttrParserStringSwitches¶
Purpose: Creates AttrParserStringSwitches.inc, which contains
StringSwitch::Case statements for parser-related string switches. Each switch
is given its own macro (such as CLANG_ATTR_ARG_CONTEXT_LIST
, or
CLANG_ATTR_IDENTIFIER_ARG_LIST
), which is expected to be defined before
including AttrParserStringSwitches.inc, and undefined after.
ClangAttrImpl¶
Purpose: Creates AttrImpl.inc, which contains semantic attribute class
definitions for any attribute in Attr.td
that has not set ASTNode = 0
.
This file is included as part of AttrImpl.cpp
.
ClangAttrList¶
Purpose: Creates AttrList.inc, which is used when a list of semantic
attribute identifiers is required. For instance, AttrKinds.h
includes this
file to generate the list of attr::Kind
enumeration values. This list is
separated out into multiple categories: attributes, inheritable attributes, and
inheritable parameter attributes. This categorization happens automatically
based on information in Attr.td
and is used to implement the classof
functionality required for dyn_cast
and similar APIs.
ClangAttrPCHRead¶
Purpose: Creates AttrPCHRead.inc, which is used to deserialize attributes
in the ASTReader::ReadAttributes
function.
ClangAttrPCHWrite¶
Purpose: Creates AttrPCHWrite.inc, which is used to serialize attributes in
the ASTWriter::WriteAttributes
function.
ClangAttrSpellings¶
Purpose: Creates AttrSpellings.inc, which is used to implement the
__has_attribute
feature test macro.
ClangAttrSpellingListIndex¶
Purpose: Creates AttrSpellingListIndex.inc, which is used to map parsed
attribute spellings (including which syntax or scope was used) to an attribute
spelling list index. These spelling list index values are internal
implementation details exposed via
AttributeList::getAttributeSpellingListIndex
.
ClangAttrVisitor¶
Purpose: Creates AttrVisitor.inc, which is used when implementing recursive AST visitors.
ClangAttrTemplateInstantiate¶
Purpose: Creates AttrTemplateInstantiate.inc, which implements the
instantiateTemplateAttribute
function, used when instantiating a template
that requires an attribute to be cloned.
ClangAttrParsedAttrList¶
Purpose: Creates AttrParsedAttrList.inc, which is used to generate the
AttributeList::Kind
parsed attribute enumeration.
ClangAttrParsedAttrImpl¶
Purpose: Creates AttrParsedAttrImpl.inc, which is used by
AttributeList.cpp
to implement several functions on the AttributeList
class. This functionality is implemented via the AttrInfoMap ParsedAttrInfo
array, which contains one element per parsed attribute object.
ClangAttrParsedAttrKinds¶
Purpose: Creates AttrParsedAttrKinds.inc, which is used to implement the
AttributeList::getKind
function, mapping a string (and syntax) to a parsed
attribute AttributeList::Kind
enumeration.
ClangAttrDump¶
Purpose: Creates AttrDump.inc, which dumps information about an attribute.
It is used to implement ASTDumper::dumpAttr
.
ClangDiagsDefs¶
Generate Clang diagnostics definitions.
ClangDiagGroups¶
Generate Clang diagnostic groups.
ClangDiagsIndexName¶
Generate Clang diagnostic name index.
ClangCommentNodes¶
Generate Clang AST comment nodes.
ClangDeclNodes¶
Generate Clang AST declaration nodes.
ClangStmtNodes¶
Generate Clang AST statement nodes.
ClangSACheckers¶
Generate Clang Static Analyzer checkers.
ClangCommentHTMLTags¶
Generate efficient matchers for HTML tag names that are used in documentation comments.
ClangCommentHTMLTagsProperties¶
Generate efficient matchers for HTML tag properties.
ClangCommentHTMLNamedCharacterReferences¶
Generate function to translate named character references to UTF-8 sequences.
ClangCommentCommandInfo¶
Generate command properties for commands that are used in documentation comments.
ClangCommentCommandList¶
Generate list of commands that are used in documentation comments.
ArmNeonSema¶
Generate ARM NEON sema support for clang.
ArmNeonTest¶
Generate ARM NEON tests for clang.
How to write a back-end¶
TODO.
Until we get a step-by-step HowTo for writing TableGen backends, you can at least grab the boilerplate (build system, new files, etc.) from Clang’s r173931.
TODO: How they work, how to write one. This section should not contain details
about any particular backend, except maybe -print-enums
as an example. This
should highlight the APIs in TableGen/Record.h
.
TableGen Language Reference¶
Warning
This document is extremely rough. If you find something lacking, please fix it, file a documentation bug, or ask about it on llvm-dev.
Introduction¶
This document is meant to be a normative spec about the TableGen language in and of itself (i.e. how to understand a given construct in terms of how it affects the final set of records represented by the TableGen file). If you are unsure if this document is really what you are looking for, please read the introduction to TableGen first.
Notation¶
The lexical and syntax notation used here is intended to imitate Python’s. In particular, for lexical definitions, the productions operate at the character level and there is no implied whitespace between elements. The syntax definitions operate at the token level, so there is implied whitespace between tokens.
Lexical Analysis¶
TableGen supports BCPL (// ...
) and nestable C-style (/* ... */
)
comments.
The following is a listing of the basic punctuation tokens:
- + [ ] { } ( ) < > : ; . = ? #
Numeric literals take one of the following forms:
TokInteger ::=DecimalInteger
|HexInteger
|BinInteger
DecimalInteger ::= ["+" | "-"] ("0"..."9")+ HexInteger ::= "0x" ("0"..."9" | "a"..."f" | "A"..."F")+ BinInteger ::= "0b" ("0" | "1")+
One aspect to note is that the DecimalInteger
token includes the
+
or -
, as opposed to having +
and -
be unary operators as
most languages do.
Also note that BinInteger
creates a value of type bits<n>
(where n
is the number of bits). This will implicitly convert to
integers when needed.
TableGen has identifier-like tokens:
ualpha ::= "a"..."z" | "A"..."Z" | "_" TokIdentifier ::= ("0"..."9")*ualpha
(ualpha
| "0"..."9")* TokVarName ::= "$"ualpha
(ualpha
| "0"..."9")*
Note that unlike most languages, TableGen allows TokIdentifier
to
begin with a number. In case of ambiguity, a token will be interpreted as a
numeric literal rather than an identifier.
TableGen also has two string-like literals:
TokString ::= '"' <non-'"' characters and C-like escapes> '"' TokCodeFragment ::= "[{" <shortest text not containing "}]"> "}]"
TokCodeFragment
is essentially a multiline string literal
delimited by [{
and }]
.
Note
The current implementation accepts the following C-like escapes:
\\ \' \" \t \n
TableGen also has the following keywords:
bit bits class code dag
def foreach defm field in
int let list multiclass string
TableGen also has “bang operators” which have a wide variety of meanings:
BangOperator ::= one of !eq !if !head !tail !con !add !shl !sra !srl !and !cast !empty !subst !foreach !listconcat !strconcat
Syntax¶
TableGen has an include
mechanism. It does not play a role in the
syntax per se, since it is lexically replaced with the contents of the
included file.
IncludeDirective ::= "include" TokString
TableGen’s top-level production consists of “objects”.
TableGenFile ::=Object
* Object ::=Class
|Def
|Defm
|Let
|MultiClass
|Foreach
class
es¶
Class ::= "class"TokIdentifier
[TemplateArgList
]ObjectBody
A class
declaration creates a record which other records can inherit
from. A class can be parametrized by a list of “template arguments”, whose
values can be used in the class body.
A given class can only be defined once. A class
declaration is
considered to define the class if any of the following is true:
- The
TemplateArgList
is present. - The
Body
in theObjectBody
is present and is not empty. - The
BaseClassList
in theObjectBody
is present.
You can declare an empty class by giving and empty TemplateArgList
and an empty ObjectBody
. This can serve as a restricted form of
forward declaration: note that records deriving from the forward-declared
class will inherit no fields from it since the record expansion is done
when the record is parsed.
TemplateArgList ::= "<"Declaration
(","Declaration
)* ">"
Declarations¶
The declaration syntax is pretty much what you would expect as a C++ programmer.
Declaration ::=Type
TokIdentifier
["="Value
]
It assigns the value to the identifer.
Types¶
Type ::= "string" | "code" | "bit" | "int" | "dag" | "bits" "<"TokInteger
">" | "list" "<"Type
">" |ClassID
ClassID ::=TokIdentifier
Both string
and code
correspond to the string type; the difference
is purely to indicate programmer intention.
The ClassID
must identify a class that has been previously
declared or defined.
Values¶
Value ::=SimpleValue
ValueSuffix
* ValueSuffix ::= "{"RangeList
"}" | "["RangeList
"]" | "."TokIdentifier
RangeList ::=RangePiece
(","RangePiece
)* RangePiece ::=TokInteger
|TokInteger
"-"TokInteger
|TokInteger
TokInteger
The peculiar last form of RangePiece
is due to the fact that the
“-
” is included in the TokInteger
, hence 1-5
gets lexed as
two consecutive TokInteger
‘s, with values 1
and -5
,
instead of “1”, “-”, and “5”.
The RangeList
can be thought of as specifying “list slice” in some
contexts.
SimpleValue
has a number of forms:
SimpleValue ::= TokIdentifier
The value will be the variable referenced by the identifier. It can be one of:
name of a
def
, such as the use ofBar
in:def Bar : SomeClass { int X = 5; } def Foo { SomeClass Baz = Bar; }
value local to a
def
, such as the use ofBar
in:def Foo { int Bar = 5; int Baz = Bar; }
a template arg of a
class
, such as the use ofBar
in:class Foo<int Bar> { int Baz = Bar; }
value local to a
multiclass
, such as the use ofBar
in:multiclass Foo { int Bar = 5; int Baz = Bar; }
a template arg to a
multiclass
, such as the use ofBar
in:multiclass Foo<int Bar> { int Baz = Bar; }
SimpleValue ::= TokInteger
This represents the numeric value of the integer.
SimpleValue ::= TokString
+
Multiple adjacent string literals are concatenated like in C/C++. The value is the concatenation of the strings.
SimpleValue ::= TokCodeFragment
The value is the string value of the code fragment.
SimpleValue ::= "?"
?
represents an “unset” initializer.
SimpleValue ::= "{"ValueList
"}" ValueList ::= [ValueListNE
] ValueListNE ::=Value
(","Value
)*
This represents a sequence of bits, as would be used to initialize a
bits<n>
field (where n
is the number of bits).
SimpleValue ::=ClassID
"<"ValueListNE
">"
This generates a new anonymous record definition (as would be created by an
unnamed def
inheriting from the given class with the given template
arguments) and the value is the value of that record definition.
SimpleValue ::= "["ValueList
"]" ["<"Type
">"]
A list initializer. The optional Type
can be used to indicate a
specific element type, otherwise the element type will be deduced from the
given values.
SimpleValue ::= "("DagArg
DagArgList
")" DagArgList ::=DagArg
(","DagArg
)* DagArg ::=Value
[":"TokVarName
] |TokVarName
The initial DagArg
is called the “operator” of the dag.
SimpleValue ::=BangOperator
["<"Type
">"] "("ValueListNE
")"
Bodies¶
ObjectBody ::=BaseClassList
Body
BaseClassList ::= [":"BaseClassListNE
] BaseClassListNE ::=SubClassRef
(","SubClassRef
)* SubClassRef ::= (ClassID
|MultiClassID
) ["<"ValueList
">"] DefmID ::=TokIdentifier
The version with the MultiClassID
is only valid in the
BaseClassList
of a defm
.
The MultiClassID
should be the name of a multiclass
.
It is after parsing the base class list that the “let stack” is applied.
Body ::= ";" | "{" BodyList "}" BodyList ::= BodyItem* BodyItem ::=Declaration
";" | "let"TokIdentifier
[RangeList
] "="Value
";"
The let
form allows overriding the value of an inherited field.
def
¶
Def ::= "def"TokIdentifier
ObjectBody
Defines a record whose name is given by the TokIdentifier
. The
fields of the record are inherited from the base classes and defined in the
body.
Special handling occurs if this def
appears inside a multiclass
or
a foreach
.
defm
¶
Defm ::= "defm"TokIdentifier
":"BaseClassListNE
";"
Note that in the BaseClassList
, all of the multiclass
‘s must
precede any class
‘s that appear.
foreach
¶
Foreach ::= "foreach"Declaration
"in" "{"Object
* "}" | "foreach"Declaration
"in"Object
The value assigned to the variable in the declaration is iterated over and the object or object list is reevaluated with the variable set at each iterated value.
Top-Level let
¶
Let ::= "let"LetList
"in" "{"Object
* "}" | "let"LetList
"in"Object
LetList ::=LetItem
(","LetItem
)* LetItem ::=TokIdentifier
[RangeList
] "="Value
This is effectively equivalent to let
inside the body of a record
except that it applies to multiple records at a time. The bindings are
applied at the end of parsing the base classes of a record.
multiclass
¶
MultiClass ::= "multiclass"TokIdentifier
[TemplateArgList
] [":"BaseMultiClassList
] "{"MultiClassObject
+ "}" BaseMultiClassList ::=MultiClassID
(","MultiClassID
)* MultiClassID ::=TokIdentifier
MultiClassObject ::=Def
|Defm
|Let
|Foreach
TableGen Language Introduction¶
Warning
This document is extremely rough. If you find something lacking, please fix it, file a documentation bug, or ask about it on llvm-dev.
Introduction¶
This document is not meant to be a normative spec about the TableGen language in and of itself (i.e. how to understand a given construct in terms of how it affects the final set of records represented by the TableGen file). For the formal language specification, see TableGen Language Reference.
TableGen syntax¶
TableGen doesn’t care about the meaning of data (that is up to the backend to define), but it does care about syntax, and it enforces a simple type system. This section describes the syntax and the constructs allowed in a TableGen file.
TableGen primitives¶
TableGen comments¶
TableGen supports C++ style “//
” comments, which run to the end of the
line, and it also supports nestable “/* */
” comments.
The TableGen type system¶
TableGen files are strongly typed, in a simple (but complete) type-system. These types are used to perform automatic conversions, check for errors, and to help interface designers constrain the input that they allow. Every value definition is required to have an associated type.
TableGen supports a mixture of very low-level types (such as bit
) and very
high-level types (such as dag
). This flexibility is what allows it to
describe a wide range of information conveniently and compactly. The TableGen
types are:
bit
- A ‘bit’ is a boolean value that can hold either 0 or 1.
int
- The ‘int’ type represents a simple 32-bit integer value, such as 5.
string
- The ‘string’ type represents an ordered sequence of characters of arbitrary length.
bits<n>
- A ‘bits’ type is an arbitrary, but fixed, size integer that is broken up into individual bits. This type is useful because it can handle some bits being defined while others are undefined.
list<ty>
- This type represents a list whose elements are some other type. The contained type is arbitrary: it can even be another list type.
- Class type
- Specifying a class name in a type context means that the defined value must
be a subclass of the specified class. This is useful in conjunction with
the
list
type, for example, to constrain the elements of the list to a common base class (e.g., alist<Register>
can only contain definitions derived from the “Register
” class). dag
- This type represents a nestable directed graph of elements.
To date, these types have been sufficient for describing things that TableGen has been used for, but it is straight-forward to extend this list if needed.
TableGen values and expressions¶
TableGen allows for a pretty reasonable number of different expression forms when building up values. These forms allow the TableGen file to be written in a natural syntax and flavor for the application. The current expression forms supported include:
?
- uninitialized field
0b1001011
- binary integer value. Note that this is sized by the number of bits given and will not be silently extended/truncated.
07654321
- octal integer value (indicated by a leading 0)
7
- decimal integer value
0x7F
- hexadecimal integer value
"foo"
- string value
[{ ... }]
- usually called a “code fragment”, but is just a multiline string literal
[ X, Y, Z ]<type>
- list value. <type> is the type of the list element and is usually optional. In rare cases, TableGen is unable to deduce the element type in which case the user must specify it explicitly.
{ a, b, 0b10 }
- initializer for a “bits<4>” value. 1-bit from “a”, 1-bit from “b”, 2-bits from 0b10.
value
- value reference
value{17}
- access to one bit of a value
value{15-17}
- access to multiple bits of a value
DEF
- reference to a record definition
CLASS<val list>
- reference to a new anonymous definition of CLASS with the specified template arguments.
X.Y
- reference to the subfield of a value
list[4-7,17,2-3]
- A slice of the ‘list’ list, including elements 4,5,6,7,17,2, and 3 from it. Elements may be included multiple times.
foreach <var> = [ <list> ] in { <body> }
foreach <var> = [ <list> ] in <def>
- Replicate <body> or <def>, replacing instances of <var> with each value
in <list>. <var> is scoped at the level of the
foreach
loop and must not conflict with any other object introduced in <body> or <def>. Currently onlydef
s are expanded within <body>.
foreach <var> = 0-15 in ...
foreach <var> = {0-15,32-47} in ...
- Loop over ranges of integers. The braces are required for multiple ranges.
(DEF a, b)
- a dag value. The first element is required to be a record definition, the
remaining elements in the list may be arbitrary other values, including
nested
`dag
‘ values. !listconcat(a, b, ...)
- A list value that is the result of concatenating the ‘a’ and ‘b’ lists. The lists must have the same element type. More than two arguments are accepted with the result being the concatenation of all the lists given.
!strconcat(a, b, ...)
- A string value that is the result of concatenating the ‘a’ and ‘b’ strings. More than two arguments are accepted with the result being the concatenation of all the strings given.
str1#str2
- “#” (paste) is a shorthand for !strconcat. It may concatenate things that are not quoted strings, in which case an implicit !cast<string> is done on the operand of the paste.
!cast<type>(a)
- A symbol of type type obtained by looking up the string ‘a’ in the symbol table. If the type of ‘a’ does not match type, TableGen aborts with an error. !cast<string> is a special case in that the argument must be an object defined by a ‘def’ construct.
!subst(a, b, c)
- If ‘a’ and ‘b’ are of string type or are symbol references, substitute ‘b’ for ‘a’ in ‘c.’ This operation is analogous to $(subst) in GNU make.
!foreach(a, b, c)
- For each member of dag or list ‘b’ apply operator ‘c.’ ‘a’ is a dummy variable that should be declared as a member variable of an instantiated class. This operation is analogous to $(foreach) in GNU make.
!head(a)
- The first element of list ‘a.’
!tail(a)
- The 2nd-N elements of list ‘a.’
!empty(a)
- An integer {0,1} indicating whether list ‘a’ is empty.
!if(a,b,c)
- ‘b’ if the result of ‘int’ or ‘bit’ operator ‘a’ is nonzero, ‘c’ otherwise.
!eq(a,b)
- ‘bit 1’ if string a is equal to string b, 0 otherwise. This only operates on string, int and bit objects. Use !cast<string> to compare other types of objects.
!shl(a,b)
!srl(a,b)
!sra(a,b)
!add(a,b)
!and(a,b)
- The usual binary and arithmetic operators.
Note that all of the values have rules specifying how they convert to values
for different types. These rules allow you to assign a value like “7
”
to a “bits<4>
” value, for example.
Classes and definitions¶
As mentioned in the introduction, classes and definitions (collectively known as
‘records’) in TableGen are the main high-level unit of information that TableGen
collects. Records are defined with a def
or class
keyword, the record
name, and an optional list of “template arguments”. If the record has
superclasses, they are specified as a comma separated list that starts with a
colon character (“:
”). If value definitions or let expressions are
needed for the class, they are enclosed in curly braces (“{}
”); otherwise,
the record ends with a semicolon.
Here is a simple TableGen file:
class C { bit V = 1; }
def X : C;
def Y : C {
string Greeting = "hello";
}
This example defines two definitions, X
and Y
, both of which derive from
the C
class. Because of this, they both get the V
bit value. The Y
definition also gets the Greeting member as well.
In general, classes are useful for collecting together the commonality between a group of records and isolating it in a single place. Also, classes permit the specification of default values for their subclasses, allowing the subclasses to override them as they wish.
Value definitions¶
Value definitions define named entries in records. A value must be defined before it can be referred to as the operand for another value definition or before the value is reset with a let expression. A value is defined by specifying a TableGen type and a name. If an initial value is available, it may be specified after the type with an equal sign. Value definitions require terminating semicolons.
‘let’ expressions¶
A record-level let expression is used to change the value of a value definition
in a record. This is primarily useful when a superclass defines a value that a
derived class or definition wants to override. Let expressions consist of the
‘let
‘ keyword followed by a value name, an equal sign (“=
”), and a new
value. For example, a new class could be added to the example above, redefining
the V
field for all of its subclasses:
class D : C { let V = 0; }
def Z : D;
In this case, the Z
definition will have a zero value for its V
value,
despite the fact that it derives (indirectly) from the C
class, because the
D
class overrode its value.
Class template arguments¶
TableGen permits the definition of parameterized classes as well as normal concrete classes. Parameterized TableGen classes specify a list of variable bindings (which may optionally have defaults) that are bound when used. Here is a simple example:
class FPFormat<bits<3> val> {
bits<3> Value = val;
}
def NotFP : FPFormat<0>;
def ZeroArgFP : FPFormat<1>;
def OneArgFP : FPFormat<2>;
def OneArgFPRW : FPFormat<3>;
def TwoArgFP : FPFormat<4>;
def CompareFP : FPFormat<5>;
def CondMovFP : FPFormat<6>;
def SpecialFP : FPFormat<7>;
In this case, template arguments are used as a space efficient way to specify a
list of “enumeration values”, each with a “Value
” field set to the specified
integer.
The more esoteric forms of TableGen expressions are useful in conjunction with template arguments. As an example:
class ModRefVal<bits<2> val> {
bits<2> Value = val;
}
def None : ModRefVal<0>;
def Mod : ModRefVal<1>;
def Ref : ModRefVal<2>;
def ModRef : ModRefVal<3>;
class Value<ModRefVal MR> {
// Decode some information into a more convenient format, while providing
// a nice interface to the user of the "Value" class.
bit isMod = MR.Value{0};
bit isRef = MR.Value{1};
// other stuff...
}
// Example uses
def bork : Value<Mod>;
def zork : Value<Ref>;
def hork : Value<ModRef>;
This is obviously a contrived example, but it shows how template arguments can
be used to decouple the interface provided to the user of the class from the
actual internal data representation expected by the class. In this case,
running llvm-tblgen
on the example prints the following definitions:
def bork { // Value
bit isMod = 1;
bit isRef = 0;
}
def hork { // Value
bit isMod = 1;
bit isRef = 1;
}
def zork { // Value
bit isMod = 0;
bit isRef = 1;
}
This shows that TableGen was able to dig into the argument and extract a piece of information that was requested by the designer of the “Value” class. For more realistic examples, please see existing users of TableGen, such as the X86 backend.
Multiclass definitions and instances¶
While classes with template arguments are a good way to factor commonality
between two instances of a definition, multiclasses allow a convenient notation
for defining multiple definitions at once (instances of implicitly constructed
classes). For example, consider an 3-address instruction set whose instructions
come in two forms: “reg = reg op reg
” and “reg = reg op imm
”
(e.g. SPARC). In this case, you’d like to specify in one place that this
commonality exists, then in a separate place indicate what all the ops are.
Here is an example TableGen fragment that shows this idea:
def ops;
def GPR;
def Imm;
class inst<int opc, string asmstr, dag operandlist>;
multiclass ri_inst<int opc, string asmstr> {
def _rr : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
(ops GPR:$dst, GPR:$src1, GPR:$src2)>;
def _ri : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
(ops GPR:$dst, GPR:$src1, Imm:$src2)>;
}
// Instantiations of the ri_inst multiclass.
defm ADD : ri_inst<0b111, "add">;
defm SUB : ri_inst<0b101, "sub">;
defm MUL : ri_inst<0b100, "mul">;
...
The name of the resultant definitions has the multidef fragment names appended
to them, so this defines ADD_rr
, ADD_ri
, SUB_rr
, etc. A defm may
inherit from multiple multiclasses, instantiating definitions from each
multiclass. Using a multiclass this way is exactly equivalent to instantiating
the classes multiple times yourself, e.g. by writing:
def ops;
def GPR;
def Imm;
class inst<int opc, string asmstr, dag operandlist>;
class rrinst<int opc, string asmstr>
: inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
(ops GPR:$dst, GPR:$src1, GPR:$src2)>;
class riinst<int opc, string asmstr>
: inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
(ops GPR:$dst, GPR:$src1, Imm:$src2)>;
// Instantiations of the ri_inst multiclass.
def ADD_rr : rrinst<0b111, "add">;
def ADD_ri : riinst<0b111, "add">;
def SUB_rr : rrinst<0b101, "sub">;
def SUB_ri : riinst<0b101, "sub">;
def MUL_rr : rrinst<0b100, "mul">;
def MUL_ri : riinst<0b100, "mul">;
...
A defm
can also be used inside a multiclass providing several levels of
multiclass instantiations.
class Instruction<bits<4> opc, string Name> {
bits<4> opcode = opc;
string name = Name;
}
multiclass basic_r<bits<4> opc> {
def rr : Instruction<opc, "rr">;
def rm : Instruction<opc, "rm">;
}
multiclass basic_s<bits<4> opc> {
defm SS : basic_r<opc>;
defm SD : basic_r<opc>;
def X : Instruction<opc, "x">;
}
multiclass basic_p<bits<4> opc> {
defm PS : basic_r<opc>;
defm PD : basic_r<opc>;
def Y : Instruction<opc, "y">;
}
defm ADD : basic_s<0xf>, basic_p<0xf>;
...
// Results
def ADDPDrm { ...
def ADDPDrr { ...
def ADDPSrm { ...
def ADDPSrr { ...
def ADDSDrm { ...
def ADDSDrr { ...
def ADDY { ...
def ADDX { ...
defm
declarations can inherit from classes too, the rule to follow is that
the class list must start after the last multiclass, and there must be at least
one multiclass before them.
class XD { bits<4> Prefix = 11; }
class XS { bits<4> Prefix = 12; }
class I<bits<4> op> {
bits<4> opcode = op;
}
multiclass R {
def rr : I<4>;
def rm : I<2>;
}
multiclass Y {
defm SS : R, XD;
defm SD : R, XS;
}
defm Instr : Y;
// Results
def InstrSDrm {
bits<4> opcode = { 0, 0, 1, 0 };
bits<4> Prefix = { 1, 1, 0, 0 };
}
...
def InstrSSrr {
bits<4> opcode = { 0, 1, 0, 0 };
bits<4> Prefix = { 1, 0, 1, 1 };
}
File scope entities¶
File inclusion¶
TableGen supports the ‘include
‘ token, which textually substitutes the
specified file in place of the include directive. The filename should be
specified as a double quoted string immediately after the ‘include
‘ keyword.
Example:
include "foo.td"
‘let’ expressions¶
“Let” expressions at file scope are similar to “let” expressions within a record, except they can specify a value binding for multiple records at a time, and may be useful in certain other cases. File-scope let expressions are really just another way that TableGen allows the end-user to factor out commonality from the records.
File-scope “let” expressions take a comma-separated list of bindings to apply, and one or more records to bind the values in. Here are some examples:
let isTerminator = 1, isReturn = 1, isBarrier = 1, hasCtrlDep = 1 in
def RET : I<0xC3, RawFrm, (outs), (ins), "ret", [(X86retflag 0)]>;
let isCall = 1 in
// All calls clobber the non-callee saved registers...
let Defs = [EAX, ECX, EDX, FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0,
MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, EFLAGS] in {
def CALLpcrel32 : Ii32<0xE8, RawFrm, (outs), (ins i32imm:$dst,variable_ops),
"call\t${dst:call}", []>;
def CALL32r : I<0xFF, MRM2r, (outs), (ins GR32:$dst, variable_ops),
"call\t{*}$dst", [(X86call GR32:$dst)]>;
def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst, variable_ops),
"call\t{*}$dst", []>;
}
File-scope “let” expressions are often useful when a couple of definitions need
to be added to several records, and the records do not otherwise need to be
opened, as in the case with the CALL*
instructions above.
It’s also possible to use “let” expressions inside multiclasses, providing more ways to factor out commonality from the records, specially if using several levels of multiclass instantiations. This also avoids the need of using “let” expressions within subsequent records inside a multiclass.
multiclass basic_r<bits<4> opc> {
let Predicates = [HasSSE2] in {
def rr : Instruction<opc, "rr">;
def rm : Instruction<opc, "rm">;
}
let Predicates = [HasSSE3] in
def rx : Instruction<opc, "rx">;
}
multiclass basic_ss<bits<4> opc> {
let IsDouble = 0 in
defm SS : basic_r<opc>;
let IsDouble = 1 in
defm SD : basic_r<opc>;
}
defm ADD : basic_ss<0xf>;
Looping¶
TableGen supports the ‘foreach
‘ block, which textually replicates the loop
body, substituting iterator values for iterator references in the body.
Example:
foreach i = [0, 1, 2, 3] in {
def R#i : Register<...>;
def F#i : Register<...>;
}
This will create objects R0
, R1
, R2
and R3
. foreach
blocks
may be nested. If there is only one item in the body the braces may be
elided:
foreach i = [0, 1, 2, 3] in
def R#i : Register<...>;
Code Generator backend info¶
Expressions used by code generator to describe instructions and isel patterns:
(implicit a)
- an implicitly defined physical register. This tells the dag instruction selection emitter the input pattern’s extra definitions matches implicit physical register definitions.
TableGen Deficiencies¶
Introduction¶
Despite being very generic, TableGen has some deficiencies that have been pointed out numerous times. The common theme is that, while TableGen allows you to build Domain-Specific-Languages, the final languages that you create lack the power of other DSLs, which in turn increase considerably the size and complexity of TableGen files.
At the same time, TableGen allows you to create virtually any meaning of the basic concepts via custom-made back-ends, which can pervert the original design and make it very hard for newcomers to understand it.
There are some in favour of extending the semantics even more, but making sure back-ends adhere to strict rules. Others suggesting we should move to more powerful DSLs designed with specific purposes, or even re-using existing DSLs.
Known Problems¶
TODO: Add here frequently asked questions about why TableGen doesn’t do what you want, how it might, and how we could extend/restrict it to be more use friendly.
Introduction¶
TableGen’s purpose is to help a human develop and maintain records of domain-specific information. Because there may be a large number of these records, it is specifically designed to allow writing flexible descriptions and for common features of these records to be factored out. This reduces the amount of duplication in the description, reduces the chance of error, and makes it easier to structure domain specific information.
The core part of TableGen parses a file, instantiates the declarations, and hands the result off to a domain-specific backend for processing.
The current major users of TableGen are The LLVM Target-Independent Code Generator and the Clang diagnostics and attributes.
Note that if you work on TableGen much, and use emacs or vim, that you can find
an emacs “TableGen mode” and a vim language file in the llvm/utils/emacs
and
llvm/utils/vim
directories of your LLVM distribution, respectively.
The TableGen program¶
TableGen files are interpreted by the TableGen program: llvm-tblgen available on your build directory under bin. It is not installed in the system (or where your sysroot is set to), since it has no use beyond LLVM’s build process.
Running TableGen¶
TableGen runs just like any other LLVM tool. The first (optional) argument
specifies the file to read. If a filename is not specified, llvm-tblgen
reads from standard input.
To be useful, one of the backends must be used. These backends are
selectable on the command line (type ‘llvm-tblgen -help
‘ for a list). For
example, to get a list of all of the definitions that subclass a particular type
(which can be useful for building up an enum list of these records), use the
-print-enums
option:
$ llvm-tblgen X86.td -print-enums -class=Register
AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
XMM6, XMM7, XMM8, XMM9,
$ llvm-tblgen X86.td -print-enums -class=Instruction
ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
The default backend prints out all of the records.
If you plan to use TableGen, you will most likely have to write a backend that extracts the information specific to what you need and formats it in the appropriate way.
Example¶
With no other arguments, llvm-tblgen parses the specified file and prints out all
of the classes, then all of the definitions. This is a good way to see what the
various definitions expand to fully. Running this on the X86.td
file prints
this (at the time of this writing):
...
def ADD32rr { // Instruction X86Inst I
string Namespace = "X86";
dag OutOperandList = (outs GR32:$dst);
dag InOperandList = (ins GR32:$src1, GR32:$src2);
string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
list<Register> Uses = [];
list<Register> Defs = [EFLAGS];
list<Predicate> Predicates = [];
int CodeSize = 3;
int AddedComplexity = 0;
bit isReturn = 0;
bit isBranch = 0;
bit isIndirectBranch = 0;
bit isBarrier = 0;
bit isCall = 0;
bit canFoldAsLoad = 0;
bit mayLoad = 0;
bit mayStore = 0;
bit isImplicitDef = 0;
bit isConvertibleToThreeAddress = 1;
bit isCommutable = 1;
bit isTerminator = 0;
bit isReMaterializable = 0;
bit isPredicable = 0;
bit hasDelaySlot = 0;
bit usesCustomInserter = 0;
bit hasCtrlDep = 0;
bit isNotDuplicable = 0;
bit hasSideEffects = 0;
InstrItinClass Itinerary = NoItinerary;
string Constraints = "";
string DisableEncoding = "";
bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
Format Form = MRMDestReg;
bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
ImmType ImmT = NoImm;
bits<3> ImmTypeBits = { 0, 0, 0 };
bit hasOpSizePrefix = 0;
bit hasAdSizePrefix = 0;
bits<4> Prefix = { 0, 0, 0, 0 };
bit hasREX_WPrefix = 0;
FPFormat FPForm = ?;
bits<3> FPFormBits = { 0, 0, 0 };
}
...
This definition corresponds to the 32-bit register-register add
instruction
of the x86 architecture. def ADD32rr
defines a record named
ADD32rr
, and the comment at the end of the line indicates the superclasses
of the definition. The body of the record contains all of the data that
TableGen assembled for the record, indicating that the instruction is part of
the “X86” namespace, the pattern indicating how the instruction is selected by
the code generator, that it is a two-address instruction, has a particular
encoding, etc. The contents and semantics of the information in the record are
specific to the needs of the X86 backend, and are only shown as an example.
As you can see, a lot of information is needed for every instruction supported by the code generator, and specifying it all manually would be unmaintainable, prone to bugs, and tiring to do in the first place. Because we are using TableGen, all of the information was derived from the following definition:
let Defs = [EFLAGS],
isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y
isConvertibleToThreeAddress = 1 in // Can transform into LEA.
def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst),
(ins GR32:$src1, GR32:$src2),
"add{l}\t{$src2, $dst|$dst, $src2}",
[(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
This definition makes use of the custom class I
(extended from the custom
class X86Inst
), which is defined in the X86-specific TableGen file, to
factor out the common features that instructions of its class share. A key
feature of TableGen is that it allows the end-user to define the abstractions
they prefer to use when describing their information.
Each def
record has a special entry called “NAME”. This is the name of the
record (“ADD32rr
” above). In the general case def
names can be formed
from various kinds of string processing expressions and NAME
resolves to the
final value obtained after resolving all of those expressions. The user may
refer to NAME
anywhere she desires to use the ultimate name of the def
.
NAME
should not be defined anywhere else in user code to avoid conflicts.
Syntax¶
TableGen has a syntax that is loosely based on C++ templates, with built-in types and specification. In addition, TableGen’s syntax introduces some automation concepts like multiclass, foreach, let, etc.
Basic concepts¶
TableGen files consist of two key parts: ‘classes’ and ‘definitions’, both of which are considered ‘records’.
TableGen records have a unique name, a list of values, and a list of superclasses. The list of values is the main data that TableGen builds for each record; it is this that holds the domain specific information for the application. The interpretation of this data is left to a specific backend, but the structure and format rules are taken care of and are fixed by TableGen.
TableGen definitions are the concrete form of ‘records’. These generally do
not have any undefined values, and are marked with the ‘def
‘ keyword.
def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
"Enable ARMv8 FP">;
In this example, FeatureFPARMv8 is SubtargetFeature
record initialised
with some values. The names of the classes are defined via the
keyword class either on the same file or some other included. Most target
TableGen files include the generic ones in include/llvm/Target
.
TableGen classes are abstract records that are used to build and describe other records. These classes allow the end-user to build abstractions for either the domain they are targeting (such as “Register”, “RegisterClass”, and “Instruction” in the LLVM code generator) or for the implementor to help factor out common properties of records (such as “FPInst”, which is used to represent floating point instructions in the X86 backend). TableGen keeps track of all of the classes that are used to build up a definition, so the backend can find all definitions of a particular class, such as “Instruction”.
class ProcNoItin<string Name, list<SubtargetFeature> Features>
: Processor<Name, NoItineraries, Features>;
Here, the class ProcNoItin, receiving parameters Name of type string and a list of target features is specializing the class Processor by passing the arguments down as well as hard-coding NoItineraries.
TableGen multiclasses are groups of abstract records that are instantiated all at once. Each instantiation can result in multiple TableGen definitions. If a multiclass inherits from another multiclass, the definitions in the sub-multiclass become part of the current multiclass, as if they were declared in the current multiclass.
multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
dag address, ValueType sty> {
def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
(!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
Base, Offset, Extend)>;
def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
(!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
Base, Offset, Extend)>;
}
defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
!foreach(decls.pattern, address,
!subst(SHIFT, imm_eq0, decls.pattern)),
i8>;
See the TableGen Language Introduction for more generic information on the usage of the language, and the TableGen Language Reference for more in-depth description of the formal language specification.
TableGen backends¶
TableGen files have no real meaning without a back-end. The default operation
of running llvm-tblgen
is to print the information in a textual format, but
that’s only useful for debugging of the TableGen files themselves. The power
in TableGen is, however, to interpret the source files into an internal
representation that can be generated into anything you want.
Current usage of TableGen is to create huge include files with tables that you can either include directly (if the output is in the language you’re coding), or be used in pre-processing via macros surrounding the include of the file.
Direct output can be used if the back-end already prints a table in C format or if the output is just a list of strings (for error and warning messages). Pre-processed output should be used if the same information needs to be used in different contexts (like Instruction names), so your back-end should print a meta-information list that can be shaped into different compile-time formats.
See the TableGen BackEnds for more information.
TableGen Deficiencies¶
Despite being very generic, TableGen has some deficiencies that have been pointed out numerous times. The common theme is that, while TableGen allows you to build Domain-Specific-Languages, the final languages that you create lack the power of other DSLs, which in turn increase considerably the size and complexity of TableGen files.
At the same time, TableGen allows you to create virtually any meaning of the basic concepts via custom-made back-ends, which can pervert the original design and make it very hard for newcomers to understand the evil TableGen file.
There are some in favour of extending the semantics even more, but making sure back-ends adhere to strict rules. Others are suggesting we should move to less, more powerful DSLs designed with specific purposes, or even re-using existing DSLs.
Either way, this is a discussion that will likely span across several years, if not decades. You can read more in the TableGen Deficiencies document.
Debugging JIT-ed Code With GDB¶
Background¶
Without special runtime support, debugging dynamically generated code with GDB (as well as most debuggers) can be quite painful. Debuggers generally read debug information from the object file of the code, but for JITed code, there is no such file to look for.
In order to communicate the necessary debug info to GDB, an interface for
registering JITed code with debuggers has been designed and implemented for
GDB and LLVM MCJIT. At a high level, whenever MCJIT generates new machine code,
it does so in an in-memory object file that contains the debug information in
DWARF format. MCJIT then adds this in-memory object file to a global list of
dynamically generated object files and calls a special function
(__jit_debug_register_code
) marked noinline that GDB knows about. When
GDB attaches to a process, it puts a breakpoint in this function and loads all
of the object files in the global list. When MCJIT calls the registration
function, GDB catches the breakpoint signal, loads the new object file from
the inferior’s memory, and resumes the execution. In this way, GDB can get the
necessary debug information.
GDB Version¶
In order to debug code JIT-ed by LLVM, you need GDB 7.0 or newer, which is available on most modern distributions of Linux. The version of GDB that Apple ships with Xcode has been frozen at 6.3 for a while. LLDB may be a better option for debugging JIT-ed code on Mac OS X.
Debugging MCJIT-ed code¶
The emerging MCJIT component of LLVM allows full debugging of JIT-ed code with GDB. This is due to MCJIT’s ability to use the MC emitter to provide full DWARF debugging information to GDB.
Note that lli has to be passed the -use-mcjit
flag to JIT the code with
MCJIT instead of the old JIT.
Example¶
Consider the following C code (with line numbers added to make the example easier to follow):
1 int compute_factorial(int n)
2 {
3 if (n <= 1)
4 return 1;
5
6 int f = n;
7 while (--n > 1)
8 f *= n;
9 return f;
10 }
11
12
13 int main(int argc, char** argv)
14 {
15 if (argc < 2)
16 return -1;
17 char firstletter = argv[1][0];
18 int result = compute_factorial(firstletter - '0');
19
20 // Returned result is clipped at 255...
21 return result;
22 }
Here is a sample command line session that shows how to build and run this
code via lli
inside GDB:
$ $BINPATH/clang -cc1 -O0 -g -emit-llvm showdebug.c
$ gdb --quiet --args $BINPATH/lli -use-mcjit showdebug.ll 5
Reading symbols from $BINPATH/lli...done.
(gdb) b showdebug.c:6
No source file named showdebug.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (showdebug.c:6) pending.
(gdb) r
Starting program: $BINPATH/lli -use-mcjit showdebug.ll 5
[Thread debugging using libthread_db enabled]
Breakpoint 1, compute_factorial (n=5) at showdebug.c:6
6 int f = n;
(gdb) p n
$1 = 5
(gdb) p f
$2 = 0
(gdb) n
7 while (--n > 1)
(gdb) p f
$3 = 5
(gdb) b showdebug.c:9
Breakpoint 2 at 0x7ffff7ed404c: file showdebug.c, line 9.
(gdb) c
Continuing.
Breakpoint 2, compute_factorial (n=1) at showdebug.c:9
9 return f;
(gdb) p f
$4 = 120
(gdb) bt
#0 compute_factorial (n=1) at showdebug.c:9
#1 0x00007ffff7ed40a9 in main (argc=2, argv=0x16677e0) at showdebug.c:18
#2 0x3500000001652748 in ?? ()
#3 0x00000000016677e0 in ?? ()
#4 0x0000000000000002 in ?? ()
#5 0x0000000000d953b3 in llvm::MCJIT::runFunction (this=0x16151f0, F=0x1603020, ArgValues=...) at /home/ebenders_test/llvm_svn_rw/lib/ExecutionEngine/MCJIT/MCJIT.cpp:161
#6 0x0000000000dc8872 in llvm::ExecutionEngine::runFunctionAsMain (this=0x16151f0, Fn=0x1603020, argv=..., envp=0x7fffffffe040)
at /home/ebenders_test/llvm_svn_rw/lib/ExecutionEngine/ExecutionEngine.cpp:397
#7 0x000000000059c583 in main (argc=4, argv=0x7fffffffe018, envp=0x7fffffffe040) at /home/ebenders_test/llvm_svn_rw/tools/lli/lli.cpp:324
(gdb) finish
Run till exit from #0 compute_factorial (n=1) at showdebug.c:9
0x00007ffff7ed40a9 in main (argc=2, argv=0x16677e0) at showdebug.c:18
18 int result = compute_factorial(firstletter - '0');
Value returned is $5 = 120
(gdb) p result
$6 = 23406408
(gdb) n
21 return result;
(gdb) p result
$7 = 120
(gdb) c
Continuing.
Program exited with code 0170.
(gdb)
The LLVM gold plugin¶
Introduction¶
Building with link time optimization requires cooperation from the system linker. LTO support on Linux systems requires that you use the gold linker which supports LTO via plugins. This is the same mechanism used by the GCC LTO project.
The LLVM gold plugin implements the gold plugin interface on top of
libLTO. The same plugin can also be used by other tools such as
ar
and nm
.
How to build it¶
You need to have gold with plugin support and build the LLVMgold plugin.
Check whether you have gold running /usr/bin/ld -v
. It will report “GNU
gold” or else “GNU ld” if not. If you have gold, check for plugin support
by running /usr/bin/ld -plugin
. If it complains “missing argument” then
you have plugin support. If not, such as an “unknown option” error then you
will either need to build gold or install a version with plugin support.
Download, configure and build gold with plugin support:
$ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils $ mkdir build $ cd build $ ../binutils/configure --enable-gold --enable-plugins --disable-werror $ make all-gold
That should leave you with
build/gold/ld-new
which supports the-plugin
option. Runningmake
will additionally buildbuild/binutils/ar
andnm-new
binaries supporting plugins.Build the LLVMgold plugin. If building with autotools, run configure with
--with-binutils-include=/path/to/binutils/include
and runmake
. If building with CMake, run cmake with-DLLVM_BINUTILS_INCDIR=/path/to/binutils/include
. The correct include path will contain the fileplugin-api.h
.
Usage¶
The linker takes a -plugin
option that points to the path of
the plugin .so
file. To find out what link command gcc
would run in a given situation, run gcc -v [...]
and
look for the line where it runs collect2
. Replace that with
ld-new -plugin /path/to/LLVMgold.so
to test it out. Once you’re
ready to switch to using gold, backup your existing /usr/bin/ld
then replace it with ld-new
.
You should produce bitcode files from clang
with the option
-flto
. This flag will also cause clang
to look for the gold plugin in
the lib
directory under its prefix and pass the -plugin
option to
ld
. It will not look for an alternate linker, which is why you need
gold to be the installed system linker in your path.
ar
and nm
also accept the -plugin
option and it’s possible to
to install LLVMgold.so
to /usr/lib/bfd-plugins
for a seamless setup.
If you built your own gold, be sure to install the ar
and nm-new
you
built to /usr/bin
.
Example of link time optimization¶
The following example shows a worked example of the gold plugin mixing LLVM bitcode and native code.
--- a.c ---
#include <stdio.h>
extern void foo1(void);
extern void foo4(void);
void foo2(void) {
printf("Foo2\n");
}
void foo3(void) {
foo4();
}
int main(void) {
foo1();
}
--- b.c ---
#include <stdio.h>
extern void foo2(void);
void foo1(void) {
foo2();
}
void foo4(void) {
printf("Foo4");
}
--- command lines ---
$ clang -flto a.c -c -o a.o # <-- a.o is LLVM bitcode file
$ ar q a.a a.o # <-- a.a is an archive with LLVM bitcode
$ clang b.c -c -o b.o # <-- b.o is native object file
$ clang -flto a.a b.o -o main # <-- link with LLVMgold plugin
Gold informs the plugin that foo3 is never referenced outside the IR, leading LLVM to delete that function. However, unlike in the libLTO example gold does not currently eliminate foo4.
Quickstart for using LTO with autotooled projects¶
Once your system ld
, ar
, and nm
all support LLVM bitcode,
everything is in place for an easy to use LTO build of autotooled projects:
Follow the instructions on how to build LLVMgold.so.
Install the newly built binutils to
$PREFIX
Copy
Release/lib/LLVMgold.so
to$PREFIX/lib/bfd-plugins/
Set environment variables (
$PREFIX
is where you installed clang and binutils):export CC="$PREFIX/bin/clang -flto" export CXX="$PREFIX/bin/clang++ -flto" export AR="$PREFIX/bin/ar" export NM="$PREFIX/bin/nm" export RANLIB=/bin/true #ranlib is not needed, and doesn't support .bc files in .a
Or you can just set your path:
export PATH="$PREFIX/bin:$PATH" export CC="clang -flto" export CXX="clang++ -flto" export RANLIB=/bin/true
Configure and build the project as usual:
% ./configure && make && make check
The environment variable settings may work for non-autotooled projects too,
but you may need to set the LD
environment variable as well.
Licensing¶
Gold is licensed under the GPLv3. LLVMgold uses the interface file
plugin-api.h
from gold which means that the resulting LLVMgold.so
binary is also GPLv3. This can still be used to link non-GPLv3 programs
just as much as gold could without the plugin.
LLVM’s Optional Rich Disassembly Output¶
Introduction¶
LLVM’s default disassembly output is raw text. To allow consumers more ability to introspect the instructions’ textual representation or to reformat for a more user friendly display there is an optional rich disassembly output.
This optional output is sufficient to reference into individual portions of the instruction text. This is intended for clients like disassemblers, list file generators, and pretty-printers, which need more than the raw instructions and the ability to print them.
To provide this functionality the assembly text is marked up with annotations. The markup is simple enough in syntax to be robust even in the case of version mismatches between consumers and producers. That is, the syntax generally does not carry semantics beyond “this text has an annotation,” so consumers can simply ignore annotations they do not understand or do not care about.
After calling LLVMCreateDisasm()
to create a disassembler context the
optional output is enable with this call:
LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup);
Then subsequent calls to LLVMDisasmInstruction()
will return output strings
with the marked up annotations.
Instruction Annotations¶
Contextual markups¶
Annoated assembly display will supply contextual markup to help clients more efficiently implement things like pretty printers. Most markup will be target independent, so clients can effectively provide good display without any target specific knowledge.
Annotated assembly goes through the normal instruction printer, but optionally includes contextual tags on portions of the instruction string. An annotation is any ‘<’ ‘>’ delimited section of text(1).
annotation: '<' tag-name tag-modifier-list ':' annotated-text '>'
tag-name: identifier
tag-modifier-list: comma delimited identifier list
The tag-name is an identifier which gives the type of the annotation. For the first pass, this will be very simple, with memory references, registers, and immediates having the tag names “mem”, “reg”, and “imm”, respectively.
The tag-modifier-list is typically additional target-specific context, such as register class.
Clients should accept and ignore any tag-names or tag-modifiers they do not understand, allowing the annotations to grow in richness without breaking older clients.
For example, a possible annotation of an ARM load of a stack-relative location might be annotated as:
ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]>
1: For assembly dialects in which ‘<’ and/or ‘>’ are legal tokens, a literal token is escaped by following immediately with a repeat of the character. For example, a literal ‘<’ character is output as ‘<<’ in an annotated assembly string.
C API Details¶
The intended consumers of this information use the C API, therefore the new C
API function for the disassembler will be added to provide an option to produce
disassembled instructions with annotations, LLVMSetDisasmOptions()
and the
LLVMDisassembler_Option_UseMarkup
option (see above).
System Library¶
Abstract¶
This document provides some details on LLVM’s System Library, located in the
source at lib/System
and include/llvm/System
. The library’s purpose is
to shield LLVM from the differences between operating systems for the few
services LLVM needs from the operating system. Much of LLVM is written using
portability features of standard C++. However, in a few areas, system dependent
facilities are needed and the System Library is the wrapper around those system
calls.
By centralizing LLVM’s use of operating system interfaces, we make it possible
for the LLVM tool chain and runtime libraries to be more easily ported to new
platforms since (theoretically) only lib/System
needs to be ported. This
library also unclutters the rest of LLVM from #ifdef use and special cases for
specific operating systems. Such uses are replaced with simple calls to the
interfaces provided in include/llvm/System
.
Note that the System Library is not intended to be a complete operating system wrapper (such as the Adaptive Communications Environment (ACE) or Apache Portable Runtime (APR)), but only provides the functionality necessary to support LLVM.
The System Library was written by Reid Spencer who formulated the design based on similar work originating from the eXtensible Programming System (XPS). Several people helped with the effort; especially, Jeff Cohen and Henrik Bach on the Win32 port.
Keeping LLVM Portable¶
In order to keep LLVM portable, LLVM developers should adhere to a set of portability rules associated with the System Library. Adherence to these rules should help the System Library achieve its goal of shielding LLVM from the variations in operating system interfaces and doing so efficiently. The following sections define the rules needed to fulfill this objective.
Don’t Include System Headers¶
Except in lib/System
, no LLVM source code should directly #include
a
system header. Care has been taken to remove all such #includes
from LLVM
while lib/System
was being developed. Specifically this means that header
files like “unistd.h
”, “windows.h
”, “stdio.h
”, and “string.h
”
are forbidden to be included by LLVM source code outside the implementation of
lib/System
.
To obtain system-dependent functionality, existing interfaces to the system
found in include/llvm/System
should be used. If an appropriate interface is
not available, it should be added to include/llvm/System
and implemented in
lib/System
for all supported platforms.
Don’t Expose System Headers¶
The System Library must shield LLVM from all system headers. To obtain
system level functionality, LLVM source must #include "llvm/System/Thing.h"
and nothing else. This means that Thing.h
cannot expose any system header
files. This protects LLVM from accidentally using system specific functionality
and only allows it via the lib/System
interface.
Use Standard C Headers¶
The standard C headers (the ones beginning with “c”) are allowed to be
exposed through the lib/System
interface. These headers and the things they
declare are considered to be platform agnostic. LLVM source files may include
them directly or obtain their inclusion through lib/System
interfaces.
Use Standard C++ Headers¶
The standard C++ headers from the standard C++ library and standard
template library may be exposed through the lib/System
interface. These
headers and the things they declare are considered to be platform agnostic.
LLVM source files may include them or obtain their inclusion through
lib/System
interfaces.
High Level Interface¶
The entry points specified in the interface of lib/System
must be aimed at
completing some reasonably high level task needed by LLVM. We do not want to
simply wrap each operating system call. It would be preferable to wrap several
operating system calls that are always used in conjunction with one another by
LLVM.
For example, consider what is needed to execute a program, wait for it to
complete, and return its result code. On Unix, this involves the following
operating system calls: getenv
, fork
, execve
, and wait
. The
correct thing for lib/System
to provide is a function, say
ExecuteProgramAndWait
, that implements the functionality completely. what
we don’t want is wrappers for the operating system calls involved.
There must not be a one-to-one relationship between operating system calls and the System library’s interface. Any such interface function will be suspicious.
No Unused Functionality¶
There must be no functionality specified in the interface of lib/System
that isn’t actually used by LLVM. We’re not writing a general purpose operating
system wrapper here, just enough to satisfy LLVM’s needs. And, LLVM doesn’t
need much. This design goal aims to keep the lib/System
interface small and
understandable which should foster its actual use and adoption.
No Duplicate Implementations¶
The implementation of a function for a given platform must be written exactly once. This implies that it must be possible to apply a function’s implementation to multiple operating systems if those operating systems can share the same implementation. This rule applies to the set of operating systems supported for a given class of operating system (e.g. Unix, Win32).
No Virtual Methods¶
The System Library interfaces can be called quite frequently by LLVM. In order
to make those calls as efficient as possible, we discourage the use of virtual
methods. There is no need to use inheritance for implementation differences, it
just adds complexity. The #include
mechanism works just fine.
No Exposed Functions¶
Any functions defined by system libraries (i.e. not defined by lib/System
)
must not be exposed through the lib/System
interface, even if the header
file for that function is not exposed. This prevents inadvertent use of system
specific functionality.
For example, the stat
system call is notorious for having variations in the
data it provides. lib/System
must not declare stat
nor allow it to be
declared. Instead it should provide its own interface to discovering
information about files and directories. Those interfaces may be implemented in
terms of stat
but that is strictly an implementation detail. The interface
provided by the System Library must be implemented on all platforms (even those
without stat
).
No Exposed Data¶
Any data defined by system libraries (i.e. not defined by lib/System
) must
not be exposed through the lib/System
interface, even if the header file
for that function is not exposed. As with functions, this prevents inadvertent
use of data that might not exist on all platforms.
Minimize Soft Errors¶
Operating system interfaces will generally provide error results for every little thing that could go wrong. In almost all cases, you can divide these error results into two groups: normal/good/soft and abnormal/bad/hard. That is, some of the errors are simply information like “file not found”, “insufficient privileges”, etc. while other errors are much harder like “out of space”, “bad disk sector”, or “system call interrupted”. We’ll call the first group “soft” errors and the second group “hard” errors.
lib/System
must always attempt to minimize soft errors. This is a design
requirement because the minimization of soft errors can affect the granularity
and the nature of the interface. In general, if you find that you’re wanting to
throw soft errors, you must review the granularity of the interface because it
is likely you’re trying to implement something that is too low level. The rule
of thumb is to provide interface functions that can’t fail, except when
faced with hard errors.
For a trivial example, suppose we wanted to add an “OpenFileForWriting
”
function. For many operating systems, if the file doesn’t exist, attempting to
open the file will produce an error. However, lib/System
should not simply
throw that error if it occurs because its a soft error. The problem is that the
interface function, OpenFileForWriting
is too low level. It should be
OpenOrCreateFileForWriting
. In the case of the soft “doesn’t exist” error,
this function would just create it and then open it for writing.
This design principle needs to be maintained in lib/System
because it
avoids the propagation of soft error handling throughout the rest of LLVM.
Hard errors will generally just cause a termination for an LLVM tool so don’t
be bashful about throwing them.
Rules of thumb:
- Don’t throw soft errors, only hard errors.
- If you’re tempted to throw a soft error, re-think the interface.
- Handle internally the most common normal/good/soft error conditions so the rest of LLVM doesn’t have to.
No throw Specifications¶
None of the lib/System
interface functions may be declared with C++
throw()
specifications on them. This requirement makes sure that the
compiler does not insert additional exception handling code into the interface
functions. This is a performance consideration: lib/System
functions are at
the bottom of many call chains and as such can be frequently called. We need
them to be as efficient as possible. However, no routines in the system
library should actually throw exceptions.
Code Organization¶
Implementations of the System Library interface are separated by their general
class of operating system. Currently only Unix and Win32 classes are defined
but more could be added for other operating system classifications. To
distinguish which implementation to compile, the code in lib/System
uses
the LLVM_ON_UNIX
and LLVM_ON_WIN32
#defines
provided via configure
through the llvm/Config/config.h
file. Each source file in lib/System
,
after implementing the generic (operating system independent) functionality
needs to include the correct implementation using a set of
#if defined(LLVM_ON_XYZ)
directives. For example, if we had
lib/System/File.cpp
, we’d expect to see in that file:
#if defined(LLVM_ON_UNIX)
#include "Unix/File.cpp"
#endif
#if defined(LLVM_ON_WIN32)
#include "Win32/File.cpp"
#endif
The implementation in lib/System/Unix/File.cpp
should handle all Unix
variants. The implementation in lib/System/Win32/File.cpp
should handle all
Win32 variants. What this does is quickly differentiate the basic class of
operating system that will provide the implementation. The specific details for
a given platform must still be determined through the use of #ifdef
.
Consistent Semantics¶
The implementation of a lib/System
interface can vary drastically between
platforms. That’s okay as long as the end result of the interface function is
the same. For example, a function to create a directory is pretty straight
forward on all operating system. System V IPC on the other hand isn’t even
supported on all platforms. Instead of “supporting” System V IPC,
lib/System
should provide an interface to the basic concept of
inter-process communications. The implementations might use System V IPC if
that was available or named pipes, or whatever gets the job done effectively
for a given operating system. In all cases, the interface and the
implementation must be semantically consistent.
Source Level Debugging with LLVM¶
Introduction¶
This document is the central repository for all information pertaining to debug information in LLVM. It describes the actual format that the LLVM debug information takes, which is useful for those interested in creating front-ends or dealing directly with the information. Further, this document provides specific examples of what debug information for C/C++ looks like.
Philosophy behind LLVM debugging information¶
The idea of the LLVM debugging information is to capture how the important pieces of the source-language’s Abstract Syntax Tree map onto LLVM code. Several design aspects have shaped the solution that appears here. The important ones are:
- Debugging information should have very little impact on the rest of the compiler. No transformations, analyses, or code generators should need to be modified because of debugging information.
- LLVM optimizations should interact in well-defined and easily described ways with the debugging information.
- Because LLVM is designed to support arbitrary programming languages, LLVM-to-LLVM tools should not need to know anything about the semantics of the source-level-language.
- Source-level languages are often widely different from one another. LLVM should not put any restrictions of the flavor of the source-language, and the debugging information should work with any language.
- With code generator support, it should be possible to use an LLVM compiler to compile a program to native machine code and standard debugging formats. This allows compatibility with traditional machine-code level debuggers, like GDB or DBX.
The approach used by the LLVM implementation is to use a small set of intrinsic functions to define a mapping between LLVM program objects and the source-level objects. The description of the source-level program is maintained in LLVM metadata in an implementation-defined format (the C/C++ front-end currently uses working draft 7 of the DWARF 3 standard).
When a program is being debugged, a debugger interacts with the user and turns the stored debug information into source-language specific information. As such, a debugger must be aware of the source-language, and is thus tied to a specific language or family of languages.
Debug information consumers¶
The role of debug information is to provide meta information normally stripped away during the compilation process. This meta information provides an LLVM user a relationship between generated code and the original program source code.
Currently, debug information is consumed by DwarfDebug to produce dwarf information used by the gdb debugger. Other targets could use the same information to produce stabs or other debug forms.
It would also be reasonable to use debug information to feed profiling tools for analysis of generated code, or, tools for reconstructing the original source from generated code.
TODO - expound a bit more.
Debugging optimized code¶
An extremely high priority of LLVM debugging information is to make it interact well with optimizations and analysis. In particular, the LLVM debug information provides the following guarantees:
- LLVM debug information always provides information to accurately read the source-level state of the program, regardless of which LLVM optimizations have been run, and without any modification to the optimizations themselves. However, some optimizations may impact the ability to modify the current state of the program with a debugger, such as setting program variables, or calling functions that have been deleted.
- As desired, LLVM optimizations can be upgraded to be aware of the LLVM debugging information, allowing them to update the debugging information as they perform aggressive optimizations. This means that, with effort, the LLVM optimizers could optimize debug code just as well as non-debug code.
- LLVM debug information does not prevent optimizations from happening (for example inlining, basic block reordering/merging/cleanup, tail duplication, etc).
- LLVM debug information is automatically optimized along with the rest of the program, using existing facilities. For example, duplicate information is automatically merged by the linker, and unused information is automatically removed.
Basically, the debug information allows you to compile a program with
“-O0 -g
” and get full debug information, allowing you to arbitrarily modify
the program as it executes from a debugger. Compiling a program with
“-O3 -g
” gives you full debug information that is always available and
accurate for reading (e.g., you get accurate stack traces despite tail call
elimination and inlining), but you might lose the ability to modify the program
and call functions where were optimized out of the program, or inlined away
completely.
LLVM test suite provides a framework to test optimizer’s handling of debugging information. It can be run like this:
% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
% make TEST=dbgopt
This will test impact of debugging information on optimization passes. If debugging information influences optimization passes then it will be reported as a failure. See LLVM Testing Infrastructure Guide for more information on LLVM test infrastructure and how to run various tests.
Debugging information format¶
LLVM debugging information has been carefully designed to make it possible for the optimizer to optimize the program and debugging information without necessarily having to know anything about debugging information. In particular, the use of metadata avoids duplicated debugging information from the beginning, and the global dead code elimination pass automatically deletes debugging information for a function if it decides to delete the function.
To do this, most of the debugging information (descriptors for types, variables, functions, source files, etc) is inserted by the language front-end in the form of LLVM metadata.
Debug information is designed to be agnostic about the target debugger and debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic pass to decode the information that represents variables, types, functions, namespaces, etc: this allows for arbitrary source-language semantics and type-systems to be used, as long as there is a module written for the target debugger to interpret the information.
To provide basic functionality, the LLVM debugger does have to make some assumptions about the source-level language being debugged, though it keeps these to a minimum. The only common features that the LLVM debugger assumes exist are source files, and program objects. These abstract objects are used by a debugger to form stack traces, show information about local variables, etc.
This section of the documentation first describes the representation aspects common to any source-language. C/C++ front-end specific debug information describes the data layout conventions used by the C and C++ front-ends.
Debug information descriptors are specialized metadata nodes, first-class subclasses of Metadata
.
Debugger intrinsic functions¶
LLVM uses several intrinsic functions (name prefixed with “llvm.dbg
”) to
provide debug information at various points in generated code.
llvm.dbg.declare
¶
void @llvm.dbg.declare(metadata, metadata, metadata)
This intrinsic provides information about a local element (e.g., variable). The first argument is metadata holding the alloca for the variable. The second argument is a local variable containing a description of the variable. The third argument is a complex expression.
llvm.dbg.value
¶
void @llvm.dbg.value(metadata, i64, metadata, metadata)
This intrinsic provides information when a user source variable is set to a new value. The first argument is the new value (wrapped as metadata). The second argument is the offset in the user source variable where the new value is written. The third argument is a local variable containing a description of the variable. The third argument is a complex expression.
Object lifetimes and scoping¶
In many languages, the local variables in functions can have their lifetimes or scopes limited to a subset of a function. In the C family of languages, for example, variables are only live (readable and writable) within the source block that they are defined in. In functional languages, values are only readable after they have been defined. Though this is a very obvious concept, it is non-trivial to model in LLVM, because it has no notion of scoping in this sense, and does not want to be tied to a language’s scoping rules.
In order to handle this, the LLVM debug format uses the metadata attached to llvm instructions to encode line number and scoping information. Consider the following C fragment, for example:
1. void foo() {
2. int X = 21;
3. int Y = 22;
4. {
5. int Z = 23;
6. Z = X;
7. }
8. X = Y;
9. }
Compiled to LLVM, this function would be represented like this:
; Function Attrs: nounwind ssp uwtable
define void @foo() #0 {
entry:
%X = alloca i32, align 4
%Y = alloca i32, align 4
%Z = alloca i32, align 4
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
store i32 21, i32* %X, align 4, !dbg !14
call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16
store i32 22, i32* %Y, align 4, !dbg !16
call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
store i32 23, i32* %Z, align 4, !dbg !19
%0 = load i32, i32* %X, align 4, !dbg !20
store i32 %0, i32* %Z, align 4, !dbg !21
%1 = load i32, i32* %Y, align 4, !dbg !22
store i32 %1, i32* %X, align 4, !dbg !23
ret void, !dbg !24
}
; Function Attrs: nounwind readnone
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readnone }
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!7, !8, !9}
!llvm.ident = !{!10}
!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: 1, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2)
!1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info")
!2 = !{}
!3 = !{!4}
!4 = !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, function: void ()* @foo, variables: !2)
!5 = !DISubroutineType(types: !6)
!6 = !{null}
!7 = !{i32 2, !"Dwarf Version", i32 2}
!8 = !{i32 2, !"Debug Info Version", i32 3}
!9 = !{i32 1, !"PIC Level", i32 2}
!10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
!11 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "X", scope: !4, file: !1, line: 2, type: !12)
!12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
!13 = !DIExpression()
!14 = !DILocation(line: 2, column: 9, scope: !4)
!15 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "Y", scope: !4, file: !1, line: 3, type: !12)
!16 = !DILocation(line: 3, column: 9, scope: !4)
!17 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "Z", scope: !18, file: !1, line: 5, type: !12)
!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
!19 = !DILocation(line: 5, column: 11, scope: !18)
!20 = !DILocation(line: 6, column: 11, scope: !18)
!21 = !DILocation(line: 6, column: 9, scope: !18)
!22 = !DILocation(line: 8, column: 9, scope: !4)
!23 = !DILocation(line: 8, column: 7, scope: !4)
!24 = !DILocation(line: 9, column: 3, scope: !4)
This example illustrates a few important details about LLVM debugging
information. In particular, it shows how the llvm.dbg.declare
intrinsic and
location information, which are attached to an instruction, are applied
together to allow a debugger to analyze the relationship between statements,
variable definitions, and the code used to implement the function.
call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
; [debug line = 2:7] [debug variable = X]
The first intrinsic %llvm.dbg.declare
encodes debugging information for the
variable X
. The metadata !dbg !14
attached to the intrinsic provides
scope information for the variable X
.
!14 = !DILocation(line: 2, column: 9, scope: !4)
!4 = !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
isLocal: false, isDefinition: true, scopeLine: 1,
isOptimized: false, function: void ()* @foo,
variables: !2)
Here !14
is metadata providing location information. In this example, scope is encoded by !4
, a
subprogram descriptor. This way the location
information attached to the intrinsics indicates that the variable X
is
declared at line number 2 at a function level scope in function foo
.
Now lets take another example.
call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
; [debug line = 5:9] [debug variable = Z]
The third intrinsic %llvm.dbg.declare
encodes debugging information for
variable Z
. The metadata !dbg !19
attached to the intrinsic provides
scope information for the variable Z
.
!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
!19 = !DILocation(line: 5, column: 11, scope: !18)
Here !19
indicates that Z
is declared at line number 5 and column
number 0 inside of lexical scope !18
. The lexical scope itself resides
inside of subprogram !4
described above.
The scope information attached with each instruction provides a straightforward way to find instructions covered by a scope.
C/C++ front-end specific debug information¶
The C and C++ front-ends represent information about the program in a format that is effectively identical to DWARF 3.0 in terms of information content. This allows code generators to trivially support native debuggers by generating standard dwarf information, and contains enough information for non-dwarf targets to translate it as needed.
This section describes the forms used to represent C and C++ programs. Other languages could pattern themselves after this (which itself is tuned to representing programs in the same way that DWARF 3 does), or they could choose to provide completely different forms if they don’t fit into the DWARF model. As support for debugging information gets added to the various LLVM source-language front-ends, the information used should be documented here.
The following sections provide examples of a few C/C++ constructs and the debug
information that would best describe those constructs. The canonical
references are the DIDescriptor
classes defined in
include/llvm/IR/DebugInfo.h
and the implementations of the helper functions
in lib/IR/DIBuilder.cpp
.
C/C++ source file information¶
llvm::Instruction
provides easy access to metadata attached with an
instruction. One can extract line number information encoded in LLVM IR using
Instruction::getMetadata()
and DILocation::getLineNumber()
.
if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction
DILocation Loc(N); // DILocation is in DebugInfo.h
unsigned Line = Loc.getLineNumber();
StringRef File = Loc.getFilename();
StringRef Dir = Loc.getDirectory();
}
C/C++ global variable information¶
Given an integer global variable declared as follows:
int MyGlobal = 100;
a C/C++ front-end would generate the following descriptors:
;;
;; Define the global itself.
;;
@MyGlobal = global i32 100, align 4
;;
;; List of debug info of globals
;;
!llvm.dbg.cu = !{!0}
;; Some unrelated metadata.
!llvm.module.flags = !{!6, !7}
;; Define the compile unit.
!0 = !DICompileUnit(language: DW_LANG_C99, file: !1,
producer:
"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)",
isOptimized: false, runtimeVersion: 0, emissionKind: 1,
enums: !2, retainedTypes: !2, subprograms: !2, globals:
!3, imports: !2)
;;
;; Define the file
;;
!1 = !DIFile(filename: "/dev/stdin",
directory: "/Users/dexonsmith/data/llvm/debug-info")
;; An empty array.
!2 = !{}
;; The Array of Global Variables
!3 = !{!4}
;;
;; Define the global variable itself.
;;
!4 = !DIGlobalVariable(name: "MyGlobal", scope: !0, file: !1, line: 1,
type: !5, isLocal: false, isDefinition: true,
variable: i32* @MyGlobal)
;;
;; Define the type
;;
!5 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
;; Dwarf version to output.
!6 = !{i32 2, !"Dwarf Version", i32 2}
;; Debug info schema version.
!7 = !{i32 2, !"Debug Info Version", i32 3}
C/C++ function information¶
Given a function declared as follows:
int main(int argc, char *argv[]) {
return 0;
}
a C/C++ front-end would generate the following descriptors:
;;
;; Define the anchor for subprograms.
;;
!4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5,
isLocal: false, isDefinition: true, scopeLine: 1,
flags: DIFlagPrototyped, isOptimized: false,
function: i32 (i32, i8**)* @main, variables: !2)
;;
;; Define the subprogram itself.
;;
define i32 @main(i32 %argc, i8** %argv) {
...
}
Debugging information format¶
Debugging Information Extension for Objective C Properties¶
Introduction¶
Objective C provides a simpler way to declare and define accessor methods using declared properties. The language provides features to declare a property and to let compiler synthesize accessor methods.
The debugger lets developer inspect Objective C interfaces and their instance variables and class variables. However, the debugger does not know anything about the properties defined in Objective C interfaces. The debugger consumes information generated by compiler in DWARF format. The format does not support encoding of Objective C properties. This proposal describes DWARF extensions to encode Objective C properties, which the debugger can use to let developers inspect Objective C properties.
Proposal¶
Objective C properties exist separately from class members. A property can be
defined only by “setter” and “getter” selectors, and be calculated anew on each
access. Or a property can just be a direct access to some declared ivar.
Finally it can have an ivar “automatically synthesized” for it by the compiler,
in which case the property can be referred to in user code directly using the
standard C dereference syntax as well as through the property “dot” syntax, but
there is no entry in the @interface
declaration corresponding to this ivar.
To facilitate debugging, these properties we will add a new DWARF TAG into the
DW_TAG_structure_type
definition for the class to hold the description of a
given property, and a set of DWARF attributes that provide said description.
The property tag will also contain the name and declared type of the property.
If there is a related ivar, there will also be a DWARF property attribute placed
in the DW_TAG_member
DIE for that ivar referring back to the property TAG
for that property. And in the case where the compiler synthesizes the ivar
directly, the compiler is expected to generate a DW_TAG_member
for that
ivar (with the DW_AT_artificial
set to 1), whose name will be the name used
to access this ivar directly in code, and with the property attribute pointing
back to the property it is backing.
The following examples will serve as illustration for our discussion:
@interface I1 {
int n2;
}
@property int p1;
@property int p2;
@end
@implementation I1
@synthesize p1;
@synthesize p2 = n2;
@end
This produces the following DWARF (this is a “pseudo dwarfdump” output):
0x00000100: TAG_structure_type [7] *
AT_APPLE_runtime_class( 0x10 )
AT_name( "I1" )
AT_decl_file( "Objc_Property.m" )
AT_decl_line( 3 )
0x00000110 TAG_APPLE_property
AT_name ( "p1" )
AT_type ( {0x00000150} ( int ) )
0x00000120: TAG_APPLE_property
AT_name ( "p2" )
AT_type ( {0x00000150} ( int ) )
0x00000130: TAG_member [8]
AT_name( "_p1" )
AT_APPLE_property ( {0x00000110} "p1" )
AT_type( {0x00000150} ( int ) )
AT_artificial ( 0x1 )
0x00000140: TAG_member [8]
AT_name( "n2" )
AT_APPLE_property ( {0x00000120} "p2" )
AT_type( {0x00000150} ( int ) )
0x00000150: AT_type( ( int ) )
Note, the current convention is that the name of the ivar for an auto-synthesized property is the name of the property from which it derives with an underscore prepended, as is shown in the example. But we actually don’t need to know this convention, since we are given the name of the ivar directly.
Also, it is common practice in ObjC to have different property declarations in the @interface and @implementation - e.g. to provide a read-only property in the interface,and a read-write interface in the implementation. In that case, the compiler should emit whichever property declaration will be in force in the current translation unit.
Developers can decorate a property with attributes which are encoded using
DW_AT_APPLE_property_attribute
.
@property (readonly, nonatomic) int pr;
TAG_APPLE_property [8]
AT_name( "pr" )
AT_type ( {0x00000147} (int) )
AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
The setter and getter method names are attached to the property using
DW_AT_APPLE_property_setter
and DW_AT_APPLE_property_getter
attributes.
@interface I1
@property (setter=myOwnP3Setter:) int p3;
-(void)myOwnP3Setter:(int)a;
@end
@implementation I1
@synthesize p3;
-(void)myOwnP3Setter:(int)a{ }
@end
The DWARF for this would be:
0x000003bd: TAG_structure_type [7] *
AT_APPLE_runtime_class( 0x10 )
AT_name( "I1" )
AT_decl_file( "Objc_Property.m" )
AT_decl_line( 3 )
0x000003cd TAG_APPLE_property
AT_name ( "p3" )
AT_APPLE_property_setter ( "myOwnP3Setter:" )
AT_type( {0x00000147} ( int ) )
0x000003f3: TAG_member [8]
AT_name( "_p3" )
AT_type ( {0x00000147} ( int ) )
AT_APPLE_property ( {0x000003cd} )
AT_artificial ( 0x1 )
New DWARF Tags¶
TAG | Value |
---|---|
DW_TAG_APPLE_property | 0x4200 |
New DWARF Attributes¶
Attribute | Value | Classes |
---|---|---|
DW_AT_APPLE_property | 0x3fed | Reference |
DW_AT_APPLE_property_getter | 0x3fe9 | String |
DW_AT_APPLE_property_setter | 0x3fea | String |
DW_AT_APPLE_property_attribute | 0x3feb | Constant |
New DWARF Constants¶
Name | Value | |
---|---|---|
DW_APPLE_PROPERTY_readonly | 0x01 | |
DW_APPLE_PROPERTY_getter | 0x02 | |
DW_APPLE_PROPERTY_assign | 0x04 | |
DW_APPLE_PROPERTY_readwrite | 0x08 | |
DW_APPLE_PROPERTY_retain | 0x10 | |
DW_APPLE_PROPERTY_copy | 0x20 | |
DW_APPLE_PROPERTY_nonatomic | 0x40 | |
DW_APPLE_PROPERTY_setter | 0x80 | |
DW_APPLE_PROPERTY_atomic | 0x100 | |
DW_APPLE_PROPERTY_weak | 0x200 | |
DW_APPLE_PROPERTY_strong | 0x400 | |
DW_APPLE_PROPERTY_unsafe_unretained | 0x800 |
Name Accelerator Tables¶
Introduction¶
The “.debug_pubnames
” and “.debug_pubtypes
” formats are not what a
debugger needs. The “pub
” in the section name indicates that the entries
in the table are publicly visible names only. This means no static or hidden
functions show up in the “.debug_pubnames
”. No static variables or private
class variables are in the “.debug_pubtypes
”. Many compilers add different
things to these tables, so we can’t rely upon the contents between gcc, icc, or
clang.
The typical query given by users tends not to match up with the contents of
these tables. For example, the DWARF spec states that “In the case of the name
of a function member or static data member of a C++ structure, class or union,
the name presented in the “.debug_pubnames
” section is not the simple name
given by the DW_AT_name attribute
of the referenced debugging information
entry, but rather the fully qualified name of the data or function member.”
So the only names in these tables for complex C++ entries is a fully
qualified name. Debugger users tend not to enter their search strings as
“a::b::c(int,const Foo&) const
”, but rather as “c
”, “b::c
” , or
“a::b::c
”. So the name entered in the name table must be demangled in
order to chop it up appropriately and additional names must be manually entered
into the table to make it effective as a name lookup table for debuggers to
se.
All debuggers currently ignore the “.debug_pubnames
” table as a result of
its inconsistent and useless public-only name content making it a waste of
space in the object file. These tables, when they are written to disk, are not
sorted in any way, leaving every debugger to do its own parsing and sorting.
These tables also include an inlined copy of the string values in the table
itself making the tables much larger than they need to be on disk, especially
for large C++ programs.
Can’t we just fix the sections by adding all of the names we need to this table? No, because that is not what the tables are defined to contain and we won’t know the difference between the old bad tables and the new good tables. At best we could make our own renamed sections that contain all of the data we need.
These tables are also insufficient for what a debugger like LLDB needs. LLDB
uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then
often asked to look for type “foo
” or namespace “bar
”, or list items in
namespace “baz
”. Namespaces are not included in the pubnames or pubtypes
tables. Since clang asks a lot of questions when it is parsing an expression,
we need to be very fast when looking up names, as it happens a lot. Having new
accelerator tables that are optimized for very quick lookups will benefit this
type of debugging experience greatly.
We would like to generate name lookup tables that can be mapped into memory from disk, and used as is, with little or no up-front parsing. We would also be able to control the exact content of these different tables so they contain exactly what we need. The Name Accelerator Tables were designed to fix these issues. In order to solve these issues we need to:
- Have a format that can be mapped into memory from disk and used as is
- Lookups should be very fast
- Extensible table format so these tables can be made by many producers
- Contain all of the names needed for typical lookups out of the box
- Strict rules for the contents of tables
Table size is important and the accelerator table format should allow the reuse of strings from common string tables so the strings for the names are not duplicated. We also want to make sure the table is ready to be used as-is by simply mapping the table into memory with minimal header parsing.
The name lookups need to be fast and optimized for the kinds of lookups that debuggers tend to do. Optimally we would like to touch as few parts of the mapped table as possible when doing a name lookup and be able to quickly find the name entry we are looking for, or discover there are no matches. In the case of debuggers we optimized for lookups that fail most of the time.
Each table that is defined should have strict rules on exactly what is in the accelerator tables and documented so clients can rely on the content.
Hash Tables¶
Standard Hash Tables¶
Typical hash tables have a header, buckets, and each bucket points to the bucket contents:
.------------.
| HEADER |
|------------|
| BUCKETS |
|------------|
| DATA |
`------------'
The BUCKETS are an array of offsets to DATA for each hash:
.------------.
| 0x00001000 | BUCKETS[0]
| 0x00002000 | BUCKETS[1]
| 0x00002200 | BUCKETS[2]
| 0x000034f0 | BUCKETS[3]
| | ...
| 0xXXXXXXXX | BUCKETS[n_buckets]
'------------'
So for bucket[3]
in the example above, we have an offset into the table
0x000034f0 which points to a chain of entries for the bucket. Each bucket must
contain a next pointer, full 32 bit hash value, the string itself, and the data
for the current string value.
.------------.
0x000034f0: | 0x00003500 | next pointer
| 0x12345678 | 32 bit hash
| "erase" | string value
| data[n] | HashData for this bucket
|------------|
0x00003500: | 0x00003550 | next pointer
| 0x29273623 | 32 bit hash
| "dump" | string value
| data[n] | HashData for this bucket
|------------|
0x00003550: | 0x00000000 | next pointer
| 0x82638293 | 32 bit hash
| "main" | string value
| data[n] | HashData for this bucket
`------------'
The problem with this layout for debuggers is that we need to optimize for the
negative lookup case where the symbol we’re searching for is not present. So
if we were to lookup “printf
” in the table above, we would make a 32 hash
for “printf
”, it might match bucket[3]
. We would need to go to the
offset 0x000034f0 and start looking to see if our 32 bit hash matches. To do
so, we need to read the next pointer, then read the hash, compare it, and skip
to the next bucket. Each time we are skipping many bytes in memory and
touching new cache pages just to do the compare on the full 32 bit hash. All
of these accesses then tell us that we didn’t have a match.
Name Hash Tables¶
To solve the issues mentioned above we have structured the hash tables a bit differently: a header, buckets, an array of all unique 32 bit hash values, followed by an array of hash value data offsets, one for each hash value, then the data for all hash values:
.-------------.
| HEADER |
|-------------|
| BUCKETS |
|-------------|
| HASHES |
|-------------|
| OFFSETS |
|-------------|
| DATA |
`-------------'
The BUCKETS
in the name tables are an index into the HASHES
array. By
making all of the full 32 bit hash values contiguous in memory, we allow
ourselves to efficiently check for a match while touching as little memory as
possible. Most often checking the 32 bit hash values is as far as the lookup
goes. If it does match, it usually is a match with no collisions. So for a
table with “n_buckets
” buckets, and “n_hashes
” unique 32 bit hash
values, we can clarify the contents of the BUCKETS
, HASHES
and
OFFSETS
as:
.-------------------------.
| HEADER.magic | uint32_t
| HEADER.version | uint16_t
| HEADER.hash_function | uint16_t
| HEADER.bucket_count | uint32_t
| HEADER.hashes_count | uint32_t
| HEADER.header_data_len | uint32_t
| HEADER_DATA | HeaderData
|-------------------------|
| BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes
|-------------------------|
| HASHES | uint32_t[n_hashes] // 32 bit hash values
|-------------------------|
| OFFSETS | uint32_t[n_hashes] // 32 bit offsets to hash value data
|-------------------------|
| ALL HASH DATA |
`-------------------------'
So taking the exact same data from the standard hash example above we end up with:
.------------.
| HEADER |
|------------|
| 0 | BUCKETS[0]
| 2 | BUCKETS[1]
| 5 | BUCKETS[2]
| 6 | BUCKETS[3]
| | ...
| ... | BUCKETS[n_buckets]
|------------|
| 0x........ | HASHES[0]
| 0x........ | HASHES[1]
| 0x........ | HASHES[2]
| 0x........ | HASHES[3]
| 0x........ | HASHES[4]
| 0x........ | HASHES[5]
| 0x12345678 | HASHES[6] hash for BUCKETS[3]
| 0x29273623 | HASHES[7] hash for BUCKETS[3]
| 0x82638293 | HASHES[8] hash for BUCKETS[3]
| 0x........ | HASHES[9]
| 0x........ | HASHES[10]
| 0x........ | HASHES[11]
| 0x........ | HASHES[12]
| 0x........ | HASHES[13]
| 0x........ | HASHES[n_hashes]
|------------|
| 0x........ | OFFSETS[0]
| 0x........ | OFFSETS[1]
| 0x........ | OFFSETS[2]
| 0x........ | OFFSETS[3]
| 0x........ | OFFSETS[4]
| 0x........ | OFFSETS[5]
| 0x000034f0 | OFFSETS[6] offset for BUCKETS[3]
| 0x00003500 | OFFSETS[7] offset for BUCKETS[3]
| 0x00003550 | OFFSETS[8] offset for BUCKETS[3]
| 0x........ | OFFSETS[9]
| 0x........ | OFFSETS[10]
| 0x........ | OFFSETS[11]
| 0x........ | OFFSETS[12]
| 0x........ | OFFSETS[13]
| 0x........ | OFFSETS[n_hashes]
|------------|
| |
| |
| |
| |
| |
|------------|
0x000034f0: | 0x00001203 | .debug_str ("erase")
| 0x00000004 | A 32 bit array count - number of HashData with name "erase"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x........ | HashData[2]
| 0x........ | HashData[3]
| 0x00000000 | String offset into .debug_str (terminate data for hash)
|------------|
0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
| 0x00000002 | A 32 bit array count - number of HashData with name "collision"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x00001203 | String offset into .debug_str ("dump")
| 0x00000003 | A 32 bit array count - number of HashData with name "dump"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x........ | HashData[2]
| 0x00000000 | String offset into .debug_str (terminate data for hash)
|------------|
0x00003550: | 0x00001203 | String offset into .debug_str ("main")
| 0x00000009 | A 32 bit array count - number of HashData with name "main"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x........ | HashData[2]
| 0x........ | HashData[3]
| 0x........ | HashData[4]
| 0x........ | HashData[5]
| 0x........ | HashData[6]
| 0x........ | HashData[7]
| 0x........ | HashData[8]
| 0x00000000 | String offset into .debug_str (terminate data for hash)
`------------'
So we still have all of the same data, we just organize it more efficiently for
debugger lookup. If we repeat the same “printf
” lookup from above, we
would hash “printf
” and find it matches BUCKETS[3]
by taking the 32 bit
hash value and modulo it by n_buckets
. BUCKETS[3]
contains “6” which
is the index into the HASHES
table. We would then compare any consecutive
32 bit hashes values in the HASHES
array as long as the hashes would be in
BUCKETS[3]
. We do this by verifying that each subsequent hash value modulo
n_buckets
is still 3. In the case of a failed lookup we would access the
memory for BUCKETS[3]
, and then compare a few consecutive 32 bit hashes
before we know that we have no match. We don’t end up marching through
multiple words of memory and we really keep the number of processor data cache
lines being accessed as small as possible.
The string hash that is used for these lookup tables is the Daniel J.
Bernstein hash which is also used in the ELF GNU_HASH
sections. It is a
very good hash for all kinds of names in programs with very few hash
collisions.
Empty buckets are designated by using an invalid hash index of UINT32_MAX
.
Details¶
These name hash tables are designed to be generic where specializations of the
table get to define additional data that goes into the header (“HeaderData
”),
how the string value is stored (“KeyType
”) and the content of the data for each
hash value.
Header Layout¶
The header has a fixed part, and the specialized part. The exact format of the header is:
struct Header
{
uint32_t magic; // 'HASH' magic value to allow endian detection
uint16_t version; // Version number
uint16_t hash_function; // The hash function enumeration that was used
uint32_t bucket_count; // The number of buckets in this hash table
uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table
uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
// Specifically the length of the following HeaderData field - this does not
// include the size of the preceding fields
HeaderData header_data; // Implementation specific header data
};
The header starts with a 32 bit “magic
” value which must be 'HASH'
encoded as an ASCII integer. This allows the detection of the start of the
hash table and also allows the table’s byte order to be determined so the table
can be correctly extracted. The “magic
” value is followed by a 16 bit
version
number which allows the table to be revised and modified in the
future. The current version number is 1. hash_function
is a uint16_t
enumeration that specifies which hash function was used to produce this table.
The current values for the hash function enumerations include:
enum HashFunctionType
{
eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
};
bucket_count
is a 32 bit unsigned integer that represents how many buckets
are in the BUCKETS
array. hashes_count
is the number of unique 32 bit
hash values that are in the HASHES
array, and is the same number of offsets
are contained in the OFFSETS
array. header_data_len
specifies the size
in bytes of the HeaderData
that is filled in by specialized versions of
this table.
Fixed Lookup¶
The header is followed by the buckets, hashes, offsets, and hash value data.
struct FixedTable
{
uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
};
buckets
is an array of 32 bit indexes into the hashes
array. The
hashes
array contains all of the 32 bit hash values for all names in the
hash table. Each hash in the hashes
table has an offset in the offsets
array that points to the data for the hash value.
This table setup makes it very easy to repurpose these tables to contain different data, while keeping the lookup mechanism the same for all tables. This layout also makes it possible to save the table to disk and map it in later and do very efficient name lookups with little or no parsing.
DWARF lookup tables can be implemented in a variety of ways and can store a lot of information for each name. We want to make the DWARF tables extensible and able to store the data efficiently so we have used some of the DWARF features that enable efficient data storage to define exactly what kind of data we store for each name.
The HeaderData
contains a definition of the contents of each HashData chunk.
We might want to store an offset to all of the debug information entries (DIEs)
for each name. To keep things extensible, we create a list of items, or
Atoms, that are contained in the data for each name. First comes the type of
the data in each atom:
enum AtomType
{
eAtomTypeNULL = 0u,
eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding
eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question
eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
eAtomTypeNameFlags = 4u, // Flags from enum NameFlags
eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags
};
The enumeration values and their meanings are:
eAtomTypeNULL - a termination atom that specifies the end of the atom list
eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name
eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE
eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...)
eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...)
Then we allow each atom type to define the atom type and how the data for each atom type data is encoded:
struct Atom
{
uint16_t type; // AtomType enum value
uint16_t form; // DWARF DW_FORM_XXX defines
};
The form
type above is from the DWARF specification and defines the exact
encoding of the data for the Atom type. See the DWARF specification for the
DW_FORM_
definitions.
struct HeaderData
{
uint32_t die_offset_base;
uint32_t atom_count;
Atoms atoms[atom_count0];
};
HeaderData
defines the base DIE offset that should be added to any atoms
that are encoded using the DW_FORM_ref1
, DW_FORM_ref2
,
DW_FORM_ref4
, DW_FORM_ref8
or DW_FORM_ref_udata
. It also defines
what is contained in each HashData
object – Atom.form
tells us how large
each field will be in the HashData
and the Atom.type
tells us how this data
should be interpreted.
For the current implementations of the “.apple_names
” (all functions +
globals), the “.apple_types
” (names of all types that are defined), and
the “.apple_namespaces
” (all namespaces), we currently set the Atom
array to be:
HeaderData.atom_count = 1;
HeaderData.atoms[0].type = eAtomTypeDIEOffset;
HeaderData.atoms[0].form = DW_FORM_data4;
This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have multiple matching DIEs in a single file, which could come up with an inlined function for instance. Future tables could include more information about the DIE such as flags indicating if the DIE is a function, method, block, or inlined.
The KeyType for the DWARF table is a 32 bit string table offset into the ”.debug_str” table. The ”.debug_str” is the string table for the DWARF which may already contain copies of all of the strings. This helps make sure, with help from the compiler, that we reuse the strings between all of the DWARF sections and keeps the hash table size down. Another benefit to having the compiler generate all strings as DW_FORM_strp in the debug info, is that DWARF parsing can be made much faster.
After a lookup is made, we get an offset into the hash data. The hash data needs to be able to deal with 32 bit hash collisions, so the chunk of data at the offset in the hash data consists of a triple:
uint32_t str_offset
uint32_t hash_data_count
HashData[hash_data_count]
If “str_offset” is zero, then the bucket contents are done. 99.9% of the hash data chunks contain a single item (no 32 bit hash collision):
.------------.
| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
| 0x00000004 | uint32_t HashData count
| 0x........ | uint32_t HashData[0] DIE offset
| 0x........ | uint32_t HashData[1] DIE offset
| 0x........ | uint32_t HashData[2] DIE offset
| 0x........ | uint32_t HashData[3] DIE offset
| 0x00000000 | uint32_t KeyType (end of hash chain)
`------------'
If there are collisions, you will have multiple valid string offsets:
.------------.
| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
| 0x00000004 | uint32_t HashData count
| 0x........ | uint32_t HashData[0] DIE offset
| 0x........ | uint32_t HashData[1] DIE offset
| 0x........ | uint32_t HashData[2] DIE offset
| 0x........ | uint32_t HashData[3] DIE offset
| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
| 0x00000002 | uint32_t HashData count
| 0x........ | uint32_t HashData[0] DIE offset
| 0x........ | uint32_t HashData[1] DIE offset
| 0x00000000 | uint32_t KeyType (end of hash chain)
`------------'
Current testing with real world C++ binaries has shown that there is around 1 32 bit hash collision per 100,000 name entries.
Contents¶
As we said, we want to strictly define exactly what is included in the
different tables. For DWARF, we have 3 tables: “.apple_names
”,
“.apple_types
”, and “.apple_namespaces
”.
“.apple_names
” sections should contain an entry for each DWARF DIE whose
DW_TAG
is a DW_TAG_label
, DW_TAG_inlined_subroutine
, or
DW_TAG_subprogram
that has address attributes: DW_AT_low_pc
,
DW_AT_high_pc
, DW_AT_ranges
or DW_AT_entry_pc
. It also contains
DW_TAG_variable
DIEs that have a DW_OP_addr
in the location (global and
static variables). All global and static variables should be included,
including those scoped within functions and classes. For example using the
following code:
static int var = 0;
void f ()
{
static int var = 0;
}
Both of the static var
variables would be included in the table. All
functions should emit both their full names and their basenames. For C or C++,
the full name is the mangled name (if available) which is usually in the
DW_AT_MIPS_linkage_name
attribute, and the DW_AT_name
contains the
function basename. If global or static variables have a mangled name in a
DW_AT_MIPS_linkage_name
attribute, this should be emitted along with the
simple name found in the DW_AT_name
attribute.
“.apple_types
” sections should contain an entry for each DWARF DIE whose
tag is one of:
- DW_TAG_array_type
- DW_TAG_class_type
- DW_TAG_enumeration_type
- DW_TAG_pointer_type
- DW_TAG_reference_type
- DW_TAG_string_type
- DW_TAG_structure_type
- DW_TAG_subroutine_type
- DW_TAG_typedef
- DW_TAG_union_type
- DW_TAG_ptr_to_member_type
- DW_TAG_set_type
- DW_TAG_subrange_type
- DW_TAG_base_type
- DW_TAG_const_type
- DW_TAG_file_type
- DW_TAG_namelist
- DW_TAG_packed_type
- DW_TAG_volatile_type
- DW_TAG_restrict_type
- DW_TAG_interface_type
- DW_TAG_unspecified_type
- DW_TAG_shared_type
Only entries with a DW_AT_name
attribute are included, and the entry must
not be a forward declaration (DW_AT_declaration
attribute with a non-zero
value). For example, using the following code:
int main ()
{
int *b = 0;
return *b;
}
We get a few type DIEs:
0x00000067: TAG_base_type [5]
AT_encoding( DW_ATE_signed )
AT_name( "int" )
AT_byte_size( 0x04 )
0x0000006e: TAG_pointer_type [6]
AT_type( {0x00000067} ( int ) )
AT_byte_size( 0x08 )
The DW_TAG_pointer_type is not included because it does not have a DW_AT_name
.
“.apple_namespaces
” section should contain all DW_TAG_namespace
DIEs.
If we run into a namespace that has no name this is an anonymous namespace, and
the name should be output as “(anonymous namespace)
” (without the quotes).
Why? This matches the output of the abi::cxa_demangle()
that is in the
standard C++ library that demangles mangled names.
Language Extensions and File Format Changes¶
Objective-C Extensions¶
“.apple_objc
” section should contain all DW_TAG_subprogram
DIEs for an
Objective-C class. The name used in the hash table is the name of the
Objective-C class itself. If the Objective-C class has a category, then an
entry is made for both the class name without the category, and for the class
name with the category. So if we have a DIE at offset 0x1234 with a name of
method “-[NSString(my_additions) stringWithSpecialString:]
”, we would add
an entry for “NSString
” that points to DIE 0x1234, and an entry for
“NSString(my_additions)
” that points to 0x1234. This allows us to quickly
track down all Objective-C methods for an Objective-C class when doing
expressions. It is needed because of the dynamic nature of Objective-C where
anyone can add methods to a class. The DWARF for Objective-C methods is also
emitted differently from C++ classes where the methods are not usually
contained in the class definition, they are scattered about across one or more
compile units. Categories can also be defined in different shared libraries.
So we need to be able to quickly find all of the methods and class functions
given the Objective-C class name, or quickly find all methods and class
functions for a class + category name. This table does not contain any
selector names, it just maps Objective-C class names (or class names +
category) to all of the methods and class functions. The selectors are added
as function basenames in the “.debug_names
” section.
In the “.apple_names
” section for Objective-C functions, the full name is
the entire function name with the brackets (“-[NSString
stringWithCString:]
”) and the basename is the selector only
(“stringWithCString:
”).
Mach-O Changes¶
The sections names for the apple hash tables are for non-mach-o files. For
mach-o files, the sections should be contained in the __DWARF
segment with
names as follows:
- “
.apple_names
” -> “__apple_names
“ - “
.apple_types
” -> “__apple_types
“ - “
.apple_namespaces
” -> “__apple_namespac
” (16 character limit) - “
.apple_objc
” -> “__apple_objc
“
Auto-Vectorization in LLVM¶
LLVM has two vectorizers: The Loop Vectorizer, which operates on Loops, and the SLP Vectorizer. These vectorizers focus on different optimization opportunities and use different techniques. The SLP vectorizer merges multiple scalars that are found in the code into vectors while the Loop Vectorizer widens instructions in loops to operate on multiple consecutive iterations.
Both the Loop Vectorizer and the SLP Vectorizer are enabled by default.
The Loop Vectorizer¶
Usage¶
The Loop Vectorizer is enabled by default, but it can be disabled through clang using the command line flag:
$ clang ... -fno-vectorize file.c
Command line flags¶
The loop vectorizer uses a cost model to decide on the optimal vectorization factor and unroll factor. However, users of the vectorizer can force the vectorizer to use specific values. Both ‘clang’ and ‘opt’ support the flags below.
Users can control the vectorization SIMD width using the command line flag “-force-vector-width”.
$ clang -mllvm -force-vector-width=8 ...
$ opt -loop-vectorize -force-vector-width=8 ...
Users can control the unroll factor using the command line flag “-force-vector-unroll”
$ clang -mllvm -force-vector-unroll=2 ...
$ opt -loop-vectorize -force-vector-unroll=2 ...
Pragma loop hint directives¶
The #pragma clang loop
directive allows loop vectorization hints to be
specified for the subsequent for, while, do-while, or c++11 range-based for
loop. The directive allows vectorization and interleaving to be enabled or
disabled. Vector width as well as interleave count can also be manually
specified. The following example explicitly enables vectorization and
interleaving:
#pragma clang loop vectorize(enable) interleave(enable)
while(...) {
...
}
The following example implicitly enables vectorization and interleaving by specifying a vector width and interleaving count:
#pragma clang loop vectorize_width(2) interleave_count(2)
for(...) {
...
}
See the Clang language extensions for details.
Diagnostics¶
Many loops cannot be vectorized including loops with complicated control flow, unvectorizable types, and unvectorizable calls. The loop vectorizer generates optimization remarks which can be queried using command line options to identify and diagnose loops that are skipped by the loop-vectorizer.
Optimization remarks are enabled using:
-Rpass=loop-vectorize
identifies loops that were successfully vectorized.
-Rpass-missed=loop-vectorize
identifies loops that failed vectorization and
indicates if vectorization was specified.
-Rpass-analysis=loop-vectorize
identifies the statements that caused
vectorization to fail.
Consider the following loop:
#pragma clang loop vectorize(enable)
for (int i = 0; i < Length; i++) {
switch(A[i]) {
case 0: A[i] = i*2; break;
case 1: A[i] = i; break;
default: A[i] = 0;
}
}
The command line -Rpass-missed=loop-vectorized
prints the remark:
no_switch.cpp:4:5: remark: loop not vectorized: vectorization is explicitly enabled [-Rpass-missed=loop-vectorize]
And the command line -Rpass-analysis=loop-vectorize
indicates that the
switch statement cannot be vectorized.
no_switch.cpp:4:5: remark: loop not vectorized: loop contains a switch statement [-Rpass-analysis=loop-vectorize]
switch(A[i]) {
^
To ensure line and column numbers are produced include the command line options
-gline-tables-only
and -gcolumn-info
. See the Clang user manual
for details
Features¶
The LLVM Loop Vectorizer has a number of features that allow it to vectorize complex loops.
Loops with unknown trip count¶
The Loop Vectorizer supports loops with an unknown trip count.
In the loop below, the iteration start
and finish
points are unknown,
and the Loop Vectorizer has a mechanism to vectorize loops that do not start
at zero. In this example, ‘n’ may not be a multiple of the vector width, and
the vectorizer has to execute the last few iterations as scalar code. Keeping
a scalar copy of the loop increases the code size.
void bar(float *A, float* B, float K, int start, int end) {
for (int i = start; i < end; ++i)
A[i] *= B[i] + K;
}
Runtime Checks of Pointers¶
In the example below, if the pointers A and B point to consecutive addresses, then it is illegal to vectorize the code because some elements of A will be written before they are read from array B.
Some programmers use the ‘restrict’ keyword to notify the compiler that the pointers are disjointed, but in our example, the Loop Vectorizer has no way of knowing that the pointers A and B are unique. The Loop Vectorizer handles this loop by placing code that checks, at runtime, if the arrays A and B point to disjointed memory locations. If arrays A and B overlap, then the scalar version of the loop is executed.
void bar(float *A, float* B, float K, int n) {
for (int i = 0; i < n; ++i)
A[i] *= B[i] + K;
}
Reductions¶
In this example the sum
variable is used by consecutive iterations of
the loop. Normally, this would prevent vectorization, but the vectorizer can
detect that ‘sum’ is a reduction variable. The variable ‘sum’ becomes a vector
of integers, and at the end of the loop the elements of the array are added
together to create the correct result. We support a number of different
reduction operations, such as addition, multiplication, XOR, AND and OR.
int foo(int *A, int *B, int n) {
unsigned sum = 0;
for (int i = 0; i < n; ++i)
sum += A[i] + 5;
return sum;
}
We support floating point reduction operations when -ffast-math is used.
Inductions¶
In this example the value of the induction variable i
is saved into an
array. The Loop Vectorizer knows to vectorize induction variables.
void bar(float *A, float* B, float K, int n) {
for (int i = 0; i < n; ++i)
A[i] = i;
}
If Conversion¶
The Loop Vectorizer is able to “flatten” the IF statement in the code and generate a single stream of instructions. The Loop Vectorizer supports any control flow in the innermost loop. The innermost loop may contain complex nesting of IFs, ELSEs and even GOTOs.
int foo(int *A, int *B, int n) {
unsigned sum = 0;
for (int i = 0; i < n; ++i)
if (A[i] > B[i])
sum += A[i] + 5;
return sum;
}
Pointer Induction Variables¶
This example uses the “accumulate” function of the standard c++ library. This loop uses C++ iterators, which are pointers, and not integer indices. The Loop Vectorizer detects pointer induction variables and can vectorize this loop. This feature is important because many C++ programs use iterators.
int baz(int *A, int n) {
return std::accumulate(A, A + n, 0);
}
Reverse Iterators¶
The Loop Vectorizer can vectorize loops that count backwards.
int foo(int *A, int *B, int n) {
for (int i = n; i > 0; --i)
A[i] +=1;
}
Scatter / Gather¶
The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory.
int foo(int * A, int * B, int n) {
for (intptr_t i = 0; i < n; ++i)
A[i] += B[i * 4];
}
In many situations the cost model will inform LLVM that this is not beneficial and LLVM will only vectorize such code if forced with “-mllvm -force-vector-width=#”.
Vectorization of Mixed Types¶
The Loop Vectorizer can vectorize programs with mixed types. The Vectorizer cost model can estimate the cost of the type conversion and decide if vectorization is profitable.
int foo(int *A, char *B, int n, int k) {
for (int i = 0; i < n; ++i)
A[i] += 4 * B[i];
}
Global Structures Alias Analysis¶
Access to global structures can also be vectorized, with alias analysis being used to make sure accesses don’t alias. Run-time checks can also be added on pointer access to structure members.
Many variations are supported, but some that rely on undefined behaviour being ignored (as other compilers do) are still being left un-vectorized.
struct { int A[100], K, B[100]; } Foo;
int foo() {
for (int i = 0; i < 100; ++i)
Foo.A[i] = Foo.B[i] + 100;
}
Vectorization of function calls¶
The Loop Vectorize can vectorize intrinsic math functions. See the table below for a list of these functions.
pow | exp | exp2 |
sin | cos | sqrt |
log | log2 | log10 |
fabs | floor | ceil |
fma | trunc | nearbyint |
fmuladd |
The loop vectorizer knows about special instructions on the target and will vectorize a loop containing a function call that maps to the instructions. For example, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps instruction is available.
void foo(float *f) {
for (int i = 0; i != 1024; ++i)
f[i] = floorf(f[i]);
}
Partial unrolling during vectorization¶
Modern processors feature multiple execution units, and only programs that contain a high degree of parallelism can fully utilize the entire width of the machine. The Loop Vectorizer increases the instruction level parallelism (ILP) by performing partial-unrolling of loops.
In the example below the entire array is accumulated into the variable ‘sum’. This is inefficient because only a single execution port can be used by the processor. By unrolling the code the Loop Vectorizer allows two or more execution ports to be used simultaneously.
int foo(int *A, int *B, int n) {
unsigned sum = 0;
for (int i = 0; i < n; ++i)
sum += A[i];
return sum;
}
The Loop Vectorizer uses a cost model to decide when it is profitable to unroll loops. The decision to unroll the loop depends on the register pressure and the generated code size.
Performance¶
This section shows the execution time of Clang on a simple benchmark: gcc-loops. This benchmarks is a collection of loops from the GCC autovectorization page by Dorit Nuzman.
The chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for “corei7-avx”, running on a Sandybridge iMac. The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels.

And Linpack-pc with the same configuration. Result is Mflops, higher is better.

The SLP Vectorizer¶
Details¶
The goal of SLP vectorization (a.k.a. superword-level parallelism) is to combine similar independent instructions into vector instructions. Memory accesses, arithmetic operations, comparison operations, PHI-nodes, can all be vectorized using this technique.
For example, the following function performs very similar operations on its inputs (a1, b1) and (a2, b2). The basic-block vectorizer may combine these into vector operations.
void foo(int a1, int a2, int b1, int b2, int *A) {
A[0] = a1*(a1 + b1)/b1 + 50*b1/a1;
A[1] = a2*(a2 + b2)/b2 + 50*b2/a2;
}
The SLP-vectorizer processes the code bottom-up, across basic blocks, in search of scalars to combine.
Usage¶
The SLP Vectorizer is enabled by default, but it can be disabled through clang using the command line flag:
$ clang -fno-slp-vectorize file.c
LLVM has a second basic block vectorization phase which is more compile-time intensive (The BB vectorizer). This optimization can be enabled through clang using the command line flag:
$ clang -fslp-vectorize-aggressive file.c
Writing an LLVM Backend¶
How To Use Instruction Mappings¶
Introduction¶
This document contains information about adding instruction mapping support for a target. The motivation behind this feature comes from the need to switch between different instruction formats during various optimizations. One approach could be to use switch cases which list all the instructions along with formats they can transition to. However, it has large maintenance overhead because of the hardcoded instruction names. Also, whenever a new instruction is added in the .td files, all the relevant switch cases should be modified accordingly. Instead, the same functionality could be achieved with TableGen and some support from the .td files for a fraction of maintenance cost.
InstrMapping
Class Overview¶
TableGen uses relationship models to map instructions with each other. These
models are described using InstrMapping
class as a base. Each model sets
various fields of the InstrMapping
class such that they can uniquely
describe all the instructions using that model. TableGen parses all the relation
models and uses the information to construct relation tables which relate
instructions with each other. These tables are emitted in the
XXXInstrInfo.inc
file along with the functions to query them. Following
is the definition of InstrMapping
class definied in Target.td file:
class InstrMapping {
// Used to reduce search space only to the instructions using this
// relation model.
string FilterClass;
// List of fields/attributes that should be same for all the instructions in
// a row of the relation table. Think of this as a set of properties shared
// by all the instructions related by this relationship.
list<string> RowFields = [];
// List of fields/attributes that are same for all the instructions
// in a column of the relation table.
list<string> ColFields = [];
// Values for the fields/attributes listed in 'ColFields' corresponding to
// the key instruction. This is the instruction that will be transformed
// using this relation model.
list<string> KeyCol = [];
// List of values for the fields/attributes listed in 'ColFields', one for
// each column in the relation table. These are the instructions a key
// instruction will be transformed into.
list<list<string> > ValueCols = [];
}
Sample Example¶
Let’s say that we want to have a function
int getPredOpcode(uint16_t Opcode, enum PredSense inPredSense)
which
takes a non-predicated instruction and returns its predicated true or false form
depending on some input flag, inPredSense
. The first step in the process is
to define a relationship model that relates predicated instructions to their
non-predicated form by assigning appropriate values to the InstrMapping
fields. For this relationship, non-predicated instructions are treated as key
instruction since they are the one used to query the interface function.
def getPredOpcode : InstrMapping {
// Choose a FilterClass that is used as a base class for all the
// instructions modeling this relationship. This is done to reduce the
// search space only to these set of instructions.
let FilterClass = "PredRel";
// Instructions with same values for all the fields in RowFields form a
// row in the resulting relation table.
// For example, if we want to relate 'ADD' (non-predicated) with 'Add_pt'
// (predicated true) and 'Add_pf' (predicated false), then all 3
// instructions need to have same value for BaseOpcode field. It can be any
// unique value (Ex: XYZ) and should not be shared with any other
// instruction not related to 'add'.
let RowFields = ["BaseOpcode"];
// List of attributes that can be used to define key and column instructions
// for a relation. Key instruction is passed as an argument
// to the function used for querying relation tables. Column instructions
// are the instructions they (key) can transform into.
//
// Here, we choose 'PredSense' as ColFields since this is the unique
// attribute of the key (non-predicated) and column (true/false)
// instructions involved in this relationship model.
let ColFields = ["PredSense"];
// The key column contains non-predicated instructions.
let KeyCol = ["none"];
// Two value columns - first column contains instructions with
// PredSense=true while second column has instructions with PredSense=false.
let ValueCols = [["true"], ["false"]];
}
TableGen uses the above relationship model to emit relation table that maps
non-predicated instructions with their predicated forms. It also outputs the
interface function
int getPredOpcode(uint16_t Opcode, enum PredSense inPredSense)
to query
the table. Here, Function getPredOpcode
takes two arguments, opcode of the
current instruction and PredSense of the desired instruction, and returns
predicated form of the instruction, if found in the relation table.
In order for an instruction to be added into the relation table, it needs
to include relevant information in its definition. For example, consider
following to be the current definitions of ADD, ADD_pt (true) and ADD_pf (false)
instructions:
def ADD : ALU32_rr<(outs IntRegs:$dst), (ins IntRegs:$a, IntRegs:$b),
"$dst = add($a, $b)",
[(set (i32 IntRegs:$dst), (add (i32 IntRegs:$a),
(i32 IntRegs:$b)))]>;
def ADD_Pt : ALU32_rr<(outs IntRegs:$dst),
(ins PredRegs:$p, IntRegs:$a, IntRegs:$b),
"if ($p) $dst = add($a, $b)",
[]>;
def ADD_Pf : ALU32_rr<(outs IntRegs:$dst),
(ins PredRegs:$p, IntRegs:$a, IntRegs:$b),
"if (!$p) $dst = add($a, $b)",
[]>;
In this step, we modify these instructions to include the information required by the relationship model, <tt>getPredOpcode</tt>, so that they can be related.
def ADD : PredRel, ALU32_rr<(outs IntRegs:$dst), (ins IntRegs:$a, IntRegs:$b),
"$dst = add($a, $b)",
[(set (i32 IntRegs:$dst), (add (i32 IntRegs:$a),
(i32 IntRegs:$b)))]> {
let BaseOpcode = "ADD";
let PredSense = "none";
}
def ADD_Pt : PredRel, ALU32_rr<(outs IntRegs:$dst),
(ins PredRegs:$p, IntRegs:$a, IntRegs:$b),
"if ($p) $dst = add($a, $b)",
[]> {
let BaseOpcode = "ADD";
let PredSense = "true";
}
def ADD_Pf : PredRel, ALU32_rr<(outs IntRegs:$dst),
(ins PredRegs:$p, IntRegs:$a, IntRegs:$b),
"if (!$p) $dst = add($a, $b)",
[]> {
let BaseOpcode = "ADD";
let PredSense = "false";
}
Please note that all the above instructions use PredRel
as a base class.
This is extremely important since TableGen uses it as a filter for selecting
instructions for getPredOpcode
model. Any instruction not derived from
PredRel
is excluded from the analysis. BaseOpcode
is another important
field. Since it’s selected as a RowFields
of the model, it is required
to have the same value for all 3 instructions in order to be related. Next,
PredSense
is used to determine their column positions by comparing its value
with KeyCol
and ValueCols
. If an instruction sets its PredSense
value to something not used in the relation model, it will not be assigned
a column in the relation table.
Introduction¶
This document describes techniques for writing compiler backends that convert the LLVM Intermediate Representation (IR) to code for a specified machine or other languages. Code intended for a specific machine can take the form of either assembly code or binary code (usable for a JIT compiler).
The backend of LLVM features a target-independent code generator that may create output for several types of target CPUs — including X86, PowerPC, ARM, and SPARC. The backend may also be used to generate code targeted at SPUs of the Cell processor or GPUs to support the execution of compute kernels.
The document focuses on existing examples found in subdirectories of
llvm/lib/Target
in a downloaded LLVM release. In particular, this document
focuses on the example of creating a static compiler (one that emits text
assembly) for a SPARC target, because SPARC has fairly standard
characteristics, such as a RISC instruction set and straightforward calling
conventions.
Audience¶
The audience for this document is anyone who needs to write an LLVM backend to generate code for a specific hardware or software target.
Prerequisite Reading¶
These essential documents must be read before reading this document:
- LLVM Language Reference Manual — a reference manual for the LLVM assembly language.
- The LLVM Target-Independent Code Generator — a guide to the components (classes and code generation algorithms) for translating the LLVM internal representation into machine code for a specified target. Pay particular attention to the descriptions of code generation stages: Instruction Selection, Scheduling and Formation, SSA-based Optimization, Register Allocation, Prolog/Epilog Code Insertion, Late Machine Code Optimizations, and Code Emission.
- TableGen — a document that describes the TableGen
(
tblgen
) application that manages domain-specific information to support LLVM code generation. TableGen processes input from a target description file (.td
suffix) and generates C++ code that can be used for code generation. - Writing an LLVM Pass — The assembly printer is a
FunctionPass
, as are severalSelectionDAG
processing steps.
To follow the SPARC examples in this document, have a copy of The SPARC
Architecture Manual, Version 8 for
reference. For details about the ARM instruction set, refer to the ARM
Architecture Reference Manual. For more about
the GNU Assembler format (GAS
), see Using As, especially for the
assembly printer. “Using As” contains a list of target machine dependent
features.
Basic Steps¶
To write a compiler backend for LLVM that converts the LLVM IR to code for a specified target (machine or other language), follow these steps:
- Create a subclass of the
TargetMachine
class that describes characteristics of your target machine. Copy existing examples of specificTargetMachine
class and header files; for example, start withSparcTargetMachine.cpp
andSparcTargetMachine.h
, but change the file names for your target. Similarly, change code that references “Sparc
” to reference your target. - Describe the register set of the target. Use TableGen to generate code for
register definition, register aliases, and register classes from a
target-specific
RegisterInfo.td
input file. You should also write additional code for a subclass of theTargetRegisterInfo
class that represents the class register file data used for register allocation and also describes the interactions between registers. - Describe the instruction set of the target. Use TableGen to generate code
for target-specific instructions from target-specific versions of
TargetInstrFormats.td
andTargetInstrInfo.td
. You should write additional code for a subclass of theTargetInstrInfo
class to represent machine instructions supported by the target machine. - Describe the selection and conversion of the LLVM IR from a Directed Acyclic
Graph (DAG) representation of instructions to native target-specific
instructions. Use TableGen to generate code that matches patterns and
selects instructions based on additional information in a target-specific
version of
TargetInstrInfo.td
. Write code forXXXISelDAGToDAG.cpp
, whereXXX
identifies the specific target, to perform pattern matching and DAG-to-DAG instruction selection. Also write code inXXXISelLowering.cpp
to replace or remove operations and data types that are not supported natively in a SelectionDAG. - Write code for an assembly printer that converts LLVM IR to a GAS format for
your target machine. You should add assembly strings to the instructions
defined in your target-specific version of
TargetInstrInfo.td
. You should also write code for a subclass ofAsmPrinter
that performs the LLVM-to-assembly conversion and a trivial subclass ofTargetAsmInfo
. - Optionally, add support for subtargets (i.e., variants with different
capabilities). You should also write code for a subclass of the
TargetSubtarget
class, which allows you to use the-mcpu=
and-mattr=
command-line options. - Optionally, add JIT support and create a machine code emitter (subclass of
TargetJITInfo
) that is used to emit binary code directly into memory.
In the .cpp
and .h
. files, initially stub up these methods and then
implement them later. Initially, you may not know which private members that
the class will need and which components will need to be subclassed.
Preliminaries¶
To actually create your compiler backend, you need to create and modify a few files. The absolute minimum is discussed here. But to actually use the LLVM target-independent code generator, you must perform the steps described in the LLVM Target-Independent Code Generator document.
First, you should create a subdirectory under lib/Target
to hold all the
files related to your target. If your target is called “Dummy”, create the
directory lib/Target/Dummy
.
In this new directory, create a Makefile
. It is easiest to copy a
Makefile
of another target and modify it. It should at least contain the
LEVEL
, LIBRARYNAME
and TARGET
variables, and then include
$(LEVEL)/Makefile.common
. The library can be named LLVMDummy
(for
example, see the MIPS target). Alternatively, you can split the library into
LLVMDummyCodeGen
and LLVMDummyAsmPrinter
, the latter of which should be
implemented in a subdirectory below lib/Target/Dummy
(for example, see the
PowerPC target).
Note that these two naming schemes are hardcoded into llvm-config
. Using
any other naming scheme will confuse llvm-config
and produce a lot of
(seemingly unrelated) linker errors when linking llc
.
To make your target actually do something, you need to implement a subclass of
TargetMachine
. This implementation should typically be in the file
lib/Target/DummyTargetMachine.cpp
, but any file in the lib/Target
directory will be built and should work. To use LLVM’s target independent code
generator, you should do what all current machine backends do: create a
subclass of LLVMTargetMachine
. (To create a target from scratch, create a
subclass of TargetMachine
.)
To get LLVM to actually build and link your target, you need to add it to the
TARGETS_TO_BUILD
variable. To do this, you modify the configure script to
know about your target when parsing the --enable-targets
option. Search
the configure script for TARGETS_TO_BUILD
, add your target to the lists
there (some creativity required), and then reconfigure. Alternatively, you can
change autoconf/configure.ac
and regenerate configure by running
./autoconf/AutoRegen.sh
.
Target Machine¶
LLVMTargetMachine
is designed as a base class for targets implemented with
the LLVM target-independent code generator. The LLVMTargetMachine
class
should be specialized by a concrete target class that implements the various
virtual methods. LLVMTargetMachine
is defined as a subclass of
TargetMachine
in include/llvm/Target/TargetMachine.h
. The
TargetMachine
class implementation (TargetMachine.cpp
) also processes
numerous command-line options.
To create a concrete target-specific subclass of LLVMTargetMachine
, start
by copying an existing TargetMachine
class and header. You should name the
files that you create to reflect your specific target. For instance, for the
SPARC target, name the files SparcTargetMachine.h
and
SparcTargetMachine.cpp
.
For a target machine XXX
, the implementation of XXXTargetMachine
must
have access methods to obtain objects that represent target components. These
methods are named get*Info
, and are intended to obtain the instruction set
(getInstrInfo
), register set (getRegisterInfo
), stack frame layout
(getFrameInfo
), and similar information. XXXTargetMachine
must also
implement the getDataLayout
method to access an object with target-specific
data characteristics, such as data type size and alignment requirements.
For instance, for the SPARC target, the header file SparcTargetMachine.h
declares prototypes for several get*Info
and getDataLayout
methods that
simply return a class member.
namespace llvm {
class Module;
class SparcTargetMachine : public LLVMTargetMachine {
const DataLayout DataLayout; // Calculates type size & alignment
SparcSubtarget Subtarget;
SparcInstrInfo InstrInfo;
TargetFrameInfo FrameInfo;
protected:
virtual const TargetAsmInfo *createTargetAsmInfo() const;
public:
SparcTargetMachine(const Module &M, const std::string &FS);
virtual const SparcInstrInfo *getInstrInfo() const {return &InstrInfo; }
virtual const TargetFrameInfo *getFrameInfo() const {return &FrameInfo; }
virtual const TargetSubtarget *getSubtargetImpl() const{return &Subtarget; }
virtual const TargetRegisterInfo *getRegisterInfo() const {
return &InstrInfo.getRegisterInfo();
}
virtual const DataLayout *getDataLayout() const { return &DataLayout; }
static unsigned getModuleMatchQuality(const Module &M);
// Pass Pipeline Configuration
virtual bool addInstSelector(PassManagerBase &PM, bool Fast);
virtual bool addPreEmitPass(PassManagerBase &PM, bool Fast);
};
} // end namespace llvm
getInstrInfo()
getRegisterInfo()
getFrameInfo()
getDataLayout()
getSubtargetImpl()
For some targets, you also need to support the following methods:
getTargetLowering()
getJITInfo()
Some architectures, such as GPUs, do not support jumping to an arbitrary program location and implement branching using masked execution and loop using special instructions around the loop body. In order to avoid CFG modifications that introduce irreducible control flow not handled by such hardware, a target must call setRequiresStructuredCFG(true) when being initialized.
In addition, the XXXTargetMachine
constructor should specify a
TargetDescription
string that determines the data layout for the target
machine, including characteristics such as pointer size, alignment, and
endianness. For example, the constructor for SparcTargetMachine
contains
the following:
SparcTargetMachine::SparcTargetMachine(const Module &M, const std::string &FS)
: DataLayout("E-p:32:32-f128:128:128"),
Subtarget(M, FS), InstrInfo(Subtarget),
FrameInfo(TargetFrameInfo::StackGrowsDown, 8, 0) {
}
Hyphens separate portions of the TargetDescription
string.
- An upper-case “
E
” in the string indicates a big-endian target data model. A lower-case “e
” indicates little-endian. - “
p:
” is followed by pointer information: size, ABI alignment, and preferred alignment. If only two figures follow “p:
”, then the first value is pointer size, and the second value is both ABI and preferred alignment. - Then a letter for numeric type alignment: “
i
”, “f
”, “v
”, or “a
” (corresponding to integer, floating point, vector, or aggregate). “i
”, “v
”, or “a
” are followed by ABI alignment and preferred alignment. “f
” is followed by three values: the first indicates the size of a long double, then ABI alignment, and then ABI preferred alignment.
Target Registration¶
You must also register your target with the TargetRegistry
, which is what
other LLVM tools use to be able to lookup and use your target at runtime. The
TargetRegistry
can be used directly, but for most targets there are helper
templates which should take care of the work for you.
All targets should declare a global Target
object which is used to
represent the target during registration. Then, in the target’s TargetInfo
library, the target should define that object and use the RegisterTarget
template to register the target. For example, the Sparc registration code
looks like this:
Target llvm::TheSparcTarget;
extern "C" void LLVMInitializeSparcTargetInfo() {
RegisterTarget<Triple::sparc, /*HasJIT=*/false>
X(TheSparcTarget, "sparc", "Sparc");
}
This allows the TargetRegistry
to look up the target by name or by target
triple. In addition, most targets will also register additional features which
are available in separate libraries. These registration steps are separate,
because some clients may wish to only link in some parts of the target — the
JIT code generator does not require the use of the assembler printer, for
example. Here is an example of registering the Sparc assembly printer:
extern "C" void LLVMInitializeSparcAsmPrinter() {
RegisterAsmPrinter<SparcAsmPrinter> X(TheSparcTarget);
}
For more information, see “llvm/Target/TargetRegistry.h”.
Register Set and Register Classes¶
You should describe a concrete target-specific class that represents the
register file of a target machine. This class is called XXXRegisterInfo
(where XXX
identifies the target) and represents the class register file
data that is used for register allocation. It also describes the interactions
between registers.
You also need to define register classes to categorize related registers. A register class should be added for groups of registers that are all treated the same way for some instruction. Typical examples are register classes for integer, floating-point, or vector registers. A register allocator allows an instruction to use any register in a specified register class to perform the instruction in a similar manner. Register classes allocate virtual registers to instructions from these sets, and register classes let the target-independent register allocator automatically choose the actual registers.
Much of the code for registers, including register definition, register
aliases, and register classes, is generated by TableGen from
XXXRegisterInfo.td
input files and placed in XXXGenRegisterInfo.h.inc
and XXXGenRegisterInfo.inc
output files. Some of the code in the
implementation of XXXRegisterInfo
requires hand-coding.
Defining a Register¶
The XXXRegisterInfo.td
file typically starts with register definitions for
a target machine. The Register
class (specified in Target.td
) is used
to define an object for each register. The specified string n
becomes the
Name
of the register. The basic Register
object does not have any
subregisters and does not specify any aliases.
class Register<string n> {
string Namespace = "";
string AsmName = n;
string Name = n;
int SpillSize = 0;
int SpillAlignment = 0;
list<Register> Aliases = [];
list<Register> SubRegs = [];
list<int> DwarfNumbers = [];
}
For example, in the X86RegisterInfo.td
file, there are register definitions
that utilize the Register
class, such as:
def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>;
This defines the register AL
and assigns it values (with DwarfRegNum
)
that are used by gcc
, gdb
, or a debug information writer to identify a
register. For register AL
, DwarfRegNum
takes an array of 3 values
representing 3 different modes: the first element is for X86-64, the second for
exception handling (EH) on X86-32, and the third is generic. -1 is a special
Dwarf number that indicates the gcc number is undefined, and -2 indicates the
register number is invalid for this mode.
From the previously described line in the X86RegisterInfo.td
file, TableGen
generates this code in the X86GenRegisterInfo.inc
file:
static const unsigned GR8[] = { X86::AL, ... };
const unsigned AL_AliasSet[] = { X86::AX, X86::EAX, X86::RAX, 0 };
const TargetRegisterDesc RegisterDescriptors[] = {
...
{ "AL", "AL", AL_AliasSet, Empty_SubRegsSet, Empty_SubRegsSet, AL_SuperRegsSet }, ...
From the register info file, TableGen generates a TargetRegisterDesc
object
for each register. TargetRegisterDesc
is defined in
include/llvm/Target/TargetRegisterInfo.h
with the following fields:
struct TargetRegisterDesc {
const char *AsmName; // Assembly language name for the register
const char *Name; // Printable name for the reg (for debugging)
const unsigned *AliasSet; // Register Alias Set
const unsigned *SubRegs; // Sub-register set
const unsigned *ImmSubRegs; // Immediate sub-register set
const unsigned *SuperRegs; // Super-register set
};
TableGen uses the entire target description file (.td
) to determine text
names for the register (in the AsmName
and Name
fields of
TargetRegisterDesc
) and the relationships of other registers to the defined
register (in the other TargetRegisterDesc
fields). In this example, other
definitions establish the registers “AX
”, “EAX
”, and “RAX
” as
aliases for one another, so TableGen generates a null-terminated array
(AL_AliasSet
) for this register alias set.
The Register
class is commonly used as a base class for more complex
classes. In Target.td
, the Register
class is the base for the
RegisterWithSubRegs
class that is used to define registers that need to
specify subregisters in the SubRegs
list, as shown here:
class RegisterWithSubRegs<string n, list<Register> subregs> : Register<n> {
let SubRegs = subregs;
}
In SparcRegisterInfo.td
, additional register classes are defined for SPARC:
a Register
subclass, SparcReg
, and further subclasses: Ri
, Rf
,
and Rd
. SPARC registers are identified by 5-bit ID numbers, which is a
feature common to these subclasses. Note the use of “let
” expressions to
override values that are initially defined in a superclass (such as SubRegs
field in the Rd
class).
class SparcReg<string n> : Register<n> {
field bits<5> Num;
let Namespace = "SP";
}
// Ri - 32-bit integer registers
class Ri<bits<5> num, string n> :
SparcReg<n> {
let Num = num;
}
// Rf - 32-bit floating-point registers
class Rf<bits<5> num, string n> :
SparcReg<n> {
let Num = num;
}
// Rd - Slots in the FP register file for 64-bit floating-point values.
class Rd<bits<5> num, string n, list<Register> subregs> : SparcReg<n> {
let Num = num;
let SubRegs = subregs;
}
In the SparcRegisterInfo.td
file, there are register definitions that
utilize these subclasses of Register
, such as:
def G0 : Ri< 0, "G0">, DwarfRegNum<[0]>;
def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>;
...
def F0 : Rf< 0, "F0">, DwarfRegNum<[32]>;
def F1 : Rf< 1, "F1">, DwarfRegNum<[33]>;
...
def D0 : Rd< 0, "F0", [F0, F1]>, DwarfRegNum<[32]>;
def D1 : Rd< 2, "F2", [F2, F3]>, DwarfRegNum<[34]>;
The last two registers shown above (D0
and D1
) are double-precision
floating-point registers that are aliases for pairs of single-precision
floating-point sub-registers. In addition to aliases, the sub-register and
super-register relationships of the defined register are in fields of a
register’s TargetRegisterDesc
.
Defining a Register Class¶
The RegisterClass
class (specified in Target.td
) is used to define an
object that represents a group of related registers and also defines the
default allocation order of the registers. A target description file
XXXRegisterInfo.td
that uses Target.td
can construct register classes
using the following class:
class RegisterClass<string namespace,
list<ValueType> regTypes, int alignment, dag regList> {
string Namespace = namespace;
list<ValueType> RegTypes = regTypes;
int Size = 0; // spill size, in bits; zero lets tblgen pick the size
int Alignment = alignment;
// CopyCost is the cost of copying a value between two registers
// default value 1 means a single instruction
// A negative value means copying is extremely expensive or impossible
int CopyCost = 1;
dag MemberList = regList;
// for register classes that are subregisters of this class
list<RegisterClass> SubRegClassList = [];
code MethodProtos = [{}]; // to insert arbitrary code
code MethodBodies = [{}];
}
To define a RegisterClass
, use the following 4 arguments:
- The first argument of the definition is the name of the namespace.
- The second argument is a list of
ValueType
register type values that are defined ininclude/llvm/CodeGen/ValueTypes.td
. Defined values include integer types (such asi16
,i32
, andi1
for Boolean), floating-point types (f32
,f64
), and vector types (for example,v8i16
for an8 x i16
vector). All registers in aRegisterClass
must have the sameValueType
, but some registers may store vector data in different configurations. For example a register that can process a 128-bit vector may be able to handle 16 8-bit integer elements, 8 16-bit integers, 4 32-bit integers, and so on. - The third argument of the
RegisterClass
definition specifies the alignment required of the registers when they are stored or loaded to memory. - The final argument,
regList
, specifies which registers are in this class. If an alternative allocation order method is not specified, thenregList
also defines the order of allocation used by the register allocator. Besides simply listing registers with(add R0, R1, ...)
, more advanced set operators are available. Seeinclude/llvm/Target/Target.td
for more information.
In SparcRegisterInfo.td
, three RegisterClass
objects are defined:
FPRegs
, DFPRegs
, and IntRegs
. For all three register classes, the
first argument defines the namespace with the string “SP
”. FPRegs
defines a group of 32 single-precision floating-point registers (F0
to
F31
); DFPRegs
defines a group of 16 double-precision registers
(D0-D15
).
// F0, F1, F2, ..., F31
def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>;
def DFPRegs : RegisterClass<"SP", [f64], 64,
(add D0, D1, D2, D3, D4, D5, D6, D7, D8,
D9, D10, D11, D12, D13, D14, D15)>;
def IntRegs : RegisterClass<"SP", [i32], 32,
(add L0, L1, L2, L3, L4, L5, L6, L7,
I0, I1, I2, I3, I4, I5,
O0, O1, O2, O3, O4, O5, O7,
G1,
// Non-allocatable regs:
G2, G3, G4,
O6, // stack ptr
I6, // frame ptr
I7, // return address
G0, // constant zero
G5, G6, G7 // reserved for kernel
)>;
Using SparcRegisterInfo.td
with TableGen generates several output files
that are intended for inclusion in other source code that you write.
SparcRegisterInfo.td
generates SparcGenRegisterInfo.h.inc
, which should
be included in the header file for the implementation of the SPARC register
implementation that you write (SparcRegisterInfo.h
). In
SparcGenRegisterInfo.h.inc
a new structure is defined called
SparcGenRegisterInfo
that uses TargetRegisterInfo
as its base. It also
specifies types, based upon the defined register classes: DFPRegsClass
,
FPRegsClass
, and IntRegsClass
.
SparcRegisterInfo.td
also generates SparcGenRegisterInfo.inc
, which is
included at the bottom of SparcRegisterInfo.cpp
, the SPARC register
implementation. The code below shows only the generated integer registers and
associated register classes. The order of registers in IntRegs
reflects
the order in the definition of IntRegs
in the target description file.
// IntRegs Register Class...
static const unsigned IntRegs[] = {
SP::L0, SP::L1, SP::L2, SP::L3, SP::L4, SP::L5,
SP::L6, SP::L7, SP::I0, SP::I1, SP::I2, SP::I3,
SP::I4, SP::I5, SP::O0, SP::O1, SP::O2, SP::O3,
SP::O4, SP::O5, SP::O7, SP::G1, SP::G2, SP::G3,
SP::G4, SP::O6, SP::I6, SP::I7, SP::G0, SP::G5,
SP::G6, SP::G7,
};
// IntRegsVTs Register Class Value Types...
static const MVT::ValueType IntRegsVTs[] = {
MVT::i32, MVT::Other
};
namespace SP { // Register class instances
DFPRegsClass DFPRegsRegClass;
FPRegsClass FPRegsRegClass;
IntRegsClass IntRegsRegClass;
...
// IntRegs Sub-register Classess...
static const TargetRegisterClass* const IntRegsSubRegClasses [] = {
NULL
};
...
// IntRegs Super-register Classess...
static const TargetRegisterClass* const IntRegsSuperRegClasses [] = {
NULL
};
...
// IntRegs Register Class sub-classes...
static const TargetRegisterClass* const IntRegsSubclasses [] = {
NULL
};
...
// IntRegs Register Class super-classes...
static const TargetRegisterClass* const IntRegsSuperclasses [] = {
NULL
};
IntRegsClass::IntRegsClass() : TargetRegisterClass(IntRegsRegClassID,
IntRegsVTs, IntRegsSubclasses, IntRegsSuperclasses, IntRegsSubRegClasses,
IntRegsSuperRegClasses, 4, 4, 1, IntRegs, IntRegs + 32) {}
}
The register allocators will avoid using reserved registers, and callee saved registers are not used until all the volatile registers have been used. That is usually good enough, but in some cases it may be necessary to provide custom allocation orders.
Implement a subclass of TargetRegisterInfo
¶
The final step is to hand code portions of XXXRegisterInfo
, which
implements the interface described in TargetRegisterInfo.h
(see
The TargetRegisterInfo class). These functions return 0
, NULL
, or
false
, unless overridden. Here is a list of functions that are overridden
for the SPARC implementation in SparcRegisterInfo.cpp
:
getCalleeSavedRegs
— Returns a list of callee-saved registers in the order of the desired callee-save stack frame offset.getReservedRegs
— Returns a bitset indexed by physical register numbers, indicating if a particular register is unavailable.hasFP
— Return a Boolean indicating if a function should have a dedicated frame pointer register.eliminateCallFramePseudoInstr
— If call frame setup or destroy pseudo instructions are used, this can be called to eliminate them.eliminateFrameIndex
— Eliminate abstract frame indices from instructions that may use them.emitPrologue
— Insert prologue code into the function.emitEpilogue
— Insert epilogue code into the function.
Instruction Set¶
During the early stages of code generation, the LLVM IR code is converted to a
SelectionDAG
with nodes that are instances of the SDNode
class
containing target instructions. An SDNode
has an opcode, operands, type
requirements, and operation properties. For example, is an operation
commutative, does an operation load from memory. The various operation node
types are described in the include/llvm/CodeGen/SelectionDAGNodes.h
file
(values of the NodeType
enum in the ISD
namespace).
TableGen uses the following target description (.td
) input files to
generate much of the code for instruction definition:
Target.td
— Where theInstruction
,Operand
,InstrInfo
, and other fundamental classes are defined.TargetSelectionDAG.td
— Used bySelectionDAG
instruction selection generators, containsSDTC*
classes (selection DAG type constraint), definitions ofSelectionDAG
nodes (such asimm
,cond
,bb
,add
,fadd
,sub
), and pattern support (Pattern
,Pat
,PatFrag
,PatLeaf
,ComplexPattern
.XXXInstrFormats.td
— Patterns for definitions of target-specific instructions.XXXInstrInfo.td
— Target-specific definitions of instruction templates, condition codes, and instructions of an instruction set. For architecture modifications, a different file name may be used. For example, for Pentium with SSE instruction, this file isX86InstrSSE.td
, and for Pentium with MMX, this file isX86InstrMMX.td
.
There is also a target-specific XXX.td
file, where XXX
is the name of
the target. The XXX.td
file includes the other .td
input files, but
its contents are only directly important for subtargets.
You should describe a concrete target-specific class XXXInstrInfo
that
represents machine instructions supported by a target machine.
XXXInstrInfo
contains an array of XXXInstrDescriptor
objects, each of
which describes one instruction. An instruction descriptor defines:
- Opcode mnemonic
- Number of operands
- List of implicit register definitions and uses
- Target-independent properties (such as memory access, is commutable)
- Target-specific flags
The Instruction class (defined in Target.td
) is mostly used as a base for
more complex instruction classes.
class Instruction {
string Namespace = "";
dag OutOperandList; // A dag containing the MI def operand list.
dag InOperandList; // A dag containing the MI use operand list.
string AsmString = ""; // The .s format to print the instruction with.
list<dag> Pattern; // Set to the DAG pattern for this instruction.
list<Register> Uses = [];
list<Register> Defs = [];
list<Predicate> Predicates = []; // predicates turned into isel match code
... remainder not shown for space ...
}
A SelectionDAG
node (SDNode
) should contain an object representing a
target-specific instruction that is defined in XXXInstrInfo.td
. The
instruction objects should represent instructions from the architecture manual
of the target machine (such as the SPARC Architecture Manual for the SPARC
target).
A single instruction from the architecture manual is often modeled as multiple
target instructions, depending upon its operands. For example, a manual might
describe an add instruction that takes a register or an immediate operand. An
LLVM target could model this with two instructions named ADDri
and
ADDrr
.
You should define a class for each instruction category and define each opcode as a subclass of the category with appropriate parameters such as the fixed binary encoding of opcodes and extended opcodes. You should map the register bits to the bits of the instruction in which they are encoded (for the JIT). Also you should specify how the instruction should be printed when the automatic assembly printer is used.
As is described in the SPARC Architecture Manual, Version 8, there are three
major 32-bit formats for instructions. Format 1 is only for the CALL
instruction. Format 2 is for branch on condition codes and SETHI
(set high
bits of a register) instructions. Format 3 is for other instructions.
Each of these formats has corresponding classes in SparcInstrFormat.td
.
InstSP
is a base class for other instruction classes. Additional base
classes are specified for more precise formats: for example in
SparcInstrFormat.td
, F2_1
is for SETHI
, and F2_2
is for
branches. There are three other base classes: F3_1
for register/register
operations, F3_2
for register/immediate operations, and F3_3
for
floating-point operations. SparcInstrInfo.td
also adds the base class
Pseudo
for synthetic SPARC instructions.
SparcInstrInfo.td
largely consists of operand and instruction definitions
for the SPARC target. In SparcInstrInfo.td
, the following target
description file entry, LDrr
, defines the Load Integer instruction for a
Word (the LD
SPARC opcode) from a memory address to a register. The first
parameter, the value 3 (11
2), is the operation value for this
category of operation. The second parameter (000000
2) is the
specific operation value for LD
/Load Word. The third parameter is the
output destination, which is a register operand and defined in the Register
target description file (IntRegs
).
def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr),
"ld [$addr], $dst",
[(set i32:$dst, (load ADDRrr:$addr))]>;
The fourth parameter is the input source, which uses the address operand
MEMrr
that is defined earlier in SparcInstrInfo.td
:
def MEMrr : Operand<i32> {
let PrintMethod = "printMemOperand";
let MIOperandInfo = (ops IntRegs, IntRegs);
}
The fifth parameter is a string that is used by the assembly printer and can be left as an empty string until the assembly printer interface is implemented. The sixth and final parameter is the pattern used to match the instruction during the SelectionDAG Select Phase described in The LLVM Target-Independent Code Generator. This parameter is detailed in the next section, Instruction Selector.
Instruction class definitions are not overloaded for different operand types, so separate versions of instructions are needed for register, memory, or immediate value operands. For example, to perform a Load Integer instruction for a Word from an immediate operand to a register, the following instruction class is defined:
def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr),
"ld [$addr], $dst",
[(set i32:$dst, (load ADDRri:$addr))]>;
Writing these definitions for so many similar instructions can involve a lot of
cut and paste. In .td
files, the multiclass
directive enables the
creation of templates to define several instruction classes at once (using the
defm
directive). For example in SparcInstrInfo.td
, the multiclass
pattern F3_12
is defined to create 2 instruction classes each time
F3_12
is invoked:
multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> {
def rr : F3_1 <2, Op3Val,
(outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
!strconcat(OpcStr, " $b, $c, $dst"),
[(set i32:$dst, (OpNode i32:$b, i32:$c))]>;
def ri : F3_2 <2, Op3Val,
(outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c),
!strconcat(OpcStr, " $b, $c, $dst"),
[(set i32:$dst, (OpNode i32:$b, simm13:$c))]>;
}
So when the defm
directive is used for the XOR
and ADD
instructions, as seen below, it creates four instruction objects: XORrr
,
XORri
, ADDrr
, and ADDri
.
defm XOR : F3_12<"xor", 0b000011, xor>;
defm ADD : F3_12<"add", 0b000000, add>;
SparcInstrInfo.td
also includes definitions for condition codes that are
referenced by branch instructions. The following definitions in
SparcInstrInfo.td
indicate the bit location of the SPARC condition code.
For example, the 10th bit represents the “greater than” condition for
integers, and the 22nd bit represents the “greater than” condition for
floats.
def ICC_NE : ICC_VAL< 9>; // Not Equal
def ICC_E : ICC_VAL< 1>; // Equal
def ICC_G : ICC_VAL<10>; // Greater
...
def FCC_U : FCC_VAL<23>; // Unordered
def FCC_G : FCC_VAL<22>; // Greater
def FCC_UG : FCC_VAL<21>; // Unordered or Greater
...
(Note that Sparc.h
also defines enums that correspond to the same SPARC
condition codes. Care must be taken to ensure the values in Sparc.h
correspond to the values in SparcInstrInfo.td
. I.e., SPCC::ICC_NE = 9
,
SPCC::FCC_U = 23
and so on.)
Instruction Operand Mapping¶
The code generator backend maps instruction operands to fields in the
instruction. Operands are assigned to unbound fields in the instruction in the
order they are defined. Fields are bound when they are assigned a value. For
example, the Sparc target defines the XNORrr
instruction as a F3_1
format instruction having three operands.
def XNORrr : F3_1<2, 0b000111,
(outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
"xnor $b, $c, $dst",
[(set i32:$dst, (not (xor i32:$b, i32:$c)))]>;
The instruction templates in SparcInstrFormats.td
show the base class for
F3_1
is InstSP
.
class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction {
field bits<32> Inst;
let Namespace = "SP";
bits<2> op;
let Inst{31-30} = op;
dag OutOperandList = outs;
dag InOperandList = ins;
let AsmString = asmstr;
let Pattern = pattern;
}
InstSP
leaves the op
field unbound.
class F3<dag outs, dag ins, string asmstr, list<dag> pattern>
: InstSP<outs, ins, asmstr, pattern> {
bits<5> rd;
bits<6> op3;
bits<5> rs1;
let op{1} = 1; // Op = 2 or 3
let Inst{29-25} = rd;
let Inst{24-19} = op3;
let Inst{18-14} = rs1;
}
F3
binds the op
field and defines the rd
, op3
, and rs1
fields. F3
format instructions will bind the operands rd
, op3
, and
rs1
fields.
class F3_1<bits<2> opVal, bits<6> op3val, dag outs, dag ins,
string asmstr, list<dag> pattern> : F3<outs, ins, asmstr, pattern> {
bits<8> asi = 0; // asi not currently used
bits<5> rs2;
let op = opVal;
let op3 = op3val;
let Inst{13} = 0; // i field = 0
let Inst{12-5} = asi; // address space identifier
let Inst{4-0} = rs2;
}
F3_1
binds the op3
field and defines the rs2
fields. F3_1
format instructions will bind the operands to the rd
, rs1
, and rs2
fields. This results in the XNORrr
instruction binding $dst
, $b
,
and $c
operands to the rd
, rs1
, and rs2
fields respectively.
Instruction Operand Name Mapping¶
TableGen will also generate a function called getNamedOperandIdx() which can be used to look up an operand’s index in a MachineInstr based on its TableGen name. Setting the UseNamedOperandTable bit in an instruction’s TableGen definition will add all of its operands to an enumeration in the llvm::XXX:OpName namespace and also add an entry for it into the OperandMap table, which can be queried using getNamedOperandIdx()
int DstIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::dst); // => 0
int BIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::b); // => 1
int CIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::c); // => 2
int DIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::d); // => -1
...
The entries in the OpName enum are taken verbatim from the TableGen definitions, so operands with lowercase names will have lower case entries in the enum.
To include the getNamedOperandIdx() function in your backend, you will need to define a few preprocessor macros in XXXInstrInfo.cpp and XXXInstrInfo.h. For example:
XXXInstrInfo.cpp:
#define GET_INSTRINFO_NAMED_OPS // For getNamedOperandIdx() function
#include "XXXGenInstrInfo.inc"
XXXInstrInfo.h:
#define GET_INSTRINFO_OPERAND_ENUM // For OpName enum
#include "XXXGenInstrInfo.inc"
namespace XXX {
int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIndex);
} // End namespace XXX
Instruction Operand Types¶
TableGen will also generate an enumeration consisting of all named Operand
types defined in the backend, in the llvm::XXX::OpTypes namespace.
Some common immediate Operand types (for instance i8, i32, i64, f32, f64)
are defined for all targets in include/llvm/Target/Target.td
, and are
available in each Target’s OpTypes enum. Also, only named Operand types appear
in the enumeration: anonymous types are ignored.
For example, the X86 backend defines brtarget
and brtarget8
, both
instances of the TableGen Operand
class, which represent branch target
operands:
def brtarget : Operand<OtherVT>;
def brtarget8 : Operand<OtherVT>;
This results in:
namespace X86 {
namespace OpTypes {
enum OperandType {
...
brtarget,
brtarget8,
...
i32imm,
i64imm,
...
OPERAND_TYPE_LIST_END
} // End namespace OpTypes
} // End namespace X86
In typical TableGen fashion, to use the enum, you will need to define a preprocessor macro:
#define GET_INSTRINFO_OPERAND_TYPES_ENUM // For OpTypes enum
#include "XXXGenInstrInfo.inc"
Instruction Scheduling¶
Instruction itineraries can be queried using MCDesc::getSchedClass(). The value can be named by an enumemation in llvm::XXX::Sched namespace generated by TableGen in XXXGenInstrInfo.inc. The name of the schedule classes are the same as provided in XXXSchedule.td plus a default NoItinerary class.
Instruction Relation Mapping¶
This TableGen feature is used to relate instructions with each other. It is
particularly useful when you have multiple instruction formats and need to
switch between them after instruction selection. This entire feature is driven
by relation models which can be defined in XXXInstrInfo.td
files
according to the target-specific instruction set. Relation models are defined
using InstrMapping
class as a base. TableGen parses all the models
and generates instruction relation maps using the specified information.
Relation maps are emitted as tables in the XXXGenInstrInfo.inc
file
along with the functions to query them. For the detailed information on how to
use this feature, please refer to How To Use Instruction Mappings.
Implement a subclass of TargetInstrInfo
¶
The final step is to hand code portions of XXXInstrInfo
, which implements
the interface described in TargetInstrInfo.h
(see The TargetInstrInfo class).
These functions return 0
or a Boolean or they assert, unless overridden.
Here’s a list of functions that are overridden for the SPARC implementation in
SparcInstrInfo.cpp
:
isLoadFromStackSlot
— If the specified machine instruction is a direct load from a stack slot, return the register number of the destination and theFrameIndex
of the stack slot.isStoreToStackSlot
— If the specified machine instruction is a direct store to a stack slot, return the register number of the destination and theFrameIndex
of the stack slot.copyPhysReg
— Copy values between a pair of physical registers.storeRegToStackSlot
— Store a register value to a stack slot.loadRegFromStackSlot
— Load a register value from a stack slot.storeRegToAddr
— Store a register value to memory.loadRegFromAddr
— Load a register value from memory.foldMemoryOperand
— Attempt to combine instructions of any load or store instruction for the specified operand(s).
Branch Folding and If Conversion¶
Performance can be improved by combining instructions or by eliminating
instructions that are never reached. The AnalyzeBranch
method in
XXXInstrInfo
may be implemented to examine conditional instructions and
remove unnecessary instructions. AnalyzeBranch
looks at the end of a
machine basic block (MBB) for opportunities for improvement, such as branch
folding and if conversion. The BranchFolder
and IfConverter
machine
function passes (see the source files BranchFolding.cpp
and
IfConversion.cpp
in the lib/CodeGen
directory) call AnalyzeBranch
to improve the control flow graph that represents the instructions.
Several implementations of AnalyzeBranch
(for ARM, Alpha, and X86) can be
examined as models for your own AnalyzeBranch
implementation. Since SPARC
does not implement a useful AnalyzeBranch
, the ARM target implementation is
shown below.
AnalyzeBranch
returns a Boolean value and takes four parameters:
MachineBasicBlock &MBB
— The incoming block to be examined.MachineBasicBlock *&TBB
— A destination block that is returned. For a conditional branch that evaluates to true,TBB
is the destination.MachineBasicBlock *&FBB
— For a conditional branch that evaluates to false,FBB
is returned as the destination.std::vector<MachineOperand> &Cond
— List of operands to evaluate a condition for a conditional branch.
In the simplest case, if a block ends without a branch, then it falls through
to the successor block. No destination blocks are specified for either TBB
or FBB
, so both parameters return NULL
. The start of the
AnalyzeBranch
(see code below for the ARM target) shows the function
parameters and the code for the simplest case.
bool ARMInstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,
std::vector<MachineOperand> &Cond) const
{
MachineBasicBlock::iterator I = MBB.end();
if (I == MBB.begin() || !isUnpredicatedTerminator(--I))
return false;
If a block ends with a single unconditional branch instruction, then
AnalyzeBranch
(shown below) should return the destination of that branch in
the TBB
parameter.
if (LastOpc == ARM::B || LastOpc == ARM::tB) {
TBB = LastInst->getOperand(0).getMBB();
return false;
}
If a block ends with two unconditional branches, then the second branch is
never reached. In that situation, as shown below, remove the last branch
instruction and return the penultimate branch in the TBB
parameter.
if ((SecondLastOpc == ARM::B || SecondLastOpc == ARM::tB) &&
(LastOpc == ARM::B || LastOpc == ARM::tB)) {
TBB = SecondLastInst->getOperand(0).getMBB();
I = LastInst;
I->eraseFromParent();
return false;
}
A block may end with a single conditional branch instruction that falls through
to successor block if the condition evaluates to false. In that case,
AnalyzeBranch
(shown below) should return the destination of that
conditional branch in the TBB
parameter and a list of operands in the
Cond
parameter to evaluate the condition.
if (LastOpc == ARM::Bcc || LastOpc == ARM::tBcc) {
// Block ends with fall-through condbranch.
TBB = LastInst->getOperand(0).getMBB();
Cond.push_back(LastInst->getOperand(1));
Cond.push_back(LastInst->getOperand(2));
return false;
}
If a block ends with both a conditional branch and an ensuing unconditional
branch, then AnalyzeBranch
(shown below) should return the conditional
branch destination (assuming it corresponds to a conditional evaluation of
“true
”) in the TBB
parameter and the unconditional branch destination
in the FBB
(corresponding to a conditional evaluation of “false
”). A
list of operands to evaluate the condition should be returned in the Cond
parameter.
unsigned SecondLastOpc = SecondLastInst->getOpcode();
if ((SecondLastOpc == ARM::Bcc && LastOpc == ARM::B) ||
(SecondLastOpc == ARM::tBcc && LastOpc == ARM::tB)) {
TBB = SecondLastInst->getOperand(0).getMBB();
Cond.push_back(SecondLastInst->getOperand(1));
Cond.push_back(SecondLastInst->getOperand(2));
FBB = LastInst->getOperand(0).getMBB();
return false;
}
For the last two cases (ending with a single conditional branch or ending with
one conditional and one unconditional branch), the operands returned in the
Cond
parameter can be passed to methods of other instructions to create new
branches or perform other operations. An implementation of AnalyzeBranch
requires the helper methods RemoveBranch
and InsertBranch
to manage
subsequent operations.
AnalyzeBranch
should return false indicating success in most circumstances.
AnalyzeBranch
should only return true when the method is stumped about what
to do, for example, if a block has three terminating branches.
AnalyzeBranch
may return true if it encounters a terminator it cannot
handle, such as an indirect branch.
Instruction Selector¶
LLVM uses a SelectionDAG
to represent LLVM IR instructions, and nodes of
the SelectionDAG
ideally represent native target instructions. During code
generation, instruction selection passes are performed to convert non-native
DAG instructions into native target-specific instructions. The pass described
in XXXISelDAGToDAG.cpp
is used to match patterns and perform DAG-to-DAG
instruction selection. Optionally, a pass may be defined (in
XXXBranchSelector.cpp
) to perform similar DAG-to-DAG operations for branch
instructions. Later, the code in XXXISelLowering.cpp
replaces or removes
operations and data types not supported natively (legalizes) in a
SelectionDAG
.
TableGen generates code for instruction selection using the following target description input files:
XXXInstrInfo.td
— Contains definitions of instructions in a target-specific instruction set, generatesXXXGenDAGISel.inc
, which is included inXXXISelDAGToDAG.cpp
.XXXCallingConv.td
— Contains the calling and return value conventions for the target architecture, and it generatesXXXGenCallingConv.inc
, which is included inXXXISelLowering.cpp
.
The implementation of an instruction selection pass must include a header that
declares the FunctionPass
class or a subclass of FunctionPass
. In
XXXTargetMachine.cpp
, a Pass Manager (PM) should add each instruction
selection pass into the queue of passes to run.
The LLVM static compiler (llc
) is an excellent tool for visualizing the
contents of DAGs. To display the SelectionDAG
before or after specific
processing phases, use the command line options for llc
, described at
SelectionDAG Instruction Selection Process.
To describe instruction selector behavior, you should add patterns for lowering
LLVM code into a SelectionDAG
as the last parameter of the instruction
definitions in XXXInstrInfo.td
. For example, in SparcInstrInfo.td
,
this entry defines a register store operation, and the last parameter describes
a pattern with the store DAG operator.
def STrr : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src),
"st $src, [$addr]", [(store i32:$src, ADDRrr:$addr)]>;
ADDRrr
is a memory mode that is also defined in SparcInstrInfo.td
:
def ADDRrr : ComplexPattern<i32, 2, "SelectADDRrr", [], []>;
The definition of ADDRrr
refers to SelectADDRrr
, which is a function
defined in an implementation of the Instructor Selector (such as
SparcISelDAGToDAG.cpp
).
In lib/Target/TargetSelectionDAG.td
, the DAG operator for store is defined
below:
def store : PatFrag<(ops node:$val, node:$ptr),
(st node:$val, node:$ptr), [{
if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N))
return !ST->isTruncatingStore() &&
ST->getAddressingMode() == ISD::UNINDEXED;
return false;
}]>;
XXXInstrInfo.td
also generates (in XXXGenDAGISel.inc
) the
SelectCode
method that is used to call the appropriate processing method
for an instruction. In this example, SelectCode
calls Select_ISD_STORE
for the ISD::STORE
opcode.
SDNode *SelectCode(SDValue N) {
...
MVT::ValueType NVT = N.getNode()->getValueType(0);
switch (N.getOpcode()) {
case ISD::STORE: {
switch (NVT) {
default:
return Select_ISD_STORE(N);
break;
}
break;
}
...
The pattern for STrr
is matched, so elsewhere in XXXGenDAGISel.inc
,
code for STrr
is created for Select_ISD_STORE
. The Emit_22
method
is also generated in XXXGenDAGISel.inc
to complete the processing of this
instruction.
SDNode *Select_ISD_STORE(const SDValue &N) {
SDValue Chain = N.getOperand(0);
if (Predicate_store(N.getNode())) {
SDValue N1 = N.getOperand(1);
SDValue N2 = N.getOperand(2);
SDValue CPTmp0;
SDValue CPTmp1;
// Pattern: (st:void i32:i32:$src,
// ADDRrr:i32:$addr)<<P:Predicate_store>>
// Emits: (STrr:void ADDRrr:i32:$addr, IntRegs:i32:$src)
// Pattern complexity = 13 cost = 1 size = 0
if (SelectADDRrr(N, N2, CPTmp0, CPTmp1) &&
N1.getNode()->getValueType(0) == MVT::i32 &&
N2.getNode()->getValueType(0) == MVT::i32) {
return Emit_22(N, SP::STrr, CPTmp0, CPTmp1);
}
...
The SelectionDAG Legalize Phase¶
The Legalize phase converts a DAG to use types and operations that are natively
supported by the target. For natively unsupported types and operations, you
need to add code to the target-specific XXXTargetLowering
implementation to
convert unsupported types and operations to supported ones.
In the constructor for the XXXTargetLowering
class, first use the
addRegisterClass
method to specify which types are supported and which
register classes are associated with them. The code for the register classes
are generated by TableGen from XXXRegisterInfo.td
and placed in
XXXGenRegisterInfo.h.inc
. For example, the implementation of the
constructor for the SparcTargetLowering class (in SparcISelLowering.cpp
)
starts with the following code:
addRegisterClass(MVT::i32, SP::IntRegsRegisterClass);
addRegisterClass(MVT::f32, SP::FPRegsRegisterClass);
addRegisterClass(MVT::f64, SP::DFPRegsRegisterClass);
You should examine the node types in the ISD
namespace
(include/llvm/CodeGen/SelectionDAGNodes.h
) and determine which operations
the target natively supports. For operations that do not have native
support, add a callback to the constructor for the XXXTargetLowering
class,
so the instruction selection process knows what to do. The TargetLowering
class callback methods (declared in llvm/Target/TargetLowering.h
) are:
setOperationAction
— General operation.setLoadExtAction
— Load with extension.setTruncStoreAction
— Truncating store.setIndexedLoadAction
— Indexed load.setIndexedStoreAction
— Indexed store.setConvertAction
— Type conversion.setCondCodeAction
— Support for a given condition code.
Note: on older releases, setLoadXAction
is used instead of
setLoadExtAction
. Also, on older releases, setCondCodeAction
may not
be supported. Examine your release to see what methods are specifically
supported.
These callbacks are used to determine that an operation does or does not work
with a specified type (or types). And in all cases, the third parameter is a
LegalAction
type enum value: Promote
, Expand
, Custom
, or
Legal
. SparcISelLowering.cpp
contains examples of all four
LegalAction
values.
Promote¶
For an operation without native support for a given type, the specified type
may be promoted to a larger type that is supported. For example, SPARC does
not support a sign-extending load for Boolean values (i1
type), so in
SparcISelLowering.cpp
the third parameter below, Promote
, changes
i1
type values to a large type before loading.
setLoadExtAction(ISD::SEXTLOAD, MVT::i1, Promote);
Expand¶
For a type without native support, a value may need to be broken down further,
rather than promoted. For an operation without native support, a combination
of other operations may be used to similar effect. In SPARC, the
floating-point sine and cosine trig operations are supported by expansion to
other operations, as indicated by the third parameter, Expand
, to
setOperationAction
:
setOperationAction(ISD::FSIN, MVT::f32, Expand);
setOperationAction(ISD::FCOS, MVT::f32, Expand);
Custom¶
For some operations, simple type promotion or operation expansion may be insufficient. In some cases, a special intrinsic function must be implemented.
For example, a constant value may require special treatment, or an operation may require spilling and restoring registers in the stack and working with register allocators.
As seen in SparcISelLowering.cpp
code below, to perform a type conversion
from a floating point value to a signed integer, first the
setOperationAction
should be called with Custom
as the third parameter:
setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom);
In the LowerOperation
method, for each Custom
operation, a case
statement should be added to indicate what function to call. In the following
code, an FP_TO_SINT
opcode will call the LowerFP_TO_SINT
method:
SDValue SparcTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) {
switch (Op.getOpcode()) {
case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
...
}
}
Finally, the LowerFP_TO_SINT
method is implemented, using an FP register to
convert the floating-point value to an integer.
static SDValue LowerFP_TO_SINT(SDValue Op, SelectionDAG &DAG) {
assert(Op.getValueType() == MVT::i32);
Op = DAG.getNode(SPISD::FTOI, MVT::f32, Op.getOperand(0));
return DAG.getNode(ISD::BITCAST, MVT::i32, Op);
}
Legal¶
The Legal
LegalizeAction
enum value simply indicates that an operation
is natively supported. Legal
represents the default condition, so it
is rarely used. In SparcISelLowering.cpp
, the action for CTPOP
(an
operation to count the bits set in an integer) is natively supported only for
SPARC v9. The following code enables the Expand
conversion technique for
non-v9 SPARC implementations.
setOperationAction(ISD::CTPOP, MVT::i32, Expand);
...
if (TM.getSubtarget<SparcSubtarget>().isV9())
setOperationAction(ISD::CTPOP, MVT::i32, Legal);
Calling Conventions¶
To support target-specific calling conventions, XXXGenCallingConv.td
uses
interfaces (such as CCIfType
and CCAssignToReg
) that are defined in
lib/Target/TargetCallingConv.td
. TableGen can take the target descriptor
file XXXGenCallingConv.td
and generate the header file
XXXGenCallingConv.inc
, which is typically included in
XXXISelLowering.cpp
. You can use the interfaces in
TargetCallingConv.td
to specify:
- The order of parameter allocation.
- Where parameters and return values are placed (that is, on the stack or in registers).
- Which registers may be used.
- Whether the caller or callee unwinds the stack.
The following example demonstrates the use of the CCIfType
and
CCAssignToReg
interfaces. If the CCIfType
predicate is true (that is,
if the current argument is of type f32
or f64
), then the action is
performed. In this case, the CCAssignToReg
action assigns the argument
value to the first available register: either R0
or R1
.
CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>>
SparcCallingConv.td
contains definitions for a target-specific return-value
calling convention (RetCC_Sparc32
) and a basic 32-bit C calling convention
(CC_Sparc32
). The definition of RetCC_Sparc32
(shown below) indicates
which registers are used for specified scalar return types. A single-precision
float is returned to register F0
, and a double-precision float goes to
register D0
. A 32-bit integer is returned in register I0
or I1
.
def RetCC_Sparc32 : CallingConv<[
CCIfType<[i32], CCAssignToReg<[I0, I1]>>,
CCIfType<[f32], CCAssignToReg<[F0]>>,
CCIfType<[f64], CCAssignToReg<[D0]>>
]>;
The definition of CC_Sparc32
in SparcCallingConv.td
introduces
CCAssignToStack
, which assigns the value to a stack slot with the specified
size and alignment. In the example below, the first parameter, 4, indicates
the size of the slot, and the second parameter, also 4, indicates the stack
alignment along 4-byte units. (Special cases: if size is zero, then the ABI
size is used; if alignment is zero, then the ABI alignment is used.)
def CC_Sparc32 : CallingConv<[
// All arguments get passed in integer registers if there is space.
CCIfType<[i32, f32, f64], CCAssignToReg<[I0, I1, I2, I3, I4, I5]>>,
CCAssignToStack<4, 4>
]>;
CCDelegateTo
is another commonly used interface, which tries to find a
specified sub-calling convention, and, if a match is found, it is invoked. In
the following example (in X86CallingConv.td
), the definition of
RetCC_X86_32_C
ends with CCDelegateTo
. After the current value is
assigned to the register ST0
or ST1
, the RetCC_X86Common
is
invoked.
def RetCC_X86_32_C : CallingConv<[
CCIfType<[f32], CCAssignToReg<[ST0, ST1]>>,
CCIfType<[f64], CCAssignToReg<[ST0, ST1]>>,
CCDelegateTo<RetCC_X86Common>
]>;
CCIfCC
is an interface that attempts to match the given name to the current
calling convention. If the name identifies the current calling convention,
then a specified action is invoked. In the following example (in
X86CallingConv.td
), if the Fast
calling convention is in use, then
RetCC_X86_32_Fast
is invoked. If the SSECall
calling convention is in
use, then RetCC_X86_32_SSE
is invoked.
def RetCC_X86_32 : CallingConv<[
CCIfCC<"CallingConv::Fast", CCDelegateTo<RetCC_X86_32_Fast>>,
CCIfCC<"CallingConv::X86_SSECall", CCDelegateTo<RetCC_X86_32_SSE>>,
CCDelegateTo<RetCC_X86_32_C>
]>;
Other calling convention interfaces include:
CCIf <predicate, action>
— If the predicate matches, apply the action.CCIfInReg <action>
— If the argument is marked with the “inreg
” attribute, then apply the action.CCIfNest <action>
— If the argument is marked with the “nest
” attribute, then apply the action.CCIfNotVarArg <action>
— If the current function does not take a variable number of arguments, apply the action.CCAssignToRegWithShadow <registerList, shadowList>
— similar toCCAssignToReg
, but with a shadow list of registers.CCPassByVal <size, align>
— Assign value to a stack slot with the minimum specified size and alignment.CCPromoteToType <type>
— Promote the current value to the specified type.CallingConv <[actions]>
— Define each calling convention that is supported.
Assembly Printer¶
During the code emission stage, the code generator may utilize an LLVM pass to produce assembly output. To do this, you want to implement the code for a printer that converts LLVM IR to a GAS-format assembly language for your target machine, using the following steps:
- Define all the assembly strings for your target, adding them to the
instructions defined in the
XXXInstrInfo.td
file. (See Instruction Set.) TableGen will produce an output file (XXXGenAsmWriter.inc
) with an implementation of theprintInstruction
method for theXXXAsmPrinter
class. - Write
XXXTargetAsmInfo.h
, which contains the bare-bones declaration of theXXXTargetAsmInfo
class (a subclass ofTargetAsmInfo
). - Write
XXXTargetAsmInfo.cpp
, which contains target-specific values forTargetAsmInfo
properties and sometimes new implementations for methods. - Write
XXXAsmPrinter.cpp
, which implements theAsmPrinter
class that performs the LLVM-to-assembly conversion.
The code in XXXTargetAsmInfo.h
is usually a trivial declaration of the
XXXTargetAsmInfo
class for use in XXXTargetAsmInfo.cpp
. Similarly,
XXXTargetAsmInfo.cpp
usually has a few declarations of XXXTargetAsmInfo
replacement values that override the default values in TargetAsmInfo.cpp
.
For example in SparcTargetAsmInfo.cpp
:
SparcTargetAsmInfo::SparcTargetAsmInfo(const SparcTargetMachine &TM) {
Data16bitsDirective = "\t.half\t";
Data32bitsDirective = "\t.word\t";
Data64bitsDirective = 0; // .xword is only supported by V9.
ZeroDirective = "\t.skip\t";
CommentString = "!";
ConstantPoolSection = "\t.section \".rodata\",#alloc\n";
}
The X86 assembly printer implementation (X86TargetAsmInfo
) is an example
where the target specific TargetAsmInfo
class uses an overridden methods:
ExpandInlineAsm
.
A target-specific implementation of AsmPrinter
is written in
XXXAsmPrinter.cpp
, which implements the AsmPrinter
class that converts
the LLVM to printable assembly. The implementation must include the following
headers that have declarations for the AsmPrinter
and
MachineFunctionPass
classes. The MachineFunctionPass
is a subclass of
FunctionPass
.
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
As a FunctionPass
, AsmPrinter
first calls doInitialization
to set
up the AsmPrinter
. In SparcAsmPrinter
, a Mangler
object is
instantiated to process variable names.
In XXXAsmPrinter.cpp
, the runOnMachineFunction
method (declared in
MachineFunctionPass
) must be implemented for XXXAsmPrinter
. In
MachineFunctionPass
, the runOnFunction
method invokes
runOnMachineFunction
. Target-specific implementations of
runOnMachineFunction
differ, but generally do the following to process each
machine function:
- Call
SetupMachineFunction
to perform initialization. - Call
EmitConstantPool
to print out (to the output stream) constants which have been spilled to memory. - Call
EmitJumpTableInfo
to print out jump tables used by the current function. - Print out the label for the current function.
- Print out the code for the function, including basic block labels and the
assembly for the instruction (using
printInstruction
)
The XXXAsmPrinter
implementation must also include the code generated by
TableGen that is output in the XXXGenAsmWriter.inc
file. The code in
XXXGenAsmWriter.inc
contains an implementation of the printInstruction
method that may call these methods:
printOperand
printMemOperand
printCCOperand
(for conditional statements)printDataDirective
printDeclare
printImplicitDef
printInlineAsm
The implementations of printDeclare
, printImplicitDef
,
printInlineAsm
, and printLabel
in AsmPrinter.cpp
are generally
adequate for printing assembly and do not need to be overridden.
The printOperand
method is implemented with a long switch
/case
statement for the type of operand: register, immediate, basic block, external
symbol, global address, constant pool index, or jump table index. For an
instruction with a memory address operand, the printMemOperand
method
should be implemented to generate the proper output. Similarly,
printCCOperand
should be used to print a conditional operand.
doFinalization
should be overridden in XXXAsmPrinter
, and it should be
called to shut down the assembly printer. During doFinalization
, global
variables and constants are printed to output.
Subtarget Support¶
Subtarget support is used to inform the code generation process of instruction set variations for a given chip set. For example, the LLVM SPARC implementation provided covers three major versions of the SPARC microprocessor architecture: Version 8 (V8, which is a 32-bit architecture), Version 9 (V9, a 64-bit architecture), and the UltraSPARC architecture. V8 has 16 double-precision floating-point registers that are also usable as either 32 single-precision or 8 quad-precision registers. V8 is also purely big-endian. V9 has 32 double-precision floating-point registers that are also usable as 16 quad-precision registers, but cannot be used as single-precision registers. The UltraSPARC architecture combines V9 with UltraSPARC Visual Instruction Set extensions.
If subtarget support is needed, you should implement a target-specific
XXXSubtarget
class for your architecture. This class should process the
command-line options -mcpu=
and -mattr=
.
TableGen uses definitions in the Target.td
and Sparc.td
files to
generate code in SparcGenSubtarget.inc
. In Target.td
, shown below, the
SubtargetFeature
interface is defined. The first 4 string parameters of
the SubtargetFeature
interface are a feature name, an attribute set by the
feature, the value of the attribute, and a description of the feature. (The
fifth parameter is a list of features whose presence is implied, and its
default value is an empty array.)
class SubtargetFeature<string n, string a, string v, string d,
list<SubtargetFeature> i = []> {
string Name = n;
string Attribute = a;
string Value = v;
string Desc = d;
list<SubtargetFeature> Implies = i;
}
In the Sparc.td
file, the SubtargetFeature
is used to define the
following features.
def FeatureV9 : SubtargetFeature<"v9", "IsV9", "true",
"Enable SPARC-V9 instructions">;
def FeatureV8Deprecated : SubtargetFeature<"deprecated-v8",
"V8DeprecatedInsts", "true",
"Enable deprecated V8 instructions in V9 mode">;
def FeatureVIS : SubtargetFeature<"vis", "IsVIS", "true",
"Enable UltraSPARC Visual Instruction Set extensions">;
Elsewhere in Sparc.td
, the Proc
class is defined and then is used to
define particular SPARC processor subtypes that may have the previously
described features.
class Proc<string Name, list<SubtargetFeature> Features>
: Processor<Name, NoItineraries, Features>;
def : Proc<"generic", []>;
def : Proc<"v8", []>;
def : Proc<"supersparc", []>;
def : Proc<"sparclite", []>;
def : Proc<"f934", []>;
def : Proc<"hypersparc", []>;
def : Proc<"sparclite86x", []>;
def : Proc<"sparclet", []>;
def : Proc<"tsc701", []>;
def : Proc<"v9", [FeatureV9]>;
def : Proc<"ultrasparc", [FeatureV9, FeatureV8Deprecated]>;
def : Proc<"ultrasparc3", [FeatureV9, FeatureV8Deprecated]>;
def : Proc<"ultrasparc3-vis", [FeatureV9, FeatureV8Deprecated, FeatureVIS]>;
From Target.td
and Sparc.td
files, the resulting
SparcGenSubtarget.inc
specifies enum values to identify the features,
arrays of constants to represent the CPU features and CPU subtypes, and the
ParseSubtargetFeatures
method that parses the features string that sets
specified subtarget options. The generated SparcGenSubtarget.inc
file
should be included in the SparcSubtarget.cpp
. The target-specific
implementation of the XXXSubtarget
method should follow this pseudocode:
XXXSubtarget::XXXSubtarget(const Module &M, const std::string &FS) {
// Set the default features
// Determine default and user specified characteristics of the CPU
// Call ParseSubtargetFeatures(FS, CPU) to parse the features string
// Perform any additional operations
}
JIT Support¶
The implementation of a target machine optionally includes a Just-In-Time (JIT) code generator that emits machine code and auxiliary structures as binary output that can be written directly to memory. To do this, implement JIT code generation by performing the following steps:
- Write an
XXXCodeEmitter.cpp
file that contains a machine function pass that transforms target-machine instructions into relocatable machine code. - Write an
XXXJITInfo.cpp
file that implements the JIT interfaces for target-specific code-generation activities, such as emitting machine code and stubs. - Modify
XXXTargetMachine
so that it provides aTargetJITInfo
object through itsgetJITInfo
method.
There are several different approaches to writing the JIT support code. For
instance, TableGen and target descriptor files may be used for creating a JIT
code generator, but are not mandatory. For the Alpha and PowerPC target
machines, TableGen is used to generate XXXGenCodeEmitter.inc
, which
contains the binary coding of machine instructions and the
getBinaryCodeForInstr
method to access those codes. Other JIT
implementations do not.
Both XXXJITInfo.cpp
and XXXCodeEmitter.cpp
must include the
llvm/CodeGen/MachineCodeEmitter.h
header file that defines the
MachineCodeEmitter
class containing code for several callback functions
that write data (in bytes, words, strings, etc.) to the output stream.
Machine Code Emitter¶
In XXXCodeEmitter.cpp
, a target-specific of the Emitter
class is
implemented as a function pass (subclass of MachineFunctionPass
). The
target-specific implementation of runOnMachineFunction
(invoked by
runOnFunction
in MachineFunctionPass
) iterates through the
MachineBasicBlock
calls emitInstruction
to process each instruction and
emit binary code. emitInstruction
is largely implemented with case
statements on the instruction types defined in XXXInstrInfo.h
. For
example, in X86CodeEmitter.cpp
, the emitInstruction
method is built
around the following switch
/case
statements:
switch (Desc->TSFlags & X86::FormMask) {
case X86II::Pseudo: // for not yet implemented instructions
... // or pseudo-instructions
break;
case X86II::RawFrm: // for instructions with a fixed opcode value
...
break;
case X86II::AddRegFrm: // for instructions that have one register operand
... // added to their opcode
break;
case X86II::MRMDestReg:// for instructions that use the Mod/RM byte
... // to specify a destination (register)
break;
case X86II::MRMDestMem:// for instructions that use the Mod/RM byte
... // to specify a destination (memory)
break;
case X86II::MRMSrcReg: // for instructions that use the Mod/RM byte
... // to specify a source (register)
break;
case X86II::MRMSrcMem: // for instructions that use the Mod/RM byte
... // to specify a source (memory)
break;
case X86II::MRM0r: case X86II::MRM1r: // for instructions that operate on
case X86II::MRM2r: case X86II::MRM3r: // a REGISTER r/m operand and
case X86II::MRM4r: case X86II::MRM5r: // use the Mod/RM byte and a field
case X86II::MRM6r: case X86II::MRM7r: // to hold extended opcode data
...
break;
case X86II::MRM0m: case X86II::MRM1m: // for instructions that operate on
case X86II::MRM2m: case X86II::MRM3m: // a MEMORY r/m operand and
case X86II::MRM4m: case X86II::MRM5m: // use the Mod/RM byte and a field
case X86II::MRM6m: case X86II::MRM7m: // to hold extended opcode data
...
break;
case X86II::MRMInitReg: // for instructions whose source and
... // destination are the same register
break;
}
The implementations of these case statements often first emit the opcode and
then get the operand(s). Then depending upon the operand, helper methods may
be called to process the operand(s). For example, in X86CodeEmitter.cpp
,
for the X86II::AddRegFrm
case, the first data emitted (by emitByte
) is
the opcode added to the register operand. Then an object representing the
machine operand, MO1
, is extracted. The helper methods such as
isImmediate
, isGlobalAddress
, isExternalSymbol
,
isConstantPoolIndex
, and isJumpTableIndex
determine the operand type.
(X86CodeEmitter.cpp
also has private methods such as emitConstant
,
emitGlobalAddress
, emitExternalSymbolAddress
, emitConstPoolAddress
,
and emitJumpTableAddress
that emit the data into the output stream.)
case X86II::AddRegFrm:
MCE.emitByte(BaseOpcode + getX86RegNum(MI.getOperand(CurOp++).getReg()));
if (CurOp != NumOps) {
const MachineOperand &MO1 = MI.getOperand(CurOp++);
unsigned Size = X86InstrInfo::sizeOfImm(Desc);
if (MO1.isImmediate())
emitConstant(MO1.getImm(), Size);
else {
unsigned rt = Is64BitMode ? X86::reloc_pcrel_word
: (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word);
if (Opcode == X86::MOV64ri)
rt = X86::reloc_absolute_dword; // FIXME: add X86II flag?
if (MO1.isGlobalAddress()) {
bool NeedStub = isa<Function>(MO1.getGlobal());
bool isLazy = gvNeedsLazyPtr(MO1.getGlobal());
emitGlobalAddress(MO1.getGlobal(), rt, MO1.getOffset(), 0,
NeedStub, isLazy);
} else if (MO1.isExternalSymbol())
emitExternalSymbolAddress(MO1.getSymbolName(), rt);
else if (MO1.isConstantPoolIndex())
emitConstPoolAddress(MO1.getIndex(), rt);
else if (MO1.isJumpTableIndex())
emitJumpTableAddress(MO1.getIndex(), rt);
}
}
break;
In the previous example, XXXCodeEmitter.cpp
uses the variable rt
, which
is a RelocationType
enum that may be used to relocate addresses (for
example, a global address with a PIC base offset). The RelocationType
enum
for that target is defined in the short target-specific XXXRelocations.h
file. The RelocationType
is used by the relocate
method defined in
XXXJITInfo.cpp
to rewrite addresses for referenced global symbols.
For example, X86Relocations.h
specifies the following relocation types for
the X86 addresses. In all four cases, the relocated value is added to the
value already in memory. For reloc_pcrel_word
and reloc_picrel_word
,
there is an additional initial adjustment.
enum RelocationType {
reloc_pcrel_word = 0, // add reloc value after adjusting for the PC loc
reloc_picrel_word = 1, // add reloc value after adjusting for the PIC base
reloc_absolute_word = 2, // absolute relocation; no additional adjustment
reloc_absolute_dword = 3 // absolute relocation; no additional adjustment
};
Target JIT Info¶
XXXJITInfo.cpp
implements the JIT interfaces for target-specific
code-generation activities, such as emitting machine code and stubs. At
minimum, a target-specific version of XXXJITInfo
implements the following:
getLazyResolverFunction
— Initializes the JIT, gives the target a function that is used for compilation.emitFunctionStub
— Returns a native function with a specified address for a callback function.relocate
— Changes the addresses of referenced globals, based on relocation types.- Callback function that are wrappers to a function stub that is used when the real target is not initially known.
getLazyResolverFunction
is generally trivial to implement. It makes the
incoming parameter as the global JITCompilerFunction
and returns the
callback function that will be used a function wrapper. For the Alpha target
(in AlphaJITInfo.cpp
), the getLazyResolverFunction
implementation is
simply:
TargetJITInfo::LazyResolverFn AlphaJITInfo::getLazyResolverFunction(
JITCompilerFn F) {
JITCompilerFunction = F;
return AlphaCompilationCallback;
}
For the X86 target, the getLazyResolverFunction
implementation is a little
more complicated, because it returns a different callback function for
processors with SSE instructions and XMM registers.
The callback function initially saves and later restores the callee register values, incoming arguments, and frame and return address. The callback function needs low-level access to the registers or stack, so it is typically implemented with assembler.
Garbage Collection with LLVM¶
- Abstract
- Quick Start
- Introduction
- LLVM IR Features
- Built In GC Strategies
- Custom GC Strategies
- Collector Requirements
- Implementing a collector plugin
- Overview of available features
- Computing stack maps
- Initializing roots to null:
InitRoots
- Custom lowering of intrinsics:
CustomRoots
,CustomReadBarriers
, andCustomWriteBarriers
- Generating safe points:
NeededSafePoints
- Emitting assembly code:
GCMetadataPrinter
- References
Abstract¶
This document covers how to integrate LLVM into a compiler for a language which supports garbage collection. Note that LLVM itself does not provide a garbage collector. You must provide your own.
Quick Start¶
First, you should pick a collector strategy. LLVM includes a number of built in ones, but you can also implement a loadable plugin with a custom definition. Note that the collector strategy is a description of how LLVM should generate code such that it interacts with your collector and runtime, not a description of the collector itself.
Next, mark your generated functions as using your chosen collector strategy. From c++, you can call:
F.setGC(<collector description name>);
This will produce IR like the following fragment:
define void @foo() gc "<collector description name>" { ... }
When generating LLVM IR for your functions, you will need to:
- Use
@llvm.gcread
and/or@llvm.gcwrite
in place of standard load and store instructions. These intrinsics are used to represent load and store barriers. If you collector does not require such barriers, you can skip this step. - Use the memory allocation routines provided by your garbage collector’s runtime library.
- If your collector requires them, generate type maps according to your runtime’s binary interface. LLVM is not involved in the process. In particular, the LLVM type system is not suitable for conveying such information though the compiler.
- Insert any coordination code required for interacting with your collector. Many collectors require running application code to periodically check a flag and conditionally call a runtime function. This is often referred to as a safepoint poll.
You will need to identify roots (i.e. references to heap objects your collector
needs to know about) in your generated IR, so that LLVM can encode them into
your final stack maps. Depending on the collector strategy chosen, this is
accomplished by using either the @llvm.gcroot
intrinsics or an
gc.statepoint
relocation sequence.
Don’t forget to create a root for each intermediate value that is generated when
evaluating an expression. In h(f(), g())
, the result of f()
could
easily be collected if evaluating g()
triggers a collection.
Finally, you need to link your runtime library with the generated program executable (for a static compiler) or ensure the appropriate symbols are available for the runtime linker (for a JIT compiler).
Introduction¶
What is Garbage Collection?¶
Garbage collection is a widely used technique that frees the programmer from having to know the lifetimes of heap objects, making software easier to produce and maintain. Many programming languages rely on garbage collection for automatic memory management. There are two primary forms of garbage collection: conservative and accurate.
Conservative garbage collection often does not require any special support from either the language or the compiler: it can handle non-type-safe programming languages (such as C/C++) and does not require any special information from the compiler. The Boehm collector is an example of a state-of-the-art conservative collector.
Accurate garbage collection requires the ability to identify all pointers in the program at run-time (which requires that the source-language be type-safe in most cases). Identifying pointers at run-time requires compiler support to locate all places that hold live pointer variables at run-time, including the processor stack and registers.
Conservative garbage collection is attractive because it does not require any special compiler support, but it does have problems. In particular, because the conservative garbage collector cannot know that a particular word in the machine is a pointer, it cannot move live objects in the heap (preventing the use of compacting and generational GC algorithms) and it can occasionally suffer from memory leaks due to integer values that happen to point to objects in the program. In addition, some aggressive compiler transformations can break conservative garbage collectors (though these seem rare in practice).
Accurate garbage collectors do not suffer from any of these problems, but they can suffer from degraded scalar optimization of the program. In particular, because the runtime must be able to identify and update all pointers active in the program, some optimizations are less effective. In practice, however, the locality and performance benefits of using aggressive garbage collection techniques dominates any low-level losses.
This document describes the mechanisms and interfaces provided by LLVM to support accurate garbage collection.
Goals and non-goals¶
LLVM’s intermediate representation provides garbage collection intrinsics that offer support for a broad class of collector models. For instance, the intrinsics permit:
- semi-space collectors
- mark-sweep collectors
- generational collectors
- incremental collectors
- concurrent collectors
- cooperative collectors
- reference counting
We hope that the support built into the LLVM IR is sufficient to support a broad class of garbage collected languages including Scheme, ML, Java, C#, Perl, Python, Lua, Ruby, other scripting languages, and more.
Note that LLVM does not itself provide a garbage collector — this should be part of your language’s runtime library. LLVM provides a framework for describing the garbage collectors requirements to the compiler. In particular, LLVM provides support for generating stack maps at call sites, polling for a safepoint, and emitting load and store barriers. You can also extend LLVM - possibly through a loadable code generation plugins - to generate code and data structures which conforms to the binary interface specified by the runtime library. This is similar to the relationship between LLVM and DWARF debugging info, for example. The difference primarily lies in the lack of an established standard in the domain of garbage collection — thus the need for a flexible extension mechanism.
The aspects of the binary interface with which LLVM’s GC support is concerned are:
- Creation of GC safepoints within code where collection is allowed to execute safely.
- Computation of the stack map. For each safe point in the code, object references within the stack frame must be identified so that the collector may traverse and perhaps update them.
- Write barriers when storing object references to the heap. These are commonly used to optimize incremental scans in generational collectors.
- Emission of read barriers when loading object references. These are useful for interoperating with concurrent collectors.
There are additional areas that LLVM does not directly address:
- Registration of global roots with the runtime.
- Registration of stack map entries with the runtime.
- The functions used by the program to allocate memory, trigger a collection, etc.
- Computation or compilation of type maps, or registration of them with the runtime. These are used to crawl the heap for object references.
In general, LLVM’s support for GC does not include features which can be adequately addressed with other features of the IR and does not specify a particular binary interface. On the plus side, this means that you should be able to integrate LLVM with an existing runtime. On the other hand, it can have the effect of leaving a lot of work for the developer of a novel language. We try to mitigate this by providing built in collector strategy descriptions that can work with many common collector designs and easy extension points. If you don’t already have a specific binary interface you need to support, we recommend trying to use one of these built in collector strategies.
LLVM IR Features¶
This section describes the garbage collection facilities provided by the LLVM intermediate representation. The exact behavior of these IR features is specified by the selected GC strategy description.
Specifying GC code generation: gc "..."
¶
define <returntype> @name(...) gc "name" { ... }
The gc
function attribute is used to specify the desired GC strategy to the
compiler. Its programmatic equivalent is the setGC
method of Function
.
Setting gc "name"
on a function triggers a search for a matching subclass
of GCStrategy. Some collector strategies are built in. You can add others
using either the loadable plugin mechanism, or by patching your copy of LLVM.
It is the selected GC strategy which defines the exact nature of the code
generated to support GC. If none is found, the compiler will raise an error.
Specifying the GC style on a per-function basis allows LLVM to link together programs that use different garbage collection algorithms (or none at all).
Identifying GC roots on the stack¶
LLVM currently supports two different mechanisms for describing references in
compiled code at safepoints. llvm.gcroot
is the older mechanism;
gc.statepoint
has been added more recently. At the moment, you can choose
either implementation (on a per GC strategy basis). Longer
term, we will probably either migrate away from llvm.gcroot
entirely, or
substantially merge their implementations. Note that most new development
work is focused on gc.statepoint
.
Using gc.statepoint
¶
This page contains detailed documentation for
gc.statepoint
.
Using llvm.gcwrite
¶
void @llvm.gcroot(i8** %ptrloc, i8* %metadata)
The llvm.gcroot
intrinsic is used to inform LLVM that a stack variable
references an object on the heap and is to be tracked for garbage collection.
The exact impact on generated code is specified by the Function’s selected
GC strategy. All calls to llvm.gcroot
must reside
inside the first basic block.
The first argument must be a value referring to an alloca instruction or a bitcast of an alloca. The second contains a pointer to metadata that should be associated with the pointer, and must be a constant or global value address. If your target collector uses tags, use a null pointer for metadata.
A compiler which performs manual SSA construction must ensure that SSA
values representing GC references are stored in to the alloca passed to the
respective gcroot
before every call site and reloaded after every call.
A compiler which uses mem2reg to raise imperative code using alloca
into
SSA form need only add a call to @llvm.gcroot
for those variables which
are pointers into the GC heap.
It is also important to mark intermediate values with llvm.gcroot
. For
example, consider h(f(), g())
. Beware leaking the result of f()
in the
case that g()
triggers a collection. Note, that stack variables must be
initialized and marked with llvm.gcroot
in function’s prologue.
The %metadata
argument can be used to avoid requiring heap objects to have
‘isa’ pointers or tag bits. [Appel89, Goldberg91, Tolmach94] If specified,
its value will be tracked along with the location of the pointer in the stack
frame.
Consider the following fragment of Java code:
{
Object X; // A null-initialized reference to an object
...
}
This block (which may be located in the middle of a function or in a loop nest), could be compiled to this LLVM code:
Entry:
;; In the entry block for the function, allocate the
;; stack space for X, which is an LLVM pointer.
%X = alloca %Object*
;; Tell LLVM that the stack space is a stack root.
;; Java has type-tags on objects, so we pass null as metadata.
%tmp = bitcast %Object** %X to i8**
call void @llvm.gcroot(i8** %tmp, i8* null)
...
;; "CodeBlock" is the block corresponding to the start
;; of the scope above.
CodeBlock:
;; Java null-initializes pointers.
store %Object* null, %Object** %X
...
;; As the pointer goes out of scope, store a null value into
;; it, to indicate that the value is no longer live.
store %Object* null, %Object** %X
...
Reading and writing references in the heap¶
Some collectors need to be informed when the mutator (the program that needs garbage collection) either reads a pointer from or writes a pointer to a field of a heap object. The code fragments inserted at these points are called read barriers and write barriers, respectively. The amount of code that needs to be executed is usually quite small and not on the critical path of any computation, so the overall performance impact of the barrier is tolerable.
Barriers often require access to the object pointer rather than the derived
pointer (which is a pointer to the field within the object). Accordingly,
these intrinsics take both pointers as separate arguments for completeness. In
this snippet, %object
is the object pointer, and %derived
is the derived
pointer:
;; An array type.
%class.Array = type { %class.Object, i32, [0 x %class.Object*] }
...
;; Load the object pointer from a gcroot.
%object = load %class.Array** %object_addr
;; Compute the derived pointer.
%derived = getelementptr %object, i32 0, i32 2, i32 %n
LLVM does not enforce this relationship between the object and derived pointer (although a particular collector strategy might). However, it would be an unusual collector that violated it.
The use of these intrinsics is naturally optional if the target GC does not
require the corresponding barrier. The GC strategy used with such a collector
should replace the intrinsic calls with the corresponding load
or
store
instruction if they are used.
One known deficiency with the current design is that the barrier intrinsics do not include the size or alignment of the underlying operation performed. It is currently assumed that the operation is of pointer size and the alignment is assumed to be the target machine’s default alignment.
Write barrier: llvm.gcwrite
¶
void @llvm.gcwrite(i8* %value, i8* %object, i8** %derived)
For write barriers, LLVM provides the llvm.gcwrite
intrinsic function. It
has exactly the same semantics as a non-volatile store
to the derived
pointer (the third argument). The exact code generated is specified by the
Function’s selected GC strategy.
Many important algorithms require write barriers, including generational and concurrent collectors. Additionally, write barriers could be used to implement reference counting.
Read barrier: llvm.gcread
¶
i8* @llvm.gcread(i8* %object, i8** %derived)
For read barriers, LLVM provides the llvm.gcread
intrinsic function. It has
exactly the same semantics as a non-volatile load
from the derived pointer
(the second argument). The exact code generated is specified by the Function’s
selected GC strategy.
Read barriers are needed by fewer algorithms than write barriers, and may have a greater performance impact since pointer reads are more frequent than writes.
Built In GC Strategies¶
LLVM includes built in support for several varieties of garbage collectors.
The Shadow Stack GC¶
To use this collector strategy, mark your functions with:
F.setGC("shadow-stack");
Unlike many GC algorithms which rely on a cooperative code generator to compile stack maps, this algorithm carefully maintains a linked list of stack roots [Henderson2002]. This so-called “shadow stack” mirrors the machine stack. Maintaining this data structure is slower than using a stack map compiled into the executable as constant data, but has a significant portability advantage because it requires no special support from the target code generator, and does not require tricky platform-specific code to crawl the machine stack.
The tradeoff for this simplicity and portability is:
- High overhead per function call.
- Not thread-safe.
Still, it’s an easy way to get started. After your compiler and runtime are up and running, writing a plugin will allow you to take advantage of more advanced GC features of LLVM in order to improve performance.
The shadow stack doesn’t imply a memory allocation algorithm. A semispace
collector or building atop malloc
are great places to start, and can be
implemented with very little code.
When it comes time to collect, however, your runtime needs to traverse the stack roots, and for this it needs to integrate with the shadow stack. Luckily, doing so is very simple. (This code is heavily commented to help you understand the data structure, but there are only 20 lines of meaningful code.)
/// @brief The map for a single function's stack frame. One of these is
/// compiled as constant data into the executable for each function.
///
/// Storage of metadata values is elided if the %metadata parameter to
/// @llvm.gcroot is null.
struct FrameMap {
int32_t NumRoots; //< Number of roots in stack frame.
int32_t NumMeta; //< Number of metadata entries. May be < NumRoots.
const void *Meta[0]; //< Metadata for each root.
};
/// @brief A link in the dynamic shadow stack. One of these is embedded in
/// the stack frame of each function on the call stack.
struct StackEntry {
StackEntry *Next; //< Link to next stack entry (the caller's).
const FrameMap *Map; //< Pointer to constant FrameMap.
void *Roots[0]; //< Stack roots (in-place array).
};
/// @brief The head of the singly-linked list of StackEntries. Functions push
/// and pop onto this in their prologue and epilogue.
///
/// Since there is only a global list, this technique is not threadsafe.
StackEntry *llvm_gc_root_chain;
/// @brief Calls Visitor(root, meta) for each GC root on the stack.
/// root and meta are exactly the values passed to
/// @llvm.gcroot.
///
/// Visitor could be a function to recursively mark live objects. Or it
/// might copy them to another heap or generation.
///
/// @param Visitor A function to invoke for every GC root on the stack.
void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) {
for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) {
unsigned i = 0;
// For roots [0, NumMeta), the metadata pointer is in the FrameMap.
for (unsigned e = R->Map->NumMeta; i != e; ++i)
Visitor(&R->Roots[i], R->Map->Meta[i]);
// For roots [NumMeta, NumRoots), the metadata pointer is null.
for (unsigned e = R->Map->NumRoots; i != e; ++i)
Visitor(&R->Roots[i], NULL);
}
}
The ‘Erlang’ and ‘Ocaml’ GCs¶
LLVM ships with two example collectors which leverage the gcroot
mechanisms. To our knowledge, these are not actually used by any language
runtime, but they do provide a reasonable starting point for someone interested
in writing an gcroot
compatible GC plugin. In particular, these are the
only in tree examples of how to produce a custom binary stack map format using
a gcroot
strategy.
As there names imply, the binary format produced is intended to model that used by the Erlang and OCaml compilers respectively.
The Statepoint Example GC¶
F.setGC("statepoint-example");
This GC provides an example of how one might use the infrastructure provided
by gc.statepoint
. This example GC is compatible with the
PlaceSafepoints and RewriteStatepointsForGC utility passes
which simplify gc.statepoint
sequence insertion. If you need to build a
custom GC strategy around the gc.statepoints
mechanisms, it is recommended
that you use this one as a starting point.
This GC strategy does not support read or write barriers. As a result, these intrinsics are lowered to normal loads and stores.
The stack map format generated by this GC strategy can be found in the Stack Map Section using a format documented here. This format is intended to be the standard format supported by LLVM going forward.
The CoreCLR GC¶
F.setGC("coreclr");
This GC leverages the gc.statepoint
mechanism to support the
CoreCLR runtime.
Support for this GC strategy is a work in progress. This strategy will differ from statepoint-example GC strategy in certain aspects like:
- Base-pointers of interior pointers are not explicitly tracked and reported.
- A different format is used for encoding stack maps.
- Safe-point polls are only needed before loop-back edges and before tail-calls (not needed at function-entry).
Custom GC Strategies¶
If none of the built in GC strategy descriptions met your needs above, you will need to define a custom GCStrategy and possibly, a custom LLVM pass to perform lowering. Your best example of where to start defining a custom GCStrategy would be to look at one of the built in strategies.
You may be able to structure this additional code as a loadable plugin library. Loadable plugins are sufficient if all you need is to enable a different combination of built in functionality, but if you need to provide a custom lowering pass, you will need to build a patched version of LLVM. If you think you need a patched build, please ask for advice on llvm-dev. There may be an easy way we can extend the support to make it work for your use case without requiring a custom build.
Collector Requirements¶
You should be able to leverage any existing collector library that includes the following elements:
- A memory allocator which exposes an allocation function your compiled code can call.
- A binary format for the stack map. A stack map describes the location of references at a safepoint and is used by precise collectors to identify references within a stack frame on the machine stack. Note that collectors which conservatively scan the stack don’t require such a structure.
- A stack crawler to discover functions on the call stack, and enumerate the references listed in the stack map for each call site.
- A mechanism for identifying references in global locations (e.g. global variables).
- If you collector requires them, an LLVM IR implementation of your collectors load and store barriers. Note that since many collectors don’t require barriers at all, LLVM defaults to lowering such barriers to normal loads and stores unless you arrange otherwise.
Implementing a collector plugin¶
User code specifies which GC code generation to use with the gc
function
attribute or, equivalently, with the setGC
method of Function
.
To implement a GC plugin, it is necessary to subclass llvm::GCStrategy
,
which can be accomplished in a few lines of boilerplate code. LLVM’s
infrastructure provides access to several important algorithms. For an
uncontroversial collector, all that remains may be to compile LLVM’s computed
stack map to assembly code (using the binary representation expected by the
runtime library). This can be accomplished in about 100 lines of code.
This is not the appropriate place to implement a garbage collected heap or a garbage collector itself. That code should exist in the language’s runtime library. The compiler plugin is responsible for generating code which conforms to the binary interface defined by library, most essentially the stack map.
To subclass llvm::GCStrategy
and register it with the compiler:
// lib/MyGC/MyGC.cpp - Example LLVM GC plugin
#include "llvm/CodeGen/GCStrategy.h"
#include "llvm/CodeGen/GCMetadata.h"
#include "llvm/Support/Compiler.h"
using namespace llvm;
namespace {
class LLVM_LIBRARY_VISIBILITY MyGC : public GCStrategy {
public:
MyGC() {}
};
GCRegistry::Add<MyGC>
X("mygc", "My bespoke garbage collector.");
}
This boilerplate collector does nothing. More specifically:
llvm.gcread
calls are replaced with the correspondingload
instruction.llvm.gcwrite
calls are replaced with the correspondingstore
instruction.- No safe points are added to the code.
- The stack map is not compiled into the executable.
Using the LLVM makefiles, this code can be compiled as a plugin using a simple makefile:
# lib/MyGC/Makefile
LEVEL := ../..
LIBRARYNAME = MyGC
LOADABLE_MODULE = 1
include $(LEVEL)/Makefile.common
Once the plugin is compiled, code using it may be compiled using llc
-load=MyGC.so
(though MyGC.so may have some other platform-specific
extension):
$ cat sample.ll
define void @f() gc "mygc" {
entry:
ret void
}
$ llvm-as < sample.ll | llc -load=MyGC.so
It is also possible to statically link the collector plugin into tools, such as a language-specific compiler front-end.
Overview of available features¶
GCStrategy
provides a range of features through which a plugin may do useful
work. Some of these are callbacks, some are algorithms that can be enabled,
disabled, or customized. This matrix summarizes the supported (and planned)
features and correlates them with the collection techniques which typically
require them.
Algorithm | Done | Shadow stack | refcount | mark- sweep | copying | incremental | threaded | concurrent |
---|---|---|---|---|---|---|---|---|
stack map | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ||
initialize roots | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
derived pointers | NO | N* | N* | |||||
custom lowering | ✔ | |||||||
gcroot | ✔ | ✘ | ✘ | |||||
gcwrite | ✔ | ✘ | ✘ | ✘ | ||||
gcread | ✔ | ✘ | ||||||
safe points | ||||||||
in calls | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ||
before calls | ✔ | ✘ | ✘ | |||||
for loops | NO | N | N | |||||
before escape | ✔ | ✘ | ✘ | |||||
emit code at safe points | NO | N | N | |||||
output | ||||||||
assembly | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ||
JIT | NO | ? | ? | ? | ? | ? | ||
obj | NO | ? | ? | ? | ? | ? | ||
live analysis | NO | ? | ? | ? | ? | ? | ||
register map | NO | ? | ? | ? | ? | ? | ||
* Derived pointers only pose a hasard to copying collections. | ||||||||
? denotes a feature which could be utilized if available. |
To be clear, the collection techniques above are defined as:
- Shadow Stack
- The mutator carefully maintains a linked list of stack roots.
- Reference Counting
- The mutator maintains a reference count for each object and frees an object when its count falls to zero.
- Mark-Sweep
- When the heap is exhausted, the collector marks reachable objects starting from the roots, then deallocates unreachable objects in a sweep phase.
- Copying
- As reachability analysis proceeds, the collector copies objects from one heap area to another, compacting them in the process. Copying collectors enable highly efficient “bump pointer” allocation and can improve locality of reference.
- Incremental
- (Including generational collectors.) Incremental collectors generally have all the properties of a copying collector (regardless of whether the mature heap is compacting), but bring the added complexity of requiring write barriers.
- Threaded
- Denotes a multithreaded mutator; the collector must still stop the mutator (“stop the world”) before beginning reachability analysis. Stopping a multithreaded mutator is a complicated problem. It generally requires highly platform-specific code in the runtime, and the production of carefully designed machine code at safe points.
- Concurrent
- In this technique, the mutator and the collector run concurrently, with the goal of eliminating pause times. In a cooperative collector, the mutator further aids with collection should a pause occur, allowing collection to take advantage of multiprocessor hosts. The “stop the world” problem of threaded collectors is generally still present to a limited extent. Sophisticated marking algorithms are necessary. Read barriers may be necessary.
As the matrix indicates, LLVM’s garbage collection infrastructure is already suitable for a wide variety of collectors, but does not currently extend to multithreaded programs. This will be added in the future as there is interest.
Computing stack maps¶
LLVM automatically computes a stack map. One of the most important features
of a GCStrategy
is to compile this information into the executable in
the binary representation expected by the runtime library.
The stack map consists of the location and identity of each GC root in the each function in the module. For each root:
RootNum
: The index of the root.StackOffset
: The offset of the object relative to the frame pointer.RootMetadata
: The value passed as the%metadata
parameter to the@llvm.gcroot
intrinsic.
Also, for the function as a whole:
getFrameSize()
: The overall size of the function’s initial stack frame,- not accounting for any dynamic allocation.
roots_size()
: The count of roots in the function.
To access the stack map, use GCFunctionMetadata::roots_begin()
and
-end()
from the GCMetadataPrinter:
for (iterator I = begin(), E = end(); I != E; ++I) {
GCFunctionInfo *FI = *I;
unsigned FrameSize = FI->getFrameSize();
size_t RootCount = FI->roots_size();
for (GCFunctionInfo::roots_iterator RI = FI->roots_begin(),
RE = FI->roots_end();
RI != RE; ++RI) {
int RootNum = RI->Num;
int RootStackOffset = RI->StackOffset;
Constant *RootMetadata = RI->Metadata;
}
}
If the llvm.gcroot
intrinsic is eliminated before code generation by a
custom lowering pass, LLVM will compute an empty stack map. This may be useful
for collector plugins which implement reference counting or a shadow stack.
Initializing roots to null: InitRoots
¶
MyGC::MyGC() {
InitRoots = true;
}
When set, LLVM will automatically initialize each root to null
upon entry to
the function. This prevents the GC’s sweep phase from visiting uninitialized
pointers, which will almost certainly cause it to crash. This initialization
occurs before custom lowering, so the two may be used together.
Since LLVM does not yet compute liveness information, there is no means of distinguishing an uninitialized stack root from an initialized one. Therefore, this feature should be used by all GC plugins. It is enabled by default.
Custom lowering of intrinsics: CustomRoots
, CustomReadBarriers
, and CustomWriteBarriers
¶
For GCs which use barriers or unusual treatment of stack roots, these flags allow the collector to perform arbitrary transformations of the LLVM IR:
class MyGC : public GCStrategy {
public:
MyGC() {
CustomRoots = true;
CustomReadBarriers = true;
CustomWriteBarriers = true;
}
};
If any of these flags are set, LLVM suppresses its default lowering for the corresponding intrinsics. Instead, you must provide a custom Pass which lowers the intrinsics as desired. If you have opted in to custom lowering of a particular intrinsic your pass must eliminate all instances of the corresponding intrinsic in functions which opt in to your GC. The best example of such a pass is the ShadowStackGC and it’s ShadowStackGCLowering pass.
There is currently no way to register such a custom lowering pass without building a custom copy of LLVM.
Generating safe points: NeededSafePoints
¶
LLVM can compute four kinds of safe points:
namespace GC {
/// PointKind - The type of a collector-safe point.
///
enum PointKind {
Loop, //< Instr is a loop (backwards branch).
Return, //< Instr is a return instruction.
PreCall, //< Instr is a call instruction.
PostCall //< Instr is the return address of a call.
};
}
A collector can request any combination of the four by setting the
NeededSafePoints
mask:
MyGC::MyGC() {
NeededSafePoints = 1 << GC::Loop
| 1 << GC::Return
| 1 << GC::PreCall
| 1 << GC::PostCall;
}
It can then use the following routines to access safe points.
for (iterator I = begin(), E = end(); I != E; ++I) {
GCFunctionInfo *MD = *I;
size_t PointCount = MD->size();
for (GCFunctionInfo::iterator PI = MD->begin(),
PE = MD->end(); PI != PE; ++PI) {
GC::PointKind PointKind = PI->Kind;
unsigned PointNum = PI->Num;
}
}
Almost every collector requires PostCall
safe points, since these correspond
to the moments when the function is suspended during a call to a subroutine.
Threaded programs generally require Loop
safe points to guarantee that the
application will reach a safe point within a bounded amount of time, even if it
is executing a long-running loop which contains no function calls.
Threaded collectors may also require Return
and PreCall
safe points to
implement “stop the world” techniques using self-modifying code, where it is
important that the program not exit the function without reaching a safe point
(because only the topmost function has been patched).
Emitting assembly code: GCMetadataPrinter
¶
LLVM allows a plugin to print arbitrary assembly code before and after the rest of a module’s assembly code. At the end of the module, the GC can compile the LLVM stack map into assembly code. (At the beginning, this information is not yet computed.)
Since AsmWriter and CodeGen are separate components of LLVM, a separate abstract
base class and registry is provided for printing assembly code, the
GCMetadaPrinter
and GCMetadataPrinterRegistry
. The AsmWriter will look
for such a subclass if the GCStrategy
sets UsesMetadata
:
MyGC::MyGC() {
UsesMetadata = true;
}
This separation allows JIT-only clients to be smaller.
Note that LLVM does not currently have analogous APIs to support code generation in the JIT, nor using the object writers.
// lib/MyGC/MyGCPrinter.cpp - Example LLVM GC printer
#include "llvm/CodeGen/GCMetadataPrinter.h"
#include "llvm/Support/Compiler.h"
using namespace llvm;
namespace {
class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter {
public:
virtual void beginAssembly(AsmPrinter &AP);
virtual void finishAssembly(AsmPrinter &AP);
};
GCMetadataPrinterRegistry::Add<MyGCPrinter>
X("mygc", "My bespoke garbage collector.");
}
The collector should use AsmPrinter
to print portable assembly code. The
collector itself contains the stack map for the entire module, and may access
the GCFunctionInfo
using its own begin()
and end()
methods. Here’s
a realistic example:
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/Target/TargetAsmInfo.h"
#include "llvm/Target/TargetMachine.h"
void MyGCPrinter::beginAssembly(AsmPrinter &AP) {
// Nothing to do.
}
void MyGCPrinter::finishAssembly(AsmPrinter &AP) {
MCStreamer &OS = AP.OutStreamer;
unsigned IntPtrSize = AP.TM.getSubtargetImpl()->getDataLayout()->getPointerSize();
// Put this in the data section.
OS.SwitchSection(AP.getObjFileLowering().getDataSection());
// For each function...
for (iterator FI = begin(), FE = end(); FI != FE; ++FI) {
GCFunctionInfo &MD = **FI;
// A compact GC layout. Emit this data structure:
//
// struct {
// int32_t PointCount;
// void *SafePointAddress[PointCount];
// int32_t StackFrameSize; // in words
// int32_t StackArity;
// int32_t LiveCount;
// int32_t LiveOffsets[LiveCount];
// } __gcmap_<FUNCTIONNAME>;
// Align to address width.
AP.EmitAlignment(IntPtrSize == 4 ? 2 : 3);
// Emit PointCount.
OS.AddComment("safe point count");
AP.EmitInt32(MD.size());
// And each safe point...
for (GCFunctionInfo::iterator PI = MD.begin(),
PE = MD.end(); PI != PE; ++PI) {
// Emit the address of the safe point.
OS.AddComment("safe point address");
MCSymbol *Label = PI->Label;
AP.EmitLabelPlusOffset(Label/*Hi*/, 0/*Offset*/, 4/*Size*/);
}
// Stack information never change in safe points! Only print info from the
// first call-site.
GCFunctionInfo::iterator PI = MD.begin();
// Emit the stack frame size.
OS.AddComment("stack frame size (in words)");
AP.EmitInt32(MD.getFrameSize() / IntPtrSize);
// Emit stack arity, i.e. the number of stacked arguments.
unsigned RegisteredArgs = IntPtrSize == 4 ? 5 : 6;
unsigned StackArity = MD.getFunction().arg_size() > RegisteredArgs ?
MD.getFunction().arg_size() - RegisteredArgs : 0;
OS.AddComment("stack arity");
AP.EmitInt32(StackArity);
// Emit the number of live roots in the function.
OS.AddComment("live root count");
AP.EmitInt32(MD.live_size(PI));
// And for each live root...
for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI),
LE = MD.live_end(PI);
LI != LE; ++LI) {
// Emit live root's offset within the stack frame.
OS.AddComment("stack index (offset / wordsize)");
AP.EmitInt32(LI->StackOffset);
}
}
}
References¶
[Appel89] Runtime Tags Aren’t Necessary. Andrew W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.
[Goldberg91] Tag-free garbage collection for strongly typed programming languages. Benjamin Goldberg. ACM SIGPLAN PLDI‘91.
[Tolmach94] Tag-free garbage collection using explicit type parameters. Andrew Tolmach. Proceedings of the 1994 ACM conference on LISP and functional programming.
[Henderson2002] Accurate Garbage Collection in an Uncooperative Environment
Writing an LLVM Pass¶
- Introduction — What is a pass?
- Quick Start — Writing hello world
- Pass classes and requirements
- Pass Statistics
- Registering dynamically loaded passes
Introduction — What is a pass?¶
The LLVM Pass Framework is an important part of the LLVM system, because LLVM passes are where most of the interesting parts of the compiler exist. Passes perform the transformations and optimizations that make up the compiler, they build the analysis results that are used by these transformations, and they are, above all, a structuring technique for compiler code.
All LLVM passes are subclasses of the Pass class, which implement
functionality by overriding virtual methods inherited from Pass
. Depending
on how your pass works, you should inherit from the ModulePass , CallGraphSCCPass, FunctionPass , or LoopPass, or RegionPass, or BasicBlockPass classes, which gives the system more
information about what your pass does, and how it can be combined with other
passes. One of the main features of the LLVM Pass Framework is that it
schedules passes to run in an efficient way based on the constraints that your
pass meets (which are indicated by which class they derive from).
We start by showing you how to construct a pass, everything from setting up the code, to compiling, loading, and executing it. After the basics are down, more advanced features are discussed.
Quick Start — Writing hello world¶
Here we describe how to write the “hello world” of passes. The “Hello” pass is
designed to simply print out the name of non-external functions that exist in
the program being compiled. It does not modify the program at all, it just
inspects it. The source code and files for this pass are available in the LLVM
source tree in the lib/Transforms/Hello
directory.
Setting up the build environment¶
First, configure and build LLVM. This needs to be done directly inside the
LLVM source tree rather than in a separate objects directory. Next, you need
to create a new directory somewhere in the LLVM source base. For this example,
we’ll assume that you made lib/Transforms/Hello
. Finally, you must set up
a build script (Makefile
) that will compile the source code for the new
pass. To do this, copy the following into Makefile
:
# Makefile for hello pass
# Path to top level of LLVM hierarchy
LEVEL = ../../..
# Name of the library to build
LIBRARYNAME = Hello
# Make the shared library become a loadable module so the tools can
# dlopen/dlsym on the resulting library.
LOADABLE_MODULE = 1
# Include the makefile implementation stuff
include $(LEVEL)/Makefile.common
This makefile specifies that all of the .cpp
files in the current directory
are to be compiled and linked together into a shared object
$(LEVEL)/Debug+Asserts/lib/Hello.so
that can be dynamically loaded by the
opt or bugpoint tools via their -load
options.
If your operating system uses a suffix other than .so
(such as Windows or Mac
OS X), the appropriate extension will be used.
If you are used CMake to build LLVM, see Developing LLVM passes out of source.
Now that we have the build scripts set up, we just need to write the code for the pass itself.
Basic code required¶
Now that we have a way to compile our new pass, we just have to write it. Start out with:
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
Which are needed because we are writing a Pass, we are operating on Functions, and we will be doing some printing.
Next we have:
using namespace llvm;
... which is required because the functions from the include files live in the llvm namespace.
Next we have:
namespace {
... which starts out an anonymous namespace. Anonymous namespaces are to C++
what the “static
” keyword is to C (at global scope). It makes the things
declared inside of the anonymous namespace visible only to the current file.
If you’re not familiar with them, consult a decent C++ book for more
information.
Next, we declare our pass itself:
struct Hello : public FunctionPass {
This declares a “Hello
” class that is a subclass of FunctionPass. The different builtin pass subclasses
are described in detail later, but
for now, know that FunctionPass
operates on a function at a time.
static char ID;
Hello() : FunctionPass(ID) {}
This declares pass identifier used by LLVM to identify pass. This allows LLVM to avoid using expensive C++ runtime information.
bool runOnFunction(Function &F) override {
errs() << "Hello: ";
errs().write_escaped(F.getName()) << "\n";
return false;
}
}; // end of struct Hello
} // end of anonymous namespace
We declare a runOnFunction method, which overrides an abstract virtual method inherited from FunctionPass. This is where we are supposed to do our thing, so we just print out our message with the name of each function.
char Hello::ID = 0;
We initialize pass ID here. LLVM uses ID’s address to identify a pass, so initialization value is not important.
static RegisterPass<Hello> X("hello", "Hello World Pass",
false /* Only looks at CFG */,
false /* Analysis Pass */);
Lastly, we register our class
Hello
, giving it a command line argument “hello
”, and a name “Hello
World Pass”. The last two arguments describe its behavior: if a pass walks CFG
without modifying it then the third argument is set to true
; if a pass is
an analysis pass, for example dominator tree pass, then true
is supplied as
the fourth argument.
As a whole, the .cpp
file looks like:
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;
namespace {
struct Hello : public FunctionPass {
static char ID;
Hello() : FunctionPass(ID) {}
bool runOnFunction(Function &F) override {
errs() << "Hello: ";
errs().write_escaped(F.getName()) << '\n';
return false;
}
};
}
char Hello::ID = 0;
static RegisterPass<Hello> X("hello", "Hello World Pass", false, false);
Now that it’s all together, compile the file with a simple “gmake
” command
in the local directory and you should get a new file
“Debug+Asserts/lib/Hello.so
” under the top level directory of the LLVM
source tree (not in the local directory). Note that everything in this file is
contained in an anonymous namespace — this reflects the fact that passes
are self contained units that do not need external interfaces (although they
can have them) to be useful.
Running a pass with opt
¶
Now that you have a brand new shiny shared object file, we can use the
opt command to run an LLVM program through your pass. Because you
registered your pass with RegisterPass
, you will be able to use the
opt tool to access it, once loaded.
To test it, follow the example at the end of the Getting Started with the LLVM System to compile “Hello World” to LLVM. We can now run the bitcode file (hello.bc) for the program through our transformation like this (or course, any bitcode file will work):
$ opt -load ../../../Debug+Asserts/lib/Hello.so -hello < hello.bc > /dev/null
Hello: __main
Hello: puts
Hello: main
The -load
option specifies that opt should load your pass
as a shared object, which makes “-hello
” a valid command line argument
(which is one reason you need to register your pass). Because the Hello pass does not modify
the program in any interesting way, we just throw away the result of
opt (sending it to /dev/null
).
To see what happened to the other string you registered, try running
opt with the -help
option:
$ opt -load ../../../Debug+Asserts/lib/Hello.so -help
OVERVIEW: llvm .bc -> .bc modular optimizer
USAGE: opt [options] <input bitcode>
OPTIONS:
Optimizations available:
...
-globalopt - Global Variable Optimizer
-globalsmodref-aa - Simple mod/ref analysis for globals
-gvn - Global Value Numbering
-hello - Hello World Pass
-indvars - Induction Variable Simplification
-inline - Function Integration/Inlining
...
The pass name gets added as the information string for your pass, giving some
documentation to users of opt. Now that you have a working pass,
you would go ahead and make it do the cool transformations you want. Once you
get it all working and tested, it may become useful to find out how fast your
pass is. The PassManager provides a
nice command line option (--time-passes
) that allows you to get
information about the execution time of your pass along with the other passes
you queue up. For example:
$ opt -load ../../../Debug+Asserts/lib/Hello.so -hello -time-passes < hello.bc > /dev/null
Hello: __main
Hello: puts
Hello: main
===============================================================================
... Pass execution timing report ...
===============================================================================
Total Execution Time: 0.02 seconds (0.0479059 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Pass Name ---
0.0100 (100.0%) 0.0000 ( 0.0%) 0.0100 ( 50.0%) 0.0402 ( 84.0%) Bitcode Writer
0.0000 ( 0.0%) 0.0100 (100.0%) 0.0100 ( 50.0%) 0.0031 ( 6.4%) Dominator Set Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 2.7%) Module Verifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0033 ( 6.9%) Hello World Pass
0.0100 (100.0%) 0.0100 (100.0%) 0.0200 (100.0%) 0.0479 (100.0%) TOTAL
As you can see, our implementation above is pretty fast. The additional passes listed are automatically inserted by the opt tool to verify that the LLVM emitted by your pass is still valid and well formed LLVM, which hasn’t been broken somehow.
Now that you have seen the basics of the mechanics behind passes, we can talk about some more details of how they work and how to use them.
Pass classes and requirements¶
One of the first things that you should do when designing a new pass is to decide what class you should subclass for your pass. The Hello World example uses the FunctionPass class for its implementation, but we did not discuss why or when this should occur. Here we talk about the classes available, from the most general to the most specific.
When choosing a superclass for your Pass
, you should choose the most
specific class possible, while still being able to meet the requirements
listed. This gives the LLVM Pass Infrastructure information necessary to
optimize how passes are run, so that the resultant compiler isn’t unnecessarily
slow.
The ImmutablePass
class¶
The most plain and boring type of pass is the “ImmutablePass” class. This pass type is used for passes that do not have to be run, do not change state, and never need to be updated. This is not a normal type of transformation or analysis, but can provide information about the current compiler configuration.
Although this pass class is very infrequently used, it is important for providing information about the current target machine being compiled for, and other static information that can affect the various transformations.
ImmutablePass
es never invalidate other transformations, are never
invalidated, and are never “run”.
The ModulePass
class¶
The ModulePass class
is the most general of all superclasses that you can use. Deriving from
ModulePass
indicates that your pass uses the entire program as a unit,
referring to function bodies in no predictable order, or adding and removing
functions. Because nothing is known about the behavior of ModulePass
subclasses, no optimization can be done for their execution.
A module pass can use function level passes (e.g. dominators) using the
getAnalysis
interface getAnalysis<DominatorTree>(llvm::Function *)
to
provide the function to retrieve analysis result for, if the function pass does
not require any module or immutable passes. Note that this can only be done
for functions for which the analysis ran, e.g. in the case of dominators you
should only ask for the DominatorTree
for function definitions, not
declarations.
To write a correct ModulePass
subclass, derive from ModulePass
and
overload the runOnModule
method with the following signature:
The runOnModule
method¶
virtual bool runOnModule(Module &M) = 0;
The runOnModule
method performs the interesting work of the pass. It
should return true
if the module was modified by the transformation and
false
otherwise.
The CallGraphSCCPass
class¶
The CallGraphSCCPass is used by
passes that need to traverse the program bottom-up on the call graph (callees
before callers). Deriving from CallGraphSCCPass
provides some mechanics
for building and traversing the CallGraph
, but also allows the system to
optimize execution of CallGraphSCCPass
es. If your pass meets the
requirements outlined below, and doesn’t meet the requirements of a
FunctionPass or BasicBlockPass, you should derive from
CallGraphSCCPass
.
TODO
: explain briefly what SCC, Tarjan’s algo, and B-U mean.
To be explicit, CallGraphSCCPass subclasses are:
- ... not allowed to inspect or modify any
Function
s other than those in the current SCC and the direct callers and direct callees of the SCC. - ... required to preserve the current
CallGraph
object, updating it to reflect any changes made to the program. - ... not allowed to add or remove SCC’s from the current Module, though they may change the contents of an SCC.
- ... allowed to add or remove global variables from the current Module.
- ... allowed to maintain state across invocations of runOnSCC (including global data).
Implementing a CallGraphSCCPass
is slightly tricky in some cases because it
has to handle SCCs with more than one node in it. All of the virtual methods
described below should return true
if they modified the program, or
false
if they didn’t.
The doInitialization(CallGraph &)
method¶
virtual bool doInitialization(CallGraph &CG);
The doInitialization
method is allowed to do most of the things that
CallGraphSCCPass
es are not allowed to do. They can add and remove
functions, get pointers to functions, etc. The doInitialization
method is
designed to do simple initialization type of stuff that does not depend on the
SCCs being processed. The doInitialization
method call is not scheduled to
overlap with any other pass executions (thus it should be very fast).
The runOnSCC
method¶
virtual bool runOnSCC(CallGraphSCC &SCC) = 0;
The runOnSCC
method performs the interesting work of the pass, and should
return true
if the module was modified by the transformation, false
otherwise.
The doFinalization(CallGraph &)
method¶
virtual bool doFinalization(CallGraph &CG);
The doFinalization
method is an infrequently used method that is called
when the pass framework has finished calling runOnSCC for every SCC in the program being compiled.
The FunctionPass
class¶
In contrast to ModulePass
subclasses, FunctionPass subclasses do have a
predictable, local behavior that can be expected by the system. All
FunctionPass
execute on each function in the program independent of all of
the other functions in the program. FunctionPass
es do not require that
they are executed in a particular order, and FunctionPass
es do not modify
external functions.
To be explicit, FunctionPass
subclasses are not allowed to:
- Inspect or modify a
Function
other than the one currently being processed. - Add or remove
Function
s from the currentModule
. - Add or remove global variables from the current
Module
. - Maintain state across invocations of runOnFunction (including global data).
Implementing a FunctionPass
is usually straightforward (See the Hello
World pass for example).
FunctionPass
es may overload three virtual methods to do their work. All
of these methods should return true
if they modified the program, or
false
if they didn’t.
The doInitialization(Module &)
method¶
virtual bool doInitialization(Module &M);
The doInitialization
method is allowed to do most of the things that
FunctionPass
es are not allowed to do. They can add and remove functions,
get pointers to functions, etc. The doInitialization
method is designed to
do simple initialization type of stuff that does not depend on the functions
being processed. The doInitialization
method call is not scheduled to
overlap with any other pass executions (thus it should be very fast).
A good example of how this method should be used is the LowerAllocations pass. This pass
converts malloc
and free
instructions into platform dependent
malloc()
and free()
function calls. It uses the doInitialization
method to get a reference to the malloc
and free
functions that it
needs, adding prototypes to the module if necessary.
The runOnFunction
method¶
virtual bool runOnFunction(Function &F) = 0;
The runOnFunction
method must be implemented by your subclass to do the
transformation or analysis work of your pass. As usual, a true
value
should be returned if the function is modified.
The doFinalization(Module &)
method¶
virtual bool doFinalization(Module &M);
The doFinalization
method is an infrequently used method that is called
when the pass framework has finished calling runOnFunction for every function in the program being
compiled.
The LoopPass
class¶
All LoopPass
execute on each loop in the function independent of all of the
other loops in the function. LoopPass
processes loops in loop nest order
such that outer most loop is processed last.
LoopPass
subclasses are allowed to update loop nest using LPPassManager
interface. Implementing a loop pass is usually straightforward.
LoopPass
es may overload three virtual methods to do their work. All
these methods should return true
if they modified the program, or false
if they didn’t.
The doInitialization(Loop *, LPPassManager &)
method¶
virtual bool doInitialization(Loop *, LPPassManager &LPM);
The doInitialization
method is designed to do simple initialization type of
stuff that does not depend on the functions being processed. The
doInitialization
method call is not scheduled to overlap with any other
pass executions (thus it should be very fast). LPPassManager
interface
should be used to access Function
or Module
level analysis information.
The runOnLoop
method¶
virtual bool runOnLoop(Loop *, LPPassManager &LPM) = 0;
The runOnLoop
method must be implemented by your subclass to do the
transformation or analysis work of your pass. As usual, a true
value
should be returned if the function is modified. LPPassManager
interface
should be used to update loop nest.
The doFinalization()
method¶
virtual bool doFinalization();
The doFinalization
method is an infrequently used method that is called
when the pass framework has finished calling runOnLoop for every loop in the program being compiled.
The RegionPass
class¶
RegionPass
is similar to LoopPass,
but executes on each single entry single exit region in the function.
RegionPass
processes regions in nested order such that the outer most
region is processed last.
RegionPass
subclasses are allowed to update the region tree by using the
RGPassManager
interface. You may overload three virtual methods of
RegionPass
to implement your own region pass. All these methods should
return true
if they modified the program, or false
if they did not.
The doInitialization(Region *, RGPassManager &)
method¶
virtual bool doInitialization(Region *, RGPassManager &RGM);
The doInitialization
method is designed to do simple initialization type of
stuff that does not depend on the functions being processed. The
doInitialization
method call is not scheduled to overlap with any other
pass executions (thus it should be very fast). RPPassManager
interface
should be used to access Function
or Module
level analysis information.
The runOnRegion
method¶
virtual bool runOnRegion(Region *, RGPassManager &RGM) = 0;
The runOnRegion
method must be implemented by your subclass to do the
transformation or analysis work of your pass. As usual, a true value should be
returned if the region is modified. RGPassManager
interface should be used to
update region tree.
The doFinalization()
method¶
virtual bool doFinalization();
The doFinalization
method is an infrequently used method that is called
when the pass framework has finished calling runOnRegion for every region in the program being
compiled.
The BasicBlockPass
class¶
BasicBlockPass
es are just like FunctionPass’s , except that they must limit their scope
of inspection and modification to a single basic block at a time. As such,
they are not allowed to do any of the following:
- Modify or inspect any basic blocks outside of the current one.
- Maintain state across invocations of runOnBasicBlock.
- Modify the control flow graph (by altering terminator instructions)
- Any of the things forbidden for FunctionPasses.
BasicBlockPass
es are useful for traditional local and “peephole”
optimizations. They may override the same doInitialization(Module &) and doFinalization(Module &) methods that FunctionPass’s have, but also have the following virtual
methods that may also be implemented:
The doInitialization(Function &)
method¶
virtual bool doInitialization(Function &F);
The doInitialization
method is allowed to do most of the things that
BasicBlockPass
es are not allowed to do, but that FunctionPass
es
can. The doInitialization
method is designed to do simple initialization
that does not depend on the BasicBlock
s being processed. The
doInitialization
method call is not scheduled to overlap with any other
pass executions (thus it should be very fast).
The runOnBasicBlock
method¶
virtual bool runOnBasicBlock(BasicBlock &BB) = 0;
Override this function to do the work of the BasicBlockPass
. This function
is not allowed to inspect or modify basic blocks other than the parameter, and
are not allowed to modify the CFG. A true
value must be returned if the
basic block is modified.
The doFinalization(Function &)
method¶
virtual bool doFinalization(Function &F);
The doFinalization
method is an infrequently used method that is called
when the pass framework has finished calling runOnBasicBlock for every BasicBlock
in the program
being compiled. This can be used to perform per-function finalization.
The MachineFunctionPass
class¶
A MachineFunctionPass
is a part of the LLVM code generator that executes on
the machine-dependent representation of each LLVM function in the program.
Code generator passes are registered and initialized specially by
TargetMachine::addPassesToEmitFile
and similar routines, so they cannot
generally be run from the opt or bugpoint commands.
A MachineFunctionPass
is also a FunctionPass
, so all the restrictions
that apply to a FunctionPass
also apply to it. MachineFunctionPass
es
also have additional restrictions. In particular, MachineFunctionPass
es
are not allowed to do any of the following:
- Modify or create any LLVM IR
Instruction
s,BasicBlock
s,Argument
s,Function
s,GlobalVariable
s,GlobalAlias
es, orModule
s. - Modify a
MachineFunction
other than the one currently being processed. - Maintain state across invocations of runOnMachineFunction (including global data).
The runOnMachineFunction(MachineFunction &MF)
method¶
virtual bool runOnMachineFunction(MachineFunction &MF) = 0;
runOnMachineFunction
can be considered the main entry point of a
MachineFunctionPass
; that is, you should override this method to do the
work of your MachineFunctionPass
.
The runOnMachineFunction
method is called on every MachineFunction
in a
Module
, so that the MachineFunctionPass
may perform optimizations on
the machine-dependent representation of the function. If you want to get at
the LLVM Function
for the MachineFunction
you’re working on, use
MachineFunction
‘s getFunction()
accessor method — but remember, you
may not modify the LLVM Function
or its contents from a
MachineFunctionPass
.
Pass registration¶
In the Hello World example pass we illustrated how pass registration works, and discussed some of the reasons that it is used and what it does. Here we discuss how and why passes are registered.
As we saw above, passes are registered with the RegisterPass
template. The
template parameter is the name of the pass that is to be used on the command
line to specify that the pass should be added to a program (for example, with
opt or bugpoint). The first argument is the name of the
pass, which is to be used for the -help
output of programs, as well
as for debug output generated by the --debug-pass
option.
If you want your pass to be easily dumpable, you should implement the virtual print method:
The print
method¶
virtual void print(llvm::raw_ostream &O, const Module *M) const;
The print
method must be implemented by “analyses” in order to print a
human readable version of the analysis results. This is useful for debugging
an analysis itself, as well as for other people to figure out how an analysis
works. Use the opt -analyze
argument to invoke this method.
The llvm::raw_ostream
parameter specifies the stream to write the results
on, and the Module
parameter gives a pointer to the top level module of the
program that has been analyzed. Note however that this pointer may be NULL
in certain circumstances (such as calling the Pass::dump()
from a
debugger), so it should only be used to enhance debug output, it should not be
depended on.
Specifying interactions between passes¶
One of the main responsibilities of the PassManager
is to make sure that
passes interact with each other correctly. Because PassManager
tries to
optimize the execution of passes it
must know how the passes interact with each other and what dependencies exist
between the various passes. To track this, each pass can declare the set of
passes that are required to be executed before the current pass, and the passes
which are invalidated by the current pass.
Typically this functionality is used to require that analysis results are computed before your pass is run. Running arbitrary transformation passes can invalidate the computed analysis results, which is what the invalidation set specifies. If a pass does not implement the getAnalysisUsage method, it defaults to not having any prerequisite passes, and invalidating all other passes.
The getAnalysisUsage
method¶
virtual void getAnalysisUsage(AnalysisUsage &Info) const;
By implementing the getAnalysisUsage
method, the required and invalidated
sets may be specified for your transformation. The implementation should fill
in the AnalysisUsage object with
information about which passes are required and not invalidated. To do this, a
pass may call any of the following methods on the AnalysisUsage
object:
The AnalysisUsage::addRequired<>
and AnalysisUsage::addRequiredTransitive<>
methods¶
If your pass requires a previous pass to be executed (an analysis for example),
it can use one of these methods to arrange for it to be run before your pass.
LLVM has many different types of analyses and passes that can be required,
spanning the range from DominatorSet
to BreakCriticalEdges
. Requiring
BreakCriticalEdges
, for example, guarantees that there will be no critical
edges in the CFG when your pass has been run.
Some analyses chain to other analyses to do their job. For example, an
AliasAnalysis <AliasAnalysis> implementation is required to chain to other alias analysis passes. In cases where
analyses chain, the addRequiredTransitive
method should be used instead of
the addRequired
method. This informs the PassManager
that the
transitively required pass should be alive as long as the requiring pass is.
The AnalysisUsage::addPreserved<>
method¶
One of the jobs of the PassManager
is to optimize how and when analyses are
run. In particular, it attempts to avoid recomputing data unless it needs to.
For this reason, passes are allowed to declare that they preserve (i.e., they
don’t invalidate) an existing analysis if it’s available. For example, a
simple constant folding pass would not modify the CFG, so it can’t possibly
affect the results of dominator analysis. By default, all passes are assumed
to invalidate all others.
The AnalysisUsage
class provides several methods which are useful in
certain circumstances that are related to addPreserved
. In particular, the
setPreservesAll
method can be called to indicate that the pass does not
modify the LLVM program at all (which is true for analyses), and the
setPreservesCFG
method can be used by transformations that change
instructions in the program but do not modify the CFG or terminator
instructions (note that this property is implicitly set for
BasicBlockPasses).
addPreserved
is particularly useful for transformations like
BreakCriticalEdges
. This pass knows how to update a small set of loop and
dominator related analyses if they exist, so it can preserve them, despite the
fact that it hacks on the CFG.
Example implementations of getAnalysisUsage
¶
// This example modifies the program, but does not modify the CFG
void LICM::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();
AU.addRequired<LoopInfoWrapperPass>();
}
The getAnalysis<>
and getAnalysisIfAvailable<>
methods¶
The Pass::getAnalysis<>
method is automatically inherited by your class,
providing you with access to the passes that you declared that you required
with the getAnalysisUsage
method. It takes a single template argument that specifies which pass class
you want, and returns a reference to that pass. For example:
bool LICM::runOnFunction(Function &F) {
LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
//...
}
This method call returns a reference to the pass desired. You may get a
runtime assertion failure if you attempt to get an analysis that you did not
declare as required in your getAnalysisUsage implementation. This method can be
called by your run*
method implementation, or by any other local method
invoked by your run*
method.
A module level pass can use function level analysis info using this interface. For example:
bool ModuleLevelPass::runOnModule(Module &M) {
//...
DominatorTree &DT = getAnalysis<DominatorTree>(Func);
//...
}
In above example, runOnFunction
for DominatorTree
is called by pass
manager before returning a reference to the desired pass.
If your pass is capable of updating analyses if they exist (e.g.,
BreakCriticalEdges
, as described above), you can use the
getAnalysisIfAvailable
method, which returns a pointer to the analysis if
it is active. For example:
if (DominatorSet *DS = getAnalysisIfAvailable<DominatorSet>()) {
// A DominatorSet is active. This code will update it.
}
Implementing Analysis Groups¶
Now that we understand the basics of how passes are defined, how they are used, and how they are required from other passes, it’s time to get a little bit fancier. All of the pass relationships that we have seen so far are very simple: one pass depends on one other specific pass to be run before it can run. For many applications, this is great, for others, more flexibility is required.
In particular, some analyses are defined such that there is a single simple interface to the analysis results, but multiple ways of calculating them. Consider alias analysis for example. The most trivial alias analysis returns “may alias” for any alias query. The most sophisticated analysis a flow-sensitive, context-sensitive interprocedural analysis that can take a significant amount of time to execute (and obviously, there is a lot of room between these two extremes for other implementations). To cleanly support situations like this, the LLVM Pass Infrastructure supports the notion of Analysis Groups.
Analysis Group Concepts¶
An Analysis Group is a single simple interface that may be implemented by
multiple different passes. Analysis Groups can be given human readable names
just like passes, but unlike passes, they need not derive from the Pass
class. An analysis group may have one or more implementations, one of which is
the “default” implementation.
Analysis groups are used by client passes just like other passes are: the
AnalysisUsage::addRequired()
and Pass::getAnalysis()
methods. In order
to resolve this requirement, the PassManager scans the available passes to see if any
implementations of the analysis group are available. If none is available, the
default implementation is created for the pass to use. All standard rules for
interaction between passes still
apply.
Although Pass Registration is optional for normal passes, all analysis group implementations must be registered, and must use the INITIALIZE_AG_PASS template to join the implementation pool. Also, a default implementation of the interface must be registered with RegisterAnalysisGroup.
As a concrete example of an Analysis Group in action, consider the AliasAnalysis analysis group. The default implementation of the alias analysis interface (the basicaa pass) just does a few simple checks that don’t require significant analysis to compute (such as: two different globals can never alias each other, etc). Passes that use the AliasAnalysis interface (for example the gcse pass), do not care which implementation of alias analysis is actually provided, they just use the designated interface.
From the user’s perspective, commands work just like normal. Issuing the
command opt -gcse ...
will cause the basicaa
class to be instantiated
and added to the pass sequence. Issuing the command opt -somefancyaa -gcse
...
will cause the gcse
pass to use the somefancyaa
alias analysis
(which doesn’t actually exist, it’s just a hypothetical example) instead.
Using RegisterAnalysisGroup
¶
The RegisterAnalysisGroup
template is used to register the analysis group
itself, while the INITIALIZE_AG_PASS
is used to add pass implementations to
the analysis group. First, an analysis group should be registered, with a
human readable name provided for it. Unlike registration of passes, there is
no command line argument to be specified for the Analysis Group Interface
itself, because it is “abstract”:
static RegisterAnalysisGroup<AliasAnalysis> A("Alias Analysis");
Once the analysis is registered, passes can declare that they are valid implementations of the interface by using the following code:
namespace {
// Declare that we implement the AliasAnalysis interface
INITIALIZE_AG_PASS(FancyAA, AliasAnalysis , "somefancyaa",
"A more complex alias analysis implementation",
false, // Is CFG Only?
true, // Is Analysis?
false); // Is default Analysis Group implementation?
}
This just shows a class FancyAA
that uses the INITIALIZE_AG_PASS
macro
both to register and to “join” the AliasAnalysis analysis group.
Every implementation of an analysis group should join using this macro.
namespace {
// Declare that we implement the AliasAnalysis interface
INITIALIZE_AG_PASS(BasicAA, AliasAnalysis, "basicaa",
"Basic Alias Analysis (default AA impl)",
false, // Is CFG Only?
true, // Is Analysis?
true); // Is default Analysis Group implementation?
}
Here we show how the default implementation is specified (using the final
argument to the INITIALIZE_AG_PASS
template). There must be exactly one
default implementation available at all times for an Analysis Group to be used.
Only default implementation can derive from ImmutablePass
. Here we declare
that the BasicAliasAnalysis pass is the default
implementation for the interface.
Pass Statistics¶
The Statistic class is
designed to be an easy way to expose various success metrics from passes.
These statistics are printed at the end of a run, when the -stats
command line option is enabled on the command line. See the Statistics
section in the Programmer’s Manual for details.
What PassManager does¶
The PassManager class takes a list of passes, ensures their prerequisites are set up correctly, and then schedules passes to run efficiently. All of the LLVM tools that run passes use the PassManager for execution of these passes.
The PassManager does two main things to try to reduce the execution time of a series of passes:
Share analysis results. The
PassManager
attempts to avoid recomputing analysis results as much as possible. This means keeping track of which analyses are available already, which analyses get invalidated, and which analyses are needed to be run for a pass. An important part of work is that thePassManager
tracks the exact lifetime of all analysis results, allowing it to free memory allocated to holding analysis results as soon as they are no longer needed.Pipeline the execution of passes on the program. The
PassManager
attempts to get better cache and memory usage behavior out of a series of passes by pipelining the passes together. This means that, given a series of consecutive FunctionPass, it will execute all of the FunctionPass on the first function, then all of the FunctionPasses on the second function, etc... until the entire program has been run through the passes.This improves the cache behavior of the compiler, because it is only touching the LLVM program representation for a single function at a time, instead of traversing the entire program. It reduces the memory consumption of compiler, because, for example, only one DominatorSet needs to be calculated at a time. This also makes it possible to implement some interesting enhancements in the future.
The effectiveness of the PassManager
is influenced directly by how much
information it has about the behaviors of the passes it is scheduling. For
example, the “preserved” set is intentionally conservative in the face of an
unimplemented getAnalysisUsage
method. Not implementing when it should be implemented will have the effect of
not allowing any analysis results to live across the execution of your pass.
The PassManager
class exposes a --debug-pass
command line options that
is useful for debugging pass execution, seeing how things work, and diagnosing
when you should be preserving more analyses than you currently are. (To get
information about all of the variants of the --debug-pass
option, just type
“opt -help-hidden
”).
By using the –debug-pass=Structure option, for example, we can see how our Hello World pass interacts with other passes. Lets try it out with the gcse and licm passes:
$ opt -load ../../../Debug+Asserts/lib/Hello.so -gcse -licm --debug-pass=Structure < hello.bc > /dev/null
Module Pass Manager
Function Pass Manager
Dominator Set Construction
Immediate Dominators Construction
Global Common Subexpression Elimination
-- Immediate Dominators Construction
-- Global Common Subexpression Elimination
Natural Loop Construction
Loop Invariant Code Motion
-- Natural Loop Construction
-- Loop Invariant Code Motion
Module Verifier
-- Dominator Set Construction
-- Module Verifier
Bitcode Writer
--Bitcode Writer
This output shows us when passes are constructed and when the analysis results
are known to be dead (prefixed with “--
”). Here we see that GCSE uses
dominator and immediate dominator information to do its job. The LICM pass
uses natural loop information, which uses dominator sets, but not immediate
dominators. Because immediate dominators are no longer useful after the GCSE
pass, it is immediately destroyed. The dominator sets are then reused to
compute natural loop information, which is then used by the LICM pass.
After the LICM pass, the module verifier runs (which is automatically added by the opt tool), which uses the dominator set to check that the resultant LLVM code is well formed. After it finishes, the dominator set information is destroyed, after being computed once, and shared by three passes.
Lets see how this changes when we run the Hello World pass in between the two passes:
$ opt -load ../../../Debug+Asserts/lib/Hello.so -gcse -hello -licm --debug-pass=Structure < hello.bc > /dev/null
Module Pass Manager
Function Pass Manager
Dominator Set Construction
Immediate Dominators Construction
Global Common Subexpression Elimination
-- Dominator Set Construction
-- Immediate Dominators Construction
-- Global Common Subexpression Elimination
Hello World Pass
-- Hello World Pass
Dominator Set Construction
Natural Loop Construction
Loop Invariant Code Motion
-- Natural Loop Construction
-- Loop Invariant Code Motion
Module Verifier
-- Dominator Set Construction
-- Module Verifier
Bitcode Writer
--Bitcode Writer
Hello: __main
Hello: puts
Hello: main
Here we see that the Hello World pass has killed the Dominator Set pass, even though it doesn’t modify the code at all! To fix this, we need to add the following getAnalysisUsage method to our pass:
// We don't modify the program, so we preserve all analyses
void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesAll();
}
Now when we run our pass, we get this output:
$ opt -load ../../../Debug+Asserts/lib/Hello.so -gcse -hello -licm --debug-pass=Structure < hello.bc > /dev/null
Pass Arguments: -gcse -hello -licm
Module Pass Manager
Function Pass Manager
Dominator Set Construction
Immediate Dominators Construction
Global Common Subexpression Elimination
-- Immediate Dominators Construction
-- Global Common Subexpression Elimination
Hello World Pass
-- Hello World Pass
Natural Loop Construction
Loop Invariant Code Motion
-- Loop Invariant Code Motion
-- Natural Loop Construction
Module Verifier
-- Dominator Set Construction
-- Module Verifier
Bitcode Writer
--Bitcode Writer
Hello: __main
Hello: puts
Hello: main
Which shows that we don’t accidentally invalidate dominator information anymore, and therefore do not have to compute it twice.
The releaseMemory
method¶
virtual void releaseMemory();
The PassManager
automatically determines when to compute analysis results,
and how long to keep them around for. Because the lifetime of the pass object
itself is effectively the entire duration of the compilation process, we need
some way to free analysis results when they are no longer useful. The
releaseMemory
virtual method is the way to do this.
If you are writing an analysis or any other pass that retains a significant
amount of state (for use by another pass which “requires” your pass and uses
the getAnalysis method) you should
implement releaseMemory
to, well, release the memory allocated to maintain
this internal state. This method is called after the run*
method for the
class, before the next call of run*
in your pass.
Registering dynamically loaded passes¶
Size matters when constructing production quality tools using LLVM, both for the purposes of distribution, and for regulating the resident code size when running on the target system. Therefore, it becomes desirable to selectively use some passes, while omitting others and maintain the flexibility to change configurations later on. You want to be able to do all this, and, provide feedback to the user. This is where pass registration comes into play.
The fundamental mechanisms for pass registration are the
MachinePassRegistry
class and subclasses of MachinePassRegistryNode
.
An instance of MachinePassRegistry
is used to maintain a list of
MachinePassRegistryNode
objects. This instance maintains the list and
communicates additions and deletions to the command line interface.
An instance of MachinePassRegistryNode
subclass is used to maintain
information provided about a particular pass. This information includes the
command line name, the command help string and the address of the function used
to create an instance of the pass. A global static constructor of one of these
instances registers with a corresponding MachinePassRegistry
, the static
destructor unregisters. Thus a pass that is statically linked in the tool
will be registered at start up. A dynamically loaded pass will register on
load and unregister at unload.
Using existing registries¶
There are predefined registries to track instruction scheduling
(RegisterScheduler
) and register allocation (RegisterRegAlloc
) machine
passes. Here we will describe how to register a register allocator machine
pass.
Implement your register allocator machine pass. In your register allocator
.cpp
file add the following include:
#include "llvm/CodeGen/RegAllocRegistry.h"
Also in your register allocator .cpp
file, define a creator function in the
form:
FunctionPass *createMyRegisterAllocator() {
return new MyRegisterAllocator();
}
Note that the signature of this function should match the type of
RegisterRegAlloc::FunctionPassCtor
. In the same file add the “installing”
declaration, in the form:
static RegisterRegAlloc myRegAlloc("myregalloc",
"my register allocator help string",
createMyRegisterAllocator);
Note the two spaces prior to the help string produces a tidy result on the
-help
query.
$ llc -help
...
-regalloc - Register allocator to use (default=linearscan)
=linearscan - linear scan register allocator
=local - local register allocator
=simple - simple register allocator
=myregalloc - my register allocator help string
...
And that’s it. The user is now free to use -regalloc=myregalloc
as an
option. Registering instruction schedulers is similar except use the
RegisterScheduler
class. Note that the
RegisterScheduler::FunctionPassCtor
is significantly different from
RegisterRegAlloc::FunctionPassCtor
.
To force the load/linking of your register allocator into the
llc/lli tools, add your creator function’s global
declaration to Passes.h
and add a “pseudo” call line to
llvm/Codegen/LinkAllCodegenComponents.h
.
Creating new registries¶
The easiest way to get started is to clone one of the existing registries; we
recommend llvm/CodeGen/RegAllocRegistry.h
. The key things to modify are
the class name and the FunctionPassCtor
type.
Then you need to declare the registry. Example: if your pass registry is
RegisterMyPasses
then define:
MachinePassRegistry RegisterMyPasses::Registry;
And finally, declare the command line option for your passes. Example:
cl::opt<RegisterMyPasses::FunctionPassCtor, false,
RegisterPassParser<RegisterMyPasses> >
MyPassOpt("mypass",
cl::init(&createDefaultMyPass),
cl::desc("my pass option help"));
Here the command option is “mypass
”, with createDefaultMyPass
as the
default creator.
Using GDB with dynamically loaded passes¶
Unfortunately, using GDB with dynamically loaded passes is not as easy as it should be. First of all, you can’t set a breakpoint in a shared object that has not been loaded yet, and second of all there are problems with inlined functions in shared objects. Here are some suggestions to debugging your pass with GDB.
For sake of discussion, I’m going to assume that you are debugging a transformation invoked by opt, although nothing described here depends on that.
Setting a breakpoint in your pass¶
First thing you do is start gdb on the opt process:
$ gdb opt
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.6"...
(gdb)
Note that opt has a lot of debugging information in it, so it takes
time to load. Be patient. Since we cannot set a breakpoint in our pass yet
(the shared object isn’t loaded until runtime), we must execute the process,
and have it stop before it invokes our pass, but after it has loaded the shared
object. The most foolproof way of doing this is to set a breakpoint in
PassManager::run
and then run the process with the arguments you want:
$ (gdb) break llvm::PassManager::run
Breakpoint 1 at 0x2413bc: file Pass.cpp, line 70.
(gdb) run test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]
Starting program: opt test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]
Breakpoint 1, PassManager::run (this=0xffbef174, M=@0x70b298) at Pass.cpp:70
70 bool PassManager::run(Module &M) { return PM->run(M); }
(gdb)
Once the opt stops in the PassManager::run
method you are now
free to set breakpoints in your pass so that you can trace through execution or
do other standard debugging stuff.
Miscellaneous Problems¶
Once you have the basics down, there are a couple of problems that GDB has, some with solutions, some without.
- Inline functions have bogus stack information. In general, GDB does a pretty
good job getting stack traces and stepping through inline functions. When a
pass is dynamically loaded however, it somehow completely loses this
capability. The only solution I know of is to de-inline a function (move it
from the body of a class to a
.cpp
file). - Restarting the program breaks breakpoints. After following the information
above, you have succeeded in getting some breakpoints planted in your pass.
Nex thing you know, you restart the program (i.e., you type “
run
” again), and you start getting errors about breakpoints being unsettable. The only way I have found to “fix” this problem is to delete the breakpoints that are already set in your pass, run the program, and re-set the breakpoints once execution stops inPassManager::run
.
Hopefully these tips will help with common case debugging situations. If you’d like to contribute some tips of your own, just contact Chris.
Future extensions planned¶
Although the LLVM Pass Infrastructure is very capable as it stands, and does some nifty stuff, there are things we’d like to add in the future. Here is where we are going:
Multithreaded LLVM¶
Multiple CPU machines are becoming more common and compilation can never be
fast enough: obviously we should allow for a multithreaded compiler. Because
of the semantics defined for passes above (specifically they cannot maintain
state across invocations of their run*
methods), a nice clean way to
implement a multithreaded compiler would be for the PassManager
class to
create multiple instances of each pass object, and allow the separate instances
to be hacking on different parts of the program at the same time.
This implementation would prevent each of the passes from having to implement multithreaded constructs, requiring only the LLVM core to have locking in a few places (for global resources). Although this is a simple extension, we simply haven’t had time (or multiprocessor machines, thus a reason) to implement this. Despite that, we have kept the LLVM passes SMP ready, and you should too.
How To Use Attributes¶
Introduction¶
Attributes in LLVM have changed in some fundamental ways. It was necessary to do this to support expanding the attributes to encompass more than a handful of attributes — e.g. command line options. The old way of handling attributes consisted of representing them as a bit mask of values. This bit mask was stored in a “list” structure that was reference counted. The advantage of this was that attributes could be manipulated with ‘or’s and ‘and’s. The disadvantage of this was that there was limited room for expansion, and virtually no support for attribute-value pairs other than alignment.
In the new scheme, an Attribute
object represents a single attribute that’s
uniqued. You use the Attribute::get
methods to create a new Attribute
object. An attribute can be a single “enum” value (the enum being the
Attribute::AttrKind
enum), a string representing a target-dependent
attribute, or an attribute-value pair. Some examples:
- Target-independent:
noinline
,zext
- Target-dependent:
"no-sse"
,"thumb2"
- Attribute-value pair:
"cpu" = "cortex-a8"
,align = 4
Note: for an attribute value pair, we expect a target-dependent attribute to have a string for the value.
Attribute
¶
An Attribute
object is designed to be passed around by value.
Because attributes are no longer represented as a bit mask, you will need to convert any code which does treat them as a bit mask to use the new query methods on the Attribute class.
AttributeSet
¶
The AttributeSet
class replaces the old AttributeList
class. The
AttributeSet
stores a collection of Attribute objects for each kind of
object that may have an attribute associated with it: the function as a
whole, the return type, or the function’s parameters. A function’s attributes
are at index AttributeSet::FunctionIndex
; the return type’s attributes are
at index AttributeSet::ReturnIndex
; and the function’s parameters’
attributes are at indices 1, ..., n (where ‘n’ is the number of parameters).
Most methods on the AttributeSet
class take an index parameter.
An AttributeSet
is also a uniqued and immutable object. You create an
AttributeSet
through the AttributeSet::get
methods. You can add and
remove attributes, which result in the creation of a new AttributeSet
.
An AttributeSet
object is designed to be passed around by value.
Note: It is advised that you do not use the AttributeSet
“introspection”
methods (e.g. Raw
, getRawPointer
, etc.). These methods break
encapsulation, and may be removed in a future release (i.e. LLVM 4.0).
AttrBuilder
¶
Lastly, we have a “builder” class to help create the AttributeSet
object
without having to create several different intermediate uniqued
AttributeSet
objects. The AttrBuilder
class allows you to add and
remove attributes at will. The attributes won’t be uniqued until you call the
appropriate AttributeSet::get
method.
An AttrBuilder
object is not designed to be passed around by value. It
should be passed by reference.
Note: It is advised that you do not use the AttrBuilder::addRawValue()
method or the AttrBuilder(uint64_t Val)
constructor. These are for
backwards compatibility and may be removed in a future release (i.e. LLVM 4.0).
And that’s basically it! A lot of functionality is hidden behind these classes, but the interfaces are pretty straight forward.
User Guide for NVPTX Back-end¶
Introduction¶
To support GPU programming, the NVPTX back-end supports a subset of LLVM IR along with a defined set of conventions used to represent GPU programming concepts. This document provides an overview of the general usage of the back- end, including a description of the conventions used and the set of accepted LLVM IR.
Note
This document assumes a basic familiarity with CUDA and the PTX assembly language. Information about the CUDA Driver API and the PTX assembly language can be found in the CUDA documentation.
Conventions¶
Marking Functions as Kernels¶
In PTX, there are two types of functions: device functions, which are only
callable by device code, and kernel functions, which are callable by host
code. By default, the back-end will emit device functions. Metadata is used to
declare a function as a kernel function. This metadata is attached to the
nvvm.annotations
named metadata object, and has the following format:
!0 = metadata !{<function-ref>, metadata !"kernel", i32 1}
The first parameter is a reference to the kernel function. The following
example shows a kernel function calling a device function in LLVM IR. The
function @my_kernel
is callable from host code, but @my_fmad
is not.
define float @my_fmad(float %x, float %y, float %z) {
%mul = fmul float %x, %y
%add = fadd float %mul, %z
ret float %add
}
define void @my_kernel(float* %ptr) {
%val = load float* %ptr
%ret = call float @my_fmad(float %val, float %val, float %val)
store float %ret, float* %ptr
ret void
}
!nvvm.annotations = !{!1}
!1 = metadata !{void (float*)* @my_kernel, metadata !"kernel", i32 1}
When compiled, the PTX kernel functions are callable by host-side code.
Address Spaces¶
The NVPTX back-end uses the following address space mapping:
Address Space Memory Space 0 Generic 1 Global 2 Internal Use 3 Shared 4 Constant 5 Local
Every global variable and pointer type is assigned to one of these address spaces, with 0 being the default address space. Intrinsics are provided which can be used to convert pointers between the generic and non-generic address spaces.
As an example, the following IR will define an array @g
that resides in
global device memory.
@g = internal addrspace(1) global [4 x i32] [ i32 0, i32 1, i32 2, i32 3 ]
LLVM IR functions can read and write to this array, and host-side code can copy data to it by name with the CUDA Driver API.
Note that since address space 0 is the generic space, it is illegal to have
global variables in address space 0. Address space 0 is the default address
space in LLVM, so the addrspace(N)
annotation is required for global
variables.
Triples¶
The NVPTX target uses the module triple to select between 32/64-bit code
generation and the driver-compiler interface to use. The triple architecture
can be one of nvptx
(32-bit PTX) or nvptx64
(64-bit PTX). The
operating system should be one of cuda
or nvcl
, which determines the
interface used by the generated code to communicate with the driver. Most
users will want to use cuda
as the operating system, which makes the
generated PTX compatible with the CUDA Driver API.
Example: 32-bit PTX for CUDA Driver API: nvptx-nvidia-cuda
Example: 64-bit PTX for CUDA Driver API: nvptx64-nvidia-cuda
NVPTX Intrinsics¶
Address Space Conversion¶
‘llvm.nvvm.ptr.*.to.gen
‘ Intrinsics¶
Syntax:¶
These are overloaded intrinsics. You can use these on any pointer types.
declare i8* @llvm.nvvm.ptr.global.to.gen.p0i8.p1i8(i8 addrspace(1)*)
declare i8* @llvm.nvvm.ptr.shared.to.gen.p0i8.p3i8(i8 addrspace(3)*)
declare i8* @llvm.nvvm.ptr.constant.to.gen.p0i8.p4i8(i8 addrspace(4)*)
declare i8* @llvm.nvvm.ptr.local.to.gen.p0i8.p5i8(i8 addrspace(5)*)
Overview:¶
The ‘llvm.nvvm.ptr.*.to.gen
‘ intrinsics convert a pointer in a non-generic
address space to a generic address space pointer.
Semantics:¶
These intrinsics modify the pointer value to be a valid generic address space pointer.
‘llvm.nvvm.ptr.gen.to.*
‘ Intrinsics¶
Syntax:¶
These are overloaded intrinsics. You can use these on any pointer types.
declare i8 addrspace(1)* @llvm.nvvm.ptr.gen.to.global.p1i8.p0i8(i8*)
declare i8 addrspace(3)* @llvm.nvvm.ptr.gen.to.shared.p3i8.p0i8(i8*)
declare i8 addrspace(4)* @llvm.nvvm.ptr.gen.to.constant.p4i8.p0i8(i8*)
declare i8 addrspace(5)* @llvm.nvvm.ptr.gen.to.local.p5i8.p0i8(i8*)
Overview:¶
The ‘llvm.nvvm.ptr.gen.to.*
‘ intrinsics convert a pointer in the generic
address space to a pointer in the target address space. Note that these
intrinsics are only useful if the address space of the target address space of
the pointer is known. It is not legal to use address space conversion
intrinsics to convert a pointer from one non-generic address space to another
non-generic address space.
Semantics:¶
These intrinsics modify the pointer value to be a valid pointer in the target non-generic address space.
Reading PTX Special Registers¶
‘llvm.nvvm.read.ptx.sreg.*
‘¶
Syntax:¶
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.tid.y()
declare i32 @llvm.nvvm.read.ptx.sreg.tid.z()
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.y()
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.z()
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y()
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.z()
declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()
declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.y()
declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.z()
declare i32 @llvm.nvvm.read.ptx.sreg.warpsize()
Overview:¶
The ‘@llvm.nvvm.read.ptx.sreg.*
‘ intrinsics provide access to the PTX
special registers, in particular the kernel launch bounds. These registers
map in the following way to CUDA builtins:
CUDA Builtin PTX Special Register Intrinsic threadId
@llvm.nvvm.read.ptx.sreg.tid.*
blockIdx
@llvm.nvvm.read.ptx.sreg.ctaid.*
blockDim
@llvm.nvvm.read.ptx.sreg.ntid.*
gridDim
@llvm.nvvm.read.ptx.sreg.nctaid.*
Barriers¶
‘llvm.nvvm.barrier0
‘¶
Syntax:¶
declare void @llvm.nvvm.barrier0()
Overview:¶
The ‘@llvm.nvvm.barrier0()
‘ intrinsic emits a PTX bar.sync 0
instruction, equivalent to the __syncthreads()
call in CUDA.
Other Intrinsics¶
For the full set of NVPTX intrinsics, please see the
include/llvm/IR/IntrinsicsNVVM.td
file in the LLVM source tree.
Linking with Libdevice¶
The CUDA Toolkit comes with an LLVM bitcode library called libdevice
that
implements many common mathematical functions. This library can be used as a
high-performance math library for any compilers using the LLVM NVPTX target.
The library can be found under nvvm/libdevice/
in the CUDA Toolkit and
there is a separate version for each compute architecture.
For a list of all math functions implemented in libdevice, see libdevice Users Guide.
To accommodate various math-related compiler flags that can affect code
generation of libdevice code, the library code depends on a special LLVM IR
pass (NVVMReflect
) to handle conditional compilation within LLVM IR. This
pass looks for calls to the @__nvvm_reflect
function and replaces them
with constants based on the defined reflection parameters. Such conditional
code often follows a pattern:
float my_function(float a) {
if (__nvvm_reflect("FASTMATH"))
return my_function_fast(a);
else
return my_function_precise(a);
}
The default value for all unspecified reflection parameters is zero.
The NVVMReflect
pass should be executed early in the optimization
pipeline, immediately after the link stage. The internalize
pass is also
recommended to remove unused math functions from the resulting PTX. For an
input IR module module.bc
, the following compilation flow is recommended:
- Save list of external functions in
module.bc
- Link
module.bc
withlibdevice.compute_XX.YY.bc
- Internalize all functions not in list from (1)
- Eliminate all unused internal functions
- Run
NVVMReflect
pass - Run standard optimization pipeline
Note
linkonce
and linkonce_odr
linkage types are not suitable for the
libdevice functions. It is possible to link two IR modules that have been
linked against libdevice using different reflection variables.
Since the NVVMReflect
pass replaces conditionals with constants, it will
often leave behind dead code of the form:
entry:
..
br i1 true, label %foo, label %bar
foo:
..
bar:
; Dead code
..
Therefore, it is recommended that NVVMReflect
is executed early in the
optimization pipeline before dead-code elimination.
Reflection Parameters¶
The libdevice library currently uses the following reflection parameters to control code generation:
Flag | Description |
---|---|
__CUDA_FTZ=[0,1] |
Use optimized code paths that flush subnormals to zero |
Invoking NVVMReflect¶
To ensure that all dead code caused by the reflection pass is eliminated, it
is recommended that the reflection pass is executed early in the LLVM IR
optimization pipeline. The pass takes an optional mapping of reflection
parameter name to an integer value. This mapping can be specified as either a
command-line option to opt
or as an LLVM StringMap<int>
object when
programmatically creating a pass pipeline.
With opt
:
# opt -nvvm-reflect -nvvm-reflect-list=<var>=<value>,<var>=<value> module.bc -o module.reflect.bc
With programmatic pass pipeline:
extern ModulePass *llvm::createNVVMReflectPass(const StringMap<int>& Mapping);
StringMap<int> ReflectParams;
ReflectParams["__CUDA_FTZ"] = 1;
Passes.add(createNVVMReflectPass(ReflectParams));
Executing PTX¶
The most common way to execute PTX assembly on a GPU device is to use the CUDA Driver API. This API is a low-level interface to the GPU driver and allows for JIT compilation of PTX code to native GPU machine code.
Initializing the Driver API:
CUdevice device;
CUcontext context;
// Initialize the driver API
cuInit(0);
// Get a handle to the first compute device
cuDeviceGet(&device, 0);
// Create a compute device context
cuCtxCreate(&context, 0, device);
JIT compiling a PTX string to a device binary:
CUmodule module;
CUfunction funcion;
// JIT compile a null-terminated PTX string
cuModuleLoadData(&module, (void*)PTXString);
// Get a handle to the "myfunction" kernel function
cuModuleGetFunction(&function, module, "myfunction");
For full examples of executing PTX assembly, please see the CUDA Samples distribution.
Common Issues¶
ptxas complains of undefined function: __nvvm_reflect¶
When linking with libdevice, the NVVMReflect
pass must be used. See
Linking with Libdevice for more information.
Tutorial: A Simple Compute Kernel¶
To start, let us take a look at a simple compute kernel written directly in LLVM IR. The kernel implements vector addition, where each thread computes one element of the output vector C from the input vectors A and B. To make this easier, we also assume that only a single CTA (thread block) will be launched, and that it will be one dimensional.
The Kernel¶
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
; Intrinsic to read X component of thread ID
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
define void @kernel(float addrspace(1)* %A,
float addrspace(1)* %B,
float addrspace(1)* %C) {
entry:
; What is my ID?
%id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
; Compute pointers into A, B, and C
%ptrA = getelementptr float addrspace(1)* %A, i32 %id
%ptrB = getelementptr float addrspace(1)* %B, i32 %id
%ptrC = getelementptr float addrspace(1)* %C, i32 %id
; Read A, B
%valA = load float addrspace(1)* %ptrA, align 4
%valB = load float addrspace(1)* %ptrB, align 4
; Compute C = A + B
%valC = fadd float %valA, %valB
; Store back to C
store float %valC, float addrspace(1)* %ptrC, align 4
ret void
}
!nvvm.annotations = !{!0}
!0 = metadata !{void (float addrspace(1)*,
float addrspace(1)*,
float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
We can use the LLVM llc
tool to directly run the NVPTX code generator:
# llc -mcpu=sm_20 kernel.ll -o kernel.ptx
Note
If you want to generate 32-bit code, change p:64:64:64
to p:32:32:32
in the module data layout string and use nvptx-nvidia-cuda
as the
target triple.
The output we get from llc
(as of LLVM 3.4):
//
// Generated by LLVM NVPTX Back-End
//
.version 3.1
.target sm_20
.address_size 64
// .globl kernel
// @kernel
.visible .entry kernel(
.param .u64 kernel_param_0,
.param .u64 kernel_param_1,
.param .u64 kernel_param_2
)
{
.reg .f32 %f<4>;
.reg .s32 %r<2>;
.reg .s64 %rl<8>;
// BB#0: // %entry
ld.param.u64 %rl1, [kernel_param_0];
mov.u32 %r1, %tid.x;
mul.wide.s32 %rl2, %r1, 4;
add.s64 %rl3, %rl1, %rl2;
ld.param.u64 %rl4, [kernel_param_1];
add.s64 %rl5, %rl4, %rl2;
ld.param.u64 %rl6, [kernel_param_2];
add.s64 %rl7, %rl6, %rl2;
ld.global.f32 %f1, [%rl3];
ld.global.f32 %f2, [%rl5];
add.f32 %f3, %f1, %f2;
st.global.f32 [%rl7], %f3;
ret;
}
Dissecting the Kernel¶
Now let us dissect the LLVM IR that makes up this kernel.
Data Layout¶
The data layout string determines the size in bits of common data types, their ABI alignment, and their storage size. For NVPTX, you should use one of the following:
32-bit PTX:
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
64-bit PTX:
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
Target Intrinsics¶
In this example, we use the @llvm.nvvm.read.ptx.sreg.tid.x
intrinsic to
read the X component of the current thread’s ID, which corresponds to a read
of register %tid.x
in PTX. The NVPTX back-end supports a large set of
intrinsics. A short list is shown below; please see
include/llvm/IR/IntrinsicsNVVM.td
for the full list.
Intrinsic | CUDA Equivalent |
---|---|
i32 @llvm.nvvm.read.ptx.sreg.tid.{x,y,z} |
threadIdx.{x,y,z} |
i32 @llvm.nvvm.read.ptx.sreg.ctaid.{x,y,z} |
blockIdx.{x,y,z} |
i32 @llvm.nvvm.read.ptx.sreg.ntid.{x,y,z} |
blockDim.{x,y,z} |
i32 @llvm.nvvm.read.ptx.sreg.nctaid.{x,y,z} |
gridDim.{x,y,z} |
void @llvm.cuda.syncthreads() |
__syncthreads() |
Address Spaces¶
You may have noticed that all of the pointer types in the LLVM IR example had an explicit address space specifier. What is address space 1? NVIDIA GPU devices (generally) have four types of memory:
- Global: Large, off-chip memory
- Shared: Small, on-chip memory shared among all threads in a CTA
- Local: Per-thread, private memory
- Constant: Read-only memory shared across all threads
These different types of memory are represented in LLVM IR as address spaces. There is also a fifth address space used by the NVPTX code generator that corresponds to the “generic” address space. This address space can represent addresses in any other address space (with a few exceptions). This allows users to write IR functions that can load/store memory using the same instructions. Intrinsics are provided to convert pointers between the generic and non-generic address spaces.
See Address Spaces and NVPTX Intrinsics for more information.
Kernel Metadata¶
In PTX, a function can be either a kernel function (callable from the host
program), or a device function (callable only from GPU code). You can think
of kernel functions as entry-points in the GPU program. To mark an LLVM IR
function as a kernel function, we make use of special LLVM metadata. The
NVPTX back-end will look for a named metadata node called
nvvm.annotations
. This named metadata must contain a list of metadata that
describe the IR. For our purposes, we need to declare a metadata node that
assigns the “kernel” attribute to the LLVM IR function that should be emitted
as a PTX kernel function. These metadata nodes take the form:
metadata !{<function ref>, metadata !"kernel", i32 1}
For the previous example, we have:
!nvvm.annotations = !{!0}
!0 = metadata !{void (float addrspace(1)*,
float addrspace(1)*,
float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
Here, we have a single metadata declaration in nvvm.annotations
. This
metadata annotates our @kernel
function with the kernel
attribute.
Running the Kernel¶
Generating PTX from LLVM IR is all well and good, but how do we execute it on a real GPU device? The CUDA Driver API provides a convenient mechanism for loading and JIT compiling PTX to a native GPU device, and launching a kernel. The API is similar to OpenCL. A simple example showing how to load and execute our vector addition code is shown below. Note that for brevity this code does not perform much error checking!
Note
You can also use the ptxas
tool provided by the CUDA Toolkit to offline
compile PTX to machine code (SASS) for a specific GPU architecture. Such
binaries can be loaded by the CUDA Driver API in the same way as PTX. This
can be useful for reducing startup time by precompiling the PTX kernels.
#include <iostream>
#include <fstream>
#include <cassert>
#include "cuda.h"
void checkCudaErrors(CUresult err) {
assert(err == CUDA_SUCCESS);
}
/// main - Program entry point
int main(int argc, char **argv) {
CUdevice device;
CUmodule cudaModule;
CUcontext context;
CUfunction function;
CUlinkState linker;
int devCount;
// CUDA initialization
checkCudaErrors(cuInit(0));
checkCudaErrors(cuDeviceGetCount(&devCount));
checkCudaErrors(cuDeviceGet(&device, 0));
char name[128];
checkCudaErrors(cuDeviceGetName(name, 128, device));
std::cout << "Using CUDA Device [0]: " << name << "\n";
int devMajor, devMinor;
checkCudaErrors(cuDeviceComputeCapability(&devMajor, &devMinor, device));
std::cout << "Device Compute Capability: "
<< devMajor << "." << devMinor << "\n";
if (devMajor < 2) {
std::cerr << "ERROR: Device 0 is not SM 2.0 or greater\n";
return 1;
}
std::ifstream t("kernel.ptx");
if (!t.is_open()) {
std::cerr << "kernel.ptx not found\n";
return 1;
}
std::string str((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
// Create driver context
checkCudaErrors(cuCtxCreate(&context, 0, device));
// Create module for object
checkCudaErrors(cuModuleLoadDataEx(&cudaModule, str.c_str(), 0, 0, 0));
// Get kernel function
checkCudaErrors(cuModuleGetFunction(&function, cudaModule, "kernel"));
// Device data
CUdeviceptr devBufferA;
CUdeviceptr devBufferB;
CUdeviceptr devBufferC;
checkCudaErrors(cuMemAlloc(&devBufferA, sizeof(float)*16));
checkCudaErrors(cuMemAlloc(&devBufferB, sizeof(float)*16));
checkCudaErrors(cuMemAlloc(&devBufferC, sizeof(float)*16));
float* hostA = new float[16];
float* hostB = new float[16];
float* hostC = new float[16];
// Populate input
for (unsigned i = 0; i != 16; ++i) {
hostA[i] = (float)i;
hostB[i] = (float)(2*i);
hostC[i] = 0.0f;
}
checkCudaErrors(cuMemcpyHtoD(devBufferA, &hostA[0], sizeof(float)*16));
checkCudaErrors(cuMemcpyHtoD(devBufferB, &hostB[0], sizeof(float)*16));
unsigned blockSizeX = 16;
unsigned blockSizeY = 1;
unsigned blockSizeZ = 1;
unsigned gridSizeX = 1;
unsigned gridSizeY = 1;
unsigned gridSizeZ = 1;
// Kernel parameters
void *KernelParams[] = { &devBufferA, &devBufferB, &devBufferC };
std::cout << "Launching kernel\n";
// Kernel launch
checkCudaErrors(cuLaunchKernel(function, gridSizeX, gridSizeY, gridSizeZ,
blockSizeX, blockSizeY, blockSizeZ,
0, NULL, KernelParams, NULL));
// Retrieve device data
checkCudaErrors(cuMemcpyDtoH(&hostC[0], devBufferC, sizeof(float)*16));
std::cout << "Results:\n";
for (unsigned i = 0; i != 16; ++i) {
std::cout << hostA[i] << " + " << hostB[i] << " = " << hostC[i] << "\n";
}
// Clean up after ourselves
delete [] hostA;
delete [] hostB;
delete [] hostC;
// Clean-up
checkCudaErrors(cuMemFree(devBufferA));
checkCudaErrors(cuMemFree(devBufferB));
checkCudaErrors(cuMemFree(devBufferC));
checkCudaErrors(cuModuleUnload(cudaModule));
checkCudaErrors(cuCtxDestroy(context));
return 0;
}
You will need to link with the CUDA driver and specify the path to cuda.h.
# clang++ sample.cpp -o sample -O2 -g -I/usr/local/cuda-5.5/include -lcuda
We don’t need to specify a path to libcuda.so
since this is installed in a
system location by the driver, not the CUDA toolkit.
If everything goes as planned, you should see the following output when running the compiled program:
Using CUDA Device [0]: GeForce GTX 680
Device Compute Capability: 3.0
Launching kernel
Results:
0 + 0 = 0
1 + 2 = 3
2 + 4 = 6
3 + 6 = 9
4 + 8 = 12
5 + 10 = 15
6 + 12 = 18
7 + 14 = 21
8 + 16 = 24
9 + 18 = 27
10 + 20 = 30
11 + 22 = 33
12 + 24 = 36
13 + 26 = 39
14 + 28 = 42
15 + 30 = 45
Note
You will likely see a different device identifier based on your hardware
Tutorial: Linking with Libdevice¶
In this tutorial, we show a simple example of linking LLVM IR with the
libdevice library. We will use the same kernel as the previous tutorial,
except that we will compute C = pow(A, B)
instead of C = A + B
.
Libdevice provides an __nv_powf
function that we will use.
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
; Intrinsic to read X component of thread ID
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
; libdevice function
declare float @__nv_powf(float, float)
define void @kernel(float addrspace(1)* %A,
float addrspace(1)* %B,
float addrspace(1)* %C) {
entry:
; What is my ID?
%id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
; Compute pointers into A, B, and C
%ptrA = getelementptr float addrspace(1)* %A, i32 %id
%ptrB = getelementptr float addrspace(1)* %B, i32 %id
%ptrC = getelementptr float addrspace(1)* %C, i32 %id
; Read A, B
%valA = load float addrspace(1)* %ptrA, align 4
%valB = load float addrspace(1)* %ptrB, align 4
; Compute C = pow(A, B)
%valC = call float @__nv_powf(float %valA, float %valB)
; Store back to C
store float %valC, float addrspace(1)* %ptrC, align 4
ret void
}
!nvvm.annotations = !{!0}
!0 = metadata !{void (float addrspace(1)*,
float addrspace(1)*,
float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
To compile this kernel, we perform the following steps:
- Link with libdevice
- Internalize all but the public kernel function
- Run
NVVMReflect
and set__CUDA_FTZ
to 0 - Optimize the linked module
- Codegen the module
These steps can be performed by the LLVM llvm-link
, opt
, and llc
tools. In a complete compiler, these steps can also be performed entirely
programmatically by setting up an appropriate pass configuration (see
Linking with Libdevice).
# llvm-link t2.bc libdevice.compute_20.10.bc -o t2.linked.bc
# opt -internalize -internalize-public-api-list=kernel -nvvm-reflect-list=__CUDA_FTZ=0 -nvvm-reflect -O3 t2.linked.bc -o t2.opt.bc
# llc -mcpu=sm_20 t2.opt.bc -o t2.ptx
Note
The -nvvm-reflect-list=_CUDA_FTZ=0
is not strictly required, as any
undefined variables will default to zero. It is shown here for evaluation
purposes.
This gives us the following PTX (excerpt):
//
// Generated by LLVM NVPTX Back-End
//
.version 3.1
.target sm_20
.address_size 64
// .globl kernel
// @kernel
.visible .entry kernel(
.param .u64 kernel_param_0,
.param .u64 kernel_param_1,
.param .u64 kernel_param_2
)
{
.reg .pred %p<30>;
.reg .f32 %f<111>;
.reg .s32 %r<21>;
.reg .s64 %rl<8>;
// BB#0: // %entry
ld.param.u64 %rl2, [kernel_param_0];
mov.u32 %r3, %tid.x;
ld.param.u64 %rl3, [kernel_param_1];
mul.wide.s32 %rl4, %r3, 4;
add.s64 %rl5, %rl2, %rl4;
ld.param.u64 %rl6, [kernel_param_2];
add.s64 %rl7, %rl3, %rl4;
add.s64 %rl1, %rl6, %rl4;
ld.global.f32 %f1, [%rl5];
ld.global.f32 %f2, [%rl7];
setp.eq.f32 %p1, %f1, 0f3F800000;
setp.eq.f32 %p2, %f2, 0f00000000;
or.pred %p3, %p1, %p2;
@%p3 bra BB0_1;
bra.uni BB0_2;
BB0_1:
mov.f32 %f110, 0f3F800000;
st.global.f32 [%rl1], %f110;
ret;
BB0_2: // %__nv_isnanf.exit.i
abs.f32 %f4, %f1;
setp.gtu.f32 %p4, %f4, 0f7F800000;
@%p4 bra BB0_4;
// BB#3: // %__nv_isnanf.exit5.i
abs.f32 %f5, %f2;
setp.le.f32 %p5, %f5, 0f7F800000;
@%p5 bra BB0_5;
BB0_4: // %.critedge1.i
add.f32 %f110, %f1, %f2;
st.global.f32 [%rl1], %f110;
ret;
BB0_5: // %__nv_isinff.exit.i
...
BB0_26: // %__nv_truncf.exit.i.i.i.i.i
mul.f32 %f90, %f107, 0f3FB8AA3B;
cvt.rzi.f32.f32 %f91, %f90;
mov.f32 %f92, 0fBF317200;
fma.rn.f32 %f93, %f91, %f92, %f107;
mov.f32 %f94, 0fB5BFBE8E;
fma.rn.f32 %f95, %f91, %f94, %f93;
mul.f32 %f89, %f95, 0f3FB8AA3B;
// inline asm
ex2.approx.ftz.f32 %f88,%f89;
// inline asm
add.f32 %f96, %f91, 0f00000000;
ex2.approx.f32 %f97, %f96;
mul.f32 %f98, %f88, %f97;
setp.lt.f32 %p15, %f107, 0fC2D20000;
selp.f32 %f99, 0f00000000, %f98, %p15;
setp.gt.f32 %p16, %f107, 0f42D20000;
selp.f32 %f110, 0f7F800000, %f99, %p16;
setp.eq.f32 %p17, %f110, 0f7F800000;
@%p17 bra BB0_28;
// BB#27:
fma.rn.f32 %f110, %f110, %f108, %f110;
BB0_28: // %__internal_accurate_powf.exit.i
setp.lt.f32 %p18, %f1, 0f00000000;
setp.eq.f32 %p19, %f3, 0f3F800000;
and.pred %p20, %p18, %p19;
@!%p20 bra BB0_30;
bra.uni BB0_29;
BB0_29:
mov.b32 %r9, %f110;
xor.b32 %r10, %r9, -2147483648;
mov.b32 %f110, %r10;
BB0_30: // %__nv_powf.exit
st.global.f32 [%rl1], %f110;
ret;
}
User Guide for AMDGPU Back-end¶
Introduction¶
The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with the R600 family up until the current Volcanic Islands (GCN Gen 3).
Assembler¶
The assembler is currently considered experimental.
For syntax examples look in test/MC/AMDGPU.
Below some of the currently supported features (modulo bugs). These all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands are also supported but may be missing some instructions and have more bugs:
DS Instructions¶
All DS instructions are supported.
FLAT Instructions¶
These instructions are only present in the Sea Islands and Volcanic Islands instruction set. All FLAT instructions are supported for these architectures
MUBUF Instructions¶
All non-atomic MUBUF instructions are supported.
SMRD Instructions¶
Only the s_load_dword* SMRD instructions are supported.
SOP1 Instructions¶
All SOP1 instructions are supported.
SOP2 Instructions¶
All SOP2 instructions are supported.
SOPC Instructions¶
All SOPC instructions are supported.
SOPP Instructions¶
Unless otherwise mentioned, all SOPP instructions that have one or more operands accept integer operands only. No verification is performed on the operands, so it is up to the programmer to be familiar with the range or acceptable values.
s_waitcnt¶
s_waitcnt accepts named arguments to specify which memory counter(s) to wait for.
// Wait for all counters to be 0
s_waitcnt 0
// Equivalent to s_waitcnt 0. Counter names can also be delimited by
// '&' or ','.
s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
// Wait for vmcnt counter to be 1.
s_waitcnt vmcnt(1)
VOP1, VOP2, VOP3, VOPC Instructions¶
All 32-bit and 64-bit encodings should work.
The assembler will automatically detect which encoding size to use for VOP1, VOP2, and VOPC instructions based on the operands. If you want to force a specific encoding size, you can add an _e32 (for 32-bit encoding) or _e64 (for 64-bit encoding) suffix to the instruction. Most, but not all instructions support an explicit suffix. These are all valid assembly strings:
v_mul_i32_i24 v1, v2, v3
v_mul_i32_i24_e32 v1, v2, v3
v_mul_i32_i24_e64 v1, v2, v3
Assembler Directives¶
.hsa_code_object_version major, minor¶
major and minor are integers that specify the version of the HSA code object that will be generated by the assembler. This value will be stored in an entry of the .note section.
.hsa_code_object_isa [major, minor, stepping, vendor, arch]¶
major, minor, and stepping are all integers that describe the instruction set architecture (ISA) version of the assembly program.
vendor and arch are quoted strings. vendor should always be equal to “AMD” and arch should always be equal to “AMDGPU”.
If no arguments are specified, then the assembler will derive the ISA version, vendor, and arch from the value of the -mcpu option that is passed to the assembler.
ISA version, vendor, and arch will all be stored in a single entry of the .note section.
.amd_kernel_code_t¶
This directive marks the beginning of a list of key / value pairs that are used to specify the amd_kernel_code_t object that will be emitted by the assembler. The list must be terminated by the .end_amd_kernel_code_t directive. For any amd_kernel_code_t values that are unspecified a default value will be used. The default value for all keys is 0, with the following exceptions:
- kernel_code_version_major defaults to 1.
- machine_kind defaults to 1.
- machine_version_major, machine_version_minor, and machine_version_stepping are derived from the value of the -mcpu option that is passed to the assembler.
- kernel_code_entry_byte_offset defaults to 256.
- wavefront_size defaults to 6.
- kernarg_segment_alignment, group_segment_alignment, and private_segment_alignment default to 4. Note that alignments are specified as a power of two, so a value of n means an alignment of 2^ n.
The .amd_kernel_code_t directive must be placed immediately after the function label and before any instructions.
For a full list of amd_kernel_code_t keys, see the examples in test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
Here is an example of a minimal amd_kernel_code_t specification:
.hsa_code_object_version 1,0
.hsa_code_object_isa
.text
hello_world:
.amd_kernel_code_t
enable_sgpr_kernarg_segment_ptr = 1
is_ptr64 = 1
compute_pgm_rsrc1_vgprs = 0
compute_pgm_rsrc1_sgprs = 0
compute_pgm_rsrc2_user_sgpr = 2
kernarg_segment_byte_size = 8
wavefront_sgpr_count = 2
workitem_vgpr_count = 3
.end_amd_kernel_code_t
s_load_dwordx2 s[0:1], s[0:1] 0x0
v_mov_b32 v0, 3.14159
s_waitcnt lgkmcnt(0)
v_mov_b32 v1, s0
v_mov_b32 v2, s1
flat_store_dword v0, v[1:2]
s_endpgm
Stack maps and patch points in LLVM¶
Definitions¶
In this document we refer to the “runtime” collectively as all components that serve as the LLVM client, including the LLVM IR generator, object code consumer, and code patcher.
A stack map records the location of live values
at a particular
instruction address. These live values
do not refer to all the
LLVM values live across the stack map. Instead, they are only the
values that the runtime requires to be live at this point. For
example, they may be the values the runtime will need to resume
program execution at that point independent of the compiled function
containing the stack map.
LLVM emits stack map data into the object code within a designated Stack Map Section. This stack map data contains a record for each stack map. The record stores the stack map’s instruction address and contains a entry for each mapped value. Each entry encodes a value’s location as a register, stack offset, or constant.
A patch point is an instruction address at which space is reserved for
patching a new instruction sequence at run time. Patch points look
much like calls to LLVM. They take arguments that follow a calling
convention and may return a value. They also imply stack map
generation, which allows the runtime to locate the patchpoint and
find the location of live values
at that point.
Motivation¶
This functionality is currently experimental but is potentially useful in a variety of settings, the most obvious being a runtime (JIT) compiler. Example applications of the patchpoint intrinsics are implementing an inline call cache for polymorphic method dispatch or optimizing the retrieval of properties in dynamically typed languages such as JavaScript.
The intrinsics documented here are currently used by the JavaScript compiler within the open source WebKit project, see the FTL JIT, but they are designed to be used whenever stack maps or code patching are needed. Because the intrinsics have experimental status, compatibility across LLVM releases is not guaranteed.
The stack map functionality described in this document is separate from the functionality described in Computing stack maps. GCFunctionMetadata provides the location of pointers into a collected heap captured by the GCRoot intrinsic, which can also be considered a “stack map”. Unlike the stack maps defined above, the GCFunctionMetadata stack map interface does not provide a way to associate live register values of arbitrary type with an instruction address, nor does it specify a format for the resulting stack map. The stack maps described here could potentially provide richer information to a garbage collecting runtime, but that usage will not be discussed in this document.
Intrinsics¶
The following two kinds of intrinsics can be used to implement stack
maps and patch points: llvm.experimental.stackmap
and
llvm.experimental.patchpoint
. Both kinds of intrinsics generate a
stack map record, and they both allow some form of code patching. They
can be used independently (i.e. llvm.experimental.patchpoint
implicitly generates a stack map without the need for an additional
call to llvm.experimental.stackmap
). The choice of which to use
depends on whether it is necessary to reserve space for code patching
and whether any of the intrinsic arguments should be lowered according
to calling conventions. llvm.experimental.stackmap
does not
reserve any space, nor does it expect any call arguments. If the
runtime patches code at the stack map’s address, it will destructively
overwrite the program text. This is unlike
llvm.experimental.patchpoint
, which reserves space for in-place
patching without overwriting surrounding code. The
llvm.experimental.patchpoint
intrinsic also lowers a specified
number of arguments according to its calling convention. This allows
patched code to make in-place function calls without marshaling.
Each instance of one of these intrinsics generates a stack map record in the Stack Map Section. The record includes an ID, allowing the runtime to uniquely identify the stack map, and the offset within the code from the beginning of the enclosing function.
‘llvm.experimental.stackmap
‘ Intrinsic¶
Syntax:¶
declare void
@llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
Overview:¶
The ‘llvm.experimental.stackmap
‘ intrinsic records the location of
specified values in the stack map without generating any code.
Operands:¶
The first operand is an ID to be encoded within the stack map. The
second operand is the number of shadow bytes following the
intrinsic. The variable number of operands that follow are the live
values
for which locations will be recorded in the stack map.
To use this intrinsic as a bare-bones stack map, with no code patching support, the number of shadow bytes can be set to zero.
Semantics:¶
The stack map intrinsic generates no code in place, unless nops are needed to cover its shadow (see below). However, its offset from function entry is stored in the stack map. This is the relative instruction address immediately following the instructions that precede the stack map.
The stack map ID allows a runtime to locate the desired stack map record. LLVM passes this ID through directly to the stack map record without checking uniqueness.
LLVM guarantees a shadow of instructions following the stack map’s
instruction offset during which neither the end of the basic block nor
another call to llvm.experimental.stackmap
or
llvm.experimental.patchpoint
may occur. This allows the runtime to
patch the code at this point in response to an event triggered from
outside the code. The code for instructions following the stack map
may be emitted in the stack map’s shadow, and these instructions may
be overwritten by destructive patching. Without shadow bytes, this
destructive patching could overwrite program text or data outside the
current function. We disallow overlapping stack map shadows so that
the runtime does not need to consider this corner case.
For example, a stack map with 8 byte shadow:
call void @runtime()
call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
i64* %ptr)
%val = load i64* %ptr
%add = add i64 %val, 3
ret i64 %add
May require one byte of nop-padding:
0x00 callq _runtime
0x05 nop <--- stack map address
0x06 movq (%rdi), %rax
0x07 addq $3, %rax
0x0a popq %rdx
0x0b ret <---- end of 8-byte shadow
Now, if the runtime needs to invalidate the compiled code, it may patch 8 bytes of code at the stack map’s address at follows:
0x00 callq _runtime
0x05 movl $0xffff, %rax <--- patched code at stack map address
0x0a callq *%rax <---- end of 8-byte shadow
This way, after the normal call to the runtime returns, the code will execute a patched call to a special entry point that can rebuild a stack frame from the values located by the stack map.
‘llvm.experimental.patchpoint.*
‘ Intrinsic¶
Syntax:¶
declare void
@llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
i8* <target>, i32 <numArgs>, ...)
declare i64
@llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
i8* <target>, i32 <numArgs>, ...)
Overview:¶
The ‘llvm.experimental.patchpoint.*
‘ intrinsics creates a function
call to the specified <target>
and records the location of specified
values in the stack map.
Operands:¶
The first operand is an ID, the second operand is the number of bytes
reserved for the patchable region, the third operand is the target
address of a function (optionally null), and the fourth operand
specifies how many of the following variable operands are considered
function call arguments. The remaining variable number of operands are
the live values
for which locations will be recorded in the stack
map.
Semantics:¶
The patch point intrinsic generates a stack map. It also emits a
function call to the address specified by <target>
if the address
is not a constant null. The function call and its arguments are
lowered according to the calling convention specified at the
intrinsic’s callsite. Variants of the intrinsic with non-void return
type also return a value according to calling convention.
On PowerPC, note that <target>
must be the ABI function pointer for the
intended target of the indirect call. Specifically, when compiling for the
ELF V1 ABI, <target>
is the function-descriptor address normally used as
the C/C++ function-pointer representation.
Requesting zero patch point arguments is valid. In this case, all
variable operands are handled just like
llvm.experimental.stackmap.*
. The difference is that space will
still be reserved for patching, a call will be emitted, and a return
value is allowed.
The location of the arguments are not normally recorded in the stack
map because they are already fixed by the calling convention. The
remaining live values
will have their location recorded, which
could be a register, stack location, or constant. A special calling
convention has been introduced for use with stack maps, anyregcc,
which forces the arguments to be loaded into registers but allows
those register to be dynamically allocated. These argument registers
will have their register locations recorded in the stack map in
addition to the remaining live values
.
The patch point also emits nops to cover at least <numBytes>
of
instruction encoding space. Hence, the client must ensure that
<numBytes>
is enough to encode a call to the target address on the
supported targets. If the call target is constant null, then there is
no minimum requirement. A zero-byte null target patchpoint is
valid.
The runtime may patch the code emitted for the patch point, including the call sequence and nops. However, the runtime may not assume anything about the code LLVM emits within the reserved space. Partial patching is not allowed. The runtime must patch all reserved bytes, padding with nops if necessary.
This example shows a patch point reserving 15 bytes, with one argument in $rdi, and a return value in $rax per native calling convention:
%target = inttoptr i64 -281474976710654 to i8*
%val = call i64 (i64, i32, ...)*
@llvm.experimental.patchpoint.i64(i64 78, i32 15,
i8* %target, i32 1, i64* %ptr)
%add = add i64 %val, 3
ret i64 %add
May generate:
0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
0x0a callq *%r11
0x0d nop
0x0e nop <--- end of reserved 15-bytes
0x0f addq $0x3, %rax
0x10 movl %rax, 8(%rsp)
Note that no stack map locations will be recorded. If the patched code
sequence does not need arguments fixed to specific calling convention
registers, then the anyregcc
convention may be used:
%val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
i8* %target, i32 1,
i64* %ptr)
The stack map now indicates the location of the %ptr argument and return value:
Stack Map: ID=78, Loc0=%r9 Loc1=%r8
The patch code sequence may now use the argument that happened to be allocated in %r8 and return a value allocated in %r9:
0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
0x03 nop
...
0x0e nop <--- end of reserved 15-bytes
0x0f addq $0x3, %r9
0x10 movl %r9, 8(%rsp)
Stack Map Format¶
The existence of a stack map or patch point intrinsic within an LLVM Module forces code emission to create a Stack Map Section. The format of this section follows:
Header {
uint8 : Stack Map Version (current version is 1)
uint8 : Reserved (expected to be 0)
uint16 : Reserved (expected to be 0)
}
uint32 : NumFunctions
uint32 : NumConstants
uint32 : NumRecords
StkSizeRecord[NumFunctions] {
uint64 : Function Address
uint64 : Stack Size
}
Constants[NumConstants] {
uint64 : LargeConstant
}
StkMapRecord[NumRecords] {
uint64 : PatchPoint ID
uint32 : Instruction Offset
uint16 : Reserved (record flags)
uint16 : NumLocations
Location[NumLocations] {
uint8 : Register | Direct | Indirect | Constant | ConstantIndex
uint8 : Reserved (location flags)
uint16 : Dwarf RegNum
int32 : Offset or SmallConstant
}
uint16 : Padding
uint16 : NumLiveOuts
LiveOuts[NumLiveOuts]
uint16 : Dwarf RegNum
uint8 : Reserved
uint8 : Size in Bytes
}
uint32 : Padding (only if required to align to 8 byte)
}
The first byte of each location encodes a type that indicates how to
interpret the RegNum
and Offset
fields as follows:
Encoding | Type | Value | Description |
0x1 | Register | Reg | Value in a register |
0x2 | Direct | Reg + Offset | Frame index value |
0x3 | Indirect | [Reg + Offset] | Spilled value |
0x4 | Constant | Offset | Small constant |
0x5 | ConstIndex | Constants[Offset] | Large constant |
In the common case, a value is available in a register, and the
Offset
field will be zero. Values spilled to the stack are encoded
as Indirect
locations. The runtime must load those values from a
stack address, typically in the form [BP + Offset]
. If an
alloca
value is passed directly to a stack map intrinsic, then
LLVM may fold the frame index into the stack map as an optimization to
avoid allocating a register or stack slot. These frame indices will be
encoded as Direct
locations in the form BP + Offset
. LLVM may
also optimize constants by emitting them directly in the stack map,
either in the Offset
of a Constant
location or in the constant
pool, referred to by ConstantIndex
locations.
At each callsite, a “liveout” register list is also recorded. These are the registers that are live across the stackmap and therefore must be saved by the runtime. This is an important optimization when the patchpoint intrinsic is used with a calling convention that by default preserves most registers as callee-save.
Each entry in the liveout register list contains a DWARF register
number and size in bytes. The stackmap format deliberately omits
specific subregister information. Instead the runtime must interpret
this information conservatively. For example, if the stackmap reports
one byte at %rax
, then the value may be in either %al
or
%ah
. It doesn’t matter in practice, because the runtime will
simply save %rax
. However, if the stackmap reports 16 bytes at
%ymm0
, then the runtime can safely optimize by saving only
%xmm0
.
The stack map format is a contract between an LLVM SVN revision and the runtime. It is currently experimental and may change in the short term, but minimizing the need to update the runtime is important. Consequently, the stack map design is motivated by simplicity and extensibility. Compactness of the representation is secondary because the runtime is expected to parse the data immediately after compiling a module and encode the information in its own format. Since the runtime controls the allocation of sections, it can reuse the same stack map space for multiple modules.
Stackmap support is currently only implemented for 64-bit platforms. However, a 32-bit implementation should be able to use the same format with an insignificant amount of wasted space.
Stack Map Section¶
A JIT compiler can easily access this section by providing its own
memory manager via the LLVM C API
LLVMCreateSimpleMCJITMemoryManager()
. When creating the memory
manager, the JIT provides a callback:
LLVMMemoryManagerAllocateDataSectionCallback()
. When LLVM creates
this section, it invokes the callback and passes the section name. The
JIT can record the in-memory address of the section at this time and
later parse it to recover the stack map data.
On Darwin, the stack map section name is “__llvm_stackmaps”. The segment name is “__LLVM_STACKMAPS”.
Stack Map Usage¶
The stack map support described in this document can be used to precisely determine the location of values at a specific position in the code. LLVM does not maintain any mapping between those values and any higher-level entity. The runtime must be able to interpret the stack map record given only the ID, offset, and the order of the locations, which LLVM preserves.
Note that this is quite different from the goal of debug information, which is a best-effort attempt to track the location of named variables at every instruction.
An important motivation for this design is to allow a runtime to commandeer a stack frame when execution reaches an instruction address associated with a stack map. The runtime must be able to rebuild a stack frame and resume program execution using the information provided by the stack map. For example, execution may resume in an interpreter or a recompiled version of the same function.
This usage restricts LLVM optimization. Clearly, LLVM must not move stores across a stack map. However, loads must also be handled conservatively. If the load may trigger an exception, hoisting it above a stack map could be invalid. For example, the runtime may determine that a load is safe to execute without a type check given the current state of the type system. If the type system changes while some activation of the load’s function exists on the stack, the load becomes unsafe. The runtime can prevent subsequent execution of that load by immediately patching any stack map location that lies between the current call site and the load (typically, the runtime would simply patch all stack map locations to invalidate the function). If the compiler had hoisted the load above the stack map, then the program could crash before the runtime could take back control.
To enforce these semantics, stackmap and patchpoint intrinsics are considered to potentially read and write all memory. This may limit optimization more than some clients desire. This limitation may be avoided by marking the call site as “readonly”. In the future we may also allow meta-data to be added to the intrinsic call to express aliasing, thereby allowing optimizations to hoist certain loads above stack maps.
Direct Stack Map Entries¶
As shown in Stack Map Section, a Direct stack map location records the address of frame index. This address is itself the value that the runtime requested. This differs from Indirect locations, which refer to a stack locations from which the requested values must be loaded. Direct locations can communicate the address if an alloca, while Indirect locations handle register spills.
For example:
entry:
%a = alloca i64...
llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
The runtime can determine this alloca’s relative location on the stack immediately after compilation, or at any time thereafter. This differs from Register and Indirect locations, because the runtime can only read the values in those locations when execution reaches the instruction address of the stack map.
This functionality requires LLVM to treat entry-block allocas specially when they are directly consumed by an intrinsics. (This is the same requirement imposed by the llvm.gcroot intrinsic.) LLVM transformations must not substitute the alloca with any intervening value. This can be verified by the runtime simply by checking that the stack map’s location is a Direct location type.
Design and Usage of the InAlloca Attribute¶
Introduction¶
The inalloca attribute is designed to allow taking the address of an aggregate argument that is being passed by value through memory. Primarily, this feature is required for compatibility with the Microsoft C++ ABI. Under that ABI, class instances that are passed by value are constructed directly into argument stack memory. Prior to the addition of inalloca, calls in LLVM were indivisible instructions. There was no way to perform intermediate work, such as object construction, between the first stack adjustment and the final control transfer. With inalloca, all arguments passed in memory are modelled as a single alloca, which can be stored to prior to the call. Unfortunately, this complicated feature comes with a large set of restrictions designed to bound the lifetime of the argument memory around the call.
For now, it is recommended that frontends and optimizers avoid producing this construct, primarily because it forces the use of a base pointer. This feature may grow in the future to allow general mid-level optimization, but for now, it should be regarded as less efficient than passing by value with a copy.
Intended Usage¶
The example below is the intended LLVM IR lowering for some C++ code
that passes two default-constructed Foo
objects to g
in the
32-bit Microsoft C++ ABI.
// Foo is non-trivial.
struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
void g(Foo a, Foo b);
void f() {
g(Foo(), Foo());
}
%struct.Foo = type { i32, i32 }
declare void @Foo_ctor(%struct.Foo* %this)
declare void @Foo_dtor(%struct.Foo* %this)
declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
define void @f() {
entry:
%base = call i8* @llvm.stacksave()
%memargs = alloca <{ %struct.Foo, %struct.Foo }>
%b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
call void @Foo_ctor(%struct.Foo* %b)
; If a's ctor throws, we must destruct b.
%a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
invoke void @Foo_ctor(%struct.Foo* %a)
to label %invoke.cont unwind %invoke.unwind
invoke.cont:
call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
call void @llvm.stackrestore(i8* %base)
...
invoke.unwind:
call void @Foo_dtor(%struct.Foo* %b)
call void @llvm.stackrestore(i8* %base)
...
}
To avoid stack leaks, the frontend saves the current stack pointer with
a call to llvm.stacksave. Then, it allocates the
argument stack space with alloca and calls the default constructor. The
default constructor could throw an exception, so the frontend has to
create a landing pad. The frontend has to destroy the already
constructed argument b
before restoring the stack pointer. If the
constructor does not unwind, g
is called. In the Microsoft C++ ABI,
g
will destroy its arguments, and then the stack is restored in
f
.
Design Considerations¶
Lifetime¶
The biggest design consideration for this feature is object lifetime. We cannot model the arguments as static allocas in the entry block, because all calls need to use the memory at the top of the stack to pass arguments. We cannot vend pointers to that memory at function entry because after code generation they will alias.
The rule against allocas between argument allocations and the call site
avoids this problem, but it creates a cleanup problem. Cleanup and
lifetime is handled explicitly with stack save and restore calls. In
the future, we may want to introduce a new construct such as freea
or afree
to make it clear that this stack adjusting cleanup is less
powerful than a full stack save and restore.
Nested Calls and Copy Elision¶
We also want to be able to support copy elision into these argument slots. This means we have to support multiple live argument allocations.
Consider the evaluation of:
// Foo is non-trivial.
struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
Foo bar(Foo b);
int main() {
bar(bar(Foo()));
}
In this case, we want to be able to elide copies into bar
‘s argument
slots. That means we need to have more than one set of argument frames
active at the same time. First, we need to allocate the frame for the
outer call so we can pass it in as the hidden struct return pointer to
the middle call. Then we do the same for the middle call, allocating a
frame and passing its address to Foo
‘s default constructor. By
wrapping the evaluation of the inner bar
with stack save and
restore, we can have multiple overlapping active call frames.
Callee-cleanup Calling Conventions¶
Another wrinkle is the existence of callee-cleanup conventions. On Windows, all methods and many other functions adjust the stack to clear the memory used to pass their arguments. In some sense, this means that the allocas are automatically cleared by the call. However, LLVM instead models this as a write of undef to all of the inalloca values passed to the call instead of a stack adjustment. Frontends should still restore the stack pointer to avoid a stack leak.
Exceptions¶
There is also the possibility of an exception. If argument evaluation or copy construction throws an exception, the landing pad must do cleanup, which includes adjusting the stack pointer to avoid a stack leak. This means the cleanup of the stack memory cannot be tied to the call itself. There needs to be a separate IR-level instruction that can perform independent cleanup of arguments.
Efficiency¶
Eventually, it should be possible to generate efficient code for this construct. In particular, using inalloca should not require a base pointer. If the backend can prove that all points in the CFG only have one possible stack level, then it can address the stack directly from the stack pointer. While this is not yet implemented, the plan is that the inalloca attribute should not change much, but the frontend IR generation recommendations may change.
Using ARM NEON instructions in big endian mode¶
Introduction¶
Generating code for big endian ARM processors is for the most part straightforward. NEON loads and stores however have some interesting properties that make code generation decisions less obvious in big endian mode.
The aim of this document is to explain the problem with NEON loads and stores, and the solution that has been implemented in LLVM.
In this document the term “vector” refers to what the ARM ABI calls a “short vector”, which is a sequence of items that can fit in a NEON register. This sequence can be 64 or 128 bits in length, and can constitute 8, 16, 32 or 64 bit items. This document refers to A64 instructions throughout, but is almost applicable to the A32/ARMv7 instruction sets also. The ABI format for passing vectors in A32 is sligtly different to A64. Apart from that, the same concepts apply.
Example: C-level intrinsics -> assembly¶
It may be helpful first to illustrate how C-level ARM NEON intrinsics are lowered to instructions.
This trivial C function takes a vector of four ints and sets the zero’th lane to the value “42”:
#include <arm_neon.h>
int32x4_t f(int32x4_t p) {
return vsetq_lane_s32(42, p, 0);
}
arm_neon.h intrinsics generate “generic” IR where possible (that is, normal IR instructions not llvm.arm.neon.*
intrinsic calls). The above generates:
define <4 x i32> @f(<4 x i32> %p) {
%vset_lane = insertelement <4 x i32> %p, i32 42, i32 0
ret <4 x i32> %vset_lane
}
Which then becomes the following trivial assembly:
f: // @f
movz w8, #0x2a
ins v0.s[0], w8
ret
Problem¶
The main problem is how vectors are represented in memory and in registers.
First, a recap. The “endianness” of an item affects its representation in memory only. In a register, a number is just a sequence of bits - 64 bits in the case of AArch64 general purpose registers. Memory, however, is a sequence of addressable units of 8 bits in size. Any number greater than 8 bits must therefore be split up into 8-bit chunks, and endianness describes the order in which these chunks are laid out in memory.
A “little endian” layout has the least significant byte first (lowest in memory address). A “big endian” layout has the most significant byte first. This means that when loading an item from big endian memory, the lowest 8-bits in memory must go in the most significant 8-bits, and so forth.
LDR
and LD1
¶

Big endian vector load using LDR
.
A vector is a consecutive sequence of items that are operated on simultaneously. To load a 64-bit vector, 64 bits need to be read from memory. In little endian mode, we can do this by just performing a 64-bit load - LDR q0, [foo]
. However if we try this in big endian mode, because of the byte swapping the lane indices end up being swapped! The zero’th item as laid out in memory becomes the n’th lane in the vector.

Big endian vector load using LD1
. Note that the lanes retain the correct ordering.
Because of this, the instruction LD1
performs a vector load but performs byte swapping not on the entire 64 bits, but on the individual items within the vector. This means that the register content is the same as it would have been on a little endian system.
It may seem that LD1
should suffice to peform vector loads on a big endian machine. However there are pros and cons to the two approaches that make it less than simple which register format to pick.
There are two options:
- The content of a vector register is the same as if it had been loaded with an
LDR
instruction.- The content of a vector register is the same as if it had been loaded with an
LD1
instruction.
Because LD1 == LDR + REV
and similarly LDR == LD1 + REV
(on a big endian system), we can simulate either type of load with the other type of load plus a REV
instruction. So we’re not deciding which instructions to use, but which format to use (which will then influence which instruction is best to use).
Considerations¶
LLVM IR Lane ordering¶
LLVM IR has first class vector types. In LLVM IR, the zero’th element of a vector resides at the lowest memory address. The optimizer relies on this property in certain areas, for example when concatenating vectors together. The intention is for arrays and vectors to have identical memory layouts - [4 x i8]
and <4 x i8>
should be represented the same in memory. Without this property there would be many special cases that the optimizer would have to cleverly handle.
Use of LDR
would break this lane ordering property. This doesn’t preclude the use of LDR
, but we would have to do one of two things:
- Insert a
REV
instruction to reverse the lane order after everyLDR
.- Disable all optimizations that rely on lane layout, and for every access to an individual lane (
insertelement
/extractelement
/shufflevector
) reverse the lane index.
AAPCS¶
The ARM procedure call standard (AAPCS) defines the ABI for passing vectors between functions in registers. It states:
When a short vector is transferred between registers and memory it is treated as an opaque object. That is a short vector is stored in memory as if it were stored with a single
STR
of the entire register; a short vector is loaded from memory using the correspondingLDR
instruction. On a little-endian system this means that element 0 will always contain the lowest addressed element of a short vector; on a big-endian system element 0 will contain the highest-addressed element of a short vector.—Procedure Call Standard for the ARM 64-bit Architecture (AArch64), 4.1.2 Short Vectors
The use of LDR
and STR
as the ABI defines has at least one advantage over LD1
and ST1
. LDR
and STR
are oblivious to the size of the individual lanes of a vector. LD1
and ST1
are not - the lane size is encoded within them. This is important across an ABI boundary, because it would become necessary to know the lane width the callee expects. Consider the following code:
<callee.c>
void callee(uint32x2_t v) {
...
}
<caller.c>
extern void callee(uint32x2_t);
void caller() {
callee(...);
}
If callee
changed its signature to uint16x4_t
, which is equivalent in register content, if we passed as LD1
we’d break this code until caller
was updated and recompiled.
There is an argument that if the signatures of the two functions are different then the behaviour should be undefined. But there may be functions that are agnostic to the lane layout of the vector, and treating the vector as an opaque value (just loading it and storing it) would be impossible without a common format across ABI boundaries.
So to preserve ABI compatibility, we need to use the LDR
lane layout across function calls.
Alignment¶
In strict alignment mode, LDR qX
requires its address to be 128-bit aligned, whereas LD1
only requires it to be as aligned as the lane size. If we canonicalised on using LDR
, we’d still need to use LD1
in some places to avoid alignment faults (the result of the LD1
would then need to be reversed with REV
).
Most operating systems however do not run with alignment faults enabled, so this is often not an issue.
Summary¶
The following table summarises the instructions that are required to be emitted for each property mentioned above for each of the two solutions.
LDR layout |
LD1 layout |
|
---|---|---|
Lane ordering | LDR + REV |
LD1 |
AAPCS | LDR |
LD1 + REV |
Alignment for strict mode | LDR / LD1 + REV |
LD1 |
Neither approach is perfect, and choosing one boils down to choosing the lesser of two evils. The issue with lane ordering, it was decided, would have to change target-agnostic compiler passes and would result in a strange IR in which lane indices were reversed. It was decided that this was worse than the changes that would have to be made to support LD1
, so LD1
was chosen as the canonical vector load instruction (and by inference, ST1
for vector stores).
Implementation¶
There are 3 parts to the implementation:
- Predicate
LDR
andSTR
instructions so that they are never allowed to be selected to generate vector loads and stores. The exception is one-lane vectors [1] - these by definition cannot have lane ordering problems so are fine to useLDR
/STR
.- Create code generation patterns for bitconverts that create
REV
instructions.- Make sure appropriate bitconverts are created so that vector values get passed over call boundaries as 1-element vectors (which is the same as if they were loaded with
LDR
).
Bitconverts¶

The main problem with the LD1
solution is dealing with bitconverts (or bitcasts, or reinterpret casts). These are pseudo instructions that only change the compiler’s interpretation of data, not the underlying data itself. A requirement is that if data is loaded and then saved again (called a “round trip”), the memory contents should be the same after the store as before the load. If a vector is loaded and is then bitconverted to a different vector type before storing, the round trip will currently be broken.
Take for example this code sequence:
%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
store <2 x i64> %1, <2 x i64>* %y
This would produce a code sequence such as that in the figure on the right. The mismatched LD1
and ST1
cause the stored data to differ from the loaded data.
X
to type Y
, what we need to do is to change the in-register representation of the data to be as if it had just been loaded by a LD1
of type Y
.
Conceptually this is simple - we can insert a REV
undoing the LD1
of type X
(converting the in-register representation to the same as if it had been loaded by LDR
) and then insert another REV
to change the representation to be as if it had been loaded by an LD1
of type Y
.
For the previous example, this would be:
LD1 v0.4s, [x]
REV64 v0.4s, v0.4s // There is no REV128 instruction, so it must be synthesizedcd
EXT v0.16b, v0.16b, v0.16b, #8 // with a REV64 then an EXT to swap the two 64-bit elements.
REV64 v0.2d, v0.2d
EXT v0.16b, v0.16b, v0.16b, #8
ST1 v0.2d, [y]
It turns out that these REV
pairs can, in almost all cases, be squashed together into a single REV
. For the example above, a REV128 4s
+ REV128 2d
is actually a REV64 4s
, as shown in the figure on the right.
[1] | One lane vectors may seem useless as a concept but they serve to distinguish between values held in general purpose registers and values held in NEON/VFP registers. For example, an i64 would live in an x register, but <1 x i64> would live in a d register. |
LLVM Code Coverage Mapping Format¶
Introduction¶
LLVM’s code coverage mapping format is used to provide code coverage
analysis using LLVM’s and Clang’s instrumenation based profiling
(Clang’s -fprofile-instr-generate
option).
This document is aimed at those who use LLVM’s code coverage mapping to provide code coverage analysis for their own programs, and for those who would like to know how it works under the hood. A prior knowledge of how Clang’s profile guided optimization works is useful, but not required.
We start by showing how to use LLVM and Clang for code coverage analysis, then we briefly desribe LLVM’s code coverage mapping format and the way that Clang and LLVM’s code coverage tool work with this format. After the basics are down, more advanced features of the coverage mapping format are discussed - such as the data structures, LLVM IR representation and the binary encoding.
Quick Start¶
Here’s a short story that describes how to generate code coverage overview for a sample source file called test.c.
First, compile an instrumented version of your program using Clang’s
-fprofile-instr-generate
option with the additional-fcoverage-mapping
option:clang -o test -fprofile-instr-generate -fcoverage-mapping test.c
Then, run the instrumented binary. The runtime will produce a file called default.profraw containing the raw profile instrumentation data:
./test
After that, merge the profile data using the llvm-profdata tool:
llvm-profdata merge -o test.profdata default.profraw
Finally, run LLVM’s code coverage tool (llvm-cov) to produce the code coverage overview for the sample source file:
llvm-cov show ./test -instr-profile=test.profdata test.c
High Level Overview¶
LLVM’s code coverage mapping format is designed to be a self contained data format, that can be embedded into the LLVM IR and object files. It’s described in this document as a mapping format because its goal is to store the data that is required for a code coverage tool to map between the specific source ranges in a file and the execution counts obtained after running the instrumented version of the program.
The mapping data is used in two places in the code coverage process:
- When clang compiles a source file with
-fcoverage-mapping
, it generates the mapping information that describes the mapping between the source ranges and the profiling instrumentation counters. This information gets embedded into the LLVM IR and conveniently ends up in the final executable file when the program is linked. - It is also used by llvm-cov - the mapping information is extracted from an object file and is used to associate the execution counts (the values of the profile instrumentation counters), and the source ranges in a file. After that, the tool is able to generate various code coverage reports for the program.
The coverage mapping format aims to be a “universal format” that would be suitable for usage by any frontend, and not just by Clang. It also aims to provide the frontend the possibility of generating the minimal coverage mapping data in order to reduce the size of the IR and object files - for example, instead of emitting mapping information for each statement in a function, the frontend is allowed to group the statements with the same execution count into regions of code, and emit the mapping information only for those regions.
Advanced Concepts¶
The remainder of this guide is meant to give you insight into the way the coverage mapping format works.
The coverage mapping format operates on a per-function level as the profile instrumentation counters are associated with a specific function. For each function that requires code coverage, the frontend has to create coverage mapping data that can map between the source code ranges and the profile instrumentation counters for that function.
Mapping Region¶
The function’s coverage mapping data contains an array of mapping regions. A mapping region stores the source code range that is covered by this region, the file id, the coverage mapping counter and the region’s kind. There are several kinds of mapping regions:
Code regions associate portions of source code and coverage mapping counters. They make up the majority of the mapping regions. They are used by the code coverage tool to compute the execution counts for lines, highlight the regions of code that were never executed, and to obtain the various code coverage statistics for a function. For example:
int main(int argc, const char *argv[]) { // Code Region from 1:40 to 9:2 if (argc > 1) { // Code Region from 3:17 to 5:4 printf("%s\n", argv[1]); } else { // Code Region from 5:10 to 7:4 printf("\n"); } return 0; }
Skipped regions are used to represent source ranges that were skipped by Clang’s preprocessor. They don’t associate with coverage mapping counters, as the frontend knows that they are never executed. They are used by the code coverage tool to mark the skipped lines inside a function as non-code lines that don’t have execution counts. For example:
int main() { // Code Region from 1:12 to 6:2 #ifdef DEBUG // Skipped Region from 2:1 to 4:2 printf("Hello world"); #endif return 0; }
Expansion regions are used to represent Clang’s macro expansions. They have an additional property - expanded file id. This property can be used by the code coverage tool to find the mapping regions that are created as a result of this macro expansion, by checking if their file id matches the expanded file id. They don’t associate with coverage mapping counters, as the code coverage tool can determine the execution count for this region by looking up the execution count of the first region with a corresponding file id. For example:
int func(int x) { #define MAX(x,y) ((x) > (y)? (x) : (y)) return MAX(x, 42); // Expansion Region from 3:10 to 3:13 }
Source Range:¶
The source range record contains the starting and ending location of a certain mapping region. Both locations include the line and the column numbers.
File ID:¶
The file id an integer value that tells us in which source file or macro expansion is this region located. It enables Clang to produce mapping information for the code defined inside macros, like this example demonstrates:
void func(const char *str) { // Code Region from 1:28 to 6:2 with file id 0
#define PUT printf("%s\n", str) // 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2
if(*str)
PUT; // Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1
PUT; // Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2
}
Counter:¶
A coverage mapping counter can represents a reference to the profile instrumentation counter. The execution count for a region with such counter is determined by looking up the value of the corresponding profile instrumentation counter.
It can also represent a binary arithmetical expression that operates on coverage mapping counters or other expressions. The execution count for a region with an expression counter is determined by evaluating the expression’s arguments and then adding them together or subtracting them from one another. In the example below, a subtraction expression is used to compute the execution count for the compound statement that follows the else keyword:
int main(int argc, const char *argv[]) { // Region's counter is a reference to the profile counter #0
if (argc > 1) { // Region's counter is a reference to the profile counter #1
printf("%s\n", argv[1]);
} else { // Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)
printf("\n");
}
return 0;
}
Finally, a coverage mapping counter can also represent an execution count of of zero. The zero counter is used to provide coverage mapping for unreachable statements and expressions, like in the example below:
int main() {
return 0;
printf("Hello world!\n"); // Unreachable region's counter is zero
}
The zero counters allow the code coverage tool to display proper line execution counts for the unreachable lines and highlight the unreachable code. Without them, the tool would think that those lines and regions were still executed, as it doesn’t possess the frontend’s knowledge.
LLVM IR Representation¶
The coverage mapping data is stored in the LLVM IR using a single global constant structure variable called __llvm_coverage_mapping with the __llvm_covmap section specifier.
For example, let’s consider a C file and how it gets compiled to LLVM:
int foo() {
return 42;
}
int bar() {
return 13;
}
The coverage mapping variable generated by Clang is:
@__llvm_coverage_mapping = internal constant { i32, i32, i32, i32, [2 x { i8*, i32, i32 }], [40 x i8] }
{ i32 2, ; The number of function records
i32 20, ; The length of the string that contains the encoded translation unit filenames
i32 20, ; The length of the string that contains the encoded coverage mapping data
i32 0, ; Coverage mapping format version
[2 x { i8*, i32, i32 }] [ ; Function records
{ i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_foo, i32 0, i32 0), ; Function's name
i32 3, ; Function's name length
i32 9 ; Function's encoded coverage mapping data string length
},
{ i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_bar, i32 0, i32 0), ; Function's name
i32 3, ; Function's name length
i32 9 ; Function's encoded coverage mapping data string length
}],
[40 x i8] c"..." ; Encoded data (dissected later)
}, section "__llvm_covmap", align 8
Version:¶
The coverage mapping version number can have the following values:
- 0 — The first (current) version of the coverage mapping format.
Function record:¶
A function record is a structure of the following type:
{ i8*, i32, i32 }
It contains the pointer to the function’s name, function’s name length, and the length of the encoded mapping data for that function.
Encoded data:¶
The encoded data is stored in a single string that contains the encoded filenames used by this translation unit and the encoded coverage mapping data for each function in this translation unit.
The encoded data has the following structure:
[filenames, coverageMappingDataForFunctionRecord0, coverageMappingDataForFunctionRecord1, ..., padding]
If necessary, the encoded data is padded with zeroes so that the size of the data string is rounded up to the nearest multiple of 8 bytes.
Dissecting the sample:¶
Here’s an overview of the encoded data that was stored in the IR for the coverage mapping sample that was shown earlier:
The IR contains the following string constant that represents the encoded coverage mapping data for the sample translation unit:
c"\01\12/Users/alex/test.c\01\00\00\01\01\01\0C\02\02\01\00\00\01\01\04\0C\02\02\00\00"
The string contains values that are encoded in the LEB128 format, which is used throughout for storing integers. It also contains a string value.
The length of the substring that contains the encoded translation unit filenames is the value of the second field in the __llvm_coverage_mapping structure, which is 20, thus the filenames are encoded in this string:
c"\01\12/Users/alex/test.c"
This string contains the following data:
- Its first byte has a value of
0x01
. It stores the number of filenames contained in this string. - Its second byte stores the length of the first filename in this string.
- The remaining 18 bytes are used to store the first filename.
- Its first byte has a value of
The length of the substring that contains the encoded coverage mapping data for the first function is the value of the third field in the first structure in an array of function records stored in the fifth field of the __llvm_coverage_mapping structure, which is the 9. Therefore, the coverage mapping for the first function record is encoded in this string:
c"\01\00\00\01\01\01\0C\02\02"
This string consists of the following bytes:
0x01
The number of file ids used by this function. There is only one file id used by the mapping data in this function. 0x00
An index into the filenames array which corresponds to the file “/Users/alex/test.c”. 0x00
The number of counter expressions used by this function. This function doesn’t use any expressions. 0x01
The number of mapping regions that are stored in an array for the function’s file id #0. 0x01
The coverage mapping counter for the first region in this function. The value of 1 tells us that it’s a coverage mapping counter that is a reference ot the profile instrumentation counter with an index of 0. 0x01
The starting line of the first mapping region in this function. 0x0C
The starting column of the first mapping region in this function. 0x02
The ending line of the first mapping region in this function. 0x02
The ending column of the first mapping region in this function. The length of the substring that contains the encoded coverage mapping data for the second function record is also 9. It’s structured like the mapping data for the first function record.
The two trailing bytes are zeroes and are used to pad the coverage mapping data to give it the 8 byte alignment.
Encoding¶
The per-function coverage mapping data is encoded as a stream of bytes, with a simple structure. The structure consists of the encoding types like variable-length unsigned integers, that are used to encode File ID Mapping, Counter Expressions and the Mapping Regions.
The format of the structure follows:
[file id mapping, counter expressions, mapping regions]
The translation unit filenames are encoded using the same encoding types as the per-function coverage mapping data, with the following structure:
[numFilenames : LEB128, filename0 : string, filename1 : string, ...]
Types¶
This section describes the basic types that are used by the encoding format
and can appear after :
in the [foo : type]
description.
File ID Mapping¶
[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]
File id mapping in a function’s coverage mapping stream contains the indices into the translation unit’s filenames array.
Counter¶
[value : LEB128]
A coverage mapping counter is stored in a single LEB value. It is composed of two things — the tag which is stored in the lowest 2 bits, and the counter data which is stored in the remaining bits.
Tag:¶
The counter’s tag encodes the counter’s kind and, if the counter is an expression, the expression’s kind. The possible tag values are:
- 0 - The counter is zero.
- 1 - The counter is a reference to the profile instrumentation counter.
- 2 - The counter is a subtraction expression.
- 3 - The counter is an addition expression.
Data:¶
The counter’s data is interpreted in the following manner:
- When the counter is a reference to the profile instrumentation counter, then the counter’s data is the id of the profile counter.
- When the counter is an expression, then the counter’s data is the index into the array of counter expressions.
Counter Expressions¶
[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]
Counter expressions consist of two counters as they represent binary arithmetic operations. The expression’s kind is determined from the tag of the counter that references this expression.
Mapping Regions¶
[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]
The mapping regions are stored in an array of sub-arrays where every region in a particular sub-array has the same file id.
The file id for a sub-array of regions is the index of that sub-array in the main array e.g. The first sub-array will have the file id of 0.
Sub-Array of Regions¶
[numRegions : LEB128, region0, region1, ...]
The mapping regions for a specific file id are stored in an array that is sorted in an ascending order by the region’s starting location.
Mapping Region¶
[header, source range]
The mapping region record contains two sub-records — the header, which stores the counter and/or the region’s kind, and the source range that contains the starting and ending location of this region.
Header¶
[counter]
or
[pseudo-counter]
The header encodes the region’s counter and the region’s kind.
The value of the counter’s tag distinguishes between the counters and pseudo-counters — if the tag is zero, than this header contains a pseudo-counter, otherwise this header contains an ordinary counter.
Pseudo-Counter:¶
[value : LEB128]
A pseudo-counter is stored in a single LEB value, just like the ordinary counter. It has the following interpretation:
bits 0-1: tag, which is always 0.
bit 2: expansionRegionTag. If this bit is set, then this mapping region is an expansion region.
remaining bits: data. If this region is an expansion region, then the data contains the expanded file id of that region.
Otherwise, the data contains the region’s kind. The possible region kind values are:
- 0 - This mapping region is a code region with a counter of zero.
- 2 - This mapping region is a skipped region.
Source Range¶
[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]
The source range record contains the following fields:
deltaLineStart: The difference between the starting line of the current mapping region and the starting line of the previous mapping region.
If the current mapping region is the first region in the current sub-array, then it stores the starting line of that region.
columnStart: The starting column of the mapping region.
numLines: The difference between the ending line and the starting line of the current mapping region.
columnEnd: The ending column of the mapping region.
Garbage Collection Safepoints in LLVM¶
Status¶
This document describes a set of experimental extensions to LLVM. Use with caution. Because the intrinsics have experimental status, compatibility across LLVM releases is not guaranteed.
LLVM currently supports an alternate mechanism for conservative
garbage collection support using the gcroot
intrinsic. The mechanism
described here shares little in common with the alternate gcroot
implementation and it is hoped that this mechanism will eventually
replace the gc_root mechanism.
Overview¶
To collect dead objects, garbage collectors must be able to identify any references to objects contained within executing code, and, depending on the collector, potentially update them. The collector does not need this information at all points in code - that would make the problem much harder - but only at well-defined points in the execution known as ‘safepoints’ For most collectors, it is sufficient to track at least one copy of each unique pointer value. However, for a collector which wishes to relocate objects directly reachable from running code, a higher standard is required.
One additional challenge is that the compiler may compute intermediate results (“derived pointers”) which point outside of the allocation or even into the middle of another allocation. The eventual use of this intermediate value must yield an address within the bounds of the allocation, but such “exterior derived pointers” may be visible to the collector. Given this, a garbage collector can not safely rely on the runtime value of an address to indicate the object it is associated with. If the garbage collector wishes to move any object, the compiler must provide a mapping, for each pointer, to an indication of its allocation.
To simplify the interaction between a collector and the compiled code, most garbage collectors are organized in terms of three abstractions: load barriers, store barriers, and safepoints.
- A load barrier is a bit of code executed immediately after the machine load instruction, but before any use of the value loaded. Depending on the collector, such a barrier may be needed for all loads, merely loads of a particular type (in the original source language), or none at all.
- Analogously, a store barrier is a code fragement that runs immediately before the machine store instruction, but after the computation of the value stored. The most common use of a store barrier is to update a ‘card table’ in a generational garbage collector.
- A safepoint is a location at which pointers visible to the compiled code (i.e. currently in registers or on the stack) are allowed to change. After the safepoint completes, the actual pointer value may differ, but the ‘object’ (as seen by the source language) pointed to will not.
Note that the term ‘safepoint’ is somewhat overloaded. It refers to both the location at which the machine state is parsable and the coordination protocol involved in bring application threads to a point at which the collector can safely use that information. The term “statepoint” as used in this document refers exclusively to the former.
This document focuses on the last item - compiler support for safepoints in generated code. We will assume that an outside mechanism has decided where to place safepoints. From our perspective, all safepoints will be function calls. To support relocation of objects directly reachable from values in compiled code, the collector must be able to:
- identify every copy of a pointer (including copies introduced by the compiler itself) at the safepoint,
- identify which object each pointer relates to, and
- potentially update each of those copies.
This document describes the mechanism by which an LLVM based compiler can provide this information to a language runtime/collector, and ensure that all pointers can be read and updated if desired. The heart of the approach is to construct (or rewrite) the IR in a manner where the possible updates performed by the garbage collector are explicitly visible in the IR. Doing so requires that we:
- create a new SSA value for each potentially relocated pointer, and ensure that no uses of the original (non relocated) value is reachable after the safepoint,
- specify the relocation in a way which is opaque to the compiler to ensure that the optimizer can not introduce new uses of an unrelocated value after a statepoint. This prevents the optimizer from performing unsound optimizations.
- recording a mapping of live pointers (and the allocation they’re associated with) for each statepoint.
At the most abstract level, inserting a safepoint can be thought of as replacing a call instruction with a call to a multiple return value function which both calls the original target of the call, returns it’s result, and returns updated values for any live pointers to garbage collected objects.
Note that the task of identifying all live pointers to garbage collected values, transforming the IR to expose a pointer giving the base object for every such live pointer, and inserting all the intrinsics correctly is explicitly out of scope for this document. The recommended approach is to use the utility passes described below.
This abstract function call is concretely represented by a sequence of intrinsic calls known collectively as a “statepoint relocation sequence”.
Let’s consider a simple call in LLVM IR:
define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
call void ()* @foo()
ret i8 addrspace(1)* %obj
}
Depending on our language we may need to allow a safepoint during the execution
of foo
. If so, we need to let the collector update local values in the
current frame. If we don’t, we’ll be accessing a potential invalid reference
once we eventually return from the call.
In this example, we need to relocate the SSA value %obj
. Since we can’t
actually change the value in the SSA value %obj
, we need to introduce a new
SSA value %obj.relocated
which represents the potentially changed value of
%obj
after the safepoint and update any following uses appropriately. The
resulting relocation sequence is:
define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
%0 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
%obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 7, i32 7)
ret i8 addrspace(1)* %obj.relocated
}
Ideally, this sequence would have been represented as a M argument, N return value function (where M is the number of values being relocated + the original call arguments and N is the original return value + each relocated value), but LLVM does not easily support such a representation.
Instead, the statepoint intrinsic marks the actual site of the
safepoint or statepoint. The statepoint returns a token value (which
exists only at compile time). To get back the original return value
of the call, we use the gc.result
intrinsic. To get the relocation
of each pointer in turn, we use the gc.relocate
intrinsic with the
appropriate index. Note that both the gc.relocate
and gc.result
are
tied to the statepoint. The combination forms a “statepoint relocation
sequence” and represents the entitety of a parseable call or ‘statepoint’.
When lowered, this example would generate the following x86 assembly:
.globl test1
.align 16, 0x90
pushq %rax
callq foo
.Ltmp1:
movq (%rsp), %rax # This load is redundant (oops!)
popq %rdx
retq
Each of the potentially relocated values has been spilled to the stack, and a record of that location has been recorded to the Stack Map section. If the garbage collector needs to update any of these pointers during the call, it knows exactly what to change.
The relevant parts of the StackMap section for our example are:
# This describes the call site
# Stack Maps: callsite 2882400000
.quad 2882400000
.long .Ltmp1-test1
.short 0
# .. 8 entries skipped ..
# This entry describes the spill slot which is directly addressable
# off RSP with offset 0. Given the value was spilled with a pushq,
# that makes sense.
# Stack Maps: Loc 8: Direct RSP [encoding: .byte 2, .byte 8, .short 7, .int 0]
.byte 2
.byte 8
.short 7
.long 0
This example was taken from the tests for the RewriteStatepointsForGC utility pass. As such, it’s full StackMap can be easily examined with the following command.
opt -rewrite-statepoints-for-gc test/Transforms/RewriteStatepointsForGC/basics.ll -S | llc -debug-only=stackmaps
GC Transitions¶
As a practical consideration, many garbage-collected systems allow code that is collector-aware (“managed code”) to call code that is not collector-aware (“unmanaged code”). It is common that such calls must also be safepoints, since it is desirable to allow the collector to run during the execution of unmanaged code. Futhermore, it is common that coordinating the transition from managed to unmanaged code requires extra code generation at the call site to inform the collector of the transition. In order to support these needs, a statepoint may be marked as a GC transition, and data that is necessary to perform the transition (if any) may be provided as additional arguments to the statepoint.
Note that although in many cases statepoints may be inferred to be GC transitions based on the function symbols involved (e.g. a call from a function with GC strategy “foo” to a function with GC strategy “bar”), indirect calls that are also GC transitions must also be supported. This requirement is the driving force behing the decision to require that GC transitions are explicitly marked.
Let’s revisit the sample given above, this time treating the call to @foo
as a GC transition. Depending on our target, the transition code may need to
access some extra state in order to inform the collector of the transition.
Let’s assume a hypothetical GC–somewhat unimaginatively named “hypothetical-gc”
–that requires that a TLS variable must be written to before and after a call
to unmanaged code. The resulting relocation sequence is:
@flag = thread_local global i32 0, align 4
define i8 addrspace(1)* @test1(i8 addrspace(1) *%obj)
gc "hypothetical-gc" {
%0 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 1, i32* @Flag, i32 0, i8 addrspace(1)* %obj)
%obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 7, i32 7)
ret i8 addrspace(1)* %obj.relocated
}
During lowering, this will result in a instruction selection DAG that looks something like:
CALLSEQ_START
...
GC_TRANSITION_START (lowered i32 *@Flag), SRCVALUE i32* Flag
STATEPOINT
GC_TRANSITION_END (lowered i32 *@Flag), SRCVALUE i32 *Flag
...
CALLSEQ_END
In order to generate the necessary transition code, the backend for each target
supported by “hypothetical-gc” must be modified to lower GC_TRANSITION_START
and GC_TRANSITION_END
nodes appropriately when the “hypothetical-gc”
strategy is in use for a particular function. Assuming that such lowering has
been added for X86, the generated assembly would be:
.globl test1
.align 16, 0x90
pushq %rax
movl $1, %fs:Flag@TPOFF
callq foo
movl $0, %fs:Flag@TPOFF
.Ltmp1:
movq (%rsp), %rax # This load is redundant (oops!)
popq %rdx
retq
Note that the design as presented above is not fully implemented: in particular, strategy-specific lowering is not present, and all GC transitions are emitted as as single no-op before and after the call instruction. These no-ops are often removed by the backend during dead machine instruction elimination.
Intrinsics¶
‘llvm.experimental.gc.statepoint’ Intrinsic¶
Syntax:¶
declare i32
@llvm.experimental.gc.statepoint(i64 <id>, i32 <num patch bytes>,
func_type <target>,
i64 <#call args>, i64 <flags>,
... (call parameters),
i64 <# transition args>, ... (transition parameters),
i64 <# deopt args>, ... (deopt parameters),
... (gc parameters))
Overview:¶
The statepoint intrinsic represents a call which is parse-able by the runtime.
Operands:¶
The ‘id’ operand is a constant integer that is reported as the ID field in the generated stackmap. LLVM does not interpret this parameter in any way and its meaning is up to the statepoint user to decide. Note that LLVM is free to duplicate code containing statepoint calls, and this may transform IR that had a unique ‘id’ per lexical call to statepoint to IR that does not.
If ‘num patch bytes’ is non-zero then the call instruction corresponding to the statepoint is not emitted and LLVM emits ‘num patch bytes’ bytes of nops in its place. LLVM will emit code to prepare the function arguments and retrieve the function return value in accordance to the calling convention; the former before the nop sequence and the latter after the nop sequence. It is expected that the user will patch over the ‘num patch bytes’ bytes of nops with a calling sequence specific to their runtime before executing the generated machine code. There are no guarantees with respect to the alignment of the nop sequence. Unlike Stack maps and patch points in LLVM statepoints do not have a concept of shadow bytes.
The ‘target’ operand is the function actually being called. The target can be specified as either a symbolic LLVM function, or as an arbitrary Value of appropriate function type. Note that the function type must match the signature of the callee and the types of the ‘call parameters’ arguments. If ‘num patch bytes’ is non-zero then ‘target’ has to be the constant pointer null of the appropriate function type.
The ‘#call args’ operand is the number of arguments to the actual call. It must exactly match the number of arguments passed in the ‘call parameters’ variable length section.
The ‘flags’ operand is used to specify extra information about the statepoint. This is currently only used to mark certain statepoints as GC transitions. This operand is a 64-bit integer with the following layout, where bit 0 is the least significant bit:
Bit # Usage 0 Set if the statepoint is a GC transition, cleared otherwise. 1-63 Reserved for future use; must be cleared.
The ‘call parameters’ arguments are simply the arguments which need to be passed to the call target. They will be lowered according to the specified calling convention and otherwise handled like a normal call instruction. The number of arguments must exactly match what is specified in ‘# call args’. The types must match the signature of ‘target’.
The ‘transition parameters’ arguments contain an arbitrary list of Values which need to be passed to GC transition code. They will be lowered and passed as operands to the appropriate GC_TRANSITION nodes in the selection DAG. It is assumed that these arguments must be available before and after (but not necessarily during) the execution of the callee. The ‘# transition args’ field indicates how many operands are to be interpreted as ‘transition parameters’.
The ‘deopt parameters’ arguments contain an arbitrary list of Values which is meaningful to the runtime. The runtime may read any of these values, but is assumed not to modify them. If the garbage collector might need to modify one of these values, it must also be listed in the ‘gc pointer’ argument list. The ‘# deopt args’ field indicates how many operands are to be interpreted as ‘deopt parameters’.
The ‘gc parameters’ arguments contain every pointer to a garbage collector object which potentially needs to be updated by the garbage collector. Note that the argument list must explicitly contain a base pointer for every derived pointer listed. The order of arguments is unimportant. Unlike the other variable length parameter sets, this list is not length prefixed.
Semantics:¶
A statepoint is assumed to read and write all memory. As a result, memory operations can not be reordered past a statepoint. It is illegal to mark a statepoint as being either ‘readonly’ or ‘readnone’.
Note that legal IR can not perform any memory operation on a ‘gc
pointer’ argument of the statepoint in a location statically reachable
from the statepoint. Instead, the explicitly relocated value (from a
gc.relocate
) must be used.
‘llvm.experimental.gc.result’ Intrinsic¶
Syntax:¶
declare type*
@llvm.experimental.gc.result(i32 %statepoint_token)
Overview:¶
gc.result
extracts the result of the original call instruction
which was replaced by the gc.statepoint
. The gc.result
intrinsic is actually a family of three intrinsics due to an
implementation limitation. Other than the type of the return value,
the semantics are the same.
Operands:¶
The first and only argument is the gc.statepoint
which starts
the safepoint sequence of which this gc.result
is a part.
Despite the typing of this as a generic i32, only the value defined
by a gc.statepoint
is legal here.
Semantics:¶
The gc.result
represents the return value of the call target of
the statepoint
. The type of the gc.result
must exactly match
the type of the target. If the call target returns void, there will
be no gc.result
.
A gc.result
is modeled as a ‘readnone’ pure function. It has no
side effects since it is just a projection of the return value of the
previous call represented by the gc.statepoint
.
‘llvm.experimental.gc.relocate’ Intrinsic¶
Syntax:¶
declare <pointer type>
@llvm.experimental.gc.relocate(i32 %statepoint_token,
i32 %base_offset,
i32 %pointer_offset)
Overview:¶
A gc.relocate
returns the potentially relocated value of a pointer
at the safepoint.
Operands:¶
The first argument is the gc.statepoint
which starts the
safepoint sequence of which this gc.relocation
is a part.
Despite the typing of this as a generic i32, only the value defined
by a gc.statepoint
is legal here.
The second argument is an index into the statepoints list of arguments which specifies the base pointer for the pointer being relocated. This index must land within the ‘gc parameter’ section of the statepoint’s argument list.
The third argument is an index into the statepoint’s list of arguments which specify the (potentially) derived pointer being relocated. It is legal for this index to be the same as the second argument if-and-only-if a base pointer is being relocated. This index must land within the ‘gc parameter’ section of the statepoint’s argument list.
Semantics:¶
The return value of gc.relocate
is the potentially relocated value
of the pointer specified by it’s arguments. It is unspecified how the
value of the returned pointer relates to the argument to the
gc.statepoint
other than that a) it points to the same source
language object with the same offset, and b) the ‘based-on’
relationship of the newly relocated pointers is a projection of the
unrelocated pointers. In particular, the integer value of the pointer
returned is unspecified.
A gc.relocate
is modeled as a readnone
pure function. It has no
side effects since it is just a way to extract information about work
done during the actual call modeled by the gc.statepoint
.
Stack Map Format¶
Locations for each pointer value which may need read and/or updated by the runtime or collector are provided via the Stack Map format specified in the PatchPoint documentation.
Each statepoint generates the following Locations:
- Constant which describes the calling convention of the call target. This constant is a valid calling convention identifier for the version of LLVM used to generate the stackmap. No additional compatibility guarantees are made for this constant over what LLVM provides elsewhere w.r.t. these identifiers.
- Constant which describes the flags passed to the statepoint intrinsic
- Constant which describes number of following deopt Locations (not operands)
- Variable number of Locations, one for each deopt parameter listed in the IR statepoint (same number as described by previous Constant)
- Variable number of Locations pairs, one pair for each unique pointer which needs relocated. The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated. It is guaranteed that the base pointer must also appear explicitly as a relocation pair if used after the statepoint. There may be fewer pairs then gc parameters in the IR statepoint. Each unique pair will occur at least once; duplicates are possible.
Note that the Locations used in each section may describe the same physical location. e.g. A stack slot may appear as a deopt location, a gc base pointer, and a gc derived pointer.
The LiveOut section of the StkMapRecord will be empty for a statepoint record.
Safepoint Semantics & Verification¶
The fundamental correctness property for the compiled code’s correctness w.r.t. the garbage collector is a dynamic one. It must be the case that there is no dynamic trace such that a operation involving a potentially relocated pointer is observably-after a safepoint which could relocate it. ‘observably-after’ is this usage means that an outside observer could observe this sequence of events in a way which precludes the operation being performed before the safepoint.
To understand why this ‘observable-after’ property is required, consider a null comparison performed on the original copy of a relocated pointer. Assuming that control flow follows the safepoint, there is no way to observe externally whether the null comparison is performed before or after the safepoint. (Remember, the original Value is unmodified by the safepoint.) The compiler is free to make either scheduling choice.
The actual correctness property implemented is slightly stronger than this. We require that there be no static path on which a potentially relocated pointer is ‘observably-after’ it may have been relocated. This is slightly stronger than is strictly necessary (and thus may disallow some otherwise valid programs), but greatly simplifies reasoning about correctness of the compiled code.
By construction, this property will be upheld by the optimizer if correctly established in the source IR. This is a key invariant of the design.
The existing IR Verifier pass has been extended to check most of the local restrictions on the intrinsics mentioned in their respective documentation. The current implementation in LLVM does not check the key relocation invariant, but this is ongoing work on developing such a verifier. Please ask on llvm-dev if you’re interested in experimenting with the current version.
Utility Passes for Safepoint Insertion¶
RewriteStatepointsForGC¶
The pass RewriteStatepointsForGC transforms a functions IR by replacing a
gc.statepoint
(with an optional gc.result
) with a full relocation
sequence, including all required gc.relocates
. To function, the pass
requires that the GC strategy specified for the function be able to reliably
distinguish between GC references and non-GC references in IR it is given.
As an example, given this code:
define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0)
ret i8 addrspace(1)* %obj
}
The pass would produce this IR:
define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
%0 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
%obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 12, i32 12)
ret i8 addrspace(1)* %obj.relocated
}
In the above examples, the addrspace(1) marker on the pointers is the mechanism
that the statepoint-example
GC strategy uses to distinguish references from
non references. Address space 1 is not globally reserved for this purpose.
This pass can be used an utility function by a language frontend that doesn’t want to manually reason about liveness, base pointers, or relocation when constructing IR. As currently implemented, RewriteStatepointsForGC must be run after SSA construction (i.e. mem2ref).
In practice, RewriteStatepointsForGC can be run much later in the pass pipeline, after most optimization is already done. This helps to improve the quality of the generated code when compiled with garbage collection support. In the long run, this is the intended usage model. At this time, a few details have yet to be worked out about the semantic model required to guarantee this is always correct. As such, please use with caution and report bugs.
PlaceSafepoints¶
The pass PlaceSafepoints transforms a function’s IR by replacing any call or
invoke instructions with appropriate gc.statepoint
and gc.result
pairs,
and inserting safepoint polls sufficient to ensure running code checks for a
safepoint request on a timely manner. This pass is expected to be run before
RewriteStatepointsForGC and thus does not produce full relocation sequences.
As an example, given input IR of the following:
define void @test() gc "statepoint-example" {
call void @foo()
ret void
}
declare void @do_safepoint()
define void @gc.safepoint_poll() {
call void @do_safepoint()
ret void
}
This pass would produce the following IR:
define void @test() gc "statepoint-example" {
%safepoint_token = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @do_safepoint, i32 0, i32 0, i32 0, i32 0)
%safepoint_token1 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0)
ret void
}
In this case, we’ve added an (unconditional) entry safepoint poll and converted the call into a gc.statepoint
. Note that despite appearances, the entry poll is not necessarily redundant. We’d have to know that foo
and test
were not mutually recursive for the poll to be redundant. In practice, you’d probably want to your poll definition to contain a conditional branch of some form.
At the moment, PlaceSafepoints can insert safepoint polls at method entry and loop backedges locations. Extending this to work with return polls would be straight forward if desired.
PlaceSafepoints includes a number of optimizations to avoid placing safepoint polls at particular sites unless needed to ensure timely execution of a poll under normal conditions. PlaceSafepoints does not attempt to ensure timely execution of a poll under worst case conditions such as heavy system paging.
The implementation of a safepoint poll action is specified by looking up a
function of the name gc.safepoint_poll
in the containing Module. The body
of this function is inserted at each poll site desired. While calls or invokes
inside this method are transformed to a gc.statepoints
, recursive poll
insertion is not performed.
By default PlaceSafepoints passes in 0xABCDEF00
as the statepoint
ID and 0
as the number of patchable bytes to the newly constructed
gc.statepoint
. These values can be configured on a per-callsite
basis using the attributes "statepoint-id"
and
"statepoint-num-patch-bytes"
. If a call site is marked with a
"statepoint-id"
function attribute and its value is a positive
integer (represented as a string), then that value is used as the ID
of the newly constructed gc.statepoint
. If a call site is marked
with a "statepoint-num-patch-bytes"
function attribute and its
value is a positive integer, then that value is used as the ‘num patch
bytes’ parameter of the newly constructed gc.statepoint
. The
"statepoint-id"
and "statepoint-num-patch-bytes"
attributes
are not propagated to the gc.statepoint
call or invoke if they
could be successfully parsed.
If you are scheduling the RewriteStatepointsForGC pass late in the pass order, you should probably schedule this pass immediately before it. The exception would be if you need to preserve abstract frame information (e.g. for deoptimization or introspection) at safepoints. In that case, ask on the llvm-dev mailing list for suggestions.
Bugs and Enhancements¶
Currently known bugs and enhancements under consideration can be tracked by performing a bugzilla search for [Statepoint] in the summary field. When filing new bugs, please use this tag so that interested parties see the newly filed bug. As with most LLVM features, design discussions take place on llvm-dev, and patches should be sent to llvm-commits for review.
MergeFunctions pass, how it works¶
Introduction¶
Sometimes code contains equal functions, or functions that does exactly the same thing even though they are non-equal on the IR level (e.g.: multiplication on 2 and ‘shl 1’). It could happen due to several reasons: mainly, the usage of templates and automatic code generators. Though, sometimes user itself could write the same thing twice :-)
The main purpose of this pass is to recognize such functions and merge them.
Why would I want to read this document?¶
Document is the extension to pass comments and describes the pass logic. It describes algorithm that is used in order to compare functions, it also explains how we could combine equal functions correctly, keeping module valid.
Material is brought in top-down form, so reader could start learn pass from ideas and end up with low-level algorithm details, thus preparing him for reading the sources.
So main goal is do describe algorithm and logic here; the concept. This document is good for you, if you don’t want to read the source code, but want to understand pass algorithms. Author tried not to repeat the source-code and cover only common cases, and thus avoid cases when after minor code changes we need to update this document.
What should I know to be able to follow along with this document?¶
Reader should be familiar with common compile-engineering principles and LLVM code fundamentals. In this article we suppose reader is familiar with Single Static Assingment concepts. Understanding of IR structure is also important.
We will use such terms as “module”, “function”, “basic block”, “user”, “value”, “instruction”.
As a good start point, Kaleidoscope tutorial could be used:
LLVM Tutorial: Table of Contents
Especially it’s important to understand chapter 3 of tutorial:
Kaleidoscope: Code generation to LLVM IR
Reader also should know how passes work in LLVM, they could use next article as a reference and start point here:
What else? Well perhaps reader also should have some experience in LLVM pass debugging and bug-fixing.
What I gain by reading this document?¶
Main purpose is to provide reader with comfortable form of algorithms description, namely the human reading text. Since it could be hard to understand algorithm straight from the source code: pass uses some principles that have to be explained first.
Author wishes to everybody to avoid case, when you read code from top to bottom again and again, and yet you don’t understand why we implemented it that way.
We hope that after this article reader could easily debug and improve MergeFunctions pass and thus help LLVM project.
Narrative structure¶
Article consists of three parts. First part explains pass functionality on the top-level. Second part describes the comparison procedure itself. The third part describes the merging process.
In every part author also tried to put the contents into the top-down form. First, the top-level methods will be described, while the terminal ones will be at the end, in the tail of each part. If reader will see the reference to the method that wasn’t described yet, they will find its description a bit below.
Basics¶
How to do it?¶
Do we need to merge functions? Obvious thing is: yes that’s a quite possible case, since usually we do have duplicates. And it would be good to get rid of them. But how to detect such a duplicates? The idea is next: we split functions onto small bricks (parts), then we compare “bricks” amount, and if it equal, compare “bricks” themselves, and then do our conclusions about functions themselves.
What the difference it could be? For example, on machine with 64-bit pointers
(let’s assume we have only one address space), one function stores 64-bit
integer, while another one stores a pointer. So if the target is a machine
mentioned above, and if functions are identical, except the parameter type (we
could consider it as a part of function type), then we can treat uint64_t
and``void*`` as equal.
It was just an example; possible details are described a bit below.
As another example reader may imagine two more functions. First function performs multiplication on 2, while the second one performs arithmetic right shift on 1.
Possible solutions¶
Let’s briefly consider possible options about how and what we have to implement in order to create full-featured functions merging, and also what it would meant for us.
Equal functions detection, obviously supposes “detector” method to be implemented, latter should answer the question “whether functions are equal”. This “detector” method consists of tiny “sub-detectors”, each of them answers exactly the same question, but for function parts.
As the second step, we should merge equal functions. So it should be a “merger” method. “Merger” accepts two functions F1 and F2, and produces F1F2 function, the result of merging.
Having such a routines in our hands, we can process whole module, and merge all equal functions.
In this case, we have to compare every function with every another function. As reader could notice, this way seems to be quite expensive. Of course we could introduce hashing and other helpers, but it is still just an optimization, and thus the level of O(N*N) complexity.
Can we reach another level? Could we introduce logarithmical search, or random access lookup? The answer is: “yes”.
Random-access¶
How it could be done? Just convert each function to number, and gather all of them in special hash-table. Functions with equal hash are equal. Good hashing means, that every function part must be taken into account. That means we have to convert every function part into some number, and then add it into hash. Lookup-up time would be small, but such approach adds some delay due to hashing routine.
Logarithmical search¶
We could introduce total ordering among the functions set, once we had it we could then implement a logarithmical search. Lookup time still depends on N, but adds a little of delay (log(N)).
Present state¶
Both of approaches (random-access and logarithmical) has been implemented and tested. And both of them gave a very good improvement. And what was most surprising, logarithmical search was faster; sometimes up to 15%. Hashing needs some extra CPU time, and it is the main reason why it works slower; in most of cases total “hashing” time was greater than total “logarithmical-search” time.
So, preference has been granted to the “logarithmical search”.
Though in the case of need, logarithmical-search (read “total-ordering”) could be used as a milestone on our way to the random-access implementation.
Every comparison is based either on the numbers or on flags comparison. In random-access approach we could use the same comparison algorithm. During comparison we exit once we find the difference, but here we might have to scan whole function body every time (note, it could be slower). Like in “total-ordering”, we will track every numbers and flags, but instead of comparison, we should get numbers sequence and then create the hash number. So, once again, total-ordering could be considered as a milestone for even faster (in theory) random-access approach.
MergeFunctions, main fields and runOnModule¶
There are two most important fields in class:
FnTree
– the set of all unique functions. It keeps items that couldn’t be
merged with each other. It is defined as:
std::set<FunctionNode> FnTree;
Here FunctionNode
is a wrapper for llvm::Function
class, with
implemented “<” operator among the functions set (below we explain how it works
exactly; this is a key point in fast functions comparison).
Deferred
– merging process can affect bodies of functions that are in
FnTree
already. Obviously such functions should be rechecked again. In this
case we remove them from FnTree
, and mark them as to be rescanned, namely
put them into Deferred
list.
runOnModule¶
The algorithm is pretty simple:
- Put all module’s functions into the worklist.
2. Scan worklist‘s functions twice: first enumerate only strong functions and then only weak ones:
2.1. Loop body: take function from worklist (call it FCur) and try to insert it into FnTree: check whether FCur is equal to one of functions in FnTree. If there is equal function in FnTree (call it FExists): merge function FCur with FExists. Otherwise add function from worklist to FnTree.
3. Once worklist scanning and merging operations is complete, check Deferred list. If it is not empty: refill worklist contents with Deferred list and do step 2 again, if Deferred is empty, then exit from method.
Comparison and logarithmical search¶
Let’s recall our task: for every function F from module M, we have to find equal functions F` in shortest time, and merge them into the single function.
Defining total ordering among the functions set allows to organize functions into the binary tree. The lookup procedure complexity would be estimated as O(log(N)) in this case. But how to define total-ordering?
We have to introduce a single rule applicable to every pair of functions, and following this rule then evaluate which of them is greater. What kind of rule it could be? Let’s declare it as “compare” method, that returns one of 3 possible values:
-1, left is less than right,
0, left and right are equal,
1, left is greater than right.
Of course it means, that we have to maintain strict and non-strict order relation properties:
- reflexivity (
a <= a
,a == a
,a >= a
), - antisymmetry (if
a <= b
andb <= a
thena == b
), - transitivity (
a <= b
andb <= c
, thena <= c
) - asymmetry (if
a < b
, thena > b
ora == b
).
As it was mentioned before, comparison routine consists of “sub-comparison-routines”, each of them also consists “sub-comparison-routines”, and so on, finally it ends up with a primitives comparison.
Below, we will use the next operations:
cmpNumbers(number1, number2)
is method that returns -1 if left is less than right; 0, if left and right are equal; and 1 otherwise.cmpFlags(flag1, flag2)
is hypothetical method that compares two flags. The logic is the same as incmpNumbers
, wheretrue
is 1, andfalse
is 0.
The rest of article is based on MergeFunctions.cpp source code (<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp). We would like to ask reader to keep this file open nearby, so we could use it as a reference for further explanations.
Now we’re ready to proceed to the next chapter and see how it works.
Functions comparison¶
At first, let’s define how exactly we compare complex objects.
Complex objects comparison (function, basic-block, etc) is mostly based on its sub-objects comparison results. So it is similar to the next “tree” objects comparison:
- For two trees T1 and T2 we perform depth-first-traversal and have two sequences as a product: “T1Items” and “T2Items”.
- Then compare chains “T1Items” and “T2Items” in most-significant-item-first order. Result of items comparison would be the result of T1 and T2 comparison itself.
FunctionComparator::compare(void)¶
Brief look at the source code tells us, that comparison starts in
“int FunctionComparator::compare(void)
” method.
1. First parts to be compared are function’s attributes and some properties that
outsides “attributes” term, but still could make function different without
changing its body. This part of comparison is usually done within simple
cmpNumbers or cmpFlags operations (e.g.
cmpFlags(F1->hasGC(), F2->hasGC())
). Below is full list of function’s
properties to be compared on this stage:
- Attributes (those are returned by
Function::getAttributes()
method).- GC, for equivalence, RHS and LHS should be both either without GC or with the same one.
- Section, just like a GC: RHS and LHS should be defined in the same section.
- Variable arguments. LHS and RHS should be both either with or without var-args.
- Calling convention should be the same.
2. Function type. Checked by FunctionComparator::cmpType(Type*, Type*)
method. It checks return type and parameters type; the method itself will be
described later.
3. Associate function formal parameters with each other. Then comparing function
bodies, if we see the usage of LHS‘s i-th argument in LHS‘s body, then,
we want to see usage of RHS‘s i-th argument at the same place in RHS‘s
body, otherwise functions are different. On this stage we grant the preference
to those we met later in function body (value we met first would be less).
This is done by “FunctionComparator::cmpValues(const Value*, const Value*)
”
method (will be described a bit later).
- Function body comparison. As it written in method comments:
“We do a CFG-ordered walk since the actual ordering of the blocks in the linked list is immaterial. Our walk starts at the entry block for both functions, then takes each block from each terminator in order. As an artifact, this also means that unreachable blocks are ignored.”
So, using this walk we get BBs from left and right in the same order, and
compare them by “FunctionComparator::compare(const BasicBlock*, const
BasicBlock*)
” method.
We also associate BBs with each other, like we did it with function formal
arguments (see cmpValues
method below).
FunctionComparator::cmpType¶
Consider how types comparison works.
1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the integer type. It could be done if its address space is 0, or if address spaces are ignored at all. Do the same thing for the right type.
2. If left and right types are equal, return 0. Otherwise we need to give preference to one of them. So proceed to the next step.
3. If types are of different kind (different type IDs). Return result of type
IDs comparison, treating them as a numbers (use cmpNumbers
operation).
4. If types are vectors or integers, return result of their pointers comparison, comparing them as numbers.
Check whether type ID belongs to the next group (call it equivalent-group):
- Void
- Float
- Double
- X86_FP80
- FP128
- PPC_FP128
- Label
- Metadata.
If ID belongs to group above, return 0. Since it’s enough to see that types has the same
TypeID
. No additional information is required.
6. Left and right are pointers. Return result of address space comparison (numbers comparison).
7. Complex types (structures, arrays, etc.). Follow complex objects comparison technique (see the very first paragraph of this chapter). Both left and right are to be expanded and their element types will be checked the same way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn’t
get any conclusions, then invoke llvm_unreachable
, since it’s quite
unexpectable case.
cmpValues(const Value*, const Value*)¶
Method that compares local values.
This method gives us an answer on a very curious quesion: whether we could treat local values as equal, and which value is greater otherwise. It’s better to start from example:
Consider situation when we’re looking at the same place in left function “FL” and in right function “FR”. And every part of left place is equal to the corresponding part of right place, and (!) both parts use Value instances, for example:
instr0 i32 %LV ; left side, function FL
instr0 i32 %RV ; right side, function FR
So, now our conclusion depends on Value instances comparison.
Main purpose of this method is to determine relation between such values.
What we expect from equal functions? At the same place, in functions “FL” and “FR” we expect to see equal values, or values defined at the same place in “FL” and “FR”.
Consider small example here:
define void %f(i32 %pf0, i32 %pf1) {
instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
}
define void %g(i32 %pg0, i32 %pg1) {
instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
}
In this example, pf0 is associated with pg0, pf1 is associated with pg1, and we also declare that pf0 < pf1, and thus pg0 < pf1.
Instructions with opcode “instr0” would be equal, since their types and opcodes are equal, and values are associated.
Instruction with opcode “instr1” from f is greater than instruction with opcode “instr1” from g; here we have equal types and opcodes, but “pf1 is greater than “pg0”.
And instructions with opcode “instr2” are equal, because their opcodes and types are equal, and the same constant is used as a value.
What we assiciate in cmpValues?¶
- Function arguments. i-th argument from left function associated with i-th argument from right function.
- BasicBlock instances. In basic-block enumeration loop we associate i-th BasicBlock from the left function with i-th BasicBlock from the right function.
- Instructions.
- Instruction operands. Note, we can meet Value here we have never seen before. In this case it is not a function argument, nor BasicBlock, nor Instruction. It is global value. It is constant, since its the only supposed global here. Method also compares:
- Constants that are of the same type.
- If right constant could be losslessly bit-casted to the left one, then we also compare them.
How to implement cmpValues?¶
Association is a case of equality for us. We just treat such values as equal. But, in general, we need to implement antisymmetric relation. As it was mentioned above, to understand what is less, we can use order in which we meet values. If both of values has the same order in function (met at the same time), then treat values as associated. Otherwise – it depends on who was first.
Every time we run top-level compare method, we initialize two identical maps (one for the left side, another one for the right side):
map<Value, int> sn_mapL, sn_mapR;
The key of the map is the Value itself, the value – is its order (call it serial number).
To add value V we need to perform the next procedure:
sn_map.insert(std::make_pair(V, sn_map.size()));
For the first Value, map will return 0, for second Value map will return 1, and so on.
Then we can check whether left and right values met at the same time with simple comparison:
cmpNumbers(sn_mapL[Left], sn_mapR[Right]);
Of course, we can combine insertion and comparison:
std::pair<iterator, bool>
LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
= sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
return cmpNumbers(LeftRes.first->second, RightRes.first->second);
Let’s look, how whole method could be implemented.
1. we have to start from the bad news. Consider function self and cross-referencing cases:
// self-reference unsigned fact0(unsigned n) { return n > 1 ? n
* fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
fact1(n-1) : 1; }
// cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
} unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
This comparison has been implemented in initial MergeFunctions pass version. But, unfortunately, it is not transitive. And this is the only case we can’t convert to less-equal-greater comparison. It is a seldom case, 4-5 functions of 10000 (checked on test-suite), and, we hope, reader would forgive us for such a sacrifice in order to get the O(log(N)) pass time.
2. If left/right Value is a constant, we have to compare them. Return 0 if it
is the same constant, or use cmpConstants
method otherwise.
3. If left/right is InlineAsm instance. Return result of Value pointers comparison.
4. Explicit association of L (left value) and R (right value). We need to find out whether values met at the same time, and thus are associated. Or we need to put the rule: when we treat L < R. Now it is easy: just return result of numbers comparison:
std::pair<iterator, bool>
LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
if (LeftRes.first->second == RightRes.first->second) return 0;
if (LeftRes.first->second < RightRes.first->second) return -1;
return 1;
Now when cmpValues returns 0, we can proceed comparison procedure. Otherwise, if we get (-1 or 1), we need to pass this result to the top level, and finish comparison procedure.
cmpConstants¶
Performs constants comparison as follows:
1. Compare constant types using cmpType
method. If result is -1 or 1, goto
step 2, otherwise proceed to step 3.
2. If types are different, we still can check whether constants could be
losslessly bitcasted to each other. The further explanation is modification of
canLosslesslyBitCastTo
method.
2.1 Check whether constants are of the first class types (
isFirstClassType
check):2.1.1. If both constants are not of the first class type: return result of
cmpType
.2.1.2. Otherwise, if left type is not of the first class, return -1. If right type is not of the first class, return 1.
2.1.3. If both types are of the first class type, proceed to the next step (2.1.3.1).
2.1.3.1. If types are vectors, compare their bitwidth using the cmpNumbers. If result is not 0, return it.
2.1.3.2. Different types, but not a vectors:
- if both of them are pointers, good for us, we can proceed to step 3.
- if one of types is pointer, return result of isPointer flags comparison (cmpFlags operation).
- otherwise we have no methods to prove bitcastability, and thus return result of types comparison (-1 or 1).
Steps below are for the case when types are equal, or case when constants are bitcastable:
3. One of constants is a “null” value. Return the result of
cmpFlags(L->isNullValue, R->isNullValue)
comparison.
- Compare value IDs, and return result if it is not 0:
if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
return Res;
5. Compare the contents of constants. The comparison depends on kind of constants, but on this stage it is just a lexicographical comparison. Just see how it was described in the beginning of “Functions comparison” paragraph. Mathematically it is equal to the next case: we encode left constant and right constant (with similar way bitcode-writer does). Then compare left code sequence and right code sequence.
compare(const BasicBlock*, const BasicBlock*)¶
Compares two BasicBlock instances.
It enumerates instructions from left BB and right BB.
1. It assigns serial numbers to the left and right instructions, using
cmpValues
method.
2. If one of left or right is GEP (GetElementPtr
), then treat GEP as
greater than other instructions, if both instructions are GEPs use cmpGEP
method for comparison. If result is -1 or 1, pass it to the top-level
comparison (return it).
3.1. Compare operations. Call
cmpOperation
method. If result is -1 or 1, return it.3.2. Compare number of operands, if result is -1 or 1, return it.
3.3. Compare operands themselves, use
cmpValues
method. Return result if it is -1 or 1.3.4. Compare type of operands, using
cmpType
method. Return result if it is -1 or 1.3.5. Proceed to the next instruction.
We can finish instruction enumeration in 3 cases:
4.1. We reached the end of both left and right basic-blocks. We didn’t exit on steps 1-3, so contents is equal, return 0.
4.2. We have reached the end of the left basic-block. Return -1.
4.3. Return 1 (the end of the right basic block).
cmpGEP¶
Compares two GEPs (getelementptr
instructions).
It differs from regular operations comparison with the only thing: possibility
to use accumulateConstantOffset
method.
So, if we get constant offset for both left and right GEPs, then compare it as numbers, and return comparison result.
Otherwise treat it like a regular operation (see previous paragraph).
cmpOperation¶
Compares instruction opcodes and some important operation properties.
- Compare opcodes, if it differs return the result.
- Compare number of operands. If it differs – return the result.
3. Compare operation types, use cmpType. All the same – if types are different, return result.
4. Compare subclassOptionalData, get it with getRawSubclassOptionalData
method, and compare it like a numbers.
- Compare operand types.
6. For some particular instructions check equivalence (relation in our case) of
some significant attributes. For example we have to compare alignment for
load
instructions.
Merging process, mergeTwoFunctions¶
Once MergeFunctions detected that current function (G) is equal to one that
were analyzed before (function F) it calls mergeTwoFunctions(Function*,
Function*)
.
Operation affects FnTree
contents with next way: F will stay in
FnTree
. G being equal to F will not be added to FnTree
. Calls of
G would be replaced with something else. It changes bodies of callers. So,
functions that calls G would be put into Deferred
set and removed from
FnTree
, and analyzed again.
The approach is next:
1. Most wished case: when we can use alias and both of F and G are weak. We
make both of them with aliases to the third strong function H. Actually H
is F. See below how it’s made (but it’s better to look straight into the
source code). Well, this is a case when we can just replace G with F
everywhere, we use replaceAllUsesWith
operation here (RAUW).
2. F could not be overridden, while G could. It would be good to do the next: after merging the places where overridable function were used, still use overridable stub. So try to make G alias to F, or create overridable tail call wrapper around F and replace G with that call.
3. Neither F nor G could be overridden. We can’t use RAUW. We can just
change the callers: call F instead of G. That’s what
replaceDirectCallers
does.
Below is detailed body description.
If “F” may be overridden¶
As follows from mayBeOverridden
comments: “whether the definition of this
global may be replaced by something non-equivalent at link time”. If so, thats
ok: we can use alias to F instead of G or change call instructions itself.
HasGlobalAliases, removeUsers¶
First consider the case when we have global aliases of one function name to
another. Our purpose is make both of them with aliases to the third strong
function. Though if we keep F alive and without major changes we can leave it
in FnTree
. Try to combine these two goals.
Do stub replacement of F itself with an alias to F.
1. Create stub function H, with the same name and attributes like function F. It takes maximum alignment of F and G.
2. Replace all uses of function F with uses of function H. It is the two
steps procedure instead. First of all, we must take into account, all functions
from whom F is called would be changed: since we change the call argument
(from F to H). If so we must to review these caller functions again after
this procedure. We remove callers from FnTree
, method with name
removeUsers(F)
does that (don’t confuse with replaceAllUsesWith
):
2.1.
Inside removeUsers(Value* V)
we go through the all values that use value V (or F in our context). If value is instruction, we go to function that holds this instruction and mark it as to-be-analyzed-again (put toDeferred
set), we also remove caller fromFnTree
.2.2. Now we can do the replacement: call
F->replaceAllUsesWith(H)
.
3. H (that now “officially” plays F‘s role) is replaced with alias to F. Do the same with G: replace it with alias to F. So finally everywhere F was used, we use H and it is alias to F, and everywhere G was used we also have alias to F.
- Set F linkage to private. Make it strong :-)
No global aliases, replaceDirectCallers¶
If global aliases are not supported. We call replaceDirectCallers
then. Just
go through all calls of G and replace it with calls of F. If you look into
method you will see that it scans all uses of G too, and if use is callee (if
user is call instruction and G is used as what to be called), we replace it
with use of F.
If “F” could not be overridden, fix it!¶
We call writeThunkOrAlias(Function *F, Function *G)
. Here we try to replace
G with alias to F first. Next conditions are essential:
- target should support global aliases,
- the address itself of G should be not significant, not named and not referenced anywhere,
- function should come with external, local or weak linkage.
Otherwise we write thunk: some wrapper that has G’s interface and calls F, so G could be replaced with this wrapper.
writeAlias
As follows from llvm reference:
“Aliases act as second name for the aliasee value”. So we just want to create second name for F and use it instead of G:
create global alias itself (GA),
adjust alignment of F so it must be maximum of current and G’s alignment;
replace uses of G:
3.1. first mark all callers of G as to-be-analyzed-again, using
removeUsers
method (see chapter above),3.2. call
G->replaceAllUsesWith(GA)
.Get rid of G.
writeThunk
As it written in method comments:
“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G with bitcast(F). Deletes G.”
In general it does the same as usual when we want to replace callee, except the first point:
1. We generate tail call wrapper around F, but with interface that allows use it instead of G.
- “As-usual”:
removeUsers
andreplaceAllUsesWith
then. - Get rid of G.
That’s it.¶
We have described how to detect equal functions, and how to merge them, and in first chapter we have described how it works all-together. Author hopes, reader have some picture from now, and it helps him improve and debug this pass.
Reader is welcomed to send us any questions and proposals ;-)
Bitsets¶
This is a mechanism that allows IR modules to co-operatively build pointer sets corresponding to addresses within a given set of globals. One example of a use case for this is to allow a C++ program to efficiently verify (at each call site) that a vtable pointer is in the set of valid vtable pointers for the type of the class or its derived classes.
To use the mechanism, a client creates a global metadata node named
llvm.bitsets
. Each element is a metadata node with three elements:
the first is a metadata string containing an identifier for the bitset,
the second is a global variable and the third is a byte offset into the
global variable.
This will cause a link-time optimization pass to generate bitsets from the memory addresses referenced from the elements of the bitset metadata. The pass will lay out the referenced globals consecutively, so their definitions must be available at LTO time. The GlobalLayoutBuilder class is responsible for laying out the globals efficiently to minimize the sizes of the underlying bitsets. An intrinsic, llvm.bitset.test, generates code to test whether a given pointer is a member of a bitset.
Example: |
---|
target datalayout = "e-p:32:32"
@a = internal global i32 0
@b = internal global i32 0
@c = internal global i32 0
@d = internal global [2 x i32] [i32 0, i32 0]
!llvm.bitsets = !{!0, !1, !2, !3, !4}
!0 = !{!"bitset1", i32* @a, i32 0}
!1 = !{!"bitset1", i32* @b, i32 0}
!2 = !{!"bitset2", i32* @b, i32 0}
!3 = !{!"bitset2", i32* @c, i32 0}
!4 = !{!"bitset2", i32* @d, i32 4}
declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone
define i1 @foo(i32* %p) {
%pi8 = bitcast i32* %p to i8*
%x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset1")
ret i1 %x
}
define i1 @bar(i32* %p) {
%pi8 = bitcast i32* %p to i8*
%x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset2")
ret i1 %x
}
define void @main() {
%a1 = call i1 @foo(i32* @a) ; returns 1
%b1 = call i1 @foo(i32* @b) ; returns 1
%c1 = call i1 @foo(i32* @c) ; returns 0
%a2 = call i1 @bar(i32* @a) ; returns 0
%b2 = call i1 @bar(i32* @b) ; returns 1
%c2 = call i1 @bar(i32* @c) ; returns 1
%d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
%d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
ret void
}
FaultMaps and implicit checks¶
Motivation¶
Code generated by managed language runtimes tend to have checks that are required for safety but never fail in practice. In such cases, it is profitable to make the non-failing case cheaper even if it makes the failing case significantly more expensive. This asymmetry can be exploited by folding such safety checks into operations that can be made to fault reliably if the check would have failed, and recovering from such a fault by using a signal handler.
For example, Java requires null checks on objects before they are read
from or written to. If the object is null
then a
NullPointerException
has to be thrown, interrupting normal
execution. In practice, however, dereferencing a null
pointer is
extremely rare in well-behaved Java programs, and typically the null
check can be folded into a nearby memory operation that operates on
the same memory location.
The Fault Map Section¶
Information about implicit checks generated by LLVM are put in a
special “fault map” section. On Darwin this section is named
__llvm_faultmaps
.
The format of this section is
Header {
uint8 : Fault Map Version (current version is 1)
uint8 : Reserved (expected to be 0)
uint16 : Reserved (expected to be 0)
}
uint32 : NumFunctions
FunctionInfo[NumFunctions] {
uint64 : FunctionAddress
uint32 : NumFaultingPCs
uint32 : Reserved (expected to be 0)
FunctionFaultInfo[NumFaultingPCs] {
uint32 : FaultKind = FaultMaps::FaultingLoad (only legal value currently)
uint32 : FaultingPCOffset
uint32 : HandlerPCOffset
}
}
The ImplicitNullChecks
pass¶
The ImplicitNullChecks
pass transforms explicit control flow for
checking if a pointer is null
, like:
%ptr = call i32* @get_ptr()
%ptr_is_null = icmp i32* %ptr, null
br i1 %ptr_is_null, label %is_null, label %not_null, !make.implicit !0
not_null:
%t = load i32, i32* %ptr
br label %do_something_with_t
is_null:
call void @HFC()
unreachable
!0 = !{}
to control flow implicit in the instruction loading or storing through the pointer being null checked:
%ptr = call i32* @get_ptr()
%t = load i32, i32* %ptr ;; handler-pc = label %is_null
br label %do_something_with_t
is_null:
call void @HFC()
unreachable
This transform happens at the MachineInstr
level, not the LLVM IR
level (so the above example is only representative, not literal). The
ImplicitNullChecks
pass runs during codegen, if
-enable-implicit-null-checks
is passed to llc
.
The ImplicitNullChecks
pass adds entries to the
__llvm_faultmaps
section described above as needed.
make.implicit
metadata¶
Making null checks implicit is an aggressive optimization, and it can be a net performance pessimization if too many memory operations end up faulting because of it. A language runtime typically needs to ensure that only a negligible number of implicit null checks actually fault once the application has reached a steady state. A standard way of doing this is by healing failed implicit null checks into explicit null checks via code patching or recompilation. It follows that there are two requirements an explicit null check needs to satisfy for it to be profitable to convert it to an implicit null check:
- The case where the pointer is actually null (i.e. the “failing” case) is extremely rare.
- The failing path heals the implicit null check into an explicit null check so that the application does not repeatedly page fault.
The frontend is expected to mark branches that satisfy (1) and (2)
using a !make.implicit
metadata node (the actual content of the
metadata node is ignored). Only branches that are marked with
!make.implicit
metadata are considered as candidates for
conversion into implicit null checks.
(Note that while we could deal with (1) using profiling data, dealing with (2) requires some information not present in branch profiles.)
- Writing an LLVM Pass
- Information on how to write LLVM transformations and analyses.
- Writing an LLVM Backend
- Information on how to write LLVM backends for machine targets.
- The LLVM Target-Independent Code Generator
- The design and implementation of the LLVM code generator. Useful if you are working on retargetting LLVM to a new architecture, designing a new codegen pass, or enhancing existing components.
- TableGen
- Describes the TableGen tool, which is used heavily by the LLVM code generator.
- LLVM Alias Analysis Infrastructure
- Information on how to write a new alias analysis implementation or how to use existing analyses.
- Garbage Collection with LLVM
- The interfaces source-language compilers should use for compiling GC’d programs.
- Source Level Debugging with LLVM
- This document describes the design and philosophy behind the LLVM source-level debugger.
- Auto-Vectorization in LLVM
- This document describes the current status of vectorization in LLVM.
- Exception Handling in LLVM
- This document describes the design and implementation of exception handling in LLVM.
- LLVM bugpoint tool: design and usage
- Automatic bug finder and test-case reducer description and usage information.
- LLVM Bitcode File Format
- This describes the file format and encoding used for LLVM “bc” files.
- System Library
- This document describes the LLVM System Library (
lib/System
) and how to keep LLVM source code portable - LLVM Link Time Optimization: Design and Implementation
- This document describes the interface between LLVM intermodular optimizer and the linker and its design
- The LLVM gold plugin
- How to build your programs with link-time optimization on Linux.
- Debugging JIT-ed Code With GDB
- How to debug JITed code with GDB.
- MCJIT Design and Implementation
- Describes the inner workings of MCJIT execution engine.
- LLVM Branch Weight Metadata
- Provides information about Branch Prediction Information.
- LLVM Block Frequency Terminology
- Provides information about terminology used in the
BlockFrequencyInfo
analysis pass. - Segmented Stacks in LLVM
- This document describes segmented stacks and how they are used in LLVM.
- LLVM’s Optional Rich Disassembly Output
- This document describes the optional rich disassembly output syntax.
- How To Use Attributes
- Answers some questions about the new Attributes infrastructure.
- User Guide for NVPTX Back-end
- This document describes using the NVPTX back-end to compile GPU kernels.
- User Guide for AMDGPU Back-end
- This document describes how to use the AMDGPU back-end.
- Stack maps and patch points in LLVM
- LLVM support for mapping instruction addresses to the location of values and allowing code to be patched.
- Using ARM NEON instructions in big endian mode
- LLVM’s support for generating NEON instructions on big endian ARM targets is somewhat nonintuitive. This document explains the implementation and rationale.
- LLVM Code Coverage Mapping Format
- This describes the format and encoding used for LLVM’s code coverage mapping.
- Garbage Collection Safepoints in LLVM
- This describes a set of experimental extensions for garbage collection support.
- MergeFunctions pass, how it works
- Describes functions merging optimization.
- Design and Usage of the InAlloca Attribute
- Description of the
inalloca
argument attribute. - FaultMaps and implicit checks
- LLVM support for folding control flow into faulting machine instructions.
Development Process Documentation¶
Information about LLVM’s development process.
LLVM Developer Policy¶
Introduction¶
This document contains the LLVM Developer Policy which defines the project’s policy towards developers and their contributions. The intent of this policy is to eliminate miscommunication, rework, and confusion that might arise from the distributed nature of LLVM’s development. By stating the policy in clear terms, we hope each developer can know ahead of time what to expect when making LLVM contributions. This policy covers all llvm.org subprojects, including Clang, LLDB, libc++, etc.
This policy is also designed to accomplish the following objectives:
- Attract both users and developers to the LLVM project.
- Make life as simple and easy for contributors as possible.
- Keep the top of Subversion trees as stable as possible.
- Establish awareness of the project’s copyright, license, and patent policies with contributors to the project.
This policy is aimed at frequent contributors to LLVM. People interested in contributing one-off patches can do so in an informal way by sending them to the llvm-commits mailing list and engaging another developer to see it through the process.
Developer Policies¶
This section contains policies that pertain to frequent LLVM developers. We always welcome one-off patches from people who do not routinely contribute to LLVM, but we expect more from frequent contributors to keep the system as efficient as possible for everyone. Frequent LLVM contributors are expected to meet the following requirements in order for LLVM to maintain a high standard of quality.
Stay Informed¶
Developers should stay informed by reading at least the “dev” mailing list for the projects you are interested in, such as llvm-dev for LLVM, cfe-dev for Clang, or lldb-dev for LLDB. If you are doing anything more than just casual work on LLVM, it is suggested that you also subscribe to the “commits” mailing list for the subproject you’re interested in, such as llvm-commits, cfe-commits, or lldb-commits. Reading the “commits” list and paying attention to changes being made by others is a good way to see what other people are interested in and watching the flow of the project as a whole.
We recommend that active developers register an email account with LLVM Bugzilla and preferably subscribe to the llvm-bugs email list to keep track of bugs and enhancements occurring in LLVM. We really appreciate people who are proactive at catching incoming bugs in their components and dealing with them promptly.
Please be aware that all public LLVM mailing lists are public and archived, and that notices of confidentiality or non-disclosure cannot be respected.
Making and Submitting a Patch¶
When making a patch for review, the goal is to make it as easy for the reviewer to read it as possible. As such, we recommend that you:
- Make your patch against the Subversion trunk, not a branch, and not an old version of LLVM. This makes it easy to apply the patch. For information on how to check out SVN trunk, please see the Getting Started Guide.
- Similarly, patches should be submitted soon after they are generated. Old patches may not apply correctly if the underlying code changes between the time the patch was created and the time it is applied.
- Patches should be made with
svn diff
, or similar. If you use a different tool, make sure it uses thediff -u
format and that it doesn’t contain clutter which makes it hard to read. - If you are modifying generated files, such as the top-level
configure
script, please separate out those changes into a separate patch from the rest of your changes.
Once your patch is ready, submit it by emailing it to the appropriate project’s commit mailing list (or commit it directly if applicable). Alternatively, some patches get sent to the project’s development list or component of the LLVM bug tracker, but the commit list is the primary place for reviews and should generally be preferred.
When sending a patch to a mailing list, it is a good idea to send it as an attachment to the message, not embedded into the text of the message. This ensures that your mailer will not mangle the patch when it sends it (e.g. by making whitespace changes or by wrapping lines).
For Thunderbird users: Before submitting a patch, please open Preferences >
Advanced > General > Config Editor, find the key
mail.content_disposition_type
, and set its value to 1
. Without this
setting, Thunderbird sends your attachment using Content-Disposition: inline
rather than Content-Disposition: attachment
. Apple Mail gamely displays such
a file inline, making it difficult to work with for reviewers using that
program.
When submitting patches, please do not add confidentiality or non-disclosure notices to the patches themselves. These notices conflict with the LLVM License and may result in your contribution being excluded.
Code Reviews¶
LLVM has a code review policy. Code review is one way to increase the quality of software. We generally follow these policies:
- All developers are required to have significant changes reviewed before they are committed to the repository.
- Code reviews are conducted by email on the relevant project’s commit mailing list, or alternatively on the project’s development list or bug tracker.
- Code can be reviewed either before it is committed or after. We expect major changes to be reviewed before being committed, but smaller changes (or changes where the developer owns the component) can be reviewed after commit.
- The developer responsible for a code change is also responsible for making all necessary review-related changes.
- Code review can be an iterative process, which continues until the patch is ready to be committed. Specifically, once a patch is sent out for review, it needs an explicit “looks good” before it is submitted. Do not assume silent approval, or request active objections to the patch with a deadline.
Sometimes code reviews will take longer than you would hope for, especially for larger features. Accepted ways to speed up review times for your patches are:
- Review other people’s patches. If you help out, everybody will be more willing to do the same for you; goodwill is our currency.
- Ping the patch. If it is urgent, provide reasons why it is important to you to get this patch landed and ping it every couple of days. If it is not urgent, the common courtesy ping rate is one week. Remember that you’re asking for valuable time from other professional developers.
- Ask for help on IRC. Developers on IRC will be able to either help you directly, or tell you who might be a good reviewer.
- Split your patch into multiple smaller patches that build on each other. The smaller your patch, the higher the probability that somebody will take a quick look at it.
Developers should participate in code reviews as both reviewers and reviewees. If someone is kind enough to review your code, you should return the favor for someone else. Note that anyone is welcome to review and give feedback on a patch, but only people with Subversion write access can approve it.
There is a web based code review tool that can optionally be used for code reviews. See Code Reviews with Phabricator.
Code Owners¶
The LLVM Project relies on two features of its process to maintain rapid development in addition to the high quality of its source base: the combination of code review plus post-commit review for trusted maintainers. Having both is a great way for the project to take advantage of the fact that most people do the right thing most of the time, and only commit patches without pre-commit review when they are confident they are right.
The trick to this is that the project has to guarantee that all patches that are committed are reviewed after they go in: you don’t want everyone to assume someone else will review it, allowing the patch to go unreviewed. To solve this problem, we have a notion of an ‘owner’ for a piece of the code. The sole responsibility of a code owner is to ensure that a commit to their area of the code is appropriately reviewed, either by themself or by someone else. The list of current code owners can be found in the file CODE_OWNERS.TXT in the root of the LLVM source tree.
Note that code ownership is completely different than reviewers: anyone can review a piece of code, and we welcome code review from anyone who is interested. Code owners are the “last line of defense” to guarantee that all patches that are committed are actually reviewed.
Being a code owner is a somewhat unglamorous position, but it is incredibly important for the ongoing success of the project. Because people get busy, interests change, and unexpected things happen, code ownership is purely opt-in, and anyone can choose to resign their “title” at any time. For now, we do not have an official policy on how one gets elected to be a code owner.
Test Cases¶
Developers are required to create test cases for any bugs fixed and any new features added. Some tips for getting your testcase approved:
- All feature and regression test cases are added to the
llvm/test
directory. The appropriate sub-directory should be selected (see the Testing Guide for details). - Test cases should be written in LLVM assembly language.
- Test cases, especially for regressions, should be reduced as much as possible,
by bugpoint or manually. It is unacceptable to place an
entire failing program into
llvm/test
as this creates a time-to-test burden on all developers. Please keep them short.
Note that llvm/test and clang/test are designed for regression and small feature
tests only. More extensive test cases (e.g., entire applications, benchmarks,
etc) should be added to the llvm-test
test suite. The llvm-test suite is
for coverage (correctness, performance, etc) testing, not feature or regression
testing.
Quality¶
The minimum quality standards that any change must satisfy before being committed to the main development branch are:
- Code must adhere to the LLVM Coding Standards.
- Code must compile cleanly (no errors, no warnings) on at least one platform.
- Bug fixes and new features should include a testcase so we know if the fix/feature ever regresses in the future.
- Code must pass the
llvm/test
test suite. - The code must not cause regressions on a reasonable subset of llvm-test,
where “reasonable” depends on the contributor’s judgement and the scope of
the change (more invasive changes require more testing). A reasonable subset
might be something like “
llvm-test/MultiSource/Benchmarks
”.
Additionally, the committer is responsible for addressing any problems found in the future that the change is responsible for. For example:
- The code should compile cleanly on all supported platforms.
- The changes should not cause any correctness regressions in the
llvm-test
suite and must not cause any major performance regressions. - The change set should not cause performance or correctness regressions for the LLVM tools.
- The changes should not cause performance or correctness regressions in code compiled by LLVM on all applicable targets.
- You are expected to address any Bugzilla bugs that result from your change.
We prefer for this to be handled before submission but understand that it isn’t possible to test all of this for every submission. Our build bots and nightly testing infrastructure normally finds these problems. A good rule of thumb is to check the nightly testers for regressions the day after your change. Build bots will directly email you if a group of commits that included yours caused a failure. You are expected to check the build bot messages to see if they are your fault and, if so, fix the breakage.
Commits that violate these quality standards (e.g. are very broken) may be reverted. This is necessary when the change blocks other developers from making progress. The developer is welcome to re-commit the change after the problem has been fixed.
Commit messages¶
Although we don’t enforce the format of commit messages, we prefer that you follow these guidelines to help review, search in logs, email formatting and so on. These guidelines are very similar to rules used by other open source projects.
Most importantly, the contents of the message should be carefully written to convey the rationale of the change (without delving too much in detail). It also should avoid being vague or overly specific. For example, “bits were not set right” will leave the reviewer wondering about which bits, and why they weren’t right, while “Correctly set overflow bits in TargetInfo” conveys almost all there is to the change.
Below are some guidelines about the format of the message itself:
- Separate the commit message into title, body and, if you’re not the original author, a “Patch by” attribution line (see below).
- The title should be concise. Because all commits are emailed to the list with the first line as the subject, long titles are frowned upon. Short titles also look better in git log.
- When the changes are restricted to a specific part of the code (e.g. a back-end or optimization pass), it is customary to add a tag to the beginning of the line in square brackets. For example, “[SCEV] ...” or “[OpenMP] ...”. This helps email filters and searches for post-commit reviews.
- The body, if it exists, should be separated from the title by an empty line.
- The body should be concise, but explanatory, including a complete reasoning. Unless it is required to understand the change, examples, code snippets and gory details should be left to bug comments, web review or the mailing list.
- If the patch fixes a bug in bugzilla, please include the PR# in the message.
- Attribution of Changes should be in a separate line, after the end of the body, as simple as “Patch by John Doe.”. This is how we officially handle attribution, and there are automated processes that rely on this format.
- Text formatting and spelling should follow the same rules as documentation and in-code comments, ex. capitalization, full stop, etc.
- If the commit is a bug fix on top of another recently committed patch, or a revert or reapply of a patch, include the svn revision number of the prior related commit. This could be as simple as “Revert rNNNN because it caused PR#”.
For minor violations of these recommendations, the community normally favors reminding the contributor of this policy over reverting. Minor corrections and omissions can be handled by sending a reply to the commits mailing list.
Obtaining Commit Access¶
We grant commit access to contributors with a track record of submitting high quality patches. If you would like commit access, please send an email to Chris with the following information:
- The user name you want to commit with, e.g. “hacker”.
- The full name and email address you want message to llvm-commits to come from, e.g. “J. Random Hacker <hacker@yoyodyne.com>”.
- A “password hash” of the password you want to use, e.g. “
2ACR96qjUqsyM
”. Note that you don’t ever tell us what your password is; you just give it to us in an encrypted form. To get this, run “htpasswd
” (a utility that comes with apache) in crypt mode (often enabled with “-d
”), or find a web page that will do it for you.
Once you’ve been granted commit access, you should be able to check out an LLVM tree with an SVN URL of “https://username@llvm.org/...” instead of the normal anonymous URL of “http://llvm.org/...”. The first time you commit you’ll have to type in your password. Note that you may get a warning from SVN about an untrusted key; you can ignore this. To verify that your commit access works, please do a test commit (e.g. change a comment or add a blank line). Your first commit to a repository may require the autogenerated email to be approved by a mailing list. This is normal and will be done when the mailing list owner has time.
If you have recently been granted commit access, these policies apply:
- You are granted commit-after-approval to all parts of LLVM. To get approval, submit a patch to llvm-commits. When approved, you may commit it yourself.
- You are allowed to commit patches without approval which you think are obvious. This is clearly a subjective decision — we simply expect you to use good judgement. Examples include: fixing build breakage, reverting obviously broken patches, documentation/comment changes, any other minor changes.
- You are allowed to commit patches without approval to those portions of LLVM that you have contributed or maintain (i.e., have been assigned responsibility for), with the proviso that such commits must not break the build. This is a “trust but verify” policy, and commits of this nature are reviewed after they are committed.
- Multiple violations of these policies or a single egregious violation may cause commit access to be revoked.
In any case, your changes are still subject to code review (either before or after they are committed, depending on the nature of the change). You are encouraged to review other peoples’ patches as well, but you aren’t required to do so.
Making a Major Change¶
When a developer begins a major new project with the aim of contributing it back to LLVM, they should inform the community with an email to the llvm-dev email list, to the extent possible. The reason for this is to:
- keep the community informed about future changes to LLVM,
- avoid duplication of effort by preventing multiple parties working on the same thing and not knowing about it, and
- ensure that any technical issues around the proposed work are discussed and resolved before any significant work is done.
The design of LLVM is carefully controlled to ensure that all the pieces fit together well and are as consistent as possible. If you plan to make a major change to the way LLVM works or want to add a major new extension, it is a good idea to get consensus with the development community before you start working on it.
Once the design of the new feature is finalized, the work itself should be done as a series of incremental changes, not as a long-term development branch.
Incremental Development¶
In the LLVM project, we do all significant changes as a series of incremental patches. We have a strong dislike for huge changes or long-term development branches. Long-term development branches have a number of drawbacks:
- Branches must have mainline merged into them periodically. If the branch development and mainline development occur in the same pieces of code, resolving merge conflicts can take a lot of time.
- Other people in the community tend to ignore work on branches.
- Huge changes (produced when a branch is merged back onto mainline) are extremely difficult to code review.
- Branches are not routinely tested by our nightly tester infrastructure.
- Changes developed as monolithic large changes often don’t work until the entire set of changes is done. Breaking it down into a set of smaller changes increases the odds that any of the work will be committed to the main repository.
To address these problems, LLVM uses an incremental development style and we require contributors to follow this practice when making a large/invasive change. Some tips:
- Large/invasive changes usually have a number of secondary changes that are required before the big change can be made (e.g. API cleanup, etc). These sorts of changes can often be done before the major change is done, independently of that work.
- The remaining inter-related work should be decomposed into unrelated sets of changes if possible. Once this is done, define the first increment and get consensus on what the end goal of the change is.
- Each change in the set can be stand alone (e.g. to fix a bug), or part of a planned series of changes that works towards the development goal.
- Each change should be kept as small as possible. This simplifies your work (into a logical progression), simplifies code review and reduces the chance that you will get negative feedback on the change. Small increments also facilitate the maintenance of a high quality code base.
- Often, an independent precursor to a big change is to add a new API and slowly migrate clients to use the new API. Each change to use the new API is often “obvious” and can be committed without review. Once the new API is in place and used, it is much easier to replace the underlying implementation of the API. This implementation change is logically separate from the API change.
If you are interested in making a large change, and this scares you, please make sure to first discuss the change/gather consensus then ask about the best way to go about making the change.
Attribution of Changes¶
When contributors submit a patch to an LLVM project, other developers with commit access may commit it for the author once appropriate (based on the progression of code review, etc.). When doing so, it is important to retain correct attribution of contributions to their contributors. However, we do not want the source code to be littered with random attributions “this code written by J. Random Hacker” (this is noisy and distracting). In practice, the revision control system keeps a perfect history of who changed what, and the CREDITS.txt file describes higher-level contributions. If you commit a patch for someone else, please follow the attribution of changes in the simple manner as outlined by the commit messages section. Overall, please do not add contributor names to the source code.
Also, don’t commit patches authored by others unless they have submitted the patch to the project or you have been authorized to submit them on their behalf (you work together and your company authorized you to contribute the patches, etc.). The author should first submit them to the relevant project’s commit list, development list, or LLVM bug tracker component. If someone sends you a patch privately, encourage them to submit it to the appropriate list first.
IR Backwards Compatibility¶
When the IR format has to be changed, keep in mind that we try to maintain some backwards compatibility. The rules are intended as a balance between convenience for llvm users and not imposing a big burden on llvm developers:
- The textual format is not backwards compatible. We don’t change it too often, but there are no specific promises.
- The bitcode format produced by a X.Y release will be readable by all following X.Z releases and the (X+1).0 release.
- Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR.
- Debug metadata is special in that it is currently dropped during upgrades.
- Non-debug metadata is defined to be safe to drop, so a valid way to upgrade it is to drop it. That is not very user friendly and a bit more effort is expected, but no promises are made.
Copyright, License, and Patents¶
Note
This section deals with legal matters but does not provide legal advice. We are not lawyers — please seek legal counsel from an attorney.
This section addresses the issues of copyright, license and patents for the LLVM project. The copyright for the code is held by the individual contributors of the code and the terms of its license to LLVM users and developers is the University of Illinois/NCSA Open Source License (with portions dual licensed under the MIT License, see below). As contributor to the LLVM project, you agree to allow any contributions to the project to licensed under these terms.
Copyright¶
The LLVM project does not require copyright assignments, which means that the copyright for the code in the project is held by its respective contributors who have each agreed to release their contributed code under the terms of the LLVM License.
An implication of this is that the LLVM license is unlikely to ever change: changing it would require tracking down all the contributors to LLVM and getting them to agree that a license change is acceptable for their contribution. Since there are no plans to change the license, this is not a cause for concern.
As a contributor to the project, this means that you (or your company) retain ownership of the code you contribute, that it cannot be used in a way that contradicts the license (which is a liberal BSD-style license), and that the license for your contributions won’t change without your approval in the future.
License¶
We intend to keep LLVM perpetually open source and to use a liberal open source license. As a contributor to the project, you agree that any contributions be licensed under the terms of the corresponding subproject. All of the code in LLVM is available under the University of Illinois/NCSA Open Source License, which boils down to this:
- You can freely distribute LLVM.
- You must retain the copyright notice if you redistribute LLVM.
- Binaries derived from LLVM must reproduce the copyright notice (e.g. in an included readme file).
- You can’t use our names to promote your LLVM derived products.
- There’s no warranty on LLVM at all.
We believe this fosters the widest adoption of LLVM because it allows commercial products to be derived from LLVM with few restrictions and without a requirement for making any derived works also open source (i.e. LLVM’s license is not a “copyleft” license like the GPL). We suggest that you read the License if further clarification is needed.
In addition to the UIUC license, the runtime library components of LLVM (compiler_rt, libc++, and libclc) are also licensed under the MIT License, which does not contain the binary redistribution clause. As a user of these runtime libraries, it means that you can choose to use the code under either license (and thus don’t need the binary redistribution clause), and as a contributor to the code that you agree that any contributions to these libraries be licensed under both licenses. We feel that this is important for runtime libraries, because they are implicitly linked into applications and therefore should not subject those applications to the binary redistribution clause. This also means that it is ok to move code from (e.g.) libc++ to the LLVM core without concern, but that code cannot be moved from the LLVM core to libc++ without the copyright owner’s permission.
Note that the LLVM Project does distribute dragonegg, which is GPL. This means that anything “linked” into dragonegg must itself be compatible with the GPL, and must be releasable under the terms of the GPL. This implies that any code linked into dragonegg and distributed to others may be subject to the viral aspects of the GPL (for example, a proprietary code generator linked into dragonegg must be made available under the GPL). This is not a problem for code already distributed under a more liberal license (like the UIUC license), and GPL-containing subprojects are kept in separate SVN repositories whose LICENSE.txt files specifically indicate that they contain GPL code.
We have no plans to change the license of LLVM. If you have questions or comments about the license, please contact the LLVM Developer's Mailing List.
Patents¶
To the best of our knowledge, LLVM does not infringe on any patents (we have actually removed code from LLVM in the past that was found to infringe). Having code in LLVM that infringes on patents would violate an important goal of the project by making it hard or impossible to reuse the code for arbitrary purposes (including commercial use).
When contributing code, we expect contributors to notify us of any potential for patent-related trouble with their changes (including from third parties). If you or your employer own the rights to a patent and would like to contribute code to LLVM that relies on it, we require that the copyright owner sign an agreement that allows any other user of LLVM to freely use your patent. Please contact the oversight group for more details.
LLVM Makefile Guide¶
Introduction¶
This document provides usage information about the LLVM makefile system. While
loosely patterned after the BSD makefile system, LLVM has taken a departure from
BSD in order to implement additional features needed by LLVM. Although makefile
systems, such as automake
, were attempted at one point, it has become clear
that the features needed by LLVM and the Makefile
norm are too great to use
a more limited tool. Consequently, LLVM requires simply GNU Make 3.79, a widely
portable makefile processor. LLVM unabashedly makes heavy use of the features of
GNU Make so the dependency on GNU Make is firm. If you’re not familiar with
make
, it is recommended that you read the GNU Makefile Manual.
While this document is rightly part of the LLVM Programmer’s Manual, it is treated separately here because of the volume of content and because it is often an early source of bewilderment for new developers.
General Concepts¶
The LLVM Makefile System is the component of LLVM that is responsible for building the software, testing it, generating distributions, checking those distributions, installing and uninstalling, etc. It consists of a several files throughout the source tree. These files and other general concepts are described in this section.
Projects¶
The LLVM Makefile System is quite generous. It not only builds its own software,
but it can build yours too. Built into the system is knowledge of the
llvm/projects
directory. Any directory under projects
that has both a
configure
script and a Makefile
is assumed to be a project that uses the
LLVM Makefile system. Building software that uses LLVM does not require the
LLVM Makefile System nor even placement in the llvm/projects
directory. However, doing so will allow your project to get up and running
quickly by utilizing the built-in features that are used to compile LLVM. LLVM
compiles itself using the same features of the makefile system as used for
projects.
For further details, consult the Projects page.
Variable Values¶
To use the makefile system, you simply create a file named Makefile
in your
directory and declare values for certain variables. The variables and values
that you select determine what the makefile system will do. These variables
enable rules and processing in the makefile system that automatically Do The
Right Thing (C).
Including Makefiles¶
Setting variables alone is not enough. You must include into your Makefile additional files that provide the rules of the LLVM Makefile system. The various files involved are described in the sections that follow.
Makefile
¶
Each directory to participate in the build needs to have a file named
Makefile
. This is the file first read by make
. It has three
sections:
- Settable Variables — Required that must be set first.
include $(LEVEL)/Makefile.common
— include the LLVM Makefile system.- Override Variables — Override variables set by the LLVM Makefile system.
Makefile.common
¶
Every project must have a Makefile.common
file at its top source
directory. This file serves three purposes:
- It includes the project’s configuration makefile to obtain values determined
by the
configure
script. This is done by including the $(LEVEL)/Makefile.config file. - It specifies any other (static) values that are needed throughout the project. Only values that are used in all or a large proportion of the project’s directories should be placed here.
- It includes the standard rules for the LLVM Makefile system,
$(LLVM_SRC_ROOT)/Makefile.rules. This file is the guts of the LLVM
Makefile
system.
Makefile.config
¶
Every project must have a Makefile.config
at the top of its build
directory. This file is generated by the configure
script from the
pattern provided by the Makefile.config.in
file located at the top of the
project’s source directory. The contents of this file depend largely on what
configuration items the project uses, however most projects can get what they
need by just relying on LLVM’s configuration found in
$(LLVM_OBJ_ROOT)/Makefile.config
.
Makefile.rules
¶
This file, located at $(LLVM_SRC_ROOT)/Makefile.rules
is the heart of the
LLVM Makefile System. It provides all the logic, dependencies, and rules for
building the targets supported by the system. What it does largely depends on
the values of make
variables that have been set before
Makefile.rules
is included.
Comments¶
User Makefile
s need not have comments in them unless the construction is
unusual or it does not strictly follow the rules and patterns of the LLVM
makefile system. Makefile comments are invoked with the pound (#
) character.
The #
character and any text following it, to the end of the line, are
ignored by make
.
Tutorial¶
This section provides some examples of the different kinds of modules you can build with the LLVM makefile system. In general, each directory you provide will build a single object although that object may be composed of additionally compiled components.
Libraries¶
Only a few variable definitions are needed to build a regular library.
Normally, the makefile system will build all the software into a single
libname.o
(pre-linked) object. This means the library is not searchable and
that the distinction between compilation units has been dissolved. Optionally,
you can ask for a shared library (.so) or archive library (.a) built. Archive
libraries are the default. For example:
LIBRARYNAME = mylib
SHARED_LIBRARY = 1
BUILD_ARCHIVE = 1
says to build a library named mylib
with both a shared library
(mylib.so
) and an archive library (mylib.a
) version. The contents of all
the libraries produced will be the same, they are just constructed differently.
Note that you normally do not need to specify the sources involved. The LLVM
Makefile system will infer the source files from the contents of the source
directory.
The LOADABLE_MODULE=1
directive can be used in conjunction with
SHARED_LIBRARY=1
to indicate that the resulting shared library should be
openable with the dlopen
function and searchable with the dlsym
function
(or your operating system’s equivalents). While this isn’t strictly necessary on
Linux and a few other platforms, it is required on systems like HP-UX and
Darwin. You should use LOADABLE_MODULE
for any shared library that you
intend to be loaded into an tool via the -load
option. Pass
documentation has an example of why you might
want to do this.
Loadable Modules¶
In some situations, you need to create a loadable module. Loadable modules can
be loaded into programs like opt
or llc
to specify additional passes to
run or targets to support. Loadable modules are also useful for debugging a
pass or providing a pass with another package if that pass can’t be included in
LLVM.
LLVM provides complete support for building such a module. All you need to do is
use the LOADABLE_MODULE
variable in your Makefile
. For example, to build
a loadable module named MyMod
that uses the LLVM libraries LLVMSupport.a
and LLVMSystem.a
, you would specify:
LIBRARYNAME := MyMod
LOADABLE_MODULE := 1
LINK_COMPONENTS := support system
Use of the LOADABLE_MODULE
facility implies several things:
- There will be no “
lib
” prefix on the module. This differentiates it from - a standard shared library of the same name.
- There will be no “
- The SHARED_LIBRARY variable is turned on.
- The LINK_LIBS_IN_SHARED variable is turned on.
A loadable module is loaded by LLVM via the facilities of libtool’s libltdl
library which is part of lib/System
implementation.
Tools¶
For building executable programs (tools), you must provide the name of the tool and the names of the libraries you wish to link with the tool. For example:
TOOLNAME = mytool
USEDLIBS = mylib
LINK_COMPONENTS = support system
says that we are to build a tool name mytool
and that it requires three
libraries: mylib
, LLVMSupport.a
and LLVMSystem.a
.
Note that two different variables are used to indicate which libraries are
linked: USEDLIBS
and LLVMLIBS
. This distinction is necessary to support
projects. LLVMLIBS
refers to the LLVM libraries found in the LLVM object
directory. USEDLIBS
refers to the libraries built by your project. In the
case of building LLVM tools, USEDLIBS
and LLVMLIBS
can be used
interchangeably since the “project” is LLVM itself and USEDLIBS
refers to
the same place as LLVMLIBS
.
Also note that there are two different ways of specifying a library: with a
.a
suffix and without. Without the suffix, the entry refers to the re-linked
(.o) file which will include all symbols of the library. This is
useful, for example, to include all passes from a library of passes. If the
.a
suffix is used then the library is linked as a searchable library (with
the -l
option). In this case, only the symbols that are unresolved at
that point will be resolved from the library, if they exist. Other
(unreferenced) symbols will not be included when the .a
syntax is used. Note
that in order to use the .a
suffix, the library in question must have been
built with the BUILD_ARCHIVE
option set.
JIT Tools¶
Many tools will want to use the JIT features of LLVM. To do this, you simply specify that you want an execution ‘engine’, and the makefiles will automatically link in the appropriate JIT for the host or an interpreter if none is available:
TOOLNAME = my_jit_tool
USEDLIBS = mylib
LINK_COMPONENTS = engine
Of course, any additional libraries may be listed as other components. To get a full understanding of how this changes the linker command, it is recommended that you:
% cd examples/Fibonacci
% make VERBOSE=1
Targets Supported¶
This section describes each of the targets that can be built using the LLVM Makefile system. Any target can be invoked from any directory but not all are applicable to a given directory (e.g. “check”, “dist” and “install” will always operate as if invoked from the top level directory).
Target Name | Implied Targets | Target Description |
---|---|---|
all |
Compile the software recursively. Default target. | |
all-local |
Compile the software in the local directory only. | |
check |
Change to the test directory in a project and run the test suite there. |
|
check-local |
Run a local test suite. Generally this is only defined in the Makefile of the project’s test directory. |
|
clean |
Remove built objects recursively. | |
clean-local |
Remove built objects from the local directory only. | |
dist |
all |
Prepare a source distribution tarball. |
dist-check |
all |
Prepare a source distribution tarball and check that it builds. |
dist-clean |
clean |
Clean source distribution tarball temporary files. |
install |
all |
Copy built objects to installation directory. |
preconditions |
all |
Check to make sure configuration and makefiles are up to date. |
printvars |
all |
Prints variables defined by the makefile system (for debugging). |
tags |
Make C and C++ tags files for emacs and vi. | |
uninstall |
Remove built objects from installation directory. |
all
(default)¶
When you invoke make
with no arguments, you are implicitly instructing it to
seek the all
target (goal). This target is used for building the software
recursively and will do different things in different directories. For example,
in a lib
directory, the all
target will compile source files and
generate libraries. But, in a tools
directory, it will link libraries and
generate executables.
all-local
¶
This target is the same as all but it operates only on the current directory instead of recursively.
check
¶
This target can be invoked from anywhere within a project’s directories but
always invokes the check-local target in the project’s test
directory, if
it exists and has a Makefile
. A warning is produced otherwise. If
TESTSUITE is defined on the make
command line, it will be passed down to
the invocation of make check-local
in the test
directory. The intended
usage for this is to assist in running specific suites of tests. If
TESTSUITE
is not set, the implementation of check-local
should run all
normal tests. It is up to the project to define what different values for
TESTSUTE
will do. See the Testing Guide for further
details.
check-local
¶
This target should be implemented by the Makefile
in the project’s test
directory. It is invoked by the check
target elsewhere. Each project is
free to define the actions of check-local
as appropriate for that
project. The LLVM project itself uses the Lit testing
tool to run a suite of feature and regression tests. Other projects may choose
to use lit or any other testing mechanism.
clean
¶
This target cleans the build directory, recursively removing all things that the
Makefile builds. The cleaning rules have been made guarded so they shouldn’t go
awry (via rm -f $(UNSET_VARIABLE)/*
which will attempt to erase the entire
directory structure).
clean-local
¶
This target does the same thing as clean
but only for the current (local)
directory.
dist
¶
This target builds a distribution tarball. It first builds the entire project
using the all
target and then tars up the necessary files and compresses
it. The generated tarball is sufficient for a casual source distribution, but
probably not for a release (see dist-check
).
dist-check
¶
This target does the same thing as the dist
target but also checks the
distribution tarball. The check is made by unpacking the tarball to a new
directory, configuring it, building it, installing it, and then verifying that
the installation results are correct (by comparing to the original build). This
target can take a long time to run but should be done before a release goes out
to make sure that the distributed tarball can actually be built into a working
release.
dist-clean
¶
This is a special form of the clean
clean target. It performs a normal
clean
but also removes things pertaining to building the distribution.
install
¶
This target finalizes shared objects and executables and copies all libraries,
headers, executables and documentation to the directory given with the
--prefix
option to configure
. When completed, the prefix directory will
have everything needed to use LLVM.
The LLVM makefiles can generate complete internal documentation for all the
classes by using doxygen
. By default, this feature is not enabled
because it takes a long time and generates a massive amount of data (>100MB). If
you want this feature, you must configure LLVM with the –enable-doxygen switch
and ensure that a modern version of doxygen (1.3.7 or later) is available in
your PATH
. You can download doxygen from here.
preconditions
¶
This utility target checks to see if the Makefile
in the object directory is
older than the Makefile
in the source directory and copies it if so. It also
reruns the configure
script if that needs to be done and rebuilds the
Makefile.config
file similarly. Users may overload this target to ensure
that sanity checks are run before any building of targets as all the targets
depend on preconditions
.
printvars
¶
This utility target just causes the LLVM makefiles to print out some of the makefile variables so that you can double check how things are set.
reconfigure
¶
This utility target will force a reconfigure of LLVM or your project. It simply
runs $(PROJ_OBJ_ROOT)/config.status --recheck
to rerun the configuration
tests and rebuild the configured files. This isn’t generally useful as the
makefiles will reconfigure themselves whenever its necessary.
spotless
¶
Warning
Use with caution!
This utility target, only available when $(PROJ_OBJ_ROOT)
is not the same as
$(PROJ_SRC_ROOT)
, will completely clean the $(PROJ_OBJ_ROOT)
directory
by removing its content entirely and reconfiguring the directory. This returns
the $(PROJ_OBJ_ROOT)
directory to a completely fresh state. All content in
the directory except configured files and top-level makefiles will be lost.
Variables¶
Variables are used to tell the LLVM Makefile System what to do and to obtain information from it. Variables are also used internally by the LLVM Makefile System. Variable names that contain only the upper case alphabetic letters and underscore are intended for use by the end user. All other variables are internal to the LLVM Makefile System and should not be relied upon nor modified. The sections below describe how to use the LLVM Makefile variables.
Control Variables¶
Variables listed in the table below should be set before the inclusion of $(LEVEL)/Makefile.common. These variables provide input to the LLVM make system that tell it what to do for the current directory.
BUILD_ARCHIVE
- If set to any value, causes an archive (.a) library to be built.
BUILT_SOURCES
- Specifies a set of source files that are generated from other source files. These sources will be built before any other target processing to ensure they are present.
CONFIG_FILES
- Specifies a set of configuration files to be installed.
DEBUG_SYMBOLS
- If set to any value, causes the build to include debugging symbols even in optimized objects, libraries and executables. This alters the flags specified to the compilers and linkers. Debugging isn’t fun in an optimized build, but it is possible.
DIRS
- Specifies a set of directories, usually children of the current directory, that should also be made using the same goal. These directories will be built serially.
DISABLE_AUTO_DEPENDENCIES
- If set to any value, causes the makefiles to not automatically generate dependencies when running the compiler. Use of this feature is discouraged and it may be removed at a later date.
ENABLE_OPTIMIZED
- If set to 1, causes the build to generate optimized objects, libraries and executables. This alters the flags specified to the compilers and linkers. Generally debugging won’t be a fun experience with an optimized build.
ENABLE_PROFILING
- If set to 1, causes the build to generate both optimized and profiled
objects, libraries and executables. This alters the flags specified to the
compilers and linkers to ensure that profile data can be collected from the
tools built. Use the
gprof
tool to analyze the output from the profiled tools (gmon.out
). DISABLE_ASSERTIONS
- If set to 1, causes the build to disable assertions, even if building a debug or profile build. This will exclude all assertion check code from the build. LLVM will execute faster, but with little help when things go wrong.
EXPERIMENTAL_DIRS
- Specify a set of directories that should be built, but if they fail, it should not cause the build to fail. Note that this should only be used temporarily while code is being written.
EXPORTED_SYMBOL_FILE
- Specifies the name of a single file that contains a list of the symbols to be exported by the linker. One symbol per line.
EXPORTED_SYMBOL_LIST
- Specifies a set of symbols to be exported by the linker.
EXTRA_DIST
- Specifies additional files that should be distributed with LLVM. All source files, all built sources, all Makefiles, and most documentation files will be automatically distributed. Use this variable to distribute any files that are not automatically distributed.
KEEP_SYMBOLS
- If set to any value, specifies that when linking executables the makefiles should retain debug symbols in the executable. Normally, symbols are stripped from the executable.
LEVEL
(required)- Specify the level of nesting from the top level. This variable must be set in each makefile as it is used to find the top level and thus the other makefiles.
LIBRARYNAME
- Specify the name of the library to be built. (Required For Libraries)
LINK_COMPONENTS
- When specified for building a tool, the value of this variable will be
passed to the
llvm-config
tool to generate a link line for the tool. UnlikeUSEDLIBS
andLLVMLIBS
, not all libraries need to be specified. Thellvm-config
tool will figure out the library dependencies and add any libraries that are needed. TheUSEDLIBS
variable can still be used in conjunction withLINK_COMPONENTS
so that additional project-specific libraries can be linked with the LLVM libraries specified byLINK_COMPONENTS
.
LLVMLIBS
- Specifies the set of libraries from the LLVM
$(ObjDir)
that will be linked into the tool or library. LOADABLE_MODULE
- If set to any value, causes the shared library being built to also be a
loadable module. Loadable modules can be opened with the dlopen() function
and searched with dlsym (or the operating system’s equivalent). Note that
setting this variable without also setting
SHARED_LIBRARY
will have no effect. NO_INSTALL
- Specifies that the build products of the directory should not be installed
but should be built even if the
install
target is given. This is handy for directories that build libraries or tools that are only used as part of the build process, such as code generators (e.g.tblgen
). OPTIONAL_DIRS
- Specify a set of directories that may be built, if they exist, but it is not an error for them not to exist.
PARALLEL_DIRS
- Specify a set of directories to build recursively and in parallel if the
-j
option was used withmake
.
TESTSUITE
- Specifies the directory of tests to run in
llvm/test
. TOOLNAME
- Specifies the name of the tool that the current directory should build.
TOOL_VERBOSE
- Implies
VERBOSE
and also tells each tool invoked to be verbose. This is handy when you’re trying to see the sub-tools invoked by each tool invoked by the makefile. For example, this will pass-v
to the GCC compilers which causes it to print out the command lines it uses to invoke sub-tools (compiler, assembler, linker).
USEDLIBS
- Specifies the list of project libraries that will be linked into the tool or library.
VERBOSE
- Tells the Makefile system to produce detailed output of what it is doing instead of just summary comments. This will generate a LOT of output.
Override Variables¶
Override variables can be used to override the default values provided by the LLVM makefile system. These variables can be set in several ways:
- In the environment (e.g. setenv, export) — not recommended.
- On the
make
command line — recommended. - On the
configure
command line. - In the Makefile (only after the inclusion of $(LEVEL)/Makefile.common).
The override variables are given below:
AR
(defaulted)- Specifies the path to the
ar
tool. PROJ_OBJ_DIR
- The directory into which the products of build rules will be placed. This might be the same as PROJ_SRC_DIR but typically is not.
PROJ_SRC_DIR
- The directory which contains the source files to be built.
BUILD_EXAMPLES
- If set to 1, build examples in
examples
and (if building Clang)tools/clang/examples
directories. BZIP2
(configured)- The path to the
bzip2
tool. CC
(configured)- The path to the ‘C’ compiler.
CFLAGS
- Additional flags to be passed to the ‘C’ compiler.
CPPFLAGS
- Additional flags passed to the C/C++ preprocessor.
CXX
- Specifies the path to the C++ compiler.
CXXFLAGS
- Additional flags to be passed to the C++ compiler.
DATE
(configured)- Specifies the path to the
date
program or any program that can generate the current date and time on its standard output. DOT
(configured)- Specifies the path to the
dot
tool orfalse
if there isn’t one. ECHO
(configured)- Specifies the path to the
echo
tool for printing output. EXEEXT
(configured)- Provides the extension to be used on executables built by the makefiles. The value may be empty on platforms that do not use file extensions for executables (e.g. Unix).
INSTALL
(configured)- Specifies the path to the
install
tool. LDFLAGS
(configured)- Allows users to specify additional flags to pass to the linker.
LIBS
(configured)- The list of libraries that should be linked with each tool.
LIBTOOL
(configured)- Specifies the path to the
libtool
tool. This tool is renamedmklib
by theconfigure
script. LLVMAS
(defaulted)- Specifies the path to the
llvm-as
tool. LLVMGCC
(defaulted)- Specifies the path to the LLVM version of the GCC ‘C’ Compiler.
LLVMGXX
(defaulted)- Specifies the path to the LLVM version of the GCC C++ Compiler.
LLVMLD
(defaulted)- Specifies the path to the LLVM bitcode linker tool
LLVM_OBJ_ROOT
(configured)- Specifies the top directory into which the output of the build is placed.
LLVM_SRC_ROOT
(configured)- Specifies the top directory in which the sources are found.
LLVM_TARBALL_NAME
(configured)- Specifies the name of the distribution tarball to create. This is configured from the name of the project and its version number.
MKDIR
(defaulted)- Specifies the path to the
mkdir
tool that creates directories. ONLY_TOOLS
- If set, specifies the list of tools to build.
PLATFORMSTRIPOPTS
- The options to provide to the linker to specify that a stripped (no symbols) executable should be built.
RANLIB
(defaulted)- Specifies the path to the
ranlib
tool. RM
(defaulted)- Specifies the path to the
rm
tool. SED
(defaulted)- Specifies the path to the
sed
tool. SHLIBEXT
(configured)- Provides the filename extension to use for shared libraries.
TBLGEN
(defaulted)- Specifies the path to the
tblgen
tool. TAR
(defaulted)- Specifies the path to the
tar
tool. ZIP
(defaulted)- Specifies the path to the
zip
tool.
Readable Variables¶
Variables listed in the table below can be used by the user’s Makefile but should not be changed. Changing the value will generally cause the build to go wrong, so don’t do it.
bindir
- The directory into which executables will ultimately be installed. This
value is derived from the
--prefix
option given toconfigure
. BuildMode
- The name of the type of build being performed: Debug, Release, or Profile.
bytecode_libdir
- The directory into which bitcode libraries will ultimately be installed.
This value is derived from the
--prefix
option given toconfigure
. ConfigureScriptFLAGS
- Additional flags given to the
configure
script when reconfiguring. DistDir
- The current directory for which a distribution copy is being made.
Echo
- The LLVM Makefile System output command. This provides the
llvm[n]
prefix and starts with@
so the command itself is not printed bymake
. EchoCmd
- Same as Echo but without the leading
@
. includedir
- The directory into which include files will ultimately be installed. This
value is derived from the
--prefix
option given toconfigure
. libdir
- The directory into which native libraries will ultimately be installed.
This value is derived from the
--prefix
option given toconfigure
. LibDir
- The configuration specific directory into which libraries are placed before installation.
MakefileConfig
- Full path of the
Makefile.config
file. MakefileConfigIn
- Full path of the
Makefile.config.in
file. ObjDir
- The configuration and directory specific directory where build objects (compilation results) are placed.
SubDirs
- The complete list of sub-directories of the current directory as specified by other variables.
Sources
- The complete list of source files.
sysconfdir
- The directory into which configuration files will ultimately be
installed. This value is derived from the
--prefix
option given toconfigure
. ToolDir
- The configuration specific directory into which executables are placed before they are installed.
TopDistDir
- The top most directory into which the distribution files are copied.
Verb
- Use this as the first thing on your build script lines to enable or disable
verbose mode. It expands to either an
@
(quiet mode) or nothing (verbose mode).
Internal Variables¶
Variables listed below are used by the LLVM Makefile System and considered internal. You should not use these variables under any circumstances.
Archive
AR.Flags
BaseNameSources
BCLinkLib
C.Flags
Compile.C
CompileCommonOpts
Compile.CXX
ConfigStatusScript
ConfigureScript
CPP.Flags
CPP.Flags
CXX.Flags
DependFiles
DestArchiveLib
DestBitcodeLib
DestModule
DestSharedLib
DestTool
DistAlways
DistCheckDir
DistCheckTop
DistFiles
DistName
DistOther
DistSources
DistSubDirs
DistTarBZ2
DistTarGZip
DistZip
ExtraLibs
FakeSources
INCFiles
InternalTargets
LD.Flags
LibName.A
LibName.BC
LibName.LA
LibName.O
LibTool.Flags
Link
LinkModule
LLVMLibDir
LLVMLibsOptions
LLVMLibsPaths
LLVMToolDir
LLVMUsedLibs
LocalTargets
Module
ObjectsLO
ObjectsO
ObjMakefiles
ParallelTargets
PreConditions
ProjLibsOptions
ProjLibsPaths
ProjUsedLibs
Ranlib
RecursiveTargets
SrcMakefiles
Strip
StripWarnMsg
TableGen
TDFiles
ToolBuildPath
TopLevelTargets
UserTargets
Creating an LLVM Project¶
Overview¶
The LLVM build system is designed to facilitate the building of third party
projects that use LLVM header files, libraries, and tools. In order to use
these facilities, a Makefile
from a project must do the following things:
- Set
make
variables. There are several variables that aMakefile
needs to set to use the LLVM build system:PROJECT_NAME
- The name by which your project is known.LLVM_SRC_ROOT
- The root of the LLVM source tree.LLVM_OBJ_ROOT
- The root of the LLVM object tree.PROJ_SRC_ROOT
- The root of the project’s source tree.PROJ_OBJ_ROOT
- The root of the project’s object tree.PROJ_INSTALL_ROOT
- The root installation directory.LEVEL
- The relative path from the current directory to the project’s root($PROJ_OBJ_ROOT)
.
- Include
Makefile.config
from$(LLVM_OBJ_ROOT)
. - Include
Makefile.rules
from$(LLVM_SRC_ROOT)
.
There are two ways that you can set all of these variables:
- You can write your own
Makefiles
which hard-code these values. - You can use the pre-made LLVM sample project. This sample project includes
Makefiles
, a configure script that can be used to configure the location of LLVM, and the ability to support multiple object directories from a single source directory.
If you want to devise your own build system, studying other projects and LLVM
Makefiles
will probably provide enough information on how to write your own
Makefiles
.
Source Tree Layout¶
In order to use the LLVM build system, you will want to organize your source code so that it can benefit from the build system’s features. Mainly, you want your source tree layout to look similar to the LLVM source tree layout.
Underneath your top level directory, you should have the following directories:
lib
This subdirectory should contain all of your library source code. For each library that you build, you will have one directory in lib that will contain that library’s source code.
Libraries can be object files, archives, or dynamic libraries. The lib directory is just a convenient place for libraries as it places them all in a directory from which they can be linked later.
include
This subdirectory should contain any header files that are global to your project. By global, we mean that they are used by more than one library or executable of your project.
By placing your header files in include, they will be found automatically by the LLVM build system. For example, if you have a file include/jazz/note.h, then your source files can include it simply with #include “jazz/note.h”.
tools
This subdirectory should contain all of your source code for executables. For each program that you build, you will have one directory in tools that will contain that program’s source code.
test
This subdirectory should contain tests that verify that your code works correctly. Automated tests are especially useful.
Currently, the LLVM build system provides basic support for tests. The LLVM system provides the following:
LLVM contains regression tests in
llvm/test
. These tests are run by the Lit testing tool. This test procedure usesRUN
lines in the actual test case to determine how to run the test. See the LLVM Testing Infrastructure Guide for more details.LLVM contains an optional package called
llvm-test
, which provides benchmarks and programs that are known to compile with the Clang front end. You can use these programs to test your code, gather statistical information, and compare it to the current LLVM performance statistics.Currently, there is no way to hook your tests directly into the
llvm/test
testing harness. You will simply need to find a way to use the source provided within that directory on your own.
Typically, you will want to build your lib directory first followed by your tools directory.
Writing LLVM Style Makefiles¶
The LLVM build system provides a convenient way to build libraries and executables. Most of your project Makefiles will only need to define a few variables. Below is a list of the variables one can set and what they can do:
Required Variables¶
LEVEL
This variable is the relative path from thisMakefile
to the top directory of your project’s source code. For example, if your source code is in/tmp/src
, then theMakefile
in/tmp/src/jump/high
would setLEVEL
to"../.."
.
Variables for Building Subdirectories¶
DIRS
This is a space separated list of subdirectories that should be built. They will be built, one at a time, in the order specified.
PARALLEL_DIRS
This is a list of directories that can be built in parallel. These will be built after the directories in DIRS have been built.
OPTIONAL_DIRS
This is a list of directories that can be built if they exist, but will not cause an error if they do not exist. They are built serially in the order in which they are listed.
Variables for Building Libraries¶
LIBRARYNAME
This variable contains the base name of the library that will be built. For example, to build a library namedlibsample.a
,LIBRARYNAME
should be set tosample
.
BUILD_ARCHIVE
By default, a library is a.o
file that is linked directly into a program. To build an archive (also known as a static library), set theBUILD_ARCHIVE
variable.
SHARED_LIBRARY
IfSHARED_LIBRARY
is defined in your Makefile, a shared (or dynamic) library will be built.
Variables for Building Programs¶
TOOLNAME
This variable contains the name of the program that will be built. For example, to build an executable namedsample
,TOOLNAME
should be set tosample
.
USEDLIBS
This variable holds a space separated list of libraries that should be linked into the program. These libraries must be libraries that come from your lib directory. The libraries must be specified without their
lib
prefix. For example, to linklibsample.a
, you would setUSEDLIBS
tosample.a
.Note that this works only for statically linked libraries.
LLVMLIBS
This variable holds a space separated list of libraries that should be linked into the program. These libraries must be LLVM libraries. The libraries must be specified without their
lib
prefix. For example, to link with a driver that performs an IR transformation you might setLLVMLIBS
to this minimal set of librariesLLVMSupport.a LLVMCore.a LLVMBitReader.a LLVMAsmParser.a LLVMAnalysis.a LLVMTransformUtils.a LLVMScalarOpts.a LLVMTarget.a
.Note that this works only for statically linked libraries. LLVM is split into a large number of static libraries, and the list of libraries you require may be much longer than the list above. To see a full list of libraries use:
llvm-config --libs all
. UsingLINK_COMPONENTS
as described below, obviates the need to setLLVMLIBS
.
LINK_COMPONENTS
This variable holds a space separated list of components that the LLVMMakefiles
pass to thellvm-config
tool to generate a link line for the program. For example, to link with all LLVM libraries useLINK_COMPONENTS = all
.
LIBS
To link dynamic libraries, add
-l<library base name>
to theLIBS
variable. The LLVM build system will look in the same places for dynamic libraries as it does for static libraries.For example, to link
libsample.so
, you would have the following line in yourMakefile
:LIBS += -lsample
Note that LIBS
must occur in the Makefile after the inclusion of
Makefile.common
.
Miscellaneous Variables¶
CFLAGS
& CPPFLAGS
This variable can be used to add options to the C and C++ compiler, respectively. It is typically used to add options that tell the compiler the location of additional directories to search for header files.
It is highly suggested that you append to
CFLAGS
andCPPFLAGS
as opposed to overwriting them. The masterMakefiles
may already have useful options in them that you may not want to overwrite.
Placement of Object Code¶
The final location of built libraries and executables will depend upon whether
you do a Debug
, Release
, or Profile
build.
Libraries
All libraries (static and dynamic) will be stored inPROJ_OBJ_ROOT/<type>/lib
, where type isDebug
,Release
, orProfile
for a debug, optimized, or profiled build, respectively.
Executables
All executables will be stored inPROJ_OBJ_ROOT/<type>/bin
, where type isDebug
,Release
, orProfile
for a debug, optimized, or profiled build, respectively.
Further Help¶
If you have any questions or need any help creating an LLVM project, the LLVM team would be more than happy to help. You can always post your questions to the LLVM Developers Mailing List.
LLVMBuild Guide¶
Introduction¶
This document describes the LLVMBuild
organization and files which
we use to describe parts of the LLVM ecosystem. For description of
specific LLVMBuild related tools, please see the command guide.
LLVM is designed to be a modular set of libraries which can be flexibly mixed together in order to build a variety of tools, like compilers, JITs, custom code generators, optimization passes, interpreters, and so on. Related projects in the LLVM system like Clang and LLDB also tend to follow this philosophy.
In order to support this usage style, LLVM has a fairly strict structure
as to how the source code and various components are organized. The
LLVMBuild.txt
files are the explicit specification of that
structure, and are used by the build systems and other tools in order to
develop the LLVM project.
Project Organization¶
The source code for LLVM projects using the LLVMBuild system (LLVM, Clang, and LLDB) is organized into components, which define the separate pieces of functionality that make up the project. These projects may consist of many libraries, associated tools, build tools, or other utility tools (for example, testing tools).
For the most part, the project contents are organized around defining
one main component per each subdirectory. Each such directory contains
an LLVMBuild.txt
which contains the component definitions.
The component descriptions for the project as a whole are automatically
gathered by the LLVMBuild tools. The tools automatically traverse the
source directory structure to find all of the component description
files. NOTE: For performance/sanity reasons, we only traverse into
subdirectories when the parent itself contains an LLVMBuild.txt
description file.
Build Integration¶
The LLVMBuild files themselves are just a declarative way to describe the project structure. The actual building of the LLVM project is handled by another build system (currently we support both Makefiles and CMake).
The build system implementation will load the relevant contents of the
LLVMBuild files and use that to drive the actual project build.
Typically, the build system will only need to load this information at
“configure” time, and use it to generative native information. Build
systems will also handle automatically reconfiguring their information
when the contents of the LLVMBuild.txt
files change.
Developers generally are not expected to need to be aware of the details
of how the LLVMBuild system is integrated into their build. Ideally,
LLVM developers who are not working on the build system would only ever
need to modify the contents of the LLVMBuild.txt
description files
(although we have not reached this goal yet).
For more information on the utility tool we provide to help interfacing with the build system, please see the llvm-build documentation.
Component Overview¶
As mentioned earlier, LLVM projects are organized into logical components. Every component is typically grouped into its own subdirectory. Generally, a component is organized around a coherent group of sources which have some kind of clear API separation from other parts of the code.
LLVM primarily uses the following types of components:
- Libraries - Library components define a distinct API which can be independently linked into LLVM client applications. Libraries typically have private and public header files, and may specify a link of required libraries that they build on top of.
- Build Tools - Build tools are applications which are designed to be run as part of the build process (typically to generate other source files). Currently, LLVM uses one main build tool called TableGen to generate a variety of source files.
- Tools - Command line applications which are built using the LLVM component libraries. Most LLVM tools are small and are primarily frontends to the library interfaces.
Components are described using LLVMBuild.txt
files in the directories
that define the component. See the LLVMBuild Format Reference section
for information on the exact format of these files.
LLVMBuild Format Reference¶
LLVMBuild files are written in a simple variant of the INI or configuration file format (Wikipedia entry). The format defines a list of sections each of which may contain some number of properties. A simple example of the file format is below:
; Comments start with a semi-colon.
; Sections are declared using square brackets.
[component_0]
; Properties are declared using '=' and are contained in the previous section.
;
; We support simple string and boolean scalar values and list values, where
; items are separated by spaces. There is no support for quoting, and so
; property values may not contain spaces.
property_name = property_value
list_property_name = value_1 value_2 ... value_n
boolean_property_name = 1 (or 0)
LLVMBuild files are expected to define a strict set of sections and properties. A typical component description file for a library component would look like the following example:
[component_0]
type = Library
name = Linker
parent = Libraries
required_libraries = Archive BitReader Core Support TransformUtils
A full description of the exact sections and properties which are allowed follows.
Each file may define exactly one common component, named common
. The
common component may define the following properties:
subdirectories
[optional]If given, a list of the names of the subdirectories from the current subpath to search for additional LLVMBuild files.
Each file may define multiple components. Each component is described by a
section who name starts with component
. The remainder of the section
name is ignored, but each section name must be unique. Typically components
are just number in order for files with multiple components
(component_0
, component_1
, and so on).
Warning
Section names not matching this format (or the common
section) are
currently unused and are disallowed.
Every component is defined by the properties in the section. The exact list of properties that are allowed depends on the component type. Components may not define any properties other than those expected by the component type.
Every component must define the following properties:
type
[required]The type of the component. Supported component types are detailed below. Most components will define additional properties which may be required or optional.
name
[required]The name of the component. Names are required to be unique across the entire project.
parent
[required]The name of the logical parent of the component. Components are organized into a logical tree to make it easier to navigate and organize groups of components. The parents have no semantics as far as the project build is concerned, however. Typically, the parent will be the main component of the parent directory.
Components may reference the root pseudo component using
$ROOT
to indicate they should logically be grouped at the top-level.
Components may define the following properties:
dependencies
[optional]If specified, a list of names of components which must be built prior to this one. This should only be exactly those components which produce some tool or source code required for building the component.
Note
Group
andLibraryGroup
components have no semantics for the actual build, and are not allowed to specify dependencies.
The following section lists the available component types, as well as the properties which are associated with that component.
type = Group
Group components exist purely to allow additional arbitrary structuring of the logical components tree. For example, one might define a
Libraries
group to hold all of the root library components.Group
components have no additionally properties.type = Library
Library components define an individual library which should be built from the source code in the component directory.
Components with this type use the following properties:
library_name
[optional]If given, the name to use for the actual library file on disk. If not given, the name is derived from the component name itself.
required_libraries
[optional]If given, a list of the names of
Library
orLibraryGroup
components which must also be linked in whenever this library is used. That is, the link time dependencies for this component. When tools are built, the build system will include the transitive closure of allrequired_libraries
for the components the tool needs.add_to_library_groups
[optional]If given, a list of the names of
LibraryGroup
components which this component is also part of. This allows nesting groups of components. For example, theX86
target might define a library group for all of theX86
components. That library group might then be included in theall-targets
library group.installed
[optional] [boolean]Whether this library is installed. Libraries that are not installed are only reported by
llvm-config
when it is run as part of a development directory.
type = LibraryGroup
LibraryGroup
components are a mechanism to allow easy definition of useful sets of related components. In particular, we use them to easily specify things like “all targets”, or “all assembly printers”.Components with this type use the following properties:
required_libraries
[optional]See the
Library
type for a description of this property.add_to_library_groups
[optional]See the
Library
type for a description of this property.
type = TargetGroup
TargetGroup
components are an extension ofLibraryGroup
s, specifically for defining LLVM targets (which are handled specially in a few places).The name of the component should always be the name of the target.
Components with this type use the
LibraryGroup
properties in addition to:has_asmparser
[optional] [boolean]Whether this target defines an assembly parser.
has_asmprinter
[optional] [boolean]Whether this target defines an assembly printer.
has_disassembler
[optional] [boolean]Whether this target defines a disassembler.
has_jit
[optional] [boolean]Whether this target supports JIT compilation.
type = Tool
Tool
components define standalone command line tools which should be built from the source code in the component directory and linked.Components with this type use the following properties:
required_libraries
[optional]If given, a list of the names of
Library
orLibraryGroup
components which this tool is required to be linked with.Note
The values should be the component names, which may not always match up with the actual library names on disk.
Build systems are expected to properly include all of the libraries required by the linked components (i.e., the transitive closure of
required_libraries
).Build systems are also expected to understand that those library components must be built prior to linking – they do not also need to be listed under
dependencies
.
type = BuildTool
BuildTool
components are likeTool
components, except that the tool is supposed to be built for the platform where the build is running (instead of that platform being targeted). Build systems are expected to handle the fact that required libraries may need to be built for multiple platforms in order to be able to link this tool.BuildTool
components currently use the exact same properties asTool
components, the type distinction is only used to differentiate what the tool is built for.
How To Release LLVM To The Public¶
Introduction¶
This document contains information about successfully releasing LLVM —
including subprojects: e.g., clang
and dragonegg
— to the public. It
is the Release Manager’s responsibility to ensure that a high quality build of
LLVM is released.
If you’re looking for the document on how to test the release candidates and create the binary packages, please refer to the How To Validate a New Release instead.
Release Timeline¶
LLVM is released on a time based schedule — with major releases roughly every 6 months. In between major releases there may be dot releases. The release manager will determine if and when to make a dot release based on feedback from the community. Typically, dot releases should be made if there are large number of bug-fixes in the stable branch or a critical bug has been discovered that affects a large number of users.
Unless otherwise stated, dot releases will follow the same procedure as major releases.
The release process is roughly as follows:
- Set code freeze and branch creation date for 6 months after last code freeze date. Announce release schedule to the LLVM community and update the website.
- Create release branch and begin release process.
- Send out release candidate sources for first round of testing. Testing lasts 7-10 days. During the first round of testing, any regressions found should be fixed. Patches are merged from mainline into the release branch. Also, all features need to be completed during this time. Any features not completed at the end of the first round of testing will be removed or disabled for the release.
- Generate and send out the second release candidate sources. Only critial bugs found during this testing phase will be fixed. Any bugs introduced by merged patches will be fixed. If so a third round of testing is needed.
- The release notes are updated.
- Finally, release!
The release process will be accelerated for dot releases. If the first round of testing finds no critical bugs and no regressions since the last major release, then additional rounds of testing will not be required.
Release Process¶
Release Administrative Tasks¶
This section describes a few administrative tasks that need to be done for the release process to begin. Specifically, it involves:
- Creating the release branch,
- Setting version numbers, and
- Tagging release candidates for the release team to begin testing.
Create Release Branch¶
Branch the Subversion trunk using the following procedure:
Remind developers that the release branching is imminent and to refrain from committing patches that might break the build. E.g., new features, large patches for works in progress, an overhaul of the type system, an exciting new TableGen feature, etc.
Verify that the current Subversion trunk is in decent shape by examining nightly tester and buildbot results.
Create the release branch for
llvm
,clang
, thetest-suite
, anddragonegg
from the last known good revision. The branch’s name isrelease_XY
, whereX
is the major andY
the minor release numbers. The branches should be created using the following commands:$ svn copy https://llvm.org/svn/llvm-project/llvm/trunk \ https://llvm.org/svn/llvm-project/llvm/branches/release_XY $ svn copy https://llvm.org/svn/llvm-project/cfe/trunk \ https://llvm.org/svn/llvm-project/cfe/branches/release_XY $ svn copy https://llvm.org/svn/llvm-project/dragonegg/trunk \ https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY $ svn copy https://llvm.org/svn/llvm-project/test-suite/trunk \ https://llvm.org/svn/llvm-project/test-suite/branches/release_XY
Advise developers that they may now check their patches into the Subversion tree again.
The Release Manager should switch to the release branch, because all changes to the release will now be done in the branch. The easiest way to do this is to grab a working copy using the following commands:
$ svn co https://llvm.org/svn/llvm-project/llvm/branches/release_XY llvm-X.Y $ svn co https://llvm.org/svn/llvm-project/cfe/branches/release_XY clang-X.Y $ svn co https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY dragonegg-X.Y $ svn co https://llvm.org/svn/llvm-project/test-suite/branches/release_XY test-suite-X.Y
Update LLVM Version¶
After creating the LLVM release branch, update the release branches’
autoconf
and configure.ac
versions from ‘X.Ysvn
‘ to ‘X.Y
‘.
Update it on mainline as well to be the next version (‘X.Y+1svn
‘).
Regenerate the configure scripts for both llvm
and the test-suite
.
In addition, the version numbers of all the Bugzilla components must be updated for the next release.
Build the LLVM Release Candidates¶
Create release candidates for llvm
, clang
, dragonegg
, and the LLVM
test-suite
by tagging the branch with the respective release candidate
number. For instance, to create Release Candidate 1 you would issue the
following commands:
$ svn mkdir https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ
$ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ/rc1
$ svn mkdir https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ
$ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \
https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ/rc1
$ svn mkdir https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ
$ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \
https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ/rc1
$ svn mkdir https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ
$ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \
https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ/rc1
Similarly, Release Candidate 2 would be named RC2
and so on. This keeps
a permanent copy of the release candidate around for people to export and build
as they wish. The final released sources will be tagged in the RELEASE_XYZ
directory as Final
(c.f. Tag the LLVM Final Release).
The Release Manager may supply pre-packaged source tarballs for users. This can be done with the following commands:
$ svn export https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ/rc1 llvm-X.Yrc1
$ svn export https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ/rc1 clang-X.Yrc1
$ svn export https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ/rc1 dragonegg-X.Yrc1
$ svn export https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ/rc1 llvm-test-X.Yrc1
$ tar -cvf - llvm-X.Yrc1 | gzip > llvm-X.Yrc1.src.tar.gz
$ tar -cvf - clang-X.Yrc1 | gzip > clang-X.Yrc1.src.tar.gz
$ tar -cvf - dragonegg-X.Yrc1 | gzip > dragonegg-X.Yrc1.src.tar.gz
$ tar -cvf - llvm-test-X.Yrc1 | gzip > llvm-test-X.Yrc1.src.tar.gz
Building the Release¶
The builds of llvm
, clang
, and dragonegg
must be free of
errors and warnings in Debug, Release+Asserts, and Release builds. If all
builds are clean, then the release passes Build Qualification.
The make
options for building the different modes:
Mode | Options |
---|---|
Debug | ENABLE_OPTIMIZED=0 |
Release+Asserts | ENABLE_OPTIMIZED=1 |
Release | ENABLE_OPTIMIZED=1 DISABLE_ASSERTIONS=1 |
Build LLVM¶
Build Debug
, Release+Asserts
, and Release
versions
of llvm
on all supported platforms. Directions to build llvm
are here.
Build Clang Binary Distribution¶
Creating the clang
binary distribution (Debug/Release+Asserts/Release)
requires performing the following steps for each supported platform:
- Build clang according to the directions here.
- Build both a Debug and Release version of clang. The binary will be the Release build.
- Package
clang
(details to follow).
Target Specific Build Details¶
The table below specifies which compilers are used for each Arch/OS combination
when qualifying the build of llvm
, clang
, and dragonegg
.
Architecture | OS | compiler |
---|---|---|
x86-32 | Mac OS 10.5 | gcc 4.0.1 |
x86-32 | Linux | gcc 4.2.X, gcc 4.3.X |
x86-32 | FreeBSD | gcc 4.2.X |
x86-32 | mingw | gcc 3.4.5 |
x86-64 | Mac OS 10.5 | gcc 4.0.1 |
x86-64 | Linux | gcc 4.2.X, gcc 4.3.X |
x86-64 | FreeBSD | gcc 4.2.X |
ARMv7 | Linux | gcc 4.6.X, gcc 4.7.X |
Release Qualification Criteria¶
A release is qualified when it has no regressions from the previous release (or baseline). Regressions are related to correctness first and performance second. (We may tolerate some minor performance regressions if they are deemed necessary for the general quality of the compiler.)
Regressions are new failures in the set of tests that are used to qualify each product and only include things on the list. Every release will have some bugs in it. It is the reality of developing a complex piece of software. We need a very concrete and definitive release criteria that ensures we have monotonically improving quality on some metric. The metric we use is described below. This doesn’t mean that we don’t care about other criteria, but these are the criteria which we found to be most important and which must be satisfied before a release can go out.
Qualify LLVM¶
LLVM is qualified when it has a clean test run without a front-end. And it has
no regressions when using either clang
or dragonegg
with the
test-suite
from the previous release.
Qualify Clang¶
Clang
is qualified when front-end specific tests in the llvm
regression
test suite all pass, clang’s own test suite passes cleanly, and there are no
regressions in the test-suite
.
Specific Target Qualification Details¶
Architecture | OS | clang baseline | tests |
---|---|---|---|
x86-32 | Linux | last release | llvm regression tests, clang regression tests, test-suite (including spec) |
x86-32 | FreeBSD | last release | llvm regression tests, clang regression tests, test-suite |
x86-32 | mingw | none | QT |
x86-64 | Mac OS 10.X | last release | llvm regression tests, clang regression tests, test-suite (including spec) |
x86-64 | Linux | last release | llvm regression tests, clang regression tests, test-suite (including spec) |
x86-64 | FreeBSD | last release | llvm regression tests, clang regression tests, test-suite |
ARMv7A | Linux | last release | llvm regression tests, clang regression tests, test-suite |
Community Testing¶
Once all testing has been completed and appropriate bugs filed, the release candidate tarballs are put on the website and the LLVM community is notified. Ask that all LLVM developers test the release in 2 ways:
- Download
llvm-X.Y
,llvm-test-X.Y
, and the appropriateclang
binary. Build LLVM. Runmake check
and the full LLVM test suite (make TEST=nightly report
). - Download
llvm-X.Y
,llvm-test-X.Y
, and theclang
sources. Compile everything. Runmake check
and the full LLVM test suite (make TEST=nightly report
).
Ask LLVM developers to submit the test suite report and make check
results
to the list. Verify that there are no regressions from the previous release.
The results are not used to qualify a release, but to spot other potential
problems. For unsupported targets, verify that make check
is at least
clean.
During the first round of testing, all regressions must be fixed before the second release candidate is tagged.
If this is the second round of testing, the testing is only to ensure that bug fixes previously merged in have not created new major problems. This is not the time to solve additional and unrelated bugs! If no patches are merged in, the release is determined to be ready and the release manager may move onto the next stage.
Release Patch Rules¶
Below are the rules regarding patching the release branch:
- Patches applied to the release branch may only be applied by the release manager.
- During the first round of testing, patches that fix regressions or that are small and relatively risk free (verified by the appropriate code owner) are applied to the branch. Code owners are asked to be very conservative in approving patches for the branch. We reserve the right to reject any patch that does not fix a regression as previously defined.
- During the remaining rounds of testing, only patches that fix critical regressions may be applied.
- For dot releases all patches must mantain both API and ABI compatibility with the previous major release. Only bugfixes will be accepted.
Release Final Tasks¶
The final stages of the release process involves tagging the “final” release branch, updating documentation that refers to the release, and updating the demo page.
Update Documentation¶
Review the documentation and ensure that it is up to date. The “Release Notes” must be updated to reflect new features, bug fixes, new known issues, and changes in the list of supported platforms. The “Getting Started Guide” should be updated to reflect the new release version number tag available from Subversion and changes in basic system requirements. Merge both changes from mainline into the release branch.
Tag the LLVM Final Release¶
Tag the final release sources using the following procedure:
$ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ/Final
$ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \
https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ/Final
$ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \
https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ/Final
$ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \
https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ/Final
Update the LLVM Demo Page¶
The LLVM demo page must be updated to use the new release. This consists of
using the new clang
binary and building LLVM.
Update the LLVM Website¶
The website must be updated before the release announcement is sent out. Here is what to do:
- Check out the
www
module from Subversion. - Create a new subdirectory
X.Y
in the releases directory. - Commit the
llvm
,test-suite
,clang
source,clang binaries
,dragonegg
source, anddragonegg
binaries in this new directory. - Copy and commit the
llvm/docs
andLICENSE.txt
files into this new directory. The docs should be built withBUILD_FOR_WEBSITE=1
. - Commit the
index.html
to therelease/X.Y
directory to redirect (use from previous release). - Update the
releases/download.html
file with the new release. - Update the
releases/index.html
with the new release and link to release documentation. - Finally, update the main page (
index.html
and sidebar) to point to the new release and release announcement. Make sure this all gets committed back into Subversion.
Announce the Release¶
Have Chris send out the release announcement when everything is finished.
Advice on Packaging LLVM¶
Overview¶
LLVM sets certain default configure options to make sure our developers don’t break things for constrained platforms. These settings are not optimal for most desktop systems, and we hope that packagers (e.g., Redhat, Debian, MacPorts, etc.) will tweak them. This document lists settings we suggest you tweak.
LLVM’s API changes with each release, so users are likely to want, for example, both LLVM-2.6 and LLVM-2.7 installed at the same time to support apps developed against each.
Compile Flags¶
LLVM runs much more quickly when it’s optimized and assertions are removed.
However, such a build is currently incompatible with users who build without
defining NDEBUG
, and the lack of assertions makes it hard to debug problems
in user code. We recommend allowing users to install both optimized and debug
versions of LLVM in parallel. The following configure flags are relevant:
--disable-assertions
- Builds LLVM with
NDEBUG
defined. Changes the LLVM ABI. Also available by settingDISABLE_ASSERTIONS=0|1
inmake
‘s environment. This defaults to enabled regardless of the optimization setting, but it slows things down. --enable-debug-symbols
- Builds LLVM with
-g
. Also available by settingDEBUG_SYMBOLS=0|1
inmake
‘s environment. This defaults to disabled when optimizing, so you should turn it back on to let users debug their programs. --enable-optimized
- (For svn checkouts) Builds LLVM with
-O2
and, by default, turns off debug symbols. Also available by settingENABLE_OPTIMIZED=0|1
inmake
‘s environment. This defaults to enabled when not in a checkout.
C++ Features¶
- RTTI
- LLVM disables RTTI by default. Add
REQUIRES_RTTI=1
to your environment while runningmake
to re-enable it. This will allow users to build with RTTI enabled and still inherit from LLVM classes.
Dependencies¶
--enable-libffi
- Depend on libffi to allow the LLVM interpreter to call external functions.
--with-oprofile
Depend on libopagent (>=version 0.9.4) to let the LLVM JIT tell oprofile about function addresses and line numbers.
How To Validate a New Release¶
Introduction¶
This document contains information about testing the release candidates that will ultimately be the next LLVM release. For more information on how to manage the actual release, please refer to How To Release LLVM To The Public.
Overview of the Release Process¶
Once the release process starts, the Release Manager will ask for volunteers, and it’ll be the role of each volunteer to:
- Test and benchmark the previous release
- Test and benchmark each release candidate, comparing to the previous release and candidates
- Identify, reduce and report every regression found during tests and benchmarks
- Make sure the critical bugs get fixed and merged to the next release candidate
Not all bugs or regressions are show-stoppers and it’s a bit of a grey area what should be fixed before the next candidate and what can wait until the next release.
It’ll depend on:
- The severity of the bug, how many people it affects and if it’s a regression or a known bug. Known bugs are “unsupported features” and some bugs can be disabled if they have been implemented recently.
- The stage in the release. Less critical bugs should be considered to be fixed between RC1 and RC2, but not so much at the end of it.
- If it’s a correctness or a performance regression. Performance regression tends to be taken more lightly than correctness.
Scripts¶
The scripts are in the utils/release
directory.
test-release.sh¶
This script will check-out, configure and compile LLVM+Clang (+ most add-ons, like compiler-rt
,
libcxx
and clang-extra-tools
) in three stages, and will test the final stage.
It’ll have installed the final binaries on the Phase3/Releasei(+Asserts) directory, and
that’s the one you should use for the test-suite and other external tests.
To run the script on a specific release candidate run:
./test-release.sh \
-release 3.3 \
-rc 1 \
-no-64bit \
-test-asserts \
-no-compare-files
Each system will require different options. For instance, x86_64 will obviously not need
-no-64bit
while 32-bit systems will, or the script will fail.
The important flags to get right are:
- On the pre-release, you should change
-rc 1
to-final
. On RC2, change it to-rc 2
and so on. - On non-release testing, you can use
-final
in conjunction with-no-checkout
, but you’ll have to create thefinal
directory by hand and link the correct source dir tofinal/llvm.src
. - For release candidates, you need
-test-asserts
, or it won’t create a “Release+Asserts” directory, which is needed for release testing and benchmarking. This will take twice as long. - On the final candidate you just need Release builds, and that’s the binary directory you’ll have to pack.
This script builds three phases of Clang+LLVM twice each (Release and Release+Asserts), so use screen or nohup to avoid headaches, since it’ll take a long time.
Use the --help
option to see all the options and chose it according to your needs.
findRegressions-nightly.py¶
TODO
Test Suite¶
Follow the LNT Quick Start Guide link on how to set-up the test-suite
The binary location you’ll have to use for testing is inside the rcN/Phase3/Release+Asserts/llvmCore-REL-RC.install
.
Link that directory to an easier location and run the test-suite.
An example on the run command line, assuming you created a link from the correct
install directory to ~/devel/llvm/install
:
./sandbox/bin/python sandbox/bin/lnt runtest \
nt \
-j4 \
--sandbox sandbox \
--test-suite ~/devel/llvm/test/test-suite \
--cc ~/devel/llvm/install/bin/clang \
--cxx ~/devel/llvm/install/bin/clang++
It should have no new regressions, compared to the previous release or release candidate. You don’t need to fix all the bugs in the test-suite, since they’re not necessarily meant to pass on all architectures all the time. This is due to the nature of the result checking, which relies on direct comparison, and most of the time, the failures are related to bad output checking, rather than bad code generation.
If the errors are in LLVM itself, please report every single regression found as blocker, and all the other bugs as important, but not necessarily blocking the release to proceed. They can be set as “known failures” and to be fix on a future date.
Pre-Release Process¶
When the release process is announced on the mailing list, you should prepare for the testing, by applying the same testing you’ll do on the release candidates, on the previous release.
You should:
- Download the previous release sources from http://llvm.org/releases/download.html.
- Run the test-release.sh script on
final
mode (change-rc 1
to-final
). - Once all three stages are done, it’ll test the final stage.
- Using the
Phase3/Release+Asserts/llvmCore-MAJ.MIN-final.install
base, run the test-suite.
If the final phase’s make check-all
failed, it’s a good idea to also test the
intermediate stages by going on the obj directory and running make check-all
to find
if there’s at least one stage that passes (helps when reducing the error for bug report
purposes).
Release Process¶
When the Release Manager sends you the release candidate, download all sources, unzip on the same directory (there will be sym-links from the appropriate places to them), and run the release test as above.
You should:
- Download the current candidate sources from where the release manager points you (ex. http://llvm.org/pre-releases/3.3/rc1/).
- Repeat the steps above with
-rc 1
,-rc 2
etc modes and run the test-suite the same way. - Compare the results, report all errors on Bugzilla and publish the binary blob where the release manager can grab it.
Once the release manages announces that the latest candidate is the good one, you
have to pack the Release
(no Asserts) install directory on Phase3
and that
will be the official binary.
- Rename (or link)
clang+llvm-REL-ARCH-ENV
to the .install directory - Tar that into the same name with
.tar.gz
extensioan from outside the directory - Make it available for the release manager to download
Bug Reporting Process¶
If you found regressions or failures when comparing a release candidate with the previous release, follow the rules below:
- Critical bugs on compilation should be fixed as soon as possible, possibly before releasing the binary blobs.
- Check-all tests should be fixed before the next release candidate, but can wait until the test-suite run is finished.
- Bugs in the test suite or unimportant check-all tests can be fixed in between release candidates.
- New features or recent big changes, when close to the release, should have done in a way that it’s easy to disable. If they misbehave, prefer disabling them than releasing an unstable (but untested) binary package.
- LLVM Developer Policy
- The LLVM project’s policy towards developers and their contributions.
- Creating an LLVM Project
- How-to guide and templates for new projects that use the LLVM
infrastructure. The templates (directory organization, Makefiles, and test
tree) allow the project code to be located outside (or inside) the
llvm/
tree, while using LLVM header files and libraries. - LLVMBuild Guide
- Describes the LLVMBuild organization and files used by LLVM to specify component descriptions.
- LLVM Makefile Guide
- Describes how the LLVM makefiles work and how to use them.
- How To Release LLVM To The Public
- This is a guide to preparing LLVM releases. Most developers can ignore it.
- How To Validate a New Release
- This is a guide to validate a new release, during the release process. Most developers can ignore it.
- Advice on Packaging LLVM
- Advice on packaging LLVM into a distribution.
- Code Reviews with Phabricator
- Describes how to use the Phabricator code review tool hosted on http://reviews.llvm.org/ and its command line interface, Arcanist.
Community¶
LLVM has a thriving community of friendly and helpful developers. The two primary communication mechanisms in the LLVM community are mailing lists and IRC.
Mailing Lists¶
If you can’t find what you need in these docs, try consulting the mailing lists.
- Developer’s List (llvm-dev)
- This list is for people who want to be included in technical discussions of LLVM. People post to this list when they have questions about writing code for or using the LLVM tools. It is relatively low volume.
- Commits Archive (llvm-commits)
- This list contains all commit messages that are made when LLVM developers commit code changes to the repository. It also serves as a forum for patch review (i.e. send patches here). It is useful for those who want to stay on the bleeding edge of LLVM development. This list is very high volume.
- Bugs & Patches Archive (llvm-bugs)
- This list gets emailed every time a bug is opened and closed. It is higher volume than the LLVM-dev list.
- Test Results Archive (llvm-testresults)
- A message is automatically sent to this list by every active nightly tester when it completes. As such, this list gets email several times each day, making it a high volume list.
- LLVM Announcements List (llvm-announce)
- This is a low volume list that provides important announcements regarding LLVM. It gets email about once a month.
IRC¶
Users and developers of the LLVM project (including subprojects such as Clang) can be found in #llvm on irc.oftc.net.
This channel has several bots.
- Buildbot reporters
- llvmbb - Bot for the main LLVM buildbot master. http://lab.llvm.org:8011/console
- bb-chapuni - An individually run buildbot master. http://bb.pgr.jp/console
- smooshlab - Apple’s internal buildbot master.
- robot - Bugzilla linker. %bug <number>
- clang-bot - A geordi instance running near-trunk clang instead of gcc.