summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorDmitri Gribenko <gribozavr@gmail.com>2012-11-22 11:56:02 +0000
committerDmitri Gribenko <gribozavr@gmail.com>2012-11-22 11:56:02 +0000
commitbbef5ead4cc90d7f7ca2f5dded41751ca3ff3dc9 (patch)
tree41991cf65a641c93b3de40a2f17b5fd0c6b5f9a9 /docs
parent7a3b7e5efc44c3852c5b34b245bd4eedeeac886f (diff)
downloadllvm-bbef5ead4cc90d7f7ca2f5dded41751ca3ff3dc9.tar.gz
llvm-bbef5ead4cc90d7f7ca2f5dded41751ca3ff3dc9.tar.bz2
llvm-bbef5ead4cc90d7f7ca2f5dded41751ca3ff3dc9.tar.xz
Documentation: convert SourceLevelDebugging.html to reST
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168493 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r--docs/SourceLevelDebugging.html2858
-rw-r--r--docs/SourceLevelDebugging.rst2285
-rw-r--r--docs/subsystems.rst3
3 files changed, 2287 insertions, 2859 deletions
diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html
deleted file mode 100644
index 546aab9d1a..0000000000
--- a/docs/SourceLevelDebugging.html
+++ /dev/null
@@ -1,2858 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-<html>
-<head>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <title>Source Level Debugging with LLVM</title>
- <link rel="stylesheet" href="_static/llvm.css" type="text/css">
-</head>
-<body>
-
-<h1>Source Level Debugging with LLVM</h1>
-
-<table class="layout" style="width:100%">
- <tr class="layout">
- <td class="left">
-<ul>
- <li><a href="#introduction">Introduction</a>
- <ol>
- <li><a href="#phil">Philosophy behind LLVM debugging information</a></li>
- <li><a href="#consumers">Debug information consumers</a></li>
- <li><a href="#debugopt">Debugging optimized code</a></li>
- </ol></li>
- <li><a href="#format">Debugging information format</a>
- <ol>
- <li><a href="#debug_info_descriptors">Debug information descriptors</a>
- <ul>
- <li><a href="#format_compile_units">Compile unit descriptors</a></li>
- <li><a href="#format_files">File descriptors</a></li>
- <li><a href="#format_global_variables">Global variable descriptors</a></li>
- <li><a href="#format_subprograms">Subprogram descriptors</a></li>
- <li><a href="#format_blocks">Block descriptors</a></li>
- <li><a href="#format_basic_type">Basic type descriptors</a></li>
- <li><a href="#format_derived_type">Derived type descriptors</a></li>
- <li><a href="#format_composite_type">Composite type descriptors</a></li>
- <li><a href="#format_subrange">Subrange descriptors</a></li>
- <li><a href="#format_enumeration">Enumerator descriptors</a></li>
- <li><a href="#format_variables">Local variables</a></li>
- </ul></li>
- <li><a href="#format_common_intrinsics">Debugger intrinsic functions</a>
- <ul>
- <li><a href="#format_common_declare">llvm.dbg.declare</a></li>
- <li><a href="#format_common_value">llvm.dbg.value</a></li>
- </ul></li>
- </ol></li>
- <li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li>
- <li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a>
- <ol>
- <li><a href="#ccxx_compile_units">C/C++ source file information</a></li>
- <li><a href="#ccxx_global_variable">C/C++ global variable information</a></li>
- <li><a href="#ccxx_subprogram">C/C++ function information</a></li>
- <li><a href="#ccxx_basic_types">C/C++ basic types</a></li>
- <li><a href="#ccxx_derived_types">C/C++ derived types</a></li>
- <li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li>
- <li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li>
- </ol></li>
- <li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a>
- <ol>
- <li><a href="#objcproperty">Debugging Information Extension
- for Objective C Properties</a>
- <ul>
- <li><a href="#objcpropertyintroduction">Introduction</a></li>
- <li><a href="#objcpropertyproposal">Proposal</a></li>
- <li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li>
- <li><a href="#objcpropertynewconstants">New DWARF Constants</a></li>
- </ul>
- </li>
- <li><a href="#acceltable">Name Accelerator Tables</a>
- <ul>
- <li><a href="#acceltableintroduction">Introduction</a></li>
- <li><a href="#acceltablehashes">Hash Tables</a></li>
- <li><a href="#acceltabledetails">Details</a></li>
- <li><a href="#acceltablecontents">Contents</a></li>
- <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li>
- </ul>
- </li>
- </ol>
- </li>
-</ul>
-</td>
-</tr></table>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
-</div>
-
-
-<!-- *********************************************************************** -->
-<h2><a name="introduction">Introduction</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>This document is the central repository for all information pertaining to
- debug information in LLVM. It describes the <a href="#format">actual format
- that the LLVM debug information</a> takes, which is useful for those
- interested in creating front-ends or dealing directly with the information.
- Further, this document provides specific examples of what debug information
- for C/C++ looks like.</p>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="phil">Philosophy behind LLVM debugging information</a>
-</h3>
-
-<div>
-
-<p>The idea of the LLVM debugging information is to capture how the important
- pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
- Several design aspects have shaped the solution that appears here. The
- important ones are:</p>
-
-<ul>
- <li>Debugging information should have very little impact on the rest of the
- compiler. No transformations, analyses, or code generators should need to
- be modified because of debugging information.</li>
-
- <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and
- easily described ways</a> with the debugging information.</li>
-
- <li>Because LLVM is designed to support arbitrary programming languages,
- LLVM-to-LLVM tools should not need to know anything about the semantics of
- the source-level-language.</li>
-
- <li>Source-level languages are often <b>widely</b> different from one another.
- LLVM should not put any restrictions of the flavor of the source-language,
- and the debugging information should work with any language.</li>
-
- <li>With code generator support, it should be possible to use an LLVM compiler
- to compile a program to native machine code and standard debugging
- formats. This allows compatibility with traditional machine-code level
- debuggers, like GDB or DBX.</li>
-</ul>
-
-<p>The approach used by the LLVM implementation is to use a small set
- of <a href="#format_common_intrinsics">intrinsic functions</a> to define a
- mapping between LLVM program objects and the source-level objects. The
- description of the source-level program is maintained in LLVM metadata
- in an <a href="#ccxx_frontend">implementation-defined format</a>
- (the C/C++ front-end currently uses working draft 7 of
- the <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3
- standard</a>).</p>
-
-<p>When a program is being debugged, a debugger interacts with the user and
- turns the stored debug information into source-language specific information.
- As such, a debugger must be aware of the source-language, and is thus tied to
- a specific language or family of languages.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="consumers">Debug information consumers</a>
-</h3>
-
-<div>
-
-<p>The role of debug information is to provide meta information normally
- stripped away during the compilation process. This meta information provides
- an LLVM user a relationship between generated code and the original program
- source code.</p>
-
-<p>Currently, debug information is consumed by DwarfDebug to produce dwarf
- information used by the gdb debugger. Other targets could use the same
- information to produce stabs or other debug forms.</p>
-
-<p>It would also be reasonable to use debug information to feed profiling tools
- for analysis of generated code, or, tools for reconstructing the original
- source from generated code.</p>
-
-<p>TODO - expound a bit more.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="debugopt">Debugging optimized code</a>
-</h3>
-
-<div>
-
-<p>An extremely high priority of LLVM debugging information is to make it
- interact well with optimizations and analysis. In particular, the LLVM debug
- information provides the following guarantees:</p>
-
-<ul>
- <li>LLVM debug information <b>always provides information to accurately read
- the source-level state of the program</b>, regardless of which LLVM
- optimizations have been run, and without any modification to the
- optimizations themselves. However, some optimizations may impact the
- ability to modify the current state of the program with a debugger, such
- as setting program variables, or calling functions that have been
- deleted.</li>
-
- <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM
- debugging information, allowing them to update the debugging information
- as they perform aggressive optimizations. This means that, with effort,
- the LLVM optimizers could optimize debug code just as well as non-debug
- code.</li>
-
- <li>LLVM debug information does not prevent optimizations from
- happening (for example inlining, basic block reordering/merging/cleanup,
- tail duplication, etc).</li>
-
- <li>LLVM debug information is automatically optimized along with the rest of
- the program, using existing facilities. For example, duplicate
- information is automatically merged by the linker, and unused information
- is automatically removed.</li>
-</ul>
-
-<p>Basically, the debug information allows you to compile a program with
- "<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily
- modify the program as it executes from a debugger. Compiling a program with
- "<tt>-O3 -g</tt>" gives you full debug information that is always available
- and accurate for reading (e.g., you get accurate stack traces despite tail
- call elimination and inlining), but you might lose the ability to modify the
- program and call functions where were optimized out of the program, or
- inlined away completely.</p>
-
-<p><a href="TestingGuide.html#quicktestsuite">LLVM test suite</a> provides a
- framework to test optimizer's handling of debugging information. It can be
- run like this:</p>
-
-<div class="doc_code">
-<pre>
-% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
-% make TEST=dbgopt
-</pre>
-</div>
-
-<p>This will test impact of debugging information on optimization passes. If
- debugging information influences optimization passes then it will be reported
- as a failure. See <a href="TestingGuide.html">TestingGuide</a> for more
- information on LLVM test infrastructure and how to run various tests.</p>
-
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2>
- <a name="format">Debugging information format</a>
-</h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>LLVM debugging information has been carefully designed to make it possible
- for the optimizer to optimize the program and debugging information without
- necessarily having to know anything about debugging information. In
- particular, the use of metadata avoids duplicated debugging information from
- the beginning, and the global dead code elimination pass automatically
- deletes debugging information for a function if it decides to delete the
- function. </p>
-
-<p>To do this, most of the debugging information (descriptors for types,
- variables, functions, source files, etc) is inserted by the language
- front-end in the form of LLVM metadata. </p>
-
-<p>Debug information is designed to be agnostic about the target debugger and
- debugging information representation (e.g. DWARF/Stabs/etc). It uses a
- generic pass to decode the information that represents variables, types,
- functions, namespaces, etc: this allows for arbitrary source-language
- semantics and type-systems to be used, as long as there is a module
- written for the target debugger to interpret the information. </p>
-
-<p>To provide basic functionality, the LLVM debugger does have to make some
- assumptions about the source-level language being debugged, though it keeps
- these to a minimum. The only common features that the LLVM debugger assumes
- exist are <a href="#format_files">source files</a>,
- and <a href="#format_global_variables">program objects</a>. These abstract
- objects are used by a debugger to form stack traces, show information about
- local variables, etc.</p>
-
-<p>This section of the documentation first describes the representation aspects
- common to any source-language. The <a href="#ccxx_frontend">next section</a>
- describes the data layout conventions used by the C and C++ front-ends.</p>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="debug_info_descriptors">Debug information descriptors</a>
-</h3>
-
-<div>
-
-<p>In consideration of the complexity and volume of debug information, LLVM
- provides a specification for well formed debug descriptors. </p>
-
-<p>Consumers of LLVM debug information expect the descriptors for program
- objects to start in a canonical format, but the descriptors can include
- additional information appended at the end that is source-language
- specific. All LLVM debugging information is versioned, allowing backwards
- compatibility in the case that the core structures need to change in some
- way. Also, all debugging information objects start with a tag to indicate
- what type of object it is. The source-language is allowed to define its own
- objects, by using unreserved tag numbers. We recommend using with tags in
- the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base =
- 0x1000.)</p>
-
-<p>The fields of debug descriptors used internally by LLVM
- are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>,
- <tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p>
-
-<div class="doc_code">
-<pre>
-!1 = metadata !{
- i32, ;; A tag
- ...
-}
-</pre>
-</div>
-
-<p><a name="LLVMDebugVersion">The first field of a descriptor is always an
- <tt>i32</tt> containing a tag value identifying the content of the
- descriptor. The remaining fields are specific to the descriptor. The values
- of tags are loosely bound to the tag values of DWARF information entries.
- However, that does not restrict the use of the information supplied to DWARF
- targets. To facilitate versioning of debug information, the tag is augmented
- with the current debug version (LLVMDebugVersion = 8 &lt;&lt; 16 or
- 0x80000 or 524288.)</a></p>
-
-<p>The details of the various descriptors follow.</p>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_compile_units">Compile unit descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!0 = metadata !{
- i32, ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
- ;; (DW_TAG_compile_unit)
- i32, ;; Unused field.
- i32, ;; DWARF language identifier (ex. DW_LANG_C89)
- metadata, ;; Source file name
- metadata, ;; Source file directory (includes trailing slash)
- metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
- i1, ;; True if this is a main compile unit.
- i1, ;; True if this is optimized.
- metadata, ;; Flags
- i32 ;; Runtime version
- metadata ;; List of enums types
- metadata ;; List of retained types
- metadata ;; List of subprograms
- metadata ;; List of global variables
-}
-</pre>
-</div>
-
-<p>These descriptors contain a source language ID for the file (we use the DWARF
- 3.0 ID numbers, such as <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>,
- <tt>DW_LANG_Cobol74</tt>, etc), three strings describing the filename,
- working directory of the compiler, and an identifier string for the compiler
- that produced it.</p>
-
-<p>Compile unit descriptors provide the root context for objects declared in a
- specific compilation unit. File descriptors are defined using this context.
- These descriptors are collected by a named metadata
- <tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms,
- global variables and type information.
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_files">File descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!0 = metadata !{
- i32, ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
- ;; (DW_TAG_file_type)
- metadata, ;; Source file name
- metadata, ;; Source file directory (includes trailing slash)
- metadata ;; Unused
-}
-</pre>
-</div>
-
-<p>These descriptors contain information for a file. Global variables and top
- level functions would be defined using this context.k File descriptors also
- provide context for source line correspondence. </p>
-
-<p>Each input file is encoded as a separate file descriptor in LLVM debugging
- information output. </p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_global_variables">Global variable descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!1 = metadata !{
- i32, ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
- ;; (DW_TAG_variable)
- i32, ;; Unused field.
- metadata, ;; Reference to context descriptor
- metadata, ;; Name
- metadata, ;; Display name (fully qualified C++ name)
- metadata, ;; MIPS linkage name (for C++)
- metadata, ;; Reference to file where defined
- i32, ;; Line number where defined
- metadata, ;; Reference to type descriptor
- i1, ;; True if the global is local to compile unit (static)
- i1, ;; True if the global is defined in the compile unit (not extern)
- {}* ;; Reference to the global variable
-}
-</pre>
-</div>
-
-<p>These descriptors provide debug information about globals variables. The
-provide details such as name, type and where the variable is defined. All
-global variables are collected inside the named metadata
-<tt>!llvm.dbg.cu</tt>.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_subprograms">Subprogram descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32, ;; Tag = 46 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
- ;; (DW_TAG_subprogram)
- i32, ;; Unused field.
- metadata, ;; Reference to context descriptor
- metadata, ;; Name
- metadata, ;; Display name (fully qualified C++ name)
- metadata, ;; MIPS linkage name (for C++)
- metadata, ;; Reference to file where defined
- i32, ;; Line number where defined
- metadata, ;; Reference to type descriptor
- i1, ;; True if the global is local to compile unit (static)
- i1, ;; True if the global is defined in the compile unit (not extern)
- i32, ;; Line number where the scope of the subprogram begins
- i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
- i32, ;; Index into a virtual function
- metadata, ;; indicates which base type contains the vtable pointer for the
- ;; derived class
- i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
- i1, ;; isOptimized
- Function *,;; Pointer to LLVM function
- metadata, ;; Lists function template parameters
- metadata ;; Function declaration descriptor
- metadata ;; List of function variables
-}
-</pre>
-</div>
-
-<p>These descriptors provide debug information about functions, methods and
- subprograms. They provide details such as name, return types and the source
- location where the subprogram is defined.
-</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_blocks">Block descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!3 = metadata !{
- i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
- metadata,;; Reference to context descriptor
- i32, ;; Line number
- i32, ;; Column number
- metadata,;; Reference to source file
- i32 ;; Unique ID to identify blocks from a template function
-}
-</pre>
-</div>
-
-<p>This descriptor provides debug information about nested blocks within a
- subprogram. The line number and column numbers are used to dinstinguish
- two lexical blocks at same depth. </p>
-
-<div class="doc_code">
-<pre>
-!3 = metadata !{
- i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
- metadata ;; Reference to the scope we're annotating with a file change
- metadata,;; Reference to the file the scope is enclosed in.
-}
-</pre>
-</div>
-
-<p>This descriptor provides a wrapper around a lexical scope to handle file
- changes in the middle of a lexical block.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_basic_type">Basic type descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!4 = metadata !{
- i32, ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
- ;; (DW_TAG_base_type)
- metadata, ;; Reference to context
- metadata, ;; Name (may be "" for anonymous types)
- metadata, ;; Reference to file where defined (may be NULL)
- i32, ;; Line number where defined (may be 0)
- i64, ;; Size in bits
- i64, ;; Alignment in bits
- i64, ;; Offset in bits
- i32, ;; Flags
- i32 ;; DWARF type encoding
-}
-</pre>
-</div>
-
-<p>These descriptors define primitive types used in the code. Example int, bool
- and float. The context provides the scope of the type, which is usually the
- top level. Since basic types are not usually user defined the context
- and line number can be left as NULL and 0. The size, alignment and offset
- are expressed in bits and can be 64 bit values. The alignment is used to
- round the offset when embedded in a
- <a href="#format_composite_type">composite type</a> (example to keep float
- doubles on 64 bit boundaries.) The offset is the bit offset if embedded in
- a <a href="#format_composite_type">composite type</a>.</p>
-
-<p>The type encoding provides the details of the type. The values are typically
- one of the following:</p>
-
-<div class="doc_code">
-<pre>
-DW_ATE_address = 1
-DW_ATE_boolean = 2
-DW_ATE_float = 4
-DW_ATE_signed = 5
-DW_ATE_signed_char = 6
-DW_ATE_unsigned = 7
-DW_ATE_unsigned_char = 8
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_derived_type">Derived type descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!5 = metadata !{
- i32, ;; Tag (see below)
- metadata, ;; Reference to context
- metadata, ;; Name (may be "" for anonymous types)
- metadata, ;; Reference to file where defined (may be NULL)
- i32, ;; Line number where defined (may be 0)
- i64, ;; Size in bits
- i64, ;; Alignment in bits
- i64, ;; Offset in bits
- i32, ;; Flags to encode attributes, e.g. private
- metadata, ;; Reference to type derived from
- metadata, ;; (optional) Name of the Objective C property associated with
- ;; Objective-C an ivar
- metadata, ;; (optional) Name of the Objective C property getter selector.
- metadata, ;; (optional) Name of the Objective C property setter selector.
- i32 ;; (optional) Objective C property attributes.
-}
-</pre>
-</div>
-
-<p>These descriptors are used to define types derived from other types. The
-value of the tag varies depending on the meaning. The following are possible
-tag values:</p>
-
-<div class="doc_code">
-<pre>
-DW_TAG_formal_parameter = 5
-DW_TAG_member = 13
-DW_TAG_pointer_type = 15
-DW_TAG_reference_type = 16
-DW_TAG_typedef = 22
-DW_TAG_const_type = 38
-DW_TAG_volatile_type = 53
-DW_TAG_restrict_type = 55
-</pre>
-</div>
-
-<p><tt>DW_TAG_member</tt> is used to define a member of
- a <a href="#format_composite_type">composite type</a>
- or <a href="#format_subprograms">subprogram</a>. The type of the member is
- the <a href="#format_derived_type">derived
- type</a>. <tt>DW_TAG_formal_parameter</tt> is used to define a member which
- is a formal argument of a subprogram.</p>
-
-<p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p>
-
-<p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>,
- <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and
- <tt>DW_TAG_restrict_type</tt> are used to qualify
- the <a href="#format_derived_type">derived type</a>. </p>
-
-<p><a href="#format_derived_type">Derived type</a> location can be determined
- from the context and line number. The size, alignment and offset are
- expressed in bits and can be 64 bit values. The alignment is used to round
- the offset when embedded in a <a href="#format_composite_type">composite
- type</a> (example to keep float doubles on 64 bit boundaries.) The offset is
- the bit offset if embedded in a <a href="#format_composite_type">composite
- type</a>.</p>
-
-<p>Note that the <tt>void *</tt> type is expressed as a type derived from NULL.
-</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_composite_type">Composite type descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!6 = metadata !{
- i32, ;; Tag (see below)
- metadata, ;; Reference to context
- metadata, ;; Name (may be "" for anonymous types)
- metadata, ;; Reference to file where defined (may be NULL)
- i32, ;; Line number where defined (may be 0)
- i64, ;; Size in bits
- i64, ;; Alignment in bits
- i64, ;; Offset in bits
- i32, ;; Flags
- metadata, ;; Reference to type derived from
- metadata, ;; Reference to array of member descriptors
- i32 ;; Runtime languages
-}
-</pre>
-</div>
-
-<p>These descriptors are used to define types that are composed of 0 or more
-elements. The value of the tag varies depending on the meaning. The following
-are possible tag values:</p>
-
-<div class="doc_code">
-<pre>
-DW_TAG_array_type = 1
-DW_TAG_enumeration_type = 4
-DW_TAG_structure_type = 19
-DW_TAG_union_type = 23
-DW_TAG_vector_type = 259
-DW_TAG_subroutine_type = 21
-DW_TAG_inheritance = 28
-</pre>
-</div>
-
-<p>The vector flag indicates that an array type is a native packed vector.</p>
-
-<p>The members of array types (tag = <tt>DW_TAG_array_type</tt>) or vector types
- (tag = <tt>DW_TAG_vector_type</tt>) are <a href="#format_subrange">subrange
- descriptors</a>, each representing the range of subscripts at that level of
- indexing.</p>
-
-<p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are
- <a href="#format_enumeration">enumerator descriptors</a>, each representing
- the definition of enumeration value for the set. All enumeration type
- descriptors are collected inside the named metadata
- <tt>!llvm.dbg.cu</tt>.</p>
-
-<p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag
- = <tt>DW_TAG_union_type</tt>) types are any one of
- the <a href="#format_basic_type">basic</a>,
- <a href="#format_derived_type">derived</a>
- or <a href="#format_composite_type">composite</a> type descriptors, each
- representing a field member of the structure or union.</p>
-
-<p>For C++ classes (tag = <tt>DW_TAG_structure_type</tt>), member descriptors
- provide information about base classes, static members and member
- functions. If a member is a <a href="#format_derived_type">derived type
- descriptor</a> and has a tag of <tt>DW_TAG_inheritance</tt>, then the type
- represents a base class. If the member of is
- a <a href="#format_global_variables">global variable descriptor</a> then it
- represents a static member. And, if the member is
- a <a href="#format_subprograms">subprogram descriptor</a> then it represents
- a member function. For static members and member
- functions, <tt>getName()</tt> returns the members link or the C++ mangled
- name. <tt>getDisplayName()</tt> the simplied version of the name.</p>
-
-<p>The first member of subroutine (tag = <tt>DW_TAG_subroutine_type</tt>) type
- elements is the return type for the subroutine. The remaining elements are
- the formal arguments to the subroutine.</p>
-
-<p><a href="#format_composite_type">Composite type</a> location can be
- determined from the context and line number. The size, alignment and
- offset are expressed in bits and can be 64 bit values. The alignment is used
- to round the offset when embedded in
- a <a href="#format_composite_type">composite type</a> (as an example, to keep
- float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
- in a <a href="#format_composite_type">composite type</a>.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_subrange">Subrange descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!42 = metadata !{
- i32, ;; Tag = 33 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_subrange_type)
- i64, ;; Low value
- i64 ;; High value
-}
-</pre>
-</div>
-
-<p>These descriptors are used to define ranges of array subscripts for an array
- <a href="#format_composite_type">composite type</a>. The low value defines
- the lower bounds typically zero for C/C++. The high value is the upper
- bounds. Values are 64 bit. High - low + 1 is the size of the array. If low
- > high the array bounds are not included in generated debugging information.
-</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_enumeration">Enumerator descriptors</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!6 = metadata !{
- i32, ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
- ;; (DW_TAG_enumerator)
- metadata, ;; Name
- i64 ;; Value
-}
-</pre>
-</div>
-
-<p>These descriptors are used to define members of an
- enumeration <a href="#format_composite_type">composite type</a>, it
- associates the name to the value.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_variables">Local variables</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!7 = metadata !{
- i32, ;; Tag (see below)
- metadata, ;; Context
- metadata, ;; Name
- metadata, ;; Reference to file where defined
- i32, ;; 24 bit - Line number where defined
- ;; 8 bit - Argument number. 1 indicates 1st argument.
- metadata, ;; Type descriptor
- i32, ;; flags
- metadata ;; (optional) Reference to inline location
-}
-</pre>
-</div>
-
-<p>These descriptors are used to define variables local to a sub program. The
- value of the tag depends on the usage of the variable:</p>
-
-<div class="doc_code">
-<pre>
-DW_TAG_auto_variable = 256
-DW_TAG_arg_variable = 257
-DW_TAG_return_variable = 258
-</pre>
-</div>
-
-<p>An auto variable is any variable declared in the body of the function. An
- argument variable is any variable that appears as a formal argument to the
- function. A return variable is used to track the result of a function and
- has no source correspondent.</p>
-
-<p>The context is either the subprogram or block where the variable is defined.
- Name the source variable name. Context and line indicate where the
- variable was defined. Type descriptor defines the declared type of the
- variable.</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="format_common_intrinsics">Debugger intrinsic functions</a>
-</h3>
-
-<div>
-
-<p>LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to
- provide debug information at various points in generated code.</p>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_common_declare">llvm.dbg.declare</a>
-</h4>
-
-<div>
-<pre>
- void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata)
-</pre>
-
-<p>This intrinsic provides information about a local element (e.g., variable). The
- first argument is metadata holding the alloca for the variable. The
- second argument is metadata containing a description of the variable.</p>
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="format_common_value">llvm.dbg.value</a>
-</h4>
-
-<div>
-<pre>
- void %<a href="#format_common_value">llvm.dbg.value</a>(metadata, i64, metadata)
-</pre>
-
-<p>This intrinsic provides information when a user source variable is set to a
- new value. The first argument is the new value (wrapped as metadata). The
- second argument is the offset in the user source variable where the new value
- is written. The third argument is metadata containing a description of the
- user source variable.</p>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="format_common_lifetime">Object lifetimes and scoping</a>
-</h3>
-
-<div>
-<p>In many languages, the local variables in functions can have their lifetimes
- or scopes limited to a subset of a function. In the C family of languages,
- for example, variables are only live (readable and writable) within the
- source block that they are defined in. In functional languages, values are
- only readable after they have been defined. Though this is a very obvious
- concept, it is non-trivial to model in LLVM, because it has no notion of
- scoping in this sense, and does not want to be tied to a language's scoping
- rules.</p>
-
-<p>In order to handle this, the LLVM debug format uses the metadata attached to
- llvm instructions to encode line number and scoping information. Consider
- the following C fragment, for example:</p>
-
-<div class="doc_code">
-<pre>
-1. void foo() {
-2. int X = 21;
-3. int Y = 22;
-4. {
-5. int Z = 23;
-6. Z = X;
-7. }
-8. X = Y;
-9. }
-</pre>
-</div>
-
-<p>Compiled to LLVM, this function would be represented like this:</p>
-
-<div class="doc_code">
-<pre>
-define void @foo() nounwind ssp {
-entry:
- %X = alloca i32, align 4 ; &lt;i32*&gt; [#uses=4]
- %Y = alloca i32, align 4 ; &lt;i32*&gt; [#uses=4]
- %Z = alloca i32, align 4 ; &lt;i32*&gt; [#uses=3]
- %0 = bitcast i32* %X to {}* ; &lt;{}*&gt; [#uses=1]
- call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7
- store i32 21, i32* %X, !dbg !8
- %1 = bitcast i32* %Y to {}* ; &lt;{}*&gt; [#uses=1]
- call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10
- store i32 22, i32* %Y, !dbg !11
- %2 = bitcast i32* %Z to {}* ; &lt;{}*&gt; [#uses=1]
- call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14
- store i32 23, i32* %Z, !dbg !15
- %tmp = load i32* %X, !dbg !16 ; &lt;i32&gt; [#uses=1]
- %tmp1 = load i32* %Y, !dbg !16 ; &lt;i32&gt; [#uses=1]
- %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; &lt;i32&gt; [#uses=1]
- store i32 %add, i32* %Z, !dbg !16
- %tmp2 = load i32* %Y, !dbg !17 ; &lt;i32&gt; [#uses=1]
- store i32 %tmp2, i32* %X, !dbg !17
- ret void, !dbg !18
-}
-
-declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone
-
-!0 = metadata !{i32 459008, metadata !1, metadata !"X",
- metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
-!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
-!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo",
- metadata !"foo", metadata !3, i32 1, metadata !4,
- i1 false, i1 true}; [DW_TAG_subprogram ]
-!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c",
- metadata !"/private/tmp", metadata !"clang 1.1", i1 true,
- i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
-!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0,
- i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
-!5 = metadata !{null}
-!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0,
- i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
-!7 = metadata !{i32 2, i32 7, metadata !1, null}
-!8 = metadata !{i32 2, i32 3, metadata !1, null}
-!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3,
- metadata !6}; [ DW_TAG_auto_variable ]
-!10 = metadata !{i32 3, i32 7, metadata !1, null}
-!11 = metadata !{i32 3, i32 3, metadata !1, null}
-!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5,
- metadata !6}; [ DW_TAG_auto_variable ]
-!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
-!14 = metadata !{i32 5, i32 9, metadata !13, null}
-!15 = metadata !{i32 5, i32 5, metadata !13, null}
-!16 = metadata !{i32 6, i32 5, metadata !13, null}
-!17 = metadata !{i32 8, i32 3, metadata !1, null}
-!18 = metadata !{i32 9, i32 1, metadata !2, null}
-</pre>
-</div>
-
-<p>This example illustrates a few important details about LLVM debugging
- information. In particular, it shows how the <tt>llvm.dbg.declare</tt>
- intrinsic and location information, which are attached to an instruction,
- are applied together to allow a debugger to analyze the relationship between
- statements, variable definitions, and the code used to implement the
- function.</p>
-
-<div class="doc_code">
-<pre>
-call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7
-</pre>
-</div>
-
-<p>The first intrinsic
- <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
- encodes debugging information for the variable <tt>X</tt>. The metadata
- <tt>!dbg !7</tt> attached to the intrinsic provides scope information for the
- variable <tt>X</tt>.</p>
-
-<div class="doc_code">
-<pre>
-!7 = metadata !{i32 2, i32 7, metadata !1, null}
-!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
-!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo",
- metadata !"foo", metadata !"foo", metadata !3, i32 1,
- metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]
-</pre>
-</div>
-
-<p>Here <tt>!7</tt> is metadata providing location information. It has four
- fields: line number, column number, scope, and original scope. The original
- scope represents inline location if this instruction is inlined inside a
- caller, and is null otherwise. In this example, scope is encoded by
- <tt>!1</tt>. <tt>!1</tt> represents a lexical block inside the scope
- <tt>!2</tt>, where <tt>!2</tt> is a
- <a href="#format_subprograms">subprogram descriptor</a>. This way the
- location information attached to the intrinsics indicates that the
- variable <tt>X</tt> is declared at line number 2 at a function level scope in
- function <tt>foo</tt>.</p>
-
-<p>Now lets take another example.</p>
-
-<div class="doc_code">
-<pre>
-call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14
-</pre>
-</div>
-
-<p>The second intrinsic
- <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
- encodes debugging information for variable <tt>Z</tt>. The metadata
- <tt>!dbg !14</tt> attached to the intrinsic provides scope information for
- the variable <tt>Z</tt>.</p>
-
-<div class="doc_code">
-<pre>
-!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
-!14 = metadata !{i32 5, i32 9, metadata !13, null}
-</pre>
-</div>
-
-<p>Here <tt>!14</tt> indicates that <tt>Z</tt> is declared at line number 5 and
- column number 9 inside of lexical scope <tt>!13</tt>. The lexical scope
- itself resides inside of lexical scope <tt>!1</tt> described above.</p>
-
-<p>The scope information attached with each instruction provides a
- straightforward way to find instructions covered by a scope.</p>
-
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2>
- <a name="ccxx_frontend">C/C++ front-end specific debug information</a>
-</h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>The C and C++ front-ends represent information about the program in a format
- that is effectively identical
- to <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3.0</a> in
- terms of information content. This allows code generators to trivially
- support native debuggers by generating standard dwarf information, and
- contains enough information for non-dwarf targets to translate it as
- needed.</p>
-
-<p>This section describes the forms used to represent C and C++ programs. Other
- languages could pattern themselves after this (which itself is tuned to
- representing programs in the same way that DWARF 3 does), or they could
- choose to provide completely different forms if they don't fit into the DWARF
- model. As support for debugging information gets added to the various LLVM
- source-language front-ends, the information used should be documented
- here.</p>
-
-<p>The following sections provide examples of various C/C++ constructs and the
- debug information that would best describe those constructs.</p>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="ccxx_compile_units">C/C++ source file information</a>
-</h3>
-
-<div>
-
-<p>Given the source files <tt>MySource.cpp</tt> and <tt>MyHeader.h</tt> located
- in the directory <tt>/Users/mine/sources</tt>, the following code:</p>
-
-<div class="doc_code">
-<pre>
-#include "MyHeader.h"
-
-int main(int argc, char *argv[]) {
- return 0;
-}
-</pre>
-</div>
-
-<p>a C/C++ front-end would generate the following descriptors:</p>
-
-<div class="doc_code">
-<pre>
-...
-;;
-;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
-;;
-!2 = metadata !{
- i32 524305, ;; Tag
- i32 0, ;; Unused
- i32 4, ;; Language Id
- metadata !"MySource.cpp",
- metadata !"/Users/mine/sources",
- metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)",
- i1 true, ;; Main Compile Unit
- i1 false, ;; Optimized compile unit
- metadata !"", ;; Compiler flags
- i32 0} ;; Runtime version
-
-;;
-;; Define the file for the file "/Users/mine/sources/MySource.cpp".
-;;
-!1 = metadata !{
- i32 524329, ;; Tag
- metadata !"MySource.cpp",
- metadata !"/Users/mine/sources",
- metadata !2 ;; Compile unit
-}
-
-;;
-;; Define the file for the file "/Users/mine/sources/Myheader.h"
-;;
-!3 = metadata !{
- i32 524329, ;; Tag
- metadata !"Myheader.h"
- metadata !"/Users/mine/sources",
- metadata !2 ;; Compile unit
-}
-
-...
-</pre>
-</div>
-
-<p>llvm::Instruction provides easy access to metadata attached with an
-instruction. One can extract line number information encoded in LLVM IR
-using <tt>Instruction::getMetadata()</tt> and
-<tt>DILocation::getLineNumber()</tt>.
-<pre>
- if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction
- DILocation Loc(N); // DILocation is in DebugInfo.h
- unsigned Line = Loc.getLineNumber();
- StringRef File = Loc.getFilename();
- StringRef Dir = Loc.getDirectory();
- }
-</pre>
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="ccxx_global_variable">C/C++ global variable information</a>
-</h3>
-
-<div>
-
-<p>Given an integer global variable declared as follows:</p>
-
-<div class="doc_code">
-<pre>
-int MyGlobal = 100;
-</pre>
-</div>
-
-<p>a C/C++ front-end would generate the following descriptors:</p>
-
-<div class="doc_code">
-<pre>
-;;
-;; Define the global itself.
-;;
-%MyGlobal = global int 100
-...
-;;
-;; List of debug info of globals
-;;
-!llvm.dbg.cu = !{!0}
-
-;; Define the compile unit.
-!0 = metadata !{
- i32 786449, ;; Tag
- i32 0, ;; Context
- i32 4, ;; Language
- metadata !"foo.cpp", ;; File
- metadata !"/Volumes/Data/tmp", ;; Directory
- metadata !"clang version 3.1 ", ;; Producer
- i1 true, ;; Deprecated field
- i1 false, ;; "isOptimized"?
- metadata !"", ;; Flags
- i32 0, ;; Runtime Version
- metadata !1, ;; Enum Types
- metadata !1, ;; Retained Types
- metadata !1, ;; Subprograms
- metadata !3 ;; Global Variables
-} ; [ DW_TAG_compile_unit ]
-
-;; The Array of Global Variables
-!3 = metadata !{
- metadata !4
-}
-
-!4 = metadata !{
- metadata !5
-}
-
-;;
-;; Define the global variable itself.
-;;
-!5 = metadata !{
- i32 786484, ;; Tag
- i32 0, ;; Unused
- null, ;; Unused
- metadata !"MyGlobal", ;; Name
- metadata !"MyGlobal", ;; Display Name
- metadata !"", ;; Linkage Name
- metadata !6, ;; File
- i32 1, ;; Line
- metadata !7, ;; Type
- i32 0, ;; IsLocalToUnit
- i32 1, ;; IsDefinition
- i32* @MyGlobal ;; LLVM-IR Value
-} ; [ DW_TAG_variable ]
-
-;;
-;; Define the file
-;;
-!6 = metadata !{
- i32 786473, ;; Tag
- metadata !"foo.cpp", ;; File
- metadata !"/Volumes/Data/tmp", ;; Directory
- null ;; Unused
-} ; [ DW_TAG_file_type ]
-
-;;
-;; Define the type
-;;
-!7 = metadata !{
- i32 786468, ;; Tag
- null, ;; Unused
- metadata !"int", ;; Name
- null, ;; Unused
- i32 0, ;; Line
- i64 32, ;; Size in Bits
- i64 32, ;; Align in Bits
- i64 0, ;; Offset
- i32 0, ;; Flags
- i32 5 ;; Encoding
-} ; [ DW_TAG_base_type ]
-
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="ccxx_subprogram">C/C++ function information</a>
-</h3>
-
-<div>
-
-<p>Given a function declared as follows:</p>
-
-<div class="doc_code">
-<pre>
-int main(int argc, char *argv[]) {
- return 0;
-}
-</pre>
-</div>
-
-<p>a C/C++ front-end would generate the following descriptors:</p>
-
-<div class="doc_code">
-<pre>
-;;
-;; Define the anchor for subprograms. Note that the second field of the
-;; anchor is 46, which is the same as the tag for subprograms
-;; (46 = DW_TAG_subprogram.)
-;;
-!6 = metadata !{
- i32 524334, ;; Tag
- i32 0, ;; Unused
- metadata !1, ;; Context
- metadata !"main", ;; Name
- metadata !"main", ;; Display name
- metadata !"main", ;; Linkage name
- metadata !1, ;; File
- i32 1, ;; Line number
- metadata !4, ;; Type
- i1 false, ;; Is local
- i1 true, ;; Is definition
- i32 0, ;; Virtuality attribute, e.g. pure virtual function
- i32 0, ;; Index into virtual table for C++ methods
- i32 0, ;; Type that holds virtual table.
- i32 0, ;; Flags
- i1 false, ;; True if this function is optimized
- Function *, ;; Pointer to llvm::Function
- null ;; Function template parameters
-}
-;;
-;; Define the subprogram itself.
-;;
-define i32 @main(i32 %argc, i8** %argv) {
-...
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="ccxx_basic_types">C/C++ basic types</a>
-</h3>
-
-<div>
-
-<p>The following are the basic type descriptors for C/C++ core types:</p>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_type_bool">bool</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"bool", ;; Name
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 8, ;; Size in Bits
- i64 8, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 2 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_char">char</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"char", ;; Name
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 8, ;; Size in Bits
- i64 8, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 6 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_unsigned_char">unsigned char</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"unsigned char",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 8, ;; Size in Bits
- i64 8, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 8 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_short">short</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"short int",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 16, ;; Size in Bits
- i64 16, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 5 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_unsigned_short">unsigned short</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"short unsigned int",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 16, ;; Size in Bits
- i64 16, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 7 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_int">int</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"int", ;; Name
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 32, ;; Size in Bits
- i64 32, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 5 ;; Encoding
-}
-</pre></div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_unsigned_int">unsigned int</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"unsigned int",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 32, ;; Size in Bits
- i64 32, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 7 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_long_long">long long</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"long long int",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 64, ;; Size in Bits
- i64 64, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 5 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_unsigned_long_long">unsigned long long</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"long long unsigned int",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 64, ;; Size in Bits
- i64 64, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 7 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_float">float</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"float",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 32, ;; Size in Bits
- i64 32, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 4 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="ccxx_basic_double">double</a>
-</h4>
-
-<div>
-
-<div class="doc_code">
-<pre>
-!2 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"double",;; Name
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 64, ;; Size in Bits
- i64 64, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 4 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="ccxx_derived_types">C/C++ derived types</a>
-</h3>
-
-<div>
-
-<p>Given the following as an example of C/C++ derived type:</p>
-
-<div class="doc_code">
-<pre>
-typedef const int *IntPtr;
-</pre>
-</div>
-
-<p>a C/C++ front-end would generate the following descriptors:</p>
-
-<div class="doc_code">
-<pre>
-;;
-;; Define the typedef "IntPtr".
-;;
-!2 = metadata !{
- i32 524310, ;; Tag
- metadata !1, ;; Context
- metadata !"IntPtr", ;; Name
- metadata !3, ;; File
- i32 0, ;; Line number
- i64 0, ;; Size in bits
- i64 0, ;; Align in bits
- i64 0, ;; Offset in bits
- i32 0, ;; Flags
- metadata !4 ;; Derived From type
-}
-
-;;
-;; Define the pointer type.
-;;
-!4 = metadata !{
- i32 524303, ;; Tag
- metadata !1, ;; Context
- metadata !"", ;; Name
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 64, ;; Size in bits
- i64 64, ;; Align in bits
- i64 0, ;; Offset in bits
- i32 0, ;; Flags
- metadata !5 ;; Derived From type
-}
-;;
-;; Define the const type.
-;;
-!5 = metadata !{
- i32 524326, ;; Tag
- metadata !1, ;; Context
- metadata !"", ;; Name
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 32, ;; Size in bits
- i64 32, ;; Align in bits
- i64 0, ;; Offset in bits
- i32 0, ;; Flags
- metadata !6 ;; Derived From type
-}
-;;
-;; Define the int type.
-;;
-!6 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"int", ;; Name
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 32, ;; Size in bits
- i64 32, ;; Align in bits
- i64 0, ;; Offset in bits
- i32 0, ;; Flags
- 5 ;; Encoding
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="ccxx_composite_types">C/C++ struct/union types</a>
-</h3>
-
-<div>
-
-<p>Given the following as an example of C/C++ struct type:</p>
-
-<div class="doc_code">
-<pre>
-struct Color {
- unsigned Red;
- unsigned Green;
- unsigned Blue;
-};
-</pre>
-</div>
-
-<p>a C/C++ front-end would generate the following descriptors:</p>
-
-<div class="doc_code">
-<pre>
-;;
-;; Define basic type for unsigned int.
-;;
-!5 = metadata !{
- i32 524324, ;; Tag
- metadata !1, ;; Context
- metadata !"unsigned int",
- metadata !1, ;; File
- i32 0, ;; Line number
- i64 32, ;; Size in Bits
- i64 32, ;; Align in Bits
- i64 0, ;; Offset in Bits
- i32 0, ;; Flags
- i32 7 ;; Encoding
-}
-;;
-;; Define composite type for struct Color.
-;;
-!2 = metadata !{
- i32 524307, ;; Tag
- metadata !1, ;; Context
- metadata !"Color", ;; Name
- metadata !1, ;; Compile unit
- i32 1, ;; Line number
- i64 96, ;; Size in bits
- i64 32, ;; Align in bits
- i64 0, ;; Offset in bits
- i32 0, ;; Flags
- null, ;; Derived From
- metadata !3, ;; Elements
- i32 0 ;; Runtime Language
-}
-
-;;
-;; Define the Red field.
-;;
-!4 = metadata !{
- i32 524301, ;; Tag
- metadata !1, ;; Context
- metadata !"Red", ;; Name
- metadata !1, ;; File
- i32 2, ;; Line number
- i64 32, ;; Size in bits
- i64 32, ;; Align in bits
- i64 0, ;; Offset in bits
- i32 0, ;; Flags
- metadata !5 ;; Derived From type
-}
-
-;;
-;; Define the Green field.
-;;
-!6 = metadata !{
- i32 524301, ;; Tag
- metadata !1, ;; Context
- metadata !"Green", ;; Name
- metadata !1, ;; File
- i32 3, ;; Line number
- i64 32, ;; Size in bits
- i64 32, ;; Align in bits
- i64 32, ;; Offset in bits
- i32 0, ;; Flags
- metadata !5 ;; Derived From type
-}
-
-;;
-;; Define the Blue field.
-;;
-!7 = metadata !{
- i32 524301, ;; Tag
- metadata !1, ;; Context
- metadata !"Blue", ;; Name
- metadata !1, ;; File
- i32 4, ;; Line number
- i64 32, ;; Size in bits
- i64 32, ;; Align in bits
- i64 64, ;; Offset in bits
- i32 0, ;; Flags
- metadata !5 ;; Derived From type
-}
-
-;;
-;; Define the array of fields used by the composite type Color.
-;;
-!3 = metadata !{metadata !4, metadata !6, metadata !7}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="ccxx_enumeration_types">C/C++ enumeration types</a>
-</h3>
-
-<div>
-
-<p>Given the following as an example of C/C++ enumeration type:</p>
-
-<div class="doc_code">
-<pre>
-enum Trees {
- Spruce = 100,
- Oak = 200,
- Maple = 300
-};
-</pre>
-</div>
-
-<p>a C/C++ front-end would generate the following descriptors:</p>
-
-<div class="doc_code">
-<pre>
-;;
-;; Define composite type for enum Trees
-;;
-!2 = metadata !{
- i32 524292, ;; Tag
- metadata !1, ;; Context
- metadata !"Trees", ;; Name
- metadata !1, ;; File
- i32 1, ;; Line number
- i64 32, ;; Size in bits
- i64 32, ;; Align in bits
- i64 0, ;; Offset in bits
- i32 0, ;; Flags
- null, ;; Derived From type
- metadata !3, ;; Elements
- i32 0 ;; Runtime language
-}
-
-;;
-;; Define the array of enumerators used by composite type Trees.
-;;
-!3 = metadata !{metadata !4, metadata !5, metadata !6}
-
-;;
-;; Define Spruce enumerator.
-;;
-!4 = metadata !{i32 524328, metadata !"Spruce", i64 100}
-
-;;
-;; Define Oak enumerator.
-;;
-!5 = metadata !{i32 524328, metadata !"Oak", i64 200}
-
-;;
-;; Define Maple enumerator.
-;;
-!6 = metadata !{i32 524328, metadata !"Maple", i64 300}
-
-</pre>
-</div>
-
-</div>
-
-</div>
-
-
-<!-- *********************************************************************** -->
-<h2>
- <a name="llvmdwarfextension">Debugging information format</a>
-</h2>
-<!-- *********************************************************************** -->
-<div>
-<!-- ======================================================================= -->
-<h3>
- <a name="objcproperty">Debugging Information Extension for Objective C Properties</a>
-</h3>
-<div>
-<!-- *********************************************************************** -->
-<h4>
- <a name="objcpropertyintroduction">Introduction</a>
-</h4>
-<!-- *********************************************************************** -->
-
-<div>
-<p>Objective C provides a simpler way to declare and define accessor methods
-using declared properties. The language provides features to declare a
-property and to let compiler synthesize accessor methods.
-</p>
-
-<p>The debugger lets developer inspect Objective C interfaces and their
-instance variables and class variables. However, the debugger does not know
-anything about the properties defined in Objective C interfaces. The debugger
-consumes information generated by compiler in DWARF format. The format does
-not support encoding of Objective C properties. This proposal describes DWARF
-extensions to encode Objective C properties, which the debugger can use to let
-developers inspect Objective C properties.
-</p>
-
-</div>
-
-
-<!-- *********************************************************************** -->
-<h4>
- <a name="objcpropertyproposal">Proposal</a>
-</h4>
-<!-- *********************************************************************** -->
-
-<div>
-<p>Objective C properties exist separately from class members. A property
-can be defined only by &quot;setter&quot; and &quot;getter&quot; selectors, and
-be calculated anew on each access. Or a property can just be a direct access
-to some declared ivar. Finally it can have an ivar &quot;automatically
-synthesized&quot; for it by the compiler, in which case the property can be
-referred to in user code directly using the standard C dereference syntax as
-well as through the property &quot;dot&quot; syntax, but there is no entry in
-the @interface declaration corresponding to this ivar.
-</p>
-<p>
-To facilitate debugging, these properties we will add a new DWARF TAG into the
-DW_TAG_structure_type definition for the class to hold the description of a
-given property, and a set of DWARF attributes that provide said description.
-The property tag will also contain the name and declared type of the property.
-</p>
-<p>
-If there is a related ivar, there will also be a DWARF property attribute placed
-in the DW_TAG_member DIE for that ivar referring back to the property TAG for
-that property. And in the case where the compiler synthesizes the ivar directly,
-the compiler is expected to generate a DW_TAG_member for that ivar (with the
-DW_AT_artificial set to 1), whose name will be the name used to access this
-ivar directly in code, and with the property attribute pointing back to the
-property it is backing.
-</p>
-<p>
-The following examples will serve as illustration for our discussion:
-</p>
-
-<div class="doc_code">
-<pre>
-@interface I1 {
- int n2;
-}
-
-@property int p1;
-@property int p2;
-@end
-
-@implementation I1
-@synthesize p1;
-@synthesize p2 = n2;
-@end
-</pre>
-</div>
-
-<p>
-This produces the following DWARF (this is a &quot;pseudo dwarfdump&quot; output):
-</p>
-<div class="doc_code">
-<pre>
-0x00000100: TAG_structure_type [7] *
- AT_APPLE_runtime_class( 0x10 )
- AT_name( "I1" )
- AT_decl_file( "Objc_Property.m" )
- AT_decl_line( 3 )
-
-0x00000110 TAG_APPLE_property
- AT_name ( "p1" )
- AT_type ( {0x00000150} ( int ) )
-
-0x00000120: TAG_APPLE_property
- AT_name ( "p2" )
- AT_type ( {0x00000150} ( int ) )
-
-0x00000130: TAG_member [8]
- AT_name( "_p1" )
- AT_APPLE_property ( {0x00000110} "p1" )
- AT_type( {0x00000150} ( int ) )
- AT_artificial ( 0x1 )
-
-0x00000140: TAG_member [8]
- AT_name( "n2" )
- AT_APPLE_property ( {0x00000120} "p2" )
- AT_type( {0x00000150} ( int ) )
-
-0x00000150: AT_type( ( int ) )
-</pre>
-</div>
-
-<p> Note, the current convention is that the name of the ivar for an
-auto-synthesized property is the name of the property from which it derives with
-an underscore prepended, as is shown in the example.
-But we actually don't need to know this convention, since we are given the name
-of the ivar directly.
-</p>
-
-<p>
-Also, it is common practice in ObjC to have different property declarations in
-the @interface and @implementation - e.g. to provide a read-only property in
-the interface,and a read-write interface in the implementation. In that case,
-the compiler should emit whichever property declaration will be in force in the
-current translation unit.
-</p>
-
-<p> Developers can decorate a property with attributes which are encoded using
-DW_AT_APPLE_property_attribute.
-</p>
-
-<div class="doc_code">
-<pre>
-@property (readonly, nonatomic) int pr;
-</pre>
-</div>
-<p>
-Which produces a property tag:
-<p>
-<div class="doc_code">
-<pre>
-TAG_APPLE_property [8]
- AT_name( "pr" )
- AT_type ( {0x00000147} (int) )
- AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
-</pre>
-</div>
-
-<p> The setter and getter method names are attached to the property using
-DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes.
-</p>
-<div class="doc_code">
-<pre>
-@interface I1
-@property (setter=myOwnP3Setter:) int p3;
--(void)myOwnP3Setter:(int)a;
-@end
-
-@implementation I1
-@synthesize p3;
--(void)myOwnP3Setter:(int)a{ }
-@end
-</pre>
-</div>
-
-<p>
-The DWARF for this would be:
-</p>
-<div class="doc_code">
-<pre>
-0x000003bd: TAG_structure_type [7] *
- AT_APPLE_runtime_class( 0x10 )
- AT_name( "I1" )
- AT_decl_file( "Objc_Property.m" )
- AT_decl_line( 3 )
-
-0x000003cd TAG_APPLE_property
- AT_name ( "p3" )
- AT_APPLE_property_setter ( "myOwnP3Setter:" )
- AT_type( {0x00000147} ( int ) )
-
-0x000003f3: TAG_member [8]
- AT_name( "_p3" )
- AT_type ( {0x00000147} ( int ) )
- AT_APPLE_property ( {0x000003cd} )
- AT_artificial ( 0x1 )
-</pre>
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h4>
- <a name="objcpropertynewtags">New DWARF Tags</a>
-</h4>
-<!-- *********************************************************************** -->
-
-<div>
-<table border="1" cellspacing="0">
- <col width="200">
- <col width="200">
- <tr>
- <th>TAG</th>
- <th>Value</th>
- </tr>
- <tr>
- <td>DW_TAG_APPLE_property</td>
- <td>0x4200</td>
- </tr>
-</table>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h4>
- <a name="objcpropertynewattributes">New DWARF Attributes</a>
-</h4>
-<!-- *********************************************************************** -->
-
-<div>
-<table border="1" cellspacing="0">
- <col width="200">
- <col width="200">
- <col width="200">
- <tr>
- <th>Attribute</th>
- <th>Value</th>
- <th>Classes</th>
- </tr>
- <tr>
- <td>DW_AT_APPLE_property</td>
- <td>0x3fed</td>
- <td>Reference</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_property_getter</td>
- <td>0x3fe9</td>
- <td>String</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_property_setter</td>
- <td>0x3fea</td>
- <td>String</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_property_attribute</td>
- <td>0x3feb</td>
- <td>Constant</td>
- </tr>
-</table>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h4>
- <a name="objcpropertynewconstants">New DWARF Constants</a>
-</h4>
-<!-- *********************************************************************** -->
-
-<div>
-<table border="1" cellspacing="0">
- <col width="200">
- <col width="200">
- <tr>
- <th>Name</th>
- <th>Value</th>
- </tr>
- <tr>
- <td>DW_AT_APPLE_PROPERTY_readonly</td>
- <td>0x1</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_PROPERTY_readwrite</td>
- <td>0x2</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_PROPERTY_assign</td>
- <td>0x4</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_PROPERTY_retain</td>
- <td>0x8</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_PROPERTY_copy</td>
- <td>0x10</td>
- </tr>
- <tr>
- <td>DW_AT_APPLE_PROPERTY_nonatomic</td>
- <td>0x20</td>
- </tr>
-</table>
-
-</div>
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="acceltable">Name Accelerator Tables</a>
-</h3>
-<!-- ======================================================================= -->
-<div>
-<!-- ======================================================================= -->
-<h4>
- <a name="acceltableintroduction">Introduction</a>
-</h4>
-<!-- ======================================================================= -->
-<div>
-<p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger
- needs. The "pub" in the section name indicates that the entries in the
- table are publicly visible names only. This means no static or hidden
- functions show up in the .debug_pubnames. No static variables or private class
- variables are in the .debug_pubtypes. Many compilers add different things to
- these tables, so we can't rely upon the contents between gcc, icc, or clang.</p>
-
-<p>The typical query given by users tends not to match up with the contents of
- these tables. For example, the DWARF spec states that "In the case of the
- name of a function member or static data member of a C++ structure, class or
- union, the name presented in the .debug_pubnames section is not the simple
- name given by the DW_AT_name attribute of the referenced debugging information
- entry, but rather the fully qualified name of the data or function member."
- So the only names in these tables for complex C++ entries is a fully
- qualified name. Debugger users tend not to enter their search strings as
- "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So
- the name entered in the name table must be demangled in order to chop it up
- appropriately and additional names must be manually entered into the table
- to make it effective as a name lookup table for debuggers to use.</p>
-
-<p>All debuggers currently ignore the .debug_pubnames table as a result of
- its inconsistent and useless public-only name content making it a waste of
- space in the object file. These tables, when they are written to disk, are
- not sorted in any way, leaving every debugger to do its own parsing
- and sorting. These tables also include an inlined copy of the string values
- in the table itself making the tables much larger than they need to be on
- disk, especially for large C++ programs.</p>
-
-<p>Can't we just fix the sections by adding all of the names we need to this
- table? No, because that is not what the tables are defined to contain and we
- won't know the difference between the old bad tables and the new good tables.
- At best we could make our own renamed sections that contain all of the data
- we need.</p>
-
-<p>These tables are also insufficient for what a debugger like LLDB needs.
- LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is
- then often asked to look for type "foo" or namespace "bar", or list items in
- namespace "baz". Namespaces are not included in the pubnames or pubtypes
- tables. Since clang asks a lot of questions when it is parsing an expression,
- we need to be very fast when looking up names, as it happens a lot. Having new
- accelerator tables that are optimized for very quick lookups will benefit
- this type of debugging experience greatly.</p>
-
-<p>We would like to generate name lookup tables that can be mapped into
- memory from disk, and used as is, with little or no up-front parsing. We would
- also be able to control the exact content of these different tables so they
- contain exactly what we need. The Name Accelerator Tables were designed
- to fix these issues. In order to solve these issues we need to:</p>
-
-<ul>
- <li>Have a format that can be mapped into memory from disk and used as is</li>
- <li>Lookups should be very fast</li>
- <li>Extensible table format so these tables can be made by many producers</li>
- <li>Contain all of the names needed for typical lookups out of the box</li>
- <li>Strict rules for the contents of tables</li>
-</ul>
-
-<p>Table size is important and the accelerator table format should allow the
- reuse of strings from common string tables so the strings for the names are
- not duplicated. We also want to make sure the table is ready to be used as-is
- by simply mapping the table into memory with minimal header parsing.</p>
-
-<p>The name lookups need to be fast and optimized for the kinds of lookups
- that debuggers tend to do. Optimally we would like to touch as few parts of
- the mapped table as possible when doing a name lookup and be able to quickly
- find the name entry we are looking for, or discover there are no matches. In
- the case of debuggers we optimized for lookups that fail most of the time.</p>
-
-<p>Each table that is defined should have strict rules on exactly what is in
- the accelerator tables and documented so clients can rely on the content.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="acceltablehashes">Hash Tables</a>
-</h4>
-<!-- ======================================================================= -->
-
-<div>
-<h5>Standard Hash Tables</h5>
-
-<p>Typical hash tables have a header, buckets, and each bucket points to the
-bucket contents:
-</p>
-
-<div class="doc_code">
-<pre>
-.------------.
-| HEADER |
-|------------|
-| BUCKETS |
-|------------|
-| DATA |
-`------------'
-</pre>
-</div>
-
-<p>The BUCKETS are an array of offsets to DATA for each hash:</p>
-
-<div class="doc_code">
-<pre>
-.------------.
-| 0x00001000 | BUCKETS[0]
-| 0x00002000 | BUCKETS[1]
-| 0x00002200 | BUCKETS[2]
-| 0x000034f0 | BUCKETS[3]
-| | ...
-| 0xXXXXXXXX | BUCKETS[n_buckets]
-'------------'
-</pre>
-</div>
-
-<p>So for bucket[3] in the example above, we have an offset into the table
- 0x000034f0 which points to a chain of entries for the bucket. Each bucket
- must contain a next pointer, full 32 bit hash value, the string itself,
- and the data for the current string value.</p>
-
-<div class="doc_code">
-<pre>
- .------------.
-0x000034f0: | 0x00003500 | next pointer
- | 0x12345678 | 32 bit hash
- | "erase" | string value
- | data[n] | HashData for this bucket
- |------------|
-0x00003500: | 0x00003550 | next pointer
- | 0x29273623 | 32 bit hash
- | "dump" | string value
- | data[n] | HashData for this bucket
- |------------|
-0x00003550: | 0x00000000 | next pointer
- | 0x82638293 | 32 bit hash
- | "main" | string value
- | data[n] | HashData for this bucket
- `------------'
-</pre>
-</div>
-
-<p>The problem with this layout for debuggers is that we need to optimize for
- the negative lookup case where the symbol we're searching for is not present.
- So if we were to lookup "printf" in the table above, we would make a 32 hash
- for "printf", it might match bucket[3]. We would need to go to the offset
- 0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we
- need to read the next pointer, then read the hash, compare it, and skip to
- the next bucket. Each time we are skipping many bytes in memory and touching
- new cache pages just to do the compare on the full 32 bit hash. All of these
- accesses then tell us that we didn't have a match.</p>
-
-<h5>Name Hash Tables</h5>
-
-<p>To solve the issues mentioned above we have structured the hash tables
- a bit differently: a header, buckets, an array of all unique 32 bit hash
- values, followed by an array of hash value data offsets, one for each hash
- value, then the data for all hash values:</p>
-
-<div class="doc_code">
-<pre>
-.-------------.
-| HEADER |
-|-------------|
-| BUCKETS |
-|-------------|
-| HASHES |
-|-------------|
-| OFFSETS |
-|-------------|
-| DATA |
-`-------------'
-</pre>
-</div>
-
-<p>The BUCKETS in the name tables are an index into the HASHES array. By
- making all of the full 32 bit hash values contiguous in memory, we allow
- ourselves to efficiently check for a match while touching as little
- memory as possible. Most often checking the 32 bit hash values is as far as
- the lookup goes. If it does match, it usually is a match with no collisions.
- So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash
- values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p>
-
-<div class="doc_code">
-<pre>
-.-------------------------.
-| HEADER.magic | uint32_t
-| HEADER.version | uint16_t
-| HEADER.hash_function | uint16_t
-| HEADER.bucket_count | uint32_t
-| HEADER.hashes_count | uint32_t
-| HEADER.header_data_len | uint32_t
-| HEADER_DATA | HeaderData
-|-------------------------|
-| BUCKETS | uint32_t[bucket_count] // 32 bit hash indexes
-|-------------------------|
-| HASHES | uint32_t[hashes_count] // 32 bit hash values
-|-------------------------|
-| OFFSETS | uint32_t[hashes_count] // 32 bit offsets to hash value data
-|-------------------------|
-| ALL HASH DATA |
-`-------------------------'
-</pre>
-</div>
-
-<p>So taking the exact same data from the standard hash example above we end up
- with:</p>
-
-<div class="doc_code">
-<pre>
- .------------.
- | HEADER |
- |------------|
- | 0 | BUCKETS[0]
- | 2 | BUCKETS[1]
- | 5 | BUCKETS[2]
- | 6 | BUCKETS[3]
- | | ...
- | ... | BUCKETS[n_buckets]
- |------------|
- | 0x........ | HASHES[0]
- | 0x........ | HASHES[1]
- | 0x........ | HASHES[2]
- | 0x........ | HASHES[3]
- | 0x........ | HASHES[4]
- | 0x........ | HASHES[5]
- | 0x12345678 | HASHES[6] hash for BUCKETS[3]
- | 0x29273623 | HASHES[7] hash for BUCKETS[3]
- | 0x82638293 | HASHES[8] hash for BUCKETS[3]
- | 0x........ | HASHES[9]
- | 0x........ | HASHES[10]
- | 0x........ | HASHES[11]
- | 0x........ | HASHES[12]
- | 0x........ | HASHES[13]
- | 0x........ | HASHES[n_hashes]
- |------------|
- | 0x........ | OFFSETS[0]
- | 0x........ | OFFSETS[1]
- | 0x........ | OFFSETS[2]
- | 0x........ | OFFSETS[3]
- | 0x........ | OFFSETS[4]
- | 0x........ | OFFSETS[5]
- | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3]
- | 0x00003500 | OFFSETS[7] offset for BUCKETS[3]
- | 0x00003550 | OFFSETS[8] offset for BUCKETS[3]
- | 0x........ | OFFSETS[9]
- | 0x........ | OFFSETS[10]
- | 0x........ | OFFSETS[11]
- | 0x........ | OFFSETS[12]
- | 0x........ | OFFSETS[13]
- | 0x........ | OFFSETS[n_hashes]
- |------------|
- | |
- | |
- | |
- | |
- | |
- |------------|
-0x000034f0: | 0x00001203 | .debug_str ("erase")
- | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
- | 0x........ | HashData[0]
- | 0x........ | HashData[1]
- | 0x........ | HashData[2]
- | 0x........ | HashData[3]
- | 0x00000000 | String offset into .debug_str (terminate data for hash)
- |------------|
-0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
- | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
- | 0x........ | HashData[0]
- | 0x........ | HashData[1]
- | 0x00001203 | String offset into .debug_str ("dump")
- | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
- | 0x........ | HashData[0]
- | 0x........ | HashData[1]
- | 0x........ | HashData[2]
- | 0x00000000 | String offset into .debug_str (terminate data for hash)
- |------------|
-0x00003550: | 0x00001203 | String offset into .debug_str ("main")
- | 0x00000009 | A 32 bit array count - number of HashData with name "main"
- | 0x........ | HashData[0]
- | 0x........ | HashData[1]
- | 0x........ | HashData[2]
- | 0x........ | HashData[3]
- | 0x........ | HashData[4]
- | 0x........ | HashData[5]
- | 0x........ | HashData[6]
- | 0x........ | HashData[7]
- | 0x........ | HashData[8]
- | 0x00000000 | String offset into .debug_str (terminate data for hash)
- `------------'
-</pre>
-</div>
-
-<p>So we still have all of the same data, we just organize it more efficiently
- for debugger lookup. If we repeat the same "printf" lookup from above, we
- would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash
- value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index
- into the HASHES table. We would then compare any consecutive 32 bit hashes
- values in the HASHES array as long as the hashes would be in BUCKETS[3]. We
- do this by verifying that each subsequent hash value modulo n_buckets is still
- 3. In the case of a failed lookup we would access the memory for BUCKETS[3], and
- then compare a few consecutive 32 bit hashes before we know that we have no match.
- We don't end up marching through multiple words of memory and we really keep the
- number of processor data cache lines being accessed as small as possible.</p>
-
-<p>The string hash that is used for these lookup tables is the Daniel J.
- Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very
- good hash for all kinds of names in programs with very few hash collisions.</p>
-
-<p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p>
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="acceltabledetails">Details</a>
-</h4>
-<!-- ======================================================================= -->
-<div>
-<p>These name hash tables are designed to be generic where specializations of
- the table get to define additional data that goes into the header
- ("HeaderData"), how the string value is stored ("KeyType") and the content
- of the data for each hash value.</p>
-
-<h5>Header Layout</h5>
-<p>The header has a fixed part, and the specialized part. The exact format of
- the header is:</p>
-<div class="doc_code">
-<pre>
-struct Header
-{
- uint32_t magic; // 'HASH' magic value to allow endian detection
- uint16_t version; // Version number
- uint16_t hash_function; // The hash function enumeration that was used
- uint32_t bucket_count; // The number of buckets in this hash table
- uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table
- uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
- // Specifically the length of the following HeaderData field - this does not
- // include the size of the preceding fields
- HeaderData header_data; // Implementation specific header data
-};
-</pre>
-</div>
-<p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as
- an ASCII integer. This allows the detection of the start of the hash table and
- also allows the table's byte order to be determined so the table can be
- correctly extracted. The "magic" value is followed by a 16 bit version number
- which allows the table to be revised and modified in the future. The current
- version number is 1. "hash_function" is a uint16_t enumeration that specifies
- which hash function was used to produce this table. The current values for the
- hash function enumerations include:</p>
-<div class="doc_code">
-<pre>
-enum HashFunctionType
-{
- eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
-};
-</pre>
-</div>
-<p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets
- are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash
- values that are in the HASHES array, and is the same number of offsets are
- contained in the OFFSETS array. "header_data_len" specifies the size in
- bytes of the HeaderData that is filled in by specialized versions of this
- table.</p>
-
-<h5>Fixed Lookup</h5>
-<p>The header is followed by the buckets, hashes, offsets, and hash value
- data.
-<div class="doc_code">
-<pre>
-struct FixedTable
-{
- uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
- uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
- uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
-};
-</pre>
-</div>
-<p>"buckets" is an array of 32 bit indexes into the "hashes" array. The
- "hashes" array contains all of the 32 bit hash values for all names in the
- hash table. Each hash in the "hashes" table has an offset in the "offsets"
- array that points to the data for the hash value.</p>
-
-<p>This table setup makes it very easy to repurpose these tables to contain
- different data, while keeping the lookup mechanism the same for all tables.
- This layout also makes it possible to save the table to disk and map it in
- later and do very efficient name lookups with little or no parsing.</p>
-
-<p>DWARF lookup tables can be implemented in a variety of ways and can store
- a lot of information for each name. We want to make the DWARF tables
- extensible and able to store the data efficiently so we have used some of the
- DWARF features that enable efficient data storage to define exactly what kind
- of data we store for each name.</p>
-
-<p>The "HeaderData" contains a definition of the contents of each HashData
- chunk. We might want to store an offset to all of the debug information
- entries (DIEs) for each name. To keep things extensible, we create a list of
- items, or Atoms, that are contained in the data for each name. First comes the
- type of the data in each atom:</p>
-<div class="doc_code">
-<pre>
-enum AtomType
-{
- eAtomTypeNULL = 0u,
- eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding
- eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question
- eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
- eAtomTypeNameFlags = 4u, // Flags from enum NameFlags
- eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags
-};
-</pre>
-</div>
-<p>The enumeration values and their meanings are:</p>
-<div class="doc_code">
-<pre>
- eAtomTypeNULL - a termination atom that specifies the end of the atom list
- eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name
- eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE
- eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
- eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...)
- eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...)
-</pre>
-</div>
-<p>Then we allow each atom type to define the atom type and how the data for
- each atom type data is encoded:</p>
-<div class="doc_code">
-<pre>
-struct Atom
-{
- uint16_t type; // AtomType enum value
- uint16_t form; // DWARF DW_FORM_XXX defines
-};
-</pre>
-</div>
-<p>The "form" type above is from the DWARF specification and defines the
- exact encoding of the data for the Atom type. See the DWARF specification for
- the DW_FORM_ definitions.</p>
-<div class="doc_code">
-<pre>
-struct HeaderData
-{
- uint32_t die_offset_base;
- uint32_t atom_count;
- Atoms atoms[atom_count0];
-};
-</pre>
-</div>
-<p>"HeaderData" defines the base DIE offset that should be added to any atoms
- that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4,
- DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in
- each "HashData" object -- Atom.form tells us how large each field will be in
- the HashData and the Atom.type tells us how this data should be interpreted.</p>
-
-<p>For the current implementations of the ".apple_names" (all functions + globals),
- the ".apple_types" (names of all types that are defined), and the
- ".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p>
-<div class="doc_code">
-<pre>
-HeaderData.atom_count = 1;
-HeaderData.atoms[0].type = eAtomTypeDIEOffset;
-HeaderData.atoms[0].form = DW_FORM_data4;
-</pre>
-</div>
-<p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
- encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
- multiple matching DIEs in a single file, which could come up with an inlined
- function for instance. Future tables could include more information about the
- DIE such as flags indicating if the DIE is a function, method, block,
- or inlined.</p>
-
-<p>The KeyType for the DWARF table is a 32 bit string table offset into the
- ".debug_str" table. The ".debug_str" is the string table for the DWARF which
- may already contain copies of all of the strings. This helps make sure, with
- help from the compiler, that we reuse the strings between all of the DWARF
- sections and keeps the hash table size down. Another benefit to having the
- compiler generate all strings as DW_FORM_strp in the debug info, is that
- DWARF parsing can be made much faster.</p>
-
-<p>After a lookup is made, we get an offset into the hash data. The hash data
- needs to be able to deal with 32 bit hash collisions, so the chunk of data
- at the offset in the hash data consists of a triple:</p>
-<div class="doc_code">
-<pre>
-uint32_t str_offset
-uint32_t hash_data_count
-HashData[hash_data_count]
-</pre>
-</div>
-<p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the
- hash data chunks contain a single item (no 32 bit hash collision):</p>
-<div class="doc_code">
-<pre>
-.------------.
-| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
-| 0x00000004 | uint32_t HashData count
-| 0x........ | uint32_t HashData[0] DIE offset
-| 0x........ | uint32_t HashData[1] DIE offset
-| 0x........ | uint32_t HashData[2] DIE offset
-| 0x........ | uint32_t HashData[3] DIE offset
-| 0x00000000 | uint32_t KeyType (end of hash chain)
-`------------'
-</pre>
-</div>
-<p>If there are collisions, you will have multiple valid string offsets:</p>
-<div class="doc_code">
-<pre>
-.------------.
-| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
-| 0x00000004 | uint32_t HashData count
-| 0x........ | uint32_t HashData[0] DIE offset
-| 0x........ | uint32_t HashData[1] DIE offset
-| 0x........ | uint32_t HashData[2] DIE offset
-| 0x........ | uint32_t HashData[3] DIE offset
-| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
-| 0x00000002 | uint32_t HashData count
-| 0x........ | uint32_t HashData[0] DIE offset
-| 0x........ | uint32_t HashData[1] DIE offset
-| 0x00000000 | uint32_t KeyType (end of hash chain)
-`------------'
-</pre>
-</div>
-<p>Current testing with real world C++ binaries has shown that there is around 1
- 32 bit hash collision per 100,000 name entries.</p>
-</div>
-<!-- ======================================================================= -->
-<h4>
- <a name="acceltablecontents">Contents</a>
-</h4>
-<!-- ======================================================================= -->
-<div>
-<p>As we said, we want to strictly define exactly what is included in the
- different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types",
- and ".apple_namespaces".</p>
-
-<p>".apple_names" sections should contain an entry for each DWARF DIE whose
- DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that
- has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or
- DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr
- in the location (global and static variables). All global and static variables
- should be included, including those scoped within functions and classes. For
- example using the following code:</p>
-<div class="doc_code">
-<pre>
-static int var = 0;
-
-void f ()
-{
- static int var = 0;
-}
-</pre>
-</div>
-<p>Both of the static "var" variables would be included in the table. All
- functions should emit both their full names and their basenames. For C or C++,
- the full name is the mangled name (if available) which is usually in the
- DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function
- basename. If global or static variables have a mangled name in a
- DW_AT_MIPS_linkage_name attribute, this should be emitted along with the
- simple name found in the DW_AT_name attribute.</p>
-
-<p>".apple_types" sections should contain an entry for each DWARF DIE whose
- tag is one of:</p>
-<ul>
- <li>DW_TAG_array_type</li>
- <li>DW_TAG_class_type</li>
- <li>DW_TAG_enumeration_type</li>
- <li>DW_TAG_pointer_type</li>
- <li>DW_TAG_reference_type</li>
- <li>DW_TAG_string_type</li>
- <li>DW_TAG_structure_type</li>
- <li>DW_TAG_subroutine_type</li>
- <li>DW_TAG_typedef</li>
- <li>DW_TAG_union_type</li>
- <li>DW_TAG_ptr_to_member_type</li>
- <li>DW_TAG_set_type</li>
- <li>DW_TAG_subrange_type</li>
- <li>DW_TAG_base_type</li>
- <li>DW_TAG_const_type</li>
- <li>DW_TAG_constant</li>
- <li>DW_TAG_file_type</li>
- <li>DW_TAG_namelist</li>
- <li>DW_TAG_packed_type</li>
- <li>DW_TAG_volatile_type</li>
- <li>DW_TAG_restrict_type</li>
- <li>DW_TAG_interface_type</li>
- <li>DW_TAG_unspecified_type</li>
- <li>DW_TAG_shared_type</li>
-</ul>
-<p>Only entries with a DW_AT_name attribute are included, and the entry must
- not be a forward declaration (DW_AT_declaration attribute with a non-zero value).
- For example, using the following code:</p>
-<div class="doc_code">
-<pre>
-int main ()
-{
- int *b = 0;
- return *b;
-}
-</pre>
-</div>
-<p>We get a few type DIEs:</p>
-<div class="doc_code">
-<pre>
-0x00000067: TAG_base_type [5]
- AT_encoding( DW_ATE_signed )
- AT_name( "int" )
- AT_byte_size( 0x04 )
-
-0x0000006e: TAG_pointer_type [6]
- AT_type( {0x00000067} ( int ) )
- AT_byte_size( 0x08 )
-</pre>
-</div>
-<p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p>
-
-<p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If
- we run into a namespace that has no name this is an anonymous namespace,
- and the name should be output as "(anonymous namespace)" (without the quotes).
- Why? This matches the output of the abi::cxa_demangle() that is in the standard
- C++ library that demangles mangled names.</p>
-</div>
-
-<!-- ======================================================================= -->
-<h4>
- <a name="acceltableextensions">Language Extensions and File Format Changes</a>
-</h4>
-<!-- ======================================================================= -->
-<div>
-<h5>Objective-C Extensions</h5>
-<p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an
- Objective-C class. The name used in the hash table is the name of the
- Objective-C class itself. If the Objective-C class has a category, then an
- entry is made for both the class name without the category, and for the class
- name with the category. So if we have a DIE at offset 0x1234 with a name
- of method "-[NSString(my_additions) stringWithSpecialString:]", we would add
- an entry for "NSString" that points to DIE 0x1234, and an entry for
- "NSString(my_additions)" that points to 0x1234. This allows us to quickly
- track down all Objective-C methods for an Objective-C class when doing
- expressions. It is needed because of the dynamic nature of Objective-C where
- anyone can add methods to a class. The DWARF for Objective-C methods is also
- emitted differently from C++ classes where the methods are not usually
- contained in the class definition, they are scattered about across one or more
- compile units. Categories can also be defined in different shared libraries.
- So we need to be able to quickly find all of the methods and class functions
- given the Objective-C class name, or quickly find all methods and class
- functions for a class + category name. This table does not contain any selector
- names, it just maps Objective-C class names (or class names + category) to all
- of the methods and class functions. The selectors are added as function
- basenames in the .debug_names section.</p>
-
-<p>In the ".apple_names" section for Objective-C functions, the full name is the
- entire function name with the brackets ("-[NSString stringWithCString:]") and the
- basename is the selector only ("stringWithCString:").</p>
-
-<h5>Mach-O Changes</h5>
-<p>The sections names for the apple hash tables are for non mach-o files. For
- mach-o files, the sections should be contained in the "__DWARF" segment with
- names as follows:</p>
-<ul>
- <li>".apple_names" -> "__apple_names"</li>
- <li>".apple_types" -> "__apple_types"</li>
- <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li>
- <li> ".apple_objc" -> "__apple_objc"</li>
-</ul>
-</div>
-</div>
-</div>
-
-<!-- *********************************************************************** -->
-
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-
-</body>
-</html>
diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst
new file mode 100644
index 0000000000..2bbf2e2c35
--- /dev/null
+++ b/docs/SourceLevelDebugging.rst
@@ -0,0 +1,2285 @@
+================================
+Source Level Debugging with LLVM
+================================
+
+.. sectionauthor:: Chris Lattner <sabre@nondot.org> and Jim Laskey <jlaskey@mac.com>
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+This document is the central repository for all information pertaining to debug
+information in LLVM. It describes the :ref:`actual format that the LLVM debug
+information takes <format>`, which is useful for those interested in creating
+front-ends or dealing directly with the information. Further, this document
+provides specific examples of what debug information for C/C++ looks like.
+
+Philosophy behind LLVM debugging information
+--------------------------------------------
+
+The idea of the LLVM debugging information is to capture how the important
+pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
+Several design aspects have shaped the solution that appears here. The
+important ones are:
+
+* Debugging information should have very little impact on the rest of the
+ compiler. No transformations, analyses, or code generators should need to
+ be modified because of debugging information.
+
+* LLVM optimizations should interact in :ref:`well-defined and easily described
+ ways <intro_debugopt>` with the debugging information.
+
+* Because LLVM is designed to support arbitrary programming languages,
+ LLVM-to-LLVM tools should not need to know anything about the semantics of
+ the source-level-language.
+
+* Source-level languages are often **widely** different from one another.
+ LLVM should not put any restrictions of the flavor of the source-language,
+ and the debugging information should work with any language.
+
+* With code generator support, it should be possible to use an LLVM compiler
+ to compile a program to native machine code and standard debugging
+ formats. This allows compatibility with traditional machine-code level
+ debuggers, like GDB or DBX.
+
+The approach used by the LLVM implementation is to use a small set of
+:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping
+between LLVM program objects and the source-level objects. The description of
+the source-level program is maintained in LLVM metadata in an
+:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end
+currently uses working draft 7 of the `DWARF 3 standard
+<http://www.eagercon.com/dwarf/dwarf3std.htm>`_).
+
+When a program is being debugged, a debugger interacts with the user and turns
+the stored debug information into source-language specific information. As
+such, a debugger must be aware of the source-language, and is thus tied to a
+specific language or family of languages.
+
+Debug information consumers
+---------------------------
+
+The role of debug information is to provide meta information normally stripped
+away during the compilation process. This meta information provides an LLVM
+user a relationship between generated code and the original program source
+code.
+
+Currently, debug information is consumed by DwarfDebug to produce dwarf
+information used by the gdb debugger. Other targets could use the same
+information to produce stabs or other debug forms.
+
+It would also be reasonable to use debug information to feed profiling tools
+for analysis of generated code, or, tools for reconstructing the original
+source from generated code.
+
+TODO - expound a bit more.
+
+.. _intro_debugopt:
+
+Debugging optimized code
+------------------------
+
+An extremely high priority of LLVM debugging information is to make it interact
+well with optimizations and analysis. In particular, the LLVM debug
+information provides the following guarantees:
+
+* LLVM debug information **always provides information to accurately read
+ the source-level state of the program**, regardless of which LLVM
+ optimizations have been run, and without any modification to the
+ optimizations themselves. However, some optimizations may impact the
+ ability to modify the current state of the program with a debugger, such
+ as setting program variables, or calling functions that have been
+ deleted.
+
+* As desired, LLVM optimizations can be upgraded to be aware of the LLVM
+ debugging information, allowing them to update the debugging information
+ as they perform aggressive optimizations. This means that, with effort,
+ the LLVM optimizers could optimize debug code just as well as non-debug
+ code.
+
+* LLVM debug information does not prevent optimizations from
+ happening (for example inlining, basic block reordering/merging/cleanup,
+ tail duplication, etc).
+
+* LLVM debug information is automatically optimized along with the rest of
+ the program, using existing facilities. For example, duplicate
+ information is automatically merged by the linker, and unused information
+ is automatically removed.
+
+Basically, the debug information allows you to compile a program with
+"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
+the program as it executes from a debugger. Compiling a program with
+"``-O3 -g``" gives you full debug information that is always available and
+accurate for reading (e.g., you get accurate stack traces despite tail call
+elimination and inlining), but you might lose the ability to modify the program
+and call functions where were optimized out of the program, or inlined away
+completely.
+
+:ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test
+optimizer's handling of debugging information. It can be run like this:
+
+.. code-block:: bash
+
+ % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
+ % make TEST=dbgopt
+
+This will test impact of debugging information on optimization passes. If
+debugging information influences optimization passes then it will be reported
+as a failure. See :doc:`TestingGuide` for more information on LLVM test
+infrastructure and how to run various tests.
+
+.. _format:
+
+Debugging information format
+============================
+
+LLVM debugging information has been carefully designed to make it possible for
+the optimizer to optimize the program and debugging information without
+necessarily having to know anything about debugging information. In
+particular, the use of metadata avoids duplicated debugging information from
+the beginning, and the global dead code elimination pass automatically deletes
+debugging information for a function if it decides to delete the function.
+
+To do this, most of the debugging information (descriptors for types,
+variables, functions, source files, etc) is inserted by the language front-end
+in the form of LLVM metadata.
+
+Debug information is designed to be agnostic about the target debugger and
+debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic
+pass to decode the information that represents variables, types, functions,
+namespaces, etc: this allows for arbitrary source-language semantics and
+type-systems to be used, as long as there is a module written for the target
+debugger to interpret the information.
+
+To provide basic functionality, the LLVM debugger does have to make some
+assumptions about the source-level language being debugged, though it keeps
+these to a minimum. The only common features that the LLVM debugger assumes
+exist are :ref:`source files <format_files>`, and :ref:`program objects
+<format_global_variables>`. These abstract objects are used by a debugger to
+form stack traces, show information about local variables, etc.
+
+This section of the documentation first describes the representation aspects
+common to any source-language. :ref:`ccxx_frontend` describes the data layout
+conventions used by the C and C++ front-ends.
+
+Debug information descriptors
+-----------------------------
+
+In consideration of the complexity and volume of debug information, LLVM
+provides a specification for well formed debug descriptors.
+
+Consumers of LLVM debug information expect the descriptors for program objects
+to start in a canonical format, but the descriptors can include additional
+information appended at the end that is source-language specific. All LLVM
+debugging information is versioned, allowing backwards compatibility in the
+case that the core structures need to change in some way. Also, all debugging
+information objects start with a tag to indicate what type of object it is.
+The source-language is allowed to define its own objects, by using unreserved
+tag numbers. We recommend using with tags in the range 0x1000 through 0x2000
+(there is a defined ``enum DW_TAG_user_base = 0x1000``.)
+
+The fields of debug descriptors used internally by LLVM are restricted to only
+the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and
+``mdnode``.
+
+.. code-block:: llvm
+
+ !1 = metadata !{
+ i32, ;; A tag
+ ...
+ }
+
+<a name="LLVMDebugVersion">The first field of a descriptor is always an
+``i32`` containing a tag value identifying the content of the descriptor.
+The remaining fields are specific to the descriptor. The values of tags are
+loosely bound to the tag values of DWARF information entries. However, that
+does not restrict the use of the information supplied to DWARF targets. To
+facilitate versioning of debug information, the tag is augmented with the
+current debug version (``LLVMDebugVersion = 8 << 16`` or 0x80000 or
+524288.)
+
+The details of the various descriptors follow.
+
+Compile unit descriptors
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !0 = metadata !{
+ i32, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit)
+ i32, ;; Unused field.
+ i32, ;; DWARF language identifier (ex. DW_LANG_C89)
+ metadata, ;; Source file name
+ metadata, ;; Source file directory (includes trailing slash)
+ metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
+ i1, ;; True if this is a main compile unit.
+ i1, ;; True if this is optimized.
+ metadata, ;; Flags
+ i32 ;; Runtime version
+ metadata ;; List of enums types
+ metadata ;; List of retained types
+ metadata ;; List of subprograms
+ metadata ;; List of global variables
+ }
+
+These descriptors contain a source language ID for the file (we use the DWARF
+3.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``,
+``DW_LANG_Cobol74``, etc), three strings describing the filename, working
+directory of the compiler, and an identifier string for the compiler that
+produced it.
+
+Compile unit descriptors provide the root context for objects declared in a
+specific compilation unit. File descriptors are defined using this context.
+These descriptors are collected by a named metadata ``!llvm.dbg.cu``. Compile
+unit descriptor keeps track of subprograms, global variables and type
+information.
+
+.. _format_files:
+
+File descriptors
+^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !0 = metadata !{
+ i32, ;; Tag = 41 + LLVMDebugVersion (DW_TAG_file_type)
+ metadata, ;; Source file name
+ metadata, ;; Source file directory (includes trailing slash)
+ metadata ;; Unused
+ }
+
+These descriptors contain information for a file. Global variables and top
+level functions would be defined using this context. File descriptors also
+provide context for source line correspondence.
+
+Each input file is encoded as a separate file descriptor in LLVM debugging
+information output.
+
+.. _format_global_variables:
+
+Global variable descriptors
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !1 = metadata !{
+ i32, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable)
+ i32, ;; Unused field.
+ metadata, ;; Reference to context descriptor
+ metadata, ;; Name
+ metadata, ;; Display name (fully qualified C++ name)
+ metadata, ;; MIPS linkage name (for C++)
+ metadata, ;; Reference to file where defined
+ i32, ;; Line number where defined
+ metadata, ;; Reference to type descriptor
+ i1, ;; True if the global is local to compile unit (static)
+ i1, ;; True if the global is defined in the compile unit (not extern)
+ {}* ;; Reference to the global variable
+ }
+
+These descriptors provide debug information about globals variables. The
+provide details such as name, type and where the variable is defined. All
+global variables are collected inside the named metadata ``!llvm.dbg.cu``.
+
+.. _format_subprograms:
+
+Subprogram descriptors
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram)
+ i32, ;; Unused field.
+ metadata, ;; Reference to context descriptor
+ metadata, ;; Name
+ metadata, ;; Display name (fully qualified C++ name)
+ metadata, ;; MIPS linkage name (for C++)
+ metadata, ;; Reference to file where defined
+ i32, ;; Line number where defined
+ metadata, ;; Reference to type descriptor
+ i1, ;; True if the global is local to compile unit (static)
+ i1, ;; True if the global is defined in the compile unit (not extern)
+ i32, ;; Line number where the scope of the subprogram begins
+ i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
+ i32, ;; Index into a virtual function
+ metadata, ;; indicates which base type contains the vtable pointer for the
+ ;; derived class
+ i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
+ i1, ;; isOptimized
+ Function * , ;; Pointer to LLVM function
+ metadata, ;; Lists function template parameters
+ metadata, ;; Function declaration descriptor
+ metadata ;; List of function variables
+ }
+
+These descriptors provide debug information about functions, methods and
+subprograms. They provide details such as name, return types and the source
+location where the subprogram is defined.
+
+Block descriptors
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !3 = metadata !{
+ i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block)
+ metadata,;; Reference to context descriptor
+ i32, ;; Line number
+ i32, ;; Column number
+ metadata,;; Reference to source file
+ i32 ;; Unique ID to identify blocks from a template function
+ }
+
+This descriptor provides debug information about nested blocks within a
+subprogram. The line number and column numbers are used to dinstinguish two
+lexical blocks at same depth.
+
+.. code-block:: llvm
+
+ !3 = metadata !{
+ i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block)
+ metadata ;; Reference to the scope we're annotating with a file change
+ metadata,;; Reference to the file the scope is enclosed in.
+ }
+
+This descriptor provides a wrapper around a lexical scope to handle file
+changes in the middle of a lexical block.
+
+.. _format_basic_type:
+
+Basic type descriptors
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !4 = metadata !{
+ i32, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type)
+ metadata, ;; Reference to context
+ metadata, ;; Name (may be "" for anonymous types)
+ metadata, ;; Reference to file where defined (may be NULL)
+ i32, ;; Line number where defined (may be 0)
+ i64, ;; Size in bits
+ i64, ;; Alignment in bits
+ i64, ;; Offset in bits
+ i32, ;; Flags
+ i32 ;; DWARF type encoding
+ }
+
+These descriptors define primitive types used in the code. Example ``int``,
+``bool`` and ``float``. The context provides the scope of the type, which is
+usually the top level. Since basic types are not usually user defined the
+context and line number can be left as NULL and 0. The size, alignment and
+offset are expressed in bits and can be 64 bit values. The alignment is used
+to round the offset when embedded in a :ref:`composite type
+<format_composite_type>` (example to keep float doubles on 64 bit boundaries).
+The offset is the bit offset if embedded in a :ref:`composite type
+<format_composite_type>`.
+
+The type encoding provides the details of the type. The values are typically
+one of the following:
+
+.. code-block:: llvm
+
+ DW_ATE_address = 1
+ DW_ATE_boolean = 2
+ DW_ATE_float = 4
+ DW_ATE_signed = 5
+ DW_ATE_signed_char = 6
+ DW_ATE_unsigned = 7
+ DW_ATE_unsigned_char = 8
+
+.. _format_derived_type:
+
+Derived type descriptors
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !5 = metadata !{
+ i32, ;; Tag (see below)
+ metadata, ;; Reference to context
+ metadata, ;; Name (may be "" for anonymous types)
+ metadata, ;; Reference to file where defined (may be NULL)
+ i32, ;; Line number where defined (may be 0)
+ i64, ;; Size in bits
+ i64, ;; Alignment in bits
+ i64, ;; Offset in bits
+ i32, ;; Flags to encode attributes, e.g. private
+ metadata, ;; Reference to type derived from
+ metadata, ;; (optional) Name of the Objective C property associated with
+ ;; Objective-C an ivar
+ metadata, ;; (optional) Name of the Objective C property getter selector.
+ metadata, ;; (optional) Name of the Objective C property setter selector.
+ i32 ;; (optional) Objective C property attributes.
+ }
+
+These descriptors are used to define types derived from other types. The value
+of the tag varies depending on the meaning. The following are possible tag
+values:
+
+.. code-block:: llvm
+
+ DW_TAG_formal_parameter = 5
+ DW_TAG_member = 13
+ DW_TAG_pointer_type = 15
+ DW_TAG_reference_type = 16
+ DW_TAG_typedef = 22
+ DW_TAG_const_type = 38
+ DW_TAG_volatile_type = 53
+ DW_TAG_restrict_type = 55
+
+``DW_TAG_member`` is used to define a member of a :ref:`composite type
+<format_composite_type>` or :ref:`subprogram <format_subprograms>`. The type
+of the member is the :ref:`derived type <format_derived_type>`.
+``DW_TAG_formal_parameter`` is used to define a member which is a formal
+argument of a subprogram.
+
+``DW_TAG_typedef`` is used to provide a name for the derived type.
+
+``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``,
+``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the
+:ref:`derived type <format_derived_type>`.
+
+:ref:`Derived type <format_derived_type>` location can be determined from the
+context and line number. The size, alignment and offset are expressed in bits
+and can be 64 bit values. The alignment is used to round the offset when
+embedded in a :ref:`composite type <format_composite_type>` (example to keep
+float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
+in a :ref:`composite type <format_composite_type>`.
+
+Note that the ``void *`` type is expressed as a type derived from NULL.
+
+.. _format_composite_type:
+
+Composite type descriptors
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !6 = metadata !{
+ i32, ;; Tag (see below)
+ metadata, ;; Reference to context
+ metadata, ;; Name (may be "" for anonymous types)
+ metadata, ;; Reference to file where defined (may be NULL)
+ i32, ;; Line number where defined (may be 0)
+ i64, ;; Size in bits
+ i64, ;; Alignment in bits
+ i64, ;; Offset in bits
+ i32, ;; Flags
+ metadata, ;; Reference to type derived from
+ metadata, ;; Reference to array of member descriptors
+ i32 ;; Runtime languages
+ }
+
+These descriptors are used to define types that are composed of 0 or more
+elements. The value of the tag varies depending on the meaning. The following
+are possible tag values:
+
+.. code-block:: llvm
+
+ DW_TAG_array_type = 1
+ DW_TAG_enumeration_type = 4
+ DW_TAG_structure_type = 19
+ DW_TAG_union_type = 23
+ DW_TAG_vector_type = 259
+ DW_TAG_subroutine_type = 21
+ DW_TAG_inheritance = 28
+
+The vector flag indicates that an array type is a native packed vector.
+
+The members of array types (tag = ``DW_TAG_array_type``) or vector types (tag =
+``DW_TAG_vector_type``) are :ref:`subrange descriptors <format_subrange>`, each
+representing the range of subscripts at that level of indexing.
+
+The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are
+:ref:`enumerator descriptors <format_enumerator>`, each representing the
+definition of enumeration value for the set. All enumeration type descriptors
+are collected inside the named metadata ``!llvm.dbg.cu``.
+
+The members of structure (tag = ``DW_TAG_structure_type``) or union (tag =
+``DW_TAG_union_type``) types are any one of the :ref:`basic
+<format_basic_type>`, :ref:`derived <format_derived_type>` or :ref:`composite
+<format_composite_type>` type descriptors, each representing a field member of
+the structure or union.
+
+For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide
+information about base classes, static members and member functions. If a
+member is a :ref:`derived type descriptor <format_derived_type>` and has a tag
+of ``DW_TAG_inheritance``, then the type represents a base class. If the member
+of is a :ref:`global variable descriptor <format_global_variables>` then it
+represents a static member. And, if the member is a :ref:`subprogram
+descriptor <format_subprograms>` then it represents a member function. For
+static members and member functions, ``getName()`` returns the members link or
+the C++ mangled name. ``getDisplayName()`` the simplied version of the name.
+
+The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements
+is the return type for the subroutine. The remaining elements are the formal
+arguments to the subroutine.
+
+:ref:`Composite type <format_composite_type>` location can be determined from
+the context and line number. The size, alignment and offset are expressed in
+bits and can be 64 bit values. The alignment is used to round the offset when
+embedded in a :ref:`composite type <format_composite_type>` (as an example, to
+keep float doubles on 64 bit boundaries). The offset is the bit offset if
+embedded in a :ref:`composite type <format_composite_type>`.
+
+.. _format_subrange:
+
+Subrange descriptors
+^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !42 = metadata !{
+ i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type)
+ i64, ;; Low value
+ i64 ;; High value
+ }
+
+These descriptors are used to define ranges of array subscripts for an array
+:ref:`composite type <format_composite_type>`. The low value defines the lower
+bounds typically zero for C/C++. The high value is the upper bounds. Values
+are 64 bit. ``High - Low + 1`` is the size of the array. If ``Low > High``
+the array bounds are not included in generated debugging information.
+
+.. _format_enumerator:
+
+Enumerator descriptors
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !6 = metadata !{
+ i32, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator)
+ metadata, ;; Name
+ i64 ;; Value
+ }
+
+These descriptors are used to define members of an enumeration :ref:`composite
+type <format_composite_type>`, it associates the name to the value.
+
+Local variables
+^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !7 = metadata !{
+ i32, ;; Tag (see below)
+ metadata, ;; Context
+ metadata, ;; Name
+ metadata, ;; Reference to file where defined
+ i32, ;; 24 bit - Line number where defined
+ ;; 8 bit - Argument number. 1 indicates 1st argument.
+ metadata, ;; Type descriptor
+ i32, ;; flags
+ metadata ;; (optional) Reference to inline location
+ }
+
+These descriptors are used to define variables local to a sub program. The
+value of the tag depends on the usage of the variable:
+
+.. code-block:: llvm
+
+ DW_TAG_auto_variable = 256
+ DW_TAG_arg_variable = 257
+ DW_TAG_return_variable = 258
+
+An auto variable is any variable declared in the body of the function. An
+argument variable is any variable that appears as a formal argument to the
+function. A return variable is used to track the result of a function and has
+no source correspondent.
+
+The context is either the subprogram or block where the variable is defined.
+Name the source variable name. Context and line indicate where the variable
+was defined. Type descriptor defines the declared type of the variable.
+
+.. _format_common_intrinsics:
+
+Debugger intrinsic functions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
+provide debug information at various points in generated code.
+
+``llvm.dbg.declare``
+^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ void %llvm.dbg.declare(metadata, metadata)
+
+This intrinsic provides information about a local element (e.g., variable).
+The first argument is metadata holding the alloca for the variable. The second
+argument is metadata containing a description of the variable.
+
+``llvm.dbg.value``
+^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ void %llvm.dbg.value(metadata, i64, metadata)
+
+This intrinsic provides information when a user source variable is set to a new
+value. The first argument is the new value (wrapped as metadata). The second
+argument is the offset in the user source variable where the new value is
+written. The third argument is metadata containing a description of the user
+source variable.
+
+Object lifetimes and scoping
+============================
+
+In many languages, the local variables in functions can have their lifetimes or
+scopes limited to a subset of a function. In the C family of languages, for
+example, variables are only live (readable and writable) within the source
+block that they are defined in. In functional languages, values are only
+readable after they have been defined. Though this is a very obvious concept,
+it is non-trivial to model in LLVM, because it has no notion of scoping in this
+sense, and does not want to be tied to a language's scoping rules.
+
+In order to handle this, the LLVM debug format uses the metadata attached to
+llvm instructions to encode line number and scoping information. Consider the
+following C fragment, for example:
+
+.. code-block:: c
+
+ 1. void foo() {
+ 2. int X = 21;
+ 3. int Y = 22;
+ 4. {
+ 5. int Z = 23;
+ 6. Z = X;
+ 7. }
+ 8. X = Y;
+ 9. }
+
+Compiled to LLVM, this function would be represented like this:
+
+.. code-block:: llvm
+
+ define void @foo() nounwind ssp {
+ entry:
+ %X = alloca i32, align 4 ; <i32*> [#uses=4]
+ %Y = alloca i32, align 4 ; <i32*> [#uses=4]
+ %Z = alloca i32, align 4 ; <i32*> [#uses=3]
+ %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1]
+ call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7
+ store i32 21, i32* %X, !dbg !8
+ %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1]
+ call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10
+ store i32 22, i32* %Y, !dbg !11
+ %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1]
+ call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14
+ store i32 23, i32* %Z, !dbg !15
+ %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1]
+ %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1]
+ %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1]
+ store i32 %add, i32* %Z, !dbg !16
+ %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1]
+ store i32 %tmp2, i32* %X, !dbg !17
+ ret void, !dbg !18
+ }
+
+ declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone
+
+ !0 = metadata !{i32 459008, metadata !1, metadata !"X",
+ metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
+ !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
+ !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo",
+ metadata !"foo", metadata !3, i32 1, metadata !4,
+ i1 false, i1 true}; [DW_TAG_subprogram ]
+ !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c",
+ metadata !"/private/tmp", metadata !"clang 1.1", i1 true,
+ i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
+ !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0,
+ i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
+ !5 = metadata !{null}
+ !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0,
+ i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
+ !7 = metadata !{i32 2, i32 7, metadata !1, null}
+ !8 = metadata !{i32 2, i32 3, metadata !1, null}
+ !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3,
+ metadata !6}; [ DW_TAG_auto_variable ]
+ !10 = metadata !{i32 3, i32 7, metadata !1, null}
+ !11 = metadata !{i32 3, i32 3, metadata !1, null}
+ !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5,
+ metadata !6}; [ DW_TAG_auto_variable ]
+ !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
+ !14 = metadata !{i32 5, i32 9, metadata !13, null}
+ !15 = metadata !{i32 5, i32 5, metadata !13, null}
+ !16 = metadata !{i32 6, i32 5, metadata !13, null}
+ !17 = metadata !{i32 8, i32 3, metadata !1, null}
+ !18 = metadata !{i32 9, i32 1, metadata !2, null}
+
+This example illustrates a few important details about LLVM debugging
+information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
+location information, which are attached to an instruction, are applied
+together to allow a debugger to analyze the relationship between statements,
+variable definitions, and the code used to implement the function.
+
+.. code-block:: llvm
+
+ call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7
+
+The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
+variable ``X``. The metadata ``!dbg !7`` attached to the intrinsic provides
+scope information for the variable ``X``.
+
+.. code-block:: llvm
+
+ !7 = metadata !{i32 2, i32 7, metadata !1, null}
+ !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
+ !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo",
+ metadata !"foo", metadata !"foo", metadata !3, i32 1,
+ metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]
+
+Here ``!7`` is metadata providing location information. It has four fields:
+line number, column number, scope, and original scope. The original scope
+represents inline location if this instruction is inlined inside a caller, and
+is null otherwise. In this example, scope is encoded by ``!1``. ``!1``
+represents a lexical block inside the scope ``!2``, where ``!2`` is a
+:ref:`subprogram descriptor <format_subprograms>`. This way the location
+information attached to the intrinsics indicates that the variable ``X`` is
+declared at line number 2 at a function level scope in function ``foo``.
+
+Now lets take another example.
+
+.. code-block:: llvm
+
+ call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14
+
+The second intrinsic ``%llvm.dbg.declare`` encodes debugging information for
+variable ``Z``. The metadata ``!dbg !14`` attached to the intrinsic provides
+scope information for the variable ``Z``.
+
+.. code-block:: llvm
+
+ !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
+ !14 = metadata !{i32 5, i32 9, metadata !13, null}
+
+Here ``!14`` indicates that ``Z`` is declared at line number 5 and
+column number 9 inside of lexical scope ``!13``. The lexical scope itself
+resides inside of lexical scope ``!1`` described above.
+
+The scope information attached with each instruction provides a straightforward
+way to find instructions covered by a scope.
+
+.. _ccxx_frontend:
+
+C/C++ front-end specific debug information
+==========================================
+
+The C and C++ front-ends represent information about the program in a format
+that is effectively identical to `DWARF 3.0
+<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information
+content. This allows code generators to trivially support native debuggers by
+generating standard dwarf information, and contains enough information for
+non-dwarf targets to translate it as needed.
+
+This section describes the forms used to represent C and C++ programs. Other
+languages could pattern themselves after this (which itself is tuned to
+representing programs in the same way that DWARF 3 does), or they could choose
+to provide completely different forms if they don't fit into the DWARF model.
+As support for debugging information gets added to the various LLVM
+source-language front-ends, the information used should be documented here.
+
+The following sections provide examples of various C/C++ constructs and the
+debug information that would best describe those constructs.
+
+C/C++ source file information
+-----------------------------
+
+Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the
+directory ``/Users/mine/sources``, the following code:
+
+.. code-block:: c
+
+ #include "MyHeader.h"
+
+ int main(int argc, char *argv[]) {
+ return 0;
+ }
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: llvm
+
+ ...
+ ;;
+ ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
+ ;;
+ !2 = metadata !{
+ i32 524305, ;; Tag
+ i32 0, ;; Unused
+ i32 4, ;; Language Id
+ metadata !"MySource.cpp",
+ metadata !"/Users/mine/sources",
+ metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)",
+ i1 true, ;; Main Compile Unit
+ i1 false, ;; Optimized compile unit
+ metadata !"", ;; Compiler flags
+ i32 0} ;; Runtime version
+
+ ;;
+ ;; Define the file for the file "/Users/mine/sources/MySource.cpp".
+ ;;
+ !1 = metadata !{
+ i32 524329, ;; Tag
+ metadata !"MySource.cpp",
+ metadata !"/Users/mine/sources",
+ metadata !2 ;; Compile unit
+ }
+
+ ;;
+ ;; Define the file for the file "/Users/mine/sources/Myheader.h"
+ ;;
+ !3 = metadata !{
+ i32 524329, ;; Tag
+ metadata !"Myheader.h"
+ metadata !"/Users/mine/sources",
+ metadata !2 ;; Compile unit
+ }
+
+ ...
+
+``llvm::Instruction`` provides easy access to metadata attached with an
+instruction. One can extract line number information encoded in LLVM IR using
+``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``.
+
+.. code-block:: c++
+
+ if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction
+ DILocation Loc(N); // DILocation is in DebugInfo.h
+ unsigned Line = Loc.getLineNumber();
+ StringRef File = Loc.getFilename();
+ StringRef Dir = Loc.getDirectory();
+ }
+
+C/C++ global variable information
+---------------------------------
+
+Given an integer global variable declared as follows:
+
+.. code-block:: c
+
+ int MyGlobal = 100;
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: llvm
+
+ ;;
+ ;; Define the global itself.
+ ;;
+ %MyGlobal = global int 100
+ ...
+ ;;
+ ;; List of debug info of globals
+ ;;
+ !llvm.dbg.cu = !{!0}
+
+ ;; Define the compile unit.
+ !0 = metadata !{
+ i32 786449, ;; Tag
+ i32 0, ;; Context
+ i32 4, ;; Language
+ metadata !"foo.cpp", ;; File
+ metadata !"/Volumes/Data/tmp", ;; Directory
+ metadata !"clang version 3.1 ", ;; Producer
+ i1 true, ;; Deprecated field
+ i1 false, ;; "isOptimized"?
+ metadata !"", ;; Flags
+ i32 0, ;; Runtime Version
+ metadata !1, ;; Enum Types
+ metadata !1, ;; Retained Types
+ metadata !1, ;; Subprograms
+ metadata !3 ;; Global Variables
+ } ; [ DW_TAG_compile_unit ]
+
+ ;; The Array of Global Variables
+ !3 = metadata !{
+ metadata !4
+ }
+
+ !4 = metadata !{
+ metadata !5
+ }
+
+ ;;
+ ;; Define the global variable itself.
+ ;;
+ !5 = metadata !{
+ i32 786484, ;; Tag
+ i32 0, ;; Unused
+ null, ;; Unused
+ metadata !"MyGlobal", ;; Name
+ metadata !"MyGlobal", ;; Display Name
+ metadata !"", ;; Linkage Name
+ metadata !6, ;; File
+ i32 1, ;; Line
+ metadata !7, ;; Type
+ i32 0, ;; IsLocalToUnit
+ i32 1, ;; IsDefinition
+ i32* @MyGlobal ;; LLVM-IR Value
+ } ; [ DW_TAG_variable ]
+
+ ;;
+ ;; Define the file
+ ;;
+ !6 = metadata !{
+ i32 786473, ;; Tag
+ metadata !"foo.cpp", ;; File
+ metadata !"/Volumes/Data/tmp", ;; Directory
+ null ;; Unused
+ } ; [ DW_TAG_file_type ]
+
+ ;;
+ ;; Define the type
+ ;;
+ !7 = metadata !{
+ i32 786468, ;; Tag
+ null, ;; Unused
+ metadata !"int", ;; Name
+ null, ;; Unused
+ i32 0, ;; Line
+ i64 32, ;; Size in Bits
+ i64 32, ;; Align in Bits
+ i64 0, ;; Offset
+ i32 0, ;; Flags
+ i32 5 ;; Encoding
+ } ; [ DW_TAG_base_type ]
+
+C/C++ function information
+--------------------------
+
+Given a function declared as follows:
+
+.. code-block:: c
+
+ int main(int argc, char *argv[]) {
+ return 0;
+ }
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: llvm
+
+ ;;
+ ;; Define the anchor for subprograms. Note that the second field of the
+ ;; anchor is 46, which is the same as the tag for subprograms
+ ;; (46 = DW_TAG_subprogram.)
+ ;;
+ !6 = metadata !{
+ i32 524334, ;; Tag
+ i32 0, ;; Unused
+ metadata !1, ;; Context
+ metadata !"main", ;; Name
+ metadata !"main", ;; Display name
+ metadata !"main", ;; Linkage name
+ metadata !1, ;; File
+ i32 1, ;; Line number
+ metadata !4, ;; Type
+ i1 false, ;; Is local
+ i1 true, ;; Is definition
+ i32 0, ;; Virtuality attribute, e.g. pure virtual function
+ i32 0, ;; Index into virtual table for C++ methods
+ i32 0, ;; Type that holds virtual table.
+ i32 0, ;; Flags
+ i1 false, ;; True if this function is optimized
+ Function *, ;; Pointer to llvm::Function
+ null ;; Function template parameters
+ }
+ ;;
+ ;; Define the subprogram itself.
+ ;;
+ define i32 @main(i32 %argc, i8** %argv) {
+ ...
+ }
+
+C/C++ basic types
+-----------------
+
+The following are the basic type descriptors for C/C++ core types:
+
+bool
+^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"bool", ;; Name
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 8, ;; Size in Bits
+ i64 8, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 2 ;; Encoding
+ }
+
+char
+^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"char", ;; Name
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 8, ;; Size in Bits
+ i64 8, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 6 ;; Encoding
+ }
+
+unsigned char
+^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"unsigned char",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 8, ;; Size in Bits
+ i64 8, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 8 ;; Encoding
+ }
+
+short
+^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"short int",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 16, ;; Size in Bits
+ i64 16, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 5 ;; Encoding
+ }
+
+unsigned short
+^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"short unsigned int",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 16, ;; Size in Bits
+ i64 16, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 7 ;; Encoding
+ }
+
+int
+^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"int", ;; Name
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 32, ;; Size in Bits
+ i64 32, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 5 ;; Encoding
+ }
+
+unsigned int
+^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"unsigned int",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 32, ;; Size in Bits
+ i64 32, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 7 ;; Encoding
+ }
+
+long long
+^^^^^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"long long int",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 64, ;; Size in Bits
+ i64 64, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 5 ;; Encoding
+ }
+
+unsigned long long
+^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"long long unsigned int",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 64, ;; Size in Bits
+ i64 64, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 7 ;; Encoding
+ }
+
+float
+^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"float",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 32, ;; Size in Bits
+ i64 32, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 4 ;; Encoding
+ }
+
+double
+^^^^^^
+
+.. code-block:: llvm
+
+ !2 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"double",;; Name
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 64, ;; Size in Bits
+ i64 64, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 4 ;; Encoding
+ }
+
+C/C++ derived types
+-------------------
+
+Given the following as an example of C/C++ derived type:
+
+.. code-block:: c
+
+ typedef const int *IntPtr;
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: llvm
+
+ ;;
+ ;; Define the typedef "IntPtr".
+ ;;
+ !2 = metadata !{
+ i32 524310, ;; Tag
+ metadata !1, ;; Context
+ metadata !"IntPtr", ;; Name
+ metadata !3, ;; File
+ i32 0, ;; Line number
+ i64 0, ;; Size in bits
+ i64 0, ;; Align in bits
+ i64 0, ;; Offset in bits
+ i32 0, ;; Flags
+ metadata !4 ;; Derived From type
+ }
+ ;;
+ ;; Define the pointer type.
+ ;;
+ !4 = metadata !{
+ i32 524303, ;; Tag
+ metadata !1, ;; Context
+ metadata !"", ;; Name
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 64, ;; Size in bits
+ i64 64, ;; Align in bits
+ i64 0, ;; Offset in bits
+ i32 0, ;; Flags
+ metadata !5 ;; Derived From type
+ }
+ ;;
+ ;; Define the const type.
+ ;;
+ !5 = metadata !{
+ i32 524326, ;; Tag
+ metadata !1, ;; Context
+ metadata !"", ;; Name
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 32, ;; Size in bits
+ i64 32, ;; Align in bits
+ i64 0, ;; Offset in bits
+ i32 0, ;; Flags
+ metadata !6 ;; Derived From type
+ }
+ ;;
+ ;; Define the int type.
+ ;;
+ !6 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"int", ;; Name
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 32, ;; Size in bits
+ i64 32, ;; Align in bits
+ i64 0, ;; Offset in bits
+ i32 0, ;; Flags
+ 5 ;; Encoding
+ }
+
+C/C++ struct/union types
+------------------------
+
+Given the following as an example of C/C++ struct type:
+
+.. code-block:: c
+
+ struct Color {
+ unsigned Red;
+ unsigned Green;
+ unsigned Blue;
+ };
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: llvm
+
+ ;;
+ ;; Define basic type for unsigned int.
+ ;;
+ !5 = metadata !{
+ i32 524324, ;; Tag
+ metadata !1, ;; Context
+ metadata !"unsigned int",
+ metadata !1, ;; File
+ i32 0, ;; Line number
+ i64 32, ;; Size in Bits
+ i64 32, ;; Align in Bits
+ i64 0, ;; Offset in Bits
+ i32 0, ;; Flags
+ i32 7 ;; Encoding
+ }
+ ;;
+ ;; Define composite type for struct Color.
+ ;;
+ !2 = metadata !{
+ i32 524307, ;; Tag
+ metadata !1, ;; Context
+ metadata !"Color", ;; Name
+ metadata !1, ;; Compile unit
+ i32 1, ;; Line number
+ i64 96, ;; Size in bits
+ i64 32, ;; Align in bits
+ i64 0, ;; Offset in bits
+ i32 0, ;; Flags
+ null, ;; Derived From
+ metadata !3, ;; Elements
+ i32 0 ;; Runtime Language
+ }
+
+ ;;
+ ;; Define the Red field.
+ ;;
+ !4 = metadata !{
+ i32 524301, ;; Tag
+ metadata !1, ;; Context
+ metadata !"Red", ;; Name
+ metadata !1, ;; File
+ i32 2, ;; Line number
+ i64 32, ;; Size in bits
+ i64 32, ;; Align in bits
+ i64 0, ;; Offset in bits
+ i32 0, ;; Flags
+ metadata !5 ;; Derived From type
+ }
+
+ ;;
+ ;; Define the Green field.
+ ;;
+ !6 = metadata !{
+ i32 524301, ;; Tag
+ metadata !1, ;; Context
+ metadata !"Green", ;; Name
+ metadata !1, ;; File
+ i32 3, ;; Line number
+ i64 32, ;; Size in bits
+ i64 32, ;; Align in bits
+ i64 32, ;; Offset in bits
+ i32 0, ;; Flags
+ metadata !5 ;; Derived From type
+ }
+
+ ;;
+ ;; Define the Blue field.
+ ;;
+ !7 = metadata !{
+ i32 524301, ;; Tag
+ metadata !1, ;; Context
+ metadata !"Blue", ;; Name
+ metadata !1, ;; File
+ i32 4, ;; Line number
+ i64 32, ;; Size in bits
+ i64 32, ;; Align in bits
+ i64 64, ;; Offset in bits
+ i32 0, ;; Flags
+ metadata !5 ;; Derived From type
+ }
+
+ ;;
+ ;; Define the array of fields used by the composite type Color.
+ ;;
+ !3 = metadata !{metadata !4, metadata !6, metadata !7}
+
+C/C++ enumeration types
+-----------------------
+
+Given the following as an example of C/C++ enumeration type:
+
+.. code-block:: c
+
+ enum Trees {
+ Spruce = 100,
+ Oak = 200,
+ Maple = 300
+ };
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: llvm
+
+ ;;
+ ;; Define composite type for enum Trees
+ ;;
+ !2 = metadata !{
+ i32 524292, ;; Tag
+ metadata !1, ;; Context
+ metadata !"Trees", ;; Name
+ metadata !1, ;; File
+ i32 1, ;; Line number
+ i64 32, ;; Size in bits
+ i64 32, ;; Align in bits
+ i64 0, ;; Offset in bits
+ i32 0, ;; Flags
+ null, ;; Derived From type
+ metadata !3, ;; Elements
+ i32 0 ;; Runtime language
+ }
+
+ ;;
+ ;; Define the array of enumerators used by composite type Trees.
+ ;;
+ !3 = metadata !{metadata !4, metadata !5, metadata !6}
+
+ ;;
+ ;; Define Spruce enumerator.
+ ;;
+ !4 = metadata !{i32 524328, metadata !"Spruce", i64 100}
+
+ ;;
+ ;; Define Oak enumerator.
+ ;;
+ !5 = metadata !{i32 524328, metadata !"Oak", i64 200}
+
+ ;;
+ ;; Define Maple enumerator.
+ ;;
+ !6 = metadata !{i32 524328, metadata !"Maple", i64 300}
+
+Debugging information format
+============================
+
+Debugging Information Extension for Objective C Properties
+----------------------------------------------------------
+
+Introduction
+^^^^^^^^^^^^
+
+Objective C provides a simpler way to declare and define accessor methods using
+declared properties. The language provides features to declare a property and
+to let compiler synthesize accessor methods.
+
+The debugger lets developer inspect Objective C interfaces and their instance
+variables and class variables. However, the debugger does not know anything
+about the properties defined in Objective C interfaces. The debugger consumes
+information generated by compiler in DWARF format. The format does not support
+encoding of Objective C properties. This proposal describes DWARF extensions to
+encode Objective C properties, which the debugger can use to let developers
+inspect Objective C properties.
+
+Proposal
+^^^^^^^^
+
+Objective C properties exist separately from class members. A property can be
+defined only by "setter" and "getter" selectors, and be calculated anew on each
+access. Or a property can just be a direct access to some declared ivar.
+Finally it can have an ivar "automatically synthesized" for it by the compiler,
+in which case the property can be referred to in user code directly using the
+standard C dereference syntax as well as through the property "dot" syntax, but
+there is no entry in the ``@interface`` declaration corresponding to this ivar.
+
+To facilitate debugging, these properties we will add a new DWARF TAG into the
+``DW_TAG_structure_type`` definition for the class to hold the description of a
+given property, and a set of DWARF attributes that provide said description.
+The property tag will also contain the name and declared type of the property.
+
+If there is a related ivar, there will also be a DWARF property attribute placed
+in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG
+for that property. And in the case where the compiler synthesizes the ivar
+directly, the compiler is expected to generate a ``DW_TAG_member`` for that
+ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used
+to access this ivar directly in code, and with the property attribute pointing
+back to the property it is backing.
+
+The following examples will serve as illustration for our discussion:
+
+.. code-block:: objc
+
+ @interface I1 {
+ int n2;
+ }
+
+ @property int p1;
+ @property int p2;
+ @end
+
+ @implementation I1
+ @synthesize p1;
+ @synthesize p2 = n2;
+ @end
+
+This produces the following DWARF (this is a "pseudo dwarfdump" output):
+
+.. code-block:: none
+
+ 0x00000100: TAG_structure_type [7] *
+ AT_APPLE_runtime_class( 0x10 )
+ AT_name( "I1" )
+ AT_decl_file( "Objc_Property.m" )
+ AT_decl_line( 3 )
+
+ 0x00000110 TAG_APPLE_property
+ AT_name ( "p1" )
+ AT_type ( {0x00000150} ( int ) )
+
+ 0x00000120: TAG_APPLE_property
+ AT_name ( "p2" )
+ AT_type ( {0x00000150} ( int ) )
+
+ 0x00000130: TAG_member [8]
+ AT_name( "_p1" )
+ AT_APPLE_property ( {0x00000110} "p1" )
+ AT_type( {0x00000150} ( int ) )
+ AT_artificial ( 0x1 )
+
+ 0x00000140: TAG_member [8]
+ AT_name( "n2" )
+ AT_APPLE_property ( {0x00000120} "p2" )
+ AT_type( {0x00000150} ( int ) )
+
+ 0x00000150: AT_type( ( int ) )
+
+Note, the current convention is that the name of the ivar for an
+auto-synthesized property is the name of the property from which it derives
+with an underscore prepended, as is shown in the example. But we actually
+don't need to know this convention, since we are given the name of the ivar
+directly.
+
+Also, it is common practice in ObjC to have different property declarations in
+the @interface and @implementation - e.g. to provide a read-only property in
+the interface,and a read-write interface in the implementation. In that case,
+the compiler should emit whichever property declaration will be in force in the
+current translation unit.
+
+Developers can decorate a property with attributes which are encoded using
+``DW_AT_APPLE_property_attribute``.
+
+.. code-block:: objc
+
+ @property (readonly, nonatomic) int pr;
+
+.. code-block:: none
+
+ TAG_APPLE_property [8]
+ AT_name( "pr" )
+ AT_type ( {0x00000147} (int) )
+ AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
+
+The setter and getter method names are attached to the property using
+``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes.
+
+.. code-block:: objc
+
+ @interface I1
+ @property (setter=myOwnP3Setter:) int p3;
+ -(void)myOwnP3Setter:(int)a;
+ @end
+
+ @implementation I1
+ @synthesize p3;
+ -(void)myOwnP3Setter:(int)a{ }
+ @end
+
+The DWARF for this would be:
+
+.. code-block:: none
+
+ 0x000003bd: TAG_structure_type [7] *
+ AT_APPLE_runtime_class( 0x10 )
+ AT_name( "I1" )
+ AT_decl_file( "Objc_Property.m" )
+ AT_decl_line( 3 )
+
+ 0x000003cd TAG_APPLE_property
+ AT_name ( "p3" )
+ AT_APPLE_property_setter ( "myOwnP3Setter:" )
+ AT_type( {0x00000147} ( int ) )
+
+ 0x000003f3: TAG_member [8]
+ AT_name( "_p3" )
+ AT_type ( {0x00000147} ( int ) )
+ AT_APPLE_property ( {0x000003cd} )
+ AT_artificial ( 0x1 )
+
+New DWARF Tags
+^^^^^^^^^^^^^^
+
++-----------------------+--------+
+| TAG | Value |
++=======================+========+
+| DW_TAG_APPLE_property | 0x4200 |
++-----------------------+--------+
+
+New DWARF Attributes
+^^^^^^^^^^^^^^^^^^^^
+
++--------------------------------+--------+-----------+
+| Attribute | Value | Classes |
++================================+========+===========+
+| DW_AT_APPLE_property | 0x3fed | Reference |
++--------------------------------+--------+-----------+
+| DW_AT_APPLE_property_getter | 0x3fe9 | String |
++--------------------------------+--------+-----------+
+| DW_AT_APPLE_property_setter | 0x3fea | String |
++--------------------------------+--------+-----------+
+| DW_AT_APPLE_property_attribute | 0x3feb | Constant |
++--------------------------------+--------+-----------+
+
+New DWARF Constants
+^^^^^^^^^^^^^^^^^^^
+
++--------------------------------+-------+
+| Name | Value |
++================================+=======+
+| DW_AT_APPLE_PROPERTY_readonly | 0x1 |
++--------------------------------+-------+
+| DW_AT_APPLE_PROPERTY_readwrite | 0x2 |
++--------------------------------+-------+
+| DW_AT_APPLE_PROPERTY_assign | 0x4 |
++--------------------------------+-------+
+| DW_AT_APPLE_PROPERTY_retain | 0x8 |
++--------------------------------+-------+
+| DW_AT_APPLE_PROPERTY_copy | 0x10 |
++--------------------------------+-------+
+| DW_AT_APPLE_PROPERTY_nonatomic | 0x20 |
++--------------------------------+-------+
+
+Name Accelerator Tables
+-----------------------
+
+Introduction
+^^^^^^^^^^^^
+
+The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a
+debugger needs. The "``pub``" in the section name indicates that the entries
+in the table are publicly visible names only. This means no static or hidden
+functions show up in the "``.debug_pubnames``". No static variables or private
+class variables are in the "``.debug_pubtypes``". Many compilers add different
+things to these tables, so we can't rely upon the contents between gcc, icc, or
+clang.
+
+The typical query given by users tends not to match up with the contents of
+these tables. For example, the DWARF spec states that "In the case of the name
+of a function member or static data member of a C++ structure, class or union,
+the name presented in the "``.debug_pubnames``" section is not the simple name
+given by the ``DW_AT_name attribute`` of the referenced debugging information
+entry, but rather the fully qualified name of the data or function member."
+So the only names in these tables for complex C++ entries is a fully
+qualified name. Debugger users tend not to enter their search strings as
+"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or
+"``a::b::c``". So the name entered in the name table must be demangled in
+order to chop it up appropriately and additional names must be manually entered
+into the table to make it effective as a name lookup table for debuggers to
+se.
+
+All debuggers currently ignore the "``.debug_pubnames``" table as a result of
+its inconsistent and useless public-only name content making it a waste of
+space in the object file. These tables, when they are written to disk, are not
+sorted in any way, leaving every debugger to do its own parsing and sorting.
+These tables also include an inlined copy of the string values in the table
+itself making the tables much larger than they need to be on disk, especially
+for large C++ programs.
+
+Can't we just fix the sections by adding all of the names we need to this
+table? No, because that is not what the tables are defined to contain and we
+won't know the difference between the old bad tables and the new good tables.
+At best we could make our own renamed sections that contain all of the data we
+need.
+
+These tables are also insufficient for what a debugger like LLDB needs. LLDB
+uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then
+often asked to look for type "``foo``" or namespace "``bar``", or list items in
+namespace "``baz``". Namespaces are not included in the pubnames or pubtypes
+tables. Since clang asks a lot of questions when it is parsing an expression,
+we need to be very fast when looking up names, as it happens a lot. Having new
+accelerator tables that are optimized for very quick lookups will benefit this
+type of debugging experience greatly.
+
+We would like to generate name lookup tables that can be mapped into memory
+from disk, and used as is, with little or no up-front parsing. We would also
+be able to control the exact content of these different tables so they contain
+exactly what we need. The Name Accelerator Tables were designed to fix these
+issues. In order to solve these issues we need to:
+
+* Have a format that can be mapped into memory from disk and used as is
+* Lookups should be very fast
+* Extensible table format so these tables can be made by many producers
+* Contain all of the names needed for typical lookups out of the box
+* Strict rules for the contents of tables
+
+Table size is important and the accelerator table format should allow the reuse
+of strings from common string tables so the strings for the names are not
+duplicated. We also want to make sure the table is ready to be used as-is by
+simply mapping the table into memory with minimal header parsing.
+
+The name lookups need to be fast and optimized for the kinds of lookups that
+debuggers tend to do. Optimally we would like to touch as few parts of the
+mapped table as possible when doing a name lookup and be able to quickly find
+the name entry we are looking for, or discover there are no matches. In the
+case of debuggers we optimized for lookups that fail most of the time.
+
+Each table that is defined should have strict rules on exactly what is in the
+accelerator tables and documented so clients can rely on the content.
+
+Hash Tables
+^^^^^^^^^^^
+
+Standard Hash Tables
+""""""""""""""""""""
+
+Typical hash tables have a header, buckets, and each bucket points to the
+bucket contents:
+
+.. code-block:: none
+
+ .------------.
+ | HEADER |
+ |------------|
+ | BUCKETS |
+ |------------|
+ | DATA |
+ `------------'
+
+The BUCKETS are an array of offsets to DATA for each hash:
+
+.. code-block:: none
+
+ .------------.
+ | 0x00001000 | BUCKETS[0]
+ | 0x00002000 | BUCKETS[1]
+ | 0x00002200 | BUCKETS[2]
+ | 0x000034f0 | BUCKETS[3]
+ | | ...
+ | 0xXXXXXXXX | BUCKETS[n_buckets]
+ '------------'
+
+So for ``bucket[3]`` in the example above, we have an offset into the table
+0x000034f0 which points to a chain of entries for the bucket. Each bucket must
+contain a next pointer, full 32 bit hash value, the string itself, and the data
+for the current string value.
+
+.. code-block:: none
+
+ .------------.
+ 0x000034f0: | 0x00003500 | next pointer
+ | 0x12345678 | 32 bit hash
+ | "erase" | string value
+ | data[n] | HashData for this bucket
+ |------------|
+ 0x00003500: | 0x00003550 | next pointer
+ | 0x29273623 | 32 bit hash
+ | "dump" | string value
+ | data[n] | HashData for this bucket
+ |------------|
+ 0x00003550: | 0x00000000 | next pointer
+ | 0x82638293 | 32 bit hash
+ | "main" | string value
+ | data[n] | HashData for this bucket
+ `------------'
+
+The problem with this layout for debuggers is that we need to optimize for the
+negative lookup case where the symbol we're searching for is not present. So
+if we were to lookup "``printf``" in the table above, we would make a 32 hash
+for "``printf``", it might match ``bucket[3]``. We would need to go to the
+offset 0x000034f0 and start looking to see if our 32 bit hash matches. To do
+so, we need to read the next pointer, then read the hash, compare it, and skip
+to the next bucket. Each time we are skipping many bytes in memory and
+touching new cache pages just to do the compare on the full 32 bit hash. All
+of these accesses then tell us that we didn't have a match.
+
+Name Hash Tables
+""""""""""""""""
+
+To solve the issues mentioned above we have structured the hash tables a bit
+differently: a header, buckets, an array of all unique 32 bit hash values,
+followed by an array of hash value data offsets, one for each hash value, then
+the data for all hash values:
+
+.. code-block:: none
+
+ .-------------.
+ | HEADER |
+ |-------------|
+ | BUCKETS |
+ |-------------|
+ | HASHES |
+ |-------------|
+ | OFFSETS |
+ |-------------|
+ | DATA |
+ `-------------'
+
+The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By
+making all of the full 32 bit hash values contiguous in memory, we allow
+ourselves to efficiently check for a match while touching as little memory as
+possible. Most often checking the 32 bit hash values is as far as the lookup
+goes. If it does match, it usually is a match with no collisions. So for a
+table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
+values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
+``OFFSETS`` as:
+
+.. code-block:: none
+
+ .-------------------------.
+ | HEADER.magic | uint32_t
+ | HEADER.version | uint16_t
+ | HEADER.hash_function | uint16_t
+ | HEADER.bucket_count | uint32_t
+ | HEADER.hashes_count | uint32_t
+ | HEADER.header_data_len | uint32_t
+ | HEADER_DATA | HeaderData
+ |-------------------------|
+ | BUCKETS | uint32_t[bucket_count] // 32 bit hash indexes
+ |-------------------------|
+ | HASHES | uint32_t[hashes_count] // 32 bit hash values
+ |-------------------------|
+ | OFFSETS | uint32_t[hashes_count] // 32 bit offsets to hash value data
+ |-------------------------|
+ | ALL HASH DATA |
+ `-------------------------'
+
+So taking the exact same data from the standard hash example above we end up
+with:
+
+.. code-block:: none
+
+ .------------.
+ | HEADER |
+ |------------|
+ | 0 | BUCKETS[0]
+ | 2 | BUCKETS[1]
+ | 5 | BUCKETS[2]
+ | 6 | BUCKETS[3]
+ | | ...
+ | ... | BUCKETS[n_buckets]
+ |------------|
+ | 0x........ | HASHES[0]
+ | 0x........ | HASHES[1]
+ | 0x........ | HASHES[2]
+ | 0x........ | HASHES[3]
+ | 0x........ | HASHES[4]
+ | 0x........ | HASHES[5]
+ | 0x12345678 | HASHES[6] hash for BUCKETS[3]
+ | 0x29273623 | HASHES[7] hash for BUCKETS[3]
+ | 0x82638293 | HASHES[8] hash for BUCKETS[3]
+ | 0x........ | HASHES[9]
+ | 0x........ | HASHES[10]
+ | 0x........ | HASHES[11]
+ | 0x........ | HASHES[12]
+ | 0x........ | HASHES[13]
+ | 0x........ | HASHES[n_hashes]
+ |------------|
+ | 0x........ | OFFSETS[0]
+ | 0x........ | OFFSETS[1]
+ | 0x........ | OFFSETS[2]
+ | 0x........ | OFFSETS[3]
+ | 0x........ | OFFSETS[4]
+ | 0x........ | OFFSETS[5]
+ | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3]
+ | 0x00003500 | OFFSETS[7] offset for BUCKETS[3]
+ | 0x00003550 | OFFSETS[8] offset for BUCKETS[3]
+ | 0x........ | OFFSETS[9]
+ | 0x........ | OFFSETS[10]
+ | 0x........ | OFFSETS[11]
+ | 0x........ | OFFSETS[12]
+ | 0x........ | OFFSETS[13]
+ | 0x........ | OFFSETS[n_hashes]
+ |------------|
+ | |
+ | |
+ | |
+ | |
+ | |
+ |------------|
+ 0x000034f0: | 0x00001203 | .debug_str ("erase")
+ | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x........ | HashData[2]
+ | 0x........ | HashData[3]
+ | 0x00000000 | String offset into .debug_str (terminate data for hash)
+ |------------|
+ 0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
+ | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x00001203 | String offset into .debug_str ("dump")
+ | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x........ | HashData[2]
+ | 0x00000000 | String offset into .debug_str (terminate data for hash)
+ |------------|
+ 0x00003550: | 0x00001203 | String offset into .debug_str ("main")
+ | 0x00000009 | A 32 bit array count - number of HashData with name "main"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x........ | HashData[2]
+ | 0x........ | HashData[3]
+ | 0x........ | HashData[4]
+ | 0x........ | HashData[5]
+ | 0x........ | HashData[6]
+ | 0x........ | HashData[7]
+ | 0x........ | HashData[8]
+ | 0x00000000 | String offset into .debug_str (terminate data for hash)
+ `------------'
+
+So we still have all of the same data, we just organize it more efficiently for
+debugger lookup. If we repeat the same "``printf``" lookup from above, we
+would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
+hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which
+is the index into the ``HASHES`` table. We would then compare any consecutive
+32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
+``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo
+``n_buckets`` is still 3. In the case of a failed lookup we would access the
+memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
+before we know that we have no match. We don't end up marching through
+multiple words of memory and we really keep the number of processor data cache
+lines being accessed as small as possible.
+
+The string hash that is used for these lookup tables is the Daniel J.
+Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a
+very good hash for all kinds of names in programs with very few hash
+collisions.
+
+Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``.
+
+Details
+^^^^^^^
+
+These name hash tables are designed to be generic where specializations of the
+table get to define additional data that goes into the header ("``HeaderData``"),
+how the string value is stored ("``KeyType``") and the content of the data for each
+hash value.
+
+Header Layout
+"""""""""""""
+
+The header has a fixed part, and the specialized part. The exact format of the
+header is:
+
+.. code-block:: c
+
+ struct Header
+ {
+ uint32_t magic; // 'HASH' magic value to allow endian detection
+ uint16_t version; // Version number
+ uint16_t hash_function; // The hash function enumeration that was used
+ uint32_t bucket_count; // The number of buckets in this hash table
+ uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table
+ uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
+ // Specifically the length of the following HeaderData field - this does not
+ // include the size of the preceding fields
+ HeaderData header_data; // Implementation specific header data
+ };
+
+The header starts with a 32 bit "``magic``" value which must be ``'HASH'``
+encoded as an ASCII integer. This allows the detection of the start of the
+hash table and also allows the table's byte order to be determined so the table
+can be correctly extracted. The "``magic``" value is followed by a 16 bit
+``version`` number which allows the table to be revised and modified in the
+future. The current version number is 1. ``hash_function`` is a ``uint16_t``
+enumeration that specifies which hash function was used to produce this table.
+The current values for the hash function enumerations include:
+
+.. code-block:: c
+
+ enum HashFunctionType
+ {
+ eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
+ };
+
+``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
+are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit
+hash values that are in the ``HASHES`` array, and is the same number of offsets
+are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size
+in bytes of the ``HeaderData`` that is filled in by specialized versions of
+this table.
+
+Fixed Lookup
+""""""""""""
+
+The header is followed by the buckets, hashes, offsets, and hash value data.
+
+.. code-block:: c
+
+ struct FixedTable
+ {
+ uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
+ uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
+ uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
+ };
+
+``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The
+``hashes`` array contains all of the 32 bit hash values for all names in the
+hash table. Each hash in the ``hashes`` table has an offset in the ``offsets``
+array that points to the data for the hash value.
+
+This table setup makes it very easy to repurpose these tables to contain
+different data, while keeping the lookup mechanism the same for all tables.
+This layout also makes it possible to save the table to disk and map it in
+later and do very efficient name lookups with little or no parsing.
+
+DWARF lookup tables can be implemented in a variety of ways and can store a lot
+of information for each name. We want to make the DWARF tables extensible and
+able to store the data efficiently so we have used some of the DWARF features
+that enable efficient data storage to define exactly what kind of data we store
+for each name.
+
+The ``HeaderData`` contains a definition of the contents of each HashData chunk.
+We might want to store an offset to all of the debug information entries (DIEs)
+for each name. To keep things extensible, we create a list of items, or
+Atoms, that are contained in the data for each name. First comes the type of
+the data in each atom:
+
+.. code-block:: c
+
+ enum AtomType
+ {
+ eAtomTypeNULL = 0u,
+ eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding
+ eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question
+ eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
+ eAtomTypeNameFlags = 4u, // Flags from enum NameFlags
+ eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags
+ };
+
+The enumeration values and their meanings are:
+
+.. code-block:: none
+
+ eAtomTypeNULL - a termination atom that specifies the end of the atom list
+ eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name
+ eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE
+ eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
+ eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...)
+ eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...)
+
+Then we allow each atom type to define the atom type and how the data for each
+atom type data is encoded:
+
+.. code-block:: c
+
+ struct Atom
+ {
+ uint16_t type; // AtomType enum value
+ uint16_t form; // DWARF DW_FORM_XXX defines
+ };
+
+The ``form`` type above is from the DWARF specification and defines the exact
+encoding of the data for the Atom type. See the DWARF specification for the
+``DW_FORM_`` definitions.
+
+.. code-block:: c
+
+ struct HeaderData
+ {
+ uint32_t die_offset_base;
+ uint32_t atom_count;
+ Atoms atoms[atom_count0];
+ };
+
+``HeaderData`` defines the base DIE offset that should be added to any atoms
+that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``,
+``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines
+what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large
+each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data
+should be interpreted.
+
+For the current implementations of the "``.apple_names``" (all functions +
+globals), the "``.apple_types``" (names of all types that are defined), and
+the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom``
+array to be:
+
+.. code-block:: c
+
+ HeaderData.atom_count = 1;
+ HeaderData.atoms[0].type = eAtomTypeDIEOffset;
+ HeaderData.atoms[0].form = DW_FORM_data4;
+
+This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
+ encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
+ multiple matching DIEs in a single file, which could come up with an inlined
+ function for instance. Future tables could include more information about the
+ DIE such as flags indicating if the DIE is a function, method, block,
+ or inlined.
+
+The KeyType for the DWARF table is a 32 bit string table offset into the
+ ".debug_str" table. The ".debug_str" is the string table for the DWARF which
+ may already contain copies of all of the strings. This helps make sure, with
+ help from the compiler, that we reuse the strings between all of the DWARF
+ sections and keeps the hash table size down. Another benefit to having the
+ compiler generate all strings as DW_FORM_strp in the debug info, is that
+ DWARF parsing can be made much faster.
+
+After a lookup is made, we get an offset into the hash data. The hash data
+ needs to be able to deal with 32 bit hash collisions, so the chunk of data
+ at the offset in the hash data consists of a triple:
+
+.. code-block:: c
+
+ uint32_t str_offset
+ uint32_t hash_data_count
+ HashData[hash_data_count]
+
+If "str_offset" is zero, then the bucket contents are done. 99.9% of the
+ hash data chunks contain a single item (no 32 bit hash collision):
+
+.. code-block:: none
+
+ .------------.
+ | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
+ | 0x00000004 | uint32_t HashData count
+ | 0x........ | uint32_t HashData[0] DIE offset
+ | 0x........ | uint32_t HashData[1] DIE offset
+ | 0x........ | uint32_t HashData[2] DIE offset
+ | 0x........ | uint32_t HashData[3] DIE offset
+ | 0x00000000 | uint32_t KeyType (end of hash chain)
+ `------------'
+
+If there are collisions, you will have multiple valid string offsets:
+
+.. code-block:: none
+
+ .------------.
+ | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
+ | 0x00000004 | uint32_t HashData count
+ | 0x........ | uint32_t HashData[0] DIE offset
+ | 0x........ | uint32_t HashData[1] DIE offset
+ | 0x........ | uint32_t HashData[2] DIE offset
+ | 0x........ | uint32_t HashData[3] DIE offset
+ | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
+ | 0x00000002 | uint32_t HashData count
+ | 0x........ | uint32_t HashData[0] DIE offset
+ | 0x........ | uint32_t HashData[1] DIE offset
+ | 0x00000000 | uint32_t KeyType (end of hash chain)
+ `------------'
+
+Current testing with real world C++ binaries has shown that there is around 1
+32 bit hash collision per 100,000 name entries.
+
+Contents
+^^^^^^^^
+
+As we said, we want to strictly define exactly what is included in the
+different tables. For DWARF, we have 3 tables: "``.apple_names``",
+"``.apple_types``", and "``.apple_namespaces``".
+
+"``.apple_names``" sections should contain an entry for each DWARF DIE whose
+``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or
+``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``,
+``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains
+``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and
+static variables). All global and static variables should be included,
+including those scoped within functions and classes. For example using the
+following code:
+
+.. code-block:: c
+
+ static int var = 0;
+
+ void f ()
+ {
+ static int var = 0;
+ }
+
+Both of the static ``var`` variables would be included in the table. All
+functions should emit both their full names and their basenames. For C or C++,
+the full name is the mangled name (if available) which is usually in the
+``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the
+function basename. If global or static variables have a mangled name in a
+``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the
+simple name found in the ``DW_AT_name`` attribute.
+
+"``.apple_types``" sections should contain an entry for each DWARF DIE whose
+tag is one of:
+
+* DW_TAG_array_type
+* DW_TAG_class_type
+* DW_TAG_enumeration_type
+* DW_TAG_pointer_type
+* DW_TAG_reference_type
+* DW_TAG_string_type
+* DW_TAG_structure_type
+* DW_TAG_subroutine_type
+* DW_TAG_typedef
+* DW_TAG_union_type
+* DW_TAG_ptr_to_member_type
+* DW_TAG_set_type
+* DW_TAG_subrange_type
+* DW_TAG_base_type
+* DW_TAG_const_type
+* DW_TAG_constant
+* DW_TAG_file_type
+* DW_TAG_namelist
+* DW_TAG_packed_type
+* DW_TAG_volatile_type
+* DW_TAG_restrict_type
+* DW_TAG_interface_type
+* DW_TAG_unspecified_type
+* DW_TAG_shared_type
+
+Only entries with a ``DW_AT_name`` attribute are included, and the entry must
+not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero
+value). For example, using the following code:
+
+.. code-block:: c
+
+ int main ()
+ {
+ int *b = 0;
+ return *b;
+ }
+
+We get a few type DIEs:
+
+.. code-block:: none
+
+ 0x00000067: TAG_base_type [5]
+ AT_encoding( DW_ATE_signed )
+ AT_name( "int" )
+ AT_byte_size( 0x04 )
+
+ 0x0000006e: TAG_pointer_type [6]
+ AT_type( {0x00000067} ( int ) )
+ AT_byte_size( 0x08 )
+
+The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
+
+"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
+If we run into a namespace that has no name this is an anonymous namespace, and
+the name should be output as "``(anonymous namespace)``" (without the quotes).
+Why? This matches the output of the ``abi::cxa_demangle()`` that is in the
+standard C++ library that demangles mangled names.
+
+
+Language Extensions and File Format Changes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Objective-C Extensions
+""""""""""""""""""""""
+
+"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an
+Objective-C class. The name used in the hash table is the name of the
+Objective-C class itself. If the Objective-C class has a category, then an
+entry is made for both the class name without the category, and for the class
+name with the category. So if we have a DIE at offset 0x1234 with a name of
+method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add
+an entry for "``NSString``" that points to DIE 0x1234, and an entry for
+"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly
+track down all Objective-C methods for an Objective-C class when doing
+expressions. It is needed because of the dynamic nature of Objective-C where
+anyone can add methods to a class. The DWARF for Objective-C methods is also
+emitted differently from C++ classes where the methods are not usually
+contained in the class definition, they are scattered about across one or more
+compile units. Categories can also be defined in different shared libraries.
+So we need to be able to quickly find all of the methods and class functions
+given the Objective-C class name, or quickly find all methods and class
+functions for a class + category name. This table does not contain any
+selector names, it just maps Objective-C class names (or class names +
+category) to all of the methods and class functions. The selectors are added
+as function basenames in the "``.debug_names``" section.
+
+In the "``.apple_names``" section for Objective-C functions, the full name is
+the entire function name with the brackets ("``-[NSString
+stringWithCString:]``") and the basename is the selector only
+("``stringWithCString:``").
+
+Mach-O Changes
+""""""""""""""
+
+The sections names for the apple hash tables are for non mach-o files. For
+mach-o files, the sections should be contained in the ``__DWARF`` segment with
+names as follows:
+
+* "``.apple_names``" -> "``__apple_names``"
+* "``.apple_types``" -> "``__apple_types``"
+* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit)
+* "``.apple_objc``" -> "``__apple_objc``"
+
diff --git a/docs/subsystems.rst b/docs/subsystems.rst
index 7bd9617e27..35d7b8111d 100644
--- a/docs/subsystems.rst
+++ b/docs/subsystems.rst
@@ -20,6 +20,7 @@ Subsystem Documentation
MarkedUpDisassembly
HowToUseInstrMappings
SystemLibrary
+ SourceLevelDebugging
.. FIXME: once LangRef is Sphinxified, HowToUseInstrMappings should be put
under LangRef's toctree instead of this page's toctree.
@@ -53,7 +54,7 @@ Subsystem Documentation
The interfaces source-language compilers should use for compiling GC'd
programs.
-* `Source Level Debugging with LLVM <SourceLevelDebugging.html>`_
+* :doc:`Source Level Debugging with LLVM <SourceLevelDebugging>`
This document describes the design and philosophy behind the LLVM
source-level debugger.