Add the beginnings of documentation for the Name Accelerator Tables.

Based on a writeup originally by Greg Clayton. Abuse div and pre tags horribly. Needs a bit more cleanup. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152093 91177308-0d34-0410-b5e6-96231b3b80d8
author: Eric Christopher <echristo@apple.com> 2012-03-06 02:25:38 +0000
committer: Eric Christopher <echristo@apple.com> 2012-03-06 02:25:38 +0000
commit: 25e6329e68006abff78cea9c64d229eea8d1291e (patch)
tree: d682a381742fbf764ce7d27a8f0d4ab7b26090bd /docs/SourceLevelDebugging.html
parent: fc7243a1f616c0987c115be5f5be1ac044136a2d (diff)
download: llvm-25e6329e68006abff78cea9c64d229eea8d1291e.tar.gz
llvm-25e6329e68006abff78cea9c64d229eea8d1291e.tar.bz2
llvm-25e6329e68006abff78cea9c64d229eea8d1291e.tar.xz
1 files changed, 663 insertions, 1 deletions
diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html
index 399187d0da..8c7ae530f4 100644
--- a/docs/SourceLevelDebugging.html
+++ b/docs/SourceLevelDebugging.html
@@ -63,7 +63,14 @@
 	<li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li>
 	<li><a href="#objcpropertynewconstants">New DWARF Constants</a></li>
       </ul>
-
+      <li><a href="#acceltable">Name Accelerator Tables</a></li>
+      <ul>
+        <li><a href="#acceltableintroduction">Introduction</a></li>
+        <li><a href="#acceltablehashes">Hash Tables</a></li>
+        <li><a href="#acceltabledetails">Details</a></li>
+        <li><a href="#acceltablecontents">Contents</a></li>
+        <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li>
+      </ul>
     </ol>
   </li>
 </ul>
@@ -2116,6 +2123,661 @@ The DWARF for this would be:
 </div>
 </div>
 
+<div>
+<!-- ======================================================================= -->
+<h3>
+  <a name="acceltable">Name Accelerator Tables</a>
+</h3>
+<!-- ======================================================================= -->
+<!-- ======================================================================= -->
+<h4>
+  <a name="acceltableintro">Introduction</a>
+</h4>
+<!-- ======================================================================= -->
+<div>
+<p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger
+  needs. The "pub" in the section name indicates that the entries in the
+  table are publicly visible names only. This means no static or hidden
+  functions show up in the .debug_pubnames. No static variables or private class
+  variables are in the .debug_pubtypes. Many compilers add different things to
+  these tables, so we can't rely upon the contents between gcc, icc, or clang.
+
+<p>The typical query given by users tends not to match up with the contents of
+  these tables. For example, the DWARF spec states that "In the case of the
+  name of a function member or static data member of a C++ structure, class or
+  union, the name presented in the .debug_pubnames section is not the simple
+  name given by the DW_AT_name attribute of the referenced debugging information
+  entry, but rather the fully qualified name of the data or function member."
+  So the only names in these tables for complex C++ entries is a fully
+  qualified name.  Debugger users tend not to enter their search strings as
+  "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c".  So
+  the name entered in the name table must be demangled in order to chop it up
+  appropriately and additional names must be manually entered into the table
+  to make it effective as a name lookup table for debuggers to use.
+
+<p>All debuggers currently ignore the .debug_pubnames table as a result of
+  its inconsistent and useless public-only name content making it a waste of
+  space in the object file. These tables, when they are written to disk, are
+  not sorted in any way, leaving every debugger to do its own parsing
+  and sorting. These tables also include an inlined copy of the string values
+  in the table itself making the tables much larger than they need to be on
+  disk, especially for large C++ programs.
+
+<p>Can't we just fix the sections by adding all of the names we need to this
+  table? No, because that is not what the tables are defined to contain and we
+  won't know the difference between the old bad tables and the new good tables.
+  At best we could make our own renamed sections that contain all of the data
+  we need.
+
+<p>These tables are also insufficient for what a debugger like LLDB needs.
+  LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is
+  then often asked to look for type "foo" or namespace "bar", or list items in
+  namespace "baz". Namespaces are not included in the pubnames or pubtypes
+  tables. Since clang asks a lot of questions when it is parsing an expression,
+  we need to be very fast when looking up names, as it happens a lot. Having new
+  accelerator tables that are optimized for very quick lookups will benefit
+  this type of debugging experience greatly.
+
+<p>We would like to generate name lookup tables that can be mapped into
+  memory from disk, and used as is, with little or no up-front parsing. We would
+  also be able to control the exact content of these different tables so they
+  contain exactly what we need. The Name Accelerator Tables were designed
+  to fix these issues. In order to solve these issues we need to:
+<ul>
+  <li>Have a format that can be mapped into memory from disk and used as is</li>
+  <li>Lookups should be very fast</li>
+  <li>Extensible table format so these tables can be made by many producers</li>
+  <li>Contain all of the names needed for typical lookups out of the box</li>
+  <li>Strict rules for the contents of tables</li>
+</ul>
+<p>Table size is important and the accelerator table format should allow the
+  reuse of strings from common string tables so the strings for the names are
+  not duplicated. We also want to make sure the table is ready to be used as-is
+  by simply mapping the table into memory with minimal header parsing.
+
+<p>The name lookups need to be fast and optimized for the kinds of lookups
+  that debuggers tend to do. Optimally we would like to touch as few parts of
+  the mapped table as possible when doing a name lookup and be able to quickly
+  find the name entry we are looking for, or discover there are no matches. In
+  the case of debuggers we optimized for lookups that fail most of the time.
+
+<p>Each table that is defined should have strict rules on exactly what is in
+  the accelerator tables and documented so clients can rely on the content.
+</div>
+<!-- ======================================================================= -->
+<h4>
+  <a name="acceltablehashes">Hash Tables</a>
+</h4>
+<!-- ======================================================================= -->
+<div>
+<h5>Standard Hash Tables</h5>
+<p>Typical hash tables have a header, buckets, and each bucket points to the
+bucket contents:
+<div class="doc_code">
+<pre>
+.------------.
+|  HEADER    |
+|------------|
+|  BUCKETS   |
+|------------|
+|  DATA      |
+`------------'
+</pre>
+</div>
+<p>The BUCKETS are an array of offsets to DATA for each hash:
+<div class="doc_code">
+<pre>
+.------------.
+| 0x00001000 | BUCKETS[0]
+| 0x00002000 | BUCKETS[1]
+| 0x00002200 | BUCKETS[2]
+| 0x000034f0 | BUCKETS[3]
+|            | ...
+| 0xXXXXXXXX | BUCKETS[n_buckets]
+'------------'
+</pre>
+</div>
+<p>So for bucket[3] in the example above, we have an offset into the table
+  0x000034f0 which points to a chain of entries for the bucket. Each bucket
+  must contain a next pointer, full 32 bit hash value, the string itself,
+  and the data for the current string value.
+<div class="doc_code">
+<pre>
+            .------------.
+0x000034f0: | 0x00003500 | next pointer
+            | 0x12345678 | 32 bit hash
+            | "erase"    | string value
+            | data[n]    | HashData for this bucket
+            |------------|
+0x00003500: | 0x00003550 | next pointer
+            | 0x29273623 | 32 bit hash
+            | "dump"     | string value
+            | data[n]    | HashData for this bucket
+            |------------|
+0x00003550: | 0x00000000 | next pointer
+            | 0x82638293 | 32 bit hash
+            | "main"     | string value
+            | data[n]    | HashData for this bucket
+            `------------'
+</pre>
+</div>
+<p>The problem with this layout for debuggers is that we need to optimize for
+  the negative lookup case where the symbol we're searching for is not present.
+  So if we were to lookup "printf" in the table above, we would make a 32 hash
+  for "printf", it might match bucket[3]. We would need to go to the offset
+  0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we
+  need to read the next pointer, then read the hash, compare it, and skip to
+  the next bucket. Each time we are skipping many bytes in memory and touching
+  new cache pages just to do the compare on the full 32 bit hash. All of these
+  accesses then tell us that we didn't have a match.
+
+<h5>Name Hash Tables</h5>
+
+<p>To solve the issues mentioned above we have structured the hash tables
+  a bit differently: a header, buckets, an array of all unique 32 bit hash
+  values, followed by an array of hash value data offsets, one for each hash
+  value, then the data for all hash values:
+<div class="doc_code">
+<pre>
+.-------------.
+|  HEADER     |
+|-------------|
+|  BUCKETS    |
+|-------------|
+|  HASHES     |
+|-------------|
+|  OFFSETS    |
+|-------------|
+|  DATA       |
+`-------------'
+</pre>
+</div>
+<p>The BUCKETS in the Apple tables is an index into the HASHES array. By
+  making all of the full 32 bit hash values contiguous in memory, we allow
+  ourselves to efficiently check for a match while touching as little
+  memory as possible. Most often, checking the 32 bit hash values is as far as
+  the lookup goes. If it does match, it usually is a match with no collisions.
+  So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash
+  values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:
+<div class="doc_code">
+<pre>
+.-------------------------.
+|  HEADER.magic           | uint32_t
+|  HEADER.version         | uint16_t
+|  HEADER.hash_function   | uint16_t
+|  HEADER.bucket_count    | uint32_t
+|  HEADER.hashes_count    | uint32_t
+|  HEADER.header_data_len | uint32_t
+|  HEADER_DATA            | HeaderData
+|-------------------------|
+|  BUCKETS                | uint32_t[n_buckets] // 32 bit hash indexes
+|-------------------------|
+|  HASHES                 | uint32_t[n_buckets] // 32 bit hash values
+|-------------------------|
+|  OFFSETS                | uint32_t[n_buckets] // 32 bit offsets to hash value data
+|-------------------------|
+|  ALL HASH DATA          |
+`-------------------------'
+</pre>
+</div>
+<p>So taking the exact same data from the standard hash example above we end up
+  with:
+<div class="doc_code">
+<pre>
+            .------------.
+            | HEADER     |
+            |------------|
+            |          0 | BUCKETS[0]
+            |          2 | BUCKETS[1]
+            |          5 | BUCKETS[2]
+            |          6 | BUCKETS[3]
+            |            | ...
+            |        ... | BUCKETS[n_buckets]
+            |------------|
+            | 0x........ | HASHES[0]
+            | 0x........ | HASHES[1]
+            | 0x........ | HASHES[2]
+            | 0x........ | HASHES[3]
+            | 0x........ | HASHES[4]
+            | 0x........ | HASHES[5]
+            | 0x12345678 | HASHES[6]    hash for BUCKETS[3]
+            | 0x29273623 | HASHES[7]    hash for BUCKETS[3]
+            | 0x82638293 | HASHES[8]    hash for BUCKETS[3]
+            | 0x........ | HASHES[9]
+            | 0x........ | HASHES[10]
+            | 0x........ | HASHES[11]
+            | 0x........ | HASHES[12]
+            | 0x........ | HASHES[13]
+            | 0x........ | HASHES[n_hashes]
+            |------------|
+            | 0x........ | OFFSETS[0]
+            | 0x........ | OFFSETS[1]
+            | 0x........ | OFFSETS[2]
+            | 0x........ | OFFSETS[3]
+            | 0x........ | OFFSETS[4]
+            | 0x........ | OFFSETS[5]
+            | 0x000034f0 | OFFSETS[6]   offset for BUCKETS[3]
+            | 0x00003500 | OFFSETS[7]   offset for BUCKETS[3]
+            | 0x00003550 | OFFSETS[8]   offset for BUCKETS[3]
+            | 0x........ | OFFSETS[9]
+            | 0x........ | OFFSETS[10]
+            | 0x........ | OFFSETS[11]
+            | 0x........ | OFFSETS[12]
+            | 0x........ | OFFSETS[13]
+            | 0x........ | OFFSETS[n_hashes]
+            |------------|
+            |            |
+            |            |
+            |            |
+            |            |
+            |            |
+            |------------|
+0x000034f0: | 0x00001203 | .debug_str ("erase")
+            | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
+            | 0x........ | HashData[0]
+            | 0x........ | HashData[1]
+            | 0x........ | HashData[2]
+            | 0x........ | HashData[3]
+            | 0x00000000 | String offset into .debug_str (terminate data for hash)
+            |------------|
+0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
+            | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
+            | 0x........ | HashData[0]
+            | 0x........ | HashData[1]
+            | 0x00001203 | String offset into .debug_str ("dump")
+            | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
+            | 0x........ | HashData[0]
+            | 0x........ | HashData[1]
+            | 0x........ | HashData[2]
+            | 0x00000000 | String offset into .debug_str (terminate data for hash)
+            |------------|
+0x00003550: | 0x00001203 | String offset into .debug_str ("main")
+            | 0x00000009 | A 32 bit array count - number of HashData with name "main"
+            | 0x........ | HashData[0]
+            | 0x........ | HashData[1]
+            | 0x........ | HashData[2]
+            | 0x........ | HashData[3]
+            | 0x........ | HashData[4]
+            | 0x........ | HashData[5]
+            | 0x........ | HashData[6]
+            | 0x........ | HashData[7]
+            | 0x........ | HashData[8]
+            | 0x00000000 | String offset into .debug_str (terminate data for hash)
+            `------------'
+</pre>
+</div>
+<p>So we still have all of the same data, we just organize it more efficiently
+  for debugger lookup. If we repeat the same "printf" lookup from above, we
+  would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash
+  value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index
+  into the HASHES table. We would then compare any consecutive 32 bit hashes
+  values in the HASHES array as long as the hashes would be in BUCKETS[3]. We
+  do this by verifying that each subsequent hash value modulo n_buckets is still
+  3. In the case of a failed lookup we would access the memory for BUCKETS[3], and
+  then compare a few consecutive 32 bit hashes before we know that we have no match.
+  We don't end up marching through multiple words of memory and we really keep the
+  number of processor data cache lines being accessed as small as possible.
+
+<p>The string hash that is used for these lookup tables is the Daniel J.
+  Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very
+  good hash for all kinds of names in programs with very few hash collisions.
+
+<p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.
+</div>
+<!-- ======================================================================= -->
+<h4>
+  <a name="acceltabledetails">Details</a>
+</h4>
+<!-- ======================================================================= -->
+<div>
+<p>These name hash tables are designed to be generic where specializations of
+  the table get to define additional data that goes into the header
+  ("HeaderData"), how the string value is stored ("KeyType") and the content
+  of the data for each hash value.
+
+<h5>Header Layout</h5>
+<p>The header has a fixed part, and the specialized part. The exact format of
+  the header is:
+<div class="doc_code">
+<pre>
+struct Header
+{
+  uint32_t   magic;           // 'HASH' magic value to allow endian detection
+  uint16_t   version;         // Version number
+  uint16_t   hash_function;   // The hash function enumeration that was used
+  uint32_t   bucket_count;    // The number of buckets in this hash table
+  uint32_t   hashes_count;    // The total number of unique hash values and hash data offsets in this table
+  uint32_t   header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
+                              // Specifically the length of the following HeaderData field - this does not
+                              // include the size of the preceding fields
+  HeaderData header_data;     // Implementation specific header data
+};
+</pre>
+</div>
+<p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as
+  an ASCII integer. This allows the detection of the start of the hash table and
+  also allows the table's byte order to be determined so the table can be
+  correctly extracted. The "magic" value is followed by a 16 bit version number
+  which allows the table to be revised and modified in the future. The current
+  version number is 1. "hash_function" is a uint16_t enumeration that specifies
+  which hash function was used to produce this table. The current values for the
+  hash function enumerations include:
+<div class="doc_code">
+<pre>
+enum HashFunctionType
+{
+  eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
+};
+</pre>
+</div>
+<p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets
+  are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash
+  values that are in the HASHES array, and is the same number of offsets are
+  contained in the OFFSETS array. "header_data_len" specifies the size in
+  bytes of the HeaderData that is filled in by specialized versions of this
+  table.
+
+<h5>Fixed Lookup</h5>
+<p>The header is followed by the buckets, hashes, offsets, and hash value
+  data.
+<div class="doc_code">
+<pre>
+struct FixedTable
+{
+  uint32_t buckets[Header.bucket_count];  // An array of hash indexes into the "hashes[]" array below
+  uint32_t hashes [Header.hashes_count];  // Every unique 32 bit hash for the entire table is in this table
+  uint32_t offsets[Header.hashes_count];  // An offset that corresponds to each item in the "hashes[]" array above
+};
+</pre>
+</div>
+<p>"buckets" is an array of 32 bit indexes into the "hashes" array. The
+  "hashes" array contains all of the 32 bit hash values for all names in the
+  hash table. Each hash in the "hashes" table has an offset in the "offsets"
+  array that points to the data for the hash value.
+
+<p>This table setup makes it very easy to repurpose these tables to contain
+  different data, while keeping the lookup mechanism the same for all tables.
+  This layout also makes it possible to save the table to disk and map it in
+  later and do very efficient name lookups with little or no parsing.
+
+<p>DWARF lookup tables can be implemented in a variety of ways and can store
+  a lot of information for each name. We want to make the DWARF tables
+  extensible and able to store the data efficiently so we have used some of the
+  DWARF features that enable efficient data storage to define exactly what kind
+  of data we store for each name.
+
+<p>The "HeaderData" contains a definition of the contents of each HashData
+  chunk. We might want to store an offset to all of the debug information
+  entries (DIEs) for each name. To keep things extensible, we create a list of
+  items, or Atoms, that are contained in the data for each name. First comes the
+  type of the data in each atom:
+<div class="doc_code">
+<pre>
+enum AtomType
+{
+  eAtomTypeNULL       = 0u,
+  eAtomTypeDIEOffset  = 1u,   // DIE offset, check form for encoding
+  eAtomTypeCUOffset   = 2u,   // DIE offset of the compiler unit header that contains the item in question
+  eAtomTypeTag        = 3u,   // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
+  eAtomTypeNameFlags  = 4u,   // Flags from enum NameFlags
+  eAtomTypeTypeFlags  = 5u,   // Flags from enum TypeFlags
+};
+</pre>
+</div>
+<p>The enumeration values and their meanings are:
+<div class="doc_code">
+<pre>
+  eAtomTypeNULL       - a termination atom that specifies the end of the atom list
+  eAtomTypeDIEOffset  - an offset into the .debug_info section for the DWARF DIE for this name
+  eAtomTypeCUOffset   - an offset into the .debug_info section for the CU that contains the DIE
+  eAtomTypeDIETag     - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
+  eAtomTypeNameFlags  - Flags for functions and global variables (isFunction, isInlined, isExternal...)
+  eAtomTypeTypeFlags  - Flags for types (isCXXClass, isObjCClass, ...)
+</pre>
+</div>
+<p>Then we allow each atom type to define the atom type and how the data for
+  each atom type data is encoded:
+<div class="doc_code">
+<pre>
+struct Atom
+{
+  uint16_t type;  // AtomType enum value
+  uint16_t form;  // DWARF DW_FORM_XXX defines
+};
+</pre>
+</div>
+<p>The "form" type above is from the DWARF specification and defines the
+  exact encoding of the data for the Atom type. See the DWARF specification for
+  the DW_FORM_ definitions.
+<div class="doc_code">
+<pre>
+struct HeaderData
+{
+  uint32_t die_offset_base;
+  uint32_t atom_count;
+  Atoms    atoms[atom_count0];
+};
+</pre>
+</div>
+<p>"HeaderData" defines the base DIE offset that should be added to any atoms
+  that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4,
+  DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in
+  each "HashData" object -- Atom.form tells us how large each field will be in
+  the HashData and the Atom.type tells us how this data should be interpreted.
+
+<p>For the current implementations of the ".apple_names" (all functions + globals),
+  the ".apple_types" (names of all types that are defined), and the
+  ".apple_namespaces" (all namespaces), we currently set the Atom array to be:
+<div class="doc_code">
+<pre>
+HeaderData.atom_count = 1;
+HeaderData.atoms[0].type = eAtomTypeDIEOffset;
+HeaderData.atoms[0].form = DW_FORM_data4;
+</pre>
+</div>
+<p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
+  encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
+  multiple matching DIEs in a single file, which could come up with an inlined
+  function for instance. Future tables could include more information about the
+  DIE such as flags indicating if the DIE is a function, method, block,
+  or inlined.
+
+<p>The KeyType for the DWARF table is a 32 bit string table offset into the
+  ".debug_str" table. The ".debug_str" is the string table for the DWARF which
+  may already contain copies of all of the strings. This helps make sure, with
+  help from the compiler, that we reuse the strings between all of the DWARF
+  sections and keeps the hash table size down. Another benefit to having the
+  compiler generate all strings as DW_FORM_strp in the debug info, is that
+  DWARF parsing can be made much faster.
+
+<p>After a lookup is made, we get an offset into the hash data. The hash data
+  needs to be able to deal with 32 bit hash collisions, so the chunk of data
+  at the offset in the hash data consists of a triple:
+<div class="doc_code">
+<pre>
+uint32_t str_offset
+uint32_t hash_data_count
+HashData[hash_data_count]
+</pre>
+</div>
+<p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the
+  hash data chunks contain a single item (no 32 bit hash collision):
+<div class="doc_code">
+<pre>
+.------------.
+| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
+| 0x00000004 | uint32_t HashData count
+| 0x........ | uint32_t HashData[0] DIE offset
+| 0x........ | uint32_t HashData[1] DIE offset
+| 0x........ | uint32_t HashData[2] DIE offset
+| 0x........ | uint32_t HashData[3] DIE offset
+| 0x00000000 | uint32_t KeyType (end of hash chain)
+`------------'
+</pre>
+</div>
+<p>If there are collisions, you will have multiple valid string offsets:
+<div class="doc_code">
+<pre>
+.------------.
+| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
+| 0x00000004 | uint32_t HashData count
+| 0x........ | uint32_t HashData[0] DIE offset
+| 0x........ | uint32_t HashData[1] DIE offset
+| 0x........ | uint32_t HashData[2] DIE offset
+| 0x........ | uint32_t HashData[3] DIE offset
+| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
+| 0x00000002 | uint32_t HashData count
+| 0x........ | uint32_t HashData[0] DIE offset
+| 0x........ | uint32_t HashData[1] DIE offset
+| 0x00000000 | uint32_t KeyType (end of hash chain)
+`------------'
+</pre>
+</div>
+<p>Current testing with real world C++ binaries has shown that there is around 1
+  32 bit hash collision per 100,000 name entries.
+</div>
+<!-- ======================================================================= -->
+<h4>
+  <a name="acceltablecontents">Contents</a>
+</h4>
+<!-- ======================================================================= -->
+<div>
+<p>As we said, we want to strictly define exactly what is included in the
+  different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types",
+  and ".apple_namespaces".
+
+<p>".apple_names" sections should contain an entry for each DWARF DIE whose
+  DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that
+  has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or
+  DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr
+  in the location (global and static variables). All global and static variables
+  should be included, including those scoped withing functions and classes. For
+  example using the following code:
+<div class="doc_code">
+<pre>
+static int var = 0;
+
+void f ()
+{
+  static int var = 0;
+}
+</pre>
+</div>
+<p>Both of the static "var" variables would be included in the table. All
+  functions should emit both their full names and their basenames. For C or C++,
+  the full name is the mangled name (if available) which is usually in the
+  DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function
+  basename. If global or static variables have a mangled name in a
+  DW_AT_MIPS_linkage_name attribute, this should be emitted along with the
+  simple name found in the DW_AT_name attribute.
+
+<p>".apple_types" sections should contain an entry for each DWARF DIE whose
+  tag is one of:
+<ul>
+  <li>DW_TAG_array_type</li>
+  <li>DW_TAG_class_type</li>
+  <li>DW_TAG_enumeration_type</li>
+  <li>DW_TAG_pointer_type</li>
+  <li>DW_TAG_reference_type</li>
+  <li>DW_TAG_string_type</li>
+  <li>DW_TAG_structure_type</li>
+  <li>DW_TAG_subroutine_type</li>
+  <li>DW_TAG_typedef</li>
+  <li>DW_TAG_union_type</li>
+  <li>DW_TAG_ptr_to_member_type</li>
+  <li>DW_TAG_set_type</li>
+  <li>DW_TAG_subrange_type</li>
+  <li>DW_TAG_base_type</li>
+  <li>DW_TAG_const_type</li>
+  <li>DW_TAG_constant</li>
+  <li>DW_TAG_file_type</li>
+  <li>DW_TAG_namelist</li>
+  <li>DW_TAG_packed_type</li>
+  <li>DW_TAG_volatile_type</li>
+  <li>DW_TAG_restrict_type</li>
+  <li>DW_TAG_interface_type</li>
+  <li>DW_TAG_unspecified_type</li>
+  <li>DW_TAG_shared_type</li>
+</ul>
+<p>Only entries with a DW_AT_name attribute are included, and the entry must
+  not be a forward declaration (DW_AT_declaration attribute with a non-zero value).
+  For example, using the following code:
+<div class="doc_code">
+<pre>
+int main ()
+{
+  int *b = 0;
+  return *b;
+}
+</pre>
+</div>
+<p>We get a few type DIEs:
+<div class="doc_code">
+<pre>
+0x00000067:     TAG_base_type [5]
+                AT_encoding( DW_ATE_signed )
+                AT_name( "int" )
+                AT_byte_size( 0x04 )
+
+0x0000006e:     TAG_pointer_type [6]
+                AT_type( {0x00000067} ( int ) )
+                AT_byte_size( 0x08 )
+</pre>
+</div>
+<p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.
+
+<p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If
+  we run into a namespace that has no name this is an anonymous namespace,
+  and the name should be output as "(anonymous namespace)" (without the quotes).
+  Why? This matches the output of the abi::cxa_demangle() that is in the standard
+  C++ library that demangles mangled names.
+</div>
+
+<!-- ======================================================================= -->
+<h4>
+  <a name="acceltableextensions">Language Extensions and File Format Changes</a>
+</h4>
+<!-- ======================================================================= -->
+<div>
+<h5>Objective-C Extensions</h5>
+<p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an
+  Objective-C class. The name used in the hash table is the name of the
+  Objective-C class itself. If the Objective-C class has a category, then an
+  entry is made for both the class name without the category, and for the class
+  name with the category. So if we have a DIE at offset 0x1234 with a name
+  of method "-[NSString(my_additions) stringWithSpecialString:]", we would add
+  an entry for "NSString" that points to DIE 0x1234, and an entry for
+  "NSString(my_additions)" that points to 0x1234. This allows us to quickly
+  track down all Objective-C methods for an Objective-C class when doing
+  expressions. It is needed because of the dynamic nature of Objective-C where
+  anyone can add methods to a class. The DWARF for Objective-C methods is also
+  emitted differently from C++ classes where the methods are not usually
+  contained in the class definition, they are scattered about across one or more
+  compile units. Categories can also be defined in different shared libraries.
+  So we need to be able to quickly find all of the methods and class functions
+  given the Objective-C class name, or quickly find all methods and class
+  functions for a class + category name. This table does not contain any selector
+  names, it just maps Objective-C class names (or class names + category) to all
+  of the methods and class functions. The selectors are added as function
+  basenames in the .debug_names section.
+
+<p>In the ".apple_names" section for Objective-C functions, the full name is the
+  entire function name with the brackets ("-[NSString stringWithCString:]") and the
+  basename is the selector only ("stringWithCString:").
+
+<h5>Mach-O Changes</h5>
+<p>The sections names for the apple hash tables are for non mach-o files. For
+  mach-o files, the sections should be contained in the "__DWARF" segment with
+  names as follows:
+<ul>
+  <li>".apple_names" -> "__apple_names"</li>
+  <li>".apple_types" -> "__apple_types"</li>
+  <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li>
+  <li> ".apple_objc" -> "__apple_objc"</li>
+</ul>
+</div>
+</div>
+
 <!-- *********************************************************************** -->
 
 <hr>
author	Eric Christopher <echristo@apple.com>	2012-03-06 02:25:38 +0000
committer	Eric Christopher <echristo@apple.com>	2012-03-06 02:25:38 +0000
commit	25e6329e68006abff78cea9c64d229eea8d1291e (patch)
tree	d682a381742fbf764ce7d27a8f0d4ab7b26090bd /docs/SourceLevelDebugging.html
parent	fc7243a1f616c0987c115be5f5be1ac044136a2d (diff)
download	llvm-25e6329e68006abff78cea9c64d229eea8d1291e.tar.gz llvm-25e6329e68006abff78cea9c64d229eea8d1291e.tar.bz2 llvm-25e6329e68006abff78cea9c64d229eea8d1291e.tar.xz