18 files changed, 1553 insertions, 0 deletions
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
new file mode 100644
index 0000000000..f086181192
--- /dev/null
+++ b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
@@ -0,0 +1,74 @@
+Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST)
+From: Vikram Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <lattner@cs.uiuc.edu>
+Subject: a few thoughts
+
+I've been mulling over the virtual machine problem and I had some
+thoughts about some things for us to think about discuss:
+
+1. We need to be clear on our goals for the VM.  Do we want to emphasize
+   portability and safety like the Java VM?  Or shall we focus on the
+   architecture interface first (i.e., consider the code generation and
+   processor issues), since the architecture interface question is also
+   important for portable Java-type VMs?
+
+   This is important because the audiences for these two goals are very
+   different.  Architects and many compiler people care much more about
+   the second question.  The Java compiler and OS community care much more
+   about the first one.
+
+   Also, while the architecture interface question is important for
+   Java-type VMs, the design constraints are very different.
+
+
+2. Design issues to consider (an initial list that we should continue
+   to modify).  Note that I'm not trying to suggest actual solutions here,
+   but just various directions we can pursue:
+
+   a. A single-assignment VM, which we've both already been thinking about.
+
+   b. A strongly-typed VM.  One question is do we need the types to be
+      explicitly declared or should they be inferred by the dynamic compiler?
+
+   c. How do we get more high-level information into the VM while keeping
+      to a low-level VM design?
+
+        o  Explicit array references as operands?  An alternative is
+           to have just an array type, and let the index computations be
+           separate 3-operand instructions.
+
+        o  Explicit instructions to handle aliasing, e.g.s:
+           -- an instruction to say "I speculate that these two values are not
+              aliased, but check at runtime", like speculative execution in
+              EPIC?
+           -- or an instruction to check whether two values are aliased and
+              execute different code depending on the answer, somewhat like
+              predicated code in EPIC
+
+        o  (This one is a difficult but powerful idea.)
+           A "thread-id" field on every instruction that allows the static
+           compiler to generate a set of parallel threads, and then have
+           the runtime compiler and hardware do what they please with it.
+           This has very powerful uses, but thread-id on every instruction
+           is expensive in terms of instruction size and code size.
+           We would need to compactly encode it somehow.
+
+           Also, this will require some reading on at least two other
+           projects:
+                -- Multiscalar architecture from Wisconsin
+                -- Simultaneous multithreading architecture from Washington
+
+        o  Or forget all this and stick to a traditional instruction set?
+
+
+BTW, on an unrelated note, after the meeting yesterday, I did remember
+that you had suggested doing instruction scheduling on SSA form instead
+of a dependence DAG earlier in the semester.  When we talked about
+it yesterday, I didn't remember where the idea had come from but I
+remembered later.  Just giving credit where its due...
+
+Perhaps you can save the above as a file under RCS so you and I can
+continue to expand on this.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
new file mode 100644
index 0000000000..1c725f5aa7
--- /dev/null
+++ b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
@@ -0,0 +1,199 @@
+Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram Adve <vadve@cs.uiuc.edu>
+Subject: Re: a few thoughts
+
+Okay... here are a few of my thoughts on this (it's good to know that we
+think so alike!):
+
+> 1. We need to be clear on our goals for the VM.  Do we want to emphasize
+>    portability and safety like the Java VM?  Or shall we focus on the
+>    architecture interface first (i.e., consider the code generation and
+>    processor issues), since the architecture interface question is also
+>    important for portable Java-type VMs?
+
+I forsee the architecture looking kinda like this: (which is completely
+subject to change)
+
+1. The VM code is NOT guaranteed safe in a java sense.  Doing so makes it
+   basically impossible to support C like languages.  Besides that,
+   certifying a register based language as safe at run time would be a
+   pretty expensive operation to have to do.  Additionally, we would like
+   to be able to statically eliminate many bounds checks in Java
+   programs... for example.
+
+ 2. Instead, we can do the following (eventually): 
+   * Java bytecode is used as our "safe" representation (to avoid
+     reinventing something that we don't add much value to).  When the
+     user chooses to execute Java bytecodes directly (ie, not
+     precompiled) the runtime compiler can do some very simple
+     transformations (JIT style) to convert it into valid input for our
+     VM.  Performance is not wonderful, but it works right.
+   * The file is scheduled to be compiled (rigorously) at a later
+     time.  This could be done by some background process or by a second
+     processor in the system during idle time or something...
+   * To keep things "safe" ie to enforce a sandbox on Java/foreign code,
+     we could sign the generated VM code with a host specific private
+     key.  Then before the code is executed/loaded, we can check to see if
+     the trusted compiler generated the code.  This would be much quicker
+     than having to validate consistency (especially if bounds checks have
+     been removed, for example)
+
+>    This is important because the audiences for these two goals are very
+>    different.  Architects and many compiler people care much more about
+>    the second question.  The Java compiler and OS community care much more
+>    about the first one.
+
+3. By focusing on a more low level virtual machine, we have much more room
+   for value add.  The nice safe "sandbox" VM can be provided as a layer
+   on top of it.  It also lets us focus on the more interesting compilers
+   related projects.
+
+> 2. Design issues to consider (an initial list that we should continue
+>    to modify).  Note that I'm not trying to suggest actual solutions here,
+>    but just various directions we can pursue:
+
+Understood.  :)
+
+>    a. A single-assignment VM, which we've both already been thinking
+>       about.
+
+Yup, I think that this makes a lot of sense.  I am still intrigued,
+however, by the prospect of a minimally allocated VM representation... I
+think that it could have definate advantages for certain applications
+(think very small machines, like PDAs).  I don't, however, think that our
+initial implementations should focus on this.  :)
+
+Here are some other auxilliary goals that I think we should consider:
+
+1. Primary goal: Support a high performance dynamic compilation
+   system.  This means that we have an "ideal" division of labor between
+   the runtime and static compilers.  Of course, the other goals of the
+   system somewhat reduce the importance of this point (f.e. portability
+   reduces performance, but hopefully not much)
+2. Portability to different processors.  Since we are most familiar with
+   x86 and solaris, I think that these two are excellent candidates when
+   we get that far...
+3. Support for all languages & styles of programming (general purpose
+   VM).  This is the point that disallows java style bytecodes, where all
+   array refs are checked for bounds, etc...
+4. Support linking between different language families.  For example, call
+   C functions directly from Java without using the nasty/slow/gross JNI
+   layer.  This involves several subpoints:
+  A. Support for languages that require garbage collectors and integration
+     with languages that don't.  As a base point, we could insist on
+     always using a conservative GC, but implement free as a noop, f.e.
+
+>    b. A strongly-typed VM.  One question is do we need the types to be
+>       explicitly declared or should they be inferred by the dynamic
+>       compiler?
+
+  B. This is kind of similar to another idea that I have: make OOP
+     constructs (virtual function tables, class heirarchies, etc) explicit
+     in the VM representation.  I believe that the number of additional
+     constructs would be fairly low, but would give us lots of important
+     information... something else that would/could be important is to
+     have exceptions as first class types so that they would be handled in
+     a uniform way for the entire VM... so that C functions can call Java
+     functions for example...
+
+>    c. How do we get more high-level information into the VM while keeping
+>       to a low-level VM design?
+>       o  Explicit array references as operands?  An alternative is
+>          to have just an array type, and let the index computations be
+>          separate 3-operand instructions.
+
+   C. In the model I was thinking of (subject to change of course), we
+      would just have an array type (distinct from the pointer
+      types).  This would allow us to have arbitrarily complex index
+      expressions, while still distinguishing "load" from "Array load",
+      for example.  Perhaps also, switch jump tables would be first class
+      types as well?  This would allow better reasoning about the program.
+
+5. Support dynamic loading of code from various sources.  Already
+   mentioned above was the example of loading java bytecodes, but we want
+   to support dynamic loading of VM code as well.  This makes the job of
+   the runtime compiler much more interesting:  it can do interprocedural
+   optimizations that the static compiler can't do, because it doesn't
+   have all of the required information (for example, inlining from
+   shared libraries, etc...)
+
+6. Define a set of generally useful annotations to add to the VM
+   representation.  For example, a function can be analysed to see if it
+   has any sideeffects when run... also, the MOD/REF sets could be
+   calculated, etc... we would have to determine what is reasonable.  This
+   would generally be used to make IP optimizations cheaper for the
+   runtime compiler...
+
+>       o  Explicit instructions to handle aliasing, e.g.s:
+>            -- an instruction to say "I speculate that these two values are not
+>               aliased, but check at runtime", like speculative execution in
+>             EPIC?
+>          -- or an instruction to check whether two values are aliased and
+>             execute different code depending on the answer, somewhat like
+>             predicated code in EPIC
+
+These are also very good points... if this can be determined at compile
+time.  I think that an epic style of representation (not the instruction
+packing, just the information presented) could be a very interesting model
+to use... more later...
+
+>         o  (This one is a difficult but powerful idea.)
+>          A "thread-id" field on every instruction that allows the static
+>          compiler to generate a set of parallel threads, and then have
+>          the runtime compiler and hardware do what they please with it.
+>          This has very powerful uses, but thread-id on every instruction
+>          is expensive in terms of instruction size and code size.
+>          We would need to compactly encode it somehow.
+
+Yes yes yes!  :)  I think it would be *VERY* useful to include this kind
+of information (which EPIC architectures *implicitly* encode.  The trend
+that we are seeing supports this greatly:
+
+1. Commodity processors are getting massive SIMD support:
+   * Intel/Amd MMX/MMX2
+   * AMD's 3Dnow!
+   * Intel's SSE/SSE2
+   * Sun's VIS
+2. SMP is becoming much more common, especially in the server space.
+3. Multiple processors on a die are right around the corner.
+
+If nothing else, not designing this in would severely limit our future
+expansion of the project...
+
+>          Also, this will require some reading on at least two other
+>          projects:
+>               -- Multiscalar architecture from Wisconsin
+>               -- Simultaneous multithreading architecture from Washington
+>
+>       o  Or forget all this and stick to a traditional instruction set?
+
+Heh... :)  Well, from a pure research point of view, it is almost more
+attactive to go with the most extreme/different ISA possible.  On one axis
+you get safety and conservatism, and on the other you get degree of
+influence that the results have.  Of course the problem with pure research
+is that often times there is no concrete product of the research... :)
+
+> BTW, on an unrelated note, after the meeting yesterday, I did remember
+> that you had suggested doing instruction scheduling on SSA form instead
+> of a dependence DAG earlier in the semester.  When we talked about
+> it yesterday, I didn't remember where the idea had come from but I
+> remembered later.  Just giving credit where its due...
+
+:) Thanks.  
+
+> Perhaps you can save the above as a file under RCS so you and I can
+> continue to expand on this.
+
+I think it makes sense to do so when we get our ideas more formalized and
+bounce it back and forth a couple of times... then I'll do a more formal
+writeup of our goals and ideas.  Obviously our first implementation will
+not want to do all of the stuff that I pointed out above... be we will
+want to design the project so that we do not artificially limit ourselves
+at sometime in the future...
+
+Anyways, let me know what you think about these ideas... and if they sound
+reasonable...
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
new file mode 100644
index 0000000000..8c452924dd
--- /dev/null
+++ b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
@@ -0,0 +1,30 @@
+From: Chris Lattner [mailto:sabre@nondot.org]
+Sent: Wednesday, December 06, 2000 6:41 PM
+To: Vikram S. Adve
+Subject: Additional idea with respect to encoding
+
+Here's another idea with respect to keeping the common case instruction
+size down (less than 32 bits ideally):
+
+Instead of encoding an instruction to operate on two register numbers,
+have it operate on two negative offsets based on the current register
+number.  Therefore, instead of using:
+
+r57 = add r55, r56  (r57 is the implicit dest register, of course)
+
+We could use:
+
+r57 = add -2, -1
+
+My guess is that most SSA references are to recent values (especially if
+they correspond to expressions like (x+y*z+p*q/ ...), so the negative
+numbers would tend to stay small, even at the end of the procedure (where
+the implicit register destination number could be quite large).  Of course
+the negative sign is reduntant, so you would be storing small integers
+almost all of the time, and 5-6 bits worth of register number would be
+plenty for most cases...
+
+What do you think?
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
new file mode 100644
index 0000000000..b66e18556f
--- /dev/null
+++ b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
@@ -0,0 +1,83 @@
+SUMMARY
+-------
+
+We met to discuss the LLVM instruction format and bytecode representation:
+
+ISSUES RESOLVED
+---------------
+
+1. We decided that we shall use a flat namespace to represent our 
+   variables in SSA form, as opposed to having a two dimensional namespace
+   of the original variable and the SSA instance subscript.
+
+ARGUMENT AGAINST:
+   * A two dimensional namespace would be valuable when doing alias 
+     analysis because the extra information can help limit the scope of
+     analysis.
+
+ARGUMENT FOR:
+   * Including this information would require that all users of the LLVM
+     bytecode would have to parse and handle it.  This would slow down the
+     common case and inflate the instruction representation with another
+     infinite variable space.
+
+REASONING:
+   * It was decided that because original variable sources could be
+     reconstructed from SSA form in linear time, that it would be an
+     unjustified expense for the common case to include the extra
+     information for one optimization.  Alias analysis itself is typically
+     greater than linear in asymptotic complexity, so this extra analaysis
+     would not affect the runtime of the optimization in a significant
+     way.  Additionally, this would be an unlikely optimization to do at
+     runtime.
+
+
+IDEAS TO CONSIDER
+-----------------
+
+1. Including dominator information in the LLVM bytecode
+   representation.  This is one example of an analysis result that may be
+   packaged with the bytecodes themselves.  As a conceptual implementation 
+   idea, we could include an immediate dominator number for each basic block
+   in the LLVM bytecode program.  Basic blocks could be numbered according
+   to the order of occurance in the bytecode representation.
+
+2. Including loop header and body information.  This would facilitate
+   detection of intervals and natural loops.
+
+UNRESOLVED ISSUES 
+----------------- 
+
+1. Will oSUIF provide enough of an infrastructure to support the research
+   that we will be doing?  We know that it has less than stellar
+   performance, but hope that this will be of little importance for our
+   static compiler.  This could affect us if we decided to do some IP
+   research.  Also we do not yet understand the level of exception support
+   currently implemented.
+
+2. Should we consider the requirements of a direct hardware implementation
+   of the LLVM when we design it?  If so, several design issues should
+   have their priorities shifted.  The other option is to focus on a
+   software layer interpreting the LLVM in all cases.
+
+3. Should we use some form of packetized format to improve forward
+   compatibility?  For example, we could design the system to encode a
+   packet type and length field before analysis information, to allow a
+   runtime to skip information that it didn't understand in a bytecode
+   stream.  The obvious benefit would be for compatibility, the drawback
+   is that it would tend to splinter that 'standard' LLVM definition.
+
+4. Should we use fixed length instructions or variable length
+   instructions?  Fetching variable length instructions is expensive (for
+   either hardware or software based LLVM runtimes), but we have several
+   'infinite' spaces that instructions operate in (SSA register numbers,
+   type spaces, or packet length [if packets were implemented]).  Several
+   options were mentioned including: 
+     A. Using 16 or 32 bit numbers, which would be 'big enough'
+     B. A scheme similar to how UTF-8 works, to encode infinite numbers
+        while keeping small number small.
+     C. Use something similar to Huffman encoding, so that the most common
+        numbers are the smallest.
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
new file mode 100644
index 0000000000..111706a344
--- /dev/null
+++ b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
@@ -0,0 +1,39 @@
+Date: Wed, 31 Jan 2001 12:04:33 -0600
+From: Vikram S. Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <lattner@cs.uiuc.edu>
+Subject: another thought
+
+I have a budding idea about making LLVM a little more ambitious: a
+customizable runtime system that can be used to implement language-specific
+virtual machines for many different languages.  E.g., a C vm, a C++ vm, a
+Java vm, a Lisp vm, ..
+
+The idea would be that LLVM would provide a standard set of runtime features
+(some low-level like standard assembly instructions with code generation and
+static and runtime optimization; some higher-level like type-safety and
+perhaps a garbage collection library).  Each language vm would select the
+runtime features needed for that language, extending or customizing them as
+needed.  Most of the machine-dependent code-generation and optimization
+features as well as low-level machine-independent optimizations (like PRE)
+could be provided by LLVM and should be sufficient for any language,
+simplifying the language compiler.  (This would also help interoperability
+between languages.)  Also, some or most of the higher-level
+machine-independent features like type-safety and access safety should be
+reusable by different languages, with minor extensions.  The language
+compiler could then focus on language-specific analyses and optimizations.
+
+The risk is that this sounds like a universal IR -- something that the
+compiler community has tried and failed to develop for decades, and is
+universally skeptical about.  No matter what we say, we won't be able to
+convince anyone that we have a universal IR that will work.  We need to
+think about whether LLVM is different or if has something novel that might
+convince people.  E.g., the idea of providing a package of separable
+features that different languages select from.  Also, using SSA with or
+without type-safety as the intermediate representation.
+
+One interesting starting point would be to discuss how a JVM would be
+implemented on top of LLVM a bit more.  That might give us clues on how to
+structure LLVM to support one or more language VMs.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
new file mode 100644
index 0000000000..c09cf1f03c
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
@@ -0,0 +1,67 @@
+Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: Type notation debate...
+
+This is the way that I am currently planning on implementing types:
+
+Primitive Types:        
+type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong
+
+Method:
+typelist ::= typelisth | /*empty*/
+typelisth ::= type | typelisth ',' type
+type ::= type (typelist)
+
+Arrays (without and with size):
+type ::= '[' type ']' | '[' INT ',' type ']'
+
+Pointer:
+type ::= type '*'
+
+Structure:
+type ::= '{' typelist '}'
+
+Packed:
+type ::= '<' INT ',' type '>'
+
+Simple examples:
+
+[[ %4, int ]]   - array of (array of 4 (int))
+[ { int, int } ] - Array of structure
+[ < %4, int > ] - Array of 128 bit SIMD packets
+int (int, [[int, %4]])  - Method taking a 2d array and int, returning int
+
+
+Okay before you comment, please look at:
+
+http://www.research.att.com/~bs/devXinterview.html
+
+Search for "In another interview, you defined the C declarator syntax as
+an experiment that failed. However, this syntactic construct has been
+around for 27 years and perhaps more; why do you consider it problematic
+(except for its cumbersome syntax)?" and read that response for me.  :)
+
+Now with this syntax, his example would be represented as:
+
+[ %10, bool (int, int) * ] *
+
+vs 
+
+bool (*(*)[10])(int, int)
+
+in C.
+
+Basically, my argument for this type construction system is that it is
+VERY simple to use and understand (although it IS different than C, it is
+very simple and straightforward, which C is NOT).  In fact, I would assert
+that most programmers TODAY do not understand pointers to member
+functions, and have to look up an example when they have to write them.
+
+In my opinion, it is critically important to have clear and concise type
+specifications, because types are going to be all over the programs.
+
+Let me know your thoughts on this.  :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
new file mode 100644
index 0000000000..8bfefbf69f
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
@@ -0,0 +1,75 @@
+Date: Thu, 8 Feb 2001 08:42:04 -0600
+From: Vikram S. Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <sabre@nondot.org>
+Subject: RE: Type notation debate...
+
+Chris,
+
+> Okay before you comment, please look at:
+>
+> http://www.research.att.com/~bs/devXinterview.html
+
+I read this argument.  Even before that, I was already in agreement with you
+and him that the C declarator syntax is difficult and confusing.
+
+But in fact, if you read the entire answer carefully, he came to the same
+conclusion I do: that you have to go with familiar syntax over logical
+syntax because familiarity is such a strong force:
+
+        "However, familiarity is a strong force. To compare, in English, we
+live
+more or less happily with the absurd rules for "to be" (am, are, is, been,
+was, were, ...) and all attempts to simplify are treated with contempt or
+(preferably) humor. It be a curious world and it always beed."
+
+> Basically, my argument for this type construction system is that it is
+> VERY simple to use and understand (although it IS different than C, it is
+> very simple and straightforward, which C is NOT).  In fact, I would assert
+> that most programmers TODAY do not understand pointers to member
+> functions, and have to look up an example when they have to write them.
+
+Again, I don't disagree with this at all.  But to some extent this
+particular problem is inherently difficult.  Your syntax for the above
+example may be easier for you to read because this is the way you have been
+thinking about it.  Honestly, I don't find it much easier than the C syntax.
+In either case, I would have to look up an example to write pointers to
+member functions.
+
+But pointers to member functions are nowhere near as common as arrays.  And
+the old array syntax:
+        type [ int, int, ...]
+is just much more familiar and clear to people than anything new you
+introduce, no matter how logical it is.  Introducing a new syntax that may
+make function pointers easier but makes arrays much more difficult seems
+very risky to me.
+
+> In my opinion, it is critically important to have clear and concise type
+> specifications, because types are going to be all over the programs.
+
+I absolutely agree.  But the question is, what is more clear and concise?
+The syntax programmers are used to out of years of experience or a new
+syntax that they have never seen that has a more logical structure.  I think
+the answer is the former.  Sometimes, you have to give up a better idea
+because you can't overcome sociological barriers to it.  Qwerty keyboards
+and Windows are two classic examples of bad technology that are difficult to
+root out.
+
+P.S.  Also, while I agree that most your syntax is more logical, there is
+one part that isn't:
+
+Arrays (without and with size):
+type ::= '[' type ']' | '[' INT ',' type ']'.
+
+The arrays with size lists the dimensions and the type in a single list.
+That is just too confusing:
+        [10, 40, int]
+This seems to be a 3-D array where the third dimension is something strange.
+It is too confusing to have a list of 3 things, some of which are dimensions
+and one is a type.  Either of the following would be better:
+
+        array [10, 40] of int
+or
+        int [10, 40]
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
new file mode 100644
index 0000000000..6e9784158a
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
@@ -0,0 +1,53 @@
+Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: RE: Type notation debate...
+
+> Arrays (without and with size):
+> type ::= '[' type ']' | '[' INT ',' type ']'.
+> 
+> The arrays with size lists the dimensions and the type in a single list.
+> That is just too confusing:
+
+>       [10, 40, int]
+> This seems to be a 3-D array where the third dimension is something strange.
+> It is too confusing to have a list of 3 things, some of which are dimensions
+> and one is a type. 
+
+The above grammar indicates that there is only one integer parameter, ie
+the upper bound.  The lower bound is always implied to be zero, for
+several reasons:
+
+* As a low level VM, we want to expose addressing computations
+  explicitly.  Since the lower bound must always be known in a high level
+  language statically, the language front end can do the translation
+  automatically.
+* This fits more closely with what Java needs, ie what we need in the
+  short term.  Java arrays are always zero based.
+
+If a two element list is too confusing, I would recommend an alternate
+syntax of:
+
+type ::= '[' type ']' | '[' INT 'x' type ']'.
+
+For example:
+  [12 x int]
+  [12x int]
+  [ 12 x [ 4x int ]]
+
+Which is syntactically nicer, and more explicit.
+
+> Either of the following would be better:
+>       array [10, 40] of int
+
+I considered this approach for arrays in general (ie array of int/ array
+of 12 int), but found that it made declarations WAY too long.  Remember
+that because of the nature of llvm, you get a lot of types strewn all over
+the program, and using the 'typedef' like facility is not a wonderful
+option, because then types aren't explicit anymore.
+
+I find this email interesting, because you contradict the previous email
+you sent, where you recommend that we stick to C syntax....
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
new file mode 100644
index 0000000000..7b9032742a
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
@@ -0,0 +1,89 @@
+> But in fact, if you read the entire answer carefully, he came to the same
+> conclusion I do: that you have to go with familiar syntax over logical
+> syntax because familiarity is such a strong force:
+>       "However, familiarity is a strong force. To compare, in English, we
+live
+> more or less happily with the absurd rules for "to be" (am, are, is, been,
+> was, were, ...) and all attempts to simplify are treated with contempt or
+> (preferably) humor. It be a curious world and it always beed."
+
+Although you have to remember that his situation was considerably
+different than ours.  He was in a position where he was designing a high
+level language that had to be COMPATIBLE with C.  Our language is such
+that a new person would have to learn the new, different, syntax
+anyways.  Making them learn about the type system does not seem like much
+of a stretch from learning the opcodes and how SSA form works, and how
+everything ties together...
+
+> > Basically, my argument for this type construction system is that it is
+> > VERY simple to use and understand (although it IS different than C, it is
+> > very simple and straightforward, which C is NOT).  In fact, I would assert
+> > that most programmers TODAY do not understand pointers to member
+> > functions, and have to look up an example when they have to write them.
+
+> Again, I don't disagree with this at all.  But to some extent this
+> particular problem is inherently difficult.  Your syntax for the above
+> example may be easier for you to read because this is the way you have been
+> thinking about it.  Honestly, I don't find it much easier than the C syntax.
+> In either case, I would have to look up an example to write pointers to
+> member functions.
+
+I would argue that because the lexical structure of the language is self
+consistent, any person who spent a significant amount of time programming
+in LLVM directly would understand how to do it without looking it up in a
+manual.  The reason this does not work for C is because you rarely have to
+declare these pointers, and the syntax is inconsistent with the method
+declaration and calling syntax.
+
+> But pointers to member functions are nowhere near as common as arrays.
+
+Very true.  If you're implementing an object oriented language, however,
+remember that you have to do all the pointer to member function stuff
+yourself.... so everytime you invoke a virtual method one is involved
+(instead of having C++ hide it for you behind "syntactic sugar").
+
+> And the old array syntax:
+>       type [ int, int, ...]
+> is just much more familiar and clear to people than anything new you
+> introduce, no matter how logical it is.  
+
+Erm... excuse me but how is this the "old array syntax"?  If you are
+arguing for consistency with C, you should be asking for 'type int []',
+which is significantly different than the above (beside the above
+introduces a new operator and duplicates information
+needlessly).  Basically what I am suggesting is exactly the above without
+the fluff.  So instead of:
+
+       type [ int, int, ...]
+
+you use:
+
+       type [ int ]
+
+> Introducing a new syntax that may
+> make function pointers easier but makes arrays much more difficult seems
+> very risky to me.
+
+This is not about function pointers.  This is about consistency in the
+type system, and consistency with the rest of the language.  The point
+above does not make arrays any more difficult to use, and makes the
+structure of types much more obvious than the "c way".
+
+> > In my opinion, it is critically important to have clear and concise type
+> > specifications, because types are going to be all over the programs.
+> 
+> I absolutely agree.  But the question is, what is more clear and concise?
+> The syntax programmers are used to out of years of experience or a new
+> syntax that they have never seen that has a more logical structure.  I think
+> the answer is the former.  Sometimes, you have to give up a better idea
+> because you can't overcome sociological barriers to it.  Qwerty keyboards
+> and Windows are two classic examples of bad technology that are difficult to
+> root out.
+
+Very true, but you seem to be advocating a completely different Type
+system than C has, in addition to it not offering the advantages of clear
+structure that the system I recommended does... so you seem to not have a
+problem with changing this, just with what I change it to.  :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-09-AdveComments.txt b/docs/HistoricalNotes/2001-02-09-AdveComments.txt
new file mode 100644
index 0000000000..5503233c1e
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-09-AdveComments.txt
@@ -0,0 +1,120 @@
+Ok, here are my comments and suggestions about the LLVM instruction set.
+We should discuss some now, but can discuss many of them later, when we
+revisit synchronization, type inference, and other issues.
+(We have discussed some of the comments already.)
+
+
+o  We should consider eliminating the type annotation in cases where it is
+   essentially obvious from the instruction type, e.g., in br, it is obvious
+   that the first arg. should be a bool and the other args should be labels:
+
+	br bool <cond>, label <iftrue>, label <iffalse>
+
+   I think your point was that making all types explicit improves clarity
+   and readability.  I agree to some extent, but it also comes at the cost
+   of verbosity.  And when the types are obvious from people's experience
+   (e.g., in the br instruction), it doesn't seem to help as much.
+
+
+o  On reflection, I really like your idea of having the two different switch
+   types (even though they encode implementation techniques rather than
+   semantics).  It should simplify building the CFG and my guess is it could
+   enable some significant optimizations, though we should think about which.
+
+
+o  In the lookup-indirect form of the switch, is there a reason not to make
+   the val-type uint?  Most HLL switch statements (including Java and C++)
+   require that anyway.  And it would also make the val-type uniform 
+   in the two forms of the switch.
+
+   I did see the switch-on-bool examples and, while cute, we can just use
+   the branch instructions in that particular case.
+
+
+o  I agree with your comment that we don't need 'neg'.
+
+
+o  There's a trade-off with the cast instruction:
+   +  it avoids having to define all the upcasts and downcasts that are
+      valid for the operands of each instruction  (you probably have thought
+      of other benefits also)
+   -  it could make the bytecode significantly larger because there could
+      be a lot of cast operations
+
+
+o  Making the second arg. to 'shl' a ubyte seems good enough to me.
+   255 positions seems adequate for several generations of machines
+   and is more compact than uint.
+
+
+o  I still have some major concerns about including malloc and free in the
+   language (either as builtin functions or instructions).  LLVM must be
+   able to represent code from many different languages.  Languages such as
+   C, C++ Java and Fortran 90 would not be able to use our malloc anyway
+   because each of them will want to provide a library implementation of it.
+
+   This gets even worse when code from different languages is linked
+   into a single executable (which is fairly common in large apps).
+   Having a single malloc would just not suffice, and instead would simply
+   complicate the picture further because it adds an extra variant in
+   addition to the one each language provides.
+
+   Instead, providing a default library version of malloc and free
+   (and perhaps a malloc_gc with garbage collection instead of free)
+   would make a good implementation available to anyone who wants it.
+
+   I don't recall all your arguments in favor so let's discuss this again,
+   and soon.
+
+
+o  'alloca' on the other hand sounds like a good idea, and the
+   implementation seems fairly language-independent so it doesn't have the
+   problems with malloc listed above.
+
+
+o  About indirect call:
+   Your option #2 sounded good to me.  I'm not sure I understand your
+   concern about an explicit 'icall' instruction?
+
+
+o  A pair of important synchronization instr'ns to think about:
+     load-linked
+     store-conditional
+
+
+o  Other classes of instructions that are valuable for pipeline performance:
+     conditional-move		 
+     predicated instructions
+
+
+o  I believe tail calls are relatively easy to identify; do you know why
+   .NET has a tailcall instruction?
+
+
+o  I agree that we need a static data space.  Otherwise, emulating global
+   data gets unnecessarily complex.
+
+
+o  About explicit parallelism:
+
+   We once talked about adding a symbolic thread-id field to each
+   instruction.  (It could be optional so single-threaded codes are
+   not penalized.)  This could map well to multi-threaded architectures
+   while providing easy ILP for single-threaded onces.  But it is probably
+   too radical an idea to include in a base version of LLVM.  Instead, it
+   could a great topic for a separate study.
+
+   What is the semantics of the IA64 stop bit?
+
+
+
+
+o  And finally, another thought about the syntax for arrays :-)
+
+   Although this syntax:
+	  array <dimension-list> of <type>
+   is verbose, it will be used only in the human-readable assembly code so
+   size should not matter.  I think we should consider it because I find it
+   to be the clearest syntax.  It could even make arrays of function
+   pointers somewhat readable.
+
diff --git a/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
new file mode 100644
index 0000000000..4d2879554a
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
@@ -0,0 +1,245 @@
+From: Chris Lattner <sabre@nondot.org>
+To: "Vikram S. Adve" <vadve@cs.uiuc.edu>
+Subject: Re: LLVM Feedback
+
+I've included your feedback in the /home/vadve/lattner/llvm/docs directory
+so that it will live in CVS eventually with the rest of LLVM.  I've
+significantly updated the documentation to reflect the changes you
+suggested, as specified below:
+
+> We should consider eliminating the type annotation in cases where it is
+> essentially obvious from the instruction type:
+>        br bool <cond>, label <iftrue>, label <iffalse>
+> I think your point was that making all types explicit improves clarity
+> and readability.  I agree to some extent, but it also comes at the
+> cost of verbosity.  And when the types are obvious from people's
+> experience (e.g., in the br instruction), it doesn't seem to help as
+> much.
+
+Very true.  We should discuss this more, but my reasoning is more of a
+consistency argument.  There are VERY few instructions that can have all
+of the types eliminated, and doing so when available unnecesarily makes
+the language more difficult to handle.  Especially when you see 'int
+%this' and 'bool %that' all over the place, I think it would be
+disorienting to see:
+
+  br %predicate, %iftrue, %iffalse
+
+for branches.  Even just typing that once gives me the creeps. ;)  Like I
+said, we should probably discuss this further in person...
+
+> On reflection, I really like your idea of having the two different
+> switch types (even though they encode implementation techniques rather
+> than semantics).  It should simplify building the CFG and my guess is it
+> could enable some significant optimizations, though we should think
+> about which.
+
+Great.  I added a note to the switch section commenting on how the VM
+should just use the instruction type as a hint, and that the
+implementation may choose altermate representations (such as predicated
+branches).
+
+> In the lookup-indirect form of the switch, is there a reason not to
+> make the val-type uint?
+
+No.  This was something I was debating for a while, and didn't really feel
+strongly about either way.  It is common to switch on other types in HLL's
+(for example signed int's are particually common), but in this case, all
+that will be added is an additional 'cast' instruction.  I removed that
+from the spec.
+
+> I agree with your comment that we don't need 'neg'
+
+Removed.
+
+> There's a trade-off with the cast instruction:
+>  +  it avoids having to define all the upcasts and downcasts that are
+>     valid for the operands of each instruction  (you probably have
+>     thought of other benefits also)
+>  -  it could make the bytecode significantly larger because there could
+>     be a lot of cast operations
+
+ + You NEED casts to represent things like:
+    void foo(float);
+    ...
+    int x;
+    ...
+    foo(x);
+   in a language like C.  Even in a Java like language, you need upcasts
+   and some way to implement dynamic downcasts.
+ + Not all forms of instructions take every type (for example you can't
+   shift by a floating point number of bits), thus SOME programs will need
+   implicit casts.
+
+To be efficient and to avoid your '-' point above, we just have to be
+careful to specify that the instructions shall operate on all common
+types, therefore casting should be relatively uncommon.  For example all
+of the arithmetic operations work on almost all data types.
+
+> Making the second arg. to 'shl' a ubyte seems good enough to me.
+> 255 positions seems adequate for several generations of machines
+
+Okay, that comment is removed.
+
+> and is more compact than uint.
+
+No, it isn't.  Remember that the bytecode encoding saves value slots into
+the bytecode instructions themselves, not constant values.  This is
+another case where we may introduce more cast instructions (but we will
+also reduce the number of opcode variants that must be supported by a
+virtual machine).  Because most shifts are by constant values, I don't
+think that we'll have to cast many shifts.  :)
+
+> I still have some major concerns about including malloc and free in the
+> language (either as builtin functions or instructions).
+
+Agreed.  How about this proposal:
+
+malloc/free are either built in functions or actual opcodes.  They provide
+all of the type safety that the document would indicate, blah blah
+blah. :)
+
+Now, because of all of the excellent points that you raised, an
+implementation may want to override the default malloc/free behavior of
+the program.  To do this, they simply implement a "malloc" and
+"free" function.  The virtual machine will then be defined to use the user
+defined malloc/free function (which return/take void*'s, not type'd
+pointers like the builtin function would) if one is available, otherwise
+fall back on a system malloc/free.
+
+Does this sound like a good compromise?  It would give us all of the
+typesafety/elegance in the language while still allowing the user to do
+all the cool stuff they want to...
+
+>  'alloca' on the other hand sounds like a good idea, and the
+>  implementation seems fairly language-independent so it doesn't have the
+>  problems with malloc listed above.
+
+Okay, once we get the above stuff figured out, I'll put it all in the
+spec.
+
+>  About indirect call:
+>  Your option #2 sounded good to me.  I'm not sure I understand your
+>  concern about an explicit 'icall' instruction?
+
+I worry too much.  :)  The other alternative has been removed. 'icall' is
+now up in the instruction list next to 'call'.
+
+> I believe tail calls are relatively easy to identify; do you know why
+> .NET has a tailcall instruction?
+
+Although I am just guessing, I believe it probably has to do with the fact
+that they want languages like Haskell and lisp to be efficiently runnable
+on their VM.  Of course this means that the VM MUST implement tail calls
+'correctly', or else life will suck.  :)  I would put this into a future
+feature bin, because it could be pretty handy...
+
+>  A pair of important synchronization instr'ns to think about:
+>    load-linked
+>    store-conditional
+
+What is 'load-linked'?  I think that (at least for now) I should add these
+to the 'possible extensions' section, because they are not immediately
+needed...
+
+> Other classes of instructions that are valuable for pipeline
+> performance:
+>    conditional-move            
+>    predicated instructions
+
+Conditional move is effectly a special case of a predicated
+instruction... and I think that all predicated instructions can possibly
+be implemented later in LLVM.  It would significantly change things, and
+it doesn't seem to be very neccesary right now.  It would seem to
+complicate flow control analysis a LOT in the virtual machine.  I would
+tend to prefer that a predicated architecture like IA64 convert from a
+"basic block" representation to a predicated rep as part of it's dynamic
+complication phase.  Also, if a basic block contains ONLY a move, then
+that can be trivally translated into a conditional move...
+
+> I agree that we need a static data space.  Otherwise, emulating global
+> data gets unnecessarily complex.
+
+Definately.  Also a later item though.  :)
+
+> We once talked about adding a symbolic thread-id field to each
+> ..
+> Instead, it could a great topic for a separate study.
+
+Agreed.  :)
+
+> What is the semantics of the IA64 stop bit?
+
+Basically, the IA64 writes instructions like this:
+mov ...
+add ...
+sub ...
+op xxx
+op xxx
+;;
+mov ...
+add ...
+sub ...
+op xxx
+op xxx
+;;
+
+Where the ;; delimits a group of instruction with no dependencies between
+them, which can all be executed concurrently (to the limits of the
+available functional units).  The ;; gets translated into a bit set in one
+of the opcodes.
+
+The advantages of this representation is that you don't have to do some
+kind of 'thread id scheduling' pass by having to specify ahead of time how
+many threads to use, and the representation doesn't have a per instruction
+overhead...
+
+> And finally, another thought about the syntax for arrays :-)
+>  Although this syntax:
+>         array <dimension-list> of <type>
+>  is verbose, it will be used only in the human-readable assembly code so
+>  size should not matter.  I think we should consider it because I find it
+>  to be the clearest syntax.  It could even make arrays of function
+>  pointers somewhat readable.
+
+My only comment will be to give you an example of why this is a bad
+idea.  :)
+
+Here is an example of using the switch statement (with my recommended
+syntax):
+
+switch uint %val, label %otherwise, 
+       [%3 x {uint, label}] [ { uint %57, label %l1 }, 
+                              { uint %20, label %l2 }, 
+                              { uint %14, label %l3 } ]
+
+Here it is with the syntax you are proposing:
+
+switch uint %val, label %otherwise, 
+       array %3 of {uint, label} 
+              array of {uint, label}
+                              { uint %57, label %l1 },
+                              { uint %20, label %l2 },
+                              { uint %14, label %l3 }
+
+Which is ambiguous and very verbose. It would be possible to specify
+constants with [] brackets as in my syntax, which would look like this:
+
+switch uint %val, label %otherwise,
+       array %3 of {uint, label}  [ { uint %57, label %l1 },
+                                    { uint %20, label %l2 },
+                                    { uint %14, label %l3 } ]
+
+But then the syntax is inconsistent between type definition and constant
+definition (why do []'s enclose the constants but not the types??).  
+
+Anyways, I'm sure that there is much debate still to be had over
+this... :)
+
+-Chris
+
+http://www.nondot.org/~sabre/os/
+http://www.nondot.org/MagicStats/
+http://korbit.sourceforge.net/
+
+
diff --git a/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
new file mode 100644
index 0000000000..2c7534d9da
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
@@ -0,0 +1,39 @@
+Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: LLVM Concerns...
+
+
+I've updated the documentation to include load store and allocation
+instructions (please take a look and let me know if I'm on the right
+track):
+
+file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops
+
+I have a couple of concerns I would like to bring up:
+
+1. Reference types
+   Right now, I've spec'd out the language to have a pointer type, which
+   works fine for lots of stuff... except that Java really has
+   references: constrained pointers that cannot be manipulated: added and
+   subtracted, moved, etc... Do we want to have a type like this?  It
+   could be very nice for analysis (pointer always points to the start of
+   an object, etc...) and more closely matches Java semantics.  The
+   pointer type would be kept for C++ like semantics.  Through analysis,
+   C++ pointers could be promoted to references in the LLVM
+   representation.
+
+2. Our "implicit" memory references in assembly language:
+   After thinking about it, this model has two problems:
+      A. If you do pointer analysis and realize that two stores are
+         independent and can share the same memory source object, there is
+         no way to represent this in either the bytecode or assembly.
+      B. When parsing assembly/bytecode, we effectively have to do a full
+         SSA generation/PHI node insertion pass to build the dependencies
+         when we don't want the "pinned" representation.  This is not
+         cool.
+   I'm tempted to make memory references explicit in both the assembly and
+   bytecode to get around this... what do you think?
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
new file mode 100644
index 0000000000..505343378d
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
@@ -0,0 +1,47 @@
+Date: Tue, 13 Feb 2001 18:25:42 -0600
+From: Vikram S. Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <sabre@nondot.org>
+Subject: RE: LLVM Concerns...
+
+> 1. Reference types
+>    Right now, I've spec'd out the language to have a pointer type, which
+>    works fine for lots of stuff... except that Java really has
+>    references: constrained pointers that cannot be manipulated: added and
+>    subtracted, moved, etc... Do we want to have a type like this?  It
+>    could be very nice for analysis (pointer always points to the start of
+>    an object, etc...) and more closely matches Java semantics.  The
+>    pointer type would be kept for C++ like semantics.  Through analysis,
+>    C++ pointers could be promoted to references in the LLVM
+>    representation.
+
+
+You're right, having references would be useful.  Even for C++ the *static*
+compiler could generate references instead of pointers with fairly
+straightforward analysis.  Let's include a reference type for now.  But I'm
+also really concerned that LLVM is becoming big and complex and (perhaps)
+too high-level.  After we get some initial performance results, we may have
+a clearer idea of what our goals should be and we should revisit this
+question then.
+
+> 2. Our "implicit" memory references in assembly language:
+>    After thinking about it, this model has two problems:
+>       A. If you do pointer analysis and realize that two stores are
+>          independent and can share the same memory source object,
+
+not sure what you meant by "share the same memory source object"
+
+> there is
+>          no way to represent this in either the bytecode or assembly.
+>       B. When parsing assembly/bytecode, we effectively have to do a full
+>          SSA generation/PHI node insertion pass to build the dependencies
+>          when we don't want the "pinned" representation.  This is not
+>          cool.
+
+I understand the concern.  But again, let's focus on the performance first
+and then look at the language design issues.  E.g., it would be good to know
+how big the bytecode files are before expanding them further.  I am pretty
+keen to explore the implications of LLVM for mobile devices.  Both bytecode
+size and power consumption are important to consider there.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
new file mode 100644
index 0000000000..923aa62037
--- /dev/null
+++ b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
@@ -0,0 +1,12 @@
+By Chris:
+
+LLVM has been designed with two primary goals in mind.  First we strive to enable the best possible division of labor between static and dynamic compilers, and second, we need a flexible and powerful interface between these two complementary stages of compilation.  We feel that providing a solution to these two goals will yield an excellent solution to the performance problem faced by modern architectures and programming languages.
+
+A key insight into current compiler and runtime systems is that a compiler may fall in anywhere in a "continuum of compilation" to do its job.  On one side, scripting languages statically compile nothing and dynamically compile (or equivalently, interpret) everything.  On the far other side, traditional static compilers process everything statically and nothing dynamically.  These approaches have typically been seen as a tradeoff between performance and portability.  On a deeper level, however, there are two reasons that optimal system performance may be obtained by a system somewhere in between these two extremes: Dynamic application behavior and social constraints.
+
+From a technical perspective, pure static compilation cannot ever give optimal performance in all cases, because applications have varying dynamic behavior that the static compiler cannot take into consideration.  Even compilers that support profile guided optimization generate poor code in the real world, because using such optimization tunes that application to one particular usage pattern, whereas real programs (as opposed to benchmarks) often have several different usage patterns.
+
+On a social level, static compilation is a very shortsighted solution to the performance problem.  Instruction set architectures (ISAs) continuously evolve, and each implementation of an ISA (a processor) must choose a set of tradeoffs that make sense in the market context that it is designed for.  With every new processor introduced, the vendor faces two fundamental problems: First, there is a lag time between when a processor is introduced to when compilers generate quality code for the architecture.  Secondly, even when compilers catch up to the new architecture there is often a large body of legacy code that was compiled for previous generations and will not or can not be upgraded.  Thus a large percentage of code running on a processor may be compiled quite sub-optimally for the current characteristics of the dynamic execution environment.
+
+For these reasons, LLVM has been designed from the beginning as a long-term solution to these problems.  Its design allows the large body of platform independent, static, program optimizations currently in compilers to be reused unchanged in their current form.  It also provides important static type information to enable powerful dynamic and link time optimizations to be performed quickly and efficiently.  This combination enables an increase in effective system performance for real world environments.
+
diff --git a/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
new file mode 100644
index 0000000000..2e0b794075
--- /dev/null
+++ b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
@@ -0,0 +1,202 @@
+Meeting notes: Implementation idea: Exception Handling in C++/Java
+
+The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM.
+We decided that the best solution requires a set of library calls provided by
+the VM, as well as an extension to the LLVM function invocation syntax.
+
+The LLVM function invocation instruction previously looks like this (ignoring
+types):
+
+  call func(arg1, arg2, arg3)
+
+The extension discussed today adds an optional "with" clause that 
+associates a label with the call site.  The new syntax looks like this:
+
+  call func(arg1, arg2, arg3) with funcCleanup
+
+This funcHandler always stays tightly associated with the call site (being
+encoded directly into the call opcode itself), and should be used whenever
+there is cleanup work that needs to be done for the current function if 
+an exception is thrown by func (or if we are in a try block).
+
+To support this, the VM/Runtime provide the following simple library 
+functions (all syntax in this document is very abstract):
+
+typedef struct { something } %frame;
+  The VM must export a "frame type", that is an opaque structure used to 
+  implement different types of stack walking that may be used by various
+  language runtime libraries. We imagine that it would be typical to 
+  represent a frame with a PC and frame pointer pair, although that is not 
+  required.
+
+%frame getStackCurrentFrame();
+  Get a frame object for the current function.  Note that if the current
+  function was inlined into its caller, the "current" frame will belong to
+  the "caller".
+
+bool isFirstFrame(%frame f);
+  Returns true if the specified frame is the top level (first activated) frame
+  for this thread.  For the main thread, this corresponds to the main() 
+  function, for a spawned thread, it corresponds to the thread function.
+
+%frame getNextFrame(%frame f);
+  Return the previous frame on the stack.  This function is undefined if f
+  satisfies the predicate isFirstFrame(f).
+
+Label *getFrameLabel(%frame f);
+  If a label was associated with f (as discussed below), this function returns
+  it.  Otherwise, it returns a null pointer.
+
+doNonLocalBranch(Label *L);
+  At this point, it is not clear whether this should be a function or 
+  intrinsic.  It should probably be an intrinsic in LLVM, but we'll deal with
+  this issue later.
+
+
+Here is a motivating example that illustrates how these facilities could be
+used to implement the C++ exception model:
+
+void TestFunction(...) {
+  A a; B b;
+  foo();        // Any function call may throw
+  bar();
+  C c;
+
+  try {
+    D d;
+    baz();
+  } catch (int) {
+    ...int Stuff...
+    // execution continues after the try block: the exception is consumed
+  } catch (double) {
+    ...double stuff...
+   throw;            // Exception is propogated
+  }
+}
+
+This function would compile to approximately the following code (heavy 
+pseudo code follows):
+
+Func:
+  %a = alloca A
+  A::A(%a)        // These ctors & dtors could throw, but we ignore this 
+  %b = alloca B   // minor detail for this example
+  B::B(%b)
+
+  call foo() with fooCleanup // An exception in foo is propogated to fooCleanup
+  call bar() with barCleanup // An exception in bar is propogated to barCleanup
+
+  %c = alloca C
+  C::C(c)
+  %d = alloca D
+  D::D(d)
+  call baz() with bazCleanup // An exception in baz is propogated to bazCleanup
+  d->~D();
+EndTry:                   // This label corresponds to the end of the try block
+  c->~C()       // These could also throw, these are also ignored
+  b->~B()
+  a->~A()
+  return
+
+Note that this is a very straight forward and literal translation: exactly
+what we want for zero cost (when unused) exception handling.  Especially on
+platforms with many registers (ie, the IA64) setjmp/longjmp style exception
+handling is *very* impractical.  Also, the "with" clauses describe the 
+control flow paths explicitly so that analysis is not adversly effected.
+
+The foo/barCleanup labels are implemented as:
+
+TryCleanup:          // Executed if an exception escapes the try block  
+  c->~C()
+barCleanup:          // Executed if an exception escapes from bar()
+  // fall through
+fooCleanup:          // Executed if an exception escapes from foo()
+  b->~B()
+  a->~A()
+  Exception *E = getThreadLocalException()
+  call throw(E)      // Implemented by the C++ runtime, described below
+
+Which does the work one would expect.  getThreadLocalException is a function
+implemented by the C++ support library.  It returns the current exception 
+object for the current thread.  Note that we do not attempt to recycle the 
+shutdown code from before, because performance of the mainline code is 
+critically important.  Also, obviously fooCleanup and barCleanup may be 
+merged and one of them eliminated.  This just shows how the code generator 
+would most likely emit code.
+
+The bazCleanup label is more interesting.  Because the exception may be caught
+by the try block, we must dispatch to its handler... but it does not exist
+on the call stack (it does not have a VM Call->Label mapping installed), so 
+we must dispatch statically with a goto.  The bazHandler thus appears as:
+
+bazHandler:
+  d->~D();    // destruct D as it goes out of scope when entering catch clauses
+  goto TryHandler
+
+In general, TryHandler is not the same as bazHandler, because multiple 
+function calls could be made from the try block.  In this case, trivial 
+optimization could merge the two basic blocks.  TryHandler is the code 
+that actually determines the type of exception, based on the Exception object
+itself.  For this discussion, assume that the exception object contains *at
+least*:
+
+1. A pointer to the RTTI info for the contained object
+2. A pointer to the dtor for the contained object
+3. The contained object itself
+
+Note that it is neccesary to maintain #1 & #2 in the exception object itself
+because objects without virtual function tables may be thrown (as in this 
+example).  Assuming this, TryHandler would look something like this:
+
+TryHandler: 
+  Exception *E = getThreadLocalException();
+  switch (E->RTTIType) {
+  case IntRTTIInfo:
+    ...int Stuff...       // The action to perform from the catch block
+    break;
+  case DoubleRTTIInfo:
+    ...double Stuff...    // The action to perform from the catch block
+    goto TryCleanup       // This catch block rethrows the exception
+    break;                // Redundant, eliminated by the optimizer
+  default:
+    goto TryCleanup       // Exception not caught, rethrow
+  }
+
+  // Exception was consumed
+  if (E->dtor)
+    E->dtor(E->object)    // Invoke the dtor on the object if it exists
+  goto EndTry             // Continue mainline code...
+
+And that is all there is to it.
+
+The throw(E) function would then be implemented like this (which may be 
+inlined into the caller through standard optimization):
+
+function throw(Exception *E) {
+  // Get the start of the stack trace...
+  %frame %f = call getStackCurrentFrame()
+
+  // Get the label information that corresponds to it
+  label * %L = call getFrameLabel(%f)
+  while (%L == 0 && !isFirstFrame(%f)) {
+    // Loop until a cleanup handler is found
+    %f = call getNextFrame(%f)
+    %L = call getFrameLabel(%f)
+  }
+
+  if (%L != 0) {
+    call setThreadLocalException(E)   // Allow handlers access to this...
+    call doNonLocalBranch(%L)
+  }
+  // No handler found!
+  call BlowUp()         // Ends up calling the terminate() method in use
+}
+
+That's a brief rundown of how C++ exception handling could be implemented in
+llvm.  Java would be very similar, except it only uses destructors to unlock
+synchronized blocks, not to destroy data.  Also, it uses two stack walks: a
+nondestructive walk that builds a stack trace, then a destructive walk that
+unwinds the stack as shown here. 
+
+It would be trivial to get exception interoperability between C++ and Java.
+
diff --git a/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
new file mode 100644
index 0000000000..3375365f54
--- /dev/null
+++ b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
@@ -0,0 +1,45 @@
+Date: Sat, 19 May 2001 19:09:13 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: RE: Meeting writeup
+
+> I read it through and it looks great!
+
+Thanks!
+
+> The finally clause in Java may need more thought.  The code for this clause
+> is like a subroutine because it needs to be entered from many points (end of
+> try block and beginning of each catch block), and then needs to *return to
+> the place from where the code was entered*.  That's why JVM has the
+> jsr/jsr_w instruction.
+
+Hrm... I guess that is an implementation decision.  It can either be
+modelled as a subroutine (as java bytecodes do), which is really
+gross... or it can be modelled as code duplication (emitted once inline,
+then once in the exception path).  Because this could, at worst,
+slightly less than double the amount of code in a function (it is
+bounded) I don't think this is a big deal.  One of the really nice things
+about the LLVM representation is that it still allows for runtime code
+generation for exception paths (exceptions paths are not compiled until
+needed).  Obviously a static compiler couldn't do this though.  :)
+
+In this case, only one copy of the code would be compiled... until the
+other one is needed on demand.  Also this strategy fits with the "zero
+cost" exception model... the standard case is not burdened with extra
+branches or "call"s.
+
+> I suppose you could save the return address in a particular register
+> (specific to this finally block), jump to the finally block, and then at the
+> end of the finally block, jump back indirectly through this register.  It
+> will complicate building the CFG but I suppose that can be handled.  It is
+> also unsafe in terms of checking where control returns (which is I suppose
+> why the JVM doesn't use this).
+
+I think that a code duplication method would be cleaner, and would avoid
+the caveats that you mention.  Also, it does not slow down the normal case
+with an indirect branch...
+
+Like everything, we can probably defer a final decision until later.  :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
new file mode 100644
index 0000000000..d542fb478c
--- /dev/null
+++ b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
@@ -0,0 +1,63 @@
+Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: Interesting: GCC passes
+
+
+Take a look at this document (which describes the order of optimizations
+that GCC performs):
+
+http://gcc.gnu.org/onlinedocs/gcc_17.html
+
+The rundown is that after RTL generation, the following happens:
+
+1 . [t] jump optimization (jumps to jumps, etc)
+2 . [t] Delete unreachable code
+3 .     Compute live ranges for CSE
+4 . [t] Jump threading (jumps to jumps with identical or inverse conditions)
+5 . [t] CSE
+6 . *** Conversion to SSA 
+7 . [t] SSA Based DCE
+8 . *** Conversion to LLVM
+9 .     UnSSA
+10.     GCSE
+11.     LICM
+12.     Strength Reduction
+13.     Loop unrolling
+14. [t] CSE
+15. [t] DCE
+16.     Instruction combination, register movement, scheduling... etc.
+
+I've marked optimizations with a [t] to indicate things that I believe to
+be relatively trivial to implement in LLVM itself.  The time consuming
+things to reimplement would be SSA based PRE, Strength reduction & loop
+unrolling... these would be the major things we would miss out on if we
+did LLVM creation from tree code [inlining and other high level
+optimizations are done on the tree representation].
+
+Given the lack of "strong" optimizations that would take a long time to
+reimplement, I am leaning a bit more towards creating LLVM from the tree
+code.  Especially given that SGI has GPL'd their compiler, including many
+SSA based optimizations that could be adapted (besides the fact that their
+code looks MUCH nicer than GCC :)
+
+Even if we choose to do LLVM code emission from RTL, we will almost
+certainly want to move LLVM emission from step 8 down until at least CSE
+has been rerun... which causes me to wonder if the SSA generation code
+will still work (due to global variable dependancies and stuff).  I assume
+that it can be made to work, but might be a little more involved than we
+would like.
+
+I'm continuing to look at the Tree -> RTL code.  It is pretty gross
+because they do some of the translation a statement at a time, and some
+of it a function at a time...  I'm not quite clear why and how the
+distinction is drawn, but it does not appear that there is a wonderful
+place to attach extra info.
+
+Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now.  We
+can talk about this more on Monday.
+
+Wouldn't it be nice if there were a obvious decision to be made?  :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
new file mode 100644
index 0000000000..6c9e0971a0
--- /dev/null
+++ b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
@@ -0,0 +1,71 @@
+Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: RE: Interesting: GCC passes
+
+> That is very interesting.  I agree that some of these could be done on LLVM
+> at link-time, but it is the extra time required that concerns me.  Link-time
+> optimization is severely time-constrained.
+
+If we were to reimplement any of these optimizations, I assume that we
+could do them a translation unit at a time, just as GCC does now.  This
+would lead to a pipeline like this:
+
+Static optimizations, xlation unit at a time:
+.c --GCC--> .llvm --llvmopt--> .llvm 
+
+Link time optimizations:
+.llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm 
+
+Of course, many optimizations could be shared between llvmopt and
+llvm-link-opt, but the wouldn't need to be shared...  Thus compile time
+could be faster, because we are using a "smarter" IR (SSA based).
+
+> BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and
+> putting it into another is not necessarily easier than re-doing it.
+> Optimization code is usually heavily tied in to the specific IR they use.
+
+Understood.  The only reason that I brought this up is because SGI's IR is
+more similar to LLVM than it is different in many respects (SSA based,
+relatively low level, etc), and could be easily adapted.  Also their
+optimizations are written in C++ and are actually somewhat
+structured... of course it would be no walk in the park, but it would be
+much less time consuming to adapt, say, SSA-PRE than to rewrite it.
+
+> But your larger point is valid that adding SSA based optimizations is
+> feasible and should be fun.  (Again, link time cost is the issue.)
+
+Assuming linktime cost wasn't an issue, the question is: 
+Does using GCC's backend buy us anything?
+
+> It also occurs to me that GCC is probably doing quite a bit of back-end
+> optimization (step 16 in your list).  Do you have a breakdown of that?
+
+Not really.  The irritating part of GCC is that it mixes it all up and
+doesn't have a clean seperation of concerns.  A lot of the "back end
+optimization" happens right along with other data optimizations (ie, CSE
+of machine specific things).
+
+As far as REAL back end optimizations go, it looks something like this:
+
+1. Instruction combination: try to make CISCy instructions, if available
+2. Register movement: try to get registers in the right places for the
+architecture to avoid register to register moves.  For example, try to get
+the first argument of a function to naturally land in %o0 for sparc.
+3. Instruction scheduling: 'nuff said :)
+4. Register class preferencing: ??
+5. Local register allocation
+6. global register allocation
+7. Spilling
+8. Local regalloc
+9. Jump optimization
+10. Delay slot scheduling
+11. Branch shorting for CISC machines
+12. Instruction selection & peephole optimization
+13. Debug info output
+
+But none of this would be usable for LLVM anyways, unless we were using
+GCC as a static compiler.
+
+-Chris
+