Open LLVM Projects

This document is meant to be a sort of "big TODO list" for LLVM. Each project in this document is something that would be useful for LLVM to have, and would also be a great way to get familiar with the system. Some of these projects are small and self-contained, which may be implemented in a couple of days, others are larger. Several of these projects may lead to interesting research projects in their own right. In any case, we welcome all contributions.

If you are thinking about tackling one of these projects, please send a mail to the LLVM Developer's mailing list, so that we know the project is being worked on. Additionally this is a good way to get more information about a specific project or to suggest other projects to add to this page.

The projects in this page are open-ended. More specific projects are filed as unassigned enhancements in the LLVM bug tracker. See the list of currently outstanding issues if you wish to help improve LLVM.

Improvements to the current infrastructure are always very welcome and tend to be fairly straight-forward to implement. Here are some of the key areas that can use improvement...

The LLVM bug tracker occasionally has "code-cleanup" bugs filed in it. Taking one of these and fixing it is a good way to get your feet wet in the LLVM code and discover how some of its components work.

It would be very useful to port glibc to LLVM. This would allow a variety of interprocedural algorithms to be much more effective in the face of library calls. The most important pieces to port are things like the string library and the stdio related functions... low-level system calls like 'read' should stay unimplemented in LLVM.

We are always looking for new testcases and benchmarks for use with LLVM. In particular, it is useful to try compiling your favorite C source code with LLVM. If it doesn't compile, try to figure out why or report it to the llvm-bugs list. If you get the program to compile, it would be extremely useful to convert the build system to be compatible with the LLVM Programs testsuite so that we can check it into CVS and the automated tester can use it to track progress of the compiler.

When testing a code, try running it with a variety of optimizations, and with all the back-ends: CBE, llc, and lli.

Add support for platform-independent prefetch support. The GCC prefetch project page has a good survey of the prefetching capabilities of a variety of modern processors.

Someone needs to look into getting the ranlib tool to index LLVM bytecode files, so that linking in .a files is not hideously slow. They would also then have to implement the reader for this index in gccld.
Rework the PassManager to be more flexible
Some transformations and analyses only work on reducible flow graphs. It would be nice to have a transformation which could be "required" by these passes which makes irreducible graphs reducible. This can easily be accomplished through code duplication. See Making Graphs Reducible with Controlled Node Splitting and perhaps Nesting of Reducible and Irreducible Loops.

Sometimes creating new things is more fun than improving existing things. These projects tend to be more involved and perhaps require more work, but can also be very rewarding.

Many ideas for feature requests are stored in LLVM bugzilla. Just search for bugs with a "new-feature" keyword.

We have a strong base for development of both pointer analysis based optimizations as well as pointer analyses themselves. It seems natural to want to take advantage of this...

Implement a flow-sensitive context-sensitive alias analysis algorithm
- Pick one of the somewhat efficient algorithms, but strive for maximum precision
Implement a flow-sensitive context-insensitive alias analysis algorithm
- Just an efficient local algorithm perhaps?
Implement alias-analysis-based optimizations:
- Dead store elimination
- ...

We now have a unified infrastructure for writing profile-guided transformations, which will work either at offline-compile-time or in the JIT, but we don't have many transformations. We would welcome new profile-guided transformations as well as improvements to the current profiling system.

Ideas for profile guided transformations:

Superblock formation (with many optimizations)
Loop unrolling/peeling
Profile directed inlining
Code layout
...

Improvements to the existing support:

The current block and edge profiling code that gets inserted is very simple and inefficient. Through the use of control-dependence information, many fewer counters could be inserted into the code. Also, if the execution count of a loop is known to be a compile-time or runtime constant, all of the counters in the loop could be avoided.
You could implement one of the "static profiling" algorithms which analyze a piece of code an make educated guesses about the relative execution frequencies of various parts of the code.
You could add path profiling support, or adapt the existing LLVM path profiling code to work with the generic profiling interfaces.

Implement a Dependence Analysis Infrastructure
- Design some way to represent and query dep analysis
Implement a strength reduction pass
Value range propagation pass

Implement a better instruction selector
Implement support for the "switch" instruction without requiring the lower-switches pass.
Implement interprocedural register allocation. The CallGraphSCCPass can be used to implement a bottom-up analysis that will determine the *actual* registers clobbered by a function. Use the pass to fine tune register usage in callers based on *actual* registers used by the callee.

Port the Bigloo Scheme compiler, from Manuel Serrano at INRIA Sophia-Antipolis, to output LLVM bytecode. It seems that it can already output .NET bytecode, JVM bytecode, and C, so LLVM would ostensibly be another good candidate.
Write a new frontend for C/C++ in C++, giving us the ability to directly use LLVM C++ classes from within a compiler rather than use C-based wrapper functions a la llvm-gcc. One possible starting point is the C++ yacc grammar by Ed Willink.
Write a new frontend for some other language (Java? OCaml? Forth?)
Write a new backend for a target (IA64? MIPS? MMIX?)
Write a disassembler for machine code that would use TableGen to output MachineInstrs for transformations, optimizations, etc.
Random test vector generator: Use a C grammar to generate random C code; run it through llvm-gcc, then run a random set of passes on it using opt. Try to crash opt. When opt crashes, use bugpoint to reduce the test case and mail the result to yourself. Repeat ad infinitum.
Design a simple, recognizable logo.