Skip to content

guangwong/pyston

 
 

Repository files navigation

Pyston Build Status Gitter

Pyston is a new, under-development Python implementation built using LLVM and modern JIT techniques with the goal of achieving good performance.

We have a small website pyston.org, which for now just hosts the mailing lists and the blog. We have two mailing lists: pyston-dev@ for development-related discussions, and pyston-announce@ which is for wider announcements (new releases, major project changes).

Current state

Pyston should be considered in early alpha: it "works" in that it can successfully run Python code, but it is still quite far from being useful for end-users.

Currently, Pyston targets Python 2.7, only runs on x86_64 platforms, and only has been tested on Ubuntu. Support for more platforms -- along with Python 3 compatibility -- is desired but currently not on the roadmap.

Note: Pyston does not currently work on Mac OSX, and it is not clear when it will.

Contributing

Pyston welcomes any kind of contribution; please see CONTRIBUTING.md for details.

tl;dr: You will need to sign the Dropbox CLA and run the tests.

Roadmap

  • Focus was on building and validating the core Python-to-LLVM JIT infrastructure.
  • Many core parts of the language were missing.
  • Focus was on improving language compatibility to the point that we can start running "real code" in the form of existing benchmarks.
  • Many new features:
  • Exceptions
  • Class inheritance, metaclasses
  • Basic native C API support
  • Closures, generators, lambdas, generator expressions
  • Default arguments, keywords, *args, **kwargs
  • Longs, and integer promotion
  • Multithreading support
  • We have allowed performance to regress, sometimes considerably, but (hopefully) in places that allow for more efficient implementations as we have time.
  • Better language support
  • Can self-host all of our internal Python scripts
  • Better performance
  • Match CPython's performance on our small benchmark suite
v0.4: Coming soon

Getting started

To get a full development environment for Pyston, you will need pretty recent versions of various tools. The docs/INSTALLING.md file contains information about what the tools are, how to get them, and how to install them; currently it can take up to an hour to get them all built on a quad-core machine.

To simply build and run Pyston, a smaller set of dependencies is required; see docs/INSTALLING.md, but skip the "OPTIONAL DEPENDENCIES" section. Once all the dependencies are installed, you should be able to do

$ make check -j4

And see that hopefully all of the tests pass. (If they don't, please email pyston-dev.)

We also have a new CMake-based build system which is easier and faster to get running; see the end of INSTALLING.md for information.

All pull requests are built and tested by travis-ci.org running Ubuntu 12.04. See travis-ci.org/dropbox/pyston/builds.

Running Pyston

Pyston builds in a few different configurations; right now there is pyston_dbg, which is the debug configuration and contains assertions and debug symbols, and pyston_release, the release configuration which has no assertions or debug symbols, and has full optimizations. You can build them by saying make pyston_dbg or make pyston_release, respectively. If you are interested in seeing how fast Pyston can go, you should try the release configuration, but there is a good chance that it will crash, in which case you can run the debug configuration to see what is happening.

There are a number of other configurations useful for development: "pyston_debug" contains full LLVM debug information, but will weigh in at a few hundred MB. "pyston_prof" contains gprof-style profiling instrumentation; gprof can't profile JIT'd code, reducing it's usefulness in this case, but the configuration has stuck around since it gets compiled with gcc, and can expose issues with the normal clang-based build.

You can get a simple REPL by simply typing make run; it is not very robust right now, and only supports single-line statements, but can give you an interactive view into how Pyston works. To get more functionality, you can do ./pyston_dbg -i [your_source_file.py], which will go into the REPL after executing the given file, letting you access all the variables you had defined.

Makefile targets

  • make check: run the tests
  • make run: run the REPL
  • make format: run clang-format over the codebase
  • We have a number of helpers of the form make VERB_TESTNAME, where TESTNAME can be any of the tests/benchmarks, and VERB can be one of:
  • make run_TESTNAME: runs the file under pyston_dbg.
  • make run_release_TESTNAME: runs the file under pyston_release.
  • make dbg_TESTNAME: same as run, but runs pyston under gdb.
  • make check_TESTNAME: checks that the script has the same behavior under pyston_dbg as it does under CPython. See tools/tester.py for information about test annotations.
  • make perf_TESTNAME: runs the script in pyston_release, and uses perf to record and display performance statistics.
  • A few lesser used ones; see the Makefile for details.
  • make watch_cmd: meta-command which uses inotifywait to run make cmd every time a source file changes.
  • For example, make watch_pyston_dbg will rebuild pyston_dbg every time you save a source file. This is handy enough to have the alias make watch.
  • make watch_run_TESTNAME will rebuild pyston_dbg and run TESTNAME every time you change a file.
  • make wdbg_TESTNAME is mostly an alias for make watch_dbg_TESTNAME, but will automatically quit GDB for you. This is handy if pyston is crashing and you want to get a C-level stacktrace.

There are a number of common flags you can pass to your make invocations:

  • V=1 or VERBOSE=1: display the full commands being executed
  • ARGS=-v: pass the given args (in this example, -v) to the executable.
  • Note: these will usually end up before the script name, and so apply to the pyston runtime as opposed to appearing in sys.argv. For example, make run_test ARGS=-v will execute ./pyston_dbg -v test.py.
  • BR=breakpoint: when running under gdb, automatically set a breakpoint at the given location.
  • SELF_HOST=1: run all of our Python scripts using pyston_dbg.

For a full list, please check out the (Makefile)[https://github.com/dropbox/pyston/blob/master/Makefile].

Pyston command-line options:

Pyston-specific flags:

-q
Set verbosity to 0
-v
Increase verbosity by 1
-s
Print out the internal stats at exit.
-n
Disable the Pyston interpreter. This is mostly used for debugging, to force the use of higher compilation tiers in situations they wouldn't typically be used.
-O
Force Pyston to always run at the highest compilation tier. This doesn't always produce the fastest running time due to the lack of type recording from lower compilation tiers, but similar to -n can help test the code generator.
-I
Force always using the Pyston interpreter. This is mostly used for debugging / testing. (Takes precedence over -n and -O)
-r
Use a stripped stdlib. When running pyston_dbg, the default is to use a stdlib with full debugging symbols enabled. Passing -r changes this behavior to load a slimmer, stripped stdlib.
-x
Experimental: use the pypa parser.

Standard Python flags:

-i
Go into the repl after executing the given script.

There are also some lesser-used flags; see src/jit.cpp for more details.


Technical features

Compilation tiers

Pyston currently features four compilation tiers. In increasing order of speed, but also compilation time:

  1. An AST interpreter. We do some basic transformations on the AST beforehand, to make it easier and faster to interpret.
  2. Baseline LLVM compilation. Runs no LLVM optimizations, and no type speculation, and simply hands off the generated code to the LLVM code generator. This tier does type recording for the final tier. We are thinking of replacing this with a simple non-LLVM JIT tier at some point.
  3. Improved LLVM compilation. Behaves very similarly to baseline LLVM compilation, so this tier will probably be removed in the near future.
  4. Full LLVM optimization + compilation. This tier runs full LLVM optimizations, and uses type feedback from lower tiers. This tier currently kicks in after 10000 loop iterations, or 10000 calls to a function.

There are two main ways that Pyston can move up to higher tiers:

  • If a function gets called often, it will get recompiled at a higher tier and the new version will be called instead.
  • If a loop gets iterated enough times, Pyston will OSR to a higher tier within the same function.

Pyston can move back down to the AST interpreter by using our frame introspection machinery to do a deoptimization.

Frame introspection

Pyston uses LLVM's patchpoint functionality to convey information from the LLVM code generator to the runtime. By attaching all local variables as stackmap arguments, at any callsite we can access all of the frame's local variables. We use this to implement user-level features such as eval() and locals(), and also to implement internal features such as deoptimization.

We have a blog post that goes into more detail.

OSR

Pyston uses OSR (which stands for On-Stack Replacement, though Pyston does not use that particular mechanism) to move up to a higher tier while inside a function -- this can be important for functions that are expensive the very first time they are called.

OSR is implemented in Pyston by keeping a count, per backedge, of the number of times that the backedge is taken. Once a certain threshold is reached (currently 10 for the interpreter, 10000 otherwise), Pyston will compile a special OSR-entry version of the function. This function takes as arguments all the local variables for that point in the program, and continues execution where the previous function left off.

For example, this Python function:

def square(n):
    r = 0
    for i in xrange(n):
        r += n
    return r

will get translated to something similar to:

static _backedge_trip_count = 0;
int square(int n) {
    int r = 0;
    for (int i = 0; i < n; i++) {
        r += n;
        
        // OSR exit here:
        _backedge_trip_count++;
        if (_backedge_trip_count >= 10000) {
            auto osr_entry = compileOsrEntry();
            return osr_entry(n, i, r);
        }
    }
    return r;
}

The compiled OSR entry will look something similar to:

int square_osrentry(int n, int i, int r) {
    for (; i < n; i++) {
        r += n;
    }
    return r;
}

The pseudo-C shown above doesn't look that different; the benefit of this approach is that the square() function can be compiled at a low (cheap) compilation tier or even interpreted, but the square_osrentry can be compiled at a higher one since the compilation time is much more likely to pay off.

This approach seems to work, but has a couple drawbacks:

  • It's currently tracked per backedge rather than per backedge-target, which can lead to more OSR compilations than necessary.
  • The OSR'd version can be slower due to the optimizations having less context about the source of the arguments, ie that they're local variables that haven't escaped.

Inlining

Pyston can inline functions from its runtime into the code that it's JIT'ing. This only happens if, at JIT time, it can guarantee the runtime function that would end up getting called, which typically happens if it is an attribute of a guaranteed type. For instance, [].append() will end up resolving to the internal listAppend(), since we know what the type of [] is.

Once the Python-level call is resolved to a C-level call to a runtime function, normal inlining heuristics kick in to determine if it is profitable to inline the function. As a side note, the inlining is only possible because the LLVM IR for the runtime is not only compiled to machine code to be run, but also directly embedded as LLVM IR into the pyston binary, so that the LLVM IR can be inlined.

Object representation

Current Pyston uses an 'everything is boxed' model. It has some ability to deal with unboxed variants of ints, floats, and bools, but those unboxed types are not mixable with boxed types. ie if you put an integer into a list, the integer will always get boxed first.

Inline caches

Hidden classes

Type feedback

Currently, tiers 2 and 3 support type recording, and make a record of the types seen at specifically-designated parts of the program.

Tier 4 then looks at the type record; the current heuristic is that if the same type has been seen 100 times in a row, the compiler will speculate

Garbage collection

Pyston currently utilizes a conservative garbage collector -- this means that GC roots aren't tracked directly, but rather all GC-managed memory is scanned for values that could point into the GC heap, and treat those conservatively as pointers that keep the pointed-to GC memory alive.

Currently, the Pyston's GC is a non-copying, non-generational, stop-the-world GC. ie it is a simple implementation that will need to be improved in the future.

Native extension module support

CPython-style C extension modules can be difficult in a system that doesn't use refcounting, since a GC-managed runtime is forced to provide a refcounted API. PyPy handles this by using a compatibility layer to create refcounted objects; our hope is to do the reverse, and instead of making the runtime refcount-aware, to make the extension module GC-aware.

We have a basic implementation of this that is able to run a number of the CPython standard modules (from the Modules/ directory), and so far seems to be working. The C API is quite large and will take some time to cover.

Parallelism support

Pyston currently uses a GIL to protect threaded code. The codebase still contains an experimental "GRWL" configuration, which replaces the GIL with a read-write lock. This allows Python code to execute in parallel but still allow for critical sections (recompilation, C API calls, etc), and seems to work ok. It doesn't provide the same memory-ordering guarantees that CPython provides.

This approach has mostly been abandoned as infeasible, but you can test it by doing make pyston_grwl.

Packages

No packages published

Languages

  • Python 46.0%
  • C 36.0%
  • C++ 8.6%
  • Objective-C 4.7%
  • Assembly 2.0%
  • Shell 1.3%
  • Other 1.4%