Table of Contents for Programming Languages: a survey
Other intermediate target languages designed for implementing a single high-level language
CPython bytecode
Two stacks:
- main stack
- block stack: "Per frame, there is a stack of blocks, denoting nested loops, try statements, and such."
Stack ops:
- various rots, dups
- in-place binary arithmetic on top0, top1 (doesn't consume top1)
Arithmetic:
- arithmetical negation (-6 -> 6), boolean negation (-6 -> False), inversion (-6 -> 5), unary positive (does nothing by default but can be overridden by __pos__ in custom classes)
instructions:
quoted and paraphrased from https://docs.python.org/release/3.3.2/library/dis.html
General instructions:
- NOP
- POP_TOP (pop), ROT_TWO, ROT_THREE, DUP_TOP, DUP_TOP_TWO
Unary operations:
Unary operations take the top of the stack, apply the operation, and push the result back on the stack.
- UNARY_POSITIVE (+TOS), UNARY_NEGATIVE, UNARY_NOT (not TOS), UNARY_INVERT (~TOS), GET_ITER (iter(TOS))
Binary operations:
Binary operations remove the top of the stack (TOS) and the second top-most stack item (TOS1) from the stack. They perform the operation, and put the result back on the stack.
- BINARY_POWER (TOS1 TOS), BINARY_MULTIPLY, BINARY_FLOOR_DIVIDE (TOS1 TOS), BINARY_TRUE_DIVIDE (TOS1 / TOS), BINARY_MODULO, BINARY_ADD, BINARY_SUBTRACT, BINARY_SUBSCR(TOS1[TOS]), BINARY_LSHIFT (TOS1 TOS), BINARY_AND, BINARY_XOR, BINARY_OR
In-place operations:
In-place operations are like binary operations, in that they remove TOS and TOS1, and push the result back on the stack, but the operation is done in-place when TOS1 supports it, and the resulting TOS may be (but does not have to be) the original TOS1.
INPLACE_POWER, INPLACE_MULTIPLY, INPLACE_FLOOR_DIVIDE, INPLACE_TRUE_DIVIDE, INPLACE_MODULO, INPLACE_ADD, INPLACE_SUBTRACT, INPLACE_LSHIFT, INPLACE_RSHIFT, INPLACE_AND, INPLACE_XOR, INPLACE_OR, STORE_SUBSCR (TOS1[TOS] = TOS2), DELETE_SUBSCR (del TOS1[TOS]),
Miscellaneous opcodes:
PRINT_EXPR
- BREAK_LOOP, CONTINUE_LOOP(target) (target is the address to jump to (which should be a FOR_ITER instruction))
- SET_ADD(i) (Calls set.add(TOS1[-i], TOS)), LIST_APPEND(i) (list.append(TOS[-i], TOS)), MAP_ADD(i) (dict.setitem(TOS1[-i], TOS, TOS1)). For all of the SET_ADD, LIST_APPEND and MAP_ADD instructions, while the added value or key/value pair is popped off, the container object remains on the stack so that it is available for further iterations of the loop.
- RETURN_VALUE (Returns with TOS to the caller of the function), YIELD_VALUE (Pops TOS and yields it from a generator), YIELD_FROM (Pops TOS and delegates to it as a subiterator from a generator)
- IMPORT_STAR (Loads all symbols not starting with '_' directly from the module TOS to the local namespace. The module is popped after loading all names. This opcode implements 'from module import *')
- POP_BLOCK (Removes one block from the block stack. Per frame, there is a stack of blocks, denoting nested loops, try statements, and such)
- POP_EXCEPT (Removes one block from the block stack. The popped block must be an exception handler block, as implicitly created when entering an except handler. In addition to popping extraneous values from the frame stack, the last three popped values are used to restore the exception state.)
- END_FINALLY (Terminates a finally clause. The interpreter recalls whether the exception has to be re-raised, or whether the function returns, and continues with the outer-next block)
- LOAD_BUILD_CLASS (builtins.__build_class__() onto the stack. It is later called by CALL_FUNCTION to construct a class)
- SETUP_WITH(delta) (This opcode performs several operations before a with block starts. First, it loads __exit__() from the context manager and pushes it onto the stack for later use by WITH_CLEANUP. Then, __enter__() is called, and a finally block pointing to delta is pushed. Finally, the result of calling the enter method is pushed onto the stack. The next opcode will either ignore it (POP_TOP), or store it in (a) variable(s) (STORE_FAST, STORE_NAME, or UNPACK_SEQUENCE).)
- WITH_CLEANUP (Cleans up the stack when a with statement block exits. TOS is the context manager’s __exit__() bound method. (see https://docs.python.org/release/3.3.2/library/dis.html#opcode-WITH_CLEANUP for details)
- STORE_LOCALS (Pops TOS from the stack and stores it as the current frame’s f_locals. This is used in class construction)
Miscellaneous opcodes with arguments:
All of the following opcodes expect arguments. An argument is two bytes, with the more significant byte last.
- STORE_NAME(namei) (name = TOS; namei is the index of name in the attribute co_names of the code object. The compiler tries to use STORE_FAST or STORE_GLOBAL if possible)
- DELETE_NAME(namei) (del name)
- UNPACK_SEQUENCE(count) (Unpacks TOS into count individual values, which are put onto the stack right-to-left)
- UNPACK_EX(counts) (Implements assignment with a starred target: Unpacks an iterable in TOS into individual values, where the total number of values can be smaller than the number of items in the iterable: one the new values will be a list of all leftover items)
- STORE_ATTR(namei) (TOS.name = TOS1)
- DELETE_ATTR(namei) (del TOS.name)
- STORE_GLOBAL(namei) (STORE_NAME, but stores the name as a global)
- DELETE_GLOBAL(namei)
- LOAD_CONST(consti) (Pushes co_consts[consti] onto the stack), LOAD_NAME(namei) (Pushes the value associated with co_names[namei] onto the stack)
- BUILD_TUPLE(count) (Creates a tuple consuming count items from the stack), BUILD_LIST(count), BUILD_SET(count), BUILD_MAP(count) (Pushes a new dictionary object onto the stack. The dictionary is pre-sized to hold count entries)
- LOAD_ATTR(namei) (getattr(TOS, co_names[namei]))
- COMPARE_OP(opname) (Performs a Boolean operation. The operation name can be found in cmp_op[opname])
- IMPORT_NAME(namei) (Imports the module co_names[namei]. TOS and TOS1 are popped and provide the fromlist and level arguments of __import__(). The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent STORE_FAST instruction modifies the namespace)
- IMPORT_FROM(namei) (Loads the attribute co_names[namei] from the module found in TOS)
- JUMP_FORWARD(delta) (Increments bytecode counter by delta)
- POP_JUMP_IF_TRUE(target) (If TOS is true, sets the bytecode counter to target. TOS is popped)
- POP_JUMP_IF_FALSE(target)
- JUMP_IF_TRUE_OR_POP(target) (If TOS is true, sets the bytecode counter to target and leaves TOS on the stack. Otherwise (TOS is false), TOS is popped)
- JUMP_IF_FALSE_OR_POP(target) (If TOS is false, sets the bytecode counter to target and leaves TOS on the stack. Otherwise (TOS is true), TOS is popped)
- JUMP_ABSOLUTE(target) (Set bytecode counter to target)
- FOR_ITER(delta) (TOS is an iterator. Call its __next__() method. If this yields a new value, push it on the stack (leaving the iterator below it). If the iterator indicates it is exhausted TOS is popped, and the byte code counter is incremented by delta)
- LOAD_GLOBAL(namei) (Loads the global named co_names[namei] onto the stack)
- SETUP_LOOP(delta): Pushes a block for a loop onto the block stack. The block spans from the current instruction with a size of delta bytes.
- SETUP_EXCEPT(delta): Pushes a try block from a try-except clause onto the block stack. delta points to the first except block.
- SETUP_FINALLY(delta): Pushes a try block from a try-except clause onto the block stack. delta points to the finally block.
- STORE_MAP: Store a key and value pair in a dictionary. Pops the key and value while leaving the dictionary on the stack.
- LOAD_FAST(var_num), STORE_FAST(var_num), DELETE_FAST(var_num): local co_varnames[var_num]
- LOAD_CLOSURE(i): Pushes a reference to the cell contained in slot i of the cell and free variable storage. The name of the variable is co_cellvars[i] if i is less than the length of co_cellvars. Otherwise it is co_freevars[i - len(co_cellvars)].
- LOAD_DEREF(i) (Loads the cell contained in slot i of the cell and free variable storage. Pushes a reference to the object the cell contains on the stack), STORE_DEREF(i)
- DELETE_DEREF(i)
- RAISE_VARARGS(argc): Raises an exception. argc indicates the number of parameters to the raise statement, ranging from 0 to 3. The handler will find the traceback as TOS2, the parameter as TOS1, and the exception as TOS.
- CALL_FUNCTION(argc): Calls a function. The low byte of argc indicates the number of positional parameters, the high byte the number of keyword parameters. Pops all function arguments, and the function itself off the stack, and pushes the return value.
- MAKE_FUNCTION(argc): Pushes a new function object on the stack. TOS is the qualified name of the function; TOS1 is the code associated with the function. The function object is defined to have argc default parameters, which are found below TOS1.
- MAKE_CLOSURE(argc): Creates a new function object, sets its __closure__ slot, and pushes it on the stack. TOS is the qualified name of the function, TOS1 is the code associated with the function, and TOS2 is the tuple containing cells for the closure’s free variables. The function also has argc default parameters, which are found below the cells.
- BUILD_SLICE(argc): Pushes a slice object on the stack. argc must be 2 or 3. If it is 2, slice(TOS1, TOS) is pushed; if it is 3, slice(TOS2, TOS1, TOS) is pushed. See the slice() built-in function for more information.
todo
Prefixes any opcode which has an argument too big to fit into the default two bytes. ext holds two additional bytes which, taken together with the subsequent opcode’s argument, comprise a four-byte argument, ext being the two most-significant bytes.
CALL_FUNCTION_VAR(argc)
Calls a function. argc is interpreted as in CALL_FUNCTION. The top element on the stack contains the variable argument list, followed by keyword and positional arguments.
CALL_FUNCTION_KW(argc)
Calls a function. argc is interpreted as in CALL_FUNCTION. The top element on the stack contains the keyword arguments dictionary, followed by explicit keyword and positional arguments.
CALL_FUNCTION_VAR_KW(argc)
Calls a function. argc is interpreted as in CALL_FUNCTION. The top element on the stack contains the keyword arguments dictionary, followed by the variable-arguments tuple, followed by explicit keyword and positional arguments.
- HAVE_ARGUMENT: This is not really an opcode. It identifies the dividing line between opcodes which don’t take arguments < HAVE_ARGUMENT and those which do >= HAVE_ARGUMENT.
Note: as of Python 3.6, "The Python interpreter now uses a 16-bit wordcode instead of bytecode which made a number of opcode optimizations possible. (Contributed by Demur Rumed with input and reviews from Serhiy Storchaka and Victor Stinner in bpo-26647 and bpo-28050.)" [1].
(rationale: "The change is to have all instructions take an argument. This removes the branch on each instruction on whether to load oparg. It then also aligns instructions to always be 2 bytes rather than 1 or 3 by having arguments only take up 1 byte. In the case that an argument to an instruction is greater than 255, it can chain EXTENDED_ARG up to 3 times. In practice this rarely occurs, mostly only for jumps, & abarnert measured stdlib to be ~5% smaller
The rationale is that this offers 3 benefits: Smaller code size, simpler instruction iteration/indexing (One may now scan backwards, as peephole.c does in this patch), which between the two results in a small perf gain (the only way for perf to be negatively impacted is by an increase in EXTENDED_ARGs, when I post benchmarking I'll also post a count of how many more EXTENDED_ARGs are emitted)
This also means that if I want to create something like a tracer that tracks some information for each instruction, I can allocate an array of codesize/2 bytes, then index off of half the instruction index. This isn't currently done in peephole.c, nor does this include halving jump opargs
" ) -- [2]
Links:
Dalvik VM
todo http://www.netmite.com/android/mydroid/dalvik/docs/dalvik-bytecode.html
(todo is this wrong?) interestingly, Dalvik only has 16 registers.
however:
" The Dalvik instruction set implements an interesting compromise: it is register based, but there are a finite number of them as opposed to the theoretically infinite registers of LLVM or Parrot. Dalvik supports 65,536 registers, a vast number compared to hardware CPUs and presumably sufficient to implement SSA (if desired) in reasonably large functions.
Even more interestingly, not all Dalvik instructions can access all registers. Many Dalvik instructions dedicate 4 bits to the register number, requiring their operands to be stored in the first 16 registers. A few more instructions have an 8 bit instruction number, to access the first 256. There are also instructions to copy the value to or from any of the 65,536 registers to a low register, for a subsequent instruction to access.
It took a while to understand the rationale for this choice, and I'm still not confident I fully get it. Clearly the Dalvik designers believe that keeping data in one of the high registers will be faster than explicitly storing it to memory, even if the vast number of registers end up mostly residing in RAM. Addressing data as register numbers instead of memory addresses should make it easier for the VM to dynamically remap Dalvik registers to the real hardware registers. For example, if it can predict that virtual register 257 will likely be used in the near future it can be kept in a CPU register instead of being immediately stored to memory. " -- http://codingrelic.geekhold.com/2010/07/virtual-instruction-sets-opcode.html
Non-deprecated, non-implementation-specific instructions (many instructions have _2ADDR forms, and many also have _LIT8 and _LIT16 forms, and many have _WIDE forms, and many have _FLOAT, _INT, and _LONG forms, and many have _BOOLEAN, _BYTE, _CHAR, _OBJECT, _SHORT, and _WIDE forms, and many have _16 and _32 forms, and many of these are mixed, e.g. ADD_INT_2ADDR. I'll only list the base instruction in these cases)
ADD AGET AND APUT ARRAY_LENGTH CHECK_CAST CMPG CMPL CMP CONST CONST_CLASS CONST_HIGH16 CONST_STRING CONST_STRING_JUMBO DIV DOUBLE_TO_FLOAT DOUBLE_TO_INT DOUBLE_TO_LONG FILLED_NEW_ARRAY FILLED_NEW_ARRAY_RANGE FILL_ARRAY_DATA FLOAT_TO_DOUBLE FLOAT_TO_INT FLOAT_TO_LONG GOTO IF_EQ IF_EQZ IF_GE IF_GEZ IF_GT IF_GTZ IF_LE IF_LEZ IF_LT IF_LTZ IF_NE IF_NEZ IGET INSTANCE_OF INT_TO_BYTE INT_TO_CHAR INT_TO_DOUBLE INT_TO_FLOAT INT_TO_LONG INT_TO_SHORT INVOKE_DIRECT INVOKE_DIRECT_RANGE INVOKE_INTERFACE INVOKE_INTERFACE_RANGE INVOKE_STATIC INVOKE_STATIC_RANGE INVOKE_SUPER INVOKE_SUPER_RANGE INVOKE_VIRTUAL INVOKE_VIRTUAL_RANGE IPUT LONG_TO_DOUBLE LONG_TO_FLOAT LONG_TO_INT MONITOR_ENTER MONITOR_EXIT MOVE MOVE_16 MOVE_EXCEPTION MOVE_FROM16 MOVE_OBJECT MOVE_OBJECT_16 MOVE_OBJECT_FROM16 MOVE_RESULT MOVE_RESULT_OBJECT MOVE_RESULT_WIDE MOVE_WIDE MOVE_WIDE_16 MOVE_WIDE_FROM16 MUL NEG NEW_ARRAY NEW_INSTANCE NOP NOT OR PACKED_SWITCH REM RETURN RETURN_OBJECT RETURN_VOID RETURN_WIDE RSUB SGET SHL SHL SHR SPARSE_SWITCH SPUT SPUT SUB SUB THROW USHR XOR
Links:
Perl6
NQP
https://github.com/perl6/nqp
opcodes (about 268 of them): https://github.com/perl6/nqp/blob/master/docs/ops.markdown
MoarVM
MoarVM? is a VM built for Perl6's Rakudo implementation (the most canonical Perl6 implementation as of this writing).
- Register-based (16-bit registers identifiers, so on the order of 2^16 possible registers)
- primitive types: int, num, str, object
- uses libuv for async I/O, libtommath for multiple precision arithmetic, libatomic_ops for atomic operations, uthash for hashes
MoarVM introduction and features
- "In NQP and Rakudo, all operations that we can perform in a VM-independent way are captured in the nqp:: op set. Covers arithmetic, string manipulation, array and hash operations, object operations, I/O, and a few specialized things for the grammar engine. The MoarVM? instruction set is largely derived from this " -- [3]
- "MoarVM? includes the Unicode Character Database so far as we need it for Perl 6. No external dependencies (like ICU)" -- [4]
- "Threadsafe...We use mutexes in some places. However, many places –especially anything on a hot path –uses atomic operations in place of locks" [5]
"The heart of the VM": "
- Bytecode interpreter
- Most of 6 model
- Generational, parallel GC
- String and Unicode support
- Basic thread support
- Basic I/O support " -- [6]
"What does a VM typically do?
- Execute instructions (possibly by interpreting, possibly by JIT compilation, often a combination)
- Provide memory management (both allocation and deallocation, typically through garbage collection)
- Offer a range of built-in data structures and instructions to operate on them (strings, arrays, objects, ...)
- Abstract away the details of the underlying OS and expose a common interface to IO, threading, etc." " -- [7]
" Some key features provided by MoarVM? include:
- Meta-object programming, using the 6model design
- Precise, generational, and parallel GC
- Unicode 8.0 support (Unicode Character Database, encodings, normalization)
- First-class code objects, lexical variables and closures
- Exceptions
- Continuations
- Bounded serialization
- Code generation from MAST (MoarVM? AST)
- Runtime loading of code
- Big integers
- A range of IO and process support, including asynchronous sockets, signals, timers, and processes
- Native calling and native pointer manipulation
- Threads, mutexes, condition variables, semaphores, and blocking queues
- Bytecode specialization by type, and numerous optimizations (including resolution of method calls and multiple dispatch, dead code elimination, inlining, and on stack replacement)
- JIT compilation
- Instrumentation-based profiling of call frames and allocations " -- [8]
MoarVM AST (MAST)
12 different node types:
- CompUnit?
- Frame
- Op
- SVal
- IVal
- NVal
- Label
- Local
- Lexical
- Call
- Annotated
- HandlerScope?
-- [9]
MoarVM 6model
6model according to http://jnthn.net/papers/2013-yapceu-moarvm.pdf and https://github.com/jnthn/6model/blob/master/overview.pod :
"An object is a blob of memory. The only constraint is that the first thing in the blob must be a pointer/reference to a Shared Table data structure." -- https://github.com/jnthn/6model/blob/master/overview.pod
Object is composed of:
- STable
- Flags, owner
- GC stuff
- <body>
STable is composed of:
- HOW (meta-object)
- REPResentation
- WHAT (type object)
- WHO (stash) ("The package supporting the object." [10]) ("A Stash is a hash that is used for symbol tables at the package scoping level in Perl 6.") -- [11]
- Method cache
- Type check cache
on meta-objects: 6model "provides one meta-object that implements objects with attributes (state) and methods (behavior) - and that's about it. It doesn't enforce one definition of method dispatch, inheritance, interfaces, introspection and so forth. These are all built up by implementing meta-objects that specify their semantics." -- [12]
on representations: " Rather than committing to one view of how to lay out an object in memory, 6model supports "representations". Representations define how attributes are actually stored and accessed, how primitive types (integers, floating point values and strings) are boxed and unboxed - or perhaps both. Additionally, representations are orthogonal to meta-objects, meaning it is possible to define one type (e.g. a class) and use it with different storage strategies. This is known as representation polymorphism...An object may in the abstract be a blob of memory that starts with a pointer to a Shared Table, but of course something has to know what the rest of it means. That's what representations do. A representation is responsible for object allocation, attribute storage and access (both in terms of memory layout and operation), boxing from and unboxing to native types and (depending on the VM) GC interaction. Representations may be like singletons, or they may act more like instances. How a representation is implemented is VM-specific; in fact, pretty much everything the representation has to do is also VM-specific." -- [13]
on meta-objects and caches:
"In an ideal world, every single method dispatch we perform would be conducted by delegating to the meta-object's method that implements method dispatch semantics. In the real world, that's not practical from a performance point of view. Thus 6model provides various mechanisms that a meta-object can "opt in" to in order to allow for muchly increased performance. However, it considers all of these to really just be a kind of "cache". A v-table is just an array of invokable objects published by the meta-object, which it is responsible for maintaining. Similar things apply to type-checking." -- [14]
" REPR API has a common part (allocation, GC marking) along with several sub - protocols for different ways of using memory: Representations are orthogonal to type (and thus dis - interested in method dispatch, type check, etc.) and also non - virtual (if you know the REPR, can inline stuff) "
there are various calls to functions prefixed with 'MVM_6model' in https://github.com/MoarVM/MoarVM/blob/master/src/core/interp.c
these seem to be declared in https://github.com/MoarVM/MoarVM/blob/master/src/6model/6model.h
the list seems to be (for readabilty, i removed the 'MVM' prefixes from everything in this list, and also the '6model' prefixes):
- PUBLIC Object * get_how(ThreadContext? *tc, STable *st);
- PUBLIC Object * get_how_obj(ThreadContext? *tc, Object *obj);
- void find_method(ThreadContext? *tc, Object *obj, String *name, Register *res);
- PUBLIC Object * find_method_cache_only(ThreadContext? *tc, Object *obj, String *name);
- int32 find_method_spesh(ThreadContext? *tc, Object *obj, String *name, int32 ss_idx, Register *res);
- int64 can_method_cache_only(ThreadContext? *tc, Object *obj, String *name);
- void can_method(ThreadContext? *tc, Object *obj, String *name, Register *res);
- void istype(ThreadContext? *tc, Object *obj, Object *type, Register *res);
- PUBLIC int64 istype_cache_only(ThreadContext? *tc, Object *obj, Object *type);
- int64 try_cache_type_check(ThreadContext? *tc, Object *obj, Object *type, int32 *result);
- void invoke_default(ThreadContext? *tc, Object *invokee, Callsite *callsite, Register *args);
- void stable_gc_free(ThreadContext? *tc, STable *st);
- uint64 next_type_cache_id(ThreadContext? *tc);
- void never_repossess(ThreadContext? *tc, Object *obj);
https://github.com/MoarVM/MoarVM/blob/master/docs/bootstrap.markdown describes various primitive types in MoarVM?: strings, arrays, hashes, (external/foreign) functions. All of these are needed for fields within the 'primitive' meta-object type, KnowHOW?. KnowHOW? meta-objects are then constructed for each of the other primitive types. The meta-object for the KnowHOW? class itself is itself a KnowHOW?.
"Every object in MoarVM? is a 6model object - one object system for the whole VM. By object, we mean what you think of as objects (Arrays, Hashes, Boxed integers, floats, etc., Threads, handles)..." [15]
Presumably much of MoarVM?'s 6model implementation is in https://github.com/MoarVM/MoarVM/blob/master/src/6model/
See also http://doc.perl6.org/language/mop
MoarVM links
Haskell GHC languages
Note that Haskell is implemented a bit differently from many languages; this stems from Haskell's properties:
- lazy (ie. at runtime we often pass around suspended expressions or 'thunks' rather than values)
- higher-order functions (ie. functions are first-class values)
- currying (ie. if we apply a function f to two arguments using the expression 'f x y', and the result type is an integer, then (f x) is a function which returns a function which takes one argument and returns an integer; in some implementations, the 'f' in (f x y) could be internally implemented either by (a) a function which takes one argument and returns a function which returns an integer, (b) a function which takes two arguments, and returns an integer (c) a function which takes more than two arguments, in which case the result of 'f x y' is a partially applied function)
It is therefore assumed that Haskell programs will, at runtime, very frequently be dealing with unevaluated thunks, with closures which represent unevaluated or partially evaluated functions, and with the application of functions whose arity was unknown at compile time.
GHC Core
Possible extensions:
Core in one slide: " data Expr b -- "b" for the type of binders, = Var Id
Lit Literal |
App (Expr b) (Arg b) |
Lam b (Expr b) |
Let (Bind b) (Expr b) |
Case (Expr b) b Type [Alt b] |
Type Type |
Cast (Expr b) Coercion |
Coercion Coercion |
Tick (Tickish Id) (Expr b) |
data Bind b = NonRec? b (Expr b)
type Arg b = Expr b
type Alt b = (AltCon?, [b], Expr b)
data AltCon? = DataAlt? DataCon?
LitAlt? Literal | DEFAULT |
" -- [16]
"Core is technically a variant of a System FC (which is itself a variant of System F)" -- [17]
" Values: Both data and functions Unboxed: Primitive (machine level) types, not "boxed up" on the heap Closure: Heap allocated data associated with a method Thunk: A suspended computation Continuations: The cousin of closures. Unlike a closure you aren't simply calling a function, you are continuing a saved execution state. Basically though, if you have closures and tail calls as Haskell does, continuations can be built.
Yes, closures, thunks and continuations are all very similar. One implementation can capture them all, however the terminology is used to capture the different use cases.
...
redex(es):
reducible expression. A expression that can be evaluated further
normal form:
an expression without an redexes
head normal form:
an expression where the top level (head) is neither a redex NOR a lambda abstraction with a reducible body
weak head normal form:
an expression where the top level (head) isn't a redex
unfolding:
unfolding of a function f is just the body of f.
Unfolding = Inlining.
...
evaluation strategies:
call-by-value: arguments evaluated before function entered (copied)
call-by-name: arguments passed unevaluated
call-by-need: arguments passed unevaluated but an expression is only evaluated once (sharing)
no-strict evaluation Vs. lazy evaluation:
non-strict: Includes both call-by-name and call-by-need, general term for evaluation strategies that don't evaluate arguments before entering a function
lazy evaluation: Specific type of non-strict evaluation. Uses call-by-need (for sharing).... scrutinee: The expression you are case'ing on in a case statement " -- [18]
"Graph Reduction: The way that lazy functional languages like Haskell are implemented is through a technique called graph reduction. Its best to use the graph reduction model as an intuitive way to think about how Haskell is evaluated, the actual way GHC implements Haskell is pretty close to how an imperative language works. " -- [19]
GHC STG
spineless tagless G-machine
- http://stackoverflow.com/questions/11921683/understanding-stg
- https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode
- Simon Marlow and Simon Peyton Jones, How to make a fast curry: push/enter vs eval/apply
- push/enter vs eval/apply: Consider what code the compiler generates when an unknown function f is applied to two arguments x and y (in Haskell, this would be written 'f x y'). At runtime, there are three possibilities; f might be of arity less than 2, or it might be of arity 2, or it might be of arity greater than 2. In the first case, when f's arity is less than 2, f needs to be passed just the arguments it needs, and the output of f will be expected to be another function g to consume the rest of the arguments. In the second case, when f's arity is exactly 2, f needs to be passed all of the arguments, f should be passed the arguments. In the third case, when f's arity is greater than 2, a closure to represent the partially applied function must be constructed and saved for later. push/enter vs eval/apply concerns how this is done.
- In an eval/apply system, the caller function's code handles this choice. This is what you might naively expect to happen. The caller examines metadata of f to find its arity, and then either (a) if f's arity is less than 2, calls f and then calls the function that f returns upon the remaining arguments, (b) if f's arity matches the number of remaining arguments, calls f on all of the arguments and stores its result for later use, or (c) if f's arity is greater than the number of available arguments, constructs a closure (partial function application, with f applied to all available arguments) and stores it for later use.
- In a push/enter system, the arguments x and y are placed on the stack ('push') and a jump is made to the code for f ('enter'); the code for f looks at how many arguments are on the stack and then behaves appropriately, consuming as many arguments as it needs, and then either (a) if its arity is less than 2, tail-calling the function that it outputs in order to consume the result of the arguments, (b) if its arity is 2, returning its result, (c) if its arity is greater than 2, returning a closure (partial function application). So, in a push/enter system, the callee function's code is responsible for choosing between these three possibilities and doing the appropriate thing.
- The original spinless tagless G-machine paper (and older Haskell implementations which followed it) use a push/enter system (following earlier work, the Johnsson G-machine and the Fairbairn and Wray Three Instruction Machine, as discussed in the 'push/enter vs eval/applys Related Work section). The 'push/enter vs eval/apply' paper profiles Haskell variants using eval/apply and push/enter and concludes that (on x86 machines, using C as a target language) neither choice is greatly more efficient than the other, but eval/apply is simpler, therefore eval/apply is recommended (they also point out that perhaps eval/apply would be even faster if more registers were used for argument passing).
- Marlow, Yakushev, Jones, Faster Laziness Using Dynamic Pointer Tagging (2007)
- this paper profiles Haskell variants without tagging, with semi-tagging, and with dynamic pointer tagging, and concludes that semi-tagging is 7-9% faster, and dynamic pointer tagging is an additional 3-5% faster
- an easier-to-read introduction to the question of 'tagging' is in Hammond, The Spineless Tagless G-Machine - NOT! (1993) (this paper also introduces semi-tagging, where one tag bit is used to mark objects as evaluated or unevaluated)
- note that on popular architectures, word-alignment means that each pointer has some unused bits, eg on a 32-bit machine, pointers have 2 unused bits (b/c there are 4 bytes in a word), and these bits can be used as tag bits (provided you mask them out before using the pointer)
- in dynamic pointer tagging, in addition to indicating whether an object has been evaluated or not, the tags indicate, in a type-specific manner, which constructor was used to generate an object. For example, in Haskell, a list might have two 'constructors', CONS and EMPTYLIST, eg the list [3 5 4] might be represented as CONS(3, CONS(5, CONS(4, EMPTYLIST()))). So, with dynamic pointer tagging, the tags for a list object might indicate not only whether the object has been evaluated or not, but also if it has been evaluated, then whether the (outermost) constructor for this list is a CONS or an EMPTYLIST. Without dynamic pointer tagging, this information is stored in the info table within the object (i think). So if there are two tag bits, then for a list object these could be interpreted as: {00: unevaluated, 01: evaluated, EMPTYLIST, 10: evaluated, CONS, 11: (never used)}. A data type with 3 possible constructors could use 11 to indicate the third constructor. A data type with more than 3 possible constructors could use 01 and 10 for the most common construtors and use 11 to indicate 'some other constructor', which would then necessitate an old-fashioned info table lookup to see which constructor it was (note: the paper, however, doesn't use dynamic pointer tagging at all for types with too many constructors, rather than tagging the most common constructors and having an 'other' tag). On 64-bit systems, there are 3 'tag bits' available due to pointer alignment, meaning that a similar coding scheme could use 000 for unevaluated, and then distinguish between up to 7 constructor types. Note that the meaning of tags is type-specific, since different types have different sets of constructors. The point of all this tagging is to save time by not having to do info table lookups in many cases to figure out which constructor a data object is.
- Peyton Jones and Simon L, Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-machine [http://research.microsoft.com/en-us/um/people/simonpj/Papers/spineless-tagless-gmachine.ps.gz ] (2007); out of date; since then Haskell apparently switched from a 'push/enter' approach to an 'eval/apply' approach (see How to make a fast curry: push/enter vs eval/apply), and apparently they also gave up on the 'tagless' part and on 'vectored returns' (see Faster Laziness Using Dynamic Pointer Tagging). 'Spineless' refers to a property of the graph-reduction strategy, see http://stackoverflow.com/a/16994919/171761 . 'Tagless' referred to the fact that the STG machine didn't tag pointers on the heap with metadata, instead just jumping into the 'entry point' code pointed to by the heap pointer, and letting the object's entry point code handle deciding what to do; as noted, as of this writing Haskell implmentations are apparently no longer 'tagless'. G-machine is short for "graph reduction machine", which refers to the idea of program execution as successive reductions of a graph of expressions.
- http://www.scs.stanford.edu/11au-cs240h/notes/ghc-slides.html
- http://webcache.googleusercontent.com/search?q=cache:zDvHNEwVmusJ:www.scs.stanford.edu/11au-cs240h/notes/ghc-slides.html&client=ubuntu&hl=en&gl=us&strip=1
- https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StgSynType
- http://lambda.jstolarek.com/2013/06/getting-friendly-with-stg/
- http://stackoverflow.com/questions/11921683/understanding-stg
- http://www.haskell.org/ghc/docs/latest/html/libraries/ghc/StgSyn.html
- if you are wondering why GHC manages its stack separately from the C stack, and does not make use of the CPU's CALL and RET instructions, see section 7.1, "Using call/return instructions", in Marlow, Yakushev, Jones, Faster Laziness Using Dynamic Pointer Tagging (2007). In summary: (a) GHC supports large numbers of threads, each with very small stacks, but operating systems take signals on the user stack, and don't do limit checking (i.e. i think they mean that very small stacks might cause stack overflows during OS signal handling), (b) in order to tell the garbage collector which items in a stack frame are pointers, the code for each 'case continuation' is normally preceded by an 'info table', but when you RET from a CALL, you RET to the instruction right after the CALL instruction, which is where we'd like to put the info table, necessitating an extra JMP (not sure if i'm understanding that part correctly; my understanding is that the info table is embedded in the code and that they save one JMP instruction by skipping over the info tables with their synthetic returns, is that correct?). https://ghc.haskell.org/trac/ghc/wiki/Commentary
- https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts
- http://www.haskell.org/haskellwiki/Ministg
GHC Bytecode (interpreter)
from http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.8.3/src/ByteCodeInstr.html :
" -- Messing with the stack = STKCHECK Word
-- Push locals (existing bits of the stack)
| PUSH_L !Word16{-offset-}
| PUSH_LL !Word16 !Word16{-2 offsets-}
| PUSH_LLL !Word16 !Word16 !Word16{-3 offsets-}
-- Push a ptr (these all map to PUSH_G really)
| PUSH_G Name
| PUSH_PRIMOP PrimOp
| PUSH_BCO (ProtoBCO Name)
-- Push an alt continuation
| PUSH_ALTS (ProtoBCO Name)
| PUSH_ALTS_UNLIFTED (ProtoBCO Name) ArgRep
-- Pushing literals
| PUSH_UBX (Either Literal (Ptr ())) Word16
-- push this int/float/double/addr, on the stack. Word16
-- is # of words to copy from literal pool. Eitherness reflects
-- the difficulty of dealing with MachAddr here, mostly due to
-- the excessive (and unnecessary) restrictions imposed by the
-- designers of the new Foreign library. In particular it is
-- quite impossible to convert an Addr to any other integral
-- type, and it appears impossible to get hold of the bits of
-- an addr, even though we need to assemble BCOs.
-- various kinds of application
| PUSH_APPLY_N
| PUSH_APPLY_V
| PUSH_APPLY_F
| PUSH_APPLY_D
| PUSH_APPLY_L
| PUSH_APPLY_P
| PUSH_APPLY_PP
| PUSH_APPLY_PPP
| PUSH_APPLY_PPPP
| PUSH_APPLY_PPPPP
| PUSH_APPLY_PPPPPP
SLIDE Word16{-this many-} Word16{-down by this much-} |
-- To do with the heap
| ALLOC_AP !Word16 -- make an AP with this many payload words
| ALLOC_AP_NOUPD !Word16 -- make an AP_NOUPD with this many payload words
| ALLOC_PAP !Word16 !Word16 -- make a PAP with this arity / payload words
| MKAP !Word16{-ptr to AP is this far down stack-} !Word16{-number of words-}
| MKPAP !Word16{-ptr to PAP is this far down stack-} !Word16{-number of words-}
| UNPACK !Word16 -- unpack N words from t.o.s Constr
| PACK DataCon !Word16
-- after assembly, the DataCon is an index into the
-- itbl array
-- For doing case trees
| LABEL LocalLabel
| TESTLT_I Int LocalLabel
| TESTEQ_I Int LocalLabel
| TESTLT_W Word LocalLabel
| TESTEQ_W Word LocalLabel
| TESTLT_F Float LocalLabel
| TESTEQ_F Float LocalLabel
| TESTLT_D Double LocalLabel
| TESTEQ_D Double LocalLabel
-- The Word16 value is a constructor number and therefore
-- stored in the insn stream rather than as an offset into
-- the literal pool.
| TESTLT_P Word16 LocalLabel
| TESTEQ_P Word16 LocalLabel
-- For doing calls to C (via glue code generated by libffi)
| CCALL Word16 -- stack frame size
(Ptr ()) -- addr of the glue code
Word16 -- whether or not the call is interruptible
-- (XXX: inefficient, but I don't know
-- what the alignment constraints are.)
-- For doing magic ByteArray passing to foreign calls
| SWIZZLE Word16 -- to the ptr N words down the stack,
Word16 -- add M (interpreted as a signed 16-bit entity)
-- To Infinity And Beyond
| ENTER
| RETURN -- return a lifted value
| RETURN_UBX ArgRep -- return an unlifted value, here's its rep
-- Breakpoints
| BRK_FUN (MutableByteArray# RealWorld) Word16 BreakInfo"
Links:
Haskell YHC
YHC IL
Robert Dockins and Samuel Z. Guyer, Bytecode Verification for Haskell table 1
YHC bytecode
the following is paraphrased/quoted from http://web.archive.org/web/20060714071238/http://www-users.cs.york.ac.uk/~ndm/yhc/bytecodes.html
- ALLOC: ALLOC n creates n 'blackhole' applications in the heap and pushes the pointers to the applications on to the top of the stack. The applications are to the function 'Prelude._black_hole' which will terminate the program if evaluated. ALLOC n is used in conjunction with UPDATE n to create circular heap nodes.
- APPLY: APPLY n takes an application on the top of the stack an application to additional n arguments from the stack to create a new application. If applying an extra n arguments to the application a would super-saturate it (i.e. apply it to more arguments that the functions arity) then APPLY satures the application fully and then builds further applications to the built in function '_apply' to apply the rest of the argument. The function '_apply' is defined in src/runtime/BCKernel/primitive.c ... it evaluates the fully-staturated application which then returns another application, and this application is then applied to the additional argument.
- END_CODE: This instruction is placed at the end of every function, after the RETURN/RETURN_EVAL. If it is executed the machine will crash with an error. The idea is that if the mutator accidentally starts running non-code (for example because the compiler mis-generated a JUMP instruction) then the program will exit swiftly. The value of this instruction code being 0 aids in this.
- EVAL: EVAL takes the value on the top of the stack and evaluates it to weak head normal form (WHNF). If the value on the top of the stack is a constructor or partial application EVAL does nothing. If it is a saturated application then EVAL 'calls' the function specified in the application. It does this by pushing a new frame onto the stack.
- FROM_ENUM: FROM_ENUM takes a pointer to an application to a constructor from the top of the stack and pushes onto the top of the stack a pointer to an heap allocated Int containing the tag number of the constructor. FROM_ENUM is used (perhaps rather unsurprisingly) to implement fromEnum
- INT_SWITCH: INT_SWITCH is similar to LOOKUP_SWITCH but selects on the integer value of an int node rather than on the constructor tag number.
- JUMP: JUMP j unconditionally jumps forward by j bytes.
- JUMP_FALSE: JUMP_FALSE j removes the value from the top of the stack and if it is the node Prelude.False then the program jumps forward by j bytes. If it is not Prelude.False then JUMP_FALSE is a no-op.
- LOOKUP_SWITCH: LOOKUP_SWITCH is similar to TABLE_SWITCH but here lookup-table is an array of tag-offset pairs. When a matching tag is found the program jumps forward by 'offset' number of bytes. If no match is found the program just forward by 'def' number of bytes.
- MK_AP: MK_AP n looks up the nth item of the constant table (which must be a FInfo). It then makes a fully saturated application to this function by taking m arguments off the stack (where m is the arity of the function). It then builds an application in the heap to the given function with those arguments. It then pushes a pointer to the new application onto the top of the stack.
- MK_CON: MK_CON n looks up the nth item of the constant table (which must be a CInfo). It then builds a fully saturated application to the constructor using the correct number of arguments from the stack.
- MK_PAP: MK_PAP n m is a more generalised version of MK_AP that can also build partial applications. As with MK_AP here n refers to the nth constant table item, but unlike MK_AP the arity of the application is specified explicitly in m.
- NEED_HEAP: NEED_HEAP specifies the number of heap words required to run the whole of the next basic block (which is calculated by the compiler).
- POP
- PRIMITIVE: PRIMITIVE takes the first constant table item, which must be an XInfo and calls the external function using the current application node. It pushes on the top of the stack the value returned by the external function. PRIMITIVE is used to implement primitive functions and the FFI.
- Primitives: ADD_W, ADD_F, ADD_D, SUB_W, SUB_F, SUB_D, MUL_W, MUL_F, MUL_D, DIV_W, DIV_F, DIV_D, MOD_W, MOD_F, MOD_D, EQ_W, EQ_F, EQ_D, NE_W, NE_F, NE_D, LE_W, LE_F, LE_D, LT_W, LT_F, LT_D, GE_W, GE_F, GE_D, GT_W, GT_F, GT_D, NEG_W, NEG_F, NEG_D
- PUSH: PUSH takes the nth argument on the stack and pushes the same pointer onto the top
- PUSH_ARG: PUSH_ARG n pushes the nth argument of the 'current application node' (See Yhc/RTS/Machine) onto the top of the stack.
- PUSH_CHAR: PUSH_CHAR n allocates a heap node for the Char n and pushes a pointer to the heap node
- PUSH_CONST: PUSH_CONST n pushes the nth item of the current application node's constant table onto the top of the stack.
- PUSH_INT: PUSH_INT n allocates a heap node for the Int n and pushes a pointer to the heap node on
- PUSH_ZAP: abbreviation for PUSH n ; ZAP_STACK (n+1)
- PUSH_ZAP_ARG: equivilent to PUSH_ARG n ; ZAP_ARG n
- RETURN: RETURN returns the value on the top of the stack as the result of the current function call. It then pops the top item off the frame stack to continue the function below it
- RETURN_EVAL: RETURN_EVAL is equivilent to EVAL ; RETURN except that it doesn't use any stack space. RETURN_EVAL is thus used to implement tail call optimisation.
- SELECT: SELECT n is an abbreviation for UNPACK ; PUSH_ZAP n ; RETURN_EVAL. It is used to implement dictionaries and selectors.
- SELECTOR_EVAL: SELECTOR_EVAL is an abbreviation for PUSH_ARG_0 ; EVAL
- SLIDE: SLIDE n takes the top item off the stack, pops n items from the stack and then pushes the first item removed back on to the top of the stack
- STRING: STRING removes from the top of the stack a pointer to an application to a StringNode?. It then 'unpacks' the string into the equivilent list of characters.
- TABLE_SWITCH: TABLE_SWITCH examines the tag of the item on the top of the stack and then jumps forward by the number of bytes specified by jump-table[tag].
- UNPACK: UNPACK takes from the top of the stack an application to a constructor. It then pushes on the stack all the arguments of that constructor with the first argument on top.
- UPDATE: UPDATE n removes the top item from stack and uses it to 'update' the application pointed to by the nth item of the stack. To 'update' the application it is overwritten with an application the value from the top of the stack. UPDATE is used in conjunction with ALLOC to create cyclic memory structures.
- ZAP_ARG: ZAP_ARG n replaces the nth argument of the 'current application node' with an application to 'Prelude._zap_arg'.
- ZAP_STACK: ZAP_STACK simply replaces the nth stack item with a pointer to a 'Prelude._zap_stack'. This is used to help prevent space leaks: when a value is no longer needed its stack reference can be 'zapped'. If this was the last reference to an object then if the GC runs it can collect the unused application.
Links:
Haskell Hugs
Bytecode:
Quoted or paraphrased from The implementation of the Gofer functional programming system:
Instructions for constructing values:
The following instructions are used to construct frag- ments of program graph, using the stack to store tem- porary values as well as the function parameters and the results of calls to the evaluator.
- LOAD n: Push the value from position n in the current stack frame onto the top of the stack. Used to access the values of function arguments or locally bound variables.
- CELL c : Push the constant cell value specified by c onto the top of the stack. Used to access constants such as constructor functions and su- percombinators.
- CHAR n : Push a character value onto the top of the stack
- INT n : Push the integer n
- FLOAT f : Push the floating point number f
- STRING str : Push the string value str onto the top of the stack.
- MKAP n : Apply the function on the top of the stack to the n argument values immediately be- low it. The top n +1 elements on the stack are replaced by the resulting expression. Note that this instruction does not involve any evaluation of the function, its arguments, or the resulting expression.
- UPDATE n : Removes the value, r say, from the top of the stack and updates the n th element in the current stack frame with a pointer to an in- direction node to r . This instruction is used to overwrite the root of an expression as part of the implementation of lazy evaluation, i.e. to ensure proper sharing of the result of an expression. It is also used to save the values of variables bound in a local def- inition. The indirection cell is used to avoid a loss of laziness see The implementation of the Gofer functional programming system p. 48(?), Section 12.4.
- UPDAP n : This instruction has almost the same effect as a MKAP 1 instruction followed by an UPDATE n instruction; in other words, the n th element in the current stack frame is replaced with the application formed from the top two values on the stack (which are subsequently dis- carded). The difference is that the UPDAP in- struction should only be used when it is known that the n th element on the stack already points to an application node; in this case, the old ap- plication node can be overwritten with the new values, avoiding the need to allocate a new ap- plication node.
- ALLOC n : Allocates n pairs, each initialized so that both first and second components are NIL , and pushes a pointer to each pair allocated onto the stack. This instruction is as part of the pro- cess of initializing (possibly recursive) local vari- able bindings.
- SLIDE n : Slides the value on the top of the stack down n places by removing the n values imme- diately below it.
- DICT n : Replaces the value on the top of the stack—which must be a dictionary value, d say— with the value held in the n th slot of d . This in- struction is used to implement dictionary lookup by compiling expressions of the form (#n d) to an instruction sequence: ... code to build d ... DICT n The type system ensures that the DICT instruc- tion will only ever be used when the value on the top of the stack is a dictionary. Furthermore, since all of the dictionary values required by a program are constructed before it is executed, there is no need to evaluate the dictionary d be- fore the DICT instruction. Thus DICT can be im- plemented very efficiently with just a couple of machine instructions without needing a run-time representation for dictionary selectors.
- ROOT n : Used in the implementation of the root optimization; see Section 9.3 of The implementation of the Gofer functional programming system for further details
Control flow:
The following instructions are used to call the eval- uator, to test the values that it returns, to indicate the end of a particular reduction, or to signal an ir- reducible expression.
- EVAL Pops the top expression off the stack and calls the evaluator to calculate its value.
- INTEQ n addr : If the value in the whnfInt reg- ister is equal to the integer constant n , then continue with the current sequence of instruc- tions. If the two values are not equal, then con- trol transfers to addr . This instruction is used to support pattern matching of integer constants. INTGE n addr : If the value in the whnfInt reg- ister is greater than or equal to n , then the in- teger whnfInt - n is pushed onto the stack. If the test fails, then control passes to the address addr . This instruction is used to support (p+k) pat- terns. For example, an INTGE 2 addr instruc- tion might be used to match the value on the top of the stack against the pattern (v+2) . If the value on the top of the stack is less than 2 , then the match fails and execution transfers to the instruction at addr . Otherwise, the match is successful and the value pushed onto the stack gives the value bound to the variable v. xtending Gofer to allow (p+k) patterns to be matched against values of any type in the Integral class as permitted in Haskell would re- quire some extensions to the INTGE instruction, or, more likely, more significant changes to the program transformations used in compiler.c . However, it seems more likely that future ver- sions of Gofer will eliminate support for (p+k) patterns altogether.
- INTDV n addr : Similar to INTGE , this instruc- tion is used to determine whether the value in the whnfInt register is a positive multiple of the in- teger n . If so, the value of whnfInt/n is pushed onto the stack. Otherwise, control transfers to the instruction at address addr . This instruction is used to support pattern matching of (n*v) patterns, added as an exper- imental feature to the Gofer interpreter. This form of pattern is now considered obsolete. TEST c addr : This instruction compares the valu in whnfHead register with the value of the Cell constant c , branching to the instruction at address addr if the values are not the same. This instruction is used in the implementation of pattern matching. To illustrate this, consider a conditional expression: "if e then t else f" and note that this is equivalent to the case ex- pression: "case e of True -> t; False -> f". The following code can be used to calculate the value of this conditional in situations where it is clear that results of the conditional will always be required (i.e. when the conditional appears in a strict context): "... code to build e ...; EVAL; TEST True label; ... code to build t ...; RETURN label: ... code to build f ...; RETURN;". Each of the sections of code that ‘build’ an ex- pression here are expected to leave the required expression, unevaluated, on the top of the stack.) As an aside, in situations where we cannot be sure that the value of the conditional will be re- quired, we use code of the form: " ... code to build f ...; ... code to build t ...; ... code to build e ...; CELL nameIf; MKAP 3;" to build a delayed version of the conditional ex- pression. The nameIf cell used here refers to a primitive function defined so that: nameIf True t f = t. nameIf False t f = f. Clearly, we would prefer to use the code con- taining the TEST instruction whenever possible because it avoids the need to build code for both t and f when the program executes.
- GOTO addr : Causes an unconditional jump to the instruction at address addr .
- RETURN : Signals the end of (one possible path through) the code for a supercombinator. The RETURN function usually follows the UPDATE 0 or UPDAP 0 instructions used to update the root of the current redex.
- FAIL : Used to signal a failed pattern match, usu- ally resulting in a run-time error. For convenience, the current version of the code generator places a single FAIL instruction at a fixed address, recorded in the variable noMatch during the initialization process. Pattern match- ing failures in other parts of the program are trig- gered by branching to this address.
- SETSTK n : Resets the stack pointer to root+ n . This instruction is used in to set the stack pointer to a known position in situations where its value cannot be determined at compile time.
Links:
FLRC / HRC's MIL
The Functional Language Research Compiler / Haskell Research Compiler has an IL called 'MIL'.
Links:
Smalltalk Slang
From http://wiki.squeak.org/squeak/2267 , the operations available in Slang are:
"&" "
" and: or: not |
"+" "-" "" "
" min: max: bitAnd: bitOr: bitXor: bitShift: "<" "<=" "=" ">" ">=" "~=" "==" isNil notNil whileTrue: whileFalse: to:do: to:by:do: ifTrue: ifFalse: ifTrue:ifFalse: ifFalse:ifTrue: at: at:put: bitInvert32 preIncrement integerValueOf: integerObjectOf: isIntegerObject:
Links:
The Smalltalk-80 Bytecodes Range Bits Function 0-15 0000iiii Push Receiver Variable #iiii 16-31 0001iiii Push Temporary Location #iiii 32-63 001iiiii Push Literal Constant #iiiii 64-95 010iiiii Push Literal Variable #iiiii 96-103 01100iii Pop and Store Receiver Variable #iii 104-111 01101iii Pop and Store Temporary Location #iii 112-119 01110iii Push (receiver, true, false, nil, -1, 0, 1, 2) [iii] 120-123 011110ii Return (receiver, true, false, nil) [ii] From Message 124-125 0111110i Return Stack Top From (Message, Block) [i] 126-127 0111111i unused 128 10000000 jjkkkkkk Push (Receiver Variable, Temporary Location, Literal Constant, Literal Variable) [jj] #kkkkkk 129 10000001 jjkkkkkk Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk 130 10000010 jjkkkkkk Pop and Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk 131 10000011 jjjkkkkk Send Literal Selector #kkkkk With jjj Arguments 132 10000100 jjjjjjjj kkkkkkkk Send Literal Selector #kkkkkkkk With jjjjjjjj Arguments 133 10000101 jjjkkkkk Send Literal Selector #kkkkk To Superclass With jjj Arguments 134 10000110 jjjjjjjj kkkkkkkk Send Literal Selector #kkkkkkkk To Superclass With jjjjjjjj Arguments 135 10000111 Pop Stack Top 136 10001000 Duplicate Stack Top 137 10001001 Push Active Context 138-143 unused 144-151 10010iii Jump iii + 1 (i.e., 1 through 8) 152-159 10011iii Pop and Jump 0n False iii +1 (i.e., 1 through 8) 160-167 10100iii jjjjjjjj Jump(iii - 4) *256+jjjjjjjj 168-171 101010ii jjjjjjjj Pop and Jump On True ii *256+jjjjjjjj 172-175 101011ii jjjjjjjj Pop and Jump On False ii *256+jjjjjjjj 176-191 1011iiii Send Arithmetic Message #iiii 192-207 1100iiii Send Special Message #iiii 208-223 1101iiii Send Literal Selector #iiii With No Arguments 224-239 1110iiii Send Literal Selector #iiii With 1 Argument 240-255 1111iiii Send Literal Selector #iiii With 2 Arguments
LuaVM
" By default, Lua has a maximum stack frame size of 250. This is encoded as MAXSTACK in llimits.h. The maximum stack frame size in turn limits the maxi mum number of locals per function, which is set at 200, encoded as LUAI_MAXVARS in luaconf.h. Other limits found in the same file include the maximum number of upval ues per function (60), encoded as LUAI_MAXUPVALUES , call depths, the minimum C stack size, etc. Also, wit h an sBx field of 18 bits, jumps and control structures cannot exceed a jump distance of about 131071. "
0 MOVE Copy a value between registers 1 LOADK Load a constant into a register 2 LOADBOOL Load a boolean into a register 3 LOADNIL Load nil values into a range of registers 4 GETUPVAL Read an upvalue into a register 5 GETGLOBAL Read a global variable into a register 6 GETTABLE Read a table element into a register 7 SETGLOBAL Write a register value into a global variable 8 SETUPVAL Write a register value into an upvalue 9 SETTABLE Write a register value into a table element 10 NEWTABLE Create a new table 11 SELF Prepare an object method for calling 12 ADD Addition operator 13 SUB Subtraction operator 14 MUL Multiplication operator 15 DIV Division operator 16 MOD Modulus (remainder) operator 17 POW Exponentiation operator 18 UNM Unary minus operator 19 NOT Logical NOT operator 20 LEN Length operator 21 CONCAT Concatenate a range of registers 22 JMP Unconditional jump 23 EQ Equality test 24 LT Less than test 25 LE Less than or equal to test 26 TEST Boolean test, with conditional jump (if not (R(A) <=> C) then PC++) 27 TESTSET Boolean test, with conditional jump and assignment (if (R(B) <=> C) then R(A) := R(B) else PC++) 28 CALL Call a closure 29 TAILCALL Perform a tail call 30 RETURN Return from function call 31 FORLOOP Iterate a numeric for loop 32 FORPREP Initialization for a numeric for loop 33 TFORLOOP Iterate a generic for loop 34 SETLIST Set a range of array elements for a table 35 CLOSE Close a range of locals being used as upvalues 36 CLOSURE Create a closure of a function prototype 37 VARARG Assign vararg function arguments to registers
this table is taken almost verbatim from http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf
summary:
- move
- loads (constant, bool, nil)
- get/set (upval, global, table)
- misc: newtable, self, close, closure, vararg
- arithmetic: add, sub, mul, div, mod, pow, unm (unary minus)
- logic: not, test, testset (both test and testset can be used to implement both 'and' and 'or', see http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf)
- arrays: len, concat, setlist
- jumps: jmp, call, tailcall, return (also some other conditionals)
- conditionals: eq, lt, le (also test and testset)
- loops: forloop, forprep, tforloop
Links:
LuaLIT VM bytecode
http://wiki.luajit.org/Bytecode-2.0
" The suffix(es) of the instruction name distinguish variants of the same basic instruction:
- V variable slot
- S string constant
- N number constant
- P primitive type
- B unsigned byte literal
- M multiple arguments/results
Here are the possible operand types:
- (none): unused operand
- var: variable slot number
- dst: variable slot number, used as a destination
- base: base slot number, read-write
- rbase: base slot number, read-only
- uv: upvalue number
- lit: literal
- lits: signed literal
- pri: primitive type (0 = nil, 1 = false, 2 = true)
- num: number constant, index into constant table
- str: string constant, negated index into constant table
- tab: template table, negated index into constant table
- func: function prototype, negated index into constant table
- cdata: cdata constant, negated index into constant table
- jump: branch target, relative to next instruction, biased with 0x8000 "
Instructions:
Comparison ops: IS{LT/GT/LE/GE] (<, >, <=, >=), ISEQ{V/S/N/P} (==, where 2nd operand is variable, string constant, number constant, primitive type), ISNE{V/S/N/P} (!=). note: "for floating-point comparisons (x < y) is not the same as not (x >= y) in the presence of NaNs?"
Unary Test and Copy ops: IST (jump if true), ISF (jump if false), ISTC (Copy D to A and jump, if D is true), ISFC (Copy D to A and jump, if D is false)
Unary ops: MOV, NOT, UNM (unary minus), LEN (object length)
Binary ops: ADD{VN, NV, VV} (variable and number, number and variable, variable and variable), SUB{VN, NV, VV}, MUL{VN, NV, VV}, DIV{VN, NV, VV}, MOD{VN, NV, VV}, POW, CAT
Constant ops: K{STR, CDATA, SHORT, NUM, PRI} (set A to constant D, where constant is of type string, cdata, short, number, primitive), KNIL (Set slots A thru D to nil)
Upvalue and Function ops: UGET (set A to upvalue D), USET{V, S, N, P} (set upvalue to variable/string constant, number constant, primitive constant), UCLO (Close upvalues for slots ≥ rbase and jump to target D), FNEW (Create new closure from prototype D and store it in A)
Table ops: TNEW (Set A to new table with size D), TDUP (Set A to duplicated template table D), GGET (A = _G[D]), GSET (_G[D] = A), TGET{V, S, B} (A = B[C], where index C is of type variable, string constant, unsigned byte literal), TSET{V, S, B}, TSETM ((A-1)[D], (A-1)[D+1], ... = A, A+1, ..) (note: "MULTRES from the previous bytecode gives the number of table slots to fill...MULTRES is an internal variable that keeps track of the number of results returned by the previous call or by VARG instructions with multiple results. It's used by calls (CALLM or CALLMT) or returns (RETM) with multiple arguments and by a table initializer (TSETM).")
Calls and Vararg Handling: CALL (A, ..., A+B-2 = A(A+1, ..., A+C-1)), CALLM (A, ..., A+B-2 = A(A+1, ..., A+C+MULTRES)), CALLT (Tailcall: return A(A+1, ..., A+D-1)), CALLMT (Tailcall: return A(A+1, ..., A+D+MULTRES)), ITERC (Call iterator: A, A+1, A+2 = A-3, A-2, A-1; A, ..., A+B-2 = A(A+1, A+2)), ITERN (Specialized ITERC, if iterator function A-3 is next()), VARG (Vararg: A, ..., A+B-2 = ...), ISNEXT (Verify ITERN specialization and jump; see http://wiki.luajit.org/Bytecode-2.0#Calls-and-Vararg-Handling)
Returns: "All return instructions copy the results starting at slot A down to the slots starting at one below the base slot (the slot holding the frame link and the currently executing function)." RET (return A, ..., A+D-2), RETM (return A, ..., A+D+MULTRES-1), RET0 (return), RET1 (return A)
Loops and branches: FORI (numeric for loop init), FORL (numeric for loop), ITER (iterator for loop), LOOP ("actually no-ops...only used by the JIT-compiler to speed up data-flow and control-flow analysis...The bytecode instruction itself is needed so the JIT-compiler can patch it to enter the JIT-compiled trace for the loop."). Many of these also have 'J' and 'I' variants which mark JIT-compiled loops and loops which cannot be JIT-compiled, respectively (the default variants "do hotspot detection. Trace recording is triggered if the loop is executed often enough"). See http://wiki.luajit.org/Bytecode-2.0#Loops-and-branches .
Function headers: FUNCF (Fixed-arg Lua function), FUNCV (Vararg Lua function), FUNCC (Pseudo-header for C functions), FUNCCW (Pseudo-header for wrapped C functions), FUNC* (Pseudo-header for fast functions). FUNCF and FUNCV each have 'J' and 'I' variants which mark JIT-compiled and un-JITable functions, as in the 'loops and branches' section.
LuaJIT SSA
http://wiki.luajit.org/SSA-IR-2.0
Erlang
Erlang BEAM VM
The Erlang virtual machine. Elixir also targets it.
Links:
quoted and paraphrased from https://synrc.com/publications/cat/Functional%20Languages/Erlang/beam.txt :
http://erlangonxen.org/more/beam appears to be copied from there. http://erlangonxen.org/more/beam claims that it documents (most of) the instruction set used in BEAM v5.9.1 (which is different from the "generic" instruction set that is shown in dissassembly).
Argument Types
Types of instruction arguments are encoded by single character codes. The meaning of these codes is given in the table below.
Type Description
t An arbitrary term, e.g. {ok,[]}
I An integer literal, e.g. 137
x A register, r.g. R1
y A stack slot
c An immediate term, i.e. atom/small int/nil
a An atom, e.g. 'ok'
f A code label
s Either a literal, a register or a stack slot
d Either a register or a stack slot
r A register R0
P A unsigned integer literal
j An optional code label
e A reference to an export table entry
l A floating-point register
Model Elements
The model of the BEAM virtual machine contains the following elements:
1. Registers R0-R255. R0 is a special 'fast' register.
2. IP (instruction pointer) contains the reference to the instruction being executed.
3. CP (continuation pointer) contains a return address of the current function call.
4. EP (stack pointer) points to the last value pushed on stack.
5. HTOP (heap top) points to the first empty word of the heap.
6. The message queue. SAVE pointer marks the message being currently matched.
7. tmpA and tmpB variables that temporarily hold values for the imminent arithmetic or logical operation.
8. Floating-point registers FR0-FR15.
'Live' refers to the current number of registers that hold values still needed by the process. Many instructions take this parameter to be able to perform garbage collection, if necessary.
Instructions
Allocation and initialization:
- allocate t t: Allocates Arg0 slots on the stack. Arg1 is Live.
- allocate_heap t I t: Allocates Arg0 slots on the stack. See allocate t t. Ensures that heap cntains at least Arg1 words after HTOP. Arg2 is Live.
- deallocate I: Restores CP to the value referenced by EP. Adds Arg0 + to EP to remove the saved CP and Arg0 slots from the stack.
- init y: Sets the Arg0 slot to nil. The op was called 'kill' earlier.
- init2 y y: Sets slots Arg0 and Arg1 to nil.
- init3 y y y: Sets slots Arg0, Arg1, and Arg2 to nil.
- test_heap I t: Ensures that the heap has at least Arg0 words available. Arg1 is Live.
Stack manipulation:
- i_trim I: Removes Arg0 slots from the stack. The saved CP is moved to the new top of the stack.
Control flow:
- i_select_val r/x/y f I: Searches an array of {val,addr} pairs for the value of Arg0. The length of the array is Arg2. If the value is found the execution continues at the corresponding addr. Otherwise, jumps to Arg1. Uses binary search and bitwise comparisons.
- i_select_val2 r/x/y f c f c f: Same as i_select_val rxy f I but uses an array made of {Arg2,Arg3} and {Arg4,Arg5}.
- i_jump_on_val rxy f I I: Implements a 'jump table'. Calculates the index by adding Arg0 to the minimum value Arg2. Jumps to Ar1 if the index falls outside of the table boundaries. Arg3 is the length of the jump table. Picks the label from the table using the calculated index and jumps there.
- i_select_tuple_arity r/x/y f I: Checks that Arg0 is tuple and then searches the array for {arity,addr} for the right arity. See i_select_al rxy f I.
- i_select_tuple_arity2 r/x/y f A f A f: Same as i_select_tuple_arity rxy f I but uses the array made up of two pairs: {Arg2,Arg3} and{Arg4,Arg5}.
- return: IP is set to CP.
- jump f: Continues execution at the location Arg0.
Exception handling:
- i_func_info I a a I: Raises function_clause exception. Arg1, Arg2, Arg3 are MFA. Arg0 is a mystery cming out of nowhere, probably, needed for NIFs.
- catch y f: Creates a new 'catch' context. Saves value of the label from Arg1 to the Arg0 stack slot.
- catch_end y: Pops a 'catch' context. Erases the label saved in the Arg0 slot. Noval in R0indicates that something is caught. If R1 contains atom 'throw' then R0 is set to R2. If R1 contains atom 'error' R0 is set to {'EXIT',{R2,<stack trace>}}.
- try_end y: Pops a 'catch' context. Erases the label saved in the Arg0 slot. Noval in 0 indicate that something is caught. If so, R0 is set to R1, R1 – to R2, R2 – to R3.
- try_case_end s: Raises a try_clause exception with the value read from Arg0.
- case_end rxy: Raises a case_clause exception with the value of Arg0.
- badmatch rxy: Raises a badmatch exception with the value of Arg0.
- if_end: Raises a if_clause exception.
- raise s s: Raises the exception. The instruction is garbled by backward compatibility. Arg0 is astack trace and Arg1 is the value accompanying the exception. The reason of the raised exception is dug up from the stack trace.
- badarg j: If Arg0 is not 0 then jumps to the label Arg0. Otherwis, raises the badarg exception.
- system_limit j: Similar to badarg j but raises system_limit eption.
Composite data manipulation:
- get_list rxy rxy rxy: Gets the head of the list from Arg0 and puts it to Arg1, the tail – to Arg2.
- set_tuple_element s d P: Sets the Arg2'th element of the tuple Arg1 to to the value of Arg0. The index in Arg2 is 1-ased.
- i_get_tuple_element rxy P rxy: Gets the value of the the Arg1'th element of the tuple Arg0 and puts it to Arg2. Arg1 is 1-based.
- i_put_tuple rxy I: Creates a tuple on the heap and places it in Arg0. Arg1 is the arity of the tuple. The elements of te tuple follow the instruction. Some of the elements may refer to registers or stack slots.
- put_list s s d: Create a cons cell on the heap and places in onto Arg2. Arg0 is the hed of the list saved to HTOP[0] and Arg1 is the tail saved to HTOP[1].
- test_arity f rxy A: Jumps to the label Arg0 is the arity of the tuple Arg1 is not Arg2. The value of Arg1 s saved to tmpA.
- extract_next_element xy: The pointer in tmpA moved to the next element of the tuple. The element is read fro the location pointed to by tmpA to Arg0.
Types:
- is_number f rxy: Jumps to Arg0 if Arg1 is not integer and is not float.
- is_tuple_of_arity f rxy A: Jumps to the label Arg0 if Arg1 is not a tuple or its arity is not Arg2. If Arg1 is tuple its valu is saved to tmpA.
- is_tuple f rxy: Jumps to the label Arg0 if Arg1 is not a tuple.
- is_integer f rxy: Jumps to the label Arg0 if Arg1 is not integer.
- is_list f rxy: Jumps to the label Arg0 if Arg1 is not a list.
- is_nonempty_list f rxy: Jumps to the label Arg0 if Arg1 is not an nonempty list.
- is_atom f rxy: Jumps to the label Arg0 if Arg1 is not an atom.
- is_float f rxy: Jumps to the label Arg0 if Arg1 is not a float.
- is_nil f rxy: Jumps to the label Arg0 if Arg1 is not nil.
- is_bitstring f rxy: Jumps to the label Arg0 if Arg1 is not a bitstring (or a binary).
- is_reference f rxy: Jumps to the label Arg0 if Arg1 is not a reference.
- is_pid f rxy: Jumps to the label Arg0 if Arg1 is not a pid.
- is_port f rxy: Jumps to the label Arg0 if Arg1 is not a port.
- is_boolean f rxy: Jumps to the label Arg0 if Arg1 is not true or false.
- is_function2 f s s: Jumps to the label Arg0 if Arg1 is not fun or have an arity different from Arg1.
- is_function f rxy: Jumps to the label Arg0 if Arg1 is not a fun.
Message queues:
- recv_mark f: Saves the last pointer of the message queue and the label of the soon exected loop_rec f r.
- i_recv_set f: Sets the SAVE pointer to the saved last pointer if the label is right. This is someow needed for in-order processing of monitoring messages.
- remove_message: Removes the (matched) message from the message queue.
- timeout: Timeout has occured. Resets the SAVE pointer to the first mssage in the message queue.
- timeout_locked: Same as timeout but with SMP precautions.
- i_loop_rec f r: Picks the next message from the message queue and places it in R0. If there are n messages then jumps to Arg0 which points to wait or wait_timeout instruction.
- loop_rec_end f: Advances the SAVE pointer to the next message as the current message did not match.Jumps to Arg0 that points to loop_rec f r instruction.
- send: Sends the message from R1 to the process R0.
Waiting:
- wait f: Prepares to wait indefinitely for a message to arrive. When te message arrives transfers control to the loop_rec instruction at the label Arg0.
- wait_locked f: Same as wait f.
- wait_unlocked f: Same as wait f but messes with SMP locks.
- i_wait_timeout f I/s: Sets the timer to the appropriate time as indicated by Arg1. Afterwards acts as wait f.
- i_wait_timeout_locked f I/s: Same as i_wait_timeout f I but with SMP trickery.
- i_wait_error: Raises a timeout_value exception.
- i_wait_error_locked: Same as i_wait_error but with due respect to SMP.
- i_hibernate: Puts the process into a hibernated state dropping its stack and minimizing ts heap. When a message arrives or if the message queue is not empty the process is waken up and starts executing a function identified by R0 (module), R1 (function), and R2 (arguments).
Comparisons:
- i_is_eq_exact_immed f rxy c: Checks if Arg1 equals Arg2 using bitwise comparison. Jumps to Arg0 if it does not.
- i_is_ne_exact_immed f rxy c: Checks if Arg1 equals Arg2 using bitwise comparison. Jumps to Arg0 if it does.
- i_is_eq_exact_literal f rxy c: Checks if Arg1 equals Arg2 using the term comparison. Jumps to Arg0 if it does not.
- i_is_ne_exact_literal f rxy c: Checks if Arg1 equals Arg2 using the term comparison. Jumps to Arg0 if it does.
- i_is_eq_exact f: Checks if tmpA 'exactly' equals tmpB. Jumps to the label Arg0 if it does not.
- i_is_ne_exact f: Checks if tmpA 'exactly' equals tmpB. Jumps to the label Arg0 if it does.
- i_is_lt f: Checks if tmpA is less than tmpB. Jumps to the label Arg0 if it is not.
- i_is_ge f: Checks if tmpA is greater than or equal to tmpB. Jumps to the label Arg0 if it is not.
- i_is_eq f: Checks if tmpA is equal to tmpB. Jumps to the label Arg0 if it is not.
- i_is_ne f: Checks if tmpA is equal to tmpB. Jumps to the label Arg0 if it is.
Moves and Loads and stores:
- i_fetch s s: Fetches values from Arg0 and Arg1 and places them in tmpA and tmpB respectively.
- fmove qdl ld: Copies Arg0 to Arg1.
- move_x1 c: Sets R1 to the value of Arg0.
- move_x2 c: Sets R2 to the value of Arg0.
- move rxync rxy: Copies Arg0 to Arg1.
Functions:
- i_apply: Applies the function identified by R0 (module), R1 (function), and R2 arguments). Sets CP to the point after the instruction.
- i_apply_last P: Applies the function identified by R0 (module), R1 (function), and R2 (argumets). Afterwards, restores CP from stack and deallocates Arg0 slots from stack.
- i_apply_only: Applies the function identified by R0 (module), R1 (function), and R2 (arguments).
- apply I: Finds the address from the export entry for the function (possibly BIF) identifed by the module R(N), the function R(N+1), and the arity N. The arity is taken from Arg0. Sets CP to the next instruction and jumps to the found address.
- apply_last I P: Similar to apply I but restores CP from stack and drops Arg1 slots from it beforejumping to the found address.
- i_apply_fun: Applies the fun R0 to the argument list in R1. Set CP to the location right afterthe instruction.
- i_apply_fun_last P: Applies the fun R0 to the argument list in R1. Restores CP from stack and deallocates Ar0 slots from stack.
- i_apply_fun_only: Applies the fun R0 to the argument list in R1.
- i_call f: Sets CP to the location after the instruction and then jumps to the label Arg0.
- i_call_last f P: Restores CP from the stack, drops Arg1 slots from stack, and jumps to the label Arg0.
- i_call_only f: Jumps to Arg0.
- i_call_ext e: Sets CP to the point after the instruction and then jumps to the address of te export entry Arg0.
- i_call_ext_last e P: Restores CP from the stack, drops Arg1 slots from the stack, and then jumps to te address of the export entry Arg0.
- i_call_ext_only e: Jumps to the address of the export entry Arg0.
- i_call_fun I: Calls the fun with the arity Arg0. The fun is found in the arity'th registers. Thus the funcall arguments are in registers starting with R0 and the fun itself immediately follows. The result is returned in R0. Sets CP to the point after the instruction and jumps to the fun's entry.
- i_call_fun_last I P: Same as i_call_fun I but restores CP from stack and drops Arg1 slots from it before proceding to the fun's entry.
- i_make_fun I t: Creates a fun and puts it to R0. Arg0 contains a reference to the entry of the lamdba table of the module. Arg1 is the number of free variables. The values of free variables are taken from registers starting with R0. Only these registers are considered live. It means that Arg1 is Live also.
BIFs:
- call_bif0 e: Calls the BIF from Arg0 without arguments and returns result in R0.
- call_bif1 e: Calls the BIF from Arg0 with R0 as the argument and returns result in R0.
- call_bif2 e: Calls the BIF from Arg0 with arguments in R0 and R1 and returns result in R0.
- call_bif3 e: Calls the BIF from Arg0 with arguments in R0, R1, and R2 and returns result in R0.
- bif1 f b s d: Calls the BIF from Arg1 with the argument Arg2 and places the result to Arg3. Upon error jmps to Arg0 as appropriate for the guard BIF.
- bif1_body b s d: Calls the BIF from Arg0 with the argument Argr1 and places the result to Arg3. This is the cas of a guard BIF in the function body. It fails like an ordinary BIF.
- i_bif2 f b d: Calls the guard BIF from Arg1 with the two arguments in tmpA and tmpB and places the result t Arg2. Upon error jumps to Arg0 as appropriate for a guard BIF.
- i_bif2_body b d: Calls the guard BIF from Arg0 with the two arguments in tmpA and tmpB and places the resut to Arg1. This is the case of a guard BIF in the function body. It fails like an ordinary BIF.
- i_gc_bif1 j I s I d: Executes a guard BIF at the address Arg1 with a single parameter Arg2. Arg3 is Live. Stores the result in Arg4. Jumps to Arg0 on error if it is not zero. Otherwise, raises an exception.
- i_gc_bif2 j I I d: Executes a guard BIF at the address Arg1 with two parameters – tmpA and tmpB. Arg2 is Live. Stores the result in Arg3. Jumps to Arg0 on error if it is not zero. Otherwise, raises an exception.
- i_gc_bif3 j I s I d: Executes a guard BIF at the address Arg1 with three parameters – Arg2, tmpA, and tmpB. Arg3 is Live. Stores the result in Arg4. Jumps to Arg0 on error if it is not zero. Otherwise, raises an exception.
- i_get s d: Reads the process dictionary key Arg0 and places the result to Arg1.
- i_fast_element rxy j I d: Gets the Arg2'th element of the tuple Arg0 and places it to Arg3. Arg1 is effectively disregarded.
- i_element rxy j s d: Gets the Arg2'th element of the tuple Arg0 and places it to Arg3. Arg1 is effectively disregarded.
Processes, meta:
- self rxy: Put the identifier of the current process to Arg0.
- node rxy: Put the node identifier of the virtual machine to Arg0.
Matching:
- i_bs_start_match2 rxy f I I d: Checks that Arg0 is the matching context with enough slots for saved offsets. The number of slots needed is given by Arg3. If there not enough slots of if Arg0 is a regular binary then recreates the matching context and saves it to Arg4. If something does not work jumps to Arg1. Arg2 is Live.
- i_bs_save2 rx I: Saves the offset from the matching context Arg0 to the saved offsets array at the index Arg1.
- i_bs_restore2 rx I: Does the opposite of i_bs_save2 rx I. Restores the offset of the matching context Arg0 from the saved offsets array at index Arg1.
- i_bs_match_string rx f I I: Compares Arg2 bits from the matching context Arg2 with the data referenced by the raw data pointer Arg3. Jumps to the label Arg1 if there is no match.
- i_bs_get_integer_small_imm rx I f I d: Gets a (small) integer of size (in bits) Arg1 from the matching context Arg0 and puts it to Arg4. Arg3 are flags. Jumps to the label Arg2 on failure.
- i_bs_get_integer_imm rx I I f I d: Gets an integer of size (in bits) Arg1 from the matching context Arg0 and puts it to Arg5. Arg4 are flags. Arg2 is Live. Jumps to the label Arg3 on failure.
- i_bs_get_integer f I I d: Gets an integer of size tmpB (in units) from the matching context tmpA and saves it to Arg3. Arg1 is Live. Arg2 are flags and unit. Jumps to Arg0 on failure.
- i_bs_get_integer_8 rx f d: Gets an 8-bit integer from the matching context Arg0 and saves it to Arg2. Jumps to Arg1 if there are not enough bits left.
- i_bs_get_integer_16 rx f d: Gets an 16-bit integer from the matching context Arg0 and saves it to Arg2. Jumps to Arg1 if there are not enough bits left.
- i_bs_get_integer_32 rx f I d: Gets an 32-bit integer from the matching context Arg0 and saves it to Arg3. Jumps to Arg1 if there are not enough bits left. Arg2 is Live as the integer may not be small.
- i_bs_get_binary_imm2 f rx I I I d: Gets a binary from the matching context Arg1 of size (in bits) Arg3. Arg2 is Live. Arg4 are flags. The binary is put to Arg5. Jumps yo Arg0 on failure.
- i_bs_get_binary2 f rx I s I d: Gets a binary from the matching context Arg1 of size (in units) Arg3. Arg2 is Live. Arg4 are flags and unit. Puts result to Arg5. Jumps to Arg0 on failure.
- i_bs_get_binary_all2 f rx I I d: Gets the rest of the matching context Arg1 as a binary and puts it to Arg4. The binary size should be divisable by unit Arg3. Arg2 is Live. Jumps to Arg0 upon failure.
- i_bs_get_binary_all_reuse rx f I: Replaces the matching context Arg0 with the rest of its reminder represented as a binary. Arg2 is the unit. The size of the binary should be divisible by unit. Jumps to the label Arg1 on failure.
- i_bs_get_float2 f rx I s I d: Gets the float value form the matching context Arg1. Arg3 is the size in units. Arg4 are unit and flags. Puts the value to Arg5. Arg2 is Live. Jumps to Arg0 on failure.
- i_bs_skip_bits2_imm2 f rx I: Move the offset of the matching context Arg1 by Arg2 bits. Jumps to Arg0 on failure.
- i_bs_skip_bits2 f rx rxy I: Skips2 Arg2 units of the matching context Arg1. Arg3 is the unit size. Jumps to Arg0 on failure.
- i_bs_skip_bits_all2 f rx I: Skips all remaining bits of the matching context Arg1. The number of bits skipped should be divisible by unit Arg2. Jumps to the label Arg0 on failure.
- bs_test_zero_tail2 f rx: Jumps to the label in Arg0 if the matching context Arg1 still have unmatched bits.
- bs_test_tail_imm2 f rx I: Checks that the matching context Arg1 has exactly Arg2 unmatched bits. Jumps to the label Arg0 if it is not so.
- bs_test_unit f rx I: Checks that the size of the remainder of the matching context Arg1 is divisible by unit Arg2. Jumps to the label Arg0 if it is not so.
- bs_test_unit8 f rx: Checks that the size of the remainder of the matching context Arg1 is divisible by 8. Jumps to the label Arg1 if it is not.
- bs_context_to_binary rxy: Converts the matching context to a (sub)binary using almost the same code as i_bs_get_binary_all_reuse rx f I.
- i_bs_get_utf8 rx f d: Extracts a UTF-8 encoded character from the matching context Arg0 and places it to Arg2. Jumps to Arg1 upon failure.
- i_bs_get_utf16 rx f I d: Extracts a UTF-16 encoded character from the matching context Arg0 using flags Arg2 and places it to Arg3. Jumps to Arg1 upon failure.
- i_bs_validate_unicode_retract j: Verifies that a matched out integer tmpA falls into the valid range for Unicode characters. If not the matching context tmpB is rectracted by 32 bits. Arg0 looks like a failure label but it is never used.
Binary data:
- i_bs_init_fail rxy j I d: The same as i_bs_init I I d but may fail if the size in Arg0 is invalid. Arg1 is not used.
- i_bs_init_fail_heap I j I d: Follows the i_fetch d d that loads binary data size into tmpA. Allocates space for a shared binary data of tmpA size. Creates a ProcBin? referencing the data. Makes sure that heap has enough space for a subbinary and Arg0 more words. Arg2 is Live. Puts the ProcBin? to Arg3.
- i_bs_init I I d: Allocates space for a shared binary data of Arg0 size. Creates a ProcBin? referencing the data. Makes sure that the heap has enough space for a subbinary. Arg1 is Live. Puts the ProcBin? into Arg2.
- i_bs_init_heap I I I d: Allocates space for a shared binary data of Arg0 size. Create a ProcBin? referencing the data. Makes sure thatthe heap has enough space for a subbinary and Arg1 more words.Arg2 is Live. Puts the ProcBin? into Arg3.
- i_bs_init_heap_bin I I d: Allocates a heap binary of size Arg0 and makes sure that the heap has enough space for a subbinary. Arg1 is Live. The heap binary is put to Arg2.
- i_bs_init_heap_bin_heap I I I d: Allocates a heap binary of size Arg0 and makes sure that the heap has enough space for a subbinary and Arg1 more words. Arg2 is Live. The heap binary is put to Arg3.
- i_bs_init_bits_fail rxy j I d: Same as i_bs_init_bits_heap I I I d but the number of extra words allocated on the heap is zero and may fail if the number of bits in Arg0 is not valid. Arg1 is not used.
- i_bs_init_bits_fail_heap I j I d: Same as i_bs_init_bits_heap I I I d but the number of bits is fetched to tmpA by previous instruction and may fail if the number of bits in tmpA is not valid. Arg1 is not used.
- i_bs_init_bits I I d: Same as i_bs_init_bits_heap I I I d but the number of extra words allocated on heap is zero.
- i_bs_init_bits_heap I I I d: Allocates a heap binary of size (in bits) Arg0. If the size is not divisible by 8 than allocates a subbinaryreferencing the exact number of bits of the heap binary.Makes sure that heap has Arg1 extra words available. Arg2 is Live. Puts the result (the heap binary or the subbinary) to Arg3.
- i_bs_add j I d: Calculates the total of the number of bits in tmpA and the number of units in tmpB. Arg1 is the unit. Stores the result to Arg2.
- i_bs_init_writable: Allocates a shared data space of size R0. Creates a ProcBin? on the heap referencing the data. Create a subbinary referencing the ProcBin?. Save the subbinary to R0.
- i_bs_append j I I I d: Appends tmpA bits to a binary tmpB. If the binary is a writable subbinary referencing a ProcBin? with enough emptyspace then a new writable subbinary is created and the oldone is made non-writable. In other cases, creates a new shared data, a new ProcBin?, and a new subbinary. For all heap allocation, a space for more Arg1 words are requested. Arg2 is Live. Arg3 is unit. Saves the resultant subbinary to Arg4.
- i_bs_private_append j I d: Same as i_bs_append j I I I d but assumes that the binary writable. Reallocates the shared data if there is not enough space.This should be safe. Changes the subbinary tmpB in place.No heap allocations. Arg1 is the unit. Save the result to Arg2.
- i_new_bs_put_integer j s I s: Assumes that previous instructions created a binary creation context. Packs an integer value Arg3 into Arg1 units of the context. Arg2 are the unit and flags. The failure label Arg0 is unused.
- i_new_bs_put_integer_imm j I I s: Assumes that previous instructions created a binary creation context. Packs an integer value Arg3 into Arg1 bits of the context. Arg2 are flags. The failure label Arg0 is unused.
- i_bs_utf8_size s d: Calculate the number of bytes needed to encode the source operarand to UTF-8. If the source operand is invalid (e.g. wrongtype or range) we return a nonsense integer result (2 or 4). Wecan get away with that because we KNOW that bs_put_utf16 will do full error checking.
- i_bs_utf16_size s d: Calculate the number of bytes needed to encode the source operarand to UTF-16. If the source operand is invalid (e.g. wrongtype or range) we return a nonsense integer result (2 or 4). Wecan get away with that because we KNOW that bs_put_utf16 will do full error checking
- i_bs_put_utf8 j s: Assumes that previous instructions created a binary creation context. Packs a UTF-8 character Arg1 into the context. The failure label Arg0 is unused.
- i_bs_put_utf16 j I s: Assumes that previous instructions created a binary creation context. Packs a UTF-16 character Arg2 into the context. Arg1 are flags. The failure label Arg0 is unused.
- i_bs_validate_unicode j s: Validates a single unicode character Arg1. Raises badarg exception on error. Arg0 is not used. NB: the existence of the instruction may bedependent on the implementation detail of binary construction;see comment in beam_emu.c, line 3963.
- i_new_bs_put_float j s I s: Assumes that previous instructions created a binary creation context. Packs a float of Arg1 units size into the context. Arg2 are the unit and flags. The failure label Arg0 is not used.
- i_new_bs_put_float_imm j I I s: Assumes that previous instructions created a binary creation context. Packs a float Arg3 of Arg1 bits size into the context. Arg2 are flags. The failure label Arg0 is not used.
- i_new_bs_put_binary j s I s: Assumes that previous instructions created a binary creation context. Packs a binary Arg3 of Arg1 units size into the context. Arg2 are flags and the unit. The failure label Arg0 is not used.
- i_new_bs_put_binary_imm j I s: Assumes that previous instructions created a binary creation context. Packs a binary Arg2 of Arg1 bits size into the context. The failure label Arg0 is not used.
- i_new_bs_put_binary_all j s I: Assumes that previous instructions created a binary creation context. Packs a whole binary Arg1 into the context. Arg2 is the unit. The failure label Arg0 is not used.
- bs_put_string I I: Assumes that previous instructions created a binary creation context. Packs a string of characters Arg0 bytes long referenced by the pointer Arg1 into the context.
Arithmetic:
- fconv d l: Converts Arg0 to float and places the result to Arg1.
- i_fadd l l l: Arg2 = Arg0 + Arg1.
- i_fsub l l l: Arg2 = Arg0 - Arg1.
- i_fmul l l l: Arg2 = Arg0 * Arg1.
- i_fdiv l l l: Arg2 = Arg0 / Arg1.
- i_fnegate l l l: Arg1 = -Arg0.
- i_fcheckerror: Checks if the FR0 contains Nan or +-Inf. Raises badarith exception if this is the case.
- fclearerror: Maybe clears FP error flag.
- i_increment rxy I I d: Arg3 = Arg0 + Arg1. Arg2 is Live.
- i_plus j I d: Arg2 = tmpA + tmpB. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_minus j I d: Arg2 = tmpA - tmpB. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_times j I d: Arg2 = tmpA * tmpB. Mixed. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_m_div j I d: Arg2 = tmpA / tmpB. Mixed. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_int_div j I d: Arg2 = tmpA / tmpB. Division is integer. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_rem j I d: Arg2 = tmpA % tmpB. Division is integer. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_bsl j I d: Arg2 = tmpA tmpB. The direction of the shift depends on the sign of tmpB. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_bsr j I d: Arg2 = tmpA >>/<< tmpB. The direction of the shift depends on the sign of tmpB. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_band j I d: Arg2 = tmpA & tmpB. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_bor j I d: Arg2 = tmpA
tmpB. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero. |
- i_bxor j I d: Arg2 = tmpA ^ tmpB. Arg1 is Live. Jumps to Arg0 on error if Arg0 is not zero.
- i_int_bnot j s I d: Arg3 = ~ Arg1. Arg2 is Live. Jumps to Arg0 on error if Arg0 is not zero.
Misc:
- int_code_end: Maybe marks the end of the code.
- label L: The label L is set to the current code location.
- line I: Maybe augments debugging output when backtracking.
Combined instructions for efficiency:
- move_jump f ncxy: Move the value from Arg1 to R0 and then jumps to the label Arg0.
- move2 x y x y / move2 y x y x / move2 x x x x: Combine two move operations.
- move_return xcn r: Combines move xcn r and return.
- move_deallocate_return xycn r Q: Combines move xycn r, deallocate Q, and return.
- deallocate_return Q: Combines deallocate Q and return.
- test_heap_1_put_list I y: Test that the heap has a room for Arg0 words and than conses value of Arg1 with R0.
- extract_next_element2 xy: The pointer in tmpA moved to the next element of the tuple. Copies 2 terms from the rray pointed to by tmpA to the registers or slots pointed to by Arg0. For instance, if Arg0 is R3 then tuple elements are loaded to R3 and R4.
- extract_next_element3 xy: Same as extract_next_element2 xy but copies three elements.
- allocate_init t I y: Combines allocate t I and init y.
- i_move_call c r f: Combines move c r and call f.
- i_move_call_last f P c r: Combines move c r and call_last f P.
- i_move_call_only f c r: Combines move c r and call_only f.
- i_move_call_ext c r e: Combines move c r with i_call_ext e.
- i_move_call_ext_last e P c r: Combines move c r with i_call_last e.
- i_move_call_ext_only e c r: Combines move c r with i_call_only e.
- move_call xy r f: Same as i_move_call xy r f.
- move_call_last xy r f Q: Same as i_move_last f Q xy r.
- move_call_only x r f: Same as i_move_only xy r f.
- allocate_zero t t: Same as allocate t t but sets the slots to nil.
- allocate_heap_zero t I t: Same as allocate_heap t I t but sets the slots to nil.
- i_jump_on_val_zero rxy f I: Same as i_jump_on_val rxy f I I but the minimum value is assumed to be zero.
- is_integer_allocate f rx I I: Combines is_integer f rx and allocate I I.
- is_nonempty_list_allocate f rx I t: Combines is_nonempty_list f rx and allocate I t.
- is_nonempty_list_test_heap f r I t: Combines is_nonempty_list f r and test_heap I t.
Core Erlang
BEAM variants
- LING: alternate unikernel implementation of Erlang's BEAM VM. Fits on MIPS PIC32 MCU
- "To shrink the size of the LING virtual machine a few things were left out (e.g. regular expressions and cryptographic functions) which shrank the resulting image size to about 1M. This meant that LING consumed roughly 50% of Program RAM. The Data RAM was used for heaps of Erlang processes." -- [20]
- https://github.com/cloudozer/ling
Mu's SubX
http://akkartik.name/post/mu-2019-1
https://github.com/akkartik/mu/blob/main/subx.md
https://github.com/akkartik/mu/blob/main/subx_bare.md
https://github.com/akkartik/mu/blob/main/subx_opcodes
https://github.com/akkartik/mu/blob/main/vocabulary.md
http://akkartik.name/akkartik-convivial-20200607.pdf
https://github.com/akkartik/mu
https://github.com/akkartik/mu/blob/main/mu.md
http://akkartik.github.io/mu/html/mu_instructions.html
UCSD Pascal p-code
https://en.wikipedia.org/wiki/UCSD_Pascal
https://en.wikipedia.org/wiki/P-code_machine
Stack-based. "Responsible for wide adoption of Pascal in the 1970s." [21]
continued at [22]