xanxys.net / hs2bf

Introduction

hs2bf is a haskell to brainfuck compiler. I believe this is the first attempt to write a compiler that can translate a high-level language such as Haskell to brainfuck. You can download it from hackageDB or github. The github repository contains test codes under test/, which show some of hs2bf's capability.

Quick Overview of (Un)supported Features

Supported Language Features

Supported Runtime System Features

Missing Features

Supported Code

Limitation

no type inference / type annotation: The only primitive type is Byte.

I/O

hs2bf uses special data structure E to represent behavior of an arbitrary brainfuck program.
data E=Input (Char -> E)
      |Output Char E
      |Halt

main :: E

Examples

echo:
main=Input (\x->Output x main)
hello world:
main=outputStr Halt "Hello World!"

outputStr k []=k
outputStr k (x:xs)=Output x (outputStr k xs)
quicksort:
main=outputStr Halt (qsort "etsb")

qsort []=[]
qsort (x:xs)=qsort (filter (gtByte x) xs)++[x]++qsort (filter (leByte x) xs)

outputStr k []=k
outputStr k (x:xs)=Output x (outputStr k xs)

As you can see, hs2bf supports very powerful subset of Haskell98 syntax.

You can find more examples in hs2bf-0.5-test.tar.bz2 (Actually, these files were used for automated regression testing)

Overall structure

(stub)
3 intermidiate languages, namely Core, GM(G-machine) and SAM(sequntial access machine) are used during compilation. And GM and SAM have interpreters for debugging.

Haskell parser

haskell-src library is used to parse Haskell code.

Core language

... is something like this:
outputStr #a0 #a1=
    case ((XT2 #a0) #a1) of
        XT2 #xa #xb ->
            let
                k = #xa
            in
                case #xb of
                    XCons   ->
                        let
                            k = #xa
                        in
                            case #xb of
                                XCons #xa #xb ->
                                    let
                                        x = #xa
                                    in
                                        let
                                            xs = #xb
                                        in
                                            ((Output x) ((outputStr k) xs))
                    XNil ->
                        k

G-machine

reference: The Implementation of Functional Programming Languages
GM code looks like:
XT2:
    PushArg 2
    PushArg 2
    Pack 0 2
    Slide 3
You should read the paper by SPJ if you haven't.

Sequential Access Machine

This is like an assembly language w/o absolute addressing.

Its code consists of procedures which can take fixed number of arguments. Its state consists of named memory regions and fixed number of registers, and a pointer shared among the memory regions.

Operands of SAM instruction can be either:
  1. relative pointer in specified memory region
  2. a register

Registers can be allocated and deallocated dynamically but they can't be passed across scopes(like variables in C). This pseudo "deallocation" is implemented to ease code generation.

S0 H0

pr #heapRef/addr
    val addr -1
    while addr
        val addr -1
        alloc cnt
        copy $H0 cnt
        while cnt
            val cnt -1
            locate 1
        delete cnt

The first line "S0 H0" declares memory regions which are available throughout the code.

"pr #heapRef/addr" defined a procedure "#heapRef" which takes one argument "addr".

"val addr -1" decrements the value of addr modulo 256.

"locate 1" moves the current pointer forward by one, and this will affect memory reference like "$H0".

Runtime System

Memory Layout

Basic Mechanism

Raw brainfuck memory space is Nat->Byte. So you can combine N such memory spaces [Nat->Byte] by interleaving them.

In addition to this, you'll need registers to carry values around since brainfuck doesn't support absolute addressing. If you want M registers, you can create N+M memory spaces with the previous method and move values in M memory spaces everytime you modify the pointer.

Actual Implementation

memory mapping

As of now, hs2bf uses 3 memory spaces S0,H0 and H1. S0 is used for stack of heap node ids, and H0 and H1 are two heaps in a copying garbage collection.

A node is a variable size structure which ids starting from 1. Address 0 of the stack is marked 0, so it is always possible to return to address 0 wherever you are.

heap node structure:
1B: node size
1B: "reachable" flag (for garbage collection, 0 means unreachable from the root, 1 means otherwise)
1B: node type
*B: payload
1B: node id
1B: node size

State Machine

finite state machine
Since you can't "call" a function, the procedure call is represented by one huge "dispatch" sentence which encompasses all procedures.

brainfuck specific techniques

"if" sentence

"if v>0 then XXX" is easily realized by [[-]XXX], assuming v is currently referred by the pointer.

A little bit diffucult problem is "if v==0 then XXX", and this can be done through negation. For example, let a temporary variable u be 1, and execute "if v>0 then u--; if u>0 then XXX".

comparison of numbers

Suppose you have two numbers x,y :: Byte, and you want to compare them.

equality

Decrement both x and y until x becomes 0, and check if y is 0.

oredering

If there's a way to subtract min x y from x,y, then you can tell which one's larger than (or equal to) the other. So the problem of comparing two numbers is reduced to logical operations which can be done by nested if. But there's a more complex and faster way.

expansion of "switch" sentence in SAM

The following code:
dispatch f
    0
        X
    1
        Y
    2
        Z
can be expanded to:
clr 1 t
while f
    val 1 f -1
    dispatch f
        0
            Y
        1
            Z
    clr 1 t
    val 1 t +1
while t
    X
    clr 1 t
and to:
clr 1 t
while f
    val 1 f -1
    clr 1 t
    while f
        val 1 f -1
        Z
        clr 1 t
        val 1 t +1
    while t
        Y
        clr 1 t
    clr 1 t
    val 1 t +1
while t
    X
    clr 1 t
This expansion method can be applied recursively.

addition w/ carry

eg. Addition of x,y::Byte. if x < x+y && y < x+y, then there's no carry.

Other Reference

You can read the comment in the source code, or my blog(in japanese).