Working Development Tree For the Fuzix Compiler Kit

Design

cc0 is a tool that tokenizes a C file and handles all the messy number conversions and string quoting to produce a token stream for a compiler proper to consume. It also extracts all the identifiers and numbers them, before writing them out in a table.

cc1 takes the tokenized stream and generates an output stream that consists of descriptors of program structure (function/do while/statement etc) with expression trees embedded within.

cc2 will then turn this into code.

In theory it ought to also be possible to add a cc1b that further optimizes the trees from cc1.

Status

The compiler is currently used to build the Fuzix OS for 8080, 8085 and Z80 and can cross build itself to run natively on these systems. The core code should be reasonably stable. There is a lot of performance work to do on the compiler itself and there are still a couple of deviations from spec that would be nice to fix. The backends for 8080/5/Z80 should be fairly stable but are being used to experiment with improvements.

The other processor trees are very much a work in progress.

Installation

As a cross compiler the front end expects it all to live in /opt/fcc/. The tool chain provides the compiler front end and phases. For cpp for now it uses the gcc preprocessor on Linux and DECUS cpp on Fuzix.

Either make the /opt/fcc directory and make it owned by your user or do the install phase with appropriate privileges.

The assembler and loader tools required live in the Fuzix-Bintools repository. To build it all first clone the Fuzix-Bintools respository and make install. Then make sure /opt/fcc is on your path.

Now clone this repository. In the Fuzix-Compiler-Kit directory do:

make bootstuff
make install

This will build a bootstrap then build the full tools and install them.

Intended C Subset

The goal is to support the following

Types

char, short, int, long, signed and unsigned
float, double
struct, union
enum
typedef

Currently the compiler requires that the target types all fit into the host unsigned long type.

Currently the compiler hardcodes assumptions that a char is 8bits, short 16bit and long 32bits (see tree.c:constify and helpers). This needs to be addressed.

Storage classes

auto, static, extern, typedef, register

register is dependent upon the backend.

C Syntax

standard keywords and flow control
labels, and goto
statements and expressions
declarations
ANSI C function declarations

Intentionally Omitted

Things that add size and complexity or are just pointless.

K&R function declarations
Most C95 stuff - wide char, digraph etc
Most C99 bloat by committee
C11 bloat by committee
struct/union passing, struct/union returns and other related badness
bitfields
const and volatile typing. To do these makes type handling really really tricky. They are accepted so that code with them can build and some magic tricks are done to get volatile right

Known incompatibilities (some to be fixed)

The constant value -32768 does not always get typed correctly. The reason for this is a complicated story about how cc0/cc1 interact.
Many C compilers permit (void) to 'cast' the result of a call away, we do not.
Local variables have a single function wide scope not a block scope

Backend Status

1802

An experimental bytecode engine for the 1802. The bytecode side of the generation appears to be functional (except for floats) and the bytecode simulation passes the basic tests. The next steps are a bytecode format assembler for user bytecode pieces, and to start to build and debug the actual 1802 interpreter. It should also be a good basis for any other CPU needing this sort of treatment.

6303/6803/68HC11

This is an early sketch only based upon the CC6303 code generation and support code.

6502

Early development code for a 6502/65C02 backend. Before this can be effective there will need to be some work on rewriting subtrees to use byte operations when possible.

65C816

An intial 65C816 native port that passes the test suite but probaly has some bugs left to find. As this port is designed for Fuzix and run in any bank it uses Y as the C stack pointer and uses the CPU stack for temporary values during expression evaluation and the all actual call/return addresses. Split code/data is supported but not multiple data or code banks in one application (that is pointers are 16bit). Going beyond that gets very ugly very fast as on 8086.

8080/8085

The compiler generates reasonable 8080 code and knows how to use call stubs for argument fetching/storing to get compact code at a performance cost if requested. On the 8085 extensive use is made of LDSI, LHLX and SHLX to get good compact code generation.

Long maths is quite slow but is not trivial to optimize, particularly on the 8080 processor. There is also no option to use RST calls for the most common bits of code for compactness (quite possibly worth 1Kb or more for some stuff). The code generator does not know the fancy tricks for turning constant divides into shift/multiply sets.

The BC register is used as a register variable for either byte or word constants, or a byte pointer. As there is no word sized load/store via BC or easy way to do it the BC register pair is not used for other pointer sizes.

Signed comparison and sign extension are significantly slower than unsigned. This is an instruction set limitation.

Z8

This port now passes all of the self tests and the code coverage compile tests. It has not yet been used except on test sets so probably contains a few bugs. Split I/D is supported.

Z80 / Z180

The Z80 code generator will generate reasonable Z80 code. The processor itself is difficult to use for C as fetching objects from the stack is slow as on the 8080. The compiler will use BC, IX and IY for register variables and knows how to use offsets from IX or IY when working with structs.

If IX or IY are free they will be used as a frame pointer, if not the compiler assumes the programmer knows what they are doing and will assign them as register variables whilst using helpers for the locals.

The Z180 is not yet differentiated. This will only matter for the support library code and maybe inlining a few specific multiplication cases.

ThreadCode

An initial backend that turns the C input into a series of helper references and data. This can easily be tweaked to make them calls, and peephole rules used to clean up or re-arrange them a bit to suit any need or turn it into byytecode etc.

Default

This is a simple test backend the just turns the input into a lot of calls. It is intended as a reference only although it may be useful for processors that require a threadcode implementation or to build an interpreted backend.

Internals

cc0

Takes input from stdin and outputs tokens to stdout. The core of the logic is pretty basic, the only oddity is using strchr() in a few places because it's often hand optimized assembler. Tokens are 16bits. C has some specific rules on tokenizing which make it simple at the cost of producing unexpected results from stuff like x+++++y; (x++ ++ +y).

All names are translated into a 16bit token number. So for example every occurence of "fred" might be 0x8004. The cc0 stage has no understanding of C scoping so 0x8004 isn't tied to any kind of scope, merely a group of letters.

After tokenizing it writes the symbol table out to disk as well. It turns out that the compiler phase has no use at all for symbol names and they take a lot of space to store and slow down comparisons.

cc1

This is essentially a hand coded recursive descent parser. Higher level constructs are described by headers and footers. Within these blocks the compiler stores expression trees per statement. Trees do not span statements nor does the compiler do anything at a higher level. There is enough information to turn functions or even entire programs into a single tree if the code generator or an optimizer pass wished.

The biggest challenge on a small machine is the memory management. To keep things tight types are packed into 16bits. Where the type is complex it contains an index to an object in the symbol table which describes the type in question (and if the type is named also has the type naming attached).

Various per object fields are packed into runs of 16bit values, such as struct field information and array sizes.

To maximise memory efficiency without losing the checking the compiler packs all functions with the same signature into the same type. As most functions actually have one of a very small number of prototypes this saves a lot of room.

cc2

For now just testing a very simple left hand walking code generator with minimal awareness of consts and names that can be directly accessed. This should suit simpler processors like the 6502, 680x, 8080, 8085 etc but isn't a good model for register oriented ones.

On the other hand it's ludicrously easy to change it to produce fairly bad code for any processor you want.

Credits

The expression parser was created by turning the public domain SmallC 3.0 one into a more traditional tree building recursive parser and testing it in SmallC. The rest of the code is original although the design is influenced by several small C subset compilers and also ANSI pcc.

Licence

Compiler (not any runtime) : GPLv3

copt is from Z88DK. Z88DK is under the Clarified Artistic License

Name		Name	Last commit message	Last commit date
Latest commit History 1,345 Commits
fp		fp
support1802		support1802
support6303		support6303
support6502		support6502
support65c816		support65c816
support6800		support6800
support6803		support6803
support6809		support6809
support68hc11		support68hc11
support8080		support8080
support8085		support8085
supportee200		supportee200
supportnova3		supportnova3
supportsuper8		supportsuper8
supportz8		supportz8
supportz80		supportz80
test		test
ABI.md		ABI.md
Backend.md		Backend.md
COPYRIGHT.lorder		COPYRIGHT.lorder
Language.md		Language.md
Makefile		Makefile
Makefile.z80		Makefile.z80
Operations.md		Operations.md
README.md		README.md
TARGET.md		TARGET.md
TODO		TODO
backend-1802.c		backend-1802.c
backend-6502.c		backend-6502.c
backend-65c816.c		backend-65c816.c
backend-6800.h		backend-6800.h
backend-8070.c		backend-8070.c
backend-8080.c		backend-8080.c
backend-8086.c		backend-8086.c
backend-default.c		backend-default.c
backend-ee200.c		backend-ee200.c
backend-nova.c		backend-nova.c
backend-super8.c		backend-super8.c
backend-threadcode.c		backend-threadcode.c
backend-z8.c		backend-z8.c
backend-z80.h		backend-z80.h
backend.c		backend.c
backend.h		backend.h
be-code-6800.c		be-code-6800.c
be-code-6809.c		be-code-6809.c
be-codegen-6800.c		be-codegen-6800.c
be-codegen-z80.c		be-codegen-z80.c
be-func-6800.c		be-func-6800.c
be-func-z80.c		be-func-z80.c
be-rewrite-z80.c		be-rewrite-z80.c
be-track-6800.c		be-track-6800.c
body.c		body.c
body.h		body.h
cc.c		cc.c
cc.hlp		cc.hlp
compiler.h		compiler.h
copt.c		copt.c
copts.c		copts.c
cpp		cpp
cpp6502		cpp6502
cpp65c816		cpp65c816
cpp85		cpp85
cppbyte		cppbyte
cppthread		cppthread
cppz80		cppz80
declaration.c		declaration.c
declaration.h		declaration.h
dumptokens.c		dumptokens.c
enum.c		enum.c
enum.h		enum.h
error.c		error.c
error.h		error.h
expression.c		expression.c
expression.h		expression.h
frontend.c		frontend.c
header.c		header.c
header.h		header.h
idxdata.c		idxdata.c
idxdata.h		idxdata.h
initializer.c		initializer.c
initializer.h		initializer.h
label.c		label.c
label.h		label.h
lex.c		lex.c
lex.h		lex.h
lorder6809		lorder6809
lorder8080		lorder8080
lorderee200		lorderee200
lorderz8		lorderz8
lorderz80		lorderz80
main.c		main.c
primary.c		primary.c
primary.h		primary.h
rules.1802		rules.1802
rules.6502		rules.6502
rules.65c816		rules.65c816
rules.6800		rules.6800
rules.6809		rules.6809
rules.8070		rules.8070
rules.8080		rules.8080

EtchedPixels/Fuzix-Compiler-Kit

Folders and files

Latest commit

History