Skip to content
Phillip Stevens edited this page Feb 3, 2023 · 33 revisions

Benchmark Strategy

In all cases a simple compile is done to verify the programs generate correct results. Then the programs are compiled for a minimal target +test that eliminates as much unnecessary code as possible; this includes elimination of stdio and as many device drivers as possible. Total program size is recorded (this includes the CODE, DATA and BSS sections but does not include the stack) and the execution time is measured by ticks. z88dk-ticks is a command line z80 emulator that comes with z88dk and can measure execution time of program fragments exactly.

Compilers

To put the current compilers and libraries for z88dk into context, the benchmarks are conducted against common alternatives.

HITECH-C CPM v3.09

Hitech-C (CP/M-80) v 3.09 Hitech's last CPM C compiler. One of the most capable native C compilers for CP/M. Runs under CP/M 2.2 and implements a large subset of C89. This compiler represents the best z80 native code generator. Hitech made this available for free many years ago.

Efforts to maintain the v3.09 compiler are being made. Some benchmarks were run using Release 15, containing many patches including back ported from v7.80pl2 amongst many others.

HITECH-C MSDOS v7.80 Patch Level 2

The last z80 compiler from Hi-Tech, cross compiles z80 and z180 code from MSDOS (and Linux). Supports banking into the full z180 address space. Seems to be near complete compliance with C89. Has both an integrated development environment HPDZ and command line ZC options.
Compiler kindly provided by @artrag.

IAR Z80 V4.06A

IAR's last z80 compiler running under windows. Although it's not currently listed on their webpage, they are willing to sell it to anyone who has the cash.

SDCC

sdcc 4.2.0 r13131 (Linux) sdcc is a current open source C cross compiler targeting several small CPUs including the z80. Its primary feature is that it supports a large subset of modern C standards (C89, C99, C11, C23).
In these tests the new register based calling convention __sdcccall(1) is enabled.

Z88DK/SCCZ80_CLASSIC

(Nightly build 28 April 2021) z88dk's native C compiler sccz80 using the classic C library in z88dk. sccz80 is a derivative of small C with most small C limitations eliminated. Its primary feature is a comprehensive (classic) C library written in assembly language.

Z88DK/SDCC_CLASSIC

(Nightly build 10 March 2022) sdcc 4.2.0 r13131 is used to translate C code with z88dk supplying its (classic) C library and startup code for targets.
In these tests the traditional calling conventions __sdcccall(0), with __z88dk_callee and __z88dk_fastcall, are enabled.

Z88DK/SCCZ80_NEW

(Nightly build 28 April 2021) z88dk's native C compiler sccz80 using the new C library in z88dk. sccz80 is a derivative of small C with most small C limitations eliminated. Its primary feature is a comprehensive (new) C library written in assembly language.

Z88DK/SDCC_NEW

(Nightly build 10 March 2022) sdcc 4.2.0 r13131 is used to translate C code with z88dk supplying its (new) C library and startup code for targets.
In these tests the traditional calling conventions __sdcccall(0), with __z88dk_callee and __z88dk_fastcall, are enabled.

Binary-Trees

The purpose of this benchmark is to verify that malloc/free function trouble free and to measure the speed of malloc/free with allocations done in the context of constructing binary trees.

The work is to create binary trees - composed only of tree nodes all the way down-to depth 0, before any of those nodes are GC'd - using at-minimum the number of allocations of Jeremy Zerfas's C program. Don't optimize away the work.

SIZE Z80 Cycles Wall Clock @4Mhz
Hitech-C CPM v3.09 5821 298,416,076 74.60 sec
Hitech-C Z80 v7.80 4247 240,336,355 60.09 sec
IAR Z80 V4.06A 4525 7,358,336,547 30 min 40 sec
SDCC 7158 188,126,191 47.03 sec
Z88DK/SCCZ80_CLASSIC 3800 145,563,150 36.39 sec
Z88DK/SCCZ80_NEW 2711 6,582,763,903 27 min 25 sec
Z88DK/SDCC_CLASSIC 3536 150,118,736 37.52 sec
Z88DK/SDCC_NEW 2689 6,576,349,618 27 min 24 sec

Notes:

  • NEW library Issue #113 Library optimization for fast realloc causes slow free block search when a thousand blocks are allocated in this benchmark.
  • IAR is likely implementing a heap similar to z88dk's new c library where an emphasis is placed on the speed of realloc().

Dhrystone 2.1

Dhrystone was a common synthetic benchmark for measuring the integer performance of compilers in the 1980s until more modern benchmarks replaced it. It attempts to simulate typical programs by executing a set of statements statistically determined from common programs in the wild.

The benchmark package is available for download.

SIZE Z80 Cycles Wall Clock @4Mhz DHRYSTONES/S DMIPS
Hitech-C CPM v3.09 8988 356,235,065 89.06 sec 224.57 0.1278
Hitech-C Z80 v7.80 7002 280,100,135 70.02 sec 285.61 0.1625
IAR Z80 V4.06A 7371 306,860,580 76.72 sec 260.70 0.1484
SDCC 6825 225,522,684 56.38 sec 354.73 0.2019
Z88DK/SDCC_CLASSIC 7882 251,880,052 62.97 sec 317.61 0.1808
Z88DK/SDCC_NEW 7072 254,720,052 63.68 sec 314.07 0.1787

Notes:

  • Hitech-C Z80 v7.80 must be compiled with global optimizer set to two; higher causes the program to hang.
  • Dhrystone 2.1 is deprecated because optimizing compilers can eliminate redundant statements that were intended to add to execution time. However many z80-era compilers ran this benchmark so it is also available in the z88dk repository.

Fannkuch

The fannkuch benchmark is defined by programs in Performing Lisp Analysis of the FANNKUCH Benchmark, Kenneth R. Anderson and Duane Rettig. FANNKUCH is an abbreviation for the German word Pfannkuchen, or pancakes, in analogy to flipping pancakes. The conjecture is that the maximum count is approximated by n*log(n) when n goes to infinity.

SIZE Z80 Cycles Wall Clock @4Mhz
Hitech-C CPM v3.09 2771 56,614,856 14.15 sec
Hitech-C Z80 v7.80 868 51,982,515 12.99 sec
IAR Z80 V4.06A 1347 56,708,022 14.18 sec
SDCC 962 57,325,388 14.33 sec
Z88DK/SCCZ80_CLASSIC 1763 75,381,296 18.84 sec
Z88DK/SCCZ80_NEW 957 77,386,481 19.35 sec
Z88DK/SDCC_CLASSIC 1304 59,756,269 14.94 sec
Z88DK/SDCC_NEW 1070 56,090,095 14.02 sec

Fasta

The program should:

  • generate DNA sequences, by copying from a given sequence.
  • generate DNA sequences, by weighted random selection from 2 alphabets.
  • convert the expected probability of selecting each nucleotide into cumulative probabilities.
  • match a random number against those cumulative probabilities to select each nucleotide (use linear search or binary search).
  • use this naïve linear congruential generator to calculate a random number each time a nucleotide needs to be selected (don't cache the random number sequence).
SIZE Z80 Cycles Wall Clock @4Mhz
Hitech-C CPM v3.09 5638 189,901,647 47.47 sec
Hitech-C Z80 v7.80 4121 DISQ
IAR Z80 V4.06A 6041 223,805,149 55.95 sec
SDCC 5835 373,202,979 93.30 sec
Z88DK/SCCZ80_CLASSIC 3291 243,021,012 60.76 sec
Z88DK/SCCZ80_CLASSIC/MATH32 3978 136,057,474 34.01 sec
Z88DK/SCCZ80_NEW 2998 204,281,085 51.07 sec
Z88DK/SCCZ80_NEW/MATH32 3729 136,057,141 34.01 sec
Z88DK/SDCC_CLASSIC 3583 248,331,410 62.08 sec
Z88DK/SDCC_NEW 3171 245,055,005 61.26 sec

Notes:

  • Hitech-C Z80 v7.80pl2 produces incorrect results on all optimization levels.
  • SDCC's performance is hurt by a floating point package implemented in C.
  • Z88DK/SCCZ80_CLASSIC uses the genmath float library while the other Z88DK compiles use math48.
  • Z88DK/SDCC uses a 48-bit float internally but this is converted to 32-bit at the compiler-library interface since sdcc only understands a 32-bit float type.
  • Z88DK/SCCZ80/MATH32 uses the math32 32-bit IEEE-754 floating point package.

n-Body

Model the orbits of Jovian planets, using the same simple symplectic-integrator. Thanks to Mark C. Lewis for suggesting this task.

Useful symplectic integrators are freely available, for example the HNBody Symplectic Integration Package.

SIZE Z80 Cycles Wall Clock @4Mhz
Hitech-C CPM v3.09 5633 1,594,771,948 6 min 38 sec
Hitech-C Z80 v7.80 3736 1,600,543,903 6 min 40 sec
IAR Z80 V4.06A 4084 2,331,516,019 9 min 43 sec
SDCC 7141 3,163,137,393 13 min 11 sec
Z88DK/SCCZ80_CLASSIC 4493 3,658,052,111 15 min 14 sec
Z88DK/SCCZ80_NEW 3363 2,376,486,525 9 min 53 sec
Z88DK/SCCZ80_NEW/MATH32 5149 754,266,702 3 min 8 sec
Z88DK/SCCZ80_NEW/MATH16 3227 0,384,230,543 1 min 36 sec
Z88DK/SDCC_CLASSIC 5246 2,253,709,929 9 min 23 sec
Z88DK/SDCC_NEW 4332 2,247,889,896 9 min 22 sec

Notes:

  • SDCC's performance is hurt by a floating point package implemented in C.
  • Z88DK/SCCZ80_CLASSIC uses the genmath float library while the other Z88DK compiles use math48.
  • Z88DK/SDCC uses a 48-bit float internally but this is converted to 32-bit at the compiler-library interface since sdcc only understands a 32-bit float type.
  • Z88DK/SCCZ80_NEW/MATH32 uses the math32 32-bit IEEE-754 floating point package.
  • Z88DK/SCCZ80_NEW/MATH16 uses the math16 16-bit IEEE-754 floating point package.

Pi

Pi.c computes pi to 800 decimal places. It is based on an implementation found at crypto.stanford.edu.

Pi.c measures 32-bit integer math performance. The computation can make good use of ldiv() but not all compilers supply this function so the program is run with and without ldiv() for comparison purposes.

Z88DK's new C library has a fast integer math option so the table below shows results for it as well as the normal build using the small integer math option.

The first set of numbers are without the use of ldiv() and the second with using ldiv().

Size Z80 Cycles Wall Clock @4Mhz Size Z80 Cycles Wall Clock @4MHz
Hitech-C CPM v3.09 8342 5,532,347,800 23 min 03 sec
Hitech-C Z80 v7.80 6593 5,528,979,464 23 min 02 sec 6728 5,892,567,264 24 min 33 sec
IAR Z80 V4.06A 6789 8,762,223,085 36 min 31 sec 7006 8,799,503,282 36 min 40 sec
SDCC 6591 6,649,404,381 27 min 42 sec
Z88DK/SCCZ80_CLASSIC 6508 4,012,440,830 16 min 43 sec
Z88DK/SCCZ80_NEW 6269 4,012,440,735 16 min 43 sec 6182 2,576,381,983 10 min 44 sec
Z88DK/SCCZ80_NEW_FAST 8999 1,696,878,309 7 min 04 sec 9131 1,301,832,933 5 min 25 sec
Z88DK/SDCC_CLASSIC 6600 4,169,137,078 17 min 22 sec
Z88DK/SDCC_NEW 6246 4,067,517,071 16 min 57 sec 6388 2,609,489,119 10 min 52 sec
Z88DK/SDCC_NEW_FAST 8997 1,756,864,232 7 min 19 sec 9097 1,339,849,656 5 min 35 sec

Notes:

  • Although HITECH-C Z80 v7.80 supplies ldiv(), it still performs two divisions to get quotient and remainder.
  • SDCC's performance is hurt by having its 32-bit math routines implemented in C.
  • Z88DK's small integer math library demotes long multiplies to integer where possible.
  • Z88DK's fast integer math library is able to reduce most 32-bit divides to 16-bit divides. The loop unrolling option is not enabled.

Sieve of Eratosthenes (Prime Numbers)

Sieve.c finds all the prime numbers in [2,7999]. The algorithm is known as the Sieve of Eratosthenes.

This is a popular benchmark for small machine compilers because just about every compiler is able to compile it. As a benchmarking tool it's mainly measuring loop overhead.

SIZE Z80 Cycles Wall Clock @4Mhz
Hitech-C CPM v3.09 10297 7,916,099 1.9790 sec
Hitech-C Z80 v7.80 8472 3,885,436 0.9713 sec
IAR Z80 V4.06A 8772 3,714,152 0.9285 sec
SDCC 8278 4,219,481 1.0548 sec
Z88DK/SCCZ80_CLASSIC 8589 4,957,733 1.2394 sec
Z88DK/SCCZ80_NEW 8362 4,957,733 1.2394 sec
Z88DK/SDCC_CLASSIC 8558 4,510,806 1.1277 sec
Z88DK/SDCC_NEW 8315 3,665,494 0.9163 sec

Notes:

  • Z88DK/SCCZ80 tries to generate small code by turning primitive compiler operations into subroutine calls. The additional call/ret overhead of these subroutine calls is significant in the small loop code and this is what hurts its performance in comparison to other compilers.

Whetstone 1.2

Whetstone is a synthetic floating point benchmark. The benchmark package is available for download.

Floating point performance depends strongly on the number of mantissa bits in the float type.

Float Size Mantissa Bytes Z80 Cycles Wall Clock @4MHz KWIPS
Hitech-C CPM v3.09 32 24 9076 646,520,995 161.6302 sec 6.1870
Hitech-C Z80 v7.80 32 24 6919 614,748,605 153.6871 sec 6.5067
IAR Z80 V4.06A 32 24 6524 732,360,277 183.0901 sec 5.4618
SDCC 32 24 10935 1,491,668,242 372.9170 sec 2.6816
Z88DK/SCCZ80_CLASSIC 48 40 6359 1,283,271,893 320.8179 sec 3.1170
Z88DK/SCCZ80_NEW 48 40 5362 972,899,568 243.2248 sec 4.1114
Z88DK/SCCZ80/MATH32 32 24 8921 567,396,426 141.8491 sec 7.0497
Z88DK/SDCC_CLASSIC 32(48) 24(40) 7588 920,781,972 230.1954 sec 4.3441
Z88DK/SDCC_NEW 32(48) 24(40) 6221 914,412,771 228.6031 sec 4.3743
Z88DK/SDCC/MATH32 32 24 10113 576,187,434 144.0468 sec 6.9421

Notes:

  • Hitech-C CPM v3.09 produces some results with some error in the third decimal position.
  • SDCC's performance is hurt by a floating point package implemented in C.
  • Z88DK/SCCZ80_CLASSIC uses the genmath float library while the other Z88DK compiles use math48.
  • Z88DK/SCCZ80/MATH32 uses the math32 32-bit IEEE-754 floating point package.
  • Z88DK/SDCC uses a 48-bit float internally but this is converted to 32-bit at the compiler-library interface since sdcc only understands a 32-bit float type.
  • Z88DK/SDCC/MATH32 uses the math32 32-bit IEEE-754 floating point package.
Clone this wiki locally