

©2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI: 10.1109/DSD.2016.24

A. Ablak and I. Damaj, HTCC: Haskell to Handel-C Compiler, The 19<sup>th</sup> EUROMICRO Conference on Digital System Design, IEEE, Limassol, Cyprus, August 31–September, 2016. P 192–199.

<https://doi.org/10.1109/DSD.2016.24>

# HTCC: Haskell to Handel-C Hardware Compiler

Ahmed B. Ablak and Issam Damaj  
 Electrical and Computer Engineering Department  
 American University of Kuwait  
 Salmiya, Kuwait  
 Email: {s00015070, idamaj}@auk.edu.kw

**Abstract**—Functional programming languages, such as Haskell, enable simple, concise, and correct-by-construction hardware development. HTCC compiles a subset of Haskell to Handel-C language with hardware output. Moreover, HTCC generates VHDL, Verilog, EDIF, and SystemC programs. The design of HTCC compiler includes lexical, syntax and semantic analyzers. HTCC automates a transformational derivation methodology to rapidly produce hardware that maps onto Field Programmable Gate Arrays (FPGAs). HTCC is generated using ANTLR compiler-compiler tool and supports an effective integrated development environment. This paper presents the design rationale and the implementation of HTCC. Several sample generations of first-class and higher-order functions are presented. In-addition, a compilation case-study is presented for the XTEA cipher. The investigation comprises a thorough evaluation and performance analysis. The targeted FPGAs include Cyclone II, Stratix IV, and Virtex-6 from Altera and Xilinx.

## I. INTRODUCTION

FPGAs are famous and widely used reconfigurable computing (RC) systems. FPGAs have become very popular in research and industrial applications in different fields, such as, security, signal processing etc. FPGAs evolved from being limited in functionality and speed to become high-performance processors. Example FPGAs include Stratix from Altera and Virtex from Xilinx [1], [2]. The flexibility of FPGAs, that are sometimes described as seas-of-gates, enable the development of software paradigms to rapidly reconfigure hardware almost instantly.

Recently, there has been considerable focus on the development of high-level synthesis (HLS) and rapid prototyping hardware/software co-design tools. The targets of co-design tools are high design productivity, simplicity, reduced time-to-prototype, correctness, to name a few. Co-design tools include converting algorithmic behaviors into digital circuits that can map onto FPGAs. High-level co-design tools are currently

beyond behavioral VHDL and the other standard tools. The area witnessed the emergence of programming languages and tools such as Handel-C [3], SystemC [4], Matlab HDL Coder, LabVIEW, etc. All the modern co-design tools enable the integration and partitioning of computations into communicating hardware and software subsystems.

Handel-C is a high-level language with hardware output. Handel-C is based on ANSI C; it is extended to the theory of communication sequential processes (CSP) and the concurrent programming language (OCCAM) [5]. Moreover, Handel-C has the ability to provide both parallel and sequential implementations. Handel-C can target different FPGA types. Recent research effort has been on automating hardware generation to target Handel-C and hardware in general starting from functional specifications, such as, Haskell [6]–[9].

Haskell is a purely functional programming language that utilizes functions to construct programs. Utilizing Haskell functions is presumed to have no side effects, as the evaluation order of the functions is independent [10]. Modern functional languages are characterized by being strongly typed, concise, clear, lazy, and easy to insure correctness. With no doubt, developing hardware circuits based on the functional programming paradigm is a promising and modern topic under investigation [11]–[13]. Much research effort has been done to benefit from the advantages of functional programming languages in hardware design including *Lava* [14], *Hawk* [15], [16], *Hydra* [17], *HML* [18], *MHDL* [19], *DDD* system [20], *SAFL* [21], *MuFP* [22], *Ruby* [23], and *Form* [24].

HTCC compiles a subset of Haskell to Handel-C, in addition to automatically generating VHDL, Verilog, EDIF, and SystemC. The design of HTCC compiler includes lexical, syntax and semantic analyzers. The compiler is generated using ANTLR based-on a subset of Haskell grammar. HTCC Integrated Development Environment (IDE) produces a variety

of analysis and schematic files. HTCC successfully connects to external tools, such as, DK Design Suite, Altera Quartus, and ModelSim. The developed compiler targets several FPGA types, and Altera DE2-70 and DE4 FPGA boards. The targeted area of application is cryptography, namely, the XTEA cipher.

The paper is organized so that Section II presents the rapid prototyping methodology adopted by HTCC. Section III details the HTCC construction including the compiler and IDE designs. The compiler implementation is presented in Section IV. Sections V and VI present the compilation approach of first-class and higher-order functions and a case-study from cryptography. A thorough analysis and evaluation is presented in Section VII. Section VIII concludes the paper and sets the ground for future works.

## II. BACKGROUND

HTCC adopts the transformational derivation and refinement methodology of Abdallah et. al [8], [25]. The adopted methodology refines functional specifications into parallel hardware implementations in Handel-C. Several case-studies for the methodology were carried out by Damaj et. al [9], [26]–[28], however the implementations did not include a compiler that automates the refinement procedure.

Figure 1 depicts the step-wise refinement procedure, where functional specifications are refined to hardware. The adopted methodology is systematic in the sense that it is carried out using the following step-by-step procedure:

- Specify the algorithm in a functional setting relying on higher-order functions as the main building constructs wherever necessary.
- Apply the predefined set of rules to create the corresponding *CSP* networks according to a chosen degree of parallelism.
- Write the equivalent *Handel-C* code and complete the hardware compilation.

The refinement steps are aided by different compilers and integrated development environments. HTCC automates the development process including the background run of existing FPGA vendor interfaces and Haskell, Handel-C, VHDL, Verilog, EDIF, and SystemC compilers.

The adopted methodology refines both datatypes and functions. Datatypes are refined to *Items*, *Streams*, and *Vectors* to create communicating entities based-on the message passing technique. The *Item* corresponds to a basic type, such as an Integer data type , and it is to be communicated on a single communicating channel. The *Stream* is a purely sequential method of communicating a list of values. The *Vector* is a refinement of a simple list of items that communicates the entire structure in parallel [9].

In addition, the methodology refines functions to communicating processes. The refinement comprises a library of standard processes, such as, *Produce* and *Store* that aid the communication of refined datatypes. The *Produce* process is used to produce values on the channels of a certain communication construct (*Item*, *Stream*, *Vector*, etc.). These values are to be received and manipulated by another processes. The



Fig. 1. The transformational derivation and refinement methodology.

process *Store* stores a communication construct in a simple or composite variable [9].

The methodology also supports a rich set of refined higher-order functions, such as, *map*, *zip*, *zipwith*, etc. The refinement of higher-order functions to processes could be done in stream or vector settings, or a combination of them. In Handel-C, datatypes are refined to structures (*struct*), while processes are refined to *macro procedures* [9]. Handel-C compiler generates the required hardware circuits that can be mapped onto FPGAs.

## III. COMPILER CONSTRUCTION

HTCC is a compiler that automates the presented refinement methodology. The presented version of HTCC Integrated Development Environment (IDE) supports the following:

- Compiles a subset of Haskell to Handel-C
- Automatically connects to the DK Design Suite from Mentor Graphics to run the Handel-C Compiler; it verifies, generates, and analyzes the corresponding VHDL, Verilog, EDIF, or SystemC code
- Automatically connects to Glasgow Haskell Compiler (GHC) to run and test the Haskell code
- Automatically connects to Altera Quartus II to run, test, analyze hardware designs; place and route; produce bit files; and target specific FPGAs and FPGA boards.
- Provides an easy-to-use, rich, and modern development environment

### A. Compiler Design using ANTLR

HTCC is developed using the compiler-compiler tool ANTLR. ANTLR provides an easy-to-use compiler construction structure; ANTLR is efficient, reliable, and effective [26]. ANTLR uses an adaptive parsing technique that provides runtime grammar analysis [29]. Moreover, ANTLR uses the Extended BackusNaur Form (EBNF). The efficiency and effectiveness of utilizing ANTLR is primarily due to its ability to support direct left-recursion, side-effecting actions (mutators) and predictions from the corresponding grammar [30].

Figure 2 demonstrates the state machine diagram of HTCC compilation procedure. The Lexical Analyzer analyzes the

input Haskell code by producing a numbered list of lexemes. In addition, the Lexical Analyzer divides the code based on the provided grammar to prepare it for the syntax analysis. The Lexical Analyzer removes all white space between tokens and ignores any input with comment symbol “-”.



Fig. 2. HTCC compiler state machine.

The syntax analyzer is also generated using ANTLR, where a new parse tree is constructed every compilation. ANTLR provides the required Java library to construct parse trees and to walk through them starting on the leftmost side. During the walk-through, the program being compiled is checked for any errors based-on the provided grammar to ANTLR.

The third stage of HTCC compiler is the semantic analysis, where all types of all functions are checked and stored in a table for further processing. Semantic Analysis checks the types of inputs and outputs of each function. The semantic analyzer walks through the parse tree nodes using ANTLR’s tree walker. If any datatype is found to be not supported or mismatched, HTCC terminates the compilation processes and reports the error.

After a successful semantic analysis check, HTCC continues to the intermediate code generation and then to the final code generation. In the intermediate stage, all input and output interface buses and macros are generated. Then, the number of connections among macros is determined and passed to the final generation stage. During the final compilation stage, both Handel-C bus interfaces and Handel-C main method are generated. Moreover, the connections among all macros are generated. The current version of HTCC does not include an optimization stage.

Figure 3 depicts the correspondence used to generate Handel-C macros from Haskell functions. An example Haskell function is as follows:

$$\begin{aligned} add3 :: Int \rightarrow Int \\ add3 x = x + 3 \end{aligned}$$

The *add3* function has one input and one output, where both are of type *integer*. The corresponding Handel-C macro

for *add3* is as follows:

```
macro proc add3 (itemIn, itemOut) {
    typeof itemIn.message x;
    itemIn.channel ? x;
    itemOut.channel ! x+3;
}
```



Fig. 3. Code generation of items

It is very important to notice that *add3* function can be utilized for list processing. The generation correspondence is shown in Figure 4.

$$\begin{aligned} vector\_add3 :: [Int] \rightarrow [Int] \\ vector\_add3 x = map(add3) x \end{aligned}$$

The corresponding Handel-C code includes a version of *add3* based on *items*; the generic implementation of the parallel version of the higher-order function *map* (*VMAP*); the implementation of function *vector\_add3* that invokes *VMAP* macro; and a main function that calls *vector\_add3* with its inputs, outputs, and the number of elements in each vector. The parallel instances of *add3* are replicated using the *par* operator in Handel-C. The generated code is as follows:

```
macro proc add3 (itemIn, itemOut) {
    typeof itemIn.message x; f
    itemIn.channel ? x;
    itemOut.channel ! (x+3); }

macro proc VMAP (vectorIn, vectorOut, n, F) {
    typeof(n) c;
    par(c=0; c<n; c++) {
        F(vectorIn.elements[c],
           vectorOut.elements[c]); } }

macro proc vector_add3 (vectorIn, vectorOut, n) {
```

```
    VMAP (vectorIn, vectorOut, n, add3);
}

void main () {
..
vector_add3 (vector0, vector1, 5);
..
}
```



Fig. 4. Code generation of parallel list processing

### B. IDE Design

The technique used in the development of the IDE separates the programming concern in structuring the code in different Jar files. HTCC IDE adopts the iterative and incremental design model (IIDM) [31]. In the IIDM, each component of the IDE is developed separately as a standalone project which allows it to be integrated into multiple projects. The IDE is implemented using Java under Netbeans [32]. The code editor is implemented using RSyntaxTextArea Java framework. The IDE theme is implemented using JTattoo Java framework. Figure 5 demonstrates the use-case diagram of HTCC IDE. The proposed IDE supports the following:

- Editing and storing project files
  - Highlighting and automatic code completion
  - File navigation, and allows to open multiple files simultaneously
  - Running Haskell code under GHC
  - Compiling Haskell code to Handel-C code. Accordingly simulating Handel-C code and generating VHDL, EDIF, Verilog, and SystemC implementations.
  - Compiling the generated HDL files using Altera Quartus. Accordingly, producing analysis and FPGA mapping files.

The IDE connects HTCC Compiler to external tools, such as, DK Design Suite to simulate and generate VHDL, Verilog, EDIF, and SystemC files. In addition, the IDE connects the compiler to Altera Quartus using the TCL commands to synthesize and generate timing analyses, pin assignments for FPGA boards, and generate bit files to program the targeted FPGAs. GHC is also connected to the IDE to execute and verify Haskell functions. Figure 6 shows a snapshot of the HTCC IDE.



Fig. 5. Use-Case diagram



Fig. 6. HTCC IDE

#### IV. COMPILER IMPLEMENTATION

The following subset of Haskell grammar is part of HTCC compiler code. Here, functions are divided into decelerations (*dcFun*) and definitions (*dFun*):

## V. FIRST-CLASS AND HIGHER-ORDER HASKELL FUNCTIONS

HTCC can generate both first-class and higher-order functions. First-class functions represent simple binary operations, while higher-order functions can take other functions as parameters and usually are operated on lists.

### A. First-Class Functions

A sample generation of the binary operation OR is shown in the following:

$$\begin{aligned} or &:: Int \rightarrow Int \rightarrow Int \\ or \ a \ b &= a .| . b \end{aligned}$$

By compiling the function *or* under HTCC, the generated Handel-C code comprises three items - each has a message of width 32 bits. The first two items are *a* and *b*, and the third item is where the result is stored. In addition, HTCC generates the macro *OR*. HTCC generates three interfaces that are *input0*, *input1*, and *output0* for the inputs and output. In the main method, HTCC creates three items to produce the two inputs and store the output. Similar first-class functions, such as, *AND*, *XOR*, *ADD*, *SUB*, *DIV* can be generated in a similar way. To run the compiled code on the Altera DE2-70, the following is automatically generated by HTCC.

```
set clock = external"AD15";
set reset = external"L8";
#define Item(Name, Msgtype) struct{chan Msgtype
channel; Msgtype message;}Name

unsigned 32 OUTPUT0;
interface bus_in (unsigned 32 value) INPUT0();
interface bus_in (unsigned 32 value) INPUT1();
interface bus_out() O0(unsigned 32 o = OUTPUT0 ) ;

macro proc OR (xItem, yItem,
itemOut) {
typeof (xItem.message) x,y;
item0In.channel ? x;
item1In.channel ? y;
itemOut.channel ! x || y; }

void main () {
Item(item0 , unsigned 32);
Item(item1 , unsigned 32);
Item(item2 , unsigned 32);
par{
PRODUCE(INPUT0.value , item0);
PRODUCE(INPUT1.value , item1);
OR(item0, item1, item2 );
STORE(item2, OUTPUT0);}}
```

### B. Higher-Order Functions

HTCC utilizes a set of parallel and sequential versions of a set of higher-order functions including *map*, *zipWith*, *foldr*, etc. The following is a sample generation of a parallel zipping of two lists with multiplication. Each list contains ten elements. The generation employs the *VectorOfItems* structure and the parallel version of *produce* and *store* macros.

$$\begin{aligned} mul &:: Int \rightarrow Int \rightarrow Int \\ mul \ x \ y &= x * y \end{aligned}$$

```
PROG : STAT+;
STAT : dcFun;
dcFun : ID ' ::' formalType( - >) * NL + dFun;
expr : expr op = ('*' | '/') (DIGIT | expr)
| exprop = ('.&' | '.' | DIGIT | expr)
| exprop = ('+' | '-') (DIGIT | expr)
| ('xor' expr DIGIT)
| ('shiftL' expr DIGIT)
| ('shiftR' expr DIGIT)
| mPassing(mPassing)*
| expr mPassing
| ID*
```

According to the proposed grammar an expression (*expr*) has multiple meanings that captures the definition of the function. *expr* can be any arithmetic or logic operation between two or more variables. In addition, an expression *expr* can call other functions that take place at *mPassing* node. Figure 7 demonstrates the parse tree of the following function:

$$\begin{aligned} f &:: Int \rightarrow Int \\ f \ x &= x + 3 \end{aligned}$$



Fig. 7. The parse tree of function *f*.

A subset of the lexer grammar is as following:

```
ID : [a-zA-Z][a-zA-Z0-9]*;
NL : '\r' ? '\n';
ARROW : '-' > | ' ->';
WS : [\t]+ → SKIP;
DIGIT : [0-9]+;
COMMENT : '--' .*? '\r' ? '\n' → SKIP;
```

```

two_vectors_mul :: [Int] → [Int] → [Int]
two_vectors_mul a b = zipWith(mul) a b

macro proc mul (xItem, yItem, output) {
  typeof (xItem.message) x, y;
  xItem.channel ? x;
  yItem.channel ? y;
  output.channel ! (x*y);}

macro proc VZIPWITH (vectorIn1, vectorIn2,
  vectorOut, n, F) {
  typeof (n) c;
  par (c = 0; c < n; c++) {
    F(vectorIn1.elements[c], vectorIn2.elements[c],
      vectorOut.elements[c]); } }

macro proc two_vectors_mul (vectorIn1, vectorIn2,
  vectorOut, n) {
  VZIPWITH(vectorIn1, vectorIn2, vectorOut, 100, mul);}

void main () {
  VectorOfItems(vector0, 10, unsigned 32);
  VectorOfItems(vector1, 10, unsigned 32);
  VectorOfItems(vector2, 10, unsigned 32);
  par{
    VPRODUCE(INPUT0, vector0, 10);
    VPRODUCE(INPUT1, vector1, 10);
    two_vectors_mul(vector0, vector1, vector2, 10);
    VSTORE(vector2, OUTPUT0);}}

```

## VI. CASE-STUDY: THE RAPID PROTOTYPING OF XTEA UNDER HTCC

To test the applicability of the developed compiler, we use the extended tiny encryption algorithm (XTEA) as a case-study. XTEA uses a 128-bit key to encrypt a 64-bit block ciphertext which follows Feistel ciphers structure with a variable number of rounds. The 128-bit plaintext is divided into two integers  $V0$  and  $V1$ . The key produces a set of integer sub-keys to be distributed to the appropriate round. XTEA is small in size, light in weight, low in power, and a secure block cipher [33]. The following is the functional specification of the XTEA single round under Haskell:

```

xteasround :: Int → uInt32 → (uInt32, uInt32) →
uInt32 → (uInt32, uInt32)
xteasround 1 sum x@(v0, v1) key0 = x
xteasround rounds sum (v0, v1) key0 = xteasround
  (rounds + 1) new_sum (new_v0, new_v1) key where
  new_v0 = xteav0 v0 v1 sum key0
  new_sum = xteasum sum
  new_v1 = xteav1 new_v0 v1 new_sum key0

```

```

xteav0 :: uInt32 → uInt32 → uInt32 → uInt32 →
uInt32
xteav0 v0 v1 sum key0 = v0 +
  (xor (key0+sum) (v1+(xor (shiftL v1 4) (shiftR v1 5))))

```

```

xteasum :: uInt32 → uInt32
xteasum sum = sum + 0x9e3779b9

```

```

xteav1 :: uInt32 → uInt32 → uInt32 → uInt32 →

```

```

uInt32
xteav1 v0 v1 sum key0 = v1 +
  (xor (key0 + sum) (v0 + (xor (shiftL v0 4) (shiftR v0 5))))

```

The data type  $uInt32$  is a user-defined unsigned integer with 32 bits width. A single round of XTEA generates the following sample main function. However, the function  $xteasround$  produces a macro  $XTEASROUND$  when the 32 rounds are replicated to implement the top-level function  $xtea$ .

```

void main {
  par{
    PRODUCE(INPUT0.value, item0);
    PRODUCE(INPUT1.value, item1);
    PRODUCE(INPUT2.value, item2);
    PRODUCE(INPUT3.value, item3);
    xteav0(item0, item1, item2, item3, item4);
    xteasum(item3, item5);
    xteav1(item4, item1, item2, item5, item6);
    STORE (item4, OUTPUT0);
    STORE (item5, OUTPUT1);
    STORE (item6, OUTPUT2);}}

```



Fig. 8. A single XTEA round with its internal computational constructs. The crossed square for the sum, crossed circle for an XOR,  $>>$  for a right shift,  $<<$  for a left shift.

## VII. ANALYSIS AND EVALUATION

The proposed compiler allows for the rapid prototyping of hardware circuits at a high-level of abstraction based-on functional specifications. Functional programming enables designing hardware using clear, concise, and correct-by-construction specifications. Overall, the proposed compiler translates a subset of Haskell to Handel-C and thus enables the usage of Haskell as a hardware description language for programming FPGAs.

HTCC adopts an effective transformational derivation approach that enables the systematic development of CSP concurrency descriptions. Accordingly, the automatic generation of Handel-C code is possible and effective in generating VHDL, EDIF, Verilog, and SystemC descriptions. The refinement methodology provides a variety of parallelism techniques to specify the required degree of parallelism. The methodology provided HTCC with the characteristics of generating a variety of implementations with different parallel characteristics. HTCC benefited from the off-the-shelf first-order, higher-order, and application-specific libraries provided by Damaj et al. [9], [27], [28] and automated the refinement procedure.

HTCC IDE enables the testing and evaluation of both Haskell and Handel-C code through the background connection to their native compilers. HTCC IDE offers the options to display analysis reports supported by Quartus, such as, power consumption, area utilization, timing, RTL views, pin assignments, etc. Furthermore, the adopted IIDM technique allows for the rapid development and integration of the various parts of the IDE with simplicity.

Although the use of ANTLR made the compiler implementation simple, additions are necessary. The main addition in HTCC is the semantic analyzer that was embedded into the adopted ANTLR structure. The embedding enabled effectively for type checking and error reporting using the supported exception handling mechanism.

Table I presents the performance analysis results of the XTEA cipher as generated by HTCC and tested under Cyclone II, Stratix IV and Virtex-6 FPGAs. The Cyclone II FPGA is part of the targeted DE2-70 board. The Stratix IV FPGA is part of the targeted Altera DE4 board. The Virtex-6 FPGA is a high-speed FPGA from Xilinx. The Total Number of NAND Gates as measured under DK Design Suite is 467969 with a total of 192 clock cycles. The highest frequency achieved is 648.54 MHz under Virtex-6, and the lowest power consumption achieved is 219.62 mW under the Cyclone II. In addition, the highest throughput is 219.3 Mbps under Xilinx Virtex-6 FPGA.

TABLE I  
XTEA IMPLEMENTATION RESULTS

|                           | Cyclone II | Stratix IV | Virtex-6     |
|---------------------------|------------|------------|--------------|
| Total logic elements      | 15,573 LE  | 1221 ALUTs | 26660 Slices |
| Fmax (MHz)                | 183.18     | 513.8      | 648.54       |
| Total Execution Time (ns) | 5.46       | 1.95       | 1.52         |
| Throughput (Mbps)         | 61.06      | 171.26     | 219.3        |
| Power consumed (mW)       | 219.62     | 888.47     | 912.4        |

As compared to the performance reported in [33]–[36], the results produced by HTCC achieved the highest throughput of 219.3 Mbps under the Virtex-6 (See Table II). A behavioral implementation of the XTEA cipher under VHDL achieved 134 Mbps, however, the main purpose of the implementation was to achieve a compact and low-power design [33]. The manual Handel-C (HC) implementation achieved a speed of 44.25 Mbps with an Fmax of 177 and an area of 720 Logic Elements.

## VIII. CONCLUSION

HTCC is a Haskell to Handel-C hardware compiler that targets FPGAs. HTCC automates a transformational derivation methodology to rapidly produce hardware circuits from functional specifications. The adopted methodology refines functional programs to a formal concurrency framework, namely, CSP. The methodology enables the systematic refinement of the CSP descriptions to Handel-C; HTCC comes to make this process automatic. Nevertheless HTCC doesn't produce CSP descriptions, this is identified as a future development. The developed compiler effectively produces hardware circuits in various descriptions and languages, such as, VHDL, Verilog, EDIF, and SystemC. HTCC connects to a bouquet of hardware design tools to produce a rich-set of analysis reports and bit-stream files that can map to different FPGAs. The paper includes a case-study from cryptography that produces comparable, and in some instances better results than what is reported in the literature. Indeed, HTCC adopted a functional programming style to benefit from its simplicity, conciseness, and correctness. Future work includes expanding the area of application and widening the pool of implemented Haskell syntax and parallelization options.

## REFERENCES

- [1] Altera, “Web,” Information available from: <https://www.altera.com/>.
- [2] Xilinx, “Web,” Information available from: <http://www.xilinx.com/>.
- [3] MentorGraphics, “Web,” Information available from: <https://www.mentor.com/>.
- [4] P. R. Panda, “SystemC - A modeling platform supporting multiple design abstractions,” in *Proceedings of ISSS01*, October 2001.
- [5] I. page, “Closing the gap between hardware and software: hardware-software cosynthesis at oxford,” in *IEE Colloquium on Hardware-Software Cosynthesis for Reconfigurable Systems*, February 1996, pp. 200–211.
- [6] I. W. Damaj, “Higher-Level Hardware Synthesis of the KASUMI Algorithm,” *Journal of Computer Science and Technology*, vol. 22, no. 1, pp. 60–70, 2007. [Online]. Available: <http://dx.doi.org/10.1007/s11390-007-9007-9>
- [7] J. Hawkins and A. E. Abdallah, “Hardware synthesis of a parallel jpeg decoder from its functional specification,” in *Design Methods and Applications for Distributed Embedded Systems*. Springer, 2004, pp. 197–206.
- [8] A. E. Abdallah and J. Hawkins, “Formal behavioural synthesis of Handel-C parallel hardware implementation for functional specifications,” in *Proceedings of the 36th annual Hawaii international conference on system sciences*. IEEE Computer Society Press, 2003, pp. 278–288.
- [9] I. Damaj, “Parallel Algorithms Development for Programmable Devices with application from cryptography,” *International Journal of Parallel Programming*, vol. 35, no. issue: 6, pp. 529–572, 1st Dec. 2007, journal (Purpose), Published (Status), Elsevier Science (Publisher), New York, U.S.A. (Address), DOI: 10.1007/s10766-007-0046-1.
- [10] S. Thompson, *Haskell: The Craft of Functional Programming*. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1997.
- [11] P. Bjesse, K. Claessen, M. Sheeran, and S. Singh, “Lava: Hardware Design in Haskell,” in *Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming*, ser. ICFP ’98. New York, NY, USA: ACM, 1998, pp. 174–184. [Online]. Available: <http://doi.acm.org/10.1145/289423.289440>
- [12] C. Baaij, “Cλash : from Haskell to hardware,” December 2009. [Online]. Available: <http://essay.utwente.nl/59482/>
- [13] A. ACOSTA, “Hardware synthesis in ForSyDe,” June 2007. [Online]. Available: <http://people.kth.se/~ingo/Papers/ThesisAlfonsoAcosta2007.pdf>
- [14] M. Sheeran, “Hardware design and functional programming: a perfect match.” *J. UCS*, vol. 11, no. 7, pp. 1135–1158, 2005.

- [15] J. Launchbury, J. Lewis, and B. Cook, “On embedding a microarchitectural design language within haskell,” in *Proceedings of the fourth ACM SIGPLAN international conference on Functional programming*. ACM Press, 1999, pp. 60–69.
- [16] J. Matthews, J. Launchbury, and B. Cook, “Specifying microprocessors in hawk,” in *Proceedings of the International Conference on Computer Languages*. IEEE, May 1998, pp. 90–101.
- [17] J. O’Donnell, “Hydra: hardware description in a functional language using recursion equations and high order combining forms,” in *The Fusion of Hardware Design and Verification*, G. J. Milne, Ed. Amsterdam: North-Holland, 1988, pp. 309–328.
- [18] Y. Li and M. Leeser, “HML: An innovative hardware design language and its translation to VHDL,” in *Conference on Hardware Design Languages*, June 1995.
- [19] D. Barton, “Advanced modeling features of MHDL,” in *In International Conference on Electronic Hardware Description Languages*, January 1995.
- [20] S. Johnson and B. Bose, “DDD: A system for mechanized digital design derivation,” Indiana University, Indiana, Tech. Rep. 323, 1990.
- [21] R. Sharp, “Higher-level hardware synthesis,” Ph.D. dissertation, Robinson College University of Cambridge, Cambridge, November 2002.
- [22] M. Sheeran, “muFP: a language for VLSI design,” in *Proc. ACM Symposium on LISP and Functional Programming*. ACM Press, 1984, pp. 104–112.
- [23] G. Jones and M. Sheeran, “Circuit design in ruby,” in *Formal Methods for VLSI design*, pp. 13–70, 1990.
- [24] T. Cheung and G. Hellestrand, “Multi-level equivalence in design transformation,” in *Proceedings of International Conference on Computer Hardware Description Languages*, Chiba Japan, September 1996, pp. 559–566.
- [25] A. E. Abdallah, “Functional process modelling,” *Research Directions in Parallel Functional Programming*, (Springer Verlag, October 1999), pp. 339–360, October 1999.
- [26] T. Parr, *The Definitive ANTLR 4 Reference*, 2nd ed. Pragmatic Bookshelf, 2013.
- [27] I. Damaj, “Co-designs of Parallel Rijndael,” in *The International Symposium on System-on-Chip*. Tampere, Finland: IEEE, 1-2 November 2011, pp. 72–77.
- [28] ——, “Parallel AES Development for Programmable Devices,” in *The Fourth IASTED International Conference on Parallel and Distributed Computing and Networks*, IASTED. Innsbruck - Austria: Acta Press, February 2009.
- [29] T. Parr, *Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages*, 1st ed. Pragmatic Bookshelf, 2009.
- [30] T. Parr, S. Harwell, and K. Fisher, “Adaptive LL(\*) Parsing: The Power of Dynamic Analysis,” *SIGPLAN Not.*, vol. 49, no. 10, pp. 579–598, Oct. 2014. [Online]. Available: <http://doi.acm.org/10.1145/2714064.2660202>
- [31] I. Jacobson, G. Booch, J. Rumbaugh, J. Rumbaugh, and G. Booch, *The unified software development process*. Addison-Wesley Reading, 1999, vol. 1.
- [32] D. R. Heffelfinger, *Java EE 7 Development with NetBeans 8*. Packt Publishing Ltd, 2015.
- [33] I. Damaj, S. Hamade, and H. Diab, “Efficient Tiny Hardware Cipher under Verilog,” in *Proceedings of the 2008 High Performance Computing and Simulation Conference*, 2008.
- [34] M. Botta, M. Simek, and N. Mitton, “Comparison of hardware and software based encryption for secure communication in wireless sensor networks,” in *Telecommunications and Signal Processing (TSP), 2013 36th International Conference on*, July 2013, pp. 6–10.
- [35] P. Yalla and J. Kaps, “Lightweight Cryptography for FPGAs,” in *International Conference on Reconfigurable Computing and FPGAs*, 2009, Dec 2009, pp. 225–230.
- [36] I. A. Shweta Gaba and D. Sujata, “Design of Efficient XTEA using Verilog,” *International Journal of Scientific and Research Publications*, vol. 2, June 2012.

TABLE II  
COMPASSION AMONG SIMILAR XTEA HARDWARE IMPLEMENTATION

| <b>Reference</b>       | <b>[34]</b>      | <b>[35]</b>  | <b>[36]</b> | <b>[33]</b> |
|------------------------|------------------|--------------|-------------|-------------|
| <b>Logic elements</b>  | NA               | 424 LUTs     | 1182 LUTs   | 539 Slices  |
| <b>Fmax (MHz)</b>      | NA               | NA           | 71.11       | 142.4       |
| <b>Total Exe. Time</b> | 2.48 ms          | NA           | 14.06 ns    | NA          |
| <b>Throughput</b>      | 0.39 kB/s        | NA           | NA          | 134 Mbps    |
| <b>Reference</b>       | <b>Manual HC</b> | <b>HTCC</b>  |             |             |
| <b>Logic elements</b>  | 720 LE           | 26660 Slices |             |             |
| <b>Fmax (MHz)</b>      | 177              | 648.54       |             |             |
| <b>Total Exe. Time</b> | 5.6 ns           | 1.52 ns      |             |             |
| <b>Throughput</b>      | 44.25 Mbps       | 219.3 Mbps   |             |             |