

# TCE Tour: Screencast

- These are accompanying slides for a set of TCE screencast clips available at  
<http://tce.cs.tut.fi/index.php/home/screencasts>
- The set of videos goes through the most important tools in TCE by means of a simple CRC example application
- Starts from C, ends with a TTA+program running on an FPGA board



# Intro and Exploration (about 7 minutes)

- Start with the application code in C
- minimal.adf is shipped in TCE:
  - a minimal set of resources in TTAs supported by the tcecc compiler
- reflect.vhdl is a VHDL implementation of a custom operation which will be added later to the design





# Intro and Exploration

- Open the minimal.adf to the Processor Design GUI
  - “prode minimal.adf &”





# Intro and Exploration

- The main.c has a simple string “TCE rocks!” for which we are going to compute the CRC



emacs22-gtk@untamo

File Edit Options Buffers Tools C Help

File

mini

```
* Notes: To test a different CRC standard, modify crc.h.  
*  
* Copyright (c) 2000 by Michael Barr. This software is placed into  
* the public domain and may be used for any purpose. However, this  
* notice must not be changed or removed and no warranty is either  
* expressed or implied by its publication or distribution.  
******/  
*  
* TCE version modified by Otto Esko  
*  
* Notes: Only crc-32 is used  
*  
*****/  
#ifdef _DEBUG  
#include <stdio.h>  
#endif /* _DEBUG */  
  
#include "crc.h"  
  
volatile crc result = 0;  
unsigned char test[] = "TCE rocks!";  
#define LENGTH 10  
  
int  
main(void) {  
  
    /*  
     * Compute the CRC of the test message, more efficiently.  
     */  
    result = crcFast(test, LENGTH);  
  
    #ifdef _DEBUG  
    printf("The crcFast() of \"TCE rocks!\" is 0x%08X\n", result);  
    #endif /* _DEBUG */  
  
    return 0;  
}
```

(DOS) — main.c Bot (29,25) (C/I Abbrev)



emacs22-gtk@unitamo

File Edit Options Buffers Tools C Help

File

mini

Func

(DOS)-- crc.c Bot (196,0) (C/1 Abbrev)

```
/*
 * Function:    crcFast()
 *
 * Description: Compute the CRC of a given message.
 *
 * Notes:       crcInit() must be called first.
 *
 * Returns:     The CRC of the message.
 */

crc
crcFast(unsigned char const message[], int nBytes)
{
    crc           remainder = INITIAL_REMAINDER;
    signed char   data;
    int           byte;

    /*
     * Divide the message by the polynomial, a byte at a time.
     */
    for (byte = 0; byte < nBytes; ++byte)
    {
        data = REFLECT_DATA(message[byte]) ^ (remainder >> (WIDTH - 8));
        remainder = crcTable[data] ^ (remainder << 8);
    }

    /*
     * The final remainder is the CRC.
     */
    return (REFLECT_REMAINDER(remainder) ^ FINAL_XOR_VALUE);
} /* crcFast() */

"crc.c" selected (5.8 KB)
```

ner - minimal.adf

intro - File Browser

Reload Home Computer Search 100% Icon View

crc.h crcTable.dat main.c reflect.vhdl

0UL, 7 58365 12216 18556

# Intro and Exploration

- Let's see how well the CRC code runs with the smallest supported TTA
  - Compile the code to the TTA with the retargetable tcecc compiler
  - Load the processor architecture description and the compiled program to the Processor Simulator GUI (Proxim)



emacs22-gtk@untamo

File Edit Options Buffers Tools C Help

File

mini

```
/*
 * Function: crcFast()
 *
 * Description: Compute the CRC of a given message.
 *
 * Notes:     crcInit() must be called first.
 *
 * Returns:   The CRC of the message.
 */

crc
crcFast(unsigned char const message[], int nBytes)
{
    crc           remainder = INITIAL_REMAINDER;
    unsigned char data;
    int          i;

    /* Divide */
    for (by = 0; by < nBytes; by++)
    {
        data = message[by];
        remainder = remainder ^ data;
        for (i = 0; i < 8; i++)
        {
            if ((remainder & 1) == 1)
                remainder = (remainder ^ 0xEDB88320);
            remainder = remainder >> 1;
        }
    }
    /* The remainder */
    return remainder;
} /* crcF
```

otski@untamo: ~/intro

File Edit View Terminal Tabs Help

```
otski@untamo:~/intro$ prode minimal.adf &
[1] 8294
otski@untamo:~/intro$ tcecc -O2 -a minimal.adf -o crc.tpef crc.c main.c
$ proxim minimal.adf crc.tpef
```

ner - minimal.adf

intro - File Browser

elp

Reload Home Computer Search

100% Icon View

# Intro and Exploration

- Proxim's main window displays the disassembly of the TTA program
- The minimal.adf has only one bus, thus the moves cannot be parallelized





# Intro and Exploration

## ➤ Proxim's machine window:

- Visualizes the TTA processor when running the given program
- Single stepping the assembly code highlights the transport paths in the processor accessed by the moves in the current instruction
- Allows inspecting the values in programmer-visible registers of the TTA such as FU ports, the utilization of the components (color coding), etc.
- Most importantly for this case, the simulator displays the total cycle count **6109** (number of TTA instructions executed)
- Also statistics for the different operations executed, registers used etc. can be produced to guide manual exploration of the architecture
  - “info proc stats”





## TTA Processor Simulator

File View Edit Command Source Program Data Help

Machine Program

Run

Resume

Kill

Step1

Next1

|    |        | 0: B1                    |
|----|--------|--------------------------|
| 0  | _start | 16777208 -> RF.0         |
| 1  |        | 4 -> alu_comp.in2        |
| 2  | next>  | RF.0 -> alu_comp.in1.sub |
| 3  |        | alu_comp.out1 -> RF.0    |
| 4  |        | gcu.ra -> lsu.in2        |
| 5  |        | RF.0 -> lsu.in1.stw      |
| 6  |        | _main -> gcu.pc.call     |
| 7  |        | ...                      |
| 8  |        | ...                      |
| 9  |        | ...                      |
| 10 |        | _exit -> gcu.pc.call     |

```
>stepi
1      <_start+1>; 4 -> alu_comp.in2 ;
>stepi
2      <_start+2>; RF.0 -> alu_comp.in1.sub ;
>
```



# TTA Processor Simulator

File View Edit Command Source Program Data Help



Resume



Step!



| 0: B1 |       |                         |
|-------|-------|-------------------------|
| 7     | ...   |                         |
| 8     | ...   |                         |
| 9     | ...   |                         |
| 10    | ...   | _exit -> gcu.pc.call    |
| 11    | ...   |                         |
| 12    | ...   |                         |
| 13    | ...   |                         |
| 14    | _exit | 4 -> alu_comp.in2       |
| 15    | next> | RF0 -> alu_comp.in1.sub |
| 16    |       | alu_comp.out1 -> RF.0   |

```
>stepi  
1      <__start+1>: 4 -> alu_comp.in2 ;  
>stepi  
2      <__start+2>: RF.0 -> alu_comp.in1.sub ;  
>resume  
  
>
```

Finished

Cycles: 6109

Function unit: alu\_comp

Ports:

in1t: 16777200  
in2: 4  
out1: 16777204



Component utilization:

Trigger count: 1182

Operation executions:

ADD: 1047 (88.5787%)  
SUB: 24 (2.03046%)  
EQ: 111 (9.39086%)  
GT: 0 (0%)  
GTU: 0 (0%)

## TTA Processor Simulator

7

File View Edit Command Source Program Data Help



Run



Resume



Kill



Step



Next

|    |  | 0: B1                |  |
|----|--|----------------------|--|
| 8  |  | ...                  |  |
| 9  |  | ...                  |  |
| 10 |  | _exit -> gcu.pc.call |  |
| 11 |  | ...                  |  |

## operations:

|      |                            |
|------|----------------------------|
| ADD  | 17.1386% (1047 executions) |
| AND  | 1.66967% (102 executions)  |
| CALL | 0.0327386% (2 executions)  |
| EQ   | 1.81699% (111 executions)  |
| IOR  | 1.57145% (96 executions)   |
| JUMP | 0.91668% (56 executions)   |
| LDQ  | 0.0818465% (5 executions)  |
| LDQU | 0.163693% (10 executions)  |
| LDW  | 10.3618% (633 executions)  |
| SHL  | 0.818465% (5 executions)   |
| SHRU | 0.91668% (56 executions)   |
| STQ  | 0.0818465% (5 executions)  |
| STW  | 6.46587% (395 executions)  |
| SUB  | 0.392863% (24 executions)  |
| XOR  | 0.491079% (30 executions)  |



Finished Cycles: 6109

## Function unit: alu\_comp

## Ports:

in1t: 16777200  
in2: 4  
out1: 16777204



## Component utilization:

Trigger count: 1182

## Operation executions:

ADD: 1047 (88.5787%)  
SUB: 24 (2.03046%)  
EQ: 111 (9.39086%)  
GT: 0 (0%)  
GTU: 0 (0%)

# Intro and Exploration

- The minimal.adf has only 5 registers, the CRC algorithm can use more as we saw from the stats
- Let's add some more registers using the Processor Designer (ProDe)
  - In this case we double the number to 10 registers
- Recompile the program for the new architecture with 10 registers using tcecc
- This time we'll load the processor+program to the command line interface of the simulator (ttasim)
  - "info proc cycles" works here also and produces the cycle count **3116** which is almost halved from the one we got using a machine with only 5 registers







# Intro and Exploration

- Next we'll try the Design Space Explorer tool
- The tool is used to launch “explorer plugins” which perform modifications to the target and measure their effect to
  - cycle count
  - area estimate
  - energy estimate
  - longest path delay estimate
- The plugins can be fully automated or semi-automated
  - Can implement a loop that explores multiple points in the design space or just generate one new design space point (processor configuration)
- “explore -g” prints a list of available exploration plugins
  - In this example we use the GrowMachine plugin which adds basic resources to the machine until the cycle count does not drop anymore significantly





# Intro and Exploration

- The explorer is used by first adding the software of the application to a Design Space Database (dsdb)
- Then we launch the explorer plugin which produces one or more new “configurations” to the DSDB along with their characteristics data (at least cycle counts)
  - Plugins usually have parameters which can also be configured through explorer
- In this case 2 new configurations were produced after starting the GrowMachine from our minimal.adf starting point
- The best cycle count we got using this explorer plugin is **690**
- Let's see with ProDe how the generated best architecture looks like







# Intro and Exploration

- As we can see, the GrowMachine plugin has added more buses and FUs to the machine
  - Currently a brute-force approach of incrementing the current resource set with a constant factor is used
- For example, the machine has 9 buses (instead of 1), many more function units and additional two register files



# Profiling and Using a Custom Operation (about 4 minutes)

- The GrowMachine plugin managed to squeeze the cycle count down to **690** by just duplicating resources
- We are not happy with this number yet as we know it can get much lower when some custom hardware is used
- This video shows how to profile the application and use a custom operation (special function unit) to accelerate a “hot spot” in the CRC program



# Profiling and Using a Custom Operation

- First we'll compile the program with procedure inlining disabled so we get a proper function profile of the program





# Profiling and Using a Custom Operation

## ➤ Simulate the program:

- Note that the cycle count has increased due to the disabled inlining to **4917**
- Verify the program by dumping the computed CRC number from memory



## TIA Processor Simulator

File View Edit Command Source Program Data Help

Machine Program

Run Resume

Kill Step Next

|    |       | 0: B1                   |
|----|-------|-------------------------|
| 2  |       |                         |
| 3  |       | alu_comp.out1 -> RF0    |
| 4  |       | gcu.ra -> lsu.in2       |
| 5  |       | RF0 -> lsu.init.stw     |
| 6  |       | _main -> gcu.pc.call    |
| 7  |       | ...                     |
| 8  |       | ...                     |
| 9  |       | ...                     |
| 10 |       | _exit -> gcu.pc.call    |
| 11 |       | ...                     |
| 12 |       | ...                     |
| 13 |       | ...                     |
| 14 | _exit | 4 -> alu_comp.in2       |
| 15 | next> | RF0 -> alu_comp.in1:sub |
| 16 |       | alu_comp.out1 -> RF0    |

&gt;run

&gt;x /u w \_result

0x62488e82

&gt;

Finished

Cycles: 4917



# Profiling and Using a Custom Operation

## ➤ Program profile:

- To find out the “hot spot” in the program, we highlight the top executed instructions
- We find out the instructions in the `_reflect()` function are executed very frequently, thus it's a potential candidate for acceleration with a custom operation (special function unit)



## TTA Processor Simulator

3

File View Edit Command Source Program Data Help



Run Resume

Kill Step Next

|    | 0: B1                       |
|----|-----------------------------|
| 40 | 4 -> alu_comp.in2           |
| 41 | lsu.out1 -> gcu.ra          |
| 42 | RF0 -> alu_comp.in1t.add    |
| 43 | alu_comp.out1 -> RF0        |
| 44 | RF3 -> RF1                  |
| 45 | gcu.ra -> gcu.pc.jump       |
| 46 | ...                         |
| 47 | ...                         |
| 48 | ...                         |
| 49 | _reflect: 4 -> alu_comp.in2 |
| 50 | RF0 -> alu_comp.in1t.sub    |
| 51 | alu_comp.out1 -> RF0        |
| 52 | B -> alu_comp.in2           |
| 53 | RF0 -> alu_comp.in1t.add    |
| 54 | alu_comp.out1 -> RF2        |

&gt;run

>x /u w \_result  
0x62488e82

## Top 5 Execution Counts:

| Exec Count | Address range |
|------------|---------------|
| 112        | 112 - 127     |
| 112        | 91 - 97       |
| 57         | 98 - 110      |
| 11         | 129 - 137     |
| 11         | 49 - 90       |
| 5          | 161 - 325     |
| 1          | 327 - 355     |
| 1          | 139 - 160     |
| 1          | 27 - 48       |



Finished

Cycles: 4917

# Profiling and Using a Custom Operation

- We find out that the reflect function is called through macros `REFLECT_DATA` and `REFLECT_REMAINDER` in the core loop of the C code
- The `reflect()` computes a “bit reflection”
  - Reverses bits like a mirror was placed in the middle of the word)
  - We see from the macros that it's done only for word sizes 8 and 32 bits



emacs22-gtk@unitamo

File Edit Options Buffers Tools C Help

File Ed

New

minimal.

ots [2] ots [2] ots

```
/*
 * Function: crcFast()
 * Description: Compute the CRC of a given message.
 * Notes:     crcInit() must be called first.
 * Returns:   The CRC of the message.
 */
crcFast(unsigned char const message[], int nBytes)
{
    crc        remainder = INITIAL_REMAINDER;
    unsigned char data;
    int         byte;

    /*
     * Divide the message by the polynomial, a byte at a time.
     */
    for (byte = 0; byte < nBytes; ++byte)
    {
        data = REFLECT_DATA(message[byte]) ^ (remainder >> (WIDTH - 8));
        remainder = crcTable[data] ^ (remainder << 8);
    }

    /*
     * The final remainder is the CRC.
     */
    return (REFLECT_REMAINDER(remainder) ^ FINAL_XOR_VALUE);
} /* crcFast() */


```

(DOS)-- crc.c Bot (222,28) (C/I Abbrev)

ner - minimal.adf

Options Help

FU: RF: RF:

c.c --disable-inlining

GCU:  
gcu  
{ jump, call }

11 items, Free space: 21.9 GB

emacs22-gtk@unitamo

File Edit Options Buffers Tools C Help

File Editor

New

minimal.adf

```

* Notes: The parameters for each supported CRC standard are
* defined in the header file crc.h. The implementations
* here should stand up to further additions to that list.
*
*
* Copyright (c) 2000 by Michael Barr. This software is placed into
* the public domain and may be used for any purpose. However, this
* notice must not be changed or removed and no warranty is either
* expressed or implied by its publication or distribution.
*****/
```

/\* TCE version modified by Otto Esko

\* Changes:
\* Only CRC-32 version is used, other versions are omitted.
\* crcSlow version is preserved although it's not used.

```
*****/
```

```
#include "crc.h"

/*
 * Derive parameters from the standard-specific parameters in crc.h.
 */
#define WIDTH    32
#define TOPBIT   (1 << (WIDTH - 1))

#define REFLECT_DATA(X)    ((unsigned char)reflect((X), 8))
#define REFLECT_REMAINDER(X) ((crc)reflect((X), WIDTH))

*****/
* Function: reflect()
* Description: Reorder the bits of a binary sequence, by reflecting
* them about the middle position.
*
* Notes: No checking is done that nBits <= 32.
*
* Returns: The reflection of the original data.
*****/
```

```
static unsigned long
-(DOS)-- crc.c      3% (47,0)  (C/I Abbrev)
```

11 items, Free space: 21.9 GB



emacs22-gtk@unitamo

File Edit Options Buffers Tools C Help

File Editor

New

minimal.

ots [2] ots [2] ots

```
#define REFLECT_DATA(X) ((unsigned char)reflect((X), 8))
#define REFLECT_REMAINDER(X) ((crc)reflect((X), WIDTH))

/****************************************************************************
 * Function: reflect()
 * Description: Reorder the bits of a binary sequence, by reflecting
 *               them about the middle position.
 *
 * Notes:      No checking is done that nBits <= 32.
 *
 * Returns:    The reflection of the original data.
 */
static unsigned long
reflect(unsigned long data, unsigned char nBits)
{
    unsigned long reflection = 0x00000000;
    unsigned char bit;

    /* Reflect the data about the center bit. */
    for (bit = 0; bit < nBits; ++bit)
    {
        /*
         * If the LSB bit is set, set the reflection of it.
         */
        if (data & 0x01)
        {
            reflection |= (1 << ((nBits - 1) - bit));
        }

        data = (data >> 1);
    }

    return (reflection);
} /* reflect() */

/****************************************************************************
 * Function: crcSlow()
 */
(DOS)-- crc.c 21% (38.55) (C/I Abbrev)
```

ner - minimal.adf

Options Help

FU: RF: RF:

c.c --disable-inlining

# Profiling and Using a Custom Operation

- The reflect() function is extremely simple and efficient to implement in hardware (just wiring and shifting if necessary), but looks like a heavy loop when implemented in C code
- Let's create a custom operation for the REFLECT
  - Custom operations added to TCE using a tool called Operation Set Editor (OSEd)
- First we add general “static” information about the operation like its name and the number and type of inputs and outputs



**Operation properties**

|                                                                              |                              |
|------------------------------------------------------------------------------|------------------------------|
| <b>Operation properties</b>                                                  | <b>Operation description</b> |
| Name: <input type="text" value="REFLECT"/>                                   |                              |
| <input type="checkbox"/> Reads memory <input type="checkbox"/> Writes memory |                              |
| <input type="checkbox"/> Can trap <input type="checkbox"/> Has side effects  |                              |
| <input type="checkbox"/> Clocked                                             |                              |

  

| <b>Affected by:</b>                                                                                            | <b>Operation inputs</b>                                                                                              |         |      |   |          |
|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|---------|------|---|----------|
| <input type="text" value="operation"/>                                                                         | <table border="1"> <tr> <th>operand</th> <th>type</th> </tr> <tr> <td>1</td> <td>UIntWord</td> </tr> </table>        | operand | type | 1 | UIntWord |
| operand                                                                                                        | type                                                                                                                 |         |      |   |          |
| 1                                                                                                              | UIntWord                                                                                                             |         |      |   |          |
| <input type="button" value="ADDSUB"/> <input type="button" value="Add"/> <input type="button" value="Delete"/> | <input type="button" value="Add..."/> <input type="button" value="Modify..."/> <input type="button" value="Delete"/> |         |      |   |          |

  

| <b>Affects</b>                                                                                                 | <b>Operation outputs</b>                                                                                             |         |      |   |          |
|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|---------|------|---|----------|
| <input type="text" value="operation"/>                                                                         | <table border="1"> <tr> <th>operand</th> <th>type</th> </tr> <tr> <td>2</td> <td>UIntWord</td> </tr> </table>        | operand | type | 2 | UIntWord |
| operand                                                                                                        | type                                                                                                                 |         |      |   |          |
| 2                                                                                                              | UIntWord                                                                                                             |         |      |   |          |
| <input type="button" value="ADDSUB"/> <input type="button" value="Add"/> <input type="button" value="Delete"/> | <input type="button" value="Add..."/> <input type="button" value="Modify..."/> <input type="button" value="Delete"/> |         |      |   |          |

  

Operation behavior module not defined.



# Profiling and Using a Custom Operation

- Then let's add a simulation behavior description for the operation
- We can copy the original C code to the simulation behavior definition, just define:
  - Reads from operation inputs (UINT(1), UINT(2)) to variables in the C code
  - Write result to the operation output (IO(3))
- The simulation behavior is loaded runtime to the processor simulator
  - It's a "plugin" module which needs to be compiled
  - Build it with OSEd
  - Test that the simulation behavior definition works using the operation behavior simulator



emacs22-gtk@untamo

File Edit Options Buffers Tools C Help

```
/*
 * Function: reflect()
 *
 * Description: Reorder the bits of a binary sequence, by reflecting
 * them about the middle position.
 *
 * Notes: No checking is done that nBits <= 32.
 *
 * Returns: The reflection of the original data.
 */
static unsigned long
reflect(unsigned long data, unsigned char nBits)
{
    unsigned long reflection = 0x00000000;
    unsigned char bit;

    /*
     * Reflect the data about the center bit.
     */
    for (bit = 0; bit < nBits; ++bit)
    {
        /*
         * If the LSB bit is set, set the reflection of it.
         */
        if (data & 0x01)
        {
            reflection |= (1 << ((nBits - 1) - bit));
        }

        data = (data >> 1);
    }

    return (reflection);
} /* reflect() */

/*
 * Function: crcSlow()
 *
 * Description: Compute the CRC of a given message.
 *
 * Notes:
 *
 * Returns: The CRC of the message.
--(DOS)-- crc.c      23% (54,47) (C/I Abbrev)-----
```



emacs@untamo

File Edit Options Buffers Tools C++ Help

```
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

/*
 * OSAL behavior definition file.
 */

#include "OSAL.hh"
OPERATION(REFLECT)
TRIGGER
unsigned long data = UINT(1);
unsigned char nBits = UINT(2);
unsigned long reflection = 0x00000000;
unsigned char bit;

/*
 * Reflect the data about the center bit.
 */
for (bit = 0; bit < nBits; ++bit)
{
    /*
     * If the LSB bit is set, set the reflection of it.
     */
    if (data & 0x01)
    {
        reflection |= (1 << ((nBits - 1) - bit));
    }

    data = (data >> 1);
}

IO(3) = static_cast<unsigned long>(reflection);
END_TRIGGER;
END_OPERATION(REFLECT)
-----
```

GCU:  
gcu  
mp, call }







# Adding SFU to the Machine and Using it in C Code (1.5 minutes)

- Now that we have defined a new custom operation to the TCE, we can use it in our TTA in a special function unit and execute it from our C code
- Add the custom operation to a new function unit in the TTA with the Processor Designer tool
  - Add a function unit
  - Add ports to the function unit
  - Add the operation to the function unit
  - Edit the operations port bindings, pipeline resource usage, and latency
- In this case we are certain that the REFLECT operation can be done in 2 cycles in hardware
  - Probably 1 cycle would be enough due to the operation's simplicity, but we "play it safe"









# Adding SFU to the Machine and Using it in C Code (1.5 minutes)

- Now the architecture supports the REFLECT custom operation with the added function unit
- Let's now use the REFLECT operation from our C code to accelerate the algorithm
- First add:
  - #include "tceops.h"
  - This brings in the macros that are used to invoke TTA operations manually
- Then call the REFLECT operation through a TCE operation macro:
  - \_TCE\_REFLECT(...);



emacs22-gtk@untamo

File Edit Options Buffers Tools C Help

```
/*
 * Filename: crc.c
 *
 * Description: Slow and fast implementations of the CRC standards.
 *
 * Notes: The parameters for each supported CRC standard are
 * defined in the header file crc.h. The implementations
 * here should stand up to further additions to that list.
 *
 *
 * Copyright (c) 2000 by Michael Barr. This software is placed into
 * the public domain and may be used for any purpose. However, this
 * notice must not be changed or removed and no warranty is either
 * expressed or implied by its publication or distribution.
 */
/*
 * TCE version modified by Otto Esko
 *
 * Changes:
 * Only CRC-32 version is used, other versions are omitted.
 * crcSlow version is preserved although it's not used.
 */
#include "crc.h"
#include "tceops.h"

/*
 * Derive parameters from the standard-specific parameters in crc.h,
 */
#define WIDTH 32
#define TOPBIT (1 << (WIDTH - 1))

#define REFLECT_DATA() ((unsigned char)reflect((X), 8))
#define REFLECT_REMAINDER(X) ((crc)reflect((X), WIDTH))

/*
 * Function: reflect()
 *
 * Description: Reorder the bits of a binary sequence, by reflecting
 * them about the middle position.
 *
 * Notes: No checking is done that nBits <= 32.
-(DOS)** crc.c      Top (28,18) (C/I Abbrev)
```



emacs22-gtk@untamo

File Edit Options Buffers Tools C Help

File Edit Options Buffers Tools C Help

```
* Description: Compute the CRC of a given message.  
* Notes:     crcInit() must be called first.  
* Returns:   The CRC of the message.  
*****  
crc  
crcFast(unsigned char const message[], int nBytes)  
{  
    crc      remainder = INITIAL_REMAINDER;  
    unsigned char data;  
    int       byte;  
  
    /* Divide the message by the polynomial, a byte at a time.  
     */  
    for (byte = 0; byte < nBytes; ++byte)  
    {  
        data = REFLECT_DATA(message[byte]) ^ (remainder >> (WIDTH - 8));  
        remainder = crcTable[data] ^ (remainder << 8);  
    }  
  
    /*  
     * The final remainder is the CRC.  
     */  
    return (REFLECT_REMAINDER(remainder) ^ FINAL_XOR_VALUE);  
} /* crcFast() */
```

(DOS)-- crc.c Bot (216,0) (C/I Abbrev)

(DOS)-- main.c Hit (36,47) (L/I Abbrev)

14 items, Free space: 21.9 GB



emacs22-gtk@untamo

File Edit Options Buffers Tools C Help

```

* Description: Compute the CRC of a given message.
*
* Notes:      crcInit() must be called first.
*
* Returns:    The CRC of the message.[]

crcFast(unsigned char const message[], int nBytes)
{
    crc          remainder = INITIAL_REMAINDER;
    unsigned char data;
    int         byte;
    crc input = 0;
    crc output = 0;

    /*
     * Divide the message by the polynomial, a byte at a time.
     */
    for (byte = 0; byte < nBytes; ++byte)
    {
        input = message[byte];
        _TCE_REFLECT(input,8, output);
        data = output ^ (remainder >> (WIDTH - 8));
        remainder = crcTable[data] ^ (remainder << 8);
    }

    /*
     * The final remainder is the CRC.
     */
    _TCE_REFLECT(remainder, WIDTH, output);
    return (output ^ FINAL_XOR_VALUE);
} /* crcFast() */

```

(DOS)-- crc.c Bot (206,38) (C/I Abbrev)



# Adding SFU to the Machine and Using it in C Code (1.5 minutes)

- Finally, recompile the code which now uses the custom operation, verify that the program still works correctly, and see its effect to the cycle count using the simulator
  - Cycle count now dropped to **403**
  - By using custom operation we reached a lower cycle count with much less hardware
- Now we could use explorer to increase the performance
  - Current architecture has only one bus
  - By increasing concurrency we would reach lower cycle count





# Adding Implementation of the SFU to the Hardware Database (50 sec)

- Now we have found a good custom operation to accelerate our algorithm and used it in our architecture and C code
- In order to generate VHDL for the processor, we still need to add an implementation of the SFU to a Hardware Database (HDB)
- Of course, implementing the SFU might take a bit longer than the 50 sec, thus we use a previously implemented VHDL block for demonstration purposes :)
- HDBEditor is a GUI for editing HDBs, we use it to add the implementation to an HDB along with the data needed to generate a processor
  - The names of the input/output ports and the entity name in the VHDL, etc..





File Edit View Search Tools Documents Help

New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhdls

```

        end if;
      end case;
    end process reflect_process;

    reflect_data <= reflect_temp;
end reflect;

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity fu_reflect is

  generic (
    dataw : integer;
    busw : integer);

  port (
    tldata : in std_logic_vector(dataw-1 downto 0);
    tiload : in std_logic;
    oldata : in std_logic_vector(dataw-1 downto 0);
    olload : in std_logic;
    r1data : out std_logic_vector(dataw-1 downto 0);
    clk : in std_logic;
    rstsx : in std_logic;
    glock : in std_logic);
end fu_reflect;

architecture rtl of fu_reflect is

  component reflect
    generic (
      dataw : integer;
      size : integer);
    port (
      data_in : in std_logic_vector(dataw-1 downto 0);
      size : in integer;
      reflect_data : out std_logic_vector(dataw-1 downto 0));
  end component;

  signal tlreg : std_logic_vector(dataw-1 downto 0);
  signal olreg : std_logic_vector(dataw-1 downto 0);
  signal r1 : std_logic;
  signal rlreg : std_logic;
  signal control : std_logic;
  signal result_en_r : std_logic;
begin
  fu_arch : reflect
    generic map (
      dataw =>

```

HDB Editor - intro.hdb

File Edit Help

Global lock port: glock

Global lock req. port:

Architecture ports:

| Name   | Architecture port | Load port | Guard port | Width formula |
|--------|-------------------|-----------|------------|---------------|
| tldata | p1                | tiload    |            | dataw         |
| odata  | p2                | olload    |            | dataw         |
| r1data | p3                |           |            | busw          |

No opcodes.

No external ports.

Parameters:

| Name  | Type    | Value |
|-------|---------|-------|
| dataw | integer | 32    |
| busw  | integer | 32    |

Icon View

Places

- otski
- Desktop
- File System
- Network
- 134.0 GB Media
- 32.2 GB Media
- 10.7 GB Media
- 500.1 GB Media
- CD-RW/DVD±RW D...
- Trash
- Documents
- Music
- Pictures
- Videos
- proffa

crc.n crc.tper crcitable.dat

demo.opb demo.opp intro.dsdb

main.c minimal.adf reflect.vhdls

DOS

"reflect.vhdls" selected (3.9 KB)

# Adding Implementation of the SFU to the Hardware Database (50 sec)

- Now we have added an implementation of the REFLECT SFU to a HDB
- Finally we need to connect the architecture of the FU in our TTA architecture file to this implementation
  - Use automated exploration plugin for this
- In TCE, architecture of the processor components and the actual implementation are separated
  - Architecture components (in ADF files edited with ProDe) are connected to HDB implementations through an Implementation Definition File (IDF)
  - Architecture definition file (ADF), implementation definition file (IDF) and one or more Hardware Databases (HDB) form a “processor configuration” that can be outputted as a VHDL implementation



File Edit View Search Tools Documents Help

New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhdl

```

        end if;
      end case;
    end process reflect_process;
  
```

File Edit View Tools Options Help

New Open Save Delete Modify Zoom

minimal.adf

TTA Processor Designer - minimal.adf

Call Explorer Plugin

Explorer Plugin: ImplementationSelector

Plugin Description: Creates implementation for the given machine.

Plugin Parameters:

| Name   | Type   | Value               |
|--------|--------|---------------------|
| adf    | string |                     |
| ic_dec | string | DefaultICDecoder    |
| ic_hdb | string | asic_130nm_1.5V.hdb |

Edit Parameter... Run Plugin Close

DOS

```

data_in      : ;
size         : ;
reflect_data : ;
end component;

signal tlreg   : s;
signal olreg   : s;
signal r1      : s;
signal rlreg   : s;
signal control : s;

signal result_en_r

begin

  fu_arch : reflect
  generic map (
    dataw     =>
  
```

"reflect.vhdl" selected (3.9 KB)



# Generating the Processor (32 sec)

- Now we have all we need to generate the processor implementation in VHDL
- For this we use the Processor Generator (ProGe) tool which can be invoked from the command line or from the ProDe GUI









File Edit View Search Tools Documents Help

New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhdl toplevel.vhdl

```

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use work.globals.all;
use work.util.all;
use work.imem_mau.all;
use work.toplevel_params.all;

entity toplevel is

    port (
        clk : in std_logic;
        rsts : in std_logic;
        busy : in std_logic;
        imem_en_x : out std_logic;
        imem_addr : out std_logic_vector(IMEMADDRWIDTH-1 downto 0);
        imem_data : in std_logic_vector(IMEMWIDTHINMAUS*IMEMMAUWIDTH-1 downto 0);
        pc_init : in std_logic_vector(IMEMADDRWIDTH-1 downto 0);
        fu_lsu_data_in : in std_logic_vector(fu_lsu_dataw-1 downto 0);
        fu_lsu_data_out : out std_logic_vector(fu_lsu_dataw-1 downto 0);
        fu_lsu_addr : out std_logic_vector(fu_lsu_addrw-2 downto 0);
        fu_lsu_mem_en_x : out std_logic_vector(0 downto 0);
        fu_lsu_wr_en_x : out std_logic_vector(0 downto 0);
        fu_lsu_wr_mask_x : out std_logic_vector(fu_lsu_dataw-1 downto 0));

end toplevel;

architecture structural of toplevel is

    signal inst_fetch_ra_out_wire : std_logic_vector(IMEMADDRWIDTH-1 downto 0);
    signal inst_fetch_ra_in_wire : std_logic_vector(IMEMADDRWIDTH-1 downto 0);
    signal inst_fetch_pc_in_wire : std_logic_vector(IMEMADDRWIDTH-1 downto 0);
    signal inst_fetch_pc_load_wire : std_logic;
    signal inst_fetch_ra_load_wire : std_logic;
    signal inst_fetch_pc_opcode_wire : std_logic_vector(0 downto 0);
    signal inst_fetch_fetch_en_wire : std_logic;
    signal inst_fetch_glock_wire : std_logic;
    signal inst_fetch_fetchblock_wire : std_logic_vector(IMEMWIDTHINMAUS*IMEMMAUWIDTH-1 downto 0);
    signal fu_lsu_tldata_wire : std_logic_vector(23 downto 0);
    signal fu_lsu_tload_wire : std_logic;
    signal fu_lsu_oldata_wire : std_logic_vector(31 downto 0);
    signal fu_lsu_olload_wire : std_logic;
    signal fu_lsu_rldata_wire : std_logic_vector(31 downto 0);
    signal fu_lsu_tlopcode_wire : std_logic_vector(2 downto 0);
    signal fu_lsu_glock_wire : std_logic;
    signal fu_logic_tldata_wire : std_logic_vector(31 downto 0);
    signal fu_logic_tlload_wire : std_logic;
    signal fu_logic_oldata_wire : std_logic_vector(31 downto 0);
    signal fu_logic_olload_wire : std_logic;
    signal fu_logic_rldata_wire : std_logic_vector(31 downto 0);
    signal fu_logic_tlopcode_wire : std_logic_vector(1 downto 0);
    signal fu_logic_glock_wire : std_logic;
    signal fu_reflect_tldata_wire : std_logic_vector(31 downto 0);

```

mininal.adf

FU: File Edit otski@un Warning: Warning: ddress s otski@un

tw\_stq\_u.vhdl  
u.vhdl  
rams\_dl

(DOS)

File Edit Opt

New Open Op

Ln 1, Col 1 INS

Loading file '/home/otski/intro/proge-output/vhdl/toplevel.vhdl'...

# Change Load-store Unit to an Avalon Bus Load/store Unit (36 secs)

- Next we'll use an FPGA board to test the processor
- For this we need to change the load-store unit function unit implementation to one that supports Altera's Avalon interface
  - We'll use the Altera Memory Mapped Interface
  - TTA acts as a master on the bus
  - This way we can use Altera's IP-components
- This can be done quickly with the Processor Designer tool



File Edit View Search Tools Documents Help

New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhd़ toplevel.vhd़

```
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use work_globals.all;
```

TTA Processor Designer - minimal.adf

File Edit View Tools Options Help

New Open Save Delete Modify Zoom + Zoom - Fit Win Options Help

minimal.adf

The diagram shows a horizontal bus at index 0. Seven functional units (FUs) are connected to this bus. From left to right, the FUs are: lsu, alu\_comp, logic, ext1, ext2, shifter, and rotat. Each FU has two inputs and one output. The lsu FU is currently selected, indicated by a dashed blue border around its box.

```

signal inst_fetch_fetch_en_wire : std_logic;
signal inst_fetch_glock_wire : std_logic;
signal inst_fetch_fetchblock_wire : std_logic_vector(IMEMWIDTHINMAUS*IMEMMAUWIDTH-1 downto 0);
signal fu_lsu_tldata_wire : std_logic_vector(23 downto 0);
signal fu_lsu_tload_wire : std_logic;
signal fu_lsu_oldata_wire : std_logic_vector(31 downto 0);
signal fu_lsu_olload_wire : std_logic;
signal fu_lsu_rldata_wire : std_logic_vector(31 downto 0);
signal fu_lsu_tlopcode_wire : std_logic_vector(2 downto 0);
signal fu_lsu_glock_wire : std_logic;
signal fu_logic_tldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_tlload_wire : std_logic;
signal fu_logic_oldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_olload_wire : std_logic;
signal fu_logic_rldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_tlopcode_wire : std_logic_vector(1 downto 0);
signal fu_logic_glock_wire : std_logic;
signal fu_reflect_tldata_wire : std_logic_vector(31 downto 0);

```

(DOS)

Ln 5, Col 19 INS

File Edit View Search Tools Documents Help

New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhdl toplevel.vhdl

library IEEE;  
use IEEE.std\_logic\_1164.all;  
use IEEE.std\_logic\_arith.all;  
use work\_globals.all;

TTA Processor Designer - minimal.adf

File Edit View Tools Options Help

Undo Ctrl+Z  
Redo Ctrl+Y

Cut Ctrl+X  
Copy Ctrl+C  
Paste Ctrl+V

Add >  
Add From HDB > Function Unit...  
Delete  
Modify...  
Address Spaces...  
Instruction Templates...  
Immediate Slots...  
Transport Bus Order...

Ctrl+X Zoom + Fit Win Options Help

minim... (DOS)

```

signal inst_fetch_fetch_en_wire : std_logic;
signal inst_fetch_glock_wire : std_logic;
signal inst_fetch_fetchblock_wire : std_logic_vector(IMEMWIDTHINMAUS*IMEMMAUWIDTH-1 downto 0);
signal fu_lsu_tldata_wire : std_logic_vector(23 downto 0);
signal fu_lsu_tload_wire : std_logic;
signal fu_lsu_oldata_wire : std_logic_vector(31 downto 0);
signal fu_lsu_olload_wire : std_logic;
signal fu_lsu_rldata_wire : std_logic_vector(31 downto 0);
signal fu_lsu_tlopcode_wire : std_logic_vector(2 downto 0);
signal fu_lsu_glock_wire : std_logic;
signal fu_logic_tldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_tload_wire : std_logic;
signal fu_logicOldData_wire : std_logic_vector(31 downto 0);
signal fu_logic_olload_wire : std_logic;
signal fu_logic_rldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_tlopcode_wire : std_logic_vector(1 downto 0);
signal fu_logic_glock_wire : std_logic;
signal fu_reflect_tldata_wire : std_logic_vector(31 downto 0);

```

Ln 5, Col 19 INS

File Edit View Search Tools Documents Help

New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhd़ toplevel.vhd़

```
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use work.aloholic.all.
```

TTA Processor Designer - minimal.adf

File Edit View Tools Options Help

New Open Save Delete Modify Zoom + Zoom - Fit Win Options Help

minimal.adf

**HDB Function Units**

| Latency | Operations                         | Impls | HDB ID                                | HDB |
|---------|------------------------------------|-------|---------------------------------------|-----|
| 1       | and(1), ior(1)                     | 14    | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 1       | and(1), ior(1)                     | 13    | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 2       | and(2), ior(2)                     | 12    | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 2       | and(2), ior(2)                     | 11    | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 2       | add(2), sub(2)                     | 10    | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 2       | add(2), sub(2)                     | 9     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 1       | add(1), sub(1)                     | 8     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 1       | add(1), sub(1)                     | 7     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 2       | add(2)                             | 6     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 2       | add(2)                             | 5     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 1       | add(1)                             | 4     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 1       | add(1)                             | 3     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 1       | sxhw(1)                            | 2     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 1       | sxhw(1)                            | 1     | /home/otski/work/tce-1.0/tce/hdb/asic |     |
| 2       | reflect(2)                         | 1     | /home/otski/intro/intro.hdb           |     |
| 6       | ldh(6), ldhu(6), ldq(6), ldqu(...  | 2     | /home/otski/intro/avalon.hdb          |     |
| 1..3    | ldw(3), ldq(3), ldh(3), stw(1),... | 1     | /home/otski/intro/avalon.hdb          |     |

Add Close

```
signal inst_fetch_fetch_en_wire : std_logic;
signal inst_fetch_glock_wire : std_logic;
signal inst_fetch_fetchblock_wire : std_logic;
signal fu_lsu_tldata_wire : std_logic;
signal fu_lsu_tload_wire : std_logic;
signal fu_lsu_oldata_wire : std_logic;
signal fu_lsu_olload_wire : std_logic;
signal fu_lsu_rldata_wire : std_logic;
signal fu_lsu_tlopcode_wire : std_logic_vector(2 downto 0);
signal fu_lsu_glock_wire : std_logic;
signal fu_logic_tldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_tlload_wire : std_logic;
signal fu_logic_oldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_olload_wire : std_logic;
signal fu_logic_rldata_wire : std_logic_vector(31 downto 0);
signal fu_logic_tlopcode_wire : std_logic_vector(1 downto 0);
signal fu_logic_glock_wire : std_logic;
signal fu_reflect_tldata_wire : std_logic_vector(31 downto 0);
```

Ln 5, Col 19 INS

The screenshot shows the TTA Processor Designer interface. At the top, there's a menu bar with File, Edit, View, Tools, Options, and Help. Below it is a toolbar with icons for New, Open, Save, Delete, Modify, and various zoom and help functions. A file list on the left shows 'reflect.vhd़' and 'toplevel.vhd़'. The main area has two functional units: 'FU: alu\_comp' and 'FU: logic'. The 'alu\_comp' unit contains four inputs and four outputs, while the 'logic' unit contains four inputs and four outputs. Below these are four vertical registers. A horizontal bus labeled '0' connects the outputs of the 'alu\_comp' unit to the inputs of the 'logic' unit. The 'HDB Function Units' dialog box is open, listing various operations with their latencies, implementations, HDB IDs, and paths. One entry for 'ldh(6)' is highlighted. At the bottom, a large code editor window displays VHDL code for signal declarations, including various wires like 'inst\_fetch\_fetch\_en\_wire', 'fu\_lsu\_tldata\_wire', and 'fu\_reflect\_tldata\_wire'.

File Edit View Search Tools Documents Help

New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhdl toplevel.vhdl

```
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use work.aloha.all;
```

TTA Processor Designer - minimal.adf

File Edit View Tools Options Help

New Open Save Delete Modify Zoom + Zoom - Fit Win Options Help

minimal.adf

The diagram illustrates a TTA (Throughput-Timed Architectural) processor architecture. It features a row of functional units (FUs) connected to a shared memory bus. From left to right, the FUs are: ext1 (sxhw), ext2 (sqxw), shifter (shl, shr, shru), rotator (rotl, rotr), reflect, and avalon\_mm\_int (ldh, ldhu, ldq, ldqu, ldw, sth, stq). A separate RF (10x32) is also shown. Each FU has two vertical ports, and the RF has one vertical port. The shared bus is represented by a horizontal line with black dots indicating connection points.

```
--> inst_fetch_tldata : std_logic_vector(31 downto 0);
signal inst_fetch_glock : std_logic;
signal inst_fetch_tldata_wire : std_logic_vector(31 downto 0);
signal fu_lsuldata_wire : std_logic_vector(31 downto 0);
signal fu_lsulload_wire : std_logic;
signal fu_lsulodata_wire : std_logic_vector(31 downto 0);
signal fu_lsulload_wire : std_logic;
signal fu_lsurldata_wire : std_logic_vector(31 downto 0);
signal fu_lsutopcode_wire : std_logic_vector(2 downto 0);
signal fu_lsuglock_wire : std_logic;
signal fu_logictldata_wire : std_logic_vector(31 downto 0);
signal fu_logictlload_wire : std_logic;
signal fu_logicOldData_wire : std_logic_vector(31 downto 0);
signal fu_logicOlload_wire : std_logic;
signal fu_logicRldata_wire : std_logic_vector(31 downto 0);
signal fu_logictopcode_wire : std_logic_vector(1 downto 0);
signal fu_logicglock_wire : std_logic;
signal fu_reflect_tldata_wire : std_logic_vector(31 downto 0);
```

Ln 5, Col 19 INS

# Using Avalon LCD for Output (21 secs)

- Now the TTA can interface with the memory (and other I/O) in the FPGA board using the Avalon bus
- Finally, we need a device to produce some output from our CRC computation to verify it actually works
- For this we use an LCD screen connected through the Avalon bus
  - We use the LCD controller from SOPC Builder's IP component library
- The LCD controller is connected to the Avalon Memory Mapped bus interface so we can define a putchar() function (which is used by printf()) that writes characters to the controller's memory mapped registers



File Edit View Search Tools Documents Help

emacs22-gtk@untamo

File Edit Options Buffers Tools C Help

```

* TCE version modified by Otto Esko
* Notes: Only crc-32 is used
***** _DEBUG
#endif /* _DEBUG */

#include "crc.h"
#include "lcd_driver/tta_lcd.h"

volatile crc result = 0;
unsigned char test[] = "TCE rocks!";
#define LENGTH 10

altera_avalon_lcd_16207_state lcd;

/* define our own putchar */
#undef putchar
int putchar(int ch) {
    lcd_write_data(&lcd, ch);
}

int
main(void) {
    /* init lcd */
    altera_avalon_lcd_16207_init(&lcd);
    /*
     * Compute the CRC of the test message, more efficiently.
     */
    result = crcFast(test, LENGTH);

#ifdef _DEBUG
    printf("TCE rocks! is\n 0x%08X\n", result);
#endif /* _DEBUG */

    return 0;
}

```

--(DOS)\*\* main.c Bot (35,26) (C/I Abbrev)

signal fu\_lsu\_rlda  
signal fu\_lsu\_tlop  
signal fu\_lsu\_gloc  
signal fu\_logic\_t1  
signal fu\_logic\_t1  
signal fu\_logic\_o1  
signal fu\_logic\_o1  
signal fu\_logic\_r1  
signal fu\_logic\_t1  
signal fu\_logic\_g1  
signal fu\_reflect

"tta\_lcd.c" selected (17.7 KB)

File Edit Options Buffers Tools C Help

File Edit Options Buffers Tools C Help

lcd\_driver - File Browser

Help

Reload Home Computer Search

100% Icon View

\_pkg.vhdl  
 main.c  
 minimal.adf  
 minimal.bem  
 img  
 prog.tpef  
 prog\_data.img  
 tta\_lcd.c  
 cd.h  
 tta\_sys.c  
 tta\_sys.h

# Generate the Bit Image of the Program Memory and Synthesize the Design (2 minutes)

- Finally, to get the TTA running on the FPGA we need to generate a bit image of the program memory
  - Use command line tool generatebits



New Open Save Print... Undo Redo Cut Copy Paste Find Replace

reflect.vhdl toplevel.vhdl

```
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use work.aloha.all;
```

## TTA Processor Des

File Edit View Tools Options Help



minimal.adf

FU: FU: FU:

otski@untamo: ~/intro

```
File Edit View Terminal Tabs Help
otski@untamo:~/intro$ generatebits -d -w 4 -f array -p crc.tpef minimal.adf
BEM file was not given --> Generating default BEM..
otski@untamo:~/intro$
```

emacs22-gtk@untamo

File Edit Options Buffers Tools C Help



```
typedef int alt_alarm;
typedef struct altera_avalon_lcd_16207_state_s
{
    int          base;
    alt_alarm    alarm;
    int          period;
    char         broken;
    unsigned char x;
```

routines are  
t update the

7\_state\* sp);



# Generate the Bit Image of the Program Memory and Synthesize the Design

- Load the VHDL files of the generated TTA processor to the Altera's Quartus II tool to synthesize the design to the FPGA
  - We add a layer on top of the ProGe generated toplevel.vhdl (not displayed in the video)
    - In this case the instruction memory is very small so we implement as “logic”
    - Synthesize tools optimizes it into a small space of internal memory banks and logic
  - The external interface of the new layer is the external buses of the load-store unit we added, and control signals (clk, reset)
    - The LSU interface is actually the Avalon interface
  - Then in Altera's SOPC builder we export TTA as a component to the design along with the onchip memory and the LCD component and connect them all to the Avalon bus
    - TTA is the Avalon Master and the memory and LCD controller are slaves
  - Synthesize the design to the FPGA and note how many of the logic elements were consumed of the FPGA by our TTA
  - Finally, upload the design to the FPGA board





File Edit View Project Assignments

File Edit Options Buffers Tools VHDL Help

Project Navigator: testi.qip

```

use work.imem_mau.all;
use work.toplevel_params.all;

entity tta_processor is
  port (
    -- CLOCK_50 : in std_logic;
    KEY : in std_logic_vector(3 downto 0);
    clk : in std_logic;
    rst_n : in std_logic;
    d_address : out std_logic_vector(11-1 downto 0);
    d_bytewable : out std_logic_vector(4-1 downto 0);
    d_read : out std_logic;
    d_write : out std_logic;
    d_writedata : out std_logic_vector(32-1 downto 0);
    d_irq : in std_logic_vector(32-1 downto 0);
    d_readdata : in std_logic_vector(32-1 downto 0);
    d_waitrequest : in std_logic
  );
end tta_processor;

architecture top of tta_processor is

component topLevel is
  port (
    clk : in std_logic;
    rstx : in std_logic;
    busy : in std_logic;
    imem_en_x : out std_logic;
    imem_addr : out std_logic_vector(IMEMADDRWIDTH-1 downto 0);
    imem_data : in std_logic_vector(IMEMWIDTHINMAUS*IMEMMAUWIDTH-1 downto 0);
    pc_init : in std_logic_vector(IMEMADDRWIDTH-1 downto 0);
    fu_lsu_d_address : out std_logic_vector(fu_lsu_a_addrw_g-1 downto 0);
    fu_lsu_d_bytewable : out std_logic_vector(fu_lsu_a_bytewable_g-1 downto 0);
    fu_lsu_d_read : out std_logic_vector(0 downto 0);
    fu_lsu_d_write : out std_logic_vector(0 downto 0);
    fu_lsu_d_writedata : out std_logic_vector(fu_lsu_a_dataw_g-1 downto 0);
    fu_lsu_d_irq : in std_logic_vector(fu_lsu_a_irqw_g-1 downto 0);
    fu_lsu_d_readdata : in std_logic_vector(fu_lsu_a_dataw_g-1 downto 0);
    fu_lsu_d_waitrequest : in std_logic_vector(0 downto 0);
  );
end component;

signal dummy_wire : std_logic;
signal clk_wire : std_logic;
signal reset_x_wire : std_logic;
signal busy_wire : std_logic;
signal pc_init_wire : std_logic_vector(IMEMADDRWIDTH-1 downto 0);

signal imem_addr_wire : std_logic_vector(IMEMADDRWIDTH-1 downto 0);
signal imem_data_wire : std_logic_vector(IMEMWIDTHINMAUS*IMEMMAUWIDTH-1 downto 0);
signal d_address_w : std_logic_vector(fu_lsu_a_addrw_g-1 downto 0);
signal d_bytewable_w : std_logic_vector(fu_lsu_a_bytewable_g-1 downto 0);
signal d_read_w : std_logic;
signal d_write_w : std_logic;
signal d_writedata_w : std_logic_vector(fu_lsu_a_dataw_g-1 downto 0);
signal d_irq_w : std_logic_vector(fu_lsu_a_irqw_g-1 downto 0);
signal d_readdata_w : std_logic_vector(fu_lsu_a_dataw_g-1 downto 0);
signal d_waitrequest_w : std_logic;

component inst_mem_logic is
  generic (
    addrw : integer := 10;
    instru : integer := 100);
  port (
    clock : in std_logic;
    addr : in std_logic_vector(addrw-1 downto 0);
    -- tta_processor.vhd 3% (22.33) (VHDL/es)
  );
end component;

```

Hierarchy | Files | Design Units

Tasks: Compilation

Flow: Compilation

Task List:

- Compile Design
- Analysis & Synthesis
- Fitter (Place & Route)
- Assembler (Generate)
- Classic Timing Analysis
- EDA Netlist Writer

Program Device (Open)

Type Message

System Processing Extra Information

Message:

For Help, press F1

"tta\_icd.c" selected (17.7 KB)

The screenshot shows the Idera Virtus II Version 8.0 software interface. On the left, there is a project navigator window for a file named 'testi.qip'. Below it is a tasks list with various compilation steps checked off. A large central area is a VHDL code editor with syntax highlighting for VHDL components like 'tta\_processor' and 'topLevel'. To the right of the code editor is a graphical interface showing a large 'IDERA' logo and 'VIRTUS II Version 8.0' text. There are also buttons for 'Download New Software Release' and 'Documentation'.

File Edit Module System View Tools Help

System Contents System Generation

Altera SOPC Builder

- Create new component...
- Nios II Processor
- Bridges and Adapters
  - Memory Mapped
  - Streaming
- Interface Protocols
- Legacy Components
- Memories and Memory Controllers
- Peripherals
  - Debug and Performance
  - Display
    - Character LCD
    - Pixel Converter (BGR)
    - Video Sync Generator
  - FPGA Peripherals
  - Microcontroller Peripherals
    - Interval Timer
    - PIO (Parallel I/O)
  - Multiprocessor Coordination
- PLL
- USB
- User Logic
  - n2h2\_str
  - tta\_processor
- Video and Image Processing

Target

Device Family: Cyclone II

Clock Settings

| Name | Source   | MHz  |
|------|----------|------|
| clk  | External | 50.0 |

Add Remove

Module Name Description Clock Base End IRQ

|                                                |                                   |     |            |            |  |
|------------------------------------------------|-----------------------------------|-----|------------|------------|--|
| <input checked="" type="checkbox"/> processor  | tta_processor                     | clk | IRQ 0      | IRQ 31     |  |
| <input checked="" type="checkbox"/> onchip_mem | avalon_master                     |     | 0x00000000 | 0x000003ff |  |
| <input checked="" type="checkbox"/> s1         | On-chip memory                    |     |            |            |  |
| <input checked="" type="checkbox"/> lcd        | tta_processor [tta_processor 1.0] | clk | 0x00000700 | 0x0000070f |  |
| control_slave                                  | Character LCD                     |     |            |            |  |
|                                                | Avalon Memory Mapped Slave        |     |            |            |  |

Add... Remove Edit Move Up Move Down Address Map... Filter...

Info: Your system is ready to generate.

For Help, press F1

Exit Help Prev Next Generate

Idle

File Edit Module System View Tools Help

System Contents System Generation

Options  
System module logic will be created in VHDL  
 Simulation. Create project simulator files.

```
Info: functions, and any output files from any of the foregoing
Info: (including device programming or simulation files), and any
Info: associated documentation or information are expressly subject
Info: to the terms and conditions of the Altera Program License
Info: Subscription Agreement, Altera MegaCore Function License
Info: Agreement, or other applicable license agreement, including,
Info: without limitation, that your use is for the sole purpose of
Info: programming logic devices manufactured by Altera and sold by
Info: Altera or its authorized distributors. Please refer to the
Info: applicable agreement for further details:
Info: Processing started: Fri Feb 13 09:58:45 2009
① Info: Command: quartus_sh -t testi_setup_quartus.tcl
① Info: Evaluation of Tcl script testi_setup_quartus.tcl was successful
① Info: Quartus II Shell was successful. 0 errors, 0 warnings
Info: Processing ended: Fri Feb 13 09:58:45 2009
Info: Elapsed time: 00:00:00
Info: Total CPU time (on all processors): 00:00:00
# 2009.02.13 09:58:45 (*) Completed generation for system: testi.
# 2009.02.13 09:58:45 (*) THE FOLLOWING SYSTEM ITEMS HAVE BEEN GENERATED:
SOPC Builder database : /home/otski/hibi/n2h2/poista/testi.ptf
System HDL Model : /home/otski/hibi/n2h2/poista/testi.vhd
System Generation Script : /home/otski/hibi/n2h2/poista/testi_generation_script
# 2009.02.13 09:58:45 (*) SUCCESS: SYSTEM GENERATION COMPLETED.
① Info: System generation was successful.

① Info: Your system is ready to generate.
```

For Help, press F1

Idle

File Edit Processing Tools Window

Hardware Setup...

USB-Blaster [USB 4-1.1]

Mode: JTAG

Progress:

0 %

Enable real-time ISP to allow background programming (for MAX II devices)

Start

Stop

Auto Detect

Delete

Add File...

Change File...

Save File...

Add Device...

Up

Down

| File      | Device     | Checksum | Usercode | Program/<br>Configure               | Verify                   | Blank-<br>Check          | Examine                  | Security<br>Bit          | Erase                    | ISP<br>CLAMP             |  |
|-----------|------------|----------|----------|-------------------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--|
| testi.sof | EP2C35F672 | 004C46B0 | FFFFFFF  | <input checked="" type="checkbox"/> | <input type="checkbox"/> |  |

testi.sof

EP2C35F672

004C46B0

FFFFFFF

Task Flow

For Help, press F1

System (8) / Processing (200) / Extra Info / Info (143) / Warning (56) / Critical Warning (1) / Error / Suppressed (6) / Flag /

Message: 0 of 8

Location:

Locate

For Help, press F1

Idle

"tta\_lcd.c" selected (17.7 KB)

# The End – Thanks for Your Attention!

