Skip to main content

2023 | Buch

Guide to Computer Processor Architecture

A RISC-V Approach, with High-Level Synthesis

insite
SUCHEN

Über dieses Buch

The book presents a succession of RISC-V processor implementations in increasing difficulty (non pipelined, pipelined, deeply pipelined, multithreaded, multicore).
Each implementation is shown as an HLS (High Level Synthesis) code in C++ which can really be synthesized and tested on an FPGA based development board (such a board can be freely obtained from the Xilinx University Program targeting the university professors).
The book can be useful for three reasons. First, it is a novel way to introduce computer architecture. The codes given can serve as labs for a processor architecture course. Second, the book content is based on the RISC-V Instruction Set Architecture, which is an open-source machine language promised to become the machine language to be taught, replacing DLX and MIPS. Third, all the designs are implemented through the High Level Synthesis, a tool which is able to translate a C program into an IP (Intellectual Property). Hence, the book can serve to engineers willing to implement processors on FPGA and to researchers willing to develop RISC-V based hardware simulators.

Inhaltsverzeichnis

Frontmatter

Single Core Processors

Frontmatter
1. Introduction: What Is an FPGA, What Is High-Level Synthesis or HLS?
Abstract
This chapter shows what an FPGA is and how it is structured from Configurable Logic Blocks or CLB (in the Xilinx terminology, or LAB, i.e. Logic Array Blocks in Altera FPGAs). It also shows how a hardware is mapped on the CLB resources and how a C program can be used to describe a circuit. An HLS tool transforms the C source code into an intermediate code in VHDL or Verilog and a placement and routing tool builds the bitstream to be sent to configure the FPGA.
Bernard Goossens
2. Setting up and Using the Vitis_HLS, Vivado, and Vitis IDE Tools
Abstract
This chapter gives you the basic instructions to setup the Xilinx tools to implement some circuit on an FPGA and to test it on a development board. It is presented as a lab that you should carry out. The aim is to learn how to use the Vitis/Vivado tools to design, implement, and run an IP.
Bernard Goossens
3. Installing and Using the RISC-V Tools
Abstract
This chapter gives you the basic instructions to setup the RISC-V tools, i.e. the RISC-V toolchain and the RISC-V simulator/debugger. The toolchain includes a cross-compiler to produce RISC-V RV32I machine code. The spike simulator/debugger is useful to run RISC-V codes with no RISC-V hardware. The result of a spike simulation is to be compared to the result of a run on an FPGA implementation of a RISC-V processor IP.
Bernard Goossens
4. The RISC-V Architecture
Abstract
This chapter briefly presents the RISC-V architecture and more precisely, its RV32I instruction set with examples taken from the compiler translations of small C codes.
Bernard Goossens
5. Building a Fetching, Decoding, and Executing Processor
Abstract
This chapter prepares the building of your first RISC-V processor. First, a fetching machine is implemented. It is only able to fetch successive words from a code memory. Second, the fetching machine is upgraded to include a decoding mechanism. Third, the fetching and decoding machine is completed with an execution engine to run computation and control instructions, but not yet memory accessing ones.
Bernard Goossens
6. Building a RISC-V Processor
Abstract
This chapter makes you build your first RISC-V processor. The implemented microarchitecture proposed in this first version is not pipelined. The IP cycle encompasses the fetch, the decoding, and the execution of an instruction.
Bernard Goossens
7. Testing Your RISC-V Processor
Abstract
This chapter lets you test your first RISC-V processor in three steps: test all the instructions in their most frequent usage (my six test programs), pass the official riscv-tests and test benchmark programs from the mibench suite and from the official riscv-tests.
Bernard Goossens
8. Building a Pipelined RISC-V Processor
Abstract
This chapter will make you build your second RISC-V processor. The implemented microarchitecture proposed in this second version is pipelined. Within a single processor cycle, the updated processor fetches and decodes instruction i, executes instruction i-1, accesses memory for instruction i-2 and writes a result back for instruction i-3.
Bernard Goossens
9. Building a RISC-V Processor with a Multicycle Pipeline
Abstract
This chapter will make you build your third RISC-V processor. The implemented microarchitecture proposed in this third version takes care of dependencies by blocking an instruction in the pipeline until the instructions it depends on are all out of the pipeline. For this purpose, a new issue stage is added. Moreover, the pipeline stages are organized to allow an instruction to stay multiple cycles in the same stage. The instruction processing is divided into six steps in order to further reduce the processor cycle to two FPGA cycles (i.e. 50Mhz): fetch, decode, issue, execute, memory access, and writeback. This multicycle pipeline microarchitecture is useful when the operators have different latencies, like multicycle arithmetic or memory accesses.
Bernard Goossens
10. Building a RISC-V Processor with a Multiple Hart Pipeline
Abstract
This chapter will make you build your fourth RISC-V processor. The implemented microarchitecture proposed in this fourth version improves the CPI by filling the pipeline with multiple instruction flows. At the OS level, a flow of control is a thread. The processor can be designed to host multiple threads and run them simultaneously (Simultaneous MultiThreading or SMT, as named by Tullsen in [1]). Such thread dedicated slots in the processor are called harts (for HARdware Threads). The multihart design presented in this chapter can host up to eight harts. The pipeline has six stages. The processor cycle is two FPGA cycles (i.e. 50 Mhz).
Bernard Goossens

Multiple Core Processors

Frontmatter
11. Connecting IPs
Abstract
This chapter presents the AXI interconnection system. You will build two multi-IP components. The different IPs are connected via the AXI interconnect IP provided by the Vivado component library. The first design connects a rv32i_npp_ip processor (presented in Chap. 6) to two block memories, one for code and the other for data. This design is intended to show how the AXI interconnection system works. The second design connects two IPs sharing two data memory banks. It is intended to show how multiple memory blocks are shared by multiple IPs, using the AXI interconnection to exchange data.
Bernard Goossens
12. A Multicore RISC-V Processor
Abstract
This chapter will make you build your first multicore RISC-V CPU. The processor is built from multiple IPs, each being a copy of the multicycle_pipeline_ip presented in Chap. 9. Each core has its own code and data memories. The data memory banks are interconnected with an AXI interconnect IP. An example of a parallelized matrix multiplication is used to measure the speedup when increasing the number of cores from one to eight.
Bernard Goossens
13. A Multicore RISC-V Processor with Multihart Cores
Abstract
This chapter will make you build your second multicore RISC-V CPU. The processor is built from multiple IPs, each being a copy of the multihart_ip presented in Chap. 10. Each core runs multiple harts. Each core has its own code and data memories. The code memory is common to all the harts of the core. The data memory of the core is partitioned between the implemented harts. Hence, a c core with h hart processor has h*c data memory partitions embedded in c memory IPs. The data memory banks are interconnected with an AXI interconnect IP. Any hart has a private access to its data memory partition and any other partition of the same core, and a remote access to any partition of any other core. An example of a parallelized matrix multiplication is used to measure the speedup when moving the number of cores from one to four and the number of harts from one to eight with a maximum of 16 harts in the whole IPs for simulation and a maximum of eight implementable harts on the FPGA.
Bernard Goossens
14. Conclusion: Playing with the Pynq-Z1/Z2 Development Board Leds and Push Buttons
Abstract
This chapter makes you play with the leds and push buttons of the development board. In a first step, an experience is built from a driver run on the Zynq Processing System and directly interacting with the board buttons and leds. Then, the driver is modified to interact with a multicore_multicycle_ip processor presented in Chap. 12. The processor runs a RISC-V program which accesses the board buttons and leds. From the general organization of the multicore_multicycle_ip processor design shown in this chapter, you can develop any RISC-V application to access the resources on the development board (switches, buttons and leds, DDR3 DRAM, SD card), including the expansion connectors (USB, HDMI, Ethernet RJ45, Pmods and Arduino shield).
Bernard Goossens
Backmatter
Metadaten
Titel
Guide to Computer Processor Architecture
verfasst von
Bernard Goossens
Copyright-Jahr
2023
Electronic ISBN
978-3-031-18023-1
Print ISBN
978-3-031-18022-4
DOI
https://doi.org/10.1007/978-3-031-18023-1

Neuer Inhalt