# RVCore Project, Arch Lab, Tokyo Tech The RVCore Project is a research and development project of the RISC-V soft processor highly optimized for FPGAs. **RVCoreP** (**R**ISC-**V** **Core** **P**ipelined version) is one of the RISC-V soft processor cores of the RVCore Project. It is an optimized RISC-V soft processor of five-stage pipelining. {{:rvcorep.png?nolink&600|}} **RVCoreP supports the following FPGA boards!** - [[https://reference.digilentinc.com/reference/programmable-logic/nexys-4-ddr/reference-manual|Nexys 4 DDR board]] with Xilinx Artix-7 FPGA - [[https://reference.digilentinc.com/reference/programmable-logic/arty-a7/reference-manual|Arty A7-35T board]] with Xilinx Artix-7 FPGA ## What's new - 2020/06/25 : Released Ver.0.5.3 and support the Arty A7-35T FPGA board - [[old2|2020/05/18]] : Add [[binary|the page]] how to build the RISC-V cross compiler and RISC-V binary files - 2020/05/17 : Change of web page structure and release of Ver.0.5.1 - [[old1|2020/05/03]] : Added the setting method about New-line code - 2020/03/04 : This page is released about Ver.0.4.6 ! ## About RVCoreP The main specifications of RVCoreP are shown below: - An optimized RISC-V soft processor - Adopt **RV32I** of RISC-V as an instruction set architecture, which is the basic 32-bit integer instruction set - Adopt **five-stage pipelining** - Instruction fetch (If) - Instruction decode (Id) - Instruction execution (Ex) - Memory access (Ma) - Write back (Wb) - Apply **three effective optimization methods** to improve the operating frequency - Instruction fetch unit optimization including the pipelined branch prediction mechanism - ALU optimization - Data alignment and sign-extension optimization for data memory output - Implemented in **Verilog HDL** - Run RISC-V programs compiled with RV32I - By Verilog HDL simulation using Verilator or Icarus Verilog - On the FPGA boards including **Xilinx Artix-7 FPGA** ## Download source file The latest version of RVCoreP is Ver.0.5.3: {{ :rvcorep_ver053.zip |rvcorep_ver053.zip}} The other old versions are as follows: - Ver.0.5.1: {{ :rvcorep_ver051.zip |rvcorep_ver051.zip}} - Ver.0.4.6: {{ :rvcorep_ver046.zip |rvcorep_ver046.zip}} This source code is released under the MIT License, see LICENSE.txt. ### Change log - Ver.0.5.3 : The version that supports Arty A7-35T FPGA board - Ver.0.5.1 : The version supporting Verilator, Embench, pySerial, Vivado 2019.2 - Ver.0.4.6 : The version used in our submitted manuscript ## Recommended environment - Ubuntu 18.04 LTS - [[https://www.veripool.org/wiki/verilator|Verilator]] for Verilog HDL simulation - [[http://iverilog.icarus.com|Icarus Verilog]] for Verilog HDL simulation - [[https://www.xilinx.com/products/design-tools/vivado.html|Xilinx Vivado Design Suite]] 2019.2 for logic synthesis - [[https://reference.digilentinc.com/reference/programmable-logic/nexys-4-ddr/reference-manual|Nexys 4 DDR board]] or [[https://reference.digilentinc.com/reference/programmable-logic/arty-a7/reference-manual|Arty A7-35T board]] with Xilinx Artix-7 FPGA for placement and routing - Python 3.6.9 - [[https://pythonhosted.org/pyserial/|pySerial]] for serial communication with FPGA board ### Install command Install verilator by the following command. ``` $ sudo apt install verilator ``` Install iverilog by the following command. ``` $ sudo apt install iverilog ``` Install pySerial by the following command. ``` $ sudo apt install python3-pip $ pip3 install pyserial ``` ## Getting started guide (1) Download the source code of the RVCoreP ``` $ wget http://www.arch.cs.titech.ac.jp/wk/rvcore/lib/exe/fetch.php?media=rvcorep_ver053.zip -O rvcorep_ver053.zip ``` (2) Extract the downloaded zip file ``` $ unzip rvcorep_ver053.zip $ cd rvcorep_ver053 ``` ### Verilog HDL simulation using Verilator You execute the following commands on the recommended environment. (1) Compile source code written in Verilog HDL using Verilator ``` $ make verilator -DSERIAL_WCNT=2 -DNO_IP --public --top-module top --clk CLK --x-assign 0 --x-initial 0 --no-threads -O2 -Wno-WIDTH -Wno-UNSIGNED --exe sim.cpp --cc top.v main.v uart.v debug.v proc.v make -j -C obj_dir -f Vtop.mk Vtop cp obj_dir/Vtop simv ``` The executable file `simv` is generated after the compilation is performed. (2) Execute the Verilog HDL simulation By default, the test benchmark is executed. The memory file of the test benchmark is `test/test.mem`. ### Execution result when running `test/test.mem` ``` $ make run ./simv Run test/test.mem Initializing : .......... -------------------------------------------------- ---- nqueen ---- Nqueen : N = 6 The number of solutions = 4 ---------------- ---- qsort ---- Sorted Seqence : 59321 A4C86 AC7D3 B210A 142044 1DEC15 1EC216 2536B2 278BCF 34A2AC ---------------- ---- fib ---- Fibonacci Seqence : 1: 1 2: 1 3: 2 4: 3 5: 5 6: 8 7: D 8: 15 9: 22 A: 37 ---------------- ---- acker ---- acker(0,0) = 1 acker(0,1) = 2 acker(0,2) = 3 acker(1,0) = 2 acker(1,1) = 3 acker(1,2) = 4 acker(2,0) = 3 acker(2,1) = 5 acker(2,2) = 7 ---------------- == elapsed clock cycles : 35030 == valid instructions executed : 28934 == IPC : 0.825 == branch prediction hit : 3615 == branch prediction miss : 1430 == branch prediction total : 5045 == branch prediction hit rate : 0.716 == the num of load-use stall : 1897 == estimated clock cycles : 35121 == r_cnt : 000088d6 == r_rout : 000000a0 - top.v:154: Verilog $finish ``` You will see the above output. The information such as IPC (Instructions Per Cycle) and branch prediction hit rate is output to the console after running simulation. (3) Execute the Dhrystone and Coremark benchmarks by the Verilog HDL simulation You compile and execute with Dhrystone and Coremark benchmarks. The memory files, the binary files, and the dump files are stored in the `bench/` directory. For Dhrystone, three configurations are prepared by the parameter `NUMBER_OF_RUNS`. - benchd : NUMBER\_OF\_RUNS=500 - benchd2 : NUMBER\_OF\_RUNS=2000 - benchd3 : NUMBER\_OF\_RUNS=10000 For Coremark, three configurations are also prepared by the parameter `ITERATIONS`. - benchc : ITERATIONS=1 - benchc2 : ITERATIONS=2 - benchc3 : ITERATIONS=10 These benchmarks can be run with the following commands. ``` $ make [configuration name] $ make run ``` (4) Execute the [[https://embench.org|Embench]] benchmark by the Verilog HDL simulation You compile and execute with 19 benchmarks in Embench. The memory files, the binary files, and the dump files are stored in the `embench/` directory. These benchmarks can be run with the following commands. ``` $ make [benchmark name] $ make run ``` The execution results of dhrystone3, coremark3, and 19 benchmarks of Embench are summarized in the file `result.txt`. ### Implementation and execution on a FPGA board You execute the following process in the directory of the downloaded source code on the recommended environment. (1) Open the project file `main.xpr` in Xilinx Vivado ``` $ vivado main.xpr & ``` (2) Perform logic synthesis, placement and routing, and generating bitstream using Xilinx Vivado Use the following set of logic synthesis and place-and-route according to the FPGA board you want to use. - When using Nexys 4 DDR board : synth\_1 and impl\_1 - When using Arty A7-35T board : synth\_2 and impl\_2 Execute the following process according to the used set of logic synthesis and placement and routing. - Right click on synth_* for logic synthesis and select "Make Active" - Click "Generate Bitstream" in Vivado project manager By default, the operating frequency of the processor is set to 160MHz. In the synth\_2 for Arty A7 board, the following command is added to the option of synth\_2, the strategy of logic synthesis. - `-verilog_define ARTYA7=1` The source code is switched for Arty A7 board by the definition of this macro `ARTYA7`, so please be careful when changing the logic synthesis strategy yourself. (3) Write the generated bitstream to the FPGA board - Click "Open Hardware Manager" in Vivado project manager to open the hardware manager - Click "Open target" and "Auto Connect" to recognize the FPGA board - Click "Program device" and specify Bitstream file - Click "Program" to write bitstream to FPGA board When the bitstream data is correctly written to the Nexys 4 DDR board, the DONE LED lights up and "00000000" is displayed on the 8-digit 7-segment LEDs. (4) Prepare for 8M baud serial communication Please move to the `test/` directory for the following processes. ``` $ cd test/ ``` For serial communication with RVCoreP, the pySerial program `serial_rvcorep.py` is used. This program sends a binary file to the FPGA board and outputs the received characters via pySerial. Before executing this program, check the setting of the port used for serial communication in the program. By default, `/dev/ttyUSB1` is set. The usage of this program is as follows: ``` $ python3 serial_rvcorep.py [serial baud rate (Mbaud)] [binary file name] ``` (5) Send the RISC-V program binary to the FPGA board by serial communication and execute the program To send the test program `test.bin` and start serial communication, execute one of the following commands. ``` $ python3 serial_rvcorep.py 8 "test.bin" ``` or ``` $ python3 serial_rvcorep.py ``` - Send the test benchmark to the FPGA board via serial communication and execute the program on the implemented processor The following execution result is output via serial communication. ### Execution result when running `test.bin` ``` $ python3 serial_rvcorep.py serial baud rate : 8000000 send file : test.bin ---- nqueen ---- Nqueen : N = 6 The number of solutions = 4 ---------------- ---- qsort ---- Sorted Seqence : 59321 A4C86 AC7D3 B210A 142044 1DEC15 1EC216 2536B2 278BCF 34A2AC ---------------- ---- fib ---- Fibonacci Seqence : 1: 1 2: 1 3: 2 4: 3 5: 5 6: 8 7: D 8: 15 9: 22 A: 37 ---------------- ---- acker ---- acker(0,0) = 1 acker(0,1) = 2 acker(0,2) = 3 acker(1,0) = 2 acker(1,1) = 3 acker(1,2) = 4 acker(2,0) = 3 acker(2,1) = 5 acker(2,2) = 7 ---------------- ``` When using Nexys 4 DDR board, The 7-segment LED shows the value of the program counter of the processor at the end of execution. If you execute the test benchmark `test/test.bin`, the output to the 7-segment LEDs is `000000A0`. When using Nexys 4 DDR board and the button "BTNU" is pressed, the 7-segment LED shows the number of execution cycles. If you execute the test benchmark `test/test.bin`, the output to the 7-segment LEDs is `000088D6`. This program is terminated by Ctrl-C. If you want to send the binary file of the program to the FPGA board again and execute it, proceed from step (3). ## How to build the RISC-V cross compiler and RISC-V binary files that works with RVCoreP Please refer the fllowing page: [[binary|How to build the RISC-V cross compiler and RISC-V binary files that works with RVCoreP]] ## Publication - [[https://www.jstage.jst.go.jp/article/transinf/E103.D/12/E103.D_2020PAP0015/_article|Hiromu MIYAZAKI, Takuto KANAMORI, Md Ashraful ISLAM, Kenji KISE, RVCoreP: An Optimized RISC-V Soft Processor of Five-Stage Pipelining, IEICE Transactions on Information and Systems, 2020, Volume E103.D, Issue 12, Pages 2494-2503, Released December 01, 2020, Online ISSN 1745-1361, Print ISSN 0916-8532.]] - Hiromu Miyazaki, Takuto Kanamori, Md Ashraful Islam, Kenji Kise: RVCoreP : An optimized RISC-V soft processor of five-stage pipelining, [[https://arxiv.org/abs/2002.03568|arXiv:2002.03568 [cs.AR]]] (2020-02-10). ## Contact [[http://www.arch.cs.titech.ac.jp/|Kise Laboratory]], Department of Computer Science, School of Computing, [[https://www.titech.ac.jp/english/|Tokyo Institute of Technology]] (Tokyo Tech) Maintainer : (E-mail) riscv-support (at) arch.cs.titech.ac.jp Contributor : Hiromu Miyazaki, Takuto Kanamori, Md Ashraful Islam, Kenji Kise ## Other Project - [[http://www.arch.cs.titech.ac.jp/wk/rvsoc/doku.php|RVSoC]] Copyright (c) 2020 Kise Laboratory, Tokyo Institute of Technology