User Tools

Site Tools


old2

**This is an old revision of the document!**

This old page was copied on May 17, 2020.

RVCore Project, Arch Lab, Tokyo Tech

The RVCore Project is a research and development project of the RISC-V soft processor highly optimized for FPGAs.

RVCoreP (RISC-V Core Pipelined version) is one of the RISC-V soft processor cores of the RVCore Project. It is an optimized RISC-V soft processor of five-stage pipelining.

About RVCoreP

The main specifications of RVCoreP are shown below:

  • An optimized RISC-V soft processor
  • Adopt RV32I of RISC-V as an instruction set architecture, which is the basic 32-bit integer instruction set
  • Adopt five-stage pipelining
    • Instruction fetch (If)
    • Instruction decode (Id)
    • Instruction execution (Ex)
    • Memory access (Ma)
    • Write back (Wb)
  • Apply three effective optimization methods to improve the operating frequency
    • Instruction fetch unit optimization including the pipelined branch prediction mechanism
    • ALU optimization
    • Data alignment and sign-extension optimization for data memory output
  • Implemented in Verilog HDL
  • Run RISC-V programs compiled with RV32I
    • By Verilog HDL simulation using Icarus Verilog or Verilator
    • On the FPGA boards including Xilinx Artix-7 FPGA

Download source file

The source code of RVCoreP Ver.0.4.6: rvcorep_ver046.zip

This source code is released under the MIT License, see LICENSE.txt.

Install iverilog by the following command.

$ sudo apt install iverilog

Install verilator by the following command, the install of verilator is optional.

$ sudo apt install verilator

Getting started guide

(1) Download the source code of the RVCoreP

$ wget http://www.arch.cs.titech.ac.jp/wk/rvcore/lib/exe/fetch.php?media=rvcorep_ver046.zip -O rvcorep_ver046.zip

(2) Extract the downloaded zip file

$ unzip rvcorep_ver046.zip
$ cd rvcorep_ver046

Verilog HDL simulation using Icarus Verilog

You execute the following commands on the recommended environment.

(1) Compile source code written in Verilog HDL using Icarus Verilog

$ make
iverilog -DSERIAL_WCNT=2 -DNO_IP -o simv top.v main.v uart.v debug.v proc.v

The executable file simv is generated after the compilation is performed.

(2) Execute the Verilog HDL simulation

By default, the test benchmark is executed. The memory file of the test benchmark is test/test.mem.

$ make run
./simv
Run test/test.mem
Initializing : ..........
--------------------------------------------------
---- nqueen ----
Nqueen :
N = 6
The number of solutions = 4
----------------

---- qsort  ----
Sorted Seqence :
59321
A4C86
AC7D3
B210A
142044
1DEC15
1EC216
2536B2
278BCF
34A2AC
----------------

---- fib    ----
Fibonacci Seqence :
1: 1
2: 1
3: 2
4: 3
5: 5
6: 8
7: D
8: 15
9: 22
A: 37
----------------

---- acker  ----
acker(0,0) = 1
acker(0,1) = 2
acker(0,2) = 3
acker(1,0) = 2
acker(1,1) = 3
acker(1,2) = 4
acker(2,0) = 3
acker(2,1) = 5
acker(2,2) = 7
----------------

== elapsed clock cycles        :    35030
== valid instructions executed :    28934
== IPC                         :    0.825
== the num of load-use stall   :     1897
== branch prediction hit       :     3615
== branch prediction miss      :     1430
== branch prediction total     :     5045
== branch prediction hit rate  :    0.716
== estimated clock cycles      :    35121
== r_cnt                       : 000088d6
== r_rout                      : 000000a0

You will see the above output. The information such as IPC (Instructions Per Cycle) and branch prediction hit rate is output to the console after running simulation.

(3) Execute the Dhrystone and Coremark benchmarks by the Verilog HDL simulation

You compile and execute with Dhrystone benchmark. The memory file is bench/dhrystone.mem.

$ make dhrystone
make BENCH="bench/dhrystone.mem"
iverilog -DSERIAL_WCNT=2 -DNO_IP -DMEMFILE=\"bench/dhrystone.mem\" -DMEM_SIZE=1024*32 -DNO_SERIAL -DPROGRESS -o simv top.v main.v uart.v debug.v proc.v
$ make run
./simv
Run bench/dhrystone.mem
Initialized.
--------------------------------------------------
............................................................................................
== elapsed clock cycles        :   973054
== valid instructions executed :   909443
== IPC                         :    0.934
== the num of load-use stall   :    18174
== branch prediction hit       :   201153
== branch prediction miss      :    16481
== branch prediction total     :   217634
== branch prediction hit rate  :    0.924
== estimated clock cycles      :   977060
== r_cnt                       : 000ed8fe
== r_rout                      : 0000124c

You also compile and execute with Coremark benchmark. The memory file is bench/coremark.mem.

$ make coremark
make BENCH="bench/coremark.mem"
iverilog -DSERIAL_WCNT=2 -DNO_IP -DMEMFILE=\"bench/coremark.mem\" -DMEM_SIZE=1024*32 -DNO_SERIAL -DPROGRESS -o simv top.v main.v uart.v debug.v proc.v
$ make run
./simv
Run bench/coremark.mem
Initialized.
--------------------------------------------------
......................................................................................................................................................
.................................
== elapsed clock cycles        :  1799505
== valid instructions executed :  1481298
== IPC                         :    0.823
== the num of load-use stall   :    34930
== branch prediction hit       :   363439
== branch prediction miss      :    94534
== branch prediction total     :   457973
== branch prediction hit rate  :    0.793
== estimated clock cycles      :  1799830
== r_cnt                       : 001b7551
== r_rout                      : 00002fe0

Implementation and execution on the Nexys 4 DDR board

You execute the following process in the directory of the downloaded source code on the recommended environment.

(1) Open the project file main.xpr in Xilinx Vivado

$ vivado main.xpr

(2) Perform logic synthesis, placement and routing, and generating bitstream using Xilinx Vivado

  • Click “Generate Bitstream” in Vivado project manager

By default, the operating frequency of the processor is set to 170MHz.

(3) Write the generated bitstream to the FPGA board

  • Click “Open Hardware Manager” in Vivado project manager to open the hardware manager
  • Click “Open target” and “Auto Connect” to recognize the FPGA board
  • Click “Program device” and specify Bitstream file
  • Click “Program” to write bitstream to FPGA board

When the bitstream data is correctly written to the FPGA board, the DONE LED lights up and “00000000” is displayed on the 8-digit 7-segment LEDs.

(4) Prepare for 5M baud serial communication

Open a terminal emulator Tera Term that can perform serial communication

  • Click “File”→“New connection” to make a new connection
  • Select the appropriate USB Serial Port and click “OK” (When using Windows, it seems better to select the larger value of COM[XX] of the serial COM port)
  • Click “Setup”→“Terminal”
  • Change “Receive” and “Transmit” in New-line to “LF” and click “OK”
  • Click “Setup”→“Serial port”
  • Change the Baud rate of Serial port to “5000000” and click “OK”

(5) Send the RISC-V program binary to the FPGA board by serial communication and execute the program

  • Click “File”→“Send file” on Tera Term
  • Check the “Binary” checkbox in the “Option” column
  • Select test/test.bin as file name and click “Open”
  • Send the test benchmark to the FPGA board via serial communication and execute the program on the implemented processor

The following execution result is output via serial communication.

---- nqueen ----
Nqueen :
N = 6
The number of solutions = 4
----------------

---- qsort  ----
Sorted Seqence :
59321
A4C86
AC7D3
B210A
142044
1DEC15
1EC216
2536B2
278BCF
34A2AC
----------------

---- fib    ----
Fibonacci Seqence :
1: 1
2: 1
3: 2
4: 3
5: 5
6: 8
7: D
8: 15
9: 22
A: 37
----------------

---- acker  ----
acker(0,0) = 1
acker(0,1) = 2
acker(0,2) = 3
acker(1,0) = 2
acker(1,1) = 3
acker(1,2) = 4
acker(2,0) = 3
acker(2,1) = 5
acker(2,2) = 7
----------------

The 7-segment LED shows the value of the program counter of the processor at the end of execution. If you execute the test benchmark test/test.bin, the output to the 7-segment LEDs is 000000A0.

When the button “BTNU” is pressed, the 7-segment LED shows the number of execution cycles. If you execute the test benchmark test/test.bin, the output to the 7-segment LEDs is 000088D6.

If you want to send the binary file of the program to the FPGA board again and execute it, proceed from step (3).

Publication

This processor RVCoreP is explicated in a preprint paper of arXiv.

Hiromu Miyazaki, Takuto Kanamori, Md Ashraful Islam, Kenji Kise: RVCoreP : An optimized RISC-V soft processor of five-stage pipelining, arXiv:2002.03568 [cs.AR] (2020-02-10).

This paper is submitted to the Institute of Electronics, Information and Communication Engineers (IEICE).

Contact

Kise Laboratory, Department of Computer Science, School of Computing, Tokyo Institute of Technology (Tokyo Tech)

E-mail: miyazaki (at) arch.cs.titech.ac.jp

Other Project

Copyright © 2020 Kise Laboratory, Tokyo Institute of Technology

old2.1727010349.txt.gz · Last modified: 2024/09/22 22:05 by kise