add what/how explanation to README
This commit is contained in:
parent
5f55ded723
commit
ae9b78d9ef
407
README.md
407
README.md
@ -4,36 +4,35 @@ Rocket Chip Generator
|
||||
This repository contains the Rocket chip generator necessary to instantiate
|
||||
the RISC-V Rocket Core.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
Contributors
|
||||
------------
|
||||
+ [Quick instructions](#quick) for those who want to dive directly into the details without knowing exactly what's in the repository.
|
||||
+ [What's in the Rocket chip generator repository?](#what)
|
||||
+ [How should I use the Rocket chip generator?](#how)
|
||||
+ [Using the high-performance cycle-accurate C++ emulator](#emulator)
|
||||
+ [Mapping a Rocket core down to an FPGA](#fpga)
|
||||
+ [Pushing a Rocket core through the VLSI tools](#vlsi)
|
||||
+ [How can I parameterize my Rocket chip?](#param)
|
||||
+ [Contributors](#contributors)
|
||||
|
||||
- Scott Beamer
|
||||
- Henry Cook
|
||||
- Yunsup Lee
|
||||
- Stephen Twigg
|
||||
- Huy Vo
|
||||
- Andrew Waterman
|
||||
## <a name="quick"></a> Quick Instructions
|
||||
|
||||
### Checkout The Code
|
||||
|
||||
Checkout The Code
|
||||
-----------------
|
||||
$ git submodule update --init
|
||||
$ git submodule update --init riscv-tools/riscv-tests
|
||||
|
||||
$ git submodule update --init --recursive
|
||||
### Setting up the RISCV environment variable
|
||||
|
||||
|
||||
Building The Toolchain
|
||||
----------------------
|
||||
|
||||
To build RISC-V ISA simulator, frontend server, proxy kernel and newlib based GNU toolchain:
|
||||
To build the rocket-chip repository, you must point the RISCV
|
||||
environment variable to your riscv-tools installation directory. If you
|
||||
do not yet have riscv-tools installed, please follow the directions in
|
||||
the
|
||||
[riscv-tools/README](https://github.com/ucb-bar/riscv-tools/blob/master/README.md).
|
||||
|
||||
$ export RISCV=/path/to/riscv/toolchain/installation
|
||||
$ cd riscv-tools
|
||||
$ ./build.sh
|
||||
|
||||
|
||||
Building The Project
|
||||
--------------------
|
||||
### Building The Project
|
||||
|
||||
To build the C simulator:
|
||||
|
||||
@ -45,10 +44,11 @@ To build the VCS simulator:
|
||||
$ cd vsim
|
||||
$ make
|
||||
|
||||
in either case, you can run a set of assembly tests or simple benchmarks:
|
||||
In either case, you can run a set of assembly tests or simple benchmarks
|
||||
(Assuming you have N cores on your host system):
|
||||
|
||||
$ make run-asm-tests
|
||||
$ make run-bmarks-test
|
||||
$ make -jN run-asm-tests
|
||||
$ make -jN run-bmarks-test
|
||||
|
||||
To build a C simulator that is capable of VCD waveform generation:
|
||||
|
||||
@ -57,20 +57,369 @@ To build a C simulator that is capable of VCD waveform generation:
|
||||
|
||||
And to run the assembly tests on the C simulator and generate waveforms:
|
||||
|
||||
$ make run-asm-tests-debug
|
||||
$ make run-bmarks-test-debug
|
||||
$ make -jN run-asm-tests-debug
|
||||
$ make -jN run-bmarks-test-debug
|
||||
|
||||
To get FPGA-synthesizable verilog (output will be in `fsim/generated-src`):
|
||||
To generate FPGA-synthesizable verilog (output will be in `fsim/generated-src`):
|
||||
|
||||
$ cd fsim
|
||||
$ make verilog
|
||||
|
||||
Similarly, to generate VLSI-synthesizable verilog (output will be in `vsim/generated-src`):
|
||||
|
||||
Updating To A Newer Version Of Chisel
|
||||
-------------------------------------
|
||||
$ cd vsim
|
||||
$ make verilog
|
||||
|
||||
### Updating To A Newer Version Of Chisel
|
||||
|
||||
To grab a newer version of chisel:
|
||||
|
||||
$ git submodule update --init
|
||||
$ cd chisel
|
||||
$ git pull origin master
|
||||
|
||||
## <a name="what"></a> What's in the Rocket chip generator repository?
|
||||
|
||||
The rocket-chip repository is the head git repository that points to
|
||||
many sub-repositories (e.g. the riscv-tools repository) using [git
|
||||
submodules](http://git-scm.com/book/en/Git-Tools-Submodules). While
|
||||
we're aware of the ongoing debate as to how meta-projects should be
|
||||
managed (i.e. a big monolithic repository vs. smaller repositories
|
||||
tracked as submodules), we've found that for our chip-building projects
|
||||
at Berkeley, the ability to compose a subset of private and public
|
||||
sub-repositories on a per-chip basis is a killer feature of git
|
||||
submodule.
|
||||
|
||||
So, which submodules are actually included in this chip's repository?
|
||||
Here's a look at all the git submodules that are currently tracked in
|
||||
the rocket-chip repository:
|
||||
|
||||
* **chisel**
|
||||
([https://github.com/ucb-bar/chisel](https://github.com/ucb-bar/chisel)):
|
||||
At Berkeley, we write RTL in Chisel. For those whom are not familiar
|
||||
with Chisel, please go take a look at
|
||||
[http://chisel.eecs.berkeley.edu](http://chisel.eecs.berkeley.edu). We
|
||||
have submoduled a specific git commit tag of the Chisel compiler rather
|
||||
than pointing to a versioned Chisel release as an external dependency;
|
||||
so far we were developing Chisel and the rocket core at the same time,
|
||||
and hence it was easiest to use submodule to track bleeding edge commits
|
||||
to Chisel, which contained a bunch of new features and bug fixes. As
|
||||
Chisel gets more stable, we will likely replace this submodule with an
|
||||
external dependency.
|
||||
* **rocket**
|
||||
([https://github.com/ucb-bar/rocket](https://github.com/ucb-bar/rocket)):
|
||||
The rocket repository holds the actual source code of the Rocket core.
|
||||
Note that the L1 blocking I$ and the L1 non-blocking D$ are considered
|
||||
part of the core, and hence we keep the L1 cache source code in this
|
||||
repository. This repository is not meant to stand alone; it needs to be
|
||||
included in a chip repository (e.g. rocket-chip) that instantiates the
|
||||
core within a memory system and connects it to the outside world.
|
||||
* **uncore**
|
||||
([https://github.com/ucb-bar/uncore](https://github.com/ucb-bar/uncore)):
|
||||
This repository implements the uncore logic, such as the coherence hub
|
||||
(the agent that keeps multiple L1 D$ coherent). The definition of the
|
||||
coherent interfaces between tiles ("tilelink") and the interface to the
|
||||
host machine ("htif") also live in this repository.
|
||||
* **hardfloat**
|
||||
([https://github.com/ucb-bar/berkeley-hardfloat](https://github.com/ucb-bar/berkeley-hardfloat)):
|
||||
This repository holds the parameterized IEEE 754-2008 compliant
|
||||
floating-point units for fused multiply-add operations, conversions
|
||||
between integer and floating-point numbers, and conversions between
|
||||
floating-point conversions with different precision. The floating-point
|
||||
units in this repository work on an internal recoded format (exponent
|
||||
has an additional bit) to handle subnormal numbers more efficiently in
|
||||
the processor. Please take a look at the
|
||||
[README](https://github.com/ucb-bar/berkeley-hardfloat/blob/master/README.md)
|
||||
in the repository for more information.
|
||||
* **dramsim2**
|
||||
([https://github.com/dramninjasUMD/DRAMSim2](https://github.com/dramninjasUMD/DRAMSim2)):
|
||||
Currently, the DRAM memory system is implemented in the testbench. We
|
||||
use dramsim2 to emulate DRAM timing.
|
||||
* **fpga-zynq**
|
||||
([https://github.com/ucb-bar/fpga-zynq](https://github.com/ucb-bar/fpga-zynq)):
|
||||
We also tag a version of the FPGA infrastructure that works with the RTL
|
||||
committed in the rocket-chip repository.
|
||||
* **riscv-tools**
|
||||
([https://github.com/ucb-bar/riscv-tools](https://github.com/ucb-bar/riscv-tools)):
|
||||
We tag a version of riscv-tools that works with the RTL committed in the
|
||||
rocket-chip repository. Once the software toolchain stabilizes, we
|
||||
might turn this submodule into an external dependency.
|
||||
|
||||
Next, take a look at rocket-chip's src/main/scala directory. There are a
|
||||
couple Chisel source files including RocketChip.scala, which
|
||||
instantiates both a Rocket core and the uncore logic, and then glues
|
||||
them together. Here's a brief overview of source files found in the
|
||||
rocket-chip repository:
|
||||
|
||||
* **RocketChip.scala**: Top-level source file (Top is the top-level
|
||||
module name), which instantiates a Rocket core, uncore logic, and glues
|
||||
them together.
|
||||
* **Network.scala**: This source file holds the crossbar network used in
|
||||
the uncore for multi-core implementations.
|
||||
* **PublicConfigs.scala**: This holds all the rocket-chip parameters.
|
||||
Probably this file is the most important file for external users. We
|
||||
will revisit this topic in the next section "How should I use the Rocket
|
||||
chip generator?", and will also post a more detailed explanation of the
|
||||
parameter infrastructure in the near future.
|
||||
* **Backends.scala**: An example of how the Chisel compiler's VLSI
|
||||
backend can be extended to route a pin named "init" to all SRAM blocks
|
||||
used in the design. This separation cleans up the source RTL of the
|
||||
design, since we don't need to add all the vendor-specific stuff in the
|
||||
Chisel source code, yet still can correctly hook up our particular
|
||||
SRAMs. The transformation is just a "compiler pass" in the Chisel
|
||||
backend that happens as the compiler translates the Chisel source code
|
||||
down to Verilog. Pretty neat huh?
|
||||
* **Vlsi.scala**: This file is pretty specific to our tapeouts. It
|
||||
implements logic to interface with an arbitrary number of slow
|
||||
single-ended digital I/Os when implementing a test chip.
|
||||
|
||||
Now you should take a look at the top-level I/O pins. Open up
|
||||
src/main/scala/RocketChip.scala, and search for TopIO. You will read the
|
||||
following (note, HostIO is defined in uncore/src/main/scala/htif.scala,
|
||||
and MemIO is defined in uncore/src/main/scala/memserdes.scala):
|
||||
|
||||
class TopIO extends Bundle {
|
||||
val host = new HostIO
|
||||
val mem = new MemIO
|
||||
val mem_backup_en = Bool(INPUT)
|
||||
val in_mem_ready = Bool(OUTPUT)
|
||||
val in_mem_valid = Bool(INPUT)
|
||||
val out_mem_ready = Bool(INPUT)
|
||||
val out_mem_valid = Bool(OUTPUT)
|
||||
}
|
||||
|
||||
There are 3 major I/O ports coming out of the top-level module:
|
||||
|
||||
* **Host-target interface (HostIO)**: The host system talks to the
|
||||
target machine via this host-target interface. We serialize a simple
|
||||
protocol over this parameterized interface. More details will come.
|
||||
* **High-performance memory interface (MemIO, mem\_backup\_en=false)**:
|
||||
When mem\_backup\_en is tied low, all memory requests from the processor
|
||||
comes out the MemIO port. The MemIO port uses the same uncore clock, and
|
||||
is intended to be connected to something on the same chip.
|
||||
* **Low-performance memory interface (parts of HostIO, in\_mem\_\*,
|
||||
out\_mem\_\*, mem\_backup\_en=true)**: When mem\_backup\_en is tied
|
||||
high, all memory requests from the processor comes out the
|
||||
low-performance memory interface. To save actual pins on a test chip, we
|
||||
multiplex the data pins of the host-target interface with the serialized
|
||||
low-performance memory port. That's the reason why you only see the
|
||||
control pins (in\_mem\_* and out\_mem\_*).
|
||||
|
||||
Of course, there's a lot more in the actual submodules, but hopefully
|
||||
this would be enough to get you started with using the Rocket chip
|
||||
generator. We will keep documenting more about our designs in the
|
||||
respective README of each submodules, release notes, and even blog
|
||||
posts. In the mean time, please post questions to the hw-dev mailing
|
||||
list.
|
||||
|
||||
## <a name="how"></a> How should I use the Rocket chip generator?
|
||||
|
||||
Chisel can generate code for three targets: a high-performance
|
||||
cycle-accurate C++ emulator, Verilog optimized for FPGAs, and Verilog
|
||||
for VLSI. The Rocket chip generator can target all three backends. You
|
||||
will need a Java runtime installed on your machine, since Chisel is
|
||||
overlaid on top of [Scala](http://www.scala-lang.org/). Chisel RTL (i.e.
|
||||
rocket-chip source code) is a Scala program executing on top of your
|
||||
Java runtime. To begin, ensure that the ROCKETCHIP environment variable
|
||||
points to the rocket-chip repository.
|
||||
|
||||
$ git clone https://github.com/ucb-bar/rocket-chip.git
|
||||
$ cd rocket-chip
|
||||
$ git submodule update --init
|
||||
$ git submodule update --init riscv-tools/riscv-tests
|
||||
$ export ROCKETCHIP=`pwd`
|
||||
|
||||
Before going any further, you must point the RISCV environment variable
|
||||
to your riscv-tools installation directory. If you do not yet have
|
||||
riscv-tools installed, follow the directions in the
|
||||
[riscv-tools/README](https://github.com/ucb-bar/riscv-tools/blob/master/README.md).
|
||||
|
||||
export RISCV=/path/to/install/riscv/toolchain
|
||||
|
||||
Otherwise, you will see the following error message while executing any
|
||||
command in the rocket-chip generator:
|
||||
|
||||
*** Please set environment variable RISCV. Please take a look at README.
|
||||
|
||||
### <a name="emulator"></a> 1) Using the high-performance cycle-accurate C++ emulator
|
||||
|
||||
Your next step is to get the C++ emulator working. Assuming you have N
|
||||
cores on your host system, do the following:
|
||||
|
||||
$ cd $ROCKETCHIP/emulator
|
||||
$ make -jN run
|
||||
|
||||
By doing so, the build system will generate C++ code for the
|
||||
cycle-accurate emulator, compile the emulator, compile all RISC-V
|
||||
assembly tests and benchmarks, and run both tests and benchmarks on the
|
||||
emulator. If make finished without any errors, it means that the
|
||||
generated Rocket chip has passed all assembly tests and benchmarks!
|
||||
|
||||
Now take a look in the emulator/generated-src directory. You will find
|
||||
Chisel generated C++ code.
|
||||
|
||||
$ ls $ROCKETCHIP/emulator/generated-src
|
||||
Top.DefaultCPPConfig-0.cpp
|
||||
Top.DefaultCPPConfig-0.o
|
||||
Top.DefaultCPPConfig-1.cpp
|
||||
Top.DefaultCPPConfig-1.o
|
||||
Top.DefaultCPPConfig-2.cpp
|
||||
Top.DefaultCPPConfig-2.o
|
||||
Top.DefaultCPPConfig-3.cpp
|
||||
Top.DefaultCPPConfig-3.o
|
||||
Top.DefaultCPPConfig-4.cpp
|
||||
Top.DefaultCPPConfig-4.o
|
||||
Top.DefaultCPPConfig-5.cpp
|
||||
Top.DefaultCPPConfig-5.o
|
||||
Top.DefaultCPPConfig.cpp
|
||||
Top.DefaultCPPConfig.h
|
||||
emulator.h
|
||||
emulator_api.h
|
||||
emulator_mod.h
|
||||
|
||||
Also, output of the executed assembly tests and benchmarks can be found
|
||||
at emulator/output/\*.out. Each file has a cycle-by-cycle dump of
|
||||
write-back stage of the pipeline. Here's an excerpt of
|
||||
emulator/output/rv64ui-p-add.out:
|
||||
|
||||
C0: 483 [1] pc=[00000002138] W[r 3=000000007fff7fff][1] R[r 1=000000007fffffff] R[r 2=ffffffffffff8000] inst=[002081b3] add s1, ra, s0
|
||||
C0: 484 [1] pc=[0000000213c] W[r29=000000007fff8000][1] R[r31=ffffffff80007ffe] R[r31=0000000000000005] inst=[7fff8eb7] lui t3, 0x7fff8
|
||||
C0: 485 [0] pc=[00000002140] W[r 0=0000000000000000][0] R[r 0=0000000000000000] R[r 0=0000000000000000] inst=[00000000] unknown
|
||||
|
||||
This means at cycle 483, core 0, the first [1] shows that there's a
|
||||
valid instruction at PC 0x2138 in the writeback stage, which is
|
||||
0x002081b3 (add s1, ra, s0). The second [1] tells us that the register
|
||||
file is writing r3 with the corresponding value 0x7fff7fff. When the add
|
||||
instruction was in the decode stage, the pipeline had read r1 and r2
|
||||
with the corresponding values next to it. Similarly at cycle 484,
|
||||
there's a valid instruction (lui instruction) at PC 0x213c in the
|
||||
writeback stage. At cycle 485, there isn't a valid instruction in the
|
||||
writeback stage, perhaps, because of a instruction cache miss at PC
|
||||
0x2140.
|
||||
|
||||
### <a name="fpga"></a> 2) Mapping a Rocket core down to an FPGA
|
||||
|
||||
We use Synopsys VCS for Verilog simulation. We acknowledge that using a
|
||||
proprietary Verilog simulation tool for an open-source project is not
|
||||
ideal; we ask the community to help us move DirectC routines (VCS's way
|
||||
of gluing Verilog testbenches to arbitrary C/C++ code) into DPI/VPI
|
||||
routines so that we can make Verilog simulation work with a open-source
|
||||
Verilog simulator. In the meantime, you can use the C++ emulator to
|
||||
generate vcd waveforms, which you can view with an open-source waveform
|
||||
viewer such as GTKWave.
|
||||
|
||||
So assuming you have a working Rocket chip, you can generate Verilog for
|
||||
the FPGA tools with the following commands:
|
||||
|
||||
$ cd $ROCKETCHIP/fsim
|
||||
$ make verilog
|
||||
|
||||
The Verilog used for the FPGA tools will be generated in
|
||||
fsim/generated-src. Please proceed further with the directions shown in
|
||||
the [README](https://github.com/ucb-bar/fpga-zynq/blob/master/README.md)
|
||||
of the fpga-zynq repository.
|
||||
|
||||
However, if you have access to VCS, you will be able to run assembly
|
||||
tests and benchmarks with the following commands (again assuming you
|
||||
have N cores on your host machine):
|
||||
|
||||
$ cd $ROCKETCHIP/fsim
|
||||
$ make -jN run
|
||||
|
||||
The generated output looks similar to those generated from the emulator.
|
||||
Look into fsim/output/\*.out for the output of the executed assembly
|
||||
tests and benchmarks.
|
||||
|
||||
### <a name="vlsi"></a> 3) Pushing a Rocket core through the VLSI tools
|
||||
|
||||
You can generate Verilog for your VLSI flow with the following commands:
|
||||
|
||||
$ cd $ROCKETCHIP/vsim
|
||||
$ make verilog
|
||||
|
||||
Now take a look at vsim/generated-src, and the contents of the
|
||||
Top.DefaultVLSIConfig.conf file:
|
||||
|
||||
$ cd $ROCKETCHIP/vsim/generated-src
|
||||
Top.DefaultVLSIConfig.conf
|
||||
Top.DefaultVLSIConfig.prm
|
||||
Top.DefaultVLSIConfig.v
|
||||
consts.DefaultVLSIConfig.vh
|
||||
memdessertMemDessert.DefaultVLSIConfig.v
|
||||
$ cat $ROCKETCHIP/vsim/generated-src/*.conf
|
||||
name MetadataArray_tag_arr depth 128 width 84 ports mwrite,read mask_gran 21
|
||||
name ICache_tag_array depth 128 width 38 ports mrw mask_gran 19
|
||||
name DataArray_T6 depth 512 width 128 ports mwrite,read mask_gran 64
|
||||
name HellaFlowQueue_ram depth 32 width 133 ports write,read
|
||||
name ICache_T157 depth 512 width 128 ports rw
|
||||
|
||||
The conf file contains information for all SRAMs instantiated in the
|
||||
flow. If you take a close look at the $ROCKETCHIP/Makefrag, you will see
|
||||
that during Verilog generation, the build system calls a $(mem\_gen)
|
||||
script with the generated configuration file as an argument, which will
|
||||
fill in the Verilog for the SRAMs. Currently, the $(mem\_gen) script
|
||||
points to vsim/vlsi\_mem\_gen, which simply instantiates behavioral
|
||||
SRAMs. You will see those SRAMs being appended at the end of
|
||||
vsim/generated-src/Top.DefaultVLSIConfig.v. To target vendor-specific
|
||||
SRAMs, you will need to make necessary changes to vsim/vlsi\_mem\_gen.
|
||||
|
||||
Similarly, if you have access to VCS, you can run assembly tests and
|
||||
benchmarks with the following commands (again assuming you have N cores
|
||||
on your host machine):
|
||||
|
||||
$ cd $ROCKETCHIP/vsim
|
||||
$ make -jN run
|
||||
|
||||
The generated output looks similar to those generated from the emulator.
|
||||
Look into vsim/output/\*.out for the output of the executed assembly
|
||||
tests and benchmarks.
|
||||
|
||||
## <a name="param"></a> How can I parameterize my Rocket chip?
|
||||
|
||||
By now, you probably figured out that all generated files have a
|
||||
configuration name attached, e.g. DefaultCPPConfig and
|
||||
DefaultVLSIConfig. Take a look at src/main/scala/PublicConfigs.scala.
|
||||
Search for NSets and NWays defined in DefaultConfig. You can change
|
||||
those numbers to get a Rocket core with different cache parameters. For
|
||||
example, by changing L1I, NWays to 4, you will get a 32KB 4-way
|
||||
set-associative L1 instruction cache rather than a 16KB 2-way
|
||||
set-associative L1 instruction cache. By searching further for
|
||||
DefaultVLSIConfig and DefaultCPPConfig, you will see that currently both
|
||||
are set to be identical to DefaultConfig.
|
||||
|
||||
Further down, you will be able to see two FPGA configurations:
|
||||
FPGAConfig and FPGASmallConfig. FPGAConfig inherits from DefaultConfig,
|
||||
but overrides the low-performance memory port (i.e., backup memory port)
|
||||
to be turned off. This is because the high-performance memory port is
|
||||
directly connected to the high-performance AXI interface on the ZYNQ
|
||||
FPGA. FPGASmallConfig inherits from FPGAConfig, but changes the cache
|
||||
sizes, disables the FPU, turns off the fast early-out multiplier and
|
||||
divider, and reduces the number of TLB entries. This small configuration
|
||||
is used for the Zybo FPGA board, which has the smallest ZYNQ part.
|
||||
|
||||
Now take a look at fsim/Makefile and vsim/Makefile. Search for the
|
||||
CONFIG variable. DefaultFPGAConfig is used for the FPGA build, while
|
||||
DefaultVLSIConfig is used for the VLSI build. You can also change the
|
||||
CONFIG variable on the make command line:
|
||||
|
||||
$ cd $ROCKETCHIP/vsim
|
||||
$ make -jN CONFIG=DefaultFPGAConfig run
|
||||
|
||||
Or, even by defining CONFIG as an environment variable:
|
||||
|
||||
$ export CONFIG=DefaultFPGAConfig
|
||||
$ make -jN run
|
||||
|
||||
This parameterization is one of the many strengths of processor
|
||||
generators written in Chisel, and will be more detailed in a future blog
|
||||
post, so please stay tuned.
|
||||
|
||||
## <a name="contributors"></a> Contributors
|
||||
|
||||
- Scott Beamer
|
||||
- Henry Cook
|
||||
- Yunsup Lee
|
||||
- Stephen Twigg
|
||||
- Huy Vo
|
||||
- Andrew Waterman
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user