XiangShan Decode Design Document.
- Version: V2R2
- Status: OK
- Date: 2025/02/28.
- commit:xxx
Glossary of Terms
| Abbreviation | Full name | Description |
|---|---|---|
| - | Decode Unit | Decode unit |
| uop | Micro Operation | Micro-operation |
| - | numOfUop | Number of uops split from one instruction |
| - | numOfWB | The number of uops requiring writeback among those split from an instruction |
| - | vtypeArch | Latest committed vector instruction vtype configuration |
| - | vtypeSpec | Current vector instruction vtype configuration. |
| - | walkVType | The vtype rolled back and restored upon redirection. |
Submodule List
| Submodule | Description |
|---|---|
| DecodeUnit | Decode unit |
| DecodeUnitComp | Vector instruction splitting and processing module. |
| FPDecoder | Floating-point instruction decoding module |
| UopInfoGen | Instruction split type and quantity generation unit |
| VecDecoder | Vector instruction decoding module |
| VecExceptionGen | Vector exception check module |
| VTypeGen | Vector instruction vtype configuration generation module |
Design specifications
- Added vector configuration generation module, vector decoding module, vector instruction splitting module, and vector exception checking module. All vector instructions undergo instruction splitting and enter decoderComp.
- Supports decoding 6 scalar instructions simultaneously in a single cycle
- Supports decoding up to 1 vector instruction per cycle.
- Some instructions undergo translation processing.
- The zimop instruction, translated as an addi instruction with src as x0 and imm as 0.
- Read vlenb instruction, translated to an addi instruction with src as x0 and imm as VLEN/8
- Read vl instruction, translated into a vset instruction that reads the vl register and writes to a scalar register.
- When reading a read-only csr, the waitForward and blockBackward signals are no longer set, supporting out-of-order execution.
- Other functions are the same as Nanhu
Function
Decode the instruction, converting the 32-bit encoding into control signals. If the instruction is a vector instruction or an AMO_CAS instruction, it undergoes instruction splitting. The splitting process divides the instruction into one or more uops and reassigns source register numbers, source register types, destination register numbers, destination register types, functional units used, and operation types based on the split type. After decoding, the instruction with control information is passed to the rename module, which allocates physical registers based on source register numbers and types. During the decode stage, exception instructions and virtualization exception instructions are checked, and the corresponding signals in exceptionVec are raised.
Overall design
The decoding process instantiates 6 DecodeUnit modules to decode input instructions. The DecodeUnit outputs a signal indicating whether the instruction is a vector instruction. If it is a vector instruction, it is passed to the complex decoder, decoderComp, for instruction splitting. Due to the longer critical path caused by vector instructions undergoing decoding in both DecodeUnit and UopInfoGen before entering the complex decoder, instructions are temporarily stored for one cycle upon entering the complex decoder. In the next cycle, vector exception checks and instruction splitting are performed, converting the instruction into one or more uops. If the uops exceed 6, multiple cycles are required to complete decoding. If the remaining uops can be decoded in the current cycle, the vector instruction needing decoding is passed to decoderComp in the same cycle. Assuming rename is ready, the following scenarios can occur based on the order of incoming instructions:
- Scalar instructions: Directly decoded
- Vector instructions: When decoderComp is ready, vector instructions are passed to decoderComp for instruction splitting, capable of processing only one vector instruction at a time
- Vector instruction + scalar instruction: When decoderComp is ready, the vector instruction is passed to decoderComp for splitting; it can only handle one vector instruction at a time and cannot process scalar instructions simultaneously.
- Scalar instruction + vector instruction: Scalar instructions preceding vector instructions are decoded directly. When decoderComp is ready, vector instructions are passed to decoderComp for instruction splitting, which can only handle one vector instruction at a time
- uops after instruction splitting + scalar instruction: Assume there are n
split uops needing rename and m scalar instructions needing rename in the
current cycle. If n + m ≤ 6, decoding proceeds directly; otherwise, only 6
- n scalar instructions are decoded.
- uop + vector instruction after splitting: Handles cases where uops split from vector instructions are vector-like
- uops after instruction splitting + vector instruction + scalar instruction: same as the case of scalar instruction + vector instruction
- Uop + scalar instruction + vector instruction after instruction splitting: Scalar instructions are handled the same as uop + scalar instruction cases after splitting, and vector instructions are handled the same as vector instruction cases.
Overall Block Diagram
Interface list
Refer to the interface documentation.
Sub-module VTypeGen
The VTypeGen module is primarily used to maintain the vtype configuration required by the currently decoded vector instruction. It updates the stored vtype information whenever a vset instruction is executed or a rollback is needed due to redirection.
Input
- 32-bit instruction information from the front-end instruction stream;
- Vtype rollback information from the vtype buffer in ROB.
- vtype commit information from the vtype buffer in the rob;
- The vtype information from the backend's vsetvl instruction, as the vtype information of the vsetvl instruction needs to be obtained by reading registers rather than decoding. Therefore, when the vsetvl instruction is written back, the vtype information is passed to vtypeGen.
Output
vtype information output to the Decode Unit (current vtype configuration used by vector instructions in the decode stage)
Design specifications
There are four scenarios for vtypeSpec updates:
-
When a vsetvl instruction commits, vtypeSpec is updated to the vtype of the vsetvl instruction, where the vtype value is obtained when the vsetvl instruction writes back. Since the vsetvl instruction flushes the pipeline, it does not conflict with other scenarios.
-
During the rollback process, vtypeSpec is updated to the walkVType passed by the vtype buffer
-
At the start of redirection, vtypeSpec is updated to Arch vtype
-
When the decoded instruction contains vsetivli or vsetvli instructions and no exception occurs, the vtype information of vsetivli and vsetvli instructions can be obtained from the immediate field. VTypeGen includes a simple decoder to determine if the input instruction contains these two types of instructions. If such vset instructions exist, the first vset instruction is selected via a PriorityMux, and the vtype information is parsed by the
VsetModulemodule.
when(io.commitVType.hasVsetvl) {
vtypeSpecNext := io.vsetvlVType
}.elsewhen(io.walkVType.valid) {
vtypeSpecNext := io.walkVType.bits
}.elsewhen(io.walkToArchVType) {
vtypeSpecNext := vtypeArch
}.elsewhen(inHasVset && io.canUpdateVType) {
vtypeSpecNext := vtypeNew
}
There are two scenarios for vtypeArch updates: 1. When the vsetvl instruction is committed, the vtypeArch is updated to the vtype written back by the vsetvl instruction. 2. When the vsetivli or vsetvli instruction is committed, vtypeArch is updated with the vtype commit information passed from the vtype buffer.
Secondary module DecodeUnit.
Input and Output
- Input
- DecodeUnitEnqIO: Instruction stream information from the frontend, including vtype and vstart information used by vector instructions
- CustomCSRCtrlIO: CSR control signals
- CSRToDecode: csr control signals
- Output
- DecodeUnitDeqIO: Decoded instruction information, whether it is a vector instruction, and the number of instruction splits
Function
This module is the decode unit of the Xiangshan backend. It converts control flow into more information-rich micro-operations, including source register numbers, source register types, destination register numbers, destination register types, immediate types, functional unit types used, operation types, and other information.
Design specifications
- Decoding information
-
XSDecode\ DecodeConstants defines decodeArray, which converts the 32-bit encoding of an instruction into XSDecode, containing the following information:
- srcType0: Type of source register 0
- srcType1: Source register 1 type
- srcType2: Source register 2 type, used for fma instructions
- fuType: functional unit type
- fuOpType: Operation type
- rfWen: Whether to write back to the scalar register.
- fpWen: Whether to write back to the floating-point register
- vfWen: Vector register write-back enable
- isXSTrap: Whether it is an XSTrap instruction.
- noSpecExec: Whether the instruction can execute out-of-order, i.e., does not need to wait for preceding instructions to commit before execution.
- blockBackward: Whether to block subsequent instructions, i.e., subsequent instructions must wait for the current instruction to commit before entering the ROB.
- flushPipe: Whether the pipeline needs to be flushed, i.e., the pipeline must be cleared after the current instruction commits
- canRobCompress: Whether the instruction supports ROB compression (for instructions that do not trigger exceptions and are not at the boundary of FTQ, we consider them compressible in ROB).
- uopSplitType: Instruction splitting type. Scalar instruction splitting types are all UopSplitType.SCA_SIM and do not require splitting, while vector instructions and AMO_CAS instructions need splitting. If a vector instruction only needs to split into one uop and does not require modification of instruction control signals, the splitting type is UopSplitType.dummy, allowing it to enter the vector complex decoder for vector instruction exception checking.
-
VPUCtrlSignals\ Vector and floating-point instructions require VPUCtrlSignals configuration. VPUCtrlSignals contains information such as sew and lmul for vector configuration.
- Vector instruction: The vector configuration information comes from the vtype information of VtypeGen in the DecodeStage.
- Floating-point instructions: The floating-point module is independent of the vector module but shares the same execution units as the vector module. The execution units specify the element width via sew information, so a dedicated decoding submodule, FPToVecDecoder, generates VPUCtrlSignals control signals for floating-point instructions.
-
FPUCtrlSignals\ Generated in the decoding submodule FPDecoder, the rm signal is used to control floating-point rounding, wflags is used to control the i2f module and fflag updates, and the remaining signals are used to control the i2f module.
class FPUCtrlSignals(implicit p: Parameters) extends XSBundle { val typeTagOut = UInt(2.W) // H S D val wflags = Bool() val typ = UInt(2.W) val fmt = UInt(2.W) val rm = UInt(3.W) }- uopnum
UopInfoGengenerates the number of instruction splits. Scalar instructions have a split count of 1, AMO_CAS instructions may split into 2 or 4 depending on type, while vector instructions require lmul-based split calculation, with vector memory instructions additionally considering lmul, sew, and eew for split count.
- uopnum
-
Translation processing
- move instruction\ Since the move instruction is a special addi instruction, it is identified by the instruction field, and move elimination is performed in the subsequent rename stage.
- zimop instruction\ Since the zimop instruction only requires writing vd as 0, it is translated into an addi instruction with src as x0 and imm as 0.
- csrr vlenb instruction The value of vlenb is fixed, translated into an addi instruction with src as x0 and imm as VLEN/8.
- csrr vl instruction vl uses an independent register file, thus supporting renaming and out-of-order execution. Reading vl instruction is converted to a vset instruction that reads vl and writes to the corresponding rd
- Software prefetch instruction Modify fuType to FuType.ldu.U and pass it to the corresponding functional unit for processing.
-
** Exception handling ** DecodeUnit will handle
illegalInstr(exception value 2) andvirtualInstr(exception value 22) two types of exceptions- illegalInstr
- Check if the immediate selection is invalid.
- Exceptions when executing instructions under certain CSR settings.
- Vector-related exceptions are not checked in this module but are handled in the complex decoder.
- virtualInstr
- Exceptions when executing instructions under certain CSR settings.
Secondary module DecodeUnitComp
Input and Output
Instruction splitting only modifies operand register numbers and operand types in the instruction, so both input and output types are DecodeUnitCompInput. Since the vtype information for vset instructions needs to be obtained through decoding rather than vtypegen, the vtypebypass signal is used to update the vtype used by the vset instruction to the vtype information of that vset instruction. - DecodeUnitCompIO
class DecodeUnitCompIO(implicit p: Parameters) extends XSBundle {
val redirect = Input(Bool())
val csrCtrl = Input(new CustomCSRCtrlIO)
val vtypeBypass = Input(new VType)
// When the first inst in decode vector is complex inst, pass it in
val in = Flipped(DecoupledIO(new DecodeUnitCompInput))
val out = new DecodeUnitCompOutput
val complexNum = Output(UInt(3.W))
}
Function
This module splits a vector instruction into multiple micro-operations based on the split type and lmul information, while modifying operand register numbers and operand types in the micro-operations. It also performs exception checking for vector instructions. The module uses a state machine where the ready signal only goes high when there are no instructions being processed or when the current instruction's processing is completed, allowing it to handle the next instruction.
Design specifications
Currently, there are many types of instruction splits, which will be optimized and simplified in the future.
| Splitting type | Corresponding instruction type |
|---|---|
| AMO_CAS_W/AMO_CAS_D/AMO_CAS_Q | AMO_CAS instruction |
| VSET | vset instruction |
| VEC_VVV. | Instructions where both source registers and destination registers are vector registers |
| VEC_VFV | An instruction where one source register is a floating-point register, and both the other source register and the destination register are vector registers. |
| VEC_EXT2/VEC_EXT4/VEC_EXT8 | Vector sign-extension instruction. |
| VEC_0XV | Scalar-to-vector move instruction |
| VEC_VXV | An instruction where one source register is a scalar register, and both the other source register and the destination register are vector registers. |
| VEC_VVW/VEC_VFW/VEC_WVW/VEC_VXW/VEC_WXW/VEC_WVV/VEC_WFW/VEC_WXV | widening/narrowing vector instructions |
| VEC_VVM/VEC_VFM/VEC_VXM | Vector instruction with destination register as mask register |
| VEC_SLIDE1UP | vslide1up instruction |
| VEC_FSLIDE1UP | vfslide1up instruction |
| VEC_SLIDE1DOWN | vslide1down instruction |
| VEC_FSLIDE1DOWN. | vfslide1down instruction |
| VEC_VRED | Scalar reduction instruction |
| VEC_VFRED | Out-of-order floating-point reduction instruction. |
| VEC_VFREDOSUM | Sequential floating-point reduction instruction |
| VEC_SLIDEUP | vslideup instruction |
| VEC_SLIDEDOWN | vslidedown instruction |
| VEC_M0X | vcpop instruction |
| VEC_MVV | vid/viota instruction |
| VEC_VWW. | Scalar widening reduction instructions |
| VEC_RGATHER | vrgather instruction. |
| VEC_RGATHER_VX | vrgather instruction with one operand from a scalar register |
| VEC_RGATHEREI16 | vrgatherei16 instruction |
| VEC_COMPRESS | vcompress instruction |
| VEC_MVNR | vmvnr instruction. |
| VEC_US_LDST | Unit-stride load/store instruction |
| VEC_S_LDST | strided load/store instructions. |
| VEC_I_LDST | indexed load/store instructions |
Secondary module VecExceptionGen.
- Inputs:
inst: 32-bit instructiondecodedInst: Decoded instruction informationvtype: vtype information-
vstart: vstart information -
Output:
illegalInst: Whether the instruction is illegal
Function
Check for exceptions in vector instructions; all exceptions except those related to vector memory access are checked during the decode stage.
Design specifications
Vector instruction-related exceptions are categorized into the following eight types:
| Exception name | Description |
|---|---|
| inst Illegal | Reserved instruction raises an exception. |
| vill Illegal | When the vill field of vtype is 1, executing any vector instruction other than vset raises an exception. |
| EEW Illegal | Vector floating-point instructions, sign-extension instructions, widening instructions, and narrowing instructions eew exception. |
| EMUL Illegal | Vector memory instructions, sign-extension instructions, widening instructions, narrowing instructions, vrgatherei16 instruction elmul exception |
| Reg Number Align. | vs1, vs2, vd not aligned to lmul |
| v0 Overlap | Exception is raised when certain instructions read the v0 register while simultaneously modifying v0. |
| Src Reg Overlap | Exception is raised when instructions vs1, vs2, and vd partially overlap |
| vstart Illegal | When vstart is not equal to 0, executing vector instructions other than vset and vector memory access instructions will raise an exception. |
If one of them triggers an exception, the exception signal is raised.