Secondary module PMP&PMA
PMP includes the following modules, with PMA checks incorporated within the PMP module:
- PMP (Distributed PMP & PMA Registers)
- PMP pmp (Frontend)
- PMP pmp (Memblock)
- PMP pmp (L2TLB)
- PMPChecker (PMP & PMA checker, returns results in the same cycle)
- PMPChecker PMPChecker (Frontend)
- PMPChecker PMPChecker_1 (Frontend)
- PMPChecker PMPChecker_2 (Frontend)
- PMPChecker PMPChecker_3 (Frontend)
- PMPChecker PMPChecker (L2TLB)
- PMPChecker PMPChecker_1 (L2TLB)
- PMPChecker_8 (PMP & PMA checker, returns results in the next cycle)
- PMPChecker_8 PMPChecker (Memblock)
- PMPChecker_8 PMPChecker_1 (Memblock)
- PMPChecker_8 PMPChecker_2 (Memblock)
- PMPChecker_8 PMPChecker_3 (Memblock)
- PMPChecker_8 PMPChecker_4 (Memblock)
- PMPChecker_8 PMPChecker_5 (Memblock)
Design specifications
- Supports physical address protection
- Supports physical address attributes
- Supports parallel execution checks for PMP and PMA
- Supports dynamic and static checking
- Supports distributed PMP and distributed PMA
- Supports exception handling mechanism
Function
Supports physical address protection
The Xiangshan processor supports physical address protection (PMP) checks, with PMP defaulting to 16 entries, which can be modified parametrically. For timing considerations, a distributed replication implementation method is adopted. The PMP registers in the CSR unit are responsible for instructions like CSRRW. Copies of the PMP registers are maintained at the front-end instruction fetch, back-end memory access, and Page Table Walker locations, with consistency ensured by pulling CSR write signals to match the PMP registers in the CSR unit.
For the format, reset values, etc., of PMP registers, please refer to the Xiangshan Open-Source Processor User Manual and the RISC-V Privileged Level Manual.
Supports physical address attributes
The implementation of Physical Memory Attributes (PMA) adopts a PMP-like approach, utilizing two reserved bits in the PMP Configure register, set as atomic and cacheable, indicating support for atomic operations and cacheability, respectively. PMP registers have no initial values, while PMA registers default to initial values that must be manually set to match the platform's address attributes. PMA registers utilize reserved CSR addresses in M-mode, defaulting to 16 entries, with parameterizable modifications allowed.
For the default PMA configuration, please refer to the Xiangshan Open-Source Processor User Manual.
PMP and PMA perform parallel checks
PMP and PMA checks are performed in parallel. If either permission is violated, the operation is illegal. All physical address accesses within the core require physical address permission checks, including after ITLB and DTLB checks and before Page Table Walker, Hypervisor Page Table Walker, and Last Level Page Table Walker memory accesses. The distributed PMP, PMA, and the corresponding PMP and PMA checkers for ITLB, DTLB, Page Table Walker, Last Level Page Table Walker, and Hypervisor Page Table Walker are shown in 此表. In other words, Frontend, Memblock, and L2 TLB each maintain a copy of the PMP and PMA registers (see Section 5.2.5), which drive their respective PMP and PMA checkers.
| Module | Channel | Distributed PMP & PMA | PMP&PMA Check Unit |
|---|---|---|---|
| ITLB | |||
| requestor(0) | pmp (Frontend) | PMPChecker | |
| requestor(1) | pmp (Frontend) | PMPChecker_1 | |
| requestor(2) | pmp (Frontend) | PMPChecker_2 | |
| requestor(3) | pmp (Frontend) | PMPChecker_3 | |
| DTLB_LD | |||
| requestor(0) | pmp (Memblock) | PMPChecker | |
| requestor(1) | pmp (Memblock) | PMPChecker_1 | |
| requestor(2) | pmp (Memblock) | PMPChecker_2 | |
| DTLB_ST | |||
| requestor(0) | pmp (Memblock) | PMPChecker_3 | |
| requestor(1) | pmp (Memblock) | PMPChecker_4 | |
| DTLB_PF | |||
| requestor(0) | pmp (Memblock) | PMPChecker_5 | |
| L2 TLB | |||
| Page Table Walker | pmp (L2 TLB) | PMPChecker | |
| Last Level Page Table Walker | pmp (L2 TLB) | PMPChecker_1 | |
| Hypervisor Page Table Walker | Pmp (L2TLB) | PMPChecker_2 |
According to the RV manual, Page Fault has higher priority than Access Fault. However, if a Page Table Walker or Last Level Page Table Walker encounters an Access Fault during PMP or PMA checks, the page table entry is invalid, resulting in the special case where both Page Fault and Access Fault occur simultaneously. Xiangshan chooses to report the Access Fault. The manual does not explicitly address this scenario, or it may contradict the manual. In all other cases, Page Fault takes precedence over Access Fault.
Dynamic and static checks
According to the manual, PMP and PMA checks should be dynamic, meaning they must be performed after TLB translation using the translated physical address for physical address permission checks. The Frontend, L2 TLB, and the 5 PMPCheckers in Memblock (see 此表) all perform dynamic checks. For timing considerations, the PMP & PMA check results of the DTLB can be queried in advance and stored in the TLB entry during backfill, which constitutes static checking. Specifically, when the L2 TLB's page table entry is backfilled into the DTLB, the backfilled page table entry is simultaneously sent to PMP and PMA for permission checks, and the resulting attribute bits (including R, W, X, C, Atomic; the specific meanings of these bits are detailed in Section 5.4) are stored in the DTLB. This allows these check results to be directly returned to MemBlock without rechecking. To implement static checking, the granularity of PMP and PMA must be increased to 4KB.
It is important to note that currently, PMP & PMA checks are not the timing bottleneck for Kunming Lake, hence static checks are not employed; all checks are performed dynamically, i.e., after obtaining the physical address through TLB lookup. The Kunming Lake V1 code does not include static checks, only dynamic checks—please take note again. However, for compatibility, the granularity of PMP and PMA remains at 4KB.
The result information obtained from dynamic and static checks is as follows:
- Dynamic Check: Returns whether an inst access fault, load access fault, or store access fault occurred; checks if the physical address belongs to the mmio address space.
- Static check: Returns the attribute bits of the checked physical address, including R, W, X, C, and Atomic. Note that Kunminghu V1 does not perform static checks by default.
Distributed PMP and PMA
The specific implementation of PMP and PMA includes four parts: CSR Unit, Frontend, Memblock, and L2 TLB. The CSR Unit is responsible for responding to CSR instructions like CSRRW for reading and writing these PMP and PMA registers. Due to the considerable distance between the CSR Unit and ITLB, DTLB, and L2 TLB, copies of PMP and PMA must be stored in ITLB, DTLB, and L2 TLB for physical address checks and physical attribute checks. To achieve this, we need to implement distributed PMP and PMA, maintaining backups of these registers near ITLB, DTLB, and L2 TLB.
Backups of these PMP and PMA registers are stored in the Frontend, Memblock, and L2 TLB, which are responsible for address checking. Pulling the CSR write signals ensures the consistency of these register contents. Due to the smaller size of the L1 TLB, the backups of PMP and PMA registers are stored in the Frontend or Memblock, providing checks for ITLB and DTLB respectively. The larger size of the L2 TLB allows the backups of PMP and PMA registers to be stored directly within it.
PMP and PMA Check Process
Before obtaining physical addresses from ITLB and DTLB queries, and before L2 TLB's Page Table Walker, Last Level Page Table Walker, and Hypervisor Page Table Walker access memory, physical address checks must be performed. ITLB, DTLB, and L2 TLB need to provide PMPChecker with information including PMP and PMA configuration registers, relevant information from address registers; the number of consecutive 1s from low to high in PMP and PMA address registers (since the granularity of PMP and PMA is 4KB, the minimum is 12); the physical address to be queried; and the type of permission to query, including execute (ITLB), read/write (L2 TLB, LoadUnits, and StoreUnits), and atomic read/write (AtomicsUnit).
The relevant information required for PMP and PMA check requests is shown in 此表:
| PMPChecker module | Information required | Source |
|---|---|---|
| Frontend | ||
| PMP and PMA Configuration Registers | Frontend pmp | |
| PMP and PMA Address Registers | Frontend pmp | |
| The mask for PMP and PMA, i.e., the number of consecutive 1s from low to high in the address registers, with a minimum of 12 | Frontend pmp | |
| The queried paddr | Icache, IFU | |
| The queried cmd, ITLB is fixed at 2, indicating execution permission is required | Icache, IFU | |
| Memblock | ||
| PMP and PMA Configuration Registers | Memblock PMP | |
| PMP and PMA Address Registers | Memblock PMP | |
| The mask for PMP and PMA, i.e., the number of consecutive 1s from low to high in the address registers, with a minimum of 12 | Memblock PMP | |
| The queried paddr | LoadUnits, L1 Load Stream & Stride Prefetch StoreUnits, AtomicsUnit, SMSprefetcher | |
| The queried cmd, where DTLB may be 0, 1, 4, or 5; representing read, write, atom_read, and atom_write permissions respectively. | LoadUnits, L1 Load Stream & Stride Prefetch StoreUnits, AtomicsUnit, SMSprefetcher | |
| Memblock static check | ||
| PMP and PMA Configuration Registers | Memblock PMP | |
| PMP and PMA Address Registers | Memblock PMP | |
| PMP and PMA mask, where the mask format has the lower i bits as 1 and higher bits as 0, with i being the count of log2(address space matched by the PMP entry) | Memblock PMP | |
| The queried paddr | PTW returned by L2 TLB | |
| L2 TLB | ||
| PMP and PMA Configuration Registers | L2 TLB PMP | |
| PMP and PMA Address Registers | L2 TLB PMP | |
| PMP and PMA mask, where the mask format has the lower i bits as 1 and higher bits as 0, with i being the count of log2(address space matched by the PMP entry) | L2 TLB PMP | |
| The queried paddr | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker | |
| The query cmd for L2 TLB is fixed at 0, indicating read permission is required. | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker |
PMPChecker needs to return to ITLB, DTLB, and L2 TLB whether an inst access fault (ITLB), load access fault (LoadUnits, L2 TLB), store access fault (StoreUnits, AtomicsUnit) occurred, and whether the address belongs to MMIO space (ITLB, DTLB, L2 TLB). Additionally, static checks need to populate the DTLB with address attribute bits, including cacheable, atomic, x, w, and r.
For requests from ITLB and L2 TLB, the PMP and PMA check results are provided in the same cycle; for requests from DTLB, the results are provided in the next cycle. The relevant information returned by PMP and PMA checks is shown in 此表:
| PMPChecker module | Information to be returned | Destination |
|---|---|---|
| Frontend | ||
| Whether an inst access fault occurs | Icache, IFU | |
| Whether the address belongs to MMIO space | Icache, IFU | |
| Memblock dynamic check | ||
| Whether a load access fault occurs | LoadUnits | |
| Whether a store access fault occurs | StoreUnits, AtomicsUnit | |
| Whether the address belongs to MMIO space | LoadUnits, StoreUnits, AtomicsUnit | |
| Memblock static check | ||
| Is the address cacheable | DTLB | |
| Whether the address is atomic | DTLB | |
| Whether the address is executable | DTLB | |
| Whether the address is writable | DTLB | |
| Is the address readable | DTLB | |
| L2 TLB | ||
| Whether a load access fault occurs | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker | |
| Whether the address belongs to MMIO space | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker |
Exception handling
Exceptions that may arise from PMP and PMA checks include: inst access fault (ITLB), load access fault (LoadUnits, L2 TLB), store access fault (StoreUnits, AtomicsUnit). For exceptions generated by ITLB and DTLB, they are respectively delivered to the module that sent the physical address query based on the request source. ITLB exceptions are delivered to Icache or IFU; DTLB exceptions are delivered to LoadUnits, StoreUnits, or AtomicsUnit for handling.
Since Page Table Walker, Last Level Page Table Walker, or Hypervisor Page Table Walker must perform PMP and PMA checks on the physical address before accessing memory, L2 TLB may generate an access fault. L2 TLB does not directly handle the generated access fault but returns this information to L1 TLB. Upon detecting an access fault during a query, L1 TLB will generate an inst access fault, load access fault, or store access fault based on the requested cmd and deliver it to the respective modules for processing according to the request source.
Possible exceptions and the MMU module's handling process are shown in 此表:
| module | Possible Exceptions | ** processing flow ** |
|---|---|---|
| ITLB | ||
| Generate inst access fault | Deliver to Icache or IFU for processing based on request source | |
| DTLB | ||
| Generate a load access fault | Hand over to LoadUnits for processing. | |
| Generate store access fault | Based on the request source, it is processed by StoreUnits or AtomicsUnit respectively | |
| L2 TLB | ||
| Generate access fault | Delivered to L1 TLB, which processes the request based on its origin |
Check rules
The checking rules for PMP and PMA in the Xiangshan Kunminghu architecture follow the PMP and PMA sections of the RV manual. Here, only the matching patterns are introduced. The physical address range controlled by a PMP or PMA entry is determined jointly by the A bit in the PMP or PMA configuration register and the PMP or PMA address register. To support static checking in DTLB (see Section 5.4.2.4), the granularity of PMP and PMA needs to be increased to 4KB. Therefore, the minimum physical address range controlled by a PMP or PMA entry is 4KB.
The configuration register A bit corresponds to the following matching modes: A bit values of 0, 1, 2, and 3 correspond to OFF, TOR, NA4, and NAPOT modes respectively.
- A is 0, OFF mode: This PMP or PMA entry is disabled and does not match any address;
- A is 1, TOR mode (Top of range): Matches addresses from the previous PMP or PMA entry's address register up to the current PMP or PMA entry's address register;
- A is 2, NA4 mode (Naturally Aligned Four-byte regions): Kunminghu architecture in Xiangshan does not support NA4 mode;
- A is 3, NAPOT mode (Naturally Aligned Power-of-two regions): Starting from the
lower bits of the PMP or PMA address register, count the number of consecutive
1s. Let the PMP or PMA address register be
ADDR=yyy...111(with x 1s), then the matched address starts fromyyy...000(ADDR >> 2bits) and spans \(2^{x+3}\) bits. Since the Kunming Lake architecture of Xiangshan specifies the minimum granularity for PMP or PMA checks as 4KB, the smallest matched address range is 4KB.
To facilitate address matching, distributed PMP and PMA need to send mask signals to the PMPChecker. The mask format has the lower i bits as 1 and higher bits as 0, where i is the number of log2(address space matched by the PMP entry). The mask value is updated simultaneously when PMP and PMA entries are updated. The Kunming Lake architecture of Xiangshan supports a minimum granularity of 4KB for PMP and PMA, so the lower 12 bits of the mask signal are always 1.
For example, if a PMP entry's pmpaddr is 16'b1111_0000_0000_0000, since the
minimum granularity supported by Kunminghu architecture in Xiangshan for PMP and
PMA is 4KB, the address range matched by napot mode is \(2^{12}\) B, i.e., 4 KB,
and the mask signal value is 18'hfff.
For example, if the pmpaddr of a certain PMP entry is 16'b1011_1111_1111_1111,
the address range matched in NAPOT mode is \(2^{17}\) B (128KB), and the mask
signal value is 18'h1ffff.
Overall Block Diagram
The overall block diagrams of the PMP module and PMA module are shown in 此图 and 此图 respectively. The CSR Unit is responsible for responding to CSR instructions like CSRRW for read/write operations on these PMP and PMA registers; backups of these PMP and PMA registers are included in the Frontend, Memblock, and L2 TLB to handle address checking. By pulling the write signals from the CSR, the consistency of these register contents is ensured.


Interface list
Refer to the interface list documentation.
Interface timing
For ITLB and L2 TLB, PMP and PMA checks must return results in the same cycle; for DTLB, PMP and PMA checks will return results in the next cycle. The interface timing for ITLB and L2 TLB PMP modules is shown in 此图.
The timing of the DTLB PMP module interface is shown in 此图, with identical timing for both static and dynamic checks.