跳转至

[XiangShan Biweekly 87] 20251013

Welcome to XiangShan biweekly column! Through this column, we will regularly share the latest development progress of XiangShan. We look forward to your contribution.

This is the 87th issue of the biweekly report.

In the past two weeks, ~~the XiangShan Team had a great National Day Holiday~~the frontend team continued to fix performance bugs caused by the V3 BPU refactoring. The backend team organized existing code and continued to promote V3 development. The memory and cache team fixed several V2 bugs while conducting code refactoring work to prepare for V3 development.

We also have an announcement to make: the XiangShan Team will be giving a tutorial at the MICRO 2025 conference on October 19th. We are very much looking forward to seeing everyone in Seoul!

Recent Developments

Frontend

  • RTL feature
  • Remove identifiedCfi (#5025)
  • Merge V3 SC skeleton (#5062, #5097)
  • Support WriteBuffer multi-port write (#5081)
  • Optimize V3 ITTAGE, use WriteBuffer, support resolve training (#5099)
  • Bug Fix
  • Fix the issue of cfiPosition field when BPU fallthrough prediction (#5058)
  • Fix several typos in TAGE (#5086, #5090)
  • Fix several typos in MBTB (#5096)
  • Fix the issue of meta SRAM read-write conflict caused by FTQ resolve queue not being flushed by redirect (#5085, #5104)
  • Remove newest target logic in FTQ-backend interface (#5101)
  • Fix the issue of ICache WayLookup read-write conflict assertion condition (#5082)
  • Fix the issue that Direct branches are corrected by IFU, causing the backend failed to mark as mispredicted and BPU cannot be trained (#5103)
  • Code quality improvements
  • Refactor BPU first taken branch selection logic, using CompareMatrix class (#5075)
  • Fix the issue that FTQ resolve queue branch index calculation starts from 1 (#5092)
  • Refactor IFU MMIO fetch handling logic (#5021)
  • Refactor IFU redirect port, moving arbitration logic from FTQ to IFU (#5064)
  • Debugging tools
  • Add BpTrace and related analysis scripts (#5091)

Backend

  • Bug Fix
  • Fix the issue when ROB compress close, fusion which cross two ftq cannot be compressed (#5079)
  • RTL new features
  • Optimize the timing of vialuf in NewMgu
  • Code quality improvements
  • Add jalr/jal/auipc implementation to ALU (#5078)

MemBlock and Cache

  • Bug Fix
  • (V2)Fix incorrect TLB level refill during exceptions, which caused large pages to be recorded as small pages(#5087
  • (V2)fix bitmap check result wakeup l0BitmapReg logic(#5073
  • Move NEMU's PMEMBASE to higher address to prevent conflicts with mmap mappings that use the MAP_FIXED mode. Future work may further eliminate the dependency on MAP_FIXED(NEMU #930
  • Timing optimization
  • Split the data SRAM of CoupledL2 into 4 parts to meet the new physical design backend requirements (CoupledL2 #432)
  • The timing fix of old MMU, LoadQueueReplay, etc. is on going
  • RTL new features
  • The refactoring of MMU, LoadUnit, StoreQueue, L2, etc. is ongoing
  • L1 Acquire gets the way information to save the cost of reading the directory to obtain the number of way during Release. Fixing bugs

Performance Evaluation

SPECint 2006 est. @ 3GHz SPECfp 2006 est. @ 3GHz
400.perlbench 35.87 410.bwaves 67.22
401.bzip2 25.51 416.gamess 41.01
403.gcc 47.87 433.milc 45.07
429.mcf 60.18 434.zeusmp 51.83
445.gobmk 30.62 435.gromacs 33.65
456.hmmer 41.61 436.cactusADM 46.20
458.sjeng 30.62 437.leslie3d 47.80
462.libquantum 122.58 444.namd 28.87
464.h264ref 56.59 447.dealII 73.82
471.omnetpp 41.41 450.soplex 52.48
473.astar 29.30 453.povray 53.50
483.xalancbmk 72.81 454.Calculix 16.38
GEOMEAN 44.66 459.GemsFDTD 39.71
465.tonto 36.72
470.lbm 91.98
481.wrf 40.64
482.sphinx3 49.13
GEOMEAN 44.96

We use SimPoint to sample programs and create checkpoints images based on our custom format. The coverage of SimPoint clustering reaches 100%. Note that the above scores are estimated based on program segments rather than a complete SPEC CPU2006 evaluation, which may deviate from the actual performance of real chips.

Compilation parameters are as follows:

Compiler gcc12
Optimization level O3
Memory library jemalloc
-march RV64GCB
-ffp-contraction fast

Processor and SoC parameters are as follows:

Commit defcc01
Date 10/10/2025
L1 ICache 64KB
L1 DCache 64KB
L2 Cache 1MB
L3 Cache 16MB
LSU 3ld2st
Bus protocol TileLink
Memory latency DDR4-3200

Editors: Zhihao Xu, Junxiong Ji, Zhuo Chen, Junjie Yu, Yanjun Li