跳转至

2025

[XiangShan Biweekly 92] 20251222

Welcome to XiangShan biweekly column! Through this column, we will regularly share the latest development progress of XiangShan.

This is the 92nd issue of the biweekly report.

In the last issue of the biweekly report in 2025, we are excited to announce the performance evaluation results of the current Kunminghu V3 architecture on SPEC CPU2006 for the first time! Since the performance regression of Kunminghu V3 started in August this year, a total of 11 performance regressions have been completed. These 11 performance regressions witness the process of the XiangShan team working together to rapidly develop and iterate on the design. The initial version of Kunminghu V3 scored only 3.717 points/GHz in the SPEC 2006 test. Now, in the latest performance regression, V3 has reached 16.081 points/GHz, surpassing the score of V2. V3 has also replaced V2 as the new mainline of the XiangShan repository!

Performance Regression Results for XiangShan Kunminghu

During this process, ~~the frontend undoubtedly took the biggest blame~~ the most significant change is the brand-new frontend of V3. The new frontend has greatly improved instruction bandwidth, now capable of predicting up to 8 branches and providing 32 instructions per cycle. Meanwhile, the backend and memory subsystem have also increased their throughput capabilities, including increasing from 6 to 8 issue ports and adjusting the sizes of various queues.

It is worth noting that the performance data curve of V3 vividly reflects the agile development philosophy of the XiangShan team. Unlike traditional waterfall development processes, the development of V3 is not a one-time delivery of all code, but rather a result of rapid iteration and continuous evolution based on the initial code. We believe that this new philosophy will bring a new development paradigm to the industry and will certainly help Kunminghu V3 reach new heights, further enhancing the performance benchmark of open-source processors.

We appreciate your companionship and support for XiangShan, and we look forward to your continued attention to the subsequent progress of Kunminghu V3!

In terms of XiangShan development, the frontend has fixed some BPU-related performance bugs and added numerous performance counters for better performance analysis. The backend continues to advance the design of the new vector unit. The memory subsystem has fixed several bugs in V2 and is continuing with V3 module refactoring and infrastructure construction.

【香山双周报 92】20251222 期

欢迎来到香山双周报专栏,我们将通过这一专栏定期介绍香山的开发进展。

本次是第 92 期双周报。

在 2025 年的最后一期双周报里,我们将首次公布目前昆明湖 V3 架构在 SPEC CPU2006 上的性能评估结果!昆明湖 V3 自今年 8 月启动性能回归以来,已经完成了 11 次性能回归。这 11 次性能回归见证了香山团队齐心协力,对设计进行快速开发迭代的过程。昆明湖 V3 的最初版本在 SPEC 2006 测试中只有 3.717 分/GHz。现在,在最新一次性能回归中,V3 已经达到了 16.081 分/GHz,超过了 V2 的分数。V3 也已替代 V2 成为了香山仓库的新主线!

Performance Regression Results for XiangShan Kunminghu

在这一过程中,~~前端毫无疑问背了最大的锅~~最大的变化是 V3 的全新前端。新的前端大幅提高了指令带宽,现在每周期最多可预测 8 条分支、提供 32 条指令。同时,后端与访存也相应拉大了吞吐能力,包括从 6 发射提升至 8 发射、调整各个队列大小等。

值得注意的是,V3 性能数据的变化曲线,正是香山团队敏捷开发理念的生动体现。和传统瀑布式开发流程不同,V3 的开发并非一蹴而就、一次性交付全部代码,而是在初始代码的基础上快速迭代、持续演进的结果。我们相信,这一新的理念将为业界带来新的开发范式,也一定能够推动昆明湖 V3 迈上新的台阶,进一步提升开源处理器的性能标杆。

感谢大家对香山的陪伴与支持,也期待您继续关注昆明湖 V3 的后续进展!

香山开发方面,前端修复了一些 BPU 相关的性能 bug,同时添加了大量性能计数器,以便更好地进行性能分析。后端继续推进新向量单元的设计。访存修复了 V2 的数个 bug,同时继续进行了 V3 模块重构与基础设施搭建。

【2025 香山入门指南 · 我在 827 做访存】(八)再探内存访问

在“2025 香山入门指南”系列文章中,我们希望构建一个基于 2025 年 6 月昆明湖 V2 版本的香山(XiangShan,提交哈希为 6318236)上手指南,通过一系列引导性的入门指南,来引导新同学们学习、了解并最终掌握香山。

827 是昆明湖项目访存组的主要办公室,“我在 827 做访存”系列以此为题介绍香山访存部分的设计。本文作为 《我在 827 做访存》 系列连载的最终章,深入剖析了访存子系统的微架构设计细节与优化权衡。本文首先深入介绍了香山昆明湖 V2 访存单元中一些前面文章没有介绍的内容,主要包括指令微操作 (uop) 的作用、访存流水线与访存队列联合性能优化机制;然后我们探讨了一些现代处理器中常见的访存优化机制并分析了其中的取舍关系;最后,本文介绍了若干访存相关的 RISC-V 扩展,以对技术迭代的展望完成了对整个访存体系的系统性总结。

最后,感谢大家一路的支持,希望我们可以一起谱写新的篇章!

【2025 香山入门指南 · 我在 827 做访存】(七)关于访存队列你所需要知道的一切

在“2025 香山入门指南”系列文章中,我们希望构建一个基于 2025 年 6 月昆明湖 V2 版本的香山(XiangShan,提交哈希为 6318236)上手指南,通过一系列引导性的入门指南,来引导新同学们学习、了解并最终掌握香山。

827 是昆明湖项目访存组的主要办公室,“我在 827 做访存”系列以此为题介绍香山访存部分的设计。本文是本专题的第七部分,主要介绍访存队列。本文主要分析了香山昆明湖 V2 处理器的访存队列(LSQ)设计,阐述了其如何在保障乱序访存正确性的同时提升执行效率。主要介绍下述内容:利用 StoreQueue 进行 Store 指令顺序维护与 Load 数据前递,通过 LoadQueueRAR 处理多核一致性问题,结合 Svpbmt 扩展管理 Main Memory、I/O 及 NC(Non-Cacheable)等不同内存属性,以及使用具备复杂仲裁逻辑的 LoadQueueReplay 模块来处理各类 Load 重发需求。本文并没有将 LSQ 的所有细节全部给出,但是介绍了 LSQ 的大部分关键功能以此抛砖引玉。

[XiangShan Biweekly 91] 20251208

Welcome to XiangShan biweekly column! Through this column, we will regularly share the latest development progress of XiangShan.

This is the 91st issue of the biweekly report.

The final performance regression of XiangShan V2 has been successfully completed! Unknowingly, we have completed a total of 46 biweekly performance regression tests. These 46 regressions not only record the performance evolution of XiangShan V2, but also serve as a strong testament to the vigorous development and continuous progress of the XiangShan processor.

Performance Regression Results for XiangShan Kunminghu V2

Over the past four years, the XiangShan processor has gradually evolved from a campus course project into an industrial-grade processor. From the starting point "Yanqi Lake", to "Nanhu", which achieved industrial delivery for the first time and has been used in multiple projects, and then to "Kunming Lake V2", the current highest-performance open-source processor core - the evolution of these three generations of XiangShan not only embodies the efforts of every team member but also relies on the continuous attention and strong support from community partners. Here, we express our sincere gratitude to everyone!

Now, it is time to bid farewell to XiangShan V2 and welcome XiangShan V3! V3 will have more powerful performance than V2, which also means greater challenges. It is an uncharted territory for the XiangShan team, and every step we take is writing new history. However, we firmly believe that through the new concept and method of "open source", we can move forward together with the entire community and further enhance the performance benchmark of open-source processors.

Thank you for your companionship and support for XiangShan, and we look forward to your continued attention to the subsequent progress of XiangShan V3!

In terms of XiangShan development, the frontend has fixed ~~countless~~ BPU-related performance bugs ~~performance is finally close to the pre-refactoring level~~, and added performance counters for better performance analysis. The backend and memory system have fixed several bugs in V2 and further optimized timing. In terms of V3, the backend continues to advance the design of the new vector unit, and the memory system has carried out module refactoring and testing, as well as prefetch performance exploration.

【香山双周报 91】20251208 期

欢迎来到香山双周报专栏,我们将通过这一专栏定期介绍香山的开发进展。

本次是第 91 期双周报。

昆明湖 V2 最后一次性能回归顺利完成!不知不觉间,我们已经累计完成了 46 次双周性能回归测试。这 46 次回归不仅记录了昆明湖 V2 的性能演进,也是香山处理器蓬勃发展和持续进步的有力见证。

Performance Regression Results for XiangShan Kunminghu V2

在过去四年里,香山处理器从一个校园课程项目逐步发展为工业级处理器。从一切开始的起点“雁栖湖”,到首次实现工业交付并已在多项目中投入使用的“南湖”,再到成为当前性能最高的开源处理器核“昆明湖 V2”——这三代香山的演进,不仅凝聚着每一位团队成员的努力,也离不开社区伙伴们的持续关注与大力支持。在此,向大家表示由衷的感谢!

现在,我们即将告别昆明湖 V2,迎接昆明湖 V3 的到来!V3 将具备比 V2 更强大的性能,这同时也意味着更大的挑战。在香山团队面前的是一片未知的领域,我们的每一步都在书写新的历史。但是,我们坚信,通过“开源”这一新理念、新方法,我们能够与整个社区共同前进,进一步提升开源处理器的性能标杆。

感谢大家对香山的陪伴与支持,也期待您继续关注昆明湖 V3 的后续进展!

香山开发方面,前端修复了~~无数~~ BPU 相关的性能 bug~~性能终于接近重构前的水平~~,同时添加了性能计数器,以便更好地进行性能分析。后端和访存修复了 V2 的数个 bug,并且进一步优化了时序。在 V3 方面,后端继续推进新向量单元的设计,访存进行了模块重构与测试,同时进行了预取性能探索。

【2025 香山入门指南 · 我在 827 做访存】(六)并行计算的诱惑

在“2025 香山入门指南”系列文章中,我们希望构建一个基于 2025 年 6 月昆明湖 V2 版本的香山(XiangShan,提交哈希为 6318236)上手指南,通过一系列引导性的入门指南,来引导新同学们学习、了解并最终掌握香山。

827 是昆明湖项目访存组的主要办公室,“我在 827 做访存”系列以此为题介绍香山访存部分的设计。本文是本专题的第六部分,主要介绍向量访存。计算机硬件的发展史本质上是对运算速度的追求,随着传统串行计算在功耗与散热方面触及物理瓶颈,并行计算成为提升性能的必然路径。文中介绍了 SISD、SIMD、MISD 及 MIMD 四种体系结构,并重点阐述了现代处理器中关键的向量计算技术。不同于传统固定位宽的 SIMD 指令集,RISC-V Vector(RVV)通过引入可变矢量长度的概念,允许硬件根据实际场景调整寄存器长度,实现了软件代码与硬件实现的解耦。文章还以香山昆明湖 V2 处理器为例,简要解析了其紧耦合向量单元的向量访存单元的实现机制,包括利用拆分与合并模块处理向量访存指令,以及针对不同访存模式的优化策略。

【2025 香山入门指南 · 我在 827 做访存】(五)原子不可分

在“2025 香山入门指南”系列文章中,我们希望构建一个基于 2025 年 6 月昆明湖 V2 版本的香山(XiangShan,提交哈希为 6318236)上手指南,通过一系列引导性的入门指南,来引导新同学们学习、了解并最终掌握香山。

827 是昆明湖项目访存组的主要办公室,“我在 827 做访存”系列以此为题介绍香山访存部分的设计。本文是本专题的第五部分,主要介绍了原子操作相关内容。本文系统阐述了计算机系统中多线程并发中的数据竞争问题与 RISC-V 原子指令集的基础概念,简要解析了香山昆明湖 V2 的原子内存操作指令的硬件实现机制并给出了一些硬件实现原子内存操作指令加速的可能的选项。

[XiangShan Biweekly 90] 20251124

Welcome to XiangShan biweekly column! Through this column, we will regularly share the latest development progress of XiangShan. We look forward to your contribution.

This is the 90th issue of the biweekly report.

In terms of XiangShan development, the frontend has implemented some new performance features in V3, while fixing multiple performance bugs caused by BPU refactoring~~hoping that the performance can reach the pre-refactoring level (x2) by the next biweekly report~~. The backend is advancing the design of the new vector unit in V3, while fixing some legacy bugs in V2. The MemBlock has added the berti prefetcher, while continuing to promote code refactoring in various modules and fixing some V2 functional bugs.

【香山双周报 90】20251124 期

欢迎来到香山双周报专栏,我们将通过这一专栏定期介绍香山的开发进展。我们期待您的贡献。

本次是第 90 期双周报。

香山开发方面,前端在 V3 实现了一些新的性能特性,同时修复了多个 BPU 重构带来的性能 bug~~希望下次双周报时性能能够达到重构前水平(×2)~~。后端正在推进 V3 新向量单元的设计,同时修复了一些 V2 的遗留 bug。访存新增了 berti 预取器,同时继续推动各个模块的代码重构,还修复了一些 V2 功能 bug。