Acknowledgments
This list outlines the techniques used in the XiangShan RTL codes.
Instruction Fetching
[1] Glenn Reinman, Todd Austin, and Brad Calder. "A scalable front-end architecture for fast instruction delivery." 26th International Symposium on Computer Architecture (ISCA). 1999. [RTL Codes]
[2] Alex Ramirez, Oliverio J. Santana, Josep L. Larriba-Pey, and Mateo Valero. "Fetching instruction streams." 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2002. [RTL Codes]
Instruction Prefetch
[1] Glenn Reinman, Brad Calder, and Todd Austin. "Fetch directed instruction prefetching." 32nd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). 1999. [RTL Codes]
[2] Yasuo Ishii, Jaekyu Lee, Krishnendra Nathella, and Dam Sunwoo. "Rebasing instruction prefetching: An industry perspective." IEEE Computer Architecture Letters 19.2: 147-150. 2020. [RTL Codes]
[3] Yasuo Ishii, Jaekyu Lee, Krishnendra Nathella, and Dam Sunwoo. "Re-establishing fetch-directed instruction prefetching: An industry perspective." 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2021. [RTL Codes]
Branch prediction
[1] Kevin Skadron, Pritpal S. Ahuja, Margaret Martonosi, and Douglas W. Clark. "Improving prediction for procedure returns with return-address-stack repair mechanisms." 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). 1998. [RTL Codes]
[2] Pierre Michaud. "A PPM-like, tag-based branch predictor." The Journal of Instruction-Level Parallelism (JILP) 7: 10. 2005. [RTL Codes]
[3] André Seznec, and Pierre Michaud. "A case for (partially) tagged geometric history length branch prediction." The Journal of Instruction-Level Parallelism (JILP) 8: 23. 2006. [RTL Codes]
[4] André Seznec. "A 256 kbits l-tage branch predictor." The Journal of Instruction-Level Parallelism (JILP) Special Issue: The Second Championship Branch Prediction Competition (CBP) 9: 1-6. 2007. [RTL Codes]
[5] André Seznec. "A 64-Kbytes ITTAGE indirect branch predictor." The Journal of Instruction-Level Parallelism (JILP) 2nd JILP Workshop on Computer Architecture Competitions (JWAC): Championship Branch Prediction (CBP). 2011. [RTL Codes]
[6] André Seznec. "Tage-sc-l branch predictors." The Journal of Instruction-Level Parallelism (JILP) 4th JILP Workshop on Computer Architecture Competitions (JWAC): Championship Branch Prediction (CBP). 2014. [RTL Codes]
[7] André Seznec. "Tage-sc-l branch predictors again." The Journal of Instruction-Level Parallelism (JILP) 5th JILP Workshop on Computer Architecture Competitions (JWAC): Championship Branch Prediction (CBP). 2016. [RTL Codes]
[8] Tan Hongze, and Wang Jian. "A Return Address Predictor Based on Persistent Stack." Journal of Computer Research and Development (CRAD) 60.6: 1337-1345. 2023. [RTL Codes]
Scheduling
[1] Robert. M. Tomasulo. "An efficient algorithm for exploiting multiple arithmetic units." IBM Journal of Research and Development (IBMJ) 11.1: 25-33. 1967. [RTL Codes]
[2] James E. Smith, and Andrew R. Pleszkun. "Implementation of precise interrupts in pipelined processors." 12th Annual International Symposium on Computer Architecture (ISCA). 1985. [RTL Codes]
[3] Fernando Latorre, Grigorios Magklis, Jose González, Pedro Chaparro, and Antonio González. "Crob: implementing a large instruction window through compression." Transactions on High-Performance Embedded Architectures and Compilers III: 115-134. Berlin, Heidelberg: Springer Berlin Heidelberg. 2011. [RTL Codes]
Execution
[1] Andrew D. Booth. "A signed binary multiplication technique." The Quarterly Journal of Mechanics and Applied Mathematics 4.2: 236-240. 1951. [RTL Codes]
[2] Christopher. S. Wallace. "A suggestion for a fast multiplier." IEEE Transactions on Electronic Computers 1: 14-17. 1964. [RTL Codes]
[3] Elisardo Antelo, Tomas Lang, Paolo Montuschi, and Alberto Nannarelli. "Digit-recurrence dividers with reduced logical depth." IEEE Transactions on Computers 54.7: 837-851. 2005. [RTL Codes]
MDP
[1] George Z. Chrysos, and Joel S. Emer. "Memory dependence prediction using store sets." 25th Annual International Symposium on Computer Architecture (ISCA). 1998. [RTL Codes]
[2] Richard Kessler. "The alpha 21264 microprocessor." IEEE Micro 19.2: 24-36. 1999. [RTL Codes]
TLB
[1] Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. "Colt: Coalesced large-reach tlbs." 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2012. [RTL Codes]
Non-blocking Cache
[1] David Kroft. "Lockup-free instruction fetch/prefetch cache organization." 8th Annual Symposium on Computer Architecture (ISCA). 1981. [RTL Codes]
Multi Port Data Cache
[1] Gurindar S. Sohi, and Manoj Franklin. "High-bandwidth data memory systems for superscalar processors." 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1991. [RTL Codes]
Data Replacement
[1] Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, and Joel Emer. "High performance cache replacement using re-reference interval prediction (RRIP)." 37th Annual International Symposium on Computer Architecture (ISCA). 2010. [RTL Codes]
Data Prefetch
[1] Jean-Loup Baer, and Tien-Fu Chen. "An effective on-chip preloading scheme to reduce data access penalty." ACM/IEEE Conference on Supercomputing. 1991. [RTL Codes]
[2] Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi and Andreas Moshovos. "Spatial memory streaming." 33rd International Symposium on Computer Architecture (ISCA). 2006. [RTL Codes]
[3] Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N. Patt "Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers." IEEE 13th International Symposium on High Performance Computer Architecture (HPCA). 2007. [RTL Codes]
[4] Pierre Michaud. "A Best-Offset Prefetcher." 2nd Data Prefetching Championship (DPC). 2015. [RTL Codes]
[5] Pierre Michaud. "Best-Offset Hardware Prefetching." IEEE International Symposium on High Performance Computer Architecture (HPCA). 2016. [RTL Codes]
[6] Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. "Temporal Prefetching Without the Off-Chip Metadata." 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2019. [RTL Codes]
[7] Sam Ainsworth, and Lev Mukhanov. "Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher." ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 2024. [RTL Codes]