## A Super Instruction-Flow Architecture Kenji Kise<sup>†,††</sup>, Takahiro Katagiri<sup>†,††</sup>, Hiroki Honda<sup>†</sup>, and Toshitsugu Yuba<sup>†</sup> † Graduate School of Information Systems, The University of Electro-Communications †† PRESTO, Japan Science and Technology Agency (JST) ## Abstract Microprocessor performance has improved at about 55% per year over the past three decades. To maintain the performance growth rates, next generation processors with more than one billion transistors must achieve higher levels of parallelism. In this context, new paradigm or new architecture is required to attain a dramatic boost of available instruction level parallelism. Most of the high performance processors in the market predict control-flow using a sophisticated branch predictor. However, even if a processor uses one of the latest branch predictors, such as YAGS[1], the prediction accuracy is about 95% at most. A branch predictor cannot avoid the misprediction of a fixed rate. In order to reduce the overhead of a branch instruction, in addition to the effort to reduce the number of mispredictions, the reduction of misprediction penalties becomes important. The aim of this project is to develop a novel processor architecture which mitigates the performance degradation caused by the branch instructions. In order to solve the problem, we propose a super instruction-flow architecture. This architecture is a type of decoupled architecture[2]. It has the mechanism to processes the multiple instruction flows efficiently. In addition to the architectural proposal, we described the design of its first generation processor. We report the preliminary evaluation results of the first generation super instruction-flow processor. Evaluation result show that the super instruction-flow processor with 5-stage pipeline achieves about 26% speedup compared with the conventional scalar processor. And the processor with 10-stage pipeline achieves the 33% speedup for matrix multiplication and the 29% speedup for selection sort. These evaluation results indicates that the super instruction-flow architecture effectively mitigates the overhead caused by a branch misprediction. ## References - [1] A. N. Eden and T. Mudge. The YAGS branch prediction scheme. In *Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture*, pages 69–77, 1998. - [2] James E. Smith. Decoupled access/execute computer architectures. In *Proceedings of the 9th annual symposium on Computer Architecture*, pages 112–119, 1982. - [3] Motokazu Ozawa, Masashi Imai, Yoichiro Ueno, Hiroshi Nakamura, and Takashi Nanya. Performance evaluation of Cascade ALU architecture for asynchronous super-scalar processors. In *Proceedings of ASYNC-2001*, pages 162–172, 2001. - [4] Glenn Reinman, Brad Calder, and Todd Austin. Optimizations Enabled by a Decoupled Front-End Architecture. *IEEE Transactions on Computers*, 50(4):338–355, 2001.