Current Research Focus

Performance and energy optimization in many core processors using dynamic cooperation of cache memory, NoC and DRAM controllers.

Sponsoring Agency : Department of Science & Technology, Govt of India.
Duration & Funding : September 2016- September 2019, - Rs. 19.70 lakhs

Summary:

Efficient management of shared resources in a multicore system is a growing research domain. Interference of applications running in various cores at shared last-level caches is a well researched topic. But less is known about the interference behavior of applications in NoCs and DRAM. Impact of this interference on applications’ execution time also is not explored thoroughly. Analyzing and managing multiple applications in a shared NoC and memory is challenging because application interactions in such a parallel system can be complex and chaotic, with many primary and secondary effects like queuing delays, varying memory-level parallelism, sudden change in packet injection rate in NoC, impact of the spatial location of cache banks, dependency of a cache miss over another, open-close row buffer status of in-flight DRAM request packets in NoC, effects on application level stall cycles due to row buffer conflicts in DRAM etc. These interference patterns can have a significant impact on application-level performance.

In order to supply the data as quickly as possible to the processor, efficient cache designs along with various replacement policies are proposed to utilize the last level shared cache efficiently. Unfortunately, most all of the techniques assume constant main memory latency in their evaluations. But the actual latency in retrieving data from main memory depends on various factors such as the congestion in the NoC, main memory scheduling policies, etc. Scheduling decisions taken at main memory may have significant impact on both latency as well as power consumption.

At any given point of time, an NoC is managing a wide variety of packets consisting of coherence transactions, cache requests and reply between L1 and L2, cache miss requests and reply between LLC and main memory. Unfortunately existing NoC architectures treats all these various category of packets uniformly. Packets from various cores can have different impact on application level performance based on queuing and port allocation policies at intermediate switches, spatial location of cache banks, memory level parallelism and much more unpredictable first and second order effects of inter packet interference. So routing and packet arbitration within routers should be aware of these side effects and prioritize actions accordingly. This application level prioritization can bring quantifiable performance enhancement and speed up of applications. Unfortunately such a quality of service implementation is missing in the present day many core processors.

In the last couple of years researchers have proposed different frameworks for providing QoS of these shared resources. Majority of these works focus on individual resource for QoS management and make use of locally available details either for static or dynamic prioritization of operations on these shared resources. Coordinated management of multiple QoS-aware shared resources at runtime remains an open problem. In this project we primarly focus on inter resource interaction to provide overall QoS and application performance improvement.


Power Efficient Fault Tolerant Techniques for Deflection Routers in Large Mesh on-Chip Interconnection Networks

Sponsoring Agency : SuG Scheme, Research and Development Section, IIT Guwahati.
Duration & Funding : February 2016- February 2018, - Rs. 4.92 lakhs

Summary:

Increasing the integration capacity of transistors in ICs has made it possible to realise multi-core chips. They accommodate thousands of processing elements in a single silicon substrate. These chips, due to their high processing capability demand modular communication architectures like Network on chips (NoCs) which offer distributed communication through a set of connected routers and links. NoC designs need to meet tough latency and throughput targets, with stringent area and power budgets.

Unfortunately, as critical dimensions like feature size of chips shrink, reliability of the chip also degrades. Shrinking transistor size leads to increasing variability in performance and reliability. Static variations like random dopant fluctuations lead to an irreversible device degradation over time. Permanent faults caused by physical damages, manufacturing defects and device wear-out may cause the entire chip to fail. Unpredictable events like particle hits and power grid fluctuations may cause transient faults in chips. Probability of link failures or even the breakdown of whole routers increases for structures in nanometer range. NoC reliability due to the increase in physical defects in advanced manufacturing processes is a critical challenge as often faults occur post manufacturing. Since NoC is the critical medium of communication, it should be designed to tolerate transient and permanent failure of components and provide reliable communication between nodes inside the chip.

Deflection based NoC routers are going to be the backbone of interconnection systems for future multi-core architectures. At any given point of time an NoC is managing a wide variety of packets consisting of coherence transactions, cache requests and reply between L1 and L2, cache miss requests and reply between LLC and main memory, control information needed to synchronize applications running in cooperating cores. Hence the role of an NoC systems is highly critical in delivering high performance to applications running on various cores. Deflection routers enable energy efficient on chip communication by eliminating buffers used to store the flits in transit. NoC reliability due to the increase in physical defects in advanced manufacturing processes is a critical challenge as often faults occur post manufacturing. Since NoC is the critical medium of communication, it should be designed to tolerate transient and permanent failure of components and provide reliable communication between nodes inside the chip. Future NoCs, as well as processing units should thus be able to cope with failures as they are inherent in those systems. NoC components like routers should be equipped with built-in fault tolerance mechanisms to ensure reliability of the chip in the presence of faults. Adaptive deflection routing algorithms are capable of exploiting the modularity and path diversity of mesh NoCs, hence they are capable of tolerating faults to a large extend. Fault tolerant design can bring quantifiable performance enhancement for applications running on the cores.