DAC 2011, SAN DIEGO, USA: ARM announced the latest AMBA 4 interface and protocol specification featuring the AMBA 4 AXI Coherency Extensions (ACE). Cache coherency is essential in multicore computing applications to efficiently maintain the consistency of data stored in local caches of a shared resource.
The AMBA 4 ACE specification enables system level cache coherency across clusters of multicore processors, such as the ARM Cortex-A15 MPCore processors and ARM Mali-T604 graphics processors. This ensures optimum performance and power efficiency of complex heterogeneous SoC designs, and is designed to address next generation computing across mobile, home, networking and gaming applications.
Compute performance in screen-based devices has increased over 700 times since the mid 1990’s. New technologies, such as high-performance heterogeneous multicore processing, have emerged to help drive this growth in performance. These new technologies have increased demand on System IP, particularly at a memory sub-system, hardware and software level. Challenges that stem from latency, bandwidth, power and performance still need to be addressed. Therefore, effective hardware coherency is becoming increasingly crucial to minimize off chip memory traffic and software cache maintenance, which saves processor cycles.
AMBA is the de facto standard on-chip interconnect methodology and is supported by the vast majority of the digital electronics industry. The direction of the latest specification has been driven by a wide group of leading semiconductor, EDA and verification vendors, including Arteris, Cadence, Jasper, Marvell, Mentor, Sonics, ST Ericsson, Synopsys and Xilinx.
“Marvell has been an active contributor to the standardization of hardware coherency within the AMBA 4 specification,” said Hongyi Chen, vice president of engineering in processor design at Marvell Semiconductor. “A key benefit of AMBA4 ACE is the provision of a development ecosystem with a standard protocol that makes future hardware design far easier. This protocol enables transparent management of cache coherency that removes a significant burden from software engineers.”
“ST-Ericsson has been a longstanding supporter of the need of hardware assist for coherency as systems become ever more complex,” said Jim Nicholas, vice president & general manager of processor subsystems & product lifecycle management at ST-Ericsson. “As one of the first licensees of the ARM CoreLink CCI-400 Cache Coherent Interconnect, we welcome the introduction of the AMBA 4 ACE specification which will enable our energy efficient high performance wireless platforms to exploit the full potential of heterogeneous multiprocessing.”
“Xilinx fully supports the standardization and advancement of hardware resources for the software community,” said Lawrence Getman, VP of processing platforms, Xilinx. “AMBA 4 ACE strengthens the ecosystem’s ability to create coherent on-chip memory solutions that further optimize the performance of systems driven by a combination of ARM CPU and programmable logic-based processing.”
“Designers of complex heterogeneous, embedded multi-processing SoCs now require robust specifications, design & verification tools and systems IP. This ensures their devices minimize off-chip memory transactions, while maximizing performance and power efficiency,” said Michael Dimelow, marketing director, processor division, ARM. “AMBA 4 ACE is a major component to enabling the successful development and deployment of future Cortex-A and Mali GPU processor sub-systems- by ensuring the optimum combination of performance and energy efficiency.”
AMBA4 ACE spec
The AMBA 4 ACE specification enables system level cache coherency for high-performance multicore processors to manage increased data and cache sharing, more cross component communication and support additional processing engines that access shared caches and external memory. Publishing a standard way of managing cache coherency, memory barriers and virtual memory management will reduce software cache maintenance, saving processor cycles and reducing external memory accesses.
The introduction of memory barriers throughout the memory sub-system enables system architects to ensure optimal instruction ordering, when necessary, to improve system performance. Distributed virtual memory signalling extends memory virtualization, introduced with the latest ARM architecture and the Cortex-A15 processor, to the system MMUs to make more efficient use of external memory and provide the ability for multiple operating systems (OS) to share hardware resources under an appropriate hypervisor.
The latest specification represents the second phase of the AMBA 4 protocol. Phase one of the AMBA 4 specification, launched in 2010, included definition of an expanded family of AXI interconnect protocols. To date more than 4000 engineers from 2500 unique businesses and organizations have downloaded this first phase.