Skip to main content

What Makes High Bandwidth Memory Different?

high bandwidth memory

Memory access, throughput, and capacity are collectively the biggest roadblocks in technologies like AI, vision, and edge computing in small embedded systems. With semiconductor scaling laws limiting what can be done on a single die, and with the drive to make components smaller, package designers are stacking their DRAM dice in 3D. High bandwidth memory refers to this type of 3D structure, which offers high capacity and data transfer rates inside a package.

High bandwidth memory is a stacked memory architecture that has been standardized for use in component packaging. Monolithic 3D DRAM dice are still a far-off technology, so newer processors are including high bandwidth memory stacks as a viable path forward for higher compute in advanced systems.

Why Include Memory in the Package?

There are several reasons to include memory directly in a package, rather than placing it as an external component:

  • Standard processors typically only contain 1 KB of on-die memory, although you might see large amounts of slower Flash memory.
  • Placing memory modules outside the package increases the time required for data to be fetched and retrieved from the external module.
  • External modules can carry higher costs as they have their own packaging expenses
  • External modules take up space in the system, and it is typically preferable to decrease the system size

Over the years, many different types of memory have been introduced to the market, with the most common being Flash, SRAM, and DRAM. Large blocks of Flash have long been placed in SoCs to store configuration or an embedded application, while RAM was external to the component package. When high capacity and high throughput are needed, DRAM is the most popular option that is also cost effective.

JEDEC Definitions for DRAM

The standards body responsible for developing and advancing many standards on components, including DRAMs, is the Joint Electron Device Engineering Council (JEDEC). The group is responsible for standards such as DDR and thermal standards for certain classes of components. Regarding DRAMs, JEDEC provides the four broad memory definitions in the table below.

DDR (double data rate)

General purpose memory that can be placed on-board or on-module

LPDDR (low-power DDR)

A lower power variant of DDR, often used in high-compute mobile devices

GDDR (graphics DDR)

Intended for graphics applications (GPUs), but could be used for specialized compute tasks

HBM (high bandwidth memory)

Intended to provide very high data rate transfers for more advanced applications

There has been research into alternative memories that could stand up to DRAMs in mainstream applications, but DRAM has continued to demonstrate required performance and has maintained its presence in the memory market. Therefore, it has persisted into 2.5D and 3D configurations in semiconductor packaging.

High Bandwidth Memory in a Package

Inside a component package for an advanced processor, memory sits on the device substrate or interposer. The current popular option is to place DRAM chips on an interposer and stack these vertically in the spirit of true 3D heterogeneous integration. Vertical stacks are then connected with through-silicon vias (TSVs) which transfer signals vertically between the CPU and DRAM stacks. The typical architecture is shown below.

Stacked DRAMs

Example stacked DRAM in a semiconductor package. [Source]

With placement directly in the package, communication between DRAM and the CPU has lower latency and lower signal loss along the communication channel. The latter factor will allow further increases in memory throughput in future. The total channel bandwidth is then aggregated across all interconnects in the die stack, giving huge bandwidths in the hundreds of GB/s.

The HBM3 Standard

The HBM3 Standard, published and maintained by JEDEC, specifies DRAM stack structural and performance requirements. In the current HBM3 standard, device stacks are allowed up to 12 dice tall, and there is a provision in the standard allowing up to 16 dice in a stack. The 12-unit stack provides a total capacity of 48 GB (12 stacks with 32 Gbits each). This is sufficient to eliminate an external DRAM module and therefore reduce system size.

Prior HBMx standards only allowed 8 or fewer channels for memory access. HBM3 doubled this to 16 in order to support more vertical stacked dice. As the standard develops, it is possible more stacks will be included in a package, and we will likely see more than 1 stack with more channels implemented in the CPU core.

Ultimately, the use of memory in a package, rather than externally as individual components, changes the systems-level design approach for design teams. For systems designers working in advanced areas like AI and vision, these more advanced component packages offer an opportunity to reduce system size without losing capabilities.

Whenever you’re designing advanced components and electronic systems, use the complete set of system analysis tools from Cadence. Only Cadence offers a comprehensive set of circuit, IC, and PCB design tools for any application and any level of complexity. Cadence PCB design products also integrate with a multiphysics field solver for thermal analysis, including verification of heat sink designs.

Subscribe to our newsletter for the latest updates. If you’re looking to learn more about how Cadence has the solution for you, talk to our team of experts.

Untitled Document