This document is relevant for: Inf2, Trn1, Trn2

nki.compiler.psum.alloc#

nki.compiler.psum.alloc(func)[source]#

Allocate PSUM memory space for each logical block in a tensor using a customized allocation method.

This is one of the NKI direction allocation APIs. We recommend reading NKI Direct Allocation Developer Guide before using these APIs.

In NKI, a PSUM tensor (declared using NKI tensor creation APIs) can have three kinds of dimensions, in order: logical block(B), partition(P), and free(F). The partition and free dimensions directly map to the PSUM dimensions. Both B and F can be multi-dimensional, while P must be one-dimensional per Neuron ISA constraints. The block dimension describes how many (P, F) logical tiles this tensor has, but does not reflect the number of physical tiles being allocated.

ncc.psum.alloc should be assigned to the buffer field of a NKI tensor declaration API. For example,

nki_tensor = nl.ndarray((2, 4, nl.par_dim(128), 512), dtype=nl.float32, buffer=ncc.psum.alloc(...))

ncc.psum.alloc allows programmers to specify the physical location of each logical tile in the tensor. The API accepts a single input func parameter, which is a callable object that takes in:

  1. a tuple of integers idx representing a logical block index,

  2. an integer pdim_size for the number of partitions the logical tile has, and

  3. an integer fdim_size for the number of bytes the logical tile has per partition.

The number of integers in idx must match the number of B dimensions the PSUM tensor has. For example, for the above nki_tensor, we expect the idx tuple to have two integers for a 2D block index.

pdim_size should match the partition dimension size of the NKI tensor exactly. fdim_size should be the total size of F dimension shapes of each logical tile in the tensor, multiplied by the data type size in bytes. For the above nki_tensor, pdim_size should be 128, and fdim_size should be 512*sizeof(nl.float32) = 2048 bytes.

Note

In current release, fdim_size cannot exceed 2KiB, which is the size of a single PSUM bank per partition. Therefore, a physical PSUM tile cannot span multiple PSUM banks. Check out Trainium/Inferentia2 Architecture Guide for NKI for more information on PSUM banks.

The func returns a tuple of three integers (bank_id, start_partition, byte_addr) indicating the physical tile location for the input logical block index.

bank_id indicates the PSUM bank ID of the physical tile. start_partition indicates the lowest partition the physical tile allocation starts from. The byte_addr indicates the byte offset into each PSUM bank per partition the physical tile starts from.

Note

In current release, start_partition and byte_addr must both be 0.

Note

In current release, programmers cannot mix NKI tensor declarations using automatic allocation (ncc.psum.auto_alloc() or the SBUF variant) and direction allocation APIs (ncc.psum.alloc(), ncc.psum.mod_alloc() or the SBUF variants) in the same kernel.

Parameters:

func – a callable object to specify how to place the logical block in PSUM memory.

This document is relevant for: Inf2, Trn1, Trn2