This document is relevant for: Inf2
, Trn1
, Trn2
nki.compiler.psum.alloc#
- nki.compiler.psum.alloc(func)[source]#
Allocate PSUM memory space for each logical block in a tensor using a customized allocation method.
This is one of the NKI direction allocation APIs. We recommend reading NKI Direct Allocation Developer Guide before using these APIs.
In NKI, a PSUM tensor (declared using NKI tensor creation APIs) can have three kinds of dimensions, in order: logical block(B), partition(P), and free(F). The partition and free dimensions directly map to the PSUM dimensions. Both B and F can be multi-dimensional, while P must be one-dimensional per Neuron ISA constraints. The block dimension describes how many (P, F) logical tiles this tensor has, but does not reflect the number of physical tiles being allocated.
ncc.psum.alloc
should be assigned to thebuffer
field of a NKI tensor declaration API. For example,nki_tensor = nl.ndarray((2, 4, nl.par_dim(128), 512), dtype=nl.float32, buffer=ncc.psum.alloc(...))
ncc.psum.alloc
allows programmers to specify the physical location of each logical tile in the tensor. The API accepts a single inputfunc
parameter, which is a callable object that takes in:a tuple of integers
idx
representing a logical block index,an integer
pdim_size
for the number of partitions the logical tile has, andan integer
fdim_size
for the number of bytes the logical tile has per partition.
The number of integers in
idx
must match the number of B dimensions the PSUM tensor has. For example, for the abovenki_tensor
, we expect theidx
tuple to have two integers for a 2D block index.pdim_size
should match the partition dimension size of the NKI tensor exactly.fdim_size
should be the total size of F dimension shapes of each logical tile in the tensor, multiplied by the data type size in bytes. For the abovenki_tensor
,pdim_size
should be 128, andfdim_size
should be512*sizeof(nl.float32) = 2048
bytes.Note
In current release,
fdim_size
cannot exceed 2KiB, which is the size of a single PSUM bank per partition. Therefore, a physical PSUM tile cannot span multiple PSUM banks. Check out Trainium/Inferentia2 Architecture Guide for NKI for more information on PSUM banks.The
func
returns a tuple of three integers(bank_id, start_partition, byte_addr)
indicating the physical tile location for the input logical block index.bank_id
indicates the PSUM bank ID of the physical tile.start_partition
indicates the lowest partition the physical tile allocation starts from. Thebyte_addr
indicates the byte offset into each PSUM bank per partition the physical tile starts from.Note
In current release,
start_partition
andbyte_addr
must both be 0.Note
In current release, programmers cannot mix NKI tensor declarations using automatic allocation (
ncc.psum.auto_alloc()
or the SBUF variant) and direction allocation APIs (ncc.psum.alloc()
,ncc.psum.mod_alloc()
or the SBUF variants) in the same kernel.- Parameters:
func – a callable object to specify how to place the logical block in PSUM memory.
This document is relevant for: Inf2
, Trn1
, Trn2