The SiCortex DMA Engines: Almost Like a Cluster Within a Cluster

The SiCortex DMA Engine

Each SiCortex node chip includes, in addition to its six 64-bit Linux cores, a DMA Engine that is capable of:

In order to do all this, the DMA Engine executes its own microcoded instruction set that has been tuned on behalf of the MPI Library. This instruction set is quite general-purpose, and features the ability to queue operations on other DMA Engines. Because each DMA Engine executes its own instructions and can invoke the help of other DMA Engines, they really do act like a cluster inside of a cluster.

The DMA Engine Command Set

Three DMA Engine instructions are used to do the heavy MPI lifting. They are:

Send Event: Immediately transmit a packet, with up to 112 bytes of data, to another node's DMA engine, where it is available to the destination user-mode program.

Send Command: Transmit a command to a destination node where it will be executed by that node's DMA Engine.

Put Buffer: Send a sequence of packets to a destination node according to parameters contained within the DMA command. Put buffer is a zero-copy operation.

Notice that no separate "Get" command is needed. A DMA engine that wants data gets it by queuing a "Put" command in the DMA Engine of the node that has it.