A dedicated unikernel for microservices

Saturday, July 30, 2022

Implementing MPI_REDUCE in Toro unikernel

This post presents the implementation of MPI_REDUCE in Toro unikernel. This is an MPI operation that allows the application of a reduction operation over a vector. In this example, each task has a vector of 64 elements that is initialised with the rank of the task, i.e., the id of the core in which the task executes. Each task sends to the root core this vector for reduction by using the MPI_SUM operation. The root core collects all vectors and applies a matrix sum. This results in a new vector of 64 elements. For example, if the vector has three elements with all 1’: [1, 1, 1], and we have 3 cores, the resulting vector would be: [3, 3, 3]. 
In this example, VirtIO is used to send the data among the cores. Each core has its own queue to exchange data with the other cores in the system. During the reduction, each core sends its data to the root core. The exchange happens without the use of lock operations. 
Let’s see how the example works. 

void main(){
    int rank, i;
    int r[VECTOR_LEN];
    int s[VECTOR_LEN];
    rank = GetRank();
    for (i=0; i < VECTOR_LEN; i++){
        r[i] = rank;
    }
    printf("hello from core", rank);

First, the program initialises the vector r[] by using the rank. The rank is the core id in which the program is running. Note that an instance of this program executes for each core in the system. Toro only supports threads and the memory-space is flat, i.e., cores share the page-directory.

 
    // reduce an array of size VECTOR_LEN
    // by using the MPI_SUM operation
    Mpi_Reduce(r, s, VECTOR_LEN, MPI_SUM, root);    

Then, the program uses the MPI_Reduce() function with the MPI_SUM operation over the r[] vector. The root core collects all vectors and applies the MPI_SUM operation. The resulting vector is stored in s[]. This vector can only be read by the root core. Mpi_Reduce() relies on VirtIO to send data to the root core.


  if (rank == root){
     for (i=0;  i < VECTOR_LEN; i++){
       // Sum = ((N - 1) * N) / 2
       if (s[i] != (((GetCores()-1) * GetCores()) / 2)){
           printf("failed!, core:", rank);
           break;
       }
     }
     if (i == VECTOR_LEN){
        printf("success! value:", s[i-1]);
     }
    }
}

Finally, the root core checks that the result is the expected. In this case, since each core sends a vector with its rank, the resulting vector is simply the sum of its ranks. I tried this example with 5 cores and the result is the following video, enjoy! 

Sunday, July 03, 2022

Using VirtIO for inter-core communication

This article presents a new device named VirtIOBus that allows cores to communicate by relying on VirtIO. The design is simple, each core has a reception queue for each core in the system. For example, in a three-cores system, each core has (N-1) reception queues, i.e., 2 reception queues per core. When the core #0 wants to send a packet to core #1, the core #0 gets a descriptor from the available ring of the reception virtqueue of the core #1, fills the buffer, and then enqueues the descriptor in the used ring. To notify the core #1, the core#0 can either trigger an inter-core interruption or simple continues until the core #1 gets the buffer. In the current implementation, interruptions are disabled, and the core #1 has just to pool the used rings to get new buffers. The enqueue and dequeue operations do not require atomic operations or any lock mechanism.
The API is built of two functions:
  • SendTo(core, buf)
  • RecvFrom(core, buf)
When a core wants to send data, it has to just set the destination core and the buffer from which the content is copied from. To get a new buffer, the core invokes RecvFrom that returns the oldest buffer in the queue. If the queue is empty, the function blocks until a new buffer arrives.
You can see the use of these functions in the example here. In this example, data is broadcasted from core #0 to all cores in a three-cores system. The core #0 sends a “ping” to core #1 and core #2, and then, core #1 and core#2 responds with a “pong”. The output is as follows:

VirtIOBus: Core[1]->Core[0] queue has been initiated
VirtIOBus: Core[2]->Core[0] queue has been initiated
VirtIOBus: Core[0]->Core[1] queue has been initiated
VirtIOBus: Core[2]->Core[1] queue has been initiated
VirtIOBus: Core[0]->Core[2] queue has been initiated
VirtIOBus: Core[1]->Core[2] queue has been initiated
Core[0] -> Core[1]: ping
Core[1] -> Core[0]: pong
Core[0] -> Core[2]: ping
Core[2] -> Core[0]: pong

In the future, the current mechanism will be used for the implementation of MPI functions like MPI_Send() and MPI_Recv(). The motivation is to port MPI applications to the Toro unikernel so stay tunned!

P.S.: You can find a demo here.