High Throughput Ring Buffers
Digital Signal Processing (DSP) often requires vast amount of data at high speeds and low latency. An often overlooked method when storing data with locality constraints is the Circular Buffer. This data-structure has a finite cyclic structure with FIFO (first in, first out) logic.
Implementation Issues
Even though the Ring Buffer is relatively simple, it can be a hassle of setting up and using.
// Simple ring buffer
template <typename T>
struct ring_buffer {
T *data;
size_t size;
}When using the buffer checks must be made to make sure that reading and writing is allowed within bounds. This in itself is not wrong, but will require checks to see reading or writing is allowed.
template <typename T>
T read_buffer(ring_buffer<T> *rb, unsigned i) {
unsigned pos = i % rb->size;
return (T)*(rb->data + pos * sizeof(T));
}Sneaky Optimizations
If we pause for a second to think about what we are trying to achieve we may find something interesting. We do not want to read outside our buffer. This is quite simple, if we use an unsigned integer for indexing, we may use \(N\) bits to automatically rollover. This will prevent getting indexes out of bounds. But what happens when we need to read or write \(K\) items? That may be problematic because that will mean that we need to stop reading and jump to the next part of the data expensive…
Abusing the page mapping process
Here comes the fun part! What if we didn’t have to worry about
boundschecking? In theory this should speed up the process due to
less instructions for the CPU. Let me introduce the page mapping
process. When the program asks the kernel for memory the kernel
usually provides what is called virtual memory. So if the program
asks for 16bytes the kernel might give 16bytes of contigious
memory. This is not always true and may introduce performance
penalties if the CPU have to jump around a lot through the memory.
What you can do is ask for contigious memory specifially. This
should when possible return memory with perfect locality another
improvement can be to allign the memory for better access. This
still won’t solve the problem with bounds checking!!! But here is
the key component, the kernel returns a virtual memory chunk to
the program. In practice this means that the memory the program
now holds is contigious somewhere in the physical memory
but to the program it looks like it’s located at some address:
0x0001 or similar. In practice the program memory
0x0001 might actually map to 0x5A7E. So
if the program reads out of bounds i.e 0x0002 this
should result in undefined behaviour or more likely an error. But
if you_ tell the kernel that the next virtual memory is located on
the same physical address it will look
something like this:
0x0001 0x0002 // Virtual
\ / // --------
0x517E // Physicalwhich migh be the following: