Fast shared Array, Buffer and Circular Buffer / Ring Buffer for .NET IPC with Memory Mapped Files

About the SharedMemory library

The SharedMemory class library provides a set of C# classes that utilise a non-persisted memory-mapped file for very fast low-level inter-process communication (IPC) to share data using the following data structures: buffer, array and circular buffer (also known as a ring buffer).

SharedMemory uses the MemoryMappedFile class introduced in .NET 4, and provides a partial P/Invoke wrapper around the Windows API for versions prior to this. The library was originally inspired by the following CodeProject article: “Fast IPC Communication Using Shared Memory and InterlockedCompareExchange“.

The library abstracts the use of the memory mapped file, adding a header that includes information about the size of the shared memory region so that readers are able to open the shared memory without knowing the size before hand. A helper class to support the fast copying of C# fixed-size structures in a generic fashion is also included.

The project source can be found on GitHub and Codeplex, with binaries available via Nuget and Codeplex.

What is a memory mapped file?

A memory-mapped file is an area of virtual memory that may have been mapped to a physical file and can be read from and written to by multiple processes on the same system. The file or portions thereof are accessed via views to one or more processes as a contiguous block of memory. Each view can then be read/written to using a memory location.

A memory-mapped file can be created as one of two types:

  • persisted: mapped to a physical file, and
  • non-persisted: in-memory only,  commonly used for inter-process communication (IPC).

A memory-mapped file is referenced using a unique name, unlike named pipes however a memory-mapped file cannot be accessed directly over a network.

The Circular Buffer

I began this project when I couldn’t find a good .NET implementation of a circular buffer in order to implement fast double/n-buffered transfer of image data between processes.  The lock-free approach to thread synchronisation presented in the above article appealed to me, however it had a number of limitations and bugs that had to be overcome such as only supporting a minimum of 3 nodes rather than 2.

A circular buffer, circular queue, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. – Wikipedia

To achieve a lock-free implementation, a single node header and multiple node records are used to keep track of which nodes within the buffer are available to be written to, those that are currently being written to, and the nodes that are currently being read from. Thread synchronisation is performed by using a combination of Interlocked.CompareExchange and EventWaitHandle‘s.

Circular Buffer Diagram
Circular buffer lock-free implementation

Each node is essentially a set of integers indicating the node’s index within the buffer, flags for whether it is ready to be written to or read from, and the offset within the shared memory buffer where the data for the node is actually located.

Circular Buffer Memory Layout
Circular Buffer memory layout within SharedMemory buffer


The maximum bandwidth achieved was approximately 20GB/s, using 20 nodes of 1MB each with 1 reader and 1 writer. The .NET 3.5 implementation is markedly slower, I believe this is due to framework level performance improvements.

The following chart shows the bandwidth achieved in MB/s using a variety of circular buffer configurations, ranging from 2 nodes to 50 nodes with a varying number of readers/writers, comparing a 1KB vs 1MB node buffer size on .NET 3.5 and .NET 4.

Circular buffer bandwidth
Circular buffer bandwidth

All results are from a machine running Windows 10 64-bit, Intel Core i7-3770K @ 3.50GHz, 16GB DDR3@1200MHz on an ASUS P8Z77-V. The data transferred was selected randomly from an array of 256 buffers that had been populated with random bytes.

The SharedMemory classes

  • SharedMemory.Buffer – an abstract base class that wraps the MemoryMappedFile  class, exposing read/write operations and implementing a small header to allow clients to open the shared buffer without knowing the size beforehand.
  • SharedMemory.BufferWithLocks – an abstract class that extends SharedMemory.Buffer  to provide simple read/write locking support through the use of EventWaitHandles .
  • SharedMemory.Array – a simple generic array implementation utilising a shared memory buffer. Inherits from SharedMemory.BufferWithLocks  to provide support for thread synchronisation.
  • SharedMemory.BufferReadWrite – provides read/write access to a shared memory buffer, with various overloads to support reading and writing structures, copying to and from IntPtr and so on. Inherits from SharedMemory.BufferWithLocks  to provide support for thread synchronisation.
  • SharedMemory.CircularBuffer – lock-free FIFO circular buffer implementation (aka ring buffer). Supporting 2 or more nodes, this implementation supports multiple readers and writers. The lock-free approach is implemented using Interlocked.Exchange  and EventWaitHandles .
  • SharedMemory.FastStructure – provides a method of fast generic structure reading/writing using generated IL with DynamicMethod.


The combined output of the of the following examples is:

For more complex usage examples see the unit tests and the client/server sample in the library’s source code.





FastStructure.PtrToStructure<T> performs approximately 20x faster than
System.Runtime.InteropServices.Marshal.PtrToStructure(IntPtr, Type)  (8ms vs 160ms) and about 1.6x slower than the non-generic equivalent (8ms vs 5ms).

FastStructure.StructureToPtr<T> performs approximately 8x faster than System.Runtime.InteropServices.Marshal.StructureToPtr(object, IntPtr, bool)  (4ms vs 34ms).


The project source can be found on GitHub and Codeplex, with binaries available via Nuget and Codeplex.

4 thoughts on “Fast shared Array, Buffer and Circular Buffer / Ring Buffer for .NET IPC with Memory Mapped Files”

  1. Justin,
    Very cool utility and one that I plan on using in an application as it sort of reminds me of the old days as a Unix type. I have run the examples and the numbers do not match. For example in the SingleProcess example if I use one writer, one reader, buffer size of 8294400 and the default number of elements the Diff value is always greater than 0. Does this mean that data was lost? Is Diff a measurement of the number of data items lost? Using the Client/Server they do not read and write the same number. Is this a problem?

    1. Hi Douglas, this is normal. A “Diff” value in this sample indicates how many items the reader is lagging behind the writer. Can you confirm if the very last entry has a “Diff” value of 0? This indicates that the reader has caught up and the correct number have been processed.

Leave a Reply