In early 2024, I was brought on by the Department of Physics at Northern Illinois University to reimplement an interpreter for a scripting language called COSY. It was originally built in Fortran 77 and features a number of high performance computing acceleration structures and differential algebraic primitives that most languages don't natively support. Over the following year and half, I workedon building the core parser for the language with the intent to adapt everything over to C++ so that it can be extended and modified to meet the growing demands of beam physics research.
After graduating, the project will be managed other students by Fall 2025. I intend to do one last refactor of the project with the intention to have everything buttoned up for the next unfortunately soul to handle the behemoth.
Each section of this developer log is a component of the project I am building and will expand as I work on them. I wrote this with the intention that each section is a standalone article that you can read.
Anything related to building the underlying runtime architecture of the Sigmafox is detailed here.
Memory allocators and related performance optimizations lay within.
Architecture is a somewhat nebulous term here. When I refer to the architecture of this software, I mean to talk about the base foundations and facilities needed to begin writing the true meat and potatoes of the interpreter. For this project in particular, I want to have fine grained control over each component that goes into the interpreter. This means developing my own standard library.
It isn't as bad as it sounds.
With most of my projects, I begin by type defining all my primitive types. It helps to be explicit about the size of your integers and makes the eventual mess I am about to make slightly more palatable.
Another component I want to point out is the need to wrap my assertions. For sure, there are compiler flags that every major vendor provides to toggle c-assertions, but it doesn't hurt to handle this yourself. Additional macros for checking pointers and creating traps for edge-case behaviors are essential for defensive programming. Do you know how often I make a careless mistake that these macros catch for me?
C++
source/common/definitions.hpp
#ifndef SIGMAFOX_SOURCE_COMMON_DEFINITIONS_HPP
#define SIGMAFOX_SOURCE_COMMON_DEFINITIONS_HPP
#include <cstdio>
#include <cstring>
#include <cstdint>
#include <cassert>
typedef uint8_t uint8;
typedef uint16_t uint16;
typedef uint32_t uint32;
typedef uint64_t uint64;
typedef int8_t int8;
typedef int16_t int16;
typedef int32_t int32;
typedef int64_t int64;
typedef float real32;
typedef double real64;
typedef int32_t bool32;
typedef int64_t bool64;
typedef char* cstr;
typedef const char* ccstr;
#define SF_BOOLEAN_ENABLE_STRING(b) (b ? "enabled" : "disabled")
#if defined(SF_DEBUG_MODE) && SF_DEBUG_MODE == 1
# define SF_ASSERT(stm) assert((stm))
# define SF_ENSURE_POINTER(ptr) assert((ptr) != NULL)
# define SF_NO_IMPLEMENTATION() assert(!"Not implemented.")
# define SF_NO_REACH() assert(!"Non-reachable condition.")
#else
# define SF_ASSERT(stm)
# define SF_ENSURE_POINTER(ptr)
# define SF_NO_IMPLEMENTATION()
# define SF_NO_REACH()
#endif;
#endif
Imagine, if you will, a function which copies memory from one buffer to the next, then clears the source buffer right after. Such a function would have three points of failure. Pause for a moment to think what they are.
C/C++
static inline void
memory_copy_and_clean(void *source, uint64 source_size,
void *destination, uint64 destination_size)
{
...
}
As programmers, we obviously wouldn't call this function with null pointers or logically incorrect buffer sizes. Yet mistakes can happen, and they happen in subtle ways. Small memory bugs creep in, things that don't manifest outright. Somewhere along the way, a datastructure breaks down and you aren't sure why. Pedantic checks are what catch these bugs by basically acting like traps that go off when it catches unintended behavior.
Here is what pedantically checking your parameters looks like for a function like this:
C/C++
static inline void
memory_copy_and_clean(void *source, uint64 source_size,
void *destination, uint64 destination_size)
{
SF_ENSURE_POINTER(source); // no nulls
SF_ENSURE_POINTER(destination); // no nulls
SF_ASSERT(destination_size > source_size); // big enough?
...
}
When you disable assertions, you're basically making a contract with the consumer of the program that you have done your due diligence and you trust that your code will function. Will that always be case? Certainly not. But the vast majority of bugs that arise come not from total sum of interactions, but logical inconsistencies the programmer bakes into the code.
Developing good software, in my opinion, comes from a deep understanding of what goes on under the hood. The standard template library acts like a black box that makes it virtually impossible to debug. Have you stepped through it with the debugger before? Most projects I write also come packed with its own OS-wrapper libraries.
Take, for example, file system operations. The following API below is what I typically
end up with for most of my personal projects. The simple and concise intention of what
they do are often described by name alone. A name like filesystem_read_entire_file()
trully does what it says it does, and if it can't, you'll know simply by the return value.
This makes documentation a breeze because it is self documenting.
C++
source/platform/filesystem.hpp
#ifndef SIGMAFOX_SOURCE_PLATFORM_FILESYSTEM_HPP
#define SIGMAFOX_SOURCE_PLATFORM_FILESYSTEM_HPP
#include <common/definitions.hpp>
uint64 filesystem_canonicalize_path(const char *path, char *buffer, uint64 buffer_size);
bool filesystem_path_exists(const char *path);
bool filesystem_path_is_file(const char *path);
bool filesystem_path_is_directory(const char *path);
const char* filesystem_get_executable_directory();
const char* filesystem_get_current_directory();
uint64 filesystem_get_file_size(const char *path);
bool filesystem_read_entire_file(const char *path, void *buffer, uint64 buffer_size);
bool filesystem_write_entire_file(const char *path, void *buffer, uint64 buffer_size);
typedef uint64 fptr;
#define SF_INVALID_FILE_HANDLE 0x00
fptr filesystem_read_stream_create(const char *path);
bool filesystem_read_stream_close(fptr file_pointer);
bool filesystem_read_stream_offset(fptr file_pointer);
bool filesystem_read_stream_pointer_reset(fptr file_pointer);
bool filesystem_read_stream_shift_pointer_right(fptr file_pointer, uint64 shift);
bool filestream_read_stream_shift_pointer_left(fptr file_pointer, uint64 shift);
bool filestream_read_stream_eob(fptr file_pointer);
bool filestream_read_stream_sob(fptr file_pointer);
bool filestream_read_stream_test(fptr file_pointer, uint64 size); // Ensures the read will fit.
bool filesystem_read_stream_read(fptr file_pointer, void *buffer, uint64 size);
#endif
Platform abstractions are also quite helpful for making performance optimizations. For example, the above filesystem API allows for streaming data from files, something that we take for granted in the C++ STL. I am not entirely fond of the way the STL handles file streaming. There is too much abstraction and it becomes a pain to reason about what is happening under the hood.
As far as the API above goes, we gather a handle to our file and shift our pointer left and right. If you glue this code in where you want to load in large files, you can easily read in chunks and process them as desired. With some minor tweaks, you can add some concurrency to preload chunks of memory as you work, effectively removing the I/O dependency in the computation.
The memory subsystem is an incredibly simple, yet critical part for this project.
A common misconception about memory management is that you're managing memory. This
comment might seem odd at first, but how often are you actually partitioning your calls
to new
and malloc()
? More often than not, we allocate single types of objects or
arrays of objects. Anything more complex gets bundled in a opaque datastructure.
What you may actually be referring to is managing lifetime--directing when memory
should be released.
Effectively, calling new
calls malloc()
to get retrieve a region of memory that it
can properly construct and return back to you. When malloc()
is called, it first checks
to see if it owns a region of memory that it can subdivide and return back to you. If it
doesn't, malloc()
will request memory from the operating system. Some implementations
of new
may actually bypass malloc()
by collecting memory for itself, but at the
end of the day, both of these functions will marshal some memory off from the operating
system. These internal datastructures are designed to reduce internal and external fragmentation.
This management incurs a performance penalty because some internal datastructure must be
traversed to determine where memory is allocated.
We can skip the middleman entirely and call the same routines the standard library
does and manage the memory ourselves. After all, if we're going through all the trouble
to write our platform architecture libraries, why stop there? For Windows, we have
access to VirtualAlloc()
and mmap()
on Linux. They offer equivalent functionality.
Some internal hardware handles mapping regions of virtual address to physical address space. Virtual allocation is basically the operating system's way of allowing user space applications access to the heap. However, taking ownership of the memory in this way comes with a small downside: the operating system has a minimum allocation size. You may hear terms like "page granularity" which refers to the size of the page or the minimum protectable region of memory.
This is why we have malloc()
. The standard library doesn't know how you're going to
use its memory, so it provides you with a general allocator that works good enough for
most applications.
This begs the question: how do we manage our memory?
We write our own memory allocator.
The simplest form of allocator is the monotonically growing stack allocator. These allocators typically don't show up organically in the wild because they're not useful without small extensions to allow them to shrink. A monotonic allocator that is extended to use a push-pop interface is often referred to as a memory arena and it is what you will find in place of monotonic allocators.
The downside is that you don't get random access allocation. This may seem like a massive downside since non-temporal allocations occur all the time in software. But, when you actually think about when these allocations occur and when the deallocations take place, there is a sense linear progression that a stack allocator can take advantage of. A common implementation I use allows me to push and pop memory from the top and bottom of the buffer so that I can use two separate regions at the same time.
C/C++
source/common/allocators/arena.hpp
#ifndef SIGMAFOX_SOURCE_COMMON_ALLOCATORS_ARENA_HPP
#define SIGMAFOX_SOURCE_COMMON_ALLOCATORS_ARENA_HPP
#include <common/definitions.hpp>
struct MemoryArena
{
uint8 *buffer;
uint64 size;
uint64 offset_top;
uint64 offset_bottom;
};
bool memory_arena_initialize(MemoryArena *arena, void *buffer, uint64 buffer_size);
bool memory_arena_reinitialize(MemoryArena *arena);
template <typename type, typename ...Args> type* memory_arena_push_top(MemoryArena *arena, Args... args);
template <typename type, typename ...Args> type* memory_arena_push_bottom(MemoryArena *arena, Args... args);
template <typename type> void memory_arena_pop_top(MemoryArena *arena);
template <typename type> void memory_arena_pop_bottom(MemoryArena *arena);
void* memory_arena_push_top(MemoryArena *arena, uint64 bytes);
void* memory_arena_push_bottom(MemoryArena *arena, uint64 bytes);
void memory_arena_pop_top(MemoryArena *arena, uint64 bytes);
void memory_arena_pop_bottom(MemoryArena *arena, uint64 bytes);
uint64 memory_arena_get_commit(MemoryArena *arena);
uint64 memory_arena_get_free(MemoryArena *arena);
#endif
Implementing this yourself is straight forward and an exercise I leave to the reader.
Instead, I want to point your attention to something that I glossed over. What about
the new
and delete
keywords in C++? How do you handle allocations that utilize RAII?
Well, under the hood, C++ defines new
and delete
as operators with their own functionality
that is invoked when you use them. This implies that the standard library also takes care
of this behavior in some form, so that means we can too.
A memory arena, by design, isn't a great tool for RAII based subsystems since the deallocation steps favor bulk deallocation over single deallocations. We have no sense of what the memory is and how we should handle "deallocation" and it is something the user must explicitly do themselves. A small price for performance.
C++
template <typename type, typename ...Args> type*
memory_arena_push_bottom(MemoryArena *arena, Args... args)
{
SF_ENSURE_POINTER(arena);
SF_ENSURE_POINTER(arena->buffer);
uint64 free = memory_arena_get_commit(arena);
SF_ASSERT(free >= sizeof(type));
if (free < sizeof(type)) return nullptr;
void *location = arena->buffer + arena->offset_bottom;
arena->offset_bottom += sizeof(type);
type *result = new (location) type(args...);
return result;
}
For the constructor, the placement new()
operator comes in handy as it will automatically
construct the type for you. In the above implementation, we make use of templates to properly
forward the argument list to the type we're constructing, thus properly constructing whatever
we're trying allocate. This is all that is needed to adequately support complex C++ objects.
C++
class Rectangle
{
public:
Rectangle(int width, int height) { this->width = width; this->height = height; }
~Rectangle() { printf("%i,%i", this->width, this->height); }
protected:
int width;
int height;
};
Rectangle *rect = memory_arena_push<Rectangle>(&my_arena, 8, 16);
rect->~Rectangle(); // Yikes.
I can't really say I love the idea of manually invoking the deconstructor, but as far as solutions go, there isn't much we can do to integrate this with the memory arena without creating what is effectively a general allocator. You may think of some clever design trick to track which allocations require deconstruction, but I can tell you this is not worth chasing in the long term. If you absolutely must have automated deconstruction on deallocation, seek other allocators like a memory pool (similar performance advantages to a stack allocator).