Introduction

In early 2024, I was brought on by the Department of Physics at Northern Illinois University to reimplement an interpreter for a scripting language called COSY. It was originally built in Fortran 77 and features a number of high performance computing acceleration structures and differential algebraic primitives that most languages don't natively support. Over the following year and half, I workedon building the core parser for the language with the intent to adapt everything over to C++ so that it can be extended and modified to meet the growing demands of beam physics research.

After graduating, the project will be managed other students by Fall 2025. I intend to do one last refactor of the project with the intention to have everything buttoned up for the next unfortunately soul to handle the behemoth.

Each section of this developer log is a component of the project I am building and will expand as I work on them. I wrote this with the intention that each section is a standalone article that you can read.

Project Architecture

Anything related to building the underlying runtime architecture of the Sigmafox is detailed here.
Memory Subsystems

Memory allocators and related performance optimizations lay within.

Architecture

Architecture is a somewhat nebulous term here. When I refer to the architecture of this software, I mean to talk about the base foundations and facilities needed to begin writing the true meat and potatoes of the interpreter. For this project in particular, I want to have fine grained control over each component that goes into the interpreter. This means developing my own standard library.

It isn't as bad as it sounds.

Type Definitions

With most of my projects, I begin by type defining all my primitive types. It helps to be explicit about the size of your integers and makes the eventual mess I am about to make slightly more palatable.

Another component I want to point out is the need to wrap my assertions. For sure, there are compiler flags that every major vendor provides to toggle c-assertions, but it doesn't hurt to handle this yourself. Additional macros for checking pointers and creating traps for edge-case behaviors are essential for defensive programming. Do you know how often I make a careless mistake that these macros catch for me?

C++

source/common/definitions.hpp

#ifndef SIGMAFOX_SOURCE_COMMON_DEFINITIONS_HPP
#define SIGMAFOX_SOURCE_COMMON_DEFINITIONS_HPP
#include <cstdio>
#include <cstring>
#include <cstdint>
#include <cassert>

typedef uint8_t     uint8;
typedef uint16_t    uint16;
typedef uint32_t    uint32;
typedef uint64_t    uint64;
typedef int8_t      int8;
typedef int16_t     int16;
typedef int32_t     int32;
typedef int64_t     int64;
typedef float       real32;
typedef double      real64;
typedef int32_t     bool32;
typedef int64_t     bool64;
typedef char*       cstr;
typedef const char* ccstr;

#define SF_BOOLEAN_ENABLE_STRING(b) (b ? "enabled" : "disabled")

#if defined(SF_DEBUG_MODE) && SF_DEBUG_MODE == 1
#   define SF_ASSERT(stm)           assert((stm))
#   define SF_ENSURE_POINTER(ptr)   assert((ptr) != NULL)
#   define SF_NO_IMPLEMENTATION()   assert(!"Not implemented.")
#   define SF_NO_REACH()            assert(!"Non-reachable condition.")
#else
#   define SF_ASSERT(stm)
#   define SF_ENSURE_POINTER(ptr)
#   define SF_NO_IMPLEMENTATION()
#   define SF_NO_REACH()
#endif;

#endif

Imagine, if you will, a function which copies memory from one buffer to the next, then clears the source buffer right after. Such a function would have three points of failure. Pause for a moment to think what they are.

C/C++

static inline void
memory_copy_and_clean(void *source, uint64 source_size, 
                      void *destination, uint64 destination_size)
{
    ...
}

As programmers, we obviously wouldn't call this function with null pointers or logically incorrect buffer sizes. Yet mistakes can happen, and they happen in subtle ways. Small memory bugs creep in, things that don't manifest outright. Somewhere along the way, a datastructure breaks down and you aren't sure why. Pedantic checks are what catch these bugs by basically acting like traps that go off when it catches unintended behavior.

Here is what pedantically checking your parameters looks like for a function like this:

C/C++

static inline void
memory_copy_and_clean(void *source, uint64 source_size, 
                      void *destination, uint64 destination_size)
{
    SF_ENSURE_POINTER(source); // no nulls
    SF_ENSURE_POINTER(destination); // no nulls
    SF_ASSERT(destination_size > source_size); // big enough?

    ...
}

When you disable assertions, you're basically making a contract with the consumer of the program that you have done your due diligence and you trust that your code will function. Will that always be case? Certainly not. But the vast majority of bugs that arise come not from total sum of interactions, but logical inconsistencies the programmer bakes into the code.

Platform Abstractions

Developing good software, in my opinion, comes from a deep understanding of what goes on under the hood. The standard template library acts like a black box that makes it virtually impossible to debug. Have you stepped through it with the debugger before? Most projects I write also come packed with its own OS-wrapper libraries.

Take, for example, file system operations. The following API below is what I typically end up with for most of my personal projects. The simple and concise intention of what they do are often described by name alone. A name like filesystem_read_entire_file() trully does what it says it does, and if it can't, you'll know simply by the return value. This makes documentation a breeze because it is self documenting.

C++

source/platform/filesystem.hpp

#ifndef SIGMAFOX_SOURCE_PLATFORM_FILESYSTEM_HPP
#define SIGMAFOX_SOURCE_PLATFORM_FILESYSTEM_HPP
#include <common/definitions.hpp>

uint64          filesystem_canonicalize_path(const char *path, char *buffer, uint64 buffer_size);
bool            filesystem_path_exists(const char *path);
bool            filesystem_path_is_file(const char *path);
bool            filesystem_path_is_directory(const char *path);

const char*     filesystem_get_executable_directory();
const char*     filesystem_get_current_directory();

uint64          filesystem_get_file_size(const char *path);
bool            filesystem_read_entire_file(const char *path, void *buffer, uint64 buffer_size);
bool            filesystem_write_entire_file(const char *path, void *buffer, uint64 buffer_size);

typedef uint64 fptr;
#define SF_INVALID_FILE_HANDLE 0x00

fptr            filesystem_read_stream_create(const char *path);
bool            filesystem_read_stream_close(fptr file_pointer);
bool            filesystem_read_stream_offset(fptr file_pointer);
bool            filesystem_read_stream_pointer_reset(fptr file_pointer);
bool            filesystem_read_stream_shift_pointer_right(fptr file_pointer, uint64 shift);
bool            filestream_read_stream_shift_pointer_left(fptr file_pointer, uint64 shift);
bool            filestream_read_stream_eob(fptr file_pointer);
bool            filestream_read_stream_sob(fptr file_pointer);
bool            filestream_read_stream_test(fptr file_pointer, uint64 size); // Ensures the read will fit.
bool            filesystem_read_stream_read(fptr file_pointer, void *buffer, uint64 size);


#endif

Platform abstractions are also quite helpful for making performance optimizations. For example, the above filesystem API allows for streaming data from files, something that we take for granted in the C++ STL. I am not entirely fond of the way the STL handles file streaming. There is too much abstraction and it becomes a pain to reason about what is happening under the hood.

As far as the API above goes, we gather a handle to our file and shift our pointer left and right. If you glue this code in where you want to load in large files, you can easily read in chunks and process them as desired. With some minor tweaks, you can add some concurrency to preload chunks of memory as you work, effectively removing the I/O dependency in the computation.

Memory Subsystem

The memory subsystem is an incredibly simple, yet critical part for this project.

A common misconception about memory management is that you're managing memory. This comment might seem odd at first, but how often are you actually partitioning your calls to new and malloc()? More often than not, we allocate single types of objects or arrays of objects. Anything more complex gets bundled in a opaque datastructure. What you may actually be referring to is managing lifetime--directing when memory should be released.

Effectively, calling new calls malloc() to get retrieve a region of memory that it can properly construct and return back to you. When malloc() is called, it first checks to see if it owns a region of memory that it can subdivide and return back to you. If it doesn't, malloc() will request memory from the operating system. Some implementations of new may actually bypass malloc() by collecting memory for itself, but at the end of the day, both of these functions will marshal some memory off from the operating system. These internal datastructures are designed to reduce internal and external fragmentation. This management incurs a performance penalty because some internal datastructure must be traversed to determine where memory is allocated.

We can skip the middleman entirely and call the same routines the standard library does and manage the memory ourselves. After all, if we're going through all the trouble to write our platform architecture libraries, why stop there? For Windows, we have access to VirtualAlloc() and mmap() on Linux. They offer equivalent functionality.

Some internal hardware handles mapping regions of virtual address to physical address space. Virtual allocation is basically the operating system's way of allowing user space applications access to the heap. However, taking ownership of the memory in this way comes with a small downside: the operating system has a minimum allocation size. You may hear terms like "page granularity" which refers to the size of the page or the minimum protectable region of memory.

This is why we have malloc(). The standard library doesn't know how you're going to use its memory, so it provides you with a general allocator that works good enough for most applications.

This begs the question: how do we manage our memory?

We write our own memory allocator.

Monotonic Allocators

The simplest form of allocator is the monotonically growing stack allocator. These allocators typically don't show up organically in the wild because they're not useful without small extensions to allow them to shrink. A monotonic allocator that is extended to use a push-pop interface is often referred to as a memory arena and it is what you will find in place of monotonic allocators.

The downside is that you don't get random access allocation. This may seem like a massive downside since non-temporal allocations occur all the time in software. But, when you actually think about when these allocations occur and when the deallocations take place, there is a sense linear progression that a stack allocator can take advantage of. A common implementation I use allows me to push and pop memory from the top and bottom of the buffer so that I can use two separate regions at the same time.

C/C++

source/common/allocators/arena.hpp

#ifndef SIGMAFOX_SOURCE_COMMON_ALLOCATORS_ARENA_HPP
#define SIGMAFOX_SOURCE_COMMON_ALLOCATORS_ARENA_HPP
#include <common/definitions.hpp>

struct MemoryArena
{
    uint8  *buffer;
    uint64  size;
    uint64  offset_top;
    uint64  offset_bottom;
};

bool        memory_arena_initialize(MemoryArena *arena, void *buffer, uint64 buffer_size);
bool        memory_arena_reinitialize(MemoryArena *arena);

template <typename type, typename ...Args> type*  memory_arena_push_top(MemoryArena *arena, Args... args);
template <typename type, typename ...Args> type*  memory_arena_push_bottom(MemoryArena *arena, Args... args);
template <typename type> void   memory_arena_pop_top(MemoryArena *arena);
template <typename type> void   memory_arena_pop_bottom(MemoryArena *arena);
void*       memory_arena_push_top(MemoryArena *arena, uint64 bytes);
void*       memory_arena_push_bottom(MemoryArena *arena, uint64 bytes);
void        memory_arena_pop_top(MemoryArena *arena, uint64 bytes);
void        memory_arena_pop_bottom(MemoryArena *arena, uint64 bytes);
uint64      memory_arena_get_commit(MemoryArena *arena);
uint64      memory_arena_get_free(MemoryArena *arena);

#endif

Implementing this yourself is straight forward and an exercise I leave to the reader. Instead, I want to point your attention to something that I glossed over. What about the new and delete keywords in C++? How do you handle allocations that utilize RAII? Well, under the hood, C++ defines new and delete as operators with their own functionality that is invoked when you use them. This implies that the standard library also takes care of this behavior in some form, so that means we can too.

A memory arena, by design, isn't a great tool for RAII based subsystems since the deallocation steps favor bulk deallocation over single deallocations. We have no sense of what the memory is and how we should handle "deallocation" and it is something the user must explicitly do themselves. A small price for performance.

C++

template <typename type, typename ...Args> type*  
memory_arena_push_bottom(MemoryArena *arena, Args... args)
{

    SF_ENSURE_POINTER(arena);
    SF_ENSURE_POINTER(arena->buffer);
    
    uint64 free = memory_arena_get_commit(arena);
    SF_ASSERT(free >= sizeof(type));
    if (free < sizeof(type)) return nullptr;

    void *location = arena->buffer + arena->offset_bottom;
    arena->offset_bottom += sizeof(type);

    type *result = new (location) type(args...);
    return result;

}

For the constructor, the placement new() operator comes in handy as it will automatically construct the type for you. In the above implementation, we make use of templates to properly forward the argument list to the type we're constructing, thus properly constructing whatever we're trying allocate. This is all that is needed to adequately support complex C++ objects.

C++

class Rectangle
{
    public:
        Rectangle(int width, int height) { this->width = width; this->height = height; }
        ~Rectangle() { printf("%i,%i", this->width, this->height); }
        
    protected:
        int width;
        int height;
};

Rectangle *rect = memory_arena_push<Rectangle>(&my_arena, 8, 16);
rect->~Rectangle(); // Yikes.

I can't really say I love the idea of manually invoking the deconstructor, but as far as solutions go, there isn't much we can do to integrate this with the memory arena without creating what is effectively a general allocator. You may think of some clever design trick to track which allocations require deconstruction, but I can tell you this is not worth chasing in the long term. If you absolutely must have automated deconstruction on deallocation, seek other allocators like a memory pool (similar performance advantages to a stack allocator).

Introduction

Project Architecture

Memory Subsystems

Architecture

Type Definitions

Platform Abstractions

Memory Subsystem

Monotonic Allocators