MOD playback

First a little demo:

It should be self explanatory by now. Music can be toggled through the menu and B button.

That said, here's how it's done:

MOD tracker

MOD files came up in the 1990s and were used in games and demos. Like MIDI files, they contain instructions to play music, but they also contain the samples to be played. This way, the music files are fairly small and yet can sound very unique. For a collection of mod files and its derivatives, check out modarchive.org.

The reason why I am interested in MOD files is that they are small and can be played quite efficiently on embedded devices while allowing long and complex music tracks.

I already experimented with this in 2018 on the Tiny Arcade, so I knew this is working.

HxCMOD

HxCMOD is a MOD player by Jean-François DEL NERO and I used it back then already.

It is a well optimized C library that is intended to be used on embedded devices: It doesn't use dynamic memory allocation and is designed to be fast and efficient - exactly what I need.

General architecture

Raylib comes with various MOD players, but these are not optimized for embedded devices.

Moreover, I need to make sure I can play music and sound in parallel for a few reasons:

The rendering is already taxing the CPU a lot; utilizing the 2nd core for something useful should help
The RP2350 is directly controlling the FUET-5025 magnetic buzzer via interrupts to ensure a stable sound output. However, certain operations can still cause the interrupt handler to be called infrequently, causing the sound to stutter. Running it on a separate core should help
The interrupt handler should be as short as possible to ensure a stable sound output. Optimally, it reads from a small buffer, updating the GPIO pin output value with the next sample
Therefore, the mod tracker and the interrupt handler should run on the same core so regular render and game update functions don't introduce stutters

So the desired architecture on the RP2350 is as follows:

The loop of the 2nd core runs an audio process that fills an audio buffer
The interrupt handler reads from the audio buffer and updates the GPIO pin
The main loop provides instructions to the audio process what to do.
The audio process provides a channel status update to the main loop

Before the integration of audio, my game code had 2 entry points: init and update. The init function is called once and the update function is called every frame, receiving a context struct that contains the screen buffer and the input states.

To accommodate the audio process, I decided to add a 3rd entry point: audioUpdate. This function is called by the audio process and receives an audio context struct which contains the following fields:

the audio buffer that is to be filled
channel status structs for back communication
the instructions from the main loop

The audio update is only called when the audio buffer has just been finished to be read out by the interrupt handler. This is very similar to how audio streaming on desktop works, where the audio driver calls a callback function when new audio data should be provided.

Raylib integration

The integration into raylib is quite straight forward: first we create a new audio stream with the desired sample rate and size. Then we set the audio stream callback to our audio system callback function and start playing the audio stream:

  1 AidioStream audioStream = LoadAudioStream(SAMPLE_RATE, SAMPLE_SIZE, 1);
  2 SetAudioStreamCallback(audioStream, AudioSystemCallback);
  3 PlayAudioStream(audioStream);

The AudioSystemCallback function is fairly simple:

  1 void AudioSystemCallback(void *buffer, unsigned int frames)
  2 {
  3     // reset the buffer content
  4     for (unsigned int i = 0; i < frames; i++)
  5     {
  6         ((short *)buffer)[i] = 0;
  7     }
  8     // the audioUpdate function is provided by the game code DLL - it may temporarily not be available
  9     // during recompiling the game code
 10     if (audioUpdate != NULL && (!isPaused || stepAudio))
 11     {
 12         stepAudio = 0;
 13         // this is subject to change; for now it works and the idea is, that each channel
 14         // can receive one instruction per audio frame update to do something. Channel
 15         // 0 is the MOD player channel.
 16         for (int i=0;i<SFX_CHANNELS_COUNT;i++)
 17         {
 18             audioCtx.inSfxInstructions[i] = ctx.outSfxInstructions[i];
 19             ctx.outSfxInstructions[i] = (SFXInstruction){0};
 20         }
 21         audioCtx.frames = frames;
 22         audioCtx.sampleRate = SAMPLE_RATE;
 23         audioCtx.sampleSize = SAMPLE_SIZE;
 24         // the buffer is provided to the audio update function and needs to be
 25         // updated with the audio data we want to have (e.g. music)
 26         audioCtx.outBuffer = buffer;
 27 
 28         // call the audio update function provided by the game code
 29         audioUpdate(&audioCtx);
 30 
 31         // after the audio update, we copy the channel status back to the runtime context
 32         // so the main loop can react to it
 33         for (int i=0;i<SFX_CHANNELS_COUNT;i++)
 34         {
 35             ctx.sfxChannelStatus[i] = audioCtx.outSfxChannelStatus[i];
 36             audioCtx.inSfxInstructions[i] = (SFXInstruction){0};
 37         }
 38     }
 39 }

The audioUpdate function is provided by the game code DLL and is platform independent. It is a bit too long to show here, but it needs to read out the instructions, fill the audio buffer and provide channel status updates. Everything meaningful is done in this part of the application - there's no platform specific code in there.

Buffer sizes

A little background information: Choosing the right buffer sizes is important: Small buffers allow for low latency but can more easily run out of data. Large buffers can introduce latency but are more stable.

My first implementation that ran on the RP2350 didn't use the 2nd core and whenever the game loaded a new scene, the buffer ran empty since the main loop was blocked by the scene loading and couldn't call the audio update function. When increasing the buffer size, the latency felt quite high (~200ms) and yet it would still run dry when loading a new scene.

Using the 2nd core

First I have to admit that I don't know much about multicore programming and I am not sure if I am doing it right, but this seems to work. The data exchange uses a shared memory area and there are currently no locks or semaphores in place - because these are not available in the RP2350 SDK.

This is one of the reasons why I want to change the way I exchange data instructions between the main loop and the audio process. This still worked however, and it wasn't difficult at all:

  1 // during init, we launch the audio core
  2 multicore_launch_core1(Audio_core);
  3 
  4 // ...
  5 
  6 void Audio_core()
  7 {
  8     // initialize the Audio relevant pins and interrupts
  9     Audio_init();
 10 
 11     // obtain the shared memory area: the RuntimeContext struct
 12     RuntimeContext *ctx = RuntimeContext_get();
 13     while (1)
 14     {
 15         // the audio buffer is only filled when it is empty
 16         if (!soundBuffer.bufferReady)
 17         {
 18             AudioContext *audioContext = AudioContext_get();
 19 
 20             // there are two buffers we use int alternation, based on at which point the 
 21             // audio frame is being read out by the interrupt handler
 22             uint16_t *buffer = soundBuffer.currentAudioBank ? soundBuffer.samplesA : soundBuffer.samplesB;
 23             audioContext->frames = ENGINE_AUDIO_BUFFER_SIZE;
 24             audioContext->outBuffer = (char*) buffer;
 25             audioContext->sampleRate = 22050;
 26             audioContext->sampleSize = 16;
 27                 
 28             // sync the instructions & call the audio update - just like with the raylib audio stream
 29             for (int i=0;i<5;i++)
 30             {
 31                 audioContext->inSfxInstructions[i] = ctx->outSfxInstructions[i];
 32                 ctx->outSfxInstructions[i] = (SFXInstruction){0};
 33             }
 34             memset(buffer, 0, ENGINE_AUDIO_BUFFER_SIZE * 2);
 35             audioUpdate(audioContext);
 36             for (int i=0;i<5;i++)
 37             {
 38                 ctx->sfxChannelStatus[i] = audioContext->outSfxChannelStatus[i];
 39                 audioContext->inSfxInstructions[i] = (SFXInstruction){0};
 40             }
 41             // flag the buffer as ready
 42             soundBuffer.bufferReady = 1;
 43         }
 44 
 45         // sleep a bit to not hog the CPU (could be improved, I guess)
 46         sleep_ms(1);
 47     }
 48 }

It doesn't look too different from how it's done in raylib (and other similar frameworks). What is maybe now interesting, is the interrupt handler function:

  1 
  2 // I honestly have no idea what in detail is going on here; it is verbatim from Tiny Circuit's
  3 // Tiny Game Engine, which can be found here:
  4 // https://github.com/TinyCircuits/TinyCircuits-Tiny-Game-Engine/blob/main/src/audio/engine_audio_module.c
  5 // With hardware instructions, it's important to do everything in the right order and using the right
  6 // values, otherwise there's usually nothing working at all in my experience. I therefore wouldn't be touching it
  7 // unless I'd have to.
  8 void Audio_init()
  9 {
 10     //generate the interrupt at the audio sample rate to set the PWM duty cycle
 11     audio_callback_pwm_pin_slice = pwm_gpio_to_slice_num(AUDIO_CALLBACK_PWM_PIN);
 12     pwm_clear_irq(audio_callback_pwm_pin_slice);
 13     pwm_set_irq_enabled(audio_callback_pwm_pin_slice, true);
 14     irq_set_exclusive_handler(PWM_IRQ_WRAP_0, repeating_audio_callback);
 15     irq_set_priority(PWM_IRQ_WRAP_0, 1);
 16     irq_set_enabled(PWM_IRQ_WRAP_0, true);
 17     audio_callback_pwm_pin_config = pwm_get_default_config();
 18     pwm_config_set_clkdiv_int(&audio_callback_pwm_pin_config, 1);
 19     engine_audio_adjust_playback_with_freq(150 * 1000 * 1000);
 20     pwm_init(audio_callback_pwm_pin_slice, &audio_callback_pwm_pin_config, true);
 21 
 22     engine_audio_setup_playback();
 23 }
 24 
 25 // same here: Sourced from TinyCircuit's Tiny Game Engine
 26 void engine_audio_setup_playback(){
 27     // Setup amplifier but make sure it is disabled while PWM is being setup
 28     gpio_init(AUDIO_ENABLE_PIN);
 29     gpio_set_dir(AUDIO_ENABLE_PIN, GPIO_OUT);
 30     gpio_put(AUDIO_ENABLE_PIN, 0);
 31 
 32     // Setup PWM audio pin, bit-depth, and frequency. Duty cycle
 33     // is only adjusted PWM parameter as samples are retrievd from
 34     // channel sources
 35     uint audio_pwm_pin_slice = pwm_gpio_to_slice_num(AUDIO_PWM_PIN);
 36     gpio_set_function(AUDIO_PWM_PIN, GPIO_FUNC_PWM);
 37     pwm_config audio_pwm_pin_config = pwm_get_default_config();
 38     pwm_config_set_clkdiv_int(&audio_pwm_pin_config, 1);
 39     pwm_config_set_wrap(&audio_pwm_pin_config, 512);   // 125MHz / 1024 = 122kHz
 40     pwm_init(audio_pwm_pin_slice, &audio_pwm_pin_config, true);
 41 
 42     // Now allow sound to play by enabling the amplifier
 43     gpio_put(AUDIO_ENABLE_PIN, 1);
 44 }
 45 
 46 // the interrupt handler itself is pretty simple though:
 47 void repeating_audio_callback(void){
 48     // using a simple counter to keep track of the current audio sample position
 49     // depending on the math, we select the current audio bank
 50     uint16_t currentAudioSamplePosition = audioWaveUpdateCounter % ENGINE_AUDIO_BUFFER_SIZE;
 51     uint8_t currentAudioBank = audioWaveUpdateCounter / ENGINE_AUDIO_BUFFER_SIZE % 2;
 52 
 53     // if the audio bank has changed, we check if the buffer was flagged as ready and I
 54     // use the LED to signal the status for debugging purposes - it's was a few times red
 55     // when I used a single core, but the 2nd core keeps up steadily withotu missing a beat
 56     if (soundBuffer.currentAudioBank != currentAudioBank) {
 57         if (!soundBuffer.bufferReady) {
 58             setRGB(1, 0, 0);
 59         }
 60         else
 61         {
 62             setRGB(0, 1, 0);
 63         }
 64         // signal that the buffer has swapped and we need the loop to fill the buffer again
 65         soundBuffer.bufferReady = 0;
 66         soundBuffer.currentAudioBank = currentAudioBank;
 67     }
 68     
 69     // select the current audio bank and the current audio sample position
 70     // this is where I wasted a huge amount of time: The samples are 16 bit signed integers
 71     // Initially I thought it was 16 bit UNSIGNED, and this caused the sound to be faint and
 72     // crackling a lot. I only figured that out after visualizing the signal on the screen
 73     // like with an oscilloscope.
 74     int16_t *bufferBank = currentAudioBank == 0 ? soundBuffer.samplesA : soundBuffer.samplesB;
 75     // the samples are 16 bit signed integers, so we need to adjust the range to 0-511, which
 76     // is the operation range of the PWM. Currently, I am just dividing by 32; when
 77     // dividing by 128, like the math would suggest (converting 16 bit to 9 bit), the sound
 78     // was extremely faint. I am cutting off the negative values and the values above 511
 79     // to keep it in the right range.
 80     int16_t sample = bufferBank[currentAudioSamplePosition] / 32 + 256;
 81     if (sample < 0)
 82     {
 83         sample = 0;
 84     }
 85     else
 86     if (sample > 511)
 87     {
 88         sample = 511;
 89     }
 90     
 91     // set the PWM level
 92     pwm_set_gpio_level(AUDIO_PWM_PIN, (uint16_t) sample);
 93     
 94     audioWaveUpdateCounter++;
 95     pwm_clear_irq(audio_callback_pwm_pin_slice);
 96 }

The implementation works quite well and the sound is stable, even when the main loop is blocked for a longer period of time. The latency is low and the sound quality is good.

Here's how the oscilloscope view looks like on device: Oscilloscope view of the sound signal

Web version

In theory, the web version is quite identical: The audio update function is called by the audio stream callback and the audio buffer is filled with the audio data.

The devil is however in the details: The audio worklet is running in an isolated context and can't access the main thread. It can only communicate with the main thread via messages. The problem is, that I am not aware of a solution to access the WASM instance from the audio worklet.

My initial attempt was to use a shared memory array and filling it with the audio data. While this worked, it suffered from the same problem as on the RP2350: When the main loop was blocking, the audio buffer ran empty unless the buffer was way too large than to be suitable for game play. I was hoping that this simple solution would be good enough, but it wasn't.

Not being well versed in web development, I decided to do an old school trick: Just run two instances of the same problem and let them communicate via messages. To do this, the worklet needs to create its own WASM instance next to the main loops WASM instance. There's much to improve, but the principle works: The main thread fetches the WASM file (which means downloading the WASM file twice 🙁) and pushes it to the audio worklet (which can't fetch the bytes). Creating the WASM instance is not a smooth experience, since it usually is done through the JS file that's typically compiled together with the WASM file. I managed to get it working eventually, but it feels quite hacky. The resulting audio quality however is pretty good.

Some notes:

Shared memory is nice, but it seems to be complicated due to security restrictions. I decided to avoid it
Pushing a Uint8Array through the message channel killed the performance. Using a numbers array works.
When building the release version with -O3 option, my audio worklet didn't work anymore. The reason is that the names got minified and my manual function binding to names no longer works. Switched to -O2 and it worked again. Size did not change much in my case.

Conclusion

I have now an audio system in place for 3 platforms: Desktop, RP2350 and Web. Since the audio generation is platform independent, I should be able to add more features without having to worry about the platform specifics.

I haven't measured the performance on the RP2350, but since it kinda worked even during rendering and loading, I am thinking to increase the specs. Originally I planned to support 1 music channel and 4 sound effect channels, but I am thinking to use 2 music channels instead and 8 sound effect channels. This allows music transitions and more complex sound effects.

The way how I send instructions to the audio channel is, I believe, not optimal. I am thinking to use a ring buffer instead with incrementing IDs. I believe this could be more stable in a threaded shared-memory environment. Although, I have to admit that the current way to instruct the audio channels to play something is dead simple:

  1 // 8 bytes for instructions; 0 channel = music channel
  2 ctx->outSfxInstructions[0] = (SFXInstruction)
  3 {
  4     .type = SFXINSTRUCTION_TYPE_PLAY,
  5     .id = musicId,
  6     .updateMask = SFXINSTRUCTION_UPDATE_MASK_VOLUME,
  7     .volume = 150
  8 };

But next I want to add WAV and audio generation support. I want to allow procedural sound effects to create noise or electronic sounds (beeps + boops).

When this is in, I will return to making the game work again.

The reason I did sound now is that there's an upcoming game jam that I want to try participating in and without sounds, it's not that great.