Gameboy Advance / Nintendo DS / DSi - Technical Info - Extracted from no$gba version 2.8c
About this Document

 GBA Reference

GBA Technical Data
GBA Memory Map

Hardware Programming
GBA LCD Video Controller
GBA Sound Controller
GBA Timers
GBA DMA Transfers
GBA Communication Ports
GBA Keypad Input
GBA Interrupt Control
GBA System Control
GBA Cartridges
GBA Unpredictable Things

ARM CPU Reference
BIOS Functions
External Connectors
 NDS Reference

DS Technical Data
DS I/O Maps
DS Memory Maps

Hardware Programming
DS Memory Control
DS Video
DS 3D Video
DS Sound
DS System and Built-in Peripherals
DS Cartridges, Encryption, Firmware
DS Xboo
DS Wireless Communications

BIOS Functions
ARM CPU Reference
External Connectors
 DSi Reference

DSi Basic Differences to NDS

New Hardware Features
DSi I/O Map
DSi Control Registers (SCFG)
DSi XpertTeak (DSP)
DSi New Shared WRAM (for ARM7, ARM9, DSP)
DSi SoundExt
DSi Advanced Encryption Standard (AES)
DSi Cartridge Header
DSi Touchscreen/Sound Controller
DSi I2C Bus
DSi Cameras
DSi SD/MMC Protocol and I/O Ports
DSi SD/MMC Filesystem
DSi GPIO Registers
DSi Console IDs
DSi Unknown Registers
DSi Notes
DSi Exploits
DSi Regions

 CPU Reference

General ARM7TDMI Information
ARM CPU Overview
ARM CPU Register Set
ARM CPU Flags & Condition Field (cond)
ARM CPU 26bit Memory Interface
ARM CPU Exceptions
ARM CPU Memory Alignments

Further Information
ARM Pseudo Instructions and Directives
ARM CP15 System Control Coprocessor
ARM CPU Instruction Cycle Times
ARM CPU Versions
ARM CPU Data Sheet

 CPU 32bit ARM Mode

ARM 32bit Opcodes (ARM Code)
ARM Instruction Summary
ARM Branch and Branch with Link (B, BL, BX, BLX, SWI, BKPT)
ARM Data Processing (ALU)
ARM Multiply and Multiply-Accumulate (MUL, MLA)
ARM Special ARM9 Instructions (CLZ, QADD/QSUB)
ARM PSR Transfer (MRS, MSR)
ARM Memory: Single Data Transfer (LDR, STR, PLD)
ARM Memory: Halfword, Doubleword, and Signed Data Transfer
ARM Memory: Block Data Transfer (LDM, STM)
ARM Memory: Single Data Swap (SWP)

 CPU 16bit THUMB Mode

ARM 16bit Opcodes (THUMB Code)
When operating in THUMB state, cut-down 16bit opcodes are used.
THUMB is supported on T-variants of ARMv4 and up, ie. ARMv4T, ARMv5T, etc.
THUMB Instruction Summary
THUMB Register Operations (ALU, BX)
THUMB Memory Load/Store (LDR/STR)
THUMB Memory Addressing (ADD PC/SP)
THUMB Memory Multiple Load/Store (PUSH/POP and LDM/STM)
THUMB Jumps and Calls
  GBA Reference

GBA Technical Data
GBA Memory Map

Hardware Programming
GBA LCD Video Controller
GBA Sound Controller
GBA Timers
GBA DMA Transfers
GBA Communication Ports
GBA Keypad Input
GBA Interrupt Control
GBA System Control
GBA Cartridges
GBA Unpredictable Things

ARM CPU Reference
BIOS Functions
External Connectors

  GBA Technical Data

CPU Modes
  ARM Mode     ARM7TDMI 32bit RISC CPU, 16.78MHz, 32bit opcodes (GBA)
  THUMB Mode   ARM7TDMI 32bit RISC CPU, 16.78MHz, 16bit opcodes (GBA)
  CGB Mode     Z80/8080-style 8bit CPU, 4.2MHz or 8.4MHz  (CGB compatibility)
  DMG Mode     Z80/8080-style 8bit CPU, 4.2MHz (monochrome gameboy compatib.)
Internal Memory
  BIOS ROM     16 KBytes
  Work RAM     288 KBytes (Fast 32K on-chip, plus Slow 256K on-board)
  VRAM         96 KBytes
  OAM          1 KByte (128 OBJs 3x16bit, 32 OBJ-Rotation/Scalings 4x16bit)
  Palette RAM  1 KByte (256 BG colors, 256 OBJ colors)
  Display      240x160 pixels (2.9 inch TFT color LCD display)
  BG layers    4 background layers
  BG types     Tile/map based, or Bitmap based
  BG colors    256 colors, or 16 colors/16 palettes, or 32768 colors
  OBJ colors   256 colors, or 16 colors/16 palettes
  OBJ size     12 types (in range 8x8 up to 64x64 dots)
  OBJs/Screen  max. 128 OBJs of any size (up to 64x64 dots each)
  OBJs/Line    max. 128 OBJs of 8x8 dots size (under best circumstances)
  Priorities   OBJ/OBJ: 0-127, OBJ/BG: 0-3, BG/BG: 0-3
  Effects      Rotation/Scaling, alpha blending, fade-in/out, mosaic, window
  Backlight    GBA SP only (optionally by light on/off toggle button)
  Analogue     4 channel CGB compatible (3x square wave, 1x noise)
  Digital      2 DMA sound channels
  Output       Built-in speaker (mono), or headphones socket (stereo)
  Gamepad      4 Direction Keys, 6 Buttons
Communication Ports
  Serial Port  Various transfer modes, 4-Player Link, Single Game Pak play
External Memory
  GBA Game Pak max. 32MB ROM or flash ROM + max 64K SRAM
  CGB Game Pak max. 32KB ROM + 8KB SRAM (more memory requires banking)
Case Dimensions
  Size (mm)    GBA: 145x81x25 - GBA SP: 82x82x24 (closed), 155x82x24 (stretch)
Power Supply
  Battery GBA  GBA: 2x1.5V DC (AA), Life-time approx. 15 hours
  Battery SP   GBA SP: Built-in rechargeable Lithium ion battery, 3.7V 600mAh
  External     GBA: 3.3V DC 350mA - GBA SP: 5.2V DC 320mA


Original Gameboy Advance (GBA)
   ____/    :  CARTRIDGE  SIO   :    \____
  | L       _____________________  LED  R |
  |        |                     |        |
  |  _||_  |   2.9" TFT SCREEN   |    (A) |
  | |_  _| | 240x160pix  61x40mm | (B)    |
  |   ||   |    NO BACKLIGHT     |  ::::  |
  |        |                     | SPEAKR |
  | STRT() |_____________________|  ::::  |
  |____  OFF-ON  BATTERY 2xAA PHONES  _==_|

   _______________________                                 _
  | _____________________ |                               / /
  ||                     ||                              / /
  ||   2.9" TFT SCREEN   ||                             / /
  || 240x160pix  61x40mm ||                            / /
  ||   WITH BACKLIGHT    ||                           / /
  ||                     ||     GBA SP SIDE VIEWS    / /
  ||_____________________||                         / /
  |  GAME BOY ADVANCE SP  |   _____________________(_)
  |_______________________|  |. . . . . . . .'.'.   _|
  |_|________|________|_|_|  |_CARTRIDGE_:_BATT._:_|_| <-- EXT1/EXT2
  |L    EXT1     EXT2    R|
  |          (*)      LEDSo   _____________________ _
  (VOL_||_           (A)  o  |_____________________(_)
  |  |_  _| ,,,,,(B)      |  |. . . . . . . .'.'.   _|
  |    ||   ;SPK;         |  |_CARTRIDGE_:_BATT._:_|_| <-- EXT1/EXT2
  |         '''''      ON #                         _ _____________________
  |       SLCT STRT    OFF#   _____________________(_)_____________________|
  | CART.  ()   ()        |  |. . . . . . . .'.'.   _|
  |_:___________________:_|  |_CARTRIDGE_:_BATT._:_|_| <-- EXT1/EXT2

Gameboy Micro (GBA Micro)
     | L      __________________      R |
     |       |     GBA-MICRO    |       |
     | _||_  |  2.0" TFT SCREEN |    (A)| +
     ||_  _| |240x160pix 42x28mm| (B)   |VOL
     |  ||   |     BACKLIGHT    |       | -
     |       |__________________|  ...  |
       PWR   <--- CARTRIDGE SLOT ---> PHONES

Nintendo DS (NDS)
    |        _____________________        |
    |       |                     |       |
    |       |    3" TFT SCREEN    |       |
    |       | 256x192pix  61x46mm |       |
    |       |      BACKLIGHT      |       |
    | ::::: |    Original NDS     | ::::: |
    | ::::: |_____________________| ::::: |
   _|        _          ______   _        |_  <-- gap between screens: 22mm
  |L|_______| |________|      |_| |_______|R|     (equivalent to 90 pixels)
  |_______   _____________________   _______|
  |  PWR  | |                     | |SEL STA|
  |   _   | |    3" TFT SCREEN    | |       |
  | _| |_ | | 256x192pix  61x46mm | |   X   |
  ||_   _|| |      BACKLIGHT      | | Y   A |
  |  |_|  | |    TOUCH SCREEN     | |   B   |
  |       | |_____________________| |       |
  |_______|             NintendoDS  |_______|
  |         MIC                LEDS         |
       VOL        SLOT2(GBA)     MIC/PHONES

Nintendo DS Lite (NDS-Lite)
    |        _____________________        |
    |       |                     |       |
    |       |    3" TFT SCREEN    |       |
    |  ...  | 256x192pix  61x46mm |  ...  |
    |  ...  |      BACKLIGHT      |  ...  |
    |       |      NDS-LITE       |       |
    |       |_____________________|       |
    |___  _ _ _ _ _ _ _ _ _ _ _ _ _ _ ____|   <-- gap between screens: 23mm
   L| _ |_____________MIC____________|LEDS|R
    |   _    _____________________        |
    | _| |_ |                     |   X   |
    ||_   _||    3" TFT SCREEN    | Y   A |PWR
    |  |_|  | 256x192pix  61x46mm |   B   |
    |       |      BACKLIGHT      |       |
    |       |    TOUCH SCREEN     |oSTART |
    |       |_____________________|oSELECT|
       VOL        SLOT2(GBA)     MIC/PHONES

Nintendo DSi (DSi)
    |        _____________________        |
    |       |                     |   O o | <-- CAM (O) and LED (o)
    |       |   3.25" TFT SCREEN  |       |     (on backside)
    |       | 256x192pix  66x50mm |       |
    |       |      BACKLIGHT      |       |
    |  __   |         DSi         |   __  |
    | (__)  |_____________________|  (__) |
    |___  _ _ _ _ _ _ _ _ _ _ _ _ _ _ ____|  <-- gap between screens: 23mm
   L|LEDS|__________CAM__MIC_________| __ |R                   (88 pixels)
  + |   _    _____________________        |
 VOL| _| |_ |                     |   X   | <-- SD Card Slot
  - ||_   _||   3.25" TFT SCREEN  | Y   A |
    |  |_|  | 256x192pix  66x50mm |   B   |
    |       |      BACKLIGHT      |       |
    |       |    TOUCH SCREEN     |oSTART |
    | POWERo|_____________________|oSELECT|

Nintendo DSi XL
  As DSi, but bigger case, and bigger 4.2" screens

Gameboy Player (Gamecube Joypad) (GBA Player)
       L____-------         -------____R
       /   ___   \           /   (Y)   \Z
      /   / O \   | (START) |        (X)\   Z      = Gameboy Player Menu
     |    \___/    \_______/      (A)    |  X or Y = Select button
     |\         _   \     /    (B)      /|
     | \___   _| |_  \   /   ___    ___/ |  optionally X/Y can be
     |    |\ |_   _| /   \  / C \  /|    |  swapped with L/R (?)
     |    | \  |_|  /     \ \___/ / |    |
     |    |  \_____/       \_____/  |    |  analogue sticks = ?
      \__/                           \__/

Gameboy Player (Gamecube Bongos) (GBA Player)
       _______     _______
      /   Y   \   /   X   \   Y/B = left bongo rear/front side
     | . . . . |_| . . . . |  X/A = right bongo rear/front side
     |    B    |R|    A    |  S   = start/pause button
     |\_______/|_|\_______/|  R   = microphone (triggers R button)
     |         |_|         |  (the X/Y inputs can be assigned to
     |\_______/| |\_______/|  GBA R/L inputs in GBA player setup)
      \_______/   \_______/

The GBA's separate 8bit/32bit CPU modes cannot be operated simultaneously. Switching is allowed between ARM and THUMB modes only (that are the two GBA modes).
This manual does not describe CGB and DMG modes, both are completely different than GBA modes, and both cannot be accessed from inside of GBA modes anyways.

Gameboy Player
An GBA Adapter for the Gamecube console; allowing to play GBA games on a television set.
GBA Gameboy Player

GBA SP Notes
Deluxe version of the original GBA. With backlight, new folded laptop-style case, and built-in rechargeable battery. Appears to be 100% compatible with GBA, there seems to be no way to detect SPs by software.

Gameboy Micro (GBA Micro)
Minituarized GBA. Supports 32bit GBA games only (no 8bit DMG/CGB games). The 256K Main RAM is a bit slower than usually (cannot be "overclocked via port 4000800h).

Nintendo DS (Dual Screen) Notes
New handheld with two screens, backwards compatible with GBA games, it is NOT backwards compatible with older 8bit games (mono/color gameboys) though..
Also, the DS has no link port, so that GBA games will thus work only in single player mode, link-port accessoires like printers cannot be used, and most unfortunately multiboot won't work (trying to press Select+Start at powerup will just lock up the DS).

iQue Notes
iQue is a brand name used by Nintendo in China, iQue GBA and iQue DS are essentially same as Nintendo GBA and Nintendo DS.
The iQue DS contains a larger firmware chip (the charset additionally contains about 6700 simplified chinese characters), the bootmenu still allows to select (only) six languages (japanese has been replaced by chinese). The iQue DS can play normal international NDS games, plus chinese dedicated games. The latter ones won't work on normal NDS consoles (that, reportedly simply due to a firmware-version check contained in chinese dedicated games, aside from that check, the games should be fully compatible with NDS consoles).

  GBA Memory Map

General Internal Memory
  00000000-00003FFF   BIOS - System ROM         (16 KBytes)
  00004000-01FFFFFF   Not used
  02000000-0203FFFF   WRAM - On-board Work RAM  (256 KBytes) 2 Wait
  02040000-02FFFFFF   Not used
  03000000-03007FFF   WRAM - On-chip Work RAM   (32 KBytes)
  03008000-03FFFFFF   Not used
  04000000-040003FE   I/O Registers
  04000400-04FFFFFF   Not used
Internal Display Memory
  05000000-050003FF   BG/OBJ Palette RAM        (1 Kbyte)
  05000400-05FFFFFF   Not used
  06000000-06017FFF   VRAM - Video RAM          (96 KBytes)
  06018000-06FFFFFF   Not used
  07000000-070003FF   OAM - OBJ Attributes      (1 Kbyte)
  07000400-07FFFFFF   Not used
External Memory (Game Pak)
  08000000-09FFFFFF   Game Pak ROM/FlashROM (max 32MB) - Wait State 0
  0A000000-0BFFFFFF   Game Pak ROM/FlashROM (max 32MB) - Wait State 1
  0C000000-0DFFFFFF   Game Pak ROM/FlashROM (max 32MB) - Wait State 2
  0E000000-0E00FFFF   Game Pak SRAM    (max 64 KBytes) - 8bit Bus width
  0E010000-0FFFFFFF   Not used
Unused Memory Area
  10000000-FFFFFFFF   Not used (upper 4bits of address bus unused)

Default WRAM Usage
By default, the 256 bytes at 03007F00h-03007FFFh in Work RAM are reserved for Interrupt vector, Interrupt Stack, and BIOS Call Stack. The remaining WRAM is free for whatever use (including User Stack, which is initially located at 03007F00h).

Address Bus Width and CPU Read/Write Access Widths
Shows the Bus-Width, supported read and write widths, and the clock cycles for 8/16/32bit accesses.
  Region        Bus   Read      Write     Cycles
  BIOS ROM      32    8/16/32   -         1/1/1
  Work RAM 32K  32    8/16/32   8/16/32   1/1/1
  I/O           32    8/16/32   8/16/32   1/1/1
  OAM           32    8/16/32   16/32     1/1/1 *
  Work RAM 256K 16    8/16/32   8/16/32   3/3/6 **
  Palette RAM   16    8/16/32   16/32     1/1/2 *
  VRAM          16    8/16/32   16/32     1/1/2 *
  GamePak ROM   16    8/16/32   -         5/5/8 **/***
  GamePak Flash 16    8/16/32   16/32     5/5/8 **/***
  GamePak SRAM  8     8         8         5     **
Timing Notes:
  *   Plus 1 cycle if GBA accesses video memory at the same time.
  **  Default waitstate settings, see System Control chapter.
  *** Separate timings for sequential, and non-sequential accesses.
  One cycle equals approx. 59.59ns (ie. 16.78MHz clock).
All memory (except GamePak SRAM) can be accessed by 16bit and 32bit DMA.

GamePak Memory
Only DMA3 (and the CPU of course) may access GamePak ROM. GamePak SRAM can be accessed by the CPU only - restricted to bytewise 8bit transfers. The SRAM region is supposed for as external FLASH backup memory, or for battery-backed SRAM.
For details about configuration of GamePak Waitstates, see:
GBA System Control

VRAM, OAM, and Palette RAM Access
These memory regions can be accessed during H-Blank or V-Blank only (unless display is disabled by Forced Blank bit in DISPCNT register).
There is an additional restriction for OAM memory: Accesses during H-Blank are allowed only if 'H-Blank Interval Free' in DISPCNT is set (which'd reduce number of display-able OBJs though).
The CPU appears to be able to access VRAM/OAM/Palette at any time, a waitstate (one clock cycle) being inserted automatically in case that the display controller was accessing memory simultaneously. (Ie. unlike as in old 8bit gameboy, the data will not get lost.)

CPU Mode Performance
Note that the GamePak ROM bus is limited to 16bits, thus executing ARM instructions (32bit opcodes) from inside of GamePak ROM would result in a not so good performance. So, it'd be more recommended to use THUMB instruction (16bit opcodes) which'd allow each opcode to be read at once.
(ARM instructions can be used at best performance by copying code from GamePak ROM into internal Work RAM)

Data Format
Even though the ARM CPU itself would allow to select between Little-Endian and Big-Endian format by using an external circuit, in the GBA no such circuit exists, and the data format is always Little-Endian. That is, when accessing 16bit or 32bit data in memory, the least significant bits are stored in the first byte (smallest address), and the most significant bits in the last byte. (Ie. same as for 80x86 and Z80 CPUs.)

  GBA I/O Map

LCD I/O Registers
  4000000h  2    R/W  DISPCNT   LCD Control
  4000002h  2    R/W  -         Undocumented - Green Swap
  4000004h  2    R/W  DISPSTAT  General LCD Status (STAT,LYC)
  4000006h  2    R    VCOUNT    Vertical Counter (LY)
  4000008h  2    R/W  BG0CNT    BG0 Control
  400000Ah  2    R/W  BG1CNT    BG1 Control
  400000Ch  2    R/W  BG2CNT    BG2 Control
  400000Eh  2    R/W  BG3CNT    BG3 Control
  4000010h  2    W    BG0HOFS   BG0 X-Offset
  4000012h  2    W    BG0VOFS   BG0 Y-Offset
  4000014h  2    W    BG1HOFS   BG1 X-Offset
  4000016h  2    W    BG1VOFS   BG1 Y-Offset
  4000018h  2    W    BG2HOFS   BG2 X-Offset
  400001Ah  2    W    BG2VOFS   BG2 Y-Offset
  400001Ch  2    W    BG3HOFS   BG3 X-Offset
  400001Eh  2    W    BG3VOFS   BG3 Y-Offset
  4000020h  2    W    BG2PA     BG2 Rotation/Scaling Parameter A (dx)
  4000022h  2    W    BG2PB     BG2 Rotation/Scaling Parameter B (dmx)
  4000024h  2    W    BG2PC     BG2 Rotation/Scaling Parameter C (dy)
  4000026h  2    W    BG2PD     BG2 Rotation/Scaling Parameter D (dmy)
  4000028h  4    W    BG2X      BG2 Reference Point X-Coordinate
  400002Ch  4    W    BG2Y      BG2 Reference Point Y-Coordinate
  4000030h  2    W    BG3PA     BG3 Rotation/Scaling Parameter A (dx)
  4000032h  2    W    BG3PB     BG3 Rotation/Scaling Parameter B (dmx)
  4000034h  2    W    BG3PC     BG3 Rotation/Scaling Parameter C (dy)
  4000036h  2    W    BG3PD     BG3 Rotation/Scaling Parameter D (dmy)
  4000038h  4    W    BG3X      BG3 Reference Point X-Coordinate
  400003Ch  4    W    BG3Y      BG3 Reference Point Y-Coordinate
  4000040h  2    W    WIN0H     Window 0 Horizontal Dimensions
  4000042h  2    W    WIN1H     Window 1 Horizontal Dimensions
  4000044h  2    W    WIN0V     Window 0 Vertical Dimensions
  4000046h  2    W    WIN1V     Window 1 Vertical Dimensions
  4000048h  2    R/W  WININ     Inside of Window 0 and 1
  400004Ah  2    R/W  WINOUT    Inside of OBJ Window & Outside of Windows
  400004Ch  2    W    MOSAIC    Mosaic Size
  400004Eh       -    -         Not used
  4000050h  2    R/W  BLDCNT    Color Special Effects Selection
  4000052h  2    W    BLDALPHA  Alpha Blending Coefficients
  4000054h  2    W    BLDY      Brightness (Fade-In/Out) Coefficient
  4000056h       -    -         Not used
Sound Registers
  4000060h  2  R/W  SOUND1CNT_L Channel 1 Sweep register       (NR10)
  4000062h  2  R/W  SOUND1CNT_H Channel 1 Duty/Length/Envelope (NR11, NR12)
  4000064h  2  R/W  SOUND1CNT_X Channel 1 Frequency/Control    (NR13, NR14)
  4000066h     -    -           Not used
  4000068h  2  R/W  SOUND2CNT_L Channel 2 Duty/Length/Envelope (NR21, NR22)
  400006Ah     -    -           Not used
  400006Ch  2  R/W  SOUND2CNT_H Channel 2 Frequency/Control    (NR23, NR24)
  400006Eh     -    -           Not used
  4000070h  2  R/W  SOUND3CNT_L Channel 3 Stop/Wave RAM select (NR30)
  4000072h  2  R/W  SOUND3CNT_H Channel 3 Length/Volume        (NR31, NR32)
  4000074h  2  R/W  SOUND3CNT_X Channel 3 Frequency/Control    (NR33, NR34)
  4000076h     -    -           Not used
  4000078h  2  R/W  SOUND4CNT_L Channel 4 Length/Envelope      (NR41, NR42)
  400007Ah     -    -           Not used
  400007Ch  2  R/W  SOUND4CNT_H Channel 4 Frequency/Control    (NR43, NR44)
  400007Eh     -    -           Not used
  4000080h  2  R/W  SOUNDCNT_L  Control Stereo/Volume/Enable   (NR50, NR51)
  4000082h  2  R/W  SOUNDCNT_H  Control Mixing/DMA Control
  4000084h  2  R/W  SOUNDCNT_X  Control Sound on/off           (NR52)
  4000086h     -    -           Not used
  4000088h  2  BIOS SOUNDBIAS   Sound PWM Control
  400008Ah  ..   -    -         Not used
  4000090h 2x10h R/W  WAVE_RAM  Channel 3 Wave Pattern RAM (2 banks!!)
  40000A0h  4    W    FIFO_A    Channel A FIFO, Data 0-3
  40000A4h  4    W    FIFO_B    Channel B FIFO, Data 0-3
  40000A8h       -    -         Not used
DMA Transfer Channels
  40000B0h  4    W    DMA0SAD   DMA 0 Source Address
  40000B4h  4    W    DMA0DAD   DMA 0 Destination Address
  40000B8h  2    W    DMA0CNT_L DMA 0 Word Count
  40000BAh  2    R/W  DMA0CNT_H DMA 0 Control
  40000BCh  4    W    DMA1SAD   DMA 1 Source Address
  40000C0h  4    W    DMA1DAD   DMA 1 Destination Address
  40000C4h  2    W    DMA1CNT_L DMA 1 Word Count
  40000C6h  2    R/W  DMA1CNT_H DMA 1 Control
  40000C8h  4    W    DMA2SAD   DMA 2 Source Address
  40000CCh  4    W    DMA2DAD   DMA 2 Destination Address
  40000D0h  2    W    DMA2CNT_L DMA 2 Word Count
  40000D2h  2    R/W  DMA2CNT_H DMA 2 Control
  40000D4h  4    W    DMA3SAD   DMA 3 Source Address
  40000D8h  4    W    DMA3DAD   DMA 3 Destination Address
  40000DCh  2    W    DMA3CNT_L DMA 3 Word Count
  40000DEh  2    R/W  DMA3CNT_H DMA 3 Control
  40000E0h       -    -         Not used
Timer Registers
  4000100h  2    R/W  TM0CNT_L  Timer 0 Counter/Reload
  4000102h  2    R/W  TM0CNT_H  Timer 0 Control
  4000104h  2    R/W  TM1CNT_L  Timer 1 Counter/Reload
  4000106h  2    R/W  TM1CNT_H  Timer 1 Control
  4000108h  2    R/W  TM2CNT_L  Timer 2 Counter/Reload
  400010Ah  2    R/W  TM2CNT_H  Timer 2 Control
  400010Ch  2    R/W  TM3CNT_L  Timer 3 Counter/Reload
  400010Eh  2    R/W  TM3CNT_H  Timer 3 Control
  4000110h       -    -         Not used
Serial Communication (1)
  4000120h  4    R/W  SIODATA32 SIO Data (Normal-32bit Mode; shared with below)
  4000120h  2    R/W  SIOMULTI0 SIO Data 0 (Parent)    (Multi-Player Mode)
  4000122h  2    R/W  SIOMULTI1 SIO Data 1 (1st Child) (Multi-Player Mode)
  4000124h  2    R/W  SIOMULTI2 SIO Data 2 (2nd Child) (Multi-Player Mode)
  4000126h  2    R/W  SIOMULTI3 SIO Data 3 (3rd Child) (Multi-Player Mode)
  4000128h  2    R/W  SIOCNT    SIO Control Register
  400012Ah  2    R/W  SIOMLT_SEND SIO Data (Local of MultiPlayer; shared below)
  400012Ah  2    R/W  SIODATA8  SIO Data (Normal-8bit and UART Mode)
  400012Ch       -    -         Not used
Keypad Input
  4000130h  2    R    KEYINPUT  Key Status
  4000132h  2    R/W  KEYCNT    Key Interrupt Control
Serial Communication (2)
  4000134h  2    R/W  RCNT      SIO Mode Select/General Purpose Data
  4000136h  -    -    IR        Ancient - Infrared Register (Prototypes only)
  4000138h       -    -         Not used
  4000140h  2    R/W  JOYCNT    SIO JOY Bus Control
  4000142h       -    -         Not used
  4000150h  4    R/W  JOY_RECV  SIO JOY Bus Receive Data
  4000154h  4    R/W  JOY_TRANS SIO JOY Bus Transmit Data
  4000158h  2    R/?  JOYSTAT   SIO JOY Bus Receive Status
  400015Ah       -    -         Not used
Interrupt, Waitstate, and Power-Down Control
  4000200h  2    R/W  IE        Interrupt Enable Register
  4000202h  2    R/W  IF        Interrupt Request Flags / IRQ Acknowledge
  4000204h  2    R/W  WAITCNT   Game Pak Waitstate Control
  4000206h       -    -         Not used
  4000208h  2    R/W  IME       Interrupt Master Enable Register
  400020Ah       -    -         Not used
  4000300h  1    R/W  POSTFLG   Undocumented - Post Boot Flag
  4000301h  1    W    HALTCNT   Undocumented - Power Down Control
  4000302h       -    -         Not used
  4000410h  ?    ?    ?         Undocumented - Purpose Unknown / Bug ??? 0FFh
  4000411h       -    -         Not used
  4000800h  4    R/W  ?         Undocumented - Internal Memory Control (R/W)
  4000804h       -    -         Not used
  4xx0800h  4    R/W  ?         Mirrors of 4000800h (repeated each 64K)

All further addresses at 4XXXXXXh are unused and do not contain mirrors of the I/O area, with the only exception that 4000800h is repeated each 64K (ie. mirrored at 4010800h, 4020800h, etc.)

  GBA LCD Video Controller

LCD I/O Display Control
LCD I/O Interrupts and Status
LCD I/O BG Control
LCD I/O BG Scrolling
LCD I/O BG Rotation/Scaling
LCD I/O Window Feature
LCD I/O Mosaic Function
LCD I/O Color Special Effects

LCD VRAM Overview
LCD VRAM Character Data
LCD VRAM BG Screen Data Format (BG Map)
LCD VRAM Bitmap BG Modes

LCD OBJ - Overview
LCD OBJ - OAM Attributes
LCD OBJ - OAM Rotation/Scaling Parameters
LCD OBJ - VRAM Character (Tile) Mapping

LCD Color Palettes
LCD Dimensions and Timings

  LCD I/O Display Control

4000000h - DISPCNT - LCD Control (Read/Write)
  Bit   Expl.
  0-2   BG Mode                (0-5=Video Mode 0-5, 6-7=Prohibited)
  3     Reserved / CGB Mode    (0=GBA, 1=CGB; can be set only by BIOS opcodes)
  4     Display Frame Select   (0-1=Frame 0-1) (for BG Modes 4,5 only)
  5     H-Blank Interval Free  (1=Allow access to OAM during H-Blank)
  6     OBJ Character VRAM Mapping (0=Two dimensional, 1=One dimensional)
  7     Forced Blank           (1=Allow FAST access to VRAM,Palette,OAM)
  8     Screen Display BG0  (0=Off, 1=On)
  9     Screen Display BG1  (0=Off, 1=On)
  10    Screen Display BG2  (0=Off, 1=On)
  11    Screen Display BG3  (0=Off, 1=On)
  12    Screen Display OBJ  (0=Off, 1=On)
  13    Window 0 Display Flag   (0=Off, 1=On)
  14    Window 1 Display Flag   (0=Off, 1=On)
  15    OBJ Window Display Flag (0=Off, 1=On)

The table summarizes the facilities of the separate BG modes (video modes).
  Mode  Rot/Scal Layers Size               Tiles Colors       Features
  0     No       0123   256x256..512x515   1024  16/16..256/1 SFMABP
  1     Mixed    012-   (BG0,BG1 as above Mode 0, BG2 as below Mode 2)
  2     Yes      --23   128x128..1024x1024 256   256/1        S-MABP
  3     Yes      --2-   240x160            1     32768        --MABP
  4     Yes      --2-   240x160            2     256/1        --MABP
  5     Yes      --2-   160x128            2     32768        --MABP
Features: S)crolling, F)lip, M)osaic, A)lphaBlending, B)rightness, P)riority.

BG Modes 0-2 are Tile/Map-based. BG Modes 3-5 are Bitmap-based, in these modes 1 or 2 Frames (ie. bitmaps, or 'full screen tiles') exists, if two frames exist, either one can be displayed, and the other one can be redrawn in background.

Blanking Bits
Setting Forced Blank (Bit 7) causes the video controller to display white lines, and all VRAM, Palette RAM, and OAM may be accessed.
"When the internal HV synchronous counter cancels a forced blank during a display period, the display begins from the beginning, following the display of two vertical lines." What ?
Setting H-Blank Interval Free (Bit 5) allows to access OAM during H-Blank time - using this feature reduces the number of sprites that can be displayed per line.

Display Enable Bits
By default, BG0-3 and OBJ Display Flags (Bit 8-12) are used to enable/disable BGs and OBJ. When enabling Window 0 and/or 1 (Bit 13-14), color special effects may be used, and BG0-3 and OBJ are controlled by the window(s).

Frame Selection
In BG Modes 4 and 5 (Bitmap modes), either one of the two bitmaps/frames may be displayed (Bit 4), allowing the user to update the other (invisible) frame in background. In BG Mode 3, only one frame exists.
In BG Modes 0-2 (Tile/Map based modes), a similar effect may be gained by altering the base address(es) of BG Map and/or BG Character data.

4000002h - Undocumented - Green Swap (R/W)
Normally, red green blue intensities for a group of two pixels is output as BGRbgr (uppercase for left pixel at even xloc, lowercase for right pixel at odd xloc). When the Green Swap bit is set, each pixel group is output as BgRbGr (ie. green intensity of each two pixels exchanged).
  Bit   Expl.
  0     Green Swap  (0=Normal, 1=Swap)
  1-15  Not used
This feature appears to be applied to the final picture (ie. after mixing the separate BG and OBJ layers). Eventually intended for other display types (with other pin-outs). With normal GBA hardware it is just producing an interesting dirt effect.
The NDS DISPCNT registers are 32bit (4000000h..4000003h), so Green Swap doesn't exist in NDS mode, however, the NDS does support Green Swap in GBA mode.

  LCD I/O Interrupts and Status

4000004h - DISPSTAT - General LCD Status (Read/Write)
Display status and Interrupt control. The H-Blank conditions are generated once per scanline, including for the 'hidden' scanlines during V-Blank.
  Bit   Expl.
  0     V-Blank flag   (Read only) (1=VBlank) (set in line 160..226; not 227)
  1     H-Blank flag   (Read only) (1=HBlank) (toggled in all lines, 0..227)
  2     V-Counter flag (Read only) (1=Match)  (set in selected line)     (R)
  3     V-Blank IRQ Enable         (1=Enable)                          (R/W)
  4     H-Blank IRQ Enable         (1=Enable)                          (R/W)
  5     V-Counter IRQ Enable       (1=Enable)                          (R/W)
  6     Not used (0) / DSi: LCD Initialization Ready (0=Busy, 1=Ready)   (R)
  7     Not used (0) / NDS: MSB of V-Vcount Setting (LYC.Bit8) (0..262)(R/W)
  8-15  V-Count Setting (LYC)      (0..227)                            (R/W)
The V-Count-Setting value is much the same as LYC of older gameboys, when its value is identical to the content of the VCOUNT register then the V-Counter flag is set (Bit 2), and (if enabled in Bit 5) an interrupt is requested.
Although the drawing time is only 960 cycles (240*4), the H-Blank flag is "0" for a total of 1006 cycles.

4000006h - VCOUNT - Vertical Counter (Read only)
Indicates the currently drawn scanline, values in range from 160..227 indicate 'hidden' scanlines within VBlank area.
  Bit   Expl.
  0-7   Current Scanline (LY)      (0..227)                              (R)
  8     Not used (0) / NDS: MSB of Current Scanline (LY.Bit8) (0..262)   (R)
  9-15  Not Used (0)
Note: This is much the same than the 'LY' register of older gameboys.

  LCD I/O BG Control

4000008h - BG0CNT - BG0 Control (R/W) (BG Modes 0,1 only)
400000Ah - BG1CNT - BG1 Control (R/W) (BG Modes 0,1 only)
400000Ch - BG2CNT - BG2 Control (R/W) (BG Modes 0,1,2 only)
400000Eh - BG3CNT - BG3 Control (R/W) (BG Modes 0,2 only)
  Bit   Expl.
  0-1   BG Priority           (0-3, 0=Highest)
  2-3   Character Base Block  (0-3, in units of 16 KBytes) (=BG Tile Data)
  4-5   Not used (must be zero)
  6     Mosaic                (0=Disable, 1=Enable)
  7     Colors/Palettes       (0=16/16, 1=256/1)
  8-12  Screen Base Block     (0-31, in units of 2 KBytes) (=BG Map Data)
  13    Display Area Overflow (0=Transparent, 1=Wraparound; BG2CNT/BG3CNT only)
  14-15 Screen Size (0-3)
Internal Screen Size (dots) and size of BG Map (bytes):
  Value  Text Mode      Rotation/Scaling Mode
  0      256x256 (2K)   128x128   (256 bytes)
  1      512x256 (4K)   256x256   (1K)
  2      256x512 (4K)   512x512   (4K)
  3      512x512 (8K)   1024x1024 (16K)
In case that some or all BGs are set to same priority then BG0 is having the highest, and BG3 the lowest priority.

In 'Text Modes', the screen size is organized as follows: The screen consists of one or more 256x256 pixel (32x32 tiles) areas. When Size=0: only 1 area (SC0), when Size=1 or Size=2: two areas (SC0,SC1 either horizontally or vertically arranged next to each other), when Size=3: four areas (SC0,SC1 in upper row, SC2,SC3 in lower row). Whereas SC0 is defined by the normal BG Map base address (Bit 8-12 of BG#CNT), SC1 uses same address +2K, SC2 address +4K, SC3 address +6K. When the screen is scrolled it'll always wraparound.

In 'Rotation/Scaling Modes', the screen size is organized as follows, only one area (SC0) of variable size 128x128..1024x1024 pixels (16x16..128x128 tiles) exists. When the screen is rotated/scaled (or scrolled?) so that the LCD viewport reaches outside of the background/screen area, then BG may be either displayed as transparent or wraparound (Bit 13 of BG#CNT).

  LCD I/O BG Scrolling

4000010h - BG0HOFS - BG0 X-Offset (W)
4000012h - BG0VOFS - BG0 Y-Offset (W)
  Bit   Expl.
  0-8   Offset (0-511)
  9-15  Not used
Specifies the coordinate of the upperleft first visible dot of BG0 background layer, ie. used to scroll the BG0 area.

4000014h - BG1HOFS - BG1 X-Offset (W)
4000016h - BG1VOFS - BG1 Y-Offset (W)
Same as above BG0HOFS and BG0VOFS for BG1 respectively.

4000018h - BG2HOFS - BG2 X-Offset (W)
400001Ah - BG2VOFS - BG2 Y-Offset (W)
Same as above BG0HOFS and BG0VOFS for BG2 respectively.

400001Ch - BG3HOFS - BG3 X-Offset (W)
400001Eh - BG3VOFS - BG3 Y-Offset (W)
Same as above BG0HOFS and BG0VOFS for BG3 respectively.

The above BG scrolling registers are exclusively used in Text modes, ie. for all layers in BG Mode 0, and for the first two layers in BG mode 1.
In other BG modes (Rotation/Scaling and Bitmap modes) above registers are ignored. Instead, the screen may be scrolled by modifying the BG Rotation/Scaling Reference Point registers.

  LCD I/O BG Rotation/Scaling

4000028h - BG2X_L - BG2 Reference Point X-Coordinate, lower 16 bit (W)
400002Ah - BG2X_H - BG2 Reference Point X-Coordinate, upper 12 bit (W)
400002Ch - BG2Y_L - BG2 Reference Point Y-Coordinate, lower 16 bit (W)
400002Eh - BG2Y_H - BG2 Reference Point Y-Coordinate, upper 12 bit (W)
These registers are replacing the BG scrolling registers which are used for Text mode, ie. the X/Y coordinates specify the source position from inside of the BG Map/Bitmap of the pixel to be displayed at upper left of the GBA display. The normal BG scrolling registers are ignored in Rotation/Scaling and Bitmap modes.
  Bit   Expl.
  0-7   Fractional portion (8 bits)
  8-26  Integer portion    (19 bits)
  27    Sign               (1 bit)
  28-31 Not used
Because values are shifted left by eight, fractional portions may be specified in steps of 1/256 pixels (this would be relevant only if the screen is actually rotated or scaled). Normal signed 32bit values may be written to above registers (the most significant bits will be ignored and the value will be cut-down to 28bits, but this is no actual problem because signed values have set all MSBs to the same value).

Internal Reference Point Registers
The above reference points are automatically copied to internal registers during each vblank, specifying the origin for the first scanline. The internal registers are then incremented by dmx and dmy after each scanline.
Caution: Writing to a reference point register by software outside of the Vblank period does immediately copy the new value to the corresponding internal register, that means: in the current frame, the new value specifies the origin of the <current> scanline (instead of the topmost scanline).

4000020h - BG2PA - BG2 Rotation/Scaling Parameter A (alias dx) (W)
4000022h - BG2PB - BG2 Rotation/Scaling Parameter B (alias dmx) (W)
4000024h - BG2PC - BG2 Rotation/Scaling Parameter C (alias dy) (W)
4000026h - BG2PD - BG2 Rotation/Scaling Parameter D (alias dmy) (W)
  Bit   Expl.
  0-7   Fractional portion (8 bits)
  8-14  Integer portion    (7 bits)
  15    Sign               (1 bit)
See below for details.

400003Xh - BG3X_L/H, BG3Y_L/H, BG3PA-D - BG3 Rotation/Scaling Parameters
Same as above BG2 Reference Point, and Rotation/Scaling Parameters, for BG3 respectively.

dx (PA) and dy (PC)
When transforming a horizontal line, dx and dy specify the resulting gradient and magnification for that line. For example:
Horizontal line, length=100, dx=1, and dy=1. The resulting line would be drawn at 45 degrees, f(y)=1/1*x. Note that this would involve that line is magnified, the new length is SQR(100^2+100^2)=141.42. Yup, exactly - that's the old a^2 + b^2 = c^2 formula.

dmx (PB) and dmy (PD)
These values define the resulting gradient and magnification for transformation of vertical lines. However, when rotating a square area (which is surrounded by horizontal and vertical lines), then the desired result should be usually a rotated <square> area (ie. not a parallelogram, for example).
Thus, dmx and dmy must be defined in direct relationship to dx and dy, taking the example above, we'd have to set dmx=-1, and dmy=1, f(x)=-1/1*y.

Area Overflow
In result of rotation/scaling it may often happen that areas outside of the actual BG area become moved into the LCD viewport. Depending of the Area Overflow bit (BG2CNT and BG3CNT, Bit 13) these areas may be either displayed (by wrapping the BG area), or may be displayed transparent.
This works only in BG modes 1 and 2. The area overflow is ignored in Bitmap modes (BG modes 3-5), the outside of the Bitmaps is always transparent.

--- more details and confusing or helpful formulas ---

The following parameters are required for Rotation/Scaling
  Rotation Center X and Y Coordinates (x0,y0)
  Rotation Angle                      (alpha)
  Magnification X and Y Values        (xMag,yMag)
The display is rotated by 'alpha' degrees around the center.
The displayed picture is magnified by 'xMag' along x-Axis (Y=y0) and 'yMag' along y-Axis (X=x0).

Calculating Rotation/Scaling Parameters A-D
  A = Cos (alpha) / xMag    ;distance moved in direction x, same line
  B = Sin (alpha) / xMag    ;distance moved in direction x, next line
  C = Sin (alpha) / yMag    ;distance moved in direction y, same line
  D = Cos (alpha) / yMag    ;distance moved in direction y, next line

Calculating the position of a rotated/scaled dot
Using the following expressions,
  x0,y0    Rotation Center
  x1,y1    Old Position of a pixel (before rotation/scaling)
  x2,y2    New position of above pixel (after rotation scaling)
  A,B,C,D  BG2PA-BG2PD Parameters (as calculated above)
the following formula can be used to calculate x2,y2:
  x2 = A(x1-x0) + B(y1-y0) + x0
  y2 = C(x1-x0) + D(y1-y0) + y0

  LCD I/O Window Feature

The Window Feature may be used to split the screen into four regions. The BG0-3,OBJ layers and Color Special Effects can be separately enabled or disabled in each of these regions.

The DISPCNT Register
DISPCNT Bits 13-15 are used to enable Window 0, Window 1, and/or OBJ Window regions, if any of these regions is enabled then the "Outside of Windows" region is automatically enabled, too.
DISPCNT Bits 8-12 are kept used as master enable bits for the BG0-3,OBJ layers, a layer is displayed only if both DISPCNT and WININ/OUT enable bits are set.

4000040h - WIN0H - Window 0 Horizontal Dimensions (W)
4000042h - WIN1H - Window 1 Horizontal Dimensions (W)
  Bit   Expl.
  0-7   X2, Rightmost coordinate of window, plus 1
  8-15  X1, Leftmost coordinate of window
Garbage values of X2>240 or X1>X2 are interpreted as X2=240.

4000044h - WIN0V - Window 0 Vertical Dimensions (W)
4000046h - WIN1V - Window 1 Vertical Dimensions (W)
  Bit   Expl.
  0-7   Y2, Bottom-most coordinate of window, plus 1
  8-15  Y1, Top-most coordinate of window
Garbage values of Y2>160 or Y1>Y2 are interpreted as Y2=160.

4000048h - WININ - Control of Inside of Window(s) (R/W)
  Bit   Expl.
  0-3   Window 0 BG0-BG3 Enable Bits     (0=No Display, 1=Display)
  4     Window 0 OBJ Enable Bit          (0=No Display, 1=Display)
  5     Window 0 Color Special Effect    (0=Disable, 1=Enable)
  6-7   Not used
  8-11  Window 1 BG0-BG3 Enable Bits     (0=No Display, 1=Display)
  12    Window 1 OBJ Enable Bit          (0=No Display, 1=Display)
  13    Window 1 Color Special Effect    (0=Disable, 1=Enable)
  14-15 Not used

400004Ah - WINOUT - Control of Outside of Windows & Inside of OBJ Window (R/W)
  Bit   Expl.
  0-3   Outside BG0-BG3 Enable Bits      (0=No Display, 1=Display)
  4     Outside OBJ Enable Bit           (0=No Display, 1=Display)
  5     Outside Color Special Effect     (0=Disable, 1=Enable)
  6-7   Not used
  8-11  OBJ Window BG0-BG3 Enable Bits   (0=No Display, 1=Display)
  12    OBJ Window OBJ Enable Bit        (0=No Display, 1=Display)
  13    OBJ Window Color Special Effect  (0=Disable, 1=Enable)
  14-15 Not used

The OBJ Window
The dimension of the OBJ Window is specified by OBJs which are having the "OBJ Mode" attribute being set to "OBJ Window". Any non-transparent dots of any such OBJs are marked as OBJ Window area. The OBJ itself is not displayed.
The color, palette, and display priority of these OBJs are ignored. Both DISPCNT Bits 12 and 15 must be set when defining OBJ Window region(s).

Window Priority
In case that more than one window is enabled, and that these windows do overlap, Window 0 is having highest priority, Window 1 medium, and Obj Window lowest priority. Outside of Window is having zero priority, it is used for all dots which are not inside of any window region.

  LCD I/O Mosaic Function

400004Ch - MOSAIC - Mosaic Size (W)
The Mosaic function can be separately enabled/disabled for BG0-BG3 by BG0CNT-BG3CNT Registers, as well as for each OBJ0-127 by OBJ attributes in OAM memory. Also, setting all of the bits below to zero effectively disables the mosaic function.
  Bit   Expl.
  0-3   BG Mosaic H-Size  (minus 1)
  4-7   BG Mosaic V-Size  (minus 1)
  8-11  OBJ Mosaic H-Size (minus 1)
  12-15 OBJ Mosaic V-Size (minus 1)
  16-31 Not used
Example: When setting H-Size to 5, then pixels 0-5 of each display row are colorized as pixel 0, pixels 6-11 as pixel 6, pixels 12-17 as pixel 12, and so on.

Normally, a 'mosaic-pixel' is colorized by the color of the upperleft covered pixel. In many cases it might be more desireful to use the color of the pixel in the center of the covered area - this effect may be gained by scrolling the background (or by adjusting the OBJ position, as far as upper/left rows/columns of OBJ are transparent).

  LCD I/O Color Special Effects

Two types of Special Effects are supported: Alpha Blending (Semi-Transparency) allows to combine colors of two selected surfaces. Brightness Increase/Decrease adjust the brightness of the selected surface.

4000050h - BLDCNT - Color Special Effects Selection (R/W)
  Bit   Expl.
  0     BG0 1st Target Pixel (Background 0)
  1     BG1 1st Target Pixel (Background 1)
  2     BG2 1st Target Pixel (Background 2)
  3     BG3 1st Target Pixel (Background 3)
  4     OBJ 1st Target Pixel (Top-most OBJ pixel)
  5     BD  1st Target Pixel (Backdrop)
  6-7   Color Special Effect (0-3, see below)
         0 = None                (Special effects disabled)
         1 = Alpha Blending      (1st+2nd Target mixed)
         2 = Brightness Increase (1st Target becomes whiter)
         3 = Brightness Decrease (1st Target becomes blacker)
  8     BG0 2nd Target Pixel (Background 0)
  9     BG1 2nd Target Pixel (Background 1)
  10    BG2 2nd Target Pixel (Background 2)
  11    BG3 2nd Target Pixel (Background 3)
  12    OBJ 2nd Target Pixel (Top-most OBJ pixel)
  13    BD  2nd Target Pixel (Backdrop)
  14-15 Not used
Selects the 1st Target layer(s) for special effects. For Alpha Blending/Semi-Transparency, it does also select the 2nd Target layer(s), which should have next lower display priority as the 1st Target.
However, any combinations are possible, including that all layers may be selected as both 1st+2nd target, in that case the top-most pixel will be used as 1st target, and the next lower pixel as 2nd target.

4000052h - BLDALPHA - Alpha Blending Coefficients (W)
Used for Color Special Effects Mode 1, and for Semi-Transparent OBJs.
  Bit   Expl.
  0-4   EVA Coefficient (1st Target) (0..16 = 0/16..16/16, 17..31=16/16)
  5-7   Not used
  8-12  EVB Coefficient (2nd Target) (0..16 = 0/16..16/16, 17..31=16/16)
  13-15 Not used
For this effect, the top-most non-transparent pixel must be selected as 1st Target, and the next-lower non-transparent pixel must be selected as 2nd Target, if so - and only if so, then color intensities of 1st and 2nd Target are mixed together by using the parameters in BLDALPHA register, for each pixel each R, G, B intensities are calculated separately:
  I = MIN ( 31, I1st*EVA + I2nd*EVB )
Otherwise - for example, if only one target exists, or if a non-transparent non-2nd-target pixel is moved between the two targets, or if 2nd target has higher display priority than 1st target - then only the top-most pixel is displayed (at normal intensity, regardless of BLDALPHA).

4000054h - BLDY - Brightness (Fade-In/Out) Coefficient (W)
Used for Color Special Effects Modes 2 and 3.
  Bit   Expl.
  0-4   EVY Coefficient (Brightness) (0..16 = 0/16..16/16, 17..31=16/16)
  5-31  Not used
For each pixel each R, G, B intensities are calculated separately:
  I = I1st + (31-I1st)*EVY   ;For Brightness Increase
  I = I1st - (I1st)*EVY      ;For Brightness Decrease
The color intensities of any selected 1st target surface(s) are increased or decreased by using the parameter in BLDY register.

Semi-Transparent OBJs
OBJs that are defined as 'Semi-Transparent' in OAM memory are always selected as 1st Target (regardless of BLDCNT Bit 4), and are always using Alpha Blending mode (regardless of BLDCNT Bit 6-7).
The BLDCNT register may be used to perform Brightness effects on the OBJ (and/or other BG/BD layers). However, if a semi-transparent OBJ pixel does overlap a 2nd target pixel, then semi-transparency becomes priority, and the brightness effect will not take place (neither on 1st, nor 2nd target).

The OBJ Layer
Before special effects are applied, the display controller computes the OBJ priority ordering, and isolates the top-most OBJ pixel. In result, only the top-most OBJ pixel is recursed at the time when processing special effects. Ie. alpha blending and semi-transparency can be used for OBJ-to-BG or BG-to-OBJ , but not for OBJ-to-OBJ.

  LCD VRAM Overview

The GBA contains 96 Kbytes VRAM built-in, located at address 06000000-06017FFF, depending on the BG Mode used as follows:

BG Mode 0,1,2 (Tile/Map based Modes)
  06000000-0600FFFF  64 KBytes shared for BG Map and Tiles
  06010000-06017FFF  32 KBytes OBJ Tiles
The shared 64K area can be split into BG Map area(s), and BG Tiles area(s), the respective addresses for Map and Tile areas are set up by BG0CNT-BG3CNT registers. The Map address may be specified in units of 2K (steps of 800h), the Tile address in units of 16K (steps of 4000h).

BG Mode 0,1 (Tile/Map based Text mode)
The tiles may have 4bit or 8bit color depth, minimum map size is 32x32 tiles, maximum is 64x64 tiles, up to 1024 tiles can be used per map.
  Item        Depth     Required Memory
  One Tile    4bit      20h bytes
  One Tile    8bit      40h bytes
  1024 Tiles  4bit      8000h (32K)
  1024 Tiles  8bit      10000h (64K) - excluding some bytes for BG map
  BG Map      32x32     800h (2K)
  BG Map      64x64     2000h (8K)

BG Mode 1,2 (Tile/Map based Rotation/Scaling mode)
The tiles may have 8bit color depth only, minimum map size is 16x16 tiles, maximum is 128x128 tiles, up to 256 tiles can be used per map.
  Item        Depth     Required Memory
  One Tile    8bit      40h bytes
  256  Tiles  8bit      4000h (16K)
  BG Map      16x16     100h bytes
  BG Map      128x128   4000h (16K)

BG Mode 3 (Bitmap based Mode for still images)
  06000000-06013FFF  80 KBytes Frame 0 buffer (only 75K actually used)
  06014000-06017FFF  16 KBytes OBJ Tiles

BG Mode 4,5 (Bitmap based Modes)
  06000000-06009FFF  40 KBytes Frame 0 buffer (only 37.5K used in Mode 4)
  0600A000-06013FFF  40 KBytes Frame 1 buffer (only 37.5K used in Mode 4)
  06014000-06017FFF  16 KBytes OBJ Tiles

Additionally to the above VRAM, the GBA also contains 1 KByte Palette RAM (at 05000000h) and 1 KByte OAM (at 07000000h) which are both used by the display controller as well.

  LCD VRAM Character Data

Each character (tile) consists of 8x8 dots (64 dots in total). The color depth may be either 4bit or 8bit (see BG0CNT-BG3CNT).

4bit depth (16 colors, 16 palettes)
Each tile occupies 32 bytes of memory, the first 4 bytes for the topmost row of the tile, and so on. Each byte representing two dots, the lower 4 bits define the color for the left (!) dot, the upper 4 bits the color for the right dot.

8bit depth (256 colors, 1 palette)
Each tile occupies 64 bytes of memory, the first 8 bytes for the topmost row of the tile, and so on. Each byte selects the palette entry for each dot.

  LCD VRAM BG Screen Data Format (BG Map)

The display background consists of 8x8 dot tiles, the arrangement of these tiles is specified by the BG Screen Data (BG Map). The separate entries in this map are as follows:

Text BG Screen (2 bytes per entry)
Specifies the tile number and attributes. Note that BG tile numbers are always specified in steps of 1 (unlike OBJ tile numbers which are using steps of two in 256 color/1 palette mode).
  Bit   Expl.
  0-9   Tile Number     (0-1023) (a bit less in 256 color mode, because
                           there'd be otherwise no room for the bg map)
  10    Horizontal Flip (0=Normal, 1=Mirrored)
  11    Vertical Flip   (0=Normal, 1=Mirrored)
  12-15 Palette Number  (0-15)    (Not used in 256 color/1 palette mode)
A Text BG Map always consists of 32x32 entries (256x256 pixels), 400h entries = 800h bytes. However, depending on the BG Size, one, two, or four of these Maps may be used together, allowing to create backgrounds of 256x256, 512x256, 256x512, or 512x512 pixels, if so, the first map (SC0) is located at base+0, the next map (SC1) at base+800h, and so on.

Rotation/Scaling BG Screen (1 byte per entry)
In this mode, only 256 tiles can be used. There are no x/y-flip attributes, the color depth is always 256 colors/1 palette.
  Bit   Expl.
  0-7   Tile Number     (0-255)
The dimensions of Rotation/Scaling BG Maps depend on the BG size. For size 0-3 that are: 16x16 tiles (128x128 pixels), 32x32 tiles (256x256 pixels), 64x64 tiles (512x512 pixels), or 128x128 tiles (1024x1024 pixels).

The size and VRAM base address of the separate BG maps for BG0-3 are set up by BG0CNT-BG3CNT registers.

  LCD VRAM Bitmap BG Modes

In BG Modes 3-5 the background is defined in form of a bitmap (unlike as for Tile/Map based BG modes). Bitmaps are implemented as BG2, with Rotation/Scaling support. As bitmap modes are occupying 80KBytes of BG memory, only 16KBytes of VRAM can be used for OBJ tiles.

BG Mode 3 - 240x160 pixels, 32768 colors
Two bytes are associated to each pixel, directly defining one of the 32768 colors (without using palette data, and thus not supporting a 'transparent' BG color).
  Bit   Expl.
  0-4   Red Intensity   (0-31)
  5-9   Green Intensity (0-31)
  10-14 Blue Intensity  (0-31)
  15    Not used in GBA Mode (in NDS Mode: Alpha=0=Transparent, Alpha=1=Normal)
The first 480 bytes define the topmost line, the next 480 the next line, and so on. The background occupies 75 KBytes (06000000-06012BFF), most of the 80 Kbytes BG area, not allowing to redraw an invisible second frame in background, so this mode is mostly recommended for still images only.

BG Mode 4 - 240x160 pixels, 256 colors (out of 32768 colors)
One byte is associated to each pixel, selecting one of the 256 palette entries. Color 0 (backdrop) is transparent, and OBJs may be displayed behind the bitmap.
The first 240 bytes define the topmost line, the next 240 the next line, and so on. The background occupies 37.5 KBytes, allowing two frames to be used (06000000-060095FF for Frame 0, and 0600A000-060135FF for Frame 1).

BG Mode 5 - 160x128 pixels, 32768 colors
Colors are defined as for Mode 3 (see above), but horizontal and vertical size are cut down to 160x128 pixels only - smaller than the physical dimensions of the LCD screen.
The background occupies exactly 40 KBytes, so that BG VRAM may be split into two frames (06000000-06009FFF for Frame 0, and 0600A000-06013FFF for Frame 1).

In BG modes 4,5, one Frame may be displayed (selected by DISPCNT Bit 4), the other Frame is invisible and may be redrawn in background.

  LCD OBJ - Overview

Objects (OBJs) are moveable sprites. Up to 128 OBJs (of any size, up to 64x64 dots each) can be displayed per screen, and under best circumstances up to 128 OBJs (of small 8x8 dots size) can be displayed per horizontal display line.

Maximum Number of Sprites per Line
The total available OBJ rendering cycles per line are
  1210  (=304*4-6)   If "H-Blank Interval Free" bit in DISPCNT register is 0
  954   (=240*4-6)   If "H-Blank Interval Free" bit in DISPCNT register is 1
The required rendering cycles are (depending on horizontal OBJ size)
  Cycles per <n> Pixels    OBJ Type              OBJ Type Screen Pixel Range
  n*1 cycles               Normal OBJs           8..64 pixels
  10+n*2 cycles            Rotation/Scaling OBJs 8..64 pixels   (area clipped)
  10+n*2 cycles            Rotation/Scaling OBJs 16..128 pixels (double size)
The maximum number of OBJs per line is also affected by undisplayed (offscreen) OBJs which are having higher priority than displayed OBJs.
To avoid this, move displayed OBJs to the begin of OAM memory (ie. OBJ0 has highest priority, OBJ127 lowest).
Otherwise (in case that the program logic expects OBJs at fixed positions in OAM) at least take care to set the OBJ size of undisplayed OBJs to 8x8 with Rotation/Scaling disabled (this reduces the overload).
Does the above also apply for VERTICALLY OFFSCREEN (or VERTICALLY not on CURRENT LINE) sprites ?

VRAM - Character Data
OBJs are always combined of one or more 8x8 pixel Tiles (much like BG Tiles in BG Modes 0-2). However, OBJ Tiles are stored in a separate area in VRAM: 06010000-06017FFF (32 KBytes) in BG Mode 0-2, or 06014000-06017FFF (16 KBytes) in BG Mode 3-5.
Depending on the size of the above area (16K or 32K), and on the OBJ color depth (4bit or 8bit), 256-1024 8x8 dots OBJ Tiles can be defined.

OAM - Object Attribute Memory
This memory area contains Attributes which specify position, size, color depth, etc. appearance for each of the 128 OBJs. Additionally, it contains 32 OBJ Rotation/Scaling Parameter groups. OAM is located at 07000000-070003FF (sized 1 KByte).

  LCD OBJ - OAM Attributes

OBJ Attributes
There are 128 entries in OAM for each OBJ0-OBJ127. Each entry consists of 6 bytes (three 16bit Attributes). Attributes for OBJ0 are located at 07000000, for OBJ1 at 07000008, OBJ2 at 07000010, and so on.

As you can see, there are blank spaces at 07000006, 0700000E, 07000016, etc. - these 16bit values are used for OBJ Rotation/Scaling (as described in the next chapter) - they are not directly related to the separate OBJs.

OBJ Attribute 0 (R/W)
  Bit   Expl.
  0-7   Y-Coordinate           (0-255)
  8     Rotation/Scaling Flag  (0=Off, 1=On)
  When Rotation/Scaling used (Attribute 0, bit 8 set):
    9     Double-Size Flag     (0=Normal, 1=Double)
  When Rotation/Scaling not used (Attribute 0, bit 8 cleared):
    9     OBJ Disable          (0=Normal, 1=Not displayed)
  10-11 OBJ Mode  (0=Normal, 1=Semi-Transparent, 2=OBJ Window, 3=Prohibited)
  12    OBJ Mosaic             (0=Off, 1=On)
  13    Colors/Palettes        (0=16/16, 1=256/1)
  14-15 OBJ Shape              (0=Square,1=Horizontal,2=Vertical,3=Prohibited)
Caution: A very large OBJ (of 128 pixels vertically, ie. a 64 pixels OBJ in a Double Size area) located at Y>128 will be treated as at Y>-128, the OBJ is then displayed parts offscreen at the TOP of the display, it is then NOT displayed at the bottom.

OBJ Attribute 1 (R/W)
  Bit   Expl.
  0-8   X-Coordinate           (0-511)
  When Rotation/Scaling used (Attribute 0, bit 8 set):
    9-13  Rotation/Scaling Parameter Selection (0-31)
          (Selects one of the 32 Rotation/Scaling Parameters that
          can be defined in OAM, for details read next chapter.)
  When Rotation/Scaling not used (Attribute 0, bit 8 cleared):
    9-11  Not used
    12    Horizontal Flip      (0=Normal, 1=Mirrored)
    13    Vertical Flip        (0=Normal, 1=Mirrored)
  14-15 OBJ Size               (0..3, depends on OBJ Shape, see Attr 0)
          Size  Square   Horizontal  Vertical
          0     8x8      16x8        8x16
          1     16x16    32x8        8x32
          2     32x32    32x16       16x32
          3     64x64    64x32       32x64

OBJ Attribute 2 (R/W)
  Bit   Expl.
  0-9   Character Name          (0-1023=Tile Number)
  10-11 Priority relative to BG (0-3; 0=Highest)
  12-15 Palette Number   (0-15) (Not used in 256 color/1 palette mode)


OBJ Mode
The OBJ Mode may be Normal, Semi-Transparent, or OBJ Window.
Semi-Transparent means that the OBJ is used as 'Alpha Blending 1st Target' (regardless of BLDCNT register, for details see chapter about Color Special Effects).
OBJ Window means that the OBJ is not displayed, instead, dots with non-zero color are used as mask for the OBJ Window, see DISPCNT and WINOUT for details.

OBJ Tile Number
There are two situations which may divide the amount of available tiles by two (by four if both situations apply):

1. When using the 256 Colors/1 Palette mode, only each second tile may be used, the lower bit of the tile number should be zero (in 2-dimensional mapping mode, the bit is completely ignored).

2. When using BG Mode 3-5 (Bitmap Modes), only tile numbers 512-1023 may be used. That is because lower 16K of OBJ memory are used for BG. Attempts to use tiles 0-511 are ignored (not displayed).

In case that the 'Priority relative to BG' is the same than the priority of one of the background layers, then the OBJ becomes higher priority and is displayed on top of that BG layer.
Caution: Take care not to mess up BG Priority and OBJ priority. For example, the following would cause garbage to be displayed:
  OBJ No. 0 with Priority relative to BG=1   ;hi OBJ prio, lo BG prio
  OBJ No. 1 with Priority relative to BG=0   ;lo OBJ prio, hi BG prio
That is, OBJ0 is always having priority above OBJ1-127, so assigning a lower BG Priority to OBJ0 than for OBJ1-127 would be a bad idea.

  LCD OBJ - OAM Rotation/Scaling Parameters

As described in the previous chapter, there are blank spaces between each of the 128 OBJ Attribute Fields in OAM memory. These 128 16bit gaps are used to store OBJ Rotation/Scaling Parameters.

Location of Rotation/Scaling Parameters in OAM
Four 16bit parameters (PA,PB,PC,PD) are required to define a complete group of Rotation/Scaling data. These are spread across OAM as such:
  1st Group - PA=07000006, PB=0700000E, PC=07000016, PD=0700001E
  2nd Group - PA=07000026, PB=0700002E, PC=07000036, PD=0700003E
By using all blank space (128 x 16bit), up to 32 of these groups (4 x 16bit each) can be defined in OAM.

OBJ Rotation/Scaling PA,PB,PC,PD Parameters (R/W)
Each OBJ that uses Rotation/Scaling may select between any of the above 32 parameter groups. For details, refer to the previous chapter about OBJ Attributes.
The meaning of the separate PA,PB,PC,PD values is identical as for BG, for details read the chapter about BG Rotation/Scaling.

OBJ Reference Point & Rotation Center
The OBJ Reference Point is the upper left of the OBJ, ie. OBJ X/Y coordinates: X+0, Y+0.
The OBJ Rotation Center is always (or should be usually?) in the middle of the object, ie. for a 8x32 pixel OBJ, this would be at the OBJ X/Y coordinates: X+4, and Y+16.

OBJ Double-Size Bit (for OBJs that use Rotation/Scaling)
When Double-Size is zero: The sprite is rotated, and then display inside of the normal-sized (not rotated) rectangular area - the edges of the rotated sprite will become invisible if they reach outside of that area.
When Double-Size is set: The sprite is rotated, and then display inside of the double-sized (not rotated) rectangular area - this ensures that the edges of the rotated sprite remain visible even if they would reach outside of the normal-sized area. (Except that, for example, rotating a 8x32 pixel sprite by 90 degrees would still cut off parts of the sprite as the double-size area isn't large enough.)

  LCD OBJ - VRAM Character (Tile) Mapping

Each OBJ tile consists of 8x8 dots, however, bigger OBJs can be displayed by combining several 8x8 tiles. The horizontal and vertical size for each OBJ may be separately defined in OAM, possible H/V sizes are 8,16,32,64 dots - allowing 'square' OBJs to be used (such like 8x8, 16x16, etc) as well as 'rectangular' OBJs (such like 8x32, 64x16, etc.)

When displaying an OBJ that contains of more than one 8x8 tile, one of the following two mapping modes can be used. In either case, the tile number of the upperleft tile must be specified in OAM memory.

Two Dimensional Character Mapping (DISPCNT Bit 6 cleared)
This mapping mode assumes that the 1024 OBJ tiles are arranged as a matrix of 32x32 tiles / 256x256 pixels (In 256 color mode: 16x32 tiles / 128x256 pixels). Ie. the upper row of this matrix contains tiles 00h-1Fh, the next row tiles 20h-3Fh, and so on.
For example, when displaying a 16x16 pixel OBJ, with tile number set to 04h; The upper row of the OBJ will consist of tile 04h and 05h, the next row of 24h and 25h. (In 256 color mode: 04h and 06h, 24h and 26h.)

One Dimensional Character Mapping (DISPCNT Bit 6 set)
In this mode, tiles are mapped each after each other from 00h-3FFh.
Using the same example as above, the upper row of the OBJ will consist of tile 04h and 05h, the next row of tile 06h and 07h. (In 256 color mode: 04h and 06h, 08h and 0Ah.)

  LCD Color Palettes

Color Palette RAM
BG and OBJ palettes are using separate memory regions:
  05000000-050001FF - BG Palette RAM (512 bytes, 256 colors)
  05000200-050003FF - OBJ Palette RAM (512 bytes, 256 colors)
Each BG and OBJ palette RAM may be either split into 16 palettes with 16 colors each, or may be used as a single palette with 256 colors.
Note that some OBJs may access palette RAM in 16 color mode, while other OBJs may use 256 color mode at the same time. Same for BG0-BG3 layers.

Transparent Colors
Color 0 of all BG and OBJ palettes is transparent. Even though palettes are described as 16 (256) color palettes, only 15 (255) colors are actually visible.

Backdrop Color
Color 0 of BG Palette 0 is used as backdrop color. This color is displayed if an area of the screen is not covered by any non-transparent BG or OBJ dots.

Color Definitions
Each color occupies two bytes (same as for 32768 color BG modes):
  Bit   Expl.
  0-4   Red Intensity   (0-31)
  5-9   Green Intensity (0-31)
  10-14 Blue Intensity  (0-31)
  15    Not used

Under normal circumstances (light source/viewing angle), the intensities 0-14 are practically all black, and only intensities 15-31 are resulting in visible medium..bright colors.

Note: The intensity problem appears in the 8bit CGB "compatibility" mode either. The original CGB display produced the opposite effect: Intensities 0-14 resulted in dark..medium colors, and intensities 15-31 resulted in bright colors. Any "medium" colors of CGB games will appear invisible/black on GBA hardware, and only very bright colors will be visible.

  LCD Dimensions and Timings

Horizontal Dimensions
The drawing time for each dot is 4 CPU cycles.
  Visible     240 dots,  57.221 us,    960 cycles - 78% of h-time
  H-Blanking   68 dots,  16.212 us,    272 cycles - 22% of h-time
  Total       308 dots,  73.433 us,   1232 cycles - ca. 13.620 kHz
VRAM and Palette RAM may be accessed during H-Blanking. OAM can accessed only if "H-Blank Interval Free" bit in DISPCNT register is set.

Vertical Dimensions
  Visible (*) 160 lines, 11.749 ms, 197120 cycles - 70% of v-time
  V-Blanking   68 lines,  4.994 ms,  83776 cycles - 30% of v-time
  Total       228 lines, 16.743 ms, 280896 cycles - ca. 59.737 Hz
All VRAM, OAM, and Palette RAM may be accessed during V-Blanking.
Note that no H-Blank interrupts are generated within V-Blank period.

System Clock
The system clock is 16.78MHz (16*1024*1024 Hz), one cycle is thus approx. 59.59ns.

(*) Even though vertical screen size is 160 lines, the upper 8 lines are not <really> visible, these lines are covered by a shadow when holding the GBA orientated towards a light source, the lines are effectively black - and should not be used to display important information.

The LCD display is using some sort of interlace in which even scanlines are dimmed in each second frame, and odd scanlines are dimmed in each other frame (it does always render ALL lines in ALL frames, but half of them are dimmed).
The effect can be seen when displaying some horizontal lines in each second frame, and hiding them in each other frame: the hardware will randomly show the lines in dimmed or non-dimmed form (depending on whether the test was started in an even or odd frame).
Unknown if it's possible to determine the even/off frame state by software (or possibly to reset the hardware to this or that state by software).
Note: The NDS is applying some sort of frameskip to GBA games, about every 3 seconds there will by a missing (or maybe: inserted) frame, ie. a GBA game that is updating the display in sync with GBA interlace will get offsync on NDS consoles.

  GBA Sound Controller

The GBA supplies four 'analogue' sound channels for Tone and Noise (mostly compatible to CGB sound), as well as two 'digital' sound channels (which can be used to replay 8bit DMA sample data).

GBA Sound Channel 1 - Tone & Sweep
GBA Sound Channel 2 - Tone
GBA Sound Channel 3 - Wave Output
GBA Sound Channel 4 - Noise
GBA Sound Channel A and B - DMA Sound

GBA Sound Control Registers
GBA Comparison of CGB and GBA Sound

The GBA includes only a single (mono) speaker built-in, each channel may be output to either left and/or right channels by using the external line-out connector (for stereo headphones, etc).

  GBA Sound Channel 1 - Tone & Sweep

4000060h - SOUND1CNT_L (NR10) - Channel 1 Sweep register (R/W)
  Bit        Expl.
  0-2   R/W  Number of sweep shift      (n=0-7)
  3     R/W  Sweep Frequency Direction  (0=Increase, 1=Decrease)
  4-6   R/W  Sweep Time; units of 7.8ms (0-7, min=7.8ms, max=54.7ms)
  7-15  -    Not used
Sweep is disabled by setting Sweep Time to zero, if so, the direction bit should be set.
The change of frequency (NR13,NR14) at each shift is calculated by the following formula where X(0) is initial freq & X(t-1) is last freq:
  X(t) = X(t-1) +/- X(t-1)/2^n

4000062h - SOUND1CNT_H (NR11, NR12) - Channel 1 Duty/Len/Envelope (R/W)
  Bit        Expl.
  0-5   W    Sound length; units of (64-n)/256s  (0-63)
  6-7   R/W  Wave Pattern Duty                   (0-3, see below)
  8-10  R/W  Envelope Step-Time; units of n/64s  (1-7, 0=No Envelope)
  11    R/W  Envelope Direction                  (0=Decrease, 1=Increase)
  12-15 R/W  Initial Volume of envelope          (1-15, 0=No Sound)
Wave Duty:
  0: 12.5% ( -_______-_______-_______ )
  1: 25%   ( --______--______--______ )
  2: 50%   ( ----____----____----____ ) (normal)
  3: 75%   ( ------__------__------__ )
The Length value is used only if Bit 6 in NR14 is set.

4000064h - SOUND1CNT_X (NR13, NR14) - Channel 1 Frequency/Control (R/W)
  Bit        Expl.
  0-10  W    Frequency; 131072/(2048-n)Hz  (0-2047)
  11-13 -    Not used
  14    R/W  Length Flag  (1=Stop output when length in NR11 expires)
  15    W    Initial      (1=Restart Sound)
  16-31 -    Not used

  GBA Sound Channel 2 - Tone

This sound channel works exactly as channel 1, except that it doesn't have a Tone Envelope/Sweep Register.

4000068h - SOUND2CNT_L (NR21, NR22) - Channel 2 Duty/Length/Envelope (R/W)
400006Ah - Not used
400006Ch - SOUND2CNT_H (NR23, NR24) - Channel 2 Frequency/Control (R/W)
For details, refer to channel 1 description.

  GBA Sound Channel 3 - Wave Output

This channel can be used to output digital sound, the length of the sample buffer (Wave RAM) can be either 32 or 64 digits (4bit samples). This sound channel can be also used to output normal tones when initializing the Wave RAM by a square wave. This channel doesn't have a volume envelope register.

4000070h - SOUND3CNT_L (NR30) - Channel 3 Stop/Wave RAM select (R/W)
  Bit        Expl.
  0-4   -    Not used
  5     R/W  Wave RAM Dimension   (0=One bank/32 digits, 1=Two banks/64 digits)
  6     R/W  Wave RAM Bank Number (0-1, see below)
  7     R/W  Sound Channel 3 Off  (0=Stop, 1=Playback)
  8-15  -    Not used
The currently selected Bank Number (Bit 6) will be played back, while reading/writing to/from wave RAM will address the other (not selected) bank. When dimension is set to two banks, output will start by replaying the currently selected bank.

4000072h - SOUND3CNT_H (NR31, NR32) - Channel 3 Length/Volume (R/W)
  Bit        Expl.
  0-7   W    Sound length; units of (256-n)/256s  (0-255)
  8-12  -    Not used.
  13-14 R/W  Sound Volume  (0=Mute/Zero, 1=100%, 2=50%, 3=25%)
  15    R/W  Force Volume  (0=Use above, 1=Force 75% regardless of above)
The Length value is used only if Bit 6 in NR34 is set.

4000074h - SOUND3CNT_X (NR33, NR34) - Channel 3 Frequency/Control (R/W)
  Bit        Expl.
  0-10  W    Sample Rate; 2097152/(2048-n) Hz   (0-2047)
  11-13 -    Not used
  14    R/W  Length Flag  (1=Stop output when length in NR31 expires)
  15    W    Initial      (1=Restart Sound)
  16-31 -    Not used
The above sample rate specifies the number of wave RAM digits per second, the actual tone frequency depends on the wave RAM content, for example:
  Wave RAM, single bank 32 digits   Tone Frequency
  FFFFFFFFFFFFFFFF0000000000000000  65536/(2048-n) Hz
  FFFFFFFF00000000FFFFFFFF00000000  131072/(2048-n) Hz
  FFFF0000FFFF0000FFFF0000FFFF0000  262144/(2048-n) Hz
  FF00FF00FF00FF00FF00FF00FF00FF00  524288/(2048-n) Hz
  F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0  1048576/(2048-n) Hz

4000090h - WAVE_RAM0_L - Channel 3 Wave Pattern RAM (W/R)
4000092h - WAVE_RAM0_H - Channel 3 Wave Pattern RAM (W/R)
4000094h - WAVE_RAM1_L - Channel 3 Wave Pattern RAM (W/R)
4000096h - WAVE_RAM1_H - Channel 3 Wave Pattern RAM (W/R)
4000098h - WAVE_RAM2_L - Channel 3 Wave Pattern RAM (W/R)
400009Ah - WAVE_RAM2_H - Channel 3 Wave Pattern RAM (W/R)
400009Ch - WAVE_RAM3_L - Channel 3 Wave Pattern RAM (W/R)
400009Eh - WAVE_RAM3_H - Channel 3 Wave Pattern RAM (W/R)
This area contains 16 bytes (32 x 4bits) Wave Pattern data which is output by channel 3. Data is played back ordered as follows: MSBs of 1st byte, followed by LSBs of 1st byte, followed by MSBs of 2nd byte, and so on - this results in a confusing ordering when filling Wave RAM in units of 16bit data - ie. samples would be then located in Bits 4-7, 0-3, 12-15, 8-11.

In the GBA, two Wave Patterns exists (each 32 x 4bits), either one may be played (as selected in NR30 register), the other bank may be accessed by the users. After all 32 samples have been played, output of the same bank (or other bank, as specified in NR30) will be automatically restarted.

Internally, Wave RAM is a giant shift-register, there is no pointer which is addressing the currently played digit. Instead, the entire 128 bits are shifted, and the 4 least significant bits are output.
Thus, when reading from Wave RAM, data might have changed its position. And, when writing to Wave RAM all data should be updated (it'd be no good idea to assume that old data is still located at the same position where it has been written to previously).

  GBA Sound Channel 4 - Noise

This channel is used to output white noise. This is done by randomly switching the amplitude between high and low at a given frequency. Depending on the frequency the noise will appear 'harder' or 'softer'.

It is also possible to influence the function of the random generator, so the that the output becomes more regular, resulting in a limited ability to output Tone instead of Noise.

4000078h - SOUND4CNT_L (NR41, NR42) - Channel 4 Length/Envelope (R/W)
  Bit        Expl.
  0-5   W    Sound length; units of (64-n)/256s  (0-63)
  6-7   -    Not used
  8-10  R/W  Envelope Step-Time; units of n/64s  (1-7, 0=No Envelope)
  11    R/W  Envelope Direction                  (0=Decrease, 1=Increase)
  12-15 R/W  Initial Volume of envelope          (1-15, 0=No Sound)
  16-31 -    Not used
The Length value is used only if Bit 6 in NR44 is set.

400007Ch - SOUND4CNT_H (NR43, NR44) - Channel 4 Frequency/Control (R/W)
The amplitude is randomly switched between high and low at the given frequency. A higher frequency will make the noise to appear 'softer'.
When Bit 3 is set, the output will become more regular, and some frequencies will sound more like Tone than Noise.
  Bit        Expl.
  0-2   R/W  Dividing Ratio of Frequencies (r)
  3     R/W  Counter Step/Width (0=15 bits, 1=7 bits)
  4-7   R/W  Shift Clock Frequency (s)
  8-13  -    Not used
  14    R/W  Length Flag  (1=Stop output when length in NR41 expires)
  15    W    Initial      (1=Restart Sound)
  16-31 -    Not used
Frequency = 524288 Hz / r / 2^(s+1) ;For r=0 assume r=0.5 instead

Noise Random Generator (aka Polynomial Counter)
Noise randomly switches between HIGH and LOW levels, the output levels are calculated by a shift register (X), at the selected frequency, as such:
  7bit:  X=X SHR 1, IF carry THEN Out=HIGH, X=X XOR 60h ELSE Out=LOW
  15bit: X=X SHR 1, IF carry THEN Out=HIGH, X=X XOR 6000h ELSE Out=LOW
The initial value when (re-)starting the sound is X=40h (7bit) or X=4000h (15bit). The data stream repeats after 7Fh (7bit) or 7FFFh (15bit) steps.

  GBA Sound Channel A and B - DMA Sound

The GBA contains two DMA sound channels (A and B), each allowing to replay digital sound (signed 8bit data, ie. -128..+127). Data can be transferred from INTERNAL memory (not sure if EXTERNAL memory works also ?) to FIFO by using DMA channel 1 or 2, the sample rate is generated by using one of the Timers.

40000A0h - FIFO_A_L - Sound A FIFO, Data 0 and Data 1 (W)
40000A2h - FIFO_A_H - Sound A FIFO, Data 2 and Data 3 (W)
These two registers may receive 32bit (4 bytes) of audio data (Data 0-3, Data 0 being located in least significant byte which is replayed first).
Internally, the capacity of the FIFO is 8 x 32bit (32 bytes), allowing to buffer a small amount of samples. As the name says (First In First Out), oldest data is replayed first.

40000A4h - FIFO_B_L - Sound B FIFO, Data 0 and Data 1 (W)
40000A6h - FIFO_B_H - Sound B FIFO, Data 2 and Data 3 (W)
Same as above, for Sound B.

Initializing DMA-Sound Playback
- Select Timer 0 or 1 in SOUNDCNT_H control register.
- Clear the FIFO.
- Manually write a sample byte to the FIFO.
- Initialize transfer mode for DMA 1 or 2.
- Initialize DMA Sound settings in sound control register.
- Start the timer.

DMA-Sound Playback Procedure
The pseudo-procedure below is automatically repeated.
  If Timer overflows then
    Move 8bit data from FIFO to sound circuit.
    If FIFO contains only 4 x 32bits (16 bytes) then
      Request more data per DMA
      Receive 4 x 32bit (16 bytes) per DMA
This playback mechanism will be repeated forever, regardless of the actual length of the sample buffer.

Synchronizing Sample Buffers
The buffer-end may be determined by counting sound Timer IRQs (each sample byte), or sound DMA IRQs (each 16th sample byte). Both methods would require a lot of CPU time (IRQ processing), and both would fail if interrupts are disabled for a longer period.
Better solutions would be to synchronize the sample rate/buffer length with V-blanks, or to use a second timer (in count up/slave mode) which produces an IRQ after the desired number of samples.

The Sample Rate
The GBA hardware does internally re-sample all sound output to 32.768kHz (default SOUNDBIAS setting). It'd thus do not make much sense to use higher DMA/Timer rates. Best re-sampling accuracy can be gained by using DMA/Timer rates of 32.768kHz, 16.384kHz, or 8.192kHz (ie. fragments of the physical output rate).

  GBA Sound Control Registers

4000080h - SOUNDCNT_L (NR50, NR51) - Channel L/R Volume/Enable (R/W)
  Bit   Expl.
  0-2   Sound 1-4 Master Volume RIGHT (0-7)
  3     Not used
  4-6   Sound 1-4 Master Volume LEFT (0-7)
  7     Not used
  8-11  Sound 1-4 Enable Flags RIGHT (each Bit 8-11, 0=Disable, 1=Enable)
  12-15 Sound 1-4 Enable Flags LEFT (each Bit 12-15, 0=Disable, 1=Enable)

4000082h - SOUNDCNT_H (GBA only) - DMA Sound Control/Mixing (R/W)
  Bit   Expl.
  0-1   Sound # 1-4 Volume   (0=25%, 1=50%, 2=100%, 3=Prohibited)
  2     DMA Sound A Volume   (0=50%, 1=100%)
  3     DMA Sound B Volume   (0=50%, 1=100%)
  4-7   Not used
  8     DMA Sound A Enable RIGHT (0=Disable, 1=Enable)
  9     DMA Sound A Enable LEFT  (0=Disable, 1=Enable)
  10    DMA Sound A Timer Select (0=Timer 0, 1=Timer 1)
  11    DMA Sound A Reset FIFO   (1=Reset)
  12    DMA Sound B Enable RIGHT (0=Disable, 1=Enable)
  13    DMA Sound B Enable LEFT  (0=Disable, 1=Enable)
  14    DMA Sound B Timer Select (0=Timer 0, 1=Timer 1)
  15    DMA Sound B Reset FIFO   (1=Reset)

4000084h - SOUNDCNT_X (NR52) - Sound on/off (R/W)
Bits 0-3 are automatically set when starting sound output, and are automatically cleared when a sound ends. (Ie. when the length expires, as far as length is enabled. The bits are NOT reset when an volume envelope ends.)
  Bit   Expl.
  0     Sound 1 ON flag (Read Only)
  1     Sound 2 ON flag (Read Only)
  2     Sound 3 ON flag (Read Only)
  3     Sound 4 ON flag (Read Only)
  4-6   Not used
  7     PSG/FIFO Master Enable (0=Disable, 1=Enable) (Read/Write)
  8-31  Not used
While Bit 7 is cleared, both PSG and FIFO sounds are disabled, and all PSG registers (4000060h..4000081h) are reset to zero (and must be re-initialized after re-enabling sound). However, registers 4000082h and 4000088h are kept read/write-able (of which, 4000082h has no function when sound is off, whilst 4000088h does work even when sound is off).

4000088h - SOUNDBIAS - Sound PWM Control (R/W, see below)
This register controls the final sound output. The default setting is 0200h, it is normally not required to change this value.
  Bit    Expl.
  0-9    Bias Level     (Default=200h, converting signed samples into unsigned)
  10-13  Not used
  14-15  Amplitude Resolution/Sampling Cycle (Default=0, see below)
  16-31  Not used
Amplitude Resolution/Sampling Cycle (0-3):
  0  9bit / 32.768kHz   (Default, best for DMA channels A,B)
  1  8bit / 65.536kHz
  2  7bit / 131.072kHz
  3  6bit / 262.144kHz  (Best for PSG channels 1-4)
For more information on this register, read the descriptions below.

400008Ch - Not used
400008Eh - Not used

Max Output Levels (with max volume settings)
Each of the two FIFOs can span the FULL output range (+/-200h).
Each of the four PSGs can span one QUARTER of the output range (+/-80h).
The current output levels of all six channels are added together by hardware.
So together, the FIFOs and PSGs, could reach THRICE the range (+/-600h).
The BIAS value is added to that signed value. With default BIAS (200h), the possible range becomes -400h..+800h, however, values that exceed the unsigned 10bit output range of 0..3FFh are clipped to MinMax(0,3FFh).

Resampling to 32.768kHz / 9bit (default)
The PSG channels 1-4 are internally generated at 262.144kHz, and DMA sound A-B could be theoretically generated at timer rates up to 16.78MHz. However, the final sound output is resampled to a rate of 32.768kHz, at 9bit depth (the above 10bit value, divided by two). If necessary, rates higher than 32.768kHz can be selected in the SOUNDBIAS register, that would result in a depth smaller than 9bit though.

PWM (Pulse Width Modulation) Output 16.78MHz / 1bit
Okay, now comes the actual output. The GBA can output only two voltages (low and high), these 'bits' are output at system clock speed (16.78MHz). If using the default 32.768kHz sampling rate, then 512 bits are output per sample (512*32K=16M). Each sample value (9bit range, N=0..511), would be then output as N low bits, followed by 512-N high bits. The resulting 'noise' is smoothed down by capacitors, by the speaker, and by human hearing, so that it will effectively sound like clean D/A converted 9bit voltages at 32kHz sampling rate.

Changing the BIAS Level
Normally use 200h for clean sound output. A value of 000h might make sense during periods when no sound is output (causing the PWM circuit to output low-bits only, which is eventually reducing the power consumption, and/or preventing 32KHz noise). Note: Using the SoundBias function (SWI 19h) allows to change the level by slowly incrementing or decrementing it (without hard scratch noise).

Low Power Mode
When not using sound output, power consumption can be reduced by setting both 4000084h (PSG/FIFO) and 4000088h (BIAS) to zero.

  GBA Comparison of CGB and GBA Sound

The GBA sound controller is mostly the same than that of older monochrome gameboy and CGB. The following changes have been done:

New Sound Channels
Two new sound channels have been added that may be used to replay 8bit digital sound. Sample rate and sample data must be supplied by using a Timer and a DMA channel.

New Control Registers
The SOUNDCNT_H register controls the new DMA channels - as well as mixing with the four old channels. The SOUNDBIAS register controls the final sound output.

Sound Channel 3 Changes
The length of the Wave RAM is doubled by dividing it into two banks of 32 digits each, either one or both banks may be replayed (one after each other), for details check NR30 Bit 5-6. Optionally, the sound may be output at 75% volume, for details check NR32 Bit 7.

Changed Control Registers
NR50 is not supporting Vin signals (that's been an external sound input from cartridge).

Changed I/O Addresses
The GBAs sound register are located at 04000060-040000AE instead of at FF10-FF3F as in CGB and monochrome gameboy. However, note that there have been new blank spaces inserted between some of the separate registers - therefore it is NOT possible to port CGB software to GBA just by changing the sound base address.

Accessing I/O Registers
In some cases two of the old 8bit registers are packed into a 16bit register and may be accessed as such.

  GBA Timers

The GBA includes four incrementing 16bit timers.
Timer 0 and 1 can be used to supply the sample rate for DMA sound channel A and/or B.

4000100h - TM0CNT_L - Timer 0 Counter/Reload (R/W)
4000104h - TM1CNT_L - Timer 1 Counter/Reload (R/W)
4000108h - TM2CNT_L - Timer 2 Counter/Reload (R/W)
400010Ch - TM3CNT_L - Timer 3 Counter/Reload (R/W)
Writing to these registers initializes the <reload> value (but does not directly affect the current counter value). Reading returns the current <counter> value (or the recent/frozen counter value if the timer has been stopped).
The reload value is copied into the counter only upon following two situations: Automatically upon timer overflows, or when the timer start bit becomes changed from 0 to 1.
Note: When simultaneously changing the start bit from 0 to 1, and setting the reload value at the same time (by a single 32bit I/O operation), then the newly written reload value is recognized as new counter value.

4000102h - TM0CNT_H - Timer 0 Control (R/W)
4000106h - TM1CNT_H - Timer 1 Control (R/W)
400010Ah - TM2CNT_H - Timer 2 Control (R/W)
400010Eh - TM3CNT_H - Timer 3 Control (R/W)
  Bit   Expl.
  0-1   Prescaler Selection (0=F/1, 1=F/64, 2=F/256, 3=F/1024)
  2     Count-up Timing   (0=Normal, 1=See below)
  3-5   Not used
  6     Timer IRQ Enable  (0=Disable, 1=IRQ on Timer overflow)
  7     Timer Start/Stop  (0=Stop, 1=Operate)
  8-15  Not used
When Count-up Timing is enabled, the prescaler value is ignored, instead the time is incremented each time when the previous counter overflows. This function cannot be used for Timer 0 (as it is the first timer).
F = System Clock (16.78MHz).

  GBA DMA Transfers

The GBA includes four DMA channels, the highest priority is assigned to DMA0, followed by DMA1, DMA2, and DMA3. DMA Channels with lower priority are paused until channels with higher priority have completed.
The CPU is paused when DMA transfers are active, however, the CPU is operating during the periods when Sound/Blanking DMA transfers are paused.

Special features of the separate DMA channels
DMA0 - highest priority, best for timing critical transfers (eg. HBlank DMA).
DMA1 and DMA2 - can be used to feed digital sample data to the Sound FIFOs.
DMA3 - can be used to write to Game Pak ROM/FlashROM (but not GamePak SRAM).
Beside for that, each DMA 0-3 may be used for whatever general purposes.

40000B0h,0B2h - DMA0SAD - DMA 0 Source Address (W) (internal memory)
40000BCh,0BEh - DMA1SAD - DMA 1 Source Address (W) (any memory)
40000C8h,0CAh - DMA2SAD - DMA 2 Source Address (W) (any memory)
40000D4h,0D6h - DMA3SAD - DMA 3 Source Address (W) (any memory)
The most significant address bits are ignored, only the least significant 27 or 28 bits are used (max 07FFFFFFh internal memory, or max 0FFFFFFFh any memory - except SRAM ?!).

40000B4h,0B6h - DMA0DAD - DMA 0 Destination Address (W) (internal memory)
40000C0h,0C2h - DMA1DAD - DMA 1 Destination Address (W) (internal memory)
40000CCh,0CEh - DMA2DAD - DMA 2 Destination Address (W) (internal memory)
40000D8h,0DAh - DMA3DAD - DMA 3 Destination Address (W) (any memory)
The most significant address bits are ignored, only the least significant 27 or 28 bits are used (max. 07FFFFFFh internal memory or 0FFFFFFFh any memory - except SRAM ?!).

40000B8h - DMA0CNT_L - DMA 0 Word Count (W) (14 bit, 1..4000h)
40000C4h - DMA1CNT_L - DMA 1 Word Count (W) (14 bit, 1..4000h)
40000D0h - DMA2CNT_L - DMA 2 Word Count (W) (14 bit, 1..4000h)
40000DCh - DMA3CNT_L - DMA 3 Word Count (W) (16 bit, 1..10000h)
Specifies the number of data units to be transferred, each unit is 16bit or 32bit depending on the transfer type, a value of zero is treated as max length (ie. 4000h, or 10000h for DMA3).

40000BAh - DMA0CNT_H - DMA 0 Control (R/W)
40000C6h - DMA1CNT_H - DMA 1 Control (R/W)
40000D2h - DMA2CNT_H - DMA 2 Control (R/W)
40000DEh - DMA3CNT_H - DMA 3 Control (R/W)
  Bit   Expl.
  0-4   Not used
  5-6   Dest Addr Control  (0=Increment,1=Decrement,2=Fixed,3=Increment/Reload)
  7-8   Source Adr Control (0=Increment,1=Decrement,2=Fixed,3=Prohibited)
  9     DMA Repeat                   (0=Off, 1=On) (Must be zero if Bit 11 set)
  10    DMA Transfer Type            (0=16bit, 1=32bit)
  11    Game Pak DRQ  - DMA3 only -  (0=Normal, 1=DRQ <from> Game Pak, DMA3)
  12-13 DMA Start Timing  (0=Immediately, 1=VBlank, 2=HBlank, 3=Special)
          The 'Special' setting (Start Timing=3) depends on the DMA channel:
          DMA0=Prohibited, DMA1/DMA2=Sound FIFO, DMA3=Video Capture
  14    IRQ upon end of Word Count   (0=Disable, 1=Enable)
  15    DMA Enable                   (0=Off, 1=On)
After changing the Enable bit from 0 to 1, wait 2 clock cycles before accessing any DMA related registers.

When accessing OAM (7000000h) or OBJ VRAM (6010000h) by HBlank Timing, then the "H-Blank Interval Free" bit in DISPCNT register must be set.

Source and Destination Address and Word Count Registers
The SAD, DAD, and CNT_L registers are holding the initial start addresses, and initial length. The hardware does NOT change the content of these registers during or after the transfer.
The actual transfer takes place by using internal pointer/counter registers. The initial values are copied into internal regs under the following circumstances:
Upon DMA Enable (Bit 15) changing from 0 to 1: Reloads SAD, DAD, CNT_L.
Upon Repeat: Reloads CNT_L, and optionally DAD (Increment+Reload).

DMA Repeat bit
If the Repeat bit is cleared: The Enable bit is automatically cleared after the specified number of data units has been transferred.
If the Repeat bit is set: The Enable bit remains set after the transfer, and the transfer will be restarted each time when the Start condition (eg. HBlank, Fifo) becomes true. The specified number of data units is transferred <each> time when the transfer is (re-)started. The transfer will be repeated forever, until it gets stopped by software.

Sound DMA (FIFO Timing Mode) (DMA1 and DMA2 only)
In this mode, the DMA Repeat bit must be set, and the destination address must be FIFO_A (040000A0h) or FIFO_B (040000A4h).
Upon DMA request from sound controller, 4 units of 32bits (16 bytes) are transferred (both Word Count register and DMA Transfer Type bit are ignored). The destination address will not be incremented in FIFO mode.
Keep in mind that DMA channels of higher priority may offhold sound DMA. For example, when using a 64 kHz sample rate, 16 bytes of sound DMA data are requested each 0.25ms (4 kHz), at this time another 16 bytes are still in the FIFO so that there's still 0.25ms time to satisfy the DMA request. Thus DMAs with higher priority should not be operated for longer than 0.25ms. (This problem does not arise for HBlank transfers as HBlank time is limited to 16.212us.)

Game Pak DMA
Only DMA 3 may be used to transfer data to/from Game Pak ROM or Flash ROM - it cannot access Game Pak SRAM though (as SRAM data bus is limited to 8bit units). In normal mode, DMA is requested as long until Word Count becomes zero. When setting the 'Game Pack DRQ' bit, then the cartridge must contain an external circuit which outputs a /DREQ signal. Note that there is only one pin for /DREQ and /IREQ, thus the cartridge may not supply /IREQs while using DRQ mode.

Video Capture Mode (DMA3 only)
Intended to copy a bitmap from memory (or from external hardware/camera) to VRAM. When using this transfer mode, set the repeat bit, and write the number of data units (per scanline) to the word count register. Capture works similar like HBlank DMA, however, the transfer is started when VCOUNT=2, it is then repeated each scanline, and it gets stopped when VCOUNT=162.

Transfer End
The DMA Enable flag (Bit 15) is automatically cleared upon completion of the transfer. The user may also clear this bit manually in order to stop the transfer (obviously this is possible for Sound/Blanking DMAs only, in all other cases the CPU is stopped until the transfer completes by itself).

Transfer Rate/Timing
Except for the first data unit, all units are transferred by sequential reads and writes. For n data units, the DMA transfer time is:
Of which, 1N+(n-1)S are read cycles, and the other 1N+(n-1)S are write cycles, actual number of cycles depends on the waitstates and bus-width of the source and destination areas (as described in CPU Instruction Cycle Times chapter). Internal time for DMA processing is 2I (normally), or 4I (if both source and destination are in gamepak memory area).

DMA lockup when stopping while starting ???
Capture delayed, Capture Enable=AutoCleared ???

  GBA Communication Ports

The GBAs Serial Port may be used in various different communication modes. Normal mode may exchange data between two GBAs (or to transfer data from master GBA to several slave GBAs in one-way direction).
Multi-player mode may exchange data between up to four GBAs. UART mode works much like a RS232 interface. JOY Bus mode uses a standardized Nintendo protocol. And General Purpose mode allows to mis-use the 'serial' port as bi-directional 4bit parallel port.
Note: The Nintendo DS does not include a Serial Port.

SIO Normal Mode
SIO Multi-Player Mode
SIO General-Purpose Mode
SIO Control Registers Summary

Wireless Adapter
GBA Wireless Adapter

Infrared Communication Adapters
Even though early GBA prototypes have been intended to support IR communication, this feature has been removed.
However, Nintendo is apparently considering to provide an external IR adapter (to be connected to the SIO connector, being accessed in General Purpose mode).
Also, it'd be theoretically possible to include IR ports built-in in game cartridges (as done for some older 8bit/monochrome Hudson games).

  SIO Normal Mode

This mode is used to communicate between two units.
Transfer rates of 256Kbit/s or 2Mbit/s can be selected, however, the fast 2Mbit/s is intended ONLY for special hardware expansions that are DIRECTLY connected to the GBA link port (ie. without a cable being located between the GBA and expansion hardware). In normal cases, always use 256Kbit/s transfer rate which provides stable results.
Transfer lengths of 8bit or 32bit may be used, the 8bit mode is the same as for older DMG/CGB gameboys, however, the voltages for "GBA cartridges in GBAs" are different as for "DMG/CGB cartridges in DMG/CGB/GBAs", ie. it is not possible to communicate between DMG/CGB games and GBA games.

4000134h - RCNT (R) - Mode Selection, in Normal/Multiplayer/UART modes (R/W)
  Bit   Expl.
  0-3   Undocumented (current SC,SD,SI,SO state, as for General Purpose mode)
  4-8   Not used     (Should be 0, bits are read/write-able though)
  9-13  Not used     (Always 0, read only)
  14    Not used     (Should be 0, bit is read/write-able though)
  15    Must be zero (0) for Normal/Multiplayer/UART modes

4000128h - SIOCNT - SIO Control, usage in NORMAL Mode (R/W)
  Bit   Expl.
  0     Shift Clock (SC)        (0=External, 1=Internal)
  1     Internal Shift Clock    (0=256KHz, 1=2MHz)
  2     SI State (opponents SO) (0=Low, 1=High/None) --- (Read Only)
  3     SO during inactivity    (0=Low, 1=High) (applied ONLY when Bit7=0)
  4-6   Not used                (Read only, always 0 ?)
  7     Start Bit               (0=Inactive/Ready, 1=Start/Active)
  8-11  Not used                (R/W, should be 0)
  12    Transfer Length         (0=8bit, 1=32bit)
  13    Must be "0" for Normal Mode
  14    IRQ Enable              (0=Disable, 1=Want IRQ upon completion)
  15    Not used                (Read only, always 0)
The Start bit is automatically reset when the transfer completes, ie. when all 8 or 32 bits are transferred, at that time an IRQ may be generated.

400012Ah - SIODATA8 - SIO Normal Communication 8bit Data (R/W)
For 8bit normal mode. Contains 8bit data (only lower 8bit are used). Outgoing data should be written to this register before starting the transfer. During transfer, transmitted bits are shifted-out (MSB first), and received bits are shifted-in simultaneously. Upon transfer completion, the register contains the received 8bit value.

4000120h - SIODATA32_L - SIO Normal Communication lower 16bit data (R/W)
4000122h - SIODATA32_H - SIO Normal Communication upper 16bit data (R/W)
Same as above SIODATA8, for 32bit normal transfer mode respectively.
SIOCNT/RCNT must be set to 32bit normal mode <before> writing to SIODATA32.

First, initialize RCNT register. Second, set mode/clock bits in SIOCNT with startbit cleared. For master: select internal clock, and (in most cases) specify 256KHz as transfer rate. For slave: select external clock, the local transfer rate selection is then ignored, as the transfer rate is supplied by the remote GBA (or other computer, which might supply custom transfer rates).
Third, set the startbit in SIOCNT with mode/clock bits unchanged.

Recommended Communication Procedure for SLAVE unit (external clock)
- Initialize data which is to be sent to master.
- Set Start flag.
- Set SO to LOW to indicate that master may start now.
- Wait for IRQ (or for Start bit to become zero). (Check timeout here!)
- Set SO to HIGH to indicate that we are not ready.
- Process received data.
- Repeat procedure if more data is to be transferred.
(or is so=high done automatically? would be fine - more stable - otherwise master may still need delay)

Recommended Communication Procedure for SLAVE unit (external clock)
- Initialize data which is to be sent to master.
- Set Start=0 and SO=0 (SO=LOW indicates that slave is (almost) ready).
- Set Start=1 and SO=1 (SO=HIGH indicates not ready, applied after transfer).
  (Expl. Old SO=LOW kept output until 1st clock bit received).
  (Expl. New SO=HIGH is automatically output at transfer completion).
- Set SO to LOW to indicate that master may start now.
- Wait for IRQ (or for Start bit to become zero). (Check timeout here!)
- Process received data.
- Repeat procedure if more data is to be transferred.

Recommended Communication Procedure for MASTER unit (internal clock)
- Initialize data which is to be sent to slave.
- Wait for SI to become LOW (slave ready). (Check timeout here!)
- Set Start flag.
- Wait for IRQ (or for Start bit to become zero).
- Process received data.
- Repeat procedure if more data is to be transferred.

Cable Protocol
During inactive transfer, the shift clock (SC) is high. The transmit (SO) and receive (SI) data lines may be manually controlled as described above.
When master sends SC=LOW, each master and slave must output the next outgoing data bit to SO. When master sends SC=HIGH, each master and slave must read out the opponents data bit from SI. This is repeated for each of the 8 or 32 bits, and when completed SC will be kept high again.

Transfer Rates
Either 256KHz or 2MHz rates can be selected for SC, so max 32KBytes (256Kbit) or 128KBytes (2Mbit) can be transferred per second. However, the software must process each 8bit or 32bit of transmitted data separately, so the actual transfer rate will be reduced by the time spent on handling each data unit.
Only 256KHz provides stable results in most cases (such like when linking between two GBAs). The 2MHz rate is intended for special expansion hardware (with very short wires) only.

Using Normal mode for One-Way Multiplayer communication
When using normal mode with multiplay-cables, data isn't exchanged between first and second GBA as usually. Instead, data is shifted from first to last GBA (the first GBA receives zero, because master SI is shortcut to GND).
This behaviour may be used for fast ONE-WAY data transfer from master to all other GBAs. For example (3 GBAs linked):
  Step         Sender      1st Recipient   2nd Recipient
  Transfer 1:  DATA #0 --> UNDEF      -->  UNDEF     -->
  Transfer 2:  DATA #1 --> DATA #0    -->  UNDEF     -->
  Transfer 3:  DATA #2 --> DATA #1    -->  DATA #0   -->
  Transfer 4:  DATA #3 --> DATA #2    -->  DATA #1   -->
The recipients should not output any own data, instead they should forward the previously received data to the next recipient during next transfer (just keep the incoming data unmodified in the data register).
Due to the delayed forwarding, 2nd recipient should ignore the first incoming data. After the last transfer, the sender must send one (or more) dummy data unit(s), so that the last data is forwarded to the 2nd (or further) recipient(s).

  SIO Multi-Player Mode

Multi-Player mode can be used to communicate between up to 4 units.

4000134h - RCNT (R) - Mode Selection, in Normal/Multiplayer/UART modes (R/W)
  Bit   Expl.
  0-3   Undocumented (current SC,SD,SI,SO state, as for General Purpose mode)
  4-8   Not used     (Should be 0, bits are read/write-able though)
  9-13  Not used     (Always 0, read only)
  14    Not used     (Should be 0, bit is read/write-able though)
  15    Must be zero (0) for Normal/Multiplayer/UART modes
Note: Even though undocumented, many Nintendo games are using Bit 0 to test current SC state in multiplay mode.

4000128h - SIOCNT - SIO Control, usage in MULTI-PLAYER Mode (R/W)
  Bit   Expl.
  0-1   Baud Rate     (0-3: 9600,38400,57600,115200 bps)
  2     SI-Terminal   (0=Parent, 1=Child)                  (Read Only)
  3     SD-Terminal   (0=Bad connection, 1=All GBAs Ready) (Read Only)
  4-5   Multi-Player ID     (0=Parent, 1-3=1st-3rd child)  (Read Only)
  6     Multi-Player Error  (0=Normal, 1=Error)            (Read Only)
  7     Start/Busy Bit      (0=Inactive, 1=Start/Busy) (Read Only for Slaves)
  8-11  Not used            (R/W, should be 0)
  12    Must be "0" for Multi-Player mode
  13    Must be "1" for Multi-Player mode
  14    IRQ Enable          (0=Disable, 1=Want IRQ upon completion)
  15    Not used            (Read only, always 0)
The ID Bits are undefined until the first transfer has completed.

400012Ah - SIOMLT_SEND - Data Send Register (R/W)
Outgoing data (16 bit) which is to be sent to the other GBAs.

4000120h - SIOMULTI0 - SIO Multi-Player Data 0 (Parent) (R/W)
4000122h - SIOMULTI1 - SIO Multi-Player Data 1 (1st child) (R/W)
4000124h - SIOMULTI2 - SIO Multi-Player Data 2 (2nd child) (R/W)
4000126h - SIOMULTI3 - SIO Multi-Player Data 3 (3rd child) (R/W)
These registers are automatically reset to FFFFh upon transfer start.
After transfer, these registers contain incoming data (16bit each) from all remote GBAs (if any / otherwise still FFFFh), as well as the local outgoing SIOMLT_SEND data.
Ie. after the transfer, all connected GBAs will contain the same values in their SIOMULTI0-3 registers.

- Initialize RCNT Bit 14-15 and SIOCNT Bit 12-13 to select Multi-Player mode.
- Read SIOCNT Bit 3 to verify that all GBAs are in Multi-Player mode.
- Read SIOCNT Bit 2 to detect whether this is the Parent/Master unit.

Recommended Transmission Procedure
- Write outgoing data to SIODATA_SEND.
- Master must set Start bit.
- All units must process received data in SIOMULTI0-3 when transfer completed.
- After the first successful transfer, ID Bits in SIOCNT are valid.
- If more data is to be transferred, repeat procedure.
The parent unit blindly sends data regardless of whether childs have already processed old data/supplied new data. So, parent unit might be required to insert delays between each transfer, and/or perform error checking.
Also, slave units may signalize that they are not ready by temporarily switching into another communication mode (which does not output SD High, as Multi-Player mode does during inactivity).

Transfer Protocol
- The masters SI pin is always LOW.
- When all GBAs are in Multiplayer mode (ready) SD is HIGH.
- When master starts the transfer, it sets SC=LOW, slaves receive Busy bit.
Step A
- ID Bits in master unit are set to 0.
- Master outputs Startbit (LOW), 16bit Data, Stopbit (HIGH) through SD.
- This data is written to SIOMULTI0 of all GBAs (including master).
- Master forwards LOW from its SO to 1st childs SI.
- Transfer ends if next child does not output data after certain time.
Step B
- ID Bits in 1st child unit are set to 1.
- 1st Child outputs Startbit (LOW), 16bit Data, Stopbit (HIGH) through SD.
- This data is written to SIOMULTI1 of all GBAs (including 1st child).
- 1st child forwards LOW from its SO to 2nd childs SI.
- Transfer ends if next child does not output data after certain time.
Step C
- ID Bits in 2nd child unit are set to 2.
- 2nd Child outputs Startbit (LOW), 16bit Data, Stopbit (HIGH) through SD.
- This data is written to SIOMULTI2 of all GBAs (including 2nd child).
- 2nd child forwards LOW from its SO to 3rd childs SI.
- Transfer ends if next child does not output data after certain time.
Step D
- ID Bits in 3rd child unit are set to 3.
- 3rd Child outputs Startbit (LOW), 16bit Data, Stopbit (HIGH) through SD.
- This data is written to SIOMULTI3 of all GBAs (including 3rd child).
- Transfer ends (this was the last child).
Transfer end
- Master sets SC=HIGH, all GBAs set SO=HIGH.
- The Start/Busy bits of all GBAs are automatically cleared.
- Interrupts are requested in all GBAs (as far as enabled).

Error Bit
This bit is set when a slave did not receive SI=LOW even though SC=LOW signalized a transfer (this might happen when connecting more than 4 GBAs, or when the previous child is not connected). Also, the bit is set when a Stopbit wasn't HIGH.
The error bit may be undefined during active transfer - read only after transfer completion (the transfer continues and completes as normal even if errors have occurred for some or all GBAs).
Don't know: The bit is automatically reset/initialized with each transfer, or must be manually reset?

Transmission Time
The transmission time depends on the selected Baud rate. And on the amount of Bits (16 data bits plus start/stop bits for each GBA), delays between data for each GBA, plus final timeout (if less than 4 GBAs). That is, depending on the number of connected GBAs:
  GBAs    Bits    Delays   Timeout
  1       18      None     Yes
  2       36      1        Yes
  3       54      2        Yes
  4       72      3        None
(The average Delay and Timeout periods are unknown?)
Above is not counting the additional CPU time that must be spent on initiating and processing each transfer.

Fast One-Way Transmission
Beside for the actual SIO Multiplayer mode, you can also use SIO Normal mode for fast one-way data transfer from Master unit to all Child unit(s). See chapter about SIO Normal mode for details.


This mode works much like a RS232 port, however, the voltages are unknown, probably 0/3V rather than +/-12V ?. SI and SO are data lines (with crossed wires), SC and SD signalize Clear to Send (with crossed wires also, which requires special cable when linking between two GBAs ?)

4000134h - RCNT (R) - Mode Selection, in Normal/Multiplayer/UART modes (R/W)
  Bit   Expl.
  0-3   Undocumented (current SC,SD,SI,SO state, as for General Purpose mode)
  4-8   Not used     (Should be 0, bits are read/write-able though)
  9-13  Not used     (Always 0, read only)
  14    Not used     (Should be 0, bit is read/write-able though)
  15    Must be zero (0) for Normal/Multiplayer/UART modes

4000128h - SCCNT_L - SIO Control, usage in UART Mode (R/W)
  Bit   Expl.
  0-1   Baud Rate  (0-3: 9600,38400,57600,115200 bps)
  2     CTS Flag   (0=Send always/blindly, 1=Send only when SC=LOW)
  3     Parity Control (0=Even, 1=Odd)
  4     Send Data Flag      (0=Not Full,  1=Full)    (Read Only)
  5     Receive Data Flag   (0=Not Empty, 1=Empty)   (Read Only)
  6     Error Flag          (0=No Error,  1=Error)   (Read Only)
  7     Data Length         (0=7bits,   1=8bits)
  8     FIFO Enable Flag    (0=Disable, 1=Enable)
  9     Parity Enable Flag  (0=Disable, 1=Enable)
  10    Send Enable Flag    (0=Disable, 1=Enable)
  11    Receive Enable Flag (0=Disable, 1=Enable)
  12    Must be "1" for UART mode
  13    Must be "1" for UART mode
  14    IRQ Enable          (0=Disable, 1=IRQ when any Bit 4/5/6 become set)
  15    Not used            (Read only, always 0)

400012Ah - SIODATA8 - usage in UART Mode (R/W)
Addresses the send/receive shift register, or (when FIFO is used) the send/receive FIFO. In either case only the lower 8bit of SIODATA8 are used, the upper 8bit are not used.
The send/receive FIFO may store up to four 8bit data units each. For example, while 1 unit is still transferred from the send shift register, it is possible to deposit another 4 units in the send FIFO, which are then automatically moved to the send shift register one after each other.

Send/Receive Enable, CTS Feedback
The receiver outputs SD=LOW (which is input as SC=LOW at the remote side) when it is ready to receive data (that is, when Receive Enable is set, and the Receive shift register (or receive FIFO) isn't full.
When CTS flag is set to always/blindly, then the sender transmits data immediately when Send Enable is set, otherwise data is transmitted only when Send Enable is set and SC is LOW.

Error Flag
The error flag is set when a bad stop bit has been received (stop bit must be 0), when a parity error has occurred (if enabled), or when new data has been completely received while the receive data register (or receive FIFO) is already full.
The error flag is automatically reset when reading from SIOCNT register.

Init & Initback
The content of the FIFO is reset when FIFO is disabled in UART mode, thus, when entering UART mode initially set FIFO=disabled.
The Send/Receive enable bits must be reset before switching from UART mode into another SIO mode!


This communication mode uses Nintendo's standardized JOY Bus protocol. When using this communication mode, the GBA is always operated as SLAVE!

In this mode, SI and SO pins are data lines (apparently synchronized by Start/Stop bits?), SC and SD are set to low (including during active transfer?), the transfer rate is unknown?

4000134h - RCNT (R) - Mode Selection, in JOY BUS mode (R/W)
  Bit   Expl.
  0-3   Undocumented (current SC,SD,SI,SO state, as for General Purpose mode)
  4-8   Not used     (Should be 0, bits are read/write-able though)
  9-13  Not used     (Always 0, read only)
  14    Must be "1" for JOY BUS Mode
  15    Must be "1" for JOY BUS Mode

4000128h - SIOCNT - SIO Control, not used in JOY BUS Mode
This register is not used in JOY BUS mode.

4000140h - JOYCNT - JOY BUS Control Register (R/W)
  Bit   Expl.
  0     Device Reset Flag     (Command FFh)          (Read/Acknowledge)
  1     Receive Complete Flag (Command 14h or 15h?)  (Read/Acknowledge)
  2     Send Complete Flag    (Command 15h or 14h?)  (Read/Acknowledge)
  3-5   Not used
  6     IRQ when receiving a Device Reset Command  (0=Disable, 1=Enable)
  7-31  Not used
Bit 0-2 are working much like the bits in the IF register: Write a "1" bit to reset (acknowledge) the respective bit.
UNCLEAR: Interrupts can be requested for Send/Receive commands also?

4000150h - JOY_RECV_L - Receive Data Register low (R/W)
4000152h - JOY_RECV_H - Receive Data Register high (R/W)
4000154h - JOY_TRANS_L - Send Data Register low (R/W)
4000156h - JOY_TRANS_H - Send Data Register high (R/W)
Send/receive data registers.

4000158h - JOYSTAT - Receive Status Register (R/W)
  Bit   Expl.
  0     Not used
  1     Receive Status Flag   (0=Remote GBA is/was receiving) (Read Only?)
  2     Not used
  3     Send Status Flag      (1=Remote GBA is/was sending)   (Read Only?)
  4-5   General Purpose Flag  (Not assigned, may be used for whatever purpose)
  6-31  Not used
Bit 1 is automatically set when writing to local JOY_TRANS.
Bit 3 is automatically reset when reading from local JOY_RECV.

Below are the four possible commands which can be received by the GBA. Note that the GBA (slave) cannot send any commands itself, all it can do is to read incoming data, and to provide 'reply' data which may (or may not) be read out by the master unit.

Command FFh - Device Reset
  Receive FFh (Command)
  Send    00h (GBA Type number LSB (or MSB?))
  Send    04h (GBA Type number MSB (or LSB?))
  Send    XXh (lower 8bits of SIOSTAT register)

Command 00h - Type/Status Data Request
  Receive 00h (Command)
  Send    00h (GBA Type number LSB (or MSB?))
  Send    04h (GBA Type number MSB (or LSB?))
  Send    XXh (lower 8bits of SIOSTAT register)

Command 15h - GBA Data Write (to GBA)
  Receive 15h (Command)
  Receive XXh (Lower 8bits of JOY_RECV_L)
  Receive XXh (Upper 8bits of JOY_RECV_L)
  Receive XXh (Lower 8bits of JOY_RECV_H)
  Receive XXh (Upper 8bits of JOY_RECV_H)
  Send    XXh (lower 8bits of SIOSTAT register)

Command 14h - GBA Data Read (from GBA)
  Receive 14h (Command)
  Send    XXh (Lower 8bits of JOY_TRANS_L)
  Send    XXh (Upper 8bits of JOY_TRANS_L)
  Send    XXh (Lower 8bits of JOY_TRANS_H)
  Send    XXh (Upper 8bits of JOY_TRANS_H)
  Send    XXh (lower 8bits of SIOSTAT register)

  SIO General-Purpose Mode

In this mode, the SIO is 'misused' as a 4bit bi-directional parallel port, each of the SI,SO,SC,SD pins may be directly controlled, each can be separately declared as input (with internal pull-up) or as output signal.

4000134h - RCNT (R) - SIO Mode, usage in GENERAL-PURPOSE Mode (R/W)
Interrupts can be requested when SI changes from HIGH to LOW, as General Purpose mode does not require a serial shift clock, this interrupt may be produced even when the GBA is in Stop (low power standby) state.
  Bit   Expl.
  0     SC Data Bit         (0=Low, 1=High)
  1     SD Data Bit         (0=Low, 1=High)
  2     SI Data Bit         (0=Low, 1=High)
  3     SO Data Bit         (0=Low, 1=High)
  4     SC Direction        (0=Input, 1=Output)
  5     SD Direction        (0=Input, 1=Output)
  6     SI Direction        (0=Input, 1=Output, but see below)
  7     SO Direction        (0=Input, 1=Output)
  8     SI Interrupt Enable (0=Disable, 1=Enable)
  9-13  Not used
  14    Must be "0" for General-Purpose Mode
  15    Must be "1" for General-Purpose or JOYBUS Mode
SI should be always used as Input to avoid problems with other hardware which does not expect data to be output there.

4000128h - SIOCNT - SIO Control, not used in GENERAL-PURPOSE Mode
This register is not used in general purpose mode. That is, the separate bits of SIOCNT still exist and are read- and/or write-able in the same manner as for Normal, Multiplay, or UART mode (depending on SIOCNT Bit 12,13), but are having no effect on data being output to the link port.

  SIO Control Registers Summary

Mode Selection (by RCNT.15-14 and SIOCNT.13-12)
  R.15 R.14 S.13 S.12 Mode
  0    x    0    0    Normal 8bit
  0    x    0    1    Normal 32bit
  0    x    1    0    Multiplay 16bit
  0    x    1    1    UART (RS232)
  1    0    x    x    General Purpose
  1    1    x    x    JOY BUS

  Bit    0      1    2     3      4 5 6   7     8    9      10   11
  Normal Master Rate SI/In SO/Out - - -   Start -    -      -    -
  Multi  Baud   Baud SI/In SD/In  ID# Err Start -    -      -    -
  UART   Baud   Baud CTS   Parity S R Err Bits  FIFO Parity Send Recv

  GBA Wireless Adapter

GBA Wireless Adapter (AGB-015 or OXY-004)
GBA Wireless Adapter Games
GBA Wireless Adapter Login
GBA Wireless Adapter Commands
GBA Wireless Adapter Component Lists

  GBA Wireless Adapter Games

GBA Wireless Adapter compatible Games
  bit Generations series (Japan only)
  Boktai 2: Solar Boy Django (Konami)
  Boktai 3: Sabata's Counterattack
  Classic NES Series: Donkey Kong
  Classic NES Series: Dr. Mario
  Classic NES Series: Ice Climber
  Classic NES Series: Pac-Man
  Classic NES Series: Super Mario Bros.
  Classic NES Series: Xevious
  Digimon Racing (Bandai) (No Wireless Adapter support in European release)
  Dragon Ball Z: Buu's Fury (Atari)
  Famicom Mini Series: #13 Balloon Fight
  Famicom Mini Series: #12 Clu Clu Land
  Famicom Mini Series: #16 Dig Dug
  Famicom Mini Series: #02 Donkey Kong
  Famicom Mini Series: #15 Dr. Mario
  Famicom Mini Series: #03 Ice Climber
  Famicom Mini Series: #18 Makaimura
  Famicom Mini Series: #08 Mappy
  Famicom Mini Series: #11 Mario Bros.
  Famicom Mini Series: #06 Pac-Man
  Famicom Mini Series: #30 SD Gundam World Scramble Wars
  Famicom Mini Series: #01 Super Mario Bros.
  Famicom Mini Series: #21 Super Mario Bros.
  Famicom Mini Series: #19 Twin Bee
  Famicom Mini Series: #14 Wrecking Crew
  Famicom Mini Series: #07 Xevious
  Hamtaro: Ham-Ham Games (Nintendo)
  Lord of the Rings: The Third Age, The (EA Games)
  Mario Golf: Advance Tour (Nintendo)
  Mario Tennis: Power Tour (Nintendo)
  Mega Man Battle Network 5: Team Protoman (Capcom)
  Mega Man Battle Network 5: Team Colonel (Capcom)
  Mega Man Battle Network 6: Cybeast Falzar
  Mega Man Battle Network 6: Cybeast Gregar
  Momotaro Dentetsu G: Make a Gold Deck! (Japan only)
  Pokemon Emerald (Nintendo)
  Pokemon FireRed (Nintendo)
  Pokemon LeafGreen (Nintendo)
  Sennen Kazoku (Japan only)
  Shrek SuperSlam
  Sonic Advance 3

  GBA Wireless Adapter Login

GBA Wireless Adapter Login
  rcnt=8000h    ;\
  rcnt=80A0h    ;
  rcnt=80A2h    ; reset adapter or so
  wait          ;
  rcnt=80A0h    ;/
  siocnt=5003h  ;\set 32bit normal mode, 2MHz internal clock
  rcnt=0000h    ;/
  passes=0, index=0
  passes=passes+1, if passes>32 then ERROR  ;give up (usually only 10 passses)
  recv.lo=siodata AND FFFFh    ;response from adapter
  recv.hi=siodata/10000h       ;adapter's own "NI" data
  if send.hi<>recv.lo then index=0, goto @@stuck  ;<-- fallback to index=0
  if (send.lo XOR FFFFh)<>recv.lo then goto @@stuck
  if (send.hi XOR FFFFh)<>recv.hi then goto @@stuck
  send.hi=recv.hi XOR FFFFh
  siocnt.bit7=1                        ;<-- start transmission
  if index<4 then goto @@lop
 @@key_string db 'NINTENDO',01h,80h    ;10 bytes (5 halfwords; index=0..4)

Data exchanged during Login
               GBA                         ADAPTER
               xxxx494E ;\     <-->        xxxxxxxx
               xxxx494E ; "NI" <--> "NI"/; 494EB6B1 ;\
  NOT("NI") /; B6B1494E ;/     <-->     \; 494EB6B1 ; NOT("NI")
            \; B6B1544E ;\"NT" <--> "NT"/; 544EB6B1 ;/
  NOT("NT") /; ABB1544E ;/     <-->     \; 544EABB1 ;\NOT("NT")
            \; ABB14E45 ;\"EN" <--> "EN"/; 4E45ABB1 ;/
  NOT("EN") /; B1BA4E45 ;/     <-->     \; 4E45B1BA ;\NOT("EN")
            \; B1BA4F44 ;\"DO" <--> "DO"/; 4F44B1BA ;/
  NOT("DO") /; B0BB4F44 ;/     <-->     \; 4F44B0BB ;\NOT("DO")
            \; B0BB8001 ;-fin  <-->  fin-; 8001B0BB ;/
                 \   \                      \   \
                  \   LSBs=Own               \   LSBs=Inverse of
                   \   Data.From.Gba          \   Prev.Data.From.Gba
                    \                          \
                     MSBs=Inverse of            MSBs=Own
                      Prev.Data.From.Adapter     Data.From.Adapter

  GBA Wireless Adapter Commands

Wireless Command/Parameter Transmission
  GBA       Adapter
  9966ppcch 80000000h   ;-send command (cc), and num param_words (pp)
  <param01> 80000000h   ;\
  <param02> 80000000h   ; send "pp" parameter word(s), if any
  ...       ...         ;/
  80000000h 9966rraah   ;-recv ack (aa=cc+80h), and num response_words (rr)
  80000000? <reply01>   ;\
  80000000? <reply02>   ; recv "rr" response word(s), if any
  ...       ...         ;/
Wireless 32bit Transfers
  wait until [4000128h].Bit2=0  ;want SI=0
  set [4000128h].Bit3=1         ;set SO=1
  wait until [4000128h].Bit2=1  ;want SI=1
  set [4000128h].Bit3=0,Bit7=1  ;set SO=0 and start 32bit transfer
All command/param/reply transfers should be done at Internal Clock (except, Response Words for command 25h,27h,35h,37h should use External Clock).

Wireless Commands
  Cmd Para Reply Name
  10h -    -     Hello (send immediately after login)
  11h -    1     Good/Bad response to cmd 16h ?
  13h -    1
  16h 6    -     Introduce (send game/user name)
  17h 1    -     Config (send after Hello) (eg. param=003C0420h or 003C043Ch)
  1Ch -    -
  1Dh -    NN    Get Directory? (receive list of game/user names?)
  1Eh -    NN    Get Directory? (receive list of game/user names?)
  1Fh 1    -     Select Game for Download (send 16bit Game_ID)

  20h -    1
  21h -    1     Good/Bad response to cmd 1Fh ?
  24h -    -
  25h                                       ;use EXT clock!
  26h -    -
  27h -    -     Begin Download ?           ;use EXT clock!

  30h 1    -
  35h                                       ;use EXT clock!
  37h                                       ;use EXT clock!
  3Dh -    -     Bye (return to language select)
Special Response 996601EEh for error or so? (only at software side?)

  GBA Wireless Adapter Component Lists

Main Chipset
  U1 32pin Freescale MC13190 (2.4 GHz ISM band transceiver)
  U2 48pin Freescale CT3000 or CT3001 (depending on adapter version)
  X3  2pin 9.5MHz crystal
The MC13190 is a Short-Range, Low-Power 2.4 GHz ISM band transceiver.
The processor is Motorola's 32-bit M-Core RISC engine. (?) MCT3000 (?)
See also:

Version with GERMAN Postal Code on sticker:
  Sticker on Case:
    "Pat.Pend.Made in Philipines, CE0125(!)B"
    "MODEL NO./MODELE NO.AGB-015 D-63760 Grossosteim P/AGB-A-WA-EUR-2 E3"
  PCB: "19-C046-04, A-7" (top side) and "B-7" and Microchip ",\\" (bottom side)
  PCB: white stamp "3104, 94V-0, RU, TW-15"
  PCB: black stamp "22FDE"
  U1 32pin "Freescale 13190, 4WFQ" (MC13190) (2.4 GHz ISM band transceiver)
  U2 48pin "Freescale CT3001, XAC0445"  (bottom side)
  X3  2pin "D959L4I" (9.5MHz)           (top side) (ca. 19 clks per 2us)
Further components... top side (A-7)
  D1   5pin "D6F, 44"   (top side, below X3)
  U71  6pin ".., () 2"  (top side, right of X3, tiny black chip)
  B71  6pin "[]"        (top side, right of X3, small white chip)
  ANT  2pin on-board copper wings
  Q?   3pin             (top side, above CN1)
  Q?   3pin             (top side, above CN1)
  D?   2pin "72"        (top side, above CN1)
  D3   2pin "F2"        (top side, above CN1)
  U200 4pin "MSV"       (top side, above CN1)
  U202 5pin "LXKA"      (top side, right of CN1)
  U203 4pin "M6H"       (top side, right of CN1)
  CN1  6pin connector to GBA link port (top side)
Further components... bottom side (B-7)
  U201 5pin "LXVB"      (bottom side, near CN1)
  U72  4pin "BMs"       (bottom side, near ANT, tiny black chip)
  FL70 ?pin "[] o26"    (bottom side, near ANT, bigger white chip)
  B70  6pin "[]"        (bottom side, near ANT, small white chip)
Plus, resistors and capacitors (without any markings).

Version WITHOUT sticker:
  Sticker on Case: N/A
  PCB: "19-C046-03, A-1" (top side) and "B-1" and Microchip ",\\" (bottom side)
  PCB: white stamp "3204, TW-15, RU, 94V-0"
  PCB: black stamp "23MN" or "23NH" or so (smeared)
  U1 32pin "Freescale 13190, 4FGD"      (top side)
  U2 48pin "Freescale CT3000, XAB0425"  (bottom side) ;CT3000 (not CT3001)
  X3  2pin "9.5SKSS4GT"                 (top side)
Further components... top side (A-1)
  D1   5pin "D6F, 31"   (top side, below X3)
  U71  6pin "P3, () 2"  (top side, right of X3, tiny black chip)
  B71  6pin "[]"        (top side, right of X3, small white chip)
  ANT  2pin on-board copper wings
  Q70  3pin             (top side, above CN1)
  D?   2pin "72"        (top side, above CN1)
  D3   2pin "F2"        (top side, above CN1)
  U200 4pin "MSV"       (top side, above CN1)
  U202 5pin "LXKH"      (top side, right of CN1)
  U203 4pin "M6H"       (top side, right of CN1)
  CN1  6pin connector to GBA link port (top side)
Further components... bottom side (B-1)
  U201 5pin "LXV2"      (bottom side, near CN1)
  U70  6pin "AAG"       (bottom side, near ANT, tiny black chip)
  FL70 ?pin "[] o26"    (bottom side, near ANT, bigger white chip)
  B70  6pin "[]"        (bottom side, near ANT, small white chip)
Plus, resistors and capacitors (without any markings).

Major Differences
  Sticker      "N/A"                     vs "Grossosteim P/AGB-A-WA-EUR-2 E3"
  PCB-markings "19-C046-03, A-1, 3204"   vs "19-C046-04, A-7, 3104"
  U1           "CT3000, XAB0425"         vs "CT3001, XAC0445"
  Transistors  One transistor (Q70)      vs Two transistors (both nameless)
  U70/U72      U70 "AAG" (6pin)          vs U72 "BMs" (4pin)
Purpose of the changes is unknown (either older/newer revisions, or different regions with different FCC regulations).

  GBA Infrared Communication

Early GBA prototypes have been intended to include a built-in IR port for sending and receiving IR signals. Among others, this port could have been used to communicate with other GBAs, or older CGB models, or TV Remote Controls, etc.

Anyways, the prototype specifications have been as shown below...

Keep in mind that the IR signal may be interrupted by whatever objects moved between sender and receiver - the IR port isn't recommended for programs that require realtime data exchange (such like action games).

4000136h - IR - Infrared Register (R/W)
  Bit   Expl.
  0     Transmission Data  (0=LED Off, 1=LED On)
  1     READ Enable        (0=Disable, 1=Enable)
  2     Reception Data     (0=None, 1=Signal received) (Read only)
  3     AMP Operation      (0=Off, 1=On)
  4     IRQ Enable Flag    (0=Disable, 1=Enable)
  5-15  Not used
When IRQ is enabled, an interrupt is requested if the incoming signal was 0.119us Off (2 cycles), followed by 0.536us On (9 cycles) - minimum timing periods each.

Transmission Notes
When transmitting an IR signal, note that it'd be not a good idea to keep the LED turned On for a very long period (such like sending a 1 second synchronization pulse). The recipient's circuit would treat such a long signal as "normal IR pollution which is in the air" after a while, and thus ignore the signal.

Reception Notes
Received data is internally latched. Latched data may be read out by setting both READ and AMP bits.
Note: Provided that you don't want to receive your own IR signal, be sure to set Bit 0 to zero before attempting to receive data.

After using the IR port, be sure to reset the register to zero in order to reduce battery power consumption.

  GBA Keypad Input

The built-in GBA gamepad has 4 direction keys, and 6 buttons.

4000130h - KEYINPUT - Key Status (R)
  Bit   Expl.
  0     Button A        (0=Pressed, 1=Released)
  1     Button B        (etc.)
  2     Select          (etc.)
  3     Start           (etc.)
  4     Right           (etc.)
  5     Left            (etc.)
  6     Up              (etc.)
  7     Down            (etc.)
  8     Button R        (etc.)
  9     Button L        (etc.)
  10-15 Not used
It'd be usually recommended to read-out this register only once per frame, and to store the current state in memory. As a side effect, this method avoids problems caused by switch bounce when a key is newly released or pressed.

4000132h - KEYCNT - Key Interrupt Control (R/W)
The keypad IRQ function is intended to terminate the very-low-power Stop mode, it is not suitable for processing normal user input, to do this, most programs are invoking their keypad handlers from within VBlank IRQ.
  Bit   Expl.
  0     Button A        (0=Ignore, 1=Select)
  1     Button B        (etc.)
  2     Select          (etc.)
  3     Start           (etc.)
  4     Right           (etc.)
  5     Left            (etc.)
  6     Up              (etc.)
  7     Down            (etc.)
  8     Button R        (etc.)
  9     Button L        (etc.)
  10-13 Not used
  14    IRQ Enable Flag (0=Disable, 1=Enable)
  15    IRQ Condition   (0=Logical OR, 1=Logical AND)
In logical OR mode, an interrupt is requested when at least one of the selected buttons is pressed.
In logical AND mode, an interrupt is requested when ALL of the selected buttons are pressed.

In 8bit gameboy compatibility mode, L and R Buttons are used to toggle the screen size between normal 160x144 pixels and stretched 240x144 pixels.
The GBA SP is additionally having a * Button used to toggle the backlight on and off (controlled by separate hardware logic, there's no way to detect or change the current backlight state by software).

  GBA Interrupt Control

4000208h - IME - Interrupt Master Enable Register (R/W)
  Bit   Expl.
  0     Disable all interrupts         (0=Disable All, 1=See IE register)
  1-31  Not used

4000200h - IE - Interrupt Enable Register (R/W)
  Bit   Expl.
  0     LCD V-Blank                    (0=Disable)
  1     LCD H-Blank                    (etc.)
  2     LCD V-Counter Match            (etc.)
  3     Timer 0 Overflow               (etc.)
  4     Timer 1 Overflow               (etc.)
  5     Timer 2 Overflow               (etc.)
  6     Timer 3 Overflow               (etc.)
  7     Serial Communication           (etc.)
  8     DMA 0                          (etc.)
  9     DMA 1                          (etc.)
  10    DMA 2                          (etc.)
  11    DMA 3                          (etc.)
  12    Keypad                         (etc.)
  13    Game Pak (external IRQ source) (etc.)
  14-15 Not used
Note that there is another 'master enable flag' directly in the CPUs Status Register (CPSR) accessible in privileged modes, see CPU reference for details.

4000202h - IF - Interrupt Request Flags / IRQ Acknowledge (R/W, see below)
  Bit   Expl.
  0     LCD V-Blank                    (1=Request Interrupt)
  1     LCD H-Blank                    (etc.)
  2     LCD V-Counter Match            (etc.)
  3     Timer 0 Overflow               (etc.)
  4     Timer 1 Overflow               (etc.)
  5     Timer 2 Overflow               (etc.)
  6     Timer 3 Overflow               (etc.)
  7     Serial Communication           (etc.)
  8     DMA 0                          (etc.)
  9     DMA 1                          (etc.)
  10    DMA 2                          (etc.)
  11    DMA 3                          (etc.)
  12    Keypad                         (etc.)
  13    Game Pak (external IRQ source) (etc.)
  14-15 Not used
Interrupts must be manually acknowledged by writing a "1" to one of the IRQ bits, the IRQ bit will then be cleared.

"[Cautions regarding clearing IME and IE]
A corresponding interrupt could occur even while a command to clear IME or each flag of the IE register is being executed. When clearing a flag of IE, you need to clear IME in advance so that mismatching of interrupt checks will not occur." ?

"[When multiple interrupts are used]
When the timing of clearing of IME and the timing of an interrupt agree, multiple interrupts will not occur during that interrupt. Therefore, set (enable) IME after saving IME during the interrupt routine." ?

BIOS Interrupt handling
Upon interrupt execution, the CPU is switched into IRQ mode, and the physical interrupt vector is called - as this address is located in BIOS ROM, the BIOS will always execute the following code before it forwards control to the user handler:
  00000018  b      128h                ;IRQ vector: jump to actual BIOS handler
  00000128  stmfd  r13!,r0-r3,r12,r14  ;save registers to SP_irq
  0000012C  mov    r0,4000000h         ;ptr+4 to 03FFFFFC (mirror of 03007FFC)
  00000130  add    r14,r15,0h          ;retadr for USER handler $+8=138h
  00000134  ldr    r15,[r0,-4h]        ;jump to [03FFFFFC] USER handler
  00000138  ldmfd  r13!,r0-r3,r12,r14  ;restore registers from SP_irq
  0000013C  subs   r15,r14,4h          ;return from IRQ (PC=LR-4, CPSR=SPSR)
As shown above, a pointer to the 32bit/ARM-code user handler must be setup in [03007FFCh]. By default, 160 bytes of memory are reserved for interrupt stack at 03007F00h-03007F9Fh.

Recommended User Interrupt handling
- If necessary switch to THUMB state manually (handler is called in ARM state)
- Determine reason(s) of interrupt by examining IF register
- User program may freely assign priority to each reason by own logic
- Process the most important reason of your choice
- User MUST manually acknowledge by writing to IF register
- If user wants to allow nested interrupts, save SPSR_irq, then enable IRQs.
- If using other registers than BIOS-pushed R0-R3, manually save R4-R11 also.
- Note that Interrupt Stack is used (which may have limited size)
- So, for memory consuming stack operations use system mode (=user stack).
- When calling subroutines in system mode, save LSR_usr also.
- Restore SPSR_irq and/or R4-R11 if you've saved them above.
- Finally, return to BIOS handler by BX LR (R14_irq) instruction.

Default memory usage at 03007FXX (and mirrored to 03FFFFXX)
  Addr.    Size Expl.
  3007FFCh 4    Pointer to user IRQ handler (32bit ARM code)
  3007FF8h 2    Interrupt Check Flag (for IntrWait/VBlankIntrWait functions)
  3007FF4h 4    Allocated Area
  3007FF0h 4    Pointer to Sound Buffer
  3007FE0h 16   Allocated Area
  3007FA0h 64   Default area for SP_svc Supervisor Stack (4 words/time)
  3007F00h 160  Default area for SP_irq Interrupt Stack (6 words/time)
Memory below 7F00h is free for User Stack and user data. The three stack pointers are initially initialized at the TOP of the respective areas:
The user may redefine these addresses and move stacks into other locations, however, the addresses for system data at 7FE0h-7FFFh are fixed.

Not sure, is following free for user ?
Registers R8-R12_fiq, R13_fiq, R14_fiq, SPSR_fiq
Registers R13-R14_abt, SPSR_abt
Registers R13-R14_und, SPSR_und

Fast Interrupt (FIQ)
The ARM CPU provides two interrupt sources, IRQ and FIQ. In the GBA only IRQ is used. In normal GBAs, the FIQ signal is shortcut to VDD35, ie. the signal is always high, and there is no way to generate a FIQ by hardware. The registers R8..12_fiq could be used by software (when switching into FIQ mode by writing to CPSR) - however, this might make the game incompatible with hardware debuggers (which are reportedly using FIQs for debugging purposes).

  GBA System Control

4000204h - WAITCNT - Waitstate Control (R/W)
This register is used to configure game pak access timings. The game pak ROM is mirrored to three address regions at 08000000h, 0A000000h, and 0C000000h, these areas are called Wait State 0-2. Different access timings may be assigned to each area (this might be useful in case that a game pak contains several ROM chips with different access times each).
  Bit   Expl.
  0-1   SRAM Wait Control          (0..3 = 4,3,2,8 cycles)
  2-3   Wait State 0 First Access  (0..3 = 4,3,2,8 cycles)
  4     Wait State 0 Second Access (0..1 = 2,1 cycles)
  5-6   Wait State 1 First Access  (0..3 = 4,3,2,8 cycles)
  7     Wait State 1 Second Access (0..1 = 4,1 cycles; unlike above WS0)
  8-9   Wait State 2 First Access  (0..3 = 4,3,2,8 cycles)
  10    Wait State 2 Second Access (0..1 = 8,1 cycles; unlike above WS0,WS1)
  11-12 PHI Terminal Output        (0..3 = Disable, 4.19MHz, 8.38MHz, 16.78MHz)
  13    Not used
  14    Game Pak Prefetch Buffer (Pipe) (0=Disable, 1=Enable)
  15    Game Pak Type Flag  (Read Only) (0=GBA, 1=CGB) (IN35 signal)
  16-31 Not used
At startup, the default setting is 0000h. Currently manufactured cartridges are using the following settings: WS0/ROM=3,1 clks; SRAM=8 clks; WS2/EEPROM: 8,8 clks; prefetch enabled; that is, WAITCNT=4317h, for more info see "GBA Cartridges" chapter.

First Access (Non-sequential) and Second Access (Sequential) define the waitstates for N and S cycles, the actual access time is 1 clock cycle PLUS the number of waitstates.
GamePak uses 16bit data bus, so that a 32bit access is split into TWO 16bit accesses (of which, the second fragment is always sequential, even if the first fragment was non-sequential).

GBA GamePak Prefetch

The GBA forcefully uses non-sequential timing at the beginning of each 128K-block of gamepak ROM, eg. "LDMIA [801fff8h],r0-r7" will have non-sequential timing at 8020000h.
The PHI Terminal output (PHI Pin of Gamepak Bus) should be disabled.

4000300h - POSTFLG - BYTE - Undocumented - Post Boot / Debug Control (R/W)
After initial reset, the GBA BIOS initializes the register to 01h, and any further execution of the Reset vector (00000000h) will pass control to the Debug vector (0000001Ch) when sensing the register to be still set to 01h.
  Bit   Expl.
  0     Undocumented. First Boot Flag  (0=First, 1=Further)
  1-7   Undocumented. Not used.
Normally the debug handler rejects control unless it detects Debug flags in cartridge header, in that case it may redirect to a cut-down boot procedure (bypassing Nintendo logo and boot delays, much like nocash burst boot for multiboot software). I am not sure if it is possible to reset the GBA externally without automatically resetting register 300h though.

4000301h - HALTCNT - BYTE - Undocumented - Low Power Mode Control (W)
Writing to this register switches the GBA into battery saving mode.
In Halt mode, the CPU is paused as long as (IE AND IF)=0, this should be used to reduce power-consumption during periods when the CPU is waiting for interrupt events.
In Stop mode, most of the hardware including sound and video are paused, this very-low-power mode could be used much like a screensaver.
  Bit   Expl.
  0-6   Undocumented. Not used.
  7     Undocumented. Power Down Mode  (0=Halt, 1=Stop)
The current GBA BIOS addresses only the upper eight bits of this register (by writing 00h or 80h to address 04000301h), however, as the register isn't officially documented, some or all of the bits might have different meanings in future GBA models.
For best forwards compatibility, it'd generally be more recommended to use the BIOS Functions SWI 2 (Halt) or SWI 3 (Stop) rather than writing to this register directly.

4000410h - Undocumented - Purpose Unknown ? 8bit (W)
The BIOS writes the 8bit value 0FFh to this address. Purpose Unknown.
Probably just another bug in the BIOS.

4000800h - 32bit - Undocumented - Internal Memory Control (R/W)
Supported by GBA and GBA SP only - NOT supported by DS (even in GBA mode).
Also supported by GBA Micro - but crashes on "overclocked" WRAM setting.
Initialized to 0D000020h (by hardware). Unlike all other I/O registers, this register is mirrored across the whole I/O area (in increments of 64K, ie. at 4000800h, 4010800h, 4020800h, ..., 4FF0800h)
  Bit   Expl.
  0     Disable 32K+256K WRAM (0=Normal, 1=Disable) (when off: empty/prefetch)
  1-3   Unknown          (Read/Write-able)
  4     Unknown          (Always zero, not used or write only)
  5     Enable 256K WRAM (0=Disable, 1=Normal) (when off: mirror of 32K WRAM)
  6-23  Unknown          (Always zero, not used or write only)
  24-27 Wait Control WRAM 256K (0-14 = 15..1 Waitstates, 15=Lockup)
  28-31 Unknown          (Read/Write-able)
The default value 0Dh in Bits 24-27 selects 2 waitstates for 256K WRAM (ie. 3/3/6 cycles 8/16/32bit accesses). The fastest possible setting would be 0Eh (1 waitstate, 2/2/4 cycles for 8/16/32bit), that works on GBA and GBA SP only, the GBA Micro locks up with that setting (it's on-chip RAM is too slow, and works only with 2 or more waitstates).

Note: One cycle equals approx. 59.59ns (ie. 16.78MHz clock).

  GBA GamePak Prefetch

GamePak Prefetch can be enabled in WAITCNT register. When prefetch buffer is enabled, the GBA attempts to read opcodes from Game Pak ROM during periods when the CPU is not using the bus (if any). Memory access is then performed with 0 Waits if the CPU requests data which is already stored in the buffer. The prefetch buffer stores up to eight 16bit values.

GamePak ROM Opcodes
The prefetch feature works only with <opcodes> fetched from GamePak ROM. Opcodes executed in RAM or BIOS are not affected by the prefetch feature (even if that opcodes read <data> from GamePak ROM).

Prefetch Enable
For GamePak ROM opcodes, prefetch may occur in two situations:
  1) opcodes with internal cycles (I) which do not change R15, shift/rotate
     register-by-register, load opcodes (ldr,ldm,pop,swp), multiply opcodes
  2) opcodes that load/store memory (ldr,str,ldm,stm,etc.)

Prefetch Disable Bug
When Prefetch is disabled, the Prefetch Disable Bug will occur for all
  "Opcodes in GamePak ROM with Internal Cycles which do not change R15"
for those opcodes, the bug changes the opcode fetch time from 1S to 1N.
Note: Affected opcodes (with I cycles) are: Shift/rotate register-by-register opcodes, multiply opcodes, and load opcodes (ldr,ldm,pop,swp).

  GBA Cartridges

GBA Cartridge Header
GBA Cartridge ROM

Backup Media
Aside from ROM, cartridges may also include one of the following backup medias, used to store game positions, highscore tables, options, or other data.
GBA Cart Backup IDs
GBA Cart Backup EEPROM
GBA Cart Backup Flash ROM
GBA Cart Backup DACS

GBA Cart I/O Port (GPIO)
GBA Cart Real-Time Clock (RTC)
GBA Cart Solar Sensor
GBA Cart Tilt Sensor
GBA Cart Gyro Sensor
GBA Cart Rumble
GBA Cart e-Reader
GBA Cart Unknown Devices
GBA Cart Protections

Other Accessoires
GBA Flashcards
GBA Cheat Devices

  GBA Cartridge Header

The first 192 bytes at 8000000h-80000BFh in ROM are used as cartridge header. The same header is also used for Multiboot images at 2000000h-20000BFh (plus some additional multiboot entries at 20000C0h and up).

Header Overview
  Address Bytes Expl.
  000h    4     ROM Entry Point  (32bit ARM branch opcode, eg. "B rom_start")
  004h    156   Nintendo Logo    (compressed bitmap, required!)
  0A0h    12    Game Title       (uppercase ascii, max 12 characters)
  0ACh    4     Game Code        (uppercase ascii, 4 characters)
  0B0h    2     Maker Code       (uppercase ascii, 2 characters)
  0B2h    1     Fixed value      (must be 96h, required!)
  0B3h    1     Main unit code   (00h for current GBA models)
  0B4h    1     Device type      (usually 00h) (bit7=DACS/debug related)
  0B5h    7     Reserved Area    (should be zero filled)
  0BCh    1     Software version (usually 00h)
  0BDh    1     Complement check (header checksum, required!)
  0BEh    2     Reserved Area    (should be zero filled)
  --- Additional Multiboot Header Entries ---
  0C0h    4     RAM Entry Point  (32bit ARM branch opcode, eg. "B ram_start")
  0C4h    1     Boot mode        (init as 00h - BIOS overwrites this value!)
  0C5h    1     Slave ID Number  (init as 00h - BIOS overwrites this value!)
  0C6h    26    Not used         (seems to be unused)
  0E0h    4     JOYBUS Entry Pt. (32bit ARM branch opcode, eg. "B joy_start")
Note: With all entry points, the CPU is initially set into system mode.

000h - Entry Point, 4 Bytes
Space for a single 32bit ARM opcode that redirects to the actual startaddress of the cartridge, this should be usually a "B <start>" instruction.
Note: This entry is ignored by Multiboot slave GBAs (in fact, the entry is then overwritten and redirected to a separate Multiboot Entry Point, as described below).

004h..09Fh - Nintendo Logo, 156 Bytes
Contains the Nintendo logo which is displayed during the boot procedure. Cartridge won't work if this data is missing or modified.
In detail: This area contains Huffman compression data (but excluding the compression header which is hardcoded in the BIOS, so that it'd be probably not possible to hack the GBA by producing de-compression buffer overflows).
A copy of the compression data is stored in the BIOS, the GBA will compare this data and lock-up itself if the BIOS data isn't exactly the same as in the cartridge (or multiboot header). The only exception are the two entries below which are allowed to have variable settings in some bits.

09Ch Bit 2,7 - Debugging Enable
This is part of the above Nintendo Logo area, and must be commonly set to 21h, however, Bit 2 and Bit 7 may be set to other values.
When both bits are set (ie. A5h), the FIQ/Undefined Instruction handler in the BIOS becomes unlocked, the handler then forwards these exceptions to the user handler in cartridge ROM (entry point defined in 80000B4h, see below).
Other bit combinations currently do not seem to have special functions.

09Eh Bit 0,1 - Cartridge Key Number MSBs
This is part of the above Nintendo Logo area, and must be commonly set to F8h, however, Bit 0-1 may be set to other values.
During startup, the BIOS performs some dummy-reads from a stream of pre-defined addresses, even though these reads seem to be meaningless, they might be intended to unlock a read-protection inside of commercial cartridge. There are 16 pre-defined address streams - selected by a 4bit key number - of which the upper two bits are gained from 800009Eh Bit 0-1, and the lower two bits from a checksum across header bytes 09Dh..0B7h (bytewise XORed, divided by 40h).

0A0h - Game Title, Uppercase Ascii, max 12 characters
Space for the game title, padded with 00h (if less than 12 chars).

0ACh - Game Code, Uppercase Ascii, 4 characters
This is the same code as the AGB-UTTD code which is printed on the package and sticker on (commercial) cartridges (excluding the leading "AGB-" part).
  U  Unique Code          (usually "A" or "B" or special meaning)
  TT Short Title          (eg. "PM" for Pac Man)
  D  Destination/Language (usually "J" or "E" or "P" or specific language)
The first character (U) is usually "A" or "B", in detail:
  A  Normal game; Older titles (mainly 2001..2003)
  B  Normal game; Newer titles (2003..)
  C  Normal game; Not used yet, but might be used for even newer titles
  F  Famicom/Classic NES Series (software emulated NES games)
  K  Yoshi and Koro Koro Puzzle (acceleration sensor)
  P  e-Reader (dot-code scanner)
  R  Warioware Twisted (cartridge with rumble and z-axis gyro sensor)
  U  Boktai 1 and 2 (cartridge with RTC and solar sensor)
  V  Drill Dozer (cartridge with rumble)
The second/third characters (TT) are:
  Usually an abbreviation of the game title (eg. "PM" for "Pac Man") (unless
  that gamecode was already used for another game, then TT is just random)
The fourth character (D) indicates Destination/Language:
  J  Japan             P  Europe/Elsewhere   F  French          S  Spanish
  E  USA/English       D  German             I  Italian

0B0h - Maker code, Uppercase Ascii, 2 characters
Identifies the (commercial) developer. For example, "01"=Nintendo.

0B2h - Fixed value, 1 Byte
Must be 96h.

0B3h - Main unit code, 1 Byte
Identifies the required hardware. Should be 00h for current GBA models.

0B4h - Device type, 1 Byte
Normally, this entry should be zero. With Nintendo's hardware debugger Bit 7 identifies the debugging handlers entry point and size of DACS (Debugging And Communication System) memory: Bit7=0: 9FFC000h/8MBIT DACS, Bit7=1: 9FE2000h/1MBIT DACS. The debugging handler can be enabled in 800009Ch (see above), normal cartridges do not have any memory (nor any mirrors) at these addresses though.

0B5h - Reserved Area, 7 Bytes
Reserved, zero filled.

0BCh - Software version number
Version number of the game. Usually zero.

0BDh - Complement check, 1 Byte
Header checksum, cartridge won't work if incorrect. Calculate as such:
chk=0:for i=0A0h to 0BCh:chk=chk-[i]:next:chk=(chk-19h) and 0FFh

0BEh - Reserved Area, 2 Bytes
Reserved, zero filled.

Below required for Multiboot/slave programs only. For Multiboot, the above 192 bytes are required to be transferred as header-block (loaded to 2000000h-20000BFh), and some additional header-information must be located at the beginning of the actual program/data-block (loaded to 20000C0h and up). This extended header consists of Multiboot Entry point(s) which must be set up correctly, and of two reserved bytes which are overwritten by the boot procedure:

0C0h - Normal/Multiplay mode Entry Point
This entry is used only if the GBA has been booted by using Normal or Multiplay transfer mode (but not by Joybus mode).
Typically deposit a ARM-32bit "B <start>" branch opcode at this location, which is pointing to your actual initialization procedure.

0C4h (BYTE) - Boot mode
The slave GBA download procedure overwrites this byte by a value which is indicating the used multiboot transfer mode.
  Value  Expl.
  01h    Joybus mode
  02h    Normal mode
  03h    Multiplay mode
Typically set this byte to zero by inserting DCB 00h in your source.
Be sure that your uploaded program does not contain important program code or data at this location, or at the ID-byte location below.

0C5h (BYTE) - Slave ID Number
If the GBA has been booted in Normal or Multiplay mode, this byte becomes overwritten by the slave ID number of the local GBA (that'd be always 01h for normal mode).
  Value  Expl.
  01h    Slave #1
  02h    Slave #2
  03h    Slave #3
Typically set this byte to zero by inserting DCB 00h in your source.
When booted in Joybus mode, the value is NOT changed and remains the same as uploaded from the master GBA.

0C6h..0DFh - Not used
Appears to be unused.

0E0h - Joybus mode Entry Point
If the GBA has been booted by using Joybus transfer mode, then the entry point is located at this address rather than at 20000C0h. Either put your initialization procedure directly at this address, or redirect to the actual boot procedure by depositing a "B <start>" opcode here (either one using 32bit ARM code). Or, if you are not intending to support joybus mode (which is probably rarely used), ignore this entry.

  GBA Cartridge ROM

ROM Size
The games F-ZERO and Super Mario Advance use ROMs of 4 MBytes each. Zelda uses 8 MBytes. Not sure if other sizes are manufactured.

ROM Waitstates
The GBA starts the cartridge with 4,2 waitstates (N,S) and prefetch disabled. The program may change these settings by writing to WAITCNT, the games F-ZERO and Super Mario Advance use 3,1 waitstates (N,S) each, with prefetch enabled.
Third-party flashcards are reportedly running unstable with these settings. Also, prefetch and shorter waitstates are allowing to read more data and opcodes from ROM is less time, the downside is that it increases the power consumption.

ROM Chip
Because of how 24bit addresses are squeezed through the Gampak bus, the cartridge must include a circuit that latches the lower 16 address bits on non-sequential access, and that increments these bits on sequential access. Nintendo includes this circuit directly in the ROM chip.
Also, the ROM must have 16bit data bus (or a circuit which converts two 8bit data units into one 16bit unit - by not exceeding the waitstate timings).

  GBA Cart Backup IDs

Nintendo didn't include a backup-type entry in the ROM header, however, the required type can be detected by ID strings in the ROM-image. Nintendo's tools are automatically inserting these strings (as part of their library headers). When using other tools, you may insert ID strings by hand.

ID Strings
The ID string must be located at a word-aligned memory location, the string length should be a multiple of 4 bytes (padded with zero's).
  EEPROM_Vnnn    EEPROM 512 bytes or 8 Kbytes (4Kbit or 64Kbit)
  SRAM_Vnnn      SRAM 32 Kbytes (256Kbit)
  FLASH_Vnnn     FLASH 64 Kbytes (512Kbit) (ID used in older files)
  FLASH512_Vnnn  FLASH 64 Kbytes (512Kbit) (ID used in newer files)
  FLASH1M_Vnnn   FLASH 128 Kbytes (1Mbit)
For Nintendo's tools, "nnn" is a 3-digit library version number. When using other tools, best keep it set to "nnn" rather than inserting numeric digits.

No$gba does auto-detect most backup types, even without ID strings, except for 128K FLASH (without ID "FLASH1M_Vnnn", the FLASH size defaults to 64K). Ideally, for faster detection, the ID should be put into the first some bytes of the ROM-image (ie. somewhere right after the ROM header).

  GBA Cart Backup SRAM/FRAM

SRAM - 32 KBytes (256Kbit) Lifetime: Depends on back-up battery
FRAM - 32 KBytes (256Kbit) Lifetime: 10,000,000,000 read/write per bit

Hyundai GM76V256CLLFW10 SRAM (Static RAM) (eg. F-Zero)
Fujitsu MB85R256 FRAM (Ferroelectric RAM) (eg. Warioware Twisted)

Addressing and Waitstates
SRAM/FRAM is mapped to E000000h-E007FFFh, it should be accessed with 8 waitstates (write a value of 3 into Bit0-1 of WAITCNT).

Databus Width
The SRAM/FRAM databus is restricted to 8 bits, it should be accessed by LDRB, LDRSB, and STRB opcodes only.

Reading and Writing
Reading from SRAM/FRAM should be performed by code executed in WRAM only (but not by code executed in ROM). There is no such restriction for writing.

Preventing Data Loss
The GBA SRAM/FRAM carts do not include a write-protect function (unlike older 8bit gameboy carts). This seems to be a problem and may cause data loss when a cartridge is removed or inserted while the GBA is still turned on. As far as I understand, this is not so much a hardware problem, but rather a software problem, ie. theoretically you could remove/insert the cartridge as many times as you want, but you should take care that your program does not crash (and write blindly into memory).

Recommended Workaround
Enable the Gamepak Interrupt (it'll most likely get triggered when removing the cartridge), and hang-up the GBA in an endless loop when your interrupt handler senses a Gamepak IRQ. For obvious reason, your interrupt handler should be located in WRAM, ie. not in the (removed) ROM cartridge. The handler should process Gamepak IRQs at highest priority. Periods during which interrupts are disabled should be kept as short as possible, if necessary allow nested interrupts.

When to use the above Workaround
A program that relies wholly on code and data in WRAM, and that does not crash even when ROM is removed, may keep operating without having to use the above mechanism.
Do NOT use the workaround for programs that run without a cartridge inserted (ie. single gamepak/multiboot slaves), or for programs that use Gamepak IRQ/DMA for other purposes.
All other programs should use it. It'd be eventually a good idea to include it even in programs that do not use SRAM/FRAM themselves (eg. otherwise removing a SRAM/FRAM-less cartridge may lock up the GBA, and may cause it to destroy backup data when inserting a SRAM/FRAM cartridge).

FRAM (Ferroelectric RAM) is a newer technology, used in newer GBA carts, unlike SRAM (Static RAM), it doesn't require a battery to hold the data. At software side, it is accessed exactly like SRAM, ie. unlike EEPROM/FLASH, it doesn't require any Write/Erase commands/delays.

In SRAM/FRAM cartridges, the /REQ pin (Pin 31 of Gamepak bus) should be a little bit shorter as than the other pins; when removing the cartridge, this causes the gamepak IRQ signal to get triggered before the other pins are disconnected.

  GBA Cart Backup EEPROM

9853 - EEPROM 512 Bytes (0200h) (4Kbit) (eg. used by Super Mario Advance)
9854 - EEPROM 8 KBytes (2000h) (64Kbit) (eg. used by Boktai)
Lifetime: 100,000 writes per address

Addressing and Waitstates
The eeprom is connected to Bit0 of the data bus, and to the upper 1 bit (or upper 17 bits in case of large 32MB ROM) of the cartridge ROM address bus, communication with the chip takes place serially.
The eeprom must be used with 8 waitstates (set WAITCNT=X3XXh; 8,8 clks in WS2 area), the eeprom can be then addressed at DFFFF00h..DFFFFFFh.
Respectively, with eeprom, ROM is restricted to 8000000h-9FFFeFFh (max. 1FFFF00h bytes = 32MB minus 256 bytes). On carts with 16MB or smaller ROM, eeprom can be alternately accessed anywhere at D000000h-DFFFFFFh.

Data and Address Width
Data can be read from (or written to) the EEPROM in units of 64bits (8 bytes). Writing automatically erases the old 64bits of data. Addressing works in units of 64bits respectively, that is, for 512 Bytes EEPROMS: an address range of 0-3Fh, 6bit bus width; and for 8KByte EEPROMs: a range of 0-3FFh, 14bit bus width (only the lower 10 address bits are used, upper 4 bits should be zero).

Set Address (For Reading)
Prepare the following bitstream in memory:
  2 bits "11" (Read Request)
  n bits eeprom address (MSB first, 6 or 14 bits, depending on EEPROM)
  1 bit "0"
Then transfer the stream to eeprom by using DMA.

Read Data
Read a stream of 68 bits from EEPROM by using DMA,
then decipher the received data as follows:
  4 bits - ignore these
 64 bits - data (conventionally MSB first)

Write Data to Address
Prepare the following bitstream in memory, then transfer the stream to eeprom by using DMA, it'll take ca. 108368 clock cycles (ca. 6.5ms) until the old data is erased and new data is programmed.
  2 bits "10" (Write Request)
  n bits eeprom address (MSB first, 6 or 14 bits, depending on EEPROM)
 64 bits data (conventionally MSB first)
  1 bit "0"
After the DMA, keep reading from the chip, by normal LDRH [DFFFF00h], until Bit 0 of the returned data becomes "1" (Ready). To prevent your program from locking up in case of malfunction, generate a timeout if the chip does not reply after 10ms or longer.

Using DMA
Transferring a bitstream to/from the EEPROM by LDRH/STRH opcodes does not work, this might be because of timing problems, or because how the GBA squeezes non-sequential memory addresses through the external address/data bus.
For this reason, a buffer in memory must be used (that buffer would be typically allocated temporarily on stack, one halfword for each bit, bit1-15 of the halfwords are don't care, only bit0 is of interest).
The buffer must be transfered as a whole to/from EEPROM by using DMA3 (only DMA 3 is valid to read & write external memory), use 16bit transfer mode, both source and destination address incrementing (ie. DMA3CNT=80000000h+length).
DMA channels of higher priority should be disabled during the transfer (ie. H/V-Blank or Sound FIFO DMAs). And, of course any interrupts that might mess with DMA registers should be disabled.

The EEPROM chips are having only 8 pins, these are connected, Pin 1..8, to ROMCS, RD, WR, AD0, GND, GND, A23, VDD of the GamePak bus. Carts with 32MB ROM must have A7..A22 logically ANDed with A23.

There seems to be no autodection mechanism, so that a hardcoded bus width must be used.

  GBA Cart Backup Flash ROM

64 KBytes - 512Kbits Flash ROM - Lifetime: 10,000 writes per sector
128 KBytes - 1Mbit Flash ROM - Lifetime: ??? writes per sector

Chip Identification (all device types)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=90h  (enter ID mode)
  dev=[E000001h], man=[E000000h]                  (get device & manufacturer)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=F0h  (terminate ID mode)
Used to detect the type (and presence) of FLASH chips. See Device Types below.

Reading Data Bytes (all device types)
  dat=[E00xxxxh]                                  (read byte from address xxxx)

Erase Entire Chip (all device types)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=80h  (erase command)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=10h  (erase entire chip)
  wait until [E000000h]=FFh (or timeout)
Erases all memory in chip, erased memory is FFh-filled.

Erase 4Kbyte Sector (all device types, except Atmel)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=80h  (erase command)
  [E005555h]=AAh, [E002AAAh]=55h, [E00n000h]=30h  (erase sector n)
  wait until [E00n000h]=FFh (or timeout)
Erases memory at E00n000h..E00nFFFh, erased memory is FFh-filled.

Erase-and-Write 128 Bytes Sector (only Atmel devices)
  old=IME, IME=0                                  (disable interrupts)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=A0h  (erase/write sector command)
  [E00xxxxh+00h..7Fh]=dat[00h..7Fh]               (write 128 bytes)
  IME=old                                         (restore old IME state)
  wait until [E00xxxxh+7Fh]=dat[7Fh] (or timeout)
Interrupts (and DMAs) should be disabled during command/write phase. Target address must be a multiple of 80h.

Write Single Data Byte (all device types, except Atmel)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=A0h  (write byte command)
  [E00xxxxh]=dat                                  (write byte to address xxxx)
  wait until [E00xxxxh]=dat (or timeout)
The target memory location must have been previously erased.

Terminate Command after Timeout (only Macronix devices, ID=1CC2h)
  [E005555h]=F0h                            (force end of write/erase command)
Use if timeout occurred during "wait until" periods, for Macronix devices only.

Bank Switching (devices bigger than 64K only)
  [E005555h]=AAh, [E002AAAh]=55h, [E005555h]=B0h  (select bank command)
  [E000000h]=bnk                                  (write bank number 0..1)
Specifies 64K bank number for read/write/erase operations.
Required because gamepak flash/sram addressbus is limited to 16bit width.

Device Types
Nintendo puts different FLASH chips in commercial game cartridges. Developers should thus detect & support all chip types. For Atmel chips it'd be recommended to simulate 4K sectors by software, though reportedly Nintendo doesn't use Atmel chips in newer games anymore. Also mind that different timings should not disturb compatibility and performance.
  ID     Name       Size  Sectors  AverageTimings  Timeouts/ms   Waits
  D4BFh  SST        64K   16x4K    20us?,?,?       10,  40, 200  3,2
  1CC2h  Macronix   64K   16x4K    ?,?,?           10,2000,2000  8,3
  1B32h  Panasonic  64K   16x4K    ?,?,?           10, 500, 500  4,2
  3D1Fh  Atmel      64K   512x128  ?,?,?           ...40..,  40  8,8
  1362h  Sanyo      128K  ?        ?,?,?           ?    ?    ?    ?
  09C2h  Macronix   128K  ?        ?,?,?           ?    ?    ?    ?
Identification Codes MSB=Device Type, LSB=Manufacturer.
Size in bytes, and numbers of sectors * sector size in bytes.
Average medium Write, Erase Sector, Erase Chips timings are unknown?
Timeouts in milliseconds for Write, Erase Sector, Erase Chips.
Waitstates for Writes, and Reads in clock cycles.

Accessing FLASH Memory
FLASH memory is located in the "SRAM" area at E000000h..E00FFFFh, which is restricted to 16bit address and 8bit data buswidths. Respectively, the memory can be accessed <only> by 8bit read/write LDRB/STRB opcodes.
Also, reading anything (data or status/busy information) can be done <only> by opcodes executed in WRAM (not from opcodes in ROM) (there's no such restriction for writing).

FLASH Waitstates
Use 8 clk waitstates for initial detection (WAITCNT Bits 0,1 both set). After detection of certain device types smaller wait values may be used for write/erase, and even smaller wait values for raw reading, see Device Types table.
In practice, games seem to use smaller values only for write/erase (even though those operations are slow anyways), whilst raw reads are always done at 8 clk waits (even though reads could actually benefit slightly from smaller wait values).

Verify Write/Erase and Retry
Even though device signalizes the completion of write/erase operations, it'd be recommended to read/confirm the content of the changed memory area by software. In practice, Nintendo's "erase-write-verify-retry" function typically repeats the operation up to three times in case of errors.
Also, for SST devices only, the "erase-write" and "erase-write-verify-retry" functions repeat the erase command up to 80 times, additionally followed by one further erase command if no retries were needed, otherwise followed by six further erase commands.

FLASH (64Kbytes) is used by the game Sonic Advance, and possibly others.

  GBA Cart Backup DACS

128 KBytes - 1Mbit DACS - Lifetime: 100,000 writes.
1024 KBytes - 8Mbit DACS - Lifetime: 100,000 writes.

DACS (Debugging And Communication System) is used in Nintendo's hardware debugger only, DACS is NOT used in normal game cartridges.

Parts of DACS memory is used to store the debugging exception handlers (entry point/size defined in cartridge header), the remaining memory could be used to store game positions or other data. The address space is the upper end of the 32MB ROM area, the memory can be read directly by the CPU, including for ability to execute program code in this area.

  GBA Cart I/O Port (GPIO)

4bit General Purpose I/O Port (GPIO) - contained in the ROM-chip

Used by Boktai for RTC and Solar Sensor:
GBA Cart Real-Time Clock (RTC)
GBA Cart Solar Sensor
And by Warioware Twisted for Rumble and Z-Axis Sensor:
GBA Cart Rumble
GBA Cart Gyro Sensor
Might be also used by other games for other purposes, such like other sensors, or SRAM bank switching, etc.

The I/O registers are mapped to a 6-byte region in the ROM-area at 80000C4h, the 6-byte region should be zero-filled in the ROM-image. In Boktai, the size of the zero-filled region is 0E0h bytes - that probably due to an incorrect definition (the additional bytes do not contain any extra ports, nor mirrors of the ports in the 6-byte region). Observe that ROM-bus writes are limited to 16bit/32bit access (STRB opcodes are ignored; that, only in DS mode?).

80000C4h - I/O Port Data (selectable W or R/W)
  bit0-3  Data Bits 0..3 (0=Low, 1=High)
  bit4-15 not used (0)

80000C6h - I/O Port Direction (for above Data Port) (selectable W or R/W)
  bit0-3  Direction for Data Port Bits 0..3 (0=In, 1=Out)
  bit4-15 not used (0)

80000C8h - I/O Port Control (selectable W or R/W)
  bit0    Register 80000C4h..80000C8h Control (0=Write-Only, 1=Read/Write)
  bit1-15 not used (0)
In write-only mode, reads return 00h (or possible other data, if the rom contains non-zero data at that location).

Connection Examples
  GPIO       | Boktai  | Wario
  Bit Pin    | RTC SOL | GYR RBL
  0   ROM.1  | SCK CLK | RES -
  1   ROM.2  | SIO RST | CLK -
  2   ROM.21 | CS  -   | DTA -
  3   ROM.22 | -   FLG | -   MOT
  IRQ ROM.43 | IRQ -   | -   -

Aside from the I/O Port, the ROM-chip also includes an inverter (used for inverting the RTC /IRQ signal), and some sort of an (unused) address decoder output (which appears to be equal or related to A23 signal) (ie. reacting on ROM A23, or SRAM D7, which share the same pin on GBA slot).

  GBA Cart Real-Time Clock (RTC)

S3511 - 8pin RTC with 3-wire serial bus (used in Boktai)

The RTC chip is (almost) the same as used in NDS consoles:
DS Real-Time Clock (RTC)
The chip is accessed via 4bit I/O port (only 3bits are used for RTC):
GBA Cart I/O Port (GPIO)

Comparision of RTC Registers
  stat2       control     (1-byte)
  datetime    datetime    (7-byte)
  time        time        (3-byte)
  stat1       force reset (0-byte)
  clkadjust   force irq   (0-byte)
  alarm1/int1 always FFh  (boktai contains code for writing 1-byte to it)
  alarm2      always FFh  (unused)
  free        always FFh  (unused)

Control Register
  Bit Dir Expl.
  0   -   Not used
  1   R/W IRQ duty/hold related?
  2   -   Not used
  3   R/W Per Minute IRQ (30s duty)        (0=Disable, 1=Enable)
  4   -   Not used
  5   R/W Unknown?
  6   R/W 12/24-hour Mode                  (0=12h, 1=24h) (usually 1)
  7   R   Power-Off (auto cleared on read) (0=Normal, 1=Failure)
Setting after Battery-Shortcut is 82h. Setting after Force-Reset is 00h.
Unused bits seem to be always zero, but might be read-only or write-only?

Datetime and Time Registers
Same as NDS, except AM/PM flag moved from hour.bit6 (NDS) to hour.bit7 (GBA).

Force Reset/Irq Registers
Used to reset all RTC registers (all used registers become 00h, except day/month which become 01h), or to drag the IRQ output LOW for a short moment. These registers are strobed by ANY access to them, ie. by both writing to, as well as reading from these registers.

Pin-Outs / IRQ Signal
The package has identical pin-outs as in NDS, although it is slightly larger than the miniature chip in the DS.
For whatever reason, the RTC's /IRQ output is passed through an inverter (contained in the ROM-chip), the inverted signal is then passed to the /IRQ pin on the cartridge slot. So, IRQ's will be triggered on the "wrong" edge - possible somehow in relation with detecting cartridge-removal IRQs?

  GBA Cart Solar Sensor

Uses a Photo Diode as Solar Sensor (used in Boktai, allowing to defeat vampires when the cartridge is exposed to sunlight). The cartridge comes in transparent case, and it's slightly longer than normal carts, so the sensor reaches out of the cartridge slot. According to the manual, the sensor works only with sunlight, but actually it works with any strong light source (eg. a 100 Watt bulb at 1-2 centimeters distance). The sensor is accessed via 4bit I/O port (only 3bits used), which is contained in the ROM-chip.
GBA Cart I/O Port (GPIO)

A/D Conversion
The cartridge uses a self-made A/D converter, which is (eventually) better than measuring a capacitor charge-up time, and/or less expensive than a real ADC-chip:
It contains a 74LV4040 12bit binary counter (clocked by CPU via the I/O port), of which only the lower 8bit are used, which are passed to a resistor ladder-type D/A converter, which is generating a linear increasing voltage, which is passed to a TLV272 voltage comparator, which is passing a signal to the I/O port when the counter voltage becomes greater than the sensor voltage.

Example Code
  strh  0001h,[80000c8h] ;-enable R/W mode
  strh  0007h,[80000c6h] ;-init I/O direction
  strh  0002h,[80000c4h] ;-reset counter to zero (high=reset) (I/O bit0)
  strh  0000h,[80000c4h] ;-clear reset (low=normal)
  mov   r0,0             ;-initial level
  strh  0001h,[80000c4h] ;-clock high ;\increase counter      (I/O bit1)
  strh  0000h,[80000c4h] ;-clock low  ;/
  ldrh  r1,[80000c4h]    ;-read port                          (I/O bit3)
  tst   r1,08h           ;\
  addeq r0,1             ; loop until voltage match (exit with r0=00h..FFh),
  tsteq r0,100h          ; or until failure/timeout (exit with r0=100h)
  beq   @@lop            ;/
The results vary depending on the clock rate used. In above example, ensure that IRQs or DMAs do not interrupt the function. Alternately, use a super-slow clock rate (eg. like 666Hz used in Boktai) so that additional small IRQ/DMA delays have little effect on the overall timing. Results should be somewhat:
  E8h  total darkness (including daylight on rainy days)
  Dxh  close to a 100 Watt Bulb
  5xh  reaches max level in boktai's solar gauge
  00h  close to a tactical nuclear bomb dropped on your city
The exact values may change from cartridge to cartridge, so it'd be recommened to include a darkness calibration function, prompting the user to cover the sensor for a moment.

  GBA Cart Tilt Sensor

Yoshi's Universal Gravitation / Yoshi Topsy Turvy (X/Y-Axis)
Koro Koro Puzzle (probably same as Yoshi, X/Y-Axis, too) (?)

Yoshi-Type (X/Y-Axis)
All of the registers are one byte wide, mapped into the top "half" of the SRAM memory range.
  E008000h (W) Write 55h to start sampling
  E008100h (W) Write AAh to start sampling
  E008200h (R) Lower 8 bits of X axis
  E008300h (R) Upper 4 bits of X axis, and Bit7: ADC Status (0=Busy, 1=Ready)
  E008400h (R) Lower 8 bits of Y axis
  E008500h (R) Upper 4 bits of Y axis
You must set SRAM wait control to 8 clocks to access it correctly.
You must also set the cartridge PHI terminal to 4 MHz to make it work.
Sampling routine (typically executed once a frame during VBlank):
  wait until [E008300h].Bit7=1 or until timeout ;wait ready
  x = ([E008300h] AND 0Fh)*100h + [E008200h]    ;get x
  y = ([E008500h] AND 0Fh)*100h + [E008400h]    ;get y
  [E008000h]=55h, [E008100h]=AAh                ;start next conversion
Example values (may vary on different carts and on temperature, etc):
  X ranged between 0x2AF to 0x477, center at 0x392.    Huh?
  Y ranged between 0x2C3 to 0x480, center at 0x3A0.    Huh?
Thanks to Flubba for Yoshi-Type information.
Unknown if the Yoshi-Type sensors are sensing rotation, or orientation, or motion, or something else? In case of rotation, rotation around X-axis would result in motion in Y-direction, so not too sure whether X and Y have which meaning?
Most probably, the sensors are measuring (both) static acceleration (gravity), and dynamic acceleration (eg. shaking the device left/right).
The X/Y values are likely to be mirrored depending on using a back-loading cartridge slot (original GBA), or front-loading cartridge slot (newer GBA SP, and NDS, and NDS-Lite).

  GBA Cart Gyro Sensor

Warioware Twisted (Z-Axis Gyro Sensor, plus Rumble)

Wario-Type (Z-Axis)
Uses a single-axis sensor, which senses rotation around the Z-axis. The sensor is connected to an analogue-in, serial-out ADC chip, which is accessed via lower 3 bits of the GPIO,
GBA Cart I/O Port (GPIO)
The four I/O Lines are connected like so,
  GPIO.Bit0 (W) Start Conversion
  GPIO.Bit1 (W) Serial Clock
  GPIO.Bit2 (R) Serial Data
  GPIO.Bit3 (W) Used for Rumble (not gyro related)
There should be at least <three sequential 32bit ARM opcodes executed in WS0 region> between the STRH opcodes which toggle the CLK signal. Wario uses WAITCNT=45B7h (SRAM=8clks, WS0/WS1/WS2=3,1clks, Prefetch=On, PHI=Off).
The data stream consists of: 4 dummy bits (usually zero), followed by 12 data bits, followed by endless unused bits (usually zero).
  mov  r1,8000000h      ;-cartridge base address
  mov  r0,01h           ;\enable R/W access
  strh r0,[r1,0c8h]     ;/
  mov  r0,0bh           ;\init direction (gpio2=input, others=output)
  strh r0,[r1,0c6h]     ;/
  ldrh r2,[r1,0c4h]     ;-get current state (for keeping gpio3=rumble)
  orr  r2,3                     ;\
  strh r2,[r1,0c4h] ;gpio0=1    ; start ADC conversion
  bic  r2,1                     ;
  strh r2,[r1,0c4h] ;gpio0=0    ;/
  mov  r0,00010000h ;stop-bit           ;\
  bic  r2,2                             ;
 @@lop:                                 ;
  ldrh r3,[r1,0c4h] ;get gpio2=data     ; read 16 bits
  strh r2,[r1,0c4h] ;gpio1=0=clk=low    ; (4 dummy bits, plus 12 data bits)
  movs r3,r3,lsr 3  ;gpio2 to cy=data   ;
  adcs r0,r0,r0     ;merge data, cy=done;
  orr  r3,r2,2      ;set bit1 and delay ;
  strh r3,[r1,0c4h] ;gpio1=1=clk=high   ;
  bcc  @@lop                            ;/
  bic  r0,0f000h                 ;-strip upper 4 dummy bits (isolate 12bit adc)
  bx   lr
Example values (may vary on different carts, battery charge, temperature, etc):
  354h  rotated in anti-clockwise direction (shock-speed)
  64Dh  rotated in anti-clockwise direction (normal fast)
  6A3h  rotated in anti-clockwise direction (slow)
  6C0h  no rotation                         (stopped)
  6DAh  rotation in clockwise direction     (slow)
  73Ah  rotation in clockwise direction     (normal fast)
  9E3h  rotation in clockwise direction     (shock-speed)
For detection, values 000h and FFFh would indicate that there's no sensor.
The Z-axis always points into same direction; no matter of frontloading or backloading cartridge slots.
Thanks to Momo Vampire for contributing a Wario cartridge.

X-Axis and Y-Axis are meant to be following the screens X and Y coordinates, so the Z-Axis would point into the screens depth direction.

DSi Cameras
DSi consoles can mis-use the built-in cameras as Gyro sensor (as done by the System Flaw DSi game).

  GBA Cart Rumble

Warioware Twisted (Rumble, plus Z-Axis Gyro Sensor)
Drill Dozer (Rumble only) <-- and ALSO supports Gameboy Player rumble?

GBA Rumble Carts are containing a small motor, which is causing some vibration when/while it is switched on (that, unlike DS Rumble, which must be repeatedly toggled on/off).

In Warioware Twisted, rumble is controlled via GPIO.Bit3 (Data 0=Low=Off, 1=High=On) (and Direction 1=Output), the other GPIO Bits are used for the gyro sensor.
GBA Cart I/O Port (GPIO)
Note: GPIO3 is connected to an external pulldown resistor (so the HighZ level gets dragged to Low=Off when direction is set to Input).

Unknown if Drill Dozer is controlled via GPIO.Bit3, too?

DS Rumble Pak
Additionally, there's a Rumble Pak for the NDS, which connects to the GBA slot, so it can be used also for GBA games (provided that the game doesn't require the GBA slot, eg. GBA multiboot games).
DS Cart Rumble Pak

Gamecube Rumble
Moreover, GBA games that are running on a Gameboy Player are having access to the Rumble function of Gamecube joypads.
GBA Gameboy Player

  GBA Cart e-Reader

GBA Cart e-Reader Overview
GBA Cart e-Reader I/O Ports
GBA Cart e-Reader Dotcode Format
GBA Cart e-Reader Data Format
GBA Cart e-Reader Program Code
GBA Cart e-Reader API Functions
GBA Cart e-Reader VPK Decompression
GBA Cart e-Reader Error Correction
GBA Cart e-Reader File Formats

  |   ShortStrip   |
  |L              L|
  |o    Center    o|
  |n    Region    n|
  |g              g|
  |  may contain   |
  |S   pictures,  S|
  |t instructions t|
  |r     etc.     r|
  |i              i|
  |p              p|

  GBA Cart e-Reader Overview

The e-Reader is a large GBA cartridge (about as big as the GBA console), with built-in dotcode scanning hardware. Dotcodes are tiny strips of black and white pixels printed on the edges of cardboard cards. The cards have to be pulled through a slot on the e-Reader, which is giving it a feeling like using a magnet card reader. The binary data on the dotcodes contains small games, either in native GBA code (ARM/THUMB), or in software emulated 8bit Z80 or NES/Famicom (6502) code.

The e-Reader Hardware
The hardware consists of regular 8MByte ROM and 128KByte FLASH chips, two link ports, a custom PGA chip, the camera module (with two red LEDs, used as light source), and some analogue components for generating the LED voltages, etc. The camera supports 402x302 pixels with 7bit monochrome color depth, but the PGA clips it to max 320 pixels per scanline with 1bit color depth.

Link Port Plug/Socket
The e-Reader's two link ports are simply interconnected with each other; without connection to the rest of the e-Reader hardware. These ports are used only on the original GBA (where the large e-Reader cartridge would be covering the GBA's link socket). When trying to insert the e-Reader into an original NDS (or GBA-Micro), then the e-Reader's link plug will hit against the case of the NDS, so it works only with some minor modification to the hardware. There's no such problem with GBA-SP and NDS-Lite.

There are 3 different e-Reader's: Japanese/Original, Japanese/Plus, and Non-Japanese. The Original version has only 64K FLASH, no Link Port, and reportedly supports only Z80 code, but no NES/GBA code. The Plus and Non-Japanese versions should be almost identical, except that they reject cards from the wrong region, and that the title strings aren't ASCII in Japan, the Plus version should be backwards compatible to the Original one.

The Problem
Nintendo's current programmers are definetly unable to squeeze a Pac-Man style game into less than 4MBytes. Their solution has been: MORE memory. That is, they've put a whopping 8MByte BIOS ROM into the e-Reader, which contains the User Interface, and software emulation for running some of their 20 years old 8bit NES and Game&Watch titles, which do fit on a few dotcode strips.

  GBA Cart e-Reader I/O Ports

DF80000h Useless Register (R/W)
  0     Output to PGA.Pin93 (which seems to be not connected to anything)
  1-3   Unknown, read/write-able (not used by e-Reader BIOS)
  4-15  Always zero (0)

DFA0000h Reset Register (R/W)
  0    Always zero              (0)
  1    Reset Something?         (0=Normal, 1=Reset)
  2    Unknown, always set      (1)
  3    Unknown, read/write-able (not used by e-Reader BIOS)
  4-7  Always zero              (0)
  8    Unknown, read/write-able (not used by e-Reader BIOS)
  9-15 Always zero              (0)

DFC0000h..DFC0027h Scanline Data (R)
Scanline data (40 bytes, for 320 pixels, 1bit per pixel, 0=black, 1=white).
The first (leftmost) pixel is located in the LSB of the LAST byte.
Port E00FFB1h.Bit1 (and [4000202h].Bit13) indicates when a new scanline is present, the data should be then transferred to RAM via DMA3 (SAD=DFC0000h, DAD=buf+y*28h, CNT=80000014h; a slower non-DMA transfer method would result in missed scanlines). After the DMA, software must reset E00FFB1h.Bit1.
Note: The scanning resolution is 1000 DPI.

DFC0028h+(0..2Fh*2) Brightest Pixels of 8x6 Blocks (R)
  0-6  Max Brightness (00h..7Fh; 00h=All black, 7Fh=One or more white)
  7-15 Always zero
Can be used to adjust the Port E00FF80h..E00FFAFh settings.

DFC0088h Darkest Pixel of whole Image (R)
  0-7  Max Darkness   (00h..7Fh; 00h=One or more black, 7Fh=All white)
  8-15 Always zero
Can be used to adjust the Port E00FF80h..E00FFAFh settings.

E00FF80h..E00FFAFh Intensity Boundaries for 8x6 Blocks (R/W)
The 320x246 pixel camera input is split into 8x6 blocks (40x41 pixels each), with Block00h=Upper-right, Block07h=Upper-left, ..., Block27h=Lower-left. The boundary values for the separate blocks are used for 128-grayscale to 2-color conversion, probably done like "IF Pixel>Boundary THEN white ELSE black".
  0-6  Block Intensity Boundaries (0..7Fh; 7Fh=Whole block gets black)
  7    Always zero
The default boundary values are stored in FLASH memory, the values are typically ranging from 28h (outer edges) to 34h (center image), that in respect to the light source (the two LEDs are emitting more light to the center region).

E00FFB0h Control Register 0 (R/W)
  0    Serial Data       (Low/High)
  1    Serial Clock      (Low/High)
  2    Serial Direction  (0=Input, 1=Output)
  3    Led/Irq Enable    (0=Off, 1=On; Enable LED and Gamepak IRQ)
  4    Start Scan        (0=Off, 1=Start) (0-to-1 --> Resync line 0)
  5    Phi 16MHz Output  (0=Off, 1=On; Enable Clock for Camera, and for LED)
  6    Power 3V Enable   (0=Off, 1=On; Enable 3V Supply for Camera)
  7    Not used          (always 0) (sometimes 1) (Read only)

E00FFB1h Control Register 1 (R/W)
  0    Not used          (always 0)
  1    Scanline Flag     (1=Scanline Received, 0=Acknowledge)
  2-3  Not used          (always 0)
  4    Strange Bit       (0=Normal, 1=Force Resync/Line0 on certain interval?)
  5    LED Anode Voltage (0=3.0V, 1=5.1V; requires E00FFB0h.Bit3+5 to be set)
  6    Not used          (always 0)
  7    Input from PGA.Pin22, always high (not used by e-Reader) (Read Only)
Bit1 can be SET by hardware only, software can only RESET that bit, the Gamepak IRQ flag (Port 4000202h.Bit13) becomes set on 0-to-1 transitions.

E00FFB2h Light Source LED Kathode Duration (LSB) (R/W)
E00FFB3h Light Source LED Kathode Duration (MSB) (R/W)
Selects the LED Kathode=LOW Duration, aka the LED=ON Duration. That does act as pulse width modulated LED brightness selection (the camera seems to react slowly enough to view the light as being dimmed to medium, rather than seeing the actual light ON and OFF states). The PWM timer seems to be clocked at 8MHz. The hardware clips timer values 2000h..FFFFh to max 2000h (=1ms). Additionally, the e-Reader BIOS clips values to max 11B3h. Default setting is found in FLASH calibration data. A value of 0000h disables the LED.

Serial Port Registers (Camera Type 1) (DV488800) (calib_data[3Ch]=1)
All 16bit values are ordered MSB,LSB. All registers are whole 8bit Read/Write-able, except 00h,57h-5Ah (read only), and 53h-55h (2bit only).
  Port     Expl.               (e-Reader Setting)
  00h      Maybe Chip ID (12h) (not used by e-Reader BIOS) (Read Only)
  01h                          (05h)    ;-Bit0: 1=auto-repeat scanning?
  02h                          (0Eh)
  10h-11h  Vertical Scroll     (calib_data[30h]+7)
  12h-13h  Horizontal Scroll   (0030h)
  14h-15h  Vertical Size       (00F6h=246)
  16h-17h  Horizontal Size     (0140h=320)
  20h-21h  H-Blank Duration    (00C4h)
  22h-23h                      (0400h)  ;-Upper-Blanking in dot-clock units?
  25h                          (var)    ;-bit1: 0=enable [57h..5Ah] ?
  26h                          (var)    ;\maybe a 16bit value
  27h                          (var)    ;/
  28h                          (00h)
  30h      Brightness/contrast (calib_data[31h]+/-nn)
  31h-33h                      (014h,014h,014h)
  34h      Brightness/contrast (02h)
  50h-52h  8bit Read/Write     (not used by e-Reader BIOS)
  53h-55h  2bit Read/Write     (not used by e-Reader BIOS)
  56h      8bit Read/Write     (not used by e-Reader BIOS)
  57h-58h  16bit value, used to autodetect/adjust register[30h] (Read Only)
  59h-5Ah  16bit value, used to autodetect/adjust register[30h] (Read Only)
  80h-FFh  Mirrors of 00h..7Fh (not used by e-Reader BIOS)
All other ports are unused, writes to those ports are ignored, and reads are returning data mirrored from other ports; that is typically data from 2 or more ports, ORed together.

Serial Port Registers (Camera Type 2) (calib_data[3Ch]=2)
All 16bit values are using more conventional LSB,MSB ordering, and port numbers are arranged in a more reasonable way. The e-Reader BIOS doesn't support (or doesn't require) brightness adjustment for this camera module.
  Port     Expl.             (e-Reader Setting)
  00h                        (22h)
  01h                        (50h)
  02h-03h  Vertical Scroll   (calib_data[30h]+28h)
  04h-05h  Horizontal Scroll (001Eh)
  06h-07h  Vertical Size     (00F6h)    ;=246
  08h-09h  Horizontal Size   (0140h)    ;=320
  0Ah-0Ch                    (not used by e-Reader BIOS)
  0Dh                        (01h)
  0Eh-0Fh                    (01EAh)    ;=245*2
  10h-11h                    (00F5h)    ;=245
  12h-13h                    (20h,F0h)  ;maybe min/max values?
  14h-15h                    (31h,C0h)  ;maybe min/max values?
  16h                        (00h)
  17h-18h                    (77h,77h)
  19h-1Ch                    (30h,30h,30h,30h)
  1Dh-20h                    (80h,80h,80h,80h)
  21h-FFh                    (not used by e-Reader BIOS)
This appears to be a Micron (aka Aptina) camera (resembling the DSi cameras).
My own e-Reader uses a Type 1 camera module. Not sure if Nintendo has ever manufactured any e-Readers with Type 2 cameras?

Calibration Data in FLASH Memory (Bank 0, Sector 0Dh)
  E00D000 14h  ID String ('Card-E Reader 2001',0,0)
  E00D014 2    Sector Checksum (NOT(x+x/10000h); x=sum of all other halfwords)
Begin of actual data (40h bytes)
  E00D016 8x6  [00h] Intensity Boundaries for 8x6 blocks ;see E00FF80h..AFh
  E00D046 1    [30h] Vertical scroll (0..36h)  ;see type1.reg10h/type2.reg02h
  E00D047 1    [31h] Brightness or contrast    ;see type1.reg30h
  E00D048 2    [32h] LED Duration              ;see E00FFB2h..B3h
  E00D04A 2    [34h] Not used?   (0000h)
  E00D04C 2    [36h] Signed value, related to adjusting the 8x6 blocks
  E00D04E 4    [38h] Not used?   (00000077h)
  E00D052 4    [3Ch] Camera Type (0=none,1=DV488800,2=Whatever?)
Remaining bytes in this Sector...
  E00D056 FAAh Not used (zerofilled) (included in above checksum)

Flowchart for Overall Camera Access
 call ereader_power_on
 call ereader_initialize
 for z=1 to number_of_frames
  for y=0 to 245
   Wait until E00FFB1h.Bit1 gets set by hardware (can be handled by IRQ)
   Copy 14h halfwords from DFC0000h to buf+y*28h via DMA3
   Reset E00FFB1h.Bit1 by software
  next y
  ;(could now check DFC0028h..DFC0086h/DFC0088h for adjusting E00FF00h..2Fh)
  ;(could now show image on screen, that may require to stop/pause scanning)
 next z
 call ereader_power_off
 [4000204h]=5803h   ;Init waitstates, and enable Phi 16MHz
 [E00FFB0h]=40h     ;Enable Power3V and reset other bits
 [E00FFB1h]=20h     ;Enable Power5V and reset other bits
 [E00FFB1h].Bit4=0  ;...should be already 0 ?
 [E00FFB0h]=40h+27h ;Phi16MHz=On, SioDtaClkDir=HighHighOut
 [E00FFB0h]=04h    ;Power3V=Off, Disable Everything, SioDtaClkDir=LowLowOut
 [DFA0000h].Bit1=0 ;...should be already 0
 [E00FFB1h].Bit5=0 ;Power5V=Off
 IF calib_data[3Ch] AND 03h = 1 THEN init_camera_type1
 [E00FFB0h].Bit4=1 ;ScanStart
 IF calib_data[3Ch] AND 03h = 2 THEN init_camera_type2
 Copy calib_data[00h..2Fh] to [E00FF80h+00h..2Fh]  ;Intensity Boundaries
 Copy calib_data[32h..33h] to [E00FFB2h+00h..01h]  ;LED Duration LSB,MSB
 [E00FFB0h].Bit3=1                                 ;LedIrqOn
 Set Sio Registers (as shown for Camera Type 1, except below values...)
 Set Sio Registers [30h]=x [25h]=04h, [26h]=58h, [27h]=6Ch
 ;(could now detect/adjust <x> based on Sio Registers [57h..5Ah])
 Set Sio Registers [30h]=x [25h]=06h, [26h]=E8h, [27h]=6Ch
 Set Sio Registers (as shown for Camera Type 2)

Accessing Serial Registers via E00FFB0h
      Begin   Write(A) Write(B) Read(C) Read(D) End     Idle    PwrOff
  Dir ooooooo ooooooo  ooooooo  iiiiiii iiiiiii ooooooo ooooooo ooooooo
  Dta ---____ AAAAAAA  BBBBBBB  xxxxxCx xxxxxDx ______- ------- _______
  Clk ------_ ___---_  ___---_  ___---_ ___---_ ___---- ------- _______

Flowchart for accessing Serial Registers via E00FFB0h (looks like I2C bus)
  Wait circa 2.5us, Ret
  SioDta=1, SioDir=Out, SioClk=1, Delay, SioDta=0, Delay, SioClk=0, Ret
  SioDta=0, SioDir=Out, Delay, SioClk=1, Delay, SioDta=1, Ret
 SioRead1bit:   ;out: databit
  SioDir=In, Delay, SioClk=1, Delay, databit=SioDta, SioClk=0, Ret
 SioWrite1bit:  ;in: databit
  SioDta=databit, SioDir=Out, Delay, SioClk=1, Delay, SioClk=0, Ret
 SioReadByte:   ;in: endflag - out: data
  for i=7 to 0, data.bit<i>=SioRead1bit, next i, SioWrite1bit(endflag), Ret
 SioWriteByte:  ;in: data - out: errorflag
  for i=7 to 0, Delay(huh/why?), SioWrite1bit(data.bit<i>), next i
  errorflag=SioRead1bit, SioDir=Out(huh/why?), Ret
 SioWriteRegisters:  ;in: index, len, buffer
  SioWriteByte(22h)        ;command (set_index) (and write_data)
  SioWriteByte(index)      ;index
  for i=0 to len-1
   SioWriteByte(buffer[i]) ;write data (and auto-increment index)
 SioReadRegisters:   ;in: index, len - out: buffer
  SioWriteByte(22h)        ;command (set_index) (without any write_data here)
  SioWriteByte(index)      ;index
  SioWriteByte(23h)        ;command (read_data) (using above index)
  for i=0 to len-1
   if i=len-1 then endflag=1 else endflag=0
   buffer[i]=SioReadByte(endflag)  ;read data (and auto-increment index)
Caution: Accessing the SIO registers appears highly unstable, and seems to require error handling with retries. Not sure what is causing that problem, possibly the registers cannot be accessed during camera-data-scans...?

The e-Reader BIOS uses WAITCNT [4000204h]=5803h when accessing the PGA, that is, gamepak 16.78MHz phi output (bit11-12=3), 8 waits for SRAM region (bit0-1=3), gamepak prefetch enabled (bit14=1), also sets WS0 to 4,2 waits (bit2-4=0), and sets WS2 to odd 4,8 waits (bit8-10=0). The WS2 (probably WS0 too) settings are nonsense, and should work with faster timings (the e-Reader can be accessed in NDS mode, which doesn't support that slow timings).

e-Reader Memory and I/O Map (with all used/unused/mirrored regions)
  C000000h-C7FFFFFh  ROM (8MB)
  C800000h-DF7FFFFh  Open Bus
  DF80000h-DF80001h  Useless Register (R/W)
  DF80002h-DF9FFFFh  Mirrors of DF80000h-DF80001h
  DFA0000h-DFA0001h  Reset Register (R/W)
  DFA0002h-DFBFFFFh  Mirrors of DFA0000h-DFA0001h
  DFC0000h-DFC0027h  Scanline Data (320 Pixels) (R)
  DFC0028h-DFC0087h  Brightest Pixels of 8x6 Blocks (R)
  DFC0088h           Darkest Pixel of whole Image (R)
  DFC0089h-DFC00FFh  Always zero
  DFC0100h-DFDFFFFh  Mirrors of DFC0000h-DFC00FFh
  DFE0000h-DFFFFFFh  Open Bus
  E000000h-E00CFFFh  FLASH Bank 0 - Data
  E00D000h-E00DFFFh  FLASH Bank 0 - Calibration Data
  E00E000h-E00EFFFh  FLASH Bank 0 - Copy of Calibration Data
  E00F000h-E00FF7Fh  FLASH Bank 0 - Unused region
  E000000h-E00EFFFh  FLASH Bank 1 - Data
  E00F000h-E00FF7Fh  FLASH Bank 1 - Unused region
  E00FF80h-E00FFAFh  Intensity Boundaries for 8x6 Blocks (R/W)
  E00FFB0h           Control Register 0 (R/W)
  E00FFB1h           Control Register 1 (R/W)
  E00FFB2h-E00FFB3h  LED Duration (16bit) (R/W)
  E00FFB4h-E00FFBFh  Always zero
  E00FFC0h-E00FFFFh  Mirror of E00FF80h-E00FFBFh
Mind that WS2 should be accessed by LDRH/STRH, and SRAM region by LDRB/STRB.
Additionally about 32 serial bus registers are contained in the camera module.

Camera Module Notes
The Type 1 initial setting on power-on is 402x302 pixels, the e-Reader uses only 320x246 pixels. The full vertical resolution could be probably used without problems. Port DFC0000h-DFC0027h are restricted to 320 pixels, so larger horizontal resolutions could be probably obtained only by changing the horizontal scroll offset on each 2nd scan.
The camera output is 128 grayscales (via parallel 7bit databus), but the PGA converts it to 2 colors (1bit depth). For still images, it might be possible to get 4 grayshades via 3 scans with different block intensity boundary settings.
No idea if the camera supports serial commands other than 22h and 23h. Namely, it <would> be a quite obvious and basic feature to allow to receive the bitmap via the 2-wire serial bus (alternately to the 7bit databus), if supported, it'd allow to get 7bit images, bypassing 1bit PGA conversion.
When used as actual camera (by cutting an opening in the case), the main problem is the 1bit color depth, which allows only black and white schemes, when/if solving that problem, focusing might be also a problem.

Either the camera or the PGA seem to have a problem on white-to-black transitions in vertical direction, the upper some black pixels are sorts of getting striped or dithered. For example, scanning the large sync marks appears as:
  Actual Shape    Scanned Shape
     XXXXX            X  X
    XXXXXXX           X  X X
   XXXXXXXXX        X X  X XX
   XXXXXXXXX        X X  X XX
    XXXXXXX          XXXXXXX
     XXXXX            XXXXX
That appears only on large black shapes (the smaller data dots look better). Probably the image is scanned from bottom upwards (and the camera senses only the initial transition at the bottom, and then looses track of what it is doing).

  GBA Cart e-Reader Dotcode Format

Resolution is 342.39 DPI (almost 10 blocks per inch).
Resolution is 134.8 dots/cm (almost 4 blocks per centimeter).
The width and height of each block, and the spacing to the bottom edge of the card is ca. 1/10 inch, or ca. 4 millimeters.

   XXX            BLOCK 1             XXX            BLOCK 2             XXX
  XXXXX                              XXXXX                              XXXXX
  XXXXX                              XXXXX                              XXXXX
         ..........................         ..........................
         ...... 3 short lines .....         ..........................
    A....      26 long lines       ....A........ X = Sync Marks   ........A..
    A....  (each 34 data dots)     ....A........ H = Block Header ........A..
    A....(not all lines shown here)....A........ . = Data Bits    ........A..
    A..................................A........ A = Address Bits ........A..
         ...... 3 short lines .....         ..........................
         ...(each 26 data dots)....         ..........................
   XXX   ..........................   XXX   ..........................   XXX
  XXXXX                              XXXXX                              XXXXX
  XXXXX                              XXXXX                              XXXXX
   XXX                                XXX                                XXX
             <ca. 35 blank lines>

Address Columns
Each Column consists of 26 dots. From top to bottom: 1 black dot, 8 blank dots, 16 address dots (MSB topmost), and 1 blank dot. The 16bit address values can be calculated as:
  addr[0] = 03FFh
  for i = 1 to 53
    addr[i] = addr[i-1] xor ((i and (-i)) * 769h)
    if (i and 07h)=0 then addr[i] = addr[i] xor (769h)
    if (i and 0Fh)=0 then addr[i] = addr[i] xor (769h*2)
    if (i and 1Fh)=0 then addr[i] = addr[i] xor (769h*4) xor (769h)
  next i
Short strips use addr[1..19], long strips use addr[25..53], left to right.

Block Header
The 18h-byte Block Header is taken from the 1st two bytes (20 dots) of the 1st 0Ch blocks (and is then repeated in the 1st two bytes of further blocks).
  00h      Unknown              (00h)
  01h      Dotcode type         (02h=Short, 03h=Long)
  02h      Unknown              (00h)
  03h      Address of 1st Block (01h=Short, 19h=Long)
  04h      Total Fragment Size  (40h) ;64 bytes per fragment, of which,
                                      ;48 bytes are actual data, the remaining
  05h      Error-Info Size      (10h) ;16 bytes are error-info
  06h      Unknown              (00h)
  07h      Interleave Value     (1Ch=Short, 2Ch=Long)
  08h..17h 16 bytes Reed-solomon error correction info for Block Header

Data 4-Bit to 5-bit Conversion
In the Block Header (HHHHH), and Data Region (.....), each 4bit are expanded to 5bit, so one byte occupies 10 dots, and each block (1040 data dots) contains 104 bytes.
  4bit  00h 01h 02h 03h 04h 05h 06h 07h 08h 09h 0Ah 0Bh 0Ch 0Dh 0Eh 0Fh
  5bit  00h 01h 02h 12h 04h 05h 06h 16h 08h 09h 0Ah 14h 0Ch 0Dh 11h 10h
That formatting ensures that there are no more than two continous black dots (in horizontal direction), neither inside of a 5bit value, nor between two 5bit values, however, the address bars are violating that rule, and up to 5 continous black dots can appear at the (..A..) block boundaries.

Data Order
Data starts with the upper bit of the 5bit value for the upper 4bit of the first byte, which is located at the leftmost dot of the upper line of the leftmost block, it does then extend towards rightmost dot of that block, and does then continue in the next line, until reaching the bottom of the block, and does then continue in the next block. The 1st two bytes of each block contain a portion of the Block Header, the remaining 102 bytes in each block contain data.

Data Size
A long strip consists of 28 blocks (28*104 = 2912 bytes), a short strip of 18 blocks (18*104 = 1872 bytes). Of which, less than 75% can be actually used for program code, the remaining data contains error correction info, and various headers. See Data Format for more info.

Interleaved Fragments
The Interleave Value (I) specifies the number of fragments, and does also specify the step to the next byte inside of a fragment; except that, at the block boundaries (every 104 bytes), the step is 2 bigger (for skipping the next two Block Header bytes).
  RAW Offset  Content
  000h..001h  1st 2 bytes of RAW Header
  002h        1st byte of 1st fragment
  003h        1st byte of 2nd fragment
  ...         ...
  002h+I-1    1st byte of last fragment
  002h+I      2nd byte of 1st fragment
  003h+I      2nd byte of 2nd fragment
  ...         ...
  002h+I*2-1  2nd byte of last fragment
  ...         ...
Each fragment consists of 48 actual data bytes, followed by 16 error correction bytes, followed by 0..2 unused bytes (since I*40h doesn't exactly match num_blocks*102).

  GBA Cart e-Reader Data Format

Data Strip Format
The size of the data region is I*48 bytes (I=Interleave Value, see Dotcode Format), the first 48-byte fragment contains the Data Header, the remaining (I-1) fragments are Data Fragments (which contain title(s), and VPK compressed program code).

First Strip
  Data Header  (48 bytes)
  Main-Title   (17 bytes, or 33 bytes)
  Sub-Title(s) (3+18 bytes, or 33 bytes) (for each strip) (optional)
  VPK Size     (2 byte value, total length of VPK Data in ALL strips)
  NULL Value   (4 bytes, contained ONLY in 1st strip of GBA strips)
  VPK Data     (length as defined in VPK Size entry, see above)

Further Strip(s)
  Data Header  (48 bytes)
  Main-Title   (17 bytes, or 33 bytes)
  Sub-Title(s) (3+18 bytes, or 33 bytes) (for each strip) (optional)
  VPK Data     (continued from previous strip)

Data Header (30h bytes) (1st fragment)
  00h-01h  Fixed         (00h,30h)
  02h      Fixed         (01h)     ;01h="Do not calculate Global Checksum" ?
  03h      Primary Type  (see below)
  04h-05h  Fixed         (00h,01h) (don't care)
  06h-07h  Strip Size    (0510h=Short, 0810h=Long Strip) ((I-1)*30h) (MSB,LSB)
  08h-0Bh  Fixed         (00h,00h,10h,12h)
  0Ch-0Dh  Region/Type   (see below)
  0Eh      Strip Type    (02h=Short Strip, 01h=Long Strip) (don't care)
  0Fh      Fixed         (00h) (don't care)
  10h-11h  Unknown       (whatever) (don't care)
  12h      Fixed         (10h)     ;10h="Do calculate Data Checksum" ?
  13h-14h  Data Checksum (see below) (MSB,LSB)
  15h-19h  Fixed         (19h,00h,00h,00h,08h)
  1Ah-21h  ID String     ('NINTENDO')
  22h-25h  Fixed         (00h,22h,00h,09h)
  26h-29h  Size Info     (see below)
  2Ah-2Dh  Flags         (see below)
  2Eh      Header Checksum (entries [0Ch-0Dh,10h-11h,26h-2Dh] XORed together)
  2Fh      Global Checksum (see below)
Primary Type [03h] is 8bit,
  0      Card Type (upper bit) (see below)
  1      Unknown (usually opposite of Bit0) (don't care)
  2-7    Unknown (usually zero)
Region/Type [0Ch..0Dh] is 16bit,
  0-3    Unknown (don't care)
  4-7    Card Type (lower bits) (see below)
  8-11   Region/Version (0=Japan/Original, 1=Non-japan, 2=Japan/Plus)
  12-15  Unknown (don't care)
Size Info [26h-29h] is 32bit,
  0      Unknown            (don't care)
  1-4    Strip Number       (01h..Number of strips)
  5-8    Number of Strips   (01h..0Ch) (01h..08h for Japan/Original version)
  9-23   Size of all Strips (excluding Headers and Main/Sub-Titles)
         (same as "VPK Size", but also including the 2-byte "VPK Size" value,
         plus the 4-byte NULL value; if it is present)
  24-31  Fixed              (02h) (don't care)
Flags [2Ah-2Dh] is 32bit,
  0      Permission to save (0=Start Immediately, 1=Prompt for FLASH Saving)
  1      Sub-Title Flag     (0=Yes, 1=None)    (Japan/Original: always 0=Yes)
  2      Application Type   (0=GBA/Z80, 1=NES) (Japan/Original: always 0=Z80)
  3-31   Zero (0) (don't care)
Data Checksum [13h-14h] is the complement (NOT) of the sum of all halfwords in all Data Fragments, however, it's all done in reversed byte order: checksum is calculated with halfwords that are read in MSB,LSB order, and the resulting checksum is stored in MSB,LSB order in the Header Fragment.
Global Checksum [2Fh] is the complement (NOT) of the sum of the first 2Fh bytes in the Data Header plus the sum of all Data Fragment checksums; the Data Fragment checksums are all 30h bytes in a fragment XORed with each other.

Titles (3+N bytes, or N bytes)
Titles can be 33 bytes for both Main and Sub (Format 0Eh), or Main=17 bytes and Sub=3+18 bytes (Formats 02h..05h). In the 3+N bytes form, the first 3 bytes (24bit) are are used to display "stats" information in form of "HP: h1 ID: i1-i2-i3", defined as:
  Bit    Expl.
  0-3    h1, values 1..15 shown as "10..150", value 0 is not displayed
  4-6    i3, values 0..7 shown as "A..G,#"
  7-13   i2, values 0..98 shown as "01..99" values 99..127 as "A0..C8"
  14-18  i1, values 0..31 shown as "A..Z,-,_,{HP},.,{ID?},:"
  19-22  Unknown
  23     Disable stats (0=Show as "HP: h1 ID: i1-i2-i3", 1=Don't show it)
The N bytes portion contains the actual title, which must be terminated by 00h (so the max length is N-1 characters, if it is shorter than N-1, then the unused bytes are padded by further 00h's). The character set is normal ASCII for non-Japan (see Region/Version entry in header), and 2-byte SHIFT-JIS for Japanese long-titles (=max 16 2-byte chars) with values as so:
  00h          --> end-byte
  81h,40h      --> SPC
  81h,43h..97h --> punctuation marks
  82h,4Fh..58h --> "0..9"
  82h,60h..79h --> "A..Z"
  82h,81h..9Ah --> "a..z"
And 1-byte chars for Japanese short-titles,
  00     = end-byte
  01     = spc
  02..0B = 0..9
  0C..AF = japanese
  B0..B4 = dash, male, female, comma, round-dot
  B5..C0 = !"%&~?/+-:.'
  C1..DA = A..Z
  DB..DF = unused (blank)
  E0..E5 = japanese
  E6..FF = a..z
  N/A    = #$()*;<=>@[\]^_`{|}
Additionally to the Main-Title, optional Sub-Titles for each strip can be included (see Sub-Title Flag in header). If enabled, then ALL strip titles are included in each strip (allowing to show a preview of which strips have/haven't been scanned yet).
The e-Reader can display maximum of 8 sub-titles, if the data consists of more than 8 strips, then sub-titles aren't displayed (so it'd be waste of space to include them in the dotcodes).
The Main Title gets clipped to 128 pixels width (that are, circa 22 characters), and, the e-Reader BIOS acts confused on multi-strip games with Main Titles longer than 26 characters (so the full 33 bytes may be used only in Japan; with 16bit charset).
If the title is empty (00h-filled), and there is only one card in the application, then the application is started immediately. That, without allowing the user to save it in FLASH memory.
Caution: Although shorter Titles do save memory, they do act unpleasant: the text "(C) P-Letter" will be displayed at the bottom of the loading screen.
On Japanese/Original, 8bit sub-titles can be up to 18 characters (without any end-byte) (or less when stats are enabled, due to limited screen width).

Card Types (Primary Type.Bit0 and Region/Type.Bit12-15)
  00h..01h  Blank Screen (?)
  02h..03h  Dotcode Application with 17byte-title, with stats, load music A
  04h..05h  Dotcode Application with 17byte-title, with stats, load music B
  06h..07h  P-Letter Attacks
  08h..09h  Construction Escape
  0Ah..0Bh  Construction Action
  0Ch..0Dh  Construction Melody Box
  0Eh       Dotcode Application with 33byte-title, without stats, load music A
  0Fh       Game specific cards
  10h..1Dh  P-Letter Viewer
  1Eh..1Fh  Same as 0Eh and 0Fh (see above)
The 'Application' types are meant to be executable GBA/Z80/NES programs.

  GBA Cart e-Reader Program Code

The GBA/Z80/NES program code is stored in the VPK compressed area.
NES-type is indicated by header [2Ah].Bit2, GBA-type is indicated by the NULL value inserted between VPK Size and VPK Data, otherwise Z80-type is used.

GBA Format
Load Address and Entrypoint are at 2000000h (in ARM state). The 32bit word at 2000008h is eventually destroyed by the e-Reader. Namely,
  IF e-Reader is Non-Japanese,
  AND [2000008h] is outside of range of 2000000h..20000E3h,
  AND only if booted from camera (not when booted from FLASH?),
  THEN [2000008h]=[2000008h]-0001610Ch ELSE [2000008h] kept intact
Existing multiboot-able GBA binaries can be converted to e-Reader format by,
  Store "B 20000C0h" at 2000000h   ;redirect to RAM-entrypoint
  Zerofill 2000004h..20000BFh      ;erase header (for better compression rate)
  Store 01h,01h at 20000C4h        ;indicate RAM boot
The GBA code has full access to the GBA hardware, and may additionally use whatever API functions contained in the e-Reader BIOS. With the incoming LR register value, "mov r0,N, bx lr" returns to the e-Reader BIOS (with N being 0=Restart, or 2=To_Menu). No idea if it's necessary to preserve portions of RAM when returning to the e-Reader BIOS?
Caution: Unlike for normal GBA cartridges/multiboot files, the hardware is left uninitialized when booting dotcodes (among others: sound DMA is active, and brightness is set to zero), use "mov r0,0feh, swi 010000h" to get the normal settings.

NES Format
Emulates a NES (Nintendo Entertainment System) console (aka Family Computer).
The visible 240x224 pixel NES/NTSC screen resolution is resampled to 240x160 to match the smaller vertical resolution of the GBA hardware. So, writing e-Reader games in NES format will result in blurred screen output. The screen/sound/joypad is accessed via emulated NES I/O ports, program code is running on an emulated 6502 8bit CPU, for more info on the NES hardware, see no$nes debugger specifications, or
The e-Reader's NES emulator supports only 16K PRG ROM, followed by 8K VROM. The emulation accuracy is very low, barely working with some of Nintendo's own NES titles; running the no$nes diagnostics program on it has successfully failed on ALL hardware tests ;-)
The load address for the 16K PRG-ROM is C000h, the 16bit NMI vector at [FFFAh] is encrypted like so:
  for i=17h to 0
   for j=07h to 0, nmi = nmi shr 1, if carry then nmi = nmi xor 8646h, next j
   nmi = nmi xor (byte[dmca_data+i] shl 8)
  next i
  dmca_data: db 0,0,'DMCA NINTENDO E-READER'
The 16bit reset vector at [FFFCh] contains:
  Bit0-14  Lower bits of Entrypoint (0..7FFFh = Address 8000h..FFFFh)
  Bit15    Nametable Mode (0=Vertical Mirroring, 1=Horizontal Mirroring)
   (NES limitations, 1 16K program rom + 1-2 8K CHR rom, mapper 0 and 1)
   ines mapper 1 would be MMC1, rather than CNROM (ines mapper 3)?
   but, there are more or less NONE games that have 16K PRG ROM + 16K VROM?
The L+R Button key-combination allows to reset the NES, however, there seems to be no way to return to the e-Reader BIOS.

Z80/8080 Format
The e-Reader doesn't support the following Z80 opcodes:
  CB [Prefix]     E0 RET PO   E2 JP PO,nn   E4 CALL PO,nn   27 DAA    76 HALT
  ED [Prefix]     E8 RET PE   EA JP PE,nn   EC CALL PE,nn   D3 OUT (n),A
  DD [IX Prefix]  F3 DI       08 EX AF,AF'  F4 CALL P,nn    DB IN A,(n)
  FD [IY Prefix]  FB EI       D9 EXX        FC CALL M,nn    xx RST 00h..38h
That is leaving not more than six supported Z80 opcodes (DJNZ, JR, JR c/nc/z/nz), everything else are 8080 opcodes. Custom opcodes are:
  76 WAIT A frames, D3 WAIT n frames, and C7/CF RST 0/8 used for API calls.
The load address and entrypoint are at 0100h in the emulated Z80 address space. The Z80 doesn't have direct access to the GBA hardware, instead video/sound/joypad are accessed via API functions, invoked via RST 0 and RST 8 opcodes, followed by an 8bit data byte, and with parameters in the Z80 CPU registers. For example, "ld a,02h, rst 8, db 00h" does return to the e-Reader BIOS.
The Z80/8080 emulation is incredibly inefficient, written in HLL code, developed by somebody whom knew nothing about emulation nor about ARM nor about Z80/8080 processors.

Running GBA-code on Japanese/Original e-Reader
Original e-Reader supports Z80 code only, but can be tweaked to run GBA-code:
   ld bc,data // ld hl,00c8h      ;src/dst
   ld a,[bc] // inc bc // ld e,a  ;lsb
   ld a,[bc] // inc bc // ld d,a  ;msb
   dw 0bcfh ;aka rst 8 // db 0bh  ;[4000000h+hl]=de (DMA registers)
   inc hl // inc  hl // ld a,l
   cp a,0dch // jr nz,lop
  mod1 equ $+1
   dw 37cfh ;aka rst 8 // db 37h  ;bx 3E700F0h
  ;below executed only on jap/plus... on jap/plus, above 37cfh is hl=[400010Ch]
   ld a,3Ah // ld [mod1],a                  ;bx 3E700F0h (3Ah instead 37h)
   ld hl,1 // ld [mod2],hl // ld [mod3],hl  ;base (0200010Ch instead 0201610Ch)
   jr retry
  mod2 equ $+1
   dd loader         ;40000C8h dma2sad (loader)            ;\
   dd 030000F0h      ;40000CCh dma2dad (mirrored 3E700F0h) ; relocate loader
   dd 8000000ah      ;40000D0h dma2cnt (copy 0Ah x 16bit)  ;/
  mod3 equ $+1
   dd main           ;40000D4h dma3sad (main)              ;\prepare main reloc
   dd 02000000h      ;40000D8h dma3dad (2000000h)          ;/dma3cnt see loader
   .align 2          ;alignment for 16bit-halfword
  org $+201600ch     ;jap/plus: adjusted to org $+200000ch
   mov r0,80000000h  ;(dma3cnt, copy 10000h x 16bit)
   mov r1,04000000h  ;i/o base
   strb r1,[r1,208h] ;ime=0 (better disable ime before moving ram)
   str r0,[r1,0DCh]  ;dma3cnt (relocate to 2000000h)
   mov r15,2000000h  ;start relocated code at 2000000h in ARM state
   ;...insert/append whatever ARM code here...

  GBA Cart e-Reader API Functions

Z80 Interface (Special Opcodes)
  db 76h       ;Wait8bit A
  db D3h,xxh   ;Wait8bit xxh
  db C7h,xxh   ;RST0_xxh
  db CFh,xxh   ;RST8_xxh
  ld r,[00xxh]       ;get system values (addresses differ on jap/ori)
  ld r,[00C2h..C3h]  ;GetKeyStateSticky (jap/ori: 9F02h..9F03h)
  ld r,[00C4h..C5h]  ;GetKeyStateRaw    (jap/ori: 9F04h..9F05h)
  ld r,[00C0h..C1h]  ;see Exit and ExitRestart
  ld r,[00D0h..D3h]  ;see Mul16bit
For jap/ori, 9Fxxh isn't forwards compatible with jap/plus, so it'd be better to check joypad via IoRead.

GBA Interface
  bx [30075FCh] ;ApiVector ;in: r0=func_no,r1,r2,r3,[sp+0],[sp+4],[sp+8]=params
  bx lr         ;Exit      ;in: r0 (0=Restart, 2=To_Menu)

The various Wait opcodes and functions are waiting as many frames as specified. Many API functions have no effect until the next Wait occurs.

Z80 RST0_xxh Functions / GBA Functions 02xxh
  RST0_00h FadeIn, A speed, number of frames (0..x)
  RST0_01h FadeOut
  RST0_02h BlinkWhite
  RST0_03h  (?)
  RST0_04h  (?) blend_func_unk1
  RST0_05h  (?)
  RST0_06h  (?)
  RST0_07h  (?)
  RST0_08h  (?)
  RST0_09h  (?) _020264CC_check
  RST0_0Ah  (?) _020264CC_free
  RST0_0Bh N/A (bx 0)
  RST0_0Ch N/A (bx 0)
  RST0_0Dh N/A (bx 0)
  RST0_0Eh N/A (bx 0)
  RST0_0Fh N/A (bx 0)
  RST0_10h LoadSystemBackground, A number of background (1..101), E bg# (0..3)
  RST0_11h SetBackgroundOffset, A=bg# (0..3), DE=X, BC=Y
  RST0_12h SetBackgroundAutoScroll
  RST0_13h SetBackgroundMirrorToggle
  RST0_14h  (?)
  RST0_15h  (?)
  RST0_16h  (?) write_000000FF_to_02029494_
  RST0_17h  (?)
  RST0_18h  (?)
  RST0_19h SetBackgroundMode, A=mode (0..2)
  RST0_1Ah  (?)
  RST0_1Bh  (?)
  RST0_1Ch  (?)
  RST0_1Dh  (?)
  RST0_1Eh  (?)
  RST0_1Fh  (?)
  RST0_20h LayerShow
  RST0_21h LayerHide
  RST0_22h  (?)
  RST0_23h  (?)
  RST0_24h ... [20264DCh+A*20h+1Ah]=DE, [20264DCh+A*20h+1Ch]=BC
  RST0_25h  (?)
  RST0_26h  (?)
  RST0_27h  (?)
  RST0_28h  (?)
  RST0_29h  (?)
  RST0_2Ah  (?)
  RST0_2Bh  (?)
  RST0_2Ch  (?)
  RST0_2Dh LoadCustomBackground, A bg# (0..3), DE pointer to struct_background,
           max. tile data size = 3000h bytes, max. map data size = 1000h bytes
  RST0_2Eh GBA: N/A - Z80: (?)
  RST0_2Fh  (?)
  RST0_30h CreateSystemSprite, - -   (what "- -" ???)
  RST0_31h SpriteFree, HL sprite handle
  RST0_32h SetSpritePos, HL=sprite handle, DE=X, BC=Y
  RST0_33h  (?) sprite_unk2
  RST0_34h SpriteFrameNext
  RST0_35h SpriteFramePrev
  RST0_36h SetSpriteFrame, HL=sprite handle, E=frame number (0..x)
  RST0_37h  (?) sprite_unk3
  RST0_38h  (?) sprite_unk4
  RST0_39h SetSpriteAutoMove, HL=sprite handle, DE=X, BC=Y
  RST0_3Ah  (?) sprite_unk5
  RST0_3Bh  (?) sprite_unk6
  RST0_3Ch SpriteAutoAnimate
  RST0_3Dh  (?) sprite_unk7
  RST0_3Eh SpriteAutoRotateUntilAngle
  RST0_3Fh SpriteAutoRotateByAngle
  RST0_40h SpriteAutoRotateByTime
  RST0_41h  (?) sprite_unk8
  RST0_42h SetSpriteAutoMoveHorizontal
  RST0_43h SetSpriteAutoMoveVertical
  RST0_44h  (?) sprite_unk9
  RST0_45h SpriteDrawOnBackground
  RST0_46h SpriteShow, HL=sprite handle
  RST0_47h SpriteHide, HL=sprite handle
  RST0_48h SpriteMirrorToggle
  RST0_49h  (?) sprite_unk10
  RST0_4Ah  (?) sprite_unk11
  RST0_4Bh  (?) sprite_unk12
  RST0_4Ch GetSpritePos
  RST0_4Dh CreateCustomSprite
  RST0_4Eh  (?)
  RST0_4Fh  (?) sprite_unk14
  RST0_50h  (?) sprite_unk15
  RST0_51h  (?) sprite_unk16
  RST0_52h  (?) sprite_unk17
  RST0_53h  (?) sprite_unk18
  RST0_54h  (?)
  RST0_55h  (?) sprite_unk20
  RST0_56h  (?)
  RST0_57h SpriteMove
  RST0_58h  (?) sprite_unk22
  RST0_59h  (?) sprite_unk23
  RST0_5Ah  (?) sprite_unk24
  RST0_5Bh SpriteAutoScaleUntilSize, C=speed (higher value is slower),
           HL=sprite handle, DE=size (0100h = normal size,
           lower value = larger, higher value = smaller)
  RST0_5Ch SpriteAutoScaleBySize
  RST0_5Dh SpriteAutoScaleWidthUntilSize
  RST0_5Eh SpriteAutoScaleHeightBySize
  RST0_5Fh  (?)
  RST0_60h  (?)
  RST0_61h  (?)
  RST0_62h  (?)
  RST0_63h  (?)
  RST0_64h hl=[[2024D28h+a*4]+12h]
  RST0_65h  (?) sprite_unk25
  RST0_66h SetSpriteVisible, HL=sprite handle, E=(0=not visible, 1=visible)
  RST0_67h  (?) sprite_unk26
  RST0_68h  (?) set_sprite_unk27
  RST0_69h  (?) get_sprite_unk27
  RST0_6Ah  (?)
  RST0_6Bh  (?)
  RST0_6Ch  (?)
  RST0_6Dh  (?)
  RST0_6Eh hl=[hl+000Ah]  ;r0=[r1+0Ah]
  RST0_6Fh  (?)
  RST0_70h  (?)
  RST0_71h  (?)
  RST0_72h  (?)
  RST0_73h  (?)
  RST0_74h  (?)
  RST0_75h  (?)
  RST0_76h  (?)
  RST0_77h  (?)
  RST0_78h  (?)
  RST0_79h  (?)
  RST0_7Ah  (?)
  RST0_7Bh  (?)
  RST0_7Ch  (?) _0202FD2C_unk12
  RST0_7Dh Wait16bit ;HL=num_frames (16bit variant of Wait8bit opcode/function)
  RST0_7Eh SetBackgroundPalette, HL=src_addr, DE=offset, C=num_colors (1..x)
  RST0_7Fh GetBackgroundPalette(a,b,c)
  RST0_80h SetSpritePalette, HL=src_addr, DE=offset, C=num_colors (1..x)
  RST0_81h GetSpritePalette(a,b,c)
  RST0_82h ClearPalette
  RST0_83h  (?) _0202FD2C_unk11
  RST0_84h  (?)
  RST0_85h  (?)
  RST0_86h  (?)
  RST0_87h  (?) _0202FD2C_unk8
  RST0_88h  (?) _0202FD2C_unk7
  RST0_89h  (?)
  RST0_8Ah  (?) _0202FD2C_unk6
  RST0_8Bh  (?) _0202FD2C_unk5
  RST0_8Ch GBA: N/A - Z80: (?)
  RST0_8Dh GBA: N/A - Z80: (?)
  RST0_8Eh  (?)
  RST0_8Fh WindowHide
  RST0_90h CreateRegion, H=bg# (0..3), L=palbank# (0..15),
           D,E,B,C=x1,y1,cx,cy (in tiles), return: n/a (no$note: n/a ???)
  RST0_91h SetRegionColor
  RST0_92h ClearRegion
  RST0_93h SetPixel
  RST0_94h GetPixel
  RST0_95h DrawLine
  RST0_96h DrawRect
  RST0_97h  (?) _0202FD2C_unk4
  RST0_98h SetTextColor, A=region handle, D=color foreground (0..15),
           E=color background (0..15)
  RST0_99h DrawText, A=region handle, BC=pointer to text, D=X, E=Y
           (non-japan uses ASCII text, but japanese e-reader's use STH ELSE?)
  RST0_9Ah SetTextSize
  RST0_9Bh  (?) RegionUnk7
  RST0_9Ch  (?) _0202FD2C_unk3
  RST0_9Dh  (?) _0202FD2C_unk2
  RST0_9Eh  (?) _0202FD2C_unk1
  RST0_9Fh Z80: (?) - GBA: SetBackgroundModeRaw
  RST0_A0h  (?)
  RST0_A1h  (?)
  RST0_A2h  (?) RegionUnk6
  RST0_A3h GBA: N/A - Z80: (?)
  RST0_A4h GBA: N/A - Z80: (?)
  RST0_A5h  (?)
  RST0_A6h  (?)
  RST0_A7h  (?)
  RST0_A8h  (?)
  RST0_A9h  (?)
  RST0_AAh  (?)
  RST0_ABh  (?)
  RST0_ACh  (?)
  RST0_ADh  (?) RegionUnk5
  RST0_AEh [202FD2Ch+122h]=A
  RST0_AFh [202FD2Ch+123h]=A
  RST0_B0h [202FD2Ch+124h]=A
  RST0_B1h  (?)
  RST0_B2h  (?)
  RST0_B3h GBA: N/A - Z80: Sqrt   ;hl=sqrt(hl)
  RST0_B4h GBA: N/A - Z80: ArcTan ;hl=ArcTan2(hl,de)
  RST0_B5h Sine                   ;hl=sin(a)*de
  RST0_B6h Cosine                 ;hl=cos(a)*de
  RST0_B7h  (?)
  RST0_B8h  (?)
  RST0_B9h N/A (bx 0)
  RST0_BAh N/A (bx 0)
  RST0_BBh N/A (bx 0)
  RST0_BCh N/A (bx 0)
  RST0_BDh N/A (bx 0)
  RST0_BEh N/A (bx 0)
  RST0_BFh N/A (bx 0)
  Below Non-Japan and Japan/Plus only (not Japan/Ori)
  RST0_C0h GetTextWidth(a,b)
  RST0_C1h GetTextWidthEx(a,b,c)
  RST0_C2h  (?)
  RST0_C3h Z80: N/A (bx 0) - GBA: (?)
  RST0_C4h  (?)
  RST0_C5h  (?)
  RST0_C6h  (?)
  RST0_C7h  (?)
  RST0_C8h  (?)
  RST0_C9h  (?)
  RST0_CAh  (?)
  RST0_CBh  (?)
  RST0_CCh  (?)
  RST0_CDh N/A (bx lr)
  RST0_CEh ;same as RST0_3Bh, but with 16bit mask
  RST0_CFh ;same as RST0_3Eh, but with 16bit de
  RST0_D0h ;same as RST0_3Fh, but with 16bit de
  RST0_D1h ;same as RST0_5Bh, but with 16bit de
  RST0_D2h ;same as RST0_5Ch, but with 16bit de
  RST0_D3h ;same as RST0_5Dh, but with 16bit de
  RST0_D4h ;same as RST0_5Eh, but with 16bit de
  RST0_D5h  (?)
  RST0_D6h  (?)
  RST0_D7h ;[202FD2Ch+125h]=A
  RST0_D8h  (?)
  RST0_D9h  (?)
  RST0_DAh  (?)
  RST0_DBh ;A=[3003E51h]
  RST0_DCh ;[3004658h]=01h
  RST0_DDh DecompressVPKorNonVPK
  RST0_DEh FlashWriteSectorSingle(a,b)
  RST0_DFh FlashReadSectorSingle(a,b)
  RST0_E0h SoftReset
  RST0_E1h GetCartridgeHeader     ;[hl+0..BFh]=[8000000h..80000BFh]
  RST0_E2h GBA: N/A - Z80: bx hl  ;in: hl=addr, af,bc,de,sp=param, out: a
  RST0_E3h Z80: N/A (bx 0) - GBA: (?)
  RST0_E4h  (?)
  RST0_E5h  (?)
  RST0_E6h  (?)
  RST0_E7h  (?)
  RST0_E8h  (?)
  RST0_E9h ;[2029498h]=0000h
  RST0_EAh Z80: N/A (bx 0) - GBA: InitMemory(a)
  RST0_EBh  (?) BL_irq_sio_dma3
  RST0_ECh ;hl = [3003E30h]*100h + [3003E34h]
  RST0_EDh FlashWriteSectorMulti(a,b,c)
  RST0_EEh FlashReadPart(a,b,c)
  RST0_EFh ;A=((-([2029416h] xor 1)) OR (+([2029416h] xor 1))) SHR 31
  RST0_F0h  (?) _unk1
  RST0_F1h RandomInit     ;in: hl=random_seed
  RST0_F2h                         (?)
  Below Japan/Plus only
  RST0_F3h  (?)
  RST0_F4h  (?)
  RST0_F5h  (?)
  RST0_F6h  (?)
  RST0_F7h GBA: N/A - Z80: (?)
  Below is undefined/garbage (values as so in Z80 mode)
  Jap/Ori: RST0_C0h      N/A (bx 0)
  Jap/Ori: RST0_C1h..FFh Overlaps RST8 jump list
  Non-Jap: RST0_F3h..FFh Overlaps RST8 jump list
  Jap/Pls: RST0_F8h..FFh Overlaps RST8 jump list

Z80 RST8_xxh Functions / GBA Functions 01xxh
  RST8_00h GBA: N/A - Z80: Exit       ;[00C0h]=a ;(1=restart, 2=exit)
  RST8_01h GBA: N/A - Z80: Mul8bit    ;hl=a*e
  RST8_02h GBA: N/A - Z80: Mul16bit   ;hl=hl*de, s32[00D0h]=hl*de
  RST8_03h Div                        ;hl=hl/de
  RST8_04h DivRem                     ;hl=hl mod de
  RST8_05h PlaySystemSound            ;in: hl=sound_number
  RST8_06h  (?) sound_unk1
  RST8_07h Random8bit                 ;a=random(0..FFh)
  RST8_08h SetSoundVolume
  RST8_09h BcdTime                    ;[de+0..5]=hhmmss(hl*bc)
  RST8_0Ah BcdNumber                  ;[de+0..4]=BCD(hl), [de+5]=00h
  RST8_0Bh IoWrite                    ;[4000000h+hl]=de
  RST8_0Ch IoRead                     ;de=[4000000h+hl]
  RST8_0Dh GBA: N/A - Z80:  (?)
  RST8_0Eh GBA: N/A - Z80:  (?)
  RST8_0Fh GBA: N/A - Z80:  (?)
  RST8_10h GBA: N/A - Z80:  (?)
  RST8_11h DivSigned                  ;hl=hl/de, signed
  RST8_12h RandomMax                  ;a=random(0..a-1)
  RST8_13h SetSoundSpeed
  RST8_14h  hl=[202FD20h]=[2024CACh]
  RST8_15h  hl=[2024CACh]-[202FD20h]
  RST8_16h SoundPause
  RST8_17h SoundResume
  RST8_18h PlaySystemSoundEx
  RST8_19h IsSoundPlaying
  RST8_1Ah  (?)
  RST8_1Bh  (?)
  RST8_1Ch  (?)
  RST8_1Dh GetExitCount               ;a=[2032D34h]
  RST8_1Eh Permille                   ;hl=de*1000/hl
  RST8_1Fh GBA: N/A - Z80: ExitRestart;[2032D38h]=a, [00C0h]=0001h  ;a=?
  RST8_20h GBA: N/A - Z80: WaitJoypad ;wait until joypad<>0, set hl=joypad
  RST8_21h GBA: N/A - Z80:  (?)
  RST8_22h  (?) _sound_unk7
  RST8_23h  (?) _sound_unk8
  RST8_24h  (?) _sound_unk9
  RST8_25h  (?) _sound_unk10
  RST8_26h Mosaic     ;bg<n>cnt.bit6=a.bit<n>, [400004Ch]=de
  RST8_27h  (?)
  RST8_28h  (?)
  RST8_29h  (?)
  RST8_2Ah  (?) get_8bit_from_2030110h
  RST8_2Bh  (?)
  RST8_2Ch  (?) get_16bit_from_2030112h ;jap/ori: hl=[20077B2h]
  RST8_2Dh  (?) get_16bit_from_2030114h ;jap/ori: hl=[20077B4h]
  RST8_2Eh  (?)
  RST8_2Fh PlayCustomSound(a,b)
  Below not for Japanese/Original
  (the renumbered functions can be theoretically used on japanese/original)
  (but, doing so would blow forwards compatibility with japanese/plus)
  RST8_30h (ori: none)      GBA: N/A - Z80: (?)
  RST8_31h (ori: none)      PlayCustomSoundEx(a,b,c)
  RST8_32h (ori: RST8_30h)  BrightnessHalf   ;[4000050h]=00FFh,[4000054h]=0008h
  RST8_33h (ori: RST8_31h)  BrightnessNormal ;[4000050h]=0000h
  RST8_34h (ori: RST8_32h)  N/A (bx lr)
  RST8_35h (ori: RST8_33h)   (?)
  RST8_36h (ori: RST8_34h)  ResetTimer ;[400010Ch]=00000000h, [400010Eh]=A+80h
  RST8_37h (ori: RST8_35h)  GetTimer   ;hl=[400010Ch]
  RST8_38h (ori: none)      GBA: N/A - Z80:  (?)
  Below is undefined/reserved/garbage (values as so in Z80 mode)
  (can be used to tweak jap/ori to start GBA-code from inside of Z80-code)
  (that, after relocating code to 3000xxxh via DMA via IoWrite function)
  RST8_39h (ori: RST8_36h)  bx 0140014h
  RST8_3Ah (ori: RST8_37h)  bx 3E700F0h
  RST8_3Bh (ori: RST8_38h)  bx 3E70000h+1
  RST8_3Ch (ori: RST8_39h)  bx 3E703E6h+1
  RST8_3Dh (ori: RST8_3Ah)  bx 3E703E6h+1
  RST8_3Eh (ori: RST8_3Bh)  bx 3E703E6h+1
  RST8_3Fh (ori: RST8_3Ch)  bx 3E703E6h+1
  40h-FFh  (ori: 3Dh-FFh)   bx ...

GBA Functions 03xxh (none such in Z80 mode)
  RSTX_00h Wait8bit  ;for 16bit: RST0_7Dh
  RSTX_01h GetKeyStateSticky()
  RSTX_02h GetKeyStateRaw()
  RSTX_03h  (?)
  RSTX_04h  (?)

  GBA Cart e-Reader VPK Decompression

  collected32bit=80000000h  ;initially empty (endflag in bit31)
  for i=0 to 3, id[i]=read_bits(8), next i, if id[0..3]<>'vpk0' then error
  dest_end=dest+read_bits(32)     ;size of decompressed data (of all strips)
  method=read_bits(8), if method>1 then error
  tree_index=0, read_huffman_tree, disproot=tree_index
  tree_index=tree_index+1, read_huffman_tree, lenroot=tree_index
  ;above stuff is contained only in the first strip. below loop starts at
  ;current location in first strip, and does then continue in further strips.
  if read_bits(1)=0 then                   ;copy one uncompressed data byte,
    [dest]=read_bits(8), dest=dest+1       ;does work without huffman trees
    if disproot=-1 or lenroot=-1 then error  ;compression does require trees
    if method=1   ;disp*4 is good for 32bit ARM opcodes
      if disp>2 then disp=disp*4-8 else disp=disp+4*read_tree(disproot)-7
    if len=0 or disp<=0 or dest+len-1>dest_end then error ;whoops
    for j=1 to len, [dest]=[dest-disp], dest=dest+1, next j
  if dest<dest_end then decompress_loop

  mov  data=0
  for i=1 to num
    shl collected32bit,1   ;move next bit to carry, or set zeroflag if empty
    if zeroflag
      src=src+4            ;read data in 32bit units, in reversed byte-order
      carryflag=1          ;endbit
      rcl collected32bit,1 ;move bit31 to carry (and endbit to bit0)
    rcl data,1             ;move carry to data
  next i

  while node[i].right<>-1  ;loop until reaching data node
    if read_bits(1)=1 then i=node[i].right else i=node[i].left
  i=node[i].left           ;get number of bits
  i=read_bits(i)           ;read that number of bits
  ret(i)                   ;return that value

  if read_bits(1)=1 then tree_index=-1, ret  ;exit (empty)
  node[tree_index].right=-1                  ;indicate data node
  node[tree_index].left=read_bits(8)         ;store data value
  if read_bits(1)=1 then ret                 ;exit (only 1 data node at root)
  push tree_index                     ;save previous (child) node
  jmp data_injump
  push tree_index                     ;save previous (child) node
  if read_bits(1)=1 then parent_node
  node[tree_index].right=-1           ;indicate data node
  node[tree_index].left=read_bits(8)  ;store data value
  jmp load_loop
  pop node[tree_index].right          ;store 1st child
  pop node[tree_index].left           ;store 2nd child
  if sp<>stacktop then jmp load_loop
  if read_bits(1)=0 then error        ;end bit (must be 1)
The best values for the huffman trees that I've found are 6,9,12-bit displacements for method 0 (best for NES/Z80 code), and two less for method 1, ie. 4,7,10-bit (best for GBA code). And 2,4,10-bit for the length values. The smallest value in node 0, and the other values in node 10 and 11.

The decompression works similar to the GBA BIOS'es LZ77 decompression function, but without using fixed bit-widths of length=4bit and displacement=12bit, instead, the bit-widths are read from huffman trees (which can also define fixed bit-widths; if data is located directly in the root node).
Unlike the GBA BIOS'es Huffman decompression function, the trees are starting with data entries, end are ending with the root entry. The above load function deciphers the data, and returns the root index.
With the variable bit-widths, the VPK compression rate is quite good, only, it's a pity that the length/disp values are zero-based, eg. for 2bit and 4bit lengths, it'd be much better to assign 2bit as 2..5, and 4bit as 6..21.

The e-Reader additionally supports an alternate decompression function, indicated by the absence of the "vpk0" ID, which supports compression of increasing byte-values, which isn't useful for program code.
Bit15 of the VPK Size value seems to disable (de-)compression, the VPK Data field is then containing plain uncompressed data.

  GBA Cart e-Reader Error Correction

The Error Correction Information that is appended at the end of the Block Header & Data Fragments consists of standard Reed-Solomon codes, which are also used for CD/DVD disks, DSL modems, and digital DVB television signals. That info allows to locate and repair a number of invalid data bytes.

Below code shows how to create and verify error-info (but not how to do the actual error correction). The dtalen,errlen values should be 18h,10h for the Block Header, and 40h,10h for Data Fragments; the latter settings might be possible to get changed to other values though?

  for i=dtalen-1 to errlen  ;loop across data portion
    z = rev[ data[i] xor data[errlen-1] ] ;
    for j=errlen-1 to 0     ;loop across error-info portion
    if j=0 then x=00h else x=data[j-1]
      if z<>FFh then
        y=gg[j], if y<>FFh then
          y=y+z, if y>=FFh then y=y-FFh
          x=x xor pow[y]
    next j
  next i

  for i=78h to 78h+errlen-1
    x=0, z=0
    for j=0 to dtalen-1
      if y<>FFh then
        y=y+z, if y>=FFh then y=y-FFh
        x=x xor pow[y]
      z=z+i, if z>=FFh then z=z-FFh
    next j
    if x<>0 then error
  next i
  ;(if errors occured, could correct them now)

  for i=0 to len-1, data[i]=rev[data[i]], next i

  for i=0 to len-1, data[i]=pow[data[i]], next i

  for i=0 to len-1, data[i]=data[i] xor FFh, next i

  for i=0 to len-1, data[i]=00h, next i

  for i=0 to (len-1)/2, x=data[i], data[i]=data[len-i], data[len-i]=x, next i

  x=01h, pow[FFh]=00h, rev[00h]=FFh
  for i=00h to FEh
    pow[i]=x, rev[x]=i, x=x*2, if x>=100h then x=x xor 187h
  next i

  for i=1 to errlen-1
    for j=i downto 0
      if j=0 then y=00h else y=gg[j-1]
      x=gg[j], if x<>00h then
        x=rev[x]+78h+i, if x>=FFh then x=x-FFh
        y=y xor pow[x]
    next j
  next i
With above value of 78h, and errlen=10h, gg[00h..0Fh] will be always:
So using a hardcoded table should take up less memory than calculating it.

The actual error correction should be able to fix up to "errlen" errors at known locations (eg. data from blocks that haven't been scanned, or whose 5bit-to-4bit conversion had failed due to an invalid 5bit value), or up to "errlen/2" errors at unknown locations. The corrected data isn't guaranteed to be correct (even if it looks okay to the "verify" function), so the Data Header checksums should be checked, too.

More Info
For more info, I've found Reed-Solomon source code from Simon Rockliff, and an updated version from Robert Morelos-Zaragoza and Hari Thirumoorthy to be useful. For getting started with that source, some important relationships & differences are:
  pow = alpha_to, but generated as shown above
  rev = index_of, dito
  b0  = 78h
  nn  = dtalen
  kk  = dtalen-errlen
  %nn = MOD FFh (for the ereader that isn't MOD dtalen)
  -1  = FFh
And, the ereader processes data/errinfo backwards, starting at the last byte.

  GBA Cart e-Reader File Formats

.BMP Files (homebrew 300 DPI strips)
Contains a picture of the whole dotcode strip with address bars and sync marks (see Dotcode chapter) in Microsoft's Bitmap format. The image is conventionally surrounded by a blank 2-pixel border, resulting in a size of 989x44 pixels for long strips. The file should should have 1bit color depth. The pixels per meter entry should match the desired printing resolution, either 300 DPI or 360 DPI. But, resolution of printer hardware is typically specified in inch rather than in meters, so an exact match isn't supported by Microsoft. Most homebrew .BMP files contain nonsense resolutions like 200 DPI, or 300 dots per meter (ca. 8 DPI).

.JPG Files (scanned 1200 DPI strips)
Same as BMP, but should contain a dotcode scanned at 1200 DPI, with correct orientation (the card-edge side at the bottom of the image), and containing only the dotcode (not the whole card), so the JPG size should be about 3450x155 pixels for long strips.
No$gba currently doesn't work with progressive JPGs. Scans with white background can be saved as monochrome JPG. Scans with red/yellow background should contain a correct RED layer (due to the red LED light source) (the brightness of the green/blue layers can be set to zero for better compression).

.RAW Files
Contains the "raw" information from the BMP format, that is, 2-byte block header, 102-byte data, 2-byte block header, 102-byte data, etc. The data portion is interleaved, and includes the full 48-byte data header, titles, vpk compressed data, error-info, and unused bytes. RAW files are excluding Address Bars, Sync Marks, and 4bit-to-5bit encoding.
Each RAW file contains one or more strip(s), so the RAW filesize is either 18*104 bytes (short strip), or 28*104 bytes (long strip), or a multiple thereof (if it contains more than one strip) (although multi-strip games are often stored in separate files for each strip; named file1.raw, file2.raw, etc).

.BIN Files
Filesize should be I*30h, with I=1Ch for short strips, and I=2Ch for long strips, or a multiple thereof (if it contains more than one strip). Each strip consists of the 48-byte Data Header, followed by title(s), and vpk compressed data. Unlike .RAW files, .BIN files aren't interleaved, and do not contain Block Headers, nor error-info, nor unused bytes (in last block). The files do contain padding bytes to match a full strip-size of I*30h.
Caution: Older .BIN files have been using a size-reduced 12-byte header (taken from entries 0Dh, 0Ch, 10h-11h, 26h-2Dh of the 48-byte Data Header; in that order), that files have never contained more than one strip per file, so the filesize should be exactly I*30h-36, the size-reduced header doesn't contain a Primary Type entry, so it's everyone's bet which Card Type is to be used (hint: the 12-byte headers were based on the assumption that Primary Type would be always 01h on Short Strips, and 02h on Long Strips).

.SAV Files
Contains a copy of the e-Reader's 128Kbyte FLASH memory. With the saved e-Reader application being located in the 2nd 64K-bank, the data consists of a header with title and gba/nes/z80 format info, followed by the vpk compressed data. The FLASH memory does also contain e-Reader calibration settings, the remaining 100Kbytes are typically FFh-filled.

  GBA Cart Unknown Devices

GBA Infra-Red Port (AGB-006)
No info?

  GBA Cart Protections

Classic NES Series
These are some NES/Famicom games ported or emulated to work on GBA. The games are doing some uncommon stuff that can cause compatibility problems when not using original GBA consoles or cartridges.
- CPU pipeline (selfmodifying code that shall NOT affect prefetched opcodes)
- STMDA write to I/O ports (writes in INCREASING order, not DECREASING order)
- SRAM detection (refuses to run if SRAM exists; the games do contain EEPROM)
- ROM mirrors (instead of the usual increasing numbers in unused ROM area)
- RAM mirrors (eg. main RAM accessed at 2F00000h instead of 2000000h)
Note: These games can be detected by checking [80000ACh]="F" (ie. game code="Fxxx").

  GBA Flashcards

Flashcards are re-writable cartridges using FLASH memory, allowing to test even multiboot-incompatible GBA software on real hardware, providing a good development environment when used in combination with a reasonable software debugger.

The carts can be written to from external tools, or directly from GBA programs.
Below are pseudo code flowcharts for detect, erase, and write operations.
All flash reads/writes are meant to be 16bit (ldrh/strh) memory accesses.

 configure_flashcard(9E2468Ah,9413h)    ;unlock flash advance cards
 turbo=1, send_command(8000000h,90h)    ;enter ID mode (both chips, if any)
 maker=[8000000h], device=[8000000h+2]
 IF maker=device THEN device=[8000000h+4] ELSE turbo=0
 flashcard_read_mode                    ;exit ID mode
 search (maker+device*10000h) in device_list
 total/erase/write_block_size = list_entry SHL turbo

 FOR x=1 to len/erase_block_size
  send_command(dest,20h)        ;erase sector command
  send_command(dest,D0h)        ;confirm erase sector
 IF wait_busy=okay THEN NEXT x
 enter_read_mode                ;exit erase/status mode

 FOR x=1 to len/siz
  IF siz=2 THEN send_command(dest,10h)  ;write halfword command
  IF siz>2 THEN send_command(dest,E8h)  ;write to buffer command
  IF siz>2 THEN send_command(dest,16-1) ;buffer size 16 halfwords (per chip)
  FOR y=1 TO siz/2
   [dest]=[src], dest=dest+2, src=src+2 ;write data to buffer
  NEXT y
  IF siz>2 THEN send_command(dest,D0h)  ;confirm write to buffer
 IF wait_busy=okay THEN NEXT x
 enter_read_mode                        ;exit write/status mode

 IF turbo THEN [adr+2]=val

 send_command(8000000h,FFh)     ;exit status mode
 send_command(8000000h,FFh)     ;again maybe more stable (as in jeff's source)

  stat=[8000000h] XOR 80h
  IF turbo THEN stat=stat OR ([8000000h+2] XOR 80h)
  IF (stat AND 7Fh)>0 THEN error
  IF (stat AND 80h)=0 THEN ready
  IF time-start>5secs THEN timeout
 UNTIL ready OR error OR timeout
 IF error OR timeout THEN send_command(8000000h,50h)    ;clear status

configure_flashcard(adr,val): ;required for Flash Advance cards only
 [802468Ah]=1234h, repeated 500 times
 [802468Ah]=5678h, repeated 500 times
 [802468Ah]=ABCDh, repeated 500 times

init_backup: ;no info how to use that exactly

device_list: (id code, total/erase/write sizes in bytes)
  ID Code    Total   Erase  Write  Name
  -??-00DCh      ?       ?      ?  Hudson Cart (???)
  00160089h     4M    128K     32  Intel i28F320J3A (Flash Advance)
  00170089h     8M    128K     32  Intel i28F640J3A (Flash Advance)
  00180089h    16M    128K     32  Intel i28F128J3A (Flash Advance)
  00E200B0h      ?     64K      2  Sharp LH28F320BJE ? (Nintendo)

All flashcards should work at 4,2 waitstates (power on default), most commercial games change waits to 3,1 which may work unstable with some/older FA flashcards. Intel FLASH specified to have a lifetime of 100,000 erases, and average block erase time 1 second (up to 5 second in worst cases).
Aside from the main FLASH memory, Flash Advance (FA) (aka Visoly) cards additionally contain battery buffered SRAM backup, and FLASH backup, and in some cases also EEPROM backup.
Turbo FA cards are containing two chips interlaced (at odd/even halfword addresses), allowing to write/erase both chips simultaneously, resulting in twice as fast programming time.
Standard Nintendo flash carts have to be modified before you can actually write to them. This is done by removing resistor R7 and putting it at empty location R8.
Mind that write/erase/detect modes output status information in ROM area, so that in that modes all GBA program code (and any interrupt handlers) must be executed in WRAM, not in ROM.

Thanks to Jeff Frohwein for his FAQ and CARTLIB sample in FLGBA at

  GBA Cheat Devices

Codebreaker (US) aka Xploder (EUR).
Gameshark (US) aka Action Replay (EUR).

GBA Cheat Codes - General Info
GBA Cheat Codes - Codebreaker/Xploder
GBA Cheat Codes - Gameshark/Action Replay V1/V2
GBA Cheat Codes - Pro Action Replay V3

  GBA Cheat Codes - General Info

Cheat devices are external adapters, connected between the GBA and the game cartridge. The devices include a BIOS ROM which is, among others, used to prompt the user to enter cheat codes.
These codes are used to patch specified memory locations for a certain GBA game, allowing the user to gain goodies such like Infinite sex, 255 Cigarettes, etc.

ROM and RAM Patches
For ROM Patches, the device watches the address bus, if it matches a specified address then it outputs a patched value to the data bus, that mechanism is implemented by hardware, aside from the Hook Enable Code some devices also allow a limited number of cheats to use ROM patches.
Most cheat codes are RAM patches, each time when the hook procedure is executed it will process all codes and overwrite the specified addresses in RAM (or VRAM or I/O area) by the desired values.

Enable Codes (Must Be On)
Enable codes usually consist of the Game ID, Hook Address, and eventually a third code used to encrypt all following codes. The Game ID is used to confirm that the correct cartridge is inserted, just a verification, though the device may insist on the ID code.
The Hook Address specifies an address in cartridge ROM, and should point to an opcode which is executed several times per second (eg. once per frame, many codes place the hook in the joypad handler). At the hook address, the device redirects to its own BIOS, processes the RAM patches, and does then return control to the game cartridge.
Note: The hook address should not point to opcodes with relative addressing (eg. B, BL, LDR Rd,=Imm, ADD Rd,=Imm opcodes - which are all relative to PC program counter register).

Addresses for 16bit or 32bit values should be properly aligned.

  GBA Cheat Codes - Codebreaker/Xploder

Codebreaker Codes
  0000xxxx 000y  Enable Code 1 - Game ID
  1aaaaaaa 000z  Enable Code 2 - Hook Address
  2aaaaaaa yyyy  [aaaaaaa]=[aaaaaaa] OR yyyy
  3aaaaaaa 00yy  [aaaaaaa]=yy
  4aaaaaaa yyyy  [aaaaaaa+0..(cccc-1)*ssss]=yyyy+0..(cccc-1)*ssss
  iiiicccc ssss  parameters for above code
  5aaaaaaa cccc  [aaaaaaa+0..(cccc-1)]=11,22,33,44,etc.
  11223344 5566  parameter bytes 1..6 for above code (example)
  77880000 0000  parameter bytes 7..8 for above code (padded with zero)
  6aaaaaaa yyyy  [aaaaaaa]=[aaaaaaa] AND yyyy
  7aaaaaaa yyyy  IF [aaaaaaa]=yyyy THEN (next code)
  8aaaaaaa yyyy  [aaaaaaa]=yyyy
  9xyyxxxx xxxx  Enable Code 0 - Encrypt all following codes (optional)
  Aaaaaaaa yyyy  IF [aaaaaaa]<>yyyy THEN (next code)
  Baaaaaaa yyyy  IF [aaaaaaa]>yyyy THEN (next code) (signed comparison)
  Caaaaaaa yyyy  IF [aaaaaaa]<yyyy THEN (next code) (signed comparison)
  D0000020 yyyy  IF [joypad] AND yyyy = 0 THEN (next code)
  Eaaaaaaa yyyy  [aaaaaaa]=[aaaaaaa]+yyyy
  Faaaaaaa yyyy  IF [aaaaaaa] AND yyyy THEN (next code)

Codebreaker Enable Codes
Hook Address 'aaaaaaa' is a 25bit offset in ROM-image (0-1FFFFFFh).
Flag byte 'y' (usually 0Ah), Bit1=Disable IRQs, Bit3=CRC Exists.
Code Handler Store Address 'z' (0-7, usually 7) (8000100h+z*400000h).
Checksum 'xxxx' for first 64Kbytes of cartridge (no$gba pads by FFh if ROM is smaller than 64K). Calculated, by using unsigned 16bit values, as such:
  for i=0 to FFFFh
   x=byte[i] xor (crc/100h)
   x=x xor (x/10h)
   crc=(crc*100h) xor (x*1001h) xor (x*20h)
  next i

Codebreaker Encryption
Encryption can be (optionally) activated by code "9xyyxxxx xxxx",
  for i=0 to 2Fh, swaplist[i]=i, next i
  randomizer = 1111h xor byte[code+4]                              ;LSB value
  for i=0 to 4Fh
    exchange swaplist[random MOD 30h] with swaplist[random MOD 30h]
  next i
  halfword[seedlist+0] = halfword[code+0]                          ;LSW address
  randomizer = 4EFAD1C3h
  for i=0 to byte[code+3]-91h, randomizer=random, next i           ;MSB address
  word[seedlist+2]=random, halfword[seedlist+6]=random
  randomizer = F254h xor byte[code+5]                              ;MSB value
  for i=0 to byte[code+5]-01h, randomizer=random, next i           ;MSB value
  word[seedlist+8]=random, halfword[seedlist+12]=random
  ;note: byte[code+2] = don't care
The above random function works like so:
  randomizer=randomizer*41C64E6Dh+3039h, x=(randomizer SHL 14 AND C0000000h)
  randomizer=randomizer*41C64E6Dh+3039h, x=(randomizer SHR 1  AND 3FFF8000h)+x
  randomizer=randomizer*41C64E6Dh+3039h, x=(randomizer SHR 16 AND 00007FFFh)+x
Once when encryption is activated, all following codes are decrypted like so:
  for i=2Fh to 0
    bitno1=(i AND 7), index1=xlatlist[i/8]
    bitno2=(j AND 7), index2=xlatlist[j/8]
    exchange [code+index1].bitno1 with [code+index2].bitno2
  next i
  word[code+0] = word[code+0] xor word[seedlist+8]
  i = (byte[code+3]*1010000h + byte[code+0]*100h + byte[code+5])
  i = (halfword[code+1]*10001h) xor (word[seedlist+2]) xor i
  i = (byte[seedlist+0]*1010101h) xor (byte[seedlist+1]*1000000h) xor i
  j = (byte[code+5] + (byte[code+0] xor byte[code+4])*100h)
  j = (byte[seedlist+0]*101h) xor halfword[seedlist+6] xor j
  word[code+0] = i, halfword[code+4] = j
The above xlatlist is fixed: xlatlist[0..5] = 3,2,1,0,5,4

  GBA Cheat Codes - Gameshark/Action Replay V1/V2

Gameshark RAW Codes (These codes must be encrypted before using them)
  0aaaaaaa 000000xx  [aaaaaaa]=xx
  1aaaaaaa 0000xxxx  [aaaaaaa]=xxxx
  2aaaaaaa xxxxxxxx  [aaaaaaa]=xxxxxxxx
  3000cccc xxxxxxxx  write xxxxxxxx to (cccc-1) addresses (list in next codes)
  aaaaaaaa aaaaaaaa  parameter for above code, containing two addresses each
  aaaaaaaa 00000000  last parameter for above, zero-padded if only one address
  60aaaaaa y000xxxx  [8000000h+aaaaaa*2]=xxxx (ROM Patch)
  8a1aaaaa 000000xx  IF GS_Button_Down THEN [a0aaaaa]=xx
  8a2aaaaa 0000xxxx  IF GS_Button_Down THEN [a0aaaaa]=xxxx
  80F00000 0000xxxx  IF GS_Button_Down THEN slowdown xxxx * ? cycles per hook
  Daaaaaaa 0000xxxx  IF [aaaaaaa]=xxxx THEN (next code)
  E0zzxxxx 0aaaaaaa  IF [aaaaaaa]=xxxx THEN (next 'zz' codes)
  Faaaaaaa 00000x0y  Enable Code - Hook Routine
  xxxxxxxx 001DC0DE  Enable Code - Game Code ID (value at [0ACh] in cartridge)
  DEADFACE 0000xxyy  Change Encryption Seeds

Enable Code - Hook Routine
Hook Address 'aaaaaaa' is a 28bit ROM address (8FFFFFFh-9FFFFFFh).
Used to insert the GS code handler routine where it will be executed at
least 20 times per second. Without this code, GSA can not write to RAM.
 y=1 - Executes code handler without backing up the LR register.
 y=2 - Executes code handler and backs up the LR register.
 y=3 - Replaces a 32-bit pointer used for long-branches.
 x=0 - Must turn GSA off before loading game.
 x=1 - Must not do that.

ROM Patch
This type allows GSA to intercept ROM reads and returns the value xxxx.
 y=0 wait for the code handler to enable the patch
 y=1 patch is enabled before the game starts
 y=2 unknown ?
Note: V1/V2 hardware can only have up to 1 user-defined rom patch max. V3 can have up to 4. Some enable code types can shorten the amount of user-defined rom patches available.

Gameshark Encryption
A=Left half, and V=Right half of code.
  FOR I=1 TO 32
    A=A + (V*16+S0) XOR (V+I*9E3779B9h) XOR (V/32+S1)
    V=V + (A*16+S2) XOR (A+I*9E3779B9h) XOR (A/32+S3)
Upon startup, the initial encryption seeds are:
  S0=09F4FBBDh S1=9681884Ah S2=352027E9h S3=F3DEE5A7h
Upon DEADFACE 0000xxyy, the S0..S3 seeds are changed like so:
  FOR y=0 TO 3
   FOR x=0 TO 3
    z = T1[(xx+x) AND FFh] + T2[(yy+y) AND FFh]
    Sy = Sy*100h + (z AND FFh)
   NEXT x
  NEXT y
All calculations truncated to unsigned 32bit integer values.
T1 and T2 are translation tables contained in the gameshark cartridge.

  GBA Cheat Codes - Pro Action Replay V3

Pro Action Replay V3 - RAW Codes
  C4aaaaaa 0000yyyy  Enable Code - Hook Routine at [8aaaaaa]
  xxxxxxxx 001DC0DE  Enable Code - ID Code [080000AC]
  DEADFACE 0000xxxx  Enable Code - Change Encryption Seeds
  00aaaaaa xxxxxxyy  [a0aaaaa..a0aaaaa+xxxxxx]=yy
  02aaaaaa xxxxyyyy  [a0aaaaa..a0aaaaa+xxxx*2]=yyyy
  04aaaaaa yyyyyyyy  [a0aaaaa]=yyyyyyyy
  40aaaaaa xxxxxxyy  [ [a0aaaaa] + xxxxxx ]=yy   (Indirect)
  42aaaaaa xxxxyyyy  [ [a0aaaaa] + xxxx*2 ]=yyyy (Indirect)
  44aaaaaa yyyyyyyy  [ [a0aaaaa] ]=yyyyyyyy      (Indirect)
  80aaaaaa 000000yy  [a0aaaaa]=[a0aaaaa]+yy
  82aaaaaa 0000yyyy  [a0aaaaa]=[a0aaaaa]+yyyy
  84aaaaaa yyyyyyyy  [a0aaaaa]=[a0aaaaa]+yyyyyyyy
  C6aaaaaa 0000yyyy  [4aaaaaa]=yyyy              (I/O Area)
  C7aaaaaa yyyyyyyy  [4aaaaaa]=yyyyyyyy          (I/O Area)
  iiaaaaaa yyyyyyyy  IF [a0aaaaa] <cond> <value> THEN <action>
  00000000 60000000  ELSE (?)
  00000000 40000000  ENDIF (?)
  00000000 0800xx00  AR Slowdown : loops the AR xx times
  00000000 00000000  End of the code list
  00000000 10aaaaaa 000000zz 00000000  IF AR_BUTTON THEN [a0aaaaa]=zz
  00000000 12aaaaaa 0000zzzz 00000000  IF AR_BUTTON THEN [a0aaaaa]=zzzz
  00000000 14aaaaaa zzzzzzzz 00000000  IF AR_BUTTON THEN [a0aaaaa]=zzzzzzzz
  00000000 18aaaaaa 0000zzzz 00000000  [8000000+aaaaaa*2]=zzzz  (ROM Patch 1)
  00000000 1Aaaaaaa 0000zzzz 00000000  [8000000+aaaaaa*2]=zzzz  (ROM Patch 2)
  00000000 1Caaaaaa 0000zzzz 00000000  [8000000+aaaaaa*2]=zzzz  (ROM Patch 3)
  00000000 1Eaaaaaa 0000zzzz 00000000  [8000000+aaaaaa*2]=zzzz  (ROM Patch 4)

  00000000 80aaaaaa 000000yy ssccssss  repeat cc times [a0aaaaa]=yy
   (with yy=yy+ss, a0aaaaa=a0aaaaa+ssss after each step)

  00000000 82aaaaaa 0000yyyy ssccssss  repeat cc times [a0aaaaa]=yyyy
   (with yyyy=yyyy+ss, a0aaaaa=a0aaaaa+ssss*2 after each step)

  00000000 84aaaaaa yyyyyyyy ssccssss  repeat cc times [a0aaaaa]=yyyyyyyy
   (with yyyy=yyyy+ss, a0aaaaa=a0aaaaa+ssss*4 after each step)

Warning: There is a bug on the real AR (v2 upgraded to v3, and maybe on real v3) with the 32bit Increment Slide code. You HAVE to add a code (best choice is 80000000 00000000 : add 0 to value at address 0) right after it, else the AR will erase the 2 last 8 digits lines of the 32 Bits Inc. Slide code when you enter it !!!

Final Notes
The 'turn off all codes' makes an infinite loop (that can't be broken, unless the condition becomes True). - How? By Interrupt? Huh?
ROM Patch1 works on real V3 and, on V1/V2 upgraded to V3.
ROM Patch2,3,4 work on real V3 hardware only.

Pro Action Replay V3 Conditional Codes - iiaaaaaa yyyyyyyy
The 'ii' is composed of <cond> + <value> + <action>.
  <cond>           <value>            <action>
  08 Equal =       00 8bit zz         00 execute next code
  10 Not equal <>  02 16bit zzzz      40 execute next two codes
  18 Signed <      04 32bit zzzzzzzz  80 execute all following
  20 Signed >      06 (always false)     codes until ELSE or ENDIF
  28 Unsigned <                       C0 normal ELSE turn off all codes
  30 Unsigned >
  38 Logical AND
For example, ii=18h+02h+40h=5Ah, produces IF [a0aaaaa]<zzzz THEN next 2 codes.

Always... Codes
  For the "Always..." codes:
  - XXXXXXXX can be any authorised address except 00000000 (eg. use 02000000).
  - ZZZZZZZZ can be anything.
  - The "y" in the code data must be in the [1-7] range (which means not 0).
  typ=y,sub=0,siz=3   Always skip next line.
  typ=y,sub=1,siz=3   Always skip next 2 lines.
  typ=y,sub=2,siz=3   Always Stops executing all the codes below.
  typ=y,sub=3,siz=3   Always turn off all codes.

Code Format (ttaaaaaa xxxxyyzz)
 adr mask = 003FFFFF
 n/a mask = 00C00000 ;not used
 xtr mask = 01000000 ;used only by I/O write, and MSB of Hook
 siz mask = 06000000
 typ mask = 38000000 ;0=normal, other=conditional
 sub mask = C0000000

Pro Action Replay V3 Encryption
Works exactly as for Gameshark Encryption, but with different initial seeds,
  S0=7AA9648Fh S1=7FAE6994h S2=C0EFAAD5h S3=42712C57h
And, the T1 and T2 translation tables are different, too.

  GBA Gameboy Player

The Gameboy Player is an "adapter" for the Gamecube console. It's basicly is a GBA in a black box without LCD screen and without buttons, connected to an expansion port at the bottom of the Gamecube. The Gamecube is then capturing the GBA video output (and passing it to the television set), and in the other direction, passing the Gamecube joypad input to the GBA inputs.

Unlocking and Detecting Gameboy Player Functions
Both unlocking and detection requires to display the 240x160 pixel Gameboy Player logo (44 colors) for a number of frames... maybe at least 3-4 frames? not sure if it checks the color of the logo... so maybe it can be hidden by using dark gray on black background?
While displaying this logo, the joypad data will switch between values 03FFh (2 frames duration) and 030Fh (1 frame duration). The latter value (left, right, up, down all pressed) indicates that it's a Gameboy Player.

Knowing Nintendo, they've probably not reproduced the blurred GBA colors (?), so the games won't look as desired on the TV screen. Unless the game does detect the Gameboy Player, and adjust the colors accordingly by software.

The only known existing special function is the joypad rumble function, controlled by sending data through the serial port (the normal GBA port, even though it also has the connectors).

The Game Boy Player added a rumble feature to certain Game Boy Advance games when played with a GameCube controller. Those games included:
 Drill Dozer (supports BOTH handheld-rumble and GBP-rumble?)
 Mario & Luigi: Superstar Saga
 Pokemon Pinball: Ruby & Sapphire
 Shikakui Atama wo Marukusuru Advance: Kokugo Sansu Rika Shakai
 Shikakui Atama wo Marukusuru Advance: Kanji Keisan
 Summon Night Craft Sword Monogatari: Hajimari no Ishi
 Super Mario Advance 4: Super Mario Bros. 3

Fredrik Olsson (aka Flubba) has implemented rumble in 3 applications now RumblePong (FluBBA) (homebrew)
  Remudvance (FluBBA) (homebrew)
  Goomba (FluBBA) (8bit Gameboy Color Emulator for 32bit GBA) (homebrew)
  and, supposedly in "Tetanus on Drugs" (Tepples) (homebrew)

The GBP can also use some of the extra controllers for the GC like the Bongas
from Donkey Konga.

The logo requires at least 256 colors, it doesn't matter if you use a tiled
screen mode or a bitmapped one, the logo can be ripped from either
"Pokemon Pinball" or "Super Mario Advance 4".

After detecting/unlocking the Gameboy Player, init RCNT and SIOCNT to 32bit normal mode, external clock, SO=high, with IRQ enabled, and set the transfer start bit. You should then receive the following sequence (about once per frame), and your serial IRQ handler should send responses accordingly:
  Receive  Response
  0000494E 494EB6B1
  xxxx494E 494EB6B1
  B6B1494E 544EB6B1
  B6B1544E 544EABB1
  ABB1544E 4E45ABB1
  ABB14E45 4E45B1BA
  B1BA4E45 4F44B1BA
  B1BA4F44 4F44B0BB
  B0BB4F44 8000B0BB
  B0BB8002 10000010
  10000010 20000013
  20000013 40000004
  30000003 40000004
  30000003 40000004
  30000003 40000004
  30000003 400000yy
  30000003 40000004
The first part of the transfer just contains the string "NINTENDO" split into 16bit fragments, and bitwise inversions thereof (eg. 494Eh="NI", and B6B1h=NOT 494Eh). In the second part, <yy> should be 04h=RumbleOff, or 26h=RumbleOn.

If it's having a similar range of functions as the 8bit Super Gameboy, then the Gameboy Player might be also able to access analogue joypad input, and to access other features of the Gamecube hardware, up to possibly executing code on the Gamecube CPU...?

  GBA Unpredictable Things

Most of the below is caused by 'traces' from previous operations which have used the databus. No promises that the results are stable on all current or future GBA models, and/or under all temperature and interference circumstances.
Also, below specifies 32bit data accesses only. When reading units less than 32bit, data is rotated depending on the alignment of the originally specified address, and 8bit or 16bit are then isolated from the 32bit value as usually.

Reading from BIOS Memory (00000000-00003FFF)
The BIOS memory is protected against reading, the GBA allows to read opcodes or data only if the program counter is located inside of the BIOS area. If the program counter is not in the BIOS area, reading will return the most recent successfully fetched BIOS opcode (eg. the opcode at [00DCh+8] after startup and SoftReset, the opcode at [0134h+8] during IRQ execution, and opcode at [013Ch+8] after IRQ execution, and opcode at [0188h+8] after SWI execution).

Reading from Unused Memory (00004000-01FFFFFF,10000000-FFFFFFFF)
Accessing unused memory at 00004000h-01FFFFFFh, and 10000000h-FFFFFFFFh (and 02000000h-03FFFFFFh when RAM is disabled via Port 4000800h) returns the recently pre-fetched opcode. For ARM code this is simply:
  WORD = [$+8]
For THUMB code the result consists of two 16bit fragments and depends on the address area and alignment where the opcode was stored.
For THUMB code in Main RAM, Palette Memory, VRAM, and Cartridge ROM this is:
  LSW = [$+4], MSW = [$+4]
For THUMB code in BIOS or OAM (and in 32K-WRAM on Original-NDS (in GBA mode)):
  LSW = [$+4], MSW = [$+6]   ;for opcodes at 4-byte aligned locations
  LSW = [$+2], MSW = [$+4]   ;for opcodes at non-4-byte aligned locations
For THUMB code in 32K-WRAM on GBA, GBA SP, GBA Micro, NDS-Lite (but not NDS):
  LSW = [$+4], MSW = OldHI   ;for opcodes at 4-byte aligned locations
  LSW = OldLO, MSW = [$+4]   ;for opcodes at non-4-byte aligned locations
Whereas OldLO/OldHI are usually:
  OldLO=[$+2], OldHI=[$+2]
Unless the previous opcode's prefetch was overwritten; that can happen if the previous opcode was itself an LDR opcode, ie. if it was itself reading data:
  OldLO=LSW(data), OldHI=MSW(data)
  Theoretically, this might also change if a DMA transfer occurs.
Note: Additionally, as usually, the 32bit data value will be rotated if the data address wasn't 4-byte aligned, and the upper bits of the 32bit value will be masked in case of LDRB/LDRH reads.
Note: The opcode prefetch is caused by the prefetch pipeline in the CPU itself, not by the external gamepak prefetch, ie. it works for code in ROM and RAM as well.

Reading from Unused or Write-Only I/O Ports
Works like above Unused Memory when the entire 32bit memory fragment is Unused (eg. 0E0h) and/or Write-Only (eg. DMA0SAD). And otherwise, returns zero if the lower 16bit fragment is readable (eg. 04Ch=MOSAIC, 04Eh=NOTUSED/ZERO).

Reading from GamePak ROM when no Cartridge is inserted
Because Gamepak uses the same signal-lines for both 16bit data and for lower 16bit halfword address, the entire gamepak ROM area is effectively filled by incrementing 16bit values (Address/2 AND FFFFh).

Memory Mirrors
Most internal memory is mirrored across the whole 24bit/16MB address space in which it is located: Slow On-board RAM at 2XXXXXX, Fast On-Chip RAM at 3XXXXXXh, Palette RAM at 5XXXXXXh, VRAM at 6XXXXXXh, and OAM at 7XXXXXXh. Even though VRAM is sized 96K (64K+32K), it is repeated in steps of 128K (64K+32K+32K, the two 32K blocks itself being mirrors of each other).
BIOS ROM, Normal ROM Cartridges, and I/O area are NOT mirrored, the only exception is the undocumented I/O port at 4000800h (repeated each 64K).
The 64K SRAM area is mirrored across the whole 32MB area at E000000h-FFFFFFFh, also, inside of the 64K SRAM field, 32K SRAM chips are repeated twice.

Writing 8bit Data to Video Memory
Video Memory (BG, OBJ, OAM, Palette) can be written to in 16bit and 32bit units only. Attempts to write 8bit data (by STRB opcode) won't work:
Writes to OBJ (6010000h-6017FFFh) (or 6014000h-6017FFFh in Bitmap mode) and to OAM (7000000h-70003FFh) are ignored, the memory content remains unchanged.
Writes to BG (6000000h-600FFFFh) (or 6000000h-6013FFFh in Bitmap mode) and to Palette (5000000h-50003FFh) are writing the new 8bit value to BOTH upper and lower 8bits of the addressed halfword, ie. "[addr AND NOT 1]=data*101h".

Using Invalid Tile Numbers
In Text mode, large tile numbers (combined with a non-zero character base setting in BGnCNT register) may exceed the available 64K of BG VRAM.
On GBA and GBA SP, such invalid tiles are displayed as if the character data is filled by the 16bit BG Map entry value (ie. as vertically striped tiles). Above applies only if there is only one BG layer enabled, with two or more layers, things are getting much more complicated: tile-data is then somehow derived from the other layers, depending on their priority order and scrolling offsets.
On NDS (in GBA mode), such invalid tiles are displayed as if the character data is zero-filled (ie. as invisible/transparent tiles).

Accessing SRAM Area by 16bit/32bit
Reading retrieves 8bit value from specified address, multiplied by 0101h (LDRH) or by 01010101h (LDR). Writing changes the 8bit value at the specified address only, being set to LSB of (source_data ROR (address*8)).

  NDS Reference

DS Technical Data
DS I/O Maps
DS Memory Maps

Hardware Programming
DS Memory Control
DS Video
DS 3D Video
DS Sound
DS System and Built-in Peripherals
DS Cartridges, Encryption, Firmware
DS Xboo
DS Wireless Communications

BIOS Functions
ARM CPU Reference
External Connectors

  DS Technical Data

  1x ARM946E-S 32bit RISC CPU, 66MHz (NDS9 video) (not used in GBA mode)
  1x ARM7TDMI  32bit RISC CPU, 33MHz (NDS7 sound) (16MHz in GBA mode)
Internal Memory
  4096KB Main RAM (8192KB in debug version)
  96KB   WRAM (64K mapped to NDS7, plus 32K mappable to NDS7 or NDS9)
  60KB   TCM/Cache (TCM: 16K Data, 32K Code) (Cache: 4K Data, 8K Code)
  656KB  VRAM (allocateable as BG/OBJ/2D/3D/Palette/Texture/WRAM memory)
  4KB    OAM/PAL (2K OBJ Attribute Memory, 2K Standard Palette RAM)
  248KB  Internal 3D Memory (104K Polygon RAM, 144K Vertex RAM)
  ?KB    Matrix Stack, 48 scanline cache
  8KB    Wifi RAM
  256KB  Firmware FLASH (512KB in iQue variant, with chinese charset)
  36KB   BIOS ROM (4K NDS9, 16K NDS7, 16K GBA)
  2x LCD screens (each 256x192 pixel, 3 inch, 18bit color depth, backlight)
  2x 2D video engines (extended variants of the GBA's video controller)
  1x 3D video engine (can be assigned to upper or lower screen)
  1x video capture (for effects, or for forwarding 3D to the 2nd 2D engine)
  16 sound channels (16x PCM8/PCM16/IMA-ADPCM, 6x PSG-Wave, 2x PSG-Noise)
  2 sound capture units (for echo effects, etc.)
  Output: Two built-in stereo speakers, and headphones socket
  Input:  One built-in microphone, and microphone socket
  Gamepad      4 Direction Keys, 8 Buttons
  Touchscreen  (on lower LCD screen)
Communication Ports
  Wifi IEEE802.11b
  Built-in Real Time Clock
  Power Managment Device
  Hardware divide and square root functions
  CP15 System Control Coprocessor (cache, tcm, pu, bist, etc.)
External Memory
  NDS Slot (for NDS games) (encrypted 8bit data bus, and serial 1bit bus)
  GBA Slot (for NDS expansions, or for GBA games) (but not for DMG/CGB games)
Manufactured Cartridges
  ROM: 16MB, 32MB, or 64MB
  EEPROM/FLASH/FRAM: 0.5KB, 8KB, 64KB, 256KB, or 512KB
Can be booted from
  NDS Cartridge (NDS mode)
  Firmware FLASH (NDS mode) (eg. by patching firmware via ds-xboo cable)
  Wifi (NDS mode)
  GBA Cartridge (GBA mode) (without DMG/CGB support) (without SIO support)
Power Supply
  Built-in rechargeable Lithium ion battery, 3.7V 1000mAh (DS-Lite)
  External Supply: 5.2V DC

Slightly smaller than the original NDS, coming in a more decently elegant case. The LCDs are much more colorful (and thus not backwards compatible with any older NDS or GBA games), and the LCDs support wider viewing angles. Slightly different power managment device (with selectable backlight brightness, new external power source flag, lost audio amplifier mute flag). Slightly different Wifi controller (different chip ID, different dirt effects when accessing invalid wifi ports and unused wifi memory regions, different behaviour on GAPDISP registers, RF/BB chips replaced by a single chip). Slightly different touch screen controller (with new unused input, and slightly different powerdown bits).

NDS9 means the ARM9 processor and its memory and I/O ports in NDS mode
NDS7 means the ARM7 processor and its memory and I/O ports in NDS mode
GBA means the ARM7 processor and its memory and I/O ports in GBA mode

The two Processors
Most game code is usually executed on the ARM9 processor (in fact, Nintendo reportedly doesn't allow developers use the ARM7 processor, except by predefined API functions, anyways, even with the most likely inefficient API code, most of the ARM7's 33MHz horsepower is left unused).
The ARM9's 66MHz "horsepower" is a different tale - it seems Nintendo thought that a 33MHz processor would be too "slow" for 3D games, and so they (tried to) badge an additional CPU to the original GBA hardware.
However, the real 66MHz can be used only with cache and tcm, all other memory and I/O accesses are delayed to the 33MHz bus clock, that'd be still quite fast, but, there seems to be a hardware glitch that adds 3 waitcycles to all nonsequential accesses at the NDS9 side, which effectively drops its bus clock to about 8MHz, making it ways slower than the 33MHz NDS7 processor, it's even slower than the original 16MHz GBA processor.
Altogether, with the bugged 66MHz, and the unused 33MHz, Nintendo could have reached almost the same power when staying with the GBA's 16MHz processor :-)
Although, when properly using cache/tcm, then the 66MHz processor <can> be very fast, still, the NDS should have worked as well with a single processor, though using only an ARM9 might cause a lot of compatibility problems with GBA games, so there's at least one reason for keeping the ARM7 included.

  DS I/O Maps

ARM9 I/O Map
ARM9 Display Engine A
  4000000h  4    2D Engine A - DISPCNT - LCD Control (Read/Write)
  4000004h  2    2D Engine A+B - DISPSTAT - General LCD Status (Read/Write)
  4000006h  2    2D Engine A+B - VCOUNT - Vertical Counter (Read only)
  4000008h  50h  2D Engine A (same registers as GBA, some changed bits)
  4000060h  2    DISP3DCNT - 3D Display Control Register (R/W)
  4000064h  4    DISPCAPCNT - Display Capture Control Register (R/W)
  4000068h  4    DISP_MMEM_FIFO - Main Memory Display FIFO (R?/W)
  400006Ch  2    2D Engine A - MASTER_BRIGHT - Master Brightness Up/Down
ARM9 DMA, Timers, and Keypad
  40000B0h  30h  DMA Channel 0..3
  40000E0h  10h  DMA FILL Registers for Channel 0..3
  4000100h  10h  Timers 0..3
  4000130h  2    KEYINPUT
  4000132h  2    KEYCNT
  4000180h  2  IPCSYNC - IPC Synchronize Register (R/W)
  4000184h  2  IPCFIFOCNT - IPC Fifo Control Register (R/W)
  4000188h  4  IPCFIFOSEND - IPC Send Fifo (W)
  40001A0h  2  AUXSPICNT - Gamecard ROM and SPI Control
  40001A2h  2  AUXSPIDATA - Gamecard SPI Bus Data/Strobe
  40001A4h  4  Gamecard bus timing/control
  40001A8h  8  Gamecard bus 8-byte command out
  40001B0h  4  Gamecard Encryption Seed 0 Lower 32bit
  40001B4h  4  Gamecard Encryption Seed 1 Lower 32bit
  40001B8h  2  Gamecard Encryption Seed 0 Upper 7bit (bit7-15 unused)
  40001BAh  2  Gamecard Encryption Seed 1 Upper 7bit (bit7-15 unused)
ARM9 Memory and IRQ Control
  4000204h  2  EXMEMCNT - External Memory Control (R/W)
  4000208h  2  IME - Interrupt Master Enable (R/W)
  4000210h  4  IE  - Interrupt Enable (R/W)
  4000214h  4  IF  - Interrupt Request Flags (R/W)
  4000240h  1  VRAMCNT_A - VRAM-A (128K) Bank Control (W)
  4000241h  1  VRAMCNT_B - VRAM-B (128K) Bank Control (W)
  4000242h  1  VRAMCNT_C - VRAM-C (128K) Bank Control (W)
  4000243h  1  VRAMCNT_D - VRAM-D (128K) Bank Control (W)
  4000244h  1  VRAMCNT_E - VRAM-E (64K) Bank Control (W)
  4000245h  1  VRAMCNT_F - VRAM-F (16K) Bank Control (W)
  4000246h  1  VRAMCNT_G - VRAM-G (16K) Bank Control (W)
  4000247h  1  WRAMCNT   - WRAM Bank Control (W)
  4000248h  1  VRAMCNT_H - VRAM-H (32K) Bank Control (W)
  4000249h  1  VRAMCNT_I - VRAM-I (16K) Bank Control (W)
ARM9 Maths
  4000280h  2  DIVCNT - Division Control (R/W)
  4000290h  8  DIV_NUMER - Division Numerator (R/W)
  4000298h  8  DIV_DENOM - Division Denominator (R/W)
  40002A0h  8  DIV_RESULT - Division Quotient (=Numer/Denom) (R)
  40002A8h  8  DIVREM_RESULT - Division Remainder (=Numer MOD Denom) (R)
  40002B0h  2  SQRTCNT - Square Root Control (R/W)
  40002B4h  4  SQRT_RESULT - Square Root Result (R)
  40002B8h  8  SQRT_PARAM - Square Root Parameter Input (R/W)
  4000300h  4  POSTFLG - Undoc
  4000304h  2  POWCNT1 - Graphics Power Control Register (R/W)
ARM9 3D Display Engine
DS 3D I/O Map
ARM9 Display Engine B
  4001000h  4    2D Engine B - DISPCNT - LCD Control (Read/Write)
  4001008h  50h  2D Engine B (same registers as GBA, some changed bits)
  400106Ch  2    2D Engine B - MASTER_BRIGHT - 16bit - Brightness Up/Down
ARM9 DSi Extra Registers
  40021Axh  ..  DSi Registers
  4004xxxh  ..  DSi Registers
  4100000h  4    IPCFIFORECV - IPC Receive Fifo (R)
  4100010h  4    Gamecard bus 4-byte data in, for manual or dma read
ARM9 DS Debug Registers (Emulator/Devkits)
  4FFF0xxh  ..   Ensata Emulator Debug Registers
  4FFFAxxh  ..   No$gba Emulator Debug Registers
ARM9 Hardcoded RAM Addresses for Exception Handling
  27FFD9Ch   ..  NDS9 Debug Stacktop / Debug Vector (0=None)
  DTCM+3FF8h 4   NDS9 IRQ Check Bits (hardcoded RAM address)
  DTCM+3FFCh 4   NDS9 IRQ Handler (hardcoded RAM address)
Main Memory Control
  27FFFFEh  2    Main Memory Control
Further Memory Control Registers
ARM CP15 System Control Coprocessor

ARM7 I/O Map
  4000004h  2   DISPSTAT
  4000006h  2   VCOUNT
  40000B0h  30h DMA Channels 0..3
  4000100h  10h Timers 0..3
  4000120h  4   Debug SIODATA32
  4000128h  4   Debug SIOCNT
  4000130h  2   keyinput
  4000132h  2   keycnt
  4000134h  2   Debug RCNT
  4000136h  2   EXTKEYIN
  4000138h  1   RTC Realtime Clock Bus
  4000180h  2   IPCSYNC - IPC Synchronize Register (R/W)
  4000184h  2   IPCFIFOCNT - IPC Fifo Control Register (R/W)
  4000188h  4   IPCFIFOSEND - IPC Send Fifo (W)
  40001A0h  2   AUXSPICNT - Gamecard ROM and SPI Control
  40001A2h  2   AUXSPIDATA - Gamecard SPI Bus Data/Strobe
  40001A4h  4   Gamecard bus timing/control
  40001A8h  8   Gamecard bus 8-byte command out
  40001B0h  4   Gamecard Encryption Seed 0 Lower 32bit
  40001B4h  4   Gamecard Encryption Seed 1 Lower 32bit
  40001B8h  2   Gamecard Encryption Seed 0 Upper 7bit (bit7-15 unused)
  40001BAh  2   Gamecard Encryption Seed 1 Upper 7bit (bit7-15 unused)
  40001C0h  2   SPI bus Control (Firmware, Touchscreen, Powerman)
  40001C2h  2   SPI bus Data
ARM7 Memory and IRQ Control
  4000204h  2   EXMEMSTAT - External Memory Status
  4000206h  2   WIFIWAITCNT
  4000208h  4   IME - Interrupt Master Enable (R/W)
  4000210h  4   IE  - Interrupt Enable (R/W)
  4000214h  4   IF  - Interrupt Request Flags (R/W)
  4000218h  -   IE2  ;\DSi only (additional ARM7 interrupt sources)
  400021Ch  -   IF2  ;/
  4000240h  1   VRAMSTAT - VRAM-C,D Bank Status (R)
  4000241h  1   WRAMSTAT - WRAM Bank Status (R)
  4000300h  1   POSTFLG
  4000301h  1   HALTCNT (different bits than on GBA) (plus NOP delay)
  4000304h  2   POWCNT2  Sound/Wifi Power Control Register (R/W)
  4000308h  4   BIOSPROT - Bios-data-read-protection address
ARM7 Sound Registers
  4000400h 100h Sound Channel 0..15 (10h bytes each)
  40004x0h  4   SOUNDxCNT - Sound Channel X Control Register (R/W)
  40004x4h  4   SOUNDxSAD - Sound Channel X Data Source Register (W)
  40004x8h  2   SOUNDxTMR - Sound Channel X Timer Register (W)
  40004xAh  2   SOUNDxPNT - Sound Channel X Loopstart Register (W)
  40004xCh  4   SOUNDxLEN - Sound Channel X Length Register (W)
  4000500h  2   SOUNDCNT - Sound Control Register (R/W)
  4000504h  2   SOUNDBIAS - Sound Bias Register (R/W)
  4000508h  1   SNDCAP0CNT - Sound Capture 0 Control Register (R/W)
  4000509h  1   SNDCAP1CNT - Sound Capture 1 Control Register (R/W)
  4000510h  4   SNDCAP0DAD - Sound Capture 0 Destination Address (R/W)
  4000514h  2   SNDCAP0LEN - Sound Capture 0 Length (W)
  4000518h  4   SNDCAP1DAD - Sound Capture 1 Destination Address (R/W)
  400051Ch  2   SNDCAP1LEN - Sound Capture 1 Length (W)
ARM7 DSi Extra Registers
  40021Axh  ..  DSi Registers
  4004xxxh  ..  DSi Registers
  4004700h  2   DSi SNDEXCNT Register  ;\mapped even in DS mode
  4004C0xh  ..  DSi GPIO Registers     ;/
  4100000h  4   IPCFIFORECV - IPC Receive Fifo (R)
  4100010h  4   Gamecard bus 4-byte data in, for manual or dma read
ARM7 WLAN Registers
  4800000h  ..  Wifi WS0 Region (32K) (Wifi Ports, and 8K Wifi RAM)
  4808000h  ..  Wifi WS1 Region (32K) (mirror of above, other waitstates)
ARM7 Hardcoded RAM Addresses for Exception Handling
  380FFC0h  4   DSi7 IRQ IF2 Check Bits (hardcoded RAM address) (DSi only)
  380FFDCh  ..  NDS7 Debug Stacktop / Debug Vector (0=None)
  380FFF8h  4   NDS7 IRQ IF Check Bits (hardcoded RAM address)
  380FFFCh  4   NDS7 IRQ Handler (hardcoded RAM address)

  DS Memory Maps

NDS9 Memory Map
  00000000h  Instruction TCM (32KB) (not moveable) (mirror-able to 1000000h)
  0xxxx000h  Data TCM        (16KB) (moveable)
  02000000h  Main Memory     (4MB)
  03000000h  Shared WRAM     (0KB, 16KB, or 32KB can be allocated to ARM9)
  04000000h  ARM9-I/O Ports
  05000000h  Standard Palettes (2KB) (Engine A BG/OBJ, Engine B BG/OBJ)
  06000000h  VRAM - Engine A, BG VRAM  (max 512KB)
  06200000h  VRAM - Engine B, BG VRAM  (max 128KB)
  06400000h  VRAM - Engine A, OBJ VRAM (max 256KB)
  06600000h  VRAM - Engine B, OBJ VRAM (max 128KB)
  06800000h  VRAM - "LCDC"-allocated (max 656KB)
  07000000h  OAM (2KB) (Engine A, Engine B)
  08000000h  GBA Slot ROM (max 32MB)
  0A000000h  GBA Slot RAM (max 64KB)
  FFFF0000h  ARM9-BIOS (32KB) (only 3K used)
The ARM9 Exception Vectors are located at FFFF0000h. The IRQ handler redirects to [DTCM+3FFCh].

NDS7 Memory Map
  00000000h  ARM7-BIOS (16KB)
  02000000h  Main Memory (4MB)
  03000000h  Shared WRAM (0KB, 16KB, or 32KB can be allocated to ARM7)
  03800000h  ARM7-WRAM (64KB)
  04000000h  ARM7-I/O Ports
  04800000h  Wireless Communications Wait State 0 (8KB RAM at 4804000h)
  04808000h  Wireless Communications Wait State 1 (I/O Ports at 4808000h)
  06000000h  VRAM allocated as Work RAM to ARM7 (max 256K)
  08000000h  GBA Slot ROM (max 32MB)
  0A000000h  GBA Slot RAM (max 64KB)
The ARM7 Exception Vectors are located at 00000000h. The IRQ handler redirects to [3FFFFFCh aka 380FFFCh].

Further Memory (not mapped to ARM9/ARM7 bus)
  3D Engine Polygon RAM (52KBx2)
  3D Engine Vertex RAM (72KBx2)
  Firmware (256KB) (built-in serial flash memory)
  GBA-BIOS (16KB) (not used in NDS mode)
  NDS Slot ROM (serial 8bit-bus, max 4GB with default protocol)
  NDS Slot FLASH/EEPROM/FRAM (serial 1bit-bus)

Even though Shared WRAM begins at 3000000h, programs are commonly using mirrors at 37F8000h (both ARM9 and ARM7). At the ARM7-side, this allows to use 32K Shared WRAM and 64K ARM7-WRAM as a continous 96K RAM block.

Undefined I/O Ports
On the NDS (at the ARM9-side at least) undefined I/O ports are always zero.

Undefined Memory Regions
16MB blocks that do not contain any defined memory regions (or that contain only mapped TCM regions) are typically completely undefined.
16MB blocks that do contain valid memory regions are typically containing mirrors of that memory in the unused upper part of the 16MB area (only exceptions are TCM and BIOS which are not mirrored).

  DS Memory Control

Memory Control
DS Memory Control - Cache and TCM
DS Memory Control - Cartridges and Main RAM
DS Memory Control - WRAM
DS Memory Control - VRAM
DS Memory Control - BIOS

Memory Access Time
DS Memory Timings

  DS Memory Control - Cache and TCM

TCM and Cache are controlled by the System Control Coprocessor,
ARM CP15 System Control Coprocessor

The specifications for the NDS9 are:

Tightly Coupled Memory (TCM)
  ITCM 32K, base=00000000h (fixed, not move-able)
  DTCM 16K, base=moveable  (default base=27C0000h)
Note: Although ITCM is NOT moveable, the NDS Firmware configures the ITCM size to 32MB, and so, produces ITCM mirrors at 0..1FFFFFFh. Furthermore, the PU can be used to lock/unlock memory in that region. That trick allows to move ITCM anywhere within the lower 32MB of memory.

  Data Cache 4KB, Instruction Cache 8KB
  4-way set associative method
  Cache line 8 words (32 bytes)
  Read-allocate method (ie. writes are not allocating cache lines)
  Round-robin and Pseudo-random replacement algorithms selectable
  Cache Lockdown, Instruction Prefetch, Data Preload
  Data write-through and write-back modes selectable

Protection Unit (PU)
Recommended/default settings are:
  Region  Name            Address   Size   Cache WBuf Code Data
  -       Background      00000000h 4GB    -     -    -    -
  0       I/O and VRAM    04000000h 64MB   -     -    R/W  R/W
  1       Main Memory     02000000h 4MB    On    On   R/W  R/W
  2       ARM7-dedicated  027C0000h 256KB  -     -    -    -
  3       GBA Slot        08000000h 128MB  -     -    -    R/W
  4       DTCM            027C0000h 16KB   -     -    -    R/W
  5       ITCM            01000000h 32KB   -     -    R/W  R/W
  6       BIOS            FFFF0000h 32KB   On    -    R    R
  7       Shared Work     027FF000h 4KB    -     -    -    R/W
Notes: In Nintendo's hardware-debugger, Main Memory is expanded to 8MB (for that reason, some addresses are at 27NN000h instead 23NN000h) (some of the extra memory is reserved for the debugger, some can be used for game development). Region 2 and 7 are not understood? GBA Slot should be max 32MB+64KB, rounded up to 64MB, no idea why it is 128MB? DTCM and ITCM do not use Cache and Write-Buffer because TCM is fast. Above settings do not allow to access Shared Memory at 37F8000h? Do not use cache/wbuf for I/O, doing so might suppress writes, and/or might read outdated values.
The main purpose of the Protection Unit is debugging, a major problem with GBA programs have been faulty accesses to memory address 00000000h and up (due to [base+offset] addressing with uninitialized (zero) base values). This problem has been fixed in the NDS, for the ARM9 processor at least, still there are various leaks: For example, the 64MB I/O and VRAM area contains only ca. 660KB valid addresses, and the ARM7 probably doesn't have a Protection Unit at all. Alltogether, the protection is better than in GBA, but it's still pretty crude compared with software debugging tools.
Region address/size are unified (same for code and data), however, cachabilty and access rights are non-unified (and may be separately defined for code and data).

Note: The NDS7 doesn't have any TCM, Cache, or CP15.

  DS Memory Control - Cartridges and Main RAM

4000204h - NDS9 - EXMEMCNT - 16bit - External Memory Control (R/W)
4000204h - NDS7 - EXMEMSTAT - 16bit - External Memory Status (R/W..R)
  0-1   32-pin GBA Slot SRAM Access Time    (0-3 = 10, 8, 6, 18 cycles)
  2-3   32-pin GBA Slot ROM 1st Access Time (0-3 = 10, 8, 6, 18 cycles)
  4     32-pin GBA Slot ROM 2nd Access Time (0-1 = 6, 4 cycles)
  5-6   32-pin GBA Slot PHI-pin out   (0-3 = Low, 4.19MHz, 8.38MHz, 16.76MHz)
  7     32-pin GBA Slot Access Rights     (0=ARM9, 1=ARM7)
  8-10  Not used (always zero)
  11    17-pin NDS Slot Access Rights     (0=ARM9, 1=ARM7)
  12    Not used (always zero)
  13    Not used (always set ?)
  14    Main Memory Interface Mode Switch (0=Async/GBA/Reserved, 1=Synchronous)
  15    Main Memory Access Priority       (0=ARM9 Priority, 1=ARM7 Priority)
Bit0-6 can be changed by both NDS9 and NDS7, changing these bits affects the local EXMEM register only, not that of the other CPU.
Bit7-15 can be changed by NDS9 only, changing these bits affects both EXMEM registers, ie. both NDS9 and NDS7 can read the current NDS9 setting.
Bit14=0 is intended for GBA mode, however, writes to this bit appear to be ignored?
DS Main Memory Control

GBA Slot (8000000h-AFFFFFFh)
The GBA Slot can be mapped to ARM9 or ARM7 via EXMEMCNT.7.
For the selected CPU, memory at 8000000h-9FFFFFFh contains the "GBA ROM" region, and memory at A000000h-AFFFFFFh contains the "GBA SRAM" region (repeated every 64Kbytes). If there is no cartridge in GBA Slot, then the ROM/SRAM regions will contain open-bus values: SRAM region is FFh-filled (High-Z). And ROM region is filled by increasing 16bit values (Addr/2), possibly ORed with garbage depending on the selected ROM Access Time:
  6 clks   --> returns "Addr/2"
  8 clks   --> returns "Addr/2"
  10 clks  --> returns "Addr/2 OR FE08h" (or similar garbage)
  18 clks  --> returns "FFFFh" (High-Z)
For the deselected CPU, all memory at 8000000h-AFFFFFFh becomes 00h-filled, this is required for bugged games like Digimon Story: Super Xros Wars (which is accidently reading deselected GBA SRAM at [main_ram_base+main_ram_addr*4], whereas it does presumably want to read Main RAM at [main_ram_base+index*4]).

  DS Memory Control - WRAM

4000247h - NDS9 - WRAMCNT - 8bit - WRAM Bank Control (R/W)
4000241h - NDS7 - WRAMSTAT - 8bit - WRAM Bank Status (R)
Should not be changed when using Nintendo's API.
  0-1   ARM9/ARM7 (0-3 = 32K/0K, 2nd 16K/1st 16K, 1st 16K/2nd 16K, 0K/32K)
  2-7   Not used
The ARM9 WRAM area is 3000000h-3FFFFFFh (16MB range).
The ARM7 WRAM area is 3000000h-37FFFFFh (8MB range).
The allocated 16K or 32K are mirrored everywhere in the above areas.
De-allocation (0K) is a special case: At the ARM9-side, the WRAM area is then empty (containing undefined data). At the ARM7-side, the WRAM area is then containing mirrors of the 64KB ARM7-WRAM (the memory at 3800000h and up).

  DS Memory Control - VRAM

4000240h - NDS7 - VRAMSTAT - 8bit - VRAM Bank Status (R)
  0     VRAM C enabled and allocated to NDS7  (0=No, 1=Yes)
  1     VRAM D enabled and allocated to NDS7  (0=No, 1=Yes)
  2-7   Not used (always zero)
The register indicates if VRAM C/D are allocated to NDS7 (as Work RAM), ie. if VRAMCNT_C/D are enabled (Bit7=1), with MST=2 (Bit0-2). However, it does not reflect the OFS value.

4000240h - NDS9 - VRAMCNT_A - 8bit - VRAM-A (128K) Bank Control (W)
4000241h - NDS9 - VRAMCNT_B - 8bit - VRAM-B (128K) Bank Control (W)
4000242h - NDS9 - VRAMCNT_C - 8bit - VRAM-C (128K) Bank Control (W)
4000243h - NDS9 - VRAMCNT_D - 8bit - VRAM-D (128K) Bank Control (W)
4000244h - NDS9 - VRAMCNT_E - 8bit - VRAM-E (64K) Bank Control (W)
4000245h - NDS9 - VRAMCNT_F - 8bit - VRAM-F (16K) Bank Control (W)
4000246h - NDS9 - VRAMCNT_G - 8bit - VRAM-G (16K) Bank Control (W)
4000248h - NDS9 - VRAMCNT_H - 8bit - VRAM-H (32K) Bank Control (W)
4000249h - NDS9 - VRAMCNT_I - 8bit - VRAM-I (16K) Bank Control (W)
  0-2   VRAM MST              ;Bit2 not used by VRAM-A,B,H,I
  3-4   VRAM Offset (0-3)     ;Offset not used by VRAM-E,H,I
  5-6   Not used
  7     VRAM Enable (0=Disable, 1=Enable)
There is a total of 656KB of VRAM in Blocks A-I.
Table below shows the possible configurations.
  VRAM    SIZE  MST  OFS   ARM9, Plain ARM9-CPU Access (so-called LCDC mode)
  A       128K  0    -     6800000h-681FFFFh
  B       128K  0    -     6820000h-683FFFFh
  C       128K  0    -     6840000h-685FFFFh
  D       128K  0    -     6860000h-687FFFFh
  E       64K   0    -     6880000h-688FFFFh
  F       16K   0    -     6890000h-6893FFFh
  G       16K   0    -     6894000h-6897FFFh
  H       32K   0    -     6898000h-689FFFFh
  I       16K   0    -     68A0000h-68A3FFFh
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine A, BG-VRAM (max 512K)
  A,B,C,D 128K  1    0..3  6000000h+(20000h*OFS)
  E       64K   1    -     6000000h
  F,G     16K   1    0..3  6000000h+(4000h*OFS.0)+(10000h*OFS.1)
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine A, OBJ-VRAM (max 256K)
  A,B     128K  2    0..1  6400000h+(20000h*OFS.0)  ;(OFS.1 must be zero)
  E       64K   2    -     6400000h
  F,G     16K   2    0..3  6400000h+(4000h*OFS.0)+(10000h*OFS.1)
  VRAM    SIZE  MST  OFS   2D Graphics Engine A, BG Extended Palette
  E       64K   4    -     Slot 0-3  ;only lower 32K used
  F,G     16K   4    0..1  Slot 0-1 (OFS=0), Slot 2-3 (OFS=1)
  VRAM    SIZE  MST  OFS   2D Graphics Engine A, OBJ Extended Palette
  F,G     16K   5    -     Slot 0  ;16K each (only lower 8K used)
  VRAM    SIZE  MST  OFS   Texture/Rear-plane Image
  A,B,C,D 128K  3    0..3  Slot OFS(0-3)   ;(Slot2-3: Texture, or Rear-plane)
  VRAM    SIZE  MST  OFS   Texture Palette
  E       64K   3    -     Slots 0-3                 ;OFS=don't care
  F,G     16K   3    0..3  Slot (OFS.0*1)+(OFS.1*4)  ;ie. Slot 0, 1, 4, or 5
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine B, BG-VRAM (max 128K)
  C       128K  4    -     6200000h
  H       32K   1    -     6200000h
  I       16K   1    -     6208000h
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine B, OBJ-VRAM (max 128K)
  D       128K  4    -     6600000h
  I       16K   2    -     6600000h
  VRAM    SIZE  MST  OFS   2D Graphics Engine B, BG Extended Palette
  H       32K   2    -     Slot 0-3
  VRAM    SIZE  MST  OFS   2D Graphics Engine B, OBJ Extended Palette
  I       16K   3    -     Slot 0  ;(only lower 8K used)
  VRAM    SIZE  MST  OFS   <ARM7>, Plain <ARM7>-CPU Access
  C,D     128K  2    0..1  6000000h+(20000h*OFS.0)  ;OFS.1 must be zero

In Plain-CPU modes, VRAM can be accessed only by the CPU (and by the Capture Unit, and by VRAM Display mode). In "Plain <ARM7>-CPU Access" mode, the VRAM blocks are allocated as Work RAM to the NDS7 CPU.
In BG/OBJ VRAM modes, VRAM can be accessed by the CPU at specified addresses, and by the display controller.
In Extended Palette and Texture Image/Palette modes, VRAM is not mapped to CPU address space, and can be accessed only by the display controller (so, to initialize or change the memory, it should be temporarily switched to Plain-CPU mode).
All VRAM (and Palette, and OAM) can be written to only in 16bit and 32bit units (STRH, STR opcodes), 8bit writes are ignored (by STRB opcode). The only exception is "Plain <ARM7>-CPU Access" mode: The ARM7 CPU can use STRB to write to VRAM (the reason for this special feature is that, in GBA mode, two 128K VRAM blocks are used to emulate the GBA's 256K Work RAM).

Other Video RAM
Aside from the map-able VRAM blocks, there are also some video-related memory regions at fixed addresses:
  5000000h Engine A Standard BG Palette (512 bytes)
  5000200h Engine A Standard OBJ Palette (512 bytes)
  5000400h Engine B Standard BG Palette (512 bytes)
  5000600h Engine B Standard OBJ Palette (512 bytes)
  7000000h Engine A OAM (1024 bytes)
  7000400h Engine B OAM (1024 bytes)

  DS Memory Control - BIOS

4000308h - NDS7 - BIOSPROT - Bios-data-read-protection address
Used to double-protect the first some KBytes of the NDS7 BIOS. The BIOS is split into two protection regions, one always active, one controlled by the BIOSPROT register. The overall idea is that only the BIOS can read from itself, any other attempts to read from that regions return FFh-bytes.
  Opcodes at...      Can read from      Expl.
  0..[BIOSPROT]-1    0..3FFFh           Double-protected (when BIOSPROT is set)
  [BIOSPROT]..3FFFh  [BIOSPROT]..3FFFh  Normal-protected (always active)
The initial BIOSPROT setting on power-up is zero (disabled). Before starting the cartridge, the BIOS boot code sets the register to 1204h (actually 1205h, but the mis-aligned low-bit is ignored). Once when initialized, further writes to the register are ignored.

The double-protected region contains the exception vectors, some bytes of code, and the cartridge KEY1 encryption seed (about 4KBytes). As far as I know, it is impossible to unlock the memory once when it is locked, however, with some trickery, it is possible execute code before it gets locked. Also, the two THUMB opcodes at 05ECh can be used to read all memory at 0..3FFFh,
  05ECh  ldrb r3,[r3,12h]      ;requires incoming r3=src-12h
  05EEh  pop  r2,r4,r6,r7,r15  ;requires dummy values & THUMB retadr on stack
Additionally most BIOS functions (eg. CpuSet), include a software-based protection which rejects source addresses in the BIOS area (the only exception is GetCRC16, though it still cannot bypass the BIOSPROT setting).

The NDS9 BIOS doesn't include any software or hardware based read protection.

  DS Memory Timings

System Clock
  Bus clock  = 33MHz (33.513982 MHz) (1FF61FEh Hertz)
  NDS7 clock = 33MHz (same as bus clock)
  NDS9 clock = 66MHz (internally twice bus clock; for cache/tcm)
Most timings in this document are specified for 33MHz clock (not for the 66MHz clock). Respectively, NDS9 timings are counted in "half" cycles.

Memory Access Times
Tables below show the different access times for code/data fetches on arm7/arm9 cpus, measured for sequential/nonsequential 32bit/16bit accesses.
  NDS7/CODE             NDS9/CODE
  N32 S32 N16 S16 Bus   N32 S32 N16 S16 Bus
  9   2   8   1   16    9   9   4.5 4.5 16  Main RAM (read) (cache off)
  1   1   1   1   32    4   4   2   2   32  WRAM,BIOS,I/O,OAM
  2   2   1   1   16    5   5   2.5 2.5 16  VRAM,Palette RAM
  16  12  10  6   16    19  19  9.5 9.5 16  GBA ROM (example 10,6 access)
  -   -   -   -   -     0.5 0.5 0.5 0.5 32  TCM, Cache_Hit
  -   -   -   -   -     (--Load 8 words--)  Cache_Miss

  NDS7/DATA             NDS9/DATA
  N32 S32 N16 S16 Bus   N32 S32 N16 S16 Bus
  10  2   9   1   16    10  2   9   1   16  Main RAM (read) (cache off)
  1   1   1   1   32    4   1   4   1   32  WRAM,BIOS,I/O,OAM
  1?  2   1   1   16    5   2   4   1   16  VRAM,Palette RAM
  15  12  9   6   16    19  12  13  6   16  GBA ROM (example 10,6 access)
  9   10  9   10  8     13  10  13  10  8   GBA RAM (example 10 access)
  -   -   -   -   -     0.5 0.5 0.5 -   32  TCM, Cache_Hit
  -   -   -   -   -     (--Load 8 words--)  Cache_Miss
  -   -   -   -   -     11  11  11  -   32  Cache_Miss (BIOS)
  -   -   -   -   -     23  23  23  -   16  Cache_Miss (Main RAM)
All timings are counted in 33MHz units (so "half" cycles can occur on NDS9).
Note: 8bit data accesses have same timings than 16bit data.

*** DS Memory Timing Notes ***

The NDS timings are altogether pretty messed up, with different timings for CODE and DATA fetches, and different timings for NDS7 and NDS9...

Timings for this region can be considered as "should be" timings.

Quite the same as NDS7/CODE. Except that, nonsequential Main RAM accesses are 1 cycle slower, and more strange, nonsequential GBA Slot accesses are 1 cycle faster.

This is the most messiest timing. An infamous PENALTY of 3 cycles is added to all nonsequential accesses (except cache, tcm, and main ram). And, all opcode fetches are forcefully made nonsequential 32bit (the NDS9 simply doesn't support fast sequential opcode fetches). That applies also for THUMB code (two 16bit opcodes are fetched by a single nonsequential 32bit access) (so the time per 16bit opcode is one half of the 32bit fetch) (unless a branch causes only one of the two 16bit opcodes to be executed, then that opcode will have the full 32bit access time).

Allows both sequential and nonsequential access, and both 16bit and 32bit access, so it's faster than NDS9/CODE. Nethertheless, it's still having the 3 cycle PENALTY on nonsequential accesses. And, similar as NDS7/DATA, it's also adding 1 cycle to nonsequential Main RAM accesses.

*** More Timing Notes / Lots of unsorted Info ***

Actual CPU Performance
The 33MHz NDS7 is running more or less nicely at 33MHz. However, the so-called "66MHz" NDS9 is having <much> higher waitstates, and it's effective bus speed is barely about 8..16MHz, the only exception is code/data in cache/tcm, which is eventually reaching real 66MHz (that, assuming cache HITS, otherwise, in case of cache MISSES, the cached memory timing might even drop to 1.4MHz or so?).
ARM9 opcode fetches are always N32 + 3 waits.
  S16 and N16 do not exist (because thumb-double-fetching) (see there).
  S32 becomes N32 (ie. the ARM9 does NOT support fast sequential timing).
That N32 is having same timing as normal N32 access on NDS7, plus 3 waits.
  Eg. an ARM9 N32 or S32 to 16bit bus will take: N16 + S16 + 3 waits.
  Eg. an ARM9 N32 or S32 to 32bit bus will take: N32 + 3 waits.
Main Memory is ALWAYS having the nonsequential 3 wait PENALTY (even on ARM7).
ARM9 Data fetches however are allowed to use sequential timing, as well as raw 16bit accesses (which aren't forcefully expanded to slow 32bit accesses).
Nethertheless, the 3 wait PENALTY is added to any NONSEQUENTIAL accesses.
Only exceptions are cache and tcm which do not have that penalty.
 Eg. LDRH on 16bit-data-bus is N16+3waits.
 Eg. LDR  on 16bit-data-bus is N16+S16+3waits.
 Eg. LDM  on 16bit-data-bus is N16+(n*2-1)*S16+3waits.
Eventually, data fetches can take place parallel with opcode fetches.
 That is NOT true for LDM (works only for LDR/LDRB/LDRH).
 That is NOT true for DATA in SAME memory region than CODE.
 That is NOT true for DATA in ITCM (no matter if CODE is in ITCM).

NDS9 Busses
Unlike ARM7, the ARM9 has separate code and data busses, allowing it to perform code and data fetches simultaneously (provided that both are in different memory regions).
Normally, opcode execution times are calculated as "(codetime+datatime)", with the two busses, it can (ideally) be "MAX(codetime,datatime)", so the data access time may virtually take "NULL" clock cycles.
In practice, DTCM and Data Cache access can take NULL cycles (however, data access to ITCM can't).
When executing code in cache/itcm, data access to non-cache/tcm won't be any faster than with only one bus (as it's best, it could subtract 0.5 cycles from datatime, but, the access must be "aligned" to the bus-clock, so the "datatime-0.5" will be rounded back to the original "datatime").
When executing code in uncached main ram, and accessing data (elsewhere than in main memory, cache/tcm), then execution time is typically "codetime+datatime-2".

NDS9 Internal Cycles
Additionally to codetime+datatime, some opcodes include one or more internal cycles. Compared with ARM7, the behaviour of that internal cycles is slightly different on ARM9. First of, on the NDS9, the internal cycles are of course "half" cycles (ie. counted in 66MHz units, not in 33MHz units) (although they may get rounded to "full" cycles upon next memory access outside tcm/cache). And, the ARM9 is in some cases "skipping" the internal cycles, that often depending on whether or not the next opcode is using the result of the current opcode.
Another big difference is that the ARM9 has lost the fast-multiply feature for small numbers; in some cases that may result in faster execution, but may also result in slower execution (one workaround would be to manually replace MUL opcodes by the new ARM9 halfword multiply opcodes); the slowest case are MUL opcodes that do update flags (eg. MULS, MLAS, SMULLS, etc. in ARM mode, and all ALL multiply opcodes in THUMB mode).

NDS9 Thumb Code
In thumb mode, the NDS9 is fetching two 16bit opcodes by a single 32bit read. In case of 32bit bus, this reduces the amount of memory traffic and may result in faster execution time, of course that works only if the two opcodes are within a word-aligned region (eg. loops at word-aligned addresses will be faster than non-aligned loops). However, the double-opcode-fetching is also done on 16bit bus memory, including for unnecessary fetches, such like opcodes after branch commands, so the feature may cause heavy slowdowns.

Main Memory
Reportedly, the main memory access times would be 5 cycles (nonsequential read), 4 cycles (nonsequential write), and 1 cycle (sequential read or write). Plus whatever termination cycles. Plus 3 cycles on nonsequential access to the last 2-bytes of a 32-byte block.
That's of course all wrong. Reads are much slower than 5 cycles. Not yet tested if writes are faster. And, I haven't been able to reproduce the 3 cycles on last 2-bytes effect, actually, it looks more as if that 3 cycles are accidently added to ALL nonsequential accesses, at ALL main memory addresses, and even to most OTHER memory regions... which might be the source of the PENALTY which occurs on VRAM/WRAM/OAM/Palette and I/O accesses.

In some cases DMA main memory read cycles are reportedly performed simultaneously with DMA write cycles to other memory.

On the NDS9, all external memory access (and I/O) is delayed to bus clock (or actually MUCH slower due to the massive waitstates), so the full 66MHz can be used only internally in the NDS9 CPU core, ie. with cache and TCM.

Bus Clock
The exact bus clock is specified as 33.513982 MHz (1FF61FEh Hertz). However, on my own NDS, measured in relation to the RTC seconds IRQ, it appears more like 1FF6231h, that inaccuary of 1 cycle per 657138 cycles (about one second per week) on either oscillator, isn't too significant though.

GBA Slot
The access time for GBA slot can be configured via EXMEMCNT register.

VRAM Waitstates
Additionally, on NDS9, a one cycle wait can be added to VRAM accesses (when the video controller simultaneously accesses it) (that can be disabled by Forced Blank, see DISPCNT.Bit7). Moreover, additional VRAM waitstates occur when using the video capture function.
Note: VRAM being mapped to NDS7 is always free of additional waits.

  DS Video

The NDS has two 2D Video Engines, each basically the same as in GBA, see
GBA LCD Video Controller

NDS Specific 2D Video Features
DS Video Stuff
DS Video BG Modes / Control
DS Video OBJs
DS Video Extended Palettes
DS Video Capture and Main Memory Display Mode
DS Video Display System Block Diagram

For Display Power Control (and Display Swap), and VRAM Allocation, see
DS Power Management
DS Memory Control - VRAM

  DS Video Stuff

DS Display Dimensions / Timings
Dot clock = 5.585664 MHz (=33.513982 MHz / 6)
H-Timing: 256 dots visible, 99 dots blanking, 355 dots total (15.7343KHz)
V-Timing: 192 lines visible, 71 lines blanking, 263 lines total (59.8261 Hz)
The V-Blank cycle for the 3D Engine consists of the 23 lines, 191..213.
Screen size 62.5mm x 47.0mm (each) (256x192 pixels)
Vertical space between screens 22mm (equivalent to 90 pixels)

400006Ch - NDS9 - MASTER_BRIGHT - 16bit - Master Brightness Up/Down
  0-4   Factor used for 6bit R,G,B Intensities (0-16, values >16 same as 16)
          Brightness up:   New = Old + (63-Old) * Factor/16
          Brightness down: New = Old - Old      * Factor/16
  5-13  Not used
  14-15 Mode (0=Disable, 1=Up, 2=Down, 3=Reserved)
  16-31 Not used

The LY and LYC values are in range 0..262, so LY/LYC values have been expanded to 9bit values: LY = VCOUNT Bit 0..8, and LYC=DISPSTAT Bit8..15,7.
VCOUNT register is write-able, allowing to synchronize linked DS consoles.
For proper synchronization:
  write new LY values only in range of 202..212
  write only while old LY values are in range of 202..212
DISPSTAT/VCOUNT supported by NDS9 (Engine A Ports, without separate Engine B Ports), and by NDS7 (allowing to synchronize NDS7 with display timings).
Similar as on GBA, the VBlank flag isn't set in the last line (ie. only in lines 192..261, but not in line 262).
Although the drawing time is only 1536 cycles (256*6), the NDS9 H-Blank flag is "0" for a total of 1606 cycles (and, for whatever reason, a bit longer, 1613 cycles in total, on NDS7).

VRAM Waitstates
The display controller performs VRAM-reads once every 6 clock cycles, a 1 cycle waitstate is generated if the CPU simultaneously accesses VRAM. With capture enabled, additionally VRAM-writes take place once every 6 cycles, so the total VRAM-read/write access rate is then once every 3 cycles.

DS Window Glitches
The DS counts scanlines in range 0..262 (0..106h), of which only the lower 8bit are compared with the WIN0V/WIN1V register settings. Respectively, Y1 coordinates 00h..06h will be triggered in scanlines 100h-106h by mistake. That means, the window gets activated within VBlank period, and will be active in scanline 0 and up (that is no problem with Y1=0, but Y1=1..6 will appear as if if Y1 would be 0). Workaround would be to disable the Window during VBlank, or to change Y1 during VBlank (to a value that does not occur during VBlank period, ie. 7..191).
Also, there's a problem to fit the 256 pixel horizontal screen resolution into 8bit values: X1=00h is treated as 0 (left-most), X2=00h is treated as 100h (right-most). However, the window is not displayed if X1=X2=00h; the window width can be max 255 pixels.

2D Engines
Includes two 2D Engines, called A and B. Both engines are accessed by the ARM9 processor, each using different memory and register addresses:
  Region______Engine A______________Engine B___________
  I/O Ports   4000000h              4001000h
  Palette     5000000h (1K)         5000400h (1K)
  BG VRAM     6000000h (max 512K)   6200000h (max 128K)
  OBJ VRAM    6400000h (max 256K)   6600000h (max 128K)
  OAM         7000000h (1K)         7000400h (1K)
Engine A additionally supports 3D and large-screen 256-color Bitmaps, plus main-memory-display and vram-display modes, plus capture unit.

Viewing Angles
The LCD screens are best viewed at viewing angles of 90 degrees. Colors may appear distorted, and may even become invisible at other viewing angles.
When the console is handheld, both screens can be turned into preferred direction. When the console is settled on a table, only the upper screen can be turned, but the lower screen is stuck into horizontal position - which results in rather bad visibility (unless the user moves his/her head directly above of it).

4000070h - NDS9 - TVOUTCNT - Unknown (W)
  Bit0-3  "COMMAND"  (?)
  Bit4-7  "COMMAND2" (?)
  Bit8-11 "COMMAND3" (?)
This register has been mentioned in an early I/O map from Nintendo, as far as I know, the register isn't used by any games/firmware/bios, not sure if it does really exist on release-version, or if it's been prototype stuff...?

DS-Lite Screens
The screens in the DS-Lite seem to allow a wider range of vertical angles.
The bad news is that the colors of the DS-Lite are (no surprise) not backwards compatible with older NDS and GBA displays. The good news is that Nintendo has finally reached near-CRT-quality (without blurred colors), so one could hope that they won't show up with more displays with other colors in future.
Don't know if there's an official/recommended way to detect DS-Lite displays (?) possible methods would be whatever values in Firmware header, or by functionality of Power Managment device, or (not too LCD-related) by Wifi Chip ID.

  DS Video BG Modes / Control

4000000h - NDS9 - DISPCNT
  Bit  Engine Expl.
  0-2   A+B   BG Mode
  3     A     BG0 2D/3D Selection (instead CGB Mode) (0=2D, 1=3D)
  4     A+B   Tile OBJ Mapping        (0=2D; max 32KB, 1=1D; max 32KB..256KB)
  5     A+B   Bitmap OBJ 2D-Dimension (0=128x512 dots, 1=256x256 dots)
  6     A+B   Bitmap OBJ Mapping      (0=2D; max 128KB, 1=1D; max 128KB..256KB)
  7-15  A+B   Same as GBA
  16-17 A+B   Display Mode (Engine A: 0..3, Engine B: 0..1, GBA: Green Swap)
  18-19 A     VRAM block (0..3=VRAM A..D) (For Capture & above Display Mode=2)
  20-21 A+B   Tile OBJ 1D-Boundary   (see Bit4)
  22    A     Bitmap OBJ 1D-Boundary (see Bit5-6)
  23    A+B   OBJ Processing during H-Blank (was located in Bit5 on GBA)
  24-26 A     Character Base (in 64K steps) (merged with 16K step in BGxCNT)
  27-29 A     Screen Base (in 64K steps) (merged with 2K step in BGxCNT)
  30    A+B   BG Extended Palettes   (0=Disable, 1=Enable)
  31    A+B   OBJ Extended Palettes  (0=Disable, 1=Enable)

BG Mode
Engine A BG Mode (DISPCNT LSBs) (0-6, 7=Reserved)
  Mode  BG0      BG1      BG2      BG3
  0     Text/3D  Text     Text     Text
  1     Text/3D  Text     Text     Affine
  2     Text/3D  Text     Affine   Affine
  3     Text/3D  Text     Text     Extended
  4     Text/3D  Text     Affine   Extended
  5     Text/3D  Text     Extended Extended
  6     3D       -        Large    -
Of which, the "Extended" modes are sub-selected by BGxCNT bits:
  BGxCNT.Bit7 BGxCNT.Bit2 Extended Affine Mode Selection
  0           CharBaseLsb rot/scal with 16bit bgmap entries (Text+Affine mixup)
  1           0           rot/scal 256 color bitmap
  1           1           rot/scal direct color bitmap
Engine B: Same as above, except that: Mode 6 is reserved (no Large screen bitmap), and BG0 is always Text (no 3D support).
Affine = formerly Rot/Scal mode (with 8bit BG Map entries)
Large Screen Bitmap = rot/scal 256 color bitmap (using all 512K of 2D VRAM)

Display Mode (DISPCNT.16-17):
  0  Display off (screen becomes white)
  1  Graphics Display (normal BG and OBJ layers)
  2  Engine A only: VRAM Display (Bitmap from block selected in DISPCNT.18-19)
  3  Engine A only: Main Memory Display (Bitmap DMA transfer from Main RAM)
Mode 2-3 display a raw direct color bitmap (15bit RGB values, the upper bit in each halfword is unused), without any further BG,OBJ,3D layers, these modes are completely bypassing the 2D/3D engines as well as any 2D effects, however the Master Brightness effect can be applied to these modes. Mode 2 is particulary useful to display captured 2D/3D images (in that case it can indirectly use the 2D/3D engine).

character base extended from bit2-3 to bit2-5 (bit4-5 formerly unused)
  engine A screen base: BGxCNT.bits*2K + DISPCNT.bits*64K
  engine B screen base: BGxCNT.bits*2K + 0
  engine A char base: BGxCNT.bits*16K + DISPCNT.bits*64K
  engine B char base: BGxCNT.bits*16K + 0
char base is used only in tile/map modes (not bitmap modes)
screen base is used in tile/map modes,
screen base used in bitmap modes as BGxCNT.bits*16K, without DISPCNT.bits*64K
screen base however NOT used at all for Large screen bitmap mode
  bgcnt size  text     rotscal    bitmap   large bmp
  0           256x256  128x128    128x128  512x1024
  1           512x256  256x256    256x256  1024x512
  2           256x512  512x512    512x256  -
  3           512x512  1024x1024  512x512  -
bitmaps that require more than 128K VRAM are supported on engine A only.

For BGxCNT.Bit7 and BGxCNT.Bit2 in Extended Affine modes, see above BG Mode description (extended affine doesn't include 16-color modes, so color depth bit can be used for mode selection. Also, bitmap modes do not use charbase, so charbase.0 can be used for mode selection as well).

for BG0, BG1 only: bit13 selects extended palette slot
                   (BG0: 0=Slot0, 1=Slot2, BG1: 0=Slot1, 1=Slot3)

Direct Color Bitmap BG, and Direct Color Bitmap OBJ
BG/OBJ Supports 32K colors (15bit RGB value) - so far same as GBAs BG.
However, the upper bit (Bit15) is used as Alpha flag. That is, Alpha=0=Transparent, Alpha=1=Normal (ie. on the NDS, Direct Color values 0..7FFFh are NOT displayed).

Unlike GBA bitmap modes, NDS bitmap modes are supporting the Area Overflow bit (BG2CNT and BG3CNT, Bit 13).

  DS Video OBJs

DS OBJ Priority
The GBA has been assigning OBJ priority in respect to the 7bit OAM entry number, regardless of the OBJs 2bit BG-priority attribute (which allowed to specify invalid priority orders). That problem has been fixed in DS mode by combining the above two values into a 9bit priority value.

OBJ Tile Mapping (DISPCNT.4,20-21):
  Bit4  Bit20-21  Dimension Boundary Total ;Notes
  0     x         2D        32       32K   ;Same as GBA 2D Mapping
  1     0         1D        32       32K   ;Same as GBA 1D Mapping
  1     1         1D        64       64K
  1     2         1D        128      128K
  1     3         1D        256      256K  ;Engine B: 128K max
TileVramAddress = TileNumber * BoundaryValue
Even if the boundary gets changed, OBJs are kept composed of 8x8 tiles.

Bitmap OBJ Mapping (DISPCNT.6,5,22):
Bitmap OBJs are 15bit Direct Color data, plus 1bit Alpha flag (in bit15).
  Bit6 Bit5 Bit22 Dimension    Boundary   Total ;Notes
  0    0    x     2D/128 dots  8x8 dots   128K  ;Source Bitmap width 128 dots
  0    1    x     2D/256 dots  8x8 dots   128K  ;Source Bitmap width 256 dots
  1    0    0     1D           128 bytes  128K  ;Source Width = Target Width
  1    0    1     1D           256 bytes  256K  ;Engine A only
  1    1    x     Reserved
In 1D mapping mode, the Tile Number is simply multiplied by the boundary value.
  1D_BitmapVramAddress = TileNumber(0..3FFh) * BoundaryValue(128..256)
  2D_BitmapVramAddress = (TileNo AND MaskX)*10h + (TileNo AND NOT MaskX)*80h
In 2D mode, the Tile Number is split into X and Y indices, the X index is located in the LSBs (ie. MaskX=0Fh, or MaskX=1Fh, depending on DISPCNT.5).

OBJ Attribute 0 and 2
Setting the OBJ Mode bits (Attr 0, Bit10-11) to a value of 3 has been prohibited in GBA, however, in NDS it selects the the new Bitmap OBJ mode; in that mode, the Color depth bit (Attr 0, Bit13) should be set to zero; also in that mode, the color bits (Attr 2, Bit 12-15) are used as Alpha-OAM value (instead of as palette setting).

OBJ Vertical Wrap
On the GBA, a large OBJ (with 64pix height, scaled into double-size region of 128pix height) located near the bottom of the screen has been wrapped to the top of the screen (and was NOT displayed at the bottom of the screen).
This problem has been "corrected" in the NDS (except in GBA mode), that is, on the NDS, the OBJ appears BOTH at the top and bottom of the screen. That isn't necessarily better - the advantage is that one can manually enable/disable the OBJ in the desired screen-half on IRQ level; that'd be required only if the wrapped portion is non-transparent.

  DS Video Extended Palettes

Extended Palettes
When allocating extended palettes, the allocated memory is not mapped to the CPU bus, so the CPU can access extended palette only when temporarily de-allocating it.

Color 0 of all standard/extended palettes is transparent, color 0 of BG standard palette 0 is used as backdrop. extended palette memory must be allocated to VRAM.

BG Extended Palette enabled in DISPCNT Bit 30, when enabled,
 standard palette --> 16-color tiles (with 16bit bgmap entries) (text)
                      256-color tiles (with 8bit bgmap entries) (rot/scal)
                      256-color bitmaps
                      backdrop-color (color 0)
 extended palette --> 256-color tiles (with 16bit bgmap entries)(text,rot/scal)
Allocated VRAM is split into 4 slots of 8K each (32K used in total), normally BG0..3 are using Slot 0..3, however BG0 and BG1 can be optionally changed to BG0=Slot2, and BG1=Slot3 via BG0CNT and BG1CNT.

OBJ Extended Palette enabled in DISPCNT Bit 31, when enabled,
 16 colors x 16 palettes --> standard palette memory (=256 colors)
 256 colors x 16 palettes --> extended palette memory (=4096 colors)
Extended OBJ palette memory must be allocated to VRAM F, G, or I (which are 16K) of which only the first 8K are used for extended palettes (=1000h 16bit entries).

  DS Video Capture and Main Memory Display Mode

4000064h - NDS9 - DISPCAPCNT - 32bit - Display Capture Control Register (R/W)
Capture is supported for Display Engine A only.
  0-4   EVA               (0..16 = Blending Factor for Source A)
  5-7   Not used
  8-12  EVB               (0..16 = Blending Factor for Source B)
  13-15 Not used
  16-17 VRAM Write Block  (0..3 = VRAM A..D) (VRAM must be allocated to LCDC)
  18-19 VRAM Write Offset (0=00000h, 0=08000h, 0=10000h, 0=18000h)
  20-21 Capture Size      (0=128x128, 1=256x64, 2=256x128, 3=256x192 dots)
  22-23 Not used
  24    Source A          (0=Graphics Screen BG+3D+OBJ, 1=3D Screen)
  25    Source B          (0=VRAM, 1=Main Memory Display FIFO)
  26-27 VRAM Read Offset  (0=00000h, 0=08000h, 0=10000h, 0=18000h)
  28    Not used
  29-30 Capture Source    (0=Source A, 1=Source B, 2/3=Sources A+B blended)
  31    Capture Enable    (0=Disable/Ready, 1=Enable/Busy)
VRAM Read Block (VRAM A..D) is selected in DISPCNT Bits 18-19.
VRAM Read Block can be (or must be ?) allocated to LCDC (MST=0).
VRAM Read Offset is ignored (zero) in VRAM Display Mode (DISPCNT.16-17).
VRAM Read/Write Offsets wrap to 00000h when exceeding 1FFFFh (max 128K).
Capture Sizes less than 256x192 capture the upper-left portion of the screen.
Blending factors EVA and EVB are used only if "Source A+B blended" selected.
After setting the Capture Enable bit, capture starts at next line 0, and the capture enable/busy bit is then automatically cleared (in line 192, regardless of the capture size).

Capture data is 15bit color depth (even when capturing 18bit 3D-images).
Capture A: Dest_Intensity = SrcA_Intensitity ; Dest_Alpha=SrcA_Alpha.
Capture B: Dest_Intensity = SrcB_Intensitity ; Dest_Alpha=SrcB_Alpha.
Capture A+B (blending):
 Dest_Intensity = (  (SrcA_Intensitity * SrcA_Alpha * EVA)
                   + (SrcB_Intensitity * SrcB_Alpha * EVB) ) / 16
 Dest_Alpha = (SrcA_Alpha AND (EVA>0)) OR (SrcB_Alpha AND EVB>0))

Capture provides a couple of interesting effects.
For example, 3D Engine output can be captured via source A (to LCDC-allocated VRAM), in the next frame, either Graphics Engine A or B can display the captured 3D image in VRAM image as BG2, BG3, or OBJ (from BG/OBJ-allocated VRAM); this method requires to switch between LCDC- and BG/OBJ-allocation.
Another example would be to capture Engine A output, the captured image can be displayed (via VRAM Display mode) in the following frames, simultaneously the new Engine A output can be captured, blended with the old captured image; in that mode moved objects will leave traces on the screen; this method works with a single LCDC-allocated VRAM block.
DS Video Display System Block Diagram

4000068h - NDS9 - DISP_MMEM_FIFO - 32bit - Main Memory Display FIFO (R?/W)
Intended to send 256x192 pixel 32K color bitmaps by DMA directly
 - to Screen A             (set DISPCNT to Main Memory Display mode), or
 - to Display Capture unit (set DISPCAPCNT to Main Memory Source).
The FIFO can receive 4 words (8 pixels) at a time, each pixel is a 15bit RGB value (the upper bit, bit15, is unused).
Set DMA to Main Memory mode, 32bit transfer width, word count set to 4, destination address to DISP_MMEM_FIFO, source address must be in Main Memory.
Transfer starts at next frame.
Main Memory Display/Capture is supported for Display Engine A only.

  DS Video Display System Block Diagram
             _____________               __________
  VRAM A -->| 2D Graphics |--------OBJ->|          |
  VRAM B -->| Engine A    |--------BG3->| Layering |
  VRAM C -->|             |--------BG2->| and      |
  VRAM D -->|             |--------BG1->| Special  |
  VRAM E -->|             |   ___       | Effects  |
  VRAM F -->|             |->|SEL|      |          |          ______
  VRAM G -->| - - - - - - |  |BG0|-BG0->|          |----+--->|      |
            | 3D Graphics |->|___|      |__________|    |    |Select|
            | Engine      |                             |    |Video |
            |_____________|--------3D----------------+  |    |Input |
             _______      _______              ___   |  |    |      |
            |       |    |       |<-----------|SEL|<-+  |    |and   |-->
            |       |    |       |    _____   |A  |     |    |      |
  VRAM A <--|Select |    |Select |   |     |<-|___|<----+    |Master|
  VRAM B <--|Capture|<---|Capture|<--|Blend|   ___           |Bright|
  VRAM C <--|Dest.  |    |Source |   |_____|<-|SEL|<----+    |A     |
  VRAM D <--|       |    |       |            |B  |     |    |      |
            |_______|    |_______|<-----------|___|<-+  |    |      |
             _______                                 |  |    |      |
  VRAM A -->|Select |                                |  |    |      |
  VRAM B -->|Display|--------------------------------+------>|      |
  VRAM C -->|VRAM   |                                   |    |      |
  VRAM D -->|_______|   _____________                   |    |      |
                       |Main Memory  |                  |    |      |
  Main   ------DMA---->|Display FIFO |------------------+--->|______|
  Memory               |_____________|
             _____________               __________           ______
  VRAM C -->| 2D Graphics |--------OBJ->| Layering |         |      |
  VRAM D -->| Engine B    |--------BG3->| and      |         |Master|
  VRAM H -->|             |--------BG2->| Special  |-------->|Bright|-->
  VRAM I -->|             |--------BG1->| Effects  |         |B     |
            |_____________|--------BG0->|__________|         |______|

  DS 3D Video

DS 3D Overview
DS 3D I/O Map
DS 3D Display Control
DS 3D Geometry Commands
DS 3D Matrix Load/Multiply
DS 3D Matrix Types
DS 3D Matrix Stack
DS 3D Matrix Examples (Projection)
DS 3D Matrix Examples (Rotate/Scale/Translate)
DS 3D Matrix Examples (Maths Basics)
DS 3D Polygon Attributes
DS 3D Polygon Definitions by Vertices
DS 3D Polygon Light Parameters
DS 3D Shadow Polygons
DS 3D Texture Attributes
DS 3D Texture Formats
DS 3D Texture Coordinates
DS 3D Texture Blending
DS 3D Toon, Edge, Fog, Alpha-Blending, Anti-Aliasing
DS 3D Status
DS 3D Tests
DS 3D Rear-Plane
DS 3D Final 2D Output

3D is more or less (about 92%) understood and described.

  DS 3D Overview

The NDS 3D hardware consists of a Geometry Engine, and a Rendering Engine.

Geometry Engine (Precalculate coordinates & assign polygon attributes)
Geometry commands can be sent via Ports 4000440h and up (or alternately, written directly to Port 4000400h).
The commands include matrix and vector multiplications, the purpose is to rotate/scale/translate coordinates (vertices), the resulting coordinates are stored in Vertex RAM.
Moreover, it allows to assign attributes to the polygons and vertices, that includes vertex colors (or automatically calculated light colors), texture attributes, number of vertices per polygon (three or four), and a number of flags, these attributes are stored in Polygon RAM. Polygon RAM also contains pointers to the corresponding vertices in Vertex RAM.

Swap Buffers (Pass data from the Geometry Engine to the Rendering Engine)
The hardware includes two sets of Vertex/Polygon RAM, one used by the Geometry Engine, one by the Rendering Engine. The SwapBuffers command simply exchanges these buffers (so the new Geometry Data is passed to the Rendering Engine) (and the old buffer is emptied, so the Geometry engine can write new data to it). Additionally, the two parameter bits from the <previous> SwapBuffers command are copied to the Geometry Engine.
Data that is NOT swapped: SwapBuffers obviously can't swap Texture memory (so software must take care that Texture memory is kept mapped throughout rendering). Moreover, the rendering control registers (ports 4000060h, and 4000330h..40003BFh) are not swapped (so that values must be kept intact during rendering, too).

Rendering Engine (Display Output)
The Rendering Engine draws the various Polygons, and outputs them as BG0 layer to the 2D Video controller (which may then output them to the screen, or to the video capture unit). The Rendering part is done automatically by hardware, so the software has little influence on it.
Rendering is done scanline-by-scanline, so there's only a limited number of clock cycles per scanline, which is limiting the maximum number of polygons per scanline. However, due to the 48-line cache (see below), some scanlines are allowed to exceed that maximum.
Rendering starts 48 lines in advance (while still in the Vblank period) (and does then continue throughout the whole display period), the rendered data is written to a small cache that can hold up to 48 scanlines.

Scanline Cache vs Framebuffer
Note: There's only the 48-line cache (not a full 192-line framebuffer to store the whole rendered image). That is perfectly reasonable since animated data is normally drawn only once (so there would be no need to store it). That, assuming that the Geometry Engine presents new data every frame (otherwise, if the Geometry software is too slow, or if the image isn't animated, then the hardware is automatically rendering the same image again, and again).

  DS 3D I/O Map

3D I/O Map
  Address  Siz Name            Expl.
  Rendering Engine (per Frame settings)
  4000060h 2   DISP3DCNT       3D Display Control Register (R/W)
  4000320h 1   RDLINES_COUNT   Rendered Line Count Register (R)
  4000330h 10h EDGE_COLOR      Edge Colors 0..7 (W)
  4000340h 1   ALPHA_TEST_REF  Alpha-Test Comparision Value (W)
  4000350h 4   CLEAR_COLOR     Clear Color Attribute Register (W)
  4000354h 2   CLEAR_DEPTH     Clear Depth Register (W)
  4000356h 2   CLRIMAGE_OFFSET Rear-plane Bitmap Scroll Offsets (W)
  4000358h 4   FOG_COLOR       Fog Color (W)
  400035Ch 2   FOG_OFFSET      Fog Depth Offset (W)
  4000360h 20h FOG_TABLE       Fog Density Table, 32 entries (W)
  4000380h 40h TOON_TABLE      Toon Table, 32 colors (W)
  Geometry Engine (per Polygon/Vertex settings)
  4000400h 40h GXFIFO          Geometry Command FIFO (W)
  4000440h ... ...             Geometry Command Ports (see below)
  4000600h 4   GXSTAT          Geometry Engine Status Register (R and R/W)
  4000604h 4   RAM_COUNT       Polygon List & Vertex RAM Count Register (R)
  4000610h 2   DISP_1DOT_DEPTH 1-Dot Polygon Display Boundary Depth (W)
  4000620h 10h POS_RESULT      Position Test Results (R)
  4000630h 6   VEC_RESULT      Vector Test Results (R)
  4000640h 40h CLIPMTX_RESULT  Read Current Clip Coordinates Matrix (R)
  4000680h 24h VECMTX_RESULT   Read Current Directional Vector Matrix (R)

Geometry Commands (can be invoked by Port Address, or by Command ID)
Table shows Port Address, Command ID, Number of Parameters, and Clock Cycles.
  Address  Cmd Pa.Cy.
  N/A      00h -  -   NOP - No Operation (for padding packed GXFIFO commands)
  4000440h 10h 1  1   MTX_MODE - Set Matrix Mode (W)
  4000444h 11h -  17  MTX_PUSH - Push Current Matrix on Stack (W)
  4000448h 12h 1  36  MTX_POP - Pop Current Matrix from Stack (W)
  400044Ch 13h 1  17  MTX_STORE - Store Current Matrix on Stack (W)
  4000450h 14h 1  36  MTX_RESTORE - Restore Current Matrix from Stack (W)
  4000454h 15h -  19  MTX_IDENTITY - Load Unit Matrix to Current Matrix (W)
  4000458h 16h 16 34  MTX_LOAD_4x4 - Load 4x4 Matrix to Current Matrix (W)
  400045Ch 17h 12 30  MTX_LOAD_4x3 - Load 4x3 Matrix to Current Matrix (W)
  4000460h 18h 16 35* MTX_MULT_4x4 - Multiply Current Matrix by 4x4 Matrix (W)
  4000464h 19h 12 31* MTX_MULT_4x3 - Multiply Current Matrix by 4x3 Matrix (W)
  4000468h 1Ah 9  28* MTX_MULT_3x3 - Multiply Current Matrix by 3x3 Matrix (W)
  400046Ch 1Bh 3  22  MTX_SCALE - Multiply Current Matrix by Scale Matrix (W)
  4000470h 1Ch 3  22* MTX_TRANS - Mult. Curr. Matrix by Translation Matrix (W)
  4000480h 20h 1  1   COLOR - Directly Set Vertex Color (W)
  4000484h 21h 1  9*  NORMAL - Set Normal Vector (W)
  4000488h 22h 1  1   TEXCOORD - Set Texture Coordinates (W)
  400048Ch 23h 2  9   VTX_16 - Set Vertex XYZ Coordinates (W)
  4000490h 24h 1  8   VTX_10 - Set Vertex XYZ Coordinates (W)
  4000494h 25h 1  8   VTX_XY - Set Vertex XY Coordinates (W)
  4000498h 26h 1  8   VTX_XZ - Set Vertex XZ Coordinates (W)
  400049Ch 27h 1  8   VTX_YZ - Set Vertex YZ Coordinates (W)
  40004A0h 28h 1  8   VTX_DIFF - Set Relative Vertex Coordinates (W)
  40004A4h 29h 1  1   POLYGON_ATTR - Set Polygon Attributes (W)
  40004A8h 2Ah 1  1   TEXIMAGE_PARAM - Set Texture Parameters (W)
  40004ACh 2Bh 1  1   PLTT_BASE - Set Texture Palette Base Address (W)
  40004C0h 30h 1  4   DIF_AMB - MaterialColor0 - Diffuse/Ambient Reflect. (W)
  40004C4h 31h 1  4   SPE_EMI - MaterialColor1 - Specular Ref. & Emission (W)
  40004C8h 32h 1  6   LIGHT_VECTOR - Set Light's Directional Vector (W)
  40004CCh 33h 1  1   LIGHT_COLOR - Set Light Color (W)
  40004D0h 34h 32 32  SHININESS - Specular Reflection Shininess Table (W)
  4000500h 40h 1  1   BEGIN_VTXS - Start of Vertex List (W)
  4000504h 41h -  1   END_VTXS - End of Vertex List (W)
  4000540h 50h 1  392 SWAP_BUFFERS - Swap Rendering Engine Buffer (W)
  4000580h 60h 1  1   VIEWPORT - Set Viewport (W)
  40005C0h 70h 3  103 BOX_TEST - Test if Cuboid Sits inside View Volume (W)
  40005C4h 71h 2  9   POS_TEST - Set Position Coordinates for Test (W)
  40005C8h 72h 1  5   VEC_TEST - Set Directional Vector for Test (W)
All cycle timings are counted in 33.51MHz units. NORMAL commands takes 9..12 cycles, depending on the number of enabled lights in PolyAttr (Huh, 9..12 (four timings) cycles for 0..4 (five settings) lights?) Total execution time of SwapBuffers is Duration until VBlank, plus 392 cycles.
In MTX_MODE=2 (Simultanous Set), MTX_MULT/TRANS take additional 30 cycles.

  DS 3D Display Control

4000060h - DISP3DCNT - 3D Display Control Register (R/W)
  0     Texture Mapping      (0=Disable, 1=Enable)
  1     PolygonAttr Shading  (0=Toon Shading, 1=Highlight Shading)
  2     Alpha-Test           (0=Disable, 1=Enable) (see ALPHA_TEST_REF)
  3     Alpha-Blending       (0=Disable, 1=Enable) (see various Alpha values)
  4     Anti-Aliasing        (0=Disable, 1=Enable)
  5     Edge-Marking         (0=Disable, 1=Enable) (see EDGE_COLOR)
  6     Fog Color/Alpha Mode (0=Alpha and Color, 1=Only Alpha) (see FOG_COLOR)
  7     Fog Master Enable    (0=Disable, 1=Enable)
  8-11  Fog Depth Shift      (FOG_STEP=400h shr FOG_SHIFT) (see FOG_OFFSET)
  12    Color Buffer RDLINES Underflow (0=None, 1=Underflow/Acknowledge)
  13    Polygon/Vertex RAM Overflow    (0=None, 1=Overflow/Acknowledge)
  14    Rear-Plane Mode                (0=Blank, 1=Bitmap)
  15-31 Not used

4000540h - Cmd 50h - SWAP_BUFFERS - Swap Rendering Engine Buffer (W)
SwapBuffers exchanges the two sets of Polygon/Vertex RAM buffers, that is, the newly defined polygons/vertices are passed to the rendering engine (and will be displayed in following frame(s)). The other buffer is emptied, and passed to the Geometry Engine (to be filled with new polygons/vertices by Geometry Commands).
  0     Translucent polygon Y-sorting (0=Auto-sort, 1=Manual-sort)
  1     Depth Buffering  (0=With Z-value, 1=With W-value)
        (mode 1 does not function properly with orthogonal projections)
  2-31  Not used
SwapBuffers isn't executed until next VBlank (Scanline 192) (the Geometry Engine is halted for that duration). SwapBuffers should not be issued within Begin/End. The two parameter bits of the SwapBuffers command are used for the following gxcommands (ie. not for the old gxcommands prior to SwapBuffers).
SwapBuffers does lock-up the 3D hardware if an incomplete polygon list has been defined (eg. a triangle with only 2 vertices). On lock-up, only 2D video is kept working, any wait-loops for GXSTAT.27 will hang the program. Once lock-up has occured, there seems to be no way to recover by software, not by sending the missing veric(es), and not even by pulsing POWCNT1.Bit2-3.

4000580h - Cmd 60h - VIEWPORT - Set Viewport (W)
  0-7   Screen/BG0 Coordinate X1 (0..255) (For Fullscreen: 0=Left-most)
  8-15  Screen/BG0 Coordinate Y1 (0..191) (For Fullscreen: 0=Bottom-most)
  16-23 Screen/BG0 Coordinate X2 (0..255) (For Fullscreen: 255=Right-most)
  24-31 Screen/BG0 Coordinate Y2 (0..191) (For Fullscreen: 191=Top-most)
Coordinate 0,0 is the lower-left (unlike for 2D where it'd be upper-left).
The 3D view-volume (size as defined by the Projection Matrix) is automatically scaled to match into the Viewport area. Although polygon vertices are clipped to the view-volume, some vertices may still exceed to X2,Y1 (lower-right) boundary by one pixel, due to some sort of rounding errors. The Viewport settings don't affect the size or position of the 3D Rear-Plane. Viewport should not be issued within Begin/End.

4000610h - DISP_1DOT_DEPTH - 1-Dot Polygon Display Boundary Depth (W)
1-Dot Polygons are very small, or very distant polygons, which would be rendered as a single pixel on screen. Polygons with a depth value greater (more distant) than DISP_1DOT_DEPTH can be automatically hidden; in order to reduce memory consumption, or to reduce dirt on the screen.
  0-14  W-Coordinate (Unsigned, 12bit integer, 3bit fractional part)
  15-31 Not used                 (0000h=Closest, 7FFFh=Most Distant)
The DISP_1DOT_DEPTH comparision can be enabled/disabled per polygon (via POLYGON_ATTR.Bit13), so "important" polygons can be displayed regardless of their size and distance.
Note: The comparision is always using the W-coordinate of the vertex (not the Z-coordinate) (ie. no matter if using Z-buffering, or W-buffering). The polygon is rendered if at least one of its vertices is having a w-coordinate less or equal than DISP_1DOT_DEPTH. NB. despite of checking the w-coords of ALL vertices, the polygon is rendered using the color/depth/texture of its FIRST vertex.
Note: The hardware does round-up the width and height of all polygons to at least 1, so polygons of 0x0, 1x0, 0x1, and 1x1 dots will be all rounded-up to a size of 1x1. Of which, the so-called "1dot" depth check is applied only to the 0x0 dot variant (so "0dot" depth check would be a better name for it).
Caution: Although DISP_1DOT_DEPTH is a Geometry Engine parameter, it is NOT routed through GXFIFO, ie. changes will take place immediately, and will affect all following polygons, including such that are still in GXFIFO. Workaround: ensure that GXFIFO is empty before changing this parameter.

4000340h - ALPHA_TEST_REF - Alpha-Test Comparision Value (W)
Alpha Test can be enabled in DISP3DCNT.Bit2. When enabled, pixels are rendered only if their Alpha value is GREATER than ALPHA_TEST_REF. Otherwise, when disabled, pixels are rendered only if their Alpha value is GREATER than zero. Alpha Test is performed on the final polygon pixels (ie. after texture blending).
  0-4   Alpha-Test Comparision Value (0..31) (Draw pixels if Alpha>AlphaRef)
  5-31  Not used
Value 00h is effectively the same as when Alpha Test is disabled. Value 1Fh hides all polygons, including opaque ones.

  DS 3D Geometry Commands

4000400h - GXFIFO - Geometry Command FIFO (W) (mirrored up to 400043Fh?)
Used to send packed commands, unpacked commands,
  0-7   First  Packed Command (or Unpacked Command)
  8-15  Second Packed Command (or 00h=None)
  16-23 Third  Packed Command (or 00h=None)
  24-31 Fourth Packed Command (or 00h=None)
and parameters,
  0-31  Parameter data for the previously sent (packed) command(s)
to the Geometry engine.

FIFO / PIPE Number of Entries
The FIFO has 256 entries, additionally, there is a PIPE with four entries (giving a total of 260 entries). If the FIFO is empty, and if the PIPE isn't full, then data is moved directly into the PIPE, otherwise it is moved into the FIFO. If the PIPE runs half empty (less than 3 entries) then 2 entries are moved from the FIFO to the PIPE. The state of the FIFO can be obtained in GXSTAT.Bit16-26, observe that there may be still data in the PIPE, even if the FIFO is empty. Check the busy flag in GXSTAT.Bit27 to see if the PIPE or FIFO contains data (or if a command is still executing).
Each PIPE/FIFO entry consists of 40bits of data (8bit command code, plus 32bit parameter value). Commands without parameters occupy 1 entry, and Commands with N parameters occupy N entries.

Sending Commands by Ports 4000440h..40005FFh
Geometry commands can be indirectly sent to the FIFO via ports 4000440h and up.
For a command with N paramters: issue N writes to the port.
For a command without parameters: issue one dummy-write to the port.
That mechanism puts the 8bit command + 32bit parameter into the FIFO/PIPE.
If the FIFO is full, then a wait is generated until data is removed from the FIFO, ie. the STR opcode gets freezed, during the wait, the bus cannot be used even by DMA, interrupts, or by the NDS7 CPU.

GXFIFO Access via DMA
Larger pre-calculated data blocks can be sent directly to the FIFO. This is usually done via DMA (use DMA in Geometry Command Mode, 32bit units, Dest=4000400h/fixed, Length=NumWords, Repeat=0). The timings are handled automatically, ie. the system (should) doesn't freeze when the FIFO is full (see below Overkill note though). DMA starts when the FIFO becomes less than half full, the DMA does then write 112 words to the GXFIFO register (or less, if the remaining DMA transfer length gets zero).

If desired, STR,STRD,STM opcodes can be used to write to the FIFO.
Opcodes that write more than one 32bit value (ie. STRD and STM) can be used to send ONE UNPACKED command, plus any parameters which belong to that command. After that, there must be a 1 cycle delay before sending the next command (ie. one cannot sent more than one command at once with a single opcode, each command must be invoked by a new opcode). STRD and STM can be used because the GXFIFO register is mirrored to 4000400h..43Fh (16 words).
As with Ports 4000440h and up, the CPU gets stopped if (and as long as) the FIFO is full.

GXFIFO / Unpacked Commands
  - command1 (upper 24bit zero)
  - parameter(s) for command1 (if any)
  - command2 (upper 24bit zero)
  - parameter(s) for command2 (if any)
  - command3 (upper 24bit zero)
  - parameter(s) for command3 (if any)

GXFIFO / Packed Commands
  - command1,2,3,4 packed into one 32bit value (all bits used)
  - parameter(s) for command1 (if any)
  - parameter(s) for command2 (if any)
  - parameter(s) for command3 (if any)
  - parameter(s) for command4 (top-most packed command MUST have parameters)
  - command5,6 packed into one 32bit value (upper 16bit zero)
  - parameter(s) for command5 (if any)
  - parameter(s) for command6 (top-most packed command MUST have parameters)
  - command7,8,9 packed into one 32bit value (upper 8bit zero)
  - parameter(s) for command7 (if any)
  - parameter(s) for command8 (if any)
  - parameter(s) for command9 (top-most packed command MUST have parameters)
Packed commands are first decompressed and then stored in command the FIFO.

GXFIFO DMA Overkill on Packed Commands Without Parameters
Normally, the 112 word limit ensures that the FIFO (256 entries) doesn't get full, however, this limit is much too high for sending a lot of "Packed Commands Without Parameters" (ie. PUSH, IDENTITY, or END) - eg. sending 112 x Packed(00151515h) to GXFIFO would write 336 x Cmd(15h) to the FIFO, which is causing the FIFO to get full, and which is causing the DMA (and CPU) to be paused (for several seconds, in WORST case) until enough FIFO commands have been processed to allow the DMA to finish the 112 word transfer.
Not sure if there's much chance to get Overkills in practice. Normally most commands DO have parameters, and so, usually even LESS than 112 FIFO entries are occupied (since 8bit commands with 32bit parameters are merged into single 40bit FIFO entries).

Invalid GX commands
Invalid commands (anything else than 10h..1Ch, 20h..2Bh, 30h..33h, 40h..41h, 50h, 60h, or 70h..72h) seem to be simply ignored by the hardware (at least, testing has confirmed that they do not fetch any parameters from the gxfifo).

  DS 3D Matrix Load/Multiply

4000440h - Cmd 10h - MTX_MODE - Set Matrix Mode (W)
  0-1   Matrix Mode (0..3)
         0  Projection Matrix
         1  Position Matrix (aka Modelview Matrix)
         2  Position & Vector Simultaneous Set mode (used for Light+VEC_TEST)
         3  Texture Matrix (see DS 3D Texture Coordinates chapter)
  2-31  Not used
Selects the current Matrix, all following MTX commands (load, multiply, push, pop, etc.) are applied to that matrix. In Mode 2, all MTX commands are applied to both the Position and Vector matrices (except for MTX_SCALE which doesn't change the Vector Matrix, even in Mode 2).

4000454h - Cmd 15h - MTX_IDENTITY - Load Unit Matrix to Current Matrix (W)
Sets C=I. Parameters: None
The Identity Matrix (I), aka Unit Matrix, consists of all zeroes, with a diagonal row of ones. A matrix multiplied by the Unit Matrix is left unchanged.

4000458h - Cmd 16h - MTX_LOAD_4x4 - Load 4x4 Matrix to Current Matrix (W)
Sets C=M. Parameters: 16, m[0..15]

400045Ch - Cmd 17h - MTX_LOAD_4x3 - Load 4x3 Matrix to Current Matrix (W)
Sets C=M. Parameters: 12, m[0..11]

4000460h - Cmd 18h - MTX_MULT_4x4 - Multiply Current Matrix by 4x4 Matrix (W)
Sets C=M*C. Parameters: 16, m[0..15]

4000464h - Cmd 19h - MTX_MULT_4x3 - Multiply Current Matrix by 4x3 Matrix (W)
Sets C=M*C. Parameters: 12, m[0..11]

4000468h - Cmd 1Ah - MTX_MULT_3x3 - Multiply Current Matrix by 3x3 Matrix (W)
Sets C=M*C. Parameters: 9, m[0..8]

400046Ch - Cmd 1Bh - MTX_SCALE - Multiply Current Matrix by Scale Matrix (W)
Sets C=M*C. Parameters: 3, m[0..2] (MTX_SCALE doesn't change Vector Matrix)

4000470h - Cmd 1Ch - MTX_TRANS - Mult. Curr. Matrix by Translation Matrix (W)
Sets C=M*C. Parameters: 3, m[0..2] (x,y,z position)

4000640h..67Fh - CLIPMTX_RESULT - Read Current Clip Coordinates Matrix (R)
This 64-byte region (16 words) contains the m[0..15] values of the Current Clip Coordinates Matrix, arranged in 4x4 Matrix format. Make sure that the Geometry Engine is stopped (GXSTAT.27) before reading from these registers.
The Clip Matrix is internally used to convert vertices to screen coordinates, and is internally re-calculated anytime when changing the Position or Projection matrices:
  ClipMatrix = PositionMatrix * ProjectionMatrix
To read only the Position Matrix, or only the Projection Matrix: Use Load Identity on the OTHER matrix, so the ClipMatrix becomes equal to the DESIRED matrix (multiplied by the Identity Matrix, which has no effect on the result).

4000680h..6A3h - VECMTX_RESULT - Read Current Directional Vector Matrix (R)
This 36-byte region (9 words) contains the m[0..8] values of the Current Directional Vector Matrix, arranged in 3x3 Matrix format (the fourth row/column may contain any values).
Make sure that the Geometry Engine is stopped (GXSTAT.27) before reading from these registers.

  DS 3D Matrix Types

Essentially, all matrices in the NDS are 4x4 Matrices, consisting of 16 values, m[0..15]. Each element is a signed fixed-point 32bit number, with a fractional part in the lower 12bits.
The other Matrix Types are used to reduce the number of parameters being transferred, for example, 3x3 Matrix requires only nine parameters, the other seven elements are automatically set to 0 or 1.0 (whereas "1.0" means "1 SHL 12" in 12bit fixed-point notation).

   _      4x4 Matrix       _        _    Identity Matrix    _
  | m[0]  m[1]  m[2]  m[3]  |      |  1.0   0     0     0    |
  | m[4]  m[5]  m[6]  m[7]  |      |  0     1.0   0     0    |
  | m[8]  m[9]  m[10] m[11] |      |  0     0     1.0   0    |
  |_m[12] m[13] m[14] m[15]_|      |_ 0     0     0     1.0 _|

   _      4x3 Matrix       _        _  Translation Matrix   _
  | m[0]  m[1]  m[2]   0    |      |  1.0   0     0     0    |
  | m[3]  m[4]  m[5]   0    |      |  0     1.0   0     0    |
  | m[6]  m[7]  m[8]   0    |      |  0     0     1.0   0    |
  |_m[9]  m[10] m[11]  1.0 _|      |_m[0]  m[1]  m[2]   1.0 _|

   _      3x3 Matrix       _        _     Scale Matrix      _
  | m[0]  m[1]  m[2]   0    |      | m[0]   0     0     0    |
  | m[3]  m[4]  m[5]   0    |      |  0    m[1]   0     0    |
  | m[6]  m[7]  m[8]   0    |      |  0     0    m[2]   0    |
  |_ 0     0     0     1.0 _|      |_ 0     0     0     1.0 _|

  DS 3D Matrix Stack

Matrix Stack
The NDS has three Matrix Stacks, and two Matrix Stack Pointers (the Coordinate Matrix stack pointer is also shared for Directional Matrix Stack).
  Matrix Stack        Valid Stack Area    Stack Pointer
  Projection Stack    0..0  (1 entry)     0..1  (1bit) (GXSTAT: 1bit)
  Coordinate Stack    0..30 (31 entries)  0..63 (6bit) (GXSTAT: 5bit only)
  Directional Stack   0..30 (31 entries)  (uses Coordinate Stack Pointer)
  Texture Stack       One..None?          0..1  (1bit) (GXSTAT: N/A)
The initial value of the Stack Pointers is zero, the current value of the pointers can be read from GXSTAT (read-only), that register does also indicate stack overflows (errors flag gets set on read/write to invalid entries, ie. entries 1 or 1Fh..3Fh). For all stacks, the upper half (ie. 1 or 20h..3Fh) are mirrors of the lower half (ie. 0 or 0..1Fh).

4000444h - Cmd 11h - MTX_PUSH - Push Current Matrix on Stack (W)
Parameters: None. Sets [S]=C, and then S=S+1.

4000448h - Cmd 12h - MTX_POP - Pop Current Matrix from Stack (W)
Sets S=S-N, and then C=[S].
  Parameter Bit0-5:  Stack Offset (signed value, -30..+31) (usually +1)
  Parameter Bit6-31: Not used
Offset N=(+1) pops the most recently pushed value, larger offsets of N>1 will "deallocate" N values (and load the Nth value into C). Zero or negative values can be used to pop previously "deallocated" values.
The stack has only one level (at address 0) in projection mode, in that mode, the parameter value is ignored, the offset is always +1 in that mode.

400044Ch - Cmd 13h - MTX_STORE - Store Current Matrix on Stack (W)
Sets [N]=C. The stack pointer S is not used, and is left unchanged.
  Parameter Bit0-4:  Stack Address (0..30) (31 causes overflow in GXSTAT.15)
  Parameter Bit5-31: Not used
The stack has only one level (at address 0) in projection mode, in that mode, the parameter value is ignored.

4000450h - Cmd 14h - MTX_RESTORE - Restore Current Matrix from Stack (W)
Sets C=[N]. The stack pointer S is not used, and is left unchanged.
  Parameter Bit0-4:  Stack Address (0..30) (31 causes overflow in GXSTAT.15)
  Parameter Bit5-31: Not used
The stack has only one level (at address 0) in projection mode, in that mode, the parameter value is ignored.

In Projection mode, the parameter for POP, STORE, and RESTORE is unused - not sure if the parameter (ie. a dummy value) is - or is not - to be written to the command FIFO?
There appear to be actually 32 entries in Coordinate & Directional Stacks, entry 31 appears to exist, and appears to be read/write-able (although the stack overflow flag gets set when accessing it).

  DS 3D Matrix Examples (Projection)

The most important matrix is the Projection Matrix (to be initialized with MTX_MODE=0 via MTX_LOAD_4x4 command). It does specify the dimensions of the view volume.

With Perspective Projections more distant objects will appear smaller, with Orthogonal Projects the size of the objects is always same regardless of their distance.

  Perspective Projection     Orthogonal Projection
                   __                  __________
       top __..--''  |            top |          |
          |   view   |                |   view   |
  Eye ----|--------->|        Eye ----|--------->|
          |__volume  |                |  volume  |
     bottom  ''--..__|          bottom|__________|
        near        far             near        far

Correctly initializing the projection matrix (as shown in the examples below) can be quite difficult (mind that fixed point multiply/divide requires to adjust the fixed-point width before/after calculation). For beginners, it may be recommended to start with a simple Identity Matrix (MTX_IDENTITY command) used as Projection Matrix (ie. Ortho with t,b,l,r set to +/-1).

Orthogonal Projections (Ortho)
  | (2.0)/(r-l)       0             0            0     |
  |      0       (2.0)/(t-b)        0            0     |
  |      0            0        (2.0)/(n-f)       0     |
  | (l+r)/(l-r)  (b+t)/(b-t)   (n+f)/(n-f)      1.0    |
n,f specify the distance from eye to near and far clip planes. t,b,l,r are the coordinates of near clip plane (top,bottom,left,right). For a symmetrical view (ie. the straight-ahead view line centered in the middle of viewport) t,b,l,r should be usually t=+ysiz/2, b=-ysiz/2, r=+xsiz/2, l=-xsiz/2; the (xsiz/ysiz) ratio should be usually equal to the viewport's (width/heigh) ratio. Examples for a asymmetrical view would be b=0 (frog's view), or t=0 (bird's view).

Left-Right Asymmetrical Perspective Projections (Frustum)
  | (2*n)/(r-l)       0             0            0     |
  |      0       (2*n)/(t-b)        0            0     |
  | (r+l)/(r-l)  (t+b)/(t-b)   (n+f)/(n-f)     -1.0    |
  |      0            0       (2*n*f)/(n-f)      0     |
n,f,t,b,l,r have same meanings as above (Ortho), the difference is that more distant objects will appear smaller with Perspective Projection (unlike Orthogonal Projection where the size isn't affected by the distance).

Left-Right Symmetrical Perspective Projections (Perspective)
  | cos/(asp*sin)     0             0            0     |
  |      0         cos/sin          0            0     |
  |      0            0        (n+f)/(n-f)     -1.0    |
  |      0            0       (2*n*f)/(n-f)      0     |
Quite the same as above (Frustum), but with symmetrical t,b values (which are in this case obtained from a vertical view range specified in degrees), and l,r are matched to the aspect ratio of the viewport (asp=height/width).

Moving the Camera
After initializing the Projection Matrix, you may multiply it with Rotate and/or Translation Matrices to change camera's position and view direction.

  DS 3D Matrix Examples (Rotate/Scale/Translate)

Identity Matrix
The MTX_IDENTITY command can be used to initialize the Position Matrix before doing any Translation/Scaling/Rotation, for example:
  Load(Identity)                           ;no rotation/scaling used
  Load(Identity), Mul(Rotate), Mul(Scale)  ;rotation/scaling (not so efficient)
  Load(Rotate), Mul(Scale)                 ;rotation/scaling (more efficient)

Rotation Matrices
Rotation can be performed with MTX_MULT_3x3 command, simple examples are:
  Around X-Axis          Around Y-Axis          Around Z-Axis
  | 1.0  0     0   |     | cos   0    sin |     | cos   sin   0   |
  | 0    cos   sin |     | 0     1.0  0   |     | -sin  cos   0   |
  | 0    -sin  cos |     | -sin  0    cos |     | 0     0     1.0 |

Scale Matrix
The MTX_SCALE command allows to adjust the size of the polygon. The x,y,z parameters should be normally all having the same value, x=y=z (unless if you want to change only the height of the object, for example). Identical results can be obtained with MTX_MULT commands, however, when using lighting (MTX_MODE=2), then scaling should be done ONLY with MTX_SCALE (which keeps the length of the light's directional vector intact).

Translation Matrix
The MTX_TRANS command allows to move polygons to the desired position. The polygon VTX commands are spanning only a small range of coordinates (near zero-coordinate), so translation is required to move the polygons to other locations in the world coordinates. Aside from that, translation is useful for moved objects (at variable coordinates), and for re-using an object at various locations (eg. you can create a forest by translating a tree to different coordinates).

Matrix Multiply Order
The Matrix must be set up BEFORE sending the Vertices (which are then automatically multiplied by the matrix). When using multiple matrices multiplied with each other: Mind that, for matrix maths A*B is NOT the same as B*A. For example, if you combine Rotate and Translate Matrices, the object will be either rotated around it's own zero-coordinate, or around world-space zero-coordinate, depending on the multiply order.

  DS 3D Matrix Examples (Maths Basics)

Below is a crash-course on matrix maths. Most of it is carried out automatically by the hardware. So this chapter is relevant only if you are interested in details about what happens inside of the 3D engine.

Matrix-by-Matrix Multiplication
Matrix multiplication, C = A * B, is possible only if the number of columns in A is equal to the number of rows in B, so it works fine with the 4x4 matrices which are used in the NDS. For the multiplication, assume matrix C to consist of elements cyx, and respecitively, matrix A and B to consist of elements ayx and byx. So that C = A * B looks like:
  | c11 c12 c13 c14 |     | a11 a12 a13 a14 |     | b11 b12 b13 b14 |
  | c21 c22 c23 c24 |  =  | a21 a22 a23 a24 |  *  | b21 b22 b23 b24 |
  | c31 c32 c33 c34 |     | a31 a32 a33 a34 |     | b31 b32 b33 b34 |
  | c41 c42 c43 c44 |     | a41 a42 a43 a44 |     | b41 b42 b43 b44 |
Each element in C is calculated by multiplying the elements from one row in A by the elements from the corresponding column in B, and then taking the sum of the products, ie.
  cyx = ay1*b1x + ay2*b2x + ay3*b3x + ay4*b4x
In total, that requires 64 multiplications (four multiplications for each of the 16 cyx elements), and 48 additions (three per cyx element), the hardware carries out that operation at a relative decent speed of 30..35 clock cycles, possibly by performing several multiplications simultaneously with separate multiply units.
Observe that for matrix multiplication, A*B is NOT the same as B*A.

Matrix-by-Vector & Vector-by-Matrix Multiplication
Vectors are Matrices with only one row, or only one column. Multiplication works as for normal matrices; the number of rows/columns must match up, repectively, row-vectors can be multiplied by matrices; and matrices can be multiplied by column-vectors (but not vice-versa). Eg. C = A * B:
                                                  | b11 b12 b13 b14 |
  | c11 c12 c13 c14 |  =  | a11 a12 a13 a14 |  *  | b21 b22 b23 b24 |
                                                  | b31 b32 b33 b34 |
                                                  | b41 b42 b43 b44 |
The formula for calculating the separate elements is same as above,
  cyx = ay1*b1x + ay2*b2x + ay3*b3x + ay4*b4x
Of which, C and A have only one y-index, so one may replace "cyx and ayx" by "c1x and a1x", or completely leave out the y-index, ie. "cx and ax".

Matrix-by-Number Multiplication
Simply multiply all elements of the Matrix by the number, C = A * n:
  cyx = ayx*n
Of course, works also with vectors (matrices with only one row/column).

Matrix-to-Matrix Addition/Subtraction
Both matrices must have the same number of rows & columns, add/subtract all elements with corresponding elements in other matrix, C = A +/- B:
  cyx = ayx +/- byx
Of course, works also with vectors (two matrices with only one row/column).

A vector, for example (x,y,z), consists of offsets along x-,y-, and z-axis. The line from origin to origin-plus-offset is having two characteristics: A direction, and a length.
The length (aka magnitude) can be calculated as L=sqrt(x^2+y^2+z^2).

Vector-by-Vector Multiplication
This can be processed as LineVector*RowVector, so the result is a number (aka scalar) (aka a matrix with only 1x1 elements). Multiplying two (normalized) vectors results in: "cos(angle)=vec1*vec2", ie. the cosine of the angle between the two vectors (eg. used for light vectors). Multiplying a vector with itself, and taking the square root of the result obtains its length, ie. "length=sqrt(vec^2)".
That stuff should be done with 3-dimensional vectors (not 4-dimensionals).

Normalized Vectors
Normalized Vectors (aka Unit Vectors) are vectors with length=1.0. To normalize a vector, divide its coordinates by its length, ie. x=x/L, y=y/L, z=z/L, the direction remains the same, but the length is now 1.0.
On the NDS, normalized vectors should have a length of something less than 1.0 (eg. something like 0.99) because several NDS registers are limited to 1bit sign, 0bit interger, Nbit fraction part (so vectors that are parallel to the x,y,z axes, or that become parallel to them after rotation, cannot have a length of 1.0).

Fixed-Point Numbers
The NDS uses fixed-point numbers (rather than floating point numbers). Addition and Subtraction works as with normal integers, provided that the fractional part is the same for both numbers. If it is not the same: Shift-left the value with the smaller fractional part.
For multiplication, the fractional part of result is the sum of the fractional parts (eg. 12bit fraction * 12bit fraction = 24bit fraction; shift-right the result by 12 to convert it 12bit fraction). The NDS matrix multiply unit is maintaining the full 24bit fraction when processing the
  cyx = ay1*b1x + ay2*b2x + ay3*b3x + ay4*b4x
formula, ie. the three additions are using full 24bit fractions (with carry-outs to upper bits), the final result of the additions is then shifted-right by 12.
For division, it's vice versa, the fractions of the operands are substracted, 24bit fraction / 12bit fraction = 12bit fraction. When dividing two 12bit numbers, shift-left the first number by 12 before division to get a result with 12bit fractional part.

Four-Dimensional Matrices
The NDS uses four-dimensional matrices and vectors, ie. matrices with 4x4 elements, and vectors with 4 elements. The first three elements are associated with the X,Y,Z-axes of the three-dimensional space. The fourth element is somewhat a "W-axis".
With 4-dimensional matrices, the Translate matrix can be used to move an object to another position. Ie. once when you've setup a matrix (which may consists of pre-multiplied scaling, rotation, translation matrices), then that matrix can be used on vertices to perform the rotation, scaling, translation all-at-once; by a single Vector*Matrix operation.
With 3-dimensional matrices, translation would require a separate addition, additionally to the multiply operation.

  DS 3D Polygon Attributes

40004A4h - Cmd 29h - POLYGON_ATTR - Set Polygon Attributes (W)
  0-3   Light 0..3 Enable Flags (each bit: 0=Disable, 1=Enable)
  4-5   Polygon Mode  (0=Modulation,1=Decal,2=Toon/Highlight Shading,3=Shadow)
  6     Polygon Back Surface   (0=Hide, 1=Render)  ;Line-segments are always
  7     Polygon Front Surface  (0=Hide, 1=Render)  ;rendered (no front/back)
  8-10  Not used
  11    Depth-value for Translucent Pixels    (0=Keep Old, 1=Set New Depth)
  12    Far-plane intersecting polygons       (0=Hide, 1=Render/clipped)
  13    1-Dot polygons behind DISP_1DOT_DEPTH (0=Hide, 1=Render)
  14    Depth Test, Draw Pixels with Depth    (0=Less, 1=Equal) (usually 0)
  15    Fog Enable                            (0=Disable, 1=Enable)
  16-20 Alpha      (0=Wire-Frame, 1..30=Translucent, 31=Solid)
  21-23 Not used
  24-29 Polygon ID (00h..3Fh, used for translucent, shadow, and edge-marking)
  30-31 Not used
Writes to POLYGON_ATTR have no effect until next BEGIN_VTXS command.
Changes to the Light bits have no effect until lighting is re-calculated by Normal command. The interior of Wire-frame polygons is transparent (Alpha=0), and only the lines at the polygon edges are rendered, using a fixed Alpha value of 31.

4000480h - Cmd 20h - COLOR - Directly Set Vertex Color (W)
  Parameter 1, Bit 0-4    Red
  Parameter 1, Bit 5-9    Green
  Parameter 1, Bit 10-14  Blue
  Parameter 1, Bit 15-31  Not used
The 5bit RGB values are internally expanded to 6bit RGB as follows: X=X*2+(X+31)/32, ie. zero remains zero, all other values are X=X*2+1.
Aside from by using the Color command, the color can be also changed by MaterialColor0 command (if MaterialColor0.Bit15 is set, it acts identical as the Color Command), and by the Normal command (which calculates the color based on light/material parameters).

Depth Test
The Depth Test compares the depth of the pixels of the polygon with the depth of previously rendered polygons (or of the rear plane if there have been none rendered yet). The new pixels are drawn if the new depth is Less (closer to the camera), or if it is Equal, as selected by POLYGON_ATTR.Bit14.
Normally, Depth Equal would work only exact matches (ie. if the overlapping polygons have exactly the same coordinates; and thus have the same rounding errors), however, the NDS hardware is allowing "Equal" to have a tolerance of +/-200h (within the 24bit depth range of 0..FFFFFFh), that may bypass rounding errors, but it may also cause nearby polygons to be accidently treated to have equal depth.

  DS 3D Polygon Definitions by Vertices

The DS supports polygons with 3 or 4 edges, triangles and quadliterals.
The position of the edges is defined by vertices, each consisting of (x,y,z) values.

For Line Segments, use Triangles with twice the same vertex, Line Segments are rendered always because they do not have any front and back sides.
The Prohibited Quad shapes may produce unintended results, namely, that are Quads with crossed sides, and quads with angles greater than 180 degrees.

  Separate Tri.     Triangle Strips   Line Segment
  v0                 v2___v4____v6
  |\      v3         /|\  |\    /\     v0    v1
  | \     /\      v0( | \ | \  /  \     ------
  |__\   /__\        \|__\|__\/____\         v2
  v1 v2 v4  v5       v1   v3  v5   v7

  Separate Quads          Quadliteral Strips         Prohibited Quads
    v0__v3                 v0__v2____v4     v10__    v0__v3     v4
     /  \   v4____v7        /  \     |\ _____ / /v11   \/       |\
    /    \   |    \        /    \    | |v6 v8| /       /\     v5| \
   /______\  |_____\      /______\___|_|_____|/       /__\     /___\
   v1    v2  v5    v6     v1    v3  v5 v7   v9       v2   v1   v6   v7

The vertices are normally arranged anti-clockwise, except that: in triangle-strips each second polygon uses clockwise arranged vertices, and quad-strips are sorts of "up-down" arranged (whereas "up" and "down" may be anywhere due to rotation). Other arrangements may result in quads with crossed lines, or may swap the front and back sides of the polygon (above examples are showing the front sides).

4000500h - Cmd 40h - BEGIN_VTXS - Start of Vertex List (W)
  Parameter 1, Bit 0-1    Primitive Type (0..3, see below)
  Parameter 1, Bit 2-31   Not used
Indicates the Start of a Vertex List, and its Primitive Type:
  0  Separate Triangle(s)    ;3*N vertices per N triangles
  1  Separate Quadliteral(s) ;4*N vertices per N quads
  2  Triangle Strips         ;3+(N-1) vertices per N triangles
  3  Quadliteral Strips      ;4+(N-1)*2 vertices per N quads
The BEGIN_VTX command should be followed by VTX_-commands to define the Vertices of the list, and should be then terminated by END_VTX command.
BEGIN_VTX additionally applies changes to POLYGON_ATTR.

4000504h - Cmd 41h - END_VTXS - End of Vertex List (W)
Parameters: None. This is a Dummy command for OpenGL compatibility. It should be used to terminate a BEGIN_VTX, VTX_<values> sequence. END_VTXS is possibly required for Nintendo's software emulator? On real NDS consoles (and in no$gba) it does have no effect, it can be left out, or can be issued multiple times inside of a vertex list, without disturbing the display.

400048Ch - Cmd 23h - VTX_16 - Set Vertex XYZ Coordinates (W)
  Parameter 1, Bit 0-15   X-Coordinate (signed, with 12bit fractional part)
  Parameter 1, Bit 16-31  Y-Coordinate (signed, with 12bit fractional part)
  Parameter 2, Bit 0-15   Z-Coordinate (signed, with 12bit fractional part)
  Parameter 2, Bit 16-31  Not used

4000490h - Cmd 24h - VTX_10 - Set Vertex XYZ Coordinates (W)
  Parameter 1, Bit 0-9    X-Coordinate (signed, with 6bit fractional part)
  Parameter 1, Bit 10-19  Y-Coordinate (signed, with 6bit fractional part)
  Parameter 1, Bit 20-29  Z-Coordinate (signed, with 6bit fractional part)
  Parameter 1, Bit 30-31  Not used
Same as VTX_16, with only one parameter, with smaller fractional part.

4000494h - Cmd 25h - VTX_XY - Set Vertex XY Coordinates (W)
  Parameter 1, Bit 0-15   X-Coordinate (signed, with 12bit fractional part)
  Parameter 1, Bit 16-31  Y-Coordinate (signed, with 12bit fractional part)
The Z-Coordinate is kept unchanged, and re-uses the value from previous VTX.

4000498h - Cmd 26h - VTX_XZ - Set Vertex XZ Coordinates (W)
  Parameter 1, Bit 0-15   X-Coordinate (signed, with 12bit fractional part)
  Parameter 1, Bit 16-31  Z-Coordinate (signed, with 12bit fractional part)
The Y-Coordinate is kept unchanged, and re-uses the value from previous VTX.

400049Ch - Cmd 27h - VTX_YZ - Set Vertex YZ Coordinates (W)
  Parameter 1, Bit 0-15   Y-Coordinate (signed, with 12bit fractional part)
  Parameter 1, Bit 16-31  Z-Coordinate (signed, with 12bit fractional part)
The X-Coordinate is kept unchanged, and re-uses the value from previous VTX.

40004A0h - Cmd 28h - VTX_DIFF - Set Relative Vertex Coordinates (W)
  Parameter 1, Bit 0-9    X-Difference (signed, with 9/12bit fractional part)
  Parameter 1, Bit 10-19  Y-Difference (signed, with 9/12bit fractional part)
  Parameter 1, Bit 20-29  Z-Difference (signed, with 9/12bit fractional part)
  Parameter 1, Bit 30-31  Not used
Sets XYZ-Coordinate relative to the XYZ-Coordinates from previous VTX. In detail: The 9bit fractional values are divided by 8 (sign expanded to 12bit fractions, in range +/-0.125), and that 12bit fraction is then added to the old vtx coordinates. The result of the addition should not overflow 16bit vertex coordinate range (1bit sign, 3bit integer, 12bit fraction).

Notes on VTX commands
On each VTX command, the viewport coordinates of the vertex are calculated and stored in Vertex RAM,
  ( xx, yy, zz, ww ) = ( x, y, z, 1.0 ) * ClipMatrix
The actual screen position (in pixels) is then,
  screen_x = (xx+ww)*viewport_width / (2*ww) + viewport_x1
  screen_y = (yy+ww)*viewport_height / (2*ww) + viewport_y1
Each VTX command that completes the definition of a polygon (ie. each 3rd for Separate Trangles) does additionally store data in Polygon List RAM.
VTX commands may be issued only between Begin and End commands.

Polygons are clipped to the 6 sides of the view volume (ie. to the left, right, top, bottom, near, and far edges). If one or more vertic(es) exceed one of these sides, then these vertic(es) are replaced by two newly created vertices (which are located on the intersections of the polygon edges and the view volume edge).
Depending on the number of clipped vertic(es), this may increase or decrease the number of entries in Vertex RAM (ie. minus N clipped vertices, plus 2 new vertices). Also, clipped polygons which are part of polygon strips are converted to separate polygons (which does increase number of entries in Vertex RAM). Polygons that are fully outside of the View Volume aren't stored in Vertex RAM, nor in Polygon RAM (the only exception are polygons that are located exactly one pixel below of, or right of lower/right edges, which appear to be accidently stored in memory).

  DS 3D Polygon Light Parameters

The lighting operation is performed by executing the Normal command (which sets the VertexColor based on the Light/Material parameters) (to the rest of the hardware it doesn't matter if the VertexColor was set by Color command or by Normal command). Light is calculated only for the Front side of the polygon (assuming that the Normal is matched to that side), so the Back side will be (incorrectly) using the same color.

40004C8h - Cmd 32h - LIGHT_VECTOR - Set Light's Directional Vector (W)
Sets direction of the specified light (ie. the light selected in Bit30-31).
  0-9   Directional Vector's X component (1bit sign + 9bit fractional part)
  10-19 Directional Vector's Y component (1bit sign + 9bit fractional part)
  20-29 Directional Vector's Z component (1bit sign + 9bit fractional part)
  30-31 Light Number                     (0..3)
Upon executing this command, the incoming vector is multiplied by the current Directional Matrix, the result is then applied as LightVector. This allows to rotate the light direction. However, normally, to keep the light unrotated, be sure to use LoadIdentity (in MtxMode=2) before setting the LightVector.

40004CCh - Cmd 33h - LIGHT_COLOR - Set Light Color (W)
Sets the color of the specified light (ie. the light selected in Bit30-31).
  0-4   Red          (0..1Fh)      ;\light color this will be combined with
  5-9   Green        (0..1Fh)      ; diffuse, specular, and ambient colors
  10-14 Blue         (0..1Fh)      ;/upon execution of the normal command
  15-29 Not used
  30-31 Light Number (0..3)

40004C0h - Cmd 30h - DIF_AMB - MaterialColor0 - Diffuse/Ambient Reflect. (W)
  0-4   Diffuse Reflection Red     ;\light(s) that directly hits the polygon,
  5-9   Diffuse Reflection Green   ; ie. max when NormalVector has opposite
  10-14 Diffuse Reflection Blue    ;/direction of LightVector
  15    Set Vertex Color (0=No, 1=Set Diffuse Reflection Color as Vertex Color)
  16-20 Ambient Reflection Red     ;\light(s) that indirectly hits the polygon,
  21-25 Ambient Reflection Green   ; ie. assuming that light is reflected by
  26-30 Ambient Reflection Blue    ;/walls/floor, regardless of LightVector
  31    Not used
With Bit15 set, the lower 15bits are applied as VertexColor (exactly as when when executing the Color command), the purpose is to use it as default color (eg. when outcommenting the Normal command), normally, when using lighting, the color setting gets overwritten (as soon as executing the Normal command).

40004C4h - Cmd 31h - SPE_EMI - MaterialColor1 - Specular Ref. & Emission (W)
  0-4   Specular Reflection Red    ;\light(s) reflected towards the camera,
  5-9   Specular Reflection Green  ; ie. max when NormalVector is in middle of
  10-14 Specular Reflection Blue   ;/LightVector and ViewDirection
  15    Specular Reflection Shininess Table (0=Disable, 1=Enable)
  16-20 Emission Red               ;\light emitted by the polygon itself,
  21-25 Emission Green             ; ie. regardless of light colors/vectors,
  26-30 Emission Blue              ;/and no matter if any lights are enabled
  31    Not used
Caution: Specular Reflection WON'T WORK when the ProjectionMatrix is rotated.

40004D0h - Cmd 34h - SHININESS - Specular Reflection Shininess Table (W)
Write 32 parameter words (each 32bit word containing four 8bit entries), entries 0..3 in the first word, through entries 124..127 in the last word:
  0-7   Shininess 0 (unsigned fixed-point, 0bit integer, 8bit fractional part)
  8-15  Shininess 1 ("")
  16-23 Shininess 2 ("")
  24-31 Shininess 3 ("")
If the table is disabled (by MaterialColor1.Bit15), then reflection will act as if the table would be filled with linear increasing numbers.

4000484h - Cmd 21h - NORMAL - Set Normal Vector (W)
In short, this command does calculate the VertexColor, based on the various light-parameters.
In detail, upon executing this command, the incoming vector is multiplied by the current Directional Matrix, the result is then applied as NormalVector (giving it the same rotation as used for the following polygon vertices).
  0-9   X-Component of Normal Vector (1bit sign + 9bit fractional part)
  10-19 Y-Component of Normal Vector (1bit sign + 9bit fractional part)
  20-29 Z-Component of Normal Vector (1bit sign + 9bit fractional part)
  30-31 Not used
Defines the Polygon's Normal. And, does then update the Vertex Color; by recursing the View Direction, the NormalVector, the LightVector(s), and Light/Material Colors. The execution time of the Normal command varies depending on the number of enabled light(s).

Additional Light Registers
Additionally to above registers, light(s) must be enabled in PolygonAttr (mind that changes to PolygonAttr aren't applied until next Begin command). And, the Directional Matrix must be set up correctly (in MtxMode=2) for the LightVector and NormalVector commands.

Normal Vector
The Normal vector must point "away from the polygon surface" (eg. for the floor, the Normal should point upwards). That direction is implied by the polygon vertices, however, the hardware cannot automatically calculate it, so it must be set manually with the Normal command (prior to the VTX-commands).
When using lighting, the Normal command must be re-executed after switching Lighting on/off, or after changing light/material parameters. And, of course, also before defining polygons with different orientation. Polygons with same orientation (eg. horizontal polygon surfaces) and same material color can use the same Normal. Changing the Normal per polygon gives differently colored polygons with flat surfaces, changing the Normal per vertex gives the illusion of curved surfaces.

Light Vector
Each light consists of parallel beams; similar to sunlight, which appears to us (due to the great distance) to consist of parallel beams, all emmitted into the same direction; towards Earth.
In reality, light is emitted into ALL directions, originated from the light source (eg. a candle), the hardware doesn't support that type of non-parallel light. However, the light vectors can be changed per polygon, so a polygon that is located north of the light source may use different light direction than a polygon that is east of the light source.
And, of course, Light 0..3 may (and should) have different directions.

Normalized Vectors
The Normal Vector and the Light Vectors should be normalized (ie. their length should be 1.0) (in practice: something like 0.99, since the registers have only fractional parts) (a length of 1.0 can cause overflows).

Lighting Limitations
The functionality of the light feature is limited to reflecting light to the camera (light is not reflected to other polygons, nor does it cast shadows on other polygons). However, independently of the lighting feature, the DS hardware does allow to create shadows, see:
DS 3D Shadow Polygons

Internal Operation on Normal Command
  IF TexCoordTransformMode=2 THEN TexCoord=NormalVector*Matrix (see TexCoord)
  VertexColor = EmissionColor
  FOR i=0 to 3
   IF PolygonAttrLight[i]=enabled THEN
    DiffuseLevel = max(0,-(LightVector[i]*NormalVector))
    ShininessLevel = max(0,(-HalfVector[i])*(NormalVector))^2
    IF TableEnabled THEN ShininessLevel = ShininessTable[ShininessLevel]
    ;note: below processed separately for the R,G,B color components...
    VertexColor = VertexColor + SpecularColor*LightColor[i]*ShininessLevel
    VertexColor = VertexColor + DiffuseColor*LightColor[i]*DiffuseLevel
    VertexColor = VertexColor + AmbientColor*LightColor[i]
  NEXT i

Internal Operation on Light_Vector Command (for Light i)
  LightVector[i] = (LightVector*DirectionalMatrix)
  HalfVector[i] = (LightVector[i]+LineOfSightVector)/2

LineOfSightVector (how it SHOULD work)
Ideally, the LineOfSightVector should point from the camera to the vertic(es), however, the vertic(es) are still unknown at time of normal command, so it is just pointing from the camera to the screen, ie.
  LineOfSightVector = (0,0,-1.0)
Moreover, the LineOfSightVector should be multiplied by the Projection Matrix (so the vector would get rotated accordingly when the camera gets rotated), and, after multiplication by a scaled matrix, it'd be required to normalize the resulting vector.

LineOfSightVector (how it DOES actually work)
However, the NDS cannot normalize vectors by hardware, and therefore, it does completely leave out the LineOfSightVector*ProjectionMatrix multiplication. So, the LineOfSightVector is always (0,0,-1.0), no matter of any camera rotation. That means,
  Specular Reflection WON'T WORK when the ProjectionMatrix is rotated (!)
So, if you want to rotate the "camera" (in MTX_MODE=0), then you must instead rotate the "world" in the opposite direction (in MTX_MODE=2).
That problem applies only to Specular Reflection, ie. only if Lighting is used, and only if the Specular Material Color is nonzero.

Maths Notes
Note on Vector*Vector multiplication: Processed as LineVector*RowVector, so the result is a number (aka scalar) (aka a matrix with only 1x1 elements), multiplying two (normalized) vectors results in: "cos(angle)=vec1*vec2", ie. the consine of the angle between the two vectors.
The various Normal/Light/Half/Sight vectors are only 3-dimensional (x,y,z), ie. only the upper-left 3x3 matrix elements are used on multiplications with the 4x4 DirectionalMatrix.

  DS 3D Shadow Polygons

The DS hardware's Light-function allows to reflect light to the camera, it does not reflect light to other polygons, and it does not cast any shadows. For shadows at fixed locations it'd be best to pre-calculate their shape and position, and to change the vertex color of the shaded polygons.
Additionally, the Shadow Polygon feature can be used to create animated shadows, ie. moved objects and variable light sources.

Shadow Polygons and Shadow Volume
The software must define a Shadow Volume (ie. the region which doesn't contain light), the hardware does then automatically draw the shadow on all pixels whose x/y/z-coordinates are inside of that region.
The Shadow Volume must be defined by several Shadow Polygons which are enclosing the shaded region. The 'top' of the shadow volume should be usually translated to the position of the object that casts the shadow, if the light direction changes then the shadow volume should be also rotated to match the light direction. The 'length' of the shadow volume should be (at least) long enough to reach from the object to the walls/floor where the shadow is to be drawn. The shadow volume must be passed TWICE to the hardware:

Step 1 - Shadow Volume for Mask
Set Polygon_Attr Mode=Shadow, PolygonID=00h, Back=Render, Front=Hide, Alpha=01h..1Eh, and pass the shadow volume (ie. the shadow polygons) to the geometry engine.
The Back=Render / Front=Hide setting causes the 'rear-side' of the shadow volume to be rendered, of course only as far as it is in front of other polygons. The Mode=Shadow / ID=00h setting causes the polygon NOT to be drawn to the Color Buffer - instead, flags are set in the Stencil Buffer (to be used in Step 2).

Step 2 - Shadow Volume for Rendering
Simply repeat step 1, but with Polygon_Attr Mode=Shadow, PolygonID=01h..3Fh, Back=Render(what/why?), Front=Render, Alpha=01h..1Eh.
The Front=Render setting causes the 'front-side' of the shadow volume to be rendered, again, only as far as it is in front of other polygons. The Mode=Shadow / ID>00h setting causes the polygon to be drawn to the Color Buffer as usually, but only if the Stencil Buffer bits are zero (ie. the portion from Step 1 is excluded) (additionally, Step 2 resets the stencil bits after checking them). Moreover, the shadow is rendered only if its Polygon ID differs from the ID in the Attribute Buffer.

Shadow Alpha and Shadow Color
The Alpha=Translucent setting in Step 1 and 2 ensures that the Shadow is drawn AFTER the normal (opaque) polygons have been rendered. In Step 2 it does additionally specify the 'intensity' of the shadow. For normal shadows, the Vertex Color should be usually black, however, the shadow volume may be also used as 'spotlight volume' when using other colors.

Rendering Order
The Mask Volume must be rendered prior to the Rendering Volume, ie. Step 1 and 2 must be performed in that order, and, to keep that order intact, Auto-sorting must have been disabled in the previous Swap_Buffers command.
The shadow volume must be rendered after the 'target' polygons have been rendered, for opaque targets this is done automatically (due to the translucent alpha setting; translucent polygons are always rendered last, even with auto-sort disabled).

Translucent Targets
Casting shadows on Translucent Polygons. First draw the translucent target (with update depth buffer enabled, required for the shadow z-coordinates), then draw the Shadow Mask/Rendering volumes.
Due to the updated depth buffer the shadow will be cast only on the translucent target (not on any other polygons underneath of the translucent polygon). If you want the shadow to appear on both: Draw draw the Shadow Mask/Rendering volume TWICE (once before, and once after drawing the translucent target).

Polygon ID and Fog Enable
The "Render only if Polygon ID differs" feature (see Step 2) allows to prevent the shadow to be cast on the object that casts the shadow (ie. the object and shadow should have the same IDs). The feature also allows to select whether overlapping shadows (with same/different IDs) are shaded once or twice.
The old Fog Enable flag in the Attribute Buffer is ANDed with the Fog Enable flag of the Shadow Polygons, this allows to exclude Fog in shaded regions.

Shadow Volume Open/Closed Shapes
Normally, the shadow volume should have a closed shape, ie. should have rear-sides (step 1), and corresponding front-sides (step 2) for all possible viewing angles. That is required for the shadow to be drawn correctly, and also for the Stencil Buffer to be reset to zero (in step 2, so that the stencil bits won't disturb other shadow volumes).
Due to that, drawing errors may occur if the shadow volume's front or rear side gets clipped by near/far clip plane.
One exception is that the volume doesn't need a bottom-side (with a suitable volume length, the bottom may be left open, since it vanishes in the floor/walls anyways).

  DS 3D Texture Attributes

4000488h - Cmd 22h - TEXCOORD - Set Texture Coordinates (W)
Specifies the texture source coordinates within the texture bitmap which are to be associated with the next vertex.
  Parameter 1, Bit 0-15   S-Coordinate (X-Coordinate in Texture Source)
  Parameter 1, Bit 16-31  T-Coordinate (Y-Coordinate in Texture Source)
  Both values are 1bit sign + 11bit integer + 4bit fractional part.
  A value of 1.0 (=1 SHL 4) equals to one Texel.
With Position 0.0 , 0.0 drawing starts from upperleft of the Texture.
With positive offsets, drawing origin starts more "within" the texture.
With negative offsets, drawing starts "before" the texture.
"When texture mapping, the Geometry Engine works faster if you issue commands in the order TexCoord -> Normal -> Vertex."

40004A8h - Cmd 2Ah - TEXIMAGE_PARAM - Set Texture Parameters (W)
  0-15  Texture VRAM Offset div 8 (0..FFFFh -> 512K RAM in Slot 0,1,2,3)
        (VRAM must be allocated as Texture data, see Memory Control chapter)
  16    Repeat in S Direction (0=Clamp Texture, 1=Repeat Texture)
  17    Repeat in T Direction (0=Clamp Texture, 1=Repeat Texture)
  18    Flip in S Direction   (0=No, 1=Flip each 2nd Texture) (requires Repeat)
  19    Flip in T Direction   (0=No, 1=Flip each 2nd Texture) (requires Repeat)
  20-22 Texture S-Size        (for N=0..7: Size=(8 SHL N); ie. 8..1024 texels)
  23-25 Texture T-Size        (for N=0..7: Size=(8 SHL N); ie. 8..1024 texels)
  26-28 Texture Format        (0..7, see below)
  29    Color 0 of 4/16/256-Color Palettes (0=Displayed, 1=Made Transparent)
  30-31 Texture Coordinates Transformation Mode (0..3, see below)
Texture Formats:
  0  No Texture
  1  A3I5 Translucent Texture
  2  4-Color Palette Texture
  3  16-Color Palette Texture
  4  256-Color Palette Texture
  5  4x4-Texel Compressed Texture
  6  A5I3 Translucent Texture
  7  Direct Texture
Texture Coordinates Transformation Modes:
  0  Do not Transform texture coordinates
  1  TexCoord source
  2  Normal source
  3  Vertex source
The S-Direction equals to the horizontal direction of the source bitmap.
The T-Direction, T-repeat, and T-flip are the same in vertical direction.
For a "/" shaped texture, the S-clamp, S-repeat, and S-flip look like so:
  Clamp _____  Repeat       Repeat+Flip
  _____/       ///////////  /\/\/\/\/\/
With "Clamp", the texture coordinates are clipped to MinMax(0,Size-1), so the texels at the edges of the texture bitmap are repeated (to avoid that effect, fill the bitmap edges by texels with alpha=0, so they become invisible).

40004ACh - Cmd 2Bh - PLTT_BASE - Set Texture Palette Base Address (W)
  0-12   Palette Base Address (div8 or div10h, see below)
         (Not used for Texture Format 7: Direct Color Texture)
         (0..FFF8h/8 for Texture Format 2: ie. 4-color-palette Texture)
         (0..17FF0h/10h for all other Texture formats)
  13-31  Not used
The palette data occupies 16bit per color, Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Not used.
(VRAM must be allocated as Texture Palette, there can be up to 6 Slots allocated, ie. the addressable 18000h bytes, see Memory Control chapter)

TexImageParam and TexPlttBase
Can be issued per polygon (except within polygon strips).

  DS 3D Texture Formats

Format 2: 4-Color Palette Texture
Each Texel occupies 2bit, the first Texel is located in LSBs of 1st byte.
In this format, the Palette Base is specified in 8-byte steps; all other formats use 16-byte steps (see PLTT_BASE register).

Format 3: 16-Color Palette Texture
Each Texel occupies 4bit, the 1st Texel is located in LSBs of 1st byte.

Format 4: 256-Color Palette Texture
Each Texel occupies 8bit, the 1st Texel is located in 1st byte.

Format 7: Direct Color Texture
Each Texel occupies 16bit, the 1st Texel is located in 1st halfword.
Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Alpha

Format 1: A3I5 Translucent Texture (3bit Alpha, 5bit Color Index)
Each Texel occupies 8bit, the 1st Texel is located in 1st byte.
  Bit0-4: Color Index (0..31) of a 32-color Palette
  Bit5-7: Alpha       (0..7; 0=Transparent, 7=Solid)
The 3bit Alpha value (0..7) is internally expanded into a 5bit Alpha value (0..31) as follows: Alpha=(Alpha*4)+(Alpha/2).

Format 6: A5I3 Translucent Texture (5bit Alpha, 3bit Color Index)
Each Texel occupies 8bit, the 1st Texel is located in 1st byte.
  Bit0-2: Color Index (0..7) of a 8-color Palette
  Bit3-7: Alpha       (0..31; 0=Transparent, 31=Solid)

Format 5: 4x4-Texel Compressed Texture
Consists of 4x4 Texel blocks in Slot 0 or 2, 32bit per block, 2bit per Texel,
  Bit0-7   Upper 4-Texel row (LSB=first/left-most Texel)
  Bit8-15  Next  4-Texel row ("")
  Bit16-23 Next  4-Texel row ("")
  Bit24-31 Lower 4-Texel row ("")
Additional Palette Index Data for each 4x4 Texel Block is located in Slot 1,
  Bit0-13  Palette Offset in 4-byte steps; Addr=(PLTT_BASE*10h)+(Offset*4)
  Bit14-15 Transparent/Interpolation Mode (0..3, see below)
whereas, the Slot 1 offset is related to above Slot 0 or 2 offset,
  slot1_addr = slot0_addr / 2           ;lower 64K of Slot1 assoc to Slot0
  slot1_addr = slot2_addr / 2 + 10000h  ;upper 64K of Slot1 assoc to Slot2
The 2bit Texel values (0..3) are intepreted depending on the Mode (0..3),
  Texel  Mode 0       Mode 1             Mode 2         Mode 3
  0      Color 0      Color0             Color 0        Color 0
  1      Color 1      Color1             Color 1        Color 1
  2      Color 2      (Color0+Color1)/2  Color 2        (Color0*5+Color1*3)/8
  3      Transparent  Transparent        Color 3        (Color0*3+Color1*5)/8
Mode 1 and 3 are using only 2 Palette Colors (which requires only half as much Palette memory), the 3rd (and 4th) Texel Colors are automatically set to above values (eg. to gray-shades if color 0 and 1 are black and white).
Note: The maximum size for 4x4-Texel Compressed Textures is 1024x512 or 512x1024 (which are both occupying the whole 128K in slot 0 or 2, plus 64K in slot1), a larger size of 1024x1024 cannot be used because of the gap between slot 0 and 2.

  DS 3D Texture Coordinates

For textured polygons, a texture coordinate must be associated with each vertex of the polygon. The coordinates (S,T) are defined by TEXCOORD command (typically issued prior to each VTX command), and can be optionally automatically transformed, by the Transformation Mode selected in TEXIMAGE_PARAM register.

Texture Matrix
Although the texture matrix is 4x4, with values m[0..15], only the left two columns of this matrix are actually used. In Mode 2 and 3, the bottom row of the matrix is replaced by S and T values from most recent TEXCOORD command.

Texture Coordinates Transformation Mode 0 - No Transform
The values are set upon executing the TEXCOORD command,
  ( S' T' )  =  ( S  T )
Simple coordinate association, without using the Texture Matrix at all.

Texture Coordinates Transformation Mode 1 - TexCoord source
The values are calculated upon executing the TEXCOORD command,
                                     | m[0]  m[1]  |
  ( S' T' )  =  ( S  T 1/16 1/16 ) * | m[4]  m[5]  |
                                     | m[8]  m[9]  |
                                     | m[12] m[13] |
Can be used to produce a simple texture scrolling, rotation, or scaling, by setting a translate, rotate, or scale matrix for the texture matrix.

Texture Coordinates Transformation Mode 2 - Normal source
The values are calculated upon executing the NORMAL command,
                                     | m[0]  m[1]  |
  ( S' T' )  =  ( Nx  Ny  Nz 1.0 ) * | m[4]  m[5]  |
                                     | m[8]  m[9]  |
                                     | S     T     |
Can be used to produce spherical reflection mapping by setting the texture matrix to the current directional vector matrix, multiplied by a scaling matrix that expands the directional vector space from -1.0..+1.0 to one half of the texture size. For that purpose, translate the origin of the texture coordinate to the center of the spherical texture by using TexCoord command (spherical texture means a bitmap that contains some circle-shaped image).

Texture Coordinates Transformation Mode 3 - Vertex source
The values are calculated upon executing any VTX commands,
                                     | m[0]  m[1]  |
  ( S' T' )  =  ( Vx  Vy  Vz 1.0 ) * | m[4]  m[5]  |
                                     | m[8]  m[9]  |
                                     | S     T     |
Can be used to produce texture scrolls dependent on the View coordinates by copying the current position coordinate matrix into the texture matrix. For example, the PositionMatrix can be obtained via CLIPMTX_RESULT (see there for details), and that values can be then manually copied to the TextureMatrix.

Sign+Integer+Fractional Parts used in above Formulas
  Matrix    m[..]     1+19+12 (32bit)
  Vertex    Vx,Vy,Vz  1+3+12  (16bit)
  Normal    Nx,Ny,Nz  1+0+9   (10bit)
  Constant  1.0       0+1+0   (1bit)
  Constant  1/16      0+0+4   (4bit)
  TexCoord  S,T       1+11+4  (16bit)
  Result    S',T'     1+11+4  (16bit) <-------- clipped to that size !
Observe that the S',T' values are clipped to 16bit size. Ie. after the Vector*Matrix calaction, the result is shifted right (to make it having a 4bit fraction), and the value is then masked to 16bit size.

  DS 3D Texture Blending

Polygon pixels consist of a Vertex Color, and of Texture Colors.
These colors can be blended as described below. Or, to use only either one:
To use only the Vertex Color: Select No Texture in TEXIMAGE_PARAM.
To use only the Texture Color: Select Modulation Mode and Alpha=31 in POLYGON_ATTR, and set COLOR to 7FFFh (white), or to gray values (to decrease brightness of the texture color).

Vertex Color (Rv,Gv,Bv,Av)
The Vertex Color (Rv,Gv,Bv) can be changed per Vertex (either by Color, Normal, or Material0 command), pixels between vertices are shaded to medium values of the surrounding vertices. The Vertex Alpha (Av), can be changed only per polygon (by PolygonAttr command).

Texture Colors (Rt,Gt,Bt,At)
The Texture Colors (Rt,Gt,Bt), and Alpha value (At), are defined by the Texture Bitmap. For formats without Alpha value, assume At=31 (solid), and for formats with 1bit Alpha assume At=A*31.

Shading Table Colors (Rs,Gs,Bs)
In Toon/Highlight Shading Mode, the red component of the Vertex Color (Rv) is mis-used as an index in the Shading Table, ie. Rv is used to read Shading Colors (Rs,Gs,Bs) from the table; the green and blue components of the Vertex Color (Gv,Bv) are unused in this mode. The Vertex Alpha (Av) is kept used.
Shading is used in Polygon Mode 2, whether it is Toon or Highlight Shading is selected in DISP3DCNT; this is a per-frame selection, so only either one can be used.

Texture Blending - Modulation Mode (Polygon Attr Mode 0)
  R = ((Rt+1)*(Rv+1)-1)/64
  G = ((Gt+1)*(Gv+1)-1)/64
  B = ((Bt+1)*(Bv+1)-1)/64
  A = ((At+1)*(Av+1)-1)/64
The multiplication result is decreased intensity (unless both factors are 63).

Texture Blending - Decal Mode (Polygon Attr Mode 1)
  R = (Rt*At + Rv*(63-At))/64  ;except, when At=0: R=Rv, when At=31: R=Rt
  G = (Gt*At + Gv*(63-At))/64  ;except, when At=0: G=Gv, when At=31: G=Gt
  B = (Bt*At + Bv*(63-At))/64  ;except, when At=0: B=Bv, when At=31: B=Bt
  A = Av
The At value is used (only) as ratio for Texture color vs Vertex Color.

Texture Blending - Toon Shading (Polygon Mode 2, DISP3DCNT=Toon)
The vertex color Red component (Rv) is used as an index in the toon table.
  R = ((Rt+1)*(Rs+1)-1)/64   ;Rs=ToonTableRed[Rv]
  G = ((Gt+1)*(Gs+1)-1)/64   ;Gs=ToonTableGreen[Rv]
  B = ((Bt+1)*(Bs+1)-1)/64   ;Bs=ToonTableBlue[Rv]
  A = ((At+1)*(Av+1)-1)/64
This is same as Modulation Mode, but using Rs,Gs,Bs instead Rv,Gv,Bv.

Texture Blending - Highlight Shading (Polygon Mode 2, DISP3DCNT=Highlight)
  R = ((Rt+1)*(Rs+1)-1)/64+Rs ;truncated to MAX=63
  G = ((Gt+1)*(Gs+1)-1)/64+Gs ;truncated to MAX=63
  B = ((Bt+1)*(Bs+1)-1)/64+Bs ;truncated to MAX=63
  A = ((At+1)*(Av+1)-1)/64
Same as Toon Shading, with additional addition offset, the addition may increase the intensity, however, it may also change the hue of the color.

Above formulas are for 6bit RGBA values, ie. 5bit values internally expanded to 6bit as such: IF X>0 THEN X=X*2+1.

Uni-Colored Textures
Although textures are normally containing "pictures", in some cases it makes sense to use "blank" textures that are filled with a single color:
Wire-frame polygons are always having Av=31, however, they can be made transparent by using Translucent Textures (ie. A5I3 or A3I5 formats) with At<31.
In Toon/Highlight shading modes, the Vertex Color is mis-used as table index, however, Toon/Highlight shading can be used on uni-colored textures, which is more or less the same as using Toon/Highlight shading on uni-colored Vertex-colors.

  DS 3D Toon, Edge, Fog, Alpha-Blending, Anti-Aliasing

4000380h..3BFh - TOON_TABLE - Toon Table (W)
This 64-byte region contains the 32 toon colors (16bit per color), used for both Toon and Highlight Shading. In both modes, the Red (R) component of the RGBA vertex color is mis-used as index to obtain the new RGB value from the toon table, vertex Alpha (A) is kept used as is.
  Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Not Used
Shading can be enabled (per polygon) in Polygon_Attr, whether it is Toon or Highlight Shading is set (per frame) in DISP3DCNT. For more info on shading, see:
DS 3D Texture Blending

4000330h..33Fh - EDGE_COLOR - Edge Colors 0..7 (W)
This 16-byte region contains the 8 edge colors (16bit per color), Edge Color 0 is used for Polygon ID 00h..07h, Color 1 for ID 08h..0Fh, and so on.
  Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Not Used
Edge Marking allows to mark the edges of an object (whose polygons all have the same ID) in a wire-frame style. Edge Marking can be enabled (per frame) in DISP3DCNT. When enabled, the polygon edges are drawn at the edge color, but only if the old ID value in the Attribute Buffer is different than the Polygon ID of the new polygon, so no edges are drawn between connected or overlapping polygons with same ID values.
Edge Marking is applied ONLY to opaque polygons (including wire-frames).
Edge Marking increases the size of opaque polygons (see notes below).
Edge Marking doesn't work very well with Anti-Aliasing (see Anti-Aliasing).
Technically, when rendering a polygon, it's edges (ie. the wire-frame region) are flagged as possible-edges (but it's still rendered normally, without using the edge-color). Once when all opaque polygons (*) have been rendered, the edge color is applied to these flagged pixels, under following conditions: At least one of the four surrounding pixels (up, down, left, right) must have different polygon_id than the edge, and, the edge depth must be LESS than the depth of that surrounding pixel (ie. no edges are rendered if the depth is GREATER or EQUAL, even if the polygon_id differs). At the screen borders, edges seem to be rendered in respect to the rear-plane's polygon_id entry (see Port 4000350h).
(*) Actually, edge-marking is reportedly performed not until all opaque AND translucent polygons have been rendered. That brings up some effects/problems when edges are covered by translucent polys: The edge-color is probably drawn as is (ie. it'll overwrite the translucent color, rather than being blended with the translucent color). And, any translucent polygons that do update the depth buffer will cause total edge-marking malfunction (since edge-marking involves the comparision of the current/surrounding pixel's depth values).

4000358h - FOG_COLOR - Fog Color (W)
Fog can be used to let more distant polygons to disappear in foggy grayness (or in darkness, or other color). This is particulary useful to "hide" the far clip plane. Fog can be enabled in DISP3DCNT.Bit7, moreover, when enabled, it can be activated or deactivated per polygon (POLYGON_ATTR.Bit15), and per Rear-plane (see there).
  0-4    Fog Color, Red     ;\
  5-9    Fog Color, Green   ; used only when DISP3DCNT.Bit6 is zero
  10-14  Fog Color, Blue    ;/
  15     Not used
  16-20  Fog Alpha          ;-used no matter of DISP3DCNT.Bit6
  21-31  Not used
Whether or not fog is applied to a pixel depends on the Fog flag in the framebuffer, the initial value of that flag can be defined in the rear-plane. When rendering opaque pixels, the framebuffer's fog flag gets replaced by PolygonAttr.Bit15. When rendering translucent pixels, the old flag in the framebuffer gets ANDed with PolygonAttr.Bit15.

400035Ch - FOG_OFFSET - Fog Depth Offset (W)
  0-14   Fog Offset (Unsigned) (0..7FFFh)
  15-31  Not used
FogDepthBoundary[0..31] (for FogDensity[0..31]) are defined as:
  FogDepthBoundary[n] = FOG_OFFSET + FOG_STEP*(n+1)   ;with n = 0..31
Whereas FOG_STEP is derived from the FOG_SHIFT value in DISP3DCNT.Bit8-11 (FOG_STEP=400h shr FOG_SHIFT) (normally FOG_SHIFT should be 0..10 (bigger shift amounts of 11..15 would cause FOG_STEP to become zero, so only Density[0] and Density[31] would be used).
The meaning of the depth values depends on whether z-values or w-values are stored in the framebuffer (see SwapBuffers.Bit1).
For translucent polygons, the depth value (and therefore: the amount of fog) depends on the depth update bit (see PolygonAttr.Bit11).

4000360h..37Fh - FOG_TABLE - Fog Density Table (W)
This 32-byte region contains FogDensity[0..31] (used at FogDepthBoundary[n]),
  0-6    Fog Density (00h..7Fh = None..Full) (usually increasing values)
  7      Not used
FogDensity[0] is used for all pixels closer than FogDepthBoundary[0], FogDensity[31] is used for all pixels more distant than FogDepthBoundary[0].
Density is linear interpolated for pixels that are between two Density depth boundaries. The formula for Fog Blending is:
  FrameBuffer[R] = (FogColor[R]*Density + FrameBuffer[R]*(128-Density)) / 128
  FrameBuffer[G] = (FogColor[G]*Density + FrameBuffer[G]*(128-Density)) / 128
  FrameBuffer[B] = (FogColor[B]*Density + FrameBuffer[B]*(128-Density)) / 128
  FrameBuffer[A] = (FogColor[A]*Density + FrameBuffer[A]*(128-Density)) / 128
If DISP3DCNT.Bit6 is set (=Alpha Only), then only FrameBuffer[A] is updated, and FrameBuffer[RGB] are kepth unchanged. Density=127 is handled as if Density=128.
Fog Glitch: The fog_alpha value appears to be ignored (treated as fog_alpha=1Fh) in the region up to the first density boundary. However, normally that value will be multiplied by zero (assumung that density[0] is usually zero), so you won't ever notice that hardware glitch.

Alpha-Blending (Polygon vs FrameBuffer)
Alpha-Blending occurs for pixels of translucent polygons,
  FrameBuf[R] = (Poly[R]*(Poly[A]+1) + FrameBuf[R]*(31-(Poly[A])) / 32
  FrameBuf[G] = (Poly[G]*(Poly[A]+1) + FrameBuf[G]*(31-(Poly[A])) / 32
  FrameBuf[B] = (Poly[B]*(Poly[A]+1) + FrameBuf[B]*(31-(Poly[A])) / 32
  FrameBuf[A] = max(Poly[A],FrameBuf[A])
There are three situations in which Alpha-Blending is bypassed (the old Framebuf[R,G,B,A] value is then simply overwritten by Poly[R,G,B,A]):
  1) Alpha-Blending is disabled                       (DISP3DCNT.Bit3=0)
  2) The polygon pixel is opaque                      (Poly[A]=31)
  3) The old framebuffer value is totally transparent (FrameBuf[A]=0)
The third case can happen if the rear-plane was initialized with Alpha=0, which causes the polygon not to be blended with the rear-plane (which may give better results when subsequently blending the 3D layer with the 2D engine).
Note: Totally transparent pixels (with Poly[A]=0) are not rendered (ie. neither FrameBuf[R,G,B,A] nor FrameBuf[Depth,Fog,PolyID,etc.] are updated.

Anti-Aliasing can be enabled in DISP3DCNT, when enabled, the edges of opaque polygons will be anti-aliased (ie. the pixels at the edges may become translucent).
Anti-Aliasing is not applied on translucent polygons. And, Anti-Aliasing is not applied on the interiors of the poylgons (eg. an 8x8 chessboard texture will be anti-aliased only at the board edges, not at the edges of the 64 fields).
Anti-Aliasing is (accidently) applied to opaque 1dot polygongs, line-segments and wire-frames (which results in dirty lines with missing pixels, 1dot polys become totally invisible), workaround is to use translucent dots, lines and wires (eg. with alpha=30).
Anti-Aliasing is (correctly) not applied to edges of Edge-Marked polygons, in that special case even opaque line-segments and wire-frames are working even if anti-aliasing is enabled (provided that they are edge-marked, ie. if their polygon ID differs from the framebuffer's ID).
Anti-Aliasing is (accidently) making the edges of Edge-Marked polygons translucent (with alpha=16 or so?), that reduces the contrast of the edge colors. Moreover, if two of these translucent edges do overlap, then they are blended twice (even if they have the same polygon_id, and even if the depth_update bit in polygon_attr is set; both should normally prevent double-blending), that scatters the brightness of such edges.

Polygon Size
In some cases, the NDS hardware doesn't render the lower/right edges of certain polygons. That feature reduces rendering load, and, when rendering connected polygons (eg. strips), then it'd be unnecessary to render that edges (since they'd overlap with the upper/left edges of the other polygon). On the contrary, if there's no connected polygon displayed, then the polygon may appear smaller than expected. Small polygons with excluded edges are:
  Opaque polygons (except wire-frames) without Edge-Marking and Anti-Aliasing,
  and, all polygons with vertical right-edges (except line-segments).
  Plus, Translucent Polys when Alpha-Blending is disabled in DISP3DCNT.Bit3.
All other polygons are rendered at full size with all edges included (except vertical right edges). Note: To disable the small-polygon feature, you can enable edge-marking (which does increase the polygon size, even if no edges are drawn, ie. even if all polys do have the same ID).

  DS 3D Status

4000600h - GXSTAT - Geometry Engine Status Register (R and R/W)
Bit 30-31 are R/W. Writing "1" to Bit15 does reset the Error Flag (Bit15), and additionally resets the Projection Stack Pointer (Bit13), and probably (?) also the Texture Stack Pointer. All other GXSTAT bits are read-only.
  0     BoxTest,PositionTest,VectorTest Busy (0=Ready, 1=Busy)
  1     BoxTest Result  (0=All Outside View, 1=Parts or Fully Inside View)
  2-7   Not used
  8-12  Position & Vector Matrix Stack Level (0..31) (lower 5bit of 6bit value)
  13    Projection Matrix Stack Level        (0..1)
  14    Matrix Stack Busy (0=No, 1=Yes; Currently executing a Push/Pop command)
  15    Matrix Stack Overflow/Underflow Error (0=No, 1=Error/Acknowledge/Reset)
  16-24 Number of 40bit-entries in Command FIFO  (0..256)
 (24)   Command FIFO Full (MSB of above)  (0=No, 1=Yes; Full)
  25    Command FIFO Less Than Half Full  (0=No, 1=Yes; Less than Half-full)
  26    Command FIFO Empty                (0=No, 1=Yes; Empty)
  27    Geometry Engine Busy (0=No, 1=Yes; Busy; Commands are executing)
  28-29 Not used
  30-31 Command FIFO IRQ (0=Never, 1=Less than half full, 2=Empty, 3=Reserved)
When GXFIFO IRQ is enabled (setting 1 or 2), the IRQ flag (IF.Bit21) is set while and as long as the IRQ condition is true (and attempts to acknowledge the IRQ by writing to IF.Bit21 have no effect). So that, the IRQ handler must either fill the FIFO, or disable the IRQ (setting 0), BEFORE trying to acknowledge the IRQ.

4000604h - RAM_COUNT - Polygon List & Vertex RAM Count Register (R)
  0-11   Number of Polygons currently stored in Polygon List RAM (0..2048)
  12-15  Not used
  16-28  Number of Vertices currently stored in Vertex RAM       (0..6144)
  13-15  Not used
If a SwapBuffers command has been sent, then the counters are reset 10 cycles (at 33.51MHz clock) after next VBlank.

4000320h - RDLINES_COUNT - Rendered Line Count Register (R)
Rendering starts in scanline 214, the rendered lines are stored in a buffer that can hold up to 48 scanlines. The actual screen output begins after scanline 262, the lines are then read from the buffer and sent to the display. Simultaneously, the rendering engine keeps writing new lines to the buffer (ideally at the same speed than display output, so the buffer would always contain 48 pre-calculated lines).
  0-5    Minimum Number (minus 2) of buffered lines in previous frame (0..46)
  6-31   Not used
If rendering becomes slower than the display output, then the number of buffered lines decreases. Smaller values in RDLINES indicate that additional load to the rendering engine may cause buffer underflows in further frames, if so, the program should reduce the number of polygons to avoid display glitches.
Even if RDLINES becomes zero, it doesn't indicate whether actual buffer underflows have occured or not (underflows are indicated in DISP3DCNT Bit12).

  DS 3D Tests

40005C0h - Cmd 70h - BOX_TEST - Test if Cuboid Sits inside View Volume (W)
The BoxTest result indicates if one or more of the 6 faces of the box are fully or parts of inside of the view volume. Can be used to reduce unnecessary overload, ie. if the result is false, then the program can skip drawing of objects which are inside of the box.
BoxTest verifies only if the faces of the box are inside view volume, and so, it will return false if the whole view volume is located inside of the box (still objects inside of the box may be inside of view).
  Parameter 1, Bit 0-15   X-Coordinate
  Parameter 1, Bit 16-31  Y-Coordinate
  Parameter 2, Bit 0-15   Z-Coordinate
  Parameter 2, Bit 16-31  Width  (presumably: X-Offset?)
  Parameter 3, Bit 0-15   Height (presumably: Y-Offset?)
  Parameter 3, Bit 16-31  Depth  (presumably: Z-Offset?)
  All values are 1bit sign, 3bit integer, 12bit fractional part
The result of the "coordinate+offset" additions should not overflow 16bit vertex coordinate range (1bit sign, 3bit integer, 12bit fraction).
Before using BoxTest, be sure that far-plane-intersecting & 1-dot polygons are enabled, if they aren't: Send the PolygonAttr command (with bit12,13 set to enable them), followed by dummy Begin and End commands (required to apply the new PolygonAttr settings). BoxTest should not be issued within Begin/End.
After sending the BoxTest command, wait until GXSTAT.Bit0 indicates Ready, then read the result from GXSTAT.Bit1.

40005C4h - Cmd 71h - POS_TEST - Set Position Coordinates for Test (W)
  Parameter 1, Bit 0-15   X-Coordinate
  Parameter 1, Bit 16-31  Y-Coordinate
  Parameter 2, Bit 0-15   Z-Coordinate
  Parameter 2, Bit 16-31  Not used
  All values are 1bit sign, 3bit integer, 12bit fractional part.
Multiplies the specified line-vector (x,y,z,1) by the clip coordinate matrix.
After sending the command, wait until GXSTAT.Bit0 indicates Ready, then read the result from POS_RESULT registers. POS_TEST can be issued anywhere (except within polygon strips, huh?).
Caution: POS_TEST overwrites the internal VTX registers, so the next vertex should be <fully> defined by VTX_10 or VTX_16, otherwise, when using VTX_XY, VTX_XZ, VTX_YZ, or VTX_DIFF, then the new vertex will be relative to the POS_TEST coordinates (rather than to the previous vertex).

4000620h..62Fh - POS_RESULT - Position Test Results (R)
This 16-byte region (4 words) contains the resulting clip coordinates (x,y,z,w) from the POS_TEST command. Each value is 1bit sign, 19bit integer, 12bit fractional part.

40005C8h - Cmd 72h - VEC_TEST - Set Directional Vector for Test (W)
  Parameter 1, Bit 0-9    X-Component
  Parameter 1, Bit 10-19  Y-Component
  Parameter 1, Bit 20-29  Z-Component
  Parameter 1, Bit 30-31  Not used
  All values are 1bit sign, 9bit fractional part.
Multiplies the specified line-vector (x,y,z,0) by the directional vector matrix. Similar as for the NORMAL command, it does require Matrix Mode 2 (ie. Position & Vector Simultaneous Set mode).
After sending the command, wait until GXSTAT.Bit0 indicates Ready, then read the result ("the directional vector in the View coordinate space") from VEC_RESULT registers.

4000630h..635h - VEC_RESULT - Vector Test Results (R)
This 6-byte region (3 halfwords) contains the resulting vector (x,y,z) from the VEC_TEST command. Each value is 4bit sign, 0bit integer, 12bit fractional part. The 4bit sign is either 0000b (positive) or 1111b (negative).
There is no integer part, so values >=1.0 or <-1.0 will cause overflows.
(Eg. +1.0 aka 1000h will be returned as -1.0 aka F000h due to overflow and sign-expansion).

  DS 3D Rear-Plane

Other docs seem to refer to this as Clear-plane, rather than Rear-plane, anyways, the plane can be an image, so it isn't always "cleared".
The view order is as such:
  --> 2D Layers --> 3D Polygons --> 3D Rear-plane --> 2D Layers --> 2D Backdrop
The rear-plane can be disabled (by making it transparent; alpha=0), so that the 2D layers become visible as background.
2D layers can be moved in front of, or behind the 3D layer-group (which is represented as BG0 to the 2D Engine), 2D layers behind BG0 can be used instead of, or additionally to the rear-plane.

The rear-plane can be initialized via below two registers (so all pixels in the plane have the same colors and attributes), this method is used when DISP3DCNT.14 is zero:

4000350h - CLEAR_COLOR - Clear Color Attribute Register (W)
  0-4    Clear Color, Red
  5-9    Clear Color, Green
  10-14  Clear Color, Blue
  15     Fog (enables Fog to the rear-plane) (doesn't affect Fog of polygons)
  16-20  Alpha
  21-23  Not used
  24-29  Clear Polygon ID (affects edge-marking, at the screen-edges?)
  30-31  Not used

4000354h - CLEAR_DEPTH - Clear Depth Register (W)
  0-14   Clear Depth (0..7FFFh) (usually 7FFFh = most distant)
  15     Not used
  16-31  See Port 4000356h, CLRIMAGE_OFFSET
The 15bit Depth is expanded to 24bit as "X=(X*200h)+((X+1)/8000h)*1FFh".

Rear Color/Depth Bitmaps
Alternately, the rear-plane can be initialized by bitmap data (allowing to assign different colors & attributes to each pixel), this method is used when DISP3DCNT.14 is set:
Consists of two bitmaps (one with color data, one with depth data), each containing 256x256 16bit entries, and so, each occupying a whole 128K slot,
  Rear Color Bitmap (located in Texture Slot 2)
    0-4    Clear Color, Red
    5-9    Clear Color, Green
    10-14  Clear Color, Blue
    15     Alpha (0=Transparent, 1=Solid) (equivalent to 5bit-alpha 0 and 31)
  Rear Depth Bitmap (located in Texture Slot 3)
    0-14   Clear Depth, expanded to 24bit as X=(X*200h)+((X+1)/8000h)*1FFh
    15     Clear Fog (Initial fog enable value)
This method requires VRAM to be allocated to Texture Slot 2 and 3 (see Memory Control chapter). Of course, in that case the VRAM is used as Rear-plane, and cannot be used for Textures.
The bitmap method is restricted to 1bit alpha values (the register-method allows to use a 5bit alpha value).
The Clear Polygon ID is kept defined in the CLEAR_COLOR register, even in bitmap mode.

4000356h - CLRIMAGE_OFFSET - Rear-plane Bitmap Scroll Offsets (W)
The visible portion of the bitmap is 256x192 pixels (regardless of the viewport setting, which is used only for polygon clipping). Internally, the bitmap is 256x256 pixels, so the bottom-most 64 rows are usually offscreen, unless scrolling is used to move them into view.
  Bit0-7   X-Offset (0..255; 0=upper row of bitmap)
  Bit8-14  Y-Offset (0..255; 0=left column of bitmap)
The bitmap wraps to the upper/left edges when exceeding the lower/right edges.

  DS 3D Final 2D Output

The final 3D image (consisting of polygons and rear-plane) is passed to 2D Engine A as BG0 layer (provided that DISPCNT is configured to use 3D as BG0).

The BG0HOFS register (4000010h) can be used the scroll the 3D layer horizontally, the scroll region is 512 pixels, consisting of 256 pixels for the 3D image, followed by 256 transparent pixels, and then wrapped to the 3D image again. Vertical scrolling (and rotation/scaling) cannot be used on the 3D layer.

BG Priority Order
The lower 2bit of the BG0CNT register (4000008h) control the priority relative to other BGs and OBJs, so the 3D layer can be in front of or behind 2D layers. All other bits in BG0CNT have no effect on 3D, namely, mosaic cannot be used on the 3D layer.

Special Effects
Special Effects Registers (4000050h..54h) can be used as such:
  Brightness up/down with BG0 as 1st Target via EVY   (as for 2D)
  Blending with BG0 as 2nd Target via EVA/EVB         (as for 2D)
  Blending with BG0 as 1st Target via 3D Alpha-values (unlike as for 2D)
The latter method probably (?) uses per-pixel 3D alpha values as such: EVA=A/2, and EVB=16-A/2, without using the EVA/EVB settings in 4000052h.

Window Feature
Window Feature (4000040h..4Bh) can be used as for 2D.
"If the 3D screen has highest priority, then alpha-blending is always enabled, regardless of the Window Control register's color effect enable flag [ie. regardless of Bit5 of WIN0IN, WIN1IN, WINOBJ, WINOUT registers]"... not sure if that is true, and if it superseedes the effect selection in Port 4000050h...?

  DS Sound

The DS contains 16 hardware sound channels.
The console contains two speakers, arranged left and right of the upper screen, and so, provides stereo sound even without using the headphone socket.

DS Sound Channels 0..15
DS Sound Control Registers
DS Sound Capture
DS Sound Block Diagrams
DS Sound Notes

Power control
When restoring power supply to the sound circuit, do not output any sound during the first 15 milliseconds.

  DS Sound Channels 0..15

Each of the 16 sound channels occopies 16 bytes in the I/O region, starting with channel 0 at 4000400h..400040Fh, up to channel 15 at 40004F0h..40004FFh.

40004x0h - NDS7 - SOUNDxCNT - Sound Channel X Control Register (R/W)
  Bit0-6    Volume Mul   (0..127=silent..loud)
  Bit7      Not used     (always zero)
  Bit8-9    Volume Div   (0=Normal, 1=Div2, 2=Div4, 3=Div16)
  Bit10-14  Not used     (always zero)
  Bit15     Hold         (0=Normal, 1=Hold last sample after one-shot sound)
  Bit16-22  Panning      (0..127=left..right) (64=half volume on both speakers)
  Bit23     Not used     (always zero)
  Bit24-26  Wave Duty    (0..7) ;HIGH=(N+1)*12.5%, LOW=(7-N)*12.5% (PSG only)
  Bit27-28  Repeat Mode  (0=Manual, 1=Loop Infinite, 2=One-Shot, 3=Prohibited)
  Bit29-30  Format       (0=PCM8, 1=PCM16, 2=IMA-ADPCM, 3=PSG/Noise)
  Bit31     Start/Status (0=Stop, 1=Start/Busy)
All channels support ADPCM/PCM formats, PSG rectangular wave can be used only on channels 8..13, and white noise only on channels 14..15.

40004x4h - NDS7 - SOUNDxSAD - Sound Channel X Data Source Register (W)
  Bit0-26  Source Address (must be word aligned, bit0-1 are always zero)
  Bit27-31 Not used

40004x8h - NDS7 - SOUNDxTMR - Sound Channel X Timer Register (W)
  Bit0-15  Timer Value, Sample frequency, timerval=-(33513982/2)/freq
The PSG Duty Cycles are composed of eight "samples", and so, the frequency for Rectangular Wave is 1/8th of the selected sample frequency.
For PSG Noise, the noise frequency is equal to the sample frequency.

40004xAh - NDS7 - SOUNDxPNT - Sound Channel X Loopstart Register (W)
  Bit0-15  Loop Start, Sample loop start position
           (counted in words, ie. N*4 bytes)

40004xCh - NDS7 - SOUNDxLEN - Sound Channel X Length Register (W)
The number of samples for N words is 4*N PCM8 samples, 2*N PCM16 samples, or 8*(N-1) ADPCM samples (the first word containing the ADPCM header). The Sound Length is not used in PSG mode.
  Bit0-21  Sound length (counted in words, ie. N*4 bytes)
  Bit22-31 Not used
Minimum length (the sum of PNT+LEN) is 4 words (16 bytes), smaller values (0..3 words) are causing hang-ups (busy bit remains set infinite, but no sound output occurs).

In One-shot mode, the sound length is the sum of (PNT+LEN).
In Looped mode, the length is (1*PNT+Infinite*LEN), ie. the first part (PNT) is played once, the second part (LEN) is repeated infinitely.

  DS Sound Control Registers

4000500h - NDS7 - SOUNDCNT - Sound Control Register (R/W)
  Bit0-6   Master Volume       (0..127=silent..loud)
  Bit7     Not used            (always zero)
  Bit8-9   Left Output from    (0=Left Mixer, 1=Ch1, 2=Ch3, 3=Ch1+Ch3)
  Bit10-11 Right Output from   (0=Right Mixer, 1=Ch1, 2=Ch3, 3=Ch1+Ch3)
  Bit12    Output Ch1 to Mixer (0=Yes, 1=No) (both Left/Right)
  Bit13    Output Ch3 to Mixer (0=Yes, 1=No) (both Left/Right)
  Bit14    Not used            (always zero)
  Bit15    Master Enable       (0=Disable, 1=Enable)
  Bit16-31 Not used            (always zero)

4000504h - NDS7 - SOUNDBIAS - Sound Bias Register (R/W)
  Bit0-9   Sound Bias    (0..3FFh, usually 200h)
  Bit10-31 Not used      (always zero)
After applying the master volume, the signed left/right audio signals are in range -200h..+1FFh (with medium level zero), the Bias value is then added to convert the signed numbers into unsigned values (with medium level 200h).
BIAS output is always enabled, even when Master Enable (SOUNDCNT.15) is off.

The sampling frequency of the mixer is 1.04876 MHz with an amplitude resolution of 24 bits, but the sampling frequency after mixing with PWM modulation is 32.768 kHz with an amplitude resolution of 10 bits.

  DS Sound Capture

The DS contains 2 built-in sound capture devices that can capture output waveform data to memory.
Sound capture 0 can capture output from left-mixer or output from channel 0.
Sound capture 1 can capture output from right-mixer or output from channel 2.

4000508h - NDS7 - SNDCAP0CNT - Sound Capture 0 Control Register (R/W)
4000509h - NDS7 - SNDCAP1CNT - Sound Capture 1 Control Register (R/W)
  Bit0     Control of Associated Sound Channels (ANDed with Bit7)
            SNDCAP0CNT: Output Sound Channel 1 (0=As such, 1=Add to Channel 0)
            SNDCAP1CNT: Output Sound Channel 3 (0=As such, 1=Add to Channel 2)
            Caution: Addition mode works only if BOTH Bit0 and Bit7 are set.
  Bit1     Capture Source Selection
            SNDCAP0CNT: Capture 0 Source (0=Left Mixer, 1=Channel 0/Bugged)
            SNDCAP1CNT: Capture 1 Source (0=Right Mixer, 1=Channel 2/Bugged)
  Bit2     Capture Repeat        (0=Loop, 1=One-shot)
  Bit3     Capture Format        (0=PCM16, 1=PCM8)
  Bit4-6   Not used              (always zero)
  Bit7     Capture Start/Status  (0=Stop, 1=Start/Busy)

4000510h - NDS7 - SNDCAP0DAD - Sound Capture 0 Destination Address (R/W)
4000518h - NDS7 - SNDCAP1DAD - Sound Capture 1 Destination Address (R/W)
  Bit0-26  Destination address (word aligned, bit0-1 are always zero)
  Bit27-31 Not used (always zero)
Capture start address (also used as re-start address for looped capture).

4000514h - NDS7 - SNDCAP0LEN - Sound Capture 0 Length (W)
400051Ch - NDS7 - SNDCAP1LEN - Sound Capture 1 Length (W)
  Bit0-15  Buffer length (1..FFFFh words) (ie. N*4 bytes)
  Bit16-31 Not used
Minimum length is 1 word (attempts to use 0 words are interpreted as 1 word).

SOUND1TMR - NDS7 - Sound Channel 1 Timer shared as Capture 0 Timer
SOUND3TMR - NDS7 - Sound Channel 3 Timer shared as Capture 1 Timer
There are no separate capture frequency registers, instead, the sample frequency of Channel 1/3 is shared for Capture 0/1. These channels are intended to output the captured data, so it makes sense that both capture and sound output use the same frequency.

For Capture 0, a=0, b=1, x=0.
For Capture 1, a=2, b=3, x=1.

Capture Bugs
The NDS contains two hardware bugs which do occur when capturing data from ch(a) (SNDCAPxCNT.Bit1=1), if so, either bug occurs depending on whether ch(a)+ch(b) addition is enabled or disabled (SNDCAPxCNT.Bit0).
  1) Both Negative Bug - SNDCAPxCNT Bit1=1, Bit0=0 (addition disabled)
   Capture data is accidently set to -8000h if ch(a) and ch(b) are both <0.
   Otherwise the correct capture result is returned, ie. plain ch(a) data,
   not being affected by ch(b) (since addition is disabled).
   Workaround: Ensure that ch(a) and/or ch(b) are >=0 (or disabled).
 2) Overflow Bug - SNDCAPxCNT Bit1=1, Bit0=1 (addition enabled)
   In this mode, Capture data isn't clipped to MinMax(-8000h,+7FFFh),
   instead, it is ANDed with FFFFh, so the sign bit is lost if the
   addition result ch(a)+ch(b) is less/greater than -8000h/+7FFFh.
   Workaround: Reduce ch(a)/ch(b) volume or data to avoid overflows.
These bugs occur only for capture (speaker output remains intact), and they occur only when capturing ch(a) (capturing mixer-output works flawless).

ch(a)+ch(b) Channel Addition
The ch(a)+ch(b) addition unit has 2 outputs, with slightly different results:
 1) Addition Result for Capture(x) when using capture source=ch(a):
  Addition is performed always, no matter of SOUNDCNT.Bit12/13.
  And, no matter of ch(a) enable, result is plain ch(b) if ch(a) is disabled.
  Result is 16bit (plus fraction) with overflow error (see Capture Bugs).
 2) Addition Result for Mixer (towards speakers, and capture source=mixer):
  Ch(b) is muted if ch(a) is disabled.
  Ch(b) is muted if ch(b) SOUNDCNT.Bit12/13 is set to "Ch(b) not to mixer".
  Result is 17bit (plus fraction) without overflow error.
Addition mode can be used only if the <corresponding> capture unit is enabled, ie. if SNDCAPxCNT (Bit0 AND Bit7)=1. If so, addition affects both mixers (and so, may also affect the <other> capture unit if it reads from mixer).

  DS Sound Block Diagrams

Left Mixer with Capture 0
(Right Mixer with Capture 1, respectively)
  Ch0.L ------------->|     |  +------------------------------> to Capture 0
               ___    |     |  |                  ___
  Ch1.L ---+->|Sel|-->|     |  |       Ch0..Ch15 |   |
           |  |___|   |Left |--+---------------->|   |
  Ch2.L ---|--------->|Mixer|                    |Sel|   ______    ____
           |   ___    |     |                Ch1 |   |  |Master|  |Add |
  Ch3.L -+-|->|Sel|-->|     | +----------------->|   |->|Volume|->|Bias|-> L
         | |  |___|   |     | |                  |   |  |______|  |____|
  Ch4.L -|-|--------->|     | |              Ch3 |   |
  ...   -|-|--------->|     | | +--------------->|   |
  Ch15.L-|-|--------->|_____| | |   ___          |   |
         | +------------------+-|->|Add| Ch1+Ch3 |   |

Channel 0 and 1, Capture 0 with input from Left Mixer
(Channel 2 and 3, Capture 1 with input from Right Mixer, respectively)
  ____     _________     ___     ___      ___
 |FIFO|-->|Channel 0|-->|Vol|-->|Add|-+->|Pan|--> Ch0.L
 |____|   |_________|   |___|   |___| |  |___|--> Ch0.R
  ____     _________     ___      ^   |
 |FIFO|<--|Capture 0|<--|Sel|<----|---+
 |____|   |_ _____ _|   |___|<----|-------------- Left Mixer
  ____     _:Timer:_     ___     _|_      ___
 |FIFO|-->|Channel 1|-->|Vol|-->|Sel|--->|Pan|--> Ch1.L
 |____|   |_________|   |___|   |___|    |___|--> Ch1.R

Channel 4 (Channel 5..15, respectively)
  ____     _________     ___              ___
 |FIFO|-->|Channel 4|-->|Vol|----------->|Pan|--> Ch4.L
 |____|   |_________|   |___|            |___|--> Ch4.R

The FIFO isn't used in PSG/Noise modes (supported on channel 8..15).

  DS Sound Notes

Sound delayed Start/Restart (timing glitch)
A sound will be started/restarted when changing its start bit from 0 to 1, however, the sound won't start immediately: PSG/Noise starts after 1 sample, PCM starts after 3 samples, and ADPCM starts after 11 samples (3 dummy samples as for PCM, plus 8 dummy samples for the ADPCM header).

Sound Stop (timing note)
In one-shot mode, the Busy bit gets cleared automatically at the BEGIN of the last sample period, nethertheless (despite of the cleared Busy bit) the last sample is kept output until the END of the last sample period (or, if the Hold flag is set, then the last sample is kept output infinitely, that is, until Hold gets cleared, or until the sound gets restarted).

Hold Flag (appears useless/bugged)
The Hold flag allows to keep the last sample being output infinitely after the end of one-shot sounds. This feature is probably intended to allow to play two continous one-shot sound blocks (without producing any scratch noise upon small delays between both blocks, which would occur if the output level would drop to zero).
However, the feature doesn't work as intended. As described above, PCM8/PCM16 sound starts are delayed by 3 samples. With Hold flag set, old output level is acually kept intact during the 1st sample, but the output level drops to zero during 2nd-3rd sample, before starting the new sound in 4th sample.

7bit Volume and Panning Values
  data.vol   = data*N/128
  pan.left   = data*(128-N)/128
  pan.right  = data*N/128
  master.vol = data*N/128/64
Register settings of 0..126,127 are interpreted as N=0..126,128.

Max Output Levels
When configured to max volume (and left-most or right-most panning), each channel can span the full 10bit output range (-200h..1FFh) on one speaker, as well as the full 16bit input range (-8000h..7FFFh) on one capture unit.
(It needs 2 channels to span the whole range on BOTH speakers/capture units.)
Together, all sixteen channels could thus reach levels up to -1E00h..21F0h (with default BIAS=200h) on one speaker, and -80000h..+7FFF0h on one capture unit. However, to avoid overflows, speaker outputs are clipped to MinMax(0,3FFh), and capture inputs to MinMax(-8000h..+7FFFh).

Channel/Mixer Bit-Widths
  Step                           Bits  Min        Max
  0 Incoming PCM16 Data          16.0  -8000h     +7FFFh
  1 Volume Divider (div 1..16)   16.4  -8000h     +7FFFh
  2 Volume Factor (mul N/128)    16.11 -8000h     +7FFFh
  3 Panning (mul N/128)          16.18 -8000h     +7FFFh
  4 Rounding Down (strip 10bit)  16.8  -8000h     +7FFFh
  5 Mixer (add channel 0..15)    20.8  -80000h    +7FFF0h
  6 Master Volume (mul N/128/64) 14.21 -2000h     +1FF0h
  7 Strip fraction               14.0  -2000h     +1FF0h
  8 Add Bias (0..3FFh, def=200h) 15.0  -2000h+0   +1FF0h+3FFh
  9 Clip (min/max 0h..3FFh)      10.0  0          +3FFh
Table shows integer.fractional bits, and min/max values (without fraction).

Capture Clipping/Rounding
Incoming ch(a) is NOT clipped, ch(a)+ch(b) may overflow (see Capture Bugs).
Incoming mixer data (20.8bits) is clipped to 16.8bits (MinMax -8000h..7FFFh).
For PCM8 capture format, the 16.8 bits are divided by 100h (=8.16 bits).
If the MSB of the fractional part is set, then data is rounded towards zero.
(Positive values are rounded down, negative values are rounded up.)
The fractional part is then discarded, and plain integer data is captured.

PSG Sound
The output volume equals to PCM16 values +7FFFh (HIGH) and -7FFFh (LOW).
PSG sound is always Infinite (the SOUNDxLEN Register, and the SOUNDxCNT Repeat Mode bits have no effect). The PSG hardware doesn't support sound length, sweep, or volume envelopes, however, these effects can be produced by software with little overload (or, more typically, with enormous overload, depending on the programming language used).

PSG Wave Duty (channel 8..13 in PSG mode)
Each duty cycle consists of eight HIGH or LOW samples, so the sound frequency is 1/8th of the selected sample rate. The duty cycle always starts at the begin of the LOW period when the sound gets (re-)started.
  0  12.5% "_______-_______-_______-"
  1  25.0% "______--______--______--"
  2  37.5% "_____---_____---_____---"
  3  50.0% "____----____----____----"
  4  62.5% "___-----___-----___-----"
  5  75.0% "__------__------__------"
  6  87.5% "_-------_-------_-------"
  7   0.0% "________________________"
The Wave Duty bits exist and are read/write-able on all channels (although they are actually used only in PSG mode on channels 8-13).

PSG Noise (channel 14..15 in PSG mode)
Noise randomly switches between HIGH and LOW samples, the output levels are calculated, at the selected sample rate, as such:
  X=X SHR 1, IF carry THEN Out=LOW, X=X XOR 6000h ELSE Out=HIGH
The initial value when (re-)starting the sound is X=7FFFh. The formula is more or less same as "15bit polynomial counter" used on 8bit Gameboy and GBA.

PCM8 and PCM16
Signed samples in range -80h..+7Fh (PCM8), or -8000h..+7FFFh (PCM16).
The output volume of PCM8=NNh is equal to PCM16=NN00h.

IMA-ADPCM is a Adaptive Differential Pulse Code Modulation (ADPCM) variant, designed by International Multimedia Association (IMA), the format is used, among others, in IMA-ADPCM compressed Windows .WAV files.
The NDS data consist of a 32bit header, followed by 4bit values (so each byte contains two values, the first value in the lower 4bits, the second in upper 4 bits). The 32bit header contains initial values:
  Bit0-15   Initial PCM16 Value (Pcm16bit = -7FFFh..+7FFF) (not -8000h)
  Bit16-22  Initial Table Index Value (Index = 0..88)
  Bit23-31  Not used (zero)
In theory, the 4bit values are decoded into PCM16 values, as such:
  Diff = ((Data4bit AND 7)*2+1)*AdpcmTable[Index]/8      ;see rounding-error
  IF (Data4bit AND 8)=0 THEN Pcm16bit = Max(Pcm16bit+Diff,+7FFFh)
  IF (Data4bit AND 8)=8 THEN Pcm16bit = Min(Pcm16bit-Diff,-7FFFh)
  Index = MinMax (Index+IndexTable[Data4bit AND 7],0,88)
In practice, the first line works like so (with rounding-error):
  Diff = AdpcmTable[Index]/8
  IF (data4bit AND 1) THEN Diff = Diff + AdpcmTable[Index]/4
  IF (data4bit AND 2) THEN Diff = Diff + AdpcmTable[Index]/2
  IF (data4bit AND 4) THEN Diff = Diff + AdpcmTable[Index]/1
And, a note on the second/third lines (with clipping-error):
  Max(+7FFFh) leaves -8000h unclipped (can happen if initial PCM16 was -8000h)
  Min(-7FFFh) clips -8000h to -7FFFh (possibly unlike windows .WAV files?)
Whereas, IndexTable[0..7] = -1,-1,-1,-1,2,4,6,8. And AdpcmTable [0..88] =
The closest way to reproduce the AdpcmTable with 32bit integer maths appears:
  X=000776d2h, FOR I=0 TO 88, Table[I]=X SHR 16, X=X+(X/10), NEXT I
  Table[3]=000Ah, Table[4]=000Bh, Table[88]=7FFFh, Table[89..127]=0000h
When using ADPCM and loops, set the loopstart position to the data part, rather than the header. At the loop end, the SAD value is reloaded to the loop start location, additionally index and pcm16 values are reloaded to the values that have originally appeared at that location. Do not change the ADPCM loop start position during playback.

Microphone Input
For Microphone (and Touchscreen) inputs, see
DS Touch Screen Controller (TSC)

  DS System and Built-in Peripherals

DS DMA Transfers
DS Timers
DS Interrupts
DS Maths
DS Inter Process Communication (IPC)
DS Keypad
DS Absent Link Port
DS Real-Time Clock (RTC)
DS Serial Peripheral Interface Bus (SPI)
DS Touch Screen Controller (TSC)
DS Power Management
DS Backwards-compatible GBA-Mode
DS Debug Registers (Emulator/Devkits)

  DS DMA Transfers

The DS includes four DMA channels for each CPU (ie. eight channels in total), which are working more or less the same as on GBA:
GBA DMA Transfers
All NDS9 and NDS7 DMA Registers are R/W. The gamepak bit (Bit 27) has been removed (on the NDS9 the bit is used to expand the mode setting to 3bits).

Word count of all channels is expanded to 21bits (max 1..1FFFFFh units, or 0=200000h units), and SAD/DAD registers for all channels support ranges of 0..0FFFFFFEh. The transfer modes (DMACNT Bit27-29) are:
  0  Start Immediately
  1  Start at V-Blank
  2  Start at H-Blank (paused during V-Blank)
  3  Synchronize to start of display
  4  Main memory display
  5  DS Cartridge Slot
  6  GBA Cartridge Slot
  7  Geometry Command FIFO

Word Count, SAD, and DAD are R/W, aside from that they do have the same restrictions as on GBA (max 4000h or 10000h units, some addresses limited to 0..07FFFFFEh). DMACNT Bit27 is unused on NDS7. The transfer modes (DMACNT Bit28-29) are:
  0  Start Immediately
  1  Start at V-Blank
  2  DS Cartridge Slot
  3  DMA0/DMA2: Wireless interrupt, DMA1/DMA3: GBA Cartridge Slot

40000E0h - NDS9 only - DMA0FILL - DMA 0 Filldata (R/W)
40000E4h - NDS9 only - DMA1FILL - DMA 1 Filldata (R/W)
40000E8h - NDS9 only - DMA2FILL - DMA 2 Filldata (R/W)
40000ECh - NDS9 only - DMA3FILL - DMA 3 Filldata (R/W)
  Bit0-31 Filldata
The DMA Filldata registers contain 16 bytes of general purpose WRAM, intended to be used as fixed source addresses for DMA memfill operations.
This is useful because DMA cannot read from TCM, and reading from Main RAM would require to recurse cache & write buffer.

NDS7 Sound DMA
The NDS additionally includes 16 Sound DMA channels, plus 2 Sound Capture DMA channels (see Sound chapter). The priority of these channels is unknown.

NDS9 Cache, Writebuffer, DTCM, and ITCM
Cache and tightly coupled memory are connected directly to the NDS9 CPU, without using the system bus. So that, DMA cannot access DTCM/ITCM, and access to cached memory regions must be handled with care: Drain the writebuffer before DMA-reads, and invalidate the cache after DMA-writes. See,
ARM CP15 System Control Coprocessor
The CPU can be kept running during DMA, provided that it is accessing only TCM (or cached memory), otherwise the CPU is halted until DMA finishes.
Respectively, interrupts executed during DMA will usually halt the CPU (unless the IRQ handler uses only TCM and cache; the IRQ vector at FFFF00xxh must be cached, or relocated to ITCM at 000000xxh, and the IRQ handler may not access IE, IF, or other I/O ports).

NDS Sequential Main Memory DMA
Main RAM has different access time for sequential and non-sequential access. Normally DMA uses sequential access (except for the first word), however, if the source and destination addresses are both in Main RAM, then all accesses become non-sequential. In that case it would be faster to use two DMA transfers, one from Main RAM to a scratch buffer in WRAM, and one from WRAM to Main RAM.

  DS Timers

Same as GBA, except F = 33.513982 MHz (for both NDS9 and NDS7).
GBA Timers
Both NDS9 and NDS7 have four Timers each, eight Timers in total.
The NDS sound controller is having its own frequency generators (unlike GBA, which needed to use Timers to drive channel A/B sounds).

  DS Interrupts

4000208h - NDS9/NDS7 - IME - Interrupt Master Enable (R/W)
  0     Disable all interrupts  (0=Disable All, 1=See IE register)
  1-31  Not used

4000210h - NDS9/NDS7 - IE - 32bit - Interrupt Enable (R/W)
4000214h - NDS9/NDS7 - IF - 32bit - Interrupt Request Flags (R/W)
Bits in the IE register are 0=Disable, 1=Enable.
Reading IF returns 0=No request, 1=Interrupt Request.
Writing IF acts as 0=No change, 1=Acknowledge (clears that bit).
  0     LCD V-Blank
  1     LCD H-Blank
  2     LCD V-Counter Match
  3     Timer 0 Overflow
  4     Timer 1 Overflow
  5     Timer 2 Overflow
  6     Timer 3 Overflow
  7     NDS7 only: SIO/RCNT/RTC (Real Time Clock)
  8     DMA 0
  9     DMA 1
  10    DMA 2
  11    DMA 3
  12    Keypad
  13    GBA-Slot (external IRQ source) / DSi: None such
  14    Not used                       / DSi9: NDS-Slot Card change?
  15    Not used                       / DSi: dito for 2nd NDS-Slot?
  16    IPC Sync
  17    IPC Send FIFO Empty
  18    IPC Recv FIFO Not Empty
  19    NDS-Slot Game Card Data Transfer Completion
  20    NDS-Slot Game Card IREQ_MC
  21    NDS9 only: Geometry Command FIFO
  22    NDS7 only: Screens unfolding
  23    NDS7 only: SPI bus
  24    NDS7 only: Wifi    / DSi9: XpertTeak DSP
  25    Not used           / DSi9: Camera
  26    Not used           / DSi9: Undoc, IF.26 set on FFh-filling 40021Axh
  27    Not used           / DSi:  Maybe IREQ_MC for 2nd gamecard?
  28    Not used           / DSi: NewDMA0
  29    Not used           / DSi: NewDMA1
  30    Not used           / DSi: NewDMA2
  31    Not used           / DSi: NewDMA3
  ?     DSi7: any further new IRQs on ARM7 side...?
Raw TCM-only IRQs can be processed even during DMA ?
Trying to set all IE bits gives FFFFFFFFh (DSi7) or FFFFFF7Fh (DSi9).

4000218h - DSi7 - IE2 - DSi7 Extra Interrupt Enable Bits
400021Ch - DSi7 - IF2 - DSi7 Extra Interrupt Flags
  0     DSi7: GPIO18[0]   ;\
  1     DSi7: GPIO18[1]   ; maybe 1.8V signals?
  2     DSi7: GPIO18[2]   ;/
  3     DSi7: Unused (0)
  4     DSi7: GPIO33[0] unknown (related to "GPIO330" testpoint on mainboard?)
  5     DSi7: GPIO33[1] Headphone connect (HP#SP) (static state)
  6     DSi7: GPIO33[2] Powerbutton interrupt (short pulse upon key-down)
  7     DSi7: GPIO33[3]
  8     DSi7: SD/MMC Controller   ;-Onboard eMMC and External SD Slot
  9     DSi7: SD Slot Data1 pin   ;-For SDIO hardware in External SD Slot
  10    DSi7: SDIO Controller     ;\Atheros Wifi Unit
  11    DSi7: SDIO Data1 pin      ;/
  12    DSi7: AES interrupt
  13    DSi7: I2C interrupt
  14    DSi7: Microphone Extended interrupt
  15-31 DSi7: Unused (0)
Trying to set all IE2 bits gives 00007FF7h (DSi7) or 00000000h (DSi9).

DTCM+3FFCh - NDS9 - IRQ Handler (hardcoded DTCM address)
380FFFCh - NDS7 - IRQ Handler (hardcoded RAM address)
  Bit 0-31  Pointer to IRQ Handler
NDS7 Handler must use ARM code, NDS9 Handler can be ARM/THUMB (Bit0=Thumb).

DTCM+3FF8h - NDS9 - IRQ Check Bits (hardcoded DTCM address)
380FFF8h - NDS7 - IRQ Check Bits (hardcoded RAM address)
  Bit 0-31  IRQ Flags (same format as IE/IF registers)
When processing & acknowleding interrupts via IF register, the user interrupt handler should also set the corresponding bits of the IRQ Check value (required for BIOS IntrWait and VBlankIntrWait SWI functions).

380FFC0h - DSi7 only - Extra IRQ Check Bits for IE2/IF2 (hardcoded RAM addr)
Same as the above 380FFF8h value, but for new IE2/IF2 registers, intended for use with IntrWait and VBlankIntrWait functions. However, that functions are BUGGED on DSi and won't actually work in practice (they do support only the new 380FFC0h bits, but do accidently ignore the old 380FFF8h bits).

--- Below for other (non-IRQ) exceptions ---

27FFD9Ch - RAM - NDS9 Debug Stacktop / Debug Vector (0=None)
380FFDCh - RAM - NDS7 Debug Stacktop / Debug Vector (0=None)
These addresses contain a 32bit pointer to the Debug Handler, and, memory below of the addresses is used as Debug Stack. The debug handler is called on undefined instruction exceptions, on data/prefetch aborts (caused by the protection unit), on FIQ (possibly caused by hardware debuggers). It is also called by accidental software-jumps to the reset vector, and by unused SWI numbers within range 0..1Fh.

  DS Maths

4000280h - NDS9 - DIVCNT - Division Control (R/W)
  0-1   Division Mode    (0-2=See below) (3=Reserved; same as Mode 1)
  2-13  Not used
  14    Division by zero (0=Okay, 1=Division by zero error; 64bit Denom=0)
  15    Busy             (0=Ready, 1=Busy) (Execution time see below)
  16-31 Not used
Division Modes and Busy Execution Times
  Mode  Numer / Denom = Result, Remainder ; Cycles
  0     32bit / 32bit = 32bit , 32bit     ; 18 clks
  1     64bit / 32bit = 64bit , 32bit     ; 34 clks
  2     64bit / 64bit = 64bit , 64bit     ; 34 clks
Division is started when writing to any of the DIVCNT/NUMER/DENOM registers.

4000290h - NDS9 - DIV_NUMER - 64bit Division Numerator (R/W)
4000298h - NDS9 - DIV_DENOM - 64bit Division Denominator (R/W)
Signed 64bit values (or signed 32bit values in 32bit modes, the upper 32bits are then unused, with one exception: the DIV0 flag in DIVCNT is set only if the full 64bit DIV_DENOM value is zero, even in 32bit mode).

40002A0h - NDS9 - DIV_RESULT - 64bit Division Quotient (=Numer/Denom) (R)
40002A8h - NDS9 - DIVREM_RESULT - 64bit Remainder (=Numer MOD Denom) (R)
Signed 64bit values (in 32bit modes, the values are sign-expanded to 64bit).

Division Overflows
Overflows occur on "DIV0" and "-MAX/-1" (eg. -80000000h/-1 in 32bit mode):
  DIV0     -->  REMAIN=NUMER, RESULT=+/-1 (with sign opposite of NUMER)
  -MAX/-1  -->  RESULT=-MAX               (instead +MAX)
On overflows in 32bit/32bit=32bit mode: the upper 32bit of the sign-expanded 32bit result are inverted. This feature produces a correct 64bit (+MAX) result in case of the incorrect 32bit (-MAX) result. The feature also applies on DIV0 errors (which makes the sign-expanded 64bit result even more messed-up than the normal 32bit result).
The DIV0 flag in DIVCNT.14 indicates DENOM=0 errors (it does not indicate "-MAX/-1" errors). The DENOM=0 check relies on the full 64bit value (so, in 32bit mode, the flag works only if the unused upper 32bit of DENOM are zero).

40002B0h - NDS9 - SQRTCNT - Square Root Control (R/W)
  0     Mode (0=32bit input, 1=64bit input)
  1-14  Not used
  15    Busy (0=Ready, 1=Busy) (Execution time is 13 clks, in either Mode)
  16-31 Not used
Calculation is started when writing to any of the SQRTCNT/PARAM registers.

40002B4h - NDS9 - SQRT_RESULT - 32bit - Square Root Result (R)
40002B8h - NDS9 - SQRT_PARAM - 64bit - Square Root Parameter Input (R/W)
Unsigned 64bit parameter, and unsigned 32bit result.

IRQ Notes
Push all DIV/SQRT values (parameters and control registers) when using DIV/SQRT registers on interrupt level, and, after restoring them, be sure to wait until the busy flag goes off, before leaving the IRQ handler.

BIOS Notes
The NDS9 and NDS7 BIOSes additionally contain software based division and square root functions, which are NOT using above hardware registers (even the NDS9 functions are raw software).

Timing Notes
The Div/Sqrt timings are counted in 33.51MHz units. Although the calculations are quite fast, mind that reading/writing the result/parameter registers takes up additional clock cycles (especially due to the PENALTY cycle glitch for non-sequential accesses; parts of that problem can be eventually bypassed by using sequential STMIA/LDMIA opcodes) (nethertheless, in some cases, software may be actually faster than the hardware registers; eg. for small 8bit numbers; that of course NOT by using the BIOS software functions which are endless inefficient).

  DS Inter Process Communication (IPC)

Allows to exchange status information between ARM7 and ARM9 CPUs.
The register can be accessed simultaneously by both CPUs (without violating access permissions, and without generating waitstates at either side).

4000180h - NDS9/NDS7 - IPCSYNC - IPC Synchronize Register (R/W)
  Bit   Dir  Expl.
  0-3   R    Data input from IPCSYNC Bit8-11 of remote CPU (00h..0Fh)
  4-7   -    Not used
  8-11  R/W  Data output to IPCSYNC Bit0-3 of remote CPU   (00h..0Fh)
  12    -    Not used
  13    W    Send IRQ to remote CPU      (0=None, 1=Send IRQ)
  14    R/W  Enable IRQ from remote CPU  (0=Disable, 1=Enable)
  15-31 -    Not used

4000184h - NDS9/NDS7 - IPCFIFOCNT - IPC Fifo Control Register (R/W)
  Bit   Dir  Expl.
  0     R    Send Fifo Empty Status      (0=Not Empty, 1=Empty)
  1     R    Send Fifo Full Status       (0=Not Full, 1=Full)
  2     R/W  Send Fifo Empty IRQ         (0=Disable, 1=Enable)
  3     W    Send Fifo Clear             (0=Nothing, 1=Flush Send Fifo)
  4-7   -    Not used
  8     R    Receive Fifo Empty          (0=Not Empty, 1=Empty)
  9     R    Receive Fifo Full           (0=Not Full, 1=Full)
  10    R/W  Receive Fifo Not Empty IRQ  (0=Disable, 1=Enable)
  11-13 -    Not used
  14    R/W  Error, Read Empty/Send Full (0=No Error, 1=Error/Acknowledge)
  15    R/W  Enable Send/Receive Fifo    (0=Disable, 1=Enable)
  16-31 -    Not used

4000188h - NDS9/NDS7 - IPCFIFOSEND - IPC Send Fifo (W)
  Bit0-31  Send Fifo Data (max 16 words; 64bytes)

4100000h - NDS9/NDS7 - IPCFIFORECV - IPC Receive Fifo (R)
  Bit0-31  Receive Fifo Data (max 16 words; 64bytes)

When IPCFIFOCNT.15 is disabled: Writes to IPCFIFOSEND are ignored (no data is stored in the FIFO, the error bit doesn't get set though), and reads from IPCFIFORECV return the oldest FIFO word (as usually) (but without removing the word from the FIFO).
When the Receive FIFO is empty: Reading from IPCFIFORECV returns the most recently received word (if any), or ZERO (if there was no data, or if the FIFO was cleared via IPCFIFOCNT.3), and, in either case the error bit gets set.
The Fifo-IRQs are edge triggered, IF.17 gets set when the condition "(IPCFIFOCNT.2 AND IPCFIFOCNT.0)" changes from 0-to-1, and IF.18 gets set when "(IPCFIFOCNT.10 AND NOT IPCFIFOCNT.8)" changes from 0-to-1. The IRQ flags can be acknowledged even while that conditions are true.

  DS Keypad

For the GBA-buttons: Same as GBA, both ARM7 and ARM9 have keyboard input registers, and each its own keypad IRQ control register.
GBA Keypad Input

For Touchscreen (and Microphone) inputs, see
DS Touch Screen Controller (TSC)

4000136h - NDS7 - EXTKEYIN - Key X/Y Input (R)
  0      Button X     (0=Pressed, 1=Released)
  1      Button Y     (0=Pressed, 1=Released)
  3      DEBUG button (0=Pressed, 1=Released/None such)
  6      Pen down     (0=Pressed, 1=Released/Disabled) (always 0 in DSi mode)
  7      Hinge/folded (0=Open, 1=Closed)
  2,4,5  Unknown / set
  8..15  Unknown / zero
The Hinge stuff is a magnetic sensor somewhere underneath of the Start/Select buttons (NDS) or between A/B/X/Y buttons (DSi), it will be triggered by the magnet field from the right speaker when the console is closed. The hinge generates an interrupt request (there seems to be no way to disable this, unlike as for all other IRQ sources), however, the interrupt execution can be disabled in IE register (as for other IRQ sources).
The Pen Down is the /PENIRQ signal from the Touch Screen Controller (TSC), if it is enabled in the TSC control register, then it will notify the program when the screen pressed, the program should then read data from the TSC (if there's no /PENIRQ then doing unneccassary TSC reads would just waste CPU power). However, the user may release the screen before the program performs the TSC read, so treat the screen as not pressed if you get invalid TSC values (even if /PENIRQ was LOW).
Not sure if the TSC /PENIRQ is actually triggering an IRQ in the NDS?
The Debug Button should be connected to R03 and GND (on original NDS, R03 is the large soldering point between the SL1 jumper and the VR1 potentiometer) (there is no R03 signal visible on the NDS-Lite board).
Interrupts are reportedly not supported for X,Y buttons.

  DS Absent Link Port

The DS doesn't have a Serial Link Port Socket, however, internally, the NDS7 contains the complete set of Serial I/O Ports, as contained in the GBA:
GBA Communication Ports

In GBA mode, the ports are working as on real GBA (as when no cable is connected). In NDS mode, the ports are even containing some additional bits:

NDS7 SIO Bits (according to an early I/O map from Nintendo)
  NDS7 4000128h SIOCNT   Bit15 "CKUP"  New Bit in NORMAL/MULTI/UART mode (R/W)
  NDS7 4000128h SIOCNT   Bit14 "N/A"   Removed IRQ Bit in UART mode (?)
  NDS7 400012Ah SIOCNT_H Bit14 "TFEMP" New Bit (R/W)
  NDS7 400012Ah SIOCNT_H Bit15 "RFFUL" New Bit (always zero?)
  NDS7 400012Ch SIOSEL   Bit0  "SEL"   New Bit (always zero?)
  NDS7 4000140h JOYCNT   Bit7  "MOD"   New Bit (R/W)
The "CKUP" bit duplicates the internal clock transfer rate (selected in SIOCNT.1) (tested in normal mode) (probably works also in multi/uart mode?).

NDS7 DS-Lite 4001080h (W) (?)
DS-Lite Firmware writes FFFFh to this address (prior to accessing SIOCNT), so it's probably SIO or debugging related (might be as well a bug or so). Reading from the port always returns 0000h on both DS and DS-Lite.

NDS9 SIO Bits (according to an early I/O map from Nintendo)
  NDS9 4000120h SIODATA32 Bit0-31 Data            (always zero?)
  NDS9 4000128h SIOCNT    Bit2    "TRECV" New Bit (always zero?)
  NDS9 4000128h SIOCNT    Bit3    "TSEND" New Bit (always zero?)
  NDS9 400012Ch SIOSEL    Bit0    "SEL"   New Bit (always zero?)
Not sure if these ports really exist in the release-version, or if it's been prototype stuff?

RCNT (4000134h) should be set to 80xxh (general purpose mode) before accessing EXTKEYIN (4000136h) or RTC (4000138h). No idea why (except when using RTC/SI-interrupt).

DS Serial Port
The SI line is labeled "INT" on the NDS mainboard, it is connected to Pin 1 of the RTC chip (ie. the /INT interrupt pin).
I have no idea where to find SO, SC, and SD. I've written a test proggy that pulsed all four RCNT bits - but all I could find was the SI signal. However, the BIOS contains some code that uses SIO normal mode transfers (for the debug version), so at least SI, SO, SC should exist...?

  DS Real-Time Clock (RTC)

Seiko Instruments Inc. S-35180 (compatible with S-35190A)
Miniature 8pin RTC with 3-wire serial bus

4000138h - NDS7 - Real Time Clock Register
  Bit  Expl.
  0    Data I/O   (0=Low, 1=High)
  1    Clock Out  (0=Low, 1=High)
  2    Select Out (0=Low, 1=High/Select)
  4    Data  Direction  (0=Read, 1=Write)
  5    Clock Direction  (should be 1=Write)
  6    Select Direction (should be 1=Write)
  3,8-11   Unused I/O Lines
  7,12-15  Direction for Bit3,8-11 (usually 0)
  16-31    Not used

Serial Transfer Flowchart
Chipselect and Command/Parameter Sequence:
  Init CS=LOW and /SCK=HIGH, and wait at least 1us
  Switch CS=HIGH, and wait at least 1us
  Send the Command byte (see bit-transfer below)
  Send/receive Parameter byte(s) associated with the command (see below)
  Switch CS to LOW
Bit transfer (repeat 8 times per cmd/param byte) (bits transferred LSB first):
  Output /SCK=LOW and SIO=databit (when writing), then wait at least 5us
  Output /SCK=HIGH, wait at least 5us, then read SIO=databit (when reading)
  In either direction, data is output on (or immediately after) falling edge.
Ideally, <both> commands and parameters should be transmitted LSB-first (unlike the original Seiko document, which recommends LSB-first for data, and MSB-first for commands).

Command Register
  Command Register
    Fwd  Rev
    0-3  7-4 Fixed Code (must be 06h = 0110b) (same for Fwd and Rev)
    4-6  3-1 Command
             Fwd Rev Parameter bytes (read/write access)
             0   0   1 byte, status register 1
             4   1   1 byte, status register 2
             2   2   7 bytes, date & time (year,month,day,day_of_week,hh,mm,ss)
             6   3   3 bytes, time (hh,mm,ss)
             1*  4*  1 byte, int1, frequency duty setting
             1*  4*  3 bytes, int1, alarm time 1 (day_of_week, hour, minute)
             5   5   3 bytes, int2, alarm time 2 (day_of_week, hour, minute)
             3   6   1 byte, clock adjustment register
             7   7   1 byte, free register
    7    0   Parameter Read/Write Access (0=Write, 1=Read)
* INT1: Type and number of parameters depend on INT1 setting in stat reg2.
The "Fwd" bit numbers and command values for LSB-first command transfers (ie. both commands and parameters use the same bit-order).
The "Rev" numbers/values are for MSB-first command transfers (ie. commands using opposite bit-order than parameters, as being suggested by Seiko).

Control and Status Registers
  Status Register 1
    0   W   Reset                (0=Normal, 1=Reset)
    1   R/W 12/24 hour mode      (0=12 hour, 1=24 hour)
    2-3 R/W General purpose bits
    4   R   Interrupt 1 Flag (1=Yes)                      ;auto-cleared on read
    5   R   Interrupt 2 Flag (1=Yes)                      ;auto-cleared on read
    6   R   Power Low Flag (0=Normal, 1=Power is/was low) ;auto-cleared on read
    7   R   Power Off Flag (0=Normal, 1=Power was off)    ;auto-cleared on read
    Power off indicates that the battery was removed or fully discharged,
    all registers are reset to 00h (or 01h), and must be re-initialized.
  Status Register 2
    0-3 R/W INT1 Mode/Enable
            0000b Disable
            0x01b Selected Frequency steady interrupt
            0x10b Per-minute edge interrupt
            0011b Per-minute steady interrupt 1 (duty 30.0 seconds)
            0100b Alarm 1 interrupt
            0111b Per-minute steady interrupt 2 (duty 0.0079 seconds)
            1xxxb 32kHz output
    4-5 R/W General purpose bits
    6   R/W INT2 Enable
            0b    Disable
            1b    Alarm 2 interrupt
    7   R/W Test Mode (0=Normal, 1=Test, don't use) (cleared on Reset)
  Clock Adjustment Register (to compensate oscillator inaccuracy)
    0-7 R/W Adjustment (00h=Normal, no adjustment)
  Free Register
    0-7 R/W General purpose bits

Date Registers
  Year Register
    0-7 R/W Year     (BCD 00h..99h = 2000..2099)
  Month Register
    0-4 R/W Month    (BCD 01h..12h = January..December)
    5-7 -   Not used (always zero)
  Day Register
    0-5 R/W Day      (BCD 01h..28h,29h,30h,31h, range depending on month/year)
    6-7 -   Not used (always zero)
  Day of Week Register (septenary counter)
    0-2 R/W Day of Week (00h..06h, custom assignment, usually 0=Monday?)
    3-7 -   Not used (always zero)

Time Registers
  Hour Register
    0-5 R/W Hour     (BCD 00h..23h in 24h mode, or 00h..11h in 12h mode)
    6   *   AM/PM    (0=AM before noon, 1=PM after noon)
            * 24h mode: AM/PM flag is read only (PM=1 if hour = 12h..23h)
            * 12h mode: AM/PM flag is read/write-able
            * 12h mode: Observe that 12 o'clock is defined as 00h (not 12h)
    7   -   Not used (always zero)
  Minute Register
    0-6 R/W Minute   (BCD 00h..59h)
    7   -   Not used (always zero)
  Second Register
    0-6 R/W Minute   (BCD 00h..59h)
    7   -   Not used (always zero)

Alarm 1 and Alarm 2 Registers
  Alarm1 and Alarm2 Day of Week Registers (INT1 and INT2 each)
    0-2 R/W Day of Week (00h..06h)
    3-6 -   Not used (always zero)
    7   R/W Compare Enable (0=Alarm every day, 1=Alarm only at specified day)
  Alarm1 and Alarm2 Hour Registers (INT1 and INT2 each)
    0-5 R/W Hour     (BCD 00h..23h in 24h mode, or 00h..11h in 12h mode)
    6   R/W AM/PM    (0=AM, 1=PM) (must be correct even in 24h mode?)
    7   R/W Compare Enable (0=Alarm every hour, 1=Alarm only at specified hour)
  Alarm1 and Alarm2 Minute Registers (INT1 and INT2 each)
    0-6 R/W Minute   (BCD 00h..59h)
    7   R/W Compare Enable (0=Alarm every min, 1=Alarm only at specified min)
  Selected Frequency Steady Interrupt Register (INT1 only) (when Stat2/Bit2=0)
    0   R/W Enable 1Hz Frequency  (0=Disable, 1=Enable)
    1   R/W Enable 2Hz Frequency  (0=Disable, 1=Enable)
    2   R/W Enable 4Hz Frequency  (0=Disable, 1=Enable)
    3   R/W Enable 8Hz Frequency  (0=Disable, 1=Enable)
    4   R/W Enable 16Hz Frequency (0=Disable, 1=Enable)
            The signals are ANDed when two or more frequencies are enabled,
            ie. the /INT signal gets LOW when either of the signals is LOW.
    5-7 R/W General purpose bits
Note: There is only one register shared as "Selected Frequency Steady Interrupt" (accessed as single byte parameter when Stat2/Bit2=0) and as "Alarm1 Minute" (accessed as 3rd byte of 3-byte parameter when Stat2/Bit2=1), changing either value will also change the other value.

There's only one /INT signal, shared for both INT1 and INT2.
In the NDS, it is connected to the SI-input of the SIO unit (and so, also shared with SIO interrupts). To enable the interrupt, RCNT should be set to 8144h (Bit14-15=General Purpose mode, Bit8=SI Interrupt Enable, Bit6,2=SI Output/High).
The Output/High settings seems to be used as pullup (giving faster reactions on low-to-high transitions) (nethertheless, in most cases it seems to be also working okay as Input, ie. with RCNT=8100h).
The RCNT interrupt is generated on high-to-low transitions on the SI line (but only if the IRQ is enabled in RCNT.8, and only if RCNT is set to general purpose mode) (note: changing RCNT.8 from off-to-on does NOT generate IRQs, even when SI is LOW).

  1 /INT      8 VDD
  2 XOUT      7 SIO
  3 XIN       6 /SCK
  4 GND       5 CS

  DS Serial Peripheral Interface Bus (SPI)

Serial Peripheral Interface Bus
SPI Bus is a 4-wire (Data In, Data Out, Clock, and Chipselect) serial bus.
The NDS supports the following SPI devices (each with its own chipselect).
DS Firmware Serial Flash Memory
DS Touch Screen Controller (TSC)
DS Power Management

40001C0h - NDS7 - SPICNT - SPI Bus Control/Status Register
  0-1   Baudrate (0=4MHz/Firmware, 1=2MHz/Touchscr, 2=1MHz/Powerman., 3=512KHz)
  2-6   Not used            (Zero)
  7     Busy Flag           (0=Ready, 1=Busy) (presumably Read-only)
  8-9   Device Select       (0=Powerman., 1=Firmware, 2=Touchscr, 3=Reserved)
  10    Transfer Size       (0=8bit/Normal, 1=16bit/Bugged)
  11    Chipselect Hold     (0=Deselect after transfer, 1=Keep selected)
  12-13 Not used            (Zero)
  14    Interrupt Request   (0=Disable, 1=Enable)
  15    SPI Bus Enable      (0=Disable, 1=Enable)
The "Hold" flag should be cleared BEFORE transferring the LAST data unit, the chipselect will be then automatically cleared after the transfer, the program should issue a WaitByLoop(3) manually AFTER the LAST transfer.

40001C2h - NDS7 - SPIDATA - SPI Bus Data/Strobe Register (R/W)
The SPI transfer is started on writing to this register, so one must <write> a dummy value (should be zero) even when intending to <read> from SPI bus.
  0-7   Data
  8-15  Not used (always zero, even in bugged-16bit mode)
During transfer, the Busy flag in SPICNT is set, and the written SPIDATA value is transferred to the device (via output line), simultaneously data is received (via input line). Upon transfer completion, the Busy flag goes off (with optional IRQ), and the received value can be then read from SPIDATA, if desired.

SPICNT Bits 12,13 appear to be unused (always zero), although the BIOS (attempts to) set Bit13=1, and Bit12=Bit11 when accessing the firmware.
The SPIDATA register is restricted to 8bit, so that only each 2nd byte will appear in SPIDATA when attempting to use the bugged-16bit mode.

Cartridge Backup Auxiliar SPI Bus
The NDS Cartridge Slot uses a separate SPI bus (with other I/O Ports), see
DS Cartridge Backup

  DS Touch Screen Controller (TSC)

Texas Instruments TSC2046 (NDS)
Asahi Kasei Microsystems AK4148AVT (NDS-Lite)
The Touch Screen Controller (for lower LCD screen) is accessed via SPI bus,
DS Serial Peripheral Interface Bus (SPI)

Control Byte (transferred MSB first)
  0-1  Power Down Mode Select
  2    Reference Select (0=Differential, 1=Single-Ended)
  3    Conversion Mode  (0=12bit, max CLK=2MHz, 1=8bit, max CLK=3MHz)
  4-6  Channel Select   (0-7, see below)
  7    Start Bit (Must be set to access Control Byte)

  0 Temperature 0 (requires calibration, step 2.1mV per 1'C accuracy)
  1 Touchscreen Y-Position  (somewhat 0B0h..F20h, or FFFh=released)
  2 Battery Voltage         (not used, connected to GND in NDS, always 000h)
  3 Touchscreen Z1-Position (diagonal position for pressure measurement)
  4 Touchscreen Z2-Position (diagonal position for pressure measurement)
  5 Touchscreen X-Position  (somewhat 100h..ED0h, or 000h=released)
  6 AUX Input               (connected to Microphone in the NDS)
  7 Temperature 1 (difference to Temp 0, without calibration, 2'C accuracy)
All channels can be accessed in Single-Ended mode.
In differential mode, only channel 1,3,4,5 (X,Z1,Z2,Y) can be accessed.
On AK4148AVT, channel 6 (AUX) is split into two separate channels, IN1 and IN2, separated by Bit2 (Reference Select). IN1 is selected when Bit2=1, IN2 is selected when Bit2=0 (despite of the Bit2 settings, both IN1 and IN2 are using single ended more). On the NDS-Lite, IN1 connects to the mircrophone (as on original NDS), and the new IN2 input is simply wired to VDD3.3 (which is equal the the external VREF voltage, so IN2 is always FFFh).

Power Down Mode
  Mode /PENIRQ   VREF  ADC   Recommended use
  0    Enabled   Auto  Auto  Differential Mode (Touchscreen, Penirq)
  1    Disabled  Off   On    Single-Ended Mode (Temperature, Microphone)
  2    Enabled   On    Off   Don't use
  3    Disabled  On    On    Don't use
Allows to enable/disable the /PENIRQ output, the internal reference voltage (VREF), and the Analogue-Digital Converter.
For AK4148AVT, Power Down modes are slightly different (among others, /PENIRQ is enabled in Mode 0..2).

Reference Voltage (VREF)
VREF is used as reference voltage in single ended mode, at 12bit resolution one ADC step equals to VREF/4096. The TSC generates an internal VREF of 2.5V (+/-0.05V), however, the NDS uses as external VREF of 3.33V (sinks to 3.31V at low battery charge), the external VREF is always enabled, no matter if internal VREF is on or off. Power Down Mode 1 disables the internal VREF, which may reduce power consumption in single ended mode. After conversion, Power Down Mode 0 should be restored to re-enable the Penirq signal.

Sending the first Command after Chip-Select
Switch chipselect low, then output the command byte (MSB first).

Reply Data
The following reply data is received (via Input line) after the Command byte has been transferred: One dummy bit (zero), followed by the 8bit or 12bit conversion result (MSB first), followed by endless padding (zero).
Note: The returned ADC value may become unreliable if there are longer delays between sending the command, and receiving the reply byte(s).

Sending further Commands during/after receiving Reply Data
In general, the Output line should be LOW during the reply period, however, once when Data bit6 has been received (or anytime later), a new Command can be invoked (started by sending the HIGH-startbit, ie. Command bit7), simultanously, the remaining reply-data bits (bit5..0) can be received.
In other words, the new command can be output after receiving 3 bits in 8bit mode (the dummy bit, and data bits 7..6), or after receiving 7 bits in 12bit mode (the dummy bit, and data bits 11..6).
In practice, the NDS SPI register always transfers 8 bits at once, so that one would usually receive 8 bits (rather than above 3 or 7 bits), before outputting a new command.

Touchscreen Position
Read the X and Y positions in 12bit differential mode, then convert the touchscreen values (adc) to screen/pixel positions (scr), as such:
  scr.x = (adc.x-adc.x1) * (scr.x2-scr.x1) / (adc.x2-adc.x1) + (scr.x1-1)
  scr.y = (adc.y-adc.y1) * (scr.y2-scr.y1) / (adc.y2-adc.y1) + (scr.y1-1)
The X1,Y1 and X2,Y2 calibration points are found in Firmware User Settings,
DS Firmware User Settings
scr.x1,y1,x2,y2 are originated at 1,1 (converted to 0,0 by above formula).

Touchscreen Pressure (not supported on DSi)
To calculate the pressure resistance, in respect to X/Y/Z positions and X/Y plate resistances, either of below formulas can be used,
  Rtouch = (Rx_plate*Xpos*(Z2pos/Z1pos-1))/4096
  Rtouch = (Rx_plate*Xpos*(4096/Z1pos-1)-Ry_plate*(1-Ypos))/4096
The second formula requires less CPU load (as it doesn't require to measure Z2), the downside is that one must know both X and Y plate resistance (or at least their ratio). The first formula doesn't require that ratio, and so Rx_plate can be set to any value, setting it to 4096 results in
  touchval = Xpos*(Z2pos/Z1pos-1)
Of course, in that case, touchval is just a number, not a resistance in Ohms.

Touchscreen Notes
It may be impossible to press locations close to the screen borders.
When pressing two or more locations the TSC values will be somewhere in the middle of these locations.
The TSC values may be garbage if the screen becomes newly pressed or released, to avoid invalid inputs: read TSC values at least two times, and ignore BOTH positions if ONE position was invalid.

Microphone / AUX Channel
Observe that the microphone amplifier is switched off after power up, see:
DS Power Management

Temperature Calculation (not supported on DSi)
TP0 decreases by circa 2.1mV per degree Kelvin. The voltage difference between TP1 minus TP0 increases by circa 0.39mV (1/2573 V) per degree Kelvin. At VREF=3.33V, one 12bit ADC step equals to circa 0.8mV (VREF/4096).
Temperature can be calculated at best resolution when using the current TP0 value, and two calibration values (an ADC value, and the corresponding temperature in degrees kelvin):
  K = (CAL.TP0-ADC.TP0) * 0.4 + CAL.KELVIN
Alternately, temperature can be calculated at rather bad resolution, but without calibration, by using the difference between TP1 and TP0:
  K = (ADC.TP1-ADC.TP0) * 8568 / 4096
To convert Kelvin to other formats,
  Celsius:     C = (K-273.15)
  Fahrenheit:  F = (K-273.15)*9/5+32
  Reaumur:     R = (K-273.15)*4/5
  Rankine:     X = (K)*9/5
The Temperature Range for the TSC 2046 chip is -40'C..+85'C (for AK4181AVT only -20'C..+70'C). According to Nintendo, the DS should not be exposed to "extreme" heat or cold, the optimal battery charging temperature is specified as +10'C..+40'C.
The original firmware does not support temperature calibration, calibration is supported by nocash firmware (if present). See Extended Settings,
DS Firmware Extended Settings

  VCC  1|o       |16 DCLK
  X+   2|        |15 /CS
  Y+   3|  TSC   |14 DIN
  X-   4|  2046  |13 BUSY
  Y-   5|        |12 DOUT
  GND  6|        |11 /PENIRQ
  VBAT 7|        |10 IOVDD
  AUX  8|________|9  VREF

For AK4181AVT, same pins as above, except that IOVDD replaced by the new IN2 input, the pin is wired to VDD3.3 (so IN2 is always equal to VREF, which is wired to VDD3.3, too) (and AUX is renamed to IN1, and is kept used for MIC input).

DSi Touchscreen Controller (in NDS mode)
DSi in NDS mode does support only X, Y, and MIC (all other channels do return FFFh in 12bit mode, and FFh in 8bit mode, ie. no pressure, no temperature, and no GNDed battery sensor). On DSi, MIC does return data in both single-ended and differential mode (unlike as on real NDS).

DSi Touchscreen Controller (in DSi mode)
The DSi touchscreen controller supports a NDS backwards compatibility mode. But, in DSi mode, it is working entirely different (it's still accessed via SPI bus, but with some new MODE/INDEX values).
DSi Touchscreen/Sound Controller
The NDS Touchscreen controller did additionally allow to read Temperature and Touchscreen Pressure - unknown if the DSi is also supporting such stuff (via whatever DSi-specific registers).
The touchscreen hardware can be switched to NDS compatibility mode (for older games), but unknown how to do that.

  DS Power Management

The DS contains several Power Managment functions, some accessed via I/O ports, some accessed via SPI bus (described later on below).

4000304h - NDS9 - POWCNT1 - Graphics Power Control Register (R/W)
  0     Enable Flag for both LCDs (0=Disable) (Prohibited, see notes)
  1     2D Graphics Engine A      (0=Disable) (Ports 008h-05Fh, Pal 5000000h)
  2     3D Rendering Engine       (0=Disable) (Ports 320h-3FFh)
  3     3D Geometry Engine        (0=Disable) (Ports 400h-6FFh)
  4-8   Not used
  9     2D Graphics Engine B      (0=Disable) (Ports 1008h-105Fh, Pal 5000400h)
  10-14 Not used
  15    Display Swap (0=Send Display A to Lower Screen, 1=To Upper Screen)
  16-31 Not used
Use SwapBuffers command once after enabling Rendering/Geometry Engine.
Improper use of Bit0 may damage the hardware?
When disabled, corresponding Ports become Read-only, corresponding (palette-) memory becomes read-only-zero-filled.

4000304h - NDS7 - POWCNT2 - Sound/Wifi Power Control Register (R/W)
  Bit   Expl.
  0     Sound Speakers (0=Disable, 1=Enable) (Initial setting = 1)
  1     Wifi           (0=Disable, 1=Enable) (Initial setting = 0)
  2-31  Not used
Note: Bit0 disables the internal Speaker only, headphones are not disabled.
Bit1 disables Port 4000206h, and Ports 4800000h-480FFFFh.

4000206h - NDS7 - WIFIWAITCNT - Wifi Waitstate Control
  Bit   Expl.
  0-2   Wifi WS0 Control (0-7) (Ports 4800000h-4807FFFh)
  3-5   Wifi WS1 Control (0-7) (Ports 4808000h-480FFFFh)
  4-15  Not used (zero)
This register is initialized by firmware on power-up, don't change.
Note: WIFIWAITCNT can be accessed only when enabled in POWCNT2.

4000301h - NDS7 - HALTCNT - Low Power Mode Control (R/W)
In Halt mode, the CPU is paused as long as (IE AND IF)=0.
In Sleep mode, most of the hardware including sound and video are paused, this very-low-power mode could be used much like a screensaver.
  Bit   Expl.
  0-5   Not used (zero)
  6-7   Power Down Mode  (0=No function, 1=Enter GBA Mode, 2=Halt, 3=Sleep)
The HALTCNT register should not be accessed directly. Instead, the BIOS Halt, Sleep, CustomHalt, IntrWait, or VBlankIntrWait SWI functions should be used.
BIOS Halt Functions
ARM CP15 System Control Coprocessor
The NDS9 does not have a HALTCNT register, instead, the Halt function uses the co-processor opcode "mcr p15,0,r0,c7,c0,4" - this opcode locks up if interrupts are disabled via IME=0 (unlike NDS7 HALTCNT method which doesn't check IME).

4000300h - NDS7/NDS9 - POSTFLG - BYTE - Post Boot Flag (R/W)
The NDS7 and NDS9 post boot flags are usually set upon BIOS/Firmware boot completion, once when set the reset vector is redirected to the debug handler of Nintendo's hardware debugger. That allows the NDS7 debugger to capture accidental jumps to address 0, that appears to be a common problem with HLL-programmers, asm-coders know that (and why) they should not jump to 0.
  Bit   Expl.
  0     Post Boot Flag (0=Boot in progress, 1=Boot completed)
  1     NDS7: Not used (always zero), NDS9: Bit1 is read-writeable
  2-7   Not used (always zero)
There are some write-restrictions: The NDS7 register can be written to only from code executed in BIOS. Bit0 of both NDS7 and NDS9 registers cannot be cleared (except by Reset) once when it is set.

Power Management Device - Mitsumi 3152A (NDS) / Mitsumi 3205B (NDS-LITE)
The Power Management Device is accessed via SPI bus,
DS Serial Peripheral Interface Bus (SPI)
To access the device, write the Index Register, then read or write the data register, and release the chipselect line when finished.
  Index Register
  Bit0-6 Register Select          (0..3) (0..4 for DS-Lite) (0..7Fh for DSi)
  Bit7   Register Direction       (0=Write, 1=Read)
  Register 0 - Powermanagement Control (R/W)
  Bit0   Sound Amplifier Enable   (0=Disable, 1=Enable)
         (Old-DS:  Disabled: Sound is very silent, but still audible)
         (DS-Lite: Disabled: Sound is NOT audible)
         (DSi in NDS Mode: R/W, but effect is unknown yet)
         (DSi in DSi Mode: Not used, Bit0 is always 1)
  Bit1   Sound Amplifier Mute     (0=Normal, 1=Mute) (Old-DS Only, not DS-Lite)
         (Old-DS:  Muted: Sound is NOT audible, that works only if Bit0=1)
         (DS-Lite: Not used, Bit1 is always zero)
         (DSi in NDS Mode: R/W, but effect is unknown yet)
         (DSi in DSi Mode: R/W, but effect is unknown yet)
  Bit2   Lower Backlight          (0=Disable, 1=Enable)
  Bit3   Upper Backlight          (0=Disable, 1=Enable)
  Bit4   Power LED Blink Enable   (0=Always ON, 1=Blinking OFF/ON)
  Bit5   Power LED Blink Speed    (0=Slow, 1=Fast) (only if Blink enabled)
         (DSi: Power LED Blinking isn't supported, neither in NDS nor DSi mode)
  Bit6   DS System Power          (0=Normal, 1=Shut Down)
  Bit7   Not used                 (always 0)
  Register 1 - Battery Status (R)
  Bit0   Battery Power LED Status (0=Power Good/Green, 1=Power Low/Red)
         (DSi: Usually 0, not tested if it changes upon Power=Low)
  Bit1-7 Not used
  Register 2 - Microphone Amplifier Control (R/W)
  Bit0   Amplifier                (0=Disable, 1=Enable)
  Bit1-7 Not used                 (always 0)
  (DSi in NDS Mode: looks same as NDS, ie. only bit0 is R/W)
  (DSi in DSi Mode: Not used, always FFh)
  Register 3 - Microphone Amplifier Gain Control (R/W)
  Bit0-1 Gain                     (0..3=Gain 20, 40, 80, 160)
  Bit2-7 Not used                 (always 0)
  (DSi in NDS Mode: looks same as NDS, ie. only bit0-1 are R/W)
  (DSi in DSi Mode: Not used, always FFh)
  Register 4 - DS-Lite and DSi Only - Backlight Levels/Power Source (R/W)
  Bit0-1 Backlight Brightness (0..3=Low,Med,High,Max)   (R/W)
         (when bit2+3 are both set, then reading bit0-1 always returns 3)
  Bit2   Force Max Brightness when Bit3=1 (0=No, 1=Yes) (R/W)
  Bit3   External Power Present           (0=No, 1=Yes) (Read-Only)
  Bit4-7 Unknown (Always 4) (Read-Only)
  (DSi in NDS Mode: looks same as in DSi mode)
  (DSi in DSi Mode: Bit0-1 are R/W, but ignored, bit2-3 are always 0)
  Register 10h - DSi Only - Backlight Mirrors & Reset (R/W)
  Bit0   Reset (0=No, 1=Reboot) (same/similar as BPTWL reset feature?)
  Bit1   Unknown (R/W) (note: whatever it is, it isn't warmboot flag)
  Bit2-3 Mirror of Register 0, bit2-3 (backlight enable bits) (R/W)
  Bit4-7 Not used (always 0)
  (DSi in NDS Mode: seems to behave same as in DSi mode, except that, reset
  defaults to warmboot, since BPTWL always has warmboot enabled in NDS mode)
On Old-DS, registers 4..7Fh are mirrors of 0..3. On DS-Lite, registers 5,6,7 are mirrors of 4, register 8..7Fh are mirrors of 0-7.
On DSi (in DS mode), index 0,1,2,3,4,10h are used (reads as 0Fh,00h,00h,01h,41h,0Fh - regardless of backlight level, and power source), index 5..0Fh and 11h..7Fh return 00h (ie. unlike DS and DS-Lite, there are no mirrors; aside from the 3 bits in register 10h).

Backlight Dimming / Backlight caused Shut-Down(s)
The above bits are essentially used to switch Backlights on or off. However, there a number of strange effects. Backlight dimming is possible by pulse width modulation, ie. by using a timer interrupt to issue pulse widths of N% ON, and 100-N% OFF. Too long pulses are certainly resulting in flickering. Too short pulses are ignored, the backlights will remain OFF, even if the ON and OFF pulses are having the same length. Much too short pulses cause the power supply to shut-down; after changing the backlight state, further changes must not occur within the next (circa) 2500 clock cycles. The mainboard can be operated without screens & backlights connected, however, if so, the power supply will shut-down as soon as backlights are enabled.
Pulse width modulated dimming does also work on the DS-Lite, allowing to use smoother fade in/out effects as when using the five "hardware" levels (Off,Low,Med,High,Max).

Memory Power Down Functions
DS Main Memory Control
DS Firmware Serial Flash Memory

  DS Main Memory Control

Main Memory
The DS Main Memory is 2Mx16bit (4MByte), 1.8V Pseudo SRAM (PSRAM); all Dynamic RAM refresh is handled internally, the chip doesn't require any external refresh signals, and alltogether behaves like Static RAM. Non-sequential access time is 70ns, sequential (burst) access time is 12ns.

Main Memory Control
The memory chips contain built-in Control functions, which can be accessed via Port 27FFFFEh and/or by EXMEMCNT Bit 14. Nintendo is using at least two different types of memory chips in DS consoles, Fujitsu 82DBS02163C-70L, and ST M69AB048BL70ZA8, both appear to have different control mechanisms, other chips (with 8MB size) are used in the semi-professional DS hardware debuggers, and further chips may be used in future, so using the memory control functions may lead into compatibitly problems.

Power Consumption / Power Control
Power Consumption during operation (read/write access) is somewhat 30mA, in standby mode (no read/write access) consumption is reduced to 100uA.
Furthermore, a number of power-down modes are supported: In "Deep" Power Down mode the refresh is fully disabled, consumption is 10uA (and all data will be lost), in "Partial" Power Down modes only fragment of memory is refreshed, for smallest fragments, consumption goes to down to circa 50uA. The chip cannot be accessed while it is in Deep or Partial Power Down mode.

Fujitsu 82DBS02163C-70L
The Configuration Register (CR) can be written to by the following sequence:
  LDRH R0,[27FFFFEh]      ;read one value
  STRH R0,[27FFFFEh]      ;write should be same value as above
  STRH R0,[27FFFFEh]      ;write should be same value as above
  STRH R0,[27FFFFEh]      ;write any value
  STRH R0,[27FFFFEh]      ;write any value
  LDRH R0,[2400000h+CR*2] ;read, address-bits are defining new CR value
Do not access any other Main Memory addresses during above sequence (ie. disable interrupts, and do not execute the sequence by code located in Main Memory). The CR value is write-only. The CR bits are:
  Bit    Expl.
  0-6    Reserved         (Must be 7Fh)
  7      Write Control
           0=WE Single Clock Pulse Control without Write Suspend Function
           1=WE Level Control with Write Suspend Function)
          Burst Read/Single Write is not supported at WE Single Clock Mode.
  8      Reserved         (Must be 1)
  9      Valid Clock Edge (0=Falling Edge, 1=Rising Edge)
  10     Single Write     (0=Burst Read/Burst Write, 1=Burst Read/Single Write)
  11     Burst Sequence   (0=Reserved, 1=Sequential)
  12-14  Read Latency     (1=3 clocks, 2=4 clocks, 3=5 clocks, other=Reserved)
  15     Mode
           0=Synchronous:  Burst Read, Burst Write
           1=Asynchronous: Page Read, Normal Write
          In Mode 1 (Async), only the Partial Size bits are used,
          all other bits, CR bits 0..18, must be "1".
  16-18  Burst Length     (2=8 Words, 3=16Words, 7=Continous, other=Reserved)
  19-20  Partial Size     (0=1MB, 1=512KB, 2=Reserved, 3=Deep/0 bytes)
The Power Down mode is entered by setting CE2=LOW, this can be probably done by setting EXMEMCNT Bit14 to zero.

ST Microelectronics M69AB048BL70ZA8
The chip name decodes as PSRAM (M96), Asynchronous (A), 1.8V Burst (B), 2Mx16 (048), Two Chip Enables (B), Low Leakage (L), 70ns (70), Package (ZA), -30..+85'C (8).
There are three data sheets for different PSRAM chips available at (unfortunately none for M69AB048BL70ZA8), each using different memory control mechanisms.

The NDS9 BIOS contains the following Main Memory initialization code, that method doesn't match up with any ST (nor Fujitsu) data sheets that I've seen. At its best, it looks like a strange (and presumably non-functional) mix-up of different ST control methods.
  STRH 2000h,[4000204h]
  LDRH R0,[27FFFFEh]
  STRH R0,[27FFFFEh]
  STRH R0,[27FFFFEh]
  STRH E732h,[27FFFFEh]
  LDRH R0,[27E57FEh]
  STRH 6000h,[4000204h]
In the above BIOS code, EXMEMCNT.14 appears to be used to unlock the control register. However, the NDS Firmware appears to use EXMEMCNT.14 to switch Main Memory into Power Down mode before entering GBA mode.

  DS Backwards-compatible GBA-Mode

When booting a 32pin GBA cartridge, the NDS is automatically switched into GBA mode, in that mode all NDS related features are disabled, and the console behaves (almost) like a GBA.

GBA Features that are NOT supported on NDS in GBA Mode.
Unlike real GBA, the NDS does not support 8bit DMG/CGB cartridges.
The undocumented Internal Memory Control register (Port 800h) isn't supported, so the NDS doesn't allow to use 'overclocked' RAM.
The NDS doesn't have a link-port, so GBA games can be played only in single player mode, link-port accessories cannot be used, and the NDS cannot run GBA code via multiboot.

GBA Features that are slightly different on NDS in GBA Mode.
The CPU, Timers, and Sound Frequencies are probably clocked at 16.76MHz; 33.51Mhz/2; a bit slower than the original GBA's 16.78MHz clock?
In the BIOS, a single byte in a formerly 00h-filled area has been changed from 00h to 01h, resulting in SWI 0Dh returning a different BIOS checksum.
The GBA picture can be shown on upper or lower screen (selectable in boot-menu), the backlight for the selected screen is always on, resulting in different colors & much better visibility than original GBA. Unlike GBA-SP, the NDS doesn't have a backlight-button.

Screen Border in GBA mode
The GBA screen is centered in the middle of the NDS screen. The surrounding pixels are defined by 32K-color bitmap data in VRAM Block A and B. Each frame, the GBA picture is captured into one block, and is displayed in the next frame (while capturing new data to the other block).
To get a flicker-free border, both blocks should be initialized to contain the same image before entering GBA mode (usually both are zero-filled, resulting in a plain black border).
Note: When using two different borders, the flickering will be irregular - so there appears to be a frame inserted or skipped once every some seconds in GBA mode?!

Switching from NDS Mode to GBA Mode
  --- NDS9: ---
  ZEROFILL VRAM A,B     ;init black screen border (or other color/image)
  POWCNT=8003h          ;enable 2D engine A on upper screen (0003h=lower)
  EXMEMCNT=...          ;set Async Main Memory mode (clear bit14)
  IME=0                 ;disable interrupts
  SWI 06h               ;halt with interrupts disabled (lockdown)
  --- NDS7: ---
  POWERMAN.REG0=09h     ;enable sound amplifier & upper backlight (05h=lower)
  IME=0                 ;disable interrupts
  wait for VCOUNT=200   ;wait until VBlank
  SWI 1Fh with R2=40h   ;enter GBA mode, by CustomHalt(40h)
After that, the GBA BIOS will be booted, the GBA Intro will be displayed, and the GBA cartridge (if any) will be started.

  DS Debug Registers (Emulator/Devkits)

No$gba Emulator Pseudo I/O Ports (no$gba) (NDS9)
  4FFFA00h..A0Fh R Emulation ID (16 bytes, eg. "no$gba v2.7", padded with 20h)
  4FFFA10h       W String Out (raw)
  4FFFA14h       W String Out (with %param's)
  4FFFA18h       W String Out (with %param's, plus linefeed)
  4FFFA1Ch       W Char Out (nocash)
  4FFFA20h..A27h R Clock Cycles (64bit)
  4FFFA28h..A3Fh - N/A

Ensata Emulator Pseudo I/O Ports (NDS9)
  4000640h (32bit) ;aka CLIPMTX_RESULT (mis-used to invoke detection)
  4000006h (16bit) ;aka VCOUNT (mis-used to get detection result)
  4FFF010h (32bit) ;use to initialize/unlock/reset something
  4FFF000h (8bit)  ;debug message character output (used when Ensata detected)
The Ensata detection works by mis-using CLIPMTX_RESULT and VCOUNT registers:
  [4000640h]=2468ACE0h      ;CLIPMTX_RESULT (on real hardware it's read-only)
  if ([4000006h] AND 1FFh)=10Eh ;VCOUNT (on real hardware it's 000h..106h)
    [4FFF010h]=13579BDFh        ;\initialize/reset something
    [4FFF010h]=FDB97531h        ;/
Once when a commercial game has detected Ensata, it stops communicating with the ARM7, and instead it does seem to want to communicate with the Ensata executable (which has little to do with real NDS hardware). Ie. aside from "unlocking" port 4FFF000h, it does also "lock" access to the ARM7 hardware (like sound, touchscreen, RTC, etc).

ISD (Intelligent Systems Debugger or so) I/O Ports
The ISD ports seem to be real (non-emulated) debugging ports, mapped to the GBA Slot region at 8000000h-9FFFFFFh, and used to output text messages, and possible also other debugging stuff.
There are appear to be two variants: nitroemu and cgbemu (the latter appears to be dating back to old 8bit CGB hardware; which was apparently still used for the NDS two hardware generations later).

NDS Devkit
In Nintendo's devkit, debug messages are handled in file "os_printf.c", this file detects the available hardware/software based debug I/O ports, and redirects the [OS_PutString] vector to the corresponding string_out function (eg. to OS_PutStringAris for writing a 00h-terminated string to port 4FFF000h). With some minimal efforts, this could be redirected to the corresponding no$gba debug I/O ports.

  DS Cartridges, Encryption, Firmware

DS Cartridge Header
DS Cartridge Secure Area
DS Cartridge Icon/Title
DS Cartridge Protocol
DS Cartridge Backup
DS Cartridge I/O Ports
DS Cartridge NitroROM and NitroARC File Systems
DS Cartridge PassMe/PassThrough
DS Cartridge GBA Slot

DS Cart Rumble Pak
DS Cart Slider with Rumble
DS Cart Expansion RAM
DS Cart Unknown Extras

Special Cartridges
DS Cart Cheat Action Replay DS
DS Cart Cheat Codebreaker DS

DS Encryption by Gamecode/Idcode (KEY1)
DS Encryption by Random Seed (KEY2)

Firmware / Wifi Flash
DS Firmware Serial Flash Memory
DS Firmware Header
DS Firmware Wifi Calibration Data
DS Firmware Wifi Internet Access Points
DS Firmware User Settings
DS Firmware Extended Settings

  DS Cartridge Header

Header Overview (loaded from ROM Addr 0 to Main RAM 27FFE00h on Power-up)
  Address Bytes Expl.
  000h    12    Game Title  (Uppercase ASCII, padded with 00h)
  00Ch    4     Gamecode    (Uppercase ASCII, NTR-<code>)        (0=homebrew)
  010h    2     Makercode   (Uppercase ASCII, eg. "01"=Nintendo) (0=homebrew)
  012h    1     Unitcode    (00h=NDS, 02h=NDS+DSi, 03h=DSi) (bit1=DSi)
  013h    1     Encryption Seed Select (00..07h, usually 00h)
  014h    1     Devicecapacity         (Chipsize = 128KB SHL nn) (eg. 7 = 16MB)
  015h    7     Reserved    (zero filled)
  01Ch    1     Reserved    (zero)                      (except, used on DSi)
  01Dh    1     NDS Region  (00h=Normal, 80h=China, 40h=Korea) (other on DSi)
  01Eh    1     ROM Version (usually 00h)
  01Fh    1     Autostart (Bit2: Skip "Press Button" after Health and Safety)
                (Also skips bootmenu, even in Manual mode & even Start pressed)
  020h    4     ARM9 rom_offset    (4000h and up, align 1000h)
  024h    4     ARM9 entry_address (2000000h..23BFE00h)
  028h    4     ARM9 ram_address   (2000000h..23BFE00h)
  02Ch    4     ARM9 size          (max 3BFE00h) (3839.5KB)
  030h    4     ARM7 rom_offset    (8000h and up)
  034h    4     ARM7 entry_address (2000000h..23BFE00h, or 37F8000h..3807E00h)
  038h    4     ARM7 ram_address   (2000000h..23BFE00h, or 37F8000h..3807E00h)
  03Ch    4     ARM7 size          (max 3BFE00h, or FE00h) (3839.5KB, 63.5KB)
  040h    4     File Name Table (FNT) offset
  044h    4     File Name Table (FNT) size
  048h    4     File Allocation Table (FAT) offset
  04Ch    4     File Allocation Table (FAT) size
  050h    4     File ARM9 overlay_offset
  054h    4     File ARM9 overlay_size
  058h    4     File ARM7 overlay_offset
  05Ch    4     File ARM7 overlay_size
  060h    4     Port 40001A4h setting for normal commands (usually 00586000h)
  064h    4     Port 40001A4h setting for KEY1 commands   (usually 001808F8h)
  068h    4     Icon/Title offset (0=None) (8000h and up)
  06Ch    2     Secure Area Checksum, CRC-16 of [[020h]..00007FFFh]
  06Eh    2     Secure Area Delay (in 131kHz units) (051Eh=10ms or 0D7Eh=26ms)
  070h    4     ARM9 Auto Load List RAM Address (?)
  074h    4     ARM7 Auto Load List RAM Address (?)
  078h    8     Secure Area Disable (by encrypted "NmMdOnly") (usually zero)
  080h    4     Total Used ROM size (remaining/unused bytes usually FFh-padded)
  084h    4     ROM Header Size (4000h)
  088h    38h   Reserved (zero filled) (except, [88h..93h] used on DSi)
  0C0h    9Ch   Nintendo Logo (compressed bitmap, same as in GBA Headers)
  15Ch    2     Nintendo Logo Checksum, CRC-16 of [0C0h-15Bh], fixed CF56h
  15Eh    2     Header Checksum, CRC-16 of [000h-15Dh]
  160h    4     Debug rom_offset   (0=none) (8000h and up)       ;only if debug
  164h    4     Debug size         (0=none) (max 3BFE00h)        ;version with
  168h    4     Debug ram_address  (0=none) (2400000h..27BFE00h) ;SIO and 8MB
  16Ch    4     Reserved (zero filled) (transferred, and stored, but not used)
  170h    90h   Reserved (zero filled) (transferred, but not stored in RAM)
DSi Cartridges are using an extended cartridge header,
DSi Cartridge Header
Newer NDS cartridges are reportedly containing RSA signatures - the format of that signatures is still unknown (probably it's same or similar as in DSi headers), those RSA signatures are required for running NDS carts on DSi consoles (at least with newer DSi firmwares) (the DSi firmware contains a whitelist with known checksums for all existing older NDS games, and requires RSA signatures in newer NDS games - this is making it impossible to run unlicensed/homebrew NDS programs on DSi, unless using trickery such like savegame exploits).

For more info about CRC-16, see description of GetCRC16 BIOS function,
BIOS Misc Functions
For the Logo checksum, the BIOS verifies only [15Ch]=CF56h, it does NOT verify the actual data at [0C0h-15Bh] (nor it's checksum), however, the data is verified by the firmware.

Secure Area Delay
The Secure Area Delay at header[06Eh] is counted in 130.912kHz units (which can be clocked via one of the hardware timers with prescaler=F/256 and reload=(10000h-((X AND 3FFFh)+2)); for some weird reason, in case of Header checksum it's ANDed with 1FFFh instead of 3FFFh). Commonly used values are X=051Eh (10ms), and X=0D7Eh (26ms).
The delay is used for all Blowfish encrypted commands, the actual usage/purpose differs depending on bit31 of the ROM Chip ID:
When ChipID.Bit31=0 (commands are sent ONCE): The delay is issued BEFORE sending the command:
Older/newer games are using delays of 10ms/26ms (although all known existing cartridges with Bit31=0 would actually work WITHOUT delays).
When ChipID.Bit31=1 (commands are repeated MULTIPLE times): The delay is issued AFTER sending the command for the FIRST time:
  Cmd,Delay,Cmd                               ;for 2x repeat
  Cmd,Delay,Cmd,Cmd,Cmd,CmdCmd,Cmd,Cmd,Cmd    ;for 9x repeat
Known games are using delays of 26ms (although all known existing cartridges (=Cooking Coach) with Bit31=1 would actually work with shorter delays of ca. 6.5ms).

NDS Gamecodes
This is the same code as the NTR-UTTD (NDS) or TWL-UTTD (DSi) code which is printed on the package and sticker on (commercial) cartridges (excluding the leading "NTR-" or "TWL-" part).
  U  Unique Code          (usually "A", "B", "C", or special meaning)
  TT Short Title          (eg. "PM" for Pac Man)
  D  Destination/Language (usually "J" or "E" or "P" or specific language)
The first character (U) is usually "A" or "B", in detail:
  A NDS common games
  B NDS common games
  C NDS common games
  D DSi-exclusive games
  H DSiWare (system utilities and browser) (eg. HNGP=browser)
  I NDS and DSi-enhanced games with built-in Infrared port
  K DSiWare (dsiware games and flipnote) (eg. KGUV=flipnote)
  N NDS nintendo channel demo's japan (NTR-NTRJ-JPN)
  T NDS many games
  U NDS utilities, educational games, or uncommon extra hardware?
  V DSi-enhanced games
  Y NDS many games
The second/third characters (TT) are:
  Usually an abbreviation of the game title (eg. "PM" for "Pac Man") (unless
  that gamecode was already used for another game, then TT is just random)
The fourth character (D) indicates Destination/Language:
  A Asian    E English/USA  I Italian   M Swedish  Q Danish   U Australian
  B N/A      F French       J Japanese  N Nor      R Russian  V EUR+AUS
  C Chinese  G N/A          K Korean    O Int      S Spanish  W..Z Europe #3..5
  D German   H Dutch        L USA #2    P Europe   T USA+AUS

  DS Cartridge Secure Area

The Secure Area is located in ROM at 4000h..7FFFh, it can contain normal program code and data, however, it can be used only for ARM9 boot code, it cannot be used for ARM7 boot code, icon/title, filesystem, or other data.

Secure Area Size
The Secure Area exists if the ARM9 boot code ROM source address (src) is located within 4000h..7FFFh, if so, it will be loaded (by BIOS via KEY1 encrypted commands) in 4K portions, starting at src, aligned by 1000h, up to address 7FFFh. The secure area size if thus 8000h-src, regardless of the ARM9 boot code size entry in header.
Note: The BIOS silently skips any NDS9 bootcode at src<4000h.
Cartridges with src>=8000h do not have a secure area.

Secure Area ID
The first 8 bytes of the secure area are containing the Secure Area ID, the ID is required (verified by BIOS boot code), the ID value changes during boot process:
  Value                Expl.
  "encryObj"           raw ID before encryption (raw ROM-image)
  (encrypted)          encrypted ID after encryption (encrypted ROM-image)
  "encryObj"           raw ID after decryption (verified by BIOS boot code)
  E7FFDEFFh,E7FFDEFFh  destroyed ID (overwritten by BIOS after verify)
If the decrypted ID does match, then the BIOS overwrites the first 8 bytes by E7FFDEFFh-values (ie. only the ID is destroyed). If the ID doesn't match, then the first 800h bytes (2K) are overwritten by E7FFDEFFh-values.

Secure Area First 2K Encryption/Content
The first 2K of the Secure Area (if it exists) are KEY1 encrypted. In official games, this 2K region contains data like so (in decrypted form):
  000h..007h  Secure Area ID (see above)
  008h..00Dh  Fixed (FFh,DEh,FFh,E7h,FFh,DEh)
  00Eh..00Fh  CRC16 across following 7E0h bytes, ie. [010h..7FFh]
  010h..7FDh  Unknown/random values, mixed with some THUMB SWI calls
  7FEh..7FFh  Fixed (00h,00h)
Of which, only the ID in the first 8 bytes is verified. Neither BIOS nor (current) firmare versions are verifying the data at 008h..7FFh, so the 7F8h bytes may be also used for normal program code/data.

Avoiding Secure Area Encryption
WLAN files are reportedly same format as cartridges, but without Secure Area, so games with Secure Area cannot be booted via WLAN. No$gba can encrypt and decrypt Secure Areas only if the NDS BIOS-images are present. And, Nintendo's devkit doesn't seem to support Secure Area encryption of unreleased games.
So, unencrypted cartridges are more flexible in use. Ways to avoid encryption (which still work on real hardware) are:
1) Set NDS9 ROM offset to 4000h, and leave the first 800h bytes of the Secure Area 00h-filled, which can be (and will be) safely destroyed during loading; due to the missing "encryObj" ID; that method is used by Nintendo's devkit.
2) Set NDS9 ROM offset to 8000h or higher (cartridge has no Secure Area at all).
3) Set NDS9 ROM offset, RAM address, and size to zero, set NDS7 ROM offset to 200h, and point both NDS9 and NDS7 entrypoints to the loaded NDS7 region. That method avoids waste of unused memory at 200h..3FFFh, and it should be compatible with the NDS console, however, it is not comaptible with commercial cartridges - which do silently redirect address below 4000h to "addr=8000h+(addr AND 1FFh)". Still, it should work with inofficial flashcards, which do not do that redirection. No$gba emulates the redirection for regular official cartridges, but it disables redirection for homebrew carts if NDS7 rom offset<8000h, and NDS7 size>0.
[One possible problem: Newer "anti-passme" firmware versions reportedly check that the entrypoint isn't set to 80000C0h, that firmwares might also reject NDS9 entrypoints within the NDS7 bootcode region?]

  DS Cartridge Icon/Title

The ROM offset of the Icon/Title is defined in CartHdr[68h]. The size was originally implied by the size of the original Icon/Title structure rounded to 200h-byte sector boundary (ie. A00h bytes for Version 1 or 2), however, later DSi carts are having a size entry at CartHdr[208h] (usually 23C0h).
If it is present (ie. if CartHdr[68h]=nonzero), then Icon/Title are displayed in the bootmenu.
  0000h 2     Version (0001h, 0002h, 0003h, or 0103h)
  0002h 2     CRC16 across entries 0020h..083Fh (all versions)
  0004h 2     CRC16 across entries 0020h..093Fh (Version 0002h and up)
  0006h 2     CRC16 across entries 0020h..0A3Fh (Version 0003h and up)
  0008h 2     CRC16 across entries 1240h..23BFh (Version 0103h and up)
  000Ah 16h   Reserved (zero-filled)
  0020h 200h  Icon Bitmap  (32x32 pix) (4x4 tiles, 4bit depth) (4x8 bytes/tile)
  0220h 20h   Icon Palette (16 colors, 16bit, range 0000h-7FFFh)
              (Color 0 is transparent, so the 1st palette entry is ignored)
  0240h 100h  Title 0 Japanese  (128 characters, 16bit Unicode)
  0340h 100h  Title 1 English   ("")
  0440h 100h  Title 2 French    ("")
  0540h 100h  Title 3 German    ("")
  0640h 100h  Title 4 Italian   ("")
  0740h 100h  Title 5 Spanish   ("")
  0840h 100h  Title 6 Chinese   ("")                 (Version 0002h and up)
  0940h 100h  Title 7 Korean    ("")                 (Version 0003h and up)
  0A40h 800h  Zerofilled (probably reserved for Title 8..15)
Below for animated DSi icons only (Version 0103h and up):
  1240h 1000h Icon Animation Bitmap 0..7 (200h bytes each, format as above)
  2240h 100h  Icon Animation Palette 0..7 (20h bytes each, format as above)
  2340h 80h   Icon Animation Sequence (16bit tokens)
Unused/padding bytes:
  0840h 1C0h  Unused/padding (FFh-filled) in Version 0001h
  0940h C0h   Unused/padding (FFh-filled) in Version 0002h
  23C0h 40h   Unused/padding (FFh-filled) in Version 0103h

  0001h = Original
  0002h = With Chinese Title
  0003h = With Chinese+Korean Titles
  0103h = With Chinese+Korean Titles and animated DSi icon

Title Strings
Usually, for non-multilanguage games, the same (english) title is stored in all title entries. The title may consist of ASCII characters 0020h-007Fh, character 000Ah (linefeed), and should be terminated/padded by 0000h.
The whole text should not exceed the dimensions of the DS cart field in the bootmenu (the maximum number of characters differs due to non-proportional font).
The title is usually split into a primary title, optional sub-title, and manufacturer, each separated by 000Ah character(s). For example: "America", 000Ah, "The Axis of War", 000Ah, "Cynicware", 0000h.

Icon Animation Sequence (DSi)
The sequence is represented by 16bit tokens, in the following format:
  15    Flip Vertically   (0=No, 1=Yes)
  14    Flip Horizontally (0=No, 1=Yes)
  13-11 Palette Index     (0..7)
  10-8  Bitmap Index      (0..7)
  7-0   Frame Duration    (01h..FFh) (in 60Hz units)
Value 0000h indicates the end of the sequence. If the first token is 0000h, then the non-animated default image is shown.
Uh, actually, a non-animated icon uses values 01h,00h,00h,01h, followed by 7Ch zerofilled bytes (ie. 0001h, 0100h, 3Eh x 0000h)?

FAT16:\title\000300tt\4ggggggg\data\banner.sav ;if carthdr[1BFh].bit2=1
Some DSi games are having a separate "banner.sav" file stored in the eMMC filesystem, enabled via carthdr[1BFh].bit2 (allowing to indicate the game progress by overriding the default icon). The banner files are 4000h bytes in size, the animation data is same as above, but without title strings and without non-animated icon.
  0000h 2     Version (0103h)
  0002h 6     Reserved (zero-filled)
  0008h 2     CRC16 across entries 0020h..119Fh (with initial value FFFFh)
  000Ah 16h   Reserved (zero-filled)
  0020h 1000h Icon Animation Bitmap 0..7 (200h bytes each)  ;\same format as
  1020h 100h  Icon Animation Palette 0..7 (20h bytes each)  ; in Icon/Title
  1120h 80h   Icon Animation Sequence (16bit tokens)        ;/
  11A0h 2E60h Garbage (random values, maybe due to eMMC decryption)
The feature is used by some Brain Age Express games (for example, Brain Age Express Sudoku: 'title\00030004\4b4e3945\data\banner.sav').
The feature does probably work only for DSiware titles (unless there are any DSi carts with SD/MMC access enabled; or unless there is a feature for storing similar data in cartridge memory).

  DS Cartridge Protocol

Communication with Cartridge ROM relies on sending 8 byte commands to the cartridge, after the sending the command, a data stream can be received from the cartridge (the length of the data stream isn't fixed, below descriptions show the default length in brackets, but one may receive more, or less bytes, if desired).

Cartridge Memory Map
  0000000h-0000FFFh Header (unencrypted)
  0001000h-0003FFFh Not read-able (zero filled in ROM-images)
  0004000h-0007FFFh Secure Area, 16KBytes (first 2Kbytes with extra encryption)
  0008000h-...      Main Data Area
DSi cartridges are split into a NDS area (as above), and a new DSi area:
  XX00000h XX02FFFh DSi Not read-able (XX00000h=first megabyte after NDS area)
  XX03000h-XX06FFFh DSi ARM9i Secure Area (usually with modcrypt encryption)
  XX07000h-...      DSi Main Data Area
Cartridge memory must be copied to RAM (the CPU cannot execute code in ROM).

Command Summary, Cmd/Reply-Encryption Type, Default Length
  Command/Params    Expl.                             Cmd  Reply Len
  -- Unencrypted Load --
  9F00000000000000h Dummy (read HIGH-Z bytes)         RAW  RAW   2000h
  0000000000000000h Get Cartridge Header              RAW  RAW   200h DSi:1000h
  9000000000000000h 1st Get ROM Chip ID               RAW  RAW   4
  00aaaaaaaa000000h Unencrypted Data (debug ver only) RAW  RAW   200h
  3Ciiijjjxkkkkkxxh Activate KEY1 Encryption Mode     RAW  RAW   0
  -- Secure Area Load --
  4llllmmmnnnkkkkkh Activate KEY2 Encryption Mode     KEY1 FIX   910h+0
  1lllliiijjjkkkkkh 2nd Get ROM Chip ID               KEY1 KEY2  910h+4
  xxxxxxxxxxxxxxxxh Invalid - Get KEY2 Stream XOR 00h KEY1 KEY2  910h+...
  2bbbbiiijjjkkkkkh Get Secure Area Block (4Kbytes)   KEY1 KEY2  910h+10A8h
  6lllliiijjjkkkkkh Optional KEY2 Disable             KEY1 KEY2  910h+?
  Alllliiijjjkkkkkh Enter Main Data Mode              KEY1 KEY2  910h+0
  -- Main Data Load --
  B7aaaaaaaa000000h Encrypted Data Read               KEY2 KEY2  200h
  B800000000000000h 3rd Get ROM Chip ID               KEY2 KEY2  4
  xxxxxxxxxxxxxxxxh Invalid - Get KEY2 Stream XOR 00h KEY2 KEY2  ...
The parameter digits contained in above commands are:
  aaaaaaaa     32bit ROM address (command B7 can access only 8000h and up)
  bbbb         Secure Area Block number (0004h..0007h for addr 4000h..7000h)
  x,xx         Random, not used in further commands (DSi: always zero)
  iii,jjj,llll Random, must be SAME value in further commands
  kkkkk        Random, must be INCREMENTED after FURTHER commands
  mmm,nnn      Random, used as KEY2-encryption seed

++++ Unencrypted Commands (First Part of Boot Procedure) ++++

Cartridge Reset
The /RES Pin switches the cartridge into unencrypted mode. After reset, the first two commands (9Fh and 00h) are transferred at 4MB/s CLK rate.

9F00000000000000h (2000h) - Dummy
Dummy command send after reset, returns endless stream of HIGH-Z bytes (ie. usually receiving FFh, immediately after sending the command, the first 1-2 received bytes may be equal to the last command byte).

0000000000000000h (200h) (DSi:1000h) - Get Header
Returns RAW unencrypted cartridge header, repeated every 1000h bytes. The interesting area are the 1st 200h bytes, the rest is typically zero filled (except on DSi carts, which do use the whole 1000h bytes).
The Gamecode header entry is used later on to initialize the encryption. Also, the ROM Control entries define the length of the KEY1 dummy periods (typically 910h clocks), and the CLK transfer rate for further commands (typically faster than the initial 4MB/s after power up).

9000000000000000h (4) - 1st Get ROM Chip ID
Returns RAW unencrypted Chip ID (eg. C2h,0Fh,00h,00h), repeated every 4 bytes.
  1st byte - Manufacturer (eg. C2h=Macronix) (roughly based on JEDEC IDs)
  2nd byte - Chip size (00h..7Fh: (N+1)Mbytes, F0h..FFh: (100h-N)*256Mbytes?)
  3rd byte - Flags (see below)
  4th byte - Flags (see below)
The Flag Bits in 3th byte can be
  0   Maybe Infrared flag? (in case ROM does contain on-chip infrared stuff)
  1   Unknown (set in some 3DS carts)
  2-7 Zero
The Flag Bits in 4th byte can be
  0-2 Zero
  3   Seems to be NAND flag (0=ROM, 1=NAND) (observed in only ONE cartridge)
  4   3DS Flag (0=NDS/DSi, 1=3DS)
  5   Zero   ... set in ... DSi-exclusive games?
  6   DSi flag (0=NDS/3DS, 1=DSi)
  7   Cart Protocol Variant (0=older/smaller carts, 1=newer/bigger carts)
Existing/known ROM IDs are:
  C2h,07h,00h,00h NDS Macronix 8MB ROM  (eg. DS Vision)
  C2h,0Fh,00h,00h NDS Macronix 16MB ROM (eg. Metroid Demo)
  C2h,1Fh,00h,00h NDS Macronix 32MB ROM (eg. Over the Hedge)
  C2h,1Fh,00h,40h DSi Macronix 32MB ROM (eg. Art Academy, TWL-VAAV)
  80h,3Fh,01h,E0h ?            64MB ROM+Infrared (eg. Walk with Me, NTR-IMWP)
  AEh,3Fh,00h,E0h DSi Noname   64MB ROM (eg. de Blob 2, TWL-VD2V)
  C2h,3Fh,00h,00h NDS Macronix 64MB ROM (eg. Ultimate Spiderman)
  C2h,3Fh,00h,40h DSi Macronix 64MB ROM (eg. Crime Lab, NTR-VAOP)
  80h,7Fh,00h,80h NDS SanDisk  128MB ROM (DS Zelda, NTR-AZEP-0)
  80h,7Fh,01h,E0h ?            128MB ROM+Infrared? (P-letter Soul Silver, IPGE)
  C2h,7Fh,00h,80h NDS Macronix 128MB ROM (eg. Spirit Tracks, NTR-BKIP)
  C2h,7Fh,00h,C0h DSi Macronix 128MB ROM (eg. Cooking Coach/TWL-VCKE)
  ECh,7Fh,00h,88h NDS Samsung  128MB NAND (eg. Warioware D.I.Y.)
  ECh,7Fh,01h,88h NDS Samsung? 128MB NAND+What? (eg. Jam with the Band, UXBP)
  ECh,7Fh,00h,E8h DSi Samsung? 128MB NAND (eg. Face Training, USKV)
  80h,FFh,80h,E0h NDS          256MB ROM (Kingdom Hearts - Re-Coded, NTR-BK9P)
  C2h,FFh,01h,C0h DSi Macronix 256MB ROM+Infrared? (eg. P-Letter White)
  C2h,FFh,00h,80h NDS Macronix 256MB ROM (eg. Band Hero, NTR-BGHP)
  C2h,FEh,01h,C0h DSi Macronix 512MB ROM+Infrared? (eg. P-Letter White 2)
  C2h,FEh,00h,90h 3DS Macronix probably 512MB? ROM (eg. Sims 3)
  45h,FAh,00h,90h 3DS SunDisk? maybe... 1.5GB? ROM (eg. Starfox)
  C2h,F8h,00h,90h 3DS Macronix maybe... 2GB?   ROM (eg. Kid Icarus)
  C2h,7Fh,00h,90h 3DS Macronix 128MB ROM CTR-P-AENJ MMinna no Ennichi
  C2h,FFh,00h,90h 3DS Macronix 256MB ROM CTR-P-AFSJ Pro Yakyuu Famista 2011
  C2h,FEh,00h,90h 3DS Macronix 512MB ROM CTR-P-AFAJ Real 3D Bass FishingFishOn
  C2h,FAh,00h,90h 3DS Macronix 1GB ROM CTR-P-ASUJ Hana to Ikimono Rittai Zukan
  C2h,FAh,02h,90h 3DS Macronix 1GB ROM CTR-P-AGGW Luigis Mansion 2 ASiA CHT
  C2h,F8h,00h,90h 3DS Macronix 2GB ROM CTR-P-ACFJ Castlevania - Lords of Shadow
  C2h,F8h,02h,90h 3DS Macronix 2GB ROM CTR-P-AH4J Monster Hunter 4
  AEh,FAh,00h,90h 3DS          1GB ROM CTR-P-AGKJ Gyakuten Saiban 5
  AEh,FAh,00h,98h 3DS          1GB NAND CTR-P-EGDJ Tobidase Doubutsu no Mori
  45h,FAh,00h,90h 3DS          1GB ROM CTR-P-AFLJ Fantasy Life
  45h,F8h,00h,90h 3DS          2GB ROM CTR-P-AVHJ Senran Kagura Burst - Guren
  C2h,F0h,00h,90h 3DS Macronix 4GB ROM CTR-P-ABRJ Biohazard Revelations
  FFh,FFh,FFh,FFh None (no cartridge inserted)
The Samsung NAND chip appears to use a slightly different protocol (seems as if it allows to read ROM header and ID only once, or as if it gets confused when reading more than 4 ID bytes, or so) (and of course, the protocol is somehow extended, allowing to write data to the NAND memory). The official JEDEC ID for Samsung would be "CEh", but for some reason, Samsung's NDS chip does spit out "ECh" as Maker ID.
ID "45h" ("SunDisk" according to a JEDEC ID list) might refer to "SanDisk"?

3Ciiijjjxkkkkkxxh (0) - Activate KEY1 Encryption Mode
The 3Ch command returns endless stream of HIGH-Z bytes, all following commands, and their return values, are encrypted. The random parameters iii,jjj,kkkkk must be re-used in further commands; the 20bit kkkkk value is to be incremented by one after each <further> command (it is <not> incremented after the 3Ch command).

3Diiijjjxkkkkkxxh (0) - Activate KEY1 Encryption Mode and Unlock DSi Mode
Same as command 3Ch (but with different initial 1048h-byte encryption values), and works only on DSi carts. Command 3Dh is unlocking two features on DSi carts:
  1) Command 2bbbbiiijjjkkkkkh loads ARM9i secure area (instead of ARM9 area)
  2) Command B7aaaaaaaa000000h allows to read the 'whole' cartridge space
Without command 3Dh, DSi carts will allow to read only the first some megabytes (for example, the first 11 Mbyte of the System Flaw cartridge), and the remaining memory returns mirrors of "addr=8000h+(addr AND 1FFh)").
Note: After reset, the cartridge protocol allows to send only either one of the 3Ch/3Dh commands (DSi consoles can control the cartridge reset pin, so they can first send 3Ch and read the normal secure area, then issue a reset and 3Dh and read the DSi secure area) (on a NDS one could do the same by ejecting/inserting the cartridge instead of toggling the reset pin).

++++ KEY1 Encrypted Commands (2nd Part of Boot procedure) ++++

4llllmmmnnnkkkkkh (910h) - Activate KEY2 Encryption Mode
KEY1 encrypted command, parameter mmmnnn is used to initialize the KEY2 encryption stream. Returns 910h dummy bytes (which are still subject to old KEY2 settings; at pre-initialization time, this is fixed: HIGH-Z, C5h, 3Ah, 81h, etc.). The new KEY2 seeds are then applied, and the first KEY2 byte is then precomputed. The 910h dummy stream is followed by that precomputed byte value endless repeated (this is the same value as that "underneath" of the first HIGH-Z dummy-byte of the next command).
Secure1000h: Returns repeated FFh bytes (instead of the leading C5h, 3Ah, 81h, etc. stuff).
Secure1000h: Returns repeated FFh bytes (instead of the repeated precomputed value).

1lllliiijjjkkkkkh (914h) - 2nd Get ROM Chip ID / Get KEY2 Stream
KEY1 encrypted command. Returns 910h dummy bytes, followed by KEY2 encrypted Chip ID repeated every 4 bytes, which must be identical as for the 1st Get ID command. The BIOS randomly executes this command once or twice. Changing the first command byte to any other value returns an endless KEY2 encrypted stream of 00h bytes, that is the easiest way to retrieve encryption values and to bypass the copyprotection.

2bbbbiiijjjkkkkkh (19B8h) - Get Secure Area Block
KEY1 encrypted command. Used to read a secure area block (bbbb in range 0004h..0007h for addr 4000h..7000h) (or, after sending command 3Dh on a DSi: bbbb in range 0004h..0007h for addr XX03000h..XX06000h).
Each block is 4K, so it requires four Get Secure Area commands to receive the whole Secure Area (ROM locations 4000h-7FFFh), the BIOS is reading these blocks in random order.
Normally (if the upper bit of the Chip ID is set): Returns 910h dummy bytes, followed by 200h KEY2 encrypted Secure Area bytes, followed by 18h KEY2 encrypted 00h bytes, then the next 200h KEY2 encrypted Secure Area bytes, again followed by 18h KEY2 encrypted 00h bytes, and so on. That stream is repeated every 10C0h bytes (8x200h data bytes, plus 8x18h zero bytes).
Alternately (if the upper bit of the Chip ID is zero): Returns 910h dummy bytes, followed by 1000h KEY2 encrypted Secure Area bytes, presumably followed by 18h bytes, too.
Aside from above KEY2 encryption (which is done by hardware), the first 2K of the NDS Secure Area is additionally KEY1 encrypted; which must be resolved after transfer by software (and the DSi Secure Area is usually modcrypted, as specified in the cartridge header).

6lllliiijjjkkkkkh (0) - Optional KEY2 Disable
KEY1 encrypted command. Returns 910h dummy bytes (which are still KEY2 affected), followed by endless stream of RAW 00h bytes. KEY2 encryption is disabled for all following commands.
This command is send only if firmware[18h] matches encrypted string "enPngOFF", and ONLY if firmware get_crypt_keys had completed BEFORE completion of secure area loading, this timing issue may cause unstable results.

Alllliiijjjkkkkkh (910h) - Enter Main Data Mode
KEY1 encrypted command. Returns 910h dummy bytes, followed by endless KEY2 encrypted stream of 00h bytes. All following commands are KEY2 encrypted.

++++ KEY2 Encrypted Commands (Main Data Transfer) ++++

B7aaaaaaaa000000h (200h) - Get Data
KEY2 encrypted command. The desired ROM address is specifed, MSB first, in parameter bytes (a). Returned data is KEY2 encrypted.
There is no alignment restriction for the address. However, the datastream wraps to the begin of the current 4K block when address+length crosses a 4K boundary (1000h bytes).
The command can be used only for addresses 8000h and up. Addresses 0..7FFFh are silently redirected to address "8000h+(addr AND 1FFh)". DSi cartridges will also reject XX00000h..XX06FFFh in the same fashion (and also XX07000h and up if the DSi cartridge isn't unlocked via command 3Dh).
Addresses that do exceed the ROM size do mirror to the valid address range (that includes mirroring non-loadable regions like 0..7FFFh to "8000h+(addr AND 1FFh)"; some newer games are using this behaviour for some kind of anti-piracy checks).

B800000000000000h (4) - 3rd Get ROM Chip ID
KEY2 encrypted command. Returns KEY2 encrypted Chip ID repeated every 4 bytes.

xxxxxxxxxxxxxxxxh - Invalid Command
Any other command (anything else than above B7h and B8h) in KEY2 command mode causes communcation failures. The invalid command returns an endless KEY2 encrypted stream of 00h bytes. After the invalid command, the KEY2 stream is NOT advanced for further command bytes, further commands seems to return KEY2 encrypted 00h bytes, of which, the first returned byte appears to be HIGH-Z.
Ie. the cartridge seems to have switched back to a state similar to the KEY1-phase, although it doesn't seem to be possible to send KEY1 commands.

++++ Notes ++++

KEY1 Command Encryption / 910h Dummy Bytes
All KEY1 encrypted commands are followed by 910h dummy byte transfers, these 910h clock cycles are probably used to decrypt the command at the cartridge side; communication will fail when transferring less than 910h bytes.
The return values for the dummy transfer are: A single HIGH-Z byte, followed by 90Fh KEY2-encrypted 00h bytes. The KEY2 encryption stream is advanced for all 910h bytes, including for the HIGH-Z byte.
Note: Current cartridges are using 910h bytes, however, other carts might use other amounts of dummy bytes, the 910h value can be calculated based on ROM Control entries in cartridge header. For the KEY1 formulas, see:
DS Encryption by Gamecode/Idcode (KEY1)

KEY2 Command/Data Encryption
DS Encryption by Random Seed (KEY2)

Cart Protocol Variants (Chip ID.Bit31)
There are two protocol variants for NDS carts, indicated by Bit31 of the ROM Chip ID (aka bit7 of the 4th ID byte):
  1) Chip ID.Bit31=0  Used by older/smaller carts with up to 64MB ROM
  2) Chip ID.Bit31=1  Used by newer/bigger carts with 64MB or more ROM
The first variant (for older carts) is described above. The second second variant includes some differences for KEY1 encrypted commands:
GAPS: The commands have the same 910h-cycle gaps, but without outputting CLK pulses during those gaps (ie. used with ROMCTRL.Bit28=0) (the absence of the CLKs implies that there is no dummy data transferred during gaps, and accordingly, that the KEY2 stream isn't advanced during the 910h gap cycles).
REPEATED COMMANDS and SECURE AREA DELAY: All KEY1 encrypted commands must be sent TWICE (or even NINE times). First, send the command with 0-byte Data transfer length. Second, issue the Secure Area Delay (required; use the delay specified in cart header[06Eh]).
Third, send the command once again with 0-byte or 4-byte data transfer length (usually 0 bytes, or 4-bytes for Chip ID command), or sent it eight times with 200h-byte data transfer length (for the 1000h-byte secure area load command).
For those repeats, always resend exactly the same command (namely, kkkkk is NOT incremented during repeats, and there is no extra index needed to select 200h-byte portions within 1000h-byte blocks; the cartridge is automatically outputting the eight portions one after another).

  DS Cartridge Backup

SPI Bus Backup Memory
  Type   Total Size  Page Size  Chip/Example      Game/Example
  EEPROM 0.5K bytes   16 bytes  ST M95040-W       (eg. Metroid Demo)
  EEPROM   8K bytes   32 bytes  ST M95640-W       (eg. Super Mario DS)
  EEPROM  64K bytes  128 bytes  ST M95512-W       (eg. Downhill Jam)
  FLASH  256K bytes  256 bytes  ST M45PE20        (eg. Skateland)
  FLASH  256K bytes             Sanyo LE25FW203T  (eg. Mariokart)
  FLASH  512K bytes  256 bytes  ST M25PE40?       (eg. which/any games?)
  FLASH  512K bytes             ST 45PE40V6       (eg. DS Zelda, NTR-AZEP-0)
  FLASH 1024K bytes             ST 45PE80V6       (eg. Spirit Tracks, NTR-BKIP)
  FLASH 8192K bytes             MX25L6445EZNI-10G (Art Academy only, TWL-VAAV)
  FRAM     8K bytes   No limit  ?                 (eg. which/any games?)
  FRAM    32K bytes   No limit  Ramtron FM25L256? (eg. which/any games?)

Lifetime Stats
  Type      Max Writes per Page    Data Retention
  EEPROM    100,000                40 years
  FLASH     100,000                20 years
  FRAM      No limit               10 years

SPI Bus Backup Memory is accessed via Ports 40001A0h and 40001A2h, see
DS Cartridge I/O Ports

For all EEPROM and FRAM types:
  06h WREN  Write Enable                Cmd, no parameters
  04h WRDI  Write Disable               Cmd, no parameters
  05h RDSR  Read Status Register        Cmd, read repeated status value(s)
  01h WRSR  Write Status Register       Cmd, write one-byte value
  9Fh RDID  Read JEDEC ID (not supported on EEPROM/FLASH, returns FFh-bytes)
For 0.5K EEPROM (8+1bit Address):
  03h RDLO  Read from Memory 000h-0FFh  Cmd, addr lsb, read byte(s)
  0Bh RDHI  Read from Memory 100h-1FFh  Cmd, addr lsb, read byte(s)
  02h WRLO  Write to Memory 000h-0FFh   Cmd, addr lsb, write 1..MAX byte(s)
  0Ah WRHI  Write to Memory 100h-1FFh   Cmd, addr lsb, write 1..MAX byte(s)
For 8K..64K EEPROM and for FRAM (16bit Address):
  03h RD    Read from Memory            Cmd, addr msb,lsb, read byte(s)
  02h WR    Write to Memory             Cmd, addr msb,lsb, write 1..MAX byte(s)
Note: MAX = Page Size (see above chip list) (no limit for FRAM).

For FLASH backup, commands should be same as for Firmware FLASH memory:
DS Firmware Serial Flash Memory

Status Register
  0   WIP  Write in Progress (1=Busy) (Read only) (always 0 for FRAM chips)
  1   WEL  Write Enable Latch (1=Enable) (Read only, except by WREN,WRDI)
  2-3 WP   Write Protect (0=None, 1=Upper quarter, 2=Upper Half, 3=All memory)
For 0.5K EEPROM:
  4-7 ONEs Not used (all four bits are always set to "1" each)
For 8K..64K EEPROM and for FRAM:
  4-6 ZERO Not used (all three bits are always set to "0" each)
  7   SRWD Status Register Write Disable (0=Normal, 1=Lock) (Only if /W=LOW)
WEL gets reset on Power-up, WRDI, WRSR, WRITE/LO/HI, and on /W=LOW.
The WRSR command allows to change ONLY the two WP bits, and the SRWD bit (if any), these bits are non-volatile (remain intact during power-down), respectively, the WIP bit must be checked to sense WRSR completion.

Detection (by examining hardware responses)
The overall memory type and bus-width can be detected by RDSR/RDID commands:
  RDSR  RDID          Type         (bus-width)
  FFh,  FFh,FFh,FFh   None         (none)
  F0h,  FFh,FFh,FFh   EEPROM       (with 8+1bit address bus)
  00h,  FFh,FFh,FFh   EEPROM/FRAM  (with 16bit address bus)
  00h,  xxh,xxh,xxh   FLASH        (usually with 24bit address bus)
And, the RD commands can be used to detect the memory size/mirrors (though that won't work if the memory is empty).

Pin-Outs for EEPROM and FRAM chips
  Pin Name Expl.
  1  /S    Chip Select
  2  Q     Data Out
  3  /W    Write-Protect (not used in NDS, wired to VCC)
  4  VSS   Ground
  5  D     Data In
  6  C     Clock
  7  /HOLD Transfer-pause (not used in NDS, wired to VCC)
  8  VCC   Supply 2.5 to 5.5V for M95xx0-W

FRAM (Ferroelectric Nonvolatile RAM) is fully backwards compatible with normal EEPROMs, but comes up with faster write/erase time (no delays), and with lower power consumption, and unlimited number of write/erase cycles. Unlike as for normal RAM, as far as I understand, the data remains intact without needing any battery.

Other special save memory
  DS Vision (NDS cart with microSD slot... and maybe ALSO with EEPROM?)
  Warioware D.I.Y. (uses a single NAND FLASH chip for both 'ROM' and 'SAVE')
    (the warioware chip is marked "SAMSUNG 004, KLC2811ANB-P204, NTR-UORE-0")
    (the warioware PCB is marked "DI X-7 C17-01")
  and, a few games are said to have "Flash - 64 Mbit" save memory?

DSi Internal eMMC and External SD Card
DSi cartridges are usually (maybe always) having SD/MMC access disabled, so they must stick using EEPROM/FLASH chips inside of the cartridges (which is required for NDS compatibility anyways).
However, DSiware games (downloaded from DSi Shop) are allowed to save data on eMMC, using "private.sav" or "public.sav" files in their data folder. The size of that files is preset in cartridge header.

  DS Cartridge I/O Ports

The Gamecard bus registers can be mapped to NDS7 or NDS9 via EXMEMCNT, see
DS Memory Control

40001A0h - NDS7/NDS9 - AUXSPICNT - Gamecard ROM and SPI Control
  0-1   SPI Baudrate        (0=4MHz/Default, 1=2MHz, 2=1MHz, 3=512KHz)
  2-5   Not used            (always zero)
  6     SPI Hold Chipselect (0=Deselect after transfer, 1=Keep selected)
  7     SPI Busy            (0=Ready, 1=Busy) (presumably Read-only)
  8-12  Not used            (always zero)
  13    NDS Slot Mode       (0=Parallel/ROM, 1=Serial/SPI-Backup)
  14    Transfer Ready IRQ  (0=Disable, 1=Enable) (for ROM, not for AUXSPI)
  15    NDS Slot Enable     (0=Disable, 1=Enable) (for both ROM and AUXSPI)
The "Hold" flag should be cleared BEFORE transferring the LAST data unit, the chipselect will be then automatically cleared after the transfer, the program should issue a WaitByLoop(12) on NDS7 (or longer on NDS9) manually AFTER the LAST transfer.

40001A2h - NDS7/NDS9 - AUXSPIDATA - Gamecard SPI Bus Data/Strobe (R/W)
The SPI transfer is started on writing to this register, so one must <write> a dummy value (should be zero) even when intending to <read> from SPI bus.
  0-7  Data
  8-15 Not used (always zero)
During transfer, the Busy flag in AUXSPICNT is set, and the written DATA value is transferred to the device (via output line), simultaneously data is received (via input line). Upon transfer completion, the Busy flag goes off, and the received value can be then read from AUXSPIDATA, if desired.

40001A4h - NDS7/NDS9 - ROMCTRL - Gamecard Bus ROMCTRL (R/W)
  Bit   Expl.
  0-12  KEY1 gap1 length  (0-1FFFh) (forced min 08F8h by BIOS) (leading gap)
  13    KEY2 encrypt data (0=Disable, 1=Enable KEY2 Encryption for Data)
  14     "SE" Unknown? (usually same as Bit13) (does NOT affect timing?)
  15    KEY2 Apply Seed   (0=No change, 1=Apply Encryption Seed) (Write only)
  16-21 KEY1 gap2 length  (0-3Fh)   (forced min 18h by BIOS) (200h-byte gap)
  22    KEY2 encrypt cmd  (0=Disable, 1=Enable KEY2 Encryption for Commands)
  23    Data-Word Status  (0=Busy, 1=Ready/DRQ) (Read-only)
  24-26 Data Block size   (0=None, 1..6=100h SHL (1..6) bytes, 7=4 bytes)
  27    Transfer CLK rate (0=6.7MHz=33.51MHz/5, 1=4.2MHz=33.51MHz/8)
  28    KEY1 Gap CLKs (0=Hold CLK High during gaps, 1=Output Dummy CLK Pulses)
  29     "RESB" Unknown (always 1 ?) (not read/write-able) -- R/W on DSi7 (?!)
  30     "WR"   Unknown (always 0 ?) (read/write-able)
  31    Block Start/Status (0=Ready, 1=Start/Busy) (IRQ See 40001A0h/Bit14)
The cartridge header is booted at 4.2MHz CLK rate, and following transfers are then using ROMCTRL settings specified in cartridge header entries [060h] and [064h], which are usually using 6.7MHz CLK rate.
Transfer length of null, four, and 200h..4000h bytes are supported by the console, however, regular cartridges support only max 1000h bytes.

40001A8h - NDS7/NDS9 - Gamecard bus 8-byte Command Out
The separate commands are described in the Cartridge Protocol chapter, however, once when the BIOS boot procedure has completed, one would usually only need command "B7aaaaaaaa000000h", for reading data (usually 200h bytes) from address aaaaaaaah (which should be usually aligned by 200h).
  0-7   1st Command Byte (at 40001A8h) (eg. B7h) (MSB)
  8-15  2nd Command Byte (at 40001A9h) (eg. addr bit 24-31)
  16-23 3rd Command Byte (at 40001AAh) (eg. addr bit 16-23)
  24-31 4th Command Byte (at 40001ABh) (eg. addr bit 8-15) (when aligned=even)
  32-39 5th Command Byte (at 40001ACh) (eg. addr bit 0-7)  (when aligned=00h)
  40-47 6th Command Byte (at 40001ADh) (eg. 00h)
  48-57 7th Command Byte (at 40001AEh) (eg. 00h)
  56-63 8th Command Byte (at 40001AFh) (eg. 00h) (LSB)
Observe that the command/parameter MSB is located at the smallest memory location (40001A8h), ie. compared with the CPU, the byte-order is reversed.

4100010h - NDS7/NDS9 - Gamecard bus 4-byte Data In (R)
  0-7   1st received Data Byte (at 4100010h)
  8-15  2nd received Data Byte (at 4100011h)
  16-23 3rd received Data Byte (at 4100012h)
  24-31 4th received Data Byte (at 4100013h)
After sending a command, data can be read from this register manually (when the DRQ bit is set), or by DMA (with DMASAD=4100010h, Fixed Source Address, Length=1, Size=32bit, Repeat=On, Mode=DS Gamecard).

40001B0h - 32bit - NDS7/NDS9 - Encryption Seed 0 Lower 32bit (W)
40001B4h - 32bit - NDS7/NDS9 - Encryption Seed 1 Lower 32bit (W)
40001B8h - 16bit - NDS7/NDS9 - Encryption Seed 0 Upper 7bit (bit7-15 unused)
40001BAh - 16bit - NDS7/NDS9 - Encryption Seed 1 Upper 7bit (bit7-15 unused)
These registers are used by the NDS7 BIOS to initialize KEY2 encryption (and there's normally no need to change that initial settings). Writes to the Seed registers do not have direct effect on the internal encryption registers, until the Seed gets applied by writing "1" to ROMCTRL.Bit15.
 For more info:
DS Encryption by Random Seed (KEY2)
Note: There are <separate> Seed registers for both NDS7 and NDS9, which can be applied by ROMCTRL on NDS7 and NDS9 respectively (however, once when applied to the internal registers, the new internal setting is used for <both> CPUs).

  DS Cartridge NitroROM and NitroARC File Systems

The DS hardware, BIOS, and Firmware do NOT contain any built-in filesystem functions. The ARM9/ARM7 boot code (together max 3903KB), and Icon/Title information are automatically loaded on power-up.
Programs that require to load additional data from cartridge ROM may do that either by implementing whatever functions to translate filenames to ROM addresses, or by reading from ROM directly.

The NitroROM Filesystem is used by many NDS games (at least those that have been developed with Nintendo's tools). It's used for ROM Cartridges, and, on the DSi, it's also used for DSiWare games (in the latter case, NitroROM acts as a 2nd virtual filesystem inside of the DSi's FAT16 filesystem).
  FNT = cart_hdr[040h]     ;\origin as defined in ROM cartridge header
  FAT = cart_hdr[048h]     ;/
  IMG = 00000000h          ;-origin at begin of ROM
Aside from using filenames, NitroROM files can be alternately accessed via Overlay IDs (see later on below).

NitroARC (Nitro Archive)
NARC Files are often found inside of NitroROM Filesystems (ie. NARC is a second virtual filesystem, nested inside of the actual filesystem). The NARC Format is very similar to the NitroROM Format, but with additional Chunk Headers (instead of the Cartridge ROM Header).
  ...  ...  Optional Header (eg. compression header, or RSA signature)
  000h 4    Chunk Name "NARC" (Nitro Archive)                   ;\
  004h 2    Byte Order (FFFEh)                                  ;
  006h 2    Version (0100h)                                     ; NARC
  008h 4    File Size (from "NARC" ID to end of file)           ; Header
  00Ch 2    Chunk Size (0010h)                                  ;
  00Eh 2    Number of following chunks (0003h)                  ;/
  010h 4    Chunk Name "BTAF" (File Allocation Table Block)     ;\
  014h 4    Chunk Size (including above chunk name)             ; File
  018h 2    Number of Files                                     ; Allocation
  01Ah 2    Reserved (0000h)                                    ; Table
  01Ch ...  FAT (see below)                                     ;/
  ...  4    Chunk Name "BTNF" (File Name Table Block)           ;\
  ...  4    Chunk Size (including above chunk name)             ; File Name
  ...  ...  FNT (see below)                                     ; Table
  ...  ..   Padding for 4-byte alignment (FFh-filled, if any)   ;/
  ...  4    Chunk Name "GMIF" (File Image Block)                ;\
  ...  4    Chunk Size (including above chunk name)             ; File Data
  ...  ...  IMG (File Data)                                     ;/

File Allocation Table (FAT) (base/size defined in cart header)
Contains ROM addresses for up to 61440 files (File IDs 0000h and up).
  Addr Size Expl.
  00h  4    Start address (originated at IMG base) (0=Unused Entry)
  04h  4    End address   (Start+Len...-1?)        (0=Unused Entry)
For NitroROM, addresses must be after Secure Area (at 8000h and up).
For NitroARC, addresses can be anywhere in the IMG area (at 0 and up).
Directories are fully defined in FNT area, and do not require FAT entries.

File Name Table (FNT) (base/size defined in cart header)
Consists of the FNT Directory Table, followed by one or more FNT Sub-Tables.
To interprete the directory tree: Start at the 1st Main-Table entry, which is referencing to a Sub-Table, any directories in the Sub-Table are referencing to Main-Table entries, which are referencing to further Sub-Tables, and so on.

FNT Directory Main-Table (base=FNT+0, size=[FNT+06h]*8)
Consists of a list of up to 4096 directories (Directory IDs F000h and up).
  Addr Size Expl.
  00h  4    Offset to Sub-table             (originated at FNT base)
  04h  2    ID of first file in Sub-table   (0000h..EFFFh)
For first entry (ID F000h, root directory):
  06h  2    Total Number of directories     (1..4096)
Further entries (ID F001h..FFFFh, sub-directories):
  06h  2    ID of parent directory (F000h..FFFEh)

FNT Sub-tables (base=FNT+offset, ends at Type/Length=00h)
Contains ASCII names for all files and sub-directories within a directory.
  Addr Size Expl.
  00h  1    Type/Length
              01h..7Fh File Entry          (Length=1..127, without ID field)
              81h..FFh Sub-Directory Entry (Length=1..127, plus ID field)
              00h      End of Sub-Table
              80h      Reserved
  01h  LEN  File or Sub-Directory Name, case-sensitive, without any ending
              zero, ASCII 20h..7Eh, except for characters \/?"<>*:;|
Below for Sub-Directory Entries only:
  LEN+1 2    Sub-Directory ID (F001h..FFFFh) ;see FNT+(ID AND FFFh)*8
File Entries do not have above ID field. Instead, File IDs are assigned in incrementing order (starting at the "First ID" value specified in the Directory Table).

ARM9 and ARM7 Overlay Tables (OVT) (base/size defined in cart header)
Somehow related to Nintendo's compiler, allows to assign compiler Overlay IDs to filesystem File IDs, and to define additional information such like load addresses.
  Addr Size Expl.
  00h  4    Overlay ID
  04h  4    RAM Address ;Point at which to load
  08h  4    RAM Size    ;Amount to load
  0Ch  4    BSS Size    ;Size of BSS data region
  10h  4    Static initialiser start address
  14h  4    Static initialiser end address
  18h  4    File ID  (0000h..EFFFh)
  1Ch  4    Reserved (zero)

Cartridge Header
The base/size of FAT, FNT, OVT areas is defined in cartridge header,
DS Cartridge Header

  DS Cartridge PassMe/PassThrough

PassMe is an adapter connected between the DS and an original NDS cartridge, used to boot unencrypted code from a flash cartridge in the GBA slot, it replaces the following entries in the original NDS cartridge header:
  Addr  Siz Patch
  004h  4   E59FF018h  ;opcode LDR PC,[027FFE24h] at 27FFE04h
  01Fh  1   04h        ;set autostart bit
  022h  1   01h        ;set ARM9 rom offset to nn01nnnnh (above secure area)
  024h  4   027FFE04h  ;patch ARM9 entry address to endless loop
  034h  4   080000C0h  ;patch ARM7 entry address in GBA slot
  15Eh  2   nnnnh      ;adjust header crc16
After having verified the encrypted chip IDs (from the original cartridge), the console thinks that it has successfully loaded a NDS cartridge, and then jumps to the (patched) entrypoints.

GBA Flashcard Format
Although the original PassMe requires only the entrypoint, PassMe programs should additionally contain one (or both) of the ID values below, allowing firmware patches to identify & start PassMe games without real PassMe hardware.
  0A0h  GBA-style Title    ("DSBooter")
  0ACh  GBA-style Gamecode ("PASS")
  0C0h  ARM7 Entrypoint    (32bit ARM code)
Of course, that applies only to early homebrew programs, newer games should use normal NDS cartridge headers.

ARM9 Entrypoint
The GBA-slot access rights in the EXMEMCNT register are initially assigned to the ARM7 CPU, so the ARM9 cannot boot from the flashcard, instead it is switched into an endless loop in Main RAM (which contains a copy of the cartridge header at 27FFE00h and up). The ARM7 must thus copy ARM9 code to Main RAM, and then set the ARM9 entry address by writing to [027FFE24h].

  DS Cartridge GBA Slot

Aside from the 17-pin NDS slot, the DS also includes a 32-pin GBA slot. This slot is used for GBA backwards compatibility mode. Additionally, in DS mode, it can be as expansion port, or for importing data from GBA games.
  NDS:     Normal 32pin slot
  DS Lite: Short 32pin slot (GBA cards stick out)
  DSi:     N/A (dropped support for GBA carts, and for DS-expansions)
In DS mode, ROM, SRAM, FLASH backup, and whatever peripherals contained in older GBA cartridges can be accessed (almost) identically as in GBA mode,
GBA Cartridges

In DS mode, only one ROM-region is present at 8000000h-9FFFFFFh (ie. without the GBA's mirrored WS1 and WS2 regions at A000000h-DFFFFFFh). The expansion region (for SRAM/FLASH/etc) has been moved from E000000h-E00FFFFh (GBA-mode) to A000000h-A00FFFFh (DS-mode).

GBA timings are specified as "waitstates" (excluding 1 access cycle), NDS timings are specified as (total) "access time". And, the NDS bus-clock is twice as fast as for GBA. So, for "N" GBA waitstates, the NDS access time should be "(N+1)*2". Timings are controlled via NDS EXMEMCNT instead GBA WAITCNT,
DS Memory Control - Cartridges and Main RAM

EEPROMs in GBA carts cannot be accessed in DS mode. The EEPROMs should be accessed with 8 waits on GBA, ie. 18 cycles on NDS on both 1st/2nd access. But, 2nd access is restricted to max 6 cycles in NDS mode, which is ways too fast.

  DS Cart Rumble Pak

DS Rumble Option Pak
The Rumble Pak comes bundled with Metroid Prime Pinball. It contains a small actuator made by ALPS to make it rumble. The original device (NTR-008) is sized like a normal GBA cartridge, and there's also shorter variant for the DS-Lite (USG-006).
The rumble pak is pretty simple internally, it only wires up to a few pins on the GBA Cartridge Port:
  VCC, GND, /WR, AD1, and IRQ (grounded)
AD1 runs into a little 8 pin chip, which is probably just a latch on the rising edge of /WR. A line runs from this chip to a transistor that is directly connected to the actuator. The only other chip on the board is a 5 pin jobber, probably a power component.
For detection, AD1 seems to be pulled low when reading from it, the other AD lines are open bus (containing the halfword address), so one can do:
  for i=0 to 0FFFh
    if halfword[8000000h+i*2]<>(i and FFFDh) then <not_a_ds_rumble_pak>
  next i
The actuator doesn't have an on/off setting like a motor, it rumbles when you switch it between the two settings. Switch frequently for a fast rumble, and fairly slowly for more of a 'tick' effect. That should be done via timer irq:
  rumble_state = rumble_state xor 0002h
Unknown if one of the two states has higher power-consumption than the other, ie. if it's a "pull/release" mechanism, if so, then disabling rumble should be done by using the "release" state, which would be AD1=0, or AD1=1...?
Note: The v3 firmware can detect the Rumble Pak as an option pak, but it does not provide an enable/disable rumble option in the alarm menu.

Other DS Rumble device
There's also another DS add-on with rumble. That device uses AD8 (instead AD1) to control rumble, and, it's using a classic motor (ie. it's rumbling while and as long as the latched AD8 value is "1").
DS Cart Slider with Rumble

GBA Rumble Carts
There are also a few GBA games that contain built-in Rumble, and which could be used by NDS games as well. To be user friendly, best support both types.
GBA Cart Rumble

  DS Cart Slider with Rumble

Add-on device for the japanese title Magukiddo. The optical sensor is attached underneath of the console (connected to the GBA slot).
The sensor is an Agilent ADNS-2030 Low Power Optical Mouse Sensor (16pin DIP chip with built-in optical sensor, and external LED light source) with two-wire serial bus (CLK and DTA).

ADNS-2030 Registers (write 1 byte index, then read/write 1 byte data)
Index (Bit7=Direction; 0=Read, 1=Write):
  00h Product_ID (R) (03h)
  01h Revision_ID (R) (10h=Rev. 1.0) (20h=Used in DS-option-pak)
  02h Motion/Status Flags (R)
  03h Delta_X (R) (signed 8bit) (automatically reset to 00h after reading)
  04h Delta_Y (R) (signed 8bit) (automatically reset to 00h after reading)
  05h SQUAL (R) (surface quality) (unsigned 8bit)
  06h Average_Pixel (R) (unsigned 6bit, upper 2bit unused)
  07h Maximum_Pixel (R) (unsigned 6bit, upper 2bit unused)
  08h Reserved
  09h Reserved
  0Ah Configuration_bits (R/W)
  0Bh Reserved
  0Ch Data_Out_Lower (R)
  0Dh Data_Out_Upper (R)
  0Eh Shutter_Lower (R)
  0Fh Shutter_Upper (R)
  10h Frame_Period_Lower (R/W)
  11h Frame_Period_Upper (R/W)
Motion/Status Flags:
  7 Motion since last report or PD (0=None, 1=Motion occurred)
  6 Reserved
  5 LED Fault detected (0=No fault,  1=Fault detected)
  4 Delta Y Overflow (0=No overflow, 1=Overflow occured)
  3 Delta X Overflow (0=No overflow, 1=Overflow occured)
  2 Reserved
  1 Reserved
  0 Resolution in counts per inch (0=400, 1=800)
  7 Reset Power up defaults (W) (0=No, 1=Reset)
  6 LED Shutter Mode (0=LED always on, 1=LED only on when shutter is open)
  5 Self Test (W) (0=No, 1=Perform all self tests)
  4 Resolution in counts per inch (0=400, 1=800)
  3 Dump 16x16 Pixel bitmap (0=No, 1=Dump via Data_Out ports)
  2 Reserved
  1 Reserved
  0 Sleep Mode (0=Normal/Sleep after 1 second, 1=Always awake)
                         |74273  |
  /WR -----------------> |CLK    |                       _____
  AD1/SIO CLK ---------> |D1   Q1|--------------> CLK   |74125|
  AD2 power control ---> |D2   Q2|--->     ____         |     |
  AD3/SIO DIR ---------> |D3   Q3|------+-|7400\________|/EN  |
  AD8 rumble on/off ---> |D?   Q?|--->  +-|____/        |     |
  AD0/SIO DTA ----+----> |D5   Q5|----------------------|A   Y|--+--DTA
                  |      |_______|                      |- - -|  |
          ____    +-------------------------------------|Y   A|--+
  /RD ---|7400\______ ____                              |     |
  /RD ---|____/      |7400\_____________________________|/EN  |
  A19 _______________|____/                             |_____|

7400 Quad NAND Gate, 74273 8bit Latch

AD0 Optical Sensor Serial Data (0=Low, 1=High)
AD1 Optical Sensor Serial Clock (0=Low, 1=High)
AD2 Optical Sensor Power (0=Off, 1=On)
AD3 Optical Sensor Serial Direction (0=Read, 1=Write)
AD8 Rumble Motor (0=Off, 1=On)

Thanks: Daniel Palmer

  DS Cart Expansion RAM

DS Memory Expansions
There are several RAM expansions for the NDS. The RAM cartridge connects to the GBA slot; can can be then accessed from cartridges in the DS slot.
  Opera         (8MB RAM) (official RAM expansion for Opera browser)
  EZ3/4/3-in-1  (8-16MB RAM, plus FLASH, plus rumble)
  Supercard     (32MB)
  M3            (32MB)
  G6            (32MB)
The recommended access time (waitstates) for all memory types is unknown. Unknown which programs do use these expansions for which purposes (aside from the Opera browser).
Thanks to Rick "Lick" Wong for info on detection and unlocking.

Opera / DS Memory Expansion Pak (NTR-011 or USG-007)
  base=9000000h, size=800000h (8MB)
  unlock=1, lock=0
  STRH [8240000h],lock/unlock

  base=8400000h, size=VAR (8MB..16MB)
  locking/unlocking/detection see below

  base=8000000h, size=1FFFFFEh (32MB minus last two bytes?)
  unlock=5 (RAM_RW), lock=3 (MEDIA)
  STRH [9FFFFFEh],lock/unlock
  STRH [9FFFFFEh],lock/unlock

  base=8000000h, size=2000000h (32MB)
  unlock=00400006h, lock=00400003h
  LDRH Rd,[8E00002h]
  LDRH Rd,[800000Eh]
  LDRH Rd,[8801FFCh]
  LDRH Rd,[800104Ah]
  LDRH Rd,[8800612h]
  LDRH Rd,[8000000h]
  LDRH Rd,[8801B66h]
  LDRH Rd,[8000000h+(lock/unlock)*2]
  LDRH Rd,[800080Eh]
  LDRH Rd,[8000000h]
  LDRH Rd,[80001E4h]
  LDRH Rd,[80001E4h]
  LDRH Rd,[8000188h]
  LDRH Rd,[8000188h]

  base=8000000h, size=2000000h (32MB)
  unlock=6, lock=3
  LDRH Rd,[9000000h]
  LDRH Rd,[9FFFFE0h]
  LDRH Rd,[9FFFF4Ah]
  LDRH Rd,[9FFFF4Ah]
  LDRH Rd,[9FFFF4Ah]
  LDRH Rd,[9200000h+(lock/unlock)*2]
  LDRH Rd,[9FFFFF0h]
  LDRH Rd,[9FFFFE8h]

For EZ, detection works as so:
 ez_ram_test:   ;Based on DSLinux Amadeus' detection
  ez_subfunc(9880000h,8000h) ;-SetRompage (OS mode)
  ez_subfunc(9C40000h,1500h) ;-OpenNorWrite
  [08400000h]=1234h          ;\
  if [08400000h]=1234h       ; test writability at 8400000h
    [8000000h]=4321h         ; and non-writability at 8000000h
    if [8000000h]<>4321h     ;
      return true            ;/
  ez_subfunc(9C40000h,D200h) ;CloseNorWrite
  ez_subfunc(9880000h,0160h) ;SetRompage (0160h)
  ez_subfunc(9C40000h,1500h) ;OpenNorWrite
  [8400000h]=1234h           ;\
  if [8400000h]=1234h        ; test writability at 8400000h
    return true              ;/
  return false               ;-failed
  STRH [9FE0000h],D200h
  STRH [8000000h],1500h
  STRH [8020000h],D200h
  STRH [8040000h],1500h
  STRH [addr],data
  STRH [9FC0000h],1500h
For all other types (everything except EZ), simply verify that you can write (when unlocked), and that you can't (when locked).

  DS Cart Unknown Extras

DS Cartridges with built-in Infrared Port
Some NDS and DSi games (those with NTR-Ixxx or TWL-Ixxx gamecodes) contain built-in Infrared ports; used to communicate with pedometers.
The IR-port is accessed via certain SPI bus commands; that bus is also shared to access FLASH memory via other commands.
The FLASH chip seems to return a nonsense chip ID (maybe the cartridge is using uncommon FLASH memory, or maybe the ID command is redirected to the IR-port hardware).
The ROM chip does also respond with an uncommon ID (with one special bit set, which is possibly indicating the presenence of the IR-hardware) (maybe the IR-port is contained in the ROM chip, or maybe the SPI-bus sharing is handled inside of the ROM chip.
The IR-related SPI commands are mostly unknown. Except that: command 08h should return 55h (or some other non-FFh value), otherwise the game won't work in emulators; this might be some IR-status byte.

DS Cartridges with NAND memory
Some NDS games (eg. Warioware D.I.Y.) contain NAND memory, this memory contains both the game and save memory (normal NDS games contain separate ROM and FLASH/EEPROM chips for that purposes) (the advantage is that NAND allows more storage than the usual FLASH chips).
The Warioware D.I.Y. PCB is marked "DI X-7 C17-01", and it does contain only one single chip, marked "SAMSUNG 004, KLC2811ANB-P204, NTR-UORE-0".
That NAND chip connects directly to the NDS parallel bus (the serial SPI chipselect is left unconnected). Unknown how to write to the chip, and unknown if certain regions are write-protected.

DS Cartridges with built-in MicroSD Card Slot
The DS Vision cartridge contains a built-in microSD card slot. Users can download videos from internet (against a few), store the videos on microSD cards, and then view them on the NDS via DS Vision cartridge.
Unknown how the microSD is accessed; via parallel 'ROM' bus and/or via serieal SPI bus; by which commands? Also unknown if the thing contains built-in video decoder hardware, or if videos are decoded on ARM cpus.

  DS Cart Cheat Action Replay DS

The first commercial DS cheat code solution, this device was developed by Datel. It supports swapping out cartridges after loading the AR software. For updating, the user may either manually enter codes or use the included proprietary USB cable that comes with the device. The user has been able to manually update codes since firmware version 1.52.

Action Replay DS Codes
  ABCD-NNNNNNNN       Game ID ;ASCII Gamecode [00Ch] and CRC32 across [0..1FFh]
  00000000 XXXXXXXX   manual hook codes (rarely used) (default is auto hook)
  1XXXXXXX 0000YYYY   half[XXXXXXX+offset] = YYYY
  2XXXXXXX 000000YY   byte[XXXXXXX+offset] = YY
  3XXXXXXX YYYYYYYY   IF YYYYYYYY > word[XXXXXXX]   ;unsigned    ;\
  4XXXXXXX YYYYYYYY   IF YYYYYYYY < word[XXXXXXX]   ;unsigned    ; for v1.54,
  5XXXXXXX YYYYYYYY   IF YYYYYYYY = word[XXXXXXX]                ; when X=0,
  6XXXXXXX YYYYYYYY   IF YYYYYYYY <> word[XXXXXXX]               ; uses
  7XXXXXXX ZZZZYYYY   IF YYYY > ((not ZZZZ) AND half[XXXXXXX])   ; [offset]
  8XXXXXXX ZZZZYYYY   IF YYYY < ((not ZZZZ) AND half[XXXXXXX])   ; instead of
  BXXXXXXX 00000000   offset = word[XXXXXXX+offset]
  C0000000 YYYYYYYY   FOR loopcount=0 to YYYYYYYY  ;execute Y+1 times
  C4000000 00000000   offset = address of the C4000000 code           ;v1.54
  C5000000 XXXXYYYY   counter=counter+1, IF (counter AND YYYY) = XXXX ;v1.54
  C6000000 XXXXXXXX   [XXXXXXXX]=offset                               ;v1.54
  D0000000 00000000   ENDIF
  D1000000 00000000   NEXT loopcount
  D2000000 00000000   NEXT loopcount, and then FLUSH everything
  D3000000 XXXXXXXX   offset = XXXXXXXX
  D4000000 XXXXXXXX   datareg = datareg + XXXXXXXX
  D5000000 XXXXXXXX   datareg = XXXXXXXX
  D6000000 XXXXXXXX   word[XXXXXXXX+offset]=datareg, offset=offset+4
  D7000000 XXXXXXXX   half[XXXXXXXX+offset]=datareg, offset=offset+2
  D8000000 XXXXXXXX   byte[XXXXXXXX+offset]=datareg, offset=offset+1
  D9000000 XXXXXXXX   datareg = word[XXXXXXXX+offset]
  DA000000 XXXXXXXX   datareg = half[XXXXXXXX+offset]
  DB000000 XXXXXXXX   datareg = byte[XXXXXXXX+offset] ;bugged on pre-v1.54
  DC000000 XXXXXXXX   offset = offset + XXXXXXXX
  EXXXXXXX YYYYYYYY   Copy YYYYYYYY parameter bytes to [XXXXXXXX+offset...]
  44332211 88776655   parameter bytes 1..8 for above code  (example)
  0000AA99 00000000   parameter bytes 9..10 for above code (padded with 00s)
  FXXXXXXX YYYYYYYY   Copy YYYYYYYY bytes from [offset..] to [XXXXXXX...]
IF/ENDIF can be nested up to 32 times. FOR/NEXT cannot be nested, any FOR statement does forcefully terminate any prior loop. FOR does backup the current IF condidition flags, and NEXT does restore these flags, so ENDIF(s) aren't required inside of the loop. The NEXT+FLUSH command does (after finishing the loop) reset offset=0, datareg=0, and does clear all condition flags, so further ENDIF(s) aren't required after the loop.
Before v1.54, the DB000000 code did accidently set offset=offset+XXXXXXX after execution of the code. For all word/halfword accesses, the address should be aligned accordingly. For the COPY commands, addresses should be aligned by four (all data is copied with ldr/str, except, on odd lengths, the last 1..3 bytes do use ldrb/strb).
offset, datareg, loopcount, and counter are internal registers in the action replay software.

> The condition register is checked, for all code types
> but the D0, D1 and D2 code type
Makes sense.

> and for the C5 code type it's checked AFTER the counter has
> been incremented (so the counter is always incremented
I love that exceptions ;-)

The hook codes consist of a series of nine 00000000 XXXXXXXX codes, and must be marked as (M) code (for not being confused with normal 0XXXXXXX YYYYYYYY codes). For all nine codes, the left 32bit are actually don't care (but should be zero), the meaning of the right 32bit varies from 1st to 9th code.
  1st: Address used prior to launching game (eg. 23xxxxxh)
  2nd: Address to write the hook at (inside the ARM7 executable)
  3rd: Hook final address (huh?)
  4th: Hook mode selection (0=auto, 1=mode1, 2=mode2)
  5th: Opcode that replaces the hooked one (eg. E51DE004h)
  6th: Address to store important stuff  (default 23FE000h)
  7th: Address to store the code handler (default 23FE074h)
  8th: Address to store the code list    (default 23FE564h)
  9th: Must be 1 (00000001h)
For most games, the AR does automatically hook code on the ARM7. Doing that automatically is nice, but hooking ARM7 means that there is no access to VRAM, TCM and Cache, which <might> cause problems since efficient games <should> store all important data in TCM or Cache (though, in practice, I'd doubt that any existing NDS games are that efficient).

To Kenobi and Dualscreenman from Kodewerx for above ARDS cheat info.

  DS Cart Cheat Codebreaker DS

This is Pelican's entry into the DS cheat-device industry. It supports swapping out the cartridges, and alternately, also gives the user the option of connecting another gamecard onto it. For updating, the user may either manually enter codes, or use Wifi to connect to the Codebreaker update site (that updating will overwrite all manually entered codes though).

Codebreaker DS Codes
  0000CR16 GAMECODE                    Specify Game ID, use Encrypted codes
  8000CR16 GAMECODE                    Specify Game ID, use Unencrypted codes
  BEEFC0DE XXXXXXXX                    Change Encryption Keys
  A0XXXXXX YYYYYYYY                    Bootup-Hook 1, X=Address, Y=Value
  A8XXXXXX YYYYYYYY                    Bootup-Hook 2, X=Address, Y=Value
  F0XXXXXX TYYYYYYY         Code-Hook 1 (T=Type,Y=CheatEngineAddr,X=HookAddr)
  F8XXXXXX TPPPPPPP         Code-Hook 2 (T=Type,X=CheatEngineHookAddr,P=Params)
  ---General codes---
  00XXXXXX 000000YY                    [X]=YY
  10XXXXXX 0000YYYY                    [X]=YYYY
  20XXXXXX YYYYYYYY                    [X]=YYYYYYYY
  60XXXXXX 000000YY ZZZZZZZZ 00000000  [[X]+Z]=YY
  60XXXXXX 0000YYYY ZZZZZZZZ 10000000  [[X]+Z]=YYYY
  30XXXXXX 000000YY                    [X]=[X] + YY
  30XXXXXX 0001YYYY                    [X]=[X] + YYYY
  38XXXXXX YYYYYYYY                    [X]=[X] + YYYYYYYY
  70XXXXXX 000000YY                    [X]=[X] OR  YY
  70XXXXXX 001000YY                    [X]=[X] AND YY
  70XXXXXX 002000YY                    [X]=[X] XOR YY
  70XXXXXX 0001YYYY                    [X]=[X] OR  YYYY
  70XXXXXX 0011YYYY                    [X]=[X] AND YYYY
  70XXXXXX 0021YYYY                    [X]=[X] XOR YYYY
  ---Memory fill/copy---
  40XXXXXX 2NUMSTEP 000000YY 000000ZZ  byte[X+(0..NUM-1)*STEP*1]=Y+(0..NUM-1)*Z
  40XXXXXX 1NUMSTEP 0000YYYY 0000ZZZZ  half[X+(0..NUM-1)*STEP*2]=Y+(0..NUM-1)*Z
  50XXXXXX YYYYYYYY ZZZZZZZZ 00000000  copy Y bytes from [X] to [Z]
  ---Conditional codes (bugged)---
  60XXXXXX 000000YY ZZZZZZZZ 01c100VV  IF [[X]+Z] .. VV   THEN [[X]+Z]=YY
  60XXXXXX 000000YY ZZZZZZZZ 01c0VVVV  IF [[X]+Z] .. VVVV THEN [[X]+Z]=YY
  60XXXXXX 0000YYYY ZZZZZZZZ 11c100VV  IF [[X]+Z] .. VV   THEN [[X]+Z]=YYYY
  ---Conditional codes (working)---
  D0XXXXXX NNc100YY                    IF [X] .. YY   THEN exec max(1,NN) lines
  D0XXXXXX NNc0YYYY                    IF [X] .. YYYY THEN exec max(1,NN) lines
The condition digits (c=0..7), have the following functions:
  0 IF [mem] =  imm THEN ...              4 IF ([mem] AND imm) =  0   THEN ...
  1 IF [mem] <> imm THEN ...              5 IF ([mem] AND imm) <> 0   THEN ...
  2 IF [mem] <  imm THEN ... (unsigned)   6 IF ([mem] AND imm) =  imm THEN ...
  3 IF [mem] >  imm THEN ... (unsigned)   7 IF ([mem] AND imm) <> imm THEN ...
  GAMECODE  Cartridge Header[00Ch] (32bit in reversed byte-order)
  CR16      Cartridge Header[15Eh] (16bit in normal byte-order)
  XXXXXX    27bit addr (actually 7 digits, XXXXXXX, overlaps 5bit code number)
The "bugged" conditional codes (60XXXXXX) are accidently skipping NN lines when the condition is false, where NN is taken from the upper 8bit of the code's last 32bit values (ie. exactly as for the D0XXXXXX codes). For byte-writes, that would be NN=01h, which can be eventually dealt with, although there may be compatibility problems which future versions that might fix that bug. For halfword/word writes, NN would be 11h or 21h, so that codes are about totally unusable.

Codebreaker DS / Encrypted Codes
The overall "address value" decryption works like so:
  for i=4Fh to 00h
    if i>13h then y=59E5DC8Ah
    if i>27h then y=054A7818h
    if i>3Bh then y=B1BF0855h
    address = (Key0-value) xor address
    value   = value - Key1 - (address ror 1Bh)
    address = (address xor (value + y)) ror 13h
    if (i>13h) then
      if (i<=27h) or (i>3Bh) then x=Key2 xor Key1 xor Key0
      else x=((Key2 xor Key1) and Key0) xor (Key1 and Key2)
      value=value xor (x+y+address)
      x = Secure[((i*4+00h) and FCh)+000h]
      x = Secure[((i*4+34h) and FCh)+100h] xor x
      x = Secure[((i*4+20h) and FCh)+200h] xor x
      x = Secure[((i*4+08h) and FCh)+300h] xor x
      address = address - (x ror 19h)
  next i
Upon startup, the initial key settings are:
  Secure[0..7FFh] = Copy of the ENCRYPTED 1st 2Kbytes of the game's Secure Area
  Key0 = 0C2EAB3Eh, Key1 = E2AE295Dh, Key2 = E1ACC3FFh, Key3 = 70D3AF46h
Upon BEEFC0DE XXXXXXXX, the keys get changed like so:
  Key0 = Key0 + (XXXXXXXX ror 1Dh)
  Key1 = Key1 - (XXXXXXXX ror 05h)
  Key2 = Key2 xor (Key3 xor Key0)
  Key3 = Key3 xor (Key2  -  Key1)
The above scramble_keys function works like so:
  for i=0 to FFh
    y = byte(xlat_table[i])
    Secure[i*4+000h] = (Secure[i*4+000h] xor Secure[y*4]) + Secure[y*4+100h]
    Secure[i*4+400h] = (Secure[i*4+400h] xor Secure[y*4]) - Secure[y*4+200h]
  next i
  for i=0 to 63h
    Key0 = Key0 xor (Secure[i*4] + Secure[i*4+190h])
    Key1 = Key1 xor (Secure[i*4] + Secure[i*4+320h])
    Key2 = Key2 xor (Secure[i*4] + Secure[i*4+4B0h])
    Key3 = Key3 xor (Secure[i*4] + Secure[i*4+640h])
  next i
  Key0 = Key0  -  Secure[7D0h]
  Key1 = Key1 xor Secure[7E0h]
  Key2 = Key2  +  Secure[7F0h]
  Key3 = Key3 xor Secure[7D0h] xor Secure[7F0h]
the xlat_table consists of 256 fixed 8bit values:
all used operations are unsigned 32bit integer.

To Kenobi and Dualscreenman from Kodewerx for above CBDS cheat info.

  DS Encryption by Gamecode/Idcode (KEY1)

KEY1 - Gamecode / Idcode Encryption
The KEY1 encryption relies only on the gamecode (or firmware idcode), it does not contain any random components. The fact that KEY1 encrypted commands appear random is just because the <unencrypted> commands contain random values, so the encryption result looks random.

KEY1 encryption is used for KEY1 encrypted gamecart commands (ie. for loading the secure area). It is also used for resolving the extra decryption of the first 2K of the secure area, and for firmware decryption, and to decode some encrypted values in gamecart/firmware header.

Initial Encryption Values
Below formulas can be used only with a copy of the 1048h-byte key tables from NDS/DSi BIOS. The values can be found at:
  NDS.ARM7 ROM: 00000030h..00001077h (values 99 D5 20 5F ..) Blowfish/NDS-mode
  DSi.ARM9 ROM: FFFF99A0h..FFFFA9E7h (values 99 D5 20 5F ..) ""
  DSi.TCM Copy: 01FFC894h..01FFD8DBh (values 99 D5 20 5F ..) ""
  DSi.ARM7 ROM: 0000C6D0h..0000D717h (values 59 AA 56 8E ..) Blowfish/DSi-mode
  DSi.RAM Copy: 03FFC654h..03FFD69Bh (values 59 AA 56 8E ..) ""
The DSi ROM sections are disabled after booting, but the RAM/TCM copies can be dumped (at least with some complex main memory hardware mods).

encrypt_64bit(ptr) / decrypt_64bit(ptr)
  FOR I=0 TO 0Fh (encrypt), or FOR I=11h TO 02h (decrypt)
    Z=[keybuf+I*4] XOR X
    X=[keybuf+048h+((Z SHR 24) AND FFh)*4]
    X=[keybuf+448h+((Z SHR 16) AND FFh)*4] + X
    X=[keybuf+848h+((Z SHR  8) AND FFh)*4] XOR X
    X=[keybuf+C48h+((Z SHR  0) AND FFh)*4] + X
    X=Y XOR X
  [ptr+0]=X XOR [keybuf+40h] (encrypt), or [ptr+0]=X XOR [keybuf+4h] (decrypt)
  [ptr+4]=Y XOR [keybuf+44h] (encrypt), or [ptr+4]=Y XOR [keybuf+0h] (decrypt)

  [scratch]=0000000000000000h   ;S=0 (64bit)
  FOR I=0 TO 44h STEP 4         ;xor with reversed byte-order (bswap)
    [keybuf+I]=[keybuf+I] XOR bswap_32bit([keycode+(I MOD modulo)])
  FOR I=0 TO 1040h STEP 8
    encrypt_64bit(scratch)      ;encrypt S (64bit) by keybuf
    [keybuf+I+0]=[scratch+4]    ;write S to keybuf (first upper 32bit)
    [keybuf+I+4]=[scratch+0]    ;write S to keybuf (then lower 32bit)

  if key=nds then copy [nds_arm7bios+0030h..1077h] to [keybuf+0..1047h]
  if key=dsi then copy [dsi_arm7bios+C6D0h..D717h] to [keybuf+0..1047h]
  IF level>=1 THEN apply_keycode(modulo) ;first apply (always)
  IF level>=2 THEN apply_keycode(modulo) ;second apply (optional)
  IF level>=3 THEN apply_keycode(modulo) ;third apply (optional)

  init_keycode(firmware_header+08h,1,0Ch,nds) ;idcode (usually "MACP"), level 1
  decrypt_64bit(firmware_header+18h)          ;rominfo
  init_keycode(firmware_header+08h,2,0Ch,nds) ;idcode (usually "MACP"), level 2
  decrypt ARM9 and ARM7 bootcode by decrypt_64bit (each 8 bytes)
  decompress ARM9 and ARM7 bootcode by LZ77 function (swi)
  calc CRC16 on decrypted/decompressed ARM9 bootcode followed by ARM7 bootcode
Note: The sizes of the compressed/encrypted bootcode areas are unknown (until they are fully decompressed), one way to solve that problem is to decrypt the next 8 bytes each time when the decompression function requires more data.

  init_keycode(cart_header+0Ch,1,08h,nds)   ;gamecode, level 1, modulo 8
  decrypt_64bit(cart_header+78h)            ;rominfo (secure area disable)
  init_keycode(cart_header+0Ch,2,08h,nds)   ;gamecode, level 2, modulo 8
  encrypt_64bit all NDS KEY1 commands (1st command byte in MSB of 64bit value)
  after loading the secure_area, calculate secure_area crc, then
  decrypt_64bit(secure_area+0)              ;first 8 bytes of secure area
  init_keycode(cart_header+0Ch,3,08h,nds)   ;gamecode, level 3, modulo 8
  decrypt_64bit(secure_area+0..7F8h)        ;each 8 bytes in first 2K of secure
  init_keycode(cart_header+0Ch,1,08h,dsi)   ;gamecode, level 1, modulo 8
  encrypt_64bit all DSi KEY1 commands (1st command byte in MSB of 64bit value)
After secure area decryption, the ID field in the first 8 bytes should be "encryObj", if it matches then first 8 bytes are filled with E7FFDEFFh, otherwise the whole 2K are filled by that value.

Gamecart Command Register
Observe that the byte-order of the command register [40001A8h] is reversed. The way how the CPU stores 64bit data in memory (and the way how the "encrypt_64bit" function for KEY1-encrypted commands expects data in memory) is LSB at [addr+0] and MSB at [addr+7]. This value is to be transferred MSB first. However, the DS hardware transfers [40001A8h+0] first, and [40001A8h+7] last. So, the byte order must be reversed when copying the value from memory to the command register.

The KEY1 encryption is based on Bruce Schneier's "Blowfish Encryption Algorithm".

  DS Encryption by Random Seed (KEY2)

KEY2 39bit Seed Values
The pre-initialization settings at cartridge-side (after reset) are:
  Seed0 = 58C56DE0E8h
  Seed1 = 5C879B9B05h
The post-initialization settings (after sending command 4llllmmmnnnkkkkkh to the cartridge, and after writing the Seed values to Port 40001Bxh) are:
  Seed0 = (mmmnnn SHL 15)+6000h+Seedbyte
  Seed1 = 5C879B9B05h
The seedbyte is selected by Cartridge Header [013h].Bit0-2, this index value should be usually in range 0..5, however, possible values for index 0..7 are: E8h,4Dh,5Ah,B1h,17h,8Fh,99h,D5h.
The 24bit random value (mmmnnn) is derived from the real time clock setting, and also scattered by KEY1 encryption, anyways, it's just random and doesn't really matter where it comes from.

KEY2 Encryption
Relies on two 39bit registers (x and y), which are initialized as such:
  x = reversed_bit_order(seed0)  ;ie. LSB(bit0) exchanged with MSB(bit38), etc.
  y = reversed_bit_order(seed1)
During transfer, x, y, and transferred data are modified as such:
  x = (((x shr 5)xor(x shr 17)xor(x shr 18)xor(x shr 31)) and 0FFh)+(x shl 8)
  y = (((y shr 5)xor(y shr 23)xor(y shr 18)xor(y shr 31)) and 0FFh)+(y shl 8)
  data = (data xor x xor y) and 0FFh

  DS Firmware Serial Flash Memory

ST Microelectronics SPI Bus Compatible Serial FLASH Memory
  ID 20h,40h,12h - ST M45PE20 - 256 KBytes (Nintendo DS) (in my old DS)
  ID 20h,50h,12h - ST M35PE20 - 256 KBytes (Nintendo DS) (in my DS-Lite)
  ID 20h,80h,13h - ST M25PE40 - 512 KBytes (iQue DS, with chinese charset)
  ID 20h,40h,11h - ST 45PE10V6 - 128 Kbytes (Nintendo DSi) (in my DSi)
  ID 20h,40h,13h - ST 45PE40V6 - 512 KBytes (DS Zelda, NTR-AZEP-0)
  ID 20h,40h,14h - ST 45PE80V6 - 1024 Kbytes (eg. Spirit Tracks, NTR-BKIP)
 +ID 62h,11h,00h - Sanyo ?          - 512 Kbytes (P-Letter Diamond, ADAE)
  ID 62h,16h,00h - Sanyo LE25FW203T - 256 KBytes (Mariokart backup)
 +ID 62h,26h,11h - Sanyo ?          - ? Kbytes (3DS: CTR-P-AXXJ)
 +ID 62h,26h,13h - Sanyo ?          - ? Kbytes (3DS: CTR-P-APDJ)
  ID C2h,22h,11h - Macronix MX25L1021E? 128 Kbytes (eg. 3DS Starfox)
  ID C2h,22h,13h - Macronix ...? 512 Kbytes (eg. 3DS Kid Icarus, 3DS Sims 3)
  ID C2h,20h,17h - Macronix MX25L6445EZNI-10G 8192 Kbytes (DSi Art Academy)
  ID 01h,F0h,00h - Garbage/Infrared on SPI-bus? (eg. P-Letter White)
  ID 03h,F8h,00h - Garbage/Infrared on SPI-bus? (eg. P-Letter White 2)
FLASH has more than 100,000 Write Cycles, more than 20 Year Data Retention
The Firmware Flash Memory is accessed via SPI bus,
DS Serial Peripheral Interface Bus (SPI)

Instruction Codes
  06h  WREN Write Enable (No Parameters)
  04h  WRDI Write Disable (No Parameters)
  9Fh  RDID Read JEDEC Identification (Read 1..3 ID Bytes)
             (Manufacturer, Device Type, Capacity)
  05h  RDSR Read Status Register (Read Status Register, endless repeated)
             Bit7-2  Not used (zero)
             Bit1    WEL Write Enable Latch             (0=No, 1=Enable)
             Bit0    WIP Write/Program/Erase in Progess (0=No, 1=Busy)
  03h  READ Read Data Bytes (Write 3-Byte-Address, read endless data stream)
  0Bh  FAST Read Data Bytes at Higher Speed (Write 3-Byte-Address, write 1
             dummy-byte, read endless data stream) (max 25Mbit/s)
  0Ah  PW   Page Write (Write 3-Byte-Address, write 1..256 data bytes)
             (changing bits to 0 or 1) (reads unchanged data, erases the page,
             then writes new & unchanged data) (11ms typ, 25ms max)
  02h  PP   Page Program (Write 3-Byte-Address, write 1..256 data bytes)
             (changing bits from 1 to 0) (1.2ms typ, 5ms max)
  DBh  PE   Page Erase 100h bytes (Write 3-Byte-Address) (10ms typ, 20ms max)
  D8h  SE   Sector Erase 10000h bytes (Write 3-Byte-Address) (1s typ, 5s max)
  B9h  DP   Deep Power-down (No Parameters) (consumption 1uA typ, 10uA max)
             (3us) (ignores all further instructions, except RDP)
  ABh  RDP  Release from Deep Power-down (No Parameters) (30us)
Write/Program may not cross page-boundaries. Write/Program/Erase are rejected during first 1..10ms after power up. The WEL bit is automatically cleared on Power-Up, on /Reset, and on completion of WRDI/PW/PP/PE/SE instructions. WEL is set by WREN instruction (which must be issued before any write/program/erase instructions). Don't know how RDSR behaves when trying to write to the write-protected region?

Communication Protocol
  Set Chip Select LOW to invoke the command
  Transmit the instruction byte
  Transmit any parameter bytes
  Transmit/receive any data bytes
  Set Chip Select HIGH to finish the command
All bytes (and 3-byte addresses) transferred most significant bit/byte first.

  1   D    Serial Data In (latched at rising clock edge)          _________
  2   C    Serial Clock (max 25MHz)                             /|o        |
  3   /RES Reset                                            1 -| |         |- 8
  4   /S   Chip Select (instructions start at falling edge) 2 -| |         |- 7
  5   /W   Write Protect (makes first 256 pages read-only)  3 -| |_________|- 6
  6   VCC  Supply (2.7V..3.6V typ) (4V max) (DS:VDD3.3)     4 -|/          |- 5
  7   VSS  Ground                                              |___________|
  8   Q    Serial Data Out (changes at falling clock edge)

  DS Firmware Header

Firmware Memory Map
  00000h-00029h  Firmware Header
  0002Ah-001FFh  Wifi Settings
  00200h-3F9FFh  Firmware Code/Data    ;-NDS only (not DSi)
  00200h-002FEh  00h-filled            ;\
  002FFh         80h                   ;
  00300h-1F3FEh  FFh-filled            ; DSi only (not NDS)
  1F3FFh         Whatever Bootflags    ;
  1F400h-1F5FFh  Wifi Access Point 4   ;
  1F600h-1F7FFh  Wifi Access Point 5   ;
  1F800h-1F9FFh  Wifi Access Point 6   ;/
  3FA00h-3FAFFh  Wifi Access Point 1
  3FB00h-3FBFFh  Wifi Access Point 2
  3FC00h-3FCFFh  Wifi Access Point 3
  3FD00h-3FDFFh  Not used
  3FE00h-3FEFFh  User Settings Area 1
  3FF00h-3FFFFh  User Settings Area 2
On iQue DS (with 512K flash memory), user settings are moved to 7FE00h and up, and, there seems to be some unknown stuff at 200h..27Fh.

Firmware Header (00000h-001FFh)
  Addr Size Expl.
  000h 2    part3 romaddr/8 (arm9 gui code) (LZ/huffman compression)
  002h 2    part4 romaddr/8 (arm7 wifi code) (LZ/huffman compression)
  004h 2    part3/4 CRC16 arm9/7 gui/wifi code
  006h 2    part1/2 CRC16 arm9/7 boot code
  008h 4    firmware identifier (usually nintendo "MAC",nn) (or nocash "XBOO")
            the 4th byte (nn) occassionally changes in different versions
  00Ch 2    part1 arm9 boot code romaddr/2^(2+shift1) (LZSS compressed)
  00Eh 2    part1 arm9 boot code 2800000h-ramaddr/2^(2+shift2)
  010h 2    part2 arm7 boot code romaddr/2^(2+shift3) (LZSS compressed)
  012h 2    part2 arm7 boot code 3810000h-ramaddr/2^(2+shift4)
  014h 2    shift amounts, bit0-2=shift1, bit3-5=shift2, bit6-8=shift3,
            bit9-11=shift4, bit12-15=firmware_chipsize/128K
  016h 2    part5 data/gfx romaddr/8 (LZ/huffman compression)
  018h 8    Optional KEY1-encrypted "enPngOFF"=Cartridge KEY2 Disable
            (feature isn't used in any consoles, instead contains timestamp)
  018h 5    Firmware version built timestamp (BCD minute,hour,day,month,year)
  01Dh 1    Console type
              FFh=Nintendo DS
              20h=Nintendo DS-lite
              57h=Nintendo DSi
            The entry was unused (FFh) in older NDS, ie. replace FFh by 00h)
              Bit0   seems to be DSi/iQue related
              Bit1   seems to be DSi/iQue related
              Bit2   seems to be DSi related
              Bit3   zero
              Bit4   seems to be DSi related
              Bit5   seems to be DS-Lite related
              Bit6   indicates presence of "extended" user settings (DSi/iQue)
              Bit7   zero
  01Eh 2    Unused (FFh-filled)
  020h 2    User Settings Offset (div8) (usually last 200h flash bytes)
  022h 2    Unknown (7EC0h or 0B51h)
  024h 2    Unknown (7E40h or 0DB3h)
  026h 2    part5 CRC16 data/gfx
  028h 2    unused (FFh-filled)
  02Ah-1FFh Wifi Calibration Data (see next chapter)

  000h..01Ch=Zerofilled (bootcode is in new eMMC chip, not on old FLASH chip)
  01Dh..021h=Same as on DS (header: Console Type and User Settings Offset)
  022h..027h=Zerofilled (bootcode is in new eMMC chip, not on old FLASH chip)
  028h..1FCh=Same as on DS (wifi calibration)
  1FDh      =01h for DWM-W015, 02h for DWM-W024 ;\
  1FEh      =20h                                ; this was FFh-filled on DS
  1FFh      =Same as on DS (FFh)                ;/
  200h..2FEh=00h-filled                         ;\
  2FFh      =80h                                ; this was bootcode on DS
  300h..1F3FEh=FFh-filled                       ;
  13FFFh    =FFh (contains whatever Bootflags)  ;/

  DS Firmware Wifi Calibration Data

Wifi Calibration/Settings (located directly after Firmware Header)
  Addr Size Expl.
  000h-029h Firmware Header (see previous chapter)
  02Ah 2    CRC16 (with initial value 0) of [2Ch..2Ch+config_length-1]
  02Ch 2    config_length (usually 0138h, ie. entries 2Ch..163h)
  02Eh 1    Unused        (00h)
  02Fh 1    Wifi version  (00h=v1..v4, 03h=v5, 05h=v6..v7, 0Fh=DSi)
  030h 6    Unused        (00h-filled)
  036h 6    48bit MAC address (v1-v5: 0009BFxxxxxx, v6-v7: 001656xxxxxx)
  03Ch 2    list of enabled channels ANDed with 7FFE (Bit1..14 = Channel 1..14)
            (usually 3FFEh, ie. only channel 1..13 enabled)
  03Eh 2    Whatever Flags (usually FFFFh)
  040h 1    RF Chip Type (usually 02h)
  041h 1    RF Bits per entry at 0CEh (usually 18h=24bit=3byte) (Bit7=?)
  042h 1    RF Number of entries at 0CEh (usually 0Ch)
  043h 1    Unknown (usually 01h)
  044h 2    Initial Value for [4808146h]
  046h 2    Initial Value for [4808148h]
  048h 2    Initial Value for [480814Ah]
  04Ah 2    Initial Value for [480814Ch]
  04Ch 2    Initial Value for [4808120h]
  04Eh 2    Initial Value for [4808122h]
  050h 2    Initial Value for [4808154h]
  052h 2    Initial Value for [4808144h]
  054h 2    Initial Value for [4808130h]
  056h 2    Initial Value for [4808132h]
  058h 2    Initial Value for [4808140h]
  05Ah 2    Initial Value for [4808142h]
  05Ch 2    Initial Value for [4808038h]
  05Eh 2    Initial Value for [4808124h]
  060h 2    Initial Value for [4808128h]
  062h 2    Initial Value for [4808150h]
  064h 69h  Initial 8bit values for BB[0..68h]
  0CDh 1    Unused (00h)
Below for Type2 (ie. when [040h]=2) (Mitsumi MM3155 and RF9008):
  0CEh 24h  Initial 24bit values for RF[0,4,5,6,7,8,9,0Ah,0Bh,1,2,3]
  0F2h 54h  Channel 1..14 2x24bit values for RF[5,6]
  146h 0Eh  Channel 1..14 8bit values for BB[1Eh] (usually somewhat B1h..B7h)
  154h 0Eh  Channel 1..14 8bit values for RF[9].Bit10..14 (usually 10h-filled)
Below for Type3 (ie. when [040h]=3) (Mitsumi MM3218):
  --- Type3 values are originated at 0CEh, following addresses depend on:  ---
  1) number of initial values, found at [042h]        ;usually 29h
  2) number of BB indices,     found at [0CEh+[042h]] ;usually 02h
  3) number of RF indices,     found at [043h]        ;usually 02h
  --- Below example addresses assume above values to be set to 29h,02h,02h ---
  0CEh 29h  Initial 8bit values for RF[0..28h]
  0F7h 1    Number of BB indices per channel
  0F8h 1    1st BB index
  0F9h 14   1st BB data for channel 1..14
  107h 1    2nd BB index
  108h 14   2nd BB data for channel 1..14
  116h 1    1st RF index
  117h 14   1st RF data for channel 1..14
  125h 1    2nd RF index
  126h 14   2nd RF data for channel 1..14
  134h 46   Unused (FFh-filled)
Below for both Type2 and Type3:
  162h 1    Unknown (usually 19h..1Ch)
  163h 1    Unused (FFh) (Inside CRC16 region, with config_length=138h)
  164h 9Ch  Unused (FFh-filled) (Outside CRC16 region, with config_length=138h)
Most of the Wifi settings seem to be always the same values on all currently existing consoles. Except for:
Values that are (obviously) different are the CRC16, and 4th-6th bytes of the MAC address. Also, initial values for BB[01h] and BB[1Eh], and channel 1..14 values for BB[1Eh], and unknown entry [162h] contain different calibration settings on all consoles.
Firmware v5 is having a new wifi ID [2Fh]=03h, and different RF[9] setting.
Firmware v6 (dslite) has wifi ID [2Fh]=05h, and same RF[9] setting as v5, additionally, v6 and up have different 2nd-3rd bytes of the MAC address.

Moreover, a LOT of values are different with Type3 chips (ie. when [040h]=3).

Unlike for Firmware User Settings, the Firmware Header (and Wifi Settings) aren't stored in RAM upon boot. So the data must be retrieved via SPI bus by software.

  DS Firmware Wifi Internet Access Points

NDS (three 100h-byte regions) (also exists on DSi)
These three 100h byte regions are used to memorize known internet access points. The firmware doesn't use these regions, but games that support internet seem to be allowed to read (and write) them.
03FA00-03FAFF: connection data 1
03FB00-03FBFF: connection data 2
03FC00-03FCFF: connection data 3
(07Fxxx for iQue DS)
(01Fxxx for DSi)
  Addr Siz Expl.
  000h 64  Unknown (usually 00h-filled) (no Proxy supported on NDS)
  040h 32  SSID (ASCII name of the access point) (padded with 00h's) see [0E8h]
  060h 32  SSID for WEP64 on AOSS router (each security level has its own SSID)
  080h 16  WEP Key 1 (for type/size, see entry E6h)
  090h 16  WEP Key 2  ;\
  0A0h 16  WEP Key 3  ; (usually 00h-filled)
  0B0h 16  WEP Key 4  ;/
  0C0h 4   IP Address           (0=Auto/DHCP)
  0C4h 4   Gateway              (0=Auto/DHCP)
  0C8h 4   Primary DNS Server   (0=Auto/DHCP)
  0CCh 4   Secondary DNS Server (0=Auto/DHCP)
  0D0h 1   Subnet Mask (0=Auto/DHCP, 1..1Ch=Leading Ones) (eg. 6 = FC.00.00.00)
  0D1h ..  Unknown (usually 00h-filled)
  0E6h 1   WEP Mode (0=None, 1/2/3=5/13/16 byte hex, 5/6/7=5/13/16 byte ascii)
  0E7h 1   Status (00h=Normal, 01h=AOSS, FFh=connection not configured/deleted)
  0E8h 1   SSID Length in characters
  0E9h 1   Unknown (usually 00h)
  0EAh 2   MTU Value (Max transmission unit) (576..1500, usually 1400)
  0ECh 3   Unknown (usually 00h-filled)
  0EFh 1   bit0/1/2 - connection 1/2/3 (1=Configured, 0=Not configured)
  0F0h 6   Nintendo Wifi Connection (WFC) 43bit User ID
           (ID=([F0h] AND 07FFFFFFFFFFFFh)*1000, shown as decimal string
           NNNN-NNNN-NNNN-N000) (the upper 5bit of the last byte are
           containing additional/unknown nonzero data)
  0F6h 8   Unknown (nonzero stuff !?!)
  0FEh 2   CRC16 for Entries 00h..FDh (with initial value 0000h)
For connection 3: entries [EFh..FDh] - always zero-filled?
The location of the first data block seems to be at the User Settings address (see Firmware Header [020h]) minus 400h.

DSi (three new 200h-byte regions)
The DSi has three extra regions (for use DSi games, with the new WPA encryption support, and with additional proxy support), these extra regions are found under "Advanced Setup" in the DSi firmware's "Internet" configuration menu.
01F400-01F5FF: new DSi connection data 4
01F600-01F7FF: new DSi connection data 5
01F800-01F9FF: new DSi connection data 6
  Addr Siz Expl.
  000h 32  Proxy Authentication Username (ASCII string, padded with 00's)
  000h 32  Proxy Authentication Password (ASCII string, padded with 00's)
  040h ..  SSID (ASCII string, padded with 00's) (see [0E8h] for length)
  0xxh ..  Maybe same as NDS
  080h ..  WEP Key (zerofilled for WPA)
  0xxh ..  Maybe same as NDS
  0C0h 4   IP Address           (0=Auto/DHCP)
  0C4h 4   Gateway              (0=Auto/DHCP)
  0C8h 4   Primary DNS Server   (0=Auto/DHCP)
  0CCh 4   Secondary DNS Server (0=Auto/DHCP)
  0D0h 1   Subnet Mask (0=Auto/DHCP, 1..1Ch=Leading Ones) (eg. 6 = FC.00.00.00)
  0D1h ..  Unknown (zerofilled)
  0E6h 1   WEP (00h=None/WPA/WPA2, 01h=WEP/5byteHEX)
  0E7h 1   00h=Normal, 10h=WPA/WPA2 (or FFh=unused/deleted)
  0E8h 1   SSID Length in characters
  0E9h 1   Unknown (usually 00h)
  0EAh 2   MTU Value (Max transmission unit) (576..1500, usually 1400)
  0ECh 3   Unknown (usually 00h-filled)
  0EFh 1   bit0/1/2 - connection 1/2/3 (1=Configured, 0=Not configured)
  0F0h 14  Zerofilled (or maybe ID as on NDS, if any such ID exists for DSi?)
  0FEh 2   Maybe CRC16 ?         (93h,88h)
  100h 32  Some big random hex number? (FEh,72h,...)      ;\all zero for WEP
  120h 16  WPA/WPA2 key (ASCII string, padded with 00's)  ;/
  130h ..  Zerofilled
  181h 1   WPA (0=None or WEP, 4=WPA-TKIP, 5=WPA2-TKIP, 6=WPA-AES, 7=WPA2-AES)
  182h 1   Proxy Enable         (00h=None, 01h=Yes)
  183h 1   Proxy Authentication (00h=None, 01h=Yes)
  184h ..  Proxy Name (ASCII string, padded with 00's)
  1xxh ..  Zerofilled
  1E8h 2   Proxy Port (16bit)
  1EAh ..  Zerofilled
  1FEh 2   Maybe another CRC16 ? (this one is 0000h if unused/deleted)
The location of the first data block (aka settings number 4) seems to be at the User Settings address (see Firmware Header [020h]) minus A00h.
Observe that NDS consoles do have Firmware bootcode/data in that area, so those new regions exist on DSi only.
There is probably also some flag in Firmware Header that indicates the presence of the new wifi regions (or as a general rule: All DSi consoles should have them).
Note that the Proxy feature can be used to redirect internet access (when using a custom proxy server, one could redirect commercial games to homebrew servers; as done by the project) (actually the same should be possible with the DNS server entry, possibly with less traffic).

  DS Firmware User Settings

Current Settings (RAM 27FFC80h-27FFCEFh)
User Settings 0 (Firmware 3FE00h-3FEFFh) ;(DSi & iQue use different address,
User Settings 1 (Firmware 3FF00h-3FFFFh) ;see Firmware Header [020h])
  Addr Size Expl.
  000h  2   Version (5) (Always 5, for all NDS/DSi Firmware versions)
  002h  1   Favorite color (0..15) (0=Gray, 1=Brown, etc.)
  003h  1   Birthday month (1..12) (Binary, non-BCD)
  004h  1   Birthday day   (1..31) (Binary, non-BCD)
  005h  1   Not used (zero)
  006h  20  Nickname string in UTF-16 format
  01Ah  2   Nickname length in characters    (0..10)
  01Ch  52  Message string in UTF-16 format
  050h  2   Message length in characters     (0..26)
  052h  1   Alarm hour     (0..23) (Binary, non-BCD)
  053h  1   Alarm minute   (0..59) (Binary, non-BCD)
  054h  2
  056h  1   80h=enable alarm (huh?), bit 0..6=enable?
  057h  1   Zero (1 byte)
  058h  2x2 Touch-screen calibration point (adc.x1,y1) 12bit ADC-position
  05Ch  2x1 Touch-screen calibration point (scr.x1,y1) 8bit pixel-position
  05Eh  2x2 Touch-screen calibration point (adc.x2,y2) 12bit ADC-position
  062h  2x1 Touch-screen calibration point (scr.x2,y2) 8bit pixel-position
  064h  2   Language and Flags (see below)
  066h  1   Year (2000..2255) (when having entered date in the boot menu)
  067h  1   Unknown (usually 00h...08h or 78h..7Fh or so)
  068h  4   RTC Offset (difference in seconds when RTC time/date was changed)
  06Ch  4   Not used (FFh-filled, sometimes 00h-filled) (=MSBs of above?)
Below not stored in RAM (found only in FLASH memory)...
  070h  2   update counter (used to check latest) (must be 0000h..007Fh)
  072h  2   CRC16 of entries 00h..6Fh (70h bytes)
  074h  8Ch Not used (FFh-filled) (or extended data, see below)
Below extended data was invented for iQue DS (for adding the chinese language setting), and is also included in Nintendo DSi models. Presence of extended data is indicated in Firmware Header entry [1Dh].Bit6.
  074h  1   Unknown (01h) (maybe version?)
  075h  1   Extended Language (0..5=Same as Entry 064h, plus 6=Chinese)
            (for language 6, entry 064h defaults to english; for compatibility)
            (for language 0..5, both entries 064h and 075h have same value)
  076h  2   Bitmask for Supported Languages (Bit0..6)
            (007Eh for iQue DS, ie. with chinese, but without japanese)
            (003Eh for DSi/EUR, ie. without chinese, and without japanese)
  078h  86h Not used (FFh-filled on iQue DS, 00h-filled on DSi)
  0FEh  2   CRC16 of entries 74h..FDh (8Ah bytes)
Note: The DSi has some more settings (eg. Country (additionally to Language), Parental Controls, and a surreal fake Wireless-Disable option; which does only disable the Wifi LED, the actual Wifi transmission does still work).
That new settings are stored in eMMC files. The old/above User Settings are stored in those files too (and copy of those User Settings is stored in Wifi FLASH, as described above; that copy is intended mainly for backwards compatibilty with NDS games).
DSi SD/MMC Firmware System Settings Data Files
DSi Backlight level and DSi sound volume seem to be stored in the BPTWL chip (or possibly in its attached I2C potentiometer).

Language and Flags (Entry 064h)
  0..2 Language (0=Japanese, 1=English, 2=French, 3=German,
       4=Italian, 5=Spanish, 6..7=Reserved) (for Chinese see Entry 075h)
       (the language setting also implies time/data format)
  3    GBA mode screen selection (0=Upper, 1=Lower)
  4-5  Backlight Level    (0..3=Low,Med,High,Max) (DS-Lite only)
  6    Bootmenu Disable   (0=Manual/bootmenu, 1=Autostart Cartridge)
  9    Settings Lost (1=Prompt for User Info, and Language, and Calibration)
  10   Settings Okay (0=Prompt for User Info)
  11   Settings Okay (0=Prompt for User Info) (Same as Bit10)
  12   No function
  13   Settings Okay (0=Prompt for User Info, and Language)
  14   Settings Okay (0=Prompt for User Info) (Same as Bit10)
  15   Settings Okay (0=Prompt for User Info) (Same as Bit10)
The Health and Safety message is skipped if Bit9=1, or if one or more of the following bits is zero: Bits 10,11,13,14,15. However, as soon as entering the bootmenu, the Penalty-Prompt occurs.

Note: There are two User Settings areas in the firmware chip, at offset 3FE00h and 3FF00h, if both areas have valid CRCs, then the current/newest area is that whose Update Counter is one bigger than in the other/older area.
  IF count1=((count0+1) AND 7Fh) THEN area1=newer ELSE area0=newer
When changing settings, the older area is overwritten with new data (and incremented Update Counter). The two areas allow to recover previous settings in case of a write-error (eg. on a battery failure during write).

Battery Removal
Even though the battery is required only for the RTC (not for the firmware flash memory), most of the firmware user settings are reset when removing the battery. This appears to be a strange bug-or-feature of the DS bios, at least, fortunately, it still keeps the rest of the firmware intact.

  DS Firmware Extended Settings

Extended Settings contain some additional information which is not supported by the original firmware (current century, date/time formats, temperature calibration, etc.), the settings are supported by Nocash Firmware, by the no$gba emulator, and may be eventually also supported by other emulators. If present, the values can be used by games, otherwise games should use either whatever default settings, or contain their own configuration menu.

Extended Settings - loaded to 23FEE00h (aka fragments of NDS9 boot code)
  Addr Siz Expl.
  00h  8  ID "XbooInfo"
  08h  2  CRC16 Value [0Ch..0Ch+Length-1]
  0Ah  2  CRC16 Length (from 0Ch and up)
  0Ch  1  Version (currently 01h)
  0Dh  1  Update Count (newer = (older+1) AND FFh)
  0Eh  1  Bootmenu Flags
            Bit6   Important Info  (0=Disable, 1=Enable)
            Bit7   Bootmenu Screen (0=Upper, 1=Lower)
  0Fh  1  GBA Border (0=Black, 1=Gray Line)
  10h  2  Temperature Calibration TP0 ADC value  (x16) (sum of 16 ADC values)
  12h  2  Temperature Calibration TP1 ADC value  (x16) (sum of 16 ADC values)
  14h  2  Temperature Calibration Degrees Kelvin (x100) (0=none)
  16h  1  Temperature Flags
            Bit0-1 Format (0=Celsius, 1=Fahrenheit, 2=Reaumur, 3=Kelvin)
  17h  1  Backlight Intensity (0=0ff .. FFh=Full)
  18h  4  Date Century Offset       (currently 20, for years 2000..2099)
  1Ch  1  Date Month Recovery Value (1..12)
  1Dh  1  Date Day Recovery Value   (1..31)
  1Eh  1  Date Year Recovery Value  (0..99)
  1Fh  1  Date/Time Flags
            Bit0-1 Date Format   (0=YYYY-MM-DD, 1=MM-DD-YYYY, 2=DD-MM-YYYY)
            Bit2   Friendly Date (0=Raw Numeric, 1=With Day/Month Names)
            Bit5   Time DST      (0=Hide DST, 1=Show DST=On/Off)
            Bit6   Time Seconds  (0=Hide Seconds, 1=Show Seconds)
            Bit7   Time Format   (0=24 hour, 1=12 hour)
  20h  1  Date Separator      (Ascii, usually Slash, or Dot)
  21h  1  Time Separator      (Ascii, usually Colon, or Dot)
  22h  1  Decimal Separator   (Ascii, usually Comma, or Dot)
  23h  1  Thousands Separator (Ascii, usually Comma, or Dot)
  24h  1  Daylight Saving Time (Nth)
             Bit 0-3 Activate on (0..4 = Last,1st,2nd,3rd,4th)
             Bit 4-7 Deactivate on (0..4 = Last,1st,2nd,3rd,4th)
  25h  1  Daylight Saving Time (Day)
             Bit 0-3 Activate on (0..7 = Mon,Tue,Wed,Thu,Fri,Sat,Sun,AnyDay)
             Bit 4-7 Deactivate on (0..7 = Mon,Tue,Wed,Thu,Fri,Sat,Sun,AnyDay)
  26h  1  Daylight Saving Time (of Month)
             Bit 0-3 Activate DST in Month   (1..12)
             Bit 4-7 Deactivate DST in Month (1..12)
  27h  1  Daylight Saving Time (Flags)
             Bit 0   Current DST State (0=Off, 1=On)
             Bit 1   Adjust DST Enable (0=Disable, 1=Enable)
Note: With the original firmware, the memory region at 23FEE00h and up contains un-initialized, non-zero-filled data (fragments of boot code).

  DS Wireless Communications

DS Wifi I/O Map
DS Wifi Control
DS Wifi Interrupts
DS Wifi Power-Down Registers
DS Wifi Receive Control
DS Wifi Receive Buffer
DS Wifi Receive Statistics
DS Wifi Transmit Control
DS Wifi Transmit Buffers
DS Wifi Transmit Errors
DS Wifi Status
DS Wifi Timers
DS Wifi Multiplay Master
DS Wifi Multiplay Slave
DS Wifi Configuration Ports
DS Wifi Baseband Chip (BB)
DS Wifi RF Chip
DS Wifi RF9008 Registers
DS Wifi Unknown Registers
DS Wifi Unused Registers
DS Wifi Initialization
DS Wifi Flowcharts
DS Wifi Hardware Headers
DS Wifi Multiboot
DS Wifi IEEE802.11 Frames
DS Wifi IEEE802.11 Managment Frames (Type=0)
DS Wifi IEEE802.11 Control and Data Frames (Type=1 and 2)

2.4GHz band, Wireless LAN (WLAN) IEEE802.11b protocol

A very large part of the DS Wifi chapters is based on Stephen Stair's great DS Wifi document, thanks there.

  DS Wifi I/O Map

Wifi Registers & RAM cannot be written to by STRB opcodes (ignored).

Registers - NDS7 - 4808000h..4808FFFh
  Address  Dir   Name            r/w  [Init] Description
  4808000h R     W_ID            ---- [1440] Chip ID (1440h=DS, C340h=DS-Lite)
  4808004h R/W   W_MODE_RST      9fff [0000] Mode/Reset
  4808006h R/W   W_MODE_WEP      --7f [0000] Mode/Wep modes
  4808008h R/W   W_TXSTATCNT     ffff [0000] Beacon Status Request
  480800Ah R/W   W_X_00Ah        ffff [0000] [bit7 - ingore rx duplicates]
  4808010h R/W   W_IF            ackk [0000] Wifi Interrupt Request Flags
  4808012h R/W   W_IE            ffff [0000] Wifi Interrupt Enable
  4808018h R/W   W_MACADDR_0     ffff [0000] Hardware MAC Address, 1st 2 bytes
  480801Ah R/W   W_MACADDR_1     ffff [0000] Hardware MAC Address, next 2 bytes
  480801Ch R/W   W_MACADDR_2     ffff [0000] Hardware MAC Address, last 2 bytes
  4808020h R/W   W_BSSID_0       ffff [0000] BSSID (first 2 bytes)
  4808022h R/W   W_BSSID_1       ffff [0000] BSSID (next 2 bytes)
  4808024h R/W   W_BSSID_2       ffff [0000] BSSID (last 2 bytes)
  4808028h R/W   W_AID_LOW       ---f [0000] usually as lower 4bit of AID value
  480802Ah R/W   W_AID_FULL      -7ff [0000] AID value assigned by a BSS.
  480802Ch R/W   W_TX_RETRYLIMIT ffff [0707] Tx Retry Limit (set from 0x00-0xFF)
  480802Eh R/W   W_INTERNAL      ---1 [0000]
  4808030h R/W   W_RXCNT         ff0e [0000] Receive control
  4808032h R/W   W_WEP_CNT       ffff [0000] WEP engine enable
  4808034h R?    W_INTERNAL      0000 [0000] bit0,1 (see ports 004h,040h,1A0h)
Power-Down Registers (and Random Generator)
  4808036h R/W   W_POWER_US      ---3 [0001]
  4808038h R/W   W_POWER_TX      ---7 [0003]
  480803Ch R/W   W_POWERSTATE    -r-2 [0200]
  4808040h R/W   W_POWERFORCE    8--1 [0000]
  4808044h R     W_RANDOM        0xxx [0xxx]
  4808048h R/W   W_POWER_?       ---3 [0000]
WLAN Memory Ports
  4808050h R/W   W_RXBUF_BEGIN   ffff [4000]
  4808052h R/W   W_RXBUF_END     ffff [4800]
  4808054h R     W_RXBUF_WRCSR   0rrr [0000]
  4808056h R/W   W_RXBUF_WR_ADDR -fff [0000]
  4808058h R/W   W_RXBUF_RD_ADDR 1ffe [0000]
  480805Ah R/W   W_RXBUF_READCSR -fff [0000]
  480805Ch R/W   W_RXBUF_COUNT   -fff [0000]
  4808060h R     W_RXBUF_RD_DATA rrrr [xxxx]
  4808062h R/W   W_RXBUF_GAP     1ffe [0000]
  4808064h R/W   W_RXBUF_GAPDISP -fff [0000]
  4808068h R/W   W_TXBUF_WR_ADDR 1ffe [0000]
  480806Ch R/W   W_TXBUF_COUNT   -fff [0000]
  4808070h W     W_TXBUF_WR_DATA xxxx [xxxx]
  4808074h R/W   W_TXBUF_GAP     1ffe [0000]
  4808076h R/W   W_TXBUF_GAPDISP 0fff [0000]
  4808078h W     W_INTERNAL      mirr [mirr] Read: Mirror of 068h
  4808080h R/W   W_TXBUF_BEACON  ffff [0000] Beacon Transmit Location
  4808084h R/W   W_TXBUF_TIM     --ff [0000] Beacon TIM Index in Frame Body
  4808088h R/W   W_LISTENCOUNT   --ff [0000] Listen Count
  480808Ch R/W   W_BEACONINT     -3ff [0064] Beacon Interval
  480808Eh R/W   W_LISTENINT     --ff [0000] Listen Interval
  4808090h R/W   W_TXBUF_CMD     ffff [0000]    (used by firmware part4)
  4808094h R/W   W_TXBUF_REPLY1  ffff [0000]    (used by firmware part4)
  4808098h R     W_TXBUF_REPLY2  0000 [0000]    (used by firmware part4)
  480809Ch R/W   W_INTERNAL      ffff [0050] value 4x00h --> preamble+x*12h us?
  48080A0h R/W   W_TXBUF_LOC1    ffff [0000]
  48080A4h R/W   W_TXBUF_LOC2    ffff [0000]
  48080A8h R/W   W_TXBUF_LOC3    ffff [0000]
  48080ACh W     W_TXREQ_RESET   fixx [0050]
  48080AEh W     W_TXREQ_SET     fixx [0050]
  48080B0h R     W_TXREQ_READ    --1f [0010]
  48080B4h W     W_TXBUF_RESET   0000 [0000]    (used by firmware part4)
  48080B6h R     W_TXBUSY        0000 [0000]    (used by firmware part4)
  48080B8h R     W_TXSTAT        0000 [0000]
  48080BAh ?     W_INTERNAL      0000 [0000]
  48080BCh R/W   W_PREAMBLE      ---3 [0001]
  48080C0h R/W x W_CMD_TOTALTIME ffff [0000]    (used by firmware part4)
  48080C4h R/W x W_CMD_REPLYTIME ffff [0000]    (used by firmware part4)
  48080C8h ?     W_INTERNAL      0000 [0000]
  48080D0h R/W   W_RXFILTER      1fff [0401]
  48080D4h R/W   W_CONFIG_0D4h   ---3 [0001]
  48080D8h R/W   W_CONFIG_0D8h   -fff [0004]
  48080DAh R/W   W_CONFIG_0DAh   ffff [0602]
  48080E0h R/W   W_RXFILTER2     ---f [0008]
Wifi Timers
  48080E8h R/W   W_US_COUNTCNT   ---1 [0000] Microsecond counter enable
  48080EAh R/W   W_US_COMPARECNT ---1 [0000] Microsecond compare enable
  48080ECh R/W   W_CONFIG_0ECh   3f1f [3F03]
  48080EEh R/W   W_CMD_COUNTCNT  ---1 [0001]
  48080F0h R/W   W_US_COMPARE0   fc-- [FC00] Microsecond compare, bits 0-15
  48080F2h R/W   W_US_COMPARE1   ffff [FFFF] Microsecond compare, bits 16-31
  48080F4h R/W   W_US_COMPARE2   ffff [FFFF] Microsecond compare, bits 32-47
  48080F6h R/W   W_US_COMPARE3   ffff [FFFF] Microsecond compare, bits 48-63
  48080F8h R/W   W_US_COUNT0     ffff [0000] Microsecond counter, bits 0-15
  48080FAh R/W   W_US_COUNT1     ffff [0000] Microsecond counter, bits 16-31
  48080FCh R/W   W_US_COUNT2     ffff [0000] Microsecond counter, bits 32-47
  48080FEh R/W   W_US_COUNT3     ffff [0000] Microsecond counter, bits 48-63
  4808100h ?     W_INTERNAL      0000 [0000]
  4808102h ?     W_INTERNAL      0000 [0000]
  4808104h ?     W_INTERNAL      0000 [0000]
  4808106h ?     W_INTERNAL      0000 [0000]
  480810Ch R/W   W_CONTENTFREE   ffff [0000] ...
  4808110h R/W   W_PRE_BEACON    ffff [0000]
  4808118h R/W   W_CMD_COUNT     ffff [0000]
  480811Ch R/W   W_BEACONCOUNT1  ffff [0000] reloaded with W_BEACONINT
Configuration Ports (and some other Registers)
  4808120h R/W   W_CONFIG_120h   81ff [0048] init from firmware[04Ch]
  4808122h R/W   W_CONFIG_122h   ffff [4840] init from firmware[04Eh]
  4808124h R/W   W_CONFIG_124h   ffff [0000] init from firmware[05Eh], or 00C8h
  4808126h ?     W_INTERNAL      fixx [ 0080]
  4808128h R/W   W_CONFIG_128h   ffff [0000] init from firmware[060h], or 07D0h
  480812Ah ?     W_INTERNAL      fixx [1000] lower 12bit same as W_CONFIG_128h
  4808130h R/W   W_CONFIG_130h   -fff [0142] init from firmware[054h]
  4808132h R/W   W_CONFIG_132h   8fff [8064] init from firmware[056h]
  4808134h R/W   W_BEACONCOUNT2  ffff [FFFF] ...
  4808140h R/W   W_CONFIG_140h   ffff [0000] init from firmware[058h], or xx
  4808142h R/W   W_CONFIG_142h   ffff [2443] init from firmware[05Ah]
  4808144h R/W   W_CONFIG_144h   --ff [0042] init from firmware[052h]
  4808146h R/W   W_CONFIG_146h   --ff [0016] init from firmware[044h]
  4808148h R/W   W_CONFIG_148h   --ff [0016] init from firmware[046h]
  480814Ah R/W   W_CONFIG_14Ah   --ff [0016] init from firmware[048h]
  480814Ch R/W   W_CONFIG_14Ch   ffff [162C] init from firmware[04Ah]
  4808150h R/W   W_CONFIG_150h   ff3f [0204] init from firmware[062h], or 202h
  4808154h R/W   W_CONFIG_154h   7a7f [0058] init from firmware[050h]
Baseband Chip Ports
  4808158h W     W_BB_CNT        mirr [00B5] BB Access Start/Direction/Index
  480815Ah W     W_BB_WRITE      ???? [0000] BB Access data byte to write
  480815Ch R     W_BB_READ       00rr [00B5] BB Access data byte read
  480815Eh R     W_BB_BUSY       000r [0000] BB Access Busy flag
  4808160h R/W   W_BB_MODE       41-- [0100] BB Access Mode
  4808168h R/W   W_BB_POWER      8--f [800D] BB Access Powerdown
Internal Stuff
  480816Ah ?     W_INTERNAL      0000 [0001] (or 0000h?)
  4808170h ?     W_INTERNAL      0000 [0000]
  4808172h ?     W_INTERNAL      0000 [0000]
  4808174h ?     W_INTERNAL      0000 [0000]
  4808176h ?     W_INTERNAL      0000 [0000]
  4808178h W     W_INTERNAL      fixx [0800] Read: mirror of 17Ch
RF Chip Ports
  480817Ch R/W   W_RF_DATA2      ffff [0800]
  480817Eh R/W   W_RF_DATA1      ffff [C008]
  4808180h R     W_RF_BUSY       000r [0000]
  4808184h R/W   W_RF_CNT        413f [0018]
  4808190h R/W   W_INTERNAL      ffff [0000]
  4808194h R/W   W_TX_HDR_CNT    ---7 [0000] used by firmware part4 (0 or 6)
  4808198h R/W   W_INTERNAL      ---f [0000]
  480819Ch R     W_RF_PINS       fixx [0004]
  48081A0h R/W   W_X_1A0h        -933 [0000] used by firmware part4 (0 or 823h)
  48081A2h R/W   W_X_1A2h        ---3 [0001] used by firmware part4
  48081A4h R/W   W_X_1A4h        ffff [0000] "Rate used when signal test..."
Wifi Statistics
  48081A8h R     W_RXSTAT_INC_IF rrrr [0000] Stats Increment Flags
  48081AAh R/W   W_RXSTAT_INC_IE ffff [0000] Stats Increment IRQ Enable
  48081ACh R     W_RXSTAT_OVF_IF rrrr [0000] Stats Half-Overflow Flags
  48081AEh R/W   W_RXSTAT_OVF_IE ffff [0000] Stats Half-Overflow IRQ Enable
  48081B0h R/W   W_RXSTAT        --ff [0000]
  48081B2h R/W   W_RXSTAT        ffff [0000] RX_LengthRateErrorCount
  48081B4h R/W   W_RXSTAT        rrff [0000] ... firmware uses also MSB ... ?
  48081B6h R/W   W_RXSTAT        ffff [0000]
  48081B8h R/W   W_RXSTAT        --ff [0000]
  48081BAh R/W   W_RXSTAT        --ff [0000]
  48081BCh R/W   W_RXSTAT        ffff [0000]
  48081BEh R/W   W_RXSTAT        ffff [0000]
  48081C0h R/W   W_TX_ERR_COUNT  --ff [0000] TransmitErrorCount
  48081C4h R     W_RX_COUNT      fixx [0000]
[1D0 - 1DE are 15 entries related to multiplayer response errors]
  48081D0h R/W   W_CMD_STAT      ff-- [0000]
  48081D2h R/W   W_CMD_STAT      ffff [0000]
  48081D4h R/W   W_CMD_STAT      ffff [0000]
  48081D6h R/W   W_CMD_STAT      ffff [0000]
  48081D8h R/W   W_CMD_STAT      ffff [0000]
  48081DAh R/W   W_CMD_STAT      ffff [0000]
  48081DCh R/W   W_CMD_STAT      ffff [0000]
  48081DEh R/W   W_CMD_STAT      ffff [0000]
Internal Diagnostics Registers (usually not used for anything)
  48081F0h R/W   W_INTERNAL      ---3 [0000]
  4808204h ?     W_INTERNAL      fixx [0000]
  4808208h ?     W_INTERNAL      fixx [0000]
  480820Ch W     W_INTERNAL      fixx [0050]
  4808210h R     W_TX_SEQNO      fixx [0000]
  4808214h R     W_RF_STATUS     XXXX [0009]    (used by firmware part4)
  480821Ch W     W_IF_SET        fbff [0000] Force Interrupt (set bits in W_IF).
  4808220h R/W   W_INTERNAL      ffff [0000] Bit0-1: Enable/Disable WifiRAM
                                             (locks memory at 4000h-5FFFh)
  4808224h R/W   W_INTERNAL      ---3 [0003]
  4808228h W     W_X_228h        fixx [0000]    (used by firmware part4) (bit3)
  4808230h R/W   W_INTERNAL      --ff [0047]
  4808234h R/W   W_INTERNAL      -eff [0EFF]
  4808238h R/W   W_INTERNAL      ffff [0000] ;rx_seq_no-60h+/-x   ;why that?
                                   ;other day: fixed value, not seq_no related?
  480823Ch ?     W_INTERNAL      fixx [0000] like W_TXSTAT... ONLY for beacons?
  4808244h R/W   W_X_244h        ffff [0000]    (used by firmware part4)
  4808248h R/W   W_INTERNAL      ffff [0000]
  480824Ch R     W_INTERNAL      fixx [0000] ;rx_mac_addr_0
  480824Eh R     W_INTERNAL      fixx [0000] ;rx_mac_addr_1
  4808250h R     W_INTERNAL      fixx [0000] ;rx_mac_addr_2
  4808254h ?     W_CONFIG_254h   fixx [0000] (read: FFFFh=DS, EEEEh=DS-Lite)
  4808258h ?     W_INTERNAL      fixx [0000]
  480825Ch ?     W_INTERNAL      fixx [0000]
  4808260h ?     W_INTERNAL      fixx [ 0FEF]
  4808264h R     W_INTERNAL      fixx [0000] ;rx_addr_1 (usually "rxtx_addr-x")
  4808268h R     W_RXTX_ADDR     fixx [0005] ;rxtx_addr
  4808270h R     W_INTERNAL      fixx [0000] ;rx_addr_2 (usually "rx_addr_1-1")
  4808274h ?     W_INTERNAL      fixx [ 0001]
  4808278h R/W   W_INTERNAL      ffff [000F]
  480827Ch ?     W_INTERNAL      fixx [ 000A]
  4808290h (R/W) W_X_290h        fixx [FFFF] bit 0 = ?  (used by firmware part4)
  4808298h W     W_INTERNAL      fixx [0000]
  48082A0h R/W   W_INTERNAL      ffff [0000]
  48082A2h R     W_INTERNAL      XXXX [7FFF] 15bit shift reg (used during tx?)
  48082A4h R     W_INTERNAL      fixx [0000] ;rx_rate_1 not ALWAYS same as 2C4h
  48082A8h W     W_INTERNAL      fixx [0000]
  48082ACh ?     W_INTERNAL      fixx [ 0038]
  48082B0h W     W_INTERNAL      fixx [0000]
  48082B4h R/W   W_INTERNAL      -1-3 [0000]
  48082B8h ?     W_INTERNAL      fixx [0000]
  48082C0h R/W   W_INTERNAL      ---1 [0000]
  48082C4h R     W_INTERNAL      fixx [000A] ;rx_rate_2 (0Ah,14h = 1,2 Mbit/s)
  48082C8h R     W_INTERNAL      fixx [0000] ;rx_duration/length/rate (or so?)
  48082CCh R     W_INTERNAL      fixx [0000] ;rx_framecontrol; from ieee header
  48082D0h DIS   W_INTERNAL                  ;"W_POWERACK" (internal garbage)
                                             ;normally DISABLED (unless FORCE)
  48082F0h R/W   W_INTERNAL      ffff [0000]
  48082F2h R/W   W_INTERNAL      ffff [0000]
  48082F4h R/W   W_INTERNAL      ffff [0000]
  48082F6h R/W   W_INTERNAL      ffff [0000]
All other ports in range 4808000h..4808FFFh are unused.
All registers marked as "W_INTERNAL" aren't used by Firmware part4, and are probably unimportant, except for whatever special diagnostics purposes.
Reading from write-only ports (W) often mirrors to data from other ports.

Additionally, there are 69h Baseband Chip Registers (BB), and 0Fh RF Chip Registers (see BB and RF chapters).

For Wifi Power Managment (POWCNT2), for Wifi Waitstates (WIFIWAITCNT), and for the Power LED Blink Feature (conventionally used to indicate Wifi activity) see:
DS Power Management

For Wifi Configuration and Calibration data in Firmware Header, see:
DS Cartridges, Encryption, Firmware

Wifi RAM - NDS7 - Memory (4804000h..4805FFFh)
  4804000h W_MACMEM RX/TX Buffers (2000h bytes) (excluding below specials)
  4805F60h Used for something, not included in the rx circular buffer.
  4805F80h W_WEPKEY_0 (32 bytes)
  4805FA0h W_WEPKEY_1 (32 bytes)
  4805FC0h W_WEPKEY_2 (32 bytes)
  4805FE0h W_WEPKEY_3 (32 bytes)
Unlike all other NDS memory, Wifi RAM is left uninitialized after boot.

5F80h - W_WEPKEY_0 thru W_WEPKEY_3 - Wifi WEP keys (R/W)
These WEP key slots store the WEP keys that are used for encryption for 802.11 keys IDs 0-3.

  DS Wifi Control

4808000h - W_ID - Wifi Chip ID (R)
  0-15   Chip ID (1440h on NDS, C340h on NDS-lite)
The NDS-lite is more or less backwards compatible with the original NDS (the W_RXBUF_GAPDISP and W_TXBUF_GAPDISP are different, and most of the garbage effects on unused/mirrored ports are different, too).

4808004h - W_MODE_RST - Wifi Hardware mode / reset (R/W)
  0     Adjust some ports (0/1=see lists below) (R/W)
        TX Master Enable for LOC1..3 and Beacon  (0=Disable, 1=Enable)
  1-12  Unknown (R/W)
  13    Reset some ports (0=No change, 1=Reset/see list below) (Write-Only)
  14    Reset some ports (0=No change, 1=Reset/see list below) (Write-Only)
  15    Unknown (R/W)

4808006h - W_MODE_WEP - Wifi Software mode / Wep mode (R/W)
  0-2   specify a software mode for wifi operation
        (may be related to hardware but a correlation has not yet been found)
  3-5   specify the hardware WEP mode
        0=no WEP, 1=64bit WEP (48bit key), and 3=128bit WEP.
        (Values 2 and 4 exist too, but are nonstandard)
  6     Unknown
  8-15  Always zero

4808018h - W_MACADDR_0 - MAC Address (R/W)
480801Ah - W_MACADDR_1 - MAC Address (R/W)
480801Ch - W_MACADDR_2 - MAC Address (R/W)
48bit MAC Address of the console. Should be initialized from firmware[036h]. The hardware receives only packets that are sent to this address (or to group addresses, like FF:FF:FF:FF:FF:FF).

4808020h - W_BSSID_0 - BSSID (R/W)
4808022h - W_BSSID_1 - BSSID (R/W)
4808024h - W_BSSID_2 - BSSID (R/W)
48bit BSSID stored here. Ie. the MAC address of the host, obtained from Beacon frames (on the host itself, that should be just same as W_MACADDR). See W_RXFILTER.

4808028h - W_AID_LOW (R/W)
  Bit0-3   Maybe player-number, assuming that HW supports such? (1..15, or 0)
  Bit4-15  Not used
Usually set equal to the lower 4bit of the W_AID_FULL value.

480802Ah - W_AID_FULL - Association ID (R/W)
  Bit0-10  Association ID (AID) (1..2007, or zero)
  Bit11-15 Not used

4808032h - W_WEP_CNT - WEP Engine Enable (R/W)
  0-14  Unknown (usually zero)
  15    WEP Engine Enable  (0=Disable, 1=Enable)
[expl. I - bit15 enables/disables WEP processing of sent/received packets]
[expl. II - bit15 enables wep processing on packets which bear the WEP flag in the 802.11 header]
[expl. III - bit15 seems to react on 0-to-1 transitions]

4808044h - W_RANDOM - Random Generator (R)
  0-10  Random
  11-15 Not used (zero)
The random generator is updated at 33.51MHz rate, as such:
  X = (X AND 1) XOR (X ROL 1)  ;(rotation within 11bit range)
That random sequence goes through 5FDh different values before it restarts.
When reading from the random register, the old latched value is returned to the CPU, and the new current random value is then latched, so reads always return the older value, timed from the previous read.
Occassionally, about once every some thousand reads, the latching appears to occur 4 cycles earlier than normally, so the value on the next read will be 4 cycles older than expected.
The random register has ACTIVE mirrors.

48080BCh - W_PREAMBLE - Preamble Control (R/W)
  Bit   Dir  Expl.
  0     R/W  Unknown                    (this does NOT affect TX)
  1     R/W  Preamble (0=Long, 1=Short) (this does NOT affect TX)
  2     W    Preamble (0=Long, 1=Short) (this does affect TX) (only at 2Mbit/s)
  3-15  -    Always zero
Short preamble works only with 2Mbit/s transfer rate (ie. when set like so in TX hardware header). 1Mbit/s rate always uses long preamble.
  Type   Carrier Signal  SFD Value     PLCP Header     Data
  Long   128bit, 1Mbit   16bit, 1Mbit  48bit, 1Mbit    N bits, 1Mbit or 2Mbit
  Short  56bit, 1Mbit    16bit, 1Mbit  48bit, 2Mbit    N bits, 2Mbit
Length of the Carrier+SFD+PLCP part is thus 192us (long) or 96us (short).
Note: The Carrier+SFD+PLCP part is sent between IRQ14 and IRQ07 (not between IRQ07 and IRQ01).

Writing "0-then-1" to W_MODE_RST.Bit0 does reset following ports:
  [4808034h]=0002h ;W_INTERNAL
  [480819Ch]=0046h ;W_RF_PINS
  [4808214h]=0009h ;W_RF_STATUS
  [480827Ch]=0005h ;W_INTERNAL
  [48082A2h]=?     ;...unstable?

Writing "1-then-0" to W_MODE_RST.Bit0 does reset following ports:
  [480827Ch]=000Ah ;W_INTERNAL

Writing "1" to W_MODE_RST.Bit13 does reset following ports:
  [4808056h]=0000h ;W_RXBUF_WR_ADDR
  [48080C0h]=0000h ;W_CMD_TOTALTIME
  [48080C4h]=0000h ;W_CMD_REPLYTIME
  [48081A4h]=0000h ;W_X_1A4h
  [4808278h]=000Fh ;W_INTERNAL
  ...Also, following may be affected (results are unstable though)...
  [48080AEh]=?     ;or rather the actual port (which it is an mirror of)
  [48080BAh]=?     ;W_INTERNAL (occassionally unstable)
  [4808204h]=?     ;W_INTERNAL
  [480825Ch]=?     ;W_INTERNAL
  [4808268h]=?     ;W_RXTX_ADDR
  [4808274h]=?     ;W_INTERNAL

Writing "1" to W_MODE_RST.Bit14 does reset following ports:
  [4808006h]=0000h ;W_MODE_WEP
  [4808008h]=0000h ;W_TXSTATCNT
  [480800Ah]=0000h ;W_X_00Ah
  [4808018h]=0000h ;W_MACADDR_0
  [480801Ah]=0000h ;W_MACADDR_1
  [480801Ch]=0000h ;W_MACADDR_2
  [4808020h]=0000h ;W_BSSID_0
  [4808022h]=0000h ;W_BSSID_1
  [4808024h]=0000h ;W_BSSID_2
  [4808028h]=0000h ;W_AID_LOW
  [480802Ah]=0000h ;W_AID_FULL
  [480802Ch]=0707h ;W_TX_RETRYLIMIT
  [480802Eh]=0000h ;W_INTERNAL
  [4808050h]=4000h ;W_RXBUF_BEGIN
  [4808052h]=4800h ;W_RXBUF_END
  [4808084h]=0000h ;W_TXBUF_TIM
  [48080BCh]=0001h ;W_PREAMBLE
  [48080D0h]=0401h ;W_RXFILTER
  [48080D4h]=0001h ;W_CONFIG_0D4h
  [48080E0h]=0008h ;W_RXFILTER2
  [48080ECh]=3F03h ;W_CONFIG_0ECh
  [4808194h]=0000h ;W_TX_HDR_CNT
  [4808198h]=0000h ;W_INTERNAL
  [48081A2h]=0001h ;W_X_1A2h
  [4808224h]=0003h ;W_INTERNAL
  [4808230h]=0047h ;W_INTERNAL

  DS Wifi Interrupts

4808010h - W_IF - Wifi Interrupt Request Flags (R/W)
  0   Receive Complete  (packet received and stored in the RX fifo)
  1   Transmit Complete (packet is done being transmitted) (no matter if error)
  2   Receive Event Increment      (IRQ02, see W_RXSTAT_INC_IE)
  3   Transmit Error Increment     (IRQ03, see W_TX_ERR_COUNT)
  4   Receive Event Half-Overflow  (IRQ04, see W_RXSTAT_OVF_IE)
  5   Transmit Error Half-Overflow (IRQ05, see W_TX_ERR_COUNT.Bit7)
  6   Start Receive     (IRQ06, a packet has just started to be received)
  7   Start Transmit    (IRQ07, a packet has just started to be transmitted)
  8   Txbuf Count Expired  (IRQ08, see W_TXBUF_COUNT)
  9   Rxbuf Count Expired  (IRQ09, see W_RXBUF_COUNT)
  10  Not used (always zero, even when trying to set it with W_IF_SET)
  11  RF Wakeup            (IRQ11, see W_POWERSTATE)
  12  Multiplay ...?       (IRQ12, see W_CMD_COUNT)
  13  Post-Beacon Timeslot (IRQ13, see W_BEACONCOUNT2)
  14  Beacon Timeslot      (IRQ14, see W_BEACONCOUNT1/W_US_COMPARE)
  15  Pre-Beacon Timeslot  (IRQ15, see W_BEACONCOUNT1/W_PRE_BEACON)
Write a '1' to a bit to clear it.
The Transmit Start/Complete bits (Bit7,1) are set for EACH packet (including beacons, and including retries).

4808012h - W_IE - Wifi Interrupt Enable Flags (R/W)
  0-15  Enable Flags, same bits as W_IF  (0=Disable, 1=Enable)
In W_IE, Bit10 is R/W, but seems to have no function since IRQ10 doesn't exist.

480821Ch - W_IF_SET (W_INTERNAL) - Force Wifi Interrupt Flags (W)
  0-15  Set corresponding bits in W_IF  (0=No change, 1=Set Bit)
Notes: Bit10 cannot be set since no IRQ10 exists. This register does only set IRQ flags, but without performing special actions (such like W_BEACONCOUNT1 and W_BEACONCOUNT2 reloads that occur on real IRQ14's).

Wifi Primary IRQ Flag (IF.Bit24, Port 4000214h)
IF.Bit24 gets set <only> when (W_IF AND W_IE) changes from 0000h to non-zero.
IF.Bit24 can be reset (ack) <even> when (W_IF AND W_IE) is still non-zero.
  Caution  Caution  Caution  Caution  Caution
  That means, when acknowledging IF.Bit24, then NO FURTHER wifi IRQs
  will be executed whilst and as long as (W_IF AND W_IE) is non-zero.
One work-around is to process/acknowledge ALL wifi IRQs in a loop, including further IRQs that may occur inside of that loop, until (W_IF AND W_IE) becomes 0000h.
Another work-around (for single IRQs) would be to acknowledge IF and W_IF, and then to set W_IE temporarily to 0000h, and then back to the old W_IE setting.

  DS Wifi Power-Down Registers

4808036h - W_POWER_US (R/W)
  0     Disable W_US_COUNT and W_BB_ports  (0=Enable, 1=Disable)
  1     Unknown (usually 0)
  2-15  Always zero
Bit0=0 enables RFU by setting RFU.Pin11=HIGH, which activates the 22.000MHz oscillator on the RFU board, the 22MHz clock is then output to RFU.Pin26.

4808038h - W_POWER_TX (R/W)
transmit-related power save or sth
init from firmware[05Ch]
  0     Auto Wakeup (1=Leave Idle Mode a while after IRQ15)
  1     Auto Sleep  (0=Enter Idle Mode on IRQ13)
  2     Unknown
  3     Unknown (Write-only) (used by firmware)
  4-15  Always zero

480803Ch - W_POWERSTATE (R/W)/(R)
  0     Unknown (usually 0)                         (R/W)
  1     Request Power Enable (0=No, 1=Yes/queued)   (R/W, but not always)
  2-7   Always zero
  8     Indicates that Bit9 is about the be cleared (Read only)
  9     Current power state (0=Enabled, 1=Disabled) (Read only)
  10-15 Always zero
[value =1: queue disable power state] ;<-- seems to be incorrect
[value =2: queue enable power state] ;<-- seems to be correct
Enabling causes wakeup interrupt (IRQ11).
Note: That queue stuff seems to work only if W_POWER_US=0 and W_MODE_RST=1.

4808040h - W_POWERFORCE - Force Power State (R/W)
  0     New value for W_POWERSTATE.Bit9  (0=Clear/Delayed, 1=Set/Immediately)
  1-14  Always zero
  15    Apply Bit0 to W_POWERSTATE.Bit9  (0=No, 1=Yes)
Setting W_POWERFORCE=8001h whilst W_POWERSTATE.Bit9=0 acts immediately:
  (Doing this is okay. Switches to power down mode. Similar to IRQ13.)
  [4808034h]=0002h ;W_INTERNAL
  [480803Ch]=02xxh ;W_POWERSTATE
  [48080B0h]=0000h ;W_TXREQ_READ
  [480819Ch]=0046h ;W_RF_PINS
  [4808214h]=0009h ;W_RF_STATUS (idle)
Setting W_POWERFORCE=8000h whilst W_POWERSTATE.Bit9=1 acts delayed:
  (Don't do this. After that sequence, the hardware seems to be messed up)
  W_POWERSTATE.Bit8 gets set to indicate the pending operation,
  while pending, changes to W_POWERFORCE aren't applied to W_POWERSTATE,
  while pending, W_POWERACK becomes Read/Write-able,
  writing 0000h to W_POWERACK does clear W_POWERSTATE.Bit8,
  and does apply POWERFORCE.Bit0 to W_POWERSTATE.Bit9
  and does deactivate Port W_POWERACK again.

4808048h - W_POWER_? (R/W)
  0     Unknown
  1     Unknown
  2-15  Always zero
At whatever time (during transmit or so) it gets set to 0003h by hardware.

See also: POWCNT2, W_BB_POWER.

  DS Wifi Receive Control

4808030h - W_RXCNT - Wifi Receive Control (parts R/W and W)
  0     Copy W_RXBUF_WR_ADDR to W_RXBUF_WRCSR                        (W)
  1-3   Unknown                                                      (R/W)
  4-6   Always zero
  7     Copy [4808094h] to [4808098h], and reset [4808094h] to 0000h (W)
          Ie. Copy W_TXBUF_REPLY1 to W_TXBUF_REPLY2,
          and reset W_TXBUF_REPLY1 to 0000h
  8-14  Unknown                                                      (R/W)
  15    Enable Queuing received data to RX FIFO                      (R/W)

48080D0h - W_RXFILTER - (R/W)
  0     (0=Insist on W_BSSID, 1=Accept no matter of W_BSSID)
  1-6   Unknown (usually zero)
  7     Unknown (0 or 1)
  8     Unknown (0 or 1)
  9     Unknown (0 or 1)
  10    Unknown (0 or 1)       (when set, receives beacons, and maybe others)
  11    Unknown (usually zero)
  12    (0=Normal, 1=Accept even whatever garbage)
  13-15 Not used (always zero)
Specifies what packets to allow.
0000h = Disable receive.
FFFFh = Enable receive.
0400h = Receives managment frames (and possibly others, too)

48080E0h - W_RXFILTER2 - (R/W)
  0     Unknown (0=Receive Data Frames, 1=Ignore Data Frames) (?)
  1     Unknown
  2     Unknown
  3     Unknown (usually set)
  4-15  Not used (always zero)
Firmware writes values 08h, 0Bh, 0Dh (aka 1000b, 1011b, 1101b).
Firmware usually has bit0 set, even when receiving data frames, so, in some situations data frames seem to pass-through even when bit0 is set...? Possibly that situation is when W_BSSID matches...?
Control/PS-Poll frames seem to be passed always (even if W_RXFILTER2=0Fh).

  DS Wifi Receive Buffer

The dimensions of the circular Buffer are set with BEGIN/END values, hardware automatically wraps to BEGIN when an incremented pointer hits END address.

Write Area
Memory between WRCSR and READCSR is free for receiving data, the hardware writes incoming packets to this region (to WRCSR and up) (but without exceeding READCSR), once when it has successfully received a complete packet, the hardware moves WRCSR after the packet (aligned to a 4-byte boundary).

Read Area
Memory between READCSR and WRCSR contains received data, which can be read by the CPU via RD_ADDR and RD_DATA registers (or directly from memory). Once when having processed that data, the CPU must set READCSR to the end of it.

4808050h - W_RXBUF_BEGIN - Wifi RX Fifo start location (R/W)
4808052h - W_RXBUF_END - Wifi RX Fifo end location (R/W)
  0-15  Byte-offset in Wifi Memory (usually 4000h..5FFEh)
Although the full 16bit are R/W, only the 12bit halfword offset in Bit1-12 is actually used, the other bits seem to have no effect.
Some or all (?) of the below incrementing registers are automatically matched to begin/end, that is, after incrementing, IF adr=end THEN adr=begin.

4808054h - W_RXBUF_WRCSR - Wifi RX Fifo Write or "end" cursor (R)
  0-11  Halfword Address in RAM
  12-15 Always zero
This is a hardware controlled write location - it shows where the next packet will be written..

4808056h - W_RXBUF_WR_ADDR - Wifi RX Fifo Write Cursor Latch value (R/W)
  0-11  Halfword Address in RAM
  12-15 Always zero
This is a value that is latched into W_RXBUF_WRCSR, when the W_RXCNT latch bit (W_RXCNT.Bit0) is written.

4808058h - W_RXBUF_RD_ADDR - Wifi CircBuf Read Address (R/W)
  0     Always zero
  1-12  Halfword Address in RAM for reading via W_RXBUF_RD_DATA
  13-15 Always zero
The circular buffer limits are the same as the range specified for the receive FIFO, however the address can be set outside of that range and will only be affected by the FIFO boundary if it crosses the FIFO end location by reading from the circular buffer.

480805Ah - W_RXBUF_READCSR - Wifi RX Fifo Read or "start" cursor (R/W)
  0-11  Halfword Address in RAM
  12-15 Always zero
This value is specified the same as W_RXBUF_WRCSR - it's purely software controlled so it's up to the programmer to move the start cursor after loading a packet. if W_RXBUF_READCSR != W_RXBUF_WRCSR, then one or more packets exist in the FIFO that need to be processed. (See the section on HW RX Headers, for information on calculating packet lengths) Once a packet has been processed, the software should advance the read cursor to the beginning of the next packet.

4808060h - W_RXBUF_RD_DATA - Wifi CircBuf Read Data (R)
  0-15  Data
returns the 16bit value at the address specified by W_RXBUF_RD_ADDR, and increments W_RXBUF_RD_ADDR by 2. If the increment causes W_RXBUF_RD_ADDR to equal the address specified in W_RXBUF_END, W_RXBUF_RD_ADDR will be reset to the address specified in W_RXBUF_BEGIN.
Ports 1060h, 6060h, 7060h are PASSIVE mirrors of 0060h, reading from these mirrors returns the old latched value from previous read from 0060h, but without reading a new value from RAM, and without incrementing the address.

4808062h - W_RXBUF_GAP - Wifi RX Gap Address (R/W)
  0     Always zero
  1-12  Halfword Address in RAM
  13-15 Always zero
Seems to be intended to define a "gap" in the circular buffer, done like so:
  Addr=Addr+2 and 1FFEh  ;address increment (by W_RXBUF_RD_DATA read)
  if Addr=RXBUF_END then ;normal begin/end wrapping (done before gap wraps)
  if Addr=RXBUF_GAP then ;now gap-wrap (may include further begin/end wrap)
     if Addr>=RXBUF_END then Addr=Addr+RXBUF_BEGIN-RXBUF_END  ;wrap more
To disable the gap stuff, set both W_RXBUF_GAP and W_RXBUF_GAPDISP to zero.

4808064h - W_RXBUF_GAPDISP - Wifi RX Gap Displacement Offset (R/W)
  0-11  Halfword Offset, used with W_RXBUF_GAP (see there)
  12-15 Always zero
Caution: On the DS-Lite, after adding it to W_RXBUF_RD_ADDR, the W_RXBUF_GAPDISP setting is destroyed (reset to 0000h) by hardware. The original DS leaves W_RXBUF_GAPDISP intact.

480805Ch - W_RXBUF_COUNT (R/W)
  0-11  Decremented on reads from W_RXBUF_RD_DATA
  12-15 Always zero
Triggers IRQ09 when it reaches zero, and does then stay at zero (without further decrementing, and without generating further IRQs).
Note: Also decremented on (accidental) writes to read-only W_RXBUF_RD_DATA.

  DS Wifi Receive Statistics

48081A8h - W_RXSTAT_INC_IF - Statistics Increment Flags (R)
  0-12   Increment Flags (see Port 48081B0h..1BFh)
  13-15  Always zero
Bitmask for which statistics have been increased at least once.
Unknown how to reset/acknowledge these bits... possibly by reading from 48081A8h, or by reading from 48081B0h..1BFh, or eventually/obscurely by writing to 48081ACh.

48081AAh - W_RXSTAT_INC_IE - Statistics Increment Interrupt Enable (R/W)
  0-12   Counter Increment Interrupt Enable (see 48081B0h..1BFh) (1=Enable)
  13-15  Unknown (usually zero)
Statistic Interrupt Enable Control register for Count Up.
Note: ------> seems to trigger IRQ02 ...?

48081ACh - W_RXSTAT_OVF_IF - Statistics Half-Overflow Flags (R)
  0-12   Half-Overflow Flags (see Port 48081B0h..1BFh)
  13-15  Always zero
The W_RXSTAT_OVF_IF bits are simply containing the current bit7-value of the corresponding counters, setting or clearing that counter bits is directly reflected to W_RXSTAT_OVF_IF.
The recommended way to acknowledge W_RXSTAT_OVF_IF is to read the corresponding counters (which are reset to 00h after reading). For some reason, the firmware is additionally writing FFFFh to W_RXSTAT_OVF_IF (that is possibly a bug, or it does acknowlege something internally?).

48081AEh - W_RXSTAT_OVF_IE - Statistics Half-Overflow Interrupt Enable (R/W)
  0-12   Half-Overflow Interrupt Enable (see Port 48081B0h..1BFh) (1=Enable)
  13-15  Unknown (usually zero)
Statistic Interrupt Enable for Overflow, bits same as in W_RXSTAT_INC_IE
Note: ------> seems to trigger IRQ04 ...?

48081B0h..1BFh - W_RXSTAT - Receive Statistics (R/W, except 1B5h: Read-only)
W_RXSTAT is a collection of 8bit counters, which are incremented upon certain events. These entries are automatically reset to 0000h after reading. Should be accessed with LDRH opcodes (using LDRB to read only 8bits does work, but the read is internally expanded to 16bit, and so, the whole 16bit value will be reset to 0000h).
  Port      Dir  Bit  Expl.
  48081B0h  R/W  0    W_RXSTAT  ?
  48081B1h  -    -    Always 0  -
  48081B2h  R/W  1    W_RXSTAT  ?    "RX_RateErrorCount"
  48081B3h  R/W  2    W_RXSTAT  Length>2348 error
  48081B4h  R/W  3    W_RXSTAT  RXBUF Full error
  48081B5h  R    4?   W_RXSTAT  ?    (R) (but seems to exist; used by firmware)
  48081B6h  R/W  5    W_RXSTAT  Length=0 or Wrong FCS Error
  48081B7h  R/W  6    W_RXSTAT  Packet Received Okay
                             (also increments on W_MACADDR mis-match)
                             (also increments on internal ACK packets)
                             (also increments on invalid IEEE type=3)
                             (also increments TOGETHER with 1BCh and 1BEh)
                             (not incremented on RXBUF_FULL error)
  48081B8h  R/W  7    W_RXSTAT  ?
  48081B9h  -    -    Always 0  -
  48081BAh  R/W  8    W_RXSTAT  ?
  48081BBh  -    -    Always 0  -
  48081BCh  R/W  9    W_RXSTAT  WEP Error (when FC.Bit14 is set)
  48081BDh  R/W  10   W_RXSTAT  ?
  48081BEh  R/W  11   W_RXSTAT  (duplicated sequence control)
  48081BFh  R/W  12   W_RXSTAT  ?

48081C4h - W_RX_COUNT (W_INTERNAL) (R)
  0-?   Receive Okay Count (increments together with ports 48081B4h, 48081B7h)
  8-?   Receive Error Count (increments together with ports 48081B3h, 48081B6h)
Increments when receiving a packet. Automatically reset to zero after reading.

48081D0h..1DFh - W_CMD_STAT - Multiplay Response Error Counters (R/W)
The multiplay error counters are only used when sending a multiplay command (via W_TXBUF_CMD) to any connected slaves (which must be indicated by flags located in the second halfword of the multiplay command's frame body).
  48081D0h        Not used (always zero)
  48081D1h..1DFh  Client 1..15 Response Error (increments on missing replies)
If one or more of those slaves fail to respond, then the corresponding error counters get incremented (at the master side). Automatically reset to zero after reading.
Unknown if these counters do also increment at the slave side?

  DS Wifi Transmit Control

48080ACh - W_TXREQ_RESET - Reset Transfer Request Bits (W)
  0-3   Reset corresponding bits in W_TXREQ_READ (0=No change, 1=Reset)
  4-15  Unknown (if any)
Firmware writes values 01h,02h,08h,0Dh, and FFFFh.

48080AEh - W_TXREQ_SET - Set Transfer Request Bits (W)
  0-3   Set corresponding bits in W_TXREQ_READ (0=No change, 1=Set)
  4-15  Unknown (if any)
Firmware writes values 01h,02h,05h,08h,0Dh.

48080B0h - W_TXREQ_READ - Get Transfer Request Bits (R)
  0     Send W_TXBUF_LOC1  (1=Transfer, if enabled in W_TXBUF_LOC1.Bit15)
  1     Send W_TXBUF_CMD   (1=Transfer, if enabled in W_TXBUF_CMD.Bit15)
  2     Send W_TXBUF_LOC2  (1=Transfer, if enabled in W_TXBUF_LOC2.Bit15)
  3     Send W_TXBUF_LOC3  (1=Transfer, if enabled in W_TXBUF_LOC3.Bit15)
  4     Unknown (seems to be always 1) (never used by firmware part4)
          Ah, except... Bit4 can be cleared via W_POWERFORCE
  5-15  Unknown/Not used
Bit0-3 can be set/reset via W_TXREQ_SET/W_TXREQ_RESET. The setting in W_TXREQ_READ remains intact even after the transfer(s) have completed.
If more than one of the LOC1,2,3 bits is set, then LOC3 is transferred first, LOC1 last. Beacons are transferred in every Beacon Timeslot (if enabled in W_TXBUF_BEACON.Bit15).
Bit0,2,3 are automatically reset upon IRQ14 (by hardware).

48080B6h - W_TXBUSY (R)
  0     W_TXBUF_LOC1  (1=Requested Transfer busy, or not yet started at all)
  1     W_TXBUF_CMD   (1=Requested Transfer busy, or not yet started at all)
  2     W_TXBUF_LOC2  (1=Requested Transfer busy, or not yet started at all)
  3     W_TXBUF_LOC3  (1=Requested Transfer busy, or not yet started at all)
  4     W_TXBUF_BEACON  (1=Beacon Transfer busy)
  5-15  Unknown (if any)
Busy bits. If all three W_TXBUF_LOC's are sent, then it goes through values 0Dh,05h,01h,00h; ie. LOC3 is transferred first, LOC1 last. The register is updated upon IRQ01 (by hardware).
Bit4 is set only in Beacon Timeslots.

48080B8h - W_TXSTAT - RESULT - Status of transmitted frame (R)
For LOC1-3, this register is updated at the end of a transfer (upon the IRQ01 request), if retries occur then it is updated only after the final retry.
For BEACON, this register is updated only if enabled in W_TXSTATCNT.Bit15, and only after successful transfers (since beacon errors result in infinite retries).
For CMD, this register is updated only if enabled in W_TXSTATCNT.Bit13,14).
Bit0/1 act similar to W_IF Bit1/3, however, the W_IF Bits are set after each transmit (including retries).
  0     One (or more) Packet has Completed (1=Yes)
        (No matter if successful, for that info see Bit1)
        (No matter if ALL packets are done, for that info see Bit12-13)
  1     Packet Failed (1=Error)
  2-7   Unknown/Not used
  8-11  Usually 0, ...but firmware is checking for values 03h,08h,0Bh
        (gets set to 07h when transferred W_TXBUF_LOC1/2/3 did have Bit12=set)
        (gets set to 00h otherwise)
        (gets set to 03h after beacons; if enabled in W_TXSTATCNT.Bit15)
        (gets set to 08h or 0Bh after CMD; depending on W_TXSTATCNT.Bit13,14)
  12-13 Packet which has updated W_TXSTAT (0=LOC1/BEACON/CMD, 1=LOC2, 2=LOC3)
  14-15 Unknown/Not used
No idea how to reset bit0/1 once when they are set?

4808008h - W_TXSTATCNT (R/W)
  0-12  Unknown (usually zero)
  13    Update W_TXSTAT=0B01h and trigger IRQ01 after CMD transmits  (1=Yes)
  14    Update W_TXSTAT=0800h and trigger IRQ01 after CMD transmits  (1=Yes)
  15    Update W_TXSTAT and trigger IRQ01 after BEACON transmits (0=No, 1=Yes)
If both Bit13 and Bit14 are set, then Bit13 is having priority.
Note: LOC1..3 transmits are always updating W_TXSTAT and triggering IRQ01.

4808194h - W_TX_HDR_CNT - Disable Transmit Header Adjustments (R/W)
  0     IEEE FC.Bit12 and Duration (0=Auto/whatever, 1=Manual/Wifi RAM)
  1     IEEE Frame Check Sequence  (0=Auto/FCS/CRC32, 1=Manual/Wifi RAM)
  2     IEEE Sequence Control      (0=Auto/W_TX_SEQNO, 1=Manual/Wifi RAM)
  3-15  Always zero
Allows to disable automatic adjustments of the IEEE header and checksum.
Note: W_TX_SEQNO can be also disabled by W_TXBUF_LOCn.Bit13 and by TXHDR[04h].

4808210h - W_TX_SEQNO - Transmit Sequence Number (R)
  0-11   Increments on IRQ07 (Transmit Start Interrupt)
  12-15  Always zero
Also incremented shortly after IRQ12.
When enabled in W_TXBUF_LOCn.Bit13, this value replaces the upper 12bit of the IEEE Frame Header's Sequence Control value (otherwise, when disabled, the original value in Wifi RAM is used, and, in that case, W_TX_SEQNO is NOT incremented).
Aside from W_TXBUF_LOCn.Bit13, other ways to disable W_TX_SEQNO are: Transmit Hardware Header entry TXHDR[04h], and W_TX_HDR_CNT.Bit2.

  DS Wifi Transmit Buffers

4808068h - W_TXBUF_WR_ADDR - Wifi CircBuf Write Address (R/W)
  0     Always zero
  1-12  Halfword Address in RAM for Writes via W_TXBUF_WR_DATA
  13-15 Always zero

4808070h - W_TXBUF_WR_DATA - Wifi CircBuf Write Data (W)
  0-15  Data to be written to address specified in W_TXBUF_WR_ADDR
After writing to this register, W_TXBUF_WR_ADDR is automatically incremented by 2, and, if it gets equal to W_TXBUF_GAP, then it gets additonally incremented by W_TXBUF_GAPDISP*2.

4808074h - W_TXBUF_GAP - Wifi CircBuf Write Top (R/W)
  0     Always zero
  1-12  Halfword Address
  13-15 Always zero

4808076h - W_TXBUF_GAPDISP - CircBuf Write Offset from Top to Bottom (R/W)
  0-11  Halfword Offset (added to; if equal to W_TXBUF_GAP)
  12-15 Always zero
Should be "0-write_buffer_size" (wrap from end to begin), or zero (no wrapping).
Caution: On the DS-Lite, after adding it to W_TXBUF_WR_ADDR, the W_TXBUF_GAPDISP setting is destroyed (reset to 0000h) by hardware. The original DS leaves W_TXBUF_GAPDISP intact.

Note: W_TXBUF_GAP and W_TXBUF_GAPDISP may be (not TOO probably) also used by transmits via W_TXBUF_LOCn and W_TXBUF_BEACON (not tested).

4808080h - W_TXBUF_BEACON - Beacon Transmit Location (R/W)
4808090h - W_TXBUF_CMD - Multiplay Command Transmit Location (R/W)
48080A0h - W_TXBUF_LOC1 - Transmit location 1 (R/W)
48080A4h - W_TXBUF_LOC2 - Transmit location 2 (R/W)
48080A8h - W_TXBUF_LOC3 - Transmit location 3 (R/W)
  0-11  Halfword Address of TX Frame Header in RAM
  12    For LOC1-3: When set, W_TXSTAT.bit8-10 are set to 07h after transfer
                    And, when set, the transferred frame-body gets messed up?
        For BEACON: Unknown, no effect on W_TXSTAT
        For CMD: Unknown, no effect on W_TXSTAT
  13    IEEE Sequence Control (0=From W_TX_SEQNO, 1=Value in Wifi RAM)
        For BEACON: Unknown (always uses W_TX_SEQNO) (no matter of bit13)
  14    Unknown
  15    Transfer Request (1=Request/Pending)
For LOC1..3 and CMD, Bit15 is automatically cleared after (or rather: during?) transfer (no matter if the transfer was successful). For Beacons, bit15 is kept unchanged since beacons are intended to be transferred repeatedly.
The purpose of W_TXBUF_CMD is unknown... maybe for automatic replies...? Pictochat seems to use it for host-to-client data frames. W_TXBUF_CMD.Bit15 can be set ONLY while W_CMD_COUNT is non-zero.

48080B4h - W_TXBUF_RESET (W)
  0     Disable LOC1    (0=No change, 1=Reset W_TXBUF_LOC1.Bit15)
  1     Disable CMD     (0=No change, 1=Reset W_TXBUF_CMD.Bit15)
  2     Disable LOC2    (0=No change, 1=Reset W_TXBUF_LOC2.Bit15)
  3     Disable LOC3    (0=No change, 1=Reset W_TXBUF_LOC3.Bit15)
  4-5   Unknown/Not used
  6     Disable REPLY2  (0=No change, 1=Reset W_TXBUF_REPLY2.Bit15)
  7     Disable REPLY1  (0=No change, 1=Reset W_TXBUF_REPLY1.Bit15)
  8-15  Unknown/Not used
Firmware writes values FFFFh, 40h, 02h, xxxx, 09h, 01h, 02h, C0h.

4808084h - W_TXBUF_TIM - Beacon TIM Index in Frame Body (R/W)
  0-7   Location of TIM parameters within Beacon Frame Body
  8-15  Not used/zero
Usually set to 15h, that assuming that preceding Frame Body content is: Timestamp(8), BeaconInterval(2), Capability(2), SuppRatesTagLenParams(4), ChannelTagLenParam(3), TimTagLen(2); so the value points to TimParams (ie. after TimTagLen).

480806Ch - W_TXBUF_COUNT (R/W)
  0-11  Decremented on writes to W_TXBUF_WR_DATA
  12-15 Always zero
Triggers IRQ08 when it reaches zero, and does then stay at zero (without further decrementing, and without generating further IRQs).
Note: Not affected by (accidental) reads from write-only W_TXBUF_WR_DATA.

  DS Wifi Transmit Errors

Automatic ACKs
Transmit errors occur on missing ACKs. The NDS hardware is automatically responding with an ACK when receiving a packet (if it has been addressed to the receipients W_MACADDR setting). And, when sending a packet, the NDS hardware is automatically checking for ACK responses.
The only exception are packets that are sent to group addresses (ie. Bit0 of the 48bit MAC address being set to "1", eg. Beacons sent to FF:FF:FF:FF:FF:FF), the receipient(s) don't need to respond to such packets, and the sender always passes okay without checking for ACKs.

480802Ch - W_TX_RETRYLIMIT (R/W)
Specifies the maximum number of retries on Transmit Errors (eg. 07h means one initial transmit, plus up to 7 retries, ie. max 8 transmits in total).
  0-7   Retry Count (usually 07h)
  8-15  Unknown     (usually 07h)
The Retry Count value is decremented on each Error (unless it is already 00h). There's no automatic reload, so W_TX_RETRYLIMIT should be reinitialized by software prior to each transmit (or, actually, there IS a reload?).
When sending multiple packets (by setting more than one bit with W_TXREQ_SET), then the first packet may eat-up all retries, leaving only a single try to the other packet(s).

48081C0h - W_TX_ERR_COUNT - TransmitErrorCount (R/W)
  0-7   TransmitErrorCount
  8-15  Always zero
Increments on Transmit Errors. Automatically reset to zero after reading.
IRQ03 triggered when W_TX_ERR_COUNT is incremented (for NON-beacons ONLY).
IRQ05 triggered when W_TX_ERR_COUNT > 7Fh (happens INCLUDING for beacons).

Error Notification
Transmit Errors can be sensed via W_TX_ERR_COUNT, IRQ03, IRQ05, TX Hardware Header entry [00h], and W_TXSTAT.Bit1.

As the name says, W_TXBUF_BEACON is intended for sending Beacons to group addresses (which do not require to respond by ACKs). So, transmit errors would occur only when mis-using W_TXBUF_BEACON to send packets to individual addresses, but the W_TXBUF_BEACON error handling isn't fully implemented:
First of, W_TX_RETRYLIMIT isn't used, instead, W_TXBUF_BEACON errors will result in infinite retries.
Moreover, W_TXBUF_BEACON errors seem to increment W_TX_ERR_COUNT, but without generating IRQ03, however, IRQ05 is generated when W_TX_ERR_COUNT>7Fh.

Other Errors
The NDS transmit hardware seems to do little error checking on the packet headers. The only known error-checked part is byte [04h] in the TX hardware header (which must be 00h, 01h, or 02h). Aside from that, when sent to a group address, it is passing okay even with invalid IEEE type/subtypes, and even with Length/Rate entries set to zero. However, when sending such data to an individual address, the receiving NDS won't respond by ACKs.

Received ACKs aren't stored in WifiRAM (or, possibly, they ARE stored, but without advancing W_RXBUF_WRCSR, so that the software won't see them, and so that they will be overwritten by the next packet).

  DS Wifi Status

480819Ch - W_RF_PINS - Status of RF-Chip Control Signals (R)
  0    Reportedly "carrier sense" (maybe 1 during RX.DTA?) (usually 0)
  1    TX.MAIN (RFU.Pin17) Transmit Data Phase          (0=No, 1=Active)
  2    Unknown (RFU.Pin3)  Seems to be always high      (Always 1=high?)
  3-5  Not used                                         (Always zero)
  6    TX.ON   (RFU.Pin14) Transmit Preamble+Data Phase (0=No, 1=Active)
         Uhhh, no that seems to be still wrong...
         Bit6 is often set, even when not transmitting anything...
  7    RX.ON   (RFU.Pin15) Receive Mode                 (0=No, 1=Enabled)
  8-15 Not used                                         (Always zero)
Physical state of the RFU board's RX/TX pins. Similar to W_RF_STATUS.

4808214h - W_RF_STATUS - Current Transmit/Receive State (R)
  0-3  Current Transmit/Receive State:
        0 = Initial Value on power-up (before raising W_MODE_RST.Bit0)
        1 = RX Mode enabled (waiting for incoming data)
        2 = Switching from RX to TX (takes a few clock cycles)
        3 = TX Mode active  (sending preamble and data)
        4 = Switching from TX to RX (takes a few clock cycles)
        5 = Unknown, firmware checks for that value (maybe RX busy)
        6 = Unknown, firmware checks for that value (maybe RX busy)
        9 = Idle (upon IRQ13, and upon raising W_MODE_RST.Bit0)
        5 = Receive ACK phase ?
        6 =
        7 =
        8 = Multiplay related ? (when sending through W_TXBUF_CMD ?)
  4-15 Always zero?
Numeric Status Code. Similar to W_RF_PINS.

4808268h - W_RXTX_ADDR - Current Receive/Transmit Address (R)
  0-11   Halfword address
  12-15  Always zero
Indicates the halfword that is currently transmitted or received. Can be used by Start Receive IRQ06 handler to determine how many halfwords of the packet have been already received (allowing to pre-examine portions of the packet header even when the whole packet isn't fully received). Can be also used in Transmit Start IRQ07 handler to determine which packet is currently transmitted.

  DS Wifi Timers

48080E8h - W_US_COUNTCNT - Microsecond counter enable (R/W)
  0     Counter Enable (0=Disable, 1=Enable)
  1-15  Always zero
Activates W_US_COUNT, and also W_BEACONCOUNT1 and W_BEACONCOUNT2 (which are decremented when lower 10bit of W_US_COUNT wrap from 3FFh to 000h). Note: W_POWER_US must be enabled, too.

48080F8h - W_US_COUNT0 - Microsecond counter, bits 0-15 (R/W)
48080FAh - W_US_COUNT1 - Microsecond counter, bits 16-31 (R/W)
48080FCh - W_US_COUNT2 - Microsecond counter, bits 32-47 (R/W)
48080FEh - W_US_COUNT3 - Microsecond counter, bits 48-63 (R/W)
  0-63  Counter Value in microseconds (incrementing)
Clocked by the 22.00MHz oscillator on the RFU board (ie. not by the 33.51MHz system clock). The 22.00MHz are divided by a 22-step prescaler.

48080EAh - W_US_COMPARECNT - Microsecond compare enable (R/W)
  0     Compare Enable (0=Disable, 1=Enable) (IRQ14/IRQ15)
  1     Force IRQ14    (0=No, 1=Force Now)   (Write-only)
  2-15  Always zero
Activates IRQ14 on W_US_COMPARE matches, and IRQ14/IRQ15 on W_BEACONCOUNT1.

48080F0h - W_US_COMPARE0 - Microsecond compare, bits 0-15 (R/W)
48080F2h - W_US_COMPARE1 - Microsecond compare, bits 16-31 (R/W)
48080F4h - W_US_COMPARE2 - Microsecond compare, bits 32-47 (R/W)
48080F6h - W_US_COMPARE3 - Microsecond compare, bits 48-63 (R/W)
  0     Always zero... firmware writes 1 though (maybe write-only flag?)
  1-9   Always zero
  10-63 Compare Value in milliseconds (aka microseconds/1024)
Triggers IRQ14 (see IRQ14 notes below) when W_US_COMPARE matches W_US_COUNT.
Usually set to FFFFFFFFFFFFFC00h (ie. almost/practically never). Instead, IRQ14 is usually derived via W_BEACONCOUNT1.

480811Ch - W_BEACONCOUNT1 (R/W)
Triggers IRQ14 and IRQ15 (see IRQ14/IRQ15 notes below) when it reaches 0000h (if W_PRE_BEACON is non-zero, then IRQ15 occurs that many microseconds in advance).
  0-15  Decrementing Millisecond Counter (reloaded with W_BEACONINT upon IRQ14)
Set to W_BEACONINT upon IRQ14 events (unlike the other W_US_COMPARE related actions, this is done always, even if W_US_COMPARECNT is zero).
When reaching 0000h, it is immediately reloaded (as for US_COUNT matches), so the counting sequence is ..,3,2,1,BEACONINT,.. (not 3,2,1,ZERO,BEACONINT).

4808134h - W_BEACONCOUNT2 - Post-Beacon Counter (R/W)
  0-15  Decrementing Millisecond Counter (reloaded with FFFFh upon IRQ14)
Triggers IRQ13 when it reaches 0000h (no matter of W_US_COMPARECNT), and does then stay fixed at 0000h (without any further decrement/wrapping to FFFFh).
Set to FFFFh upon IRQ14 (by hardware), the IRQ14 handler should then adjust the register (by software) by adding the Tag DDh Beacon header's Stepping value (usually 000Ah) to it.
Seems to be used to indicate beacon transmission time (possible including additional time being reserved for responses)?

480808Ch - W_BEACONINT - Beacon Interval (R/W)
Reload value for W_BEACONCOUNT1.
  0-9   Frequency in milliseconds of beacon transmission
  10-15 Always zero
Should be initialized randomly to 0CEh..0DEh or so. The random setting reduces risk of repeated overlaps with beacons from other hosts.

4808110h - W_PRE_BEACON - Pre-Beacon Time (R/W)
  0-15  Pre-Beacon Time in microseconds (static value, ie. NOT decrementing)
Allows to define the distance between IRQ15 and IRQ14. The setting doesn't affect the IRQ14 timing (which occurs at the W_BEACONCOUNT1'th millisecond boundary), but IRQ15 occurs in advance (at the W_BEACONCOUNT1'th millisecond boundary minus W_PRE_BEACON microseconds). If W_PRE_BEACON is zero, then both IRQ14 and IRQ15 occur exactly at the same time.

4808088h - W_LISTENCOUNT - Listen Count (R/W)
  0-7   Decremented by hardware at IRQ14 events (ie. once every beacon)
  8-15  Always zero
Reload occurs immediately BEFORE decrement, ie. with W_LISTENINT=04h, it will go through values 03h,02h,01h,00h,03h,02h,01h,00h,etc.

480808Eh - W_LISTENINT - Listen Interval (R/W)
  0-7   Listen Interval, counted in beacons (usually 02h)
  8-15  Always zero
Reload value for W_LISTENCOUNT.

480810Ch - W_CONTENTFREE (R/W)
  0-15  Decrementing microsecond counter
Operated always (no matter of W_US_COUNTCNT).
Once when it has reached 0000h, it seems to stay fixed at 0000h.
"[Set to the remaining duration of contention-free period when
receiving beacons - only *really* necessary for powersaving mode]"

IRQ13 Notes (Post-Beacon Interrupt)
IRQ13 is generated by W_BEACONCOUNT2. It's simply doing:
  W_IF.Bit13=1      ;interrupt request
If W_POWER_TX.Bit1=0, then additionally enter sleep mode:
  [4808034h]=0002h ;W_INTERNAL   ;(similar to W_POWERFORCE=8001h)
  [480803Ch]=02xxh ;W_POWERSTATE ;(W_TXREQ_READ.Bit4 is kept intact though)
  [480819Ch]=0046h ;W_RF_PINS.7=0;disable receive (enter idle mode) (RX.ON=Low)
  [4808214h]=0009h ;W_RF_STATUS=9;indicate idle mode
Unlike for IRQ14/IRQ15, that's done no matter of W_US_COMPARECNT.

IRQ14 Notes (Beacon Interrupt)
IRQ14 is generated by W_US_COMPARE, and by W_BEACONCOUNT1.
Aside from just setting the IRQ flag in W_IF, the hardware does:
  W_BEACONCOUNT1=W_BEACONINT                             ;next IRQ15/IRQ14
  (Above is NOT done when IRQ14 was forced via W_US_COMPARECNT.Bit1)
If W_US_COMPARECNT is 1, then the hardware does additionally:
  (Below IS ALSO DONE when IRQ14 was forced via W_US_COMPARECNT.Bit1)
  W_BEACONCOUNT2=FFFFh ;about 64 secs (ie. almost never) ;next IRQ13 ("never")
  if W_TXBUF_BEACON.15 then W_TXBUSY.Bit4=1
If W_TXBUF_BEACON.Bit15=1, then following is done shortly after IRQ14:
  W_RF_PINS.Bit7=0  ;disable receive (RX.ON=Low)
  W_RF_STATUS=2     ;indicate switching from RX to TX mode
If W_TXBUF_BEACON.Bit15=1, then following is done a bit later:
  W_RF_PINS.Bit6=1  ;transmit preamble start (TX.ON=High)
  W_RF_STATUS=3     ;indicate TX mode
The IRQ14 handler should then do the following (by software):
  W_BEACONCOUNT2 = W_BEACONCOUNT2 + TagDDhSteppingValue  ;next IRQ13
For using only ONE of the two IRQ14 sources: W_BEACONCOUNT1 can be disabled by setting both W_BEACONCOUNT1 and W_BEACONINT to zero. W_US_COMPARE can be sorts of "disabled" by setting it to value distant from W_US_COUNT, such like compare=count-400h.

IRQ07 Notes (Transmit Start Data; occurs after preamble)
  W_IF.Bit7=1       ;interrupt request
  W_RF_PINS.Bit1=1  ;start data transfer (preamble finished now) (TX.MAIN=High)
Below only if packet was sent through W_TXBUF_BEACON, or if it was sent via W_TXBUF_LOCn, with W_TXBUF_LOCn.Bit13 being zero:
  [TXBUF...] = W_TX_SEQNO*10h   ;auto-adjust IEEE Sequence Control
  W_TX_SEQNO=W_TX_SEQNO+1       ;increase sequence number

IRQ01 Notes (Transmit Done)
The following happens shortly before IRQ01:
  W_RF_PINS.Bit6=0  ;disable TX (TX.ON=Low)
  W_RF_STATUS=4     ;indicate switching from TX to RX mode
Then, upon IRQ01, the following happens:
  W_IF.Bit1=1       ;interrupt request
  W_RF_PINS.Bit1=0  ;disable TX (TX.MAIN=Low)
  W_RF_PINS.Bit7=1  ;enable RX (RX.ON=High)
  W_RF_STATUS=1     ;indicate RX mode

IRQ15 Notes (Pre-Beacon Interrupt)
IRQ15 is generated via W_BEACONCOUNT1 and W_PRE_BEACON. It's simply doing:
  if W_US_COMPARECNT=1 then W_IF.Bit15=1
If W_POWER_TX.Bit0=1, then additionally wakeup from sleep mode:
  W_RF_PINS.Bit7=1  ;enable RX (RX.ON=High) ;\gets set like so a good while
  W_RF_STATUS=1     ;indicate RX mode       ;/after IRQ15 (but not immediately)

Beacon IRQ Sequence
  IRQ15  Pre-Beacon  (beacon will be transferred soon)
  IRQ14  Beacon      (beacon will be transferred very soon) (carrier starts)
  IRQ07  Tx Start    (beacon transfer starts) (if enabled in W_TXBUF_BEACON.15)
  IRQ01  Tx End      (beacon transfer done) (if enabled in W_TXSTATCNT.15)
  IRQ13  Post-Beacon (beacon transferred) (unless next IRQ14 occurs earlier)
That, for tranmitting beacons. (For receiving, IRQ07/IRQ01 would be replaced by Rx IRQ's, provided that a remote unit is sending beacons).

  DS Wifi Multiplay Master

These registers are used for multiplay host-to-client (aka master to slave) commands.

  0     Enable W_CMD_COUNT (0=Disable, 1=Enable)
  1-15  Always Zero

4808118h - W_CMD_COUNT (R/W)
  0-15  Decremented once every 10 microseconds (Stopped at 0000h)
Written by firmware. Firmware IRQ14 handler checks for read value<=0Ah.
When it reaches zero, W_TXBUF_CMD is transferred (if enabled in W_TXBUF_CMD.Bit15, and in W_TXREQ_READ.Bit1), it does then trigger two (!) transfer start interrupts (IRQ07), transfer end is then indicated by a single IRQ12, optionally (when enabled in W_TXSTATCNT, IRQ01 (transfer done) is additionally generated (simultaneously with above IRQ12).
NOPE, above isn't quite right..... when W_CMD_COUNT is set to a very small value, then ONLY IRQ12 is triggered (so it might specify the duration during which the IRQ07's for W_TXBUF_CMD are allowed?)

48080C0h - W_CMD_TOTALTIME - (R/W)
  0-15  Duration per ALL slave response packet(s) in microseconds
Before sending a MASTER packet, this port should be set to the same value as the MASTER packet's IEEE header's Duration/ID entry.

48080C4h - W_CMD_REPLYTIME - (R/W)
  0-15  Duration per SINGLE slave response packet in microseconds
Before sending a MASTER packet, this port should be set to the expected per slave response time.
Note: Nintendo's multiboot/pictochat code is also putting this value in the 1st halfword of the MASTER packet's frame body.

At 2MBit/s transfer rate, the values should be set up sorts of like so:
  master_time = (master_bytes*4)+(60h)     ;60h = 96 decimal = short preamble
  slave_time = (slave_bytes*4)+(0D0h..0D2h)
  all_slave_time = (EAh..F0h)+(slave_time+0Ah)*num_slaves
  txhdr[2]   = slave_bits      ;hardware header (*)
  ieee[2]    = all_slave_time  ;ieee header (duration/id)
  body[0]    = slave_time      ;duration per slave (for multiboot/pictochat)
  body[2]    = slave_bits      ;frame body -- required (*)
  [48080C0h] = all_slave_time  ;
  [48080C4h] = slave_time      ;duration per slave
  [4808118h] = (388h+(num_slaves*slave_time)+master_time+32h)/10
  [4808090h] = 8000h+master_packet_address   ;start transmit
With the byte values counting the ieee frame header+body+fcs.
(*) The hardware doesn't actually seem to use the "slave_bits" entry in the hardware header, instead, it is using the "slave_bits" entry in the frame body(!)

  DS Wifi Multiplay Slave

These registers are used for multiplay client-to-host (aka slave to master) responses.

4808094h - W_TXBUF_REPLY1 - Multiplay Response Transmit Location 1 (R/W)
  0-11  Halfword address
  12-14 Unknown (the bits can be set, ie. they DO exist)
  15    Enable
Response packet address. The register setting probably doesn't directly affect the hardware, it's sole purpose seems to initialize 4808098h (see there).

4808098h - W_TXBUF_REPLY2 - Multiplay Response Transmit Location 2 (R)
  0-11  Halfword address
  12-14 Unknown (the bits can be set, ie. they DO exist)
  15    Enable
This register seems to contain the actual response packet address. However, since it's read-only, software cannot set it directly. Instead, software must write the address to 4808094h, and then latch it from 4808094h to 4808098h (via. W_RXCNT.Bit7).

Not sure if there's also auto-latching (similar to manual W_RXCNT.Bit7)?
Unknown if W_TXBUF_REPLY2.Bit15 is automatically reset after transfer?
Not sure if/how the hardware determines WHEN to send reply packets (eg. it should NOT send them after receiving Beacons) (eventually the Start Receive IRQ handler must examine the incoming packet, and then software must decide if it wants to respond by sending the reply) (if there are multiple slaves, the response order is probably automatically handled in respect to the local W_AID_LOW setting) (although, if, for example, ONLY slave 5 exists, then it ought to know that slave 5 is the <first> slave; that might happen if slave 1..4 have left the communication; that, unless the slaves would be automatically renumbered by software (?), so slave 5 would be become slave 1). Some of the Unknown Registers (namely Ports W_X_244h and W_X_228h) are probably also related to the REPLY function.

  DS Wifi Configuration Ports

4808120h - W_CONFIG_120h (R/W) ;81ff 0048->SAME ...init from firmware[04Ch]
4808122h - W_CONFIG_122h (R/W) ;ffff 4840->SAME ...init from firmware[04Eh]
4808124h - W_CONFIG_124h (R/W) ;ffff 0000->0032 ...init from firmware[05Eh]
4808128h - W_CONFIG_128h (R/W) ;ffff 0000->01F4 ...init from firmware[060h]
4808130h - W_CONFIG_130h (R/W) ;0fff 0142->0140 ...init from firmware[054h]
4808132h - W_CONFIG_132h (R/W) ;8fff 8064->SAME ...init from firmware[056h]
4808140h - W_CONFIG_140h (R/W) ;ffff 0000->E0E0 ...init from firmware[058h]
4808142h - W_CONFIG_142h (R/W) ;ffff 2443->SAME ...init from firmware[05Ah]
4808144h - W_CONFIG_144h (R/W) ;00ff 0042->SAME ...init from firmware[052h]
4808146h - W_CONFIG_146h (R/W) ;00ff 0016->0002 ...init from firmware[044h]
4808148h - W_CONFIG_148h (R/W) ;00ff 0016->0017 ...init from firmware[046h]
480814Ah - W_CONFIG_14Ah (R/W) ;00ff 0016->0026 ...init from firmware[048h]
480814Ch - W_CONFIG_14Ch (R/W) ;ffff 162C->1818 ...init from firmware[04Ah]
4808150h - W_CONFIG_150h (R/W) ;ff3f 0204->0101 ...init from firmware[062h]
4808154h - W_CONFIG_154h (R/W) ;7a7f 0058->SAME ...init from firmware[050h]
These ports are to be initialized from firmware settings.
Above comments show the R/W bits (eg. 81FFh means bit15 and bit8-0 are R/W, bit14-9 are always zero), followed by the initial value on Reset (eg. 0048h), followed by new value after initialization from firmware settings (eg. 0032h, or SAME if the Firmware value is equal to the Reset value), followed by the location in firmware where the new value comes from (these values seem to be identical in all currently existing consoles).
Note: Firmware part4 changes W_CONFIG_124h to C8h, and W_CONFIG_128h to 7D0h, and W_CONFIG_150h to 202h, and W_CONFIG_140h depending on tx rate and preamble:
  W_CONFIG_140h = firmware[058h]+0202h             ;1Mbit/s
  W_CONFIG_140h = firmware[058h]+0202h-6161h       ;2Mbit/s with long preamble
  W_CONFIG_140h = firmware[058h]+0202h-6161h-6060h ;2Mbit/s with short preamble

48080ECh - W_CONFIG_0ECh (R/W) ;firmware writes 3F03h (same as on power-up)
48080D4h - W_CONFIG_0D4h (R/W) ;firmware writes 0003h (affectd by W_MODE_RST)
48080D8h - W_CONFIG_0D8h (R/W) ;firmware writes 0004h (same as on power-up)
48080DAh - W_CONFIG_0DAh (R/W) ;firmware writes 0602h (same as on power-up)
4808254h - W_CONFIG_254h (?) ;firmware writes 0000h (read: EEEEh on DS-Lite)
Firmware just initializes these ports with fixed values, without further using them after initialization.

  DS Wifi Baseband Chip (BB)

BB-Chip Mitsumi MM3155 (DS), or BB/RF-Chip Mitsumi MM3218 (DS-Lite)

4808158h - W_BB_CNT - Baseband serial transfer control (W)
  0-7   Index     (00h-68h)
  8-11  Not used  (should be zero)
  12-15 Direction (5=Write BB_WRITE to Chip, 6=Read from Chip to BB_READ)
Transfer is started after writing to this register.

480815Ah - W_BB_WRITE - Baseband serial write data (W)
  0-7   Data to be sent to chip (by following W_BB_CNT transfer)
  8-15  Not used (should be zero)

480815Ch - W_BB_READ - Baseband serial read data (R)
  0-7   Data received from chip (from previous W_BB_CNT transfer)
  8-15  Not used (always zero)

480815Eh - W_BB_BUSY - Baseband serial busy flag (R)
  0     Transfer Busy (0=Ready, 1=Busy)
  1-15  Always zero
Used to sense transfer completion after writes to W_BB_CNT.
Not sure if I am doing something wrong... but the busy flag doesn't seem to get set immediately after W_BB_CNT writes, and works only after waiting a good number of clock cycles?

4808160h - W_BB_MODE (R/W)
  0-7   Always zero
  8     Unknown (usually 1) (no effect no matter what setting?)
  9-13  Always zero
  14    Unknown (usually 0) (W_BB_READ gets unstable when set)
  15    Always zero
This register is initialized by firmware bootcode - don't change.

4808168h - W_BB_POWER (R/W)
  0-3   Disable whatever   (usually 0Dh=disable)
  4-14  Always zero
  15    Disable W_BB_ports (usually 1=Disable)
Must be set to 0000h before accessing BB registers.

Read-Write-Ability of the BB-Chip Mitsumi MM3155 registers (DS)
  Index    Num Dir Expl.
  00h        1 R   always 6Dh (R) (Chip ID)
  01h..0Ch  12 R/W 8bit R/W
  0Dh..12h   6 -   always 00h
  13h..15h   3 R/W 8bit R/W
  16h..1Ah   5 -   always 00h
  1Bh..26h  12 R/W 8bit R/W
  27h        1 -   always 00h
  28h..4Ch     R/W 8bit R/W
  4Dh        1 R   always 00h or BFh (depending on other regs)
  4Eh..5Ch     R/W 8bit R/W
  5Dh        1 R   always 01h (R)
  5Eh..61h     -   always 00h
  62h..63h   2 R/W 8bit R/W
  64h        1 R   always FFh or 3Fh (depending on other regs)
  65h        1 R/W 8bit R/W
  66h        1 -   always 00h
  67h..68h   2 R/W 8bit R/W
  69h..FFh     -   always 00h

Read-Write-Ability of the BB/RF-Chip Mitsumi MM3218 (DS-Lite)
Same as above. Except that reading always seems to return [5Dh]=00h. And, for whatever reason, Nintendo initializes DS-Lite registers by writing [00h]=03h and [66h]=12h. Nethertheless, the registers always read as [00h]=6Dh and [66h]=00h, ie. same as on original DS.

Important BB Registers
Registers 0..68h are initialized by firmware bootcode, and (most) of these settings do not need to be changed by other programs, except for:
  Addr Initial Meaning
  01h 0x9E    [unsetting/resetting bit 7 initializes/resets the system?]
  02h         unknown (firmware is messing with this register)
  06h         unknown (firmware is messing with this register, too)
  13h 0x00    CCA operation - criteria for receiving
                    0=only use Carrier Sense (CS)
                    1=only use Energy Detection (ED)
                    2=receive if CS OR ED
                    3=receive only if CS AND ED
  1Eh 0xBB    see change channels flowchart (Ext. Gain when RF[09h].bit16=0)
  35h 0x1F    Energy Detection (ED) criteria
              value 0..61 (representing energy levels of -60dBm to -80dBm)

  DS Wifi RF Chip

RF-Chip RF9008 (compatible to RF2958 from RF Micro Devices, Inc.) (Original DS)
BB/RF-Chip Mitsumi MM3218 (DS-Lite)

480817Ch - W_RF_DATA2 - RF chip serial data/transfer enable (R/W)
For Type2 (ie. firmware[040h]<>3):
  0-1   Upper 2bit of 18bit data
  2-6   Index   (00h..1Fh) (firmware uses only 00h..0Bh)
  7     Command (0=Write data, 1=Read data)
  8-15  Should be zero (not used with 24bit transfer)
For Type3 (ie. firmware[040h]=3):
  0-3   Command (5=Write data, 6=Read data)
  4-15  Should be zero (not used with 20bit transfer)
Writing to this register starts the transfer.

480817Eh - W_RF_DATA1 - RF chip serial data (R/W)
For Type2 (ie. firmware[040h]<>3):
  0-15  Lower 16bit of 18bit data
For Type3 (ie. firmware[040h]=3):
  0-7   Data (to be written to chip) (or being received from chip)
  8-15  Index (usually 00h..28h) (index 40h..FFh are mirrors of 00h..3Fh)
This value should be set before setting W_RF_DATA2.

4808180h - W_RF_BUSY - RF chip serial busy flag (R)
  0     Transfer Busy (0=Ready, 1=Busy)
  1-15  Always zero
Used to sense transfer completion after writes to W_RF_DATA2.

4808184h - W_RF_CNT - RF chip serial control (R/W)
  0-5   Transfer length (init from firmware[041h].Bit0-5)
  6-7   Always zero
  8     Unknown         (init from firmware[041h].Bit7)
  9-13  Always zero
  14    Unknown         (usually 0)
  15    Always zero
This register is initialized by firmware bootcode - don't change.
Usually, Type2 has length=24bit and flag=0. Type3 uses length=20bit and flag=1.

Caution For Type2 (ie. firmware[040h]<>3)
Before accessing Type2 RF Registers, first BB[01h] must have been properly initialized (ie. BB[01h].Bit7 must have been toggled from 0-to-1).

  DS Wifi RF9008 Registers

RF9008 (RF2958 compatible)
2.4GHz Spread-Spectrum Transceiver - RF Micro Devices, Inc.

RF chip data (Type2) (initial NDS settings from firmware, example)
  Firmware   Index   Data
  (24bit)    (4bit)  (18bit)
  00C007h  =  00h  + 0C007h ;-also set to 0C008h for power-down
  129C03h  =  04h  + 29C03h
  141728h  =  05h  + 01728h ;\these are also written when changing channels
  1AE8BAh  =  06h  + 2E8BAh ;/
  1D456Fh  =  07h  + 1456Fh
  23FFFAh  =  08h  + 3FFFAh
  241D30h  =  09h  + 01D30h ;-bit10..14 should be also changed per channel?
  """"50h  =  """  + """50h ;firmware v5 and up uses narrower tx filter
  280001h  =  0Ah  + 00001h
  2C0000h  =  0Bh  + 00000h
  069C03h  =  01h  + 29C03h
  080022h  =  02h  + 00022h
  0DFF6Fh  =  03h  + 1FF6Fh

RF[00h] - Configuration Register 1 (CFG1) (Power on: 00007h)
  17-16 Reserved, program to zero (0)
  15-14 Reference Divider Value (0=Div2, 1=Div3, 2=Div44, 3=Div1)
  3     Sleep Mode Current      (0=Normal, 1=Very Low)
  2     RF VCO Regulator Enable (0=Disable, 1=Enable)
  1     IF VCO Regulator Enable (0=Disable, 1=Enable)
  0     IF VGA Regulator Enable (0=Disable, 1=Enable)

RF[01h] - IF PLL Register 1 (IFPLL1) (Power on: 09003h)
  17    IF PLL Enable                      (0=Disable, 1=Enable)
  16    IF PLL KV Calibration Enable       (0=Disable, 1=Enable)
  15    IF PLL Coarse Tuning Enable        (0=Disable, 1=Enable)
  14    IF PLL Loop Filter Select          (0=Internal, 1=External)
  13    IF PLL Charge Pump Leakage Current (0=Minimum value, 1=2*Minimum value)
  12    IF PLL Phase Detector Polarity     (0=Positive, 1=Negative)
  11    IF PLL Auto Calibration Enable     (0=Disable, 1=Enable)
  10    IF PLL Lock Detect Enable          (0=Disable, 1=Enable)
  9     IF PLL Prescaler Modulus           (0=4/5 Mode, 1=8/9 Mode)
  8-4   Reserved, program to zero (0)
  3-0   IF VCO Coarse Tuning Voltage       (N=Voltage*16/VDD)

RF[02h] - IF PLL Register 2 (IFPLL2) (Power on: 00022h)
  17-16 Reserved, program to zero (0)
  15-0  IF PLL divide-by-N value

RF[03h] - IF PLL Register 3 (IFPLL3) (Power on: 1FF78h)
  17    Reserved, program to zero (0)
  16-8  IF VCO KV Calibration, delta N value (signed)  ;DeltaF=(DN/Fr)
  7-4   IF VCO Coarse Tuning Default Value
  3-0   IF VCO KV Calibration Default Value

RF[04h] - RF PLL Register 1 (RFPLL1) (Power on: 09003h)
  17-10 Same as for RF[01h] (but for RF, not for IF)
  9     RF PLL Prescaler Modulus (0=8/9 Mode, 1=8/10 Mode)
  8-0   Same as for RF[01h] (but for RF, not for IF)

RF[05h] - RF PLL Register 2 (RFPLL2) (Power on: 01780h)
  17-6  RF PLL Divide By N Value
  5-0   RF PLL Numerator Value (Bits 23-18)

RF[06h] - RF PLL Register 3 (RFPLL3) (Power on: 00000h)
  17-0  RF PLL Numerator Value (Bits 17-0)

RF[07h] - RF PLL Register 4 (RFPLL4) (Power on: 14578h)
  17-10 Same as for RF[03h] (but for RF, not for IF) ;and, DN=(deltaF/Fr)*256

RF[08h] - Calibration Register 1 (CAL1) (Power on: 1E742h)
  17-13  VCO1 Warm-up Time  ;TVCO1=(approximate warm-up time)*(Fr/32)
  12-8   VCO1 Tuning Gain Calibration ;TLOCK1=(approximate lock time)*(Fr/128)
  7-3    VCO1 Coarse Tune Calibration Reference  ;VALUE=(average time)*(Fr/32)
  2-0    Lock Detect Resolution (0..7)

RF[09h] - TXRX Register 1 (TXRX1) (Power on: 00120h)
  17    Receiver DC Removal Loop          (0=Enable DC Removal Loop, 1=Disable)
  16    Internal Variable Gain for VGA  (0=Disable/External, 1=Enable/Internal)
  15    Internal Variable Gain Source (0=From TXVGC Bits, 1=From Power Control)
  14-10 Transmit Variable Gain Select (TXVGC)   (0..1Fh = High..low gain)
  9-7   Receive Baseband Low Pass Filter     (0=Wide Bandwidth, 7=Narrow)
  6-4   Transmit Baseband Low Pass Filter    (0=Wide Bandwidth, 7=Narrow)
  3     Mode Switch            (0=Single-ended mode, 1=Differential mode)
  2     Input Buffer Enable TX (0=Input Buffer Controlled by TXEN, 1=By BBEN)
  1     Internal Bias Enable   (0=Disable/External, 1=Enable/Internal)
  0     TX Baseband Filters Bypass        (0=Not Bypassed, 1=Bypassed)

RF[0Ah] - Power Control Register 1 (PCNT1) (Power on: 00000h)
  17-15 Select MID_BIAS Level                          (1.6V through 2.6V)
  14-9  Desired output power at antenna                (N*0.5dBm)
  8-3   Power Control loop-variation-adjustment Offset (signed, N*0.5dB)
  2-0   Desired delay for using a single TX_PE line    (N*0.5us)

RF[0Bh] - Power Control Register 2 (PCNT2) (Power on: 00000h)
  17-12 Desired MAX output power when PABIAS=MAX=2.6V (N*0.5dBm)
  11-6  Desired MAX output power when PABIAS=MID_BIAS (N*0.5dBm)
  5-0   Desired MAX output power when PABIAS=MIN=1.6V (N*0.5dBm)

RF[0Ch] - VCOT Register 1 (VCOT1) (Power on: 00000h)
  17    IF VCO Band Current Compensation (0=Disable, 1=Enable)
  16    RF VCO Band Current Compensation (0=Disable, 1=Enable)
  15-0  Reserved, program to zero (0)

RF[0Dh..1Ah] - N/A (Power on: 00000h)
  Not used.

RF[1Bh] - Test Register 1 (TEST) (Power on: 0000Fh)
  17-0  This is a test register for internal use only.

RF[1Ch..1Eh] - N/A (Power on: 00000h)
  Not used.

RF[1Fh] - Reset Register (Power on: 00001h)
  17-0  Don't care (writing any value resets the chip)

  DS Wifi Unknown Registers

480800Ah - W_X_00Ah (R/W)
  0-15  Unknown (usually zero)
"[bit7 - ingore rx duplicates]" <--- that is NOT correct (no effect).
Firmware writes 0000h to it. That, done many times. So, eventually some bits in this register are automatically set by hardware in whatever situations, otherwise repeatedly writing 0000h to it would be kinda useless...?


Below Ports W_X_244h and W_X_228h might be related to deciding when to send multiplay replies...?

4808244h - W_X_244h (R/W) x ffff [0000] (used by firmware part4)
Unknown. Seems to be W_IF/W_IE related. Firmware sets 4808Port 244h bits 6,7,12 to 1-then-0 upon IRQ06,IRQ07,IRQ12 respectively.

4808228h - W_X_228h (W) fixx [0000] (used by firmware part4) (bit3)
Unknown. Firmware writes 8-then-0 (done in IRQ06 handler, after Port 4808244h access).


Below Ports 48081A0h, 48081A2h, 48081A4h are somehow related to BB[02h]...

48081A0h - W_X_1A0h - (R/W) -933 [0000]
  0-1   Unknown
  2-3   Always zero
  4-5   Unknown
  6-7   Always zero
  8     Unknown
  9-10  Always zero
  11    Unknown
  12-15 Always zero
Firmware writes values 000h, 823h. Seems to be power-related. The following experimental code toggles RXTX.ON (RFU.Pin4): "x=0 / @@lop: / [48081A0h]=x / [4808036h]=0 / x=x XOR 3 / wait_by_loop(1000h) / b @@lop".
Also, writing to port 48081A0h affects ports 4808034h, 480819Ch, 480821Ch, and 48082A2h.

48081A2h - W_X_1A2h - (R/W) ---3 [0001] (used by firmware part4)
  0-1   Unknown. Firmware writes values 03h, 01h, and VAR.
  2-15  Always zero
Used in combination with Port 48081A0h, so it's probably power-related, too.

48081A4h - W_X_1A4h - (R/W) ffff [0000]
"Rate used when signal test is enabled (0x0A or 0x14 for 1 or 2 mbit)"
(Not too sure if that's correct, there is no visible relation to any rate.)
(This register seems to be R/W only on certain Port 48081A0h settings.)
Unknown. Firmware writes whatever.


4808290h - W_X_290h - (R/W or Disabled)
Reportedly, this is the "antenna" register, which should exist on official devkits, allowing to switch between wired Ethernet, and wireless Wifi mode.
  0     Unknown (R/W) (if present)
  1-15  Not used
On normal NDS release versions, this register seems to be disabled (if it is implemented at all), and trying to read from it acts as for unused registers, ie. reads return FFFFh (or probably 0000h on NDS-lite). The NDS firmware contains code for accessing this port, even in release versions.

All registers marked as "W_INTERNAL" aren't used by Firmware part4, and are probably unimportant, except for whatever special diagnostics purposes.

Wifi DMA
Wifi RAM can be accessed with normal "Start Immediately" DMA transfers (typically by reading through W_RXBUF_RD_DATA, so the DMA automatically wraps from END to BEGIN).
Additionally, DMA0 and DMA2 can be reportedly synchronized to "Wireless Interrupt" (rather than using "Start Immediately" timing), no idea if/how that's working though... and if it gets started on any Wifi IRQ, or only on specific IRQs...?
Possibly some of the above unknown registers, or some unknown bits in other registers, are DMA related...?
Reportedly, early firmwares did use "Wireless Interrupt" DMAs (that'd be firmware v1/v2... or, only earlier unreleased prototype versions?).

  DS Wifi Unused Registers

Wifi WS0 and WS1 Regions in NDS7 I/O Space
Wifi hardware occupies two 32K slots, but most of it is filled with unused or duplicated regions. The timings (waitstates) for WS0 and WS1 are initialized in WIFIWAITCNT (by firmware).
  4800000h-4807FFFh Wifi WS0 Region (32K)
  4808000h-4808000h Wifi WS1 Region (32K)
  4810000h-4FFFFFFh Not used (00h-filled)
Structure of the 32K Wifi Regions (WS0 and WS1)
  Wifi-WS0-Region    Wifi-WS1-Region    Content
  4800000h-4800FFFh  4808000h-4808FFFh  Registers
  4801000h-4801FFFh  4809000h-4809FFFh  Registers (mirror)
  4802000h-4803FFFh  480A000h-480BFFFh  Unused
  4804000h-4805FFFh  480C000h-480DFFFh  Wifi RAM  (8K)
  4806000h-4806FFFh  480E000h-480EFFFh  Registers (mirror)
  4807000h-4807FFFh  480F000h-480FFFFh  Registers (mirror)
Wifi Registers (recommended 4808000h-4808FFFh) appear more stable in WS1?
Wifi RAM (recommended 4804000h-4805FFFh) appears more stable in WS0?

Unused Ports (Original NDS)
Aside from those ports listed in the Wifi I/O Map, all other ports in range 4808000h..4808FFFh are unused. On the original DS, reading from these ports returns FFFFh.

Unused Ports (NDS-Lite)
Reading from unused I/O ports acts as PASSIVE mirror of W_RXBUF_RD_DATA. Exceptions are: Ports 4808188h, and 48082D8h..48082E6h; which always return 0000h.

Unused Memory (Original NDS)
Unused Wifi Memory is at 2000h..3FFFh. On the original DS, reading from that region returns FFFFh.

Unused Memory (NDS-Lite)
Reading from unused memory acts as PASSIVE mirror of WifiRAM (ie. reading from it returns the value being most recently read from 4000h..5FFFh) (that not affected by indirect WifiRAM reads via W_RXBUF_RD_DATA) (and, that not affected by writes to wifi memory, including writes that do overwrite the most recent read value) (and, that only if WifiRAM is properly enabled, ie. Port 220h.Bits0-1 should be 0).
Moreover, certain addresses are additionally ORed with mirrored I/O Ports. That addresses are:
  2030h, 2044h, 2056h, 2080h, 2090h, 2094h, 2098h, 209Ch, 20A0h, 20A4h,
  20A8h, 20AAh, 20B0h, 20B6h, 20BAh, 21C0h, 2208h, 2210h, 2244h, 31D0h,
  31D2h, 31D4h, 31D6h, 31D8h, 31DAh, 31DCh, 31DEh.
For example, 2044h is a PASSIVE mirror of WifiRAM, ORed with an ACTIVE mirror of W_RANDOM (Port 044h). Note that some mirrors are at 2000h-2FFFh, and some at 3000h-3FFFh. The W_CMD_STAT mirrors are PASSIVE (that, in unused memory region only) (in normal port-mirror regions like 1000h-1FFF, W_CMD_STAT mirrors are ACTIVE).

Known (W) Mirrors (when reading from Write-only ports)
  Read from (W)           Mirrors to (NDS)       Or to (NDS-Lite)
  078h W_INTERNAL         068h W_TXBUF_WR_ADDR   074h W_TXBUF_GAP
  0ACh W_TXREQ_RESET      09Ch W_INTERNAL        ? (zero)
  0AEh W_TXREQ_SET        09Ch W_INTERNAL        ? (zero)
  0B4h W_TXBUF_RESET      0B6h W_TXBUSY          ? (zero)
  158h W_BB_CNT           15Ch W_BB_READ         ? (zero)
  15Ah W_BB_WRITE         ? (zero)               ? (zero)
  178h W_INTERNAL         17Ch W_RF_DATA2        ? (zero)
  20Ch W_INTERNAL         09Ch W_INTERNAL        ? (zero)
  21Ch W_IF_SET           010h W_IF              010h-OR-05Ch-OR-more?
  228h W_X_228h           ? (zero)               ? (zero)
  298h W_INTERNAL         084h W_TXBUF_TIM       084h W_TXBUF_TIM
  2A8h W_INTERNAL         238h W_INTERNAL        238h W_INTERNAL
  2B0h W_INTERNAL         084h W_TXBUF_TIM       084h W_TXBUF_TIM
Notes: The mirror to W_RXBUF_RD_DATA is a passive mirror.
The DS-Lite mirror at 21Ch consists of several ports ORed with each other (known components are Ports 010h and 05Ch, but there seem to be even more values ORed with it).

Port Mirror Regions
The Wifi Port region at 000h..FFFh is mirrored to 1000h..1FFFh, 6000h..6FFFh, and 7000h..7FFFh. Many of that mirrored ports are PASSIVE mirrors. Eg. reading from 1060h (mirror of Port 060h, W_RXBUF_RD_DATA) returns the old W_RXBUF_RD_DATA value (but without loading a new value from Wifi RAM, and without incrementing W_RXBUF_RD_ADDR). However, other registers, like W_RANDOM do have ACTIVE mirrors.

  DS Wifi Initialization

Initialization sequence
These events must be done somewhat in sequence. There is some flexibility as to how they can be ordered but it's best to follow this order:
  [4000304h].Bit1 = 1 ;POWCNT2  ;-Enable power to the wifi system
  W_MACADDR = firmware[036h]    ;-Set 48bit Mac address
  reg[012h] = 0000h   ;W_IE     ;-Disable interrupts
Wake Up the wireless system:
  reg[036h] = 0000h ;W_POWER_US ;\clear all powerdown bits
  delay 8 ms                    ; (works without that killer-delay ?)
  reg[168h] = 0000h ;W_BB_POWER ;/
  IF firmware[040h]=02h         ;\
    temp=BB[01h]                ; for wifitype=02h only:
    BB[01h]=temp AND 7Fh        ; reset BB[01h].Bit7, then restore old BB[01h]
    BB[01h]=temp                ; (that BB setting enables the RF9008 chip)
  ENDIF                         ;/
  delay 30 ms                   ;-(more killer-delay now getting REALLY slow)
  call init_sub_functions       ;- same as "Init 16 registers by firmware[..]"
                                ;  and "Init RF registers", below.
                                ;  this or the other one probably not necessary
Init the Mac system:
  reg[004h] = 0000h   - W_MODE_RST       ;set hardware mode
  reg[008h] = 0000h   - W_TXSTATCNT      ;
  reg[00Ah] = 0000h   - ? W_X_00Ah       ;(related to rx filter)
  reg[012h] = 0000h   - W_IE             ;disable interrupts (again)
  reg[010h] = FFFFh   - W_IF             ;acknowledge/clear any interrupts
  reg[254h] = 0000h   - W_CONFIG_254h    ;
  reg[0B4h] = FFFFh   - W_TXBUF_RESET    ;--reset all TXBUF_LOC's
  reg[080h] = 0000h   - W_TXBUF_BEACON   ;disable automatic beacon transmission
  reg[02Ah] = 0000h   - W_AID_FULL       ;\clear AID
  reg[028h] = 0000h   - W_AID_LOW        ;/
  reg[0E8h] = 0000h   - W_US_COUNTCNT    ;disable microsecond counter
  reg[0EAh] = 0000h   - W_US_COMPARECNT  ;disable microsecond compare
  reg[0EEh] = 0001h   - W_CMD_COUNTCNT   ;(is 0001h on reset anyways)
  reg[0ECh] = 3F03h   - W_CONFIG_0ECh    ;
  reg[1A2h] = 0001h   - ?                ;
  reg[1A0h] = 0000h   - ?                ;
  reg[110h] = 0800h   - W_PRE_BEACON     ;
  reg[0BCh] = 0001h   - W_PREAMBLE       ;disable short preamble
  reg[0D4h] = 0003h   - W_CONFIG_0D4h    ;
  reg[0D8h] = 0004h   - W_CONFIG_0D8h    ;
  reg[0DAh] = 0602h   - W_CONFIG_0DAh    ;
  reg[076h] = 0000h   - W_TXBUF_GAPDISP  ;disable gap/skip (offset=zero)
Init 16 registers by firmware[044h..063h]
  reg[146h] = firmware[044h] ;W_CONFIG_146h
  reg[148h] = firmware[046h] ;W_CONFIG_148h
  reg[14Ah] = firmware[048h] ;W_CONFIG_14Ah
  reg[14Ch] = firmware[04Ah] ;W_CONFIG_14Ch
  reg[120h] = firmware[04Ch] ;W_CONFIG_120h
  reg[122h] = firmware[04Eh] ;W_CONFIG_122h
  reg[154h] = firmware[050h] ;W_CONFIG_154h
  reg[144h] = firmware[052h] ;W_CONFIG_144h
  reg[130h] = firmware[054h] ;W_CONFIG_130h
  reg[132h] = firmware[056h] ;W_CONFIG_132h
  reg[140h] = firmware[058h] ;W_CONFIG_140h
  reg[142h] = firmware[05Ah] ;W_CONFIG_142h
  reg[038h] = firmware[05Ch] ;W_POWER_TX
  reg[124h] = firmware[05Eh] ;W_CONFIG_124h
  reg[128h] = firmware[060h] ;W_CONFIG_128h
  reg[150h] = firmware[062h] ;W_CONFIG_150h
Init RF registers
  numbits = BYTE firmware[041h]    ;usually 18h
  numbytes = (numbits+7)/8         ;usually 3
  reg[0x184] = (numbits+80h) AND 017Fh  -- W_RF_CNT
  for i=0 to BYTE firmware[042h]-1 ;number of entries (usually 0Ch) (0..0Bh)
   if BYTE firmware[040h]=3
    RF_Write(numbytes at firmware[0CEh+i*numbytes])
Init the BaseBand System
  (this should be not required, already set by firmware bootcode)
  reg[160h] = 0100h  ;W_BB_MODE
  BB[0..68h] = firmware[64h+(0..68h)]
Set Mac address
  copy 6 bytes from firmware[036h] to mac address at 0x04800018  (why again ?)
Now just set some default varibles
  reg[02Ch]=0007h  ;W_TX_RETRYLIMIT - XXX needs to be set for every transmit?
  Set channel (see section on changing channels)
  Set Mode 2 -- sets bottom 3 bits of W_MODE_WEP to 2
  Set Wep Mode / key -- Wep mode is bits 3..5 of W_MODE_WEP
  BB[13h] = 00h  ;CCA operation (use only carrier sense, without ED)
  BB[35h] = 1Fh  ;Energy Detection Threshold (ED)
-- To further init wifi to the point that you can properly send
-- and receive data, there are some more variables that need to be set.
  reg[032h] = 8000h -- W_WEP_CNT     ;Enable WEP processing
  reg[134h] = FFFFh -- W_BEACONCOUNT2;reset post-beacon counter to LONG time
  reg[028h] = 0000h -- W_AID_LOW     ;\clear W_AID value, again?!
  reg[02Ah] = 0000h -- W_AID_FULL    ;/
  reg[0E8h] = 0001h -- W_US_COUNTCNT ;enable microsecond counter
  reg[038h] = 0000h -- W_POWER_TX    ;disable transmit power save
  reg[020h] = 0000h -- W_BSSID_0     ;\
  reg[022h] = 0000h -- W_BSSID_1     ; clear BSSID
  reg[024h] = 0000h -- W_BSSID_2     ;/
-- TX prepare
  reg[0AEh] = 000Dh -- W_TXREQ_SET   ;flush all pending transmits (uh?)
-- RX prepare
  reg[030h] = 8000h    W_RXCNT         ;enable RX system (done again below)
  reg[050h] = 4C00h    W_RXBUF_BEGIN   ;(example values)
  reg[052h] = 5F60h    W_RXBUF_END     ;(length = 4960 bytes)
  reg[056h] = 0C00h/2  W_RXBUF_WR_ADDR ;fifo begin latch address
  reg[05Ah] = 0C00h/2  W_RXBUF_READCSR     ;fifo end, same as begin at start.
  reg[062h] = 5F60h-2  W_RXBUF_GAP     ;(set gap<end) (zero should work, too)
  reg[030h] = 8001h    W_RXCNT  ;enable, and latch new fifo values to hardware
  reg[030h] = 8000h    W_RXCNT       enable receive (again?)
  reg[010h] = FFFFh    W_IF          clear interrupt flags
  reg[012h] = whatever W_IE          set enabled interrupts
  reg[1AEh] = 1FFFh    W_RXSTAT_OVF_IE desired STAT Overflow interrupts
  reg[1AAh] = 0000h    W_RXSTAT_INC_IE desired STAT Increase interrupts
  reg[0D0h] = 0181h    W_RXFILTER set to 0x581 when you successfully connect
                        to an access point and fill W_BSSID with a mac
                        address for it. (W_RXFILTER) [not sure on the values
                        for this yet]
  reg[0E0h] = 000Bh  -- W_RXFILTER2     ;
  reg[008h] = 0000h  -- ? W_TXSTATCNT   ;(again?)
  reg[00Ah] = 0000h  -- ? W_X_00Ah      ;(related to rx filter) (again?)
  reg[004h] = 0001h  -- W_MODE_RST      ;hardware mode
  reg[0E8h] = 0001h  -- W_US_COUNTCNT   ;enable microsecond counter (again?)
  reg[0EAh] = 0001h  -- W_US_COMPARECNT ;enable microsecond compare
  reg[048h] = 0000h  -- W_POWER_?    ;[disabling a power saving technique]
  reg[038h].Bit1 = 0 -- W_POWER_TX   ;[this too]
  reg[048h] = 0000h  -- W_POWER_?    ;[umm, it's done again. necessary?]
  reg[0AEh] = 0002h  -- W_TXREQ_SET  ;
  reg[03Ch].Bit1 = 1 -- W_POWERSTATE ;queue enable power (RX power, we believe)
  reg[0ACh] = FFFFh  -- W_TXREQ_RESET;reset LOC1..3
That's it, the DS should be now happy to send and receive packets.
It's very possible that there are some unnecessary registers set in here.

  DS Wifi Flowcharts

Wifi Transmit Procedure
To transmit data via wifi (Assuming you've already initialized wifi and changed channels to the channel you want):
 (1) Copy the TX Header followed by the 802.11 packet to send anywhere it
      will fit in MAC memory (halfword-aligned)
 (2) Take the offset from start of MAC memory that you put the packet,
      divide it by 2, and or with 0x8000 - store this in one of the
      W_TXBUF_LOC registers
 (3) Set W_TX_RETRYLIMIT, to allow your packet to be retried until an ack is
      received (set it to 7, or something similar)
 (4) Store the bit associated with the W_TXBUF_LOC register you used
      into W_TXREQ_SET - this will send the packet.
 (5) You can then read the result data in W_TXSTAT when the TX is over
      (you can tell either by polling or interrupt) to find out how many
      retries were used, and if the packet was ACK'd
Of course, this is just the simplest approach, you can be a lot more clever about it.

Wifi Receive Procedure
To receive data via wifi, you either need to handle the wifi received data interrupt, or you need to poll W_RXBUF_WRCSR - whenever it is != W_RXBUF_READCSR, there is a new packet. When there is a new packet, take the following approach:
 (1) Calculate the length of the new packet (read "received frame length"
      which is +8 bytes from the start of the packet) - total frame length
      is (12 + received frame length) padded to a multiple of 4 bytes.
 (2) Read the data out of the RX FIFO area (keep in mind it's a circular
      buffer and you may have to wrap around the end of the buffer)
 (3) Set the value of W_RXBUF_READCSR to the location of the next packet
      (add the length of the packet, and wrap around if necessary)
Keep in mind, W_RXBUF_READCSR and W_RXBUF_WRCSR must be multiplied by 2 to get a byte offset from the start of MAC memory.

Wifi Change Channels Procedure (ch=1..14)
For Type2 or Type5 (ie. firmware[040h]<>3): ;(Type2, used in Original-DS)
  RF[firmware[F2h+(ch-1)*6]/40000h] = firmware[F2h+(ch-1)*6] AND 3FFFFh
  RF[firmware[F5h+(ch-1)*6]/40000h] = firmware[F5h+(ch-1)*6] AND 3FFFFh
  delay a few milliseconds  ;huh?
  IF RF[09h].bit16=0     ;External Gain (default)
   BB[1Eh]=firmware[146h+(ch-1)]                         ;set BB.Gain register
  ELSEIF RF[09h].bit15=0 ;Internal Gain from TXVGC Bits
   RF[09h].Bit10..14 = (firmware[154h+(ch-1)] AND 1Fh)   ;set RF.TXVGC Bits
For Type3 (ie. firmware[040h]=3): ;(Type3, used in DS-Lite)
  num_initial_regs = firmware[042h]
  num_bb_writes = firmware[addr]
  num_rf_writes = firmware[43h]
  for i=1 to num_bb_writes
    BB[firmware[addr]] = firmware[addr+ch]
  next i
  for i=1 to num_rf_writes
    RF[firmware[addr]] = firmware[addr+ch]
  next i
Congrats, you are now ready to transmit/receive on whatever channel you picked.

The IEEE802.11b standard (and the NDS hardware) support 14 channels (1..14).
Channels 1..13 use frequencies 2412MHz..2472MHz (in 5MHz steps). Channel 14 uses frequency 2484MHz. Which channels are allowed to be used varies from country to country, as indicated by Bit1..14 of firmware[03Ch]. Channel 14 is rarely used (dated back to an older japanese standard).

Caution: Nearby channels do overlap, you'll get transmission errors on packets that are transferred simultaneously with packets on nearby channels. But, you won't successfully receive packets from nearby channels (so you won't even "see" that they are there, which is bad, as it doesn't allow you to share the channel synchronized with other hosts; ie. it'd be better if two hosts are using the SAME channel, rather than to use nearby channels).
To avoid that problem, conventionally only channels 1,6,11 are used - however Nintendo uses channels 1,7,13 - which is causing conflicts between channel 6,7, and maybe also between 11,13.

  DS Wifi Hardware Headers

Hardware TX Header (12 bytes) (TXHDR)
The TX header immediately precedes the data to be sent, and should be put at the location that will be given to the register activating a transmission.
  Addr Siz Expl.
  00h  2   Status - In: Don't care - Out: Status (0000h=Failed, 0001h=Okay)
  02h  2   Unknown - In: Don't care
             Bit0: Usually zero.
             Bit1..15 --------> flags for multiboot slaves number 1..15
             (Should be usually zero, except when sending multiplay commands
             via W_TXBUF_CMD. In that case, the slave flags should be ALSO
             stored in the second halfword of the FRAME BODY. Actually, the
             hardware seems to use only that entry (in the BODY), rather than
             using this entry (in the hardware header)).
  04h  1   Unknown - In: Must be 00h..02h (should be 00h)
             (03h..FFh result in error: W_TXSTAT.Bit1 gets set, but
             nethertheless header entry[00h] is kept set to 0001h=Okay)
             ;00h = use W_TX_SEQNO (if enabled in TXBUF_LOCn)
             ;01h = force NOT to use W_TX_SEQNO (even if it is enabled in LOCn)
             ;02h = seems to behave same as 01h
  05h  1   Unknown - In: Don't care - Out: Set to 00h
  06h  2   Unknown - In: Don't care
  08h  1   Transfer Rate (0Ah=1Mbit/s, 14h=2Mbit/s) (other values=1MBit/s, too)
  09h  1   Unknown - In: Don't care
  0Ah  2   Length of IEEE Frame Header+Body+checksum(s) in bytes
           (14bits, upper 2bits are unused/don't care)
The eight "Don't care" bytes should be usually set to zero (although setting them to FFh seems to be working as well). Entries [00h] and [05h] are modified by hardware, all other entries are kept unchanged.

Important note! TX length includes the length of a 4-byte "FCS" (checksum) for the packet. The hardware generates the FCS for you, but you still must include it in the packet length. Also note that if the 802.11 WEP enabled bit is set in the header, the packet will be automatically encrypted via the wep algorithm - however, the software is responsible for providing the 4-byte IV block with the WEP key ID and the 24bit IV value. - ALSO, you must include the length of the *encrypted* FCS used in packets that have wep enabled (increase the tx length by another 4 bytes) - this value is calculated automaticly for you, but you are responsible for including it in the length of your packet (if you have data there, it'll be replaced by the FCS.)

Hardware RX Header (12 bytes) (RXHDR)
The RX header is an informational structure that provides needed information about a received packet. It is written right before the received packet data in the rx circular buffer.
  Addr Siz Expl.
  00h  2   Flags
             Bit0-3: Frame type/subtype:
               0  managment/any frame (except beacon and invalid subtypes)
               1  managment/beacon frame
               5  control/ps-poll frame
               8  data/any frame (subtype0..7) (ie. except invalid subtypes)
               C,D,E,F  unknown (firmware is checking for that values)
               C    firmware uses it for data/cf-poll frame, FromDs (*)
               D    firmware uses it for data/cf-ack frame, FromDs
               E,F  firmware uses it for data/cf-ack frame, ToDs
               (*) with DA=broadcast
             Bit4:   Seems to be always set
             Bit5-7: Seems to be always zero
             Bit8: Set when FC.Bit10 is set (more fragments)
             Bit9: Set when the lower-4bit of Sequence Control are nonzero,
                   it is also set when FC.Bit10 is set (more fragments)
                   So, probably, it is set on fragment-mismatch-errors
             Bit10-14: Seems to be always zero
             Bit15: Set when Frame Header's BSSID value equals W_BSSID register
  02h  2   Unknown (usually 0040h)
  04h  2   Time since last packet (eg. when receiving beacons: total random on
            first some packets, but later on it gets equal to Beacon Interval)
            In other cases, this value is equal to the 1st 2 bytes of the DA ?
            [Above time/da effects might be explained by other reason: maybe
            this entry is left unchanged, simply containing old WifiRAM value?]
  06h  2   Transfer Rate (N*100kbit/s) (ie. 14h for 2Mbit/s)
  08h  2   Length of IEEE Frame Header+Body in bytes (excluding FCS checksum)
  0Ah  1   MAX RSSI
  0Bh  1   MIN RSSI
Important Note: Received frame lengths are always multiples of 4 bytes. While the actual header length + received frame length may be less, when incrementing the read cursor you must pad the length to a multiple of 4 bytes.

IEEE Header
The above Hardware headers should (must) be followed by valid IEEE headers. Although that headers are to be generated by software, the hardware does do some interaction with the IEEE headers, such like comparing address fields with W_MACADDR and W_BSSID. And, it does modify some entries of it:
1) The sequence control value is replaced by W_TX_SEQNO*10h (when enabled in W_TXBUF_LOCn.Bit13), this replacement does also overwrite the local TXBUF value.
2) The frame control value is modified, namely, the hardware tends to set Bit12 of it. This replacement does NOT modify the local TXBUF, but the remote RXBUF will receive the modified value. Also, Bit0-1 (protocol version) are forcefully set to 0.
3) Transmits via W_TXBUF_BEACON do additionally modify the 64bit timestamp (so W_TXBUF_BEACON should be used ONLY for packets WITH timestamp, ie. Beacons or Probe-Responses). The local TXBUF seems to be left unchanged, but the remote RXBUF will contain the (sender's) W_US_COUNT value.
C) For Control Frames, the hardware headers Length value is transferred as normally (ie. excluding the FCS length, remote RXBUF will contain TXBUF length minus 4), but - no matter of that length value - only 10 or 16 bytes (depending on the subtype) of the IEEE frame are actually transferred and/or stored in RXBUF.
X) For Control Frames with Subtype 0Ah, the AID entry is set to C000h, that, probably ORed with original value in WifiRAM, or with the W_AID_FULL register?
XX) No idea if it's possible to send Control Frames with subtype 0Bh..0Fh, as for now, it seems that either they aren't sent, or the receipient is ignoring them (or processing them internally, but without storing them in RXBUF).

  DS Wifi Multiboot

Available Game Advertisement
WMB uses beacon frames to advertise available games for download. The beacon frames are normally used to advertise available access points in most 802.11 systems, but there is nothing preventing their use in this capacity. The advertisement data is fragmented and stored partially in each beacon frame as the payload of a custom information element (tag: 0xDD).

The DS Download Play menu only lists games when the beacons are broadcasted on one of the following channels: 1, 3, 4, 5, 7, 9, 10, 11, 13, and 14 (that is WRONG, firmware_v3 checks only channels 1,7,13). However, the DS hosting mechanism only seems to transmit on channels 1, 7, and 13 (apparently selected at random).

All beacon frames transmitted by a DS host have the following format:
  802.11 management frame
  802.11 beacon header
  Supported rates (tagged IE, advertises 1 Mbit and 2 Mbit)
  DS parameter set (tagged IE, note: Distribution System, not Nintendo DS)
  TIM vector (tagged IE, transmitted as empty)
  Custom extension (tagged IE, tag 0xDD)

Nintendo specific beacon fragment format (information element code 0xDD):
  Offset Description
  00h  Nintendo Beacon ID (00h,09h,BFh,00h)
  04h  Stepping Offset for 4808134h/W_BEACONCOUNT2 (always 000Ah)
  06h  Strange Timestamp (W_US_COUNT*2-VCOUNT*7Fh)/128 (0000h for multiboot)
  08h  01 00
  0Ah  40 00
  0Ch  24 00
  0Eh  40 00
  10h  Randomly generated stream code
  12h  Number of bytes from entry 18h and up (70h for multiboot) (0 if Empty)
  13h  Beacon Type    (0Bh=Multiboot, 01h=Multicart/Pictochat, 09h=Empty)
  14h  0100 0008    (some kind of max,min values?)
For Empty (length zero, is used at very begin of multiboot)
  18h  No data.
For Multicart (variable length)
  18h  Custom data, usually containing the host name, either in 8bit ascii,
       or 16bit unicode format. Sometimes taken from Firmware User Settings,
       and sometimes from Cartridge Backup Memory.
For Pictochat (length 8)
  18h  Fixed (always 2348h)
  1Ah  xxxx
  1Ch  Chatroom number (00h..03h for Chatroom A..D)
  1Dh  Number of users already connected (01h..10h) (including host)
  1Eh  Fixed (always 0004h)
For Multiboot (always 70h bytes)
  18h  24 00 40 00 (varies from game to game)
  1Ch  End of advertisement flag (00 for non-end, 02 for end packets)
  1Dh  Always 00, 01, 02, or 04
  1Eh  Number of players already connected
  1Fh  Sequence number (0 .. total_advertisement_length)
  20h  Checksum (on entries 22h and up)
         chksum=0, for i=22h to 86h step 2, chksum=chksum+halfword[i], next i,
         chksum=FFFFh AND NOT (chksum+chksum/10000h)
  22h  Sequence number in non-final packet, # of players in final packet
  23h  Total advertisement length - 1 (in beacons)
  24h  Datasize in bytes (2 byte little-endian)
           (0062h for seq 0..7, 0048h for seq 8, 0001h for seq 9)
  26h  Data (always 62h bytes, padded with 00h if Datasize<62h)

The advertisement fragments are reordered and assembled according to their internal sequence number, to form the overall advertisement payload, as defined below:
  Offset Size Description
  000h  32  Icon Palette (same as for ROM Cartridge Icon)
  020h  512 Icon Bitmap  (same as for ROM Cartridge Icon)
  220h  1   Unknown (0Bh)
  221h  1   Length of hosting name          ;(probably same as firmware
  222h  20  Name of hosting DS (10 UCS-2)   ;user name?)
  236h  1   Max number of players
  237h  1   Unknown (00h)
  238h  96  Game name (48 UCS-2)   (same as 1st line of ROM Cartridge Title)
  298h  192 Description (96 UCS-2) (same as further lines of ROM Cart Title)
  358h  64  00's if no users are connected  <---WRONG: LEN=1, not 64
  398h  0   End of data if no users are connected

Authentication process
Once a user B chooses a download offered by a host A, the following standard 802.11 authentication process observed.
  Host A advertises a game in beacon frames as described above
  Client B sends an authentication request (sequence 1) to A
  Host A replies with an ACK
  Host A sends an authentication reply (sequence 2) to B
  Client B replies with an association request
  Host A replies with an ACK
  Host A sends an association response
  Client B responds with an ACK
After this, the two are associated, and will remain so until the transfer is complete or one is idle for several seconds, at which point they will de-associate. For more information on the association process, see the 802.11 standard.

Download process (after authentication)
  Host sends Pings (type 0x01, replies are 0x00, 0x07)
  Host sends RSA frame (type 0x03, replies 0x08)
  Host sends NDS header (type 0x04, replies 0x09)
  Host sends ARM9 binary (type 0x04, replies 0x09)
  Host sends ARM7 binary (type 0x04, replies 0x09)
  Host terminates transfer (type 0x05, no replies)

The WMB protocol ostensibly implements layers 3 to 7 of the OSI network model, but does not define a new type of network addresses. However, it does define a couple of special broadcast-like MAC addresses within the assigned Nintendo namespace (00:09:BF).

The three channels or flows used for all communications after the MAC broadcast beacons take the form 03:09:BF:00:00:xx, where xx is:
  00 for the main data flow, from host to client    (sent via Port 4808090h)
  10 for the client to host replies                 (sent via Port 4808094h)
  03 for the feedback flow, host to client (acknowledges the replies)

Observed commands:
  Command   Description
  0x01      Ping / Name request
  0x03      RSA signature frame
  0x04      Data packet
  0x05      Post-idle / unknown

Observed replies
  Reply ID  Description
  0x00      Pong (ping reply)
  0x07      Name reply
  0x08      RSA frame reply
  0x09      Data packet reply

The host does something unusual with the 802.11 sequence control field, each packet sent out on the 00 flow has a sequence control number 2 greater than the previous one, even if they are sent sequentially. When the host acknowledges a reply (on flow 03) from the client about a particular packet, it uses the sequence number one after the original packet number it sent out on 00. This is the root of one of the major problems in finding a PC card that can transmit WMB packets, as very few cards provide user control over it. Even when a card is capable of 'raw' 802.11 transmission, it typically takes care of the sequence control field in hardware or firmware, filling it with a constantly incrementing number.


Host-to-client packets (on the 0x00 flow)
  0  1  2  3  4    5     6..e-3   e-2  e-1  e-0
  06 01 02 00 Size Flags Payload  00   02   00
Above first two bytes are W_CMD_REPLYTIME.
Above next two bytes are slave flags (bit1..15 for slave 1..15) (1=connected).
The size field is in terms of half-words (16 bits), and includes the flags byte along with the payload (so a size of 0x03 represents a flag byte, a command byte, and 4 bytes of payload).
When flags is 0x11, the first byte of the payload is a command. There seems to be no important data when flags is not 0x11 (seen occasionally as 0x01), and ignoring them still results in a complete dump.

The Ping messages (type 0x00) have a payload size of 0x03, but always contain zeroes in the payload. They seem to be used only to keep the connection alive while waiting for the host DS to start the transfer, to prevent a time-out de-association.

RSA signature frame payload (type 0x03)
The RSA frame format (type 0x03) sends a table of information about the game being downloaded (most of it redundant with the NDS header, see Appendix), as well as the RSA signature for the DS. I have not looked into computing the signature, as homebrew developers are not privy to Nintendo's private key, making signing a fruitless activity, but it is my understanding that the signature is a 128 byte public key and an 8 byte SHA-1 message digest over the NDS header, ARM9 binary, and ARM7 binary. Notably: the RSA frame itself is not included as part of the data being signed, bringing up various security issues and making Nintendo's firmware engineers look amateurish at best.
There are several abortive sendings of empty RSA frames with a size field of 0x03, before the real frame is sent (always with a size field of 0x75).
  Offset Size Description
  0x00 4   ARM9 execute address
  0x04 4   ARM7 execute address
  0x08 4   0x00
  0x0C 4   Header destination
  0x10 4   Header destination
  0x14 4   Header size (0x160)
  0x18 4   0x00
  0x1C 4   ARM9 destination address
  0x20 4   ARM9 destination address
  0x24 4   ARM9 binary size
  0x28 4   0x00
  0x2C 4   0x022C0000
  0x30 4   ARM7 destination address
  0x34 4   ARM7 binary size
  0x38 4   0x01
  0x3C 136 Signature block
  0xC4 36  0x00's
  0xE8 0   End of frame payload
The offsets in the table are from after the command byte, i.e. two bytes into the 234 bytes of payload including the flags.
The unknown address 0x022C0000 is probably ARM7 related, by comparison with the duplicated header and ARM9 destination addresses 32 and 16 bytes before it, although it has no known significance according to the NDS header.

Data packet (type 0x04)
The data packets (type 0x04) include a transport-layer sequence number inside of the data packet itself, but no destination offset or other mechanism to allow the packets to be processed out-of-order. The only way to place the data at the correct location in memory is to re-order the packets according to the sequence number and process them sequentially.
  0  1     2       3   ..  End
  00 [Sequence #]  xx  ..  yy
The sequence number is a zero based little-endian number. Each packet only contains data for one of the three destination blocks (header, ARM9, ARM7), so the change-of-destination check only needs to be made on packet boundaries.


Client to Host Replies (on the 0x10 flow)
The replies from client to host are sent on the 0x10 flow. The client uses an incrementing sequence control number for all of its packets, with no unusual trickery. Each reply is sent as a standard 802.11 data frame (typically as a Data + CF-Acknowledgement), consisting of 10 data bytes for the WMB payload. The first two are always 0x04 0x81, with the third byte indicating the type of reply, and the remaining 7 bytes being reply-specific.

Idle / Pong reply (type 0x00)
  0  1  2  3  4  5  6  7  8  9
  04 81 00 00 00 00 00 00 00 00
One type of packet frequently sent before a download gets underway is what I have termed the Idle or Pong packet (in response to 0x00 'Pings'). It has a reply type field of 0x00, and does not contribute any additional information.

Name reply (type 0x07)
  0  1  2  3  4     5      6     7      8     9
  04 81 07 01 [Character0] [Character1] [Character2]
  04 81 07 02 [Character3] [Character4] [Character5]
  04 81 07 03 [Character6] [Character7] [Character8]
  04 81 07 04 [Character9] 01    00     00    00
The name reply (type 0x07) is sent shortly after association is completed, although I am not certain what triggers it. There are a variable number of pings preceding this reply, but most are replied via Pongs. The name reply sends the user-configured DS name (set in the firmware menu) split over four messages (with the 4th byte of the packet specifying which message fragment this is, 1 based). This can be a total length of 10 UCS-2 characters, although all four messages are still sent if it is shorter (padded with nulls to 10 characters, and then 01 and then nulls until the end of the frame).

RSA frame receipt reply (type 0x08)
  0  1  2  3  4  5  6  7  8  9
  04 81 08 xx xx xx xx xx xx xx
The RSA frame receipt reply contains no extra information; it only acknowledges receipt of a type 0x03 host packet on the main flow (0x00). Bizarrely, the xx bytes in the above table are not driven to a particular value when replying to an RSA frame, and usually contain the same data as the second (of four) name response frames.

Data packet receipt reply (type 0x09)
  0  1  2    3    4        5     6     7  8  9
  04 81 09 [Last packet] [Best packet] 00 00 00
[last packet] is the packet number being acknowledged
[best packet] is the highest continuous packet number seen so far
Packet IDs are little-endian numbers, like other Nintendo provided data.


Host to client acknowledgements (on the 0x03 flow)

These packets contain four data bytes, but three are always zero. The first seems to be random, with no connection to the acknowledged data. The actual indication of acknowledgement is the sequence control number of the packet. It is set to be one greater than the sequence control number of the initial host packet (sent on flow 0x00) that the client has just responded to, to indicate that the reply was received.

Host-to-client acknowledgement
  0  1  2  3
  ?? 00 00 00

The .NDS format is the standard format for Nintendo DS programs; it originated on original game cards and also appears to a limited extent in WMB binaries. The WMB process only transfers the first 0x160 bytes of the header, the ARM9 binary, and the ARM7 binary (in that order), ignoring the file name and file allocation tables, the overlay data, and some information stored in the banner (the rest is transmitted partially via the beacon advertisement process).

  DS Wifi IEEE802.11 Frames

MAC Frame Format
  10..30 bytes    MAC Header
  0..2312 bytes   Frame Body
  4 bytes         Frame Check Sequence (FCS) (aka checksum)

MAC Header (10..30 bytes)
  Size Content
  2    Frame Control Field (FC)
  2    Duration/ID
  6    Address 1
 (6)   Address 2 (if any)
 (6)   Address 3 (if any)
 (2)   Sequence Control (if any)
 (6)   Address 4 (if any)

Frame Control Field (FC)
  Bit  Expl.
  0-1  Protocol Version   (0=Current, 1..3=Reserved)
  2-3  Type               (0=Managment, 1=Control, 2=Data, 3=Reserved)
  4-7  Subtype            (see next chapters) (meaning depends on above Type)
  8    To Distribution System (DS)
  9    From Distribution System (DS)
  10   More Fragments
  11   Retry
  12   Power Managment    (0=Active, 1=STA will enter Power-Safe mode after..)
  13   More Data
  14   Wired Equivalent Privacy (WEP) Encryption (0=No, 1=Yes)
  15   Order
Bit 8-11 and Bit 13-15 are always 0 in Control Frames.

Duration/ID Field (16bit)
  0000h..7FFFh  Duration (0-32767)
  8000h         Fixed value within frames transmitted during the CFP
                (CFP=Contention Free Period)
  8001h..BFFFh  Reserved
  C000h         Reserved
  C001h..C7D7h  Association ID (AID) (1..2007) in PS-Poll frames
  C7D8h..FFFFh  Reserved

48bit MAC Addresses
MAC Addresses are 48bit (6 bytes) (Bit0 is the LSB of the 1st byte),
  0     Group Flag (0=Individual Address, 1=Group Address)
  1     Local Flag (0=Universally Administered Address, 1=Locally Administered)
  2-23  22bit Manufacturer ID (assigned by IEEE)
  24-47 24bit Device ID (assigned by the Manufacturer)
Special NDS related Addresses:
  00 09 BF xx xx xx  NDS-Consoles (Original NDS with firmware v1-v5)
  00 16 56 xx xx xx  NDS-Consoles (Newer NDS-Lite with firmware v6 and up)
  00 23 CC xx xx xx  DSi-Consoles (Original DSi with early mainboard; nocash)
  00 24 1E xx xx xx  DSi-Consoles (Another DSi; scanlime)
  03 09 BF 00 00 00  NDS-Multiboot: host to client (main data flow)
  03 09 BF 00 00 10  NDS-Multiboot: client to host (replies)
  03 09 BF 00 00 03  NDS-Multiboot: host to client (acknowledges replies)
  FF FF FF FF FF FF  Broadcast to all stations (eg. Beacons)

Sequence Control Field
  Bit  Expl.
  0-3  Fragment Number (0=First (or only) fragment)
  4-15 Sequence Number
(increment by 1, except on retransmissions, ie. retries)

WEP Frame Body
  3 bytes     Initialization Vector
  1 byte      Pad (6bit, all zero), Key ID (2bit)
  1..? bytes  Data (encrypted data)
  4 bytes     ICV (encrypted CRC32 across Data)

  DS Wifi IEEE802.11 Managment Frames (Type=0)

All Managment Frames have 24-byte Frame Header, with following values:
  FC(2), Duration(2), DA(6), SA(6), BSSID(6), Sequence Control(2)
The content of the Frame Body depends on the FC's Subtype:
  Subtype                   Frame Body
  0 Association request     Capability, ListenInterval, SSID, SuppRates
  1 Association response    Capability, Status, AID, SuppRates
  2 Reassociation request   Capability, ListenInterval, CurrAP, SSID, SuppRates
  3 Reassociation response  Capability, Status, AID, SuppRates
  4 Probe request           SSID, SuppRates
  5 Probe response          Same as for Beacon (but without TIM)
  8 Beacon                  Timestamp,BeaconInterval,Capability,SSID,SuppRates,
                             FH Parameter Set (when using Frequency Hopping),
                             DS Parameter Set (when using Direct Sequence),
                             CF Parameter Set (when supporting PCF),
                             IBSS Parameter Set (when in an IBSS),
                             TIM (when generated by AP)
  9 Announcement traffic indication message (ATIM)    Body is "null" (=none?)
  A Disassociation          ReasonCode
  B Authentication          AuthAlgorithm, AuthSequence, Status, ChallengeText
  C Deauthentication        ReasonCode
Subtypes 6..7, and D..F are Reserved.

The separate components of the Frame Body are...
64bit Parameters (8 bytes)
  Timestamp: value of the TSFTIMER (see 11.1) of a frame's source. Uh?
48bit Parameters (6 bytes)
  Current AP (Access Point): MAC Address of AP with which station is associated
16bit Parameters (2 bytes)
  Capability Information (see list below)
  Status code (see list below) (0000h=Successful, other=Error code)
  Reason code (see list below) (Error code)
  Association ID (AID) (C000h+1..2007)
  Authentication Algorithm (0=Open System, 1=Shared Key, 2..FFFFh=Reserved)
  Authentication Transaction Sequence Number (Open System:1-2, Shared Key:1-4)
  Beacon Interval (Time between beacons, N*1024 us)
  Listen Interval (see note below)
Information elements (1byte ID, 1byte LEN, followed by LEN byte(s) data)
  ID      LEN      Expl.
  00h     00h-20h  SSID (LEN=0 for broadcast SSID)
  01h     01h-08h  Supported rates; each (nn AND 7Fh)*500kbit/s, bit7=flag
  02h     05h      FH (Frequency Hopping) Parameter Set
                     DwellTime(16bit), HopSet, HopPattern, HopIndex
  03h     01h      DS (Distribution System) Parameter Set; Channel (01h..0Eh)
  04h     06h      CF Parameter Set; Count, Period, MaxDuration, RemainDuration
  05h     04h..FEh TIM; Count,Period,Control, 1-251 bytes PartialVirtualBitmap
  06h     02h      IBSS Parameter Set; ATIM Window length (16bit)
  07h-0Fh -        Reserved
  10h     02h..FEh Challenge text; 1-253 bytes Authentication data
                    (Used only for Shared Key sequence no 2,3)
                    (none such for Open System)
                    (none such for Shared key sequence no 1,4)
  11h-1Fh -        Reserved for challenge text extension
  20h-FFh -        Reserved
  DDh     var      Reserved but used by Nintendo for NDS-Multiboot beacons
IDs 20h-FFh are commonly used; I've received values 2xh..3xh and DDh (from non-nintendo network routers in the neighborhood); no idea if these "Reserved" IDs are somewhere officially documented?

Capability Information
  Bit0    ESS
  Bit1    IBSS
  Bit2    CF-Pollable
  Bit3    CF-Poll Request
  Bit4    Privacy
  Bit5    Short Preamble  (IEEE802.11b only)
  Bit6    PBCC            (IEEE802.11b only)
  Bit7    Channel Agility (IEEE802.11b only)
  Bit5-7  Reserved (0) (original IEEE802.11 specs)
  Bit8-15 Reserved (0)

Listen Interval
  ... used to indicate to the AP how often an STA wakes to listen to Beacon
  management frames. The value of this parameter is the STA's Listen Interval
  parameter of the MLME-Associate. request primitive and is expressed in
  units of Beacon Interval.

Reason codes
  00h Reserved
  01h Unspecified reason
  02h Previous authentication no longer valid
  03h Deauthenticated because sending station is leaving (or has left) IBSS
       or ESS
  04h Disassociated due to inactivity
  05h Disassociated because AP is unable to handle all currently associated
  06h Class 2 frame received from nonauthenticated station
  07h Class 3 frame received from nonassociated station
  08h Disassociated because sending station is leaving (or has left) BSS
  09h Station requesting (re)association is not authenticated with responding
  0Ah..FFFFh Reserved

Status codes
  00h Successful
  01h Unspecified failure
  02h..09h Reserved
  0Ah Cannot support all requested cap's in the Capability Information field
  0Bh Reassociation denied due to inability to confirm that association exists
  0Ch Association denied due to reason outside the scope of this standard
  0Dh Responding station doesn't support the specified authentication algorithm
  0Eh Received an Authentication frame with authentication transaction sequence
       number out of expected sequence
  0Fh Authentication rejected because of challenge failure
  10h Authentication rejected due to timeout waiting for next frame in sequence
  11h Association denied because AP is unable to handle additional associated
  12h Association denied due to requesting station not supporting all of the
       data rates in the BSSBasicRateSet parameter
  13h Association denied due to requesting station not supporting
       the Short Preamble option (IEEE802.11b only)
  14h Association denied due to requesting station not supporting
       the PBCC Modulation option (IEEE802.11b only)
  15h Association denied due to requesting station not supporting
       the Channel Agility option (IEEE802.11b only)
  13h-15h Reserved (original IEEE802.11 specs)
  16h..FFFFh Reserved

  DS Wifi IEEE802.11 Control and Data Frames (Type=1 and 2)

Control Frames (Type=1)
All Control Frames have 10-byte or 16-byte headers, depending on the Subtype:
  Subtype                          Frame Header
  A   Power Save (PS)-Poll         FC  AID       BSSID  TA
  B   Request To Send (RTS)        FC  Duration  RA     TA
  C   Clear To Send (CTS)          FC  Duration  RA     -
  D   Acknowledgment (ACK)         FC  Duration  RA     -
  E   Contention-Free (CF)-End     FC  Duration  RA     BSSID
  F   CF-End + CF-Ack              FC  Duration  RA     BSSID
Subtypes 0..9 are Reserved. Control Frames do not have a Frame Body, so the Header is directly followed by the FCS.

Data Frames (Type=2)
All Data Frames consist of the following components:
  FC, Duration/ID, Address 1, Address 2, Address 3, Sequence Control,
  Address 4 (only on From DS to DS), Frame Body, FCS.
The meaning of the 3 or 4 addresses depends on Frame Control FromDS/ToDS bits:
  Frame Control    Address 1  Address 2  Address 3  Address 4
  From STA to STA  DA         SA         BSSID      -
  From DS  to STA  DA         BSSID      SA         -
  From STA to DS   BSSID      SA         DA         -
  From DS  to DS   RA         TA         DA         SA
Frame Control Subtypes for Data Frames (Type=2) are:
  0   Data
  1   Data + CF-Ack
  2   Data + CF-Poll
  3   Data + CF-Ack + CF-Poll
  4   Null function (no data)
  5   CF-Ack (no data)
  6   CF-Poll (no data)
  7   CF-Ack + CF-Poll (no data)
  8-F Reserved

  DS Xboo

The DS Xboo cable allows to upload NDS ROM-Images (max 3.9MBytes) to the console via parallel port connection. Should be the best, simpliest, easiest, and fastest way to test code on real hardware. And, at a relative decent price of 11 cents per diode it should be by far the least expensive way. You'll have to touch classic tools (screwdrivers, knifes, saws, tweezers, and solder) which will probably scare most of you to hell.

DS XBOO Connection Schematic
  Console Pin/Names             Parallel Port Pin/Names
  RFU.9    FMW.1 D    ---|>|--- DSUB.14    CNTR.14    AutoLF
  RFU.6    FMW.2 C    ---|>|--- DSUB.1     CNTR.1     Strobe
  RFU.10   FMW.3 /RES ---|>|--- DSUB.16    CNTR.31    Init
  RFU.7    FMW.4 /S   ---|>|--- DSUB.17    CNTR.36    Select
  RFU.5    FMW.5 /W   --. SL1A  -          -          N.C.
  RFU.28   FMW.6 VCC  __| SL1B  -          -          N.C.
  RFU.2,12 FMW.7 VSS  --------- DSUB.18-25 CNTR.19-30 Ground
  RFU.8    FMW.8 Q    --------- DSUB.11    CNTR.11    Busy
  P00 Joypad-A        ---|>|--- DSUB.2     CNTR.2     D0
  P01 Joypad-B        ---|>|--- DSUB.3     CNTR.3     D1
  P02 Joypad-Select   ---|>|--- DSUB.4     CNTR.4     D2
  P03 Joypad-Start    ---|>|--- DSUB.5     CNTR.5     D3
  P04 Joypad-Right    ---|>|--- DSUB.6     CNTR.6     D4
  P05 Joypad-Left     ---|>|--- DSUB.7     CNTR.7     D5
  P06 Joypad-Up       ---|>|--- DSUB.8     CNTR.8     D6
  P07 Joypad-Down     ---|>|--- DSUB.9     CNTR.9     D7
  RTC.1 INT aka SI    --------- DSUB.10    CNTR.10    /Ack
Parts List: 15 wires, four (DS) or twelve (DS-Lite) "BAT 85" diodes, 1 parallel port socket.

DS XBOO Connection Notes
The Firmware chip (FMW.Pins) hides underneath of the RFU shielding plate, so it'd be easier to connect the wires to the RFU.Pins (except DS-Lite: The RFU pins are terribly small (and have different pin-numbers), so either using FMW.Pins, or using mainboard vias (see below GIF) would be easier). The easiest way for the /W-to-VCC connection is to shortcut SL1 by putting some solder onto it.
The P00..P07 and INT signals are labeled on the switch-side of the mainboard, however, there should be more room for the cables when connecting them to via's at the bottom-side (except DS-Lite: P01 is found only at switch-side) image below may help to locate that pins, (GIF-Image, 7.5KBytes)
At the parallel port side, DSUB.Pins or CNTR.Pins can be used for 25pin DSUB or 36pin Centronics sockets, the latter one allowing to use a standard printer cable.
The ring printed on the diodes is pointing towards parallel port side, the 4 diodes are required to prevent the parallel port to pull-up LOW levels on the NDS side, be sure to use BAT85 diodes, cheaper ones like 1N4148 are loosing too much voltage and won't gain stable LOW levels.
The power managment chip in the DS-Lite simply refuses to react to the Power-On button when P00..P07 are dragged high by the parallel port (even if it is in HighZ state), the 8 diodes in the data-lines are solving that problem (they are required on DS-Lite only, not on original DS).

DS XBOO Operation Notes
The main Upload function is found in no$gba Utility menu, together with further functions in Remote Access sub-menu.
Before uploading anything: download the original firmware, the file is saved as FIRMnnnn.BIN, whereas "nnnn" is equal to the last 16bit of the consoles 48bit MAC address, so Firmware-images from different consoles are having unique filenames. If you don't already have, also download the NDS BIOS, the BIOS contains encryption seed data required to encrypt/decrypt secure area; without having downloaded the BIOS, no$gba will be working only with unencrypted ROM-images. Next, select Patch Firmware to install the nocash firmware.

DS XBOO Troubleshooting
Be sure that the console is switched on, and that the XBOO cable is connected, and that you have selected the correct parallel port in no$gba setup (the "multiboot" options in Various Options screen), and, of course, try avoid to be fiddling with the joypad during uploads.
I've tested the cable on two computers, the overall upload/download stuff should work stable. The firmware access functions - which are required only for (un-)installation - worked only with one of the two computers; try using a different computer/parallel port in case of problems.

Nocash Firmware
The primary purpose is to receive uploaded NDS-images via parallel port connection, additionally it's containing bootmenu and setup screens similar to the original firmware. The user interface is having less cryptic symbols and should be alltogether faster and easier to use. Important Information about Whatever is supported (but it can be disabled). The setup contains a couple of additional options like automatic daylight saving time adjustment.
The bootmenu allows to boot normal NDS and GBA carts, it does additionally allow to boot NDS-images (or older PassMe-images) from flashcards in GBA slot. Furthermore, benefits of asm coding, the nocash firmware occupies less than 32KBytes, allowing to store (and boot) smaller NDS-images in the unused portion of the firmware memory (about 224KBytes), the zero-filled region between cart header and secure area, at 200h..3FFFh, is automatically excluded, so the image may be slightly bigger than the available free memory space.

Unlike the original firmware, the current version cannot yet boot via WLAN.

  DSi Reference

Basic Hardware Features (mostly same as NDS)
NDS Reference
DSi Basic Differences to NDS

New Hardware Features
DSi I/O Map
DSi Control Registers (SCFG)
DSi XpertTeak (DSP)
DSi New Shared WRAM (for ARM7, ARM9, DSP)
DSi SoundExt
DSi Advanced Encryption Standard (AES)
DSi Cartridge Header
DSi Touchscreen/Sound Controller
DSi I2C Bus
DSi Cameras
DSi SD/MMC Protocol and I/O Ports
DSi SD/MMC Filesystem
DSi Atheros Wifi SDIO Interface
DSi Atheros Wifi Internal Hardware
DSi GPIO Registers
DSi Console IDs
DSi Unknown Registers
DSi Notes
DSi Exploits
DSi Regions

General Info
ARM CPU Reference
BIOS Functions
External Connectors

Credits: (now spammed)

  DSi Basic Differences to NDS

There are several new interrupt sources in IE/IF registers, plus further ones in new IE2/IF2 registers.
DS Interrupts

Essentially same as for NDS. Some details can be changed in SCFG_EXT. For the 2D Engine, DISPSTAT.Bit6 contains a new "LCD Initialization Ready" flag on both ARM7 and ARM9 side (the bit is checked by DSi System Menu) (the bit is supposedly used at power-up, maybe also for wake-up from certain sleep modes).

BIOS SWI Functions
Some SWI Functions are changed (bugged in some cases), new SHA1 and RSA functions are added, and the initial RAM contents are moved from 27FFxxxh to 2FFFxxxh (with some extra fields, eg. a copy of extended DSi cart header).
BIOS Functions

Revised Hardware Functions
Some hardware features have been slightly revised (for example, the division by 0 flag was fixed). The revised functions can be enabled/disabled via SCFG registers.
DSi Control Registers (SCFG)

NDS Slot / Cartridges
DSi carts are using an extended cart header (1000h bytes), with RSA signature (making it problematic to run unlicensed/homebrew code), the icon/title format has been also extended, and the cartridge protocol contains a new command (command 3Dh, for unlocking extra DSi regions on the cartridge, and for reading new DSi secure area blocks).
The NDS Slot's Reset signal can be controlled by software (required because otherwise one could use only command 3Ch or 3Dh, but not both). The Power supply pin can be also controlled by software (yet not 100% confirmed how?). Moreover, there's new cartridge inserted sensor. And, DSi prototypes did have two NDS slots; DSi retail consoles do have only one slot, but they do still contain prototype relicts internally (like extra registers and extra irq sources for second slot) (there appear to be also unused extra pins on the CPU, but they couldn't be used without desoldering the whole chip).

Enable Bits
One new DSi invention is that setting Enable Bits (eg. for NDMA or CAM registers) is write-protecting the corresponding registers (ie. those registers can be initialized only while the Enable Bits are off).

SPI Touchscreen Controller
This chip is working entirely different in DSi mode. It's still accessed via SPI bus, but with some new MODE/INDEX values.
DSi Touchscreen/Sound Controller
The NDS Touchscreen controller did additionally allow to read Temperature and Touchscreen Pressure - unknown if the DSi is also supporting such stuff (via whatever DSi-specific registers).
The touchscreen hardware can be switched to NDS compatibility mode (for older games), but unknown how to do that.

SPI Power Managment Device
The Power Managment Device contains some changed register, and some new extra registers. Internally, it is actually split to two devices: The power managment chipselect signal connects to U3 and U4 chips. Ie. some SPI registers are processed by U3 (power down, and backlight enable), and others by R4 (audio out and microphone).
Further functions like LED control and backlight brightness are moved to the BPTWL chip, accessed via I2C bus instead of SPI bus - the power LED blink feature (which was used on Wifi access) seems to be no longer working, however, the Wifi LED seems to blink automatically on Wifi access; the changed backlight brightness mechanism shouldn't cause compatibility issuses since that feature is somewhat reserved for being controlled by the firmware.

SPI FLASH Memory (Wifi Calibration, User Settings, Firmware)
This memory does still exist, but it's only 128Kbytes in DSi (instead 256K), and most of it is empty (since the DSi Firmware is stored in the eMMC chip).
Reportedly, newer DSi consoles are somehow disabling access to the FLASH memory and to the Wifi hardware (unknown if that's true; it might actually disable both the SPI chipselect and Wifi hardware, or there might be just some issue with a different/smaller FLASH chip being used - and problems with reading the Wifi Calibration would indirectly cause problems to use the Wifi hardware).

Should be compatible with NDS. But seems to contain extra registers?
One of the RTC outputs does also seem to supply some (32kHz?) clock to some other mainboard components?
[XXX see Seiko S-35199A01 datasheet].

Supports new WPA and WPA2 encryption (unknow how to use them yet), and supports higher transfer rates (unknown how yet too, maybe just by changing the 10=1Mbit/s and 20=2MBit/s settings to higher values).
As said above (see SPI FLASH), the Wifi hardware can be reportedly disabled.
SPI FLASH contains three new access point settings (for WPA/WPA2/proxy support):
DS Firmware Wifi Internet Access Points
The access point configuration can be done via Firmware (unlike as on NDS, where it needed to be done by the games).

GBA Slot
The GBA Slot has been removed. The memory regions and IRQ bits do still exist internally, but the DSi does basically behave as if there is no GBA cartridge inserted. Reading GBA ROM areas does return FFFFh halfwords instead of the usual open bus values though.

NDS Mode
In NDS mode, the DSi is basically working same as NDS: The new extra hardware is disabled, original NDS BIOS ROMs are mapped, and the hardware is simulating the old touchscreen controller.
Nonetheless, there are still a some small differences to real NDS consoles:
- Unlicensed NDS carts don't work (requires RSA, or whitelist for older games)
- GBA Slot is removed (more or less behaves as if no cart inserted)
- DSi ports 4004700h and 4004C0xh can be read (and written?) even in DS mode
- SPI Power Managment has some added/removed/changed registers
- SPI Touchscreen controller doesn't support pressure & temperature
- SPI FLASH exists, but it's smaller, and has extra access point info, etc.
- ARM7 BIOS has only first 20h bytes locked (instead first 1204h bytes)
- Power Button issues a Reset (goes to boot menu) (instead of plain power off)
- Maybe some new Wifi features are available even in DS mode (?)
- RTC extra registers (if they do really exist) should exist in DS mode (?)
Unknown: does hot-swapping auto-power-off the nds-cart-slot in nds mode?

  DSi I/O Map

DSi Memory Map
The overall memory map is same as on NDS. New/changed areas are:
  0000000h  64Kbyte ARM7 BIOS   (unlike NDS which had only 16KB)
  2000000h  16MByte Main RAM    (unlike NDS which had only 4MB)
  3000000h  800Kbyte Shared RAM (unlike NDS which had only 32KB)
  4004000h  New DSi I/O Ports
  8000000h  Fake GBA Slot (32MB+64KB) (FFh-filled; when mapped to current CPU)
  C000000h  Mirror of 16Mbyte Main RAM
  D000000h  Open Bus? in retail version, Extra 16Mbyte MainRAM in debug version
  FFFF000h  64Kbyte ARM9 BIOS   (unlike NDS which had only 4KB)

DSi I/O Maps
The overall DSi I/O Maps are same as on NDS,
DS I/O Maps
additional new/changed registers are:

ARM9 NDS Register that are changed in DSi mode
  4000004h 2   DISPSTAT (new Bit6, LCD Initialization Ready Flag)
  4000204h 2   EXMEMCNT (removed Bit0-7, ie. the GBA-slot related bits)
  4000210h 4   IE       (new interrupt sources, removed GBA-slot IRQ)
  4000214h 4   IF       (new interrupt sources, removed GBA-slot IRQ)
  40021A0h 4   Unknown, nonzero, probably same/silimar as on DSi7 side
  40021A4h 4   Unknown, zero, probably same/silimar as on DSi7 side
ARM9 DSi Control
  4004000h 2   SCFG_A9ROM DSi - NDS9 - ROM Status (R) [0000h]
  4004004h 2   SCFG_CLK   DSi - NDS9 - New Block Clock Control (R/W)
  4004006h 2   SCFG_RST   DSi - NDS9 - New Block Reset (R/W)
  4004008h 4   SCFG_EXT   DSi - NDS9 - Extended Features (R/W)
  4004010h 2   SCFG_MC    Memory Card Interface Status (16bit) (undocumented)
ARM9 DSi WRAM Bank Control
  4004040h 4   MBK1       WRAM-A Slots for Bank 0,1,2,3  ;\Global ARM7+ARM9
  4004044h 4   MBK2       WRAM-B Slots for Bank 0,1,2,3  ; Slot Mapping
  4004048h 4   MBK3       WRAM-B Slots for Bank 4,5,6,7  ; (R or R/W, depending
  400404Ch 4   MBK4       WRAM-C Slots for Bank 0,1,2,3  ; on MBK9 setting)
  4004050h 4   MBK5       WRAM-C Slots for Bank 4,5,6,7  ;/
  4004054h 4   MBK6       WRAM-A Address Range           ;\Local ARM9 Side
  4004058h 4   MBK7       WRAM-B Address Range           ; (R/W)
  400405Ch 4   MBK8       WRAM-C Address Range           ;/
  4004060h 4   MBK9       WRAM-A/B/C Slot Master Selection (R)
  4004100h 4   NDMAGCNT   NewDMA Global Control                     ;-Control
  4004104h 4   NDMA0SAD   NewDMA0 Source Address                    ;\
  4004108h 4   NDMA0DAD   NewDMA0 Destination Address               ;
  400410Ch 4   NDMA0TCNT  NewDMA0 Total Length for Repeats          ; NewDMA0
  4004110h 4   NDMA0WCNT  NewDMA0 Logical Block Size                ;
  4004114h 4   NDMA0BCNT  NewDMA0 Block Transfer Timing/Interval    ;
  4004118h 4   NDMA0FDATA NewDMA0 Fill Data                         ;
  400411Ch 4   NDMA0CNT   NewDMA0 Control                           ;/
  4004120h 4   NDMA1SAD                                             ;\
  4004124h 4   NDMA1DAD                                             ;
  4004128h 4   NDMA1TCNT                                            ; NewDMA1
  400412Ch 4   NDMA1WCNT                                            ;
  4004130h 4   NDMA1BCNT                                            ;
  4004134h 4   NDMA1FDATA                                           ;
  4004138h 4   NDMA1CNT                                             ;/
  400413Ch 4   NDMA2SAD                                             ;\
  4004140h 4   NDMA2DAD                                             ;
  4004144h 4   NDMA2TCNT                                            ; NewDMA2
  4004148h 4   NDMA2WCNT                                            ;
  400414Ch 4   NDMA2BCNT                                            ;
  4004150h 4   NDMA2FDATA                                           ;
  4004154h 4   NDMA2CNT                                             ;/
  4004158h 4   NDMA3SAD                                             ;\
  400415Ch 4   NDMA3DAD                                             ;
  4004160h 4   NDMA3TCNT                                            ; NewDMA3
  4004164h 4   NDMA3WCNT                                            ;
  4004168h 4   NDMA3BCNT                                            ;
  400416Ch 4   NDMA3FDATA                                           ;
  4004170h 4   NDMA3CNT                                             ;/
ARM9 DSi Camera Module
  4004200h 2   CAM_MCNT   Camera Module Control (16bit)
  4004202h 2   CAM_CNT    Camera Control (16bit)
  4004204h 4   CAM_DAT    Camera Data (32bit)
  4004210h 4   CAM_SOFS   Camera Trimming Starting Position Setting (32bit)
  4004214h 4   CAM_EOFS   Camera Trimming Ending Position Setting (32bit)
ARM9 DSi DSP - XpertTeak processor
  4004300h 2   DSP_PDATA  DSP Transfer Data    (16bit)
  4004304h 2   DSP_PADR   DSP Transfer Address (16bit)
  4004308h 2   DSP_PCFG   DSP Configuration    (16bit)
  400430Ch 2   DSP_PSTS   DSP Status           (16bit)
  4004310h 2   DSP_PSEM   DSP ARM9-to-DSP Semaphore        (16bit)
  4004314h 2   DSP_PMASK  DSP DSP-to-ARM9 Semaphore Mask   (16bit)
  4004318h 2   DSP_PCLEAR DSP DSP-to-ARM9 Semaphore Clear (W) (16bit)
  400431Ch 2   DSP_SEM    DSP DSP-to-ARM9 Semaphore Data   (16bit)
  4004320h 2   DSP_CMD0   DSP Command Register 0 (16bit)
  4004324h 2   DSP_REP0   DSP Reply Register 0   (16bit)
  4004328h 2   DSP_CMD1   DSP Command Register 1 (16bit)
  400432Ch 2   DSP_REP1   DSP Reply Register 1   (16bit)
  4004330h 2   DSP_CMD2   DSP Command Register 2 (16bit)
  4004334h 2   DSP_REP2   DSP Reply Register 2   (16bit)
  4004340h 40h Unknown (looks like mirror of 4004300h..400433Fh)
  4004380h 40h Unknown (looks like mirror of 4004300h..400433Fh)
  40043C0h 40h Unknown (looks like mirror of 4004300h..400433Fh)

  4000004h 2   DISPSTAT (new Bit6, LCD Initialization Ready Flag) (as DSi9?)
  4000204h 2   EXMEMCNT (removed Bit0-7: GBA-slot related bits)   (as DSi9?)
  4000210h 4   IE       (new interrupt sources, removed GBA-slot IRQ)
  4000214h 4   IF       (new interrupt sources, removed GBA-slot IRQ)
  4000218h     IE2      (new register with more new interrupt sources)
  400021Ch     IF2      (new register with more new interrupt sources)
ARM7 DSi Maybe 2nd ROM slot (DSi prototypes did have 2 cartridge slots)
  40021A0h 4   Unknown, nonzero, probably related to below 40021A4h
  40021A4h 4   Unknown, related to 40001A4h (Gamecard Bus ROMCTRL)
ARM7 DSi Unknown
  4004000h 1   SCFG...? used by BIOS and SystemFlaw  (maybe A9ROM) (bit0,1)
  4004001h 1   SCFG...? used by BIOS and SystemFlaw  (maybe A7ROM) (bit0,1,2)
  4004004h ?   SCFG...? used by SystemFlaw           (maybe CLK)
  4004006h ?   SCFG...? used by SystemFlaw           (maybe RST)
  4004008h ?   SCFG...? used by SystemFlaw           (maybe EXT)
  4004010h 2   SCFG_MC - Memory Card Interface Control (R/W)
  4004012h 2   Unknown, there is something           (?) (SysMenu: 1988h)
  4004014h 2   Unknown, there is something           (?) (SysMenu: 264Ch)
  4004020h ?   ...?     used by SystemFlaw  ;bit0 = wifi?   (?)
  4004024h ?   ...?     used by SystemFlaw  ;bit0-1 = ?     (?)
  4004040h 20  Unknown, probably read-only mirror of ARM9's MBK1..MBK5
  4004054h 12  Unknown, probably local version of ARM9's MBK6..MBK8
  4004060h 1   MBK9...?  used by BIOS
  4004061h 1   MBK9...?  used by BIOS
  4004062h 1   MBK9...?  used by BIOS
  4004100h 74h NewDMA (new DMA, as on ARM9i, see there)
ARM7 DSi AES Encryption Unit
  4004400h 4   AES_CNT    (R/W)
  4004404h 4   AES_BLKCNT (W)
  4004408h 4   AES_WRFIFO (W)
  400440Ch 4   AES_RDFIFO (R)
  4004420h 16  AES_IV     (W)
  4004430h 16  AES_MAC    (W)
  4004440h 48  AES_KEY0   (W) ;used for modcrypt
  4004470h 48  AES_KEY1   (W) ;used for ?
  40044A0h 48  AES_KEY2   (W) ;used for JPEG signatures
  40044D0h 48  AES_KEY3   (W) ;used for eMMC sectors
ARM7 DSi I2C Bus
  4004500h 1   I2C_DATA
  4004501h 1   I2C_CNT
ARM7 DSi Microphone ?
  4004600h 2   MIC_CNT ?
  4004604h 4   MIC_DATA ?
  4004700h 2   SNDEXCNT             <-- can be read even in DS mode!
ARM7 DSi SD/MMC Registers for Memory Card access (SD Card and onboard eMMC)
  4004800h 2   SD_CMD              Command and Response/Data Type
  4004802h 2   SD_CARD_PORT_SELECT   (SD/MMC:020Fh, SDIO:010Fh)
  4004804h 4   SD_CMD_PARAM0-1     Argument (32bit, 2 halfwords)
  400480Ah 2   SD_DATA16_BLK_COUNT        "Transfer Block Count"
  400480Ch 16  SD_RESPONSE0-7 (128bit, 8 halfwords)
  400481Ch 4   SD_IRQ_STATUS0-1  ;IRQ Status  (0=ack, 1=req)
  4004820h 4   SD_IRQ_MASK0-1    ;IRQ Disable (0=enable, 1=disable)
  4004824h 2   SD_CARD_CLK_CTL Card Clock Control
  4004826h 2   SD_DATA16_BLK_LEN  Memory Card Transfer Data Length
  4004828h 2   SD_CARD_OPTION Memory Card Option Setup (can be C0FFh)
  400482Ah 2   Fixed always zero?
  400482Ch 4   SD_ERROR_DETAIL_STATUS0-1   Error Detail Status
  4004830h 2   SD_DATA16_FIFO        Data Port  (SD_FIFO?)
  4004832h 2   Fixed always zero?       ;(TC6371AF:BUF1 Data MSBs?)
  400483Ah 2   Fixed always zero?       ;(SDCTL_SDIO_HOST_INFORMATION)
  400483Ch 2   Fixed always zero?       ;(SDCTL_ERROR_CONTROL)
  400483Eh 2   Fixed always zero?       ;(TC6387XB: LED_CONTROL)
  4004840h 2   Fixed always 003Fh?
  4004842h 2   Fixed always 002Ah?
  4004844h 6Eh Fixed always zerofilled?
  40048B2h 2   Fixed always FFFFh?
  40048B4h 6   Fixed always zerofilled?
  40048BAh 2   Fixed always 0200h?
  40048BCh 1Ch Fixed always zerofilled?
  40048D8h 2   SD_DATA_CTL
  40048DAh 6   Fixed always zerofilled?
  40048E0h 2   SD_SOFT_RESET  Software Reset (bit0=SRST=0=reset)
  40048E2h 2   Fixed always 0009h?      ;(RESERVED2/9, TC6371AF:CORE_REV)
  40048E4h 2   Fixed always zero?
  40048E6h 2   Fixed always zero?       ;(RESERVED3, TC6371AF:BUF_ADR)
  40048E8h 2   Fixed always zero?       ;(TC6371AF:Resp_Header)
  40048EAh 6   Fixed always zerofilled?
  40048F0h 2   Fixed always zero?       ;(RESERVED10)
  40048F2h 2   ? Can be 0003h
  40048F4h 2   ? Can be 0770h
  40048F6h 2   ? Firmware tests bit0 (but, always 0?)       (RESERVED4)
  40048F8h 2   Fixed always 0004h?   (nonzero, unlike SDIO) (RESERVED5)
  40048FAh 2   ? Can be 0000h..0007h (nonzero, unlike SDIO) (RESERVED6)
  40048FCh 2   ? Can be 0024h..00FFh?                       (RESERVED7)
  40048FEh 2   ? Can be 0024h..00FFh?   (RESERVED8 / TC6371AF:Revision)
  4004900h 2   SD_DATA32_IRQ
  4004902h 2   Fixed always zero?
  4004904h 2   SD_DATA32_BLK_LEN
  4004906h 2   Fixed always zero?
  4004908h 2   SD_DATA32_BLK_COUNT
  400490Ah 2   Fixed always zero?
  400490Ch 4   SD_DATA32_FIFO
  4004910h F0h Fixed always zerofilled?
ARM7 DSi SD/MMC Registers for SDIO access (for Atheros Wifi)
  4004A00h 512 SDIO_xxx (same as SD_xxx at 4004800h..40049FFh, see there)
  4004A02h 2   SDIO_CARD_PORT_SELECT (slightly different than 4004802h)
  4004AF8h 2   Fixed always zero? (unlike SD_xxx at 40048F8h) (RESERVED5)
  4004AFAh 2   Fixed always zero? (unlike SD_xxx at 40048FAh) (RESERVED6)
ARM7 DSi General Purpose I/O (GPIO) (headphone connect, power button)
  4004C00h 1   GPIO Data In               (R)  (even in DS mode)
  4004C00h 1   GPIO Data Out              (W)
  4004C01h 1   GPIO Data Direction        (R/W)
  4004C02h 1   GPIO Interrupt Edge Select (R/W)
  4004C03h 1   GPIO Interrupt Enable      (R/W)
  4004C04h 1   GPIO? Unknown  ;\maybe GPIO related, or something else
  4004C05h 1   GPIO? Unknown  ;/
ARM7 DSi CPU/Console ID (used as eMMC key)
  4004D00h 8   CPU/Console ID Code (64bit)
  4004D08h 2   CPU/Console ID Flag (1bit)
ARM7 DSi Junk?
  8030200h 2   GBA area, accessed alongsides with SDIO port [4004A30h] (bug?)
Additional/Unknown, expected features are:
  NDS-slot-swap-bit (would allow to boot NDS carts from 2nd NDS-slot)

TEAK DSi Ports
The Teak processor should also contain whatever I/O ports:
Some counterpart to the ARM9 ports (eg. semaphores).
And possibly direct access to sound/microphone hardware?

  DSi Control Registers (SCFG)

4004000h - DSi9 - SCFG_A9ROM - ROM Status (R) [0000h]
  0-1   System ROM Status (0=NITRO, 1=TWL, 2-3=?) (somehow controlled via NDS7)
  2-15  Unused (0)
  16-31 Unspecified (0)

4004000h - DSi7 - SCFG_A9ROM? - ROM Control (R/W)
  0     Upper 32K half of DSi BIOS? (0=Enabled, 1=Disabled)
  1     NDS Mode (0=DSi BIOS?, 1=NDS BIOS?)
  2-15  Unknown/Unused
  8-15  See Port 4004001h
  16-31 Unknown/Unused

4004001h - DSi7 - SCFG_A7ROM?? - ROM Control (R/W)
  0     Upper 32K half of DSi BIOS? (0=Enabled, 1=Disabled)
  1     NDS Mode (0=DSi BIOS?, 1=NDS BIOS?)
  2     Unknown (set before starting Cartridges or DSiware files)
  3-7   Unknown/Unused
The System Menu sets 4004001h.Bit2 shortly before starting any Cartridges or DSiware files (except System Base Tools) (for NDS mode, after having set Bit2, it's also setting 4004000h.Bit1 and 4004001h.Bit1).
Setting Bit1 of 4004000h/4004001h does probably map NDS BIOS ROMs (instead of DSi BIOS ROMs) (however, the SCFG registers are disabled in NDS mode, so one can't see if Bit1 is set not).

4004004h - DSi9 - SCFG_CLK - New Block Clock Control (R/W) [0084h]
  0     ARM9 CPU Clock         (0=NITRO/67.03MHz, 1=TWL/134.06MHz) (TCM/Cache)
  1     DSP Block Clock        (0=Stop, 1=Run)
  2     Camera Interface Clock (0=Stop, 1=Run)
  3-6   Unused (0)
  7     New Shared RAM Clock   (0=Stop, 1=Run)   (R?) (always set?)
  8     Camera External Clock  (0=Disable, 1=Enable) ("outputs at 16.76MHz")
  9-15  Unused (0)
  16-31 See below (Port 4004006h)
Change ARM9 clock only from code within ITCM (and wait at least 8 cycles before accessing any non-ITCM memory).
Disable the corresponding modules before stopping their clocks.

4004006h - DSi9 - SCFG_RST - New Block Reset (R/W) [0000h]
  0     DSP Block Reset (0=Apply Reset, 1=Release Reset)
  1-15  Unused (0)

4004008h - DSi9 - SCFG_EXT - Extended Features (R/W) [8307F100h]
  0     Revised DMA Circuit            (0=NITRO, 1=Revised)
  1     Revised Geometry Circuit       (0=NITRO, 1=Revised)
  2     Revised Renderer Circuit       (0=NITRO, 1=Revised)
  3     Revised 2D Engine Circuit      (0=NITRO, 1=Revised)
  4     Revised Divider Circuit        (0=NITRO, 1=Revised)
  5-6   Unused (0)
  7     Revised Card Interface Circuit (0=NITRO, 1=Revised)
  8     Extended Interrupt Circuit     (0=NITRO, 1=Extended)
  9-11  Unused (0)
  12    Extended LCD Circuit           (0=NITRO, 1=Extended)
  13    Extended VRAM Access           (0=NITRO, 1=Extended)
  14-15 Main Memory RAM Limit (0..1=4MB/DS, 2=16MB/DSi, 3=32MB/DSiDebugger)
  16    Access to New DMA Controller   (0=Disable, 1=Enable)
  17    Access to Camera Interface     (0=Disable, 1=Enable)
  18    Access to DSP Block            (0=Disable, 1=Enable)
  19-23 Unused (0)
  24    Access to New Shared WRAM      (0=Disable, 1=Enable)
  25    Undocumented/Unknown (can be set)
  26-30 Unused (0)
  31    System Control Block Access (0=Disable, 1=Enable) (lock 4004000h-4063h)
Main RAM mapping depending on bit14-15:
  Mode         2000000h-2FFFFFFh   C000000h-CFFFFFFh   D000000h-DFFFFFFh
  4MB (0 or 1) 1st 4MB (+mirrors)  Zerofilled          Zerofilled
  16MB (2)     1st 16MB            1st 16MB (mirror)   1st 16MB (mirror)
  32MB (3)     1st 16MB            1st 16MB (mirror)   Open bus (or 2nd 16MB)
DSi9 SCFG_EXT.bit14-15 affect the Main RAM mapping on <both> ARM9 and ARM7 side. The 32MB mode requires an extra RAM chip (present in DSi debug version only; DSi retail consoles return 16bit open bus values instead of extra memory). RAM Size/Openbus detection is conventionally done by trying to read/write a BYTE at [0DFFFFFAh].

4004010h - DSi9 - SCFG_MC - Memory Card Interface Status (R)
4004010h - DSi7 - SCFG_MC - Memory Card Interface Control (R/W)
  0     1st NDS Slot Game Cartridge (0=Inserted, 1=Ejected)               (R)
  1     1st NDS Slot Unknown/Undocumented (0)
  2-3   1st NDS Slot Power State (0=Off, 1=PrepareOn, 2=On, 3=RequestOff) (R/W)
  4     2nd NDS Slot Game Cartridge (always 1=Ejected) ;\DSi              (R)
  5     2nd NDS Slot Unknown/Undocumented (0)          ; prototype
  6-7   2nd NDS Slot Power State    (always 0=Off)     ;/relict           (R/W)
  8-15  Unknown/Undocumented (0)
  16-31 ARM7: See Port 4004012h, ARM9: Unspecified (0)
NDS-Slot related. Bit3 (and maybe Bit2) are probably R/W on ARM7 side (though the register is disabled on ARM7 side in cooking coach exploit, so R/W isn't possible in practice).
Note: Additionally, the NDS slot Reset pin can be toggled (via ROMCTRL.Bit29; that bit is writeable on ARM7 side on DSi; which wasn't supported on NDS).
Power state values:
  0=Power is off
  1=Prepare Power on (shall be MANUALLY changed to state=2)
  2=Power is on
  3=Request Power off (will be AUTOMATICALLY changed to state=0)
  wait until state<>3                   ;wait if pwr off busy?
  exit if state<>0                      ;exit if already on?
  wait 1ms, then set state=1            ;prepare pwr on?       or want RESET ?
  wait 10ms, then set state=2           ;apply pwr on?
  wait 27ms, then set ROMCTRL=20000000h ;reset cart?  or rather RELEASE reset?
  wait 120ms                            ;more insane delay?
  wait until state<>3                   ;wait if pwr off busy?
  exit if state<>2                      ;exit if already off?
  set state=3                           ;request pwr off?
  wait until state=0                    ;wait until pwr off?
Power Off is also done automatically by hardware when ejecting the cartridge.

  DSi XpertTeak (DSP)

The DSi includes an XpertTeak Digital Signal Processor (DSP); which is consisting of a TeakLite II processor, plus some "expert" features (like DMA support). The thing appears to be intended for audio/video decoding, but it's left unused by most DSi games. However, it's used by the "Nintendo DSi Sound" and "Nintendo Zone" system utilities, and by the "Cooking Coach" cartridge.

DSi Teak Misc
DSi Teak I/O Ports (on ARM9 Side)
DSi TeakLite II Instruction Set Encoding
DSi TeakLite II Operand Encoding

  DSi Teak Misc

The DSi is using an "XpertTeak" (according to Nintendo). Whereas, the "Xpert" seems to be referring to DMA and Serial I/O stuff, and the actual instruction set seems to be "TeakLite II".

Teak Instruction Set References
There aren't any official references for the Teak instruction set. However, there's one document that has leaked into internet (plus some docs for older Oak instruction set):
  TeakLite Architecture Specification Revision 4.41 (DSP Group Inc.)
  OakDSPCore Technical Manuals for CWDSP1640 or CWDSP167x (LSI Logic)
  OakDSPCore DSP Subsystem AT75C (Atmel)
TeakLite II supports lots of additional opcodes, the only available info has leaked in form of .DLLs which were (apparently by mistake) bundled with a specific RVDS release version:
  TeakLite II disassembler dll in RVDS (RealView Developer Suite) 4.0 Pro
There's no known way to use the disassembler GUI to decipher Teak binaries. However, Normmatt found a way to get the .DLLs to disassemble code manually (via LoadLibrary and GetProcAddr), which in turn allowed to disassemble all possible 65536 combinations for all opcodes.

Teak COFF Files
The DSi Sound utility contains a file called "aac.a" (inside of its nitro filesystem), this appears to a COFF file with Teak code (and aside from the binary, it's also including a COFF symbol table with labels in ASCII format).

Teak Undefined Opcodes
There are several "Undefined" opcodes: Any opcodes that have no instruction assigned in the opcode encoding table (or that are excplicitely assigned as "undefined" in the table). Opcodes with invalid parameters (eg. ArArp set to 6..7).
Some opcodes are also having "Unused" operand bits; these bits should be usually zero (nonzero would supposedly mirror to the same instruction, but one shouldn't do that).
Moreover, there are various special cases saying that certain opcodes may not be used with certain registers, eg. "addh" shall not be used with operands Ax,Bx,p (with unknown results when violating that rule).

Teak Memory Map
TeakLite II supports 18bit addressing (unlike Teak/Oak which supported only 16bit addresses). Memory is addressed in 16bit WORD units (not in bytes) with separate Instruction and Data busses. So 18bit can address 256Kwords (=512Kbytes), for code/data each. Whereas, the DSi can map only half that much memory to the DSP (ie. max 256Kbytes code, plus 256Kbytes data):
DSi New Shared WRAM (for ARM7, ARM9, DSP)
Unknown if there any further internal memory locations apart from the above WRAM (such like internal fast RAM, or memory mapped I/O ports).

Teak Speed
Cycles per opcode are defined in the TeakLite document (not covering TeakLite II opcodes though). Most instructions (even Multiply opcodes) can complete in a single clock cycle. The main bottleneck appear to be memory access cycles: Code and Data memory can be accessed in parallel, so the overall rule would be:
  NumCycles = max(NumberOfOpcodeWords, NumberOfDataReadsWrites)
Some exceptions with extra cycles are opcodes that are changing PC, or that do read/write program memory (movd and movp). Opcodes exp, max, maxd, min are having restrictions saying that the result may not be used by the "following instruction".
The overall clock speed in the DSi is unknown; some years ago somebody seems to have claimed it to be around 130MHz, but it's unclear where that info came from. The ARM9 can access WRAM at 33MHz, so one may doubt that the Teak could do it much faster; unless it's using a cache, or unless it's packing two continous 16bit accesses into a 32bit access.

Teak I/O ports (at DSP side) are unknown
There should be some counterparts to the ARM9 ports (ie. the Semaphore, Command/Reply, Fifo stuff).
The XpertTeak is also said support DMA and Serial I/O (the latter presumably unused in DSi; unless it's referring to the Audio I2S bus).
And, there should be some way to output sound directly (and maybe also to input microphone directly). See SNDEXCNT register.

  DSi Teak I/O Ports (on ARM9 Side)

4004300h - DSi9 - DSP_PDATA - DSP Transfer Data Read FIFO (R)
  0-15  Data (one stage of the 16-stage Read FIFO)

4004300h - DSi9 - DSP_PDATA - DSP Transfer Data Write FIFO (W)
  0-15  Data (one stage of the 16-stage Write FIFO)

4004304h - DSi9 - DSP_PADR - DSP Transfer Address (W)
  0-15  Lower 16bit of Address in DSP Memory
Note: The upper 16bit of Address must be configued in the DMA register (inside of the DSP).

4004308h - DSi9 - DSP_PCFG - DSP Configuration (R/W) (16bit)
  0     DSP Reset (0=Release, 1=Reset)  ;should be held "1" for 8 DSP clks
  1     Address Auto-Increment (0=Off, 1=On)
  2-3   DSP Read Data Length (0=1 word, 1=8 words, 2=16 words, 3=Free-Run)
  4     DSP Read Start Flag (mem transfer via Read FIFO) (1=Start)
  5     Interrupt Enable Read FIFO Full      (0=Off, 1=On)
  6     Interrupt Enable Read FIFO Not-Empty (0=Off, 1=On)
  7     Interrupt Enable Write FIFO Full     (0=Off, 1=On)
  8     Interrupt Enable Write FIFO Empty    (0=Off, 1=On)
  9     Interrupt Enable Reply Register 0    (0=Off, 1=On)
  10    Interrupt Enable Reply Register 1    (0=Off, 1=On)
  11    Interrupt Enable Reply Register 2    (0=Off, 1=On)
  12-15 DSP Memory Transfer (0=Data Memory, 1=MMIO Register, 5=Program Memory)

400430Ch - DSi9 - DSP_PSTS - DSP Status (R) (16bit)
  0     Read Transfer Underway Flag  (0=No, 1=Yes/From DSP Memory)
  1     Write Transfer Underway Flag (0=No, 1=Yes/To DSP Memory)
  2     Peripheral Reset Flag (0=No/Ready, 1=Reset/Busy)
  3-4   Unused
  5     Read FIFO Full Flag      (0=No, 1=Yes)
  6     Read FIFO Not-Empty Flag (0=No, 1=Yes) ;ARM9 may read DSP_PDATA
  7     Write FIFO Full Flag     (0=No, 1=Yes)
  8     Write FIFO Empty Flag    (0=No, 1=Yes)
  9     Semaphore IRQ Flag (0=None, 1=IRQ)
  10    Reply Register 0 Update Flag (0=Was Written by DSP, 1=No)
  11    Reply Register 1 Update Flag (0=Was Written by DSP, 1=No)
  12    Reply Register 2 Update Flag (0=Was Written by DSP, 1=No)
  13    Command Register 0 Read Flag (0=Was Read by DSP, 1=No)
  14    Command Register 1 Read Flag (0=Was Read by DSP, 1=No)
  15    Command Register 2 Read Flag (0=Was Read by DSP, 1=No)
Unknown if/when bit10-15 get reset... maybe after reading the status?

4004310h - DSi9 - DSP_PSEM - ARM9-to-DSP Semaphore (R/W) (16bit)
  0-15  ARM9-to-DSP Semaphore 0..15 Flags (0=Off, 1=On)
Reportedly these flags are sent in ARM9-to-DSP direction (=seems correct).
Confusingly, the other DSP_Pxxx registers are for opposite direction?

4004314h - DSi9 - DSP_PMASK - DSP-to-ARM9 Semaphore Mask (R/W) (16bit)
  0-15  DSP-to-ARM9 Semaphore 0..15 Interrupt Disable (0=Enable, 1=Disable)

4004318h - DSi9 - DSP_PCLEAR - DSP-to-ARM9 Semaphore Clear (W) (16bit)
  0-15  DSP-to-ARM9 Semaphore 0..15 Clear (0=No Change, 1=Clear)
Reportedly clears bits in DSP_PSEM/4004310h. [that's probably nonsense, clearing bits in DSP_SEM/400431Ch would make more sense]

400431Ch - DSi9 - DSP_SEM - DSP-to-ARM9 Semaphore Data (R) (16bit)
  0-15  DSP-to-ARM9 Semaphore 0..15 Flags (0=Off, 1=On)
Reportedly these flags are received in DSP-to-ARM9 direction.

4004320h - DSi9 - DSP_CMD0 - DSP Command Reg. 0 (R/W) (ARM9 to DSP) (16bit)
4004328h - DSi9 - DSP_CMD1 - DSP Command Reg. 1 (R/W) (ARM9 to DSP) (16bit)
4004330h - DSi9 - DSP_CMD2 - DSP Command Reg. 2 (R/W) (ARM9 to DSP) (16bit)
  0-15  Command/Data to DSP

4004324h - DSi9 - DSP_REP0 - DSP Reply Register 0 (R) (DSP to ARM9) (16bit)
400432Ch - DSi9 - DSP_REP1 - DSP Reply Register 1 (R) (DSP to ARM9) (16bit)
4004334h - DSi9 - DSP_REP2 - DSP Reply Register 2 (R) (DSP to ARM9) (16bit)
  0-15  Reply/Data from DSP

  DSi TeakLite II Instruction Set Encoding

The opcodes are 16bits wide (some followed by an additional 16bit parameter word, namely those with "@16" operands). The encoding is very messy (fixed opcode bits randomly mixed/interleaved with variable parameter bits, and with new TL2 opcodes squeezed in formerly unused locations), making it pretty much impossible to decode that unpleasant stuff by software/logic.
The only reasonable decoding way is using a huge table with 65536 entries (which could be generated temporarily from the information in below table, using the Base number plus all variable bit combinations, for example, "6100h TL mov Direct8@0, Ab@11" has variable bits in bit0-7 and bit11-12, so the opcode would be mapped at 6100h-61FFh, 6900h-69FFh, 7100h-71FFh, 7900h-79FFh).

TeakLite I (TL) and TeakLite II (TL2) Opcodes
  Base  Ver Opcode (with parameter bits located at @bitnumber and up)
  D4FBh TL  add  Direct16@16, Ax@8
  A600h TL  add  Direct8@0, Ax@8
  86C0h TL  add  Imm16@16, Ax@8
  C600h TL  add  Imm8u@0, Ax@8
  D4DBh TL  add  MemR7Imm16@16, Ax@8
  4600h TL  add  MemR7Imm7@0, Ax@8
  8680h TL  add  MemRn@0, Ax@8 || Rn@0stepZIDS@3
  86A0h TL  add  RegisterP0@0, Ax@8
  D2DAh TL2 add  Ab@10, Bx@0
  5DF0h TL2 add  Bx@1, Ax@0
  9070h TL2 add  MemR01@8, sv, Abh@2 || sub MemR01@8offsZI@0, sv, Abl@2
             || mov Abl@2, MemR45@8 || R01@8stepII2@0, R45@8stepII2@1
  5DB0h TL2 add  MemR04@1, sv, Abh@2 || sub MemR04@1offsZI@0, sv, Abl@2
             || R04@1stepII2@0
  6F80h TL2 add  MemR45@2, MemR01@2, Abh@3
             || add MemR45@2offsZI@1, MemR01@2offsZI@0, Abl@3
             || R01@2stepII2@0, R45@2stepII2@1
  6FA0h TL2 add  MemR45@2, MemR01@2, Abh@3
             || sub MemR45@2offsZI@1, MemR01@2offsZI@0, Abl@3
             || R01@2stepII2@0, R45@2stepII2@1
  5E30h TL2 add  MemR45@8, sv, Abh@2 || sub MemR45@8offsZI@1, sv, Abl@2
             || mov Abl@2, MemR01@8 || R01@8stepII2@0, R45@8stepII2@1
  5DC0h TL2 add  p0, p1, Ab@2
  D782h TL2 add  p1, Ax@0
  5DF8h TL2 add  Px@1, Bx@0
  D38Bh TL2 add  r6, Ax@4
  4590h TL2 add3 p0, p1, Ab@2
  4592h TL2 add3a p0, p1, Ab@2
  4593h TL2 add3aa p0, p1, Ab@2
  5DC1h TL2 adda p0, p1, Ab@2
  B200h TL  addh Direct8@0, Ax@8
  9280h TL  addh MemRn@0, Ax@8 || Rn@0stepZIDS@3
  92A0h TL  addh Register@0, Ax@8
  9464h TL2 addh r6, Ax@0
  90E0h TL2 addhp MemR0426@2, Px@4, Ax@8 || R0426@2stepII2D2S@0
  B400h TL  addl Direct8@0, Ax@8
  9480h TL  addl MemRn@0, Ax@8 || Rn@0stepZIDS@3
  94A0h TL  addl Register@0, Ax@8
  9466h TL2 addl r6, Ax@0
  906Ch TL2 addsub  p0, p1, Ab@0
  49C2h TL2 addsub  p1, p0, Ab@4
  916Ch TL2 addsuba p0, p1, Ab@0
  49C3h TL2 addsuba p1, p0, Ab@4
  E700h TL  addv Imm16@16, Direct8@0
  86E0h TL  addv Imm16@16, MemRn@0 || Rn@0stepZIDS@3
  87E0h TL  addv Imm16@16, Register@0
  47BBh TL2 addv Imm16@16, r6
  D4F9h TL  and  Direct16@16, Ax@8
  A200h TL  and  Direct8@0, Ax@8
  82C0h TL  and  Imm16@16, Ax@8
  C200h TL  and  Imm8u@0, Ax@8
  D4D9h TL  and  MemR7Imm16@16, Ax@8
  4200h TL  and  MemR7Imm7@0, Ax@8
  8280h TL  and  MemRn@0, Ax@8 || Rn@0stepZIDS@3
  82A0h TL  and  RegisterP0@0, Ax@8
  6770h TL2 and  Ab@2, Ab@0, Ax@12                ;TL2 only
  D389h TL2 and  r6, Ax@4
  4B80h TL  banke BankFlags6@0  ;{r0}{,r1}{,r4}{,cfgi}{,r7}{,cfgj}
  8CDFh TL2 bankr       ;without operand ?
  8CDCh TL2 bankr Ar@0
  8CD0h TL2 bankr Ar@2, Arp@0
  8CD8h TL2 bankr Arp@0
  5EB8h TL2 bitrev Rn@0
  D7E8h TL2 bitrev Rn@0, dbrv
  D7E0h TL2 bitrev Rn@0, ebrv
  5C00h TL  bkrep NoReverse, Imm8u@0, Address16@16
  5D00h TL  bkrep NoReverse, Register@0, Address18@16and5
  8FDCh TL2 bkrep NoReverse, r6, Address18@16and0
  DA9Ch TL2 bkreprst MemR0426@0
  5F48h TL2 bkreprst MemSp, Unused2@0
  DADCh TL2 bkrepsto MemR0426@0, Unused1@10
  9468h TL2 bkrepsto MemSp, Unused3@0
  4180h TL  br   Address18@16and4, Cond@0
  D3C0h TL  break               ;break
  5000h TL  brr  RelAddr7@4, Cond@0
  41C0h TL  call Address18@16and4, Cond@0
  D480h TL  calla Axl@8
  D381h TL2 calla Ax@4
  1000h TL  callr RelAddr7@4, Cond@0
  9068h TL2 cbs  Axh@0, Axh@not0, r0, ge
  9168h TL2 cbs  Axh@0, Axh@not0, r0, gt
  D49Eh TL2 cbs  Axh@8, Bxh@5, r0, ge
  D49Fh TL2 cbs  Axh@8, Bxh@5, r0, gt
  D5C0h TL2 cbs  MemR01@2, MemR45@2, ge || R01@2stepII2@0, R45@2stepII2@1
  D5C8h TL2 cbs  MemR01@2, MemR45@2, gt || R01@2stepII2@0, R45@2stepII2@1
  E500h TL  chng Imm16@16, Direct8@0
  84E0h TL  chng Imm16@16, MemRn@0 || Rn@0stepZIDS@3
  85E0h TL  chng Imm16@16, Register@0
  47BAh TL2 chng Imm16@16, r6
  0038h TL2 chng Imm16@16, SttMod@0
  6760h TL  clr  Implied ConstZero,  Ax@12, Cond@0 ;aX=0
  6F60h TL  clr  Implied ConstZero,  Bx@12, Cond@0 ;bX=0
  8ED0h TL2 clr  Implied ConstZero, Ab@2, Ab@0
  5DFEh TL2 clrp p0
  5DFFh TL2 clrp p0, p1
  5DFDh TL2 clrp p1
  67C0h TL  clrr Implied Const8000h, Ax@12, Cond@0 ;aX=8000h
  6F70h TL2 clrr Implied Const8000h, Bx@12, Cond@0 ;bX=8000h
  8DD0h TL2 clrr Implied Const8000h, Ab@2, Ab@0
  D4FEh TL  cmp  Direct16@16, Ax@8
  AC00h TL  cmp  Direct8@0, Ax@8
  8CC0h TL  cmp  Imm16@16, Ax@8
  CC00h TL  cmp  Imm8u@0, Ax@8
  D4DEh TL  cmp  MemR7Imm16@16, Ax@8
  4C00h TL  cmp  MemR7Imm7@0, Ax@8
  8C80h TL  cmp  MemRn@0, Ax@8 || Rn@0stepZIDS@3
  8CA0h TL  cmp  RegisterP0@0, Ax@8
  4D8Ch TL2 cmp  Ax@1, Bx@0
  D483h TL2 cmp  b0, b1
  D583h TL2 cmp  b1, b0
  DA9Ah TL2 cmp  Bx@10, Ax@0
  8B63h TL2 cmp  p1, Ax@4
  D38Eh TL2 cmp  r6, Ax@4
  BE00h TL  cmpu Direct8@0, Ax@8
  9E80h TL  cmpu MemRn@0, Ax@8 || Rn@0stepZIDS@3
  9EA0h TL  cmpu Register@0, Ax@8
  8A63h TL2 cmpu r6, Ax@3
  ED00h TL  cmpv Imm16@16, Direct8@0
  8CE0h TL  cmpv Imm16@16, MemRn@0 || Rn@0stepZIDS@3
  8DE0h TL  cmpv Imm16@16, Register@0
  47BEh TL2 cmpv Imm16@16, r6
  D390h TL  cntx r  ;restore shadows
  D380h TL  cntx s  ;store shadows
  67F0h TL  copy Implied Ax@not12,   Ax@12, Cond@0 ;aX=aY
  67E0h TL  dec  Implied Const1,     Ax@12, Cond@0 ;aX=aX-1
  43C0h TL  dint        ;IE=0, interrupt disable
  0E00h TL  divs Direct8@0, Ax@8
  4380h TL  eint        ;IE=1, interrupt enable
  9460h TL  exp  Bx@0, Implied sv
  9060h TL  exp  Bx@0, Implied sv, Ax@8
  9C40h TL  exp  MemRn@0, Implied sv || Rn@0stepZIDS@3
  9840h TL  exp  MemRn@0, Implied sv, Ax@8 || Rn@0stepZIDS@3
  9040h TL  exp  RegisterP0@0, Implied sv, Ax@8
  9440h TL  exp  RegisterP0@0, Implied sv
  D7C1h TL2 exp  r6, Implied sv
  D382h TL2 exp  r6, Implied sv, Ax@4
  67D0h TL  inc  Implied Const1,     Ax@12, Cond@0 ;aX=aX+1
  49C0h TL  lim  a0     ;aka a0,a0
  49D0h TL  lim  a0, a1
  49F0h TL  lim  a1     ;aka a1,a1
  49E0h TL  lim  a1, a0
  4D80h TL  load Imm2u@0, ps      ;st1.bit11-10=imm2
  DB80h TL  load Imm7s@0, stepi   ;cfgi.LSB=imm7
  DF80h TL  load Imm7s@0, stepj   ;cfgj.LSB=imm7
  0400h TL  load Imm8u@0, page    ;st1.LSBs=imm8 ;aka "lpg"
  0200h TL  load Imm9u@0, modi    ;cfgi.MSB=imm9
  0A00h TL  load Imm9u@0, modj    ;cfgj.MSB=imm9
  D7D8h TL2 load Imm2u@1, movpd, Unused1@0
  0010h TL2 load Imm4u@0, ps01
  D400h TL  maa  MemR45@2, MemR0123@0, Ax@11
             || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8400h TL  maa  MemRn@0, Imm16@16, Ax@11 || Rn@0stepZIDS@3
  8420h TL  maa  y0, MemRn@0, Ax@11 || Rn@0stepZIDS@3
  8440h TL  maa  y0, Register@0, Ax@11
  E400h TL  maa  y0, Direct8@0, Ax@11
  5EA8h TL2 maa  y0, r6, Ax@0
  D700h TL  maasu MemR45@2, MemR0123@0, Ax@11
             || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8700h TL  maasu MemRn@0, Imm16@16, Ax@11 || Rn@0stepZIDS@3
  8720h TL  maasu y0, MemRn@0, Ax@11 || Rn@0stepZIDS@3
  8740h TL  maasu y0, Register@0, Ax@11
  5EAEh TL2 maasu y0, r6, Ax@0
  D200h TL  mac  MemR45@2, MemR0123@0, Ax@11
             || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8200h TL  mac  MemRn@0, Imm16@16, Ax@11 || Rn@0stepZIDS@3
  8220h TL  mac  y0, MemRn@0, Ax@11 || Rn@0stepZIDS@3
  8240h TL  mac  y0, Register@0, Ax@11
  E200h TL  mac  y0, Direct8@0, Ax@11
  5EA4h TL2 mac  y0, r6, Ax@0
  4D84h TL2 mac  y0, x1, Ax@1, Unused1@0
  5E28h TL2 mac1 MemR45@2, MemR01@2, Ax@8 || R01@2stepII2@0, R45@2stepII2@1
  D600h TL  macsu MemR45@2, MemR0123@0, Ax@11
             || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8600h TL  macsu MemRn@0, Imm16@16, Ax@11 || Rn@0stepZIDS@3
  E600h TL  macsu y0, Direct8@0, Ax@11
  8620h TL  macsu y0, MemRn@0, Ax@11 || Rn@0stepZIDS@3
  8640h TL  macsu y0, Register@0, Ax@11
  5EACh TL2 macsu y0, r6, Ax@0
  D300h TL  macus MemR45@2, MemR0123@0, Ax@11
             || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8300h TL  macus MemRn@0, Imm16@16, Ax@11 || Rn@0stepZIDS@3
  8320h TL  macus y0, MemRn@0, Ax@11 || Rn@0stepZIDS@3
  8340h TL  macus y0, Register@0, Ax@11
  5EA6h TL2 macus y0, r6, Ax@0
  D500h TL  macuu MemR45@2, MemR0123@0, Ax@11
             || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8500h TL  macuu MemRn@0, Imm16@16, Ax@11 || Rn@0stepZIDS@3
  8520h TL  macuu y0, MemRn@0, Ax@11 || Rn@0stepZIDS@3
  8540h TL  macuu y0, Register@0, Ax@11
  5EAAh TL2 macuu y0, r6, Ax@0
  8460h TL  max  NoReverse, Ax@8, Implied Ax@not8, Bogus MemR0, ge,
             Implied mixp, Implied r0 || R0stepZIDS@3   ;when aY >= aX
  8660h TL  max  NoReverse, Ax@8, Implied Ax@not8, Bogus MemR0, gt,
             Implied mixp, Implied r0 || R0stepZIDS@3   ;when aY > aX
  5E21h TL2 max  a0h, a1h || max a0l, a1l || vtrshr
  5F21h TL2 max  a1h, a0h || max a1l, a0l || vtrshr
  D784h TL2 max  Axh@1, Bxh@0 || max Axl@1, Bxl@0 || vtrshr
  4A40h TL2 max  Axh@3, Bxh@4 || max Axl@3, Bxl@4 || mov Axl@not3, MemR04@1
             || vtrshr || R04@1stepII2@0
  4A44h TL2 max  Axh@3, Bxh@4 || max Axl@3, Bxl@4 || mov Axh@not3, MemR04@1
             || vtrshr || R04@1stepII2@0
  45A0h TL2 max  Axh@4, Bxh@3 || max Axl@4, Bxl@3 || mov Axh@not4, MemR45@2
             || mov Axl@not4, MemR01@2 || vtrshr
             || R01@2stepII2@0, R45@2stepII2@1
  D590h TL2 max  Axh@6, Bxh@5 || max Axl@6, Bxl@5 || mov Axh@not6, MemR01@2
             || mov Axl@not6, MemR45@2 || vtrshr
             || R01@2stepII2@0, R45@2stepII2@1
  4A60h TL2 max  Bxh@4, Axh@3 || max Bxl@4, Axl@3 || mov Bxl@not4, MemR04@1
             || vtrshr || R04@1stepII2@0
  4A64h TL2 max  Bxh@4, Axh@3 || max Bxl@4, Axl@3 || mov Bxh@not4, MemR04@1
             || vtrshr || R04@1stepII2@0
  8060h TL  maxd NoReverse, Ax@8, MemR0, ge, Implied mixp, Implied r0
             || R0stepZIDS@3   ;when (r0) >= aX
  8260h TL  maxd NoReverse, Ax@8, MemR0, gt, Implied mixp, Implied r0
             || R0stepZIDS@3   ;when (r0) > aX
  8860h TL  min  NoReverse, Ax@8, Implied Ax@not8, Bogus MemR0, le,
             Implied mixp, Implied r0 || R0stepZIDS@3   ;when aY <= aX
  8A60h TL  min  NoReverse, Ax@8, Implied Ax@not8, Bogus MemR0, lt,
             Implied mixp, Implied r0 || R0stepZIDS@3   ;when aY < aX
  43C2h TL2 min  Axh@0, Axh@not0 || min Axl@0, Axl@not0 || vtrshr
  D2B8h TL2 min  Axh@11, Bxh@10 || min Axl@11, Bxl@10
             || mov Axh@not11, MemR01@2 || mov Axl@not11, MemR45@2
             || vtrshr || R01@2stepII2@0, R45@2stepII2@1
  4A00h TL2 min  Axh@3, Bxh@4 || min Axl@3, Bxl@4 || mov Axl@not3, MemR04@1
             || vtrshr || R04@1stepII2@0
  4A04h TL2 min  Axh@3, Bxh@4 || min Axl@3, Bxl@4 || mov Axh@not3, MemR04@1
             || vtrshr || R04@1stepII2@0
  45E0h TL2 min  Axh@4, Bxh@3 || min Axl@4, Bxl@3 || mov Axh@not4, MemR45@2
             || mov Axl@not4, MemR01@2 || vtrshr
             || R01@2stepII2@0, R45@2stepII2@1
  D4BAh TL2 min  Axh@8, Bxh@0 || min Axl@8, Bxl@0 || vtrshr
  4A20h TL2 min  Bxh@4, Axh@3 || min Bxl@4, Axl@3 || mov Bxl@not4, MemR04@1
             || vtrshr || R04@1stepII2@0
  4A24h TL2 min  Bxh@4, Axh@3 || min Bxl@4, Axl@3 || mov Bxh@not4, MemR04@1
             || vtrshr || R04@1stepII2@0
  47A0h TL2 mind NoReverse, Ax@3, MemR0, le, Implied mixp, Implied r0
             || R0stepZIDS@0
  47A4h TL2 mind NoReverse, Ax@3, MemR0, lt, Implied mixp, Implied r0
             || R0stepZIDS@0
  0080h TL  modr MemRn@0stepZIDS@3
  00A0h TL  modr MemRn@0stepZIDS@3, dmod  ;Disable modulo
  D294h TL2 modr MemR0123@10stepII2D2S0@0 || modr MemR4567@10stepII2D2S0@5
  0D80h TL2 modr MemR0123@5stepII2D2S0@1  || modr MemR4567@5stepII2D2S0@3, dmod
  0D81h TL2 modr MemR0123@5stepII2D2S0@1, dmod
                  || modr MemR4567@5stepII2D2S0@3, dmod
  8464h TL2 modr MemR0123@8stepII2D2S0@0, dmod || modr MemR4567@8stepII2D2S0@3
  5DA0h TL2 modr MemRn@0stepD2
  5DA8h TL2 modr MemRn@0stepD2, dmod
  4990h TL2 modr MemRn@0stepI2
  4998h TL2 modr MemRn@0stepI2, dmod
  D290h TL  mov  Ab@10, Ab@5
  D298h TL  mov  Abl@10, dvm
  D2D8h TL  mov  Abl@10, x0
  3000h TL  mov  Ablh@9, Direct8@0
  D4BCh TL  mov  Axl@8, Direct16@16
  D49Ch TL  mov  Axl@8, MemR7Imm16@16
  DC80h TL  mov  Axl@8, MemR7Imm7@0
  D4B8h TL  mov  Direct16@16, Ax@8
  6100h TL  mov  Direct8@0, Ab@11
  6200h TL  mov  Direct8@0, Ablh@10
  6500h TL  mov  Direct8@0, Axh@12, eu   ;aka Axheu
  6000h TL  mov  Direct8@0, R0123457y0@10
  6D00h TL  mov  Direct8@0, sv
  D491h TL  mov  dvm, Ab@5
  D492h TL  mov  icr, Ab@5
  5E20h TL  mov  Imm16@16, Bx@8
  5E00h TL  mov  Imm16@16, Register@0
  4F80h TL  mov  Imm5u@0, icr
  2500h TL  mov  Imm8s@0, Axh@12         ;signed!
  2900h TL  mov  Imm8s@0, ext0
  2D00h TL  mov  Imm8s@0, ext1
  3900h TL  mov  Imm8s@0, ext2
  3D00h TL  mov  Imm8s@0, ext3
  2300h TL  mov  Imm8s@0, R0123457y0@10  ;signed!
  0500h TL  mov  Imm8s@0, sv
  2100h TL  mov  Imm8u@0, Axl@12         ;unsigned!
  D498h TL  mov  MemR7Imm16@16, Ax@8
  D880h TL  mov  MemR7Imm7@0, Ax@8
  98C0h TL  mov  MemRn@0, Bx@8 || Rn@0stepZIDS@3
  1C00h TL  mov  MemRn@0, Register@5 || Rn@0stepZIDS@3
  47E0h TL  mov  MemSp, Register@0
  47C0h TL  mov  mixp, Register@0
  2000h TL  mov  R0123457y0@9, Direct8@0
  4FC0h TL  mov  Register@0, icr
  5E80h TL  mov  Register@0, mixp
  1800h TL  mov  Register@5, MemRn@0 || Rn@0stepZIDS@3
  5EC0h TL  mov  RegisterP0@0, Bx@5
  5800h TL  mov  RegisterP0@0, Register@5
  D490h TL  mov  repc, Ab@5
  7D00h TL  mov  sv, Direct8@0
  D493h TL  mov  x0, Ab@5
  D49Bh TL2 mov  a0h, stepi0
  D59Bh TL2 mov  a0h, stepj0
  4390h TL2 mov  a0h, MemR0426@2 || mov y0, MemR0426@2offsZIDZ@0
             || R0426@2stepII2D2S@0
  43D0h TL2 mov  a1h, MemR0426@2 || mov y0, MemR0426@2offsZIDZ@0
             || R0426@2stepII2D2S@0
  8FD4h TL2 mov  Ab@0, p0
  43A0h TL2 mov  Abh@3, MemR01@2 || mov Abl@3, MemR45@2
             || R01@2stepII2@0, R45@2stepII2@1
  43E0h TL2 mov  Abh@3, MemR45@2 || mov Abl@3, MemR01@2
             || R01@2stepII2@0, R45@2stepII2@1
  9D40h TL2 mov  Abh@4, MemR04@1 || mov Abh@2, MemR04@1offsZI@0
             || R04@1stepII2@0
  9164h TL2 mov  Abl@0, prpage
  9064h TL2 mov  Abl@0, repc
  D394h TL2 mov  Abl@0, x1
  D384h TL2 mov  Abl@0, y1
  9540h TL2 mov  Abl@3, ArArp@0
  9C60h TL2 mov  Abl@3, SttMod@0
  9560h TL2 mov  ArArp@0, Abl@3
  D488h TL2 mov  ArArp@0, MemR04@8 || R04@8stepII2@5
  5F50h TL2 mov  ArArpSttMod@0, MemR7Imm16@16
  886Bh TL2 mov  Ax@8, pc
  8C60h TL2 mov  Axh@4, MemR4567@8 || mov MemR0123@8, Axh@4
             || R0123@8stepII2D2S@0, R4567@8stepII2D2S@2
  4800h TL2 mov  Axh@6, MemR0123@4 || movr MemR4567@4, Axh@6
             || R0123@4stepII2D2S@0, R4567@4stepII2D2S@2
  4900h TL2 mov  Axh@6, MemR0123@4 || mov  MemR4567@4, Axh@6
             || R0123@4stepII2D2S@0, R4567@4stepII2D2S@2
  7F80h TL2 mov  Axh@6, MemR4567@4 || movr MemR0123@4, Axh@6
             || R0123@4stepII2D2S@0, R4567@4stepII2D2S@2
  8863h TL2 mov  Bx@8, pc
  0008h TL2 mov  Imm16@16, ArArp@0
  0023h TL2 mov  Imm16@16, r6
  0001h TL2 mov  Imm16@16, repc
  8971h TL2 mov  Imm16@16, stepi0
  8979h TL2 mov  Imm16@16, stepj0
  0030h TL2 mov  Imm16@16, SttMod@0
  5DD0h TL2 mov  Imm4u@0, prpage
  80C4h TL2 mov  MemR01@9, Abh@10 || mov MemR45@9, Abl@10
             || R01@9stepII2@0, R45@9stepII2@8
  D292h TL2 mov  MemR0426@10_MemR0426@10offsZIDZ@5, Px@0
             || R0426@10stepII2D2S@5
  D7D4h TL2 mov  MemR04@1, repc || R04@1stepII2@0
  5F4Ch TL2 mov  MemR04@1, sv || sub3 MemR04@1, p0, p1, b0 || R04@1stepII2@0
  D4B4h TL2 mov  MemR04@1, sv || sub3rnd MemR04@1, p0, p1, b1 || R04@1stepII2@0
  DE9Ch TL2 mov  MemR04@1, sv || sub3rnd MemR04@1, p0, p1, b0 || R04@1stepII2@0
  4B40h TL2 mov  MemR04@3, sv || addsub    MemR04@3, p1, p0, Bx@0
             || R04@3stepII2@2
  4B42h TL2 mov  MemR04@3, sv || addsubrnd MemR04@3, p1, p0, Bx@0
             || R04@3stepII2@2
  8062h TL2 mov  MemR04@4, ArArp@8  || R04@4stepII2@3
  8063h TL2 mov  MemR04@4, SttMod@8 || R04@4stepII2@3
  9960h TL2 mov  MemR04@4, sv || addsub    MemR04@4, p1, p0, Bx@2
             || R04@4stepD2S@3  ;<-- ordered p1, p0 here !
  99E0h TL2 mov  MemR04@4, sv || addsubrnd MemR04@4, p1, p0, Bx@2
             || R04@4stepD2S@3  ;<-- ordered p1, p0 here !
  9860h TL2 mov  MemR04@4, sv || sub3      MemR04@4, p0, p1, Bx@2
             || R04@4stepD2S@3
  98E0h TL2 mov  MemR04@4, sv || sub3rnd   MemR04@4, p0, p1, Bx@2
             || R04@4stepD2S@3
  8873h TL2 mov  MemR04@8, sv || sub3 MemR04@8, p0, p1, b1 || R04@8stepII2@3
  D4C0h TL2 mov  MemR45@5, Abh@2 || mov MemR01@5, Abl@2
             || R01@5stepII2@0, R45@5stepII2@1
  4D90h TL2 mov  MemR7Imm16@16, ArArpSttMod@0
  D2DCh TL2 mov  MemR7Imm16@16, repc, Unused2@0, Unused1@10
  1B20h TL2 mov  MemRn@0, r6 || Rn@0stepZIDS@3 ;override 1800h (mov a1,MemRn@0)
  D29Ch TL2 mov  MemSp, r6, Unused2@0, Unused1@10
  8A73h TL2 mov  mixp, Bx@3
  4381h TL2 mov  mixp, r6
  4382h TL2 mov  p0h, Bx@0
  D3C2h TL2 mov  p0h, r6
  4B60h TL2 mov  p0h, Register@0     ;<-- here "p0h" as source
  8FD8h TL2 mov  p1, Ab@0
  88D0h TL2 mov  Px@1, MemR0426@8_MemR0426@8offsZIDZ@2   || R0426@8stepII2D2S@2
  88D1h TL2 mov  Px@1, MemR0426@8_MemR0426@8offsZIDZ@2,s || R0426@8stepII2D2S@2
  D481h TL2 mov  r6, Bx@8
  1B00h TL2 mov  r6, MemRn@0 || Rn@0stepZIDS@3 ;override 1800h (mov a0,MemRn@0)
  43C1h TL2 mov  r6, mixp
  5F00h TL2 mov  r6, Register@0
  5F60h TL2 mov  Register@0, r6
  D2D9h TL2 mov  repc, Abl@10
  D7D0h TL2 mov  repc, MemR04@1 || R04@1stepII2@0
  D3C8h TL2 mov  repc, MemR7Imm16@16, Unused3@0
  D482h TL2 mov  stepi0, a0h
  D582h TL2 mov  stepj0, a0h
  D2F8h TL2 mov  SttMod@0, Abl@10
  49C1h TL2 mov  x1, Ab@4
  D299h TL2 mov  y1, Ab@10
  5EB0h TL2 mov prpage, Abl@0
  49A0h TL2 mov SttMod@0, MemR04@4 || R04@4stepII2@3
  4DC0h TL2 mova Ab@4, MemR0426@2_MemR0426@2offsZIDZ@0 || R0426@2stepII2D2S@0
  4BC0h TL2 mova MemR0426@2_MemR0426@2offsZIDZ@0, Ab@4 || R0426@2stepII2D2S@0
  5F80h TL  movd MemR0123@0,ProgMemR45@2 || R0123@0stepZIDS@3, R45@2stepZIDS@5
  0040h TL  movp ProgMemAxl@5, Register@0
  0D40h TL2 movp ProgMemAx@5, Register@0
  0600h TL  movp ProgMemRn@0, MemR0123@5 || R0123@5stepZIDS@7, Rn@0stepZIDS@3
  D499h TL2 movpdw MemAx@8_MemAx@8stepI, pc
  8864h TL  movr MemR0426@3, Abh@8 || R0426@3stepII2D2S@0   ;op*10000h+8000h
  9CE0h TL  movr MemRn@0, Ax@8 || Rn@0stepZIDS@3
  9CC0h TL  movr RegisterP0@0, Ax@8
  5DF4h TL2 movr Bx@1, Ax@0
  8961h TL2 movr r6, Ax@3
  6300h TL  movs Implied sv, Direct8@0, Ab@11
  0180h TL  movs Implied sv, MemRn@0, Ab@5 || Rn@0stepZIDS@3
  0100h TL  movs Implied sv, RegisterP0@0, Ab@5
  5F42h TL2 movs Implied sv, r6, Ax@0
  4080h TL  movsi Implied Imm5s@0, R0123457y0@9, Ab@5, Bogus Imm5s@0
  D000h TL  mpy  MemR45@2, MemR0123@0 || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8000h TL  mpy  MemRn@0, Imm16@16    || Rn@0stepZIDS@3
  8020h TL  mpy  y0, MemRn@0          || Rn@0stepZIDS@3
  8040h TL  mpy  y0, Register@0
  E000h TL  mpy  y0, Direct8@0
  5EA0h TL2 mpy  y0, r6
  CB00h TL2 mpy  MemR45@5, MemR01@5 || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3   p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CB01h TL2 mpy  MemR45@5, MemR01@5 || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3   p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CB02h TL2 mpy  MemR45@5, MemR01@5 || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CB03h TL2 mpy  MemR45@5, MemR01@5 || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CB04h TL2 mpy  MemR45@5, MemR01@5 || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3   p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CB05h TL2 mpy  MemR45@5, MemR01@5 || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3   p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CB06h TL2 mpy  MemR45@5, MemR01@5 || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CB07h TL2 mpy  MemR45@5, MemR01@5 || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  D5E0h TL2 mpy  MemR04@1, x1 || mpy y1, x0 || sub3 p0, p1, Ax@3
             || R04@1stepII2@0
  D5E4h TL2 mpy  MemR04@1, x1 || mpy y1, x0 || add3 p0, p1, Ax@3
             || R04@1stepII2@0
  C800h TL2 mpy  MemR4567@4, MemR0123@4
             || mpy MemR4567@4offsZIDZ@2, MemR0123@4offsZIDZ@0
             || add3 p0, p1, Ab@6 || R0123@4stepII2D2S@0, R4567@4stepII2D2S@2
  C900h TL2 mpy  MemR4567@4, MemR0123@4
             || mpy MemR4567@4offsZIDZ@2, MemR0123@4offsZIDZ@0
             || sub3 p0, p1, Ab@6 || R0123@4stepII2D2S@0, R4567@4stepII2D2S@2
  80C2h TL2 mpy  MemR45@0, MemR01@0 || mpy MemR45@0offsZI@9, MemR01@0offsZI@8
             || add3a p0, p1, Ab@10 || R01@0stepII2@8, R45@0stepII2@9
  49C8h TL2 mpy  MemR45@2, MemR01@2 || mpy MemR45@2offsZI@1, MemR01@2offsZI@0
             || sub3a p0, p1, Ab@4 || R01@2stepII2@0, R45@2stepII2@1
  80C8h TL2 mpy  MemR45@2, MemR01@2 || mpy MemR45@2offsZI@1, MemR01@2offsZI@0
             || addsub  p0, p1, Ab@10 || R01@2stepII2@0, R45@2stepII2@1
  81C8h TL2 mpy  MemR45@2, MemR01@2 || mpy MemR45@2offsZI@1, MemR01@2offsZI@0
             || addsuba p0, p1, Ab@10 || R01@2stepII2@0, R45@2stepII2@1
  82C8h TL2 mpy  MemR45@2, MemR01@2 || mpy MemR45@2offsZI@1, MemR01@2offsZI@0
             || add     p0, p1, Ab@10 || R01@2stepII2@0, R45@2stepII2@1
  83C8h TL2 mpy  MemR45@2, MemR01@2 || mpy MemR45@2offsZI@1, MemR01@2offsZI@0
             || adda    p0, p1, Ab@10 || R01@2stepII2@0, R45@2stepII2@1
  00C0h TL2 mpy  MemR45@3, MemR01@3 || mpy MemR45@3offsZI@2, MemR01@3offsZI@1
             || sub  p0, p1, Ab@4 || R01@3stepII2@1, R45@3stepII2@2
  00C1h TL2 mpy  MemR45@3, MemR01@3 || mpy MemR45@3offsZI@2, MemR01@3offsZI@1
             || suba p0, p1, Ab@4 || R01@3stepII2@1, R45@3stepII2@2
  0D20h TL2 mpy  MemR45@3, MemR01@3 || mpyus MemR45@3offsZI@2, MemR01@3offsZI@1
             || add3a p0, p1, Ax@0, dmodi || R01@3stepII2@1, R45@3stepII2@2
  0D30h TL2 mpy  MemR45@3, MemR01@3 || mpyus MemR45@3offsZI@2, MemR01@3offsZI@1
             || add3a p0, p1, Ax@0, dmodj || R01@3stepII2@1, R45@3stepII2@2
  4B50h TL2 mpy  MemR45@3, MemR01@3 || mpyus MemR45@3offsZI@2, MemR01@3offsZI@1
             || add3a p0, p1, Ax@0, dmodij || R01@3stepII2@1, R45@3stepII2@2
  D7A0h TL2 mpy  MemR45@3, MemR01@3 || mpy MemR45@3offsZI@2, MemR01@3offsZI@1
             || add3    sv, p0, p1, Ax@4 || R01@3stepII2@1, R45@3stepII2@2
  D7A1h TL2 mpy  MemR45@3, MemR01@3 || mpy MemR45@3offsZI@2, MemR01@3offsZI@1
             || add3rnd sv, p0, p1, Ax@4 || R01@3stepII2@1, R45@3stepII2@2
  9861h TL2 mpy  MemR45@4, MemR01@4 || mpy MemR45@4offsZI@3, MemR01@4offsZI@2
             || add3  p0, p1, Ax@8, dmodj  || R01@4stepII2@2, R45@4stepII2@3
  9862h TL2 mpy  MemR45@4, MemR01@4 || mpy MemR45@4offsZI@3, MemR01@4offsZI@2
             || add3  p0, p1, Ax@8, dmodi  || R01@4stepII2@2, R45@4stepII2@3
  9863h TL2 mpy  MemR45@4, MemR01@4 || mpy MemR45@4offsZI@3, MemR01@4offsZI@2
             || add3  p0, p1, Ax@8, dmodij || R01@4stepII2@2, R45@4stepII2@3
  98E1h TL2 mpy  MemR45@4, MemR01@4 || mpy MemR45@4offsZI@3, MemR01@4offsZI@2
             || add3a p0, p1, Ax@8, dmodj  || R01@4stepII2@2, R45@4stepII2@3
  98E2h TL2 mpy  MemR45@4, MemR01@4 || mpy MemR45@4offsZI@3, MemR01@4offsZI@2
             || add3a p0, p1, Ax@8, dmodi  || R01@4stepII2@2, R45@4stepII2@3
  98E3h TL2 mpy  MemR45@4, MemR01@4 || mpy MemR45@4offsZI@3, MemR01@4offsZI@2
             || add3a p0, p1, Ax@8, dmodij || R01@4stepII2@2, R45@4stepII2@3
  4DA0h TL2 mpy  y0, MemR04@3 || mpyus y1, MemR04@3offsZI@2
             || sub3  p0, p1, Ax@4 || R04@3stepII2@2
  4DA1h TL2 mpy  y0, MemR04@3 || mpyus y1, MemR04@3offsZI@2
             || sub3a p0, p1, Ax@4 || R04@3stepII2@2
  4DA2h TL2 mpy  y0, MemR04@3 || mpyus y1, MemR04@3offsZI@2
             || add3  p0, p1, Ax@4 || R04@3stepII2@2
  4DA3h TL2 mpy  y0, MemR04@3 || mpyus y1, MemR04@3offsZI@2
             || add3a p0, p1, Ax@4 || R04@3stepII2@2
  94E0h TL2 mpy  y0, MemR04@4 || mpy   y1, MemR04@4offsZI@3
             || sub3  p0, p1, Ax@8 || R04@4stepII2@3
  94E2h TL2 mpy  y0, MemR04@4 || mpy   y1, MemR04@4offsZI@3
             || sub3a p0, p1, Ax@8 || R04@4stepII2@3
  94E4h TL2 mpy  y0, MemR04@4 || mpy   y1, MemR04@4offsZI@3
             || add3  p0, p1, Ax@8 || R04@4stepII2@3
  94E6h TL2 mpy  y0, MemR04@4 || mpy   y1, MemR04@4offsZI@3
             || add3a p0, p1, Ax@8 || R04@4stepII2@3
  94E1h TL2 mpy  y0, MemR04@4 || mpysu y1, MemR04@4offsZI@3
             || sub3  p0, p1, Ax@8 || R04@4stepII2@3
  94E3h TL2 mpy  y0, MemR04@4 || mpysu y1, MemR04@4offsZI@3
             || sub3a p0, p1, Ax@8 || R04@4stepII2@3
  94E5h TL2 mpy  y0, MemR04@4 || mpysu y1, MemR04@4offsZI@3
             || add3  p0, p1, Ax@8 || R04@4stepII2@3
  94E7h TL2 mpy  y0, MemR04@4 || mpysu y1, MemR04@4offsZI@3
             || add3a p0, p1, Ax@8 || R04@4stepII2@3
  8862h TL2 mpy  y0, x1 || mpy   MemR04@4, x0 || sub3  p0, p1, Ax@8
             || R04@4stepII2@3
  8A62h TL2 mpy  y0, x1 || mpy   MemR04@4, x0 || add3  p0, p1, Ax@8
             || R04@4stepII2@3
  4D88h TL2 mpy  y0, x1 || mpy   y1, x0 || sub p0, p1, Ax@1
  5E24h TL2 mpy  y0, x1 || mpy   y1, x0 || add p0, p1, Ab@0
  8061h TL2 mpy  y0, x1 || mpy   y1, x0 || add3  p0, p1, Ab@8
  8071h TL2 mpy  y0, x1 || mpy   y1, x0 || add3a p0, p1, Ab@8
  8461h TL2 mpy  y0, x1 || mpy   y1, x0 || sub3  p0, p1, Ab@8
  8471h TL2 mpy  y0, x1 || mpy   y1, x0 || sub3a p0, p1, Ab@8
  D484h TL2 mpy  y0, x1 || mpy   y1, x0 || add3aa p0, p1, Ab@0
  D49Dh TL2 mpy  y0, x1 || mpy   y1, x0 || sub p0, p1, Bx@5
  D4A0h TL2 mpy  y0, x1 || mpy   y1, x0 || addsub p0, p1, Ab@0
  4FA0h TL2 mpy  y0, x1 || mpy y1, x0 || add3 p0, p1, Ab@3
             || mov Axh@6, MemR04@1 || mov Bxh@2, MemR04@1offsZI@0
             || R04@1stepII2@0
  5818h TL2 mpy  y0, x1 || mpy y1, x0 || addsub    sv, p0, p1, Ax@0
             || mov Axh@0, MemR0426@7 || mov Axh@not0, MemR0426@7offsZI@6
             || R0426@7stepII2@6  ;override 5800h+18h (mov a0, Register)
  5838h TL2 mpy  y0, x1 || mpy y1, x0 || addsubrnd sv, p0, p1, Ax@0
             || mov Axh@0, MemR0426@7 || mov Axh@not0, MemR0426@7offsZI@6
             || R0426@7stepII2@6  ;override 5800h+38h (mov a1, Register)
  80D0h TL2 mpy  y0, x1 || mpy y1, x0 || addsub    sv, p0, p1, Ax@10
             || mov Axh@9, MemR04@3 || mov Bxh@8, MemR04@3offsZI@2
             || R04@3stepII2@2
  80D1h TL2 mpy  y0, x1 || mpy y1, x0 || addsubrnd sv, p0, p1, Ax@10
             || mov Axh@9, MemR04@3 || mov Bxh@8, MemR04@3offsZI@2
             || R04@3stepII2@2
  80D2h TL2 mpy  y0, x1 || mpy y1, x0 || add3      sv, p0, p1, Ax@10
             || mov Axh@9, MemR04@3 || mov Bxh@8, MemR04@3offsZI@2
             || R04@3stepII2@2
  80D3h TL2 mpy  y0, x1 || mpy y1, x0 || add3rnd   sv, p0, p1, Ax@10
             || mov Axh@9, MemR04@3 || mov Bxh@8, MemR04@3offsZI@2
             || R04@3stepII2@2
  D3A0h TL2 mpy  y0, x1 || mpy y1, x0 || addsub p0, p1, Ab@3
             || mov Axh@6, MemR04@1 || mov Bxh@2, MemR04@1offsZI@0
             || R04@1stepII2@0
  4D89h TL2 mpy  y0, x1 || mpyus y1, x0 || sub p0, p1, Ax@1
  5F24h TL2 mpy  y0, x1 || mpyus y1, x0 || add p0, p1, Ab@0
  8069h TL2 mpy  y0, x1 || mpyus y1, x0 || add3  p0, p1, Ab@8
  8079h TL2 mpy  y0, x1 || mpyus y1, x0 || add3a p0, p1, Ab@8
  8469h TL2 mpy  y0, x1 || mpyus y1, x0 || sub3  p0, p1, Ab@8
  8479h TL2 mpy  y0, x1 || mpyus y1, x0 || sub3a p0, p1, Ab@8
  D584h TL2 mpy  y0, x1 || mpyus y1, x0 || add3aa p0, p1, Ab@0
  D59Dh TL2 mpy  y0, x1 || mpyus y1, x0 || sub p0, p1, Bx@5
  D5A0h TL2 mpy  y0, x1 || mpyus y1, x0 || addsub p0, p1, Ab@0
  0800h TL  mpyi NoReverse, Implied p0, y0, Imm8s@0   ;multiply  ;aka "mpys"
  D100h TL  mpysu MemR45@2, MemR0123@0 || R0123@0stepZIDS@3, R45@2stepZIDS@5
  8100h TL  mpysu MemRn@0, Imm16@16    || Rn@0stepZIDS@3
  8120h TL  mpysu y0, MemRn@0          || Rn@0stepZIDS@3
  8140h TL  mpysu y0, Register@0
  CA00h TL2 mpysu MemR45@5, MemR01@5
             || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CA01h TL2 mpysu MemR45@5, MemR01@5
             || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CA02h TL2 mpysu MemR45@5, MemR01@5
             || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3aa p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CA03h TL2 mpysu MemR45@5, MemR01@5
             || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || sub3aa p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CA04h TL2 mpysu MemR45@5, MemR01@5
             || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CA05h TL2 mpysu MemR45@5, MemR01@5
             || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3a  p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CA06h TL2 mpysu MemR45@5, MemR01@5
             || mpysu MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3aa p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  CA07h TL2 mpysu MemR45@5, MemR01@5
             || mpyus MemR45@5offsZI@4, MemR01@5offsZI@3
             || add3aa p0, p1, Ab@6 || R01@5stepII2@3, R45@5stepII2@4
  5EA2h TL2 mpysu y0, r6
  D080h TL  msu  MemR45@2,MemR0123@0,Ax@8 || R0123@0stepZIDS@3, R45@2stepZIDS@5
  90C0h TL  msu  MemRn@0, Imm16@16,  Ax@8 || Rn@0stepZIDS@3 ;multiply, subtract
  9080h TL  msu  y0, MemRn@0, Ax@8 || Rn@0stepZIDS@3
  90A0h TL  msu  y0, Register@0, Ax@8
  B000h TL  msu  y0,Direct8@0, Ax@8
  9462h TL2 msu  y0, r6, Ax@0
  8264h TL2 msusu y0, MemR0426@3, Ax@8 || R0426@3stepII2D2S@0
  6790h TL  neg  Ax@12, Cond@0 ;aX=0-aX
  0000h TL  nop
  94C0h TL  norm Ax@8, Bogus MemRn@0 || Rn@0stepZIDS@3  ;if N=0 (aX=aX*2,rN+/-)
  6780h TL  not  Ax@12, Cond@0 ;aX=not aX
  D4F8h TL  or   Direct16@16, Ax@8
  A000h TL  or   Direct8@0, Ax@8
  80C0h TL  or   Imm16@16, Ax@8
  C000h TL  or   Imm8u@0, Ax@8
  D4D8h TL  or   MemR7Imm16@16, Ax@8
  4000h TL  or   MemR7Imm7@0, Ax@8
  8080h TL  or   MemRn@0, Ax@8 || Rn@0stepZIDS@3
  80A0h TL  or   RegisterP0@0, Ax@8
  D291h TL2 or   Ab@10, Ax@6, Ax@5
  D4A4h TL2 or   Ax@8, Bx@1, Ax@0
  D3C4h TL2 or   b0, Bx@1, Ax@0
  D7C4h TL2 or   b1, Bx@1, Ax@0
  D388h TL2 or   r6, Ax@4
  67B0h TL  pacr Implied Const8000h, Implied p0, Ax@12, Cond@0 ;aX=shfP+8000h
  D7C2h TL2 pacr1 Implied Const8000h, Implied p1, Ax@0
  5E60h TL  pop  Register@0
  47B4h TL2 pop  Abe@0
  80C7h TL2 pop  ArArpSttMod@8
  0006h TL2 pop  Bx@5, Unused1@0
  D7F4h TL2 pop  prpage, Unused2@0
  D496h TL2 pop  Px@0
  0024h TL2 pop  r6, Unused1@0
  D7F0h TL2 pop  repc, Unused2@0
  D494h TL2 pop  x0
  D495h TL2 pop  x1
  0004h TL2 pop  y1, Unused1@0
  47B0h TL2 popa Ab@0
  5F40h TL  push Imm16@16
  5E40h TL  push Register@0
  D7C8h TL2 push Abe@1, Unused1@0
  D3D0h TL2 push ArArpSttMod@0
  D7FCh TL2 push prpage, Unused2@0
  D78Ch TL2 push Px@1, Unused1@0
  D4D7h TL2 push r6, Unused1@5
  D7F8h TL2 push repc, Unused2@0
  D4D4h TL2 push x0, Unused1@5
  D4D5h TL2 push x1, Unused1@5
  D4D6h TL2 push y1, Unused1@5
  4384h TL2 pusha Ax@6, Unused2@0
  D788h TL2 pusha Bx@1, Unused1@0
  0C00h TL  rep  Imm8u@0    ;repeat next opcode N+1 times
  0D00h TL  rep  Register@0 ;repeat next opcode N+1 times
  0002h TL2 rep  r6, Unused1@0
  4580h TL  ret  Cond@0      ;=pop pc
  D780h TL  retd    ;delayed return (after 2 clks)
  45C0h TL  reti Cond@0          ;Don't context switch
  45D0h TL  reti Cond@0, context ;Do context switch
  D7C0h TL  retid   ;delayed, from interrupt
  D3C3h TL2 retid context
  0900h TL  rets Imm8u@0          ;ret+dealloc sp
  67A0h TL  rnd  Implied Const8000h, Ax@12, Cond@0 ;aX=aX+8000h
  6750h TL  rol  Implied Const1,     Ax@12, Cond@0 ;aX=aX rcl 1 (37bit rotate)
  6F50h TL  rol  Implied Const1,     Bx@12, Cond@0 ;bX=bX rcl 1 (37bit rotate)
  6740h TL  ror  Implied Const1,     Ax@12, Cond@0 ;aX=aX rcr 1 (37bit rotate)
  6F40h TL  ror  Implied Const1,     Bx@12, Cond@0 ;bX=bX rcr 1 (37bit rotate)
  E300h TL  rst  Imm16@16, Direct8@0
  82E0h TL  rst  Imm16@16, MemRn@0 || Rn@0stepZIDS@3
  83E0h TL  rst  Imm16@16, Register@0
  47B9h TL2 rst  Imm16@16, r6
  4388h TL2 rst  Imm16@16, SttMod@0
  E100h TL  set  Imm16@16, Direct8@0
  80E0h TL  set  Imm16@16, MemRn@0 || Rn@0stepZIDS@3
  81E0h TL  set  Imm16@16, Register@0
  47B8h TL2 set  Imm16@16, r6
  43C8h TL2 set  Imm16@16, SttMod@0
  D280h TL  shfc Implied sv, Ab@10, Ab@5, Cond@0
  9240h TL  shfi Implied Imm6s@0, Ab@10, Ab@7, Bogus Imm6s@0
  6720h TL  shl  Implied Const1,     Ax@12, Cond@0 ;aX=aX*2
  6F20h TL  shl  Implied Const1,     Bx@12, Cond@0 ;bX=bX*2
  6730h TL  shl4 Implied Const4,     Ax@12, Cond@0 ;aX=aX*10h
  6F30h TL  shl4 Implied Const4,     Bx@12, Cond@0 ;bX=bX*10h
  6700h TL  shr  Implied Const1,     Ax@12, Cond@0 ;aX=aX/2
  6F00h TL  shr  Implied Const1,     Bx@12, Cond@0 ;bX=bX/2
  6710h TL  shr4 Implied Const4,     Ax@12, Cond@0 ;aX=aX/10h
  6F10h TL  shr4 Implied Const4,     Bx@12, Cond@0 ;bX=bX/10h
  BA00h TL  sqr  Direct8@0
  9A80h TL  sqr  MemRn@0 || Rn@0stepZIDS@3
  9AA0h TL  sqr  Register@0
  D790h TL2 sqr  Abh@2 || sqr Abl@2 || add3 p0, p1, Ab@0
  49C4h TL2 sqr  Abh@4 || mpysu Abh@4, Abl@4 || add3a p0, p1, Ab@0
  4B00h TL2 sqr  MemR0426@4 || sqr MemR0426@4offsZIDZ@2 || add3 p0, p1, Ab@0
                  || R0426@4stepII2D2S@2
  5F41h TL2 sqr  r6
  BC00h TL  sqra Direct8@0, Ax@8
  9C80h TL  sqra MemRn@0, Ax@8 || Rn@0stepZIDS@3
  9CA0h TL  sqra Register@0, Ax@8
  9062h TL2 sqra r6, Ax@8, Unused1@0
  D4FFh TL  sub  Direct16@16, Ax@8
  AE00h TL  sub  Direct8@0, Ax@8
  8EC0h TL  sub  Imm16@16, Ax@8
  CE00h TL  sub  Imm8u@0, Ax@8
  D4DFh TL  sub  MemR7Imm16@16, Ax@8
  4E00h TL  sub  MemR7Imm7@0, Ax@8
  8E80h TL  sub  MemRn@0, Ax@8 || Rn@0stepZIDS@3
  8EA0h TL  sub  RegisterP0@0, Ax@8
  8A61h TL2 sub  Ab@3, Bx@8
  8861h TL2 sub  Bx@4, Ax@3
  8064h TL2 sub  MemR01@8, sv, Abh@3 || add MemR01@8offsZI@0, sv, Abl@3
             || mov MemR45@8, sv || R01@8stepII2@0, R45@8stepII2@1
  5DE0h TL2 sub  MemR04@1, sv, Abh@2 || add MemR04@1offsZI@0, sv, Abl@2
             || R04@1stepII2@0
  6FC0h TL2 sub  MemR45@2, MemR01@2, Abh@3
             || add MemR45@2offsZI@1, MemR01@2offsZI@0, Abl@3
             || R01@2stepII2@0, R45@2stepII2@1
  6FE0h TL2 sub  MemR45@2, MemR01@2, Abh@3
             || sub MemR45@2offsZI@1, MemR01@2offsZI@0, Abl@3
             || R01@2stepII2@0, R45@2stepII2@1
  5D80h TL2 sub  MemR45@2, sv, Abh@3 || add MemR45@2offsZI@1, sv, Abl@3
             || mov MemR01@2, sv || R01@2stepII2@0, R45@2stepII2@1
  5DC2h TL2 sub  p0, p1, Ab@2
  D4B9h TL2 sub  p1, Ax@8
  8FD0h TL2 sub  Px@1, Bx@0
  D38Fh TL2 sub  r6, Ax@4
  80C6h TL2 sub3 p0, p1, Ab@10
  82C6h TL2 sub3a p0, p1, Ab@10
  83C6h TL2 sub3aa p0, p1, Ab@10
  5DC3h TL2 suba p0, p1, Ab@2
  B600h TL  subh Direct8@0, Ax@8
  9680h TL  subh MemRn@0, Ax@8 || Rn@0stepZIDS@3
  96A0h TL  subh Register@0, Ax@8
  5E23h TL2 subh r6, Ax@8
  B800h TL  subl Direct8@0, Ax@8
  9880h TL  subl MemRn@0, Ax@8 || Rn@0stepZIDS@3
  98A0h TL  subl Register@0, Ax@8
  5E22h TL2 subl r6, Ax@8
  EF00h TL  subv Imm16@16, Direct8@0
  8EE0h TL  subv Imm16@16, MemRn@0 || Rn@0stepZIDS@3
  8FE0h TL  subv Imm16@16, Register@0
  47BFh TL2 subv Imm16@16, r6
  4980h TL  swap SwapTypes4@0
  0020h TL  trap                  ;software interrupt
  A800h TL  tst0 Axl@8, Direct8@0
  8880h TL  tst0 Axl@8, MemRn@0 || Rn@0stepZIDS@3
  88A0h TL  tst0 Axl@8, Register@0
  E900h TL  tst0 Imm16@16, Direct8@0
  88E0h TL  tst0 Imm16@16, MemRn@0 || Rn@0stepZIDS@3
  89E0h TL  tst0 Imm16@16, Register@0
  D38Ch TL2 tst0 Axl@4, r6
  47BCh TL2 tst0 Imm16@16, r6
  9470h TL2 tst0 Imm16@16, SttMod@0
  AA00h TL  tst1 Axl@8, Direct8@0 Implied Not
  8A80h TL  tst1 Axl@8, MemRn@0 Implied Not || Rn@0stepZIDS@3
  8AA0h TL  tst1 Axl@8, Register@0 Implied Not
  EB00h TL  tst1 Imm16@16, Direct8@0 Implied Not
  8AE0h TL  tst1 Imm16@16, MemRn@0 Implied Not || Rn@0stepZIDS@3
  8BE0h TL  tst1 Imm16@16, Register@0 Implied Not
  D38Dh TL2 tst1 Axl@4, r6 Implied Not
  47BDh TL2 tst1 Imm16@16, r6 Implied Not
  9478h TL2 tst1 Imm16@16, SttMod@0 Implied Not
  80C1h TL2 tst4b a0l, MemR0426@10 || R0426@10stepII2D2S@8
  4780h TL2 tst4b a0l, MemR0426@2, Ax@4 || R0426@2stepII2D2S@0
  F000h TL  tstb NoReverse, Direct8@0, Imm4bitno@8
  9020h TL  tstb NoReverse, MemRn@0, Imm4bitno@8 || Rn@0stepZIDS@3
  9000h TL  tstb NoReverse, Register@0, Imm4bitno@8
  9018h TL2 tstb NoReverse, r6, Imm4bitno@8  ;override 9000h+18h (tstb a0,Imm4)
  0028h TL2 tstb NoReverse, SttMod@0, Imm4bitno@16, Unused12@20
  5F45h TL2 vtrclr vtr0
  5F47h TL2 vtrclr vtr0, vtr1
  5F46h TL2 vtrclr vtr1
  D383h TL2 vtrmov Axl@4
  D29Ah TL2 vtrmov vtr0, Axl@0
  D69Ah TL2 vtrmov vtr1, Axl@0
  D781h TL2 vtrshr
  D4FAh TL  xor  Direct16@16, Ax@8
  A400h TL  xor  Direct8@0, Ax@8
  84C0h TL  xor  Imm16@16, Ax@8
  C400h TL  xor  Imm8u@0, Ax@8
  D4DAh TL  xor  MemR7Imm16@16, Ax@8
  4400h TL  xor  MemR7Imm7@0, Ax@8
  8480h TL  xor  MemRn@0, Ax@8 || Rn@0stepZIDS@3
  84A0h TL  xor  RegisterP0@0, Ax@8
  D38Ah TL2 xor  r6, Ax@4
  8800h TL  undefined Unused5@0, Unused1@8 ;(mpy/mpys without A in bit11)
  8820h TL  undefined Unused5@0, Unused1@8 ;(mpy/mpys without A in bit11)
  8840h TL  undefined Unused5@0, Unused1@8 ;(mpy/mpys without A in bit11)
  D800h TL  undefined Unused7@0, Unused1@8 ;(mpy/mpys without A in bit11)
  9B80h TL  undefined Unused6@0  ;(sqr without A in bit8)
  BB00h TL  undefined Unused8@0  ;(sqr without A in bit8)
  E800h TL  undefined Unused8@0  ;(mpy without A in bit11)
  5EA1h TL2 undefined Unused1@1  ;(mpy/mpys without A in bit11)
  5DFCh TL2 undefined
  8CDEh TL2 undefined
  D3C1h TL2 undefined
  5EB4h TL2 undefined Unused2@0

  DSi TeakLite II Operand Encoding

  Ax:     Axl:    Axh:    Ay:     Ayl:    Ayh:
  0: a0   0: a0l  0: a0h  0: a1   0: a1l  0: a1h
  1: a1   1: a1l  1: a1h  1: a0   1: a0l  1: a0h

  Bx:     Bxl:    Bxh:    Byl:    Byh:
  0: b0   0: b0l  0: b0h  0: b1l  0: b1h
  1: b1   1: b1l  1: b1h  1: b0l  1: b0h

  Ab:     Abl:    Abh:    Abe:
  0: b0   0: b0l  0: b0h  0: b0e
  1: b1   1: b1l  1: b1h  1: b1e
  2: a0   2: a0l  2: a0h  2: a0e
  3: a1   3: a1l  3: a1h  3: a1e

  0: b0l
  1: b0h
  2: b1l
  3: b1h
  4: a0l
  5: a0h
  6: a1l
  7: a1h

  0: true  ;Always
  1: eq    ;Equal to zero                   Z = 1
  2: neq   ;Not equal to zero               Z = 0
  3: gt    ;Greater than zero               M = 0 and Z = 0
  4: ge    ;Greater or equal to zero        M = 0
  5: lt    ;Less than zero                  M = 1
  6: le    ;Less or equal to zero           M = 1 or Z = 1
  7: nn    ;Normalize flag is cleared       N = 0
  8: c     ;Carry flag is set               C = 1
  9: v     ;Overflow flag is set            V = 1
  A: e     ;Extension flag is set           E = 1
  B: l     ;Limit flag is set               L = 1
  C: nr    ;R flag is cleared               R = 0
  D: niu0  ;Input user pin 0, IUSER0, is cleared
  E: iu0   ;Input user pin 0, IUSER0, is set
  F: iu1   ;Input user pin 1, IUSER1, is set

  0: p0
  1: p1

  00: r0
  01: r1
  02: r2
  03: r3
  04: r4
  05: r5
  06: r7    ;aka rb
  07: y0    ;aka y
  08: st0
  09: st1
  0A: st2
  0B: p0h       ;"p0 / p0h"      ;aka "p / ph"  ?
  0C: pc
  0D: sp
  0E: cfgi
  0F: cfgj
  10: b0h
  11: b1h
  12: b0l
  13: b1l
  14: ext0
  15: ext1
  16: ext2
  17: ext3
  18: a0
  19: a1
  1A: a0l
  1B: a1l
  1C: a0h
  1D: a1h
  1E: lc
  1F: sv

  00: r0
  01: r1
  02: r2
  03: r3
  04: r4
  05: r5
  06: r7    ;aka rb
  07: y0    ;aka y
  08: st0
  09: st1
  0A: st2
  0B: p0   ;<-- "P0" here     ;"p0 / p0h"      ;aka "p / ph"  ?
  0C: pc
  0D: sp
  0E: cfgi
  0F: cfgj
  10: b0h
  11: b1h
  12: b0l
  13: b1l
  14: ext0
  15: ext1
  16: ext2
  17: ext3
  18: a0
  19: a1
  1A: a0l
  1B: a1l
  1C: a0h
  1D: a1h
  1E: lc
  1F: sv

  R0123457y0:     Rn:                             R01:    R04:    R45:
  0: r0           0: r0                           0: r0   0: r0   0:r4
  1: r1           1: r1                           1: r1   1: r4   1:r5
  2: r2           2: r2
  3: r3           3: r3                           R0123:  R0426:  R4567:
  4: r4           4: r4                           0: r0   0: r0   0: r4
  5: r5           5: r5                           1: r1   1: r4   1: r5
  6: r7  ;aka rb  6: r6  ;TL2 only                2: r2   2: r2   2: r6
  7: y0  ;aka y   7: r7  ;TL2 only aka rb         3: r3   3: r6   3: r7

  ArArpSttMod:    ArArp:          Ar:             SttMod:
  0: ar0          0: ar0          0: ar0          0: stt0
  1: ar1          1: ar1          1: ar1          1: stt1
  2: arp0         2: arp0                         2: stt2
  3: arp1         3: arp1         Arp:            3: reserved
  4: arp2         4: arp2         0: arp0         4: mod0
  5: arp3         5: arp3         1: arp1         5: mod1
  6: reserved     6: reserved     2: arp2         6: mod2
  7: reserved     7: reserved     3: arp3         7: mod3
  8: stt0
  9: stt1
  A: stt2
  B: reserved
  C: mod0
  D: mod1
  E: mod2
  F: mod3

 db '',0,'   ' ;Z  (zero)
 db '+1',0,' ' ;I  (increment)
 db '-1',0,' ' ;D  (decrement)
 db '+s',0,' ' ;S  (add step)           ;XXX ?   see "stepi" and "stepj"
 db '',0,'   ' ;Z  (zero)
 db '+',0,'  ' ;I  (increment)
 db '-',0,'  ' ;D  (decrement)
 db '+s',0,' ' ;S  (add step)           ;XXX ?   see "stepi" and "stepj"
 db '+1',0,' ' ;I  (increment)
 db '+2',0,' ' ;I2 (increment twice)
 db '-2',0,' ' ;D2 (decrement twice)
 db '+s',0,' ' ;S  (add step)           ;XXX ?   see "stepi" and "stepj"
 db '-2',0,' ' ;D2 (decrement twice)
 db '+s',0,' ' ;S  (add step)           ;XXX ?   see "stepi" and "stepj"
 db '+',0,'  ' ;I  (increment)
 db '+2',0,' ' ;I2 (increment twice)
 db '-2',0,' ' ;D2 (decrement twice)
 db '+s0',0,'' ;S0 (add step0 ?)        ;XXX ?   see "stepi" and "stepj"
 db '+1',0,' ' ;I  (increment)
 db '+2',0,' ' ;I2 (increment twice)
 db '+2',0,' ' ;I2 (increment twice)
 db '-2',0,' ' ;D2 (decrement twice)
 db '',0,'   ' ;Z  (zero)
 db '+',0,' '  ;I  (increment)
 db '+',0,' '  ;I  (increment)
 db '',0,'   ' ;Z  (zero)
 db '+',0,'  ' ;I  (increment)
 db '-',0,'  ' ;D  (decrement)
 db '',0,'   ' ;Z  (zero)

 native           nocash         ;meaning
 (a0,b0)          a0,b0          ;0  a0 <--> b0
 (a0,b1)          a0,b1          ;1  a0 <--> b1
 (a1,b0)          a1,b0          ;2  a1 <--> b0
 (a1,b1)          a1,b1          ;3  a1 <--> b1
 (a0,b0),(a1,b1)  a0:a1,b0:b1    ;4  a0 <--> b0 and a1 <--> b1
 (a0,b1),(a1,b0)  a0:a1,b1:b0    ;5  a0 <--> b1 and a1 <--> b0
 (a0,b0,a1)       a1,b0,a0       ;6  a0 --> b0 --> a1
 (a0,b1,a1)       a1,b1,a0       ;7  a0 --> b1 --> a1
 (a1,b0,a0)       a0,b0,a1       ;8  a1 --> b0 --> a0
 (a1,b1,a0)       a0,b1,a1       ;9  a1 --> b1 --> a0
 (b0,a0,b1)       b1,a0,b0       ;A  b0 --> a0 --> b1
 (b0,a1,b1)       b1,a1,b0       ;B  b0 --> a1 --> b1
 (b1,a0,b0)       b0,a0,b1       ;C  b1 --> a0 --> b0
 (b1,a1,b0)       b0,a1,b1       ;D  b1 --> a1 --> b0
 reserved         reserved       ;E  -
 reserved         reserved       ;F  -

  DSi New Shared WRAM (for ARM7, ARM9, DSP)

Shared WRAM (total 800Kbytes)
  Old WRAM-0/1 32Kbytes (2x16K), mappable to ARM7, or ARM9
  New WRAM-A  256Kbytes (4x64K), mappable to ARM7, or ARM9
  New WRAM-B  256Kbytes (8x32K), mappable to ARM7, ARM9, or DSP-program memory
  New WRAM-C  256Kbytes (8x32K), mappable to ARM7, ARM9, or DSP-data memory
New WRAM mapping is done in two steps: First, mapping the physical banks to logical slots. And then, mapping those slots to actual memory addresses.

4004040h - DSi9 - MBK1.0, WRAM-A0 - 64K, mappable to ARM7, or ARM9
4004041h - DSi9 - MBK1.1, WRAM-A1 - 64K, mappable to ARM7, or ARM9
4004042h - DSi9 - MBK1.2, WRAM-A2 - 64K, mappable to ARM7, or ARM9
4004043h - DSi9 - MBK1.3, WRAM-A3 - 64K, mappable to ARM7, or ARM9
  0    Master (0=ARM9, 1=ARM7)
  1    Not used
  2-3  Offset (0..3) (slot 0..3) (LSB of address in 64Kbyte units)
  4-6  Not used
  7    Enable (0=Disable, 1=Enable)
In cooking coach, above four bytes are locked via MBK9 (not write-able, always 81h,85h,89h,8Dh)?

4004044h - DSi9 - MBK2.0, WRAM-B0 - 32K, mappable to ARM7, ARM9, or DSP/code
4004045h - DSi9 - MBK2.1, WRAM-B1 - 32K, mappable to ARM7, ARM9, or DSP/code
4004046h - DSi9 - MBK2.2, WRAM-B2 - 32K, mappable to ARM7, ARM9, or DSP/code
4004047h - DSi9 - MBK2.3, WRAM-B3 - 32K, mappable to ARM7, ARM9, or DSP/code
4004048h - DSi9 - MBK3.0, WRAM-B4 - 32K, mappable to ARM7, ARM9, or DSP/code
4004049h - DSi9 - MBK3.1, WRAM-B5 - 32K, mappable to ARM7, ARM9, or DSP/code
400404Ah - DSi9 - MBK3.2, WRAM-B6 - 32K, mappable to ARM7, ARM9, or DSP/code
400404Bh - DSi9 - MBK3.3, WRAM-B7 - 32K, mappable to ARM7, ARM9, or DSP/code
  0-1  Master (0=ARM9, 1=ARM7, 2 or 3=DSP/code)
  2-4  Offset (0..7) (slot 0..7) (LSB of address in 32Kbyte units)
  5-6  Not used (0)
  7    Enable (0=Disable, 1=Enable)

400404Ch - DSi9 - MBK4.0, WRAM-C0 - 32K, mappable to ARM7, ARM9, or DSP/data
400404Dh - DSi9 - MBK4.1, WRAM-C1 - 32K, mappable to ARM7, ARM9, or DSP/data
400404Eh - DSi9 - MBK4.2, WRAM-C2 - 32K, mappable to ARM7, ARM9, or DSP/data
400404Fh - DSi9 - MBK4.3, WRAM-C3 - 32K, mappable to ARM7, ARM9, or DSP/data
4004050h - DSi9 - MBK5.0, WRAM-C4 - 32K, mappable to ARM7, ARM9, or DSP/data
4004051h - DSi9 - MBK5.1, WRAM-C5 - 32K, mappable to ARM7, ARM9, or DSP/data
4004052h - DSi9 - MBK5.2, WRAM-C6 - 32K, mappable to ARM7, ARM9, or DSP/data
4004053h - DSi9 - MBK5.3, WRAM-C7 - 32K, mappable to ARM7, ARM9, or DSP/data
  0-1  Master (0=ARM9, 1=ARM7, 2 or 3=DSP/data)
  2-4  Offset (0..7) (slot 0..7) (LSB of address in 32Kbyte units)
  5-6  Not used (0)
  7    Enable (0=Disable, 1=Enable)

4004054h - DSi9 - MBK6, WRAM-A, 64K..256K mapping
  0-3   Not used (0)
  4-11  Start Address (3000000h+N*10000h)     ;=3000000h..3FF0000h
  12-13 Image Size (0 or 1=64KB/Slot0, 2=128KB/Slot0+1+2??, 3=256KB/Slot0..3)
  14-19 Not used (0)
  20-28 End Address   (3000000h+N*10000h-1)   ;=2FFFFFFh..4FEFFFFh
  29-31 Not used (0)
Uh, does this affect only ARM9 mapping? Or also ARM7 mapping?
Uh, but, ARM7 3800000h..3FFFFFFh contains OTHER memory (ARM7-WRAM) !?

4004058h - DSi9 - MBK7, WRAM-B
400405Ch - DSi9 - MBK8, WRAM-C
  0-2   Not used (0)
  3-11  Start Address (3000000h+N*8000h)      ;=3000000h..3FF8000h
  12-13 Image Size (0=32K/Slot0,1=64KB/Slot0-1,2=128KB/Slot0-3,3=256KB/Slot0-7)
  14-18 Not used (0)
  19-28 End Address   (3000000h+N*8000h-1)    ;=2FFFFFFh..4FF7FFFh
  29-31 Not used (0)
Uh, does this affect only ARM9 mapping? Or also ARM7 and DSP mapping?
Uh, but, ARM7 3800000h..3FFFFFFh contains OTHER memory (ARM7-WRAM) !?

4004060h - DSi9 - MBK9, WRAM-A/B/C Slot Master Selection (undocumented) (R)
  0-3   WRAM-A, Port 4004040h-4004043h Master (0=ARM9, 1=ARM7)
  4-7   Unknown/Unused (0)
  8-15  WRAM-B, Port 4004044h-400404Bh Master (0=ARM9, 1=ARM7)
  16-23 WRAM-C, Port 400404Ch-4004053h Master (0=ARM9, 1=ARM7)
  24-31 Unknown/Unused (0)
Selects which CPU shall control the WRAM slot registers at 4004040h-4004053h (when selecting ARM7 as master, then the registers become Read-Only on ARM9 side).

ARM7 Side
WRAM-related I/O Ports at ARM7 side are unknown (if any).
GUESS: Maybe ports 4004040h..4004053h exist as mirror?
GUESS: Maybe ports 4004054h..400405Fh exist as separate ARM7 registers?
GUESS: Maybe ports 4004060h..4004063h exist as write-able mirror?
Existing DSi exploits don't permit to access those regs on ARM7 side.

Slots and Image Size vs Start/End Addresses
When using Image Size of 4 slots, then Memory at 3000000h..3FFFFFFh is:
  Slots  0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,etc.
When start=6, and End=12, then (with above example), only following is mapped:
  Slots  -,-,-,-,-,-,2,3,0,1,2,3,-,-,-,-,etc.
Observe that the mapped region starts with Slot 2 (not Slot 0) in that case.
Moreover, some slots may be empty (disabled, or mapped to another CPU), so, if Slot 3 is empty, then memory might probably look somewhat like so: (?)
  Slots  -,-,-,-,-,-,2,-,0,1,2,-,-,-,-,-,etc.
If so, unknown what is mapped to those "empty" areas (mirrors, or underlaying WRAM's of lower priority, or zeroes, or garbage, or whatever).

Overlapping WRAM regions
  New Shared-WRAM-A   Highest Priority
  New Shared-WRAM-B   High Priority
  New Shared-WRAM-C   Low Priority
  Old Shared-WRAM-0/1 Lowest Priority
  Old ARM7-WRAM       Whatever Priority (unknown...)
Overlapping WRAM slots
  Unknown what happens when selecting multiple WRAM blocks to the same slot?

  DSi New DMA

The DSi has four new DMA channels for ARM7 and ARM9 (each) (eight new DMA channels in total). The old NDS-style DMA channels do probably still exist, too [though unknown which priority they have in relation to new channels].

4004100h - DSi - NDMAGCNT NewDMA Global Control (R/W) [00000000h]
  0-15  Unused (0)
  16-19 Cycle Selection (0=None, 1..15=1..16384 clks) ;1 SHL (N-1)
  20-30 Unused (0)
  31    DMA Arbitration Mode (0=NDMA0=HighestPriority, 1=RoundRobinPriority)
CycleSelection is used ONLY in RoundRobin mode; if so... then it does specify the number of cycles that can be executed by ARM9 and DSP <CPUs?> during NDMA?

4004104h+x*1Ch - NDMAxSAD - NewDMAx Source Address (R/W) [00000000h]
4004108h+x*1Ch - NDMAxDAD - NewDMAx Destination Address (R/W) [00000000h]
  0-1   Unused (0)
  2-31  DMA Source/Destination Address, in 4-byte steps

400410Ch+x*1Ch - NDMAxTCNT - NewDMAx Total Length for Repeats (R/W) [0]
  0-27  Total Number of Words to Transfer (1..0FFFFFFFh, or 0=10000000h)
  28-31 Unused (0)
Not used in "Start immediately" mode (which doesn't repeat).
Not used in "Repeat infinitely" mode (which repeats forever).
Used only in "Repeat until NDMAxTCNT" mode (for example, to define the total size of the Camera picture).
Total Length isn't required to be a multiple of the Logical or Physical Block Sizes (for example the DSi launcher uses Total=64h with Log=8/Phys=8; in that case only 4 words (instead of 8 words) are transferred for the last block).

4004110h+x*1Ch - NDMAxWCNT - NewDMAx Logical Block Size
  0-23  Number of Words to Transfer       (1..00FFFFFFh, or 0=01000000h)
  24-31 Unused (0)
Should be a multiple of the Physical Block Size specified in NDMAxCNT.Bit16-19.
The bus will be monopolized until the selected number of words for (physical) block transfers has completed, a single (physical) block transfer cycle will never be split up.

4004114h+x*1Ch - NDMAxBCNT - NewDMAx Block Transfer Timing/Interval
  0-15  Interval Timer (1..FFFFh, or 0=Infinite/TillTransferEnd)
  16-17 Prescaler (33.514MHz SHR (n*2)) ;0=33MHz, 1=8MHz, 2=2MHz, 3=0.5MHz
  18-31 Unused (0)
Allows to insert a delay after each (Physical?) Block.

4004118h+x*1Ch - NDMAxFDATA - NewDMAx Fill Data
  0-31  Fill Data (can be used as Fixed Source Data for memfill's)
This value is used when setting NDMAxCNT.Bit13-14=3, which causes the source data to be read directly (within 0 clock cycles) from the NDMAxFDATA (instead of from the address specified in NDMAxSAD; in this case, the NDMAxSAD is ignored/don't care).

400411Ch+x*1Ch - NDMAxCNT - NewDMAx Control
  0-9   Unused (0)
  10-11 Dest Address Update   (0=Increment, 1=Decrement, 2=Fixed, 3=Reserved)
  12    Dest Address Reload   (0=No, 1=Reload at (logical blk?) transfer end)
  13-14 Source Address Update (0=Increment, 1=Decrement, 2=Fixed, 3=FillData)
  15    Source Address Reload (0=No, 1=Reload at (logical blk?) transfer end)
  16-19 Physical Block Size   (0..0Fh=1..32768 words, aka (1 SHL n) words)
  20-23 Unused (0)
  24-28 DMA Startup Mode      (00h..1Fh, see ARM7/ARM9 startup lists below)
  29    DMA Repeat Mode       (0=Repeat until NDMAxTCNT, 1=Repeat infinitely)
  30    DMA Interrupt Enable  (0=Disable, 1=Enable)
  31    DMA Enable/Busy       (0=Disable, 1=Enable/Busy)

Startup Modes for ARM9:
  00h      Timer0                       ;\
  01h      Timer1                       ; new NDMA-specific modes
  02h      Timer2                       ;
  03h      Timer3                       ;/
  04h      DS Cartridge Slot
  05h      Reserved (maybe 2nd DS-Cart Slot, or GBA slot relict?)
  06h      V-Blank
  07h      H-Blank (but not during V-blank)
  08h      Display Sync (sync to H-blank drawing) ;Uh, what is BLANK-DRAWING ??
  09h      Work RAM (what?) (=probably Main memory display, as on NDS)
  0Ah      Geometry Command FIFO
  0Bh      Camera                       ;-new NDMA-specific mode
  0Ch..0Fh Reserved
  10h..1Fh Start immediately (without repeat)

Startup Modes for ARM7:
  00h      Timer0                       ;\
  01h      Timer1                       ; new NDMA-specific modes
  02h      Timer2                       ;
  03h      Timer3                       ;/
  04h      DS Cartridge Slot
  05h      Reserved? (maybe 2nd DS-Cart Slot, or GBA slot relict?)
  06h      V-Blank
  07h      Wifi
  08h      SD/MMC I/F 1    ;what "1" ?  ;\
  09h      SD/MMC I/F 2    ;what "2" ?  ;
  0Ah      AES in  (AES_WRFIFO)         ; new NDMA-specific modes
  0Bh      AES out (AES_RDFIFO) / MIC?  ;
  0Ch      MIC?                         ;/
  0Dh..0Fh Reserved?
  10h..1Fh Start immediately (without repeat)

Start/repeat modes
There are three different transfer modes.
1) Start immediately (without repeat):
the transfer ends after one Logical Block, without repeat. With single IRQ (after last/only block).
2) Start by Hardware events, Repeat until NDMAxTCNT:
the transfer repeats Logical Blocks until reaching the Total Length. With single IRQ (after last block).
3) Start by Hardware events, Repeat infinitely:
the transfer repeats Logical Blocks infinitely. With multiple IRQs (one IRQ after EACH logical block).

Read-only Effect
There is something that can make port 4004104h..4004173h read-only. For example, when FFh-filling all DSi registers, and then 00h-filling them, then most DMA bits stay set (00h-filling them another time does clear them).
Maybe, during enabled transfers, ONLY the enable/busy bit is writeable?

  DSi SoundExt

4004700h - DSi7 - SNDEXCNT (16bit) (can be 0000C00Fh)
  0-3     NITRO/DSP ratio                   (valid range is 0 to 8)       (R/W)
  4-12    Unknown/Unused (0)                                               (0?)
  13      Sound/Microphone I2S frequency (0=32.73 kHz, 1=47.61 kHz)  (R or R/W)
  14      Mute status                                (?=Mute WHAT?)       (R/W)
  15      Enable Microphone (and Sound Output?)          (1=Enable)       (R/W)
The DSP can generate sound output aswell, alongside the old NITRO sound mixer. The following settings configure the ratio between DSP and NITRO mixer output:
  00h      DSP sound 8/8, NITRO sound 0/8 (=DSP sound only)
  01h      DSP sound 7/8, NITRO sound 1/8
  02h      DSP sound 6/8, NITRO sound 2/8
  03h      DSP sound 5/8, NITRO sound 3/8
  04h      DSP sound 4/8, NITRO sound 4/8
  05h      DSP sound 3/8, NITRO sound 5/8
  06h      DSP sound 2/8, NITRO sound 6/8
  07h      DSP sound 1/8, NITRO sound 7/8
  08h      DSP sound 0/8, NITRO sound 8/8 (=NITRO sound only)
  09h..0Fh Reserved
Uh, what is that? Hopefully, a volume-ratio? Preferably, no time-ratio!

  DSi Advanced Encryption Standard (AES)

AES I/O Ports
DSi AES I/O Ports

AES Pseudo Code
Little Endian Code (as used in DSi hardware):
DSi AES Little-Endian High Level Functions
DSi AES Little-Endian Core Function and Key Schedule
DSi AES Little-Endian Tables and Test Values
Big Endian Code (as used more commonly, in non-DSi implementations):
DSi AES Big-Endian High Level Functions
DSi AES Big-Endian Core Function and Key Schedule
DSi AES Big-Endian Tables and Test Values
Most AES values are endian-free byte-strings, so different "endianness" does just mean to reverse the byte order of the 16/24/32-byte KEYs, the 16-byte data chunks, and the 16-byte CTR/CFB/CBC/MAC registers (in some of the latter cases it's also referring to actual endiannes, eg. for CTR increments).

AES Usage in DSi
AES-CCM is used for several SD/MMC files (using a custom Nintendo-specific CCM variant; consisting of 128K-byte data blocks with 32-byte footers):
DSi ES Block Encryption
AES-CTR is used for the Modcrypt areas defined in Cartridge Header, and for eMMC Boot Sectors and for eMMC MBR/Partitions.

AES Usage in DSi-Wifi
DSi Wifi is also supporting AES (and TKIP and WEP) encryption, the Wifi AES part is probably implemented via additional AES hardware in the Wifi unit?

AES Usage in DSi-Shop
DSi Shop downloads (and system updates) are using big-endian AES-CBC, this appears to require an AES software implementation because the DSi's AES hardware couldn't decrypt that AES variant.

  DSi AES I/O Ports

4004400h - DSi7 - AES_CNT (parts R/W)
  0-4   Write FIFO Count    (00h..10h words) (00h=Empty, 10h=Full)          (R)
  5-9   Read FIFO Count     (00h..10h words) (00h=Empty, 10h=Full)          (R)
  10    Write FIFO Flush    (0=No change, 1=Flush)                   (N/A or W)
  11    Read FIFO Flush     (0=No change, 1=Flush)                   (N/A or W)
  12-13 Write FIFO DMA Size (0..3 = 16,12,8,4 words) (2=Normal=8)    (R or R/W)
  14-15 Read FIFO DMA Size  (0..3 = 4,8,12,16 words) (1=Normal=8)    (R or R/W)
  16-18 CCM MAC Size, max(4,(N*2+2)) bytes, (usually 7=16 bytes)     (R or R/W)
  19    CCM Pass Associated Data to RDFIFO (0=No/Normal, 1=Yes)      (R or R/W)
          Bit19=1 is a bit glitchy: The data should theoretically arrive in
          RDFIFO immediately after writing 4 words to WRFIFO, but actually,
          Bit19=1 seems to cause 4 words held hidden in neither FIFO, until
          the first Payload block is written (at that point, the hidden
          associated words are suddenly appearing into RDFIFO)
  20    CCM MAC Verify Source (0=From AES_WRFIFO, 1=From AES_MAC)    (R or R/W)
  21    CCM MAC Verify Result (0=Invalid/Busy, 1=Verified/Okay)             (R)
  22-23 Unknown/Unused (0)                                                  (0)
  24    Key Select        (0=No change, 1=Apply key selected in Bit26-27)   (W)
  25    Key Schedule Busy (uh, always 0=ready?) (rather sth else busy?)     (R)
  26-27 Key Slot          (0..3=KEY0..KEY3, applied via Bit24)       (R or R/W)
  28-29 Mode (0=CCM/decrypt, 1=CCM/encrypt, 2=CTR, 3=Same as 2)      (R or R/W)
  30    Interrupt Enable  (0=Disable, 1=Enable IRQ on Transfer End)  (R or R/W)
  31    Start/Enable      (0=Disable/Ready, 1=Enable/Busy)                (R/W)
Bit31 gets cleared automatically shortly after all data (as indicated in AES_BLKCNT) is written to WRFIFO, and the IRQ is generated alongsides; the transfer isn't fully completed at that point since there may be still data (and CCM/encrypt MAC result) in RDFIFO.

4004404h - DSi7 - AES_BLKCNT (W)
Specifies the transfer length, counted in 16-byte blocks.
  0-15  Number of Extra associated data blocks for AES-CCM (unused for AES-CTR)
  16-31 Number of Payload data blocks (0..FFFFh = 0..FFFF0h bytes)
The length values are copied to internal counter registers on transfer start (the value in AES_BLKCNT is left unchanged during/after transfer).

4004408h - DSi7 - AES_WRFIFO (W)
400440Ch - DSi7 - AES_RDFIFO (R)
  0-31  Data
Writing to WRFIFO works even when AES_CNT.bit31=0 (the data does then stay in WRFIFO though, and doesn't arrive in RDFIFO).

4004420h - DSi7 - AES_IV (16 bytes) (W)
This contains the Initialization Vector (aka IV aka Nonce). The hardware does use that value to automatically initialize the internal CTR/CBC registers when starting encryption/decryption:
  For AES-CTR mode:  CTR[00h..0Fh] = AES_IV[00h..0Fh]
                     CBC[00h..0Fh] = not used by AES-CTR mode
  For AES-CCM mode:  CTR[00h..0Fh] = 00h,00h,00h,AES_IV[00h..0Bh],02h
                     CBC[00h..0Fh] = x0h,xxh,0xh,AES_IV[00h..0Bh],flg
The initial CTR/CBC values for AES-CCM mode are following the CCM specifications, but WITHOUT encoding the "extra associated data size" in upper bytes of first block (see CCM pseudo code chapter for details).
The CTR/CBC registers are manipulated during transfer, however, the AES_IV content is kept unchanged during/after transfer.

4004430h - DSi7 - AES_MAC (16 bytes) (W)
The MAC (Message Authentication Code) is an encrypted checksum, computed alongsides with the actual data encryption/decryption, and used only in AES-CCM mode. There are three ways how the DSi deals with MAC values:
  AES-CCM Encryption: MAC is returned in AES_RDFIFO after transfer
  AES-CCM Decryption, AES_CNT.20=0: MAC written to AES_WRFIFO after transfer
  AES-CCM Decryption, AES_CNT.20=1: MAC written to AES_MAC before transfer
The AES_MAC register and the RDFIFO/WRFIFO blocks are always 16-byte wide; when selecting a smaller MAC size in AES_CNT, then the lower bytes of that 16-byte value are 00h-padded (eg. a 6-byte MAC would appear as 00000000h, 00000000h, xxxx0000h, xxxxxxxxh), for ENCRYPT those 00h-bytes are returned in RDFIFO, for DECRYPT those padding bytes MUST be 00h (else the verification will fail).
The minimum MAC size is 4 bytes (trying to use 2 byte by setting AES_CNT.16-18 to 00h is producing the exact same result as when setting it to 01h, ie. 4-bytes)

4004440h - DSi7 - AES_KEY0 (48 bytes) (W)
4004470h - DSi7 - AES_KEY1 (48 bytes) (W)
40044A0h - DSi7 - AES_KEY2 (48 bytes) (W)
40044D0h - DSi7 - AES_KEY3 (48 bytes) (W)
  Byte 00h-0Fh  Normal 128bit Key      ;\use either normal key,
  Byte 10h-1Fh  Special 128bit Key_X   ; or special key_x/y
  Byte 20h-2Fh  Special 128bit Key_Y   ;/
Writing the last word of "Key_Y" (or any of its last four bytes, ie. byte(s) 2Ch..2Fh) causes the Normal Key to be overwritten by following values:
  Key = ((Key_X XOR Key_Y) + FFFEFB4E295902582A680F5F1A4F3E79h) ROL 42
After changing a key, one must (re-)apply it via AES_CNT.Bits 24,26-27.

The AES data would be usually transferred via two NDMA channels, one for WRFIFO, one for RDFIFO. The NDMAs should be started BEFORE setting AES_CNT.31 (else the DMA will miss the first WRFIFO data request; and DMA won't start). The DMAs 'Logical Block' sizes should match up with the block sizes selected in AES_CNT (a bigger logical block size would cause FIFO overruns/underruns, a smaller logical block size could work theoretically, but it practice it causes the DMA to hang after the first data request; apparently data requests are somewhat generated upon "empty-not-empty" transitions, rather than upon "enough data/space" status).

Reading Write-Only Values
The AES_IV register and the AES_KEY registers are fully write-able, including with 8bit STRB support; this allows to 'read' the write-only values via brute-force without any noticeable delay (ie. encrypt 16 bytes with original values, then change one byte to values 00h..FFh, and check which of those values gives same encryption result). AES_BLKCNT can be also dumped by simple counting.

Cartheader Key Request Byte
The firmware is usually destroying the AES_KEY registers before starting DSi programs. However, bits in CartHeader[1B4h] allow to "request" certain keys to be left intact.

DSi BIOS & Firmware Keys
The DSi BIOS contains several AES keys in the non-dumpable upper 32K halves; most of that keys are relocated to RAM/TCM, so they can be dumped via main memory hacks (there might be some further keys that cannot be dumped, in case they are exist only in early boot stages).
The purpose for most of the dump-able keys is still unknown. Aside from the plain keys, it would be also important to know their corresponding IV values.

  DSi AES Little-Endian High Level Functions

AES-CTR (Counter)
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  [ctr+0..15] = [iv+0..15]                                         ;-init ctr
  while len>0   ;code is 100% same for ENCRYPT and DECRYPT         ;\
    if n=0                                                         ; encrypt
      aes_crypt_block(ENCRYPT,ctr,tmp)                             ; or decrypt
      littleendian(ctr)=littleendian(ctr)+1   ;increment counter   ; message
    [dst] = [src] xor [tmp+n]                                      ;
    src=src+1, dst=dst+1, len=len-1, n=(n+1) and 0Fh               ;/

AES-CCM (Counter with CBC-MAC)
  if mac_len<4 or mac_len>16 or (mac_len and 1)=1 then error       ;\limits
  if iv_len<7 or iv_len>13 then error                              ;/
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  ctr_len = 15-iv_len                                              ;\
  [ctr+15]=ctr_len-1          ;bit3..7=zero   ;1 byte (ctr_len)    ; init ctr
  [ctr+(15-iv_len)..14] = [iv+0..(iv_len-1)]  ;7..13 bytes (iv)    ;
  [ctr+0..(14-iv_len)]=littleendian(0)  ;8..2 bytes (counter=0)    ;/
  [cbc+0..15]=littleendian(msg_len)  ;-[(iv_len+1)..15]=msg_len    ;\
  if [cbc+15..15-iv_len]<>0 then error  ;msg_len overlaps iv/flags ;
  [cbc+(15-iv_len)..14]=[iv+0..iv_len-1] ;-[1..iv_len]=iv/nonce    ;
  [cbc+15].bit7=0  ;reserved/zero   ;\                             ; init cbc
  [cbc+15].bit6=(xtra_len>0)        ; [15]=flags                   ;
  [cbc+15].bit5..3=(mac_len/2-1)    ;                              ;
  [cbc+15].bit2..0=(ctr_len-1)      ;/                             ;
  aes_crypt_block(ENCRYPT,cbc,cbc)      ;UPDATE_CBC_MAC            ;/
  if NintendoDSi then                                              ;\
    a=0 ;the DSi hardware doesn't support xtra_len encoding at all ;
  elseif xtra_len<0FF00h then                                      ;
    [cbc+14..15]=[cbc+14..15] xor littleendian(xtra_len), a=2      ; weird
  elseif xtra_len<100000000h then                                  ; encoding
    [cbc+14..15]=[cbc+14..15] xor littleendian(FFFEh)              ; for
    [cbc+10..13]=[cbc+10..13] xor littleendian(xtra_len), a=6      ; xtra_len
  else                                                             ;
    [cbc+14..15]=[cbc+14..15] xor littleendian(FFFFh)              ;
    [cbc+6..13] =[cbc+6..13]  xor littleendian(xtra_len), a=10     ;/
  while xtra_len>0                                                 ;\scatter
    z=min(xtra_len,16-a)                                           ; cbc by
    [cbc+16-a-z..(15-a)]=[cbc+16-a-z..(15-a)] xor [xtra+0..(z-1)]  ; xtra
    aes_crypt_block(ENCRYPT,cbc,cbc)    ;UPDATE_CBC_MAC            ; (if any)
    xtra=xtra+z, xtra_len=xtra_len-z, a=0                          ;/
  while msg_len>0                                                  ;\
    littleendian(ctr)=littleendian(ctr)+1   ;increment counter     ;
    aes_crypt_block(ENCRYPT,ctr,tmp)    ;CTR_CRYPT                 ;
    z=min(msg_len,16)                                              ; encrypt
    if mode=ENCRYPT                                                ; or decrypt
      [cbc+(16-z)..15] = [cbc+(16-z)..15] xor [src+0..(z-1)]       ; message
    [dst+0..(z-1)] = [src+0..(z-1)] xor [tmp+(16-z)..15]           ; body
    if mode=DECRYPT                                                ;
      [cbc+(16-z)..15] = [cbc+(16-z)..15] xor [dst+0..(z-1)]       ;
    aes_crypt_block(ENCRYPT,cbc,cbc)    ;UPDATE_CBC_MAC            ;
    src=src+z, dst=dst+z, msg_len=msg_len-z                        ;/
  [ctr+0..(14-iv_len)]=littleendian(0)  ;reset counter=0           ;\
  aes_crypt_block(ENCRYPT,ctr,tmp)      ;CTR_CRYPT                 ; message
  [cbc+0..15] = [cbc+0..15] xor [tmp+0..15]                        ; auth code
  z=mac_len                                                        ; (mac)
  IF mode=ENCRYPT then [mac+0..(z-1)] = [cbc+(16-z)..15]           ;
  IF mode=DECRYPT and [mac+0..(z-1)] <> [cbc+(16-z)..15] then error;/

Below are some other AES variants (just for curiosity - those variants aren't used in DSi):

AES-CBC (Cipher-block chaining)
  aes_setkey(mode,key,key_size]                                    ;-init key
  [cbc+0..15] = [iv+0..15]                                         ;-init cbc
  if (len AND 0Fh)>0 then error
  while len>0                                                      ;\
    if mode=ENCRYPT                                                ;
      [dst+0..15] = [src+0..15] xor [cbc+0..15]                    ;
      aes_crypt_block(mode,dst,dst)                                ; encrypt
      [cbc+0..15] = [dst+0..15]                                    ; or decrypt
    if mode=DECRYPT                                                ; message
      [tmp+0..15] = [src+0..15]                                    ;
      aes_crypt_block(mode,src,dst)                                ;
      [dst+0..15] = [dst+0..15] xor [cbc+0..15]                    ;
      [cbc+0..15] = [tmp+0..15]                                    ;
    src=src+16, dst=dst+16, len=len-16                             ;/

AES-CFB128 (Cipher feedback on 128bits, aka 16 bytes)
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  [cfb+0..15] = [iv+0..15]                                         ;-init cfb
  while len>0                                                      ;\
    if n=0 then aes_crypt_block(ENCRYPT,cfb,cfb)                   ; encrypt
    if mode=DECRYPT then c=[src], [dst]=c xor [cfb+n], [cfb+n]=c   ; or decrypt
    if mode=ENCRYPT then c=[cfb+n] xor [src], [cfb+n]=c, [dst]=c   ; message
    src=src+1, dst=dst+1, len=len-1, n=(n+1) and 0Fh               ;/

AES-CFB8 (Cipher feedback on 8bits, aka 1 byte, very inefficient)
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  [cfb+0..15] = [iv+0..15]                                         ;-init cfb
  while len>0                                                      ;\
    aes_crypt_block(ENCRYPT,cfb,tmp)                               ;
    [cfb+1..15] = [cfb+0..14]   ;shift with 8-bit step             ; encrypt
    if mode=DECRYPT then [cfb+0] = [src+(n xor 0Fh)]               ; or decrypt
    [dst+(n xor 0Fh)] = [src+(n xor 0Fh)] xor [tmp+15]  ;shift-in  ; message
    if mode=ENCRYPT then [cfb+0] = [dst+(n xor 0Fh)]               ;
    len=len-1, n=n+1                                               ;/

AES-ECB (Electronic codebook, very basic, very insecure)
  aes_setkey(mode,key,key_size]                                    ;-init key
  if (len AND 0Fh)>0 then error
  while len>0                                                      ;\encrypt
    aes_crypt_block(mode,src,dst)                                  ; or decrypt
    src=src+16, dst=dst+16, len=len-16                             ;/message

  DSi AES Little-Endian Core Function and Key Schedule

  Y0 = RK[0] xor [src+00h]
  Y1 = RK[1] xor [src+04h]
  Y2 = RK[2] xor [src+08h]
  Y3 = RK[3] xor [src+0Ch]
  ;below code depending on mode:      <---ENCRYPT--->  -or-  <---DECRYPT--->
  for i=1 to nr-1
    X0      = RK[i*4+0] xor scatter32(FT,Y1,Y2,Y3,Y0)  -or-  (RT,Y3,Y2,Y1,Y0)
    X1      = RK[i*4+1] xor scatter32(FT,Y2,Y3,Y0,Y1)  -or-  (RT,Y0,Y3,Y2,Y1)
    X2      = RK[i*4+2] xor scatter32(FT,Y3,Y0,Y1,Y2)  -or-  (RT,Y1,Y0,Y3,Y2)
    X3      = RK[i*4+3] xor scatter32(FT,Y0,Y1,Y2,Y3)  -or-  (RT,Y2,Y1,Y0,Y3)
    Y0=X0, Y1=X1, Y2=X2, Y3=X3
  [dst+00h] = RK[nr*4+0] xor scatter8(FSb,Y1,Y2,Y3,Y0) -or-  (RSb,Y3,Y2,Y1,Y0)
  [dst+04h] = RK[nr*4+1] xor scatter8(FSb,Y2,Y3,Y0,Y1) -or-  (RSb,Y0,Y3,Y2,Y1)
  [dst+08h] = RK[nr*4+2] xor scatter8(FSb,Y3,Y0,Y1,Y2) -or-  (RSb,Y1,Y0,Y3,Y2)
  [dst+0Ch] = RK[nr*4+3] xor scatter8(FSb,Y0,Y1,Y2,Y3) -or-  (RSb,Y2,Y1,Y0,Y3)

 scatter32(TAB,a,b,c,d):              scatter8(TAB,a,b,c,d):
  w=      (TAB[a.bit0..7] ror 24)      w.bit0..7   = TAB[a.bit0..7]
  w=w xor (TAB[b.bit8..15] ror 16)     w.bit8..15  = TAB[b.bit8..15]
  w=w xor (TAB[c.bit16..23] ror 8)     w.bit16..23 = TAB[c.bit16..23]
  w=w xor (TAB[d.bit24..31])           w.bit24..31 = TAB[d.bit24..31]
  return w                             return w

 aes_setkey(mode,key,keysize):  ;out: RK[0..43/51/59], nr=10/12/14
  aes_generate_tables   ;<-- unless tables are already initialized
  if keysize<>128 and keysize<>192 and keysize<>256 then error  ;size in bits
  rc=01h, j=0, jj=keysize/32, nr=jj+6   ;jj=4,6,8      ;\
  for i=0 to (nr+1)*4-1                 ;nr=10,12,14   ; copy 16/24/32-byte key
    if i<jj then w=[key+(jj-1-i)*4+0..3]               ; to RK[0..3/5/7]
    else w=w xor RK[(i-jj) xor 3]                      ; and, make
    RK[i xor 3]=w, j=j+1                               ; RK[4/6/8..43/51/59]
    if j=jj then                                       ;
      w=w ror 8, w=scatter8(FSb,w,w,w,w) xor (rc shl 24)
      j=0, rc=rc*2, if rc>0FFh then rc=rc xor 11Bh     ;
    if j=4 and jj=8 then w=scatter8(FSb,w,w,w,w)       ;/
  if mode=DECRYPT then
    for i=0 to nr/2-1     ;swap entries (except middle one)
      for j=0 to 3
        w=RK[i*4+j], v=RK[nr*4-i*4+j]
        RK[i*4+j]=v, RK[nr*4-i*4+j]=w
    for i=4 to nr*4-1     ;modify entries (except RK[0..3] and RK[nr*4+0..3])
      w=RK[i], w=scatter8(FSb,w,w,w,w), RK[i]=scatter32(RT,w,w,w,w)

  DSi AES Little-Endian Tables and Test Values

  for i=0 to 0FFh               ;compute pow and log tables...
    if i=0 then x=01h, else x=x xor x*2, if x>0FFh then x=x xor 11Bh
    pow[i]=x, log[x]=i
  for i=0 to 0FFh               ;generate the forward and reverse S-boxes...
    x=x xor (x rol 1) xor (x rol 2) xor (x rol 3) xor (x rol 4) xor 63h
    if i=0 then x=63h
    FSb[i]=x, RSb[x]=i
  for i=0 to 0FFh               ;generate the forward and reverse tables...
    x=FSb[i]*2, if x>0FFh then x=x xor 11Bh
    FT[i]=(FSb[i]*00010101h) xor (x*01000001h)
    w=00000000h, x=RSb[i]
    if x<>00h then   ;ie. not at i=63h
      w=w+pow[(log[x]+log[0Eh]) mod 00FFh]*1000000h
      w=w+pow[(log[x]+log[09h]) mod 00FFh]*10000h
      w=w+pow[(log[x]+log[0Dh]) mod 00FFh]*100h
      w=w+pow[(log[x]+log[0Bh]) mod 00FFh]*1h

  pow[00h..FFh] = 01,03,05,0F,11,..,C7,52,F6,01   ;pow  ;\needed temporarily
  log[00h..FFh] = 00,FF,19,01,32,..,C0,F7,70,07   ;log  ;/for table creation
  FSb[00h..FFh] = 63,7C,77,7B,F2,..,B0,54,BB,16   ;Forward S-box
  RSb[00h..FFh] = 52,09,6A,D5,30,..,55,21,0C,7D   ;Reverse S-box
  FT[00h..FFh] = C66363A5,F87C7C84,..,2C16163A    ;Forward Table
  RT[00h..FFh] = 51F4A750,7E416553,..,D0B85742    ;Reverse Table

  key = "AES-Test-Key-Str-1234567-Abcdefg"  ;use only 1st bytes for 128/192bit
  128bit ENCRYPT --> RK[0..9..30..43] = 2D534541..2783080F..93AF7DF0..827EE10D
  192bit ENCRYPT --> RK[0..9..30..51] = 79654B2D..9708FA95..2529372B..C66C19FA
  256bit ENCRYPT --> RK[0..9..30..59] = 3332312D..DF5C92A5..74174E2E..3C8ADAE6
  128bit DECRYPT --> RK[0..9..30..43] = AEABCD4D..ECD33F19..8C87B246..7274532D
  192bit DECRYPT --> RK[0..9..30..51] = AFA9796F..72A3EFE5..455646C7..37363534
  256bit DECRYPT --> RK[0..9..30..59] = 0ED52830..4601F929..415A7D65..67666564

  [key+0..15]    = "AES-Test-Key-Str-1234567-Abcdefg"
  [iv+0..15]     = "Nonce/InitVector"
  [xtra+0..20]   = "Extra-Associated-Data"  ;\for CCM
  iv_len=12, mac_len=16, xtra_len=xx        ;/
  Unencrypted:   [dta+0..113Fh] = "Unencrypted-Data", 190h x "TestPadding"
  AES-ECB:       [dta+0..113Fh] = 20,24,73,88,..,44,A8,D6,A8  ;\
  AES-CBC:       [dta+0..113Fh] = A4,6F,7A,F2,..,58,C9,02,B4  ;
  AES-CFB128:    [dta+0..113Fh] = 20,C6,DB,35,..,9A,83,7F,DB  ; keysize=128
  AES-CFB8:      [dta+0..113Fh] = 55,C7,75,1C,..,24,6E,A6,D1  ;
  AES-CTR:       [dta+0..113Fh] = 20,C6,DB,35,..,AB,09,0C,75  ;
  AES-CCM:       [dta+0..113Fh] = C8,37,D7,F1,..,7B,EF,FC,12  ;
  AES-CCM (ori): [mac+0..0Fh]   = xx,xx,xx,xx,..,xx,xx,xx,xx  ;
  AES-CCM (DSi): [mac+0..0Fh]   = xx,xx,xx,xx,..,xx,xx,xx,xx  ;/
  AES-ECB:       [dta+0..113Fh] = CC,B6,4D,17,..,D3,56,3E,64  ;-keysize=192
  AES-ECB:       [dta+0..113Fh] = A9,A9,9B,3E,..,8A,C6,13,A1  ;-keysize=256

  DSi AES Big-Endian High Level Functions

AES-CTR (Counter)
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  [ctr+0..15] = [iv+0..15]                                         ;-init ctr
  while len>0   ;code is 100% same for ENCRYPT and DECRYPT         ;\
    if n=0                                                         ; encrypt
      aes_crypt_block(ENCRYPT,ctr,tmp)                             ; or decrypt
      bigendian(ctr)=bigendian(ctr)+1     ;increment counter       ; message
    [dst] = [src] xor [tmp+n]                                      ;
    src=src+1, dst=dst+1, len=len-1, n=(n+1) and 0Fh               ;/

AES-CCM (Counter with CBC-MAC)
  if mac_len<4 or mac_len>16 or (mac_len and 1)=1 then error       ;\limits
  if iv_len<7 or iv_len>13 then error                              ;/
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  ctr_len = 15-iv_len                                              ;\
  [ctr+0]=ctr_len-1     ;bit3..7=zero   ;1 byte (ctr_len)          ; init ctr
  [ctr+1..iv_len] = [iv+0..(iv_len-1)]  ;7..13 bytes (iv)          ;
  [ctr+(iv_len+1)..15]=bigendian(0)     ;8..2 bytes (counter=0)    ;/
  [cbc+0..15]=bigendian(msg_len)   ;-[(iv_len+1)..15]=msg_len      ;\
  if [cbc+0..iv_len]<>0 then error ;errif msg_len overlaps iv/flags;
  [cbc+1..iv_len]=[iv+0..iv_len-1] ;-[1..iv_len]=iv (aka nonce)    ;
  [cbc+0].bit7=0  ;reserved/zero   ;\                              ; init cbc
  [cbc+0].bit6=(xtra_len>0)        ; [0]=flags                     ;
  [cbc+0].bit5..3=(mac_len/2-1)    ;                               ;
  [cbc+0].bit2..0=(ctr_len-1)      ;/                              ;
  aes_crypt_block(ENCRYPT,cbc,cbc)      ;UPDATE_CBC_MAC            ;/
  if NintendoDSi then                                              ;\
    a=0 ;the DSi hardware doesn't support xtra_len encoding at all ;
  elseif xtra_len<0FF00h then                                      ;
    [cbc+0..1]=[cbc+0..1] xor bigendian(xtra_len), a=2             ; weird
  elseif xtra_len<100000000h then                                  ; encoding
    [cbc+0..1]=[cbc+0..1] xor bigendian(FFFEh)                     ; for
    [cbc+2..5]=[cbc+2..5] xor bigendian(xtra_len), a=6             ; xtra_len
  else                                                             ;
    [cbc+0..1]=[cbc+0..1] xor bigendian(FFFFh)                     ;
    [cbc+2..9]=[cbc+2..9] xor bigendian(xtra_len), a=10            ;/
  while xtra_len>0                                                 ;\scatter
    z=min(xtra_len,16-a)                                           ; cbc by
    [cbc+a..(a+z-1)]=[cbc+a..(a+z-1)] xor [xtra+0..(z-1)]          ; xtra
    aes_crypt_block(ENCRYPT,cbc,cbc)    ;UPDATE_CBC_MAC            ; (if any)
    xtra=xtra+z, xtra_len=xtra_len-z, a=0                          ;/
  while msg_len>0                                                  ;\
    bigendian(ctr)=bigendian(ctr)+1     ;increment counter         ;
    aes_crypt_block(ENCRYPT,ctr,tmp)    ;CTR_CRYPT                 ;
    z=min(msg_len,16)                                              ; encrypt
    if mode=ENCRYPT                                                ; or decrypt
      [cbc+0..(z-1)] = [cbc+0..(z-1)] xor [src+0..(z-1)]           ; message
    [dst+0..(z-1)] = [src+0..(z-1)] xor [tmp+0..(z-1)]             ; body
    if mode=DECRYPT                                                ;
      [cbc+0..(z-1)] = [cbc+0..(z-1)] xor [dst+0..(z-1)]           ;
    aes_crypt_block(ENCRYPT,cbc,cbc)    ;UPDATE_CBC_MAC            ;
    src=src+z, dst=dst+z, msg_len=msg_len-z                        ;/
  [ctr+(iv_len+1)..15]=bigendian(0)     ;reset counter=0           ;\
  aes_crypt_block(ENCRYPT,ctr,tmp)      ;CTR_CRYPT                 ; message
  [cbc+0..15] = [cbc+0..15] xor [tmp+0..15]                        ; auth code
  z=mac_len                                                        ; (mac)
  IF mode=ENCRYPT then [mac+0..(z-1)] = [cbc+0..(z-1)]             ;
  IF mode=DECRYPT and [mac+0..(z-1)] <> [cbc+0..(z-1)] then error  ;/

Below are some other AES variants (just for curiosity - those variants aren't used in DSi):

AES-CBC (Cipher-block chaining)
  aes_setkey(mode,key,key_size]                                    ;-init key
  [cbc+0..15] = [iv+0..15]                                         ;-init cbc
  if (len AND 0Fh)>0 then error
  while len>0                                                      ;\
    if mode=ENCRYPT                                                ;
      [dst+0..15] = [src+0..15] xor [cbc+0..15]                    ;
      aes_crypt_block(mode,dst,dst)                                ; encrypt
      [cbc+0..15] = [dst+0..15]                                    ; or decrypt
    if mode=DECRYPT                                                ; message
      [tmp+0..15] = [src+0..15]                                    ;
      aes_crypt_block(mode,src,dst)                                ;
      [dst+0..15] = [dst+0..15] xor [cbc+0..15]                    ;
      [cbc+0..15] = [tmp+0..15]                                    ;
    src=src+16, dst=dst+16, len=len-16                             ;/

AES-CFB128 (Cipher feedback on 128bits, aka 16 bytes)
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  [cfb+0..15] = [iv+0..15]                                         ;-init cfb
  while len>0                                                      ;\
    if n=0 then aes_crypt_block(ENCRYPT,cfb,cfb)                   ; encrypt
    if mode=DECRYPT then c=[src], [dst]=c xor [cfb+n], [cfb+n]=c   ; or decrypt
    if mode=ENCRYPT then c=[cfb+n] xor [src], [cfb+n]=c, [dst]=c   ; message
    src=src+1, dst=dst+1, len=len-1, n=(n+1) and 0Fh               ;/

AES-CFB8 (Cipher feedback on 8bits, aka 1 byte, very inefficient)
  aes_setkey(ENCRYPT,key,key_size]                                 ;-init key
  [cfb+0..15] = [iv+0..15]                                         ;-init cfb
  while len>0                                                      ;\
    aes_crypt_block(ENCRYPT,cfb,tmp)                               ;
    [cfb+0..14] = [cfb+1..15]   ;shift with 8-bit step             ; encrypt
    if mode=DECRYPT then [cfb+15] = [src]                          ; or decrypt
    [dst] = [src] xor [tmp+0]   ;shift-in new 8-bits               ; message
    if mode=ENCRYPT then [cfb+15] = [dst]                          ;
    src=src+1, dst=dst+1, len=len-1                                ;/

AES-ECB (Electronic codebook, very basic, very insecure)
  aes_setkey(mode,key,key_size]                                    ;-init key
  if (len AND 0Fh)>0 then error
  while len>0                                                      ;\encrypt
    aes_crypt_block(mode,src,dst)                                  ; or decrypt
    src=src+16, dst=dst+16, len=len-16                             ;/message

  DSi AES Big-Endian Core Function and Key Schedule

  Y0 = RK[0] xor [src+00h]
  Y1 = RK[1] xor [src+04h]
  Y2 = RK[2] xor [src+08h]
  Y3 = RK[3] xor [src+0Ch]
  ;below code depending on mode:      <---ENCRYPT--->  -or-  <---DECRYPT--->
  for i=1 to nr-1
    X0      = RK[i*4+0] xor scatter32(FT,Y0,Y1,Y2,Y3)  -or-  (RT,Y0,Y3,Y2,Y1)
    X1      = RK[i*4+1] xor scatter32(FT,Y1,Y2,Y3,Y0)  -or-  (RT,Y1,Y0,Y3,Y2)
    X2      = RK[i*4+2] xor scatter32(FT,Y2,Y3,Y0,Y1)  -or-  (RT,Y2,Y1,Y0,Y3)
    X3      = RK[i*4+3] xor scatter32(FT,Y3,Y0,Y1,Y2)  -or-  (RT,Y3,Y2,Y1,Y0)
    Y0=X0, Y1=X1, Y2=X2, Y3=X3
  [dst+00h] = RK[nr*4+0] xor scatter8(FSb,Y0,Y1,Y2,Y3) -or-  (RSb,Y0,Y3,Y2,Y1)
  [dst+04h] = RK[nr*4+1] xor scatter8(FSb,Y1,Y2,Y3,Y0) -or-  (RSb,Y1,Y0,Y3,Y2)
  [dst+08h] = RK[nr*4+2] xor scatter8(FSb,Y2,Y3,Y0,Y1) -or-  (RSb,Y2,Y1,Y0,Y3)
  [dst+0Ch] = RK[nr*4+3] xor scatter8(FSb,Y3,Y0,Y1,Y2) -or-  (RSb,Y3,Y2,Y1,Y0)

 scatter32(TAB,a,b,c,d):              scatter8(TAB,a,b,c,d):
  w=      (TAB[a.bit0..7])             w.bit0..7   = TAB[a.bit0..7]
  w=w xor (TAB[b.bit8..15] rol 8)      w.bit8..15  = TAB[b.bit8..15]
  w=w xor (TAB[c.bit16..23] rol 16)    w.bit16..23 = TAB[c.bit16..23]
  w=w xor (TAB[d.bit24..31] rol 24)    w.bit24..31 = TAB[d.bit24..31]
  return w                             return w

 aes_setkey(mode,key,keysize):  ;out: RK[0..43/51/59], nr=10/12/14
  aes_generate_tables   ;<-- unless tables are already initialized
  if keysize<>128 and keysize<>192 and keysize<>256 then error  ;size in bits
  rc=01h, j=0, jj=keysize/32, nr=jj+6   ;jj=4,6,8      ;\
  for i=0 to (nr+1)*4-1                 ;nr=10,12,14   ; copy 16/24/32-byte key
    if i<jj then w=[key+i*4+0..3]                      ; to RK[0..3/5/7]
    else w=w xor RK[i-jj]                              ; and, make
    RK[i]=w, j=j+1                                     ; RK[4/6/8..43/51/59]
    if j=jj then                                       ;
      w=w ror 8, w=scatter8(FSb,w,w,w,w) xor rc        ;
      j=0, rc=rc*2, if rc>0FFh then rc=rc xor 11Bh     ;
    if j=4 and jj=8 then w=scatter8(FSb,w,w,w,w)       ;/
  if mode=DECRYPT then
    for i=0 to nr/2-1     ;swap entries (except middle one)
      for j=0 to 3
        w=RK[i*4+j], v=RK[nr*4-i*4+j]
        RK[i*4+j]=v, RK[nr*4-i*4+j]=w
    for i=4 to nr*4-1     ;modify entries (except RK[0..3] and RK[nr*4+0..3])
      w=RK[i], w=scatter8(FSb,w,w,w,w), RK[i]=scatter32(RT,w,w,w,w)

  DSi AES Big-Endian Tables and Test Values

  for i=0 to 0FFh               ;compute pow and log tables...
    if i=0 then x=01h, else x=x xor x*2, if x>0FFh then x=x xor 11Bh
    pow[i]=x, log[x]=i
  for i=0 to 0FFh               ;generate the forward and reverse S-boxes...
    x=x xor (x rol 1) xor (x rol 2) xor (x rol 3) xor (x rol 4) xor 63h
    if i=0 then x=63h
    FSb[i]=x, RSb[x]=i
  for i=0 to 0FFh               ;generate the forward and reverse tables...
    x=FSb[i]*2, if x>0FFh then x=x xor 11Bh
    FT[i]=(FSb[i]*01010100h) xor (x*01000001h)
    w=00000000h, x=RSb[i]
    if x<>00h then   ;ie. not at i=63h
      w=w+pow[(log[x]+log[0Eh]) mod 00FFh]*1h
      w=w+pow[(log[x]+log[09h]) mod 00FFh]*100h
      w=w+pow[(log[x]+log[0Dh]) mod 00FFh]*10000h
      w=w+pow[(log[x]+log[0Bh]) mod 00FFh]*1000000h

  pow[00h..FFh] = 01,03,05,0F,11,..,C7,52,F6,01   ;pow  ;\needed temporarily
  log[00h..FFh] = 00,FF,19,01,32,..,C0,F7,70,07   ;log  ;/for table creation
  FSb[00h..FFh] = 63,7C,77,7B,F2,..,B0,54,BB,16   ;Forward S-box
  RSb[00h..FFh] = 52,09,6A,D5,30,..,55,21,0C,7D   ;Reverse S-box
  FT[00h..FFh] = A56363C6,847C7CF8,..,3A16162C    ;Forward Table
  RT[00h..FFh] = 50A7F451,5365417E,..,4257B8D0    ;Reverse Table

  key = "AES-Test-Key-Str-1234567-Abcdefg"  ;use only 1st bytes for 128/192bit
  128bit ENCRYPT --> RK[0..9..30..43] = 2D534541..ED0DC6FA..43DAC81C..0F5026BB
  192bit ENCRYPT --> RK[0..9..30..51] = 2D534541..4AAB3D82..29CA38D2..CA4DFE3B
  256bit ENCRYPT --> RK[0..9..30..59] = 2D534541..1AA51359..CCB886C8..88956C9C
  128bit DECRYPT --> RK[0..9..30..43] = F653079B..47DD8A1C..1C2070A7..7274532D
  192bit DECRYPT --> RK[0..9..30..51] = 3CEC6AFF..C4F96B6F..AE36B4AE..7274532D
  256bit DECRYPT --> RK[0..9..30..59] = DE7ADCD9..8C559ADD..067A387E..7274532D

  [key+0..15]    = "AES-Test-Key-Str-1234567-Abcdefg"
  [iv+0..15]     = "Nonce/InitVector"
  [xtra+0..20]   = "Extra-Associated-Data"  ;\for CCM
  iv_len=12, mac_len=16, xtra_len=21        ;/
  Unencrypted:   [dta+0..113Fh] = "Unencrypted-Data", 190h x "TestPadding"
  AES-ECB:       [dta+0..113Fh] = 5F,BD,04,DB,..,E4,07,F4,B6  ;\
  AES-CBC:       [dta+0..113Fh] = 0B,BB,53,FA,..,DD,28,6D,AE  ;
  AES-CFB128:    [dta+0..113Fh] = F4,75,4F,0E,..,73,B5,D7,E7  ; keysize=128
  AES-CFB8:      [dta+0..113Fh] = F4,10,6A,83,..,BF,1B,16,3E  ;
  AES-CTR:       [dta+0..113Fh] = F4,75,4F,0E,..,04,DF,EB,BA  ;
  AES-CCM:       [dta+0..113Fh] = FD,1A,6D,98,..,EE,FD,68,F6  ;
  AES-CCM (ori): [mac+0..0Fh]   = FD,F9,FE,85,..,4F,50,3C,AF  ;
  AES-CCM (DSi): [mac+0..0Fh]   = xx,xx,xx,xx,..,xx,xx,xx,xx  ;/
  AES-ECB:       [dta+0..113Fh] = 0E,69,F5,1A,..,9A,5F,7A,9A  ;-keysize=192
  AES-ECB:       [dta+0..113Fh] = C6,FB,68,C1,..,14,89,6C,E0  ;-keysize=256

  DSi ES Block Encryption

ES Block Encryption, for lack of a better name, is a Nintendo DSi specific data encryption method. It's used for some SD/MMC files:
  FAT16:\ticket\000300tt\4ggggggg.tik (tickets)
  SD Card: .bin files (aka Tad Files)
  twl-*.der files (within the "verdata" NARC file)

Block Layout
  00000h      BLKLEN   Data Block      (AES-CCM encrypted)
  BLKLEN+00h  10h      Data Checksum   (AES-CCM MAC value on above Data)
  BLKLEN+10h  1        Fixed 3Ah       (AES-CTR encrypted)
  BLKLEN+11h  0Ch      Nonce           (unencrypted)
  BLKLEN+1Dh  1        BLKLEN.bit16-23 (AES-CTR encrypted)
  BLKLEN+1Eh  1        BLKLEN.bit8-15  (AES-CTR encrypted)
  BLKLEN+1Fh  1        BLKLEN.bit0-7   (AES-CTR encrypted)
BLKLEN can be max 20000h. If the Data is bigger than 128Kbytes, then it's split into multiple block(s) with BLKLEN=20000h (the last block can have smaller BLKLEN).

Data Block Encryption/Decryption (AES-CCM)
  IV[00h..0Bh]=[BLKLEN+11h..1Ch]  ;Nonce
  IV[0Ch..0Fh]=Don't care (not used for CCM)
With that IV value, apply AES-CCM on the Data Block:
  00000h      BLKLEN   Data Block      (AES-CCM)
Observe that some DSi files have odd BLKLEN values, so you may need to append padding bytes to the Data Block (the DSi hardware requires full 16-byte chunks for encryption/decryption).

Data Block Padding (16-byte alignment)
For encryption, it's simple: Just append 00h byte(s) as padding value.
For decryption, it's more complicated: The padding values should be ENCRYPTED 00h-bytes (required to get the same MAC result as for encryption). If you don't want to verify the MAC, then you could append whatever dummy bytes. If you want to verify the MAC, then you could pre-calculate the padding values as so:
  IV[00h..02h]=BLKLEN/10h + 1     ;CTR value for last 16-byte block
  IV[03h..0Eh]=[BLKLEN+11h..1Ch]  ;Nonce
  IV[0Fh]=02h                     ;Indicate 3-byte wide CTR (fixed on DSi)
Then, use AES-CTR (not CCM) to encrypt sixteen 00h-bytes, the last byte(s) of the result can be then used as padding value(s). The padding values should be pre-calculated BEFORE starting the CCM decryption (the DSi hardware allows only one AES task at once, so they cannot be calculated via AES-CTR when AES-CCM decryption is in progress).

Verifying the Footer values (AES-CTR)
This step is needed only for verification purposes (encryption tools should create these values, but decryption tools may or may not verify them).
  IV[00h]=00h                     ;Zero
  IV[01h..0Ch]=[BLKLEN+11h..1Ch]  ;Nonce
  IV[0Dh..0Fh]=00h,00h,00h        ;Zero
With that IV value (and same Key as for AES-CCM), apply AES-CTR on the last 16 bytes of the block:
  BLKLEN+10h  1        Fixed 3Ah       (AES-CTR encrypted)
  BLKLEN+11h  0Ch      Nonce           (unencrypted)
  BLKLEN+1Dh  1        BLKLEN.bit16-23 (AES-CTR encrypted)
  BLKLEN+1Eh  1        BLKLEN.bit8-15  (AES-CTR encrypted)
  BLKLEN+1Fh  1        BLKLEN.bit0-7   (AES-CTR encrypted)
AES-CTR is XORing the data stream (encrypted bytes will turn into unencrypted bytes, and vice-versa), so the result would look as so:
  BLKLEN+10h  1        Fixed 3Ah       (unencrypted)       (to be verified)
  BLKLEN+11h  0Ch      Nonce           (AES-CTR encrypted) (useless/garbage)
  BLKLEN+1Dh  1        BLKLEN.bit16-23 (unencrypted)       (to be verified)
  BLKLEN+1Eh  1        BLKLEN.bit8-15  (unencrypted)       (to be verified)
  BLKLEN+1Fh  1        BLKLEN.bit0-7   (unencrypted)       (to be verified)
Mind that BLKLEN can be odd, so data at BLKLEN+00h..1Fh isn't necessarily located at 4-byte aligned addresses.

  DSi Cartridge Header

Old NDS Header Entries
The first 180h bytes of the DSi Header are essentially same as on NDS:
DS Cartridge Header
New/changed entries in DSi carts are:
  012h 1    Unitcode (00h=NDS, 02h=NDS+DSi, 03h=DSi) (bit1=DSi)
  01Ch 1    NDS: Reserved / DSi: Unknown (03h=Normal, 0Bh=System Menu/Settings)
  01Dh 1    NDS: Region   / DSi: Unknown (00h=Normal, 01h=System Settings)
  068h 4    Icon/Title offset (same as NDS, but with new extra entries)
  080h 4    Total Used ROM size, EXCLUDING DSi area
  088h 4    NDS: Reserved / DSi: Unknown (B8h,D0h,04h,00h)
  08Ch 4    NDS: Reserved / DSi: Unknown (44h,05h,00h,00h)
  090h 4    NDS: Reserved / DSi: Unknown (16h,00h,16h,00h)

New DSi Header Entries
  180h 20   Global MBK1..MBK5 Setting, WRAM Slots
  194h 12   Local ARM9 MBK6..MBK8 Setting, WRAM Areas
  1A0h 12   Local ARM7 MBK6..MBK8 Setting, WRAM Areas
  1ACh 3    Global MBK9 Setting, WRAM Slot Master
  1AFh 1    ... whatever, rather not 4000247h WRAMCNT ?
                 (above byte is usually 03h)
                 (but, it's FCh in System Menu?)
                 (but, it's 00h in System Settings?)
  1B0h 4    Region flags (bit0=JPN, bit1=USA, bit2=EUR, bit3=AUS, bit4=CHN,
              bit5=KOR, bit6-31=Reserved) (FFFFFFFFh=Region Free)
  1B4h 4    Access control  (uh ???)  ;whatever Flags (AES Key Select?)      ?
  1B8h 4    ARM7 SCFG EXT mask (controls which devices to enable) (uh ???)
  1BCh 3    Reserved/flags? (zerofilled)
  1BFh 1    Flags? (usually 01h) (DSiware Browser: 0Bh)
              bit2: Custom Icon  (0=No/Normal, 1=Use banner.sav)
  1C0h 4    ARM9i rom offset (usually XX03000h, XX=1MB-boundary after NDS area)
  1C4h 4    Reserved (zero-filled)
  1C8h 4    ARM9i load address
  1CCh 4    ARM9i size
  1D0h 4    ARM7i rom offset
  1D4h 4    Pointer to base address where various structures and parameters
              are passed to the title (=something passed from firmware to ram?)
  1D8h 4    ARM7i load address
  1DCh 4    ARM7i size
  1E0h 4    Digest NTR region offset (usually same as ARM9 rom offs, 0004000h)
  1E4h 4    Digest NTR region length
  1E8h 4    Digest TWL region offset (usually same as ARM9i rom offs, XX03000h)
  1ECh 4    Digest TWL region length
  1F0h 4    Digest Sector Hashtable offset ;\SHA1 HMAC's on all sectors
  1F4h 4    Digest Sector Hashtable length ;/in above NTR+TWL regions
  1F8h 4    Digest Block Hashtable offset  ;\SHA1 HMAC's on each N entries
  1FCh 4    Digest Block Hashtable length  ;/in above Sector Hashtable
  200h 4    Digest Sector size       (eg. 400h bytes per sector)
  204h 4    Digest Block sectorcount (eg. 20h sectors per block)
  208h 4    Icon/Title size (usually 23C0h)
  20Ch 4    Reserved ??? (00 00 01 00)
  210h 4    Total Used ROM size, INCLUDING DSi area
  214h 4    Reserved ?   (00 00 00 00)
  218h 4    Reserved ??? (84 D0 04 00) whatever, resembles header entry [088h]
  21Ch 4    Reserved ??? (2C 05 00 00) whatever, resembles header entry [08Ch]
  220h 4    Modcrypt area 1 offset ;usually same as ARM9i rom offs (XX03000h)
  224h 4    Modcrypt area 1 size   ;usually min(4000h,ARM9iSize+Fh AND not Fh)
  228h 4    Modcrypt area 2 offset (0=None)
  22Ch 4    Modcrypt area 2 size   (0=None)
  230h 4    Title ID, Emagcode (aka Gamecode spelled backwards)
  234h 1    Title ID, Filetype (00h=Cartridge, 04h=DSiware, 05h=System Fun
              Tools, [0Fh=Non-executable datafile without cart header],
              15h=System Base Tools, 17h=System Menu)
  235h 1    Title ID, Zero     (00h=Normal)
  236h 1    Title ID, Three    (03h=Normal, why?)
  237h 1    Title ID, Zero     (00h=Normal)
  238h 4    SD/MMC (DSiware) "public.sav" filesize in bytes  (0=none)
  23Ch 4    SD/MMC (DSiware) "private.sav" filesize in bytes (0=none)
  240h 176  Reserved (zero-filled)
  2F0h 10h  Parental Control Age Ratings (for different countries/areas)
             Bit7: Rating exists for local country/area
             Bit6: Game is prohibited in local country/area?
             Bit5: Unused
             Bit4-0: Age rating for local country/area (years)
  300h 20   SHA1 HMAC hash ARM9 (with encrypted secure area)  ;[020h,02Ch]
  314h 20   SHA1 HMAC hash ARM7                               ;[030h,03Ch]
  328h 20   SHA1 HMAC hash Digest master                      ;[1F8h,1FCh]
  33Ch 20   SHA1 HMAC hash Icon/Title                         ;[068h,208h]
  350h 20   SHA1 HMAC hash ARM9i (decrypted)                  ;[1C0h,1CCh]
  364h 20   SHA1 HMAC hash ARM7i (decrypted)                  ;[1D0h,1DCh]
  378h 40   Reserved (zero-filled)
  3A0h 20   SHA1 HMAC hash ARM9 (without 16Kbyte secure area) ;[020h,02Ch]
  3B4h 2636 Reserved (zero-filled)
  E00h 180h Reserved and unchecked region, always zero. Used for passing
              arguments in debug environment.
  F80h 80h  RSA SHA1 signature across header entries [000h..DFFh]
Note: There should be some Age-rating (for firmware's Parental Controls).
  1000h..3FFFh  Non-Load area in ROMs... but contains sth in DSiWare files!?!

DSiware/System Utilities
Files saved on SD card or internal eMMC memory are having the same header, with some differences:
  The ARM7 and ARM9 areas may exceed the 4Mbyte NDS-limit
  Entry 3A0h can be zero-filled (in LAUNCHER)
DSiware files are usually marked as [012h]=03h=DSi (exceptions are the DS Download Play and PictoChat utilities, which are marked [012h]=00h=NDS, since they are actually running in NDS mode).

The SHA1 HMAC's in cart header and Digest tables are SHA1 checksums with a 40h-byte HMAC key (values 21h, 06h, C0h, DEh, BAh, ..., 24h), the key is probably stored in encrypted areas of NAND memory and/or in undumped areas of BIOS memory, the raw unencrypted key is also stored in most DSi cartridges (probably used for verifying Digest values when loading additional data after booting). The key can be used for verifying checksums, but (due to the RSA signature) not for changing them. See BIOS chapter for SHA1/HMAC pseudo code.

The RSA SHA1 value is a normal SHA1 (not SHA1 HMAC) across header entries [000h..DFFh], the 20-byte value is padded to 127-byte size (01h, 105xFFh, 00h, followed by the 20 SHA1 bytes). It can be decrypted (via SWI 22h) using the 80h-byte RSA public keys located in ARM9BIOS (note that there are at least four different RSA keys, one is used for games, and others for system files), and can be then verfied against the SHA1 checksum (computed via SWI 27h).
BIOS RSA Functions (DSi only)
The private key needed for encryption is unknown, which is unfortunately preventing to boot unlicensed (homebrew) software.

Modcrypt (AES-CTR)
Modcrypt is a new additional way of encrypting parts of the NDS ROM executable binary modules using AES CTR. It is mostly being used to encrypt the ARM9i and ARM7i binaries. DSi cartridges are usually having only the ARM9i binary encrypted (as area 1), while NAND based applications have both the ARM9i and ARM7i binaries encrypted (as area 1 and 2).
The initial AES Counter value (IV) is:
  Modcrypt Area 1 IV[0..F]: First 16 bytes of the ARM9 SHA1 HMAC   [300h..30Fh]
  Modcrypt Area 2 IV[0..F]: First 16 bytes of the ARM7 SHA1 HMAC   [314h..323h]
The AES key depends of flags in the cartridge header:
 IF header[01Ch].Bit2 OR header[1BFh].Bit7 THEN (probably for prototypes)
  Debug KEY[0..F]: First 16 bytes of the header                    [000h..00Fh]
 ELSE (commonly used for retail software)
  Retail KEY_X[0..7]: Fixed 8-byte ASCII string                    ("Nintendo")
  Retail KEY_X[8..B]: The 4-byte gamecode, forwards                [00Ch..00Fh]
  Retail KEY_X[C..F]: The 4-byte gamecode, backwards               [00Fh..00Ch]
  Retail KEY_Y[0..F]: First 16 bytes of the ARM9i SHA1 HMAC        [350h..35Fh]
Theoretically, the modcrypt areas can span over any of the ARM9i/ARM7i and ARM9/ARM7 areas (in practice, cartridges should never use modcrypt for the ARM9/ARM7 areas because NDS consoles would leave them undecrypted; that restriction doesn't apply to DSiware).

The NDS format has been extended with a hash tree to verify the entire contents of an NDS ROM. The NDS ROM is divided into sectors, and each sector will be hashed and have its hash stored in the digest sector hashtable. The size of a sector is defined in the header aswell. Furthermore, the sector hashtable is partitioned and hashed again to form block hashes. This block hashtable is hashed again into a single hash called the digest master hash. These hashtables can be used to verify that the sectors of a NDS ROM have not been tampered with, since the integrity of a sector hash can be verified by a block hash, which in turn can be verified by the master hash. And this hash is part of the header, which is signed with RSA.
The sector hashtable reaches over the NTR and TWL regions, respectively.

Cartridge Protocol
The DSi cartridge protocol is same as on NDS; with one new command (3Dh) for unlocking DSi specific memory regions. For details,
DS Cartridge Protocol

  DSi Touchscreen/Sound Controller

DSi Touchscreen Access

AIC3000D Registers
DSi TSC, Register Summary
DSi TSC[0:00h..1Ah], Basic PLL and Timing Control
DSi TSC[0:1Bh..23h], Codec Control
DSi TSC[0:24h..32h], Status and Interrupt Flags
DSi TSC[0:33h..3Bh], Pin Control
DSi TSC[0:3Ch..55h], DAC/ADC and Beep
DSi TSC[0:56h..7Fh], AGC and ADC
DSi TSC[1:xxh], DAC and ADC Routing, PGA, Power-Controls and MISC Logic
DSi TSC[3:xxh], Touchscreen/SAR Control and TSC[FCh:xxh], Buffer
DSi TSC[04h..05h:xxh], ADC Digital Filter Coefficient RAM
DSi TSC[08h..0Fh:xxh], DAC Digital Filter Coefficient RAM
DSi TSC[20h..2Bh:xxh], TSC[40h..5Fh:xxh] ADC/DAC Instruction RAM

  DSi Touchscreen Access

The Touch Screen Controller (for lower LCD screen) is accessed via SPI bus,
DS Serial Peripheral Interface Bus (SPI)
so far, it's same as on NDS, but the SPI touchscreen commands are having an entirely different format in DSi mode:
The DSi touchscreen registers are selected via a combination of a MODE byte and an INDEX byte. The MODE byte is located at INDEX=00h, and it does somewhat 'bankswitch' the contents of INDEX=01h..7Fh. And INDEX can be incremented manually, or automatically (but, confusingly, the manual increment doesn't work for reading Y coordinates).
SPI clock should be set to 4MHz for DSi Mode touchscreen access (unlike NDS, which used 2MHz). The PENIRQ bit in port 4000136h is always zero in DSi mode.
When reading data: Write dummy 00h-bytes in output direction.


DSi Touchscreen INDEX values
The INDEX/Direction byte is written as first byte after SPI chip select:
  0     Direction for following data bytes (0=Write, 1=Read)
  1-7   INDEX (00h..7Fh) for following data bytes (auto-increasing)
The meanining of the separate INDEX values is:
  00h       R/W  MODE register (should be 03h or FCh)
When MODE=03h (Status/Control Registers)
  01h       R    Unknown (00h)
  02h..06h  mix  Unknown (18h,87h,22h,04h,20h) (writeable: FFh,BFh,F7h,E7h,EDh)
  07h..08h  R    Unknown (00h,00h)
  09h       R    State   (40h=Released, 80h=Pressed)
  0Ah..0Ch  R    Unknown (00h,00h,00h)
  0Dh       mix  Unknown (01h on 1st read, 00h thereafter?) (upper 6bit R/W)
  0Eh       mix  State   (ADh=Released, ACh=Pressed)        (upper 6bit R/W)
  0Fh       R/W  Unknown (A0h,88h,81h)
  12h..14h  mix  Unknown (usually 00h-filled) (writeable: E7h,FFh,07h)
  15h       R    Unknown (00h)
  16h..21h  R/W  Unknown Six 16bit values (0000h..1FFFh) (usually 0000h)
  22h..7Fh  R    Unknown (00h-filled)
When MODE=FCh (Touchscreen X/Y Coordinates)
  01h..0Ah  R    Five Touchscreen X Coodinates (big-endian MSB,LSB each)
  0Bh..14h  R    Five Touchscreen Y Coodinates (big-endian MSB,LSB each)
  15h..7Fh  R    Reserved (garbage) (further Touchscreen X/Y Coodinates)
Unknown what happens when using MODE values other than 03h and FCh (might give access to further registers, or return somehow distorted results, or whatever).
Note: The DSi Sound utility also uses MODEs 00h, 01h, and 08h.

When MODE=00h (?)
  01h..0Fh    00 01 44 00 00 00 00  00 00 00 00 00 00 00 00
  10h..1Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  20h..2Fh 03 00 00 00 80 99 11 08  00 00 00 00 00 00 00 00
  30h..3Fh 00 00 09 34 32 12 03 02  03 66 60 00 19 05 00 D4
  40h..4Fh 00 08 08 00 19 38 00 00  00 00 00 EE 10 D8 7E E3
  50h..5Fh 00 00 80 00 00 00 00 00  7F 00 00 00 00 00 00 00
  60h..6Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  70h..7Fh 00 00 00 00 D2 24 00 00  00 00 00 00 00 00 00 00
After FFh-filling, this crashed, and after REBOOT it became:
  01h..0Fh    00 01 44 03 A1 15 00  00 00 00 87 83 00 80 80 ;<--
  10h..1Fh 08 00 87 83 80 80 04 00  00 00 01 00 00 00 01 00 ;<--
  20h..2Fh 00 00 00 00 80 99 11 08  00 00 00 00 00 00 00 00 ;<-
  30h..3Fh 00 00 01 34 32 12 02 02  03 66 60 00 19 05 00 D4 ;<-
  40h..4Fh 00 08 08 00 0F 38 00 00  00 00 00 EE 10 D8 7E E3 ;<-
  50h..5Fh 00 00 80 00 00 00 00 00  7F 00 00 00 00 00 00 00
  60h..6Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  70h..7Fh 00 00 00 00 D2 24 00 00  00 00 00 00 00 00 00 00
  80h..    00...
Then, after reading, many bytes changed back to 00.

When MODE=01h (?)
  01h..0Fh    00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  10h..1Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  20h..2Fh D6 20 F0 44 9E 9E A7 A7  4E 4E 15 15 20 86 00 43 ;
  30h..3Fh 40 40 61 00 00 00 00 00  00 00 00 00 00 00 00 00 ;
  40h..4Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  50h..5Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  60h..6Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  70h..7Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  80h..    00...
After FFh-filling, this crashed, and after REBOOT it became:
  same as above.

When MODE=02h, 05h..07h, 08h(though used), 09h..FBh, FEh (?)
  All 00h-filled
Unknown if/how coefficient RAM and instruction RAM can be enabled.

When MODE=04h (?)
  01h..0Fh    00 01 17 01 17 7D D3  7F E1 80 1F 7F C1 7F FF
  10h..1Fh 00 00 00 00 00 00 00 00  7F FF 00 00 00 00 00 00
  20h..2Fh 00 00 7F FF 00 00 00 00  00 00 00 00 7F FF 00 00
  30h..3Fh 00 00 00 00 00 00 7F FF  00 00 00 00 00 00 00 00
  40h..4Fh 00 00 00 00 00 00 00 00  7F FF 00 00 00 00 7F FF
  50h..5Fh 00 00 00 00 00 00 00 00  7F FF 00 00 00 00 00 00
  60h..6Fh 00 00 7F FF 00 00 00 00  00 00 00 00 7F FF 00 00
  70h..7Fh 00 00 00 00 00 00 7F FF  00 00 00 00 00 00 00 00
  80h..    00...

Mode FCh
  after index 7Fh, actually it REPEATs last byte (instead 00s)

When MODE=FDh (?)
  01h..0Fh    00 00 00 00 00 00 00  00 00 00 24 00 00 09 00 ;<--
  10h..1Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  ...      00...

When MODE=FFh (?)
  01h..0Fh<01>00 00 01 00 01 00 00  00 00 00 00 00 00 00 00 ;<-- !!
  10h..1Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
  ...      00...
  70h..7Fh 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 FF ;<--
  80h...   FF...                                            ;<--
thereafter, ALL MODES return the above values (except index0, returning
00h or 01h, depending on bit0 of the written index0 value)

Pen Down Testing
  if (TSC[3:09h] AND 40h)<>0 then return(not_pressed)   ;ADC Ready Flag
  if (TSC[3:0Eh] AND 03h)<>0 then return(not_pressed)   ;Undocumented Flags?
Note: On NDS, this would be done by reading port 4000136h.bit6, which isn't supported in DSi mode.

X/Y Coordinate Reading
  touchdata[0..19] = TSC[FCh:01h..14h]     ;read page FCh, index(1..20)
  rawx=0, rawy=0
  for i=0 to 8 step 2
    x = touchdata[i+0]*100h+touchdata[i+1]
    y = touchdata[i+10]*100h+touchdata[i+11]
    if (x or y) and F000h then return(not_pressed)
    rawx=rawx+x, rawy=rawy+y
  return(rawx/5, rawy/5)
The resulting 12bit coordinates are same as on NDS (ie. they need to be further processed using the Calibration Points from User Settings).

Touchscreen X/Y Coordinates
  0-11   Coordinate (0..FFFh) (usually 000h when not pressed)
  12-14  State (0=Pressed, 7=Released) (or sometimes also 1 or 3=Released)
  15     State Changed (0=No, 1=Newly pressed/released; cleared after read)
Bit12-14 are usually set to 7 when releasing the screen (though sometimes they become 1 or 3 when releasing the screen, and do stay so until newly pressing it).
Bit15 is cleared after reading (so it will be usually seen only in the first MSB, ie. at INDEX=01h) (though maybe it can also occur elsewhere if it becomes newly set during the SPI transfer).

Odd Effects
Touchscreen coordinates should be read by setting INDEX=01h, and then reading 20 bytes continously (ie. from automatically increasing indices 01h..14h). Trying to increase the index manually (ie. using 1-byte reads via separate SPI transfers) won't work: The hardware will return only X coordinates for all indices (but no Y coordinates), ie. the upper bits of the index are ignored, bit0 does properly select MSBs/LSBs of the 16bit values though.
Trying to read more than 20 bytes will return further touchscreen coordinates (which might be further conversions, or just mirrors of the first 20 bytes), basically, there will be five X coords, followed by five Y coords, with a few odd exceptions: INDEX=0Bh..1Ch will return nine Y coords (instead of five), INDEX=7Fh will have an incomplete 16bit value (MSB only, without LSB at INDEX=80h). INDEX=80h and up will return 00h-bytes (ie. in MODE=FCh, the index doesn't wrap from 7Fh to 00h; unlike as MODE=03h which is wrapping from index 7Fh to 00h).
The five normally used X/Y coordinate pairs are apparently the results from the five most recent conversions; unknown which of the five values are newest and which are oldest (they might sorted newest..oldest, or vice-versa, or located at random locations in a ring-buffer; anyways, it doesn't really matter since the values are just added together).

The microphone input was part of the TSC on NDS. In DSi mode it is reportedly somehow changed, using a new "CODEC" (whatever that means). Maybe it's accessed directly via an ARM7 port (and/or TEAK port?), instead of via SPI bus?

NDS Backwards Compatibility Mode
The DSi hardware can emulate the NDS-style touchscreen protocol (with X/Y/MIC channels and with additional PENIRQ flag; but without Pressure or Temperature channels).