nocash mp3 decoder
Here's my approach on mp3 decoding. The project is based on the keyj minimp3 source code, manually ported to 80x86 asm code. The goals are to figure out how to decode mp3 files, and to make it fast enough to work on older hardware.
The current version is for Windows 9x or later, it should work on something like 80486DX2-??MHz (not tested yet). I am quite optimistic that I can port it to PSX or GBA consoles. That, if neccessary with reduced output quality (but without requiring users to convert their .mp3 files to lower bitrates).

Feedback needed...
Before trying to port the code to MIPS or ARM, I would be glad about some feedback on the 80x86 version.
Testing: I still need benchmarks for retro PCs (like a working 80486DX2-66MHz, or anything from 80386 to Pentium I). Please download the .mp3 file (linked in benchmarks section), and run mp3play with /test switch, then copy/paste the screen output (the cycle counters may be all zero on older PCs, but the millisecond value should work everywhere). And, of course, check if the sound comes out okay (if the PC is too slow, try using the /fast /mono /half switches).
DOS Version: There's currently only the win32 version (but the HXDOS extension should be able to run win32 code in DOS). I am also considering to make a native DOS version (but only if 80486 benchmarks do confirm that the code could be fast enough to run on yet older PCs).
Code Reviews: Suggestions for making the code faster would be welcome (for the overall logic you could look at my asm code, or the keyj minimp3 high level code). Maybe there are ways to make the code more straight forward, or to get away with fewer multiplications...?
Something like MMX would be probably faster, but PCs that do support MMX are probably fast enough for mp3 anyways (unless one wants to combine audio/video decoding).
Expert Question: Mp3 decoders are converting samples from "frequency domain to time domain". I don't know if that means what I think, and how and where the code is doing that. Anyways, could one tweak those frequencies to produce output at different sample rates? Like converting 48kHz to 44kHz or 24kHz (if the sound hardware doesn't support 48kHz, or if the CPU is too slow for that).
Forums: I've posted about this project in nesdev and psxdev and vogons.


80x86 asm decoder
mp3 80x86 asm decoder, version 1.4 - 20 Sep 2024 (hopefully fully compatible now, and skips lyrics/ext/tag+/ape/3di)
mp3 80x86 asm decoder, version 1.3 - 11 Sep 2024 (more compatible with newer PCs, and log*.exe with status info)
mp3 80x86 asm decoder, version 1.2 - 08 Sep 2024 (a bit faster, and attempts to be compatible with newer PCs)
mp3 80x86 asm decoder, version 1.1 - 02 Sep 2024 (cleanup, bugfixes, faster, more accurate)
mp3 80x86 asm decoder, version 1.0 - 12 Aug 2024 (messy, with many outcommented code relicts)

The separate files in the .zip package are:
 mp3.asm         source code for the mp3 decoding functions
 mp3play.asm     source code for the windows command line tool
 mp3play.exe     win32 executable with support for all commandline options
 mp3tiny.exe     win32 executable cutdown kkcrunchy-compressed 9.5Kbyte version

There are four output modes, and several optional commandline switches:
 mp3play "file.mp3"             output to speakers
 mp3play "file.mp3" /test       measure timings without output
 mp3play "file.mp3" "file.wav"  output to .wav file
 mp3play "file.bit" "file.pcm"  verify output against lieff's .pcm files
 mp3play ... /fast              about 10% faster    (slightly less accurate)
 mp3play ... /mono              about twice as fast (faster, and still good)
 mp3play ... /half              half sample rate    (faster, and still good)
 mp3play ... /quarter           quarter sample rate (doesn't sound so good)
 mp3play ... /8bit              for old hardware    (good, but not much faster)
 mp3play ... >"file.txt"        redirect screen output to file (as usually)

80x86 Benchmarks
Below timings are for decoding Pisse - Fliegerbombe.mp3, free download without needing to register any account or mailing list, MP3 V0, 2,584,725 bytss, 44100 Hz, 234kbit/s.
The timings were measured on a 1GHz Pentium III, showing only the raw RAM-to-RAM decoding time (without disk loading and audio output).
 mp3play "Pisse - Fliegerbombe.mp3" /test
 nocash mp3 decoder v1.1, 2024 martin korth, press ctrl+c to quit, BDS now
 file: Pisse - Fliegerbombe.mp3
 file size: 2,584,576, id3 size: 110,453, tag size: 0
 input: 44100 hz, 2 channels, 234 kbit/s
 output: 44100 hz, 2 channels, 16 bit
 audio duration 84,584 milliseconds, decoded in 930 milliseconds
  clock cycles per second:
  read header     8,137
  read extra      5,399
  read granule    31,900
  append main     133,763
  read scalefac   49,550
  xlat scalefac   100,201
  read huffman    1,927,429
  ms stereo       175,986
  i stereo        0
  reorder         10,813
  antialias       498,490
  imdct           1,517,604
   imdct36        1,370,519
   imdct12        16,766
   imdct0         72,374
  synth/dct       7,119,406
   synth.dct32    1,832,372
   synth.output   5,051,125
  total           11,645,068

Timings for different versions on different processors:
 version       processor           tester             clks/s (ms)          cpu load
 mp3play v0.0  Pentium III, 1GHz   nocash         16,xxx,xxx (pre-release) 0.016    ;\
 mp3play v1.0  Pentium III, 1GHz   nocash         12,190,805 (1019ms)      0.012    ;
 mp3play v1.1  Pentium III, 1GHz   nocash         11,644,126 (935ms)       0.012    ;
 mp3play v1.2  Pentium III, 1GHz   nocash         11,477,895 (903ms)       0.011    ;
 mp3play v1.3  Pentium III, 1GHz   nocash         11,468,126 (899ms)       0.011    ;
 mp3play v1.4  Pentium III, 1GHz   nocash         11,458,863 (885ms)       0.011    ;
 mp3play v1.1  C2D E8600, 3.33GHz  TmEE            7,988,837 (207ms)       0.002    ; perfect
 mp3play v1.1  i5-4300U, 2.5GHz,   TmEE            6,476,551 (281ms)       0.002    ; quality
 mp3play v1.1  i7-6700K, 4GHz,     Memblers        6,427,673 (219ms)       0.002    ;
 mp3play v1.1  i7-9750H, 2.6GHz,   GValiente       4,223,386 (235ms)       0.002    ;
 mp3play v1.4  Cyrix 5x86, 120MHz  Many Bothans       ~57.7M (40,663ms)    0.481    ;
 mp3play v1.4  Cyrix 5x86, 50x2MHz Many Bothans       ~54.3M (45,914ms)    0.542    ;
 mp3play v1.4  Cyrix 5x86, 33x3MHz Many Bothans       ~57.9M (48,979ms)    0.579    ;
 mp3play v1.4  Cyrix 5x86, 80MHz   Many Bothans       ~56.7M (59,960ms)    0.708    ;
 mp3play v1.4  Cyrix 5x86, 66MHz   Many Bothans       ~56.5M (72,499ms)    0.857    ;
 mp3play v1.4  AMD Am5x86, 160MHz  Many Bothans       ~94.8M (50,126ms)    0.592    ;
 mp3play v1.4  AMD Am5x86, 150MHz  Many Bothans       ~90.2M (50,855ms)    0.601    ;
 mp3play v1.4  AMD Am5x86, 133MHz  Many Bothans       ~99.9M (63,540ms)    0.751    ;
 mp3play v1.4  AMD Am5x86, 120MHz  Many Bothans       ~93.0M (65,610ms)    0.775    ;
 mp3play v1.4  AMD Am5x86, 100MHz  Many Bothans       ~91.5M (77,355ms)    0.914    ;/
 mp3play v1.4  AMD Am5x86, 75MHz   Many Bothans       ~91.0M (102,654ms)   1.213    ;\
 mp3play v1.4  Intel 486DX2, 66MHz Many Bothans       ~91.7M (117,499ms)   1.389    ; good
 mp3play v1.4  Intel 486DX2, 50MHz Many Bothans       ~92.6M (156,670ms)   1.852    ;/
 mp3play v1.4  Intel 486SX, 33MHz  Many Bothans       ~87.2M (223,563ms)   2.643    ;-low(?)
 mp3play v1.4  Intel 486SX, 25MHz  Many Bothans       ~87.6M (296,439ms)   3.504    ;-too slow
 mp3play       80486DX2-66MHz      -                       ? (more tests welcome)
 mp3play       80486DX4-100MHz     -                       ? (unknown)
 mp3play       80386               -                       ? (unknown)
 mp3play       Pentium I           -                       ? (unknown)

There are several commandline switches for faster decoding. Below timings are from mp3play v1.4 on Pentium III, for above 44100Hz file, and another 48000Hz file.
 switches                         44100Hz/234kbit/84584ms  48000Hz/256kbit  speed  quality
 /test                            11,449,871 (899ms)       12,413,478       1.000  perfect
 /test /fast                      10,165,734 (794ms)       11,015,452       1.127  good
 /test             /half          8,191,299  (674ms)       8,839,641        1.404  good
 /test /fast       /half          7,528,040  (605ms)       8,139,875        1.525  good
 /test /fast /mono                5,246,163  (445ms)       5,679,721        2.185  good
 /test /fast /mono /half          3,930,032  (320ms)       4,249,539        2.921  good
 /test /fast /mono /quarter       3,386,091  (268ms)       3,659,921        3.392  low
 /test /fast /mono /quarter /8bit 3,353,499  (269ms)       3,633,287        3.416  low

As a last resort, one could resample the mp3 file before3 playback. That's uncomfortable, but a lower samplerate can be much faster (than the /half /quarter switches), and a lower bitrate can save diskspace (but isn't much faster).
 source file                                               clks/s           speed  quality
 pisse_44khz_234kbit.mp3 /test                             11,449,871       1.000  perfect
 pisse_44khz_112kbit.mp3 /test                             10,651,344       1.075  good
 pisse_22khz_112kbit.mp3 /test                             6,028,984        1.899  good
 pisse_44khz_112kbit.mp3 /test /fast /mono /half           3,440,489        3.328  good
 pisse_11khz_56kbit.mp3  /test                             3,034,350        3.773  low
 pisse_22khz_112kbit.mp3 /test /fast /mono                 2,761,704        4.146  good
 pisse_22khz_112kbit.mp3 /test /fast /mono /half           2,099,826        5.453  low
 pisse_11khz_56kbit.mp3  /test /fast /mono                 1,384,470        8.270  low


other decoder: fixed-point keyj minimp3, 2007 Martin J. Fiedler
This is a nice small fixed-point decoder (and is itself based on FFmpeg). It claims to be rather slow, although the code doesn't look bad, apart from a few suboptimal things:


other decoder: floating-point lieff minimp3, 2018 lieff
This is another small mp3 decoder, which is confusingly also called minimp3. It's using floating-point, but claims to be very fast and accurate.


other asm decoders
mp3 asm decoders are somewhat hard to find, if they do exist at all. Some source code libraries seem to contain asm code, but that's buried in a huge library, and it's often only supporting exotic things like simd/sse/neon, and only using small asm snippets for the imdct/dct core functions, with other functions coded in high level language.
There's reportedly something called "100% assembly coded mpeg-2/mp3 decoder", but there's little info if it's been released as source code or binary, and for which processor and which operating system.


compression ratios
Historically, mp3 did reach compression ratios of about 1:10 (compared to uncompressed CD audio). But, whilst small is good for compression, many users and distributors think that bigger is better. These days, the typical mp3 ratio is about 1:5, which is barely better (or even worse) than ADPCM. Additionally, mp3 can be vandalized with ID3 headers (which do sometimes contain up to 2 Mbytes of lossless hi-res bitmaps in .png format).
  WAV   44,100Hz, stereo, 16bit      1,411,200 bit/s (1:1)
  FLAC  44,100Hz, stereo, 16bit       ~705,600 bit/s (1:2)
  ADPCM 44,100Hz, stereo, 4bit         352,800 bit/s (1:4)
  ADPCM 44,100Hz, mono, 4bit           176,400 bit/s (1:8)
  ADPCM 22,050Hz, mono, 4bit            88,200 bit/s (1:16)
  MP3   variable rate, high quality   ~240,000 bit/s (1:5.88)
  MP3   320kbit/s (highest)            320,000 bit/s (1:4.41)
  MP3   256kbit/s (common these days)  256,000 bit/s (1:5.51)
  MP3   192kbit/s (common earlier)     192,000 bit/s (1:7.53)
  MP3   128kbit/s (medium)             128,000 bit/s (1:11.02)
  MP3   32kbit/s  (lowest for mpeg1)    32,000 bit/s (1:44.10)
  MP3   8kbit/s   (lowest for mpeg2)     8,000 bit/s (1:176.40)
  OGG   compresses better than mp3
  AAC   compresses yet better than mp3