Building a Deterministic Media Normalization Pipeline: What the POC Taught Me
Media normalization sounds simple: take arbitrary files, inspect them, and convert them into a clean, Jellyfinâfriendly MP4 container. But the moment you start working with real files, you discover that the âsimpleâ part is the illusion. The complexity lives in the details â the container rules, the stream metadata, the codec compatibility, and the failure modes that only show up when you run real media through a real pipeline.
This post documents the early proofâofâconcept (POC) for a normalization engine Iâm building. The goal wasnât to produce a finished transcoder. The goal was to expose the hidden constraints so the real design can be grounded in reality rather than assumptions.
The POC did exactly that.
1. The Problem: Arbitrary Media In, Deterministic Media Out
A normalization pipeline needs to take whatever the user throws at it â MKV, MP4, WebDLs, scene releases, remuxes, transcodes â and produce a consistent output. That means:
- detecting codecs
- detecting container compatibility
- selecting the correct streams
- dropping invalid ones
- transcoding only when necessary
- producing an atomic, crashâsafe output
This is not a ârun ffmpeg on a folderâ problem.
Itâs an ingestion problem.
2. Atomic File Promotion: The First Constraint
The POC established a nonânegotiable rule:
never expose partial files to Jellyfin.
The solution:
- write to a
.tmpfile - explicitly specify the muxer (
-f mp4) - promote via
os.replace(tmp, final)
This guarantees:
- no partial indexing
- no corrupted files
- no race conditions
- no cleanup required after crashes
This constraint shapes everything else.
3. The First Real Failure: âmoov atom not foundâ
The first real test file â an 11âminute MKV â immediately failed with:
moov atom not found
Error opening input: Invalid data found when processing input
The file wasnât corrupted.
The ffmpeg build wasnât broken.
The command wasnât wrong.
The problem was the assumption behind the command:
ffmpeg -i input.mkv -c copy -f mp4 output.tmp
This tells ffmpeg to copy every stream into MP4.
The MKV contained:
- 1 HEVC 10âbit video stream
- 12 EAC3 audio tracks
- a 4K PNG cover image
- multiple default flags
- metadata from a dozen languages
MP4 cannot contain:
- attached PNG images
- multiple default audio tracks
- certain EAC3 profiles
- MKVâspecific metadata
The POC had no streamâselection logic, so ffmpeg hit an incompatible stream and aborted.
This was the turning point.
4. The Real Work: Stream Selection
The POC made it clear that a normalization pipeline must be opinionated. It must decide what belongs in the output, not guess.
That requires:
4.1. Full ffprobe interrogation
ffprobe -v quiet -print_format json -show_streams file.mkv
This provides:
- codec
- language
- disposition flags
- channel layout
- metadata
- attachments
- subtitle formats
This is the ground truth.
4.2. Deterministic scoring
A blind âtake the first audio trackâ approach fails immediately.
A deterministic scoring system is required.
Example audio scoring:
- +100 if language â {eng, en, und}
- +50 if disposition.default = 1
- +20 if codec â {aac, ac3, eac3}
- +10 if channels †6
- â1000 if title contains âcommentaryâ
Example subtitle scoring:
- +200 if forced
- +100 if English
- â20 if SDH
- +10 if MP4âcompatible
The POC implemented a minimal version of this to validate the approach.
4.3. Containerâaware mapping
Once streams are selected:
ffmpeg -i input.mkv \
-map 0:v:<best_video> \
-map 0:a:<best_audio> \
-map 0:s:<best_sub> \
-c:v copy \
-c:a copy \
-c:s mov_text \
-f mp4 output.tmp
This avoids:
- incompatible streams
- attachments
- commentary
- invalid metadata
The POC confirmed that this approach works reliably.
5. What the POC Actually Delivered
The POC wasnât meant to be a transcoder.
It was meant to answer questions.
It answered them:
- Atomic rename is essential.
- Temporary files must be invisible to Jellyfin.
- FFmpeg builds differ wildly across distros.
- MP4 remuxing is never blind.
- MKV files often contain incompatible streams.
- Stream selection is the core of the problem.
- Realâworld media is messy.
These arenât implementation details.
Theyâre architectural constraints.
The POCâs job was to reveal them early, and it did.
6. What Comes Next (Without Pretending It Already Exists)
There is no âfinal transcoderâ yet.
There is only:
- a POC
- a set of validated constraints
- a clearer understanding of the problem space
The next step is to design a system that respects those constraints.
But that design work hasnât happened yet, and it shouldnât be written about as if it has.
For now, the POC stands as a map of the terrain â the dragons included.