Building a Deterministic Media Normalization Pipeline: What the POC Taught Me

February 15, 2026 Stephen Fasciani

Media normalization sounds simple: take arbitrary files, inspect them, and convert them into a clean, Jellyfin‑friendly MP4 container. But the moment you start working with real files, you discover that the “simple” part is the illusion. The complexity lives in the details — the container rules, the stream metadata, the codec compatibility, and the failure modes that only show up when you run real media through a real pipeline.

This post documents the early proof‑of‑concept (POC) for a normalization engine I’m building. The goal wasn’t to produce a finished transcoder. The goal was to expose the hidden constraints so the real design can be grounded in reality rather than assumptions.

The POC did exactly that.

1. The Problem: Arbitrary Media In, Deterministic Media Out

A normalization pipeline needs to take whatever the user throws at it — MKV, MP4, WebDLs, scene releases, remuxes, transcodes — and produce a consistent output. That means:

detecting codecs
detecting container compatibility
selecting the correct streams
dropping invalid ones
transcoding only when necessary
producing an atomic, crash‑safe output

This is not a “run ffmpeg on a folder” problem.
It’s an ingestion problem.

2. Atomic File Promotion: The First Constraint

The POC established a non‑negotiable rule:
never expose partial files to Jellyfin.

The solution:

write to a .tmp file
explicitly specify the muxer (-f mp4)
promote via os.replace(tmp, final)

This guarantees:

no partial indexing
no corrupted files
no race conditions
no cleanup required after crashes

This constraint shapes everything else.

3. The First Real Failure: “moov atom not found”

The first real test file — an 11‑minute MKV — immediately failed with:

moov atom not found
Error opening input: Invalid data found when processing input

The file wasn’t corrupted.
The ffmpeg build wasn’t broken.
The command wasn’t wrong.

The problem was the assumption behind the command:

ffmpeg -i input.mkv -c copy -f mp4 output.tmp

This tells ffmpeg to copy every stream into MP4.

The MKV contained:

1 HEVC 10‑bit video stream
12 EAC3 audio tracks
a 4K PNG cover image
multiple default flags
metadata from a dozen languages

MP4 cannot contain:

attached PNG images
multiple default audio tracks
certain EAC3 profiles
MKV‑specific metadata

The POC had no stream‑selection logic, so ffmpeg hit an incompatible stream and aborted.

This was the turning point.

4. The Real Work: Stream Selection

The POC made it clear that a normalization pipeline must be opinionated. It must decide what belongs in the output, not guess.

That requires:

4.1. Full ffprobe interrogation

ffprobe -v quiet -print_format json -show_streams file.mkv

This provides:

codec
language
disposition flags
channel layout
metadata
attachments
subtitle formats

This is the ground truth.

4.2. Deterministic scoring

A blind “take the first audio track” approach fails immediately.
A deterministic scoring system is required.

Example audio scoring:

+100 if language ∈ {eng, en, und}
+50 if disposition.default = 1
+20 if codec ∈ {aac, ac3, eac3}
+10 if channels ≤ 6
−1000 if title contains “commentary”

Example subtitle scoring:

+200 if forced
+100 if English
−20 if SDH
+10 if MP4‑compatible

The POC implemented a minimal version of this to validate the approach.

4.3. Container‑aware mapping

Once streams are selected:

ffmpeg -i input.mkv \
  -map 0:v:<best_video> \
  -map 0:a:<best_audio> \
  -map 0:s:<best_sub> \
  -c:v copy \
  -c:a copy \
  -c:s mov_text \
  -f mp4 output.tmp

This avoids:

incompatible streams
attachments
commentary
invalid metadata

The POC confirmed that this approach works reliably.

5. What the POC Actually Delivered

The POC wasn’t meant to be a transcoder.
It was meant to answer questions.

It answered them:

Atomic rename is essential.
Temporary files must be invisible to Jellyfin.
FFmpeg builds differ wildly across distros.
MP4 remuxing is never blind.
MKV files often contain incompatible streams.
Stream selection is the core of the problem.
Real‑world media is messy.

These aren’t implementation details.
They’re architectural constraints.

The POC’s job was to reveal them early, and it did.

6. What Comes Next (Without Pretending It Already Exists)

There is no “final transcoder” yet.
There is only:

a POC
a set of validated constraints
a clearer understanding of the problem space

The next step is to design a system that respects those constraints.
But that design work hasn’t happened yet, and it shouldn’t be written about as if it has.

For now, the POC stands as a map of the terrain — the dragons included.