From Research to Production: The Benchmark Said 15 FPS. My Environment Said 10. Dependency Versions Said Nothing. 😑

In my previous post, I wrote about bridging the gap between research code and production systems.

This one is about something I did not pay enough attention to — until I had to. 😅

Library versions. Yes, really. Stay with me. 🙏

I once received a codebase from a PoC engineer who had genuinely done great work.

The model was accurate, the approach was validated, and they had benchmarked inference at 15 FPS on their GPU server.

For our use case — a real-time system where both speed and accuracy were hard requirements — that number was exactly what we needed.

So I set everything up. Same input file, same configuration, same model weights. Did everything right. Felt very professional about it.

Ran it.

"Excuse me, what?" My disappointment was immeasurable and my day was ruined. 🚑 🚑 🚑 🚑

10 FPS. 😱😱😱

Basic confirmation Before panicking, I did the responsible thing and checked everything I could think of.

So I did what any reasonable engineer does — I assumed I had done something wrong and started checking everything again. 🤔 Twice. Maybe three times. I am not counting.

Same input.
Same config.
Same weights.

Still 10 FPS. 😱

The environment was committed to its choices. Not mine. 🥹

Profiling the main code blocks When the obvious checks give you nothing, you go deeper. So I did.

So I started profiling. Pre-processing, inference, post-processing — one by one.

The inference timing was clearly off.

And after a while — longer than I would like to admit, and we will leave it at that — I found it.

Dependency Inconsistency 😱 Spoiler: the code was fine. The libraries had simply decided to live their own lives.

The library versions were different. Not one library. Several:

numpy
torch
torchvision
opencv-python

And the tricky part was that there were three different version states all quietly existing at the same time:

Their actual server environment used for benchmarking
Their requirements.txt in the repository
The environment I had built from that file — and before you ask, yes, I built it from the requirements.txt. Turns out that is not always enough. 🥹

None of them fully matched each other. Living their own lives. Unbothered.

Honestly, who would think that library versions could eat 33% of your speed?

Not me. Not that day.

Same code. Same model. Same hardware. Just a few version numbers sitting quietly in a text file, minding their own business, and taking a third of my performance with them. 😑

Nobody made a mistake. This is just what happens.

How Does This Even Happen?
Nobody changed anything. And yet, everything was slightly different.

When versions are not strictly pinned, pip installs what it considers compatible — not necessarily what was originally running.

For example, instead of:

torch==2.0.0
numpy==1.24.0

A requirements.txt with loose ranges like:

torch>=1.0
numpy>=1.0

...gives pip a lot of room to decide. And different machines, different times, different existing packages — pip may land on different versions each time.

It gets more interesting from there. Even with the same Python package versions, these can still push your environment in a different direction:

Loose version ranges — pip picks what fits, not what you had
Missing lock files — no single source of truth for exact versions
CUDA wheel differences — torch==2.0.0 installed with CUDA 11.8 and torch==2.0.0 installed with CUDA 12.1 are different wheels — pinning the version alone is not enough
Driver versions — same package, different runtime behavior
OS differences — compiled packages like opencv-python can behave differently across Ubuntu versions

The reason it is easy to miss is that nothing breaks loudly.

The model runs. The results look correct. The gap shows up in performance — and in a real-time system, performance is the product. Same code, different versions. Rude, honestly. 😂

When I finally asked them to share what was actually installed on their server — not what the requirements.txt said, but the real live environment — everything became clear.

The file had captured their intention, not their reality.

What to check when the numbers don't match
If you find yourself in the same situation, here is where to start before going down a long debugging rabbit hole. 🐇

Check these three places first:

The original environment — ask the engineer to run pip freeze on their actual server.
The requirements.txt — compare it against the actual environment. They should be the same. Sometimes they are not, and that is okay, but you need to know.
Your own environment — run pip freeze yourself and compare. All three should tell the same story.

If any of them disagree, that is your starting point.

One command that makes all of this much easier — run it on the same day you run your benchmark, on the same server:

pip freeze > requirements.txt

That number you are reporting — 15 FPS, 200ms latency, whatever it is — deserves to have its environment captured alongside it.

Future you, and whoever receives your code next, will be very grateful. 🙏

Been there. Survived it.

Now I run pip freeze like it is part of my morning routine. Coffee, pip freeze, then start the day. ☕

Is that paranoid? Perhaps. But 15 FPS is 15 FPS. 😄

From Research to Production: The Benchmark Said 15 FPS. My Environment Said 10. Dependency Versions Said Nothing. 😑

Comments

From Research to Production

From Research to Production: What Code Needs Before It Can Ship

More from this blog

How to Understand Latency in Computer Vision Inference Systems

From Research to Production: What Code Needs Before It Can Ship

Command Palette

Comments

From Research to Production

From Research to Production: What Code Needs Before It Can Ship

More from this blog