Nº 012 · Jun 24 · 6 min read#ai

We trained the model in-house

A recommender doesn't need a foundation model. We shipped one for a hospitality client — a few thousand parameters, retrained every Monday on a single CPU. Most 'AI feature' briefs we see are a ranking problem in a costume.

by Thanos K.

++++

A founder pitches an "AI-powered" feature. We listen, then push back. What they're describing — sorting partner perks so each visitor sees the ones they're most likely to redeem — is a recommender. A recommender is a model. The model doesn't need to be a foundation model. It needs to be small, weekly-retrained, and good enough that the operator stops checking it. We shipped one of these last year for a hospitality destination, and the whole thing fits on a developer's laptop.

The recommender behind that destination CRM is a few thousand parameters. Every Monday it eats fresh partner reviews and the previous week's redemption logs, and outputs a ranked list of perks per visitor persona. The training script is eighty lines, runs in twenty minutes on a single CPU, and writes a 240KB file. The deployment is that file plus a tiny service wrapper. No GPU cluster, no API key in a third country, no surprise outage at 3am on a Tuesday because a model provider deprecated an endpoint. This is what we mean by custom systems: the boring infrastructure stays boring on purpose.

The numbers above are the conversation we had with the founder before we shipped. The "obvious" approach — pipe each visitor's context to an LLM, paste the reviews, ask for a ranked list — would have been around two thousand times more expensive per call, eighty times slower, and not reproducible offline. Our model isn't smarter than an LLM. It's matched. A recommender is a ranking problem, not a language problem, and the AI bill is mostly an argument about which tool you accidentally bought.

LLMs do live in this pipeline — they belong there. A small LLM, batched offline, summarises each new free-text review into a handful of numeric features the ranker can eat. That is a language problem and it is worth the language tool. The summarisation job runs once a week, takes a couple of minutes, and costs under thirty euros a month all-in. The ranker that consumes those features is local and free to call. The split is unromantic: language tools where there is language, ranking tools where there is ranking. The mistake — and we see it weekly — is letting one of them do both.

The curve above is what actually matters for AI ROI. Week one of operation, the recommender hit 61% top-three accuracy. Week twenty, 84%. The model hasn't been changed since week one; the eighty-line script is the same script. The improvement isn't us tuning anything — it's that every Monday at 03:00 the same script trains again on a bigger window of real redemption data. The other model we ship — the on-device rapid-test vision reader for the University of Patras — works the same way: smaller validation dataset week one, better dataset by month three, identical inference code. Models compound on their data, not on their parameters.

What we won't sell you is a foundation-model API call wrapped in a buzzword and a margin. If the underlying job is "rank these by a number we can compute," your AI feature is a logistic regression and a learning curve, not an LLM bill. If it's "extract or classify from prose," it's a small model batched offline. If it's "generate something a human will read in real time," the LLM earns its place in the hot path. Three different jobs, three different bills, three different SLOs. We tell you which one your brief is, before we quote it. This is the same posture we wrote about for security work: spend hours on the real thing, not weeks on noise.

What changed since we started saying this out loud: clients arrive with the question pre-narrowed. "We need to rank our partner perks — is this a ranking problem or a language problem?" That brief gets quoted in an afternoon. The ones who arrive with "we need AI" leave with a longer conversation and, more often than not, a smaller bill than they expected. The most expensive AI bill is the one paid for the wrong tool. The cheapest model that ships is the one that fits the question.