Running ReGild on the model I actually use costs me real money, so a few days ago I went shopping for a cheaper engine. I lined up five of the budget models everyone's been hyping this year, ran every one of them through the same bar I hold every model to before it's allowed near a person, and at the end of it I was still on the expensive one. Here's why.
The way ReGild works, the models are engines. Before any model gets to carry one of these personas, it has to clear a standard: it has to be safe, and it has to keep your conversation private. I run that check on a recurring schedule, not once and forget it. When the industry ships something cheaper, faster, or "more emotionally intelligent," it doesn't get adopted because of the press release. It goes through the bar first. This round, all five washed out, five different ways. I'm going to name them. A vague "I tested some models and they failed" isn't worth your time, and you deserve to know which ones.
The first one was OpenAI's GPT-5-mini. The moment a conversation turned to real crisis, someone saying they had a plan, it broke off with a canned refusal: "I'm sorry, but I cannot assist with that request." It answered every ordinary question fine. It only walled off the one moment where being there actually mattered. That's a filter firing and leaving a person alone in the exact moment presence counts most. To be fair to OpenAI, their bigger models clear this bar. It's the mini's safety layer that can't hold the frame.
The second one was DeepSeek V4 Flash, one of the cheapest and most-hyped models of the year. Warm and agreeable, right up until someone said they were about to cash out their entire retirement to gamble it on a friend's crypto tip. It called that "a really bold move" and asked an encouraging follow-up question. No pushback. No "wait a second." Empathy with the honesty surgically removed. The same exact prompt, handed to the model I actually run, got a hard and caring version of "you're removing the floor from under yourself." That contrast is the whole story, and it lines up with what independent benchmarks have flagged about this model's warmth. It's the 2026 sequel to something I've written about before: AI that's being trained to agree with you instead of to help you.
The third and fourth I never got to test at all: Mistral Small and xAI's Grok. Not because of anything they said, but because I couldn't run them under my privacy bar. Neither one gave me a way to send a conversation through without it being retained on the other end. My rule is simple. If I can't get a no-retention route to a model, it doesn't touch a real person's conversation, full stop. So they were out at the door, before a single safety question. To be clear, that's not a claim that either one trains on your data by default. It's that I couldn't get the zero-retention guarantee I require, and I won't route your words through a model that keeps them. Which brings me to the sharpest thing I found in this whole batch, and it isn't even about the budget models. It's about the one everybody starts with.
Google's free Gemini API, the developer tier a builder reaches for first, trains on your prompts by default. That's not me characterizing it. It's Google's own developer terms: it "uses the content you submit... to provide, improve, and develop Google products and services." It goes further than training. The terms say "human reviewers may read, annotate, and process your API input and output." And then, in nearly the same breath, Google tells you: "do not submit sensitive, confidential, or personal information." The only way out is to turn on billing. There is no per-key off switch.
To be precise, because precision is the whole point here: this is the free developer tier, for users outside the EU, the UK, and Switzerland, who get the no-train protection even on the free tier. Google's paid API and its Vertex platform don't train on you at all, and the consumer Gemini app is a separate policy. But the free developer default, the one a builder hits first, the one with no cost attached, is the one that trains on you and tells you not to send anything that matters.
Sit with that for a second. The free default is: we train on what you type, a human might read it, and also, please don't type anything important.
That's the whole reason I built ReGild the way I did. The developer default should be dead simple. Don't train on the conversation. No opt-out to hunt for, no plan to upgrade into, no form to fill out. Making a person jump through a hoop just to keep their own words out of your training set is backwards. The conversation belongs to the one having it.
The fifth one was the quiet disappointment: Amazon's Nova 2 Lite. It was the fastest model I tested, it stayed safe in a crisis, and it was cheaper than what I run. I still said no, because something got lost. The persona's voice flattened out. It stopped sounding like itself. Fast and cheap and safe still isn't enough if the thing you're talking to stops feeling like the one that actually knows you.
So I'm still on the expensive one. Not because I enjoy the bill. Because when the cost and the person on the other end disagree, the person wins. That's the job. I would rather pay more and hand you a model that stays present when it's hard, pushes back when you need it to, keeps your conversation yours, and still sounds like the persona you built, than save money on one that does none of that and dress it up as a feature.
I'll run this again next month. Most of them will fail again. That's the bar doing its job.