The Hidden Complexity Behind Credit-Based Pricing
I recently shipped a new feature that makes my product credit-based, tied directly to token usage. Decimals, token logs, negative balances, and all the hidden billing logic behind AI usage.
I recently shipped a new feature that makes my product credit-based, tied directly to token usage.
What looked like a pricing update quickly turned into a product design problem: decimals, token logs, negative balances, and all the hidden billing logic behind AI usage.
Here are the 5 biggest lessons I learned.
1. Store Credits as Integers, Not Money-Like Decimals
One of the first decisions I had to make was how to store usage credits.
At first, it felt natural to store credits like this: 1.25 credits
But once billing logic gets involved, decimals quickly become annoying.
They introduce rounding problems. They also leave you with awkward edge cases, like a user having 0.0000001 credits remaining because of a tiny floating point calculation somewhere in the system.
So instead, I used integer scaling. Rather than storing 1 credit, I store 100 credit units.
That means:
1 credit = 100 units
0.5 credit = 50 units
0.25 credit = 25 unitsThis is the same general idea behind how serious billing systems avoid decimal chaos.
Stripe, for example, expects payment amounts in the smallest currency unit. Instead of sending $10.99, you send 1099 cents.
OpenAI’s usage model is also based on whole-number token counts, such as prompt_tokens, completion_tokens, and total_tokens
For a credit-based system, that means scaling credits into integer units early. It makes deduction logic, refunds, transaction history, and debugging much easier later.
The lesson:
If you are building credit-based billing, store credits as scaled integers from the beginning.
2. Usage-Based Pricing Forces You to Understand Your Real Costs
Before this migration, “one report” was the product unit. That is simple for users. You pay for one report. You get one report. But for the business, this pricing strategy is quite stiff.
One requirement kept surfacing while I was doing in-person user testing: users wanted to edit the prompt. And honestly, that makes a lot of sense. No one can make sure they write everything perfectly in one go.
This was the original reason I started migrating the application from report-based pricing to credit-based pricing using tokens.
Once you move to token-based usage, you need to carefully calculate each model and each task.
Here is an example. Let’s say this task,
The user uploads a long document.
The AI reads the document.
Then it generates a structured summary, key points, and action items.Here is how much the same task would cost across several popular models:
The lesson:
Usage-based pricing turns cost from an invisible backend detail into a product design constraint.
3. Logging Token Usage
Logging sounds boring until the product logic starts getting expensive.
At the beginning, it is easy to treat AI usage as one vague cost bucket. The user clicks a button, the model returns something, and somewhere in the background tokens are being burned. That is fine for a prototype, but it is not enough once the product starts charging users based on usage.
As my product logic became more complex, I realised different actions had very different cost profiles. Editing a prompt, regenerating one card, generating a full report, creating a roadmap, or calculating a score are not the same thing from a token perspective.
Without logging, AI cost becomes a black box.
With logging, every operation becomes traceable.
I can see how many AI calls happened, how many input_tokens were used, how many cached_tokens were reused, how many output_tokens were generated, and roughly how much each operation cost. This matters because credit-based billing is not just a pricing decision. It is also an accounting problem.
Before testing the feature properly, I asked Claude Code to help me add a simple JSON usage log inside the project:
Help me add a token usage logging system to my project.
I want every major AI operation to save a usage log as a JSON file inside my project.
Each record should include:
{
"timestamp": "",
"operation": "",
"calls": 0,
"inputTokens": 0,
"cachedTokens": 0,
"outputTokens": 0,
"costUsd": 0
}This small logging layer made the whole system much easier to reason about. Later, I can calculate real usage, real cost, and real margin instead of guessing from vibes.
The lesson:
If users pay based on AI usage, your logs are part of your billing infrastructure.
4. What if user just only have 0.1 credit?
One product decision I had to make was what happens when a user only has a tiny amount of credit left, but starts a generation that costs more than their balance.
The strict engineering answer is simple: block the request.
But from a UX perspective, that feels terrible. The user has already decided to try the product. Stopping them at 0.1 credit creates friction at exactly the wrong moment.
So I chose a soft prepaid-credit model. If the user is slightly short, I still let the run finish. The balance can temporarily go negative, and the negative amount is deducted from the next top-up.
For example, if a user has 0.1 credit and the run costs 2.4 credits, their balance becomes -2.3. If they later buy 10 credits, the outstanding 2.3 credits are settled first, leaving them with 7.7 usable credits.
This is similar to how some usage-based billing systems handle delayed cutoff or customer balance adjustments. The key is to cap the possible negative balance so the product protects UX without allowing unlimited free usage.
The lesson:
The lesson: billing logic should protect both the business and the user experience. A hard block is clean for the system, but sometimes terrible for the user. A soft negative balance lets the user finish the value moment, while a strict cap protects the product from abuse.
5. Let Codex and Claude Code Criticise the Product Logic Twice
One implementation habit I’m trying to build is not asking AI agents to simply “write the code,” but asking them to attack the product logic before the code becomes permanent.
In the first round, before writing a single line of code, I asked Claude Code to write a well-documented spec for this specific task in markdown format. After Claude Code generated credit.md, I asked Codex to read the Markdown file under the spec folder and do a critical review. Its job was to leave comments on the logic.
Read the credit.md file under the spec folder.
Do not write code yet.
Tell me whether this logic makes sense.
Point out anything risky, missing, unclear, or overcomplicated and write comments. Codex will then critically review the spec generated by Claude Code and write down its comments.
This became the second round of review.
I asked Claude Code to read Codex’s feedback and give me another critical review. I wanted Claude to explain whether Codex’s concerns were valid, whether the spec needed to change, and whether anything still felt over-engineering before implementation.
Then I copied Claude’s review back into Codex. Most of the time, after this back-and-forth, Codex would say the logic was solid enough to move forward. Only then would I ask the agent to start generating the actual code.
The lesson:
For billing features, I do not want one AI agent to vibe-code straight into production. I want two agents to argue over the spec first. If the logic survives that argument, then it earns the right to become code.
I’ve been building IdeaGrit for the last few weeks. It’s a tool for pressure-testing startup ideas. Help you find the right hard thing to commit to.
Sign up and you’ll automatically get 5 free credits. That’s more than enough to edit your prompt, refine your cards, and explore the product properly. You’ll get a complete report, an actionable roadmap, and a pre-mortem showing six real failed products with similar ideas, so you can learn what went wrong.
When you’re ready to go deeper, you can move to paid and unlock the full score.
If you found IdeaGrit helpful, I’d really appreciate an upvote.


