*Originally published by Business of Apps featuring insights from Tomás Yacachury on building AI for structured data in programmatic advertising.
__
When people think about artificial intelligence, they often picture chatbots spinning out text, image generators creating art, or voice assistants fielding questions. In all these cases, AI is working with unstructured data: words, pixels, sounds. Messy inputs, human in origin, and often evaluated qualitatively rather than quantitatively.
But there’s another side of AI that rarely gets airtime: the side that deals with structured data. At Kayzen, we live in this world. Programmatic advertising is built on structured data at massive scale. It’s this reality that led us to build kAI, the first AI agent for programmatic in-app supply, an entirely different challenge compared to the AI hype most people are used to.

In this article, I’ll unpack why structured data is both an advantage and a constraint, what unique challenges arise when building AI agents on top of it, and how we’ve approached those challenges at Kayzen.
Structured vs Unstructured Data: A Primer
AI doesn’t care whether it’s processing language, numbers, or images, it’s all just math in the end. But the form of the data dramatically shapes the kind of problems AI can solve.
- Unstructured data: Think text, images, audio, video. Inputs are inconsistent and messy, but the models (LLMs, diffusion, transformers) are built to handle fuzziness. Answers are probabilistic and subjective.
- Structured data: Think databases, bid requests, logs. Inputs are standardized, schema-driven, and numerical. Answers should be exact, or at least within an accurate range.

In programmatic, structured data is the air we breathe. A single DSP can process more than 3 million bid requests per second, each carrying a payload of device IDs, geo information, app metadata, placement signals, OS versions, bid floors, and more. On top of that, every advertiser campaign comes with its own structured layer: targeting rules, creative sets, KPIs, bid strategies.
If unstructured AI is like sculpting out of clay, structured AI is like assembling a jet engine. The parts have to fit. If they don’t, the engine doesn’t just look weird, it fails.
The Promise of AI in Programmatic
Programmatic advertising is a perfect candidate for AI because of three dynamics:
- Data Abundance: No shortage of input signals.
- Decision Density: Every millisecond a bid decision is made, often with incomplete information.
- Human Limitations: No person or team can parse this volume of structured data in real time.
That’s where kAI comes in. As the industry’s first programmatic in-app supply AI agent, it helps advertisers and marketers navigate this structured data universe. Instead of staring at dashboards or manually pulling reports, users can ask kAI direct questions about inventory availability, unique user reach, and market prices.

But building kAI wasn’t simply a matter of plugging ChatGPT into our data stack. We learned the hard way that structured data plays by different rules.
Challenge 1: Structured Data Isn’t Always “Clean”
In theory, structured data is tidy: rows, columns, schemas. In practice, it’s messy. Programmatic data streams often contain:
- Sparse fields (not every signal is always passed).
- Inconsistent labeling (app metadata differs across exchanges).
- Latency gaps (data pipelines may lag or drop).
An AI agent trained naively on structured data risks amplifying these inconsistencies. Ask for “highest reaching placements,” and you might get a biased view if half the exchanges didn’t populate the placement field consistently.
To mitigate this, we had to build normalization layers that sit between raw data and kAI. This isn’t glamorous work, but without it, the AI agent becomes unreliable. Structured data gives you the illusion of order, until you look closely.
Challenge 2: Focusing on What Really Matters
One of the paradoxes of programmatic is that abundance can be a liability. Every bid request carries dozens of signals, and when multiplied by millions per second, the sheer weight of data can overwhelm both humans and machines.
The reality is that not every signal is equally useful. Some dimensions drive outcomes; others are noise. The danger in training an AI agent on raw structured data is that it will happily process everything, but not all of it contributes meaningfully to decision-making. Worse, irrelevant signals can skew outputs or introduce inconsistencies that take time to debug.
This is where restraint matters. With kAI, we learned to start small, prioritizing the core dimensions and metrics that matter most to programmatic marketers: ad format, OS, Geo, Publisher App, request volume, unique users, price, etc; before expanding outward into secondary signals.
Focusing early on the essentials has two advantages:
- Accuracy is easier to validate: You can benchmark against known truths in key KPIs.
- Testing becomes manageable: Instead of validating across dozens of signals, you establish confidence in a smaller set, then scale.
Building an AI agent on structured data isn’t about feeding it everything. It’s about designing a hierarchy of relevance, what signals actually drive value for the end-user, and which ones can wait until later iterations.
Challenge 3: Hallucinations Are Harder to Spot
Anyone who has played with an LLM knows it hallucinates, confidently spitting out wrong answers. In unstructured settings (like writing copy), this may be tolerable. In structured settings, that eagerness to please the user with an answer, is dangerous.
Early in kAI’s development, if you asked “How many unique daily app users are there in the UK?” it might tell you 30 million. The uninitiated might have taken this question as correct, but the real figure, validated against internal data, was closer to 50–60 million.
The problem wasn’t that the model was lazy. It was that structured data has ground truth. Numbers should add up. Percentages should reconcile. An LLM that smooths over uncertainty with a guess isn’t “creative”, it’s plainly wrong.
We solved this through iterative testing and guardrails. That meant:
- Validating answers against known benchmarks.
- Injecting domain-specific constraints into the system.
- Stress-testing across different user personas, from someone very familiar with programmatic supply data like myself, down to the junior trader taking its first steps into programmatic buying.
Bottomline, when AI meets structured data, you can’t just trust outputs. You need to design systems where wrong answers surface quickly and get corrected.
Challenge 4: Balancing Accessibility with Depth
One of the reasons walled gardens like Meta and Google have been so successful with AI is accessibility. Their tools don’t overwhelm users with data, but rather they abstract complexity away.
Programmatic is a bit different. The ecosystem prides itself on customization and transparency. Marketers want to see levers, controls, and data slices. The challenge is: how do you make an AI agent accessible without flattening programmatic into a black box?
Our approach with kAI has been to strike a balance:
- Include Pre-Prompts to showcase the type of questions kAI can help you with.
- Train the AI agent to ask validating questions to ensure it provides an accurate answer to the user prompt.
- Ensure varying degrees of complexity on the answer, from plain, natural language answers down to graphs and tables with multiple dimensions and metrics.
- Design a follow up logic that incentivizes users to dig deeper and break down the data further, if desired.
- Build 2 versions of kAI, one open and free to the whole industry, and kAI Pro, our enhanced version for Kayzen customers with increased data granularity and capabilities.
kAI is not about establishing a dichotomy between humans and AI, but rather supercharging marketers with accessible, deep insights to make informed decisions with the speed and accuracy that programmatic marketing demands.
Lessons for Anyone Building AI on Structured Data
If I had to distill our journey with kAI into a few lessons for others working with structured data, they’d be:
- Don’t assume structured = clean: Build normalization and validation layers.
- Start with what actually matters: Start small with the core dimensions and metric that matter to your users, and build from there.
- Guardrail aggressively: Hallucinations matter more when numbers have to add up.
- Balance UX with transparency: Accessibility shouldn’t mean black-boxing.
Why This Matters Beyond Programmatic
Programmatic advertising might be a niche compared to text generation or image synthesis, but the lessons here apply broadly. Any industry that runs on structured data, finance, logistics, healthcare, faces the same challenges.
The hype cycle around AI has been dominated by unstructured use cases because they’re flashy and visible. I mean, I Ghibli’d the hell out of my family pictures. But the real economic impact of AI may be greatest in structured-data domains, where decisions are high-volume, high-stakes, and measurable.
Getting it right isn’t just about training bigger models. It’s about respecting the structure of the data, building systems that can harness it without distortion, and designing AI agents that integrate with how humans actually work.
That’s the journey we’re on with kAI. And we are just getting started.