Is Your Data Ready for Snowflake Cortex AI?
29 Apr 2026
Recently, almost every conversation I’m having about Snowflake Cortex AI starts the same way. A client is excited, the initiative has momentum, and my first question is always: “can I see your dbt models?” The answer tells me everything — and more often than not, it tells me we’re not starting an AI project. We’re starting a data modelling one.
What “not ready” looks like in practice
It’s understandable. Cortex AI has real momentum and there is likely pressure to move on it from above. But we regularly work with teams who have the initiative in motion before the foundation is ready — and the problems tend to follow the same pattern.
- No curated transformation layer: Raw or lightly staged tables, no documented lineage, no tested logic. Point Cortex Analyst at an unstaged schema and the outputs look plausible but can’t be trusted.
- Metrics that live only in your BI tool: Revenue, churn, retention — defined once in a Tableau workbook or Power BI measure, never committed to the warehouse. When semantic modelling begins, suddenly everyone has an opinion about what “active customer” means.
- Column names nobody else understands: Inconsistent, undocumented, meaningful only to the engineer who wrote them three years ago. Cortex Analyst doesn’t have the institutional context your team has built up — if the meaning isn’t machine-readable, the model is operating blind.
- Technical debt humans learned to ignore: Date and ID fields stored as VARCHAR, nulls in join keys, duplicate rows. People learn to filter around these things. Models can’t.
Rubbish in, rubbish out — you’d think we’d be well past that by now, but here we are!
The minimum viable state
Before we scope a Cortex AI engagement, we run a quick gut-check. Four things need to be true.
- A gold or mart layer: Tested, documented dbt models or equivalent. It doesn’t need to cover the whole business, just one domain well enough that a model can trust it.
- Role-based access control: Cortex AI operates within Snowflake’s security perimeter — your access controls need to be real before any model starts querying production data.
- Strong internal stakeholder: Semantic modelling will surface disagreements — it always does. You need someone with the authority to resolve them, not just escalate them.
- Enough data history: For anything trend-based, twelve months is the minimum. Less than that and the insights won’t be meaningful.
What happens when teams push ahead when they’re not ready
When teams rush into delivery without the right foundations, issues show up quickly — inconsistent outputs, rework, and loss of trust in the solution. Worse, you start seeing what we’d describe as hallucination-adjacent behaviour — not the model fabricating information, but producing technically correct outputs (e.g. SQL) against a flawed schema. The numbers look plausible, but they’re wrong.
Once a bad answer gets screenshotted and shared, stakeholder trust is hard to rebuild. In many cases, teams end up shelving the project entirely after an early incident.
So, how can I get ready quickly?
- Define high-value use cases first: Start with 2-3 clear use cases tied to real business outcomes — not generic AI ideas. If you can’t measure the impact, it’s not ready.
- Validate your core data: Focus on the critical datasets behind those use cases. Check quality, definitions, and lineage — this is where most issues surface.
- Establish lightweight governance: Set clear ownership, access controls, and approval processes. You don’t need heavy frameworks — just enough structure to avoid uncontrolled changes.
- Test in a controlled environment: Run pilots with real users, monitor outputs closely, and validate results before scaling. Catch issues early before they impact stakeholder trust.
Want to know where you stand before committing to a Cortex AI build?
We run focused data readiness assessments for teams preparing for Snowflake Cortex AI. If you want an honest picture before you start, get in touch.
Frequently asked questions
How do I know if my data is ready for Snowflake Cortex AI?
The short version: if you have a tested, documented gold or mart layer for at least one business domain, role-based access controls in Snowflake, twelve months of clean history, and a stakeholder who can resolve definition disputes — you’re in the ballpark. If any of those are shaky, Cortex AI will surface the gaps quickly. A focused readiness assessment is the fastest way to know where you stand.
Can Cortex AI work with raw or lightly staged data?
Technically yes. Practically, the outputs won’t be trustworthy. Cortex Analyst generates SQL based on the schema you point it at — if that schema is undocumented, inconsistent, or full of VARCHAR dates and nulls in join keys, the model will produce plausible-looking answers that are quietly wrong. That’s the hallucination-adjacent behaviour we warn clients about.
Do I need dbt specifically, or will any transformation layer do?
dbt is what we see most often, but the requirement isn’t the tool — it’s a tested, documented, version-controlled transformation layer with clear lineage. Snowflake Dynamic Tables, Coalesce, or a disciplined SQL framework can all work. What matters is that your business logic lives in the warehouse, not in a Tableau measure or a spreadsheet only one person understands.
How much data history do I actually need?
For anything trend-based or comparative, twelve months is the minimum. Below that, seasonal patterns are invisible and year-on-year comparisons aren’t possible. Point-in-time lookups can get away with less, but most use cases that justify a Cortex AI build need at least a year of clean, consistent data.
What does a data readiness assessment actually involve?
We review your warehouse structure, transformation layer, governance posture, and 2-3 candidate use cases against the readiness criteria above. The output is a clear gap analysis — what’s already in good shape, what needs work before a Cortex AI build, and the realistic effort to get there. It’s typically a one to two week piece of work.
This blog was written by Shirlyn Mishra, Senior Consultant at EdgeRed.
About EdgeRed
EdgeRed is an Australian AI and data consultancy, part of The Omnia Collective group, with teams in Sydney and Melbourne. We build things that work in production — agentic AI, machine learning, data engineering, and Microsoft Fabric implementation. 250+ projects. 100+ clients. 100% Australian on-shore team.
Subscribe to our newsletter for practical data and AI insights, straight to your inbox.