AI for Data Analysts That Knows Your Schema
You need a report on monthly recurring revenue by customer segment. You ask AI to write the SQL.
It gives you a query that looks reasonable. You run it. Error: table 'customers' doesn't exist. Your table is called 'accounts'. And MRR isn't a field — it's calculated from subscription_value and billing_frequency. And "customer segment" in your database is stored as tier_id, not segment_name.
You paste your schema. Explain the MRR calculation. Clarify the tier mapping. AI re-writes the query. Closer, but it's joining on user_id when it should join on account_id because your users table is different from your accounts table.
Twenty minutes later, you've got working SQL. Tomorrow, different report, same problem — AI forgot your schema. You start over.
Data analysts don't need AI that writes generic SQL. They need AI that knows their database.
Why Generic AI Fails Data Analysts
ChatGPT can write SQL. It just can't write your SQL.
It doesn't know your schema. It doesn't know your business logic. It doesn't know how your tables relate, what your naming conventions are, or what fields mean in your company's context.
Data analysts are working with:
- Database schemas (table names, field names, data types, relationships)
- Business logic (how revenue is calculated, what "active user" means, how churn is defined)
- Reporting formats (what stakeholders want to see, how data should be grouped, what's a metric vs. a dimension)
- Data quality issues (which fields are reliable, which have null problems, which need cleaning)
- Common queries (monthly reports that run the same way every time, dashboards that pull the same data)
When you ask AI to write a query, it guesses. Table names, field names, join logic — all guesses. Sometimes it's close. Usually it's wrong.
The AI can write SQL. It just doesn't know what to write it about.
What Data Analysts Actually Need
You need AI that remembers:
Your database schema. Not generic examples — your actual tables, fields, data types, primary keys, foreign keys, indexes. AI should know that 'accounts' exists but 'customers' doesn't.
Your business logic. How revenue is calculated. What "active user" means (logged in last 30 days? Made a purchase? Opened the app?). How churn is defined. What fields feed into what metrics.
Your table relationships. How users relate to accounts. How transactions relate to subscriptions. What joins work and what joins create duplicates. Which fields are reliable foreign keys and which aren't.
Your reporting standards. How stakeholders want data formatted. What date ranges default reports use. What gets rounded and to how many decimals. What gets grouped by month vs. week vs. day.
Your data quirks. The legacy field that's no longer updated. The table that has null problems. The join that's slow and should be avoided. The calculation that looks simple but has edge cases.
Generic AI can't do this. It needs context files.
How Context Files Work for Data Analysts
Context files are markdown documents that live in Obsidian. AI reads them every time you start a conversation.
One file might be database-schema.md:
- Table names and what they store
- Field names, data types, descriptions
- Primary keys and foreign keys
- Common joins and relationships
- Tables to avoid (deprecated, slow, unreliable)
Another might be business-logic.md:
- How key metrics are calculated (MRR, churn, LTV, CAC)
- Business definitions (what's an active user, what's a qualified lead)
- Segmentation logic (how customers are grouped)
- Edge cases and exceptions
Another: reporting-standards.md:
- How stakeholders want data formatted
- Default date ranges for monthly/quarterly reports
- Grouping and aggregation preferences
- Chart types and visualization standards
When you ask AI to write SQL for MRR by segment, it reads database-schema.md (knows the accounts table, subscription_value field, tier_id mapping), reads business-logic.md (knows how to calculate MRR from subscription_value and billing_frequency), and reads reporting-standards.md (knows stakeholders want monthly grouping, rounded to nearest dollar).
First query works. No schema pasting. No re-explaining business logic. Just working SQL.
Before and After
Before: "Write SQL for MRR by customer segment."
AI writes a query using table 'customers' (doesn't exist), field 'mrr' (doesn't exist), and 'segment' (stored as tier_id).
You paste schema. Explain MRR calculation. Clarify tier mapping. AI re-writes. Now it joins on user_id instead of account_id. You fix the join. Run it. Works, but slow because it's querying a deprecated table.
You've spent 20 minutes getting working SQL. Tomorrow, different report — you start over.
After: "Write SQL for MRR by customer segment."
AI reads database-schema.md, business-logic.md, and reporting-standards.md. First query: pulls from accounts table, calculates MRR correctly, joins on account_id, groups by tier_id, formats output as stakeholders expect. You run it. Works.
Next request: "Add churn rate to that report."
AI knows how churn is calculated (from business-logic.md), knows which fields to use, adds it to the query. Another working first draft.
No re-explaining. No schema pasting. AI remembers.
What This Looks Like in Practice
A data analyst at a SaaS company sets up four context files:
database-schema.md— All tables, fields, relationships, data typesbusiness-logic.md— Metric definitions, calculations, segmentation rulesreporting-standards.md— Stakeholder preferences, formatting rules, default date rangescommon-queries.md— Frequently-run reports and their SQL patterns
Total setup time: one afternoon (mostly copy-pasting existing documentation).
Now when she asks AI to write SQL, it knows the schema, the business logic, and the reporting standards. When she asks for Python to clean data, AI knows which fields have null problems. When she asks for a dashboard query, AI knows what stakeholders want to see.
When the schema changes (new table added, field renamed), she updates database-schema.md. Every future query uses the new schema. When business logic changes (new MRR calculation, different churn definition), she updates business-logic.md. AI adjusts automatically.
The context files become the data dictionary. New analysts read them for onboarding. AI reads them for every query. Stakeholders read them to understand where numbers come from.
What You Get
AI that writes SQL using your actual schema, not guesses.
AI that calculates metrics using your business logic, not generic formulas.
AI that formats reports the way stakeholders expect without being told.
AI that avoids data quality issues, slow tables, and deprecated fields.
AI that gets smarter as you update schema docs and business logic.
No more pasting schemas. No more re-explaining calculations. No more fixing broken queries that almost worked.
Data analysts already document schemas and business logic (or should). This just makes AI read it.
Build Your Data Memory System
One markdown file. One afternoon. AI that actually remembers who you are, what you do, and how you work.
Build Your Memory System — $997