AI Memory for Data Analysis: From Code to Business Logic
You ask AI to write a SQL query. "Pull active customers from last quarter."
AI generates syntactically correct SQL that returns the wrong data because it doesn't know: what "active" means in your business, which table actually stores customer status, that your fiscal quarters don't match calendar quarters, or that the customer_status field has three different inactive values you need to exclude.
You spend 20 minutes fixing the query because AI can write code but doesn't know your data.
What Data Analysis Looks Like With Memory
You tell Claude: "Pull Q4 active customers with LTV over $5K."
Claude already knows:
- Your database schema (table names, relationships, field definitions)
- Business logic ("active" means purchased within 90 days and status != 'churned', 'cancelled', or 'paused')
- How you calculate LTV (formula, which tables, exclusions)
- That your fiscal Q4 is Oct-Dec, not Jul-Sep
- Common data quality issues (duplicates in email field, test accounts to exclude)
- Which fields are reliable and which need validation
It writes a query that returns the right data on the first try because it knows how your business defines "active customer" and "LTV."
What Goes in Your Data Analysis Context File
Your data-dictionary.md file stores everything AI needs to write queries that match your business logic:
Database Schema Reference
Not a full technical schema dump. Just the tables and fields you actually use, with business context.
## Core Tables
### customers
Primary table for customer data. Updated nightly from Stripe.
Key fields:
- customer_id (PK, unique)
- email (unique, but has ~200 duplicates from migration — use customer_id as source of truth)
- status: 'active', 'churned', 'cancelled', 'paused', 'trial'
- Active = purchased within 90 days AND status = 'active'
- Churned = no purchase in 90+ days OR status = 'churned'
- created_at (account creation date, UTC)
- first_purchase_date (first successful payment, UTC)
- total_spend (lifetime revenue, in cents)
- mrr (monthly recurring revenue, in cents, NULL for one-time customers)
Relationships:
- customers.customer_id → purchases.customer_id (one to many)
- customers.customer_id → subscriptions.customer_id (one to many)
### purchases
Transaction history. Includes one-time and subscription payments.
Key fields:
- purchase_id (PK, unique)
- customer_id (FK to customers)
- amount (in cents, includes tax)
- purchase_date (UTC timestamp)
- product_id (FK to products)
- status: 'completed', 'refunded', 'failed', 'pending'
- Only count 'completed' for revenue calculations
- Refunded transactions stay in table but marked status='refunded'
### subscriptions
Active and cancelled subscriptions.
Key fields:
- subscription_id (PK, unique)
- customer_id (FK to customers)
- plan_id (FK to plans)
- status: 'active', 'cancelled', 'paused', 'past_due'
- mrr (monthly value in cents)
- start_date (subscription start, UTC)
- cancelled_at (NULL if active, timestamp if cancelled)
Business Logic Definitions
How you actually calculate metrics. Not just formulas — the context around them.
## Key Metrics
### Active Customer
**Definition:** Purchased within last 90 days AND status = 'active'
**SQL logic:**
```sql
WHERE status = 'active'
AND last_purchase_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
```
**Notes:**
- Don't use status field alone (churned customers sometimes show 'active' due to sync lag)
- 90-day window is company standard, matches churn definition
### Customer Lifetime Value (LTV)
**Definition:** Total revenue from customer since account creation, excluding refunds
**SQL logic:**
```sql
SELECT customer_id, SUM(amount) / 100 AS ltv
FROM purchases
WHERE status = 'completed'
GROUP BY customer_id
```
**Notes:**
- Divide by 100 (amounts stored in cents)
- Exclude refunded transactions (status = 'refunded')
- For subscription customers, use combination of purchases + projected MRR
- Does NOT include pending or failed transactions
### Monthly Recurring Revenue (MRR)
**Definition:** Sum of all active subscription monthly values
**SQL logic:**
```sql
SELECT SUM(mrr) / 100 AS total_mrr
FROM subscriptions
WHERE status = 'active'
```
**Notes:**
- Already in monthly terms (annual plans divided by 12)
- Paused subscriptions excluded
- Past due subscriptions included (we count them until formally cancelled)
### Churn Rate (Monthly)
**Definition:** Percentage of customers who cancelled subscription in given month
**SQL logic:**
```sql
-- Customers at start of month
-- Minus customers who cancelled during month
-- Divided by customers at start of month
```
**Notes:**
- Only count subscription customers (exclude one-time purchasers)
- Paused subscriptions not counted as churned unless formally cancelled
- Calculate monthly, not rolling 30 days
Fiscal Calendar & Reporting Periods
Your fiscal year doesn't match the calendar. AI needs to know this or every quarterly report is wrong.
## Reporting Periods
### Fiscal Year
Runs Oct 1 - Sep 30 (not Jan-Dec)
Fiscal quarters:
- Q1: Oct, Nov, Dec
- Q2: Jan, Feb, Mar
- Q3: Apr, May, Jun
- Q4: Jul, Aug, Sep
When someone asks for "Q4 2025", they mean Jul-Sep 2025.
### Reporting Frequency
- Revenue: Daily dashboard, monthly reports, quarterly board reports
- Churn: Monthly only
- Customer acquisition: Weekly trending, monthly detailed
- Product metrics: Daily for monitoring, weekly for analysis
### Comparison Periods
Default comparisons:
- Month over month: compare to same month last year (not previous month)
- Quarter over quarter: compare to same quarter last year
- Week over week: compare to previous week (not same week last year)
Data Quality Issues
Every database has problems. Document them so AI knows what to filter out.
## Known Data Issues
### Test Accounts
Exclude these emails from all customer analysis:
- Any email ending in @test.com
- Any email ending in @example.com
- customer_id values 1-100 (early test accounts)
- Any customer with 'test' in name field
Standard exclusion filter:
```sql
WHERE email NOT LIKE '%@test.com'
AND email NOT LIKE '%@example.com'
AND customer_id > 100
```
### Duplicate Records
- ~200 duplicate emails in customers table (from 2024 migration)
- Use customer_id as source of truth, not email
- When deduping, keep record with earliest created_at date
### Missing Data
- first_purchase_date is NULL for ~500 customers (pre-2023 accounts)
- For these, use MIN(purchase_date) from purchases table
- MRR field is NULL for one-time customers (expected)
- Some purchases missing product_id (payment processor error, ~0.5% of records)
### Timezone Issues
- All timestamps stored in UTC
- When filtering by date, use DATE() function to avoid timezone math
- Dashboard shows PT (convert UTC - 8 hours for PST, UTC - 7 for PDT)
Common Query Patterns
You run the same types of queries weekly. Store the patterns so AI doesn't reinvent them.
## Standard Query Templates
### Monthly Revenue Report
```sql
SELECT
DATE_FORMAT(purchase_date, '%Y-%m') AS month,
COUNT(DISTINCT customer_id) AS customers,
COUNT(purchase_id) AS transactions,
SUM(amount) / 100 AS revenue
FROM purchases
WHERE status = 'completed'
AND purchase_date >= '2024-01-01'
AND email NOT LIKE '%@test.com'
GROUP BY month
ORDER BY month DESC;
```
### Customer Cohort Analysis
```sql
-- Customers by signup month, with first purchase behavior
SELECT
DATE_FORMAT(created_at, '%Y-%m') AS signup_month,
COUNT(customer_id) AS signups,
COUNT(first_purchase_date) AS converted,
ROUND(COUNT(first_purchase_date) / COUNT(customer_id) * 100, 1) AS conversion_rate,
AVG(DATEDIFF(first_purchase_date, created_at)) AS days_to_first_purchase
FROM customers
WHERE created_at >= '2024-01-01'
AND email NOT LIKE '%@test.com'
GROUP BY signup_month
ORDER BY signup_month DESC;
```
### High-Value Customer Segment
```sql
-- Customers with LTV > $5K
SELECT
c.customer_id,
c.email,
c.status,
SUM(p.amount) / 100 AS ltv,
COUNT(p.purchase_id) AS purchase_count,
MAX(p.purchase_date) AS last_purchase
FROM customers c
JOIN purchases p ON c.customer_id = p.customer_id
WHERE p.status = 'completed'
AND c.email NOT LIKE '%@test.com'
GROUP BY c.customer_id
HAVING ltv > 5000
ORDER BY ltv DESC;
```
How to Set This Up
Start with your three most-used tables. For each one, list the fields you actually query and what they mean in business terms.
Write definitions for your top 3-5 metrics. Include the SQL logic and any gotchas.
Document one known data quality issue (test accounts, duplicates, whatever you always have to filter out).
That's enough to start. Ask AI to write a query. When it gets something wrong, update the context file with the correct definition. After 10 queries, you'll have documented most of your business logic.
Before and After
Without Memory
You: "Write a query to pull active customers from last quarter with LTV over $5K."
AI:
SELECT customer_id, email, SUM(purchase_amount) AS ltv
FROM customers
JOIN purchases ON customers.id = purchases.customer_id
WHERE status = 'active'
AND purchase_date >= '2025-10-01'
AND purchase_date < '2026-01-01'
GROUP BY customer_id
HAVING ltv > 5000;
Problems: Wrong quarter (used calendar Q4, not fiscal), wrong active definition (didn't check 90-day window), wrong LTV calculation (didn't exclude refunds or convert from cents), didn't filter test accounts, used wrong join field names.
You spend 20 minutes fixing it.
With Memory
You: "Pull Q4 active customers with LTV over $5K."
AI:
-- Q4 FY2025 high-value active customers (Jul-Sep 2025)
SELECT
c.customer_id,
c.email,
c.status,
SUM(p.amount) / 100 AS ltv,
MAX(p.purchase_date) AS last_purchase
FROM customers c
JOIN purchases p ON c.customer_id = p.customer_id
WHERE c.status = 'active'
AND p.status = 'completed'
AND c.email NOT LIKE '%@test.com'
AND c.email NOT LIKE '%@example.com'
AND c.customer_id > 100
AND p.purchase_date >= DATE(MAX(p.purchase_date)) - INTERVAL 90 DAY
AND p.purchase_date >= '2025-07-01'
AND p.purchase_date < '2025-10-01'
GROUP BY c.customer_id
HAVING ltv > 5000
ORDER BY ltv DESC;
```
Comment: Using fiscal Q4 (Jul-Sep), filtering for 90-day active status, excluding test accounts, calculating LTV from completed purchases only, converting from cents.
Correct quarter. Correct active definition. Correct LTV calculation. Test accounts filtered. Runs on first try.
What Changes
First week: Queries work on second or third try instead of fifth. You're still fixing business logic issues but not syntax errors.
First month: Most queries run correctly on first try. You're checking results for accuracy, not fixing SQL.
Three months: Context file has 15 table definitions, 20 metric calculations, common query patterns. AI generates production-ready queries. You spend time analyzing results instead of debugging SQL.
You get answers to business questions in 5 minutes instead of 45 minutes because AI knows your data model and business logic.
Stop Debugging Queries That Should've Worked
One markdown file. One afternoon. AI that actually remembers who you are, what you do, and how you work.
Build Your Memory System — $997