AI Memory for Data Analysis: From Code to Business Logic

Updated January 2026 | 8 min read

Key Takeaways

What: A structured markdown file (CLAUDE.md) that stores your business context permanently.
How: Claude Code reads this file automatically at the start of every conversation.
Why it matters: Your AI starts every session knowing your business, clients, processes, and voice.
Setup: One afternoon. No coding required. Works alongside your existing tools.

You ask AI to write a SQL query. "Pull active customers from last quarter."

AI generates syntactically correct SQL that returns the wrong data because it doesn't know: what "active" means in your business, which table stores customer status, that your fiscal quarters don't match calendar quarters, or that the customer_status field has three different inactive values you need to exclude.

You spend 20 minutes fixing the query because AI can write code but doesn't know your data.

What Data Analysis Looks Like With Memory

You tell Claude: "Pull Q4 active customers with LTV over $5K."

Claude already knows:

Your database schema (table names, relationships, field definitions)
Business logic ("active" means purchased within 90 days and status != 'churned', 'cancelled', or 'paused')
How you calculate LTV (formula, which tables, exclusions)
That your fiscal Q4 is Oct-Dec, not Jul-Sep
Common data quality issues (duplicates in email field, test accounts to exclude)
Which fields are reliable and which need validation

It writes a query that returns the right data on the first try because it knows how your business defines "active customer" and "LTV."

What Goes in Your Data Analysis Context File

Your data-dictionary.md file stores everything AI needs to write queries that match your business logic:

Database Schema Reference

Not a full technical schema dump. the tables and fields you use, with business context.

## Core Tables

### customers
Primary table for customer data. Updated nightly from Stripe.

Key fields:
- customer_id (PK, unique)
- email (unique, but has ~200 duplicates from migration — use customer_id as source of truth)
- status: 'active', 'churned', 'cancelled', 'paused', 'trial'
  - Active = purchased within 90 days AND status = 'active'
  - Churned = no purchase in 90+ days OR status = 'churned'
- created_at (account creation date, UTC)
- first_purchase_date (first successful payment, UTC)
- total_spend (lifetime revenue, in cents)
- mrr (monthly recurring revenue, in cents, NULL for one-time customers)

Relationships:
- customers.customer_id → purchases.customer_id (one to many)
- customers.customer_id → subscriptions.customer_id (one to many)

### purchases
Transaction history. Includes one-time and subscription payments.

Key fields:
- purchase_id (PK, unique)
- customer_id (FK to customers)
- amount (in cents, includes tax)
- purchase_date (UTC timestamp)
- product_id (FK to products)
- status: 'completed', 'refunded', 'failed', 'pending'
  - Only count 'completed' for revenue calculations
  - Refunded transactions stay in table but marked status='refunded'

### subscriptions
Active and cancelled subscriptions.

Key fields:
- subscription_id (PK, unique)
- customer_id (FK to customers)
- plan_id (FK to plans)
- status: 'active', 'cancelled', 'paused', 'past_due'
- mrr (monthly value in cents)
- start_date (subscription start, UTC)
- cancelled_at (NULL if active, timestamp if cancelled)

Business Logic Definitions

How you calculate metrics. Not formulas, the context around them.

## Key Metrics

### Active Customer
**Definition:** Purchased within last 90 days AND status = 'active'
**SQL logic:**
```sql
WHERE status = 'active'
  AND last_purchase_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
```
**Notes:**
- Don't use status field alone (churned customers sometimes show 'active' due to sync lag)
- 90-day window is company standard, matches churn definition

### Customer Lifetime Value (LTV)
**Definition:** Total revenue from customer since account creation, excluding refunds
**SQL logic:**
```sql
SELECT customer_id, SUM(amount) / 100 AS ltv
FROM purchases
WHERE status = 'completed'
GROUP BY customer_id
```
**Notes:**
- Divide by 100 (amounts stored in cents)
- Exclude refunded transactions (status = 'refunded')
- For subscription customers, use combination of purchases + projected MRR
- Does NOT include pending or failed transactions

### Monthly Recurring Revenue (MRR)
**Definition:** Sum of all active subscription monthly values
**SQL logic:**
```sql
SELECT SUM(mrr) / 100 AS total_mrr
FROM subscriptions
WHERE status = 'active'
```
**Notes:**
- Already in monthly terms (annual plans divided by 12)
- Paused subscriptions excluded
- Past due subscriptions included (we count them until formally cancelled)

### Churn Rate (Monthly)
**Definition:** Percentage of customers who cancelled subscription in given month
**SQL logic:**
```sql
-- Customers at start of month
-- Minus customers who cancelled during month
-- Divided by customers at start of month
```
**Notes:**
- Only count subscription customers (exclude one-time purchasers)
- Paused subscriptions not counted as churned unless formally cancelled
- Calculate monthly, not rolling 30 days

Fiscal Calendar & Reporting Periods

Your fiscal year doesn't match the calendar. AI needs to know this or every quarterly report is wrong.

## Reporting Periods

### Fiscal Year
Runs Oct 1 - Sep 30 (not Jan-Dec)

Fiscal quarters:
- Q1: Oct, Nov, Dec
- Q2: Jan, Feb, Mar
- Q3: Apr, May, Jun
- Q4: Jul, Aug, Sep

When someone asks for "Q4 2025", they mean Jul-Sep 2025.

### Reporting Frequency
- Revenue: Daily dashboard, monthly reports, quarterly board reports
- Churn: Monthly only
- Customer acquisition: Weekly trending, monthly detailed
- Product metrics: Daily for monitoring, weekly for analysis

### Comparison Periods
Default comparisons:
- Month over month: compare to same month last year (not previous month)
- Quarter over quarter: compare to same quarter last year
- Week over week: compare to previous week (not same week last year)

Data Quality Issues

Every database has problems. Document them so AI knows what to filter out.

## Known Data Issues

### Test Accounts
Exclude these emails from all customer analysis:
- Any email ending in @test.com
- Any email ending in @example.com
- customer_id values 1-100 (early test accounts)
- Any customer with 'test' in name field

Standard exclusion filter:
```sql
WHERE email NOT LIKE '%@test.com'
  AND email NOT LIKE '%@example.com'
  AND customer_id > 100
```

### Duplicate Records
- ~200 duplicate emails in customers table (from 2024 migration)
- Use customer_id as source of truth, not email
- When deduping, keep record with earliest created_at date

### Missing Data
- first_purchase_date is NULL for ~500 customers (pre-2023 accounts)
  - For these, use MIN(purchase_date) from purchases table
- MRR field is NULL for one-time customers (expected)
- Some purchases missing product_id (payment processor error, ~0.5% of records)

### Timezone Issues
- All timestamps stored in UTC
- When filtering by date, use DATE() function to avoid timezone math
- Dashboard shows PT (convert UTC - 8 hours for PST, UTC - 7 for PDT)

Common Query Patterns

You run the same types of queries weekly. Store the patterns so AI doesn't reinvent them.

## Standard Query Templates

### Monthly Revenue Report
```sql
SELECT
  DATE_FORMAT(purchase_date, '%Y-%m') AS month,
  COUNT(DISTINCT customer_id) AS customers,
  COUNT(purchase_id) AS transactions,
  SUM(amount) / 100 AS revenue
FROM purchases
WHERE status = 'completed'
  AND purchase_date >= '2024-01-01'
  AND email NOT LIKE '%@test.com'
GROUP BY month
ORDER BY month DESC;
```

### Customer Cohort Analysis
```sql
-- Customers by signup month, with first purchase behavior
SELECT
  DATE_FORMAT(created_at, '%Y-%m') AS signup_month,
  COUNT(customer_id) AS signups,
  COUNT(first_purchase_date) AS converted,
  ROUND(COUNT(first_purchase_date) / COUNT(customer_id) * 100, 1) AS conversion_rate,
  AVG(DATEDIFF(first_purchase_date, created_at)) AS days_to_first_purchase
FROM customers
WHERE created_at >= '2024-01-01'
  AND email NOT LIKE '%@test.com'
GROUP BY signup_month
ORDER BY signup_month DESC;
```

### High-Value Customer Segment
```sql
-- Customers with LTV > $5K
SELECT
  c.customer_id,
  c.email,
  c.status,
  SUM(p.amount) / 100 AS ltv,
  COUNT(p.purchase_id) AS purchase_count,
  MAX(p.purchase_date) AS last_purchase
FROM customers c
JOIN purchases p ON c.customer_id = p.customer_id
WHERE p.status = 'completed'
  AND c.email NOT LIKE '%@test.com'
GROUP BY c.customer_id
HAVING ltv > 5000
ORDER BY ltv DESC;
```

How to Set This Up

Start with your three most-used tables. For each one, list the fields you query and what they mean in business terms.

Write definitions for your top 3-5 metrics. Include the SQL logic and any gotchas.

Document one known data quality issue (test accounts, duplicates, whatever you always have to filter out).

That's enough to start. Ask AI to write a query. When it gets something wrong, update the context file with the correct definition. After 10 queries, you'll have documented most of your business logic.

Before and After

Without Memory

You: "Write a query to pull active customers from last quarter with LTV over $5K."

AI:

SELECT customer_id, email, SUM(purchase_amount) AS ltv
FROM customers
JOIN purchases ON customers.id = purchases.customer_id
WHERE status = 'active'
  AND purchase_date >= '2025-10-01'
  AND purchase_date < '2026-01-01'
GROUP BY customer_id
HAVING ltv > 5000;

Problems: Wrong quarter (used calendar Q4, not fiscal), wrong active definition (didn't check 90-day window), wrong LTV calculation (didn't exclude refunds or convert from cents), didn't filter test accounts, used wrong join field names.

You spend 20 minutes fixing it.

With Memory

You: "Pull Q4 active customers with LTV over $5K."

AI:

-- Q4 FY2025 high-value active customers (Jul-Sep 2025)
SELECT
  c.customer_id,
  c.email,
  c.status,
  SUM(p.amount) / 100 AS ltv,
  MAX(p.purchase_date) AS last_purchase
FROM customers c
JOIN purchases p ON c.customer_id = p.customer_id
WHERE c.status = 'active'
  AND p.status = 'completed'
  AND c.email NOT LIKE '%@test.com'
  AND c.email NOT LIKE '%@example.com'
  AND c.customer_id > 100
  AND p.purchase_date >= DATE(MAX(p.purchase_date)) - INTERVAL 90 DAY
  AND p.purchase_date >= '2025-07-01'
  AND p.purchase_date < '2025-10-01'
GROUP BY c.customer_id
HAVING ltv > 5000
ORDER BY ltv DESC;
```

Comment: Using fiscal Q4 (Jul-Sep), filtering for 90-day active status, excluding test accounts, calculating LTV from completed purchases only, converting from cents.

Correct quarter. Correct active definition. Correct LTV calculation. Test accounts filtered. Runs on first try.

What Changes

First week: Queries work on second or third try instead of fifth. You're still fixing business logic issues but not syntax errors.

First month: Most queries run correctly on first try. You're checking results for accuracy, not fixing SQL.

Three months: Context file has 15 table definitions, 20 metric calculations, common query patterns. AI generates production-ready queries. You spend time analyzing results instead of debugging SQL.

You get answers to business questions in 5 minutes instead of 45 minutes because AI knows your data model and business logic.

When a Memory System Isn't Necessary

A structured AI memory system is overkill if:

You have one simple use case. If you only use AI for drafting emails, ChatGPT's Custom Instructions (1,500 characters) might cover it.
You're not ready to document your processes. The memory file requires you to articulate how you work. If your business processes aren't defined yet, document those first, the AI memory is downstream.
You prefer starting fresh each time. Some people find that a blank slate helps them think differently. If context-free AI conversations serve your creative process, that's valid.

Frequently Asked Questions

What is a CLAUDE.md file?

A CLAUDE.md file is a markdown document that Claude Code reads automatically at the start of every conversation. It contains your business context: who you are, what you do, how you work, your terminology, your processes. Think of it as a briefing document that your AI assistant reads before every interaction.

How is this different from custom instructions?

Custom instructions in ChatGPT are limited to about 1,500 characters, roughly a paragraph. A CLAUDE.md file has no practical size limit. You can document your entire business operation, client roster, decision frameworks, and communication style. The difference is between a sticky note and an employee handbook.

Is my data safe with an AI memory system?

With Claude Code, your memory file stays on your local machine. It's never uploaded to a cloud server or used for training. You control the file, you control what's in it, and you can version it with git for full change history. Your business data stays yours.

Stop Debugging Queries That Should've Worked

One markdown file. One afternoon. AI that remembers who you are, what you do, and how you work.

Build Your Memory System. $997