# Graduation Alliance Brand Study — Wave 0 Stratification Map (`_profile.md`)

**FROZEN 2026-06-25.** This is the denominator sheet every Wave-1 lens references.
Question under study: *Competitive creative strategy for Graduation Alliance* (online diploma / dropout-recovery) vs Grad Solutions · Learn4Life · AHSA · ChanceLight.
Corpus: `capture/bundle/all_mentions.jsonl` — **14,119 records** (post-window, post-dedup; 4,410 dropped from 18,529 raw = 1,004 empty + 0 dupes + 2,688 pre-window stale + Dropout.tv-fandom contamination).
Companion: `_persona-roster.md` (the frozen 4–6 persona set). Rigor rails: `kb/brand-study-recipe.md` §3 (rails 1–8). Bias/sub-tiering source: `COVERAGE.md`.

> **How to use this file.** This is a **creative-led, not a review-led, category** (1,110 live ads vs ~83 third-party reviews — the inverse of an ecommerce VOC study). Every prevalence number a lens reports must name *which base* it is counted against (rail 2). The four bases are defined in §3. The single most-used denominator is **n=2,957 on-topic VOC** (§3) — every persona/trigger/barrier/objection % in `prevalence.md` is counted against it. Cross-brand/cross-surface review claims must obey the bias frame in §4 and the window rules in §5. A claim that ignores these does not ship.

---

## 1. Counts by stratum

### By source
| source | records | rating-bearing? |
|---|---:|---|
| reddit | 12,048 | no (social/advice — demand voice) |
| google_ads | 1,029 | no (ad creative; **`text` empty on all** — format/scale only) |
| youtube | 454 | no (creator + comment; dates approximate) |
| instagram | 282 | no (brand posts + comments) |
| tiktok | 112 | no (creator + comment) |
| meta_ads | 81 | no (ad creative; **copy-bearing** — the only readable ad text) |
| google_maps | 79 | **yes** (real third-party tail: 71×5★ · 6×1★ · 2×2★) |
| ownsite | 30 | no (brand-claimed testimonials, curated) |
| niche | 2 | yes (curated featured reviews — directional only) |
| greatschools | 2 | yes (directional only) |
| **TOTAL** | **14,119** | |

### By bucket / brand
| bucket | role | records | ad creatives (meta+google) | third-party reviews |
|---|---|---:|---:|---:|
| **graduation_alliance** | subject (B2B + adult-direct, multi-state) | 788 | 637 (56 meta + 581 google) | 72 gmaps + 1 niche |
| grad_solutions | direct B2C (AR/AZ) | 203 | 33 (4 meta + 29 google) | 7 gmaps |
| learn4life | direct B2C (parent-of-teen) | 435 | 256 (14 meta + 242 google) | 0 in-window (FB recs pre-window) |
| ahsa | direct B2C (Miami/intl, prestige) | 227 | 184 (7 meta + 177 google) | 2 greatschools + 1 niche |
| chancelight | B2B → district (the whitespace) | **0** | **0** | **0** — invisible in consumer market (LinkedIn-recruiting only) |
| **category** | buyer-problem demand voice | 12,466 | — | — (Reddit 12,048 + YouTube 418) |

> `bucket` (not `brand`) is the partition key. ChanceLight's 0 across every consumer surface is a **strategic finding** (most-credible competitor cedes the consumer lane), not a capture failure — see §4 and `03_whitespace_positioning.md` §3.

### By kind / buyer_stage
| kind | buyer_stage | records |
|---|---|---:|
| comment | social/advice | 8,890 |
| post | advice-seeking-or-broadcast | 3,879 |
| ad | advertiser | 1,110 |
| video | creator | 127 |
| review | purchaser (third-party) | 83 |
| testimonial | brand-claimed | 30 |

### By year (coarse — `created_iso[:4]`; ads dated by *launch*, kept by *still-active*)
2023 → 45 · 2024 → 207 · 2025 → 7,666 · 2026 → 6,186. (~16 records carry only relative YouTube timestamps "N years ago" snapped to capture — coarse bins, no exact trend.) **Volume is recency-weighted by the 18-month window; NOT comparable cross-source over time — see §5.**

---

## 2. The on-topic / off-topic split — the rail-2 foundation (where the demand voice actually lives)

The corpus was classified full-corpus (gpt-4.1-mini via Zhiyun, schema-validated — `prevalence.md`): **13,075 records classified** (81 Meta ads + 12,994 VOC; the 1,029 Google ads carry no `text` and are not VOC-classifiable). Of the 12,994 VOC records:

| | records | share of VOC |
|---|---:|---:|
| **ON-TOPIC VOC (the demand denominator, n=2,957)** | **2,957** | **23%** |
| off-topic / noise (flagged, excluded from prevalence) | 10,037 | 77% |

**This is the load-bearing stratification fact: only ~23% of the raw social volume is genuine dropout-recovery / diploma-completion intent.** The other 77% is community chatter that matches query tokens but isn't enrollment-intent (teacher-policy debate, post-secondary tangents, generic teen life). **Every persona/trigger/barrier/objection % in `prevalence.md` is counted against the 2,957 on-topic base, never against the 12,994 raw VOC or the 14,119 corpus.** A lens that quotes a prevalence % must say "of on-topic VOC, n=2,957."

### Reddit sub-tiering (COVERAGE bias rail — signal-cleanliness, not volume)
Reddit is 85% of the corpus but volume ≠ signal. Tier by enrollment-intent cleanliness; **prefer thread/sub diversity over raw comment counts** (a few mega-threads dominate):

| Tier | Communities (records) | Read as |
|---|---|---|
| **Cleanest buyer-intent (LEAD here)** | r/GED 790 · r/highschool 1,615 · r/AdultEducation 46 · r/findapath 1,153 | prospective-student / at-risk-adult problem voice — the demand gold |
| **Youth in-school stall (lead, noisy)** | r/teenagers 2,014 | in-school "might not graduate / dropping out" panic — high volume, use for the youth stall, filter noise |
| **Parent / payer lens** | r/Parenting 1,058 | the *buyer/decision-maker* for the youth track (loose token matches — filter by topic before use) |
| **Educator lens (SEPARATE read, NOT demand)** | r/Teachers 3,519 | supply-side *teacher* sentiment about dropouts, **thread-inflated** (driven by ~2 viral 500-comment-capped threads). Relevant to GA's district/B2B side ONLY. **Never read as buyer demand.** |
| **Post-secondary tangent (weight down)** | r/college 1,853 | post-secondary, mostly off-topic to HS-completion intent |

> r/dropout (Dropout.tv comedy fandom) was auto-excluded by `is_contaminated()` before this corpus — pure noise, not a dropout community. The demand-voice lens (`02_demand_voice.md`) analyzed ~6,011 lead-tier + YouTube records and explicitly **excludes r/Teachers**.

### YouTube category voice (n=418 category comments/videos)
Skews to *documentary-reaction* audiences (dropout docs — Sulaiman/Christine/Marco, "GED coach", "back to school as an adult"). Buyer-adjacent empathy + outcome language; lighter on how-to GED purchase intent. Dates approximate (relative timestamps) → §5.

---

## 3. The four evidence bases (denominators — rail 2)

Every number must declare which of these it is counted against. This category has a *creative* spine and a *demand-voice* spine, and only a thin *brand-sentiment* tail — so the bases differ from an ecommerce VOC study.

| Base | Records | What it is | Use for | NEVER use for |
|---|---:|---|---|---|
| **CREATIVE BASE (advertiser)** | **1,110** | 1,029 Google (format/scale, no copy) + 81 Meta (copy-bearing) | angle / offer / format teardown, saturation, share-of-voice | reading copy off Google ads (empty `text`) |
| **DEMAND-VOICE BASE (on-topic VOC)** | **2,957** | Reddit (tiered) + YouTube category, classified on-topic | triggers, barriers, objections, persona voice, % prevalence | counting as *brand* sentiment |
| **BRAND-SENTIMENT BASE (third-party reviews)** | **83** | GA 72 gmaps (68×5★·2×2★·2×1★) + GS 7 gmaps + 2 greatschools + 2 niche | GA's *actual* customer sentiment (theme incidence, complaint-to-complaint) | cross-brand star comparison; magnitude (solicitation-inflated, §4) |
| **BRAND-CLAIMED BASE (own-site)** | **30** | GA 2 + GradSol 8 + L4L 20 own-site testimonials | "claimed strengths / retention language" | a satisfaction benchmark (100% positive by construction) |

**The brand-sentiment base is tiny by category nature, not by capture failure** (COVERAGE GO/NO-GO: the thin review tail is "a documented characteristic of the vertical"). GA's 72 Google Maps reviews are the *only* real third-party customer voice in the study — read **themes** (the coach = the product), never the 68/72 ratio.

**Per-brand creative denominators** (the cross-brand share-of-voice frame — `prevalence.md` / `03`):

| bucket | meta (copy) | google (scale) | total ads | share of 1,110 |
|---|---:|---:|---:|---:|
| graduation_alliance | 56 | 581 | **637** | **57%** |
| learn4life | 14 | 242 | 256 | 23% |
| ahsa | 7 | 177 | 184 | 17% |
| grad_solutions | 4 | 29 | 33 | 3% |
| chancelight | 0 | 0 | 0 | 0% |

---

## 4. BIAS-CORRECTION FRAME — NON-NEGOTIABLE

This is a **creative-led** category: the differentiator is angle/offer teardown + demand voice, not a review distribution. The bias rails are different from an ecommerce study but no less load-bearing.

### 4a. Ad-copy reads
1. **Copy is read off Meta ONLY.** All 1,029 Google ads have empty `text` — they give scale, format, and run-dates, never copy. Any "GA says X" claim cites a **Meta** id. (`text`-less Google ads = volume signal only.)
2. **GA Meta is 56 of ~102.** The missing ~46 are near-duplicate per-state variants of captured templates — **distinct concepts are covered, exact per-state frequency is undercounted.** Report concept coverage, never a precise "X% of GA ads."
3. **Ads are kept by last-shown / `is_active`, not launch date.** A 2023-launched evergreen still running is in — correct for a *creative inventory*, wrong as a "what's new" signal. Don't read ad `created` as recency without checking `is_active`/`end`.

### 4b. Review surfaces — never comparable on raw stars
| Surface | Engine | Shape | Read it as |
|---|---|---|---|
| **GA own-site (2)** + L4L/GradSol testimonials | brand-curated | 100% positive | **CLAIMED STRENGTHS — not satisfaction.** Curated; zero negative tail by construction. |
| **GA Google Maps (72)** | Google | 68×5★ · 2×2★ · 2×1★ | The **only real third-party tail.** Use themes + the n=4 complaint base — **NOT the 68/72 ratio** (solicitation-inflated). |
| **Niche (2) / GreatSchools (2)** | curated/featured + PX-walled | featured-only | Directional only; thin sources (<30, rail 6). |

**THE RULES THE LENSES MUST FOLLOW:**
1. **Never compare review-star averages across brands or surfaces.** GA's 5★-heavy Maps vs anyone is a surface artifact, not a quality verdict. (And cross-brand it's mostly moot — only GA + GradSol have any in-window third-party reviews.)
2. **Compare by THEME INCIDENCE, complaint-base to complaint-base.** The valid read is *"58 of GA's 68 five-star reviews credit a named coach"* and *"GA's n=4 sub-5★ tail is pace-bait-and-switch + unresponsive support"* — never a star average.
3. **5★ magnitude is INFLATED by active review-solicitation** — a 2★ reviewer reports being asked for a Google review *"every single day"* `[graduation_alliance · google_maps · Ci9DQUlRQUNvZENodHljRjlvT2xRMlZGQTNaVzVNVG5WTVYzazFXRWRwU20wdFRGRRAB]`. Use review **themes** as signal, not the ratio. Report the inflation as a finding, not a buried caveat (rail 3).
4. **The on-topic VOC (2,957) is demand voice, NOT customer sentiment.** It tells you what the *market* fears/wants (the messaging gold), not how GA's *customers* feel. Don't blend the two bases.
5. **The "real diploma, NOT a GED" wedge and the state/district white-label are GA's two claim-level differentiators** but the white-label means only **1 of 56** GA Meta ads names "Graduation Alliance" — brand equity is forfeited for local legitimacy (a finding for the competitive/angles lenses, not a data error).

---

## 5. WINDOW-ALIGNMENT RULES — which comparisons are valid

| Source | Date span (in-window) | Note |
|---|---|---|
| reddit | 2024-12-25 → 2026-06-24 | full 18-mo window, all 12,048 dated |
| google_ads | 2023-07-31 → 2026-06-16 | **kept by last-shown/active** — launch dates pre-date the window for evergreens |
| **youtube** | 2025-06-24 → 2026-06-24 | **dates APPROXIMATE** — relative timestamps ("N years ago") snapped to capture. **Coarse bins only; NO exact time-trend.** |
| instagram | 2025-01-27 → 2026-06-24 | recent-only |
| tiktok | 2025-01-21 → 2026-06-24 | recent-only |
| meta_ads | 2024-01-12 → 2026-06-23 | kept by active |
| google_maps | 2025-06-24 → 2026-06-17 | recent-only |
| ownsite | 2025-06-04 → 2026-05-28 | recent-only |
| niche / greatschools | 2025-05 → 2025-10 | thin (<30), directional |

**RULES:**
- **Window is aligned at ≥2024-12-25 (18 months).** Within-18-month cross-source theme comparison is valid; **do NOT trend before the cutoff** — that data was dropped by design (2,688 stale records excluded).
- **NO cross-brand time-trend claims.** Several competitor social slices are recent-only and ads are kept-if-active; trend lines would compare misaligned windows. (FB "recommend %" recs all dated 2023 → excluded by the rule; relax FB-only if that signal is wanted.)
- **YouTube = thematic/persona voice only**, never "growing/declining over time."
- **When a section leans on a recent-only, thin, or single-source slice, flag it** (rail 2 / rail 5 / rail 6).

---

## 6. Lens checklist (carry into every Wave-1 lens)
- [ ] Demand prevalence counted against the **2,957 on-topic VOC base** (state "of on-topic VOC, n=2,957"), with Reddit sub-tier + source mix.
- [ ] Ad-copy claims cite a **Meta** id (Google ads are copy-less); concept coverage, not precise %, given GA's 56/102 capture.
- [ ] Brand sentiment read off GA's **72 Maps reviews by theme**, complaint-to-complaint — never the 68/72 ratio; flag solicitation inflation.
- [ ] Own-site testimonials (30) treated as **claimed strengths**, never satisfaction.
- [ ] r/Teachers kept in the **educator/B2B** read, OUT of the demand read; mega-thread inflation respected (sub diversity > raw counts).
- [ ] No theme counted against the 10,037 off-topic VOC as if it were signal.
- [ ] No time-trend across sources; YouTube coarse-bin only; recent-only / thin sources flagged.
- [ ] ChanceLight's 0-everywhere treated as the strategic whitespace finding, not a hole.
- [ ] Single-source themes flagged explicitly.
