Atlas Research Notebook Methodology: 65 Of 75 Symbol Pages Now Pass A Source-Backed Gate
The Atlas research notebook now serves 65 of 75 symbol pages through a source-backed quality gate that refuses any daily context lacking fetched-source provenance, with the rest held back as unvalidated rather than published thin.
That single number, 65 of 75, is the cleanest way to describe the state of the Atlas research surface as of the 2026-06-01 build window. The architecture stopped trying to publish every symbol and started publishing only what survives validation.
Table Of Contents
- What The 65-Of-75 Validation Split Means
- The Source-Provenance Quality Gate
- Source Authority Tiering In The Renderer
- Symbol Coverage And The Six Asset-Class Hubs
- The Free-Source Spine And SEC Company Facts
- Crawl Coverage As The Real Constraint
- Caveats: What Is Not Proven Or Not Live
What The 65-Of-75 Validation Split Means
A full daily-context run on 2026-05-28 (python3 atlas_dynamic_news_pipeline.py, no canary flag) produced 65 of 75 symbols marked validated_for_runtime and serving. The remaining 10 split into two distinct failure classes recorded in the upgrade ledger.
Two symbols were rejected by the gate as fail-safe-correct: XRP carried a disallowed <a> tag inside a snippet, and MA carried advice language flagged as "buy/sell now". Nine symbols (ABNB, AVGO, GOOGL, MU, NKE, SQ, TSLA, UKOIL, WMT) were parse-fail skips after an NVIDIA HTTP 429 rate-limit forced a fallback model that returned non-JSON. Skipped symbols kept their prior on-disk state, so a failed generation never degrades what is already serving.
The rebuild after that run produced 66 pages tagged index,follow (65 validated plus 1 page already trafficked in Google Search Console) and 9 tagged noindex,follow, with a sitemap carrying 67 URLs before hub pages were added. The indexable flag is no longer tied to a file simply existing on disk. A page is index,follow only when it is currently validated or already GSC-trafficked.
The Source-Provenance Quality Gate
The gate lives in /root/Atlas_Backend/daily_context_quality_gate.py. It enforces a chain of checks that an early draft of the daily-context system did not have. Current-date as_of validation rejects stale dating. A fetched-source provenance requirement means the source manifest must be a subset of URLs actually retrieved, recorded as source_manifest_source=fetched_news rather than text the model invented. The gate also runs stricter stale-forward-year patterns, advice-language patterns, an 8-class asset-class enum, balanced paragraph-tag validation, and a no-HTML-attribute rule.
The reason the gate matters is visible in the quarantine notes. An earlier ETH context shipped a "Dencun upgrade, slated for 2024" line into a 2026 page before the gate existed. After the lock, revalidating every file in /root/Atlas_Backend/cache/daily_contexts/*.json returned 0 validated contexts, because the existing files predated the fetched-source provenance requirement. The 65 valid contexts described above were regenerated against the stricter rules, not grandfathered in.
Runtime serving carries its own guard. In /root/Atlas_Backend/app.py, _check_daily_context() requires validated_for_runtime, a current as_of, a real stored brief_id, an unexpired TTL, a manifest that is a subset of the database sources, and a source_manifest_hash match before any context reaches a reader. An unvalidated symbol returns None rather than a half-checked page.
Source Authority Tiering In The Renderer
Not every cited source carries equal weight, and the renderer reflects that. classify_source_tier() and tiered_sources() in /root/Atlas_Backend/renderers.py order citations official (tier 1) above reputable press (tier 2) above aggregator (tier 3), then tag each with a label.
The tier-1 host list was expanded to include fca.org.uk, esma.europa.eu, boj.or.jp, pbc.gov.cn, imf.org, worldbank.org, and bis.org, covering the FCA, ESMA, Bank of Japan, People's Bank of China, IMF, World Bank, and BIS. Tier-2 was extended with nikkei.com, scmp.com, nasdaq.com, and nyse.com. The verification example is concrete: a GBPUSD report serves Bank of England, Federal Reserve, ONS, BLS, and BEA citations ahead of FXStreet, which sits ahead of DailyForex and Forex Factory.
This tiering was shipped render-time and additive, so it applies to the 65 already-validated contexts with no pipeline rerun and no change to the gate's required keys. That design choice is deliberate: adding tier data as a required gate key would have invalidated the live 65 at serve-time.
Symbol Coverage And The Six Asset-Class Hubs
Public symbol coverage is browsable through the Atlas symbol coverage index. The build adds 6 asset-class hub pages under /var/www/atlas/stocks/<class>/ for equities, ETF, crypto, forex, commodity, and index, plus related-symbol peer cross-links on every symbol page and a hub bar on the /stocks/ landing. Adding the hubs moved the sitemap from 67 to 73 URLs while the symbol count of 75 and the 66/9 index/noindex split stayed unchanged.
The asset-class enum is not cosmetic. The expanded canary run validated DIA as etf, JP225 as index, AAPL as equity, GBPUSD as forex, BTC as crypto, and UKOIL as commodity, each through the same gate. Cross-asset context, from a US large-cap equity to a UK forex pair to spot commodity, runs through one validation path rather than separate ad-hoc renderers.
The Free-Source Spine And SEC Company Facts
The Atlas data spine pulls from OpenBB, DefiLlama, and RSS-fed news, the same public source families that let cross-asset market intelligence sit on one schema. The methodology for those data sources is documented publicly by the providers, and the OpenBB documentation and the DefiLlama site are the canonical references for the data shapes Atlas consumes.
The 2026-06-01 build window shows the free-source layer actively caching primary filings. Between 05:00 and 05:15 UTC, the pipeline wrote SEC company-facts and submissions files for a cluster of large-cap names, including sec_companyfacts_0000320193.json (Apple), sec_companyfacts_0000789019.json (Microsoft), sec_companyfacts_0001045810.json (NVIDIA), sec_companyfacts_0001318605.json (Tesla), and sec_companyfacts_0001326801.json (Meta), each into /root/Atlas_Backend/cache/free_sources/. The market-universe lists nasdaqlisted.txt and otherlisted.txt were refreshed at 04:27 UTC. This is the plumbing meant to carry real reported numbers into symbol pages, marked in the audit as free_sources.py and fmp_data.py still being wired into the daily writer.
The public health probe confirms the backend is live: /api/atlas/health returns ok: true through https://atlas.freedomcore.io/api/atlas/health, and the served HTML carries versioned /css/atlas.css and /js/atlas.js cache tags.
Crawl Coverage As The Real Constraint
The bottleneck is not content volume. Google Search Console Coverage exports show "Discovered - currently not indexed" dominating, with an Atlas baseline of 61 against Crawled-not-indexed of 3, which points at crawl-budget and domain authority rather than thin pages. A gsc_index_monitor.py script with gsc_index_snapshots.json parses the per-property GSC zips and reports reason deltas against a prior snapshot, validated at the Atlas=61 baseline with zero delta on rerun.
The response was structural, not promotional: dropping a sitemap that advertised noindex URLs, adding tier and style hubs with breadcrumbs and BreadcrumbList JSON-LD, and deriving lastmod from content timestamps rather than rebuild time so the sitemap stops claiming false daily freshness. Schema markup follows the public Schema.org BlogPosting vocabulary. The measurement window is the next GSC export in two to four weeks, expected to show discovered-not-indexed trending down. The broader FreedomCore surface and the Maverick dashboard sit alongside Atlas in the same property family.
Caveats: What Is Not Proven Or Not Live
Several items are explicitly not finished. The SEC and financial-data layer (free_sources.py, fmp_data.py) is cached but still unwired into the daily writer, so symbol pages do not yet carry those reported numbers. The redesigned render_email_html template is built and previewable at /var/www/freedomcore/atlas_email_preview.html but is not wired into the live send path, which still falls back to render_brief_email. The 9 parse-fail symbols await a JSON-validated fallback model rung so a 429 stops skipping them. key_levels stays an empty array until a real price feed and a levels source manifest exist, and STRIPE_ENABLED remains false and operator-gated.
The crawl-budget result is unmeasured: the GSC trend gate has a defined baseline but no post-change reading yet. None of this constitutes a forecast about any asset; the notebook documents a validation pipeline and the evidence behind each served page. Read the full set in the Atlas notes notebook.
Browse the Atlas research notebook
FreedomCore Atlas Research →