TrueMarket North Star

The Four Pillars

Evaluated in order. Each depends on the one before it. Don't skip ahead.

This file is strategy only. Tactical work lives in PILLAR_<n>_PLAN.md files. Operational issues and ticket-level execution live in Clubhouse.

Pillar 1 — Pool Coverage (CURRENT PRIORITY)

Can the system find the valuer's comps at all? Do they appear anywhere in the candidate pool?

Metric: % of valuer's picked comps found in the raw candidate pool
Target: 95%
Baseline (2026-04-07): 41.2%
Latest: see docs/CURRENT_PILLAR.md for the current phase status and metrics.

Pillar 2 — Ranking (R@20)

Do the valuer's comps land in the top 20 scored candidates?

Metric: R@20 — proportion of valuer comps in the top 20
Target: R@20 ≥ 95%
Baseline: 14.5%
Blocked until Pillar 1 hits 95%

Pillar 3 — Measurement Reliability

Same input = same output every time. Deterministic scoring required.

No numeric target — pass/fail on reproducibility

Pillar 4 — Valuation Accuracy

Final price estimate within ±10% of the valuer's adopted value.

Baseline: Unknown

Pipeline Rules (NON-NEGOTIABLE)

These rules govern how the pipeline operates. Every agent reads this file every session. No exceptions, no drift.

Search

HARD RULE — FIXED RADIUS. NO EXCEPTIONS.

Search uses ONE fixed radius per run. No radius expansion, no iterative widening, no nextRadiusStep across radii.

Rationale: Valuer Dan expands radius iteratively, but only because he hand-reviews each expansion pass with a fine-tooth comb. We can't replicate that yet. So we pick a radius sized to capture ~95% of likely comps for the subject type and stick with it.

Paginating within a single radius is allowed. Widening to the next step is prohibited.

Type-code translation (canonical → source-native codes like HOUSE → RPP HOUSE) is a SEPARATE concern and is allowed. It must never be labelled search_expansions to avoid confusion with radius expansion.

Fixed radius search. One search per run using params from search_parameters_master. No expansion rounds, no radius widening, no fallback searches.
Search params come from the database. search_parameters_master table, keyed by property_family + density. Don't invent parameters.
Date window = assessment_date minus dateRangeMonths, plus 180 days forward (CLB-2091). Valuers use comps up to 6 months after assessment date.
Fetch ALL results from RPP. No early-stop thresholds. Paginate through everything RPP returns, up to 10,000 record API ceiling.
Pagination offset = page index (0, 1, 2, 3...). RPP uses page-index pagination. Do NOT send byte offsets like 0, 200, 400 — those return HTTP 400. [PROMPT-INFRA-OK]
Property types: Send all types (HOUSE+UNIT) in a single combined API call. Do not loop through types separately.
RPP API budget: No pipeline fetches or API-spending operations without Jon's explicit per-conversation approval. Default to NO on ambiguity.

Data

Store ALL candidates. No row caps. Sort by sale_date DESC before storage. Cap at 2000 only for JSONB size.
All addresses must be geocoded (subjects AND comparables) via Google Geocoding before a run counts.
Match on CoreLogic ID first. candidateMatchesTruth() uses 4-tier matching: (1) CoreLogic ID, (2) hardened address, (3) address+date ±30 days, (4) building-level fallback for strata.
Google geocoding preflight: Check google_geocoded_addresses for normalised address before RPP resolution. Use Google's version first, fall back to raw.
RPP UNIT type is family-ambiguous. A UNIT row may be residential, commercial, industrial, or mixed-use. Never infer family from the TYPE field or from subtype strings (unit standard, house one storey). Use zoning, floor area, and subject-family context instead. See docs/DATA_MODEL.md — RPP Property Type Taxonomy. [PROMPT-INFRA-OK]
Subject property type comes from the valuations table, never from RPP. Use valuations.property_type + valuations.property_classification as the authoritative subject family. Never infer subject family from the RPP response (rppType, subtype string, or any derived field). Inferring from RPP re-introduces the same family-ambiguity problem as rule 12. [PROMPT-INFRA-OK]

Logging

Every API call logged with pipeline_run_id and search_run_id. No exceptions. If it touched an external API and isn't in api_call_log, it's a bug. [PROMPT-INFRA-OK]
search_run_id backfilled after persist via repository.js UPDATE.

Measurement

The 451 stratified cohort is the only test set. No separate gold cohort, no ad-hoc addresses.
Pillar 1 = pool coverage. Comp found anywhere in the candidate pool = hit. Not R@20, not ranking. Pool coverage only.
Every pipeline run shows on the dashboard with search params, candidates, pool coverage, and learnings for missed comps.
Learnings are evidence-based. For each missed comp: check CoreLogic ID in pool, check date window, check radius, check property type. Facts, not speculation.

Data Sources

RPP web portal vs CoreLogic API are different data paths. The pipeline uses the RPP web portal via session-cookied HTTP (rpp-direct-client.js + autopilot/adapters/rpp-direct-adapter.js). It does NOT use the CoreLogic partner API (corelogic-client.js / enrichProperty / /details endpoint). The two are different services, different auth, different field shapes, different contractual basis with Cotality. Never substitute one for the other. If a task seems to need partner-API detail calls, fix the RPP-portal path first — the portal already returns the data in most cases. Calling the partner API as a "backfill" is a contractual + cost decision, not a routine fix, and needs explicit Jon approval per Rule 7. If Jon explicitly asks for the partner API address matcher, the endpoint is /search/au/matcher/address and the property ID can live at matchDetails.propertyId; /search/au/address/matcher is the wrong path. See docs/DATA_MODEL.md — Two RPP data paths.

Process

Preflight validation before every run. Verify comps have geocodes and correct address format.
One source of truth. This file. If it's not written here, it's not a rule.

Reliability invariants

No silent fallbacks for Pillar 1 measurement. When the primary RPP source returns ok=false in stage_3_raw_candidates.sourceStats, the run MUST NOT reach status=review as a measurement. Vector-text / pgvector / any non-RPP fallback results carry a source ≠ rpp marker and recall scoring MUST refuse to count them toward Pillar 1. A pilot whose only candidates came from a fallback is invalid by definition. Why: 2026-04-29 Birchgrove subject completed status=review with 114 vector-text candidates from Vincentia/Cessnock/Yeppoon while RPP was rate-limited; the result looked like a normal recall failure when it was an auth blocker.
Deployed code must equal master before a measurement run. /opt/palermo/apps/truemarket/ is a flat copy, not a git checkout — pre-run gate verifies it matches origin/master HEAD (or document the exact commit it's on). Pilots against drifted code are invalid by definition. Why: 2026-04-29 sidecar ran 22h on stale code, masked PR #17/#18/#19 from production for hours after merge.
Process env must equal env-file at restart time. After updating /etc/palermo/truemarket.env, the running pid's /proc/<pid>/environ MUST reflect the new values. Restart procedure is part of the env update, not a follow-up. Why: 2026-04-29 RPP_BROWSER=chrome-linux was added to env file but the sidecar process kept its 23h-old environ until manual restart.

Source-of-Truth Hierarchy

When sources disagree about the current operational state, the higher item wins. Lower items can annotate or remind, never override.

Live code / running service — what actually executes. Always wins over any document or memory.
Clubhouse ticket status — task ownership, scope, acceptance criteria. The contract for "is this work done?"
CHECKPOINT.md / .agent-handoff.md — current state in the project.
Mem0 — curated cross-session decisions, root causes, milestones. Ground truth at the time it was written; verify before acting.
Claude-mem (auto observations) — index, not truth. Use for orientation; confirm against live sources.
git log / git blame — who-changed-what, when.

This mirrors the ordered hierarchy in ~/.claude/rules/core.md. The two files MUST stay in sync on the ordering. The carve-outs below are project-specific elaboration; if they prove generally useful, copy them back to core.md to keep the doctrine unified.

Carve-outs (the hierarchy is not a single linear order for every question):

For "what actually changed in this commit / who changed it / when" — git log / git blame is authoritative, regardless of its position in the operational ranking. Memory and checkpoints can be wrong about history; git cannot.
For "what does the database actually show" — query the DB. Tables (pipeline_runs, valuation_comparables, corelogic_property_cache, api_call_log, search_runs) are the recorded outcome of runs and outrank docs that describe runs.

Practical rule: pick the source with the most direct claim on the question. If a doc says X but grep says it doesn't exist, grep wins. If memory says a ticket is closed but Clubhouse says it's open, Clubhouse wins. If checkpoint says PR #N is blocked but live code shows the fix landed, live code wins.