Methodology
A complete description of how Beeks.ai collects, normalizes, matches, and aggregates prediction market data.
Data collection
A Cloudflare Worker runs on a one-minute cron schedule. On each run it fetches open markets from Polymarket's CLOB API, Kalshi's Trade API, and Manifold's REST API in parallel. Up to 300 markets per run (100 per platform) are ingested.
Each raw API response is normalized into a common schema:
market_id, platform, title,
probability_yes, volume_usd, closes_at,
and source_url. Kalshi prices are converted from cents to 0–1
probability using the mid-price (bid + ask) / 2. Polymarket uses the YES token
price directly. Manifold reports probability natively.
Market matching
Markets across platforms are matched into "consensus events" using a two-signal fuzzy similarity score:
- Jaro-Winkler similarity (75% weight) — computed on the URL-slugified title. Handles minor wording differences and common abbreviations.
- Token overlap (25% weight) — the fraction of meaningful words shared between the two titles, after stop-word removal.
A combined score ≥ 0.85 triggers a match. Markets below that threshold become
their own consensus event. Markets with an ambiguous score (0.70–0.85) are queued
in pending_matches for manual review.
Matching runs within the same category first (e.g., a sports market is only matched against other sports markets), which dramatically reduces false positives.
Consensus probability
For each consensus event, the probability is computed as a volume-weighted average across all matched markets that have a non-null probability:
consensus = Σ(probability_i × volume_i) / Σ(volume_i) When volume is unknown (e.g., Manifold, which uses play-money "mana"), an equal weight is applied. This means Manifold markets have less influence on the consensus probability when paired with high-volume Polymarket or Kalshi markets.
Spread calculation
For events tracked on multiple platforms, we compute:
highest_probability— the highest YES probability across all matched marketslowest_probability— the lowest YES probabilityspread_pp— the difference in percentage points (e.g., 65% vs 58% = 7pp spread)
A spread of ≥ 5pp on a market with ≥ $10K volume typically indicates a meaningful disagreement between platforms — either a liquidity imbalance, different fee structures, or genuine information asymmetry.
Price history
On each cron run, a price snapshot is recorded for every market with a known probability. These snapshots power the history chart on each event page and the Movers feed. Snapshots are retained indefinitely.
Latency
The cron Worker runs every minute. API-to-database latency is under 10 seconds on typical runs. Page renders read from a 60-second KV cache, with D1 fallback. End-to-end latency from a market move on Polymarket to display on Beeks.ai is typically 60–90 seconds.
API
The public JSON API is available at /api/v1/markets.json.
Parameters: limit (max 100), offset, category
(politics, sports, crypto, economics, science, other).
Results are sorted by total_volume_usd descending.
Rate limit: 60 requests per minute per IP on the free tier.
Limitations
- Non-binary markets (e.g., Manifold multiple-choice, Kalshi scalar contracts) are excluded from consensus matching in v1.
- Volume figures are denominated in USD where available. Manifold volume is reported in mana (play money) and treated as zero for weighting purposes.
- Matching accuracy is ~95% for high-volume political and economic markets, lower for niche or ambiguously titled markets.
- Kalshi multi-leg parlay markets appear as individual events with complex titles — they are not excluded but do not match well cross-platform.