Skip to main content

Methodology

How we collect, classify, and surface tenders

Source coverage

We monitor 127 active procurement portals across 50+ countries — central-government feeds (UK Find a Tender, TED Europa, Saudi Etimad, South African eTenders, Norway's Doffin, India's eProcure CPPP, etc.), regional platforms (state, province, and municipality-level publishers), inter-governmental sources (the Asian Development Bank, the Islamic Development Bank, the African Development Bank, the EBRD), and a long tail of sector-specific portals (rail, healthcare, defence). Every active source has its own dedicated landing page under /sources.

The catalog grows continuously. When a user submits a new portal at /submit-source, an AI-driven onboarding analyser renders the page, captures network traffic, infers extraction selectors, and writes the rules — usually within two minutes.

The 5-step extraction ladder

Every page we ingest walks through a five-step ladder. The first step that yields a usable tender list wins; later steps are skipped to keep cost and latency low.

  1. Step 0 — API extractors. When a portal exposes an official feed (TED's Search API v3, UK FTS / Contracts Finder OCDS, Etimad's Async JSON, CEJN's GetTenders POST endpoint, etc.), we hit it directly. Highest quality, lowest cost, no AI involvement.
  2. Step 1 — Knowledge-graph selectors. For portals we've previously learned, we apply the cached CSS selectors directly.
  3. Step 3 — Backend extraction rules. Admin-defined CSS selectors per site, versioned and confidence-scored.
  4. Step 4 — Kimi AI extraction. When deterministic methods fail, the Moonshot Kimi Code CLI reads the rendered HTML and emits structured tenders. AI use is disclosed in our App Store filings and gated behind opt-in consent in the iOS app. See our About page for the full AI declaration.
  5. Step 5 — Site-specific HTML extractors. Hand-written parsers for known portal layouts (Doffin, eTenders Ireland, e-Licitatie Romania, Atexo CMS deployments across francophone Africa, etc.). The fallback when AI also fails to find a clean structure.

Normalisation

Every extracted record is reduced to a canonical shape — title, description, organization, country, language, currency, value, deadline, publishedAt, cpvCodes, tenderType, procedureType, sourceUrl, sourceId. Currency stays in the source-disclosed ISO code; we never synthesise or convert. CPV codes follow the EU 2024 taxonomy.

Deduplication

The same tender often appears on two or three portals (national + EU + regional). We dedup using a content hash that prefers (siteId, sourceId) when available, falling back to (canonical title, source URL). Duplicates resolve to the highest-quality record.

Freshness

When the scraping pipeline is active, every monitored portal is re-checked at least every 30 minutes. Detail-page enrichment for Step-0 sources happens on demand. The /sitemap.xml revalidates hourly so search engines see new tenders within the day they're published.

When the pipeline is paused (for cost-control or maintenance), the index continues to serve previously-extracted records — pages render with the most recent data and a lastSuccessAt timestamp surfaces on source pages.

Title quality and auto-derivation

A small fraction of source portals expose the wrong field as the title — Oman Tender Board returns numeric reference IDs; some German pages return a generic page label. When our heuristic detects that the source title is unusable (numeric-only, generic, or under 8 characters), we derive a presentable display title from the available metadata and surface a small “Auto-derived title” caption so the provenance is visible.

What we don't do

  • We don't fabricate data. Where a field is empty at the source, it stays empty.
  • We don't synthesise estimated values when none are disclosed.
  • We don't translate tender content into English unless a user explicitly asks for translation on the detail page.
  • We don't republish copyrighted commentary from third-party trade press.
  • We don't accept paid placement or sponsored notices.

Corrections and feedback

If a tender on this site is misclassified, mistranslated, or carries the wrong buyer attribution, email yy@datameshconsulting.co.uk with the URL. We review and correct in-place.

Methodology — how DataMesh Consulting collects, normalises and surfaces tender data