Skip to main content

Overview

Deep Search is the asynchronous discovery path for cases where instant index-based search is not enough. It ranks companies against natural-language criteria using web, Extruct DB, Maps, and LinkedIn, and is implemented through the discovery_tasks endpoints in API Reference.

This Path Works Best When

  • Ranking depends on qualitative criteria that are hard to express as firmographic filters.
  • You need explanations and criterion-level scoring for each result.
  • You are willing to wait for an asynchronous task in exchange for more deliberate evaluation.

Choose Another Path If

  • You want fast recall-first exploration over the Extruct index. Use Semantic Search.
  • You already have a strong seed company and want instant similarity expansion. Use Lookalike Search.

Prerequisites

export EXTRUCT_API_TOKEN="YOUR_API_TOKEN"
Generate tokens in Dashboard API Tokens. For full setup, see Authentication.

Endpoints used

Workflow

1) Create a Deep Search task

Set desired_num_results to your target. Task target is capped at 250. If you omit criteria, Extruct infers evaluation criteria from query. The create-task response includes id, which you will reuse as TASK_ID.
TASK_RESPONSE=$(curl -sS -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 150
  }')

TASK_ID=$(echo "${TASK_RESPONSE}" | jq -r '.id')
echo "${TASK_ID}"
Requires jq. If unavailable, copy id manually from response. Optional: if you want to define the scoring rubric yourself, use this alternate create request instead of the minimal one above.
TASK_RESPONSE=$(curl -sS -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 150,
    "criteria": [
      {
        "key": "has_logistics_focus",
        "name": "Logistics Focus",
        "criterion": "Company serves freight forwarding or logistics operations."
      },
      {
        "key": "b2b_gtm",
        "name": "B2B GTM Fit",
        "criterion": "Company sells primarily to business buyers."
      }
    ]
  }')

TASK_ID=$(echo "${TASK_RESPONSE}" | jq -r '.id')
echo "${TASK_ID}"

2) Check task progress

Use task status and counters to monitor progress (num_results_discovered, num_results_enriched, num_results_evaluated, num_results).
curl --get "https://api.extruct.ai/v1/discovery_tasks/${TASK_ID}" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}"
When to proceed: continue after num_results_evaluated starts increasing.

3) Retrieve results with pagination

RESULTS_RESPONSE=$(curl -sS --get "https://api.extruct.ai/v1/discovery_tasks/${TASK_ID}/results" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  --data-urlencode "offset=0" \
  --data-urlencode "limit=50")

echo "${RESULTS_RESPONSE}" | jq '.results[0]'
Example result fragment:
{
  "company_name": "FreightFlow",
  "company_website": "https://freightflow.example",
  "relevance": 86,
  "scores": {
    "has_logistics_focus": {
      "grade": 5,
      "explanation": "Primary product serves freight forwarding operations.",
      "sources": ["https://freightflow.example/about"]
    },
    "b2b_gtm": {
      "grade": 4,
      "explanation": "Positioning and case studies are B2B-focused.",
      "sources": ["https://freightflow.example/customers"]
    }
  }
}
Rule of thumb: shortlist candidates with grade 4-5 on must-have criteria and no grade below 3 on blockers. Use explanation and sources for manual verification before enrichment. When to proceed: move forward once you have enough high-fit candidates for your workflow.

4) Resume discovery when needed

Use resume to request additional results while staying within the task cap.
curl -X POST "https://api.extruct.ai/v1/discovery_tasks/${TASK_ID}/resume" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{"desired_new_results": 25}'

5) Move selected companies to AI Tables

After reviewing Deep Search output, you can move shortlisted companies into AI Tables for enrichment and scoring. This is only one handoff path. AI Tables also works independently when you already have your own company list. Use this handoff snippet to map the returned company_website into AI Tables rows[].data.input:
SHORTLIST_INPUT=$(echo "${RESULTS_RESPONSE}" | jq -r '.results[0].company_website')

curl -X POST "https://api.extruct.ai/v1/tables/${TABLE_ID}/rows" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d "{\"rows\":[{\"data\":{\"input\":\"${SHORTLIST_INPUT}\"}}],\"run\":false}"
If you are starting fresh, create TABLE_ID first in AI Tables Basics. Prefer Deep Search over Semantic Search or Lookalike Search when the ranking depends on criteria like:
  • whether a company serves a specific workflow or sub-vertical
  • whether it sells to a specific buyer or team
  • whether it meets a custom ICP definition
  • whether you need explicit evidence for why each company ranked well
If your need is mostly “find more companies like this” or “search broadly and filter by firmographics,” stay on the instant index-based paths first.

Troubleshooting

401 Unauthorized

The token is missing or invalid. Check that EXTRUCT_API_TOKEN is set and the header is exactly Authorization: Bearer ${EXTRUCT_API_TOKEN}.

422 Unprocessable Entity

Common causes:
  • Invalid JSON body.
  • desired_num_results or desired_new_results out of allowed range.
  • Unsupported request fields.
Validate your request body locally before sending: echo '<json-body>' | jq empty.

Task progress seems slow

Deep Search is asynchronous and includes criteria evaluation per candidate. Track progress through task status and counters. Poll every 15-30 seconds and continue while num_results_evaluated is increasing.