I'm an operator-builder by background, most recently the person you call to make messy GTM and data systems actually work. When I helped stand up InDrive's corporate venture arm, we invested across frontier markets—and quickly hit the limits of off-the-shelf databases.
In 2023 I was tasked to help InDrive (2nd most downloaded ride-hailing app) to set up corporate venture arm. We invested across frontier markets.
Try mapping logistics startups in Pakistan or Egypt using off‑the‑shelf databases and you quickly hit the limits: partial coverage, stale records, and rebrands that fell through the cracks. We'd see three different versions of the same company, miss a quiet pivot, or chase a funding round that never made it into the feeds. That gap—coverage, freshness, identity—kept showing up in the work, not in the slideware. That's the pain that pushed me to join Extruct.
To keep pace, we built our own data solution and turned to web scraping: public registries, funding announcements, LinkedIn/X graphs, hiring pages, job postings, and app store marketplaces. It delivered volume, not judgment; unknown unknowns stayed unknown, and analysts drowned in lists.
Then in 2024, deep‑research agents became practical. They didn't make databases obsolete overnight, but they offered a way to mitigate the biggest issues we kept hitting: data freshness, limited source breadth, and messy identities across systems. That opened a better path for market research—keep the volume tools for what they're great at, and layer structured agents on top when you need depth, recall, and judgment.

General‑purpose LLM chatbots are impressive, but they assume you already know how to ask and what "good" looks like. In practice, investors need structure more than open‑ended chat.
Extruct AI tackles research through a "problem structure" lens: understanding that each company is an independent unit of investigation, and that each research question is distinct. It presents the research as a table—rows as companies, columns as questions—where every cell is handled by its own dedicated AI agent instance. This architecture means the research process leverages a much larger effective context window (thanks to many parallel agents with their own memory and tool calls), enabling deeper dives and more robust outputs per company and question.
Traditional approaches might sequentially process reports, but Extruct's setup allows you to treat each company systematically and independently, with each agent specializing in a unique slice of the research. This makes it much easier to revisit the table and add new columns or data points later on, without redoing the initial groundwork. The result: richer context, higher consistency, stronger recall, and insights that improve the more you ask.
Databases like Crunchbase, PitchBook, Dealroom, CB Insights, and Harmonic are valuable—especially for volume tasks. But they're incomplete for frontier coverage and for questions that require nuance.
Your deal flow doesn't start or end in a database anyway. The real funnel is broader:
The job is to unify those streams, not to pretend one vendor can replace them.
There are jobs where a database is the fastest and cheapest option:
Could Extruct do these? Yes, but it's slower and costlier per unit because each lookup is a "column" in a deep table. Use volume tools for volume work.
💡 Note: For broad counts, generic filtering, or geo‑stage slices, a database wins on cost and speed. Save Extruct for precision work where accuracy matters.
Use Extruct when you need structured judgment, not just rows:
You already know the niche and want to map it precisely.
You want to close recall gaps after you've exhausted database search—"What did we miss?"
You have sharp criteria, e.g., "Private digital payment processors serving the legal cannabis industry that enable compliant, auditable cash‑to‑digital payments in regulated EU markets."
You're enriching known companies with hard‑to‑model attributes: supplier concentration risk, manufacturing complexity, regulatory exposure, founder credibility signals.
You need one search across all your mediums—email, CRM notes, PDFs, vendor databases, and the open web—with de‑duplicated, normalized, rankable results tied to the same company/person IDs.
Because research is a table, you can add columns—"Manufacturing complexity" or "Go‑to‑market motion"—without rerunning everything.
And because those questions cross systems, Extruct doesn't just "chat"; it retrieves across your stack and returns structured answers mapped to your schema. That's why the next step isn't another tool, but an intelligence system that handles normalization and entity resolution.
Two foundational capabilities make the intelligence system useful:
Extruct provides this middle layer so your LLM can answer precise, personalized questions across your mediums—email, CRMs, databases, LinkedIn—on top of clean, connected data. Private markets have no ticker symbols, CUSIPs, or ISINs. Your providers don't agree on identities, and many CRMs use email addresses as primary keys. That's how the same founder spreads across four records.
Think less "one more app" and more MCP server—a modular backend that integrates:
Start with the firm, not the tool:
Then choose the simplest thing that works:

Extruct is an intelligence layer with a deal‑origination function—not the other way around. We minimize manual CRM entry, unify your sources, and let you ask hard, specific questions with confidence.