This is a recap of Stefan’s talk at the first 9th Floor meetup on February 10, 2025 at Builders House. View the slides.
Who I am and why this event
I built outbound engines — from old school blasts to Clay-orchestrated hyperpersonalized, before it was cool. I specialized in companies who sell to ecommerce stores (Shopify apps, logistics, payment providers) and started building my own custom datasets about 1.5 years ago from frustration with the available data. I like to build my own tooling.
I wanted this community. It didn’t exist. So I made it. I’m not good at everything — I want to help where I can and learn from you. This isn’t a “copy my 20-step workflow” talk. Start simple. Build from there.
The problem: everyone fishes from the same pond
If everyone starts from the same data, they end up with the same leads, the same message, and the same results. You lose the one edge you actually have: knowing why your customers buy from you and when.
Most lead providers are LinkedIn scrapers at their core. Coverage depends entirely on who you sell to. Series B US founders? Apollo and Clay find most of them. They’re on LinkedIn, categorized, filters work.
Dental clinic founders in Iasi? These people either aren’t on LinkedIn, or their profiles don’t have the signals that any platform’s filters would catch. And with LinkedIn’s November update killing public profile company data, your “fresh” leads might already be stale.
Even if you’re not technical, you now have this easily accessible ability to manipulate and interact with data so that it serves your goals. It’s not easy — but the more you do it, the better you get. And even as models improve, the lessons you learn fighting dumber models compound.
The example: health-advertising.com
I figured it would be a bit stupid to try to convince people to create custom datasets by showing something I’ve been working on for over a year. So I built health-advertising.com as an exploration of alternative company-level data sources. The healthcare advertising landscape in Romania, mapped: 5,461 domains, 7,468 advertisers, 7.2B total impressions, 20,893 ad creatives.
Why Google Ads data?
Google Ads Transparency data is free (with big asterisks), public, and tells you something important: if a business spends on Google Ads, they rely on online customer acquisition. That’s a signal of sophistication and budget.
Almost nobody builds on this data. That’s the whole point — if everyone used it, it wouldn’t be an edge.
It’s not the best option in every situation, but it’s underrated. The value comes when you compose on top of it. You start “talking” to the data: Does their site mention X? Do they use only Google Ads or also Meta, TikTok? How much focus on Maps vs Search? How long have they been running? Did they try and stop? Can I see which agency they work with? Each question adds a layer. Each layer gets you closer to understanding not just who they are, but where they are and what pain they probably have right now.
The honest disclaimers: it lies to you in 25 ways. The two biggest — impression counts cap at 10M+ (they just say “10 million or more”), and the disclosed name is usually the company paying, but sometimes it’s the ad agency. The biggest cost is the hours learning these quirks. After that, it gets way easier.
The pipeline
Six steps from raw ads data to qualified leads. Weekend project — officially took 8 days, to be transparent.
- Pull raw creative data — 1.6M creatives tagged “health” from Google Ads Transparency, 7,468 unique advertiser IDs
- Get the website — You have an advertiser ID and a creative, not a clean URL. Compute the creative link, then OCR it. This alone could be a 1-hour talk.
- Get the CUI — Legal entity names alone aren’t great. You want CUIs and websites. Programmatic search — one of the best tools you should master. Search for the legal entity name, extract the CUI from results. Why not an agent? This is a deterministic task. Agent = more cost, slower, and hallucination risk for no upside.
- Enrich the entity — CUI gives you financial info (turnover, profit, employees) and stakeholder/admin names. This is where the legal entity connection becomes powerful.
- Process the website — Get homepage content, classify the business, ask qualifying questions. “Healthcare” as a generic blob doesn’t help. Dental clinics and hospitals in the same bucket is useless if you sell dental equipment.
- Contact enrichment — This is the punchline. Programmatic search is your enrichment layer.
Programmatic search as enrichment
Most providers are LinkedIn scrapers at their core. Filters, keywords, priority — they don’t fit niche markets well. Your enrichment layer is programmatic search with targeted queries:
Admin Name site:linkedin.com/inJob Title CompanyName site:linkedin.com/in"Clinica SRL" site:linkedin.com/inAdmin Name website.roLegal Entity Name website.ro
Plus: compute emails even without LinkedIn. The legal entity connection unlocks admin-based searches and non-domain LinkedIn matching that no provider can replicate.
When you reach out with real information — not assumptions — the dynamic completely changes. They see that you know things, that you’re not guessing they have a problem, you have concrete clues that they do. The chances they engage in conversation are much higher.
Agents vs. deterministic paths
Agents are great for ideation and testing. But if you know what you need from where, build deterministic paths. There are deterministic tasks where agents give you no advantages and only disadvantages: cost, speed, and hallucination risk.
Is this for everyone?
No. If you’re validating, grab what’s fast and available. It makes no difference if you find 20,000 vs 10,000 potential customers when you have zero customers. This matters when you’re competing and need an edge.
Reverse engineer your customer
What makes your ideal customer a fit beyond industry, employee count, location? What behavior or public signal tells you they have the problem you solve right now?
High-leverage sources to explore: Google Maps and reviews, job postings (hiring intent), technographics, permit data, accreditations, public sector auctions. Each one tells you something different.
Find YOUR source
For Compendi, after struggling with every provider, the unlock was event sponsor lists. Every business has a non-obvious data source that maps their market. The question isn’t “which tool” — it’s “what signal actually predicts my customer?”
Start simple. Build from there. It compounds.