Product May 8, 2025 7 min read

ICP Scoring Inside an Email Thread: What Signals Actually Matter

By Jo Thomas, CEO & Co-Founder · Enrola

Abstract data scoring visualization representing ICP fit assessment from email signals

ICP scoring was designed for top-of-funnel qualification. You define firmographics — company size, industry, geography, revenue band — and you score inbound leads against those criteria the moment they enter your CRM. High-fit leads get worked; low-fit leads get deprioritized. The logic is clean.

But there's a category of signal the firmographic model doesn't capture: what the prospect does once they're in a live email thread. The signals that predict whether a specific reply thread will convert are qualitatively different from the signals that predict whether a lead should enter the pipeline at all. And most sales teams treat both the same way — which means they're scoring fit at the wrong stage with the wrong inputs.

What standard ICP scoring captures

The classic ICP scoring model works from a pyramid of three layers. At the base, firmographics: company size by headcount or revenue, industry vertical, geographic market. This layer is easy to populate from a LinkedIn scrape, a Clearbit enrichment pull, or a ZoomInfo record. In HubSpot, it typically maps to standard company properties or custom fields like ideal_customer_profile_tier synced from enrichment. In Salesforce, it might live as a formula field on the Account object that rolls up industry, employee range, and ARR estimate.

The middle layer is technographics: what tools the company runs. This matters for integrations fit — a prospect running Salesforce and Gmail is a better fit for a tool with those native connectors than a prospect running an obscure ERP. Technographic data degrades faster than firmographic data and requires dedicated providers (BuiltWith, HG Insights, Bombora technology intent feeds) to maintain with any reliability.

The top of the pyramid is intent signals — evidence of active purchase consideration. Broadly, this comes from third-party intent data (the Bombora / G2 / TrustRadius category) and first-party behavioural data (page visits, pricing-page time, content downloads). Both are probabilistic, both lag actual intent, and both are blind to the richest signal source available once a prospect enters a live email thread.

The signal category most teams ignore: reply content

When a prospect replies to a sales email, the reply content is a direct, first-party, real-time intent signal. It's not inferred from browsing behaviour. It's not aggregated from anonymous company-level research surges. It's the prospect, in their own words, indicating something about their current situation, interest level, and decision-making context.

The challenge is that reply content is unstructured text, and most CRM-side ICP scoring operates on structured fields. A prospect's reply doesn't populate a field in HubSpot's engagement score automatically. Salesforce's Activity object captures the email record but doesn't parse the content for intent signals.

So the scoring model that determines how much SDR time this prospect deserves is based on firmographic and technographic data that was captured before the prospect said a single word — and may have been captured weeks or months before the current conversation.

The reply signals that actually predict deal progression

In practice, there are several dimensions of reply content that correlate with higher deal conversion rates. We're not claiming these are universal laws — ICP definitions vary, and what signals matter depends on your product and sales motion. But these patterns hold across the reply data we analyze:

Question specificity

A reply that asks a specific question about capability or integration — "does this connect to our existing Salesforce setup?" or "how does it handle multi-rep accounts?" — signals a qualitatively different level of consideration than a reply that says "thanks, I'll have a look." The former implies the prospect has already mentally placed your product into their stack and is evaluating fit. The latter is an acknowledgment, not an intent signal.

Question depth can be scored on a rough scale: generic acknowledgment (low), category question (medium), specific feature/integration question (high), commercial question about pricing or contract terms (very high). A single commercial question in a first reply thread is one of the strongest deal-progression indicators in the stack.

Stakeholder references

Replies that mention other people — "I'll need to loop in our Head of RevOps" or "our CTO would need to be involved for the technical review" — indicate that the prospect is thinking about internal buy-in, which is a meaningful conversion signal for multi-stakeholder B2B sales. This is information that the firmographic model never captures: company size tells you there are probably multiple stakeholders, but the reply tells you the prospect has already begun the internal advocacy process.

Timeline language

Phrases like "we're evaluating options for Q3" or "we have a review coming up in September" provide explicit buying-timeline context. This is genuinely rare — most prospects don't volunteer timelines — so when they do, it should weight heavily in triage logic. It moves the thread from "general interest" to "active evaluation with a deadline."

Problem framing

A prospect who replies by describing their current pain in specific operational terms — "we're running three SDRs off a shared inbox and it's a mess" — is demonstrating active problem awareness. This is distinct from a prospect who confirms they received your message. Problem-framing replies have a substantially higher conversion rate in thread analysis because they show the prospect is in buying mode, not just politely engaging.

Why this matters for SDR queue prioritisation

Here's the practical implication. Your SDR queue at any given time contains a mix of threads in various reply states. If you're prioritising that queue based purely on firmographic ICP scores — which most tools feed into Salesforce or HubSpot for queue ordering — then a highly engaged mid-fit prospect asking a specific commercial question in their reply will rank below a cold high-fit prospect who hasn't responded yet.

That's an inversion. The engaged mid-fit prospect is warmer, more responsive, and in an active response window. The cold high-fit prospect may be higher on the ICP pyramid, but they haven't shown any intent signal. Threading in-thread reply quality as a dynamic overlay on top of static ICP scores produces a better queue.

We're not saying firmographic ICP scoring should be replaced. We're saying it needs to be complemented by a second-pass scoring layer that fires once a reply enters the thread — one that reads the content of what the prospect actually said and adjusts the thread priority accordingly.

The Apollo / Outreach model and its gap

Toolchains like Outreach and Salesloft do a good job of tracking reply rates and tagging replies by sentiment category (positive / neutral / negative / out-of-office). Some have basic intent classification built in. But these tools were designed around sequence management, not thread-level ICP re-scoring. The sentiment tag tells you "this reply was positive" — it doesn't tell you "this positive reply contained a commercial question from a Director-level title at a 150-person company, which pushes this thread from a 72 ICP score to a 91 prioritisation score."

Apollo is strong on firmographic enrichment and lead filtering, but its reply analysis doesn't read the content of what was said in the thread. The engagement score in HubSpot tracks opens, clicks, and reply events — but the Engagement Score property doesn't parse reply content for intent signals either.

The gap is the intersection of thread-level content analysis with ICP criteria. That's not a criticism of those platforms — it's simply not what they were built to solve. It's a distinct layer in the qualification stack.

A practical scoring framework for reply threads

For teams that want to implement this manually before deploying tooling, here's a workable three-dimension framework:

Company Fit — standard ICP firmographic match, scored 0-100 against your defined criteria (employee range, industry, geography, revenue band if available). This is your static baseline score, pulled from CRM enrichment.

Role Fit — prospect's job title matched against your target persona list. A Director or VP of Revenue Operations at a B2B SaaS company is a higher-fit role than a Sales Coordinator at the same company. Scored 0-100, updated if the contact's title changes in CRM.

Intent Signals — scored from the reply content using the categories above: question specificity (0-30 points), stakeholder references (0-20 points), timeline language (0-25 points), problem framing (0-25 points). This layer updates dynamically as the thread develops.

Weighting these three dimensions — say, 35% Company Fit, 30% Role Fit, 35% Intent Signals — produces a composite thread priority score that reflects both who the prospect is and what they're actually showing you in the conversation. A thread with a 70 Company Fit, 85 Role Fit, and 82 Intent Signal score should be worked before a thread with a 90 Company Fit, 85 Role Fit, and 20 Intent Signal score. The first prospect is talking to you; the second hasn't said anything useful yet.

The data from reply threads is already in your inbox. The question is whether your triage logic is using it.

← Back to Blog Try Enrola Free