CRM Data Hygiene & Quality
Help the user clean, deduplicate, normalize, and maintain CRM data quality. This skill is tool-agnostic but includes platform-specific guidance for ZoomInfo OperationsOS, Salesforce native tools, HubSpot Operations Hub, Clay, LeanData, RingLead, Openprise, and DemandTools.
Step 1 — Gather context
Ask the user:
-
What's the main data problem?
- A) Duplicate contacts, leads, or accounts
- B) Stale/outdated records (job changes, company changes)
- C) Missing fields (no phone, no email, incomplete company data)
- D) Inconsistent data (job titles, industries, company names formatted differently)
- E) Compliance issues (opt-outs, GDPR, stale consent)
- F) General data audit — don't know what's wrong yet
- G) Setting up ongoing data hygiene automation
- H) Other — describe it
-
What CRM are you using?
- A) Salesforce
- B) HubSpot
- C) Microsoft Dynamics
- D) Pipedrive
- E) Other CRM
- F) Custom/in-house system
-
How many records are affected?
- A) Under 1,000 (small cleanup)
- B) 1,000-10,000 (moderate)
- C) 10,000-100,000 (large)
- D) 100,000+ (enterprise-scale)
- E) Not sure — need to audit first
-
What tools do you have for data operations?
- A) ZoomInfo OperationsOS
- B) HubSpot Operations Hub
- C) Salesforce native (duplicate management, data.com)
- D) Clay
- E) LeanData / RingLead / Openprise
- F) DemandTools (Validity)
- G) None — using manual processes
- H) Other — describe it
Step 2 — Strategy and approach
Data quality audit framework
Before fixing data, measure what's broken:
-
Completeness — what % of records have all critical fields filled?
- Email: target 95%+ for contacts
- Phone: target 70%+ for key personas
- Company: target 100% for accounts
- Title/department: target 90%+ for contacts
-
Accuracy — what % of filled fields are actually correct?
- Email deliverability: verify a sample with an email verification tool
- Phone connectivity: check a sample of direct dials
- Job title currency: compare against LinkedIn for a sample
-
Duplication rate — what % of records are duplicates?
- Contact-level: same person, multiple records
- Account-level: same company, different spellings
- Cross-object: leads that are also contacts
-
Decay rate — how fast does your data go stale?
- Industry average: 30% of B2B data decays annually
- Sales contacts: ~20% change jobs each year
- Direct dials: ~15% become invalid annually
- Emails: ~22% bounce rate after 12 months without refresh
-
Consistency — are the same things called the same thing?
- Job titles: "VP Sales" vs "Vice President of Sales" vs "VP, Sales"
- Industries: "SaaS" vs "Software" vs "Technology"
- Company names: "IBM" vs "International Business Machines" vs "IBM Corp"
Deduplication strategy
| Approach | When to use | Risk level |
|---|
| Exact match | Email, phone, domain — safest | Low |
| Fuzzy match | Names, company names, addresses | Medium — review matches before merging |
| Rule-based | Combine multiple fields (name + company + title) | Medium |
| ML-based | Large datasets with complex patterns | Low (if trained well) — but expensive |
Merge rules (which record wins):
- Most recently updated record keeps modifiable fields
- Most complete record keeps enrichment data
- Oldest record keeps the original owner/source
- Always preserve: original lead source, first touch date, opt-in status
Data normalization
| Field | Common problems | Solution |
|---|
| Job title | Abbreviations, variations, custom titles | Map to standard taxonomy (C-Level, VP, Director, Manager, IC) |
| Industry | Free-text, overlapping categories | Map to SIC/NAICS or your internal taxonomy |
| Company name | Abbreviations, legal suffixes, DBA names | Normalize to official name, store variants as aliases |
| Phone | Mixed formats, extensions, country codes | E.164 format (+1XXXXXXXXXX) |
| Address | Inconsistent formatting, abbreviations | USPS standardization or Google Maps API |
| Country | Mix of codes and names | ISO 3166-1 alpha-2 codes |
Enrichment automation
Set up ongoing enrichment to prevent decay:
- Trigger-based — enrich when a record is created or updated
- Scheduled — monthly/quarterly batch enrichment of all records
- Decay-based — re-enrich records older than X days
- Event-based — re-enrich when a contact's company has a news event (funding, acquisition)
Step 3 — Platform-specific guidance
In ZoomInfo (OperationsOS)
ZoomInfo OperationsOS is purpose-built for CRM data management at scale.
Deduplication:
- OperationsOS identifies duplicates across contacts, leads, and accounts using fuzzy matching
- Configurable match rules: email, name+company, phone, domain
- Bulk merge with configurable "winning record" rules
- Cross-object dedup: find leads that already exist as contacts
Data orchestration:
- Build automated workflows: new record → match existing → enrich → normalize → route
- Configure matching rules to prevent duplicates before they're created
- Set up enrichment triggers on record creation, field change, or scheduled interval
- Normalization rules for job titles, industries, company names
Data decay management:
- Auto-detect job changes and company changes
- Flag stale records based on last-enriched date
- Configure re-enrichment schedules (monthly recommended for active pipeline)
- Track data quality metrics over time
Setup: ZoomInfo admin → OperationsOS → Data Orchestration → Create Workflow.
In Salesforce (Native)
Duplicate Management:
- Setup → Duplicate Management → Duplicate Rules
- Standard rules: match on email, name+company, phone
- Custom matching rules for complex scenarios
- Block or alert on duplicate creation
Data.com Clean (if licensed):
- Batch cleaning of contacts and accounts
- Auto-enrichment on record creation
- Scheduled cleanups
Limitations: Native dedup is basic — no fuzzy matching, no cross-object dedup, no automated merge. For enterprise-scale, pair with DemandTools or ZoomInfo OperationsOS.
In HubSpot (Operations Hub)
Deduplication:
- Operations Hub includes AI-powered duplicate detection
- Suggests merge candidates with confidence scores
- Bulk merge with "primary record" selection
- Available on Operations Hub Professional+
Data quality automation:
- Programmable automation (Operations Hub Professional): custom code actions for normalization
- Data quality command center: monitor property completeness, formatting issues
- Automated formatting: capitalize names, standardize phone numbers, clean URLs
Limitations: HubSpot dedup is contact/company only — no custom object dedup. Formatting automation requires Operations Hub Professional ($800/mo+).
In Clay
- CRM enrichment & refresh: Import contacts/accounts from Salesforce, HubSpot, or Dynamics 365 into Clay tables. Run waterfall enrichment to fill missing fields and refresh stale data. Push updated records back via bidirectional sync.
- Automated data maintenance: Set up scheduled imports to regularly pull CRM records into Clay, re-enrich, and sync back. Keeps contact data fresh without manual effort.
- Duplicate detection: Use enrichment data (LinkedIn URLs, company domains, verified emails) to identify and flag duplicate records before syncing back to CRM.
- Data standardization: Use Sculptor workflows to normalize job titles, company names, industry classifications, and other fields. Apply consistent formatting before pushing to CRM.
- Plan gate: CRM sync requires Growth plan ($446-495/mo). Free/Launch users can export enriched data as CSV for manual CRM import.
- Best for: RevOps teams wanting to automate CRM data enrichment and cleanup on a recurring basis.
In LeanData / RingLead
LeanData:
- Lead-to-account matching (which leads belong to which accounts)
- Lead routing based on matching results
- Deduplication with merge automation
- Salesforce-native (runs inside SFDC)
RingLead (now ZoomInfo — acquired):
- Duplicate prevention on record creation
- Bulk dedup with configurable match rules
- Data normalization and standardization
- Works with Salesforce, HubSpot, Marketo
In DemandTools (Validity)
- Most powerful Salesforce dedup tool
- Scenario-based: build complex match/merge rules
- Mass operations: update, delete, deduplicate at scale
- Import management: clean data before it enters CRM
- Single Table Dedupe, Table-to-Table Dedupe (cross-object)
In Clearbit
Clearbit (now Breeze Intelligence in HubSpot) focuses on enrichment-driven data hygiene — standardizing and filling CRM records with firmographic and demographic data.
Enrichment-based cleanup:
- Enrich existing CRM records with standardized firmographic/demographic data
- Bulk enrichment via API or Breeze Intelligence in HubSpot
- Continuous data refresh: subscribe to enrichment updates when data changes ( parameter) — records stay current without manual re-enrichment
Normalization:
- Standardized industry codes (NAICS, GICS, SIC) — normalizes messy free-text industry fields
- Normalized role and seniority classifications — standardizes job titles across records into consistent categories
- Corporate hierarchy: parent company and ultimate parent domain fields help deduplicate subsidiary records
Data quality signals:
- flag identifies personal email addresses (gmail, yahoo) vs business emails — useful for filtering low-quality records
- Tech stack data helps identify outdated technology fields in CRM
Best for: Filling missing fields, standardizing industries and titles, flagging personal emails, and keeping records fresh with continuous enrichment. Pair with a dedup tool (ZoomInfo, DemandTools) for full hygiene coverage.
Step 4 — Actionable guidance
Quick wins (do these first)
- Remove obvious duplicates — exact email match dedup is safe and fast
- Fix formatting — standardize phone numbers, capitalize names, normalize countries
- Fill critical gaps — bulk enrich records missing email or phone
- Remove dead records — hard bounces, invalid emails, disconnected phones
Ongoing hygiene program
- Prevent duplicates at entry — enable duplicate rules on record creation
- Enrich on create — auto-enrich new records within minutes of creation
- Monthly dedup sweep — run fuzzy match dedup monthly, review and merge
- Quarterly refresh — re-enrich all active records every 90 days
- Annual purge — remove records with no activity in 12+ months (archive, don't delete)
Metrics to track
- Duplicate rate — % of records with duplicates (target: <2%)
- Field completeness — % of critical fields filled (target: 95%+)
- Bounce rate — email bounce rate on outbound (target: <3%)
- Data age — median days since last enrichment (target: <90)
- Merge rate — duplicates merged per month (should trend down over time)
Gotchas
-
Merge before you enrich — enriching duplicate records wastes credits. Dedup first, then enrich the surviving records.
-
Test dedup rules on a sample first — fuzzy matching can produce false positives (merging records that shouldn't be merged). Always review a sample of 50-100 merge candidates before running bulk operations.
-
Preserve lead source on merge — the most common post-merge complaint is losing original lead source attribution. Configure merge rules to keep the oldest record's lead source.
-
Don't delete — archive — instead of deleting stale records, move them to an archive status. Deleted records lose history; archived records can be reactivated if the contact returns.
-
GDPR and compliance — data hygiene must respect opt-out and consent records. Never re-enrich a contact who has opted out. Check compliance status before any bulk enrichment operation.
Related skills
- — Clay platform help
- — ZoomInfo platform help (for OperationsOS-specific setup)
- — Clearbit platform help (enrichment, reveal, prospector)
- — enrichment strategy across all providers
- — lead assignment and territory rules (often paired with dedup)
- — lead scoring models (depend on clean data)
- — connecting data tools to CRM
- — building prospect lists (data quality at the source)
- — Not sure which skill to use? The router matches any sales objective to the right skill. Install:
npx skills add sales-skills/sales --skills sales-do
Examples
Example 1: CRM data audit
User says: "Our Salesforce has 50,000 contacts and I suspect a lot of them are duplicates or outdated. Where do I start?"
Skill does: Walks through the data quality audit framework — measure completeness, accuracy, duplication rate, and decay. Recommends starting with exact-match email dedup (safest), then running a field completeness report, then sampling 100 records against LinkedIn to estimate accuracy.
Result: User has a data quality scorecard and prioritized cleanup plan.
Example 2: Setting up ongoing hygiene
User says: "We keep getting duplicates in HubSpot and our data goes stale within months. How do we automate this?"
Skill does: Recommends HubSpot Operations Hub for dedup + ZoomInfo or Clay for enrichment. Sets up duplicate prevention rules on creation, auto-enrichment for new records, and a quarterly re-enrichment schedule.
Result: User has an automated hygiene program that prevents duplicates and keeps data fresh.
Example 3: Pre-campaign data cleanup
User says: "We're about to launch a big outbound campaign to 10,000 contacts. How do I make sure the data is clean first?"
Skill does: Recommends a pre-campaign checklist: dedup the list, verify emails with a dedicated verification tool, re-enrich records older than 90 days, remove contacts at companies that no longer fit ICP, and check opt-out/DNC status.
Result: User launches campaign with verified, deduplicated, compliant data — lower bounce rate, higher deliverability.
Troubleshooting
Dedup merging wrong records
Symptom: Fuzzy match dedup merged two different people who happen to have similar names at the same company
Cause: Match rules too loose — matching on name + company without additional criteria
Solution: Tighten match rules: require email OR phone match in addition to name + company. Always run in "review" mode before "auto-merge" mode. Add title or department as a tiebreaker.
Enrichment not filling expected fields
Symptom: Auto-enrichment runs but many records still have empty phone or email fields
Cause: Single enrichment provider doesn't have coverage for all contacts. Coverage varies by geography, seniority, and industry.
Solution: Implement waterfall enrichment — try Provider A, if no result try Provider B, then Provider C. Use
for waterfall setup. Common waterfall: ZoomInfo → Apollo → Lusha.
Data quality metrics not improving
Symptom: Running monthly dedup and enrichment but duplicate rate and completeness aren't improving
Cause: New duplicates are being created faster than they're being merged. Root cause is usually web forms, imports, or integrations creating records without duplicate checks.
Solution: Fix the source — enable duplicate prevention rules on all record creation paths (web forms, API imports, manual creation, integration syncs). Prevention is more effective than cleanup.