Building a Centralized AML Database for Accurate Watchlist Compliance Screening

5M+

Individual & entity records unified

30K+

Global sources connected

1M+

Adverse media records collected
Design for a centralized and reliable ALM database

AML Database for Watchlist Screening

Click here to download

Customer Overview

Our client is a US-based FinTech development company building solutions for financial institutions operating under AML (Anti-Money Laundering) regulations. These institutions must screen individual and entity customers and their transactions against sanctions lists, PEP (Politically Exposed Person) and FEP (Financially Exposed Person) databases, UHNWI (Ultra High Networth Individual) profiles, and adverse media to ensure they are not linked to crimes, like terrorist financing, fraud, narcotics trafficking, or other illicit activities. Failure to detect these risks can lead to penalties and reputational harm. So, our client decided to build a Watchlist Screening Solution, but effective screening relied on an up-to-date AML risk data foundation aligned with compliance standards set by FATF (Financial Action Task Force), OFAC (Office of Foreign Assets Control), FinCEN (Financial Crimes Enforcement Network), the UN Security Council, and EU AMLD regulatory frameworks.

Project Overview

The client partnered with TenUp to build an AI-powered Watchlist Screening Solution supporting KYC verification, transaction screening, PEP and UHNWI assessments, ongoing monitoring, and STR (Suspicious Transaction Reporting) investigations across individual and entity customers. From the outset, the effectiveness of these workflows depended on access to complete, continuously refreshed AML risk intelligence from global watchlists and media sources. To meet this requirement, they engaged TenUp to build a centralized AML risk intelligence database that serves as the authoritative data foundation powering all screening, sanctions compliance, adverse media checks, and due diligence operations.

Challenges

Building a unified, accurate, and continuously refreshed risk database to ensure reliable AML screening of individual and entity customers.

  • Fetching and consolidating data across three distinct AML data classes — global sanctions lists, custom-built PEP/FEP/UHNWI datasets, and continuously arriving adverse media intelligence.
  • Aggregating thousands of fragmented sources from countries and international organizations worldwide, like Interpol, that provide data in inconsistent formats (PDF, images, HTML, XML, JSON, Word).
  • Creating custom PEP, FEP, and UHNWI datasets from scratch, parsing thousands of political party sites, government office-holder listings, public disclosures, and financial exposure sources worldwide across multiple languages and jurisdictions.
  • Resolving real-world identity complexity across aliases, transliterations, family and close associates, beneficial ownership chains, and layered entity control structures through advanced entity resolution techniques.
  • Extracting information from websites with varied and frequently changing layouts, including both traditional and dynamic SPA frameworks, while bypassing anti-scraping blocks and ensuring consistent, error-free data capture.
  • Scaling secure correlation across millions of person, entity, media, and event records to enable fast discovery of connected risks and end-to-end investigative traceability, and improved identity matching accuracy.
  • Continuously monitoring source websites for updates and changes, and prioritizing ingestion cycles to ensure all updates were captured without delay or data latency.

Solution

TenUp built a centralized AML database providing a unified foundation for sanctions, PEP/FEP/UHNWI, and adverse media intelligence to enable accurate KYC, transaction monitoring, risk scoring, EDD, and STR workflows, while supporting watchlist enrichment and regulatory compliance.

  • Built web scraping and ingestion workflows using Java, Python, and Distributed Parallel Processing Framework to aggregate all 3 classes of data from thousands of global sources in diverse formats on scheduled refresh cycles, storing all inputs in a SQL database.
  • Collected data for sanctioned and other high-risk individuals listed on global watchlists, including names, aliases, dates of birth, nationality, place of birth, identifying marks, and watchlist classifications, enabling identity matching for sanctions screening.
  • Built PEP/FEP/UHNWI databases by scraping sources across languages for details like political positions, tenure, family, and associates for PEPs, and source of wealth, ownership, transaction funding, and multi-jurisdictional links for FEP and UHNWI.
  • Created entity profiles by collecting registration status, sanctions exposure, beneficial ownership, parent–subsidiary links, control networks, and industry/geography-based risk factors for onboarding, lending, investment, and transaction checks.
  • Aggregate adverse media and intelligence data from news, reports, publications, journals, and other sources, normalizing and linking it with individual and entity profiles to create unified risk records.
  • Deployed OpenSearch for full-text indexing and relationship mapping across persons, entities, media, and events, enabling fast search, cross-record correlation, and analyst-led investigations across the risk database.
  • Exposed the adverse media risk dataset through secure APIs and integrated it into the Watchlist Screening Solution to enable real-time screening and continuous monitoring within KYC, transaction screening, and STR workflows.
  • Built modular scraping logic with pattern-based selectors and rapid update workflows to adapt extraction rules as website layouts and DOM structures changed frequently.
  • Implemented automated framework detection to route collection through lightweight HTTP parsers for static sites or headless browser automation for JavaScript-rendered SPAs.
  • Leveraged TOR network routing, VPN tunneling, rotating proxy pools, session management, user-agent cycling, request throttling, and fingerprint randomization to bypass anti-scraping controls while ensuring consistent data capture.
  • Implemented website change detection using the open-source framework changedetection.io to monitor source updates and trigger priority ingestion workflows, ensuring timely content refresh and minimal data latency.

Benefits

TenUp’s centralized ALM database, built for a financial institution’s AI-powered watchlist screening solution, offered the following advantages:

  • Aggregated and normalized data from 30K+ sources, consolidating 5M+ individual and entity records with measurable improvement in identity matching confidence and reduction in false positives.
  • Maintained continuous updates with automated ingestion workflows, ensuring near-real-time data freshness to support regulatory reporting and timely STR filing.
  • Handled high-volume data processing, enabling fast retrieval and linking of 7M+ individuals, entities, and events.
  • Exposed unified risk data as a service, supporting integration with our watchlist screening solution.

Technology

  • Java
  • Python
  • Akka
  • Crawlee
  • SQL Server
  • OpenSearch
  • TOR
  • VPN

Industry

  • FinTech
AML Dashboard flagging compliance risk

Conclusion

TenUp’s centralized AML database provides a unified, continuously updated foundation for sanctions, PEP/FEP/UHNWI, and adverse media intelligence that adheres to global AML regulatory mandates such as FATF, OFAC, FinCEN, and EU AMLD. By consolidating millions of individual and entity records, normalizing diverse global sources, and enabling fast retrieval and linking, this database ensures accurate and comprehensive risk data for AML screening workflows. Its API-based design allows seamless integration with the Watchlist Screening Solution, giving compliance teams reliable access to investigation-ready data and real-time risk updates. With this infrastructure, financial institutions can maintain up-to-date, high-quality risk information, strengthen KYC, EDD, and STR processes, and confidently manage regulatory obligations with reduced operational costs and faster due diligence turnaround.

Frequently asked questions

What is a centralized AML database and why do financial institutions need it?

faq arrow

A centralized AML database is a single source of sanctions, PEP, adverse media, and beneficial ownership data used to screen customers and transactions in real time. Financial institutions rely on it to improve detection accuracy, reduce false positives, and support compliance with standards and regulations like FATF, OFAC, FinCEN, and EU AMLD without delays or fragmented data.

How does a centralized AML database improve PEP and adverse media screening accuracy?

faq arrow

A centralized AML database improves screening accuracy by unifying sanctions, PEP, adverse media, and beneficial ownership data into a single risk profile, updated in real time. With entity resolution for aliases, multilingual news parsing, and AI-based relevance scoring, it helps reduce data gaps, lower false positives, and improve match confidence across jurisdictions.

Which regulatory bodies require ongoing screening using AML databases?

faq arrow

Ongoing AML screening is required by global regulators, including FATF, FinCEN, OFAC, FCA, EU AMLD, and the UN Security Council, which mandate continuous monitoring of sanctions, PEP updates, and adverse media, not one-time checks. These rules apply to banks, fintechs, payment providers, insurers, and virtual asset platforms.

How does an AML database support KYC, EDD, and STR filing?

faq arrow

An AML database strengthens KYC, EDD, and STR filing by centralizing verified identities, sanctions and PEP screening, beneficial ownership data, and adverse media into one risk profile. This allows compliance teams to score risk accurately, justify decisions with evidence, and submit well-documented STRs to regulators like FinCEN, FATF, and FIUs.

Why should FinTech and SaaS compliance platforms integrate AML data via API instead of manual ingestion?

faq arrow

FinTech and compliance platforms should use AML APIs because they deliver real-time sanctions and PEP updates, automatic risk scoring, and support faster KYC onboarding, and reduce data gaps. Manual ingestion delays updates, increasing false positives and creating audit failures that regulators like FinCEN and FCA can penalize.

Can AI-powered AML databases reduce operational compliance costs for FinTechs?

faq arrow

Yes. AI-powered AML databases reduce compliance costs by automating risk scoring, entity matching, and adverse media correlation, while cutting false positives, accelerating onboarding, and limiting manual reviews. This helps FinTechs scale without hiring more analysts, while maintaining regulator-grade accuracy required by FATF, FinCEN, and FCA.

How should a CTO or engineering team evaluate AML data quality before integration?

faq arrow

A CTO should evaluate AML data quality by checking real-time update frequency, entity resolution accuracy, global sanctions/PEP/media coverage, availability of beneficial ownership, and API logging performance. Data must be normalized and consistently structured to ensure scalable, audit-ready compliance in production.

Download Case Study
Contact us