Talend Helps the ICIJ Expose Hidden Wealth in the Paradise Papers
REDWOOD CITY, Calif.,
ICIJ used Talend to load more than 1.4 TB of unstructured data into Neo4j graph database, which leverages the Linkurious graph visualization platform to organize and access the information. The data includes emails, Excel, CSV and PDF documents with text and images about companies and people who are using a hidden system built for avoiding tax payment. ICIJ also used other open source tools to support their “Knowledge Center” and make the information searchable by reporters.
“Talend is our preferred solution when it comes to cleaning, transforming, and integrating the data we receive. It works as a crucial mechanism for enabling us to build a robust database,” said Pierre Romera, CTO at ICIJ. “Working with open source tools like Talend ensures security and reliability of data as our extensive network of investigative journalists review terabytes of files. Backed by an extensive community of contributors, open source solutions enable us to benefit from the latest innovations in data processing, extraction, and visualization.”
Cloud is also a central element of ICIJ’s data journey. The organization uses the power of Amazon Web Services (AWS) to process all the data and make broaden access. ICIJ set up temporary machines in AWS to parallelize data extraction - the organization uses Ubuntu, Tesseract and an in-house tool called Extract to do characters optical recognition and help to extract text from files.
“Moving to the cloud was obvious due to the nature of our mission and the large volume of data we process. Cloud technology offers the scalability we need when we need it, so we can easily manage our workload. With a robust power for processing and security, AWS was the most suitable choice for us,” explained Pierre.
The 13.4M tell-tale documents were obtained by German newspaper Süddeutsche Zeitung that received data from two offshore services firms in countries ranging from Bermuda to Singapore, as well as 19 corporate registries around the world. For about a year, ICIJ worked with hundreds of journalists and media partners on exposing this new lead, which has had a significant impact on well-known individuals and large organizations.
“Since ICIJ revealed the Panama Papers leak in 2016 for which they won the Pulitzer Prize, we have seen how much data management and processing technologies can impact our society,” said Ciaran Dynes, SVP of Products, Talend. “We are pleased to support in-depth investigative journalism and those seeking meaningful insights from data.”
Talend (NASDAQ: TLND) is a global leader in cloud and big data integration solutions that helps companies turn data into a strategic asset that delivers real-time, organization-wide insight into customers, partners, and operations. Through its open, native, and unified integration platform, Talend delivers the data agility required for companies to meet the constantly evolving demands of modern business. With Talend, companies can easily scale their data infrastructure and rapidly adopt the latest technology innovations in cloud and big data. Talend’s solutions support over 1500 global enterprise customers including AstraZeneca, GE,
A photo accompanying this announcement is available at http://www.globenewswire.com/NewsRoom/AttachmentNg/6c799db9-a422-436e-9d80-42ef14d15a37
Siobhan Lyons, Director, Corp. Communications Talend 202-431-9411 firstname.lastname@example.org Chris Taylor, VP, Corp. Communications Talend 408-674-1238 email@example.com