Data experts help expose offshore leaks
Last year, the International Consortium of Investigative Journalists (ICIJ) in Washington received a huge data set consisting of 2.5 million documents on tax havens from unknown sources. The documents contain 130,000 names of people from 170 countries suspected of fraud, among them oligarchs, arms dealers and criminal financial investors. Apart from that, there were more than two million emails and lists of 122,000 covert companies and trusts from respective tax havens. The unprecedented research that followed brought together media outlets from 46 countries which set themselves to check the data. Here in Germany, the Süddeutsche Zeitung, a leading newspaper, was involved in the process of analyzing the data. In this post editor Bastian Brinkmann writes how data experts helped to analyze the enormous volumes of data.
The beginning was analogue. Ironically, that hard disk containing leaked data on offshore service providers in tax havens came by post. To be precise, the disk contained 260 gigabytes of secret data which equals approximately 500,000 printed Bibles. No one could read all that in a lifetime.
The ICIJ faced a huge challenge. How can such a huge data set be analyzed? And, first of all – how can you possibly analyze such different types of data – images, encrypted data and more that two million emails? Data experts had to take action even before journalistic research could begin.
The volume of leaked data is enormous. It’s actually about 150 times bigger than that of the biggest published leak, which was the archive of embassy dispatches by Wikileaks. Besides, all those dispatches had one and the same format and could therefore be analyzed in a standard way. The offshore hard disk on the other hand had all sorts of mixed formats such as company databases, emails, word documents, scans or correspondence saved as pdfs. Many of those documents are found two or more times in the data set, in case they have been forwarded as email attachments from one recipient to another.
Identifying duplication was just one of the challenges data experts were facing. Many documents were saved as images, including passports of the founders of covert companies copies of which had been mailed to tax havens. Other documents containing instructions by the actual company owners to the false CEOs had first been printed out, signed and then scanned. Such documents were digitized with the help of OCR (optical character recognition) which transforms images into text in order to be made readable by the hardware.
Finally, the data set was indexed so that search engines could find specific data. That was a success. The program dtSearch can be supplied with a list of names and search for them in the 260 gigabyte large data set. Another program, Nuix, can recognize documents where German is used with the help of key words. It can also discover connections between different data, for example between an attached pdf file and an email correspondence of several people within a certain time period. For example, the “Securities and Exchange Commission” in the US uses Nuix in case it has confiscated millions of emails of suspected joint stock companies.
In the meantime, programmers have recreated the software used by the offshore service providers. It became possible to click through the register of companies just the way service providers did that and to answer many fundamental questions: Who’s the actual founder of a particular trust? Who is the contact person? Has the person been charged? At which address has the invoice been sent? Only this way was it possible to shed light on offshore deals and interdependencies.
Gunther Sachs: Playboy in tax heaven
For instance, it took months for the Süddeutsche Zeitung to investigate into the financial matters of the German industrialist and playboy Gunther Sachs, both within the data set and in real life. At the end of the day the intricate offshore scheme set up by the man was visualized in a relatively simple way, which nevertheless took a lot of effort.
In the case of Gunther Sachs, specific technicalities were taken care of by the data experts Sebastian Mondial from Germany, Duncan Campbell and Matthew Fowler from the UK, as well as Rigoberto Carvajal and Matthew Caruana from Costa Rica. After the basic technical work had been done, ICIJ decided to distribute further tasks worldwide, since the sheer volume of data made it almost impossible to analyze it in a small team. All in all, 86 journalists in 46 countries took part in the evaluation: Süddeutsche Zeitung and a public radio and television broadcaster NDR in Germany, Washington Post in the USA, Le Monde in France and the Guardian in the UK.
The International Consortium of Investigative Journalists, which is a project of the Center for Public Integrity in Washington and is primarily financed by foundations in the USA, acted as the research coordinator. A systematic analysis of the data has shown that they include documents about more than 122,000 covert companies and trusts based on the British Isles, Virgin Islands, Cook Islands, Cayman Islands, Labuan Island, Seychelles as well as in Samoa, Hong KJong, Singapore and Mauritius.
The documents expose 12,000 middlemen offering these offshore structures and contain data on about 130,000 people with addresses from 170 countries. Each of these numbers can reveal a story. The work with the data set is just at the beginning.
Translated by Natalia Karbasova
Links:
Original publication in the Süddeutsche Zeitung: Wie Computer-Forensik das Offshore-System entschlüsselte
Secrecy for sale: Inside the global offshore money maze
Feedback
Write a Comment