Trump, FBI and those emails

Trump, FBI, emails. You couldn’t make it up.

In yet another bizarre move in the US Presidential Election, The FBI has backtracked on accusations about emails against Hilary Clinton and Donald Trump has erupted in a cloud of steam – again.

“You can’t review 650,000 emails in eight days,” Trump said on Sunday in a campaign speech in Michigan hours after James Comey’s latest update to Congress came out. “You can’t do it, folks.”

Well, Donald, you can. In fact there are so many ways it can be done, they are too numerous to go through here. But just remember an even bigger mass of data, the Panama Papers – 11.5 million documents which were hacked from Mossack Fonseca, a Panamanian law firm, around April 2016. Using Trump’s limited knowledge and given the number of documents and their likely size, it would take an army of people to analyse them and that would take them probably hundreds of years. Instead, the journalists who performed the analysis used some Apache project software, notably SOLR and Tika. SOLR indexed everything into its Lucene index and Tika was used to search the documents whatever their formats. Using these software systems enabled journalists to search the terabytes of data for keywords, names, banks and countries in real time, with responses in milliseconds.

Now The FBI has probably got one or two clever people working there who may know a bit about IT and might well be using these technologies. They haven’t confirmed that they have used these systems but you can bet they have the wherewithal to search and analyse more than 650,000,000 emails in a few days let alone 650,000. By the time they’ve filtered out just the emails from/to Hilary Clinton, got rid of duplicates and repeats from threads, they probably had a few thousand emails. Easy or what? Edward Snowden – remember him? – reckons they could have done it in a few hours on an old Windows laptop!

I’ll bet they had plenty of time to enjoy a beer or two when they watched the Cubs game too.

