Mark Twain apparently said the following: “Data is like garbage, You’d better know what you are going to do with it before you collect it.”
Of course he was quite wrong. What he meant to say was “Data are like garbage …” and so on – very amusing. But his wrongness goes beyond pedantry. I say collect the data, and some day you may learn to understand it.
A few years ago I visited Geoscience Australia in connection with a data collection program being run by the Bureau of Meteorology and I happened upon someone working on the search for MH370. They gave me some glimpses of the data they were retrieving and I must say I was deeply impressed. It was not just scale and precision of the effort that impressed me, but the devotion to the task of harvesting a knowledge asset whilst providing the necessary and somewhat grim service of searching for a lost airliner.
The data harvesting emphasis is clear to see for anyone who wants to have a look at the dataset. You can clearly see the search zone which they covered in detail and you can also see the trips to and from the search zone where the ship continued to map the sea bed just to fill out their understanding of the area.
I use this slide in my classes on Big Data in order to emphasise the point that collecting data is important and useful even if you don’t yet know how you are going to use it. Yes, I know that there are issues here, when data lakes first came to the fore everyone rushed to grab all of the data and the next thing was the so-called data swamp. Lots of data and no way to get any value from it. But the solution to the data swamp problem is not to stop collecting. It is more making sure that you get everything you need to manage those data effectively.
Written by Trevor Christie-Taylor