Exploring data – start with a question
Posted on August 3, 2016
Earlier this week I wrote about the two purposes for visualizing data. This post will offer the list of steps that I take when beginning to explore a dataset.
My five steps of exploring data
While I’d love nothing more than to start digging right in to a set of data, doing exactly that has led to significant frustration for me. This series of steps helps me get out of the weeds and have an orderly way to approach an analysis. Let me assure you, I love to dig in but this hasn’t been the best approach. Here’s my process:
- Ask a question
- Gather the data
- Select your tools
- Format the data
- Explore the data
Ask a question
It all starts with a question. In Visualize This, Nathan Yau remarks that data can be boring if you don’t know what you’re looking for, or don’t know that there’s something to look for in the first place! (p. 2). Asking a question means you’ve decided there is something to look for and you’re formulating the basis for retrieving and exploring the data to answer the question.
I was on holiday in northern Europe for nearly a month earlier this summer. In order to pack for the trip I asked the question, “what’s the temperature in northern Europe expected to be for the month of July?” Where did I gather the data to answer the question? What tools did I select for analysis? How did I format the data to be used with the tools? How did I explore the data? This is a simple example, but it all started with the question. For this simple analysis I combined many of these steps and packed a single bag of spring clothing with one jacket and a small umbrella.
Are you asking the right question?
I felt freezing cold for three of the four weeks. The predictive data I found told me the temperature would be in the high 60s (Fahrenheit) and that it would rain. The cruise portion in the North Sea and Baltic Sea experienced temperatures of 50s at best with high winds that made me wonder what the windchill factor was. When we arrived in Oslo, Norway – which I expected to be the coldest location, it was sunny and balmy in the mid-70’s. A better question would have been, “what is the wind (and windchill) over the North Sea during July?” We could even ask, “what’s the best clothing to take on a North Sea cruise in July?”
The example question above is a personal example, however you can imagine the questions your business or team might ask. Examples are, “what products produced the most revenue for our business during the spring quarter?” or “do we have the most efficient process for delivering our best selling product?”.
The question informs data retrieval
The weather related questions, even the predictive question I actually asked are the types I generally think about. Being trained in engineering I tend to think in questions that lend themselves to be answered by data from official, peer-reviewed, data sources. The next post will cover data gathering, but now I’ll just tease the fact that crowd-sourced data can be just as legitimate as official, peer-reviewed sources. As long as the source can be identified, and decision-makers made aware of the resource(s) – go for it! In this case, I may have received data that allowed me to make better decisions if I considered asking questions regardless of the potential data source.
What are your questions?
What questions do you have that require you to explore data? Do you think only in questions that have old-school data sources? Do you limit your questions?
photo credit: Pixabay CC0 Public Domain