Exploring data – find and retrieve
Posted on August 9, 2016
We’re looking at data insights and communications. Basically that’s asking a question, exploring data to find answers, and then explaining the results by communicating using visualizations with audience specific context. We clarified the difference between exploring data, visually searching for the insights it tells you; and explaining the data, visually to a variety of audiences.
Recap five steps of data exploration
Our five steps were outlined in the data exploration post. We subsequently took a deep dive into the first step, ask a question. Remember, it always starts with a question.
- Ask a question
- Gather the data
- Select your tools
- Format the data
- Explore the data
Gather the data
After formulating a question (or several questions), it’s time to find and retrieve the data that will respond to the question.
Questions requiring publicly accessible data
Public questions, such as those about climate, the economy, or demographics, have well-vetted data sets that can be retrieved from official or well-reviewed sources. Never forget to cite your sources and make sure you’re confident the source of data is legitimate and objective. Nothing is worse than going through a thorough analysis just ot have your data source deemed questionable. Using questionable data is akin to the old computer adage, “GIGO” – garbage in-garbage out. It’s useless and it undermines any work you’ve done with it. An excellent public, academic, or government reference librarian can help guide you to objective data sources. Take your time with this.
Questions requiring internal, confidential or sensitive data
Questions about your enterprise, such as those about customer satisfaction, profitability, process efficiency, or safety can only be answered from the data being captured within your organization. If the question has never been asked before or isn’t asked regularly, this can be a challenge. Don’t underestimate the difficulty you may encounter getting the data. Take your time with this too. The same old computer adage, GIGO applies. It also may be difficult to acquire the data because it’s deemed sensitive or confidential.
Does the sales department collect satisfaction results? Consistently, not only anecdotally? With all customers, not just reactions from dissatisfied customers or for testimonials from happy customers? Is the data captured by an independent 3rd party so clients are more likely to be truthful? Do the customers have an incentive to be truthful?
Does operations collect data on production activities? Consistently, not just when there seems to be a problem or bottleneck?
Is the data in a form that you can handle for exploration? Increasingly data is collected electronically, but there are still many of paper forms being used to collect surveys of people by other people – even clipboards at critical areas in a production line or safety inspections. How is the penmanship of the person that recorded the data? Was that number a 1 or a 7? Is that a decimal or a comma (. or ,)? Was the person recording using United States or European conventions?
Found and retrieved, what’s next?
Not so fast. As you can imagine, there are a lot of areas to consider when you find the data you believe will answer the questions and retrieve it in a form that can be explored. At the end of this step you will have access to the data you will soon explore. Ideally you’ll have an electronic database containing the data structured in a way you can examine it easily. Take a look at your data set. Look for anomalies. Are any data points missing? If so, why? If you can’t visually take a cursory look, that’s okay. The initial exploration steps will turn up problems in data that you’ll have to investigate in later steps.
Generally, this is not the fun and interesting part of data analysis. But I would assert it is one of the most critical steps. The next most critical in my opinion is step 4, formatting the data. Spending time to do these steps well will serve your future exploration insights well.
photo credit: Pixabay CC0 Public Domain