Exploring data – format data
Posted on August 15, 2016
We’re looking at data insights and communications. Basically that’s asking a question, exploring data to find answers, and then explaining the results by communicating using visualizations with audience specific context. Separating insights from communications, we started with gaining insights from exploring the data.
Recap five steps of data exploration
Our five steps were outlined in the data exploration post. We subsequently took a deep dive into the first step – ask a question, the second step – gather the data, and the third step – select your tools.
- Ask a question
- Gather the data
- Select your tools
- Format the data
- Explore the data
In this post we look at formatting data to get it ready to use with the tools selected.
Format the data
The tool(s) you’ve selected to analyze and visualize the data, looking for the answers to your question and finding the story, will determine the structure of the data. For most people this has to be the least interesting part of the exploration process. At this point, you are making sure the data is in the form that the tool can process it and show you something that allows you to see patterns (or the lack of patterns). Furthermore, if one tool handles certain types of graphical analysis – and that doesn’t tell you anything, you may need to use a different tool and structure or format your data in a way that the next tool can process.
This step is critical, and can take quite a bit of time. Don’t become frustrated or give up! Remember, this is the exploration phase. You are searching for something, and truth be told, there may not be a story – and that IS the story (if it’s true).
Some of the data formats you’ll encounter are:
- delimited text (comma delimited text files are quite common)
- JSON (machine readable, and allegedly human readable)
- XML (extensible markup language)
There’s not much to say about data formatting. It’s very important, remember GIGO (garbage in, garbage out) so the data has to be formatted such that the analysis tools can understand the data. You (the analyst) must be confident you understand the exploration results and they make sense. Issues such as the tool misreading a data field as the wrong variable can wreak havoc on your analysis – this type of thing can happen in a formatting error. Take your time to get this part of the data exploration phase carefully done correctly.
Data formatted, check! Can we finally explore?
Next we finally get to use the tool(s) selected with the prepared data. We look at the actual exploration process in the next post in this category.
photo credit: Pixabay CC0 Public Domain