Machine Learning and Data Visualization

I wrote this tweet based on a conversation I had where I tried to explain why machine learning was the future of analytics products and data visualization, and that analytics services that do not begin to solve the machine learning problem today will go the way of the dodo tomorrow. In general, I think the tweet storm that erupted afterwards was well received, but I thought it would be worth a few moments to expand on this topic a bit more.

So the first point I tried to make was about the 2 main traps that almost all analytics services will fall into:

  1. Data accuracy
  2. Data presentation

These are the main “pillars” of analytics systems, but they really are traps that you can fall into and get lost.

In general, data accuracy doesn’t really matter. I mean, it matters in the sense that the numbers you’re looking at should be accurate, but many analytics services (in particular the web analytics products) do what is known as “modeling” to generate estimated metrics based on some sort of normalized data model. Yes, even Google Analytics numbers are modeled on some scheme and do not accurately represent the most up-to-date quantities or volumes of traffic.

So given that almost all analytics services are modeling their data sets somehow, what is someone to do to get the most accurate information? The simple answer as I tweeted about is to ‘triangulate’ your position by signing up for 3 or more services, and comparing the data you get back. Depending on the model each service uses, you can discover you are within some percentage, plus or minus some number.

So as a result, the accuracy of a single service isn’t all that valuable if you’re clever enough to use multiple data sources to compare information.

Data presentation on the other hand, is usually an issue where “too many cooks” can and do “spoil the broth”.

Having designed dashboards and information systems in the past, the largest hurdle to making good data visualizations is really the internal bar for quality. I hate to say it, but there are a lot of data scientists out there that like to imagine themselves as also design experts, and that simply providing copious volumes of data (as a data scientist might enjoy) is a sufficient means of conveying insight.

But again, this is a trap.

By letting presentation specialists (a.k.a. designers) do their jobs effectively and with usability testing as part of the process, it’s very easy to develop a best-in-breed class of analytics and data visualizations.

As an analytics provider, your job is not to focus on accuracy and visualizations. The question to answer really, is why are these 2 things important? The answer to this question, when creating or building your own analytics product, is to provide insight to the user.

There are numerous products out there that simply report on information. I mean, to get traffic data, you could have your own server logs output to CSV files and simply put these data points into Excel, and BAM! You have a report! But to be clear, a report of information is not the same as insight.

To gain insight, you really need to start looking at data in comparison! Analytics providers can start designing information systems based on this idea of data comparison, rather than simply reporting numbers.

But in order to really start analyzing data, we can look to intelligent computers and algorithms to help us shape our understanding of the vast data sets that are at hand.

So it’s very easy to start talking about this future. I totally recognize that point. It’s a far different thing to try to actually build a learning system that knows how to generate insights that are valuable to the user.

In reality machine learning is a difficult problem and it’s a non-trivial effort to try and get things up and running.

In closing, the faster your service can get started today on solving this problem, the better off your company will be by helping to ‘future proof’ the business against someone else who does it first! Furthermore, any service that doesn’t look to machine learning to advance the company’s position will simply lag behind and perhaps even become irrelevant.

In the end, the future of data visualization will rely on machine learning for all the reasons (and tweets) as I have written here. If you have a different thought, feel free to reach out to me on Twitter at @machinehuman. Thanks for reading this!

Machine Learning and Data Visualization

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s