Tectonic Trends in Data Analytics
Over the next few months, I’ll be posting a series of articles focusing on data analytics – defining it, describing it and looking at how folks are using it in the real world. Today I will set the stage by highlighting a few large-scale trends. A good analogy here is plate tectonics; in some areas, mountains of opportunity are being created, while in other areas, the gap between users and insights is widening.
1. Analytic maturity: structured -> semi-structured -> unstructured -> unified
As the data-driven economy matures, it is increasingly tackling more difficult data sources. Structured data is often less clean than you might hope, but the well-defined schemas can help you understand the intent, and the techniques for working with it are well understood. Semi-structured data, like log files or other machine data, presents new challenges. First the structure must be inferred, and then other challenges must be overcome. This data isn’t clean, complete, or consistent, so newer analytic techniques must be used to account for this. Unstructured data, however, often has only the structure inherent in human language. Today, text analytics techniques can be used to identify and extract this structure, but there is still a long way to go before this is as commonplace as structured analytics.
Perhaps even more valuable than any one data type is the ability to combine information from multiple sources and levels of structure, to provide a more holistic view of the business, customer, product or supply chain. Folks like Sue Feldman at Synthexis, and Dave Schubmehl at IDC have called this “Unified Information Access,” a term I love.
2. Retrospective -> predictive (batch -> live)
Another sign of the growing sophistication of the analytic tools and their users is the shift from retrospective analysis – why did X happen? – to predictive analytics. Here, the goal is to use historical information and historical results to predict future results. In many ways, predictive analytics is possible, or at least approachable, because of advances in machine learning and artificial intelligence.
At the same time, there is a strong desire to transition from basing these analytics on static monthly, weekly, or daily datasets toward operating with live or near-real-time data. From a machine learning algorithm standpoint, this covers two transitions – from “offline” to “online” statistical models, and from batch-oriented to incremental algorithms. If this is a foreign language to you, not to worry – I’ll unpack this for you soon.
3. Tectonic drift: Deep data science versus end-user analytics
I see a divide slowly forming between the desire to dig deep into large, complex, potentially messy datasets to uncover hidden value, and the need to allow end-users to drive the analytics process around their business needs. This leads to two separate approaches:
- Hire a data science team, outfit it with the latest in big-data software and supporting hardware, and point it at a broad set of potentially useful data.
- Invest in self-service analytics tools that give end-users the ability to explore a more limited set of data on their own.
In some senses, the gap here has widened. The tools available to the data scientist are becoming more powerful, but aren’t always getting easier to use, while the self-service analytics tools are putting a lot of power in the hands of folks who may not know what data is available or how best to wield it.
Industry analysts and the public at large don’t agree about which approach is best.
4. Tectonic convergence: Analytic output integration into business processes
Regardless of who is steering the analytic engines, the insights are being integrated directly into business processes. This trend is being driven by a virtuous circle; small successes drive more investment, which results in larger analytic projects looking at a wider range of data. We may think of Google as one of the first data-driven organizations, but nearly all of today’s successful companies are leaning on data and analytics to help them make sound decisions.
5. Visualization: A wonderful double-edged sword
Intuitively communicating complex information to end-users has been important since the dawn of data. With the growing complexity of source data and continued sophistication of analytic techniques, visualization has never been more important, more powerful – or more dangerous.
Information visualization tools enable an end-user to view high-level summarizations of the data, identify anomalies, and drill in to see the detailed source data where necessary. This can be game-changing for today’s large, fast-moving, varied datasets, which many people call big data. It is important to remember, however, that visualizations can be misleading. Correlation doesn’t mean causation, and different data sources may have varying degrees of accuracy, reliability and consistency.
This doesn’t mean that we should shy away from using visualization; it means that we should keep our visual analytics anchored to business context and use this knowledge to understand how the many data sources fit together to support or contradict taking action or making a decision.
So what do you think? Are you seeing these trends or others in your organization? Let me know your thoughts.Like This