Data Analytics Trends: Deep Data Science vs. End-User Analytics
This is the third follow-up article in my series about trends in data analytics. Article one – Climbing the Analytic Maturity Mountain – discusses how organizations are now thinking about unstructured data as a new source of differentiated value. Article two – The Transition to Predictive Analytics – describes how the world is moving from retrospective analytics to predictive analytics as modeling techniques and supporting data have improved.
In this post, I want to call attention to the growing divide between the desire to dig deep into large, complex, messy datasets to uncover hidden value, and the need to allow business users to drive the analytics process around business needs. The goals are the same – use data more effectively to drive better business decisions – but the approach is very different.
In many ways, this is yet another instance of the Specialist vs. Generalist debate. As a business, should you invest your hard-earned dollars in a specialized data science team, provisioned with the latest big-data technology, or empower your existing business users with sophisticated information visualization and analysis tools? The answer, of course, varies depending on your business, but it’s important to understand the trade-offs.
The tools available for data scientists are evolving at an amazing pace. The prototypical data scientist – if we avoid the temptation to define and debate the term – uses a series of complex tools to clean input data and combine multiple internal and external data sources. From here, they build models to recognize patterns and anomalies. In some cases, the goal is to identify and explain anomalies (e.g. identifying fraud). In other cases, the goal is to build predictive models using historic data from a range of sources to predict future performance (e.g. based on weather, day of week, and oil prices, what is electricity likely to cost tomorrow?).
The benefits can be substantial, though for SMBs, the upfront investment in staff and resources can be overwhelming. Assembling a big-data infrastructure is no small task. While Hadoop is the 800-lb gorilla of the big-data market, there are countless competing and complementary technologies, many of which are free to download. Keep in mind that free-to-download tools DO carry a cost – configuration, management, and the skilled employees to utilize them are among the costs. In fact, the challenges of maintaining compatible versions of these frameworks and integrating them efficiently has spawned a number of commercial companies selling validated, supported packages containing a subset of the tools. Another challenge that is often overlooked is the additional load these systems place on existing infrastructure. If you’re going to be taking near-real-time data from your sales database, you may be putting additional load on your servers and causing your sales staff to suffer.
End-user data visualization and analysis tools aren’t new – businesses have been using basic tools like Microsoft Excel with gusto since the early days of spreadsheet functions and built-in charts. However, the tools of today offer more power to aggregate and analyze larger datasets. By allowing users – not data scientists – to interactively build visualizations, businesses can save time and improve responsiveness. Rather than having a business analyst think of a question they want to ask, then reserve time from the data science team, and wait for an answer to pop out, the business user can dig in immediately, and refine their analysis based on early results.
Imagine that you are a marketing professional, and you set out to analyze the performance of a recent marketing campaign. Your first question might be “did sales or leads increase this month over last month?” Now imagine that the answer was no – sales and leads both went down. So what happened? The lower numbers don’t mean the campaign was a failure – it may be a seasonal dip or there may be other factors influencing the numbers. The initial question spawned several more. The shorter the turnaround time for each answer, the more efficient and effective the decision making process will be.
However, there are potential pitfalls here. There are questions around access to data – where is the data stored and how do the business users access it from their visualization and analysis tools? Proprietary or cloud-based systems are sometimes difficult to interface with. The tools themselves can also be a challenge. Desktop applications may require expensive licenses, while cloud-based tools face even more serious data security and integration challenges. Internally hosted analysis applications have high upfront costs, but they spread the overhead across the company and encourage adoption in a way that per-user fees cannot. Regardless of the tool, there are non-trivial training and support requirements for both end-users and IT staff.
Even with the best tools and engaged users, there is another key obstacle – understanding what the data are describing. As you have likely experienced at some point in your career, this is harder than it sounds. When a user misinterprets the information, it can have disastrous consequences. For example, projected sales numbers from last year are not the same as sales from last year. Use the wrong dataset and the decisions you make based on it are inherently flawed. By providing more raw data to geographically distributed groups of people, the provenance – full history and lineage – of the data can be lost and the insights misunderstood or misapplied.
Are you investing in big data tools and teams or is your organization actively pursuing end-user visualization and analytics tools?Like This