One of the major themes in last week’s Qonnections event in Dallas, TX, was the importance of having a solid Data Strategy in conjunction with a good Analytics Strategy to build and support a robust Data Platform that not only enables but also nurtures a Data Literacy culture within a company.
This focus on the Data aspect of a BI Solution Architecture is also evident in the recent acquisitions Qlik has made to expand its product portfolio with Attunity and Podium Data (and the subsequent introduction of the Qlik Data Catalyst product), adding Enterprise Data Integration and Delivery and Enterprise Data Management tools to the product family.
Besides having a robust Data Platform (the Products and Tools), another important aspect of the Data Strategy is how these products and tools are integrated and used from a Development point of view. The aim should be to have the necessary tools in place, and also to use these tools to their maximum potential and taking advantage of all their capabilities.
In this post, I want to expand a little more on why it’s important to have a solid Data Strategy as part of a BI Program, and also present a Framework that I’ve been developing to enhance this very aspect of Qlik implementations from a Development perspective, with advanced data processing techniques that take advantage of the extensive capabilities of the existing Qlik products that we’re already familiar with (QlikView and Qlik Sense): The Lean Data Processing Framework.
Information is Power
Today, we are living in the information age, where data has become a highly valuable commodity for companies of any kind or size. It has been said that, in this day and age, data is the “new oil”. You may or may not agree with the analogy, but it emphasizes just how important data is for any business.
The truth is, data is an essential ingredient in any decision-making process. Massive volumes of data are being generated constantly, at an ever-increasing rate, from almost every single aspect of daily life. And data needs to be structurally stored and logically processed in order to be analyzed, and to draw constructive observations and insights from it. It needs to be taken from its raw form, and enriched and transformed to make it analytics-ready. That’s the point at which data really becomes useful. Just like oil needs to be refined in order for it to be useful, data also needs to be processed first before we can gather any insight from it.
Data needs to be taken from its raw form, and enriched and transformed to make it analytics-ready, and to draw constructive observations and insights from it. That’s the point at which data really becomes useful.
Chances are that the first step, storing the data, is already taken care of by most companies. Also, almost any BI project requires varying degrees of data processing before we can start displaying charts and analyses. But how do we ensure that this data processing step, taking the raw data and making it analytics-ready, is implemented in a way that makes it fast and efficient?
It becomes more and more important that Business Intelligence projects use best practices not only to store but also to process the data. Data Processing, sometimes also referred to as Data Preparation, is a core function in any analytics project and the more processing power is available, the faster the decision-making process can take place. And notice that I have used the term processing power. I believe that increasing the processing power does not only mean increasing hardware resources, but it can also be increased through software efficiencies. I became convinced of this some time ago, and set out to develop a framework to ensure all my projects that involved medium-to-large volumes of data are implemented in a way that supports scalability and makes optimal use of resources to reduce infrastructure cost. This became the Lean Data Processing Framework.
The Lean Data Processing paradigm suggests that, in order to build scalable, high-performance data processing systems, it is important to focus on optimizing the underlying operations being performed during the Data Preparation cycle, with the goal of being able to process data in the least amount of time and with the least amount of hardware resources possible.
The goal of the Lean Data Processing Framework is to able to process data in the least amount of time and with the least amount of hardware resources possible.
In order to achieve this goal, we need to learn and use techniques that achieve a certain data prep operation in an efficient way, as well as implement mechanisms to process the data incrementally.
I just released an online course that covers all of the tools and techniques in the Lean Data Processing Framework. In this course you will learn and explore techniques that will make common script operations in QlikView and Qlik Sense run as optimally as possible. You will also learn how to implement incremental data processing, both on the Extract and Transform stages, as well as techniques for implementing an efficient data architecture. Learn more about the course here.
The Parsimony Mindset
Another way in which we can understand the philosophy behind the Lean Data Processing paradigm is with the Parsimony Principle. One of the formulations of the Parsimony principle is:
It is pointless to do with more what can be done with less.
This principle is also known as the Principle of Economy, or Ockham’s Razor, and although it wasn’t formulated with data processing in mind, it finds a very direct application in this area, as well as many other domains, mostly in science.
The basic idea of this principle is to avoid wasting resources. Resources, in many domains, are always valuable, so any strategy that allows us to save resources is always a good strategy.
Having a parsimony mindset, with which we always try to minimize resource usage, is an essential part of adopting the Lean Data Processing paradigm in our QlikView and Qlik Sense implementations.
Pillars of Lean Data Processing
Let’s take a look at how an efficient data processing architecture in QlikView or Qlik Sense looks like.
The following diagram shows how the various strategies that are part of the Lean Data Processing Framework work together to make a highly-optimal data processing architecture.
This post is an overview of the importance of implementing a well-designed Data Strategy, and how the Lean Data Processing paradigm can help you reduce infrastructure cost, increase the efficiency of your data prep processes, and lead to a reduced raw-to-ready time for your analytics projects.
In an upcoming post, I’ll dive deeper into the components of a Lean Data Processing architecture in QlikView and Qlik Sense, as well as the benefits such an architecture can bring. Subscribe with your email address to receive updates when the new posts are released.
Thanks for reading.
What are your thoughts? Leave a comment below.
Subscribe to receive new posts like this one and updates right in your inbox