what can be done with less.”
High-quality learning resources one click away.
By joining the Lean Data Processing master class, you’ll get access to high-quality, content-rich training material to take your Qlik skills to the next level and prepare yourself to tackle big data processing challenges.
Video Lectures6+ hours of video recordings
Get on-demand access to more than 6 hours of video recordings that will guide you through the concepts and techniques in detail.
Demos and ExercisesPractice the concepts in your own computer
Demos and Exercises
The video lectures are paired with demos showing how to apply the techniques, and guided exercises for you to implement the strategies.
Online ForumDiscuss the topics with your fellow students
Discuss the material with your fellow students. Ask questions, interact with your instructor and connect with other students.
Learning assessmentQuizzes to test your knowledge
Throughout the course, there’s a series of quizzes and other activities in which you will be able to evaluate your progress and learning.
Discover the power of the Lean Data Processing framework.
The following diagram summarizes the optimization strategies we cover in the Lean Data Processing master class. By the end of the course, you will be able to implement all of these strategies in your Qlik projects and will have a clear understanding of when, how and why to implement each of them.
This section of the master class will provide an introduction to the Lean Data Processing paradigm, describing what the goal of implementing this framework is, as well as the architecture elements involved. We’ll cover:
- Why is Lean Data Processing important?
We start the class by outlining the benefits of adopting the LDP framework for optimizing data processing jobs. This will give context to all topics that will come after, and will help us keep focus on what our end goal is.
- Pillars of Lean Data Processing
We will describe the various elements of the LDP framework and how they work together to form a high-performance data processing architecture.
- The Parsimony Mindset
Here we describe the philosophy behind the Lean Data Processing paradigm.
In this part of the master class, we’ll cover the basic techniques that are essential for the actual strategies we’ll cover in future sections. For experienced QlikView and Qlik Sense developers, this section will be an in-depth review of things you’re probably familiar with, and you may pick up a new technique or two. If you’re a newcomer to the Qlik world, this is a great way to start learning simple techniques for optimizing data handling in the Qlik platform.
- What makes and breaks an optimized QVD load
We will explore how to ensure that our scripts use an optimized load when pulling data from QVD files. There are certain scenarios in which we are able to achieve super-fast loads from a QVD file and it is a very essential technique if we really want our reload jobs to run efficiently.
- Optimizing Data Transformation
Here, we will be looking at common script operations used for manipulating data, as well as their alternatives in some cases, to understand when to use each one and how to use them efficiently.
- QVD Segmentation
One common technique when dealing with massive datasets that we have to store in QVDs is called QVD segmentation. This technique can be helpful for managing large data files easily, while also providing a level of optimization for our reloads. In this section, we will explain how the QVD Segmentation process works and in which scenarios it can be useful. We’ll also explore a couple examples that make use of an external script provided with the course files.
In this section of the course, we will explore a set of techniques for processing data incrementally. We will be looking at both the Extract layer, to see how we can pull data from external sources incrementally, as well as at the Transform layer, to see how we can process and transform data in QVDs incrementally.
- Basics of Incremental Data Processing
We start this part of the master class looking at the basic concepts related to incremental load strategies.
Benefits: We’ll discuss what are the benefits of processing data incrementally and why we should invest in this area as part of our development efforts.
The 3-stage data architecture: Since the incremental strategies we’ll discuss are based on a 3-stage data architecture, we’ll review what this architecture is and what benefits it has.
Identifying what changed: In order to implement an incremental reload strategy, we need some way of identifying what data was updated in the source. Here, we’ll review the most common approaches.
Identifying when data was last refreshed: We’ll also review the various ways in which we can keep track of when the local data repository was last updated, which is essential for implementing delta loads.
- Incremental Extracts
In this section of the master class, we will dive into incremental strategies for the Extract layer, including a review of the various scenarios we may encounter.
- Delta Tags
Here we present a mechanism for easily determining the delta timeframe for each data processing job, based on when the corresponding input and output data was last updated. The Delta Tags framework is compatible with all the delta strategies presented in this class and brings reliability, consistency, as well as efficiency to our incremental processing strategies.
- Incremental Transforms
In this section of the master class we dive into incremental strategies for dealing with data transformation processes.
The 3 Delta Scenarios: The first step to implementing an incremental transform strategy is identifying the scenario we’re dealing with. In this section, we’ll review the characteristics of various delta scenarios we may encounter in our Qlik projects.
The 4 Delta Strategies: We will present four different strategies we can use to deal with incremental data processing jobs, each with varying degrees of reliability and complexity. We will discuss the ins and outs of each strategy, their suitability for different scenarios, as well as their specific advantages and limitations.
A Framework for Delta Transforms: In this section, we’ll present a framework we can follow to facillitate the implementation of incremental strategies in the Transform layer. This framework is based on the most optimal of the delta strategies and is applicable to all delta scenarios.
Practice Case: We will spend some time putting the theory into practice with several exercises that will demonstrate the implementation of different incremental strategies, using an example database provided as part of this master class. We will also compare the performance gains resulting from each alternative.
In this part of the master class, we’ll explore the concept of Data Parallelization, and how it can drastically reduce the reload times of our Qlik data processing jobs. We’ll look at:
- Data Parallelization Overview
We’ll discuss how data parallelization works and how it can help us significantly improve reload times, including:
4 Steps to Parallelization: Overview of the process involved in orchestrating a parallel execution of Qlik reloads.
3 ways to Parallelization: Ways in which task parallelization is implemented in general.
- QVDs: Essential for Data Parallelization in Qlik
One of the main reasons why we can implement data parallelization in Qlik in an efficient way is the use of QVDs. We’ll briefly discuss their key role in supporting a Parallel Processing architecture.
We will also look at the two approaches we can use to implement Data Parallelization in Qlik:
- Basic approach
With this approach, we implement data parallelization for reload jobs that are easily distributable and that can be performed in a single server. This approach doesn’t require external tools, since the distribution criteria is static and, although it’s not as flexible and reliable as when using an orchestration service, it can get the job done in various scenarios.
- Using Orchestration Service, enQ
This approach relies on an external add-on software that is in charge of orchestrating the job execution and distributing the tasks dynamically. We’ll explore how this approach is significantly more flexible than the previous one, and how it provides additional capabilities, such as:
Multi-node reloads: the ability to distribute a single reload job across multiple nodes in coordination.
Centralized interface: with an orchestration software, jobs can be managed, monitored and controlled via a web interface.
Failover capabilities: It also provides the ability to re-launch processes when workers fail to finish the job, as well as re-allocate tasks among different worker nodes when something goes wrong.
Logging and analytics: It also enables the possibility to collect execution data and provide better logging capabilities.
As in previous sections of the class, we’ll put the theory into practice by demonstrating the concepts with hands-on examples:
- Using the Basic approach
In the first example we walk through the process of implementing data parallelization in its most basic form, using command line and within a single server.
- Using Orchestration Service
In the second example, we will introduce an orchestration service, enQ, with which we can easily implement data parallelization for Qlik jobs. This will not be an example the students will be able to perform, since it requires additional software, but rather it will be a product demo and walk-through of the service using a big data use case.
We’ll demonstrate how this service can be used to run Qlik reloads in parallel across multiple servers to easily handle big data use cases.
The software we use here is a paid product, and is not available for free. It’s also not included as part of the master class, but it’s a good opportunity for students to see how this software works and how it can be integrated into their Qlik projects.
What are the benefits of adopting the Lean Data Processing paradigm?
Data Processing is a core function in any analytics project, and it has become increasingly important for BI projects to implement efficient strategies for processing data in the extract and transform layers. Implementing the tools and techniques covered in the LDP master class will help you empower your projects in multiple ways, for example:
Faster reloads, improved lead times
Optimizing the data processing jobs in QlikView and Qlik Sense has the primary benefit of building faster reload jobs which, in turn, enables the possibility of refreshing the end reports with more frequency.
Reduced hardware resource usage
Besides reducing reload times, these techniques also reduce the hardware resource usage (CPU and RAM) that the reload jobs require to process the data (Extract and Transform).
Easier to deal with large amounts of data
No matter how large the data we’re dealing with is, by implementing Lean Data Processing strategies in the transform layer, the data volume becomes a non-issue.
Data growth is an ever-present challenge. Using the techniques covered in the LDP master class will help you build optimized data pipelines, and will make that challenge easier to tackle.
Optimized reload jobs can also bring other indirect efficiencies, like the ability to make decisions faster based on fresh data.
Using less hardware for data processing means there’s more hardware available to handle more jobs or for other uses.
What will you learn today?
Enroll now and start growing your Qlik Skills