THE LEAN MINDSET

"It is pointless to do with more,
what can be done with less.”

― The Parsimony Principle

JOIN THE MASTER CLASS

High-quality learning resources one click away.

By joining the Lean Data Processing master class, you’ll get access to high-quality, content-rich training material to take your Qlik skills to the next level and prepare yourself to tackle big data processing challenges.

Video Lectures

6+ hours of video recordings

Video Lectures

Get on-demand access to more than 6 hours of video recordings that will guide you through the concepts and techniques in detail.

Demos and Exercises

Practice the concepts in your own computer

Demos and Exercises

The video lectures are paired with demos showing how to apply the techniques, and guided exercises for you to implement the strategies.

Online Forum

Discuss the topics with your fellow students

Online Forum

Discuss the material with your fellow students. Ask questions, interact with your instructor and connect with other students.

Learning assessment

Quizzes to test your knowledge

Learning assessment

Throughout the course, there’s a series of quizzes and other activities in which you will be able to evaluate your progress and learning.

Who is this class for?

If you are a QlikView and/or Qlik Sense Developer, no matter what your skill level is, the techniques covered in this class will help you find ways to make your reload jobs run faster and more efficiently. You will also hear about not too common techniques for dealing with long-running reload jobs, and techniques for dealing with massive amounts of data.

Techniques applicable to both
QlikView and Qlik Sense

All the techniques we will cover in this class are related to how we process data using the load script. Since the load script syntax and capabilities are mostly the same for both QlikView and Qlik Sense, all of what we will cover here can be applied to QlikView or Qlik Sense documents.

Class Instructor

My name is Miguel Ángel García and I’ll be your instructor in this class. I have worked with Qlik tools for 10+ years and implemented BI solutions at multiple customers across a variety of industries. I’m author of the book QlikView 11 for Developers (several editions) and blogger at AfterSync (a.k.a. iQlik).

I became particularly interested in techniques for efficient data processing with QlikView and Qlik Sense in recent projects that involved massive amounts of data, which led me on a 3-year quest researching the topic and developing all the various strategies, tools and materials that are now available on this site.

Class Schedule

[add_eventon_fc grid_ux=”0″ load_fullmonth=”yes” hover=”numname” nexttogrid=”yes” style=”nobox” jumper=”yes” mo1st=”yes” event_type=”227″]

Discover the power of the Lean Data Processing framework.

The following diagram summarizes the optimization strategies we cover in the Lean Data Processing master class. By the end of the course, you will be able to implement all of these strategies in your Qlik projects and will have a clear understanding of when, how and why to implement each of them.

Enroll Now

Syllabus

The class is split into 4 parts to cover the full range of techniques for Lean Data Processing from basic to advanced, starting with the foundation and covering scenarios with gradually increasing complexity.

Estimated Time Effort

The estimated time required to complete the course, including following the exercises in your own computer and completing the assessments, is 18 hrs.

LDP
Fundamentals

This section of the master class will provide an introduction to the Lean Data Processing paradigm, describing what the goal of implementing this framework is, as well as the architecture elements involved. We’ll cover:

Why is Lean Data Processing important?
We start the class by outlining the benefits of adopting the LDP framework for optimizing data processing jobs. This will give context to all topics that will come after, and will help us keep focus on what our end goal is.
Pillars of Lean Data Processing
We will describe the various elements of the LDP framework and how they work together to form a high-performance data processing architecture.
The Parsimony Mindset
Here we describe the philosophy behind the Lean Data Processing paradigm.

Optimized Data Handling

In this part of the master class, we’ll cover the basic techniques that are essential for the actual strategies we’ll cover in future sections. For experienced QlikView and Qlik Sense developers, this section will be an in-depth review of things you’re probably familiar with, and you may pick up a new technique or two. If you’re a newcomer to the Qlik world, this is a great way to start learning simple techniques for optimizing data handling in the Qlik platform.

What makes and breaks an optimized QVD load
We will explore how to ensure that our scripts use an optimized load when pulling data from QVD files. There are certain scenarios in which we are able to achieve super-fast loads from a QVD file and it is a very essential technique if we really want our reload jobs to run efficiently.
Optimizing Data Transformation
Here, we will be looking at common script operations used for manipulating data, as well as their alternatives in some cases, to understand when to use each one and how to use them efficiently.
QVD Segmentation
One common technique when dealing with massive datasets that we have to store in QVDs is called QVD segmentation. This technique can be helpful for managing large data files easily, while also providing a level of optimization for our reloads. In this section, we will explain how the QVD Segmentation process works and in which scenarios it can be useful. We’ll also explore a couple examples that make use of an external script provided with the course files.

Incremental Data Processing

In this section of the course, we will explore a set of techniques for processing data incrementally. We will be looking at both the Extract layer, to see how we can pull data from external sources incrementally, as well as at the Transform layer, to see how we can process and transform data in QVDs incrementally.

Basics of Incremental Data Processing
We start this part of the master class looking at the basic concepts related to incremental load strategies.
- Benefits: We’ll discuss what are the benefits of processing data incrementally and why we should invest in this area as part of our development efforts.
- The 3-stage data architecture: Since the incremental strategies we’ll discuss are based on a 3-stage data architecture, we’ll review what this architecture is and what benefits it has.
- Identifying what changed: In order to implement an incremental reload strategy, we need some way of identifying what data was updated in the source. Here, we’ll review the most common approaches.
- Identifying when data was last refreshed: We’ll also review the various ways in which we can keep track of when the local data repository was last updated, which is essential for implementing delta loads.
Incremental Extracts
In this section of the master class, we will dive into incremental strategies for the Extract layer, including a review of the various scenarios we may encounter.
Delta Tags
Here we present a mechanism for easily determining the delta timeframe for each data processing job, based on when the corresponding input and output data was last updated. The Delta Tags framework is compatible with all the delta strategies presented in this class and brings reliability, consistency, as well as efficiency to our incremental processing strategies.
Incremental Transforms
In this section of the master class we dive into incremental strategies for dealing with data transformation processes.
- - The 3 Delta Scenarios: The first step to implementing an incremental transform strategy is identifying the scenario we’re dealing with. In this section, we’ll review the characteristics of various delta scenarios we may encounter in our Qlik projects.
  - The 4 Delta Strategies: We will present four different strategies we can use to deal with incremental data processing jobs, each with varying degrees of reliability and complexity. We will discuss the ins and outs of each strategy, their suitability for different scenarios, as well as their specific advantages and limitations.
  - A Framework for Delta Transforms: In this section, we’ll present a framework we can follow to facillitate the implementation of incremental strategies in the Transform layer. This framework is based on the most optimal of the delta strategies and is applicable to all delta scenarios.
  - Practice Case: We will spend some time putting the theory into practice with several exercises that will demonstrate the implementation of different incremental strategies, using an example database provided as part of this master class. We will also compare the performance gains resulting from each alternative.

Parallel Data Processing

In this part of the master class, we’ll explore the concept of Data Parallelization, and how it can drastically reduce the reload times of our Qlik data processing jobs. We’ll look at:

Data Parallelization Overview
We’ll discuss how data parallelization works and how it can help us significantly improve reload times, including:
- - - 4 Steps to Parallelization: Overview of the process involved in orchestrating a parallel execution of Qlik reloads.
    - 3 ways to Parallelization: Ways in which task parallelization is implemented in general.
QVDs: Essential for Data Parallelization in Qlik
One of the main reasons why we can implement data parallelization in Qlik in an efficient way is the use of QVDs. We’ll briefly discuss their key role in supporting a Parallel Processing architecture.

We will also look at the two approaches we can use to implement Data Parallelization in Qlik:

Basic approach
With this approach, we implement data parallelization for reload jobs that are easily distributable and that can be performed in a single server. This approach doesn’t require external tools, since the distribution criteria is static and, although it’s not as flexible and reliable as when using an orchestration service, it can get the job done in various scenarios.
Using Orchestration Service, enQ
This approach relies on an external add-on software that is in charge of orchestrating the job execution and distributing the tasks dynamically. We’ll explore how this approach is significantly more flexible than the previous one, and how it provides additional capabilities, such as:
- - - Multi-node reloads: the ability to distribute a single reload job across multiple nodes in coordination.
    - Centralized interface: with an orchestration software, jobs can be managed, monitored and controlled via a web interface.
    - Failover capabilities: It also provides the ability to re-launch processes when workers fail to finish the job, as well as re-allocate tasks among different worker nodes when something goes wrong.
    - Logging and analytics: It also enables the possibility to collect execution data and provide better logging capabilities.

As in previous sections of the class, we’ll put the theory into practice by demonstrating the concepts with hands-on examples:

Using the Basic approach
In the first example we walk through the process of implementing data parallelization in its most basic form, using command line and within a single server.
Using Orchestration Service
In the second example, we will introduce an orchestration service, enQ, with which we can easily implement data parallelization for Qlik jobs. This will not be an example the students will be able to perform, since it requires additional software, but rather it will be a product demo and walk-through of the service using a big data use case.
We’ll demonstrate how this service can be used to run Qlik reloads in parallel across multiple servers to easily handle big data use cases.
The software we use here is a paid product, and is not available for free. It’s also not included as part of the master class, but it’s a good opportunity for students to see how this software works and how it can be integrated into their Qlik projects.

What are the benefits of adopting the Lean Data Processing paradigm?

Data Processing is a core function in any analytics project, and it has become increasingly important for BI projects to implement efficient strategies for processing data in the extract and transform layers. Implementing the tools and techniques covered in the LDP master class will help you empower your projects in multiple ways, for example:

Faster reloads, improved lead times

Optimizing the data processing jobs in QlikView and Qlik Sense has the primary benefit of building faster reload jobs which, in turn, enables the possibility of refreshing the end reports with more frequency.

Reduced hardware resource usage

Besides reducing reload times, these techniques also reduce the hardware resource usage (CPU and RAM) that the reload jobs require to process the data (Extract and Transform).

Easier to deal with large amounts of data

No matter how large the data we’re dealing with is, by implementing Lean Data Processing strategies in the transform layer, the data volume becomes a non-issue.

Scalability

Data growth is an ever-present challenge. Using the techniques covered in the LDP master class will help you build optimized data pipelines, and will make that challenge easier to tackle.

Increased efficiency

Optimized reload jobs can also bring other indirect efficiencies, like the ability to make decisions faster based on fresh data.

Increased capacity

Using less hardware for data processing means there’s more hardware available to handle more jobs or for other uses.

What will you learn today?

Enroll now and start growing your Qlik Skills

High-quality learning resources one click away.

Video Lectures

Video Lectures

Demos and Exercises

Demos and Exercises

Online Forum

Online Forum

Learning assessment

Learning assessment

Who is this class for?

Techniques applicable to both QlikView and Qlik Sense

Class Instructor

Class Schedule

Discover the power of the Lean Data Processing framework.

Enroll Now

Syllabus

Estimated Time Effort

LDP Fundamentals

Optimized Data Handling

Incremental Data Processing

Parallel Data Processing

LDP Fundamentals

Optimized Data Handling

Incremental Data Processing

Parallel Data Processing

What are the benefits of adopting the Lean Data Processing paradigm?

Faster reloads, improved lead times

Reduced hardware resource usage

Easier to deal with large amounts of data

Scalability

Increased efficiency

Increased capacity

What will you learn today?

Enroll Now

Techniques applicable to both
QlikView and Qlik Sense

LDP
Fundamentals

LDP
Fundamentals