As you know already, I’m a big fan of using Set Analysis in my QlikView applications, especially for Point In Time Reporting. However, there might be some specific situations in which Set Analysis just won’t cut it and you need an alternative.
I found myself in that situation this month in a project where the QlikView application was required to have Sales Data to the highest detail (resulting in a Fact Table of about 300 million rows for two years worth of data) and several analyses of the Pint-In-Time kind were to be made. So, I started doing the usual stuff: built the Data Model, prepared the calendar with some fields to optimize the use of Set expressions, and started creating the objects. To my surprise, when I created the PITR analysis objects, I noticed a poor performance, a very poor performance indeed. I say “to my surprise” because I was counting on having a high performance app, despite having that amount of data in memory, based on two things:
- The Server I was working on had 512 GB of RAM, and 4 octo core processors. (What else would you need with a server like that, right?)
- The Data Model was not that complex, I had already worked with similar models and required analyses, and had not had any problems with my set expressions.
But there you go, not everything goes as first planned in terms of performance, and I started making some adjustments: I tried summarizing the data, simplifying the Data Model a little bit, and even considered having the pre-built fields for MTD, LYMTD, etc. This last option was not the best actually and I was just not satisfied with settling for that because it would mean the user would not be able to “play” and analyze or make comparisons based on different months.
After several days trying to figure it out, I was presented with an option I discarded instinctively: using Flags in the calendar in order to use expressions like Sum(Sales * FlagMTD). The reason I first doubted using this approach was very obvious to me: How would an expression that had to be evaluated on a row-by-row basis would be better than Set Analysis? Plus, that is an old-fashion method, used when there was no Set Analysis. I was wrong, not in the second part of my reasoning, but in the first.
Turns out that on the first test, Flags performed a lot better than Set Analysis: From around 16 seconds in a single and basic object using Set expressions, to less than 1 second using Flags on the same object. The Dashboard Sheet, which was taking nearly 50 seconds to calculate using Set expressions, took less than 5 using Flags. Sometimes it took less than 1, depending on the selections I made.
I felt relief for being able to deliver a high performing app, and at the same time I was taught a lesson: Try it before you trash it.
So, how does it work? If this approach is new to you, I will advice you to see the following explanation from the “associative” perspective, since I think it would be easier to understand it that way. The following diagram illustrates the concept.
We have the Fact Table to the left, and the Caleendar Table to the right, linked by a field called “KeyDate”, which is not the field used for the UI selections. We have split the Date Field in order to associate ONE date in the Calendar Table to SEVERAL dates in the Fact Table. You can see it as a “Translation” of Dates, if you like, in which you have on one side the Dates that will actually be used for selections in the UI, and the actual Dates with which QlikView will do its work, on the other side. In the above example, Feb 2011 in the “User Date” gets translated/associated to the following “Transaction Dates”:
- Feb 2011 itself, with the Flag for “This Year Month To Date”
- Feb 2010, with the Flag for “Last Year Month To Date”
- Jan and Feb 2010, with the Flag for “Last Year YTD”
- Jan and Feb 2011, with the Flag for “This Year YTD”
The most important thing that you need to understand when using this approach is that, even when the User Date is associated with different Transaction Dates, you will be able to choose which values to actually take into account by using the Flag in your expression. In this way, all the dates that you want to exclude in the calculation, will be zeroed out by the Flag itself.
- For TYMTD Calculation: Sum(SalesAmount * _TYMTD)
- For LYMTD Calculation: Sum(SalesAmount * _LYMTD)
- and so on…
Things to take care of
- Nulls. I’ve noticed that, since what is being done here is a direct multiplication (row by row), having nulls in the fields being multiplied may slow down the calculation. So, for best performance, make sure you don’t have any nulls netiher in the SalesAmount field nor in any of the flags.
- You might want to use variables for your different expressions. It will give you more control and is better for maintenance.
- The user needs to always have one selected value in the Month and year field for this to work. If this is not the case, the calculations will yield unexpected results.