πŸ“‹ Table of Contents

  1. Overview
  2. Architecture Design
  3. Phase 1: Foundation Setup
  4. Phase 2: Data Extraction
  5. Phase 3: Dimension Transformation
  6. Phase 4: Data Integration
  7. Phase 5: Business Logic Application
  8. Phase 6: Incremental Load
  9. Phase 7: SCD Type 2
  10. Phase 8: Master Calendar
  11. Phase 9: Data Persistence
  12. Phase 10: Visualization Model
  13. Testing & Validation

🎯 Overview

Purpose of This Document

This technical implementation guide provides detailed explanations of HOW to build the GlobalTech Electronics data warehouse solution. It bridges the gap between business requirements (covered in the Scenario document) and actual code implementation (covered in the Script Reference document).

Document Structure

Each phase includes:


πŸ—οΈ Architecture Design

Data Flow Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SOURCE SYSTEMS β”‚ β”‚ (Product DB) (CRM) (POS) (Store DB) (Daily POS Feed) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ EXTRACTION LAYER β”‚ β”‚ (Load QVD files from source) β”‚ β”‚ β”‚ β”‚ DIM_Products DIM_Customers FACT_Sales β”‚ β”‚ DIM_Stores FACT_Sales_Incremental β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ TRANSFORMATION LAYER β”‚ β”‚ β”‚ β”‚ β€’ Calculate profit margins β”‚ β”‚ β€’ Classify customers by loyalty β”‚ β”‚ β€’ Join dimensions with facts β”‚ β”‚ β€’ Apply business rules β”‚ β”‚ β€’ Add calculated metrics β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ LOADING LAYER β”‚ β”‚ β”‚ β”‚ β€’ Incremental load (new records only) β”‚ β”‚ β€’ SCD Type 2 (track changes) β”‚ β”‚ β€’ Master Calendar generation β”‚ β”‚ β€’ Save to transformed QVDs β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PRESENTATION LAYER β”‚ β”‚ β”‚ β”‚ β€’ Load from transformed QVDs β”‚ β”‚ β€’ Qualify fields to avoid synthetic keys β”‚ β”‚ β€’ Create star schema β”‚ β”‚ β€’ Ready for visualization β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DASHBOARDS & APPS β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Folder Structure

C:\QlikTraining\ β”‚ β”œβ”€β”€ DataFiles\ (Source data) β”‚ β”œβ”€β”€ DIM_Products.qvd β”‚ β”œβ”€β”€ DIM_Customers.qvd β”‚ β”œβ”€β”€ FACT_Sales.qvd β”‚ β”œβ”€β”€ DIM_Stores.qvd β”‚ └── FACT_Sales_Incremental.qvd β”‚ └── DataFiles\Transformed\ (Transformed data) β”œβ”€β”€ DIM_Products_Transformed.qvd β”œβ”€β”€ DIM_Customers_Transformed.qvd β”œβ”€β”€ DIM_Stores_Transformed.qvd β”œβ”€β”€ FACT_Sales_Transformed.qvd β”œβ”€β”€ FACT_Sales_Complete.qvd β”œβ”€β”€ DIM_Customers_SCD.qvd └── MasterCalendar.qvd

Script Organization

The complete ETL script is organized into 13 logical sections (tabs):

  1. Main - Configuration and documentation
  2. Create_Source_Products - Generate product source file
  3. Create_Source_Customers - Generate customer source file
  4. Create_Source_Sales - Generate sales source file
  5. Create_Source_Stores - Generate store source file
  6. Create_Source_Incremental - Generate incremental source file
  7. Data_Load_Script - Variables and configuration
  8. Extract_Data_E - Extract and join with products
  9. Transformed_Data_T2 - Join with customers
  10. Transformed_Data_T1 - Apply business logic
  11. Incremental_Load - Handle daily updates
  12. SCD_Type2 - Track historical changes
  13. Master_Calendar - Generate date dimension
  14. Save_Transformed_Data - Store to QVD
  15. Load_Data_for_Viz - Final data model

Phase 1: Foundation Setup

Concept: ETL Foundation

Before loading any data, you need to establish the foundation:

Why This Matters

Without proper foundation:

Design Decisions

Decision 1: Use Variables for Paths

Decision 2: Use Variables for Business Rules

Decision 3: Separate Source Creation

Implementation Steps

Step 1.1: Create Physical Folder Structure

What to do:

  1. Open Windows Explorer
  2. Create main folder: C:\QlikTraining\DataFiles
  3. Create subfolder: C:\QlikTraining\DataFiles\Transformed

Why two folders:

Step 1.2: Create Qlik Sense Application

What to do:

  1. Open Qlik Sense Desktop
  2. Create new app: "GlobalTech_ETL_Training"
  3. Open Data Load Editor

Navigation tip: Data Load Editor is where ALL ETL code lives

Step 1.3: Create Data Connection

What to do:

  1. In Data Load Editor β†’ Right panel β†’ Data connections
  2. Create new connection:
    • Type: Folder
    • Name: DataFiles
    • Path: C:\QlikTraining\DataFiles

What this does: Tells Qlik where to find files using a logical name (lib://DataFiles/) instead of physical path

Step 1.4: Generate Source QVD Files

Concept: Source Data Generation

In real projects, source data comes from databases, APIs, or file systems. For training, we'll create inline data and save it as QVD files to simulate real sources.

What is a QVD file?

QVD Benefits:

Create 5 Source Files:

Each source file needs its own script tab:

  1. Create tab: Create_Source_Products
  2. Create tab: Create_Source_Customers
  3. Create tab: Create_Source_Sales
  4. Create tab: Create_Source_Stores
  5. Create tab: Create_Source_Incremental

In each tab:

Run the load: This creates all 5 QVD files in your DataFiles folder

Key Techniques Introduced

Technique 1: INLINE Data Loading

LOAD * INLINE [
Field1, Field2, Field3
Value1, Value2, Value3
];

Technique 2: STORE Statement

STORE TableName INTO [lib://DataFiles/filename.qvd] (qvd);

Technique 3: DROP TABLE

DROP TABLE TableName;

Common Pitfalls

❌ Pitfall 1: Forgetting to create folders before running script

❌ Pitfall 2: Wrong folder path in connection

❌ Pitfall 3: Not dropping tables after STORE

Validation Checklist

βœ… Folders exist: C:\QlikTraining\DataFiles and Transformed subfolder βœ… Data connection "DataFiles" appears in connections list βœ… Script runs without errors βœ… Five QVD files created:


🎯 Phase 2: Data Extraction

Concept: The "E" in ETL

Extraction means reading data from source systems into Qlik's memory so we can work with it.

Key principle: Extract first, transform later

Why Extract to Raw Tables First?

Reason 1: Debugging

Reason 2: Auditability

Reason 3: Performance

Design Decisions

Decision 1: Naming Convention - "_Raw" Suffix

Decision 2: Load All Dimensions Before Facts

Decision 3: Date Formatting During Extraction

Implementation Steps

Step 2.1: Configure Variables

Create tab: Data_Load_Script

What to define:

Path Variables:

Business Rule Variables:

Date Variables:

Status Variables:

SET vs LET:

Step 2.2: Extract Dimension - Products

Create tab section: Under Extract_Data_E

What to load:

Fields to load:

Date formatting:

Date(LastModifiedDate, 'YYYY-MM-DD') as Product_LastModifiedDate

Step 2.3: Extract Dimension - Customers

What to load:

Fields to load:

Why geography fields matter:

Step 2.4: Extract Dimension - Stores

What to load:

Fields to load:

Why rename country/region/city?

Step 2.5: Extract Fact - Sales

What to load:

Fields to load:

WHERE Clause for Filtering:

WHERE ProcessedFlag = 0

Key Techniques Introduced

Technique 1: LOAD FROM QVD

LOAD 
    Field1,
    Field2
FROM [lib://DataFiles/filename.qvd] (qvd);

Technique 2: Optimized Load

Technique 3: Date Formatting

Date(DateField, 'YYYY-MM-DD') as FormattedDate

Technique 4: Field Renaming

OriginalField as NewField

Technique 5: WHERE Clause Filtering

WHERE ProcessedFlag = 0

Common Pitfalls

❌ Pitfall 1: Forgetting (qvd) format specifier

❌ Pitfall 2: Loading all fields with * when you need to transform

LOAD * FROM file.qvd (qvd);  ← Can't rename fields

❌ Pitfall 3: Date fields loading as text

❌ Pitfall 4: Not renaming conflicting fields

Validation Checklist

βœ… Script runs without errors βœ… Four tables visible in Data Model Viewer:


Phase 3: Dimension Transformation

Concept: Adding Business Value

Transformation is where we add business intelligence to raw data:

Raw data answers "what happened" Transformed data answers "what does it mean"

Why Transform Dimensions First?

Reason 1: Dimensions are stable

Reason 2: These attributes will be used in fact calculations

Reason 3: Keep transformations separate

Design Decisions

Decision 1: Create New "_Transformed" Tables

Decision 2: Use RESIDENT Loads

Decision 3: Document Business Rules in Comments

Implementation Steps

Step 3.1: Transform Products

Create tab section: Transformed_Data_T1 (or similar)

Objective: Calculate product profitability and classify by price tier

Table name: DIM_Products_Transformed

Source: RESIDENT DIM_Products_Raw

Calculations to add:

1. ProfitPerUnit

ListPrice - UnitCost as ProfitPerUnit

2. ProfitMarginPercent

Round(((ListPrice - UnitCost) / ListPrice) * 100, 2) as ProfitMarginPercent

3. PriceTier Classification

If(ListPrice >= 2000, 'Premium',
   If(ListPrice >= 1000, 'Mid-Range',
      If(ListPrice >= 500, 'Standard', 'Budget'))) as PriceTier

4. IsActive Flag

If(Status = '$(vActiveStatus)', 1, 0) as IsActive

Why keep original fields?

After transformation:

DROP TABLE DIM_Products_Raw;

Step 3.2: Transform Customers

Table name: DIM_Customers_Transformed

Source: RESIDENT DIM_Customers_Raw

Calculations to add:

1. CustomerTenureYears

Year(Today()) - Year(RegistrationDate) as CustomerTenureYears

2. LoyaltyTier

If(Year(Today()) - Year(RegistrationDate) >= 2, 'Gold',
   If(Year(Today()) - Year(RegistrationDate) >= 1, 'Silver', 'Bronze')) as LoyaltyTier

3. MarketType

If(Region = 'North America' or Region = 'Europe', 'Developed', 'Emerging') as MarketType

Why these specific calculations?

After transformation:

DROP TABLE DIM_Customers_Raw;

Key Techniques Introduced

Technique 1: RESIDENT Load

LOAD 
    *,
    NewCalculation
RESIDENT SourceTable;

Technique 2: Arithmetic Calculations

Field1 - Field2 as Difference
Field1 * Field2 as Product
Field1 / Field2 as Ratio

Technique 3: Nested IF Statements

If(condition1, result1,
   If(condition2, result2,
      If(condition3, result3, 
         else_result))) as NewField

Technique 4: Date Functions

Technique 5: Variable Usage in Expressions

If(Status = '$(vActiveStatus)', 1, 0)

Technique 6: Round Function

Round(expression, decimals) as RoundedValue

Common Pitfalls

❌ Pitfall 1: Forgetting to keep original fields

LOAD 
    NewCalculation as Field1  ← Only calculated field
RESIDENT SourceTable;

❌ Pitfall 2: Division by zero

Field1 / Field2 as Ratio  ← What if Field2 = 0?

❌ Pitfall 3: Wrong date arithmetic

Today() - RegistrationDate  ← Returns days, not years

❌ Pitfall 4: Nested IF too deep

❌ Pitfall 5: Not using variables for business rules

If(Discount > 0.10, 'Heavy', 'Light')  ← Hardcoded

Validation Checklist

βœ… Script runs without errors βœ… Two new tables in memory:


πŸ”— Phase 4: Data Integration (Joins)

Concept: Enriching Facts with Dimensions

The Star Schema Principle: Facts should contain keys and measures, dimensions should contain descriptions

Currently:

Solution: JOIN dimension tables to fact table

What is a JOIN?

A JOIN combines two tables based on matching values in a common field (the "key").

Types of Joins:

  1. INNER JOIN: Keep only matching records
  2. LEFT JOIN: Keep all records from left table, matching from right
  3. RIGHT JOIN: Keep all records from right table, matching from left
  4. OUTER JOIN: Keep all records from both tables

Why LEFT JOIN for This Scenario?

We use LEFT JOIN because:

Real-world example:

Design Decisions

Decision 1: Join to Facts, Not Create Separate Tables

Decision 2: Rename Fields During Join (Critical!)

Decision 3: Join Products First, Then Customers

Implementation Steps

Step 4.1: Join Sales with Products

Create tab section: Extract_Data_E (continues from extraction)

Objective: Add product details to every sales transaction

What to do:

LEFT JOIN (FACT_Sales_Raw)
LOAD
    ProductID,
    ProductName as Prod_ProductName,
    Category as Prod_Category,
    ... (all product fields, renamed with Prod_ prefix)
RESIDENT DIM_Products_Transformed;

How LEFT JOIN works:

  1. Qlik looks at each row in FACT_Sales_Raw
  2. Finds the ProductID value (e.g., "P001")
  3. Searches for matching ProductID in DIM_Products_Transformed
  4. If found β†’ Adds all product fields to that sales row
  5. If not found β†’ Adds NULLs
  6. Result: FACT_Sales_Raw now has product columns

Fields to add (all renamed with Prod_ prefix):

Why keep "ProductID" without prefix?

What happens to FACT_Sales_Raw?

Step 4.2: Join Sales with Customers

Create tab section: Transformed_Data_T2

Objective: Add customer details to every sales transaction

What to do:

LEFT JOIN (FACT_Sales_Raw)
LOAD
    CustomerID,
    CustomerName as Cust_CustomerName,
    Country as Cust_Country,
    ... (all customer fields, renamed with Cust_ prefix)
RESIDENT DIM_Customers_Transformed;

Fields to add (all renamed with Cust_ prefix):

After this join: FACT_Sales_Raw now has:

Result: Fully enriched fact table ready for final calculations

Understanding Synthetic Keys (and Why We Avoid Them)

What is a Synthetic Key? When two or more tables share multiple field names (not just one key), Qlik automatically creates a synthetic key table.

Example of Problem:

Table 1: Sales (CustomerID, CustomerName, Country)
Table 2: Customers (CustomerID, CustomerName, Country)

Why Synthetic Keys are Bad:

  1. Performance: Extra table overhead, slower calculations
  2. Memory: Uses more RAM unnecessarily
  3. Confusion: Data model becomes unclear
  4. Calculations: Can produce wrong results in aggregations

How We Avoid Them:

Key Techniques Introduced

Technique 1: LEFT JOIN Syntax

LEFT JOIN (TargetTable)
LOAD 
    KeyField,
    OtherFields
RESIDENT SourceTable;

Technique 2: Field Prefixing

FieldName as Prefix_FieldName

Technique 3: Join Keys

Technique 4: Reading Table in Memory

RESIDENT TableName

Common Pitfalls

❌ Pitfall 1: Forgetting to rename fields

LEFT JOIN (FACT_Sales_Raw)
LOAD 
    ProductID,
    ProductName,  ← Not renamed!
    Category      ← Not renamed!
RESIDENT DIM_Products_Transformed;

❌ Pitfall 2: Renaming the key field

LEFT JOIN (FACT_Sales_Raw)
LOAD 
    ProductID as Prod_ProductID,  ← WRONG!
    ...

❌ Pitfall 3: Using wrong join type

INNER JOIN (FACT_Sales_Raw)  ← Drops non-matching sales

❌ Pitfall 4: Inconsistent prefixes

Product_Name as Prod_ProductName
Product_Category as Prd_Category  ← Different prefix!

❌ Pitfall 5: Circular joins

LEFT JOIN (Table1) ... from Table2
LEFT JOIN (Table2) ... from Table1

Validation Checklist

βœ… Script runs without errors βœ… FACT_Sales_Raw still has 15 rows (LEFT JOIN doesn't reduce) βœ… FACT_Sales_Raw now has many more columns:


πŸ’Ό Phase 5: Business Logic Application

Concept: Creating the Final Fact Table

Now we have:

This phase creates the final fact table that will be used for all analysis.

Why Create a New Final Table?

Option 1: Continue modifying FACT_Sales_Raw

Option 2: Create new FACT_Sales_Final table (WE CHOOSE THIS)

Design Decisions

Decision 1: Calculate All Metrics in One Place

Decision 2: Use Meaningful Field Names in Output

Decision 3: Add Metadata Fields

Decision 4: Clean Up After Transformation

Implementation Steps

Step 5.1: Create Final Fact Table

Create tab section: Transformed_Data_T1

Objective: Create FACT_Sales_Final with all business logic applied

Source: RESIDENT FACT_Sales_Raw

Table name: FACT_Sales_Final

Step 5.2: Include Time Intelligence Fields

Why: Business Ask #6 requires time-based analysis

Fields to add:

OrderYear

Year(OrderDate) as OrderYear

OrderMonth

Month(OrderDate) as OrderMonth

OrderQuarter

Ceil(Month(OrderDate)/3) as OrderQuarter

Why Ceil() for Quarter?

OrderDayOfWeek

WeekDay(OrderDate) as OrderDayOfWeek

Step 5.3: Rename Dimension Fields to Clean Names

Customer Fields (remove Cust_ prefix):

CustomerID,                    ← Keep as-is (key field)
Cust_CustomerName as CustomerName,
Cust_Segment as Segment,
Cust_LoyaltyTier as LoyaltyTier,
Cust_MarketType as MarketType,
Cust_Country as Country,
Cust_Region as Region,
Cust_City as City,
Cust_TenureYears as CustomerTenureYears

Product Fields (remove Prod_ prefix):

ProductID,                     ← Keep as-is (key field)
Prod_ProductName as ProductName,
Prod_Category as Category,
Prod_SubCategory as SubCategory,
Prod_Brand as Brand,
Prod_PriceTier as PriceTier

Why remove prefixes now?

Step 5.4: Include Transaction Fields

Original transaction fields:

Quantity,
Prod_ListPrice as ListPrice,
Prod_UnitCost as UnitCost,
Discount,
ShippingCost,
OrderStatus

Step 5.5: Calculate Revenue Metrics

Business Ask #1: Need accurate revenue calculations

GrossRevenue

Quantity * Prod_ListPrice as GrossRevenue

NetRevenue

Quantity * Prod_ListPrice * (1 - Discount) as NetRevenue

Why (1 - Discount)?

Step 5.6: Calculate Cost Metrics

TotalCost

Quantity * Prod_UnitCost as TotalCost

Step 5.7: Calculate Profit Metrics

Business Ask #3: Need profitability analysis

GrossProfit

(Quantity * Prod_ListPrice * (1 - Discount)) - (Quantity * Prod_UnitCost) as GrossProfit

DiscountAmount

Quantity * Prod_ListPrice * Discount as DiscountAmount

Step 5.8: Apply Business Classifications

Business Ask #4: Categorize orders by value and discount level

OrderValueCategory

If((Quantity * Prod_ListPrice * (1 - Discount)) >= 1000, 
   'High Value', 
   'Standard') as OrderValueCategory

DiscountCategory

If(Discount > 0.10, 
   'Heavy Discount', 
   If(Discount > 0, 'Light Discount', 'No Discount')) as DiscountCategory

IsCompleted

If(OrderStatus = 'Completed', 1, 0) as IsCompleted

Step 5.9: Calculate Total Transaction Value

TotalTransactionValue

(Quantity * Prod_ListPrice * (1 - Discount)) + ShippingCost as TotalTransactionValue

Step 5.10: Add Flags and Metadata

IsCurrentYear

If(Year(OrderDate) = 2026, 1, 0) as IsCurrentYear

ProcessedFlag

1 as ProcessedFlag

ETL_LoadDate

'2026-01-29' as ETL_LoadDate

Step 5.11: Save Dimensions to QVD

Important step BEFORE dropping:

STORE DIM_Products_Transformed INTO [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);
STORE DIM_Customers_Transformed INTO [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

This ensures:

Step 5.12: Clean Up Tables

After saving, now drop:

DROP TABLE FACT_Sales_Raw;
DROP TABLE DIM_Products_Transformed;
DROP TABLE DIM_Customers_Transformed;

Why drop these tables?

Key Techniques Introduced

Technique 1: Complex Calculated Fields

(Field1 * Field2 * (1 - Field3)) - (Field1 * Field4) as Result

Technique 2: Ceil() Function

Ceil(value) as RoundedUp

Technique 3: Multiple Conditions in IF

If(condition1, 'Result1',
   If(condition2, 'Result2', 'DefaultResult')) as Category

Technique 4: Literal Values

1 as ConstantField
'Fixed Text' as TextField

Technique 5: Year/Month/Quarter Functions

Common Pitfalls

❌ Pitfall 1: Wrong order of operations

Quantity * ListPrice * 1 - Discount  ← WRONG
Should be: Quantity * ListPrice * (1 - Discount)

❌ Pitfall 2: Dropping tables before saving

DROP TABLE DIM_Products_Transformed;
STORE DIM_Products_Transformed ...  ← ERROR: Table doesn't exist

❌ Pitfall 3: Not handling NULL values

Field1 / Field2  ← What if Field2 is NULL?

❌ Pitfall 4: Inconsistent field naming

CustomerName  ← Here
Customer_Name  ← There
Cust_Name  ← Somewhere else

❌ Pitfall 5: Forgetting to rename prefixed fields

Prod_ProductName,  ← Still has prefix
Prod_Category      ← Not user-friendly

Validation Checklist

βœ… Script runs without errors βœ… FACT_Sales_Final table created with 15 rows βœ… All time fields present (OrderYear, OrderMonth, OrderQuarter, OrderDayOfWeek) βœ… All dimension fields renamed (CustomerName, ProductName, Segment, Category, etc.) βœ… All revenue calculations present (GrossRevenue, NetRevenue) βœ… All cost calculations present (TotalCost) βœ… All profit calculations present (GrossProfit, DiscountAmount) βœ… All classifications present (OrderValueCategory, DiscountCategory, IsCompleted) βœ… Metadata fields present (ProcessedFlag=1, ETL_LoadDate, IsCurrentYear) βœ… Original tables dropped (FACT_Sales_Raw, DIM_Products_Transformed, DIM_Customers_Transformed) βœ… Dimension QVDs saved before dropping βœ… Pick a row and manually verify calculations are correct

Manual verification example:


πŸ”„ Phase 6: Incremental Load Implementation

Concept: Loading Only New Data

The Problem with Full Loads:

As data grows, full reloads become:

The Incremental Solution:

Benefits:

How Incremental Load Works

Step-by-step process:

  1. Mark what's been loaded

    • Add ProcessedFlag field to source data
    • 0 = not loaded yet, 1 = already loaded
  2. Track what's already in warehouse

    • Save processed data to QVD after each load
    • This becomes the "history"
  3. Filter for new data only

    • Load WHERE ProcessedFlag = 0
    • Only gets unprocessed records
  4. Apply same transformations

    • New data goes through same logic as initial load
    • Ensures consistency
  5. Append to history

    • CONCATENATE new with existing
    • Result: Complete dataset
  6. Mark as processed

    • Set ProcessedFlag = 1
    • Next time: Won't load again

Design Decisions

Decision 1: Use ProcessedFlag Not Dates

Decision 2: Reload Dimensions Each Time

Decision 3: Use NoConcatenate Keyword

Decision 4: Drop and Reload Pattern

Implementation Steps

Step 6.1: Reload Dimension Tables

Create tab section: Incremental_Load

Why reload dimensions?

What to do:

DIM_Products_Transformed:
LOAD * FROM [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);

DIM_Customers_Transformed:
LOAD * FROM [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Tables now in memory:

Step 6.2: Load New Incremental Sales Data

Source: FACT_Sales_Incremental.qvd

What to load:

FACT_Sales_Incremental_Raw:
LOAD 
    OrderID,
    Date(OrderDate, 'YYYY-MM-DD') as OrderDate,
    CustomerID,
    ProductID,
    Quantity,
    Discount,
    ShippingCost,
    OrderStatus,
    ProcessedFlag
FROM [$(vSourcePath)FACT_Sales_Incremental.qvd] (qvd)
WHERE ProcessedFlag = 0;

Key points:

Expected result: 5 new orders loaded (ORD016-ORD020)

Step 6.3: Join Incremental Sales with Dimensions

Join with Products:

LEFT JOIN (FACT_Sales_Incremental_Raw)
LOAD
    ProductID,
    ProductName as Prod_ProductName,
    Category as Prod_Category,
    SubCategory as Prod_SubCategory,
    Brand as Prod_Brand,
    UnitCost as Prod_UnitCost,
    ListPrice as Prod_ListPrice,
    ProfitPerUnit as Prod_ProfitPerUnit,
    PriceTier as Prod_PriceTier,
    IsActive as Prod_IsActive
RESIDENT DIM_Products_Transformed;

Join with Customers:

LEFT JOIN (FACT_Sales_Incremental_Raw)
LOAD
    CustomerID,
    CustomerName as Cust_CustomerName,
    Country as Cust_Country,
    Region as Cust_Region,
    City as Cust_City,
    Segment as Cust_Segment,
    CustomerTenureYears as Cust_TenureYears,
    LoyaltyTier as Cust_LoyaltyTier,
    MarketType as Cust_MarketType
RESIDENT DIM_Customers_Transformed;

Exactly the same as Phase 4 joins!

Step 6.4: Apply Transformations to Incremental Data

Critical: Use NoConcatenate:

NoConcatenate
FACT_Sales_Incremental_Transformed:
LOAD
    ... (all same transformations as Phase 5)
RESIDENT FACT_Sales_Incremental_Raw;

Why NoConcatenate?

Transformations to apply (same as Phase 5):

Result:

Step 6.5: Concatenate with Existing Data

Now explicitly concatenate:

CONCATENATE (FACT_Sales_Final)
LOAD * RESIDENT FACT_Sales_Incremental_Transformed;

What this does:

Why CONCATENATE is safe now:

Step 6.6: Clean Up

Drop temporary tables:

DROP TABLE FACT_Sales_Incremental_Raw;
DROP TABLE FACT_Sales_Incremental_Transformed;
DROP TABLE DIM_Products_Transformed;
DROP TABLE DIM_Customers_Transformed;

Why drop dimensions again?

Final state:

Understanding Auto-Concatenation

Qlik's Auto-Concatenation Rule: When you create a table with identical field names to an existing table, Qlik automatically appends rows.

Example:

Table1:
LOAD
    OrderID,
    OrderDate,
    Quantity
...

Table1:  ← Same name!
LOAD
    OrderID,
    OrderDate,
    Quantity
...
Result: One table "Table1" with all rows combined

Problem: If you're in middle of transformations, auto-concatenation happens too early.

Solution - Option 1: Different Table Names

Table1:
LOAD ... initial data

Table2:  ← Different name
LOAD ... incremental data

CONCATENATE (Table1)
LOAD * RESIDENT Table2;

Solution - Option 2: NoConcatenate Keyword

Table1:
LOAD ... initial data

NoConcatenate  ← Prevents auto-concat
Table2:
LOAD ... incremental data (would auto-concat without NoConcatenate)

CONCATENATE (Table1)
LOAD * RESIDENT Table2;

We use Option 2: Makes table relationships clear

Key Techniques Introduced

Technique 1: WHERE with ProcessedFlag

WHERE ProcessedFlag = 0

Technique 2: NoConcatenate Keyword

NoConcatenate
TableName:
LOAD ...

Technique 3: Explicit CONCATENATE

CONCATENATE (TargetTable)
LOAD * RESIDENT SourceTable;

Technique 4: Reloading from QVD

TableName:
LOAD * FROM [path/file.qvd] (qvd);

Common Pitfalls

❌ Pitfall 1: Auto-concatenation happens unexpectedly

FACT_Sales_Final:
LOAD ... (creates table)

FACT_Sales_Incremental_Transformed:
LOAD ... (auto-concatenates to FACT_Sales_Final!)

❌ Pitfall 2: Forgetting to reload dimensions

LEFT JOIN (FACT_Sales_Incremental_Raw)
LOAD ... RESIDENT DIM_Products_Transformed;  ← ERROR: Table not found

❌ Pitfall 3: Inconsistent transformations

Initial: NetRevenue = Quantity * ListPrice * (1 - Discount)
Incremental: NetRevenue = Quantity * ListPrice  ← Missing discount!

❌ Pitfall 4: Not updating ProcessedFlag

1 as ProcessedFlag  ← Forgot this line

❌ Pitfall 5: Loading all data instead of new only

WHERE ProcessedFlag = 0  ← Missing this line

Validation Checklist

βœ… Script runs without errors βœ… Dimensions reloaded successfully from QVD βœ… Incremental raw data loaded (5 rows) βœ… Joins completed (product and customer details added) βœ… Transformations applied (same logic as initial load) βœ… NoConcatenate created separate table βœ… CONCATENATE merged tables correctly βœ… FACT_Sales_Final now has 20 rows (15 initial + 5 incremental) βœ… All 20 rows have ProcessedFlag = 1 βœ… Temporary tables dropped βœ… No duplicate records (check OrderID - should have ORD001-ORD020, no repeats) βœ… Calculations correct on incremental records (spot check a few)

Test incremental load works: Next time script runs:


πŸ“œ Phase 7: Slowly Changing Dimension (SCD Type 2)

Concept: Tracking Historical Changes

The Problem: Customer C001 is currently in the "Corporate" segment. Next month, they change to "Consumer" segment. Traditional approach:

Before: C001 | Corporate
After:  C001 | Consumer  ← Lost history!

The Solution: SCD Type 2:

Version 1: C001 | Corporate | 2024-01-01 | 2024-05-31 | Not Current
Version 2: C001 | Consumer  | 2024-06-01 | 9999-12-31 | Current

SCD Types Overview

Type 0: No Changes Allowed

Type 1: Overwrite (No History)

Before: Address = "123 Old St"
After:  Address = "456 New St"  ← Old address lost

Type 2: Track Full History (WE USE THIS)

Row 1: Address = "123 Old St" | 2020-01-01 | 2024-12-31 | Not Current
Row 2: Address = "456 New St" | 2025-01-01 | 9999-12-31 | Current

Type 3: Previous Value Column

Current_Segment = "Consumer"
Previous_Segment = "Corporate"

Type 4: History Table

We choose Type 2 because:

SCD Type 2 Structure

Key Fields:

  1. Business Key: CustomerID (identifies customer)
  2. Attributes: Segment, LoyaltyTier (what can change)
  3. EffectiveStartDate: When this version became active
  4. EffectiveEndDate: When this version stopped being active
  5. IsCurrent: Flag for current version (1 = current, 0 = historical)
  6. VersionNumber: Sequence of changes (1, 2, 3...)

Example:

CustomerID | Segment    | LoyaltyTier | EffectiveStartDate | EffectiveEndDate | IsCurrent | VersionNumber
-----------|------------|-------------|-------------------|-----------------|-----------|---------------
C001       | Consumer   | Bronze      | 2023-06-15        | 2024-05-31      | 0         | 1
C001       | Corporate  | Silver      | 2024-06-01        | 9999-12-31      | 1         | 2

Interpretation:

Design Decisions

Decision 1: Track Segment and LoyaltyTier Changes

Decision 2: Use 9999-12-31 for "Current"

Decision 3: Handle Initial Load vs Updates

Decision 4: Close Old Versions When Creating New

Implementation Steps

Step 7.1: Reload Customer Dimension

Create tab section: SCD_Type2

Why reload?

What to do:

DIM_Customers_For_SCD:
LOAD 
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier
FROM [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Note: Only load fields relevant for SCD

Step 7.2: Check If SCD File Exists

The logic:

How to check:

IF FileSize('$(vTargetPath)DIM_Customers_SCD.qvd') > 0 THEN
    ... (handle updates)
ELSE
    ... (handle initial load)
ENDIF

FileSize() function:

Step 7.3: Initial Load (First Run)

When: SCD file doesn't exist (ELSE branch)

What to do: Create version 1 for all customers

DIM_Customers_SCD_New:
LOAD
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    Date(Today()) as EffectiveStartDate,
    Date('9999-12-31') as EffectiveEndDate,
    1 as IsCurrent,
    1 as VersionNumber
RESIDENT DIM_Customers_For_SCD;

Fields added:

Result: 10 rows, one per customer, all version 1

Step 7.4: Update Logic (Subsequent Runs)

When: SCD file exists (THEN branch)

Steps:

  1. Load existing SCD data
  2. Identify current versions
  3. Compare with new data
  4. Detect changes
  5. Create new versions for changed records
  6. Close old versions
  7. Keep unchanged records

Step 7.4.1: Load Existing SCD Data

DIM_Customers_SCD_Existing:
LOAD 
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    EffectiveStartDate,
    EffectiveEndDate,
    IsCurrent,
    VersionNumber
FROM [$(vTargetPath)DIM_Customers_SCD.qvd] (qvd);

Step 7.4.2: Get Current Active Records

DIM_Customers_Current:
LOAD 
    CustomerID,
    Segment as Old_Segment,
    LoyaltyTier as Old_LoyaltyTier,
    VersionNumber as Old_Version
RESIDENT DIM_Customers_SCD_Existing
WHERE IsCurrent = 1;

What this does:

Step 7.4.3: Join to Find Changes

LEFT JOIN (DIM_Customers_For_SCD)
LOAD 
    CustomerID,
    Old_Segment,
    Old_LoyaltyTier,
    Old_Version
RESIDENT DIM_Customers_Current;

What this does:

Step 7.4.4: Identify Changed Records

NoConcatenate
DIM_Customers_Changed:
LOAD *
RESIDENT DIM_Customers_For_SCD
WHERE Segment <> Old_Segment 
   OR LoyaltyTier <> Old_LoyaltyTier;

What this does:

Step 7.4.5: Check If Any Changes Detected

LET vChangesDetected = NoOfRows('DIM_Customers_Changed');

IF vChangesDetected > 0 THEN
    ... (create new versions)
ELSE
    ... (no changes, keep existing)
ENDIF

NoOfRows() function:

Step 7.4.6: Create New Versions (If Changes Detected)

NoConcatenate
DIM_Customers_SCD_New:
LOAD
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    Date(Today()) as EffectiveStartDate,
    Date('9999-12-31') as EffectiveEndDate,
    1 as IsCurrent,
    Old_Version + 1 as VersionNumber
RESIDENT DIM_Customers_Changed;

What this does:

Step 7.4.7: Close Old Versions

CONCATENATE (DIM_Customers_SCD_New)
LOAD
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    EffectiveStartDate,
    Date(Today()-1) as EffectiveEndDate,
    0 as IsCurrent,
    VersionNumber
RESIDENT DIM_Customers_SCD_Existing
WHERE IsCurrent = 1;

What this does:

Why Today()-1?

Step 7.4.8: Keep Unchanged Records

CONCATENATE (DIM_Customers_SCD_New)
LOAD * 
RESIDENT DIM_Customers_SCD_Existing
WHERE IsCurrent = 0;

What this does:

Step 7.4.9: Handle No Changes

ELSE  ← If vChangesDetected = 0

NoConcatenate
DIM_Customers_SCD_New:
LOAD * RESIDENT DIM_Customers_SCD_Existing;

What this does:

Step 7.5: Clean Up

DROP TABLE DIM_Customers_Changed;
DROP TABLE DIM_Customers_SCD_Existing;
DROP TABLE DIM_Customers_For_SCD;
DROP TABLE DIM_Customers_Current;

Result: Only DIM_Customers_SCD_New remains

SCD Logic Flow Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Does SCD QVD File Exist? β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ NO β”‚ YES β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ INITIAL LOAD β”‚ β”‚ UPDATE LOGIC β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Create Ver 1 β”‚ β”‚ 1. Load existing β”‚ β”‚ for all β”‚ β”‚ 2. Get current vers β”‚ β”‚ customers β”‚ β”‚ 3. Compare with new β”‚ β”‚ β”‚ β”‚ 4. Detect changes β”‚ β”‚ Start: Today β”‚ β”‚ 5. Create new vers β”‚ β”‚ End: 9999-12-31 β”‚ β”‚ 6. Close old vers β”‚ β”‚ IsCurrent: 1 β”‚ β”‚ 7. Keep unchanged β”‚ β”‚ Version: 1 β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DIM_Customers_SCD β”‚ β”‚ (Complete History) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Techniques Introduced

Technique 1: FileSize() Function

IF FileSize('path/file.qvd') > 0 THEN
    ... file exists
ELSE
    ... file doesn't exist
ENDIF

Technique 2: NoOfRows() Function

LET vCount = NoOfRows('TableName');

Technique 3: Date Arithmetic

Date(Today()) as StartDate
Date(Today()-1) as EndDate
Date('9999-12-31') as FarFuture

Technique 4: Comparison in WHERE

WHERE Field1 <> Field2

Technique 5: OR Condition

WHERE Condition1 OR Condition2

Technique 6: Version Incrementing

Old_Version + 1 as VersionNumber

Common Pitfalls

❌ Pitfall 1: Not handling first run separately

IF FileSize() > 0 THEN
    ... only has update logic, no ELSE
ENDIF

❌ Pitfall 2: Forgetting to close old versions

← Missing the "update EffectiveEndDate" step

❌ Pitfall 3: Off-by-one date errors

New start: 2024-06-01
Old end:   2024-06-01  ← Overlap! Should be 2024-05-31

❌ Pitfall 4: Not using NoConcatenate**

DIM_Customers_SCD_New:  ← Might auto-concatenate!
LOAD ...

❌ Pitfall 5: Dropping tables in wrong order

DROP TABLE DIM_Customers_SCD_Existing;
CONCATENATE (DIM_Customers_SCD_New)
LOAD * RESIDENT DIM_Customers_SCD_Existing;  ← ERROR!

❌ Pitfall 6: Not tracking all changed attributes

WHERE Segment <> Old_Segment  ← Only checking Segment
← Missing LoyaltyTier check

Validation Checklist

βœ… Script runs without errors on first run (no SCD file) βœ… DIM_Customers_SCD_New created with 10 rows βœ… All rows have VersionNumber = 1 βœ… All rows have IsCurrent = 1 βœ… All rows have EffectiveStartDate = Today βœ… All rows have EffectiveEndDate = 9999-12-31 βœ… SCD QVD file saved βœ… Script runs without errors on second run (SCD file exists) βœ… No changes detected β†’ Same 10 rows βœ… Test with actual change:

To test SCD properly:

  1. Run initial load β†’ 10 customers, all version 1
  2. Run again β†’ Still 10 rows (no changes)
  3. Edit source: Change C001 Segment = 'Consumer'
  4. Run again β†’ Now 11 rows (C001 has 2 versions)
  5. Query: WHERE CustomerID = 'C001' β†’ See version history

πŸ“… Phase 8: Master Calendar Creation

Concept: The Importance of a Date Dimension

Why We Need a Calendar Table:

Sales transactions have dates, but dates alone don't answer questions like:

Calendar table provides:

Traditional Approach vs Calculated Dimensions

Option 1: Calculate in Charts (DON'T DO THIS)

Chart expression: Quarter(OrderDate)
Problem: Calculated for every chart, every time

Option 2: Master Calendar (BEST PRACTICE)

Calendar table: Pre-calculated Quarter field
Charts: Just use Quarter field

Design Decisions

Decision 1: Generate from Data Not Fixed Range

Decision 2: Use AUTOGENERATE for Date Sequences

Decision 3: Include Business Logic

Decision 4: Link to Facts via OrderDate

Implementation Steps

Step 8.1: Find Date Range from Data

Create tab section: Master_Calendar

What to do:

TempDates:
LOAD 
    Min(OrderDate) as MinDate,
    Max(OrderDate) as MaxDate
RESIDENT FACT_Sales_Final;

What this does:

Result: One row with MinDate and MaxDate

Step 8.2: Store Date Range in Variables

What to do:

LET vMinDate = Peek('MinDate', 0, 'TempDates');
LET vMaxDate = Peek('MaxDate', 0, 'TempDates');
DROP TABLE TempDates;

Peek() function:

Why use variables?

DROP TABLE TempDates:

Step 8.3: Generate Date Sequence

Concept: Create one row for each date in range

Traditional SQL approach:

SELECT date FROM date_table
WHERE date BETWEEN '2024-01-15' AND '2024-02-14'

Qlik approach - AUTOGENERATE:

TempCalendar:
LOAD
    Date($(vMinDate) + IterNo() - 1) as TempDate
AUTOGENERATE 1
WHILE $(vMinDate) + IterNo() - 1 <= $(vMaxDate);

How this works:

AUTOGENERATE 1:

IterNo():

WHILE condition:

Date arithmetic:

Iteration 1: $(vMinDate) + 1 - 1 = $(vMinDate) + 0 = Min date
Iteration 2: $(vMinDate) + 2 - 1 = $(vMinDate) + 1 = Min date + 1 day
Iteration 3: $(vMinDate) + 3 - 1 = $(vMinDate) + 2 = Min date + 2 days
...
Iteration N: $(vMinDate) + N - 1 = Max date (stops here)

Example: If MinDate = 2024-01-15 and MaxDate = 2024-02-14:

Result: TempCalendar with 31 rows, one per date

Step 8.4: Add Calendar Attributes

Create full calendar from dates:

MasterCalendar:
LOAD
    TempDate as CalendarDate,
    Week(TempDate) as Week,
    Year(TempDate) as Year,
    Month(TempDate) as Month,
    Day(TempDate) as Day,
    Ceil(Month(TempDate)/3) as Quarter,
    WeekDay(TempDate) as WeekDay,
    WeekName(TempDate) as WeekName,
    MonthName(TempDate) as MonthName,
    'Q' & Ceil(Month(TempDate)/3) as QuarterName,
    Year(TempDate) & '-Q' & Ceil(Month(TempDate)/3) as YearQuarter,
    
    If(Month(TempDate) >= 4, 
       Year(TempDate), 
       Year(TempDate) - 1) as FiscalYear,
    
    If(WeekDay(TempDate) >= 0 and WeekDay(TempDate) <= 4, 1, 0) as IsBusinessDay,
    
    If(TempDate = Today(), 1, 0) as IsToday,
    If(MonthName(TempDate) = MonthName(Today()), 1, 0) as IsCurrentMonth,
    If(Year(TempDate) = Year(Today()), 1, 0) as IsCurrentYear
    
RESIDENT TempCalendar;

DROP TABLE TempCalendar;

Attribute Explanations:

CalendarDate:

Week(TempDate):

Year(TempDate):

Month(TempDate):

Day(TempDate):

Ceil(Month(TempDate)/3) as Quarter:

WeekDay(TempDate):

WeekName(TempDate):

MonthName(TempDate):

'Q' & Ceil(Month(TempDate)/3) as QuarterName:

Year(TempDate) & '-Q' & Ceil(Month(TempDate)/3) as YearQuarter:

FiscalYear Calculation:

If(Month(TempDate) >= 4, 
   Year(TempDate), 
   Year(TempDate) - 1) as FiscalYear

IsBusinessDay:

If(WeekDay(TempDate) >= 0 and WeekDay(TempDate) <= 4, 1, 0)

IsToday:

If(TempDate = Today(), 1, 0)

IsCurrentMonth:

If(MonthName(TempDate) = MonthName(Today()), 1, 0)

IsCurrentYear:

If(Year(TempDate) = Year(Today()), 1, 0)

Step 8.5

: Link Calendar to Sales

Add sales indicator:

LEFT JOIN (MasterCalendar)
LOAD DISTINCT
    OrderDate as CalendarDate,
    1 as HasSales
RESIDENT FACT_Sales_Final;

What this does:

DISTINCT keyword:

Why add HasSales?:

Result:

Key Techniques Introduced

Technique 1: Min/Max Aggregate Functions

Min(Field) as MinValue
Max(Field) as MaxValue

Technique 2: Peek() Function

LET vVariable = Peek('FieldName', RowNumber, 'TableName');

Technique 3: AUTOGENERATE with WHILE

LOAD
    Expression
AUTOGENERATE 1
WHILE condition;

Technique 4: IterNo() Function

Technique 5: Date Arithmetic

Date + Number = Date plus that many days

Technique 6: String Concatenation

'Text' & Field & 'MoreText' as CombinedField

Technique 7: LOAD DISTINCT

LOAD DISTINCT Field1, Field2

Technique 8: Date Functions

Common Pitfalls

❌ Pitfall 1: Off-by-one in date generation

Date($(vMinDate) + IterNo()) as TempDate  ← Wrong!
Should be: $(vMinDate) + IterNo() - 1

❌ Pitfall 2: Wrong fiscal year logic

If(Month(TempDate) > 4, ...)  ← Should be >=

❌ Pitfall 3: Confusing WeekDay numbering

If(WeekDay(TempDate) >= 1 and WeekDay(TempDate) <= 5, ...)  ← Wrong!

❌ Pitfall 4: Not using Date() function**

$(vMinDate) + IterNo() - 1 as TempDate  ← Might be number

❌ Pitfall 5: Forgetting DROP TABLE TempCalendar**

← Missing DROP TABLE TempCalendar

❌ Pitfall 6: Calendar doesn't cover all sales dates**

← Used fixed date range instead of Min/Max

Validation Checklist

βœ… Script runs without errors βœ… Variables vMinDate and vMaxDate set correctly βœ… TempCalendar created with correct number of rows βœ… MasterCalendar has attributes:

Manual verification:


πŸ’Ύ Phase 9: Data Persistence (Saving to QVD)

Concept: The Three-Layer Architecture

Layer 1: Source (Raw Data)

Layer 2: Transformed (Business Logic Applied)

Layer 3: Visualization (Optimized for Dashboards)

Why Save to QVD?

Reason 1: Speed

Reason 2: Incremental Loading Support

Reason 3: Reusability

Reason 4: Backup/Audit

Reason 5: Development Efficiency

Design Decisions

Decision 1: Save Multiple Versions of Fact Table

Decision 2: Drop Tables After Saving

Decision 3: Save SCD and Calendar

Decision 4: Organized Folder Structure

DataFiles/                 ← Source (never modify)
DataFiles/Transformed/     ← Our transformed data

Implementation Steps

Step 9.1: Save Fact Tables

Create tab section: Save_Transformed_Data

Save for incremental loading:

STORE FACT_Sales_Final INTO [$(vTargetPath)FACT_Sales_Transformed.qvd] (qvd);

Save for visualization:

STORE FACT_Sales_Final INTO [$(vTargetPath)FACT_Sales_Complete.qvd] (qvd);

Why save twice?

Step 9.2: Save Dimension Tables

Products:

STORE DIM_Products_Transformed INTO [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);

Customers:

STORE DIM_Customers_Transformed INTO [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Stores:

STORE DIM_Stores_Raw INTO [$(vTargetPath)DIM_Stores_Transformed.qvd] (qvd);

Note: DIM_Stores_Raw (not transformed)

Step 9.3: Save SCD Table

STORE DIM_Customers_SCD_New INTO [$(vTargetPath)DIM_Customers_SCD.qvd] (qvd);

Critical: This is the historical tracking table

Step 9.4: Save Master Calendar

STORE MasterCalendar INTO [$(vTargetPath)MasterCalendar.qvd] (qvd);

Why save?

Step 9.5: Drop Tables to Prepare for Visualization

DROP TABLE DIM_Stores_Raw;
DROP TABLE FACT_Sales_Final;
DROP TABLE DIM_Customers_SCD_New;
DROP TABLE MasterCalendar;

Why drop after saving?

Current state in memory: EMPTY

Understanding QVD File Benefits

Performance Comparison:

Loading from:
CSV file:        1000 rows/second
Database:        5000 rows/second  
QVD (optimized): 100,000+ rows/second

Optimized vs Non-Optimized Load:

Optimized Load (fastest):

LOAD * FROM file.qvd (qvd);

Non-Optimized Load (still fast, but slower):

LOAD 
    Field1,
    Field2,
    Field3 * 2 as DoubleField3  ← Transformation
FROM file.qvd (qvd);

Key Techniques Introduced

Technique 1: STORE Statement

STORE TableName INTO [lib://Connection/path/filename.qvd] (qvd);

Technique 2: Multiple STORE of Same Table

STORE Table1 INTO [path/fileA.qvd] (qvd);
STORE Table1 INTO [path/fileB.qvd] (qvd);

Technique 3: STORE then DROP Pattern

STORE TableName INTO [path/file.qvd] (qvd);
DROP TABLE TableName;

Technique 4: File Path with Variables

[$(vTargetPath)filename.qvd]

Common Pitfalls

❌ Pitfall 1: DROP before STORE

DROP TABLE MyTable;
STORE MyTable INTO ...  ← ERROR: Table doesn't exist!

❌ Pitfall 2: Forgetting to save before dropping

DROP TABLE DIM_Products_Transformed;  ← Lost the data!

❌ Pitfall 3: Wrong file extension

STORE Table INTO [path/file.txt] (qvd);

❌ Pitfall 4: Forgetting (qvd) format specifier

STORE Table INTO [path/file.qvd];  ← No format specified

❌ Pitfall 5: Using same filename for different content

STORE DIM_Products INTO [path/Dimensions.qvd] (qvd);
STORE DIM_Customers INTO [path/Dimensions.qvd] (qvd);  ← Overwrites!

❌ Pitfall 6: Saving to wrong folder

STORE Table INTO [$(vSourcePath)file.qvd] (qvd);  ← Wrong folder!
Should be: [$(vTargetPath)file.qvd]

Validation Checklist

βœ… Script runs without errors βœ… All tables saved successfully βœ… Check Windows Explorer - files exist in Transformed folder:

File size expectations:

Test QVD files: Create a new script tab:

TEST:
LOAD * FROM [$(vTargetPath)FACT_Sales_Complete.qvd] (qvd);

Run β†’ Should load successfully β†’ Confirms QVD is valid


🎨 Phase 10: Visualization Data Model

Concept: Optimizing for Analysis

ETL is complete, but...

Why a Separate Loading Phase?

Reason 1: Different Requirements

Reason 2: Field Naming

Reason 3: Selective Loading

Reason 4: Multiple Apps

Star Schema Design

What is Star Schema?

Dimension Dimension β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Products β”‚ β”‚ Customers β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ ProductID◄──┼───────────┼──►CustomerIDβ”‚ β”‚ Name β”‚ β”‚ β”‚ Name β”‚ β”‚ Category β”‚ β”‚ β”‚ Segment β”‚ β”‚ Brand β”‚ β”‚ β”‚ Country β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”‚ Sales β”‚ ← Fact (Center) β”‚ (FACT) β”‚ β”‚ OrderID β”‚ β”‚ProductIDβ”‚ β”‚CustomerIDβ”‚ β”‚ Revenue β”‚ β”‚ Profit β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”‚ Calendarβ”‚ ← Dimension β”‚ β”‚ β”‚CalendarDateβ”‚ β”‚ Year β”‚ β”‚ Quarter β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components:

  1. Fact Table (center): Sales transactions with metrics
  2. Dimension Tables (points): Descriptive attributes
  3. Relationships: Via key fields (ProductID, CustomerID, CalendarDate)

Benefits:

Design Decisions

Decision 1: Qualify Dimension Fields

Decision 2: Keep SCD Table Separate

Decision 3: Fully Qualify SCD Fields

Decision 4: Load Dimensions with Specific Fields

Implementation Steps

Step 10.1: Load Sales Fact Table

Create tab section: Load_Data_for_Viz

What to load:

Sales:
LOAD * FROM [$(vTargetPath)FACT_Sales_Complete.qvd] (qvd);

Table name: Sales (not FACT_Sales_Final)

Fields included:

This is the center of star schema

Step 10.2: Load Products Dimension

With qualified fields:

Products:
LOAD 
    ProductID,
    ProductName as Dim_ProductName,
    Category as Dim_Category,
    SubCategory as Dim_SubCategory,
    Brand as Dim_Brand,
    UnitCost as Dim_UnitCost,
    ListPrice as Dim_ListPrice,
    Status,
    Product_LastModifiedDate,
    ProfitPerUnit as Dim_ProfitPerUnit,
    ProfitMarginPercent,
    PriceTier as Dim_PriceTier,
    IsActive
FROM [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);

Key points:

Why rename?

Relationship:

Step 10.3: Load Customers Dimension

With qualified fields:

Customers:
LOAD 
    CustomerID,
    CustomerName as Dim_CustomerName,
    Country as Dim_Country,
    Region as Dim_Region,
    City as Dim_City,
    Segment as Dim_Segment,
    RegistrationDate,
    Customer_LastModifiedDate,
    CustomerTenureYears as Dim_CustomerTenureYears,
    LoyaltyTier as Dim_LoyaltyTier,
    MarketType as Dim_MarketType
FROM [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Same pattern:

Relationship:

Step 10.4: Load Stores Dimension

Minimal changes needed:

Stores:
LOAD 
    StoreID,
    StoreName,
    Store_Country,
    Store_Region,
    Store_City,
    StoreType,
    OpeningDate,
    Store_LastModifiedDate
FROM [$(vTargetPath)DIM_Stores_Transformed.qvd] (qvd);

Why minimal changes?

Usage:

Step 10.5: Load Master Calendar

Load all fields:

Calendar:
LOAD * FROM [$(vTargetPath)MasterCalendar.qvd] (qvd);

Why LOAD * here?

Relationship:

Usage in dashboards:

Step 10.6: Load Customer History (SCD)

Fully qualified to prevent linking:

Customer_History:
LOAD 
    CustomerID as History_CustomerID,
    CustomerName as History_CustomerName,
    Segment as History_Segment,
    Country as History_Country,
    Region as History_Region,
    City as History_City,
    LoyaltyTier as History_LoyaltyTier,
    EffectiveStartDate as History_StartDate,
    EffectiveEndDate as History_EndDate,
    IsCurrent as History_IsCurrent,
    VersionNumber as History_Version
FROM [$(vTargetPath)DIM_Customers_SCD.qvd] (qvd);

Critical: ALL fields prefixed

Why separate?

Usage:

Understanding the Final Data Model

Tables in memory:

  1. Sales (20 rows) - Fact table
  2. Products (10 rows) - Dimension
  3. Customers (10 rows) - Dimension
  4. Stores (5 rows) - Dimension
  5. Calendar (31 rows) - Dimension
  6. Customer_History (10 rows) - Standalone

Relationships:

Products.ProductID = Sales.ProductID
Customers.CustomerID = Sales.CustomerID
Calendar.CalendarDate = Sales.OrderDate

Standalone:

Field naming in model:

Key Techniques Introduced

Technique 1: Selective Field Loading

LOAD 
    Field1,
    Field2,
    Field3 as RenamedField3
FROM file.qvd (qvd);

Technique 2: Consistent Prefixing

Dim_FieldName    ← Dimension fields
History_FieldName ← SCD fields
Store_FieldName   ← Store-specific fields

Technique 3: Strategic Key Field Handling

ProductID,           ← Never rename key fields
ProductName as Dim_ProductName  ← Rename descriptive fields

Technique 4: Intentional Relationship Breaking

CustomerID as History_CustomerID  ← Breaks link

Common Pitfalls

❌ Pitfall 1: Not renaming duplicate fields

LOAD 
    ProductID,
    ProductName,  ← Also in Sales!
    Category      ← Synthetic key created
FROM file.qvd (qvd);

❌ Pitfall 2: Renaming key fields unintentionally

LOAD 
    ProductID as Dim_ProductID,  ← WRONG!
    ...

❌ Pitfall 3: Inconsistent prefixes

Dim_ProductName
Prd_Category  ← Different prefix!
Product_Brand ← Yet another prefix!

❌ Pitfall 4: Loading unnecessary fields

LOAD * FROM file.qvd (qvd);  ← Loads everything

❌ Pitfall 5: Forgetting to load a table

← Missing: LOAD Calendar

❌ Pitfall 6: Wrong table names in visualization layer

FACT_Sales_Final:  ← ETL naming
LOAD ...

Validation Checklist

βœ… Script runs without errors βœ… All 6 tables loaded successfully βœ… Data Model Viewer shows:

Visual validation in Data Model Viewer:

Expected structure: Products | (ProductID) | β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β” β”‚ Sales β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ | (CustomerID) | Customers Calendar | (CalendarDate = OrderDate) | β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β” β”‚ Sales β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Stores Customer_History (standalone) (standalone)

Test with filter:

  1. Create a filter pane
  2. Add Dim_Category
  3. Select "Smartphones"
  4. Sales table should filter to smartphone orders only
  5. Confirms relationship works

βœ… Testing & Validation

Complete ETL Validation

End-to-End Test:

  1. Source Files: Check all 5 QVDs exist in DataFiles folder
  2. Transformed Files: Check all 7 QVDs exist in Transformed folder
  3. Final Data Model: Open Data Model Viewer
  4. Synthetic Keys: Verify count is 0
  5. Row Counts: Verify all tables have expected rows
  6. Relationships: Verify all connections are correct
  7. Calculations: Spot-check metrics are accurate
  8. SCD: Verify version tracking works
  9. Incremental: Verify new records append correctly
  10. Calendar: Verify date attributes are correct

Validation Queries

Test 1: Revenue Calculation

Pick order ORD001:
- Quantity: 2
- ListPrice: 1299
- Discount: 0.05
- Expected GrossRevenue: 2 Γ— 1299 = 2598
- Expected NetRevenue: 2598 Γ— 0.95 = 2468.10
- Verify these values in Sales table

Test 2: Join Integrity

Pick order ORD001:
- ProductID: P001
- Should have: ProductName = "iPhone 14 Pro"
- Should have: Category = "Electronics"
- CustomerID: C001
- Should have: CustomerName = "John Smith"
- Should have: Segment = "Corporate"

Test 3: Time Intelligence

Pick order on 2024-01-15:
- OrderYear should be: 2024
- OrderMonth should be: 1
- OrderQuarter should be: 1
- Match to Calendar table:
  - Year: 2024
  - Month: 1
  - Quarter: 1

Test 4: Business Classifications

Find order with >$1000 net revenue:
- OrderValueCategory should be: "High Value"
Find order with >10% discount:
- DiscountCategory should be: "Heavy Discount"

Test 5: Incremental Load

Initial load: 15 orders (ORD001-ORD015)
After incremental: 20 orders (ORD001-ORD020)
Check: No duplicates, all have ProcessedFlag = 1

Test 6: SCD Versioning

Initial load:
- 10 customers
- All VersionNumber = 1
- All IsCurrent = 1

After change (simulate):
- Edit source: Change C001 Segment to "Consumer"
- Reload
- Check C001 has 2 versions:
  - Ver 1: Corporate, IsCurrent=0, EndDate=Yesterday
  - Ver 2: Consumer, IsCurrent=1, EndDate=9999-12-31

Performance Validation

Load Time Targets:

Memory Usage:

Data Quality Checks

Completeness:

Accuracy:

Consistency:

Common Issues and Solutions

Issue 1: Synthetic Keys

Issue 2: Wrong Row Counts

Issue 3: NULL Values in Calculations

Issue 4: Dates Not Sorting

Issue 5: Incremental Loading All Data


πŸŽ“ Key Learnings Summary

ETL Best Practices Learned

  1. Separate Concerns: Extract, Transform, Load are distinct phases
  2. Use Variables: Make code maintainable and flexible
  3. Save Progressively: Store to QVD after each major transformation
  4. Name Meaningfully: Clear naming prevents confusion and synthetic keys
  5. Document Business Logic: Comments explain WHY, not just WHAT
  6. Test Incrementally: Validate each phase before proceeding
  7. Think in Layers: Source β†’ Transformed β†’ Visualization
  8. Optimize for Purpose: ETL different from visualization needs

Qlik-Specific Techniques Mastered

  1. Variable Declaration: SET vs LET
  2. Data Loading: LOAD from QVD, CSV, INLINE, RESIDENT
  3. Transformations: Calculated fields, IF statements, functions
  4. Joins: LEFT JOIN, understanding join keys
  5. Concatenation: Auto vs explicit, NoConcatenate
  6. Data Generation: AUTOGENERATE with WHILE
  7. Control Flow: IF-THEN-ELSE, WHERE filtering
  8. File Operations: STORE to QVD, DROP TABLE
  9. Functions: Date, string, math, aggregate
  10. Data Modeling: Star schema, relationship management

Business Intelligence Skills Developed

  1. Requirements Translation: Business ask β†’ Technical specification
  2. Metric Definition: Revenue, profit, margin calculations
  3. Data Quality: Validation, completeness checks
  4. Historical Tracking: SCD Type 2 implementation
  5. Time Intelligence: Calendar creation, fiscal year logic
  6. Performance Optimization: QVD usage, incremental loading
  7. Documentation: Clear comments, logical structure
  8. Problem Solving: Debugging, troubleshooting

πŸš€ Next Steps

You're Ready For:

  1. Dashboard Creation: Build visualizations using the clean data model
  2. Advanced Analytics: Implement complex KPIs and calculations
  3. Production Deployment: Move solution to server environment
  4. Maintenance: Handle ongoing data updates and changes
  5. Expansion: Add new data sources and dimensions
  6. Optimization: Fine-tune performance for larger datasets
  7. Training Others: Teach colleagues these techniques

Recommended Follow-Up:

  1. Build Dashboards: Create the visualizations mentioned in business requirements
  2. Test SCD: Simulate customer changes, verify version tracking
  3. Optimize: Profile performance, identify bottlenecks
  4. Scale Up: Test with larger datasets (1000s, 10000s of rows)
  5. Add Features: Implement additional business rules
  6. Document: Create user guide for dashboard consumers

πŸ“š Appendix: Quick Reference

Command Quick Reference

Load Data:

LOAD Field1, Field2 FROM [file.qvd] (qvd);
LOAD Field1, Field2 RESIDENT TableName;
LOAD * INLINE [ ... ];

Transform:

Field1 as NewName
Field1 * Field2 as Calculated
If(condition, true_result, false_result) as ConditionalField

Join:

LEFT JOIN (TargetTable) LOAD ... RESIDENT SourceTable;

Concatenate:

CONCATENATE (TargetTable) LOAD ... RESIDENT SourceTable;
NoConcatenate TableName: LOAD ...

Save/Drop:

STORE TableName INTO [file.qvd] (qvd);
DROP TABLE TableName;

Control:

IF condition THEN ... ELSE ... ENDIF
WHERE condition

Function Quick Reference

Date:

Math:

String:

Aggregate:

Special: