GlobalTech Electronics - Technical Implementation Guide

Phase 3: Dimension Transformation

Concept: Adding Business Value

Transformation is where we add business intelligence to raw data:

Calculate metrics (profit, margins, tenure)
Apply business rules (loyalty tiers, price categories)
Derive new insights (market classification)

Raw data answers "what happened" Transformed data answers "what does it mean"

Why Transform Dimensions First?

Reason 1: Dimensions are stable

Product catalog doesn't change often
Customer attributes relatively static
Safe to calculate derived attributes

Reason 2: These attributes will be used in fact calculations

Need profit margin to calculate order profitability
Need loyalty tier to segment sales analysis

Reason 3: Keep transformations separate

Don't mix extraction and transformation logic
Easier to debug when steps are separated

Design Decisions

Decision 1: Create New "_Transformed" Tables

Don't modify _Raw tables
Create new _Transformed tables with calculations
Can compare raw vs transformed if needed

Decision 2: Use RESIDENT Loads

Source is table already in memory (RESIDENT DIM_Products_Raw)
Don't reload from QVD
Faster and clearer logic flow

Decision 3: Document Business Rules in Comments

Each calculation includes comment explaining business rule
Makes maintenance easier
Helps business users understand logic

Implementation Steps

Step 3.1: Transform Products

Create tab section: Transformed_Data_T1 (or similar)

Objective: Calculate product profitability and classify by price tier

Table name: DIM_Products_Transformed

Source: RESIDENT DIM_Products_Raw

Calculations to add:

1. ProfitPerUnit

ListPrice - UnitCost as ProfitPerUnit

How much profit we make on each unit
Simple subtraction
Example: $1299 - $850 = $449 profit per iPhone

2. ProfitMarginPercent

Round(((ListPrice - UnitCost) / ListPrice) * 100, 2) as ProfitMarginPercent

Profit as percentage of selling price
Formula: (Profit / Revenue) × 100
Round to 2 decimals for readability
Example: ($449 / $1299) × 100 = 34.57%

3. PriceTier Classification

If(ListPrice >= 2000, 'Premium',
   If(ListPrice >= 1000, 'Mid-Range',
      If(ListPrice >= 500, 'Standard', 'Budget'))) as PriceTier

Nested IF statements create categories
Business rules from Requirement #3
Premium: ≥$2000, Mid-Range: $1000-1999, Standard: $500-999, Budget: <$500

4. IsActive Flag

If(Status = '$(vActiveStatus)', 1, 0) as IsActive

Binary flag (1/0) for active products
Uses variable for flexibility
Easier to filter in visualizations (1=yes, 0=no)

Why keep original fields?

Keep ProductID, ProductName, Category, etc.
Keep all original attributes plus new calculated fields
Transformed table is a superset of raw table

After transformation:

DROP TABLE DIM_Products_Raw;

Remove raw table from memory
Only keep transformed version
Saves memory, prevents confusion

Step 3.2: Transform Customers

Table name: DIM_Customers_Transformed

Source: RESIDENT DIM_Customers_Raw

Calculations to add:

1. CustomerTenureYears

Year(Today()) - Year(RegistrationDate) as CustomerTenureYears

How many years since customer registered
Year() extracts year from date
Today() gets current date
Example: 2026 - 2023 = 3 years

2. LoyaltyTier

If(Year(Today()) - Year(RegistrationDate) >= 2, 'Gold',
   If(Year(Today()) - Year(RegistrationDate) >= 1, 'Silver', 'Bronze')) as LoyaltyTier

Business rule from Requirement #2
Gold: 2+ years, Silver: 1+ year, Bronze: <1 year
Based on tenure calculation
Will be used for customer segmentation analysis

3. MarketType

If(Region = 'North America' or Region = 'Europe', 'Developed', 'Emerging') as MarketType

Classifies geography into market types
Business rule from Requirement #7
North America & Europe = Developed
All others (Asia, Middle East, Oceania) = Emerging
Used for comparing market performance

Why these specific calculations?

Tenure → Used for loyalty analysis
LoyaltyTier → Segments customers for marketing
MarketType → Enables developed vs emerging analysis

After transformation:

DROP TABLE DIM_Customers_Raw;

Key Techniques Introduced

Technique 1: RESIDENT Load

LOAD 
    *,
    NewCalculation
RESIDENT SourceTable;

Loads from table already in memory
No file I/O required
Use for transformations of previously loaded data

Technique 2: Arithmetic Calculations

Field1 - Field2 as Difference
Field1 * Field2 as Product
Field1 / Field2 as Ratio

Standard math operators work in LOAD
Create new fields from calculations
Results stored as new columns

Technique 3: Nested IF Statements

If(condition1, result1,
   If(condition2, result2,
      If(condition3, result3, 
         else_result))) as NewField

Implements multiple conditions
Reads like: "If X then A, else if Y then B, else if Z then C, else D"
Can nest as deep as needed (but keep readable)

Technique 4: Date Functions

Today() → Current date
Year(date) → Extract year from date
Month(date) → Extract month
Date(value, format) → Format as date

Technique 5: Variable Usage in Expressions

If(Status = '$(vActiveStatus)', 1, 0)

$(variableName) expands variable value
Makes code maintainable
Change variable once, affects all uses

Technique 6: Round Function

Round(expression, decimals) as RoundedValue

Rounds numbers to specified decimal places
Round(34.56789, 2) → 34.57
Important for percentages and currency

Common Pitfalls

❌ Pitfall 1: Forgetting to keep original fields

LOAD 
    NewCalculation as Field1  ← Only calculated field
RESIDENT SourceTable;

Problem: Loses all other fields from source
Fix: List all original fields OR use *, before new fields

❌ Pitfall 2: Division by zero

Field1 / Field2 as Ratio  ← What if Field2 = 0?

Problem: Causes errors or NULL results
Fix: Add safety check: If(Field2 <> 0, Field1/Field2, 0)

❌ Pitfall 3: Wrong date arithmetic

Today() - RegistrationDate  ← Returns days, not years

Problem: Date subtraction gives days as number
Fix: Use Year() function to extract years

❌ Pitfall 4: Nested IF too deep

Problem: Hard to read, easy to make mistakes
Fix: Keep to 3-4 levels max, consider lookup tables for complex rules

❌ Pitfall 5: Not using variables for business rules

If(Discount > 0.10, 'Heavy', 'Light')  ← Hardcoded

Problem: Have to find and change every instance if rule changes
Fix: If(Discount > $(vDiscountThreshold), 'Heavy', 'Light')

Validation Checklist

✅ Script runs without errors ✅ Two new tables in memory:

DIM_Products_Transformed (10 rows)
DIM_Customers_Transformed (10 rows) ✅ Raw tables dropped (not visible) ✅ Calculated fields present:
Products: ProfitPerUnit, ProfitMarginPercent, PriceTier, IsActive
Customers: CustomerTenureYears, LoyaltyTier, MarketType ✅ All original fields preserved ✅ Values make sense (no negatives where shouldn't be, reasonable percentages)

🔗 Phase 4: Data Integration (Joins)

Concept: Enriching Facts with Dimensions

The Star Schema Principle: Facts should contain keys and measures, dimensions should contain descriptions

Currently:

FACT_Sales_Raw has ProductID and CustomerID (just numbers)
Need to add product names, categories, customer names, segments, etc.

Solution: JOIN dimension tables to fact table

What is a JOIN?

A JOIN combines two tables based on matching values in a common field (the "key").

Types of Joins:

INNER JOIN: Keep only matching records
LEFT JOIN: Keep all records from left table, matching from right
RIGHT JOIN: Keep all records from right table, matching from left
OUTER JOIN: Keep all records from both tables

Why LEFT JOIN for This Scenario?

We use LEFT JOIN because:

Want to keep ALL sales transactions (left table = FACT_Sales_Raw)
Even if product or customer is missing (data quality issue)
Better to see the sale with missing info than hide it completely

Real-world example:

Order ORD001 links to ProductID P001
If P001 exists in products → Join succeeds, get product details
If P001 missing → Join returns NULL, but keep the order

Design Decisions

Decision 1: Join to Facts, Not Create Separate Tables

Could create a new table with results
Instead, we LEFT JOIN directly to FACT_Sales_Raw
Result: FACT_Sales_Raw becomes enriched with dimension attributes

Decision 2: Rename Fields During Join (Critical!)

Problem: Both Sales and Customers have "Country" field
If we don't rename → Qlik creates synthetic key
Solution: Prefix dimension fields during join
- ProductName as Prod_ProductName
- CustomerName as Cust_CustomerName

Decision 3: Join Products First, Then Customers

Order doesn't technically matter
But logical flow: Product info first, then customer info
Easier to debug step-by-step

Implementation Steps

Step 4.1: Join Sales with Products

Create tab section: Extract_Data_E (continues from extraction)

Objective: Add product details to every sales transaction

What to do:

LEFT JOIN (FACT_Sales_Raw)
LOAD
    ProductID,
    ProductName as Prod_ProductName,
    Category as Prod_Category,
    ... (all product fields, renamed with Prod_ prefix)
RESIDENT DIM_Products_Transformed;

How LEFT JOIN works:

Qlik looks at each row in FACT_Sales_Raw
Finds the ProductID value (e.g., "P001")
Searches for matching ProductID in DIM_Products_Transformed
If found → Adds all product fields to that sales row
If not found → Adds NULLs
Result: FACT_Sales_Raw now has product columns

Fields to add (all renamed with Prod_ prefix):

Prod_ProductName
Prod_Category
Prod_SubCategory
Prod_Brand
Prod_UnitCost
Prod_ListPrice
Prod_ProfitPerUnit
Prod_ProfitMarginPercent
Prod_PriceTier
Prod_IsActive

Why keep "ProductID" without prefix?

It's the join key
Needs to match exactly between tables
Only descriptive fields get prefixed

What happens to FACT_Sales_Raw?

Originally: OrderID, OrderDate, CustomerID, ProductID, Quantity, Discount, etc.
After join: All original fields PLUS all Prod_* fields
Table got wider (more columns), not taller (same rows)

Step 4.2: Join Sales with Customers

Create tab section: Transformed_Data_T2

Objective: Add customer details to every sales transaction

What to do:

LEFT JOIN (FACT_Sales_Raw)
LOAD
    CustomerID,
    CustomerName as Cust_CustomerName,
    Country as Cust_Country,
    ... (all customer fields, renamed with Cust_ prefix)
RESIDENT DIM_Customers_Transformed;

Fields to add (all renamed with Cust_ prefix):

Cust_CustomerName
Cust_Country
Cust_Region
Cust_City
Cust_Segment
Cust_TenureYears
Cust_LoyaltyTier
Cust_MarketType

After this join: FACT_Sales_Raw now has:

Original transaction fields
Product details (Prod_* fields)
Customer details (Cust_* fields)

Result: Fully enriched fact table ready for final calculations

Understanding Synthetic Keys (and Why We Avoid Them)

What is a Synthetic Key? When two or more tables share multiple field names (not just one key), Qlik automatically creates a synthetic key table.

Example of Problem:

Table 1: Sales (CustomerID, CustomerName, Country)
Table 2: Customers (CustomerID, CustomerName, Country)

Both tables have 3 matching field names
Qlik doesn't know which is the "real" key
Creates $Syn table to resolve ambiguity

Why Synthetic Keys are Bad:

Performance: Extra table overhead, slower calculations
Memory: Uses more RAM unnecessarily
Confusion: Data model becomes unclear
Calculations: Can produce wrong results in aggregations

How We Avoid Them:

Only ONE field name matches: the key field (CustomerID, ProductID)
All other fields renamed: Cust_CustomerName, Prod_ProductName
Result: Clean relationships, no synthetic keys

Key Techniques Introduced

Technique 1: LEFT JOIN Syntax

LEFT JOIN (TargetTable)
LOAD 
    KeyField,
    OtherFields
RESIDENT SourceTable;

LEFT JOIN (TableName) specifies which table to join to
Must specify table name in parentheses
Fields from LOAD get added to target table

Technique 2: Field Prefixing

FieldName as Prefix_FieldName

Renames field during load
Essential for avoiding synthetic keys
Use consistent prefixes (Prod_, Cust_, Store_)

Technique 3: Join Keys

Key field (e.g., ProductID) must have SAME NAME in both tables
Key field is NOT renamed
Qlik uses matching names to find relationships

Technique 4: Reading Table in Memory

RESIDENT TableName

Source is table already loaded
Much faster than reading from file again
Standard pattern for transformations

Common Pitfalls

❌ Pitfall 1: Forgetting to rename fields

LEFT JOIN (FACT_Sales_Raw)
LOAD 
    ProductID,
    ProductName,  ← Not renamed!
    Category      ← Not renamed!
RESIDENT DIM_Products_Transformed;

Problem: Creates synthetic keys
Fix: Rename all non-key fields

❌ Pitfall 2: Renaming the key field

LEFT JOIN (FACT_Sales_Raw)
LOAD 
    ProductID as Prod_ProductID,  ← WRONG!
    ...

Problem: Join fails (names don't match)
Fix: Never rename the key field in a join

❌ Pitfall 3: Using wrong join type

INNER JOIN (FACT_Sales_Raw)  ← Drops non-matching sales

Problem: Loses sales transactions with missing product/customer
Fix: Use LEFT JOIN to keep all sales

❌ Pitfall 4: Inconsistent prefixes

Product_Name as Prod_ProductName
Product_Category as Prd_Category  ← Different prefix!

Problem: Confusing, hard to maintain
Fix: Use consistent prefix (Prod_ for all product fields)

❌ Pitfall 5: Circular joins

LEFT JOIN (Table1) ... from Table2
LEFT JOIN (Table2) ... from Table1

Problem: Creates circular reference, may cause errors
Fix: Join in one direction only (dimensions to facts)

Validation Checklist

✅ Script runs without errors ✅ FACT_Sales_Raw still has 15 rows (LEFT JOIN doesn't reduce) ✅ FACT_Sales_Raw now has many more columns:

Original: OrderID, OrderDate, CustomerID, ProductID, Quantity, Discount, ShippingCost, OrderStatus, ProcessedFlag
Added: Prod_ProductName, Prod_Category, Prod_Brand, Prod_ListPrice, Prod_UnitCost, Prod_PriceTier, etc.
Added: Cust_CustomerName, Cust_Segment, Cust_Country, Cust_Region, Cust_LoyaltyTier, etc. ✅ Check a few records: Product and customer details correctly matched ✅ No synthetic keys (check Data Model Viewer) ✅ Key fields (ProductID, CustomerID) present without prefixes

💼 Phase 5: Business Logic Application

Concept: Creating the Final Fact Table

Now we have:

Sales transactions with product details
Sales transactions with customer details
But still need to calculate revenue, profit, and apply final business rules

This phase creates the final fact table that will be used for all analysis.

Why Create a New Final Table?

Option 1: Continue modifying FACT_Sales_Raw

Could add more calculated fields
But table name doesn't reflect it's transformed

Option 2: Create new FACT_Sales_Final table (WE CHOOSE THIS)

Clear separation: Raw → Final
Can compare if needed for debugging
More professional, maintainable approach

Design Decisions

Decision 1: Calculate All Metrics in One Place

All revenue calculations together
All profit calculations together
All classifications together
Easier to understand and maintain

Decision 2: Use Meaningful Field Names in Output

Input: Prod_ProductName, Cust_CustomerName (prefixed)
Output: ProductName, CustomerName (clean names)
Why? Final table is for analysis, make it user-friendly

Decision 3: Add Metadata Fields

ProcessedFlag = 1 (mark as processed)
ETL_LoadDate = Current date
Helps with incremental loads and auditing

Decision 4: Clean Up After Transformation

DROP FACT_Sales_Raw (no longer needed)
DROP dimension tables (will reload later if needed)
Keep only FACT_Sales_Final in memory

Implementation Steps

Step 5.1: Create Final Fact Table

Create tab section: Transformed_Data_T1

Objective: Create FACT_Sales_Final with all business logic applied

Source: RESIDENT FACT_Sales_Raw

Table name: FACT_Sales_Final

Step 5.2: Include Time Intelligence Fields

Why: Business Ask #6 requires time-based analysis

Fields to add:

OrderYear

Year(OrderDate) as OrderYear

Extracts year from order date
Example: 2024-01-15 → 2024
Used for year-over-year analysis

OrderMonth

Month(OrderDate) as OrderMonth

Extracts month number (1-12)
Example: 2024-01-15 → 1
Used for monthly trends

OrderQuarter

Ceil(Month(OrderDate)/3) as OrderQuarter

Calculates quarter (1, 2, 3, 4)
Ceil() rounds up: Jan/Feb/Mar (1-3) / 3 = Q1
Used for quarterly reporting

Why Ceil() for Quarter?

Month 1 / 3 = 0.33 → Ceil → 1 (Q1)
Month 2 / 3 = 0.67 → Ceil → 1 (Q1)
Month 3 / 3 = 1.00 → Ceil → 1 (Q1)
Month 4 / 3 = 1.33 → Ceil → 2 (Q2)

OrderDayOfWeek

WeekDay(OrderDate) as OrderDayOfWeek

Returns 0=Monday, 1=Tuesday... 6=Sunday
Used for day-of-week analysis

Step 5.3: Rename Dimension Fields to Clean Names

Customer Fields (remove Cust_ prefix):

CustomerID,                    ← Keep as-is (key field)
Cust_CustomerName as CustomerName,
Cust_Segment as Segment,
Cust_LoyaltyTier as LoyaltyTier,
Cust_MarketType as MarketType,
Cust_Country as Country,
Cust_Region as Region,
Cust_City as City,
Cust_TenureYears as CustomerTenureYears

Product Fields (remove Prod_ prefix):

ProductID,                     ← Keep as-is (key field)
Prod_ProductName as ProductName,
Prod_Category as Category,
Prod_SubCategory as SubCategory,
Prod_Brand as Brand,
Prod_PriceTier as PriceTier

Why remove prefixes now?

During join: Prefixes prevented synthetic keys
In final table: Clean names for user-friendliness
Users see "ProductName" not "Prod_ProductName"

Step 5.4: Include Transaction Fields

Original transaction fields:

Quantity,
Prod_ListPrice as ListPrice,
Prod_UnitCost as UnitCost,
Discount,
ShippingCost,
OrderStatus

Step 5.5: Calculate Revenue Metrics

Business Ask #1: Need accurate revenue calculations

GrossRevenue

Quantity * Prod_ListPrice as GrossRevenue

Total sales before discounts
Example: 2 units × $1299 = $2598
Represents full-price revenue

NetRevenue

Quantity * Prod_ListPrice * (1 - Discount) as NetRevenue

Revenue after discount applied
Example: $2598 × (1 - 0.05) = $2598 × 0.95 = $2468.10
Actual money received from customer

Why (1 - Discount)?

Discount is stored as decimal (0.05 = 5%)
Customer pays 95% of original price
(1 - 0.05) = 0.95 = 95%

Step 5.6: Calculate Cost Metrics

TotalCost

Quantity * Prod_UnitCost as TotalCost

What we paid for the goods sold
Example: 2 units × $850 = $1700
Cost of Goods Sold (COGS)

Step 5.7: Calculate Profit Metrics

Business Ask #3: Need profitability analysis

GrossProfit

(Quantity * Prod_ListPrice * (1 - Discount)) - (Quantity * Prod_UnitCost) as GrossProfit

Revenue minus cost
Example: $2468.10 - $1700 = $768.10
Profit on this transaction

DiscountAmount

Quantity * Prod_ListPrice * Discount as DiscountAmount

Dollar value of discount given
Example: $2598 × 0.05 = $129.90
How much revenue was sacrificed for the sale

Step 5.8: Apply Business Classifications

Business Ask #4: Categorize orders by value and discount level

OrderValueCategory

If((Quantity * Prod_ListPrice * (1 - Discount)) >= 1000, 
   'High Value', 
   'Standard') as OrderValueCategory

High Value: Net revenue ≥ $1000
Standard: Net revenue < $1000
Threshold from vHighValueOrder variable

DiscountCategory

If(Discount > 0.10, 
   'Heavy Discount', 
   If(Discount > 0, 'Light Discount', 'No Discount')) as DiscountCategory

Heavy Discount: > 10%
Light Discount: 0-10%
No Discount: 0%
Threshold from vDiscountThreshold variable

IsCompleted

If(OrderStatus = 'Completed', 1, 0) as IsCompleted

Binary flag for completed orders
1 = Completed, 0 = Other (Pending, Cancelled)
Easier to count in aggregations

Step 5.9: Calculate Total Transaction Value

TotalTransactionValue

(Quantity * Prod_ListPrice * (1 - Discount)) + ShippingCost as TotalTransactionValue

Net revenue PLUS shipping
Total amount customer paid
Example: $2468.10 + $15.00 = $2483.10

Step 5.10: Add Flags and Metadata

IsCurrentYear

If(Year(OrderDate) = 2026, 1, 0) as IsCurrentYear

Flag for current year analysis
Compare current year vs historical
Year value should match current year in training

ProcessedFlag

1 as ProcessedFlag

Mark record as processed
Used in incremental load logic
Next time: Load WHERE ProcessedFlag = 0 (new records only)

ETL_LoadDate

'2026-01-29' as ETL_LoadDate

Timestamp of when record was loaded
Audit trail
Can track when data entered warehouse

Step 5.11: Save Dimensions to QVD

Important step BEFORE dropping:

STORE DIM_Products_Transformed INTO [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);
STORE DIM_Customers_Transformed INTO [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

This ensures:

Dimensions saved for future use
Ready to be dropped safely
Available for incremental load (Phase 6)

Step 5.12: Clean Up Tables

After saving, now drop:

DROP TABLE FACT_Sales_Raw;
DROP TABLE DIM_Products_Transformed;
DROP TABLE DIM_Customers_Transformed;

Why drop these tables?

FACT_Sales_Raw: No longer needed, we have final version
DIM_Products_Transformed: Will cause synthetic keys if kept
DIM_Customers_Transformed: Will cause synthetic keys if kept

Key Techniques Introduced

Technique 1: Complex Calculated Fields

(Field1 * Field2 * (1 - Field3)) - (Field1 * Field4) as Result

Can combine multiple operations
Use parentheses to control order
Keep formulas readable with spacing

Technique 2: Ceil() Function

Ceil(value) as RoundedUp

Rounds UP to nearest integer
1.1 → 2, 1.9 → 2, 2.0 → 2
Perfect for quarter calculation

Technique 3: Multiple Conditions in IF

If(condition1, 'Result1',
   If(condition2, 'Result2', 'DefaultResult')) as Category

Tests conditions in order
First true condition wins
Last value is default if all fail

Technique 4: Literal Values

1 as ConstantField
'Fixed Text' as TextField

Create fields with same value for all rows
Useful for flags and metadata

Technique 5: Year/Month/Quarter Functions

Year(date) → Extract year
Month(date) → Extract month (1-12)
Quarter(date) → Extract quarter (1-4) Not used here due to potential parsing issues
WeekDay(date) → Day of week (0-6)

Common Pitfalls

❌ Pitfall 1: Wrong order of operations

Quantity * ListPrice * 1 - Discount  ← WRONG
Should be: Quantity * ListPrice * (1 - Discount)

Problem: Calculates wrong value (multiplies then subtracts)
Fix: Use parentheses to control order

❌ Pitfall 2: Dropping tables before saving

DROP TABLE DIM_Products_Transformed;
STORE DIM_Products_Transformed ...  ← ERROR: Table doesn't exist

Problem: Can't store what doesn't exist
Fix: STORE first, then DROP

❌ Pitfall 3: Not handling NULL values

Field1 / Field2  ← What if Field2 is NULL?

Problem: Result is NULL, breaks calculations
Fix: Add NULL checks with IF statements

❌ Pitfall 4: Inconsistent field naming

CustomerName  ← Here
Customer_Name  ← There
Cust_Name  ← Somewhere else

Problem: Fields don't join, duplicate fields created
Fix: Use consistent naming convention

❌ Pitfall 5: Forgetting to rename prefixed fields

Prod_ProductName,  ← Still has prefix
Prod_Category      ← Not user-friendly

Problem: Confusing field names in visualizations
Fix: Rename to clean names in final table

Validation Checklist

✅ Script runs without errors ✅ FACT_Sales_Final table created with 15 rows ✅ All time fields present (OrderYear, OrderMonth, OrderQuarter, OrderDayOfWeek) ✅ All dimension fields renamed (CustomerName, ProductName, Segment, Category, etc.) ✅ All revenue calculations present (GrossRevenue, NetRevenue) ✅ All cost calculations present (TotalCost) ✅ All profit calculations present (GrossProfit, DiscountAmount) ✅ All classifications present (OrderValueCategory, DiscountCategory, IsCompleted) ✅ Metadata fields present (ProcessedFlag=1, ETL_LoadDate, IsCurrentYear) ✅ Original tables dropped (FACT_Sales_Raw, DIM_Products_Transformed, DIM_Customers_Transformed) ✅ Dimension QVDs saved before dropping ✅ Pick a row and manually verify calculations are correct

Manual verification example:

Find order ORD001: 2 iPhones at $1299 each, 5% discount
GrossRevenue should be: 2 × $1299 = $2598
NetRevenue should be: $2598 × 0.95 = $2468.10
If UnitCost is $850: TotalCost = 2 × $850 = $1700
GrossProfit should be: $2468.10 - $1700 = $768.10
DiscountAmount should be: $2598 × 0.05 = $129.90

🔄 Phase 6: Incremental Load Implementation

Concept: Loading Only New Data

The Problem with Full Loads:

Day 1: Load 15 sales orders (takes 2 seconds)
Day 2: Load same 15 + 5 new = 20 orders (takes 2 seconds)
Day 30: Load 15 + 150 new = 165 orders (takes 10 seconds)
Day 365: Load 15 + 1800+ = 1815+ orders (takes 2 minutes)

As data grows, full reloads become:

Slower (processing time increases)
Wasteful (reprocessing unchanged data)
Disruptive (locks tables during reload)

The Incremental Solution:

Day 1: Load 15 orders (initial load)
Day 2: Load ONLY 5 new orders, append to existing
Day 30: Load ONLY 5 new orders (same speed as day 2!)
Day 365: Still loading only 5 orders per day

Benefits:

Constant speed (always loading same amount)
Efficient (only process what's new)
Fast (minutes become seconds)

How Incremental Load Works

Step-by-step process:

Mark what's been loaded
- Add ProcessedFlag field to source data
- 0 = not loaded yet, 1 = already loaded
Track what's already in warehouse
- Save processed data to QVD after each load
- This becomes the "history"
Filter for new data only
- Load WHERE ProcessedFlag = 0
- Only gets unprocessed records
Apply same transformations
- New data goes through same logic as initial load
- Ensures consistency
Append to history
- CONCATENATE new with existing
- Result: Complete dataset
Mark as processed
- Set ProcessedFlag = 1
- Next time: Won't load again

Design Decisions

Decision 1: Use ProcessedFlag Not Dates

Could use: WHERE OrderDate > LastLoadDate
We use: WHERE ProcessedFlag = 0
Why? More reliable, handles late-arriving data, explicit tracking

Decision 2: Reload Dimensions Each Time

Dimensions change rarely but need to be available
Reload from QVD at start of incremental load
Apply joins same as initial load

Decision 3: Use NoConcatenate Keyword

Problem: Qlik auto-concatenates tables with same structure
Solution: NoConcatenate forces new table creation
Then explicitly CONCATENATE when ready

Decision 4: Drop and Reload Pattern

After appending, drop temporary tables
Clean memory, prevent confusion
Keep only final consolidated table

Implementation Steps

Step 6.1: Reload Dimension Tables

Create tab section: Incremental_Load

Why reload dimensions?

We dropped them in Phase 5
Need them to join with new sales data
Quick to reload from QVD (optimized)

What to do:

DIM_Products_Transformed:
LOAD * FROM [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);

DIM_Customers_Transformed:
LOAD * FROM [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Tables now in memory:

DIM_Products_Transformed
DIM_Customers_Transformed
Ready for joining

Step 6.2: Load New Incremental Sales Data

Source: FACT_Sales_Incremental.qvd

What to load:

FACT_Sales_Incremental_Raw:
LOAD 
    OrderID,
    Date(OrderDate, 'YYYY-MM-DD') as OrderDate,
    CustomerID,
    ProductID,
    Quantity,
    Discount,
    ShippingCost,
    OrderStatus,
    ProcessedFlag
FROM [$(vSourcePath)FACT_Sales_Incremental.qvd] (qvd)
WHERE ProcessedFlag = 0;

Key points:

WHERE ProcessedFlag = 0: Only unprocessed records
Same fields as initial sales load
Table name: FACT_Sales_Incremental_Raw

Expected result: 5 new orders loaded (ORD016-ORD020)

Step 6.3: Join Incremental Sales with Dimensions

Join with Products:

LEFT JOIN (FACT_Sales_Incremental_Raw)
LOAD
    ProductID,
    ProductName as Prod_ProductName,
    Category as Prod_Category,
    SubCategory as Prod_SubCategory,
    Brand as Prod_Brand,
    UnitCost as Prod_UnitCost,
    ListPrice as Prod_ListPrice,
    ProfitPerUnit as Prod_ProfitPerUnit,
    PriceTier as Prod_PriceTier,
    IsActive as Prod_IsActive
RESIDENT DIM_Products_Transformed;

Join with Customers:

LEFT JOIN (FACT_Sales_Incremental_Raw)
LOAD
    CustomerID,
    CustomerName as Cust_CustomerName,
    Country as Cust_Country,
    Region as Cust_Region,
    City as Cust_City,
    Segment as Cust_Segment,
    CustomerTenureYears as Cust_TenureYears,
    LoyaltyTier as Cust_LoyaltyTier,
    MarketType as Cust_MarketType
RESIDENT DIM_Customers_Transformed;

Exactly the same as Phase 4 joins!

Same field names, same prefixes
Consistency is critical
New data treated identically to initial data

Step 6.4: Apply Transformations to Incremental Data

Critical: Use NoConcatenate:

NoConcatenate
FACT_Sales_Incremental_Transformed:
LOAD
    ... (all same transformations as Phase 5)
RESIDENT FACT_Sales_Incremental_Raw;

Why NoConcatenate?

FACT_Sales_Final already exists in memory
Same field structure as our incremental data
Without NoConcatenate: Qlik auto-concatenates
Problem: We can't control when/how it happens
Solution: NoConcatenate creates separate table first

Transformations to apply (same as Phase 5):

Extract time fields (Year, Month, Quarter, DayOfWeek)
Rename dimension fields (remove Prod_, Cust_ prefixes)
Calculate revenue (GrossRevenue, NetRevenue)
Calculate costs (TotalCost)
Calculate profit (GrossProfit, DiscountAmount)
Classify orders (OrderValueCategory, DiscountCategory, IsCompleted)
Calculate total value (TotalTransactionValue)
Add flags (IsCurrentYear, ProcessedFlag=1, ETL_LoadDate)

Result:

FACT_Sales_Incremental_Transformed table created
5 rows, fully transformed
Separate from FACT_Sales_Final

Step 6.5: Concatenate with Existing Data

Now explicitly concatenate:

CONCATENATE (FACT_Sales_Final)
LOAD * RESIDENT FACT_Sales_Incremental_Transformed;

What this does:

Takes all rows from FACT_Sales_Incremental_Transformed
Adds them to FACT_Sales_Final
FACT_Sales_Final now has 15 + 5 = 20 rows

Why CONCATENATE is safe now:

We explicitly control timing
Transformations already complete
Clean, understandable code

Step 6.6: Clean Up

Drop temporary tables:

DROP TABLE FACT_Sales_Incremental_Raw;
DROP TABLE FACT_Sales_Incremental_Transformed;
DROP TABLE DIM_Products_Transformed;
DROP TABLE DIM_Customers_Transformed;

Why drop dimensions again?

Same reason as Phase 5
Prevent synthetic keys
Will save to QVD and reload in visualization phase

Final state:

Only FACT_Sales_Final in memory
20 rows total (initial 15 + incremental 5)
Ready to save and use

Understanding Auto-Concatenation

Qlik's Auto-Concatenation Rule: When you create a table with identical field names to an existing table, Qlik automatically appends rows.

Example:

Table1:
LOAD
    OrderID,
    OrderDate,
    Quantity
...

Table1:  ← Same name!
LOAD
    OrderID,
    OrderDate,
    Quantity
...
Result: One table "Table1" with all rows combined

Problem: If you're in middle of transformations, auto-concatenation happens too early.

Solution - Option 1: Different Table Names

Table1:
LOAD ... initial data

Table2:  ← Different name
LOAD ... incremental data

CONCATENATE (Table1)
LOAD * RESIDENT Table2;

Solution - Option 2: NoConcatenate Keyword

Table1:
LOAD ... initial data

NoConcatenate  ← Prevents auto-concat
Table2:
LOAD ... incremental data (would auto-concat without NoConcatenate)

CONCATENATE (Table1)
LOAD * RESIDENT Table2;

We use Option 2: Makes table relationships clear

Key Techniques Introduced

Technique 1: WHERE with ProcessedFlag

WHERE ProcessedFlag = 0

Filters during load
Only gets new records
Essential for incremental loading

Technique 2: NoConcatenate Keyword

NoConcatenate
TableName:
LOAD ...

Prevents automatic concatenation
Forces new table creation
Gives you control over when tables merge

Technique 3: Explicit CONCATENATE

CONCATENATE (TargetTable)
LOAD * RESIDENT SourceTable;

Manually append rows
Clear, readable code
You decide when it happens

Technique 4: Reloading from QVD

TableName:
LOAD * FROM [path/file.qvd] (qvd);

Brings saved table back into memory
Very fast (QVD optimized)
Common pattern in multi-stage ETL

Common Pitfalls

❌ Pitfall 1: Auto-concatenation happens unexpectedly

FACT_Sales_Final:
LOAD ... (creates table)

FACT_Sales_Incremental_Transformed:
LOAD ... (auto-concatenates to FACT_Sales_Final!)

Problem: Incremental data added before transformations complete
Fix: Use NoConcatenate keyword

❌ Pitfall 2: Forgetting to reload dimensions

LEFT JOIN (FACT_Sales_Incremental_Raw)
LOAD ... RESIDENT DIM_Products_Transformed;  ← ERROR: Table not found

Problem: Dimensions were dropped in Phase 5
Fix: Reload from QVD at start of incremental section

❌ Pitfall 3: Inconsistent transformations

Initial: NetRevenue = Quantity * ListPrice * (1 - Discount)
Incremental: NetRevenue = Quantity * ListPrice  ← Missing discount!

Problem: Different records calculated differently
Fix: Copy exact same transformation logic

❌ Pitfall 4: Not updating ProcessedFlag

1 as ProcessedFlag  ← Forgot this line

Problem: Records loaded again next time
Fix: Always set ProcessedFlag = 1 in transformations

❌ Pitfall 5: Loading all data instead of new only

WHERE ProcessedFlag = 0  ← Missing this line

Problem: Loads everything, defeats purpose of incremental
Fix: Always filter for new records

Validation Checklist

✅ Script runs without errors ✅ Dimensions reloaded successfully from QVD ✅ Incremental raw data loaded (5 rows) ✅ Joins completed (product and customer details added) ✅ Transformations applied (same logic as initial load) ✅ NoConcatenate created separate table ✅ CONCATENATE merged tables correctly ✅ FACT_Sales_Final now has 20 rows (15 initial + 5 incremental) ✅ All 20 rows have ProcessedFlag = 1 ✅ Temporary tables dropped ✅ No duplicate records (check OrderID - should have ORD001-ORD020, no repeats) ✅ Calculations correct on incremental records (spot check a few)

Test incremental load works: Next time script runs:

Initial load: Still gets 15 records (ProcessedFlag = 0 in source)
Incremental load: Gets 0 records (all have ProcessedFlag = 1)
(In real scenario, source system would add new orders with ProcessedFlag = 0)

📜 Phase 7: Slowly Changing Dimension (SCD Type 2)

Concept: Tracking Historical Changes

The Problem: Customer C001 is currently in the "Corporate" segment. Next month, they change to "Consumer" segment. Traditional approach:

Before: C001 | Corporate
After:  C001 | Consumer  ← Lost history!

Can't answer: "When did they change?"
Can't analyze: "How did Corporate customers from last year perform?"
Lost context for historical sales

The Solution: SCD Type 2:

Version 1: C001 | Corporate | 2024-01-01 | 2024-05-31 | Not Current
Version 2: C001 | Consumer  | 2024-06-01 | 9999-12-31 | Current

Keeps complete history
Tracks effective date ranges
Maintains version numbers

SCD Types Overview

Type 0: No Changes Allowed

Original value never changes
Historical accuracy
Example: Date of Birth

Type 1: Overwrite (No History)

Before: Address = "123 Old St"
After:  Address = "456 New St"  ← Old address lost

Simple, no history
Current value only
Example: Typo corrections

Type 2: Track Full History (WE USE THIS)

Row 1: Address = "123 Old St" | 2020-01-01 | 2024-12-31 | Not Current
Row 2: Address = "456 New St" | 2025-01-01 | 9999-12-31 | Current

Complete audit trail
Can analyze by historical state
Example: Customer segments, pricing tiers

Type 3: Previous Value Column

Current_Segment = "Consumer"
Previous_Segment = "Corporate"

Tracks one prior value
Limited history
Example: Last job title

Type 4: History Table

Current values in main table
History in separate table
Example: Audit logs

We choose Type 2 because:

Business Ask #2 requires full history
Need to track "when" changes happened
Analyze segments over time

SCD Type 2 Structure

Key Fields:

Business Key: CustomerID (identifies customer)
Attributes: Segment, LoyaltyTier (what can change)
EffectiveStartDate: When this version became active
EffectiveEndDate: When this version stopped being active
IsCurrent: Flag for current version (1 = current, 0 = historical)
VersionNumber: Sequence of changes (1, 2, 3...)

Example:

CustomerID | Segment    | LoyaltyTier | EffectiveStartDate | EffectiveEndDate | IsCurrent | VersionNumber
-----------|------------|-------------|-------------------|-----------------|-----------|---------------
C001       | Consumer   | Bronze      | 2023-06-15        | 2024-05-31      | 0         | 1
C001       | Corporate  | Silver      | 2024-06-01        | 9999-12-31      | 1         | 2

Interpretation:

Customer C001 started as Consumer/Bronze
Changed to Corporate/Silver on June 1, 2024
Corporate/Silver is current status
Can query historical state on any date

Design Decisions

Decision 1: Track Segment and LoyaltyTier Changes

These are the attributes that change
Don't track name changes (assumed rare/Type 1)
Don't track address (not relevant for analysis)

Decision 2: Use 9999-12-31 for "Current"

EffectiveEndDate = distant future means "still active"
Industry standard approach
Simplifies queries: WHERE date BETWEEN StartDate AND EndDate

Decision 3: Handle Initial Load vs Updates

First run: Create version 1 for all customers
Subsequent runs: Detect changes, create new versions
Use IF statement to check if SCD file exists

Decision 4: Close Old Versions When Creating New

When creating version 2, update version 1:
- Set EffectiveEndDate = yesterday
- Set IsCurrent = 0
Maintains clean history

Implementation Steps

Step 7.1: Reload Customer Dimension

Create tab section: SCD_Type2

Why reload?

We dropped it in previous phase
Need current customer data to compare
Load from transformed QVD

What to do:

DIM_Customers_For_SCD:
LOAD 
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier
FROM [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Note: Only load fields relevant for SCD

Not loading RegistrationDate, Customer_LastModifiedDate
These don't change, not part of history tracking

Step 7.2: Check If SCD File Exists

The logic:

First run: SCD file doesn't exist → Create initial versions
Subsequent runs: SCD file exists → Compare and detect changes

How to check:

IF FileSize('$(vTargetPath)DIM_Customers_SCD.qvd') > 0 THEN
    ... (handle updates)
ELSE
    ... (handle initial load)
ENDIF

FileSize() function:

Returns file size in bytes
Returns 0 or NULL if file doesn't exist
> 0 means file exists

Step 7.3: Initial Load (First Run)

When: SCD file doesn't exist (ELSE branch)

What to do: Create version 1 for all customers

DIM_Customers_SCD_New:
LOAD
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    Date(Today()) as EffectiveStartDate,
    Date('9999-12-31') as EffectiveEndDate,
    1 as IsCurrent,
    1 as VersionNumber
RESIDENT DIM_Customers_For_SCD;

Fields added:

EffectiveStartDate: Today (when record was created)
EffectiveEndDate: 9999-12-31 (far future = "current")
IsCurrent: 1 (all are current on first load)
VersionNumber: 1 (first version for everyone)

Result: 10 rows, one per customer, all version 1

Step 7.4: Update Logic (Subsequent Runs)

When: SCD file exists (THEN branch)

Steps:

Load existing SCD data
Identify current versions
Compare with new data
Detect changes
Create new versions for changed records
Close old versions
Keep unchanged records

Step 7.4.1: Load Existing SCD Data

DIM_Customers_SCD_Existing:
LOAD 
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    EffectiveStartDate,
    EffectiveEndDate,
    IsCurrent,
    VersionNumber
FROM [$(vTargetPath)DIM_Customers_SCD.qvd] (qvd);

Step 7.4.2: Get Current Active Records

DIM_Customers_Current:
LOAD 
    CustomerID,
    Segment as Old_Segment,
    LoyaltyTier as Old_LoyaltyTier,
    VersionNumber as Old_Version
RESIDENT DIM_Customers_SCD_Existing
WHERE IsCurrent = 1;

What this does:

Filters for current versions only (IsCurrent = 1)
Renames to "Old_" to distinguish from new values
We'll compare Old_Segment vs (current) Segment

Step 7.4.3: Join to Find Changes

LEFT JOIN (DIM_Customers_For_SCD)
LOAD 
    CustomerID,
    Old_Segment,
    Old_LoyaltyTier,
    Old_Version
RESIDENT DIM_Customers_Current;

What this does:

Adds Old_Segment, Old_LoyaltyTier to customer data
Now each row has both old and new values
Can compare: If Segment <> Old_Segment → Changed!

Step 7.4.4: Identify Changed Records

NoConcatenate
DIM_Customers_Changed:
LOAD *
RESIDENT DIM_Customers_For_SCD
WHERE Segment <> Old_Segment 
   OR LoyaltyTier <> Old_LoyaltyTier;

What this does:

Filters for rows where something changed
Either Segment changed OR LoyaltyTier changed
Creates table with only changed customers

Step 7.4.5: Check If Any Changes Detected

LET vChangesDetected = NoOfRows('DIM_Customers_Changed');

IF vChangesDetected > 0 THEN
    ... (create new versions)
ELSE
    ... (no changes, keep existing)
ENDIF

NoOfRows() function:

Counts rows in a table
Returns 0 if no changes
Returns number of changed customers

Step 7.4.6: Create New Versions (If Changes Detected)

NoConcatenate
DIM_Customers_SCD_New:
LOAD
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    Date(Today()) as EffectiveStartDate,
    Date('9999-12-31') as EffectiveEndDate,
    1 as IsCurrent,
    Old_Version + 1 as VersionNumber
RESIDENT DIM_Customers_Changed;

What this does:

Takes changed customers
Creates new version with current attributes
EffectiveStartDate = Today
VersionNumber = Old_Version + 1 (increments)

Step 7.4.7: Close Old Versions

CONCATENATE (DIM_Customers_SCD_New)
LOAD
    CustomerID,
    CustomerName,
    Segment,
    Country,
    Region,
    City,
    LoyaltyTier,
    EffectiveStartDate,
    Date(Today()-1) as EffectiveEndDate,
    0 as IsCurrent,
    VersionNumber
RESIDENT DIM_Customers_SCD_Existing
WHERE IsCurrent = 1;

What this does:

Takes old current versions
Sets EffectiveEndDate = Yesterday
Sets IsCurrent = 0
Appends to DIM_Customers_SCD_New

Why Today()-1?

New version starts today
Old version should end yesterday
No gap in date ranges

Step 7.4.8: Keep Unchanged Records

CONCATENATE (DIM_Customers_SCD_New)
LOAD * 
RESIDENT DIM_Customers_SCD_Existing
WHERE IsCurrent = 0;

What this does:

Keeps all old historical records
Only loads already-closed versions (IsCurrent = 0)
Maintains complete history

Step 7.4.9: Handle No Changes

ELSE  ← If vChangesDetected = 0

NoConcatenate
DIM_Customers_SCD_New:
LOAD * RESIDENT DIM_Customers_SCD_Existing;

What this does:

No changes detected
Simply reload existing SCD table
No new versions needed

Step 7.5: Clean Up

DROP TABLE DIM_Customers_Changed;
DROP TABLE DIM_Customers_SCD_Existing;
DROP TABLE DIM_Customers_For_SCD;
DROP TABLE DIM_Customers_Current;

Result: Only DIM_Customers_SCD_New remains

SCD Logic Flow Diagram

┌─────────────────────────────────────┐ │ Does SCD QVD File Exist? │ └────────┬──────────────────────┬─────┘ │ NO │ YES ▼ ▼ ┌─────────────────┐ ┌─────────────────────┐ │ INITIAL LOAD │ │ UPDATE LOGIC │ │ │ │ │ │ Create Ver 1 │ │ 1. Load existing │ │ for all │ │ 2. Get current vers │ │ customers │ │ 3. Compare with new │ │ │ │ 4. Detect changes │ │ Start: Today │ │ 5. Create new vers │ │ End: 9999-12-31 │ │ 6. Close old vers │ │ IsCurrent: 1 │ │ 7. Keep unchanged │ │ Version: 1 │ │ │ └─────────┬───────┘ └──────────┬──────────┘ │ │ └───────┬───────────────┘ ▼ ┌────────────────────┐ │ DIM_Customers_SCD │ │ (Complete History) │ └────────────────────┘

Key Techniques Introduced

Technique 1: FileSize() Function

IF FileSize('path/file.qvd') > 0 THEN
    ... file exists
ELSE
    ... file doesn't exist
ENDIF

Checks if file exists
Returns size in bytes or 0/NULL
Essential for handling first run vs updates

Technique 2: NoOfRows() Function

LET vCount = NoOfRows('TableName');

Counts rows in a table
Returns number
Stores in variable for IF condition

Technique 3: Date Arithmetic

Date(Today()) as StartDate
Date(Today()-1) as EndDate
Date('9999-12-31') as FarFuture

Today() returns current date
Today()-1 returns yesterday
Can add/subtract days from dates

Technique 4: Comparison in WHERE

WHERE Field1 <> Field2

<> means "not equal"
Filters for rows where values differ
Finds changes

Technique 5: OR Condition

WHERE Condition1 OR Condition2

True if either condition is true
Detects if Segment changed OR LoyaltyTier changed

Technique 6: Version Incrementing

Old_Version + 1 as VersionNumber

Takes existing version number
Adds 1
Creates next version

Common Pitfalls

❌ Pitfall 1: Not handling first run separately

IF FileSize() > 0 THEN
    ... only has update logic, no ELSE
ENDIF

Problem: First run fails (no existing data to compare)
Fix: Always have ELSE for initial load

❌ Pitfall 2: Forgetting to close old versions

← Missing the "update EffectiveEndDate" step

Problem: Multiple versions show IsCurrent = 1
Fix: Always update old versions when creating new

❌ Pitfall 3: Off-by-one date errors

New start: 2024-06-01
Old end:   2024-06-01  ← Overlap! Should be 2024-05-31

Problem: Date ranges overlap
Fix: New starts today, old ends yesterday

❌ Pitfall 4: Not using NoConcatenate**

DIM_Customers_SCD_New:  ← Might auto-concatenate!
LOAD ...

Problem: Auto-concatenation with existing table
Fix: Use NoConcatenate to control when tables merge

❌ Pitfall 5: Dropping tables in wrong order

DROP TABLE DIM_Customers_SCD_Existing;
CONCATENATE (DIM_Customers_SCD_New)
LOAD * RESIDENT DIM_Customers_SCD_Existing;  ← ERROR!

Problem: Dropped before using
Fix: Use table fully before dropping

❌ Pitfall 6: Not tracking all changed attributes

WHERE Segment <> Old_Segment  ← Only checking Segment
← Missing LoyaltyTier check

Problem: Misses LoyaltyTier changes
Fix: Check all tracked attributes with OR

Validation Checklist

✅ Script runs without errors on first run (no SCD file) ✅ DIM_Customers_SCD_New created with 10 rows ✅ All rows have VersionNumber = 1 ✅ All rows have IsCurrent = 1 ✅ All rows have EffectiveStartDate = Today ✅ All rows have EffectiveEndDate = 9999-12-31 ✅ SCD QVD file saved ✅ Script runs without errors on second run (SCD file exists) ✅ No changes detected → Same 10 rows ✅ Test with actual change:

Manually edit source customer file
Change C001 Segment from Corporate to Consumer
Reload script
Should see 2 versions for C001:
- Ver 1: Corporate, IsCurrent=0, EndDate=Yesterday
- Ver 2: Consumer, IsCurrent=1, EndDate=9999-12-31 ✅ Other customers unchanged (still 1 version each)

To test SCD properly:

Run initial load → 10 customers, all version 1
Run again → Still 10 rows (no changes)
Edit source: Change C001 Segment = 'Consumer'
Run again → Now 11 rows (C001 has 2 versions)
Query: WHERE CustomerID = 'C001' → See version history

📅 Phase 8: Master Calendar Creation

Concept: The Importance of a Date Dimension

Why We Need a Calendar Table:

Sales transactions have dates, but dates alone don't answer questions like:

"What were Q1 sales?" (need to know which months are Q1)
"How do weekdays compare to weekends?" (need day of week)
"What's our fiscal year performance?" (need fiscal year logic)
"Show me this month vs last month" (need current month flag)

Calendar table provides:

Every date in analysis period
All date attributes pre-calculated
Consistent fiscal year logic
Easy filtering and grouping

Traditional Approach vs Calculated Dimensions

Option 1: Calculate in Charts (DON'T DO THIS)

Chart expression: Quarter(OrderDate)
Problem: Calculated for every chart, every time

Slow (recalculates constantly)
Inconsistent (different charts might use different logic)
Complex expressions in every chart

Option 2: Master Calendar (BEST PRACTICE)

Calendar table: Pre-calculated Quarter field
Charts: Just use Quarter field

Fast (calculated once during load)
Consistent (same logic everywhere)
Simple chart expressions

Design Decisions

Decision 1: Generate from Data Not Fixed Range

Don't hardcode: 2024-01-01 to 2024-12-31
Instead: Find min/max dates in actual data
Benefit: Adapts automatically as data grows

Decision 2: Use AUTOGENERATE for Date Sequences

Qlik's built-in way to create rows
Combined with WHILE loop
Industry standard approach

Decision 3: Include Business Logic

Fiscal year (starts April, not January)
Business days (Monday-Friday)
Current period flags (today, this month, this year)

Decision 4: Link to Facts via OrderDate

LEFT JOIN to add "HasSales" flag
Shows which dates had transactions
Helps identify gaps in sales

Implementation Steps

Step 8.1: Find Date Range from Data

Create tab section: Master_Calendar

What to do:

TempDates:
LOAD 
    Min(OrderDate) as MinDate,
    Max(OrderDate) as MaxDate
RESIDENT FACT_Sales_Final;

What this does:

Scans all sales orders
Finds earliest OrderDate (e.g., 2024-01-15)
Finds latest OrderDate (e.g., 2024-02-14)
Creates temporary table with just these two values

Result: One row with MinDate and MaxDate

Step 8.2: Store Date Range in Variables

What to do:

LET vMinDate = Peek('MinDate', 0, 'TempDates');
LET vMaxDate = Peek('MaxDate', 0, 'TempDates');
DROP TABLE TempDates;

Peek() function:

Reads value from a table
Syntax: Peek('FieldName', RowNumber, 'TableName')
Row 0 = first row
Example: Peek('MinDate', 0, 'TempDates') → reads MinDate from first row

Why use variables?

Use these values in next step
Can't reference table fields in AUTOGENERATE directly
Need variables for date arithmetic

DROP TABLE TempDates:

No longer needed
Free up memory
Keep workspace clean

Step 8.3: Generate Date Sequence

Concept: Create one row for each date in range

Traditional SQL approach:

SELECT date FROM date_table
WHERE date BETWEEN '2024-01-15' AND '2024-02-14'

Qlik approach - AUTOGENERATE:

TempCalendar:
LOAD
    Date($(vMinDate) + IterNo() - 1) as TempDate
AUTOGENERATE 1
WHILE $(vMinDate) + IterNo() - 1 <= $(vMaxDate);

How this works:

AUTOGENERATE 1:

Creates one initial row
Think of it as starting point

IterNo():

Special function in AUTOGENERATE/WHILE loops
Returns current iteration number
First iteration: IterNo() = 1
Second iteration: IterNo() = 2
And so on...

WHILE condition:

Keeps generating rows while condition is true
Stops when condition becomes false

Date arithmetic:

Iteration 1: $(vMinDate) + 1 - 1 = $(vMinDate) + 0 = Min date
Iteration 2: $(vMinDate) + 2 - 1 = $(vMinDate) + 1 = Min date + 1 day
Iteration 3: $(vMinDate) + 3 - 1 = $(vMinDate) + 2 = Min date + 2 days
...
Iteration N: $(vMinDate) + N - 1 = Max date (stops here)

Example: If MinDate = 2024-01-15 and MaxDate = 2024-02-14:

Day 1: 2024-01-15
Day 2: 2024-01-16
Day 3: 2024-01-17
...
Day 31: 2024-02-14 (stops)

Result: TempCalendar with 31 rows, one per date

Step 8.4: Add Calendar Attributes

Create full calendar from dates:

MasterCalendar:
LOAD
    TempDate as CalendarDate,
    Week(TempDate) as Week,
    Year(TempDate) as Year,
    Month(TempDate) as Month,
    Day(TempDate) as Day,
    Ceil(Month(TempDate)/3) as Quarter,
    WeekDay(TempDate) as WeekDay,
    WeekName(TempDate) as WeekName,
    MonthName(TempDate) as MonthName,
    'Q' & Ceil(Month(TempDate)/3) as QuarterName,
    Year(TempDate) & '-Q' & Ceil(Month(TempDate)/3) as YearQuarter,
    
    If(Month(TempDate) >= 4, 
       Year(TempDate), 
       Year(TempDate) - 1) as FiscalYear,
    
    If(WeekDay(TempDate) >= 0 and WeekDay(TempDate) <= 4, 1, 0) as IsBusinessDay,
    
    If(TempDate = Today(), 1, 0) as IsToday,
    If(MonthName(TempDate) = MonthName(Today()), 1, 0) as IsCurrentMonth,
    If(Year(TempDate) = Year(Today()), 1, 0) as IsCurrentYear
    
RESIDENT TempCalendar;

DROP TABLE TempCalendar;

Attribute Explanations:

CalendarDate:

The actual date value
Primary key for calendar
Links to OrderDate in sales

Week(TempDate):

Week number of year (1-52/53)
Example: January 15 → Week 3

Year(TempDate):

Four-digit year
Example: 2024

Month(TempDate):

Month number (1-12)
Example: January = 1, February = 2

Day(TempDate):

Day of month (1-31)
Example: 15

Ceil(Month(TempDate)/3) as Quarter:

Quarter number (1-4)
Same calculation as in Phase 5
Q1 = Jan-Mar, Q2 = Apr-Jun, Q3 = Jul-Sep, Q4 = Oct-Dec

WeekDay(TempDate):

Day of week as number
0 = Monday, 1 = Tuesday, ..., 6 = Sunday
Different from some systems that use 0 = Sunday!

WeekName(TempDate):

Week identifier: "2024/W03"
Format: Year/Week number
Good for weekly grouping

MonthName(TempDate):

Month-year identifier: "2024-01"
Format: YYYY-MM
Good for monthly grouping

'Q' & Ceil(Month(TempDate)/3) as QuarterName:

User-friendly quarter label: "Q1", "Q2", etc.
String concatenation: 'Q' + quarter number

Year(TempDate) & '-Q' & Ceil(Month(TempDate)/3) as YearQuarter:

Year-Quarter identifier: "2024-Q1"
Useful for year-over-year quarter comparisons

FiscalYear Calculation:

If(Month(TempDate) >= 4, 
   Year(TempDate), 
   Year(TempDate) - 1) as FiscalYear

If month is April or later → Fiscal year = calendar year
If month is Jan-Mar → Fiscal year = previous calendar year
Example:
- 2024-04-01 → FY 2024
- 2024-01-15 → FY 2023 (because Jan is in previous fiscal year)

IsBusinessDay:

If(WeekDay(TempDate) >= 0 and WeekDay(TempDate) <= 4, 1, 0)

WeekDay 0-4 = Monday-Friday = Business days
WeekDay 5-6 = Saturday-Sunday = Weekends
Returns 1 for business days, 0 for weekends

IsToday:

If(TempDate = Today(), 1, 0)

1 if this date is today
0 otherwise
Useful for "today" filters in dashboards

IsCurrentMonth:

If(MonthName(TempDate) = MonthName(Today()), 1, 0)

1 if this date is in current month
Compares MonthName (format: "2024-01")
Works across month boundaries

IsCurrentYear:

If(Year(TempDate) = Year(Today()), 1, 0)

1 if this date is in current year
Useful for YTD (Year-To-Date) calculations

Step 8.5

: Link Calendar to Sales

Add sales indicator:

LEFT JOIN (MasterCalendar)
LOAD DISTINCT
    OrderDate as CalendarDate,
    1 as HasSales
RESIDENT FACT_Sales_Final;

What this does:

Takes all unique OrderDate values from sales
Joins to MasterCalendar where dates match
Adds HasSales = 1 for those dates
Dates without sales get HasSales = NULL

DISTINCT keyword:

Only gets unique dates
Multiple orders on same date counted once
Example: 5 orders on 2024-01-15 → One row with CalendarDate = 2024-01-15

Why add HasSales?:

Identify dates with transactions
Filter for active sales days
Find gaps in sales (dates without sales)

Result:

Calendar has all dates in range
Dates with sales have HasSales = 1
Dates without sales have HasSales = NULL

Key Techniques Introduced

Technique 1: Min/Max Aggregate Functions

Min(Field) as MinValue
Max(Field) as MaxValue

Finds minimum and maximum values
Works on any field type
One row result

Technique 2: Peek() Function

LET vVariable = Peek('FieldName', RowNumber, 'TableName');

Reads value from specific row in table
Row 0 = first row
Returns field value
Stores in variable

Technique 3: AUTOGENERATE with WHILE

LOAD
    Expression
AUTOGENERATE 1
WHILE condition;

Generates rows programmatically
Continues while condition is true
Uses IterNo() for sequence

Technique 4: IterNo() Function

Only works in AUTOGENERATE/WHILE context
Returns current iteration number
Starts at 1

Technique 5: Date Arithmetic

Date + Number = Date plus that many days

Add days to date
Subtract days from date
Create date sequences

Technique 6: String Concatenation

'Text' & Field & 'MoreText' as CombinedField

& operator joins strings
Combine literal text and field values
Create formatted labels

Technique 7: LOAD DISTINCT

LOAD DISTINCT Field1, Field2

Only loads unique combinations
Removes duplicates automatically
Useful for creating lists

Technique 8: Date Functions

Week(date) → Week number
Year(date) → Year
Month(date) → Month number (1-12)
Day(date) → Day of month
WeekDay(date) → Day of week (0-6)
WeekName(date) → Week identifier
MonthName(date) → Month identifier

Common Pitfalls

❌ Pitfall 1: Off-by-one in date generation

Date($(vMinDate) + IterNo()) as TempDate  ← Wrong!
Should be: $(vMinDate) + IterNo() - 1

Problem: Skips first date, adds extra day at end
Fix: Remember IterNo() starts at 1, so subtract 1

❌ Pitfall 2: Wrong fiscal year logic

If(Month(TempDate) > 4, ...)  ← Should be >=

Problem: April is first month of fiscal year, needs >=
Fix: Use >= not >

❌ Pitfall 3: Confusing WeekDay numbering

If(WeekDay(TempDate) >= 1 and WeekDay(TempDate) <= 5, ...)  ← Wrong!

Problem: Qlik uses 0=Monday, not 1=Monday
Fix: Use 0-4 for Monday-Friday

❌ Pitfall 4: Not using Date() function**

$(vMinDate) + IterNo() - 1 as TempDate  ← Might be number

Problem: Result might be numeric instead of date
Fix: Wrap in Date() function: Date($(vMinDate) + IterNo() - 1)

❌ Pitfall 5: Forgetting DROP TABLE TempCalendar**

← Missing DROP TABLE TempCalendar

Problem: Unnecessary table in data model
Fix: Always drop temporary tables

❌ Pitfall 6: Calendar doesn't cover all sales dates**

← Used fixed date range instead of Min/Max

Problem: New sales outside range not in calendar
Fix: Generate from actual data date range

Validation Checklist

✅ Script runs without errors ✅ Variables vMinDate and vMaxDate set correctly ✅ TempCalendar created with correct number of rows ✅ MasterCalendar has attributes:

CalendarDate (primary key)
Year, Month, Day, Quarter, Week
WeekDay (0-6)
WeekName, MonthName, QuarterName, YearQuarter
FiscalYear
IsBusinessDay, IsToday, IsCurrentMonth, IsCurrentYear
HasSales (1 for dates with sales, NULL otherwise) ✅ Number of rows = number of days in range
Example: Jan 15 to Feb 14 = 31 days = 31 rows ✅ Check fiscal year calculation:
March date → FY should be previous year
April date → FY should be current year ✅ Check business day flag:
Monday-Friday = 1
Saturday-Sunday = 0 ✅ Check date with sales:
Pick a date with orders (e.g., 2024-01-15)
HasSales should be 1 ✅ Check date without sales:
Pick a date without orders
HasSales should be NULL ✅ All dates in sales exist in calendar ✅ Calendar covers entire date range in sales

Manual verification:

Find OrderDate in FACT_Sales_Final
Check that date exists in MasterCalendar
Verify attributes are correct (year, month, quarter)

💾 Phase 9: Data Persistence (Saving to QVD)

Concept: The Three-Layer Architecture

Layer 1: Source (Raw Data)

Files: DIM_Products.qvd, DIM_Customers.qvd, FACT_Sales.qvd
Purpose: Original data from source systems
Untouched, preserved as-is

Layer 2: Transformed (Business Logic Applied)

Files: DIM_Products_Transformed.qvd, FACT_Sales_Transformed.qvd
Purpose: Data with calculations, ready for analysis
This is our work product

Layer 3: Visualization (Optimized for Dashboards)

Files: FACT_Sales_Complete.qvd (consolidated)
Purpose: Final data model, loaded into Qlik for dashboards
Clean, performant, no synthetic keys

Why Save to QVD?

Reason 1: Speed

QVD loads 10-100x faster than databases or CSV
Next reload: Don't recalculate, just load from QVD

Reason 2: Incremental Loading Support

Save transformed history
Next run: Load history + append new data
Don't reprocess historical data

Reason 3: Reusability

Multiple apps can use same QVDs
Consistent data across dashboards
"Create once, use many" approach

Reason 4: Backup/Audit

Snapshot of data at load time
Can compare today's vs yesterday's QVD
Audit trail for compliance

Reason 5: Development Efficiency

Test visualization layer without running full ETL
Faster iteration during dashboard development

Design Decisions

Decision 1: Save Multiple Versions of Fact Table

FACT_Sales_Transformed.qvd: Used for next incremental load
FACT_Sales_Complete.qvd: Used for visualization
Why both? Different purposes, may diverge over time

Decision 2: Drop Tables After Saving

Tables saved to disk
Drop from memory to prevent synthetic keys
Will reload in visualization phase

Decision 3: Save SCD and Calendar

DIM_Customers_SCD.qvd: Needed for historical tracking
MasterCalendar.qvd: Reusable across reloads

Decision 4: Organized Folder Structure

DataFiles/                 ← Source (never modify)
DataFiles/Transformed/     ← Our transformed data

Implementation Steps

Step 9.1: Save Fact Tables

Create tab section: Save_Transformed_Data

Save for incremental loading:

STORE FACT_Sales_Final INTO [$(vTargetPath)FACT_Sales_Transformed.qvd] (qvd);

Used in next run's incremental load
Becomes the "history"

Save for visualization:

STORE FACT_Sales_Final INTO [$(vTargetPath)FACT_Sales_Complete.qvd] (qvd);

Used for dashboard data model
Complete consolidated dataset

Why save twice?

Could diverge over time (different processing)
Separation of concerns
Clear purpose for each file

Step 9.2: Save Dimension Tables

Products:

STORE DIM_Products_Transformed INTO [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);

Customers:

STORE DIM_Customers_Transformed INTO [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Stores:

STORE DIM_Stores_Raw INTO [$(vTargetPath)DIM_Stores_Transformed.qvd] (qvd);

Note: DIM_Stores_Raw (not transformed)

We didn't transform stores in our scenario
Directly save from raw version
Still goes in Transformed folder for consistency

Step 9.3: Save SCD Table

STORE DIM_Customers_SCD_New INTO [$(vTargetPath)DIM_Customers_SCD.qvd] (qvd);

Critical: This is the historical tracking table

Contains all versions (current and historical)
Next run: Will be loaded to detect changes
Growing table (adds versions over time)

Step 9.4: Save Master Calendar

STORE MasterCalendar INTO [$(vTargetPath)MasterCalendar.qvd] (qvd);

Why save?

Reusable across reloads
Date range might change (new sales data)
But most dates stay the same (efficient)

Step 9.5: Drop Tables to Prepare for Visualization

DROP TABLE DIM_Stores_Raw;
DROP TABLE FACT_Sales_Final;
DROP TABLE DIM_Customers_SCD_New;
DROP TABLE MasterCalendar;

Why drop after saving?

All data safely stored in QVD
Need clean workspace for visualization load
Prevents synthetic keys in final model

Current state in memory: EMPTY

All tables saved to disk
All tables dropped from memory
Ready to reload selectively for visualization

Understanding QVD File Benefits

Performance Comparison:

Loading from:
CSV file:        1000 rows/second
Database:        5000 rows/second  
QVD (optimized): 100,000+ rows/second

Optimized vs Non-Optimized Load:

Optimized Load (fastest):

LOAD * FROM file.qvd (qvd);

Loads fields in same order as stored
Direct memory mapping
No transformations

Non-Optimized Load (still fast, but slower):

LOAD 
    Field1,
    Field2,
    Field3 * 2 as DoubleField3  ← Transformation
FROM file.qvd (qvd);

Includes transformations
Field order changed
Still faster than database, but not optimized

Key Techniques Introduced

Technique 1: STORE Statement

STORE TableName INTO [lib://Connection/path/filename.qvd] (qvd);

Saves table from memory to QVD file
Overwrites if file exists
Creates file if doesn't exist
(qvd) specifies format

Technique 2: Multiple STORE of Same Table

STORE Table1 INTO [path/fileA.qvd] (qvd);
STORE Table1 INTO [path/fileB.qvd] (qvd);

Can save same table to multiple files
Each file is independent
Useful for different purposes

Technique 3: STORE then DROP Pattern

STORE TableName INTO [path/file.qvd] (qvd);
DROP TABLE TableName;

Save data first (to disk)
Then drop from memory
Standard ETL pattern

Technique 4: File Path with Variables

[$(vTargetPath)filename.qvd]

Variable expands to full path
Makes code portable
Easy to change storage location

Common Pitfalls

❌ Pitfall 1: DROP before STORE

DROP TABLE MyTable;
STORE MyTable INTO ...  ← ERROR: Table doesn't exist!

Problem: Can't store what doesn't exist
Fix: Always STORE first, then DROP

❌ Pitfall 2: Forgetting to save before dropping

DROP TABLE DIM_Products_Transformed;  ← Lost the data!

Problem: All transformation work lost
Fix: STORE before every DROP

❌ Pitfall 3: Wrong file extension

STORE Table INTO [path/file.txt] (qvd);

Problem: Confusing - claims QVD but .txt extension
Fix: Always use .qvd extension for QVD files

❌ Pitfall 4: Forgetting (qvd) format specifier

STORE Table INTO [path/file.qvd];  ← No format specified

Problem: Qlik might use wrong format
Fix: Always add (qvd) at end

❌ Pitfall 5: Using same filename for different content

STORE DIM_Products INTO [path/Dimensions.qvd] (qvd);
STORE DIM_Customers INTO [path/Dimensions.qvd] (qvd);  ← Overwrites!

Problem: Second STORE overwrites first
Fix: Use unique filenames

❌ Pitfall 6: Saving to wrong folder

STORE Table INTO [$(vSourcePath)file.qvd] (qvd);  ← Wrong folder!
Should be: [$(vTargetPath)file.qvd]

Problem: Overwrites source data or wrong location
Fix: Use correct path variable

Validation Checklist

✅ Script runs without errors ✅ All tables saved successfully ✅ Check Windows Explorer - files exist in Transformed folder:

FACT_Sales_Transformed.qvd
FACT_Sales_Complete.qvd
DIM_Products_Transformed.qvd
DIM_Customers_Transformed.qvd
DIM_Stores_Transformed.qvd
DIM_Customers_SCD.qvd
MasterCalendar.qvd ✅ File sizes reasonable (not 0 bytes) ✅ All tables dropped from memory (Data Model Viewer should be empty) ✅ Script output shows "Storing..." messages for each table ✅ No error messages about file access or permissions

File size expectations:

Small tables (10-15 rows): Few KB
Calendar (30-60 rows): Few KB
Might see very small sizes, that's normal for training data

Test QVD files: Create a new script tab:

TEST:
LOAD * FROM [$(vTargetPath)FACT_Sales_Complete.qvd] (qvd);

Run → Should load successfully → Confirms QVD is valid

🎨 Phase 10: Visualization Data Model

Concept: Optimizing for Analysis

ETL is complete, but...

Current state: All data in QVD files, nothing in memory
Need to: Load data into Qlik for visualization
Goal: Clean star schema, optimized for dashboards

Why a Separate Loading Phase?

Reason 1: Different Requirements

ETL phase: Focus on transformations, joining, calculating
Visualization phase: Focus on performance, relationships, usability

Reason 2: Field Naming

During ETL: Used prefixes (Prod_, Cust_) to prevent synthetic keys
For dashboards: Need clean, user-friendly names

Reason 3: Selective Loading

Don't need all tables/fields for visualization
Load only what's necessary
Optimize performance

Reason 4: Multiple Apps

One ETL creates QVDs
Many apps can load from same QVDs
Separation of data prep and data consumption

Star Schema Design

What is Star Schema?

Dimension Dimension ┌─────────────┐ ┌─────────────┐ │ Products │ │ Customers │ │ │ │ │ │ ProductID◄──┼───────────┼──►CustomerID│ │ Name │ │ │ Name │ │ Category │ │ │ Segment │ │ Brand │ │ │ Country │ └─────────────┘ │ └─────────────┘ │ ┌────▼────┐ │ Sales │ ← Fact (Center) │ (FACT) │ │ OrderID │ │ProductID│ │CustomerID│ │ Revenue │ │ Profit │ └────┬────┘ │ │ ┌────▼────┐ │ Calendar│ ← Dimension │ │ │CalendarDate│ │ Year │ │ Quarter │ └─────────┘

Components:

Fact Table (center): Sales transactions with metrics
Dimension Tables (points): Descriptive attributes
Relationships: Via key fields (ProductID, CustomerID, CalendarDate)

Benefits:

Intuitive structure (matches business thinking)
Fast queries (optimized joins)
Easy to understand and maintain
Scales well with data growth

Design Decisions

Decision 1: Qualify Dimension Fields

Problem: Multiple tables have "ProductName", "CustomerName"
Solution: Prefix in dimensions: Dim_ProductName, Dim_CustomerName
Result: Only fact table has clean names (ProductName)

Decision 2: Keep SCD Table Separate

Customer_History is standalone (not linked)
Why? Different purpose (historical analysis)
Prevents confusion in regular sales analysis

Decision 3: Fully Qualify SCD Fields

All fields prefixed: History_CustomerID, History_Segment
Breaks relationship intentionally
Used only for specific historical queries

Decision 4: Load Dimensions with Specific Fields

Don't use LOAD * for dimensions
List fields explicitly
Rename as needed for clarity

Implementation Steps

Step 10.1: Load Sales Fact Table

Create tab section: Load_Data_for_Viz

What to load:

Sales:
LOAD * FROM [$(vTargetPath)FACT_Sales_Complete.qvd] (qvd);

Table name: Sales (not FACT_Sales_Final)

User-friendly name
Standard convention for fact tables
Contains all business metrics

Fields included:

Keys: OrderID, CustomerID, ProductID
Dates: OrderDate, OrderYear, OrderMonth, OrderQuarter
Metrics: GrossRevenue, NetRevenue, GrossProfit, TotalCost
Dimensions: ProductName, CustomerName, Segment, Category, Region, etc.
Flags: IsCompleted, IsCurrentYear, OrderValueCategory

This is the center of star schema

Step 10.2: Load Products Dimension

With qualified fields:

Products:
LOAD 
    ProductID,
    ProductName as Dim_ProductName,
    Category as Dim_Category,
    SubCategory as Dim_SubCategory,
    Brand as Dim_Brand,
    UnitCost as Dim_UnitCost,
    ListPrice as Dim_ListPrice,
    Status,
    Product_LastModifiedDate,
    ProfitPerUnit as Dim_ProfitPerUnit,
    ProfitMarginPercent,
    PriceTier as Dim_PriceTier,
    IsActive
FROM [$(vTargetPath)DIM_Products_Transformed.qvd] (qvd);

Key points:

ProductID: NOT renamed (it's the key field)
Descriptive fields: Renamed with Dim_ prefix
Status, IsActive: Keep original names (no conflict)

Why rename?

Sales table has ProductName, Category, Brand
Products table would also have ProductName, Category, Brand
Result: Synthetic key!
Solution: Rename in Products to Dim_ProductName, etc.

Relationship:

Products.ProductID = Sales.ProductID
Qlik automatically detects relationship
Users can filter by Dim_Category → affects Sales

Step 10.3: Load Customers Dimension

With qualified fields:

Customers:
LOAD 
    CustomerID,
    CustomerName as Dim_CustomerName,
    Country as Dim_Country,
    Region as Dim_Region,
    City as Dim_City,
    Segment as Dim_Segment,
    RegistrationDate,
    Customer_LastModifiedDate,
    CustomerTenureYears as Dim_CustomerTenureYears,
    LoyaltyTier as Dim_LoyaltyTier,
    MarketType as Dim_MarketType
FROM [$(vTargetPath)DIM_Customers_Transformed.qvd] (qvd);

Same pattern:

CustomerID: Not renamed (key field)
Descriptive fields: Dim_ prefix
Date fields: Keep original names (no conflict)

Relationship:

Customers.CustomerID = Sales.CustomerID
Users can filter by Dim_Segment → affects Sales

Step 10.4: Load Stores Dimension

Minimal changes needed:

Stores:
LOAD 
    StoreID,
    StoreName,
    Store_Country,
    Store_Region,
    Store_City,
    StoreType,
    OpeningDate,
    Store_LastModifiedDate
FROM [$(vTargetPath)DIM_Stores_Transformed.qvd] (qvd);

Why minimal changes?

Already prefixed with Store_ in Phase 2
No conflicts with Sales table
Independent dimension (no direct link to Sales in our scenario)

Usage:

Reference data for stores
Can manually filter sales by Store_Country, Store_Region
Not directly linked (no StoreID in Sales)

Step 10.5: Load Master Calendar

Load all fields:

Calendar:
LOAD * FROM [$(vTargetPath)MasterCalendar.qvd] (qvd);

Why LOAD * here?

Calendar fields are unique (CalendarDate, Year, Quarter, etc.)
No conflicts with other tables
Want all calendar attributes available

Relationship:

Calendar.CalendarDate = Sales.OrderDate
Qlik auto-detects this relationship
Users can filter by Quarter, Month → affects Sales

Usage in dashboards:

Filter by Quarter: "Show me Q1 sales"
Filter by IsBusinessDay: "Compare weekdays vs weekends"
Filter by FiscalYear: "Show fiscal year performance"

Step 10.6: Load Customer History (SCD)

Fully qualified to prevent linking:

Customer_History:
LOAD 
    CustomerID as History_CustomerID,
    CustomerName as History_CustomerName,
    Segment as History_Segment,
    Country as History_Country,
    Region as History_Region,
    City as History_City,
    LoyaltyTier as History_LoyaltyTier,
    EffectiveStartDate as History_StartDate,
    EffectiveEndDate as History_EndDate,
    IsCurrent as History_IsCurrent,
    VersionNumber as History_Version
FROM [$(vTargetPath)DIM_Customers_SCD.qvd] (qvd);

Critical: ALL fields prefixed

Including CustomerID → History_CustomerID
Breaks relationship with Sales and Customers
Intentional: This is standalone historical table

Why separate?

SCD tracks changes over time
Regular sales analysis uses current customer status
Historical analysis uses this table explicitly
Prevents confusion in standard reports

Usage:

"Show me when customers changed segments"
"How many customers upgraded to Gold this year?"
"List all historical versions of customer C001"

Understanding the Final Data Model

Tables in memory:

Sales (20 rows) - Fact table
Products (10 rows) - Dimension
Customers (10 rows) - Dimension
Stores (5 rows) - Dimension
Calendar (31 rows) - Dimension
Customer_History (10 rows) - Standalone

Relationships:

Products.ProductID = Sales.ProductID
Customers.CustomerID = Sales.CustomerID
Calendar.CalendarDate = Sales.OrderDate

Standalone:

Stores: No link (reference data)
Customer_History: No link (historical analysis only)

Field naming in model:

Sales table: Clean names (ProductName, Category, CustomerName, Segment)
Dimension tables: Prefixed (Dim_ProductName, Dim_Category, History_CustomerID)
Result: No synthetic keys, clear model

Key Techniques Introduced

Technique 1: Selective Field Loading

LOAD 
    Field1,
    Field2,
    Field3 as RenamedField3
FROM file.qvd (qvd);

Choose which fields to load
Rename fields as needed
Control data model precisely

Technique 2: Consistent Prefixing

Dim_FieldName    ← Dimension fields
History_FieldName ← SCD fields
Store_FieldName   ← Store-specific fields

Makes purpose clear
Prevents synthetic keys
Improves model readability

Technique 3: Strategic Key Field Handling

ProductID,           ← Never rename key fields
ProductName as Dim_ProductName  ← Rename descriptive fields

Keys maintain relationships
Descriptions get qualified

Technique 4: Intentional Relationship Breaking

CustomerID as History_CustomerID  ← Breaks link

Rename key field when you DON'T want relationship
Creates standalone table
Used for special-purpose tables

Common Pitfalls

❌ Pitfall 1: Not renaming duplicate fields

LOAD 
    ProductID,
    ProductName,  ← Also in Sales!
    Category      ← Synthetic key created
FROM file.qvd (qvd);

Problem: Synthetic keys
Fix: Rename: ProductName as Dim_ProductName

❌ Pitfall 2: Renaming key fields unintentionally

LOAD 
    ProductID as Dim_ProductID,  ← WRONG!
    ...

Problem: Breaks relationship
Fix: Never rename keys unless intentional

❌ Pitfall 3: Inconsistent prefixes

Dim_ProductName
Prd_Category  ← Different prefix!
Product_Brand ← Yet another prefix!

Problem: Confusing, hard to remember
Fix: Stick to one prefix scheme (Dim_)

❌ Pitfall 4: Loading unnecessary fields

LOAD * FROM file.qvd (qvd);  ← Loads everything

Problem: Bloated model, potential conflicts
Fix: Load only needed fields explicitly

❌ Pitfall 5: Forgetting to load a table

← Missing: LOAD Calendar

Problem: No time intelligence available
Fix: Load all necessary dimensions

❌ Pitfall 6: Wrong table names in visualization layer

FACT_Sales_Final:  ← ETL naming
LOAD ...

Problem: Confusing for dashboard developers
Fix: Use simple names (Sales, Products, Customers)

Validation Checklist

✅ Script runs without errors ✅ All 6 tables loaded successfully ✅ Data Model Viewer shows:

Sales table in center
Products connected via ProductID
Customers connected via CustomerID
Calendar connected via OrderDate/CalendarDate
Stores standalone
Customer_History standalone ✅ CRITICAL: Zero synthetic keys (0 synthetic key(s) in output) ✅ Check relationships:
Click ProductID → should highlight in Sales and Products
Click CustomerID → should highlight in Sales and Customers
Click OrderDate → should highlight CalendarDate ✅ Check Sales table has clean field names:
ProductName (not Prod_ProductName)
CustomerName (not Cust_CustomerName)
Category, Segment, etc. ✅ Check Products table has prefixed names:
Dim_ProductName
Dim_Category
Dim_Brand ✅ Check Customers table has prefixed names:
Dim_CustomerName
Dim_Segment
Dim_Country ✅ Check Customer_History is isolated:
History_CustomerID (not CustomerID)
No green line connecting to Sales or Customers ✅ Row counts correct:
Sales: 20 rows (15 initial + 5 incremental)
Products: 10 rows
Customers: 10 rows
Stores: 5 rows
Calendar: 31 rows (Jan 15 - Feb 14)
Customer_History: 10 rows (all version 1 on first run)

Visual validation in Data Model Viewer:

Test with filter:

Create a filter pane
Add Dim_Category
Select "Smartphones"
Sales table should filter to smartphone orders only
Confirms relationship works

📘 Technical Implementation Guide

📋 Table of Contents

🎯 Overview

Purpose of This Document

Document Structure

🏗️ Architecture Design

Data Flow Architecture

Folder Structure

Script Organization

Phase 1: Foundation Setup

Concept: ETL Foundation

Why This Matters

Design Decisions

Implementation Steps

Step 1.1: Create Physical Folder Structure

Step 1.2: Create Qlik Sense Application

Step 1.3: Create Data Connection

Step 1.4: Generate Source QVD Files

Key Techniques Introduced

Common Pitfalls

Validation Checklist

🎯 Phase 2: Data Extraction

Concept: The "E" in ETL

Why Extract to Raw Tables First?

Design Decisions

Implementation Steps

Step 2.1: Configure Variables

Step 2.2: Extract Dimension - Products

Step 2.3: Extract Dimension - Customers

Step 2.4: Extract Dimension - Stores

Step 2.5: Extract Fact - Sales

Key Techniques Introduced

Common Pitfalls

Validation Checklist

Phase 3: Dimension Transformation

Concept: Adding Business Value

Why Transform Dimensions First?

Design Decisions

Implementation Steps

Step 3.1: Transform Products

Step 3.2: Transform Customers

Key Techniques Introduced

Common Pitfalls

Validation Checklist

🔗 Phase 4: Data Integration (Joins)

Concept: Enriching Facts with Dimensions

What is a JOIN?

Why LEFT JOIN for This Scenario?

Design Decisions

Implementation Steps

Step 4.1: Join Sales with Products

Step 4.2: Join Sales with Customers

Understanding Synthetic Keys (and Why We Avoid Them)

Key Techniques Introduced

Common Pitfalls

Validation Checklist

💼 Phase 5: Business Logic Application

Concept: Creating the Final Fact Table

Why Create a New Final Table?

Design Decisions

Implementation Steps

Step 5.1: Create Final Fact Table

Step 5.2: Include Time Intelligence Fields

Step 5.3: Rename Dimension Fields to Clean Names

Step 5.4: Include Transaction Fields

Step 5.5: Calculate Revenue Metrics

Step 5.6: Calculate Cost Metrics

Step 5.7: Calculate Profit Metrics

Step 5.8: Apply Business Classifications

Step 5.9: Calculate Total Transaction Value

Step 5.10: Add Flags and Metadata

Step 5.11: Save Dimensions to QVD

Step 5.12: Clean Up Tables

Key Techniques Introduced

Common Pitfalls

Validation Checklist

🔄 Phase 6: Incremental Load Implementation

Concept: Loading Only New Data

How Incremental Load Works

Design Decisions