Time Series Decomposition and Autocorrelation Function in Power BI with Python

Dynamic Time Series Decomposition with Python in Power BI

OVERVIEW

Functionality

Allows end users to dynamically explore their data for seasonality and trends based on any attribute/metric afforded to them in their Power BI report

Features

Decompose and explore data dynamically based on different seasonality grains (daily, monthly, quarterly, etc...)
Robust python scripting allows for local development as well as ingestion of metrics indifferent of naming conventions
Allows Users to review cross sectional data by slicing and filtering other metrics/visuals on their report
Time Series Decomposition includes R^2 values
Dynamic visual headers, x-axis, etc...
And more!

TECHNICAL DEFINITIONS

Time Series Decomposition (TSD): is a method for breaking down a time series into its key components—trend, seasonality, and residuals—to better understand its underlying patterns and make more accurate forecasts. Further Reading: Forecasting: Principles and Practice, Ch3. Time Series Decomposition
Autocorrelation Function (ACF): measures the correlation between a time series and its lagged values, capturing the persistence of patterns over time. Further Reading: Forecasting: Principles and Practice, 2.8 Autocorrelation
Partial Autocorrelation Function (PACF): quantifies the correlation between a time series and its lagged values after accounting for the influence of intermediate lags, isolating the direct relationships. Further Reading: Forecasting: Principles and Practice, 9.5 Non-seasonal ARIMA models

IMPLEMENTATION

Power BI

Create a new field parameter based on a calendar table that has the needed dates as a time series. For example on your calendar table, you will need a calendar[First Day of Quarter] column which repeats the first day of the quarter for every quarter in your date table. Then UPDATE your new field parameter table with the DAX script below. Once complete, rename the fourth column to be: Seasonality[Seasonal Value].

Microsoft, Field Parameters
Analytical Ants, Custom Date Table (TBD)
New Seasonality table DAX:

Seasonality = {
    ("Yearly", NAMEOF('Calendar'[First Day of Year]), 0 , 1),
    ("Quarterly", NAMEOF('Calendar'[First Day of Quarter]), 1 , 4),
    ("Monthly", NAMEOF('Calendar'[First Day of Month]), 2 , 12),
    ("Weekly", NAMEOF('Calendar'[First Day of Week]), 3 , 7) ,
    ("Daily", NAMEOF('Calendar'[Date]), 4 , 365)
}

You will need to create a new measure for your seasonality table

Selected Seasonality = SELECTEDVALUE( Seasonality[Seasonal Value] , 12 )

Now, the column Seasonality[Seasonality] will be used as a filter on your report.

Next, you will bring in the following column/measures into a python visual: Seasonality[Seasonality], [Your Measure], [Selected Seasonality]

Lastly, you will filter your visual so that [Your Measure] is not blank.

Below details the python scripting you will insert into each visual

Python

Insert the following python scripts into your python visual. First:

Follow the prerequisites for working with Python in PowerBI (Microsoft, Python in Power BI Desktop)
Load the following modules to your python environment:

pip install pandas matplotlib numpy statsmodels

Time Series Decomposition (TSD)

Monthly Time Series Decomposition — Time Series Decomposition, Monthly

Quarterly Time Series Decomposition — Time Series Decomposition, Quarterly

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose, STL
from statsmodels.tools.sm_exceptions import ValueWarning
import warnings
import numpy as np
import os

'''
In Power BI, assume the dataset looks like:
     0 - Date Column
     1 - Data Column
     2 - Selected Seasonality Column
'''
try:
    # If 'dataset' exists, we're likely in the Power BI environment
    df = dataset
except NameError:
    # If dataset does not exist, then run locally
    script_dir = os.getcwd()  # current working directory
    file_name = "Monthly data.csv"
    file_path = os.path.join(script_dir, file_name)
    df = pd.read_csv(file_path)

# Identify column names
date_column = df.columns[0]          # e.g., "Monthly" or "Quarterly" date strings
data_column = df.columns[1]          # e.g., "SUM Data"
seasonality_column = df.columns[2]   # e.g., "Selected Seasonality"

# Clean Data
df[date_column] = pd.to_datetime(df[date_column]) # Convert date column to datetime
df.sort_values(date_column, inplace=True) # Sort by date just in case
df.set_index(date_column, inplace=True) # Set index to date column
# df[data_column] = df[data_column].fillna(method='ffill').fillna(method='bfill') # We can fill missing data if needed; your data might not have NA though
df[data_column] = df[data_column].fillna(df[data_column].mean())

# Grab the first "Selected Seasonality" from the dataset
selected_seasonality = df[seasonality_column].iloc[0]

# Dynamic Formatting
visual_width = 800  # Default width in pixels
visual_height = 600  # Default height in pixels
try:
    if 'width' in dataset.columns and 'height' in dataset.columns:
        visual_width = dataset['width'].iloc[0]
        visual_height = dataset['height'].iloc[0]
except NameError:
    visual_width = 800
    visual_height = 600

# Aspect ratio scaling
aspect_ratio = visual_width / visual_height
base_size = min(visual_width, visual_height) / 100  # Scale based on smaller dimension
figsize = (base_size * aspect_ratio, base_size)

# Dynamic font scaling
font_scale = base_size / 8
plt.rcParams.update({'font.size': 10 * font_scale})

## ---------------------  TDS ---------------------

# Choose decomposition method based on data availability
if len(df) >= 2 * selected_seasonality:  # Check if we have at least two cycles
    result = seasonal_decompose(df[data_column], model='additive', period=selected_seasonality)
else:
    print("Using STL for decomposition due to insufficient cycles for traditional method.")
    result = STL(df[data_column], period=selected_seasonality).fit()

# Calculate R-squared
y = df[data_column]
y_hat = result.trend + result.seasonal
r_squared = 1 - (np.sum((y - y_hat)**2) / np.sum((y - y.mean())**2))

# Plotting
fig, axes = plt.subplots(4, 1, figsize=figsize, gridspec_kw={'height_ratios': [1.5, 1, 1, 1]})

# Information for the first visual
x_axis_label = date_column  # Replace this with the selected X-axis granularity
title_info = f"Time Series Decomposition (TSD)\n X Axis = {x_axis_label} | R² = {r_squared:.4f}"

# Original data
axes[0].set_title(f'{title_info}\n Original Data')
result.observed.plot(ax=axes[0])
axes[0].set_ylabel(data_column)

# Trend component
axes[1].set_title('Trend')
result.trend.plot(ax=axes[1])
axes[1].set_ylabel('Trend')

# Seasonal component
axes[2].set_title('Seasonality')
result.seasonal.plot(ax=axes[2])
axes[2].set_ylabel('Seasonality')

# Residual component
axes[3].set_title('Residuals')
result.resid.plot(ax=axes[3])
axes[3].set_ylabel('Residuals')

for ax in axes:
    ax.set_xlabel("")

plt.tight_layout()
plt.show()

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

Monthly Autocorrelation — Autocorrelation Function, Monthly

Quarterly Autocorrelation — Autocorrelation Function, Quarterly

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose, STL
from statsmodels.tools.sm_exceptions import ValueWarning
import warnings
import numpy as np
import os


'''
In Power BI, assume the dataset looks like:
     0 - Date Column
     1 - Data Column
     2 - Selected Seasonality Column
'''
try:
    # If 'dataset' exists, we're likely in the Power BI environment
    df = dataset
except NameError:
    # If dataset does not exist, then run locally
    script_dir = os.getcwd()  # current working directory
    file_name = "Monthly data.csv"
    file_path = os.path.join(script_dir, file_name)
    df = pd.read_csv(file_path)

# Identify column names
date_column = df.columns[0]          # e.g., "Monthly" or "Quarterly" date strings
data_column = df.columns[1]          # e.g., "SUM Data"
seasonality_column = df.columns[2]   # e.g., "Selected Seasonality"

# Clean Data
df[date_column] = pd.to_datetime(df[date_column]) # Convert date column to datetime
df.sort_values(date_column, inplace=True) # Sort by date just in case
df.set_index(date_column, inplace=True) # Set index to date column
# df[data_column] = df[data_column].fillna(method='ffill').fillna(method='bfill') # We can fill missing data if needed; your data might not have NA though
df[data_column] = df[data_column].fillna(df[data_column].mean())

# Grab the first "Selected Seasonality" from the dataset
selected_seasonality = df[seasonality_column].iloc[0]

# Dynamic Formatting
visual_width = 800  # Default width in pixels
visual_height = 600  # Default height in pixels
try:
    if 'width' in dataset.columns and 'height' in dataset.columns:
        visual_width = dataset['width'].iloc[0]
        visual_height = dataset['height'].iloc[0]
except NameError:
    visual_width = 800
    visual_height = 600

# Aspect ratio scaling
aspect_ratio = visual_width / visual_height
base_size = min(visual_width, visual_height) / 100  # Scale based on smaller dimension
figsize = (base_size * aspect_ratio, base_size)

# Dynamic font scaling
font_scale = base_size / 8
plt.rcParams.update({'font.size': 10 * font_scale})

## ---------------------  ACF ---------------------

# Set relevant lags
if selected_seasonality == 365:
    frequency = 'Daily'
    max_lags = min(len(df)-1, 365)  # Last Year
elif selected_seasonality == 7:
    frequency = 'Weekly'
    max_lags = min(len(df)-1, 26)  # maybe ~6 months of weekly lags or up to 26
elif selected_seasonality == 12:
    frequency = 'Monthly'
    max_lags = min(len(df)-1, 12)  # up to 12 lags (1 year of monthly)
elif selected_seasonality == 4:
    frequency = 'Quarterly'
    max_lags = min(len(df)-1, 4)   # up to 4 lags (1 year of quarterly)
elif selected_seasonality == 1:
    frequency = 'Yearly'
    max_lags = 1
else:
    frequency = 'Unknown'
    max_lags = min(len(df)-1, 12)  # fallback

# Plot ACF and PACF
if len(df) > 1:
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=figsize ) #figsize=(15, 10))
    try:
        plot_acf(df[data_column], lags=max_lags, ax=ax1)
        ax1.set_title(f'Autocorrelation Function (ACF)\n Frequency: {frequency}, Lags: {max_lags}')

        plot_pacf(df[data_column], lags=max_lags, method='ywm', ax=ax2)
        ax2.set_title(f'Partial Autocorrelation Function (PACF)\n Frequency: {frequency}, Lags: {max_lags}')
    except ValueError as e:
        ax1.set_title(f'ACF Error: {e}')
        ax2.set_title(f'PACF Error: {e}')
else:
    print("Not enough data to calculate ACF or PACF.")

plt.tight_layout()
plt.show()

Power BI Environment

A premium license of some sort (e.g., Pro, PPU, Premium Capacity, etc...) is required to display python visuals in the service. More information: Microsoft, Python Power BI Visuals - Licensing

Now your done! Your data is ready to be sliced, diced, and scrambled (sorry, couldn't resist the pun 😂)!

BONUS

Excel

Here is a cool excel formula that will:

Create a upward trend in year one and a downward trend in year two
Imbed quarterly seasonality to the data

To implement, simply create a time series of dates (daily grain) in column A, and use this to create your data in column B.

=100 + SIN(RADIANS(MONTH(A2)*30)) * 20 + (RAND() * 10 - 5)

DOWNLOADABLE FILES:

Source code and files are found on our GitHub Repo: Analytical Ants, Time Series Decomposition and Autocorrelation Function in Power BI with Python

ABOUT US

Data Engineering, Data Science, and Data Analysis Contact us for your next project! Follow & subscribe for more!