API Reference

This section contains the complete API reference for PyIndexNum.

Core Modules

pyindexnum.utils

Utility functions for the PyIndexNum library.

This module contains helper functions for data processing and calculations used throughout the library.

pyindexnum.utils.aggregate_time(df: DataFrame, date_col: str = 'date', price_col: str = 'price', quantity_col: str | None = None, id_col: str = 'product_id', agg_type: Literal['arithmetic', 'geometric', 'harmonic', 'weighted_arithmetic', 'weighted_geometric', 'weighted_harmonic'] = 'arithmetic', freq: str = '1mo') → DataFrame[source]

Aggregate time series data to a specified frequency.

This function aggregates price and quantity data by grouping on the id column and truncated date periods. Prices are aggregated according to the specified aggregation type, while quantities are always summed if provided.

Parameters:

df – Input polars DataFrame containing the data.
date_col – Name of the column containing dates. Will be parsed to datetime if string.
price_col – Name of the column containing prices. Must be numeric.
quantity_col – Name of the column containing quantities. If None, quantity aggregation is skipped.
id_col – Name of the column containing unique identifiers (e.g., product IDs).
agg_type – Type of aggregation for prices. Options: - ‘arithmetic’: Arithmetic mean - ‘geometric’: Geometric mean - ‘harmonic’: Harmonic mean - ‘weighted_arithmetic’: Weighted arithmetic mean (requires quantity_col) - ‘weighted_geometric’: Weighted geometric mean (requires quantity_col) - ‘weighted_harmonic’: Weighted harmonic mean (requires quantity_col)
freq – Frequency for aggregation (e.g., ‘1d’, ‘1w’, ‘1mo’, ‘1q’, ‘1y’).

Returns:

Aggregated DataFrame with columns – id_col, period (truncated date), aggregated_price, aggregated_quantity (if quantity_col provided).

Raises:

ValueError – If weighted aggregation type is selected but quantity_col is None.
ValueError – If required columns are missing or have wrong types.

Examples

>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-15", "2023-02-01"],
...     "product": ["A", "A", "A"],
...     "price": [100, 110, 120],
...     "quantity": [10, 12, 15]
... })
>>> result = aggregate_time(df, "date", "price", "quantity", "product", "arithmetic", "1mo")

pyindexnum.utils.carry_backward_imputation(df: DataFrame, value_cols: list[str], id_col: str = 'product_id', time_col: str = 'period') → DataFrame[source]

Create balanced panel and fill missing values using backward imputation.

This function creates a balanced panel dataset by generating all possible combinations of product IDs and time periods, then fills missing values by carrying backward the first future observation for each product.

Parameters:

df – Input polars DataFrame with aggregated data (may be unbalanced).
value_cols – List of column names to impute (e.g., [“aggregated_price”, “aggregated_quantity”]).
id_col – Name of the column containing unique identifiers (default “product_id”).
time_col – Name of the column containing time periods (default “period”).

Returns:

Balanced DataFrame with all product-period combinations and nulls filled using backward imputation.

Raises:

ValueError – If required columns are missing from the DataFrame.

Examples

>>> df = pl.DataFrame({
...     "product_id": ["A", "A", "B"],
...     "period": [pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1)],
...     "aggregated_price": [100.0, 110.0, 200.0]
... })
>>> result = carry_backward_imputation(df, ["aggregated_price"])
>>> # Creates balanced panel: A in both periods, B in both periods
>>> # A: 100.0, 110.0; B: 200.0, 200.0 (no fill needed)

pyindexnum.utils.carry_forward_imputation(df: DataFrame, value_cols: list[str], id_col: str = 'product_id', time_col: str = 'period') → DataFrame[source]

Create balanced panel and fill missing values using forward imputation.

This function creates a balanced panel dataset by generating all possible combinations of product IDs and time periods, then fills missing values by carrying forward the last available observation for each product.

Parameters:

df – Input polars DataFrame with aggregated data (may be unbalanced).
value_cols – List of column names to impute (e.g., [“aggregated_price”, “aggregated_quantity”]).
id_col – Name of the column containing unique identifiers (default “product_id”).
time_col – Name of the column containing time periods (default “period”).

Returns:

Balanced DataFrame with all product-period combinations and nulls filled using forward imputation.

Raises:

ValueError – If required columns are missing from the DataFrame.

Examples

>>> df = pl.DataFrame({
...     "product_id": ["A", "A", "B"],
...     "period": [pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1)],
...     "aggregated_price": [100.0, 110.0, 200.0]
... })
>>> result = carry_forward_imputation(df, ["aggregated_price"])
>>> # Creates balanced panel: A in both periods, B in both periods
>>> # A: 100.0, 110.0; B: 200.0, 200.0 (forward filled)

pyindexnum.utils.geometric_mean_expr(col: str) → Expr[source]

Compute geometric mean of a column using polars expressions.

Handles zero and negative values by excluding them from calculation. If any invalid values, returns null.

Parameters:: col – Column name to compute geometric mean for.
Returns:: Polars expression for geometric mean.

pyindexnum.utils.get_summary(df: DataFrame) → dict[source]

Get summary information about a standardized price index DataFrame.

This function provides key statistics about a DataFrame that has been standardized using standardize_columns().

Parameters:

df – Polars DataFrame with standardized columns (date, price, product_id, quantity).

Returns:

Dictionary containing –

n_products: Number of unique product IDs
start_date: Earliest date in the data
end_date: Latest date in the data
quantity: Boolean indicating if quantity column exists and has non-null values

pyindexnum.utils.harmonic_mean_expr(col: str) → Expr[source]

Compute harmonic mean of a column using polars expressions.

Handles zero and negative values by excluding them from calculation. If any invalid values, returns null.

Parameters:: col – Column name to compute harmonic mean for.
Returns:: Polars expression for harmonic mean.

pyindexnum.utils.remove_unbalanced(df: DataFrame) → DataFrame[source]

Remove products that are not present in all time periods.

This function filters out any product_id that does not appear in every unique time period in the dataset, resulting in a balanced panel dataset. It assumes the DataFrame has been processed by standardize_columns and aggregate_time, with columns “product_id” and “period”.

Parameters:: df – Polars DataFrame with columns “product_id” and “period” (from aggregate_time).
Returns:: Filtered DataFrame containing only products present in all periods.
Raises:: ValueError – If required columns “product_id” or “period” are missing.

Examples

>>> df = pl.DataFrame({
...     "product_id": ["A", "A", "B", "B", "C"],
...     "period": [pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1)],
...     "aggregated_price": [100, 110, 200, 210, 300]
... })
>>> result = remove_unbalanced(df)
>>> # Only product "C" is removed as it's missing period 2023-02-01

pyindexnum.utils.standardize_columns(df: DataFrame, date_col: str = 'date', price_col: str = 'price', id_col: str = 'product_id', quantity_col: str | None = None, date_format: str = '%Y-%m-%d') → DataFrame[source]

Standardize column names and types for price index calculations.

This function selects specified columns, renames them to standard nomenclature, converts the date column to Date type, validates numeric types, and filters out rows where quantity is zero if quantity column is provided.

Parameters:

df – Input polars DataFrame.
date_col – Name of the date column in input DataFrame (default “date”).
price_col – Name of the price column in input DataFrame (default “price”).
id_col – Name of the product ID column in input DataFrame (default “product_id”).
quantity_col – Name of the quantity column in input DataFrame (default None).
date_format – Format string for parsing date column (default “%Y-%m-%d”).

Returns:

DataFrame with standardized columns – “date” (Date), “price” (numeric), “product_id”, “quantity” (numeric, if provided).

Raises:

ValueError – If required columns are missing or have invalid types.

pyindexnum.utils.weighted_arithmetic_mean_expr(price_col: str, weight_col: str) → Expr[source]

Compute weighted arithmetic mean using polars expressions.

Parameters:

price_col – Column name for values to average.
weight_col – Column name for weights.

Returns:

Polars expression for weighted arithmetic mean.

pyindexnum.utils.weighted_geometric_mean_expr(price_col: str, weight_col: str) → Expr[source]

Compute weighted geometric mean using polars expressions.

Handles zero and negative values by excluding them from calculation. If any invalid values, returns null.

Parameters:

price_col – Column name for values to average.
weight_col – Column name for weights.

Returns:

Polars expression for weighted geometric mean.

pyindexnum.utils.weighted_harmonic_mean_expr(price_col: str, weight_col: str) → Expr[source]

Compute weighted harmonic mean using polars expressions.

Handles zero and negative values by excluding them from calculation. If any invalid values, returns null.

Parameters:

price_col – Column name for values to average.
weight_col – Column name for weights.

Returns:

Polars expression for weighted harmonic mean.

pyindexnum.bilateral

Bilateral price index functions for the PyIndexNum library.

This module contains functions for calculating unweighted bilateral price indices that compare prices between two time periods.

pyindexnum.bilateral.carli(df: DataFrame) → float[source]

Compute the Carli price index (arithmetic mean of price relatives).

The Carli index is calculated as the arithmetic mean of the price relatives (current price / base price) for each product.

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”) containing data for exactly two periods, with each product having exactly one price per period.
Returns:: The Carli price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices per period, or if products differ between periods, or if price relatives contain negatives.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190]
... })
>>> carli(df)
0.95...

pyindexnum.bilateral.dutot(df: DataFrame) → float[source]

Compute the Dutot price index (ratio of arithmetic means).

The Dutot index is calculated as the ratio of the arithmetic mean of prices in the current period to the arithmetic mean of prices in the base period.

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”) containing data for exactly two periods, with each product having exactly one price per period.
Returns:: The Dutot price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices per period, or if products differ between periods, or if prices contain negatives.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190]
... })
>>> dutot(df)
0.95...

pyindexnum.bilateral.fisher(df: DataFrame) → float[source]

Compute the Fisher price index.

The Fisher index is the geometric mean of the Laspeyres and Paasche indices, designed to satisfy the time reversal and factor reversal tests.

Formula: sqrt(Laspeyres * Paasche)

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”, “quantity”) containing data for exactly two periods, with each product having exactly one price and quantity per period.
Returns:: The Fisher price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices/quantities per period, or if products differ between periods, or if prices or quantities contain negatives or zeros.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190],
...     "quantity": [10, 20, 10, 20]
... })
>>> fisher(df)
0.96...

pyindexnum.bilateral.jevons(df: DataFrame) → float[source]

Compute the Jevons price index (geometric mean of price relatives).

The Jevons index is calculated as the geometric mean of the price relatives (current price / base price) for each product.

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”) containing data for exactly two periods, with each product having exactly one price per period.
Returns:: The Jevons price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices per period, or if products differ between periods, or if price relatives contain zeros or negatives.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190]
... })
>>> jevons(df)
0.95...

pyindexnum.bilateral.laspeyres(df: DataFrame) → float[source]

Compute the Laspeyres price index.

The Laspeyres index is calculated as the ratio of the cost of the basket in the current period using base period quantities to the cost of the basket in the base period.

Formula: sum(p_t * q_0) / sum(p_0 * q_0)

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”, “quantity”) containing data for exactly two periods, with each product having exactly one price and quantity per period.
Returns:: The Laspeyres price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices/quantities per period, or if products differ between periods, or if prices or quantities contain negatives or zeros.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190],
...     "quantity": [10, 20, 10, 20]
... })
>>> laspeyres(df)
0.95...

pyindexnum.bilateral.paasche(df: DataFrame) → float[source]

Compute the Paasche price index.

The Paasche index is calculated as the ratio of the cost of the basket in the current period using current period quantities to the cost of the basket in the base period using current period quantities.

Formula: sum(p_t * q_t) / sum(p_0 * q_t)

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”, “quantity”) containing data for exactly two periods, with each product having exactly one price and quantity per period.
Returns:: The Paasche price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices/quantities per period, or if products differ between periods, or if prices or quantities contain negatives or zeros.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190],
...     "quantity": [10, 20, 15, 25]
... })
>>> paasche(df)
0.97...

pyindexnum.bilateral.tornqvist(df: DataFrame) → float[source]

Compute the Törnqvist price index.

The Törnqvist index is a weighted geometric average of the price relatives. The weights are the arithmetic average of the expenditure shares in the two periods.

Formula: exp(sum(0.5 * (s_0 + s_t) * ln(p_t / p_0))) where s_i = (p_i * q_i) / sum(p_i * q_i)

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”, “quantity”) containing data for exactly two periods, with each product having exactly one price and quantity per period.
Returns:: The Törnqvist price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices/quantities per period, or if products differ between periods, or if prices or quantities contain negatives or zeros.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190],
...     "quantity": [10, 20, 15, 25]
... })
>>> tornqvist(df)
0.96...

pyindexnum.bilateral.walsh(df: DataFrame) → float[source]

Compute the Walsh price index.

The Walsh index is a pure price index that uses the geometric mean of the quantities from the two periods as the fixed basket.

Formula: sum(p_t * sqrt(q_0 * q_t)) / sum(p_0 * sqrt(q_0 * q_t))

Parameters:: df – Polars DataFrame with standardized columns (“date”, “price”, “product_id”, “quantity”) containing data for exactly two periods, with each product having exactly one price and quantity per period.
Returns:: The Walsh price index as a float.
Raises:: ValueError – If DataFrame doesn’t have exactly two unique dates, or if any product has multiple prices/quantities per period, or if products differ between periods, or if prices or quantities contain negatives or zeros.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "date": ["2023-01-01", "2023-01-01", "2023-02-01", "2023-02-01"],
...     "product_id": ["A", "B", "A", "B"],
...     "price": [100, 200, 110, 190],
...     "quantity": [10, 20, 10, 20]
... })
>>> walsh(df)
0.95...

pyindexnum.multilateral

Multilateral price index functions for the PyIndexNum library.

This module contains functions for calculating multilateral price indices that compare prices across multiple time periods simultaneously.

pyindexnum.multilateral.geary_khamis(df: DataFrame, max_iter: int = 100, tol: float = 1e-08) → DataFrame[source]

Compute the Geary-Khamis multilateral price index.

The Geary-Khamis method is an iterative multilateral index that solves for reference prices and period price levels simultaneously.

Parameters:

df – Polars DataFrame with standardized columns (“product_id”, “period”, “aggregated_price”, “aggregated_quantity”) containing data for multiple periods, with each product having exactly one price and quantity per period.
max_iter – Maximum number of iterations for convergence (default 100).
tol – Tolerance for convergence check (default 1e-8).

Returns:

DataFrame with columns “period” (Date) and “index_value” (float), where index_value represents the multilateral price index for each period relative to the base period (first chronological period = 1.0).

Raises:

ValueError – If DataFrame doesn’t meet requirements or iteration doesn’t converge.

pyindexnum.multilateral.geks_fisher(df: DataFrame) → DataFrame[source]

Compute the GEKS-Fisher multilateral price index.

The GEKS (Generalized EKS) method uses bilateral Fisher indices between all pairs of periods. The price level for period t relative to period 1 is the geometric mean of all possible bilateral links.

Formula: P_geks_t = product_{k=1}^T [P_F(p^k, p^t, q^k, q^t) / P_F(p^k, p^1, q^k, q^1)]^(1/T)

Parameters:: df – Polars DataFrame with standardized columns (“product_id”, “period”, “aggregated_price”, “aggregated_quantity”) containing data for multiple periods, with each product having exactly one price and quantity per period.
Returns:: DataFrame with columns “period” (Date) and “index_value” (float), where index_value represents the multilateral price index for each period relative to the base period (first chronological period = 1.0).
Raises:: ValueError – If DataFrame doesn’t meet requirements (see _validate_multilateral_input).

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "product_id": ["A", "A", "B", "B"],
...     "period": [pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1), pl.date(2023, 2, 1)],
...     "aggregated_price": [100, 110, 200, 210],
...     "aggregated_quantity": [10, 10, 20, 20]
... })
>>> result = geks_fisher(df)
>>> # Returns DataFrame with period and index_value columns

pyindexnum.multilateral.geks_jevons(df: DataFrame) → DataFrame[source]

Compute the GEKS-Jevons multilateral price index.

The GEKS method applied using the Jevons (unweighted geometric mean of price relatives) index as the underlying bilateral formula. Since the Jevons index is unweighted, only price information is required — no quantity column is necessary.

Formula: P_geks-J(0,t) = product_{k=0}^{T-1} [P_J(k,t) / P_J(k,0)]^(1/T)

where P_J(a,b) is the Jevons bilateral index between periods a and b: P_J(a,b) = [prod_{i=1}^{N} (p_i^b / p_i^a)]^(1/N)

GEKS-Jevons is particularly useful for web-scraped data where quantity information is unavailable. Despite being unweighted, it has been found to outperform some weighted bilateral methods in empirical studies.

Parameters:: df – Polars DataFrame with columns (“product_id”, “period”, “aggregated_price”) containing data for multiple periods, with each product having exactly one price per period. The “aggregated_quantity” column is optional and, if present, will be ignored (Jevons is an unweighted index).
Returns:: DataFrame with columns “period” (Date) and “index_value” (float), where index_value represents the multilateral price index for each period relative to the base period (first chronological period = 1.0).
Raises:: ValueError – If DataFrame doesn’t meet requirements (see _validate_multilateral_input).

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "product_id": ["A", "A", "B", "B"],
...     "period": [pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1), pl.date(2023, 2, 1)],
...     "aggregated_price": [100, 110, 200, 210],
... })
>>> result = geks_jevons(df)
>>> # Returns DataFrame with period and index_value columns

pyindexnum.multilateral.geks_tornqvist(df: DataFrame) → DataFrame[source]

Compute the GEKS-Törnqvist (CCDI) multilateral price index.

The GEKS method applied using the Törnqvist index as the underlying bilateral formula.

Formula: P_ccdi_t = product_{k=1}^T [P_T(p^k, p^t, s^k, s^t) / P_T(p^k, p^1, s^k, s^1)]^(1/T)

Parameters:: df – Polars DataFrame with standardized columns (“product_id”, “period”, “aggregated_price”, “aggregated_quantity”) containing data for multiple periods, with each product having exactly one price and quantity per period.
Returns:: DataFrame with columns “period” (Date) and “index_value” (float), where index_value represents the multilateral price index for each period relative to the base period (first chronological period = 1.0).
Raises:: ValueError – If DataFrame doesn’t meet requirements (see _validate_multilateral_input).

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "product_id": ["A", "A", "B", "B"],
...     "period": [pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1), pl.date(2023, 2, 1)],
...     "aggregated_price": [100, 110, 200, 210],
...     "aggregated_quantity": [10, 10, 20, 20]
... })
>>> result = geks_tornqvist(df)
>>> # Returns DataFrame with period and index_value columns

pyindexnum.multilateral.time_product_dummy(df: DataFrame, weighted: bool = True) → DataFrame[source]

Compute the Time Product Dummy multilateral price index.

The Time Product Dummy (TPD) method uses regression analysis to estimate price indices. Time and product dummy variables are included in the model, with the index values derived from the time dummy coefficients.

Parameters:

df – Polars DataFrame with standardized columns (“product_id”, “period”, “aggregated_price”) and optionally “aggregated_quantity” if weighted=True. Contains data for multiple periods, with each product having exactly one price per period.
weighted – If True, use weighted least squares with expenditure shares (p*q / sum(p*q) per period) as weights. This requires the “aggregated_quantity” column to be present. If False, use unweighted OLS.

Returns:

DataFrame with columns “period” (Date) and “index_value” (float), where index_value represents the multilateral price index for each period relative to the base period (first chronological period = 1.0).

Raises:

ValueError – If DataFrame doesn’t meet requirements.

Examples

>>> import polars as pl
>>> df = pl.DataFrame({
...     "product_id": ["A", "A", "B", "B"],
...     "period": [pl.date(2023, 1, 1), pl.date(2023, 2, 1), pl.date(2023, 1, 1), pl.date(2023, 2, 1)],
...     "aggregated_price": [100, 110, 200, 210],
...     "aggregated_quantity": [10, 10, 20, 20]
... })
>>> result = time_product_dummy(df, weighted=True)
>>> # Returns DataFrame with period and index_value columns

pyindexnum.extension

Extension methods for connecting two different multilateral indices.

This module contains functions for splicing two multilateral price indices that are calculated on the same window length but shifted by one period. These methods are used to extend price index series when using rolling windows.

pyindexnum.extension.fixed_base_rolling_window(index1: DataFrame, index2: DataFrame, base_period: str) → DataFrame[source]

Calculate the fixed base rolling window extension method.

The fixed base rolling method calculates the rate of change between the last period of the second window and a reference period common to the first and second window, then uses this rate to connect the base period of the first window to the last period of the second window.

Parameters:

index1 – First multilateral index DataFrame with columns “period” and “index_value”
index2 – Second multilateral index DataFrame with columns “period” and “index_value”
base_period – string indicating the date of the base period in YYYY-MM-DD format

Returns:

DataFrame with the full extended index series including all periods from index1 plus the spliced period

Raises:

ValueError – If input validation fails or if base_period is not found in both indices

Examples

>>> import polars as pl
>>> from datetime import date
>>> idx1 = pl.DataFrame({
...     "period": [date(2023, 1, 1), date(2023, 2, 1), date(2023, 3, 1)],
...     "index_value": [1.0, 1.05, 1.10]
... })
>>> idx2 = pl.DataFrame({
...     "period": [date(2023, 2, 1), date(2023, 3, 1), date(2023, 4, 1)],
...     "index_value": [1.05, 1.10, 1.15]
... })
>>> result = fixed_base_rolling_window(idx1, idx2, "2023-02-01")
>>> # Returns the full extended index series including periods 2023-01-01, 2023-02-01, 2023-03-01, and 2023-04-01

pyindexnum.extension.half_splice(index1: DataFrame, index2: DataFrame) → DataFrame[source]

Calculate the half splice extension method.

The half splice method uses the period in the middle of the first window (T/2 if the window is even, T/2+1 if the window is odd) as the connecting point.

Parameters:

index1 – First multilateral index DataFrame with columns “period” and “index_value”
index2 – Second multilateral index DataFrame with columns “period” and “index_value”

Returns:

DataFrame with the full extended index series including all periods from index1 plus the spliced period

Raises:

ValueError – If input validation fails

Examples

>>> import polars as pl
>>> from datetime import date
>>> idx1 = pl.DataFrame({
...     "period": [date(2023, 1, 1), date(2023, 2, 1), date(2023, 3, 1)],
...     "index_value": [1.0, 1.05, 1.10]
... })
>>> idx2 = pl.DataFrame({
...     "period": [date(2023, 2, 1), date(2023, 3, 1), date(2023, 4, 1)],
...     "index_value": [1.05, 1.10, 1.15]
... })
>>> result = half_splice(idx1, idx2)
>>> # Returns the full extended index series including periods 2023-01-01, 2023-02-01, 2023-03-01, and 2023-04-01

pyindexnum.extension.mean_splice(index1: DataFrame, index2: DataFrame) → DataFrame[source]

Calculate the mean splice extension method (Diewert and Fox, 2018).

The mean splice method uses the geometric mean of all possible choices of splicing, i.e., all periods which are included in the current window and the previous one. This is the most sophisticated splicing method.

Parameters:

index1 – First multilateral index DataFrame with columns “period” and “index_value”
index2 – Second multilateral index DataFrame with columns “period” and “index_value”

Returns:

DataFrame with the full extended index series including all periods from index1 plus the spliced period

Raises:

ValueError – If input validation fails

Examples

>>> import polars as pl
>>> from datetime import date
>>> idx1 = pl.DataFrame({
...     "period": [date(2023, 1, 1), date(2023, 2, 1), date(2023, 3, 1)],
...     "index_value": [1.0, 1.05, 1.10]
... })
>>> idx2 = pl.DataFrame({
...     "period": [date(2023, 2, 1), date(2023, 3, 1), date(2023, 4, 1)],
...     "index_value": [1.05, 1.10, 1.15]
... })
>>> result = mean_splice(idx1, idx2)
>>> # Returns the full extended index series including periods 2023-01-01, 2023-02-01, 2023-03-01, and 2023-04-01

pyindexnum.extension.movement_splice(index1: DataFrame, index2: DataFrame) → DataFrame[source]

Calculate the movement splice extension method.

The movement splice method calculates the rate of change between the last and second-last period in the second window, then applies this rate to extend the first window by one period.

Parameters:

index1 – First multilateral index DataFrame with columns “period” and “index_value”
index2 – Second multilateral index DataFrame with columns “period” and “index_value”

Returns:

DataFrame with the full extended index series including all periods from index1 plus the spliced period

Raises:

ValueError – If input validation fails

Examples

>>> import polars as pl
>>> from datetime import date
>>> idx1 = pl.DataFrame({
...     "period": [date(2023, 1, 1), date(2023, 2, 1), date(2023, 3, 1)],
...     "index_value": [1.0, 1.05, 1.10]
... })
>>> idx2 = pl.DataFrame({
...     "period": [date(2023, 2, 1), date(2023, 3, 1), date(2023, 4, 1)],
...     "index_value": [1.05, 1.10, 1.15]
... })
>>> result = movement_splice(idx1, idx2)
>>> # Returns the full extended index series including periods 2023-01-01, 2023-02-01, 2023-03-01, and 2023-04-01

pyindexnum.extension.window_splice(index1: DataFrame, index2: DataFrame) → DataFrame[source]

Calculate the window splice extension method.

The window splice method calculates the rate of change between the last and first period of the second window, then uses this rate to connect the second period of the first window to the last period of the second window.

Parameters:

index1 – First multilateral index DataFrame with columns “period” and “index_value”
index2 – Second multilateral index DataFrame with columns “period” and “index_value”

Returns:

DataFrame with the full extended index series including all periods from index1 plus the spliced period

Raises:

ValueError – If input validation fails

Examples

>>> import polars as pl
>>> from datetime import date
>>> idx1 = pl.DataFrame({
...     "period": [date(2023, 1, 1), date(2023, 2, 1), date(2023, 3, 1)],
...     "index_value": [1.0, 1.05, 1.10]
... })
>>> idx2 = pl.DataFrame({
...     "period": [date(2023, 2, 1), date(2023, 3, 1), date(2023, 4, 1)],
...     "index_value": [1.05, 1.10, 1.15]
... })
>>> result = window_splice(idx1, idx2)
>>> # Returns the full extended index series including periods 2023-01-01, 2023-02-01, 2023-03-01, and 2023-04-01