User Guide

Welcome to the PyIndexNum user guide! This section provides comprehensive information on how to use the library effectively for economic index number calculations.

Overview

PyIndexNum is a Python library designed for calculating price and quantity indices using modern data processing techniques. Built on top of Polars, it offers high-performance computation of bilateral and multilateral economic indices.

Key Features

High Performance: Built on Polars for efficient data processing
Comprehensive Index Methods: Support for bilateral and multilateral index calculation
Data Preparation Tools: Built-in utilities for data standardization and aggregation
Extension Methods: Support for index splicing and rolling window calculations
Type Safety: Full type annotations for better IDE support

Installation

pip install pyindexnum

Or using uv:

uv add pyindexnum

Quick Start

See the Examples section for a complete workflow demonstration.

Data Requirements

The library expects data in a standardized format with the following columns:

date: Date or datetime column
price: Numeric price data
product_id: Unique product identifier
quantity: Numeric quantity data (optional, required for weighted indices)

Workflow

Data Standardization: Use standardize_columns() to ensure consistent column names and types
Temporal Aggregation: Use aggregate_time() to aggregate data to desired frequency
Panel Balancing: Optionally remove unbalanced data with remove_unbalanced() or impute missing values
Index Calculation: Compute bilateral or multilateral indices
Extension: Optionally apply splicing methods for chained indices

Best Practices

Always start with standardize_columns() to ensure data consistency
Use aggregate_time() before index calculation to handle high-frequency data
For unbalanced panels, choose between remove_unbalanced() (removes products) or imputation methods (fills missing values)
Bilateral indices compare exactly two periods; multilateral indices can handle multiple periods

Common Pitfalls

Ensure data is sorted by date before aggregation
Check for missing values that could affect index calculations
Verify that product IDs are consistent across periods
Use appropriate aggregation methods based on your data characteristics