User Guide

Welcome to the PyIndexNum user guide! This section provides comprehensive information on how to use the library effectively for economic index number calculations.

Overview

PyIndexNum is a Python library designed for calculating price and quantity indices using modern data processing techniques. Built on top of Polars, it offers high-performance computation of bilateral and multilateral economic indices.

Key Features

  • High Performance: Built on Polars for efficient data processing

  • Comprehensive Index Methods: Support for bilateral and multilateral index calculation

  • Data Preparation Tools: Built-in utilities for data standardization and aggregation

  • Extension Methods: Support for index splicing and rolling window calculations

  • Type Safety: Full type annotations for better IDE support

Installation

pip install pyindexnum

Or using uv:

uv add pyindexnum

Quick Start

See the Examples section for a complete workflow demonstration.

Data Requirements

The library expects data in a standardized format with the following columns:

  • date: Date or datetime column

  • price: Numeric price data

  • product_id: Unique product identifier

  • quantity: Numeric quantity data (optional, required for weighted indices)

Workflow

  1. Data Standardization: Use standardize_columns() to ensure consistent column names and types

  2. Temporal Aggregation: Use aggregate_time() to aggregate data to desired frequency

  3. Panel Balancing: Optionally remove unbalanced data with remove_unbalanced() or impute missing values

  4. Index Calculation: Compute bilateral or multilateral indices

  5. Extension: Optionally apply splicing methods for chained indices

Best Practices

  • Always start with standardize_columns() to ensure data consistency

  • Use aggregate_time() before index calculation to handle high-frequency data

  • For unbalanced panels, choose between remove_unbalanced() (removes products) or imputation methods (fills missing values)

  • Bilateral indices compare exactly two periods; multilateral indices can handle multiple periods

Common Pitfalls

  • Ensure data is sorted by date before aggregation

  • Check for missing values that could affect index calculations

  • Verify that product IDs are consistent across periods

  • Use appropriate aggregation methods based on your data characteristics