polars
Original:🇺🇸 English
Translated
Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis workflows.
2installs
Added on
NPX Install
npx skill4agent add davila7/claude-code-templates polarsTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Polars
Overview
Polars is a lightning-fast DataFrame library for Python and Rust built on Apache Arrow. Work with Polars' expression-based API, lazy evaluation framework, and high-performance data manipulation capabilities for efficient data processing, pandas migration, and data pipeline optimization.
Quick Start
Installation and Basic Usage
Install Polars:
python
uv pip install polarsBasic DataFrame creation and operations:
python
import polars as pl
# Create DataFrame
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["NY", "LA", "SF"]
})
# Select columns
df.select("name", "age")
# Filter rows
df.filter(pl.col("age") > 25)
# Add computed columns
df.with_columns(
age_plus_10=pl.col("age") + 10
)Core Concepts
Expressions
Expressions are the fundamental building blocks of Polars operations. They describe transformations on data and can be composed, reused, and optimized.
Key principles:
- Use to reference columns
pl.col("column_name") - Chain methods to build complex transformations
- Expressions are lazy and only execute within contexts (select, with_columns, filter, group_by)
Example:
python
# Expression-based computation
df.select(
pl.col("name"),
(pl.col("age") * 12).alias("age_in_months")
)Lazy vs Eager Evaluation
Eager (DataFrame): Operations execute immediately
python
df = pl.read_csv("file.csv") # Reads immediately
result = df.filter(pl.col("age") > 25) # Executes immediatelyLazy (LazyFrame): Operations build a query plan, optimized before execution
python
lf = pl.scan_csv("file.csv") # Doesn't read yet
result = lf.filter(pl.col("age") > 25).select("name", "age")
df = result.collect() # Now executes optimized queryWhen to use lazy:
- Working with large datasets
- Complex query pipelines
- When only some columns/rows are needed
- Performance is critical
Benefits of lazy evaluation:
- Automatic query optimization
- Predicate pushdown
- Projection pushdown
- Parallel execution
For detailed concepts, load .
references/core_concepts.mdCommon Operations
Select
Select and manipulate columns:
python
# Select specific columns
df.select("name", "age")
# Select with expressions
df.select(
pl.col("name"),
(pl.col("age") * 2).alias("double_age")
)
# Select all columns matching a pattern
df.select(pl.col("^.*_id$"))Filter
Filter rows by conditions:
python
# Single condition
df.filter(pl.col("age") > 25)
# Multiple conditions (cleaner than using &)
df.filter(
pl.col("age") > 25,
pl.col("city") == "NY"
)
# Complex conditions
df.filter(
(pl.col("age") > 25) | (pl.col("city") == "LA")
)With Columns
Add or modify columns while preserving existing ones:
python
# Add new columns
df.with_columns(
age_plus_10=pl.col("age") + 10,
name_upper=pl.col("name").str.to_uppercase()
)
# Parallel computation (all columns computed in parallel)
df.with_columns(
pl.col("value") * 10,
pl.col("value") * 100,
)Group By and Aggregations
Group data and compute aggregations:
python
# Basic grouping
df.group_by("city").agg(
pl.col("age").mean().alias("avg_age"),
pl.len().alias("count")
)
# Multiple group keys
df.group_by("city", "department").agg(
pl.col("salary").sum()
)
# Conditional aggregations
df.group_by("city").agg(
(pl.col("age") > 30).sum().alias("over_30")
)For detailed operation patterns, load .
references/operations.mdAggregations and Window Functions
Aggregation Functions
Common aggregations within context:
group_by- - count rows
pl.len() - - sum values
pl.col("x").sum() - - average
pl.col("x").mean() - /
pl.col("x").min()- extremespl.col("x").max() - /
pl.first()- first/last valuespl.last()
Window Functions with over()
over()Apply aggregations while preserving row count:
python
# Add group statistics to each row
df.with_columns(
avg_age_by_city=pl.col("age").mean().over("city"),
rank_in_city=pl.col("salary").rank().over("city")
)
# Multiple grouping columns
df.with_columns(
group_avg=pl.col("value").mean().over("category", "region")
)Mapping strategies:
- (default): Preserves original row order
group_to_rows - : Faster but groups rows together
explode - : Creates list columns
join
Data I/O
Supported Formats
Polars supports reading and writing:
- CSV, Parquet, JSON, Excel
- Databases (via connectors)
- Cloud storage (S3, Azure, GCS)
- Google BigQuery
- Multiple/partitioned files
Common I/O Operations
CSV:
python
# Eager
df = pl.read_csv("file.csv")
df.write_csv("output.csv")
# Lazy (preferred for large files)
lf = pl.scan_csv("file.csv")
result = lf.filter(...).select(...).collect()Parquet (recommended for performance):
python
df = pl.read_parquet("file.parquet")
df.write_parquet("output.parquet")JSON:
python
df = pl.read_json("file.json")
df.write_json("output.json")For comprehensive I/O documentation, load .
references/io_guide.mdTransformations
Joins
Combine DataFrames:
python
# Inner join
df1.join(df2, on="id", how="inner")
# Left join
df1.join(df2, on="id", how="left")
# Join on different column names
df1.join(df2, left_on="user_id", right_on="id")Concatenation
Stack DataFrames:
python
# Vertical (stack rows)
pl.concat([df1, df2], how="vertical")
# Horizontal (add columns)
pl.concat([df1, df2], how="horizontal")
# Diagonal (union with different schemas)
pl.concat([df1, df2], how="diagonal")Pivot and Unpivot
Reshape data:
python
# Pivot (wide format)
df.pivot(values="sales", index="date", columns="product")
# Unpivot (long format)
df.unpivot(index="id", on=["col1", "col2"])For detailed transformation examples, load .
references/transformations.mdPandas Migration
Polars offers significant performance improvements over pandas with a cleaner API. Key differences:
Conceptual Differences
- No index: Polars uses integer positions only
- Strict typing: No silent type conversions
- Lazy evaluation: Available via LazyFrame
- Parallel by default: Operations parallelized automatically
Common Operation Mappings
| Operation | Pandas | Polars |
|---|---|---|
| Select column | | |
| Filter | | |
| Add column | | |
| Group by | | |
| Window | | |
Key Syntax Patterns
Pandas sequential (slow):
python
df.assign(
col_a=lambda df_: df_.value * 10,
col_b=lambda df_: df_.value * 100
)Polars parallel (fast):
python
df.with_columns(
col_a=pl.col("value") * 10,
col_b=pl.col("value") * 100,
)For comprehensive migration guide, load .
references/pandas_migration.mdBest Practices
Performance Optimization
-
Use lazy evaluation for large datasets:python
lf = pl.scan_csv("large.csv") # Don't use read_csv result = lf.filter(...).select(...).collect() -
Avoid Python functions in hot paths:
- Stay within expression API for parallelization
- Use only when necessary
.map_elements() - Prefer native Polars operations
-
Use streaming for very large data:python
lf.collect(streaming=True) -
Select only needed columns early:python
# Good: Select columns early lf.select("col1", "col2").filter(...) # Bad: Filter on all columns first lf.filter(...).select("col1", "col2") -
Use appropriate data types:
- Categorical for low-cardinality strings
- Appropriate integer sizes (i32 vs i64)
- Date types for temporal data
Expression Patterns
Conditional operations:
python
pl.when(condition).then(value).otherwise(other_value)Column operations across multiple columns:
python
df.select(pl.col("^.*_value$") * 2) # Regex patternNull handling:
python
pl.col("x").fill_null(0)
pl.col("x").is_null()
pl.col("x").drop_nulls()For additional best practices and patterns, load .
references/best_practices.mdResources
This skill includes comprehensive reference documentation:
references/
- - Detailed explanations of expressions, lazy evaluation, and type system
core_concepts.md - - Comprehensive guide to all common operations with examples
operations.md - - Complete migration guide from pandas to Polars
pandas_migration.md - - Data I/O operations for all supported formats
io_guide.md - - Joins, concatenation, pivots, and reshaping operations
transformations.md - - Performance optimization tips and common patterns
best_practices.md
Load these references as needed when users require detailed information about specific topics.