sap-hana-ml

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SAP HANA ML Python Client (hana-ml)

SAP HANA ML Python客户端(hana-ml)

Package Version: 2.22.241011
Last Verified: 2025-11-27
包版本:2.22.241011
最后验证时间:2025-11-27

Table of Contents

目录

Installation & Setup

安装与配置

bash
pip install hana-ml
Requirements: Python 3.8+, SAP HANA 2.0 SPS03+ or SAP HANA Cloud

bash
pip install hana-ml
要求:Python 3.8+、SAP HANA 2.0 SPS03+ 或 SAP HANA Cloud

Quick Start

快速入门

Connection & DataFrame

连接与DataFrame

python
from hana_ml import ConnectionContext
python
from hana_ml import ConnectionContext

Connect

Connect

conn = ConnectionContext( address='<hostname>', port=443, user='<username>', password='<password>', encrypt=True )
conn = ConnectionContext( address='<hostname>', port=443, user='<username>', password='<password>', encrypt=True )

Create DataFrame

Create DataFrame

df = conn.table('MY_TABLE', schema='MY_SCHEMA') print(f"Shape: {df.shape}") df.head(10).collect()
undefined
df = conn.table('MY_TABLE', schema='MY_SCHEMA') print(f"Shape: {df.shape}") df.head(10).collect()
undefined

PAL Classification

PAL分类

python
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
python
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification

Train model

Train model

clf = UnifiedClassification(func='RandomDecisionTree') clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET')
clf = UnifiedClassification(func='RandomDecisionTree') clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET')

Predict & evaluate

Predict & evaluate

predictions = clf.predict(test_df, features=['F1', 'F2', 'F3']) score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET')
undefined
predictions = clf.predict(test_df, features=['F1', 'F2', 'F3']) score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET')
undefined

APL AutoML

APL AutoML

python
from hana_ml.algorithms.apl.classification import AutoClassifier
python
from hana_ml.algorithms.apl.classification import AutoClassifier

Automated classification

Automated classification

auto_clf = AutoClassifier() auto_clf.fit(train_df, label='TARGET') predictions = auto_clf.predict(test_df)
undefined
auto_clf = AutoClassifier() auto_clf.fit(train_df, label='TARGET') predictions = auto_clf.predict(test_df)
undefined

Model Persistence

模型持久化

python
from hana_ml.model_storage import ModelStorage

ms = ModelStorage(conn)
clf.name = 'MY_CLASSIFIER'
ms.save_model(model=clf, if_exists='replace')

python
from hana_ml.model_storage import ModelStorage

ms = ModelStorage(conn)
clf.name = 'MY_CLASSIFIER'
ms.save_model(model=clf, if_exists='replace')

Core Libraries

核心库

PAL (Predictive Analysis Library)

PAL(预测分析库)

  • 100+ algorithms executed in-database
  • Categories: Classification, Regression, Clustering, Time Series, Preprocessing
  • Key classes:
    UnifiedClassification
    ,
    UnifiedRegression
    ,
    KMeans
    ,
    ARIMA
  • See:
    references/PAL_ALGORITHMS.md
    for complete list
  • 100+算法,在库内执行
  • 类别:分类、回归、聚类、时间序列、预处理
  • 核心类
    UnifiedClassification
    UnifiedRegression
    KMeans
    ARIMA
  • 详见:
    references/PAL_ALGORITHMS.md
    获取完整列表

APL (Automated Predictive Library)

APL(自动化预测库)

  • AutoML capabilities with automatic feature engineering
  • Key classes:
    AutoClassifier
    ,
    AutoRegressor
    ,
    GradientBoostingClassifier
  • See:
    references/APL_ALGORITHMS.md
    for details
  • AutoML能力,支持自动特征工程
  • 核心类
    AutoClassifier
    AutoRegressor
    GradientBoostingClassifier
  • 详见:
    references/APL_ALGORITHMS.md
    获取详情

DataFrames

DataFrames

  • Lazy evaluation - builds SQL until
    collect()
    called
  • In-database processing for optimal performance
  • See:
    references/DATAFRAME_REFERENCE.md
    for complete API
  • 延迟计算 - 直到调用
    collect()
    才生成SQL
  • 库内处理,实现最优性能
  • 详见:
    references/DATAFRAME_REFERENCE.md
    获取完整API

Visualizers

可视化工具

  • EDA plots, model explanations, metrics
  • SHAP integration for model interpretability
  • See:
    references/VISUALIZERS.md
    for 14 visualization modules

  • 探索性数据分析(EDA)图表、模型解释、指标展示
  • SHAP集成,提升模型可解释性
  • 详见:
    references/VISUALIZERS.md
    了解14个可视化模块

Common Patterns

常见模式

Train-Test Split

训练-测试-验证集拆分

python
from hana_ml.algorithms.pal.partition import train_test_val_split

train, test, val = train_test_val_split(
    data=df,
    training_percentage=0.7,
    testing_percentage=0.2,
    validation_percentage=0.1
)
python
from hana_ml.algorithms.pal.partition import train_test_val_split

train, test, val = train_test_val_split(
    data=df,
    training_percentage=0.7,
    testing_percentage=0.2,
    validation_percentage=0.1
)

Feature Importance

特征重要性

python
undefined
python
undefined

APL models

APL models

importance = auto_clf.get_feature_importances()
importance = auto_clf.get_feature_importances()

PAL models

PAL models

from hana_ml.algorithms.pal.preprocessing import FeatureSelection fs = FeatureSelection() fs.fit(train_df, features=features, label='TARGET')
undefined
from hana_ml.algorithms.pal.preprocessing import FeatureSelection fs = FeatureSelection() fs.fit(train_df, features=features, label='TARGET')
undefined

Pipeline

流水线

python
from hana_ml.algorithms.pal.pipeline import Pipeline
from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer

pipeline = Pipeline([
    ('imputer', Imputer(strategy='mean')),
    ('normalizer', FeatureNormalizer()),
    ('classifier', UnifiedClassification(func='RandomDecisionTree'))
])

python
from hana_ml.algorithms.pal.pipeline import Pipeline
from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer

pipeline = Pipeline([
    ('imputer', Imputer(strategy='mean')),
    ('normalizer', FeatureNormalizer()),
    ('classifier', UnifiedClassification(func='RandomDecisionTree'))
])

Best Practices

最佳实践

  1. Use lazy evaluation - Operations build SQL without execution until
    collect()
  2. Leverage in-database processing - Keep data in HANA for performance
  3. Use Unified interfaces - Consistent APIs across algorithms
  4. Save models - Use
    ModelStorage
    for persistence
  5. Explain predictions - Use SHAP explainers for interpretability
  6. Monitor AutoML - Use
    PipelineProgressStatusMonitor
    for long-running jobs

  1. 使用延迟计算 - 操作仅生成SQL,直到调用
    collect()
    才执行
  2. 利用库内处理 - 数据保留在HANA中以提升性能
  3. 使用统一接口 - 所有算法采用一致的API
  4. 保存模型 - 使用
    ModelStorage
    实现持久化
  5. 解释预测结果 - 使用SHAP解释器提升模型可解释性
  6. 监控AutoML任务 - 对长时间运行的任务使用
    PipelineProgressStatusMonitor

Bundled Resources

配套资源

Reference Files

参考文档

  • references/DATAFRAME_REFERENCE.md
    (479 lines)
    • ConnectionContext API, DataFrame operations, SQL generation
  • references/PAL_ALGORITHMS.md
    (869 lines)
    • Complete PAL algorithm reference (100+ algorithms)
    • Classification, Regression, Clustering, Time Series, Preprocessing
  • references/APL_ALGORITHMS.md
    (534 lines)
    • AutoML capabilities, automated feature engineering
    • AutoClassifier, AutoRegressor, GradientBoosting classes
  • references/VISUALIZERS.md
    (704 lines)
    • 14 visualization modules (EDA, SHAP, metrics, time series)
    • Plot types, configuration, export options
  • references/SUPPORTING_MODULES.md
    (626 lines)
    • Model storage, spatial analytics, graph algorithms
    • Text mining, statistics, error handling

  • references/DATAFRAME_REFERENCE.md
    (479行)
    • ConnectionContext API、DataFrame操作、SQL生成
  • references/PAL_ALGORITHMS.md
    (869行)
    • 完整PAL算法列表(100+算法)
    • 分类、回归、聚类、时间序列、预处理
  • references/APL_ALGORITHMS.md
    (534行)
    • AutoML能力、自动特征工程
    • AutoClassifier、AutoRegressor、GradientBoosting类
  • references/VISUALIZERS.md
    (704行)
    • 14个可视化模块(EDA、SHAP、指标、时间序列)
    • 图表类型、配置、导出选项
  • references/SUPPORTING_MODULES.md
    (626行)
    • 模型存储、空间分析、图算法
    • 文本挖掘、统计、错误处理

Error Handling

错误处理

python
from hana_ml.ml_exceptions import Error

try:
    clf.fit(train_df, features=features, label='TARGET')
except Error as e:
    print(f"HANA ML Error: {e}")

python
from hana_ml.ml_exceptions import Error

try:
    clf.fit(train_df, features=features, label='TARGET')
except Error as e:
    print(f"HANA ML Error: {e}")

Documentation

文档