sap-hana-ml
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSAP HANA ML Python Client (hana-ml)
SAP HANA ML Python客户端(hana-ml)
Package Version: 2.22.241011
Last Verified: 2025-11-27
Last Verified: 2025-11-27
包版本:2.22.241011
最后验证时间:2025-11-27
最后验证时间:2025-11-27
Table of Contents
目录
Installation & Setup
安装与配置
bash
pip install hana-mlRequirements: Python 3.8+, SAP HANA 2.0 SPS03+ or SAP HANA Cloud
bash
pip install hana-ml要求:Python 3.8+、SAP HANA 2.0 SPS03+ 或 SAP HANA Cloud
Quick Start
快速入门
Connection & DataFrame
连接与DataFrame
python
from hana_ml import ConnectionContextpython
from hana_ml import ConnectionContextConnect
Connect
conn = ConnectionContext(
address='<hostname>',
port=443,
user='<username>',
password='<password>',
encrypt=True
)
conn = ConnectionContext(
address='<hostname>',
port=443,
user='<username>',
password='<password>',
encrypt=True
)
Create DataFrame
Create DataFrame
df = conn.table('MY_TABLE', schema='MY_SCHEMA')
print(f"Shape: {df.shape}")
df.head(10).collect()
undefineddf = conn.table('MY_TABLE', schema='MY_SCHEMA')
print(f"Shape: {df.shape}")
df.head(10).collect()
undefinedPAL Classification
PAL分类
python
from hana_ml.algorithms.pal.unified_classification import UnifiedClassificationpython
from hana_ml.algorithms.pal.unified_classification import UnifiedClassificationTrain model
Train model
clf = UnifiedClassification(func='RandomDecisionTree')
clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET')
clf = UnifiedClassification(func='RandomDecisionTree')
clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET')
Predict & evaluate
Predict & evaluate
predictions = clf.predict(test_df, features=['F1', 'F2', 'F3'])
score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET')
undefinedpredictions = clf.predict(test_df, features=['F1', 'F2', 'F3'])
score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET')
undefinedAPL AutoML
APL AutoML
python
from hana_ml.algorithms.apl.classification import AutoClassifierpython
from hana_ml.algorithms.apl.classification import AutoClassifierAutomated classification
Automated classification
auto_clf = AutoClassifier()
auto_clf.fit(train_df, label='TARGET')
predictions = auto_clf.predict(test_df)
undefinedauto_clf = AutoClassifier()
auto_clf.fit(train_df, label='TARGET')
predictions = auto_clf.predict(test_df)
undefinedModel Persistence
模型持久化
python
from hana_ml.model_storage import ModelStorage
ms = ModelStorage(conn)
clf.name = 'MY_CLASSIFIER'
ms.save_model(model=clf, if_exists='replace')python
from hana_ml.model_storage import ModelStorage
ms = ModelStorage(conn)
clf.name = 'MY_CLASSIFIER'
ms.save_model(model=clf, if_exists='replace')Core Libraries
核心库
PAL (Predictive Analysis Library)
PAL(预测分析库)
- 100+ algorithms executed in-database
- Categories: Classification, Regression, Clustering, Time Series, Preprocessing
- Key classes: ,
UnifiedClassification,UnifiedRegression,KMeansARIMA - See: for complete list
references/PAL_ALGORITHMS.md
- 100+算法,在库内执行
- 类别:分类、回归、聚类、时间序列、预处理
- 核心类:、
UnifiedClassification、UnifiedRegression、KMeansARIMA - 详见:获取完整列表
references/PAL_ALGORITHMS.md
APL (Automated Predictive Library)
APL(自动化预测库)
- AutoML capabilities with automatic feature engineering
- Key classes: ,
AutoClassifier,AutoRegressorGradientBoostingClassifier - See: for details
references/APL_ALGORITHMS.md
- AutoML能力,支持自动特征工程
- 核心类:、
AutoClassifier、AutoRegressorGradientBoostingClassifier - 详见:获取详情
references/APL_ALGORITHMS.md
DataFrames
DataFrames
- Lazy evaluation - builds SQL until called
collect() - In-database processing for optimal performance
- See: for complete API
references/DATAFRAME_REFERENCE.md
- 延迟计算 - 直到调用才生成SQL
collect() - 库内处理,实现最优性能
- 详见:获取完整API
references/DATAFRAME_REFERENCE.md
Visualizers
可视化工具
- EDA plots, model explanations, metrics
- SHAP integration for model interpretability
- See: for 14 visualization modules
references/VISUALIZERS.md
- 探索性数据分析(EDA)图表、模型解释、指标展示
- SHAP集成,提升模型可解释性
- 详见:了解14个可视化模块
references/VISUALIZERS.md
Common Patterns
常见模式
Train-Test Split
训练-测试-验证集拆分
python
from hana_ml.algorithms.pal.partition import train_test_val_split
train, test, val = train_test_val_split(
data=df,
training_percentage=0.7,
testing_percentage=0.2,
validation_percentage=0.1
)python
from hana_ml.algorithms.pal.partition import train_test_val_split
train, test, val = train_test_val_split(
data=df,
training_percentage=0.7,
testing_percentage=0.2,
validation_percentage=0.1
)Feature Importance
特征重要性
python
undefinedpython
undefinedAPL models
APL models
importance = auto_clf.get_feature_importances()
importance = auto_clf.get_feature_importances()
PAL models
PAL models
from hana_ml.algorithms.pal.preprocessing import FeatureSelection
fs = FeatureSelection()
fs.fit(train_df, features=features, label='TARGET')
undefinedfrom hana_ml.algorithms.pal.preprocessing import FeatureSelection
fs = FeatureSelection()
fs.fit(train_df, features=features, label='TARGET')
undefinedPipeline
流水线
python
from hana_ml.algorithms.pal.pipeline import Pipeline
from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer
pipeline = Pipeline([
('imputer', Imputer(strategy='mean')),
('normalizer', FeatureNormalizer()),
('classifier', UnifiedClassification(func='RandomDecisionTree'))
])python
from hana_ml.algorithms.pal.pipeline import Pipeline
from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer
pipeline = Pipeline([
('imputer', Imputer(strategy='mean')),
('normalizer', FeatureNormalizer()),
('classifier', UnifiedClassification(func='RandomDecisionTree'))
])Best Practices
最佳实践
- Use lazy evaluation - Operations build SQL without execution until
collect() - Leverage in-database processing - Keep data in HANA for performance
- Use Unified interfaces - Consistent APIs across algorithms
- Save models - Use for persistence
ModelStorage - Explain predictions - Use SHAP explainers for interpretability
- Monitor AutoML - Use for long-running jobs
PipelineProgressStatusMonitor
- 使用延迟计算 - 操作仅生成SQL,直到调用才执行
collect() - 利用库内处理 - 数据保留在HANA中以提升性能
- 使用统一接口 - 所有算法采用一致的API
- 保存模型 - 使用实现持久化
ModelStorage - 解释预测结果 - 使用SHAP解释器提升模型可解释性
- 监控AutoML任务 - 对长时间运行的任务使用
PipelineProgressStatusMonitor
Bundled Resources
配套资源
Reference Files
参考文档
-
(479 lines)
references/DATAFRAME_REFERENCE.md- ConnectionContext API, DataFrame operations, SQL generation
-
(869 lines)
references/PAL_ALGORITHMS.md- Complete PAL algorithm reference (100+ algorithms)
- Classification, Regression, Clustering, Time Series, Preprocessing
-
(534 lines)
references/APL_ALGORITHMS.md- AutoML capabilities, automated feature engineering
- AutoClassifier, AutoRegressor, GradientBoosting classes
-
(704 lines)
references/VISUALIZERS.md- 14 visualization modules (EDA, SHAP, metrics, time series)
- Plot types, configuration, export options
-
(626 lines)
references/SUPPORTING_MODULES.md- Model storage, spatial analytics, graph algorithms
- Text mining, statistics, error handling
-
(479行)
references/DATAFRAME_REFERENCE.md- ConnectionContext API、DataFrame操作、SQL生成
-
(869行)
references/PAL_ALGORITHMS.md- 完整PAL算法列表(100+算法)
- 分类、回归、聚类、时间序列、预处理
-
(534行)
references/APL_ALGORITHMS.md- AutoML能力、自动特征工程
- AutoClassifier、AutoRegressor、GradientBoosting类
-
(704行)
references/VISUALIZERS.md- 14个可视化模块(EDA、SHAP、指标、时间序列)
- 图表类型、配置、导出选项
-
(626行)
references/SUPPORTING_MODULES.md- 模型存储、空间分析、图算法
- 文本挖掘、统计、错误处理
Error Handling
错误处理
python
from hana_ml.ml_exceptions import Error
try:
clf.fit(train_df, features=features, label='TARGET')
except Error as e:
print(f"HANA ML Error: {e}")python
from hana_ml.ml_exceptions import Error
try:
clf.fit(train_df, features=features, label='TARGET')
except Error as e:
print(f"HANA ML Error: {e}")