Binary Classification
This example demonstrates how to evaluate QCMLClassifier performance on a binary classification task using cross-validation and compare it with traditional machine learning methods.
Overview
This comparison study includes:
Rigorous evaluation using 5-fold stratified cross-validation
Proper preprocessing with feature standardization
Multiple metrics to assess performance comprehensively
Head-to-head comparison with established sklearn models
The analysis shows QCMLClassifier achieving competitive performance against traditional methods.
Complete Example
from honeio.integrations.sklearn.qcmlsklearn import QCMLClassifier
import pandas as pd
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
# Load breast cancer dataset and set up CV
SEED = 0
K_FOLDS = 5
X, y = datasets.load_breast_cancer(return_X_y=True)
kf = StratifiedKFold(n_splits=K_FOLDS, shuffle=True, random_state=SEED)
print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")
print(f"y num classes: {len(set(y))}")
Dataset Information
The breast cancer dataset provides a good benchmark for binary classification:
X shape: (569, 30)
y shape: (569,)
y num classes: 2
569 samples with 30 features
Binary classification problem (malignant vs benign)
Well-balanced dataset suitable for standard metrics
Cross-Validation Setup
# Initialize models and run 5-fold CV
model_list = [
QCMLClassifier(),
LogisticRegression(),
RandomForestClassifier(),
]
error_funcs = [
balanced_accuracy_score,
accuracy_score,
f1_score,
]
error_stats = {}
for model in model_list:
model_name = model.__class__.__name__
print(f"Training {model_name}...")
for fold, (train_index, test_index) in enumerate(kf.split(X, y)):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train the model using the training sets
model.fit(X_train_scaled, y_train)
# Make predictions using the testing set
y_pred = model.predict(X_test_scaled)
for error_func in error_funcs:
error_stats.setdefault((model_name, fold), {})[f"{error_func.__name__}"] = error_func(y_test, y_pred)
Training Output
During training, you’ll see community edition warnings for QCMLClassifier:
2025-08-07 11:17:14 [warning ]
You are using the community edition of honeio.
There are some limitations that can be lifted by purchasing a commercial license.
Please contact support@qognitive.io for more information.
Training QCMLClassifier...
[Multiple community edition warnings during 5-fold CV]
Training LogisticRegression...
Training RandomForestClassifier...
Results Analysis
# Summarize results
error_stats_df = pd.DataFrame(error_stats).T
average_error_stats = error_stats_df.groupby(level=0).mean()
average_error_stats.sort_values('balanced_accuracy_score', ascending=False, inplace=True)
print(average_error_stats)
Performance Results
The cross-validation results show competitive performance:
Model |
Balanced Accuracy |
Accuracy |
F1 Score |
|---|---|---|---|
QCMLClassifier |
0.9765 |
0.9789 |
0.9832 |
LogisticRegression |
0.9747 |
0.9789 |
0.9834 |
RandomForestClassifier |
0.9585 |
0.9649 |
0.9723 |
Key Findings
- QCMLClassifier Performance
Highest balanced accuracy at 97.65%
Competitive with LogisticRegression across all metrics
Outperforms RandomForestClassifier significantly
- Model Comparison Insights
QCMLClassifier achieves state-of-the-art performance on this dataset
Minimal performance gap with traditional linear methods
Quantum enhancement provides competitive edge over tree-based methods
Best Practices Demonstrated
- Cross-Validation Strategy
Stratified K-Fold maintains class balance across folds
Fixed random seed ensures reproducible results
Multiple metrics provide comprehensive evaluation
- Data Preprocessing
StandardScaler normalizes features for fair comparison
Proper train/test splitting prevents data leakage
Fold-specific scaling avoids information leakage
- Evaluation Methodology
Balanced accuracy accounts for class imbalance
Standard accuracy for general performance
F1 score balances precision and recall
Next Steps
- Parameter Tuning
Experiment with QCMLClassifier hyperparameters
Try different numbers of epochs
Explore hilbert_space_dim settings
- Extended Comparisons
Include more traditional models (SVM, XGBoost)
Test on different datasets
Compare training times and resource usage
- Advanced Analysis
Statistical significance testing
Learning curves analysis
Feature importance comparison
- Related Examples
See Intro to QCML for an introduction to QCML
Check Multiclass Classification for 10-class classification examples
Try Regression for continuous target prediction examples
Explore GPU vs CPU Benchmark for hardware performance optimization
Review Scikit-learn Integration for parameter details