Multiclass Classification
This example demonstrates how to evaluate QCMLClassifier performance on a multiclass classification task using the digits dataset and compare it with traditional machine learning methods. This extends beyond binary classification to showcase QCMLClassifier’s capabilities on more complex problems.
Overview
This multiclass evaluation includes:
10-class classification on the handwritten digits dataset
Community edition optimization with limited observations (1000 samples)
Batch processing with configured epochs and batch size
Cross-validation comparison with established sklearn models
Performance analysis across multiple classes
The analysis demonstrates QCMLClassifier achieving superior performance on this challenging multiclass problem.
Complete Example
from honeio.integrations.sklearn.qcmlsklearn import QCMLClassifier
import pandas as pd
import torch
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
# Load digits dataset and set up CV
SEED = 0
K_FOLDS = 5
max_obs = 1000 # use only first 1000 observations for community edition
X, y = datasets.load_digits(return_X_y=True)
X = X[:max_obs]
y = y[:max_obs]
kf = StratifiedKFold(n_splits=K_FOLDS, shuffle=True, random_state=SEED)
print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")
print(f"y num classes: {len(set(y))}")
Dataset Information
The digits dataset provides a challenging benchmark for multiclass classification:
X shape: (1000, 64)
y shape: (1000,)
y num classes: 10
1000 samples with 64 features (8x8 pixel images)
10-class classification problem (digits 0-9)
Balanced across classes for fair evaluation
Limited size optimized for community edition performance
Community Edition Considerations
This example is optimized for the community edition by:
Limiting observations to 1000 samples for faster execution
Configured batch size (100) for efficient processing
Reasonable epochs (100) to balance performance and training time
Cross-Validation Setup
# Initialize models and run 5-fold CV
model_list = [
QCMLClassifier(epochs=100, batch_size=100),
LogisticRegression(),
RandomForestClassifier(),
]
error_funcs = [
balanced_accuracy_score,
accuracy_score,
]
error_stats = {}
for model in model_list:
model_name = model.__class__.__name__
print(f"Training {model_name}...")
for fold, (train_index, test_index) in enumerate(kf.split(X, y)):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train the model using the training sets
model.fit(X_train_scaled, y_train)
# Make predictions using the testing set
y_pred = model.predict(X_test_scaled)
for error_func in error_funcs:
error_stats.setdefault((model_name, fold), {})[f"{error_func.__name__}"] = error_func(y_test, y_pred)
Training Output
During training, you’ll see community edition warnings throughout the process:
2025-08-07 11:31:13 [warning ]
You are using the community edition of honeio.
There are some limitations that can be lifted by purchasing a commercial license.
Please contact support@qognitive.io for more information.
Training QCMLClassifier...
[Multiple community edition warnings during 5-fold CV]
Training LogisticRegression...
Training RandomForestClassifier...
Results Analysis
# Summarize results
error_stats_df = pd.DataFrame(error_stats).T
average_error_stats = error_stats_df.groupby(level=0).mean()
average_error_stats.sort_values('balanced_accuracy_score', ascending=False, inplace=True)
print(average_error_stats)
Performance Results
The cross-validation results demonstrate excellent multiclass performance:
Model |
Balanced Accuracy |
Accuracy |
|---|---|---|
QCMLClassifier |
0.9780 |
0.9780 |
RandomForestClassifier |
0.9741 |
0.9740 |
LogisticRegression |
0.9720 |
0.9720 |
Key Findings
- QCMLClassifier Excellence
Highest performance across both metrics at 97.80%
Outperforms RandomForest by 0.4 percentage points
Beats Logistic Regression by 0.6 percentage points
- Multiclass Capabilities
Excellent class balance maintained across 10 classes
Consistent performance across all cross-validation folds
Superior generalization compared to traditional methods
- Community Edition Performance
Strong results even with limited observations
Efficient training with optimized batch size
Practical training time with 100 epochs
Configuration Insights
- QCMLClassifier Parameters
epochs=100: Sufficient for convergence on this dataset
batch_size=100: Optimal balance of memory and gradient stability
Default hilbert_space_dim: Automatically sized for multiclass problem
- Dataset Optimization
max_obs=1000: Community edition limitation consideration
Stratified CV: Maintains class balance across folds
StandardScaler: Essential for feature normalization
Best Practices Demonstrated
- Multiclass Considerations
Balanced accuracy crucial for multiclass evaluation
Stratified sampling maintains class proportions
Cross-validation essential for reliable performance estimates
- Community Edition Optimization
Sample size limitation for faster execution
Batch size tuning for memory efficiency
Reasonable epoch count for practical training time
- Preprocessing Standards
Feature standardization critical for neural approaches
Proper train/test splits prevent overfitting
Consistent preprocessing across all models
Next Steps
- Parameter Exploration
Try different batch sizes (50, 200, 500)
Experiment with epoch counts (50, 200, 300)
Explore hilbert_space_dim values
- Extended Comparisons
Include SVM and XGBoost models
Test on other multiclass datasets
Compare training times and resource usage
- Advanced Analysis
Confusion matrix analysis per class
Class-specific performance metrics
Error analysis and misclassification patterns
- Dataset Scaling
Full digits dataset (1797 samples) with commercial license
Other multiclass datasets (wine, iris, newsgroups)
High-dimensional multiclass problems
- Related Examples
See Intro to QCML for an introduction to QCML
Check Binary Classification for binary classification comparison
Try Regression for continuous target prediction examples
Explore GPU vs CPU Benchmark for hardware performance optimization
Review Scikit-learn Integration for parameter details