Binary Classification

This example demonstrates how to evaluate QCMLClassifier performance on a binary classification task using cross-validation and compare it with traditional machine learning methods.

Overview

This comparison study includes:

  • Rigorous evaluation using 5-fold stratified cross-validation

  • Proper preprocessing with feature standardization

  • Multiple metrics to assess performance comprehensively

  • Head-to-head comparison with established sklearn models

The analysis shows QCMLClassifier achieving competitive performance against traditional methods.

Complete Example

from honeio.integrations.sklearn.qcmlsklearn import QCMLClassifier

import pandas as pd

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler

# Load breast cancer dataset and set up CV
SEED = 0
K_FOLDS = 5

X, y = datasets.load_breast_cancer(return_X_y=True)
kf = StratifiedKFold(n_splits=K_FOLDS, shuffle=True, random_state=SEED)

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")
print(f"y num classes: {len(set(y))}")

Dataset Information

The breast cancer dataset provides a good benchmark for binary classification:

X shape: (569, 30)
y shape: (569,)
y num classes: 2
  • 569 samples with 30 features

  • Binary classification problem (malignant vs benign)

  • Well-balanced dataset suitable for standard metrics

Cross-Validation Setup

# Initialize models and run 5-fold CV
model_list = [
    QCMLClassifier(),
    LogisticRegression(),
    RandomForestClassifier(),
]

error_funcs = [
    balanced_accuracy_score,
    accuracy_score,
    f1_score,
]

error_stats = {}
for model in model_list:
    model_name = model.__class__.__name__
    print(f"Training {model_name}...")
    for fold, (train_index, test_index) in enumerate(kf.split(X, y)):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        # Standardize the features
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        # Train the model using the training sets
        model.fit(X_train_scaled, y_train)

        # Make predictions using the testing set
        y_pred = model.predict(X_test_scaled)

        for error_func in error_funcs:
            error_stats.setdefault((model_name, fold), {})[f"{error_func.__name__}"] = error_func(y_test, y_pred)

Training Output

During training, you’ll see community edition warnings for QCMLClassifier:

2025-08-07 11:17:14 [warning  ]
You are using the community edition of honeio.
There are some limitations that can be lifted by purchasing a commercial license.
Please contact support@qognitive.io for more information.

Training QCMLClassifier...
[Multiple community edition warnings during 5-fold CV]
Training LogisticRegression...
Training RandomForestClassifier...

Results Analysis

# Summarize results
error_stats_df = pd.DataFrame(error_stats).T
average_error_stats = error_stats_df.groupby(level=0).mean()
average_error_stats.sort_values('balanced_accuracy_score', ascending=False, inplace=True)
print(average_error_stats)

Performance Results

The cross-validation results show competitive performance:

Model Comparison Results

Model

Balanced Accuracy

Accuracy

F1 Score

QCMLClassifier

0.9765

0.9789

0.9832

LogisticRegression

0.9747

0.9789

0.9834

RandomForestClassifier

0.9585

0.9649

0.9723

Key Findings

QCMLClassifier Performance
  • Highest balanced accuracy at 97.65%

  • Competitive with LogisticRegression across all metrics

  • Outperforms RandomForestClassifier significantly

Model Comparison Insights
  • QCMLClassifier achieves state-of-the-art performance on this dataset

  • Minimal performance gap with traditional linear methods

  • Quantum enhancement provides competitive edge over tree-based methods

Best Practices Demonstrated

Cross-Validation Strategy
  • Stratified K-Fold maintains class balance across folds

  • Fixed random seed ensures reproducible results

  • Multiple metrics provide comprehensive evaluation

Data Preprocessing
  • StandardScaler normalizes features for fair comparison

  • Proper train/test splitting prevents data leakage

  • Fold-specific scaling avoids information leakage

Evaluation Methodology
  • Balanced accuracy accounts for class imbalance

  • Standard accuracy for general performance

  • F1 score balances precision and recall

Next Steps

Parameter Tuning
  • Experiment with QCMLClassifier hyperparameters

  • Try different numbers of epochs

  • Explore hilbert_space_dim settings

Extended Comparisons
  • Include more traditional models (SVM, XGBoost)

  • Test on different datasets

  • Compare training times and resource usage

Advanced Analysis
  • Statistical significance testing

  • Learning curves analysis

  • Feature importance comparison

Related Examples