The moment you use Scikit-learn, you’re bound to experience cryptic errors that can confuse you. Let’s say When performing hyperparameter tuning with XGBoost using Scikit-learn’s RandomizedSearchCV, you might encounter this cryptic error: 

AttributeError: 'super' object has no attribute '__sklearn_tags__'

This blog dives deep into what this error means, why it occurs, and how to resolve it step by step. We’ll use a XGBRegressor using RandomizedSearchCV or Custom Estimators as an example to make the explanation relatable and practical. 

What Are Scikit-learn Tags? 

Scikit-learn, a popular machine learning library in Python, uses a tagging system (__sklearn_tag) to assign properties to its estimators. This is helpful in that it identifies the capabilities and requirements of an estimator. For instance: 

  • Pipeline Integration: The tags determine how to pass data between different pipeline components. 
  • Validation: The tags help validate input data before processing to avoid runtime errors. 
  • Supervision: Whether the model is supervised or unsupervised. 

The error “‘super’ object has no attribute ‘__sklearn_tags__’” typically occurs when Scikit-learn attempts to access this method from a custom estimator, and it is either: 

  1. Misconfigured: The __sklearn_tags__ method is overridden incorrectly in the custom estimator. 
  1. Incompatible: The custom estimator is not aligned with the version of Scikit-learn being used. 

Understanding the Context 

When using XGBoost with Scikit-learn’s RandomizedSearchCV for hyperparameter tuning, we rely on Scikit-learn’s tagging system to: 

  • Validate the compatibility between XGBoost and Scikit-learn 
  • Ensure proper data handling in the cross-validation process 
  • Manage the parameter search efficiently 

Reproducing the Error 

Here’s a typical scenario where this error occurs when trying to tune an XGBRegressor: 

Recently, I started working on a Weather Prediction System, a project requiring machine learning models to forecast temperature and precipitation. For this project, I chose to use XGBoost, a powerful gradient boosting algorithm, combined with scikit-learn for hyperparameter tuning using RandomizedSearchCV. 

I used VS Code as my development environment. Here’s how I set up and ran the code. 

  1. Navigate to your desired location and create a folder: 
mkdir weather_prediction  
cd weather_prediction 
  1. Set up a virtual environment: 
python -m venv venvsource venv/bin/activate  
# On Windows: 
venv\Scripts\activate 
  1. Install necessary Python packages: 
pip install scikit-learn xgboost numpy 
  1. Launch VS Code in the project folder: 
code . 
  1. Create a file named train_model.py. 

Add and Run the Code 

  1. Paste the code into train_model.py
from sklearn.model_selection import RandomizedSearchCV 
from xgboost import XGBRegressor 
from sklearn.datasets import make_regression 
import numpy as np 
 
# Generate sample regression data 
X, y = make_regression(n_samples=100, n_features=10, random_state=42) 
 
# Initialize XGBoost regressor 
model = XGBRegressor() 
 
# Define parameter search space 
param_dist = { 
    'max_depth': [3, 4, 5], 
    'learning_rate': [0.01, 0.1], 
    'n_estimators': [100, 200], 
    'min_child_weight': [1, 3], 
    'subsample': [0.8, 0.9] 
} 
 
# Setup RandomizedSearchCV 
search = RandomizedSearchCV( 
    model, 
    param_dist, 
    cv=3, 
    n_iter=4, 
    n_jobs=-1, 
    random_state=42 
) 
 
# This line triggers the error with incompatible versions 
search.fit(X, y) 

 
  1. Run the file: 
python train_model.py 

Encountered Error 

When running the code, I encountered the following error: 

'super' object has no attribute '__sklearn_tags__' 
 

This is how i have encountered error in my Weather Prediction System 

Super Object
[Fixed] ‘super’ object has no attribute ‘sklearn_tags’ 4

This error occurs due to an incompatibility between XGBoost and scikit-learn versions. Specifically, the XGBoost version used did not fully support the newer scikit-learn interface. 

This error typically arises when using XGBoost versions above 1.6.0 in conjunction with newer versions of scikit-learn 

If you want to verify the version you can check this 

# Option A: Use older scikit-learn
 
pip install "scikit-learn<1.6" 
pip install xgboost 
 
# Option B: Use newer versions with warning instead of error
 
pip install "scikit-learn>=1.6.1" 
pip install xgboost 

Alternatively you can print the version’s as well 

import sklearn 
import xgboost 
 
print(f"scikit-learn version: {sklearn.__version__}") 
print(f"XGBoost version: {xgboost.__version__}") 
 
# Recommended combinations: 
# scikit-learn < 1.6 with any XGBoost version 
# scikit-learn >= 1.6.1 with XGBoost >= 2.0.3 

Also Read: 

Resolving the Error 

1. Upgrade or Downgrade Libraries 

As discussed earlier: 

  • Upgrade XGBoost to a version >= 1.6.0 
pip install --upgrade xgboost 
  • Or downgrade scikit-learn to version 1.0.2 
pip install scikit-learn==1.0.2 
 

2.Use Latest Development Version 

For the bleeding edge fixes: 

pip install git+https://github.com/dmlc/xgboost.git 

3. Alternative: Manual Hyperparameter Search 

Instead of downgrading or upgrading, you can directly bypass the issue by adding the Hyperparameter Search method to work independently. This involves specifying the without relying on the sklearn_tags mechanism. 

Manual Hyperparameter Search: Instead of relying on RandomizedSearchCV, the code manually iterates through all possible combinations of hyperparameters using the product function from itertools 

Model Evaluation: For each hyperparameter combination, the model is trained and evaluated using mean squared error (MSE). 

Best Parameters: After evaluating all combinations, the best parameters are stored and printed along with the best score. 

from xgboost import XGBRegressor 
from sklearn.datasets import make_regression 
from sklearn.metrics import mean_squared_error 
from itertools import product 
 
# Step 1: Generate Sample Data 
X, y = make_regression(n_samples=100, n_features=10, random_state=42) 
 
# Step 2: Initialize XGBoost Regressor 
model = XGBRegressor() 
 
# Step 3: Define Parameter Search Space 
param_dist = { 
    'max_depth': [3, 4, 5], 
    'learning_rate': [0.01, 0.1], 
    'n_estimators': [100, 200], 
    'min_child_weight': [1, 3], 
    'subsample': [0.8, 0.9] 
} 
 
# Step 4: Manually Perform Hyperparameter Search 
best_score = float('inf') 
best_params = None 
 
# Create all combinations of hyperparameters 
param_combinations = product( 
    param_dist['max_depth'], 
    param_dist['learning_rate'], 
    param_dist['n_estimators'], 
    param_dist['min_child_weight'], 
    param_dist['subsample'] 
) 
 
# Step 5: Loop Through All Combinations 
for params in param_combinations: 
    model.set_params( 
        max_depth=params[0], 
        learning_rate=params[1], 
        n_estimators=params[2], 
        min_child_weight=params[3], 
        subsample=params[4] 
    ) 
     
    # Step 6: Train the Model 
    model.fit(X, y) 
     
    # Step 7: Evaluate the Model Using Mean Squared Error 
    predictions = model.predict(X) 
    score = mean_squared_error(y, predictions) 
     
    # Step 8: Track the Best Hyperparameters and Score 
    if score < best_score: 
        best_score = score 
        best_params = params 
 
# Step 9: Display Best Parameters and Best Score 
print("Best Parameters:", best_params) 
print("Best Score:", best_score) 
 
Super Object
[Fixed] ‘super’ object has no attribute ‘sklearn_tags’ 5

Conclusion 

In my journey with the Weather Prediction System, I faced and resolved this error, learning about compatibility issues and their solutions. Whether by upgrading/downgrading libraries or using Hyperparameter Search, this challenge added valuable insights to my development process. I hope this guide helps you address similar challenges! 

Categorized in: