[Fixed] TypeError: _tokenize() got an unexpected keyword argument 'pad_to_max_length'

Table of Contents

The Problem: TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’
The Solution: Updating the Padding Argument
Alternate Solution: Library Versioning

When working with Natural Language Processing (NLP) models, particularly transformers like BERT, tokenization is a fundamental step. The “tokenizer.encode_plus” method from the Hugging Face “transformers” library is a popular choice for this. However, you might encounter errors, and this guide will help you navigate a common one.

The Problem: TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’

You might be trying to tokenize sentences for a BERT classifier, using code similar to this:

input_ids = []
attention_masks = []
for sent in sentences:
    encoded_dict = tokenizer.encode_plus(
                        # Sentence to encode.
                        sent,
                        # Add '[CLS]' and '[SEP]'
                        add_special_tokens = True,
                        # Pad & truncate all sentences.
                        max_length = 64,
                        pad_to_max_length = True,
                        # Construct attn. masks.
                        return_attention_mask = True,
                        # Return pytorch tensors.
                        return_tensors = 'pt',
                   )

If you encounter the “TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’“, it indicates an issue with how the padding argument is being used.

The Solution: Updating the Padding Argument

The “pad_to_max_length” argument has been deprecated or changed in newer versions of the “transformers” library. The correct way to specify padding to the maximum length is by using the “padding” argument with the value ‘max_length’.

Here’s how to fix the code:

input_ids = []
attention_masks = []
for sent in sentences:
    encoded_dict = tokenizer.encode_plus(
                        # Sentence to encode.
                        sent,
                        # Add '[CLS]' and '[SEP]'
                        add_special_tokens = True,
                        # Pad & truncate all sentences.
                        max_length = 64,
                        # Corrected padding argument
                        padding = 'max_length',
                        # Construct attn. masks.
                        return_attention_mask = True,
                        # Return pytorch tensors.
                        return_tensors = 'pt',
                   )

This change aligns with the updated Hugging Face documentation.

Alternate Solution: Library Versioning

Another reason for this error could be an outdated version of the transformers library, especially if installed via conda. Older versions (like 2.1.1) might not support the pad_to_max_length argument as expected or may have different argument names.

In such cases, the best approach is to:

Uninstall the current transformers library.
Reinstall it using pip: pip install transformers.
Alternatively, create a new conda environment and install all necessary packages using pip instead of conda forge for the transformers library.

Ensuring you have an up-to-date version of the library can often resolve such unexpected argument errors.

Categorized in:

Machine Learning Python

[Fixed] TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’

The Problem: TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’

The Solution: Updating the Padding Argument

Alternate Solution: Library Versioning

Leave a Reply Cancel reply

Other Stories

Manually download .vsix files now that the VS Code Marketplace no longer supplies them in-browser

[Fixed] Angular HttpClient make GET request with JSON in the body

Create Bijective Map Type in C++

[Fixed] “Component auth has not been registered yet” on app launch

[Fixed] Error: The file A has been modified by A on A

[Fixed] throw new TypeError(Missing parameter name at ${i}: ${DEBUG_URL});

Create Bijective Map Type in C++

[Fixed] “Component auth has not been registered yet” on app launch

[Fixed] Error: The file A has been modified by A on A

It Looks Like You Have AdBlocker Enabled

To disable ad blocker on this site:

Press ESC to close

Or check our Popular Categories...

The Problem: TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’

The Solution: Updating the Padding Argument

Alternate Solution: Library Versioning

Leave a Reply Cancel reply

Related Articles

Other Stories

Manually download .vsix files now that the VS Code Marketplace no longer supplies them in-browser

[Fixed] Angular HttpClient make GET request with JSON in the body

It Looks Like You Have AdBlocker Enabled

To disable ad blocker on this site: