When working with Natural Language Processing (NLP) models, particularly transformers like BERT, tokenization is a fundamental step. The “tokenizer.encode_plus” method from the Hugging Face “transformers” library is a popular choice for this. However, you might encounter errors, and this guide will help you navigate a common one.
The Problem: TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’
You might be trying to tokenize sentences for a BERT classifier, using code similar to this:
If you encounter the “TypeError: _tokenize() got an unexpected keyword argument ‘pad_to_max_length’“, it indicates an issue with how the padding argument is being used.
The Solution: Updating the Padding Argument
The “pad_to_max_length” argument has been deprecated or changed in newer versions of the “transformers” library. The correct way to specify padding to the maximum length is by using the “padding” argument with the value ‘max_length’.
Here’s how to fix the code:
This change aligns with the updated Hugging Face documentation.
Alternate Solution: Library Versioning
Another reason for this error could be an outdated version of the transformers library, especially if installed via conda. Older versions (like 2.1.1) might not support the pad_to_max_length argument as expected or may have different argument names.
In such cases, the best approach is to:
- Uninstall the current transformers library.
- Reinstall it using pip: pip install transformers.
- Alternatively, create a new conda environment and install all necessary packages using pip instead of conda forge for the transformers library.
Ensuring you have an up-to-date version of the library can often resolve such unexpected argument errors.