Blog

WordPress Importer plugin Content-Length problem solved

File-Compression

As a WordPress developer, I’m often tasked with migrating web sites and content between different WordPress installations. There are a variety of reasons this is done — maybe the client wants to switch hosting providers, or maybe we’re pushing from a test environment to a production environment, or maybe a new developer has started and they need a working copy of a site.

There are, of course, also a number of ways to perform a task like this: backup plugins, migration plugins, cloning plugins, database dumps, file archives, and more. I’ve used each of these with varying levels of success, so depending on the specifics of what needs to be done, I choose the tool that is most appropriate for the job. Most of the time I just end up using the WordPress Importer plugin because it’s quick and easy.

Despite it being easy, every once in a while I encounter an issue importing photos and other media using this tool. It’s usually a minor problem and I end up manually fixing the missing images … but I always wonder why the problem is so sporadic. It usually only happens with a few of the files, not all of them. I thought, perhaps, it was due to a crappy hosting provider on the remote end being unable to serve the images properly.

After encountering the problem while doing maintenance on our own site, I knew that hosting could not be the issue. I was importing about 100 images and over a third of them were failing. Repeating the import would yield the same problem, with the same exact images. That led me to believe there was something very specific about this certain group of images that was causing them to fail.

As a developer, there’s only one thing to do: look at the code and narrow down where the problem is. So I dug into the wordpress-importer.php file and searched for the “Failed to import Media” error I was seeing. Based on that section of code, I found the IMPORT_DEBUG option. I turned it on and tried the import again. Now I was able to see more information about the failure; the enhanced logging showed the message “Remote file is incorrect size“.

Okay, so images are being loaded from the remote server fine, but there is a mismatch with the file size that WordPress is expecting to receive. This error is thrown when a comparison of the Content-Length HTTP header versus the size of the file actually downloaded to disk does not match. After some head scratching and Internet research, I came across a post on the WordPress support forums from another gentleman having the same issue.

I tested the import again using his proposed changes and was able to get the importer to load all of my images 100% of the time. That still left me wondering why this was originally only failing for some images and not all of them, so I attempted to load a few of the failed images and a few of the successful images in my browser to compare the headers. I figured out that all of the images that were failing were very small in size. Since they were small, the web server could send them all in one chunk and therefore set a Content-Length header in the response. The reason the Content-Length was not matching the actual file size is because the remote server is using HTTP compression and the Content-Length header it’s sending is the size of the compressed file, not the original file size.

The larger images were being chunked (sent in several small pieces), and when chunking is involved, no Content-Length header is sent because the server cannot determine in advance what the final Content-Length will be. Since no Content-Length header was being sent for this group of images, the importer plugin was not doing a size comparison and let these images in without an issue.

Based on this testing, I determined that the WordPress Importer plugin is not compatible with remote servers that use HTTP compression. This led me to dig deeper into the WordPress core code to determine how WordPress handles compression compatibility in general. The WP_Http class sends an Accept-Encoding request header by default if the PHP installation can support deflate, compress, and/or gzip.

Thankfully, WordPress has great support for filters and hooks which allow changing the behavior of core functionality without modifying the original source code. The WP_Http::accept_encoding() method uses the wp_http_accept_encoding filter. I reverted the wordpress-importer.php file back to its original state and instead told the WordPress HTTP client not to send any Accept-Encoding headers by using the following code in my theme’s functions.php file:

add_filter( 'wp_http_accept_encoding', function( $type, $url, $args ) { return array(); }, 10, 3 );

(Note: The above will only work with PHP v5.30 and higher since it uses a PHP anonymous function)

Bingo! Another successful import with no images failing. This is because compression is no longer being used and now the Content-Length headers match the file size on disk.

I have reported this bug to the maintainers of the WordPress Importer plugin, but until it is fixed, I will include this filter as part of my standard WordPress setup tasks so I can go on importing images in the future.


Updated March 2019:

The original filter I created above will only work through WordPress v4.5.x. In v4.6, WordPress switched from using the WP_Http_Curl class to the Requests_Transport_cURL class, which doesn’t have the wp_http_accept_encoding filter.

For WordPress v4.6 and above, please use the following:

add_filter( 'http_request_args', function( $r, $url ) { $r['headers']['Accept-Encoding'] = 'identity'; return $r; }, 10, 2 );

Thanks to yearn in the comments below for letting me know the original filter no longer works.


More readin'