Fine-tuning JavaScript chunks with Webpack 4

Fine-tuning JavaScript chunks with Webpack 4

As many developers out there, we used to concatenate and minify our JavaScript codebases, because performance matters. We would try to reduce as much as possible the number of requests to the server, and maybe extract our most repeated dependencies (react, date-fns, maybe one or two lodash libraries not yet replaced by modern ECMAScript) in a big JavaScript chunk.

For context, a chunk is a file created concatenating some of your modules. The modules are your JavaScript dependencies. The entrypoints are the files that you will include in your web page, which will pull their dependencies in chunks. Ready to play with all this?

The old world: a single chunk containing all our dependencies

 cacheGroups: {
  // common chunk to the entire application
  vendors: {
    test: /node_modules\/(react|lodash|date-fns)/,
    priority: 10,
    minChunks: 5,
    name: 'vendors'
  }
}

We used to have a single shared chunk with most of our node_modules, which has some good and bad:

  • Easy to understand and review (good)
  • Lowest possible number of files to download, maximizing cacheability. (good)
  • Manually specify the modules to be included (bad)
  • Big chunk shared by most (well, all) entrypoints, introducing unnecessary overhead (bad)

The webpack documentation pointed away from this, so it was time to revisit our assumptions. Maybe this is not a good idea anymore.

New in Webpack 4: automatically create additional chunks as needed

We decided to try the default configuration in Webpack as a starting point, and suddenly found dozens of new JavaScript chunks generated by splitChinks.cacheGroups. This did not seem right, as it would impact performance.

However, it seems that under HTTP/2 there is not a big difference between downloading 1-50 modules, as long as you keep the number relatively small. In our case, webpack would generate a new chunk as soon as two modules would benefit from it, though this is configurable. After playing with some configurations and measuring the result, we ended back working with the defaults.

With this configuration, webpack was creating 5-10 chunks for each entrypoint to minimize the amount of JS code to be downloaded, which is:

  • Harder to understand and review (bad)
  • More chunks to download per web page (bad)
  • Automatically decides the number and distribution of chunks, based on modules to be included (good)
  • Files specific for each entrypoint, minimizing total size (good)

Reviewing the chunks is still possible if you spend enough time with `webpack-bundle-analyzer`, and the number of files is negligible under http/2, which is our case.

Our JS landscape is composed by +250 JS files and +20 entrypoints. The final size of some of these entrypoints is close to 400KB before minimization and gzipping (yes, we are planning to load some chunks dynamically - this will arrive later). It's clear to us that we could benefit from some reuse.

According to our numbers, the gains are not significant in big entrypoints (less than 10-20% in files larger than 200 KB), but it gets close to 50% for smaller files. It also removes the need to periodically review the distribution of modules across the application, which is always welcome.

However, we found some optimizations and some manual work that was still required for us to make this work.

Enter: our chunk optimizations

Once distributed in chunks, you need to include all the chunks of a given entrypoint to the web page. Our first roadblock was that the chunk names were generated by concatenating names in caterpillars like events-edit~events~invoices~ticket-checkout~ticket~user.js (those are the names of entry points sharing the same chunk).

This presents some downsides, as large names are wrong and there is some potential to leak internal details of our implementation. We decided to use an opaque hash of this filename, base64-encoded to save some space:

// btoa is not available on nodejs:
function btoa(number) {
  return Buffer.from('' + number).toString('base64');
}

// https://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript
String.prototype.hashCode = function() {
  var hash = 0,
    i,
    chr;
  if (this.length === 0) return hash;
  for (i = 0; i < this.length; i++) {
    chr = this.charCodeAt(i);
    hash = (hash << 5) - hash + chr;
    hash |= 0; // Convert to 32bit integer
  }
  return hash;
};

Now we can use this in our config:

optimization: {
  runtimeChunk: 'single',
  splitChunks: {
    // create chunks for static and async modules
    chunks: 'all',

    // better defaults for HTTP/2
    maxInitialRequests: 20,
    maxAsyncRequests: 20,
    cacheGroups: {
      partial: {
        name(_, chunks, cacheGroupKey) {
          const allChunksNames = chunks.map(item => item.name).join('~');
          return `${cacheGroupKey}-${btoa(allChunksNames.hashCode())}`;
        }
      }
    }
  }
}

We still need to save the list of entrypoints+chunks somewhere. For this, we add an additional plugin:

plugins: [
  // write file with the chunks composition
  {
    apply(compiler) {
      compiler.hooks.afterEmit.tap('AfterEmitPlugin', compilation => {
        const stats = compilation.getStats().toJson({ publicPath: true });
        const result = {};
        Object.entries(stats.entrypoints).forEach(([key, value]) => {
          result[key] = value.assets
            // filter out .map resources
            .filter(asset => asset.endsWith('.js'))
            // prefix our public http folder for JS files
            .map(asset => `/js/${asset}`);
        });
        fs.writeFileSync(
          'output_folder/webpack-assets.json',
          JSON.stringify(result, null, 2)
        );
      });
    }
  }
]

The generated webpack-assets.json looks something like this:

{
  "home": [
    "/js/runtime.bf83fefcacda138fd53d.js",
    "/js/partial-ODM4NjQ1MDc1.1d680b4962c2a5e51611.js",
    "/js/partial-MTA5MjY1ODYwOA==.9d2c4366dbe77bcac7b5.js",
    "/js/home.ff711f34cae1b3e507ad.js"
  ],
  // ...other entrypoints follow 
}

Notice that each chunk includes its hash to maximize cacheability by using output.filename: '[name].[chunkhash].js'. The first chunk is the runtime chunk, which in this configuration is the same for all entry points.

Now you can link to the chunks for the entrypoint that you want to include in your HTML page, using your server-side technology of choice:

<html>
  <head>
    <title>Home page</head>
    <script src="/js/runtime.bf83fefcacda138fd53d.js" defer></script>
    <script src="/js/partial-ODM4NjQ1MDc1.1d680b4962c2a5e51611.js" defer></script>
    <script src="/js/partial-MTA5MjY1ODYwOA==.9d2c4366dbe77bcac7b5.js" defer></script>
    <script src="/js/home.ff711f34cae1b3e507ad.js" defer></script>
  </head>
</html>

Notice that those are <script> tags in the <head> with the help of defer.  The alternative would be to use a prefetch or preload link, to start downloading these resources on a http/2 connection as soon as possible.

That's it! We will keep posting as we iterate around this solution, so if you want to hear about our next steps, make sure you follow us on Twitter.