Ruby on Rails performance tuning

2022-11-10

Reduce Ruby on Rails memory usage and decrease latency with this comprehensive collection of performance tips from almost 15 years working in Rails.

General tips

Instrument your app in production and spend the majority of your time on the endpoints that are the busiest. Due to Amdahl's law there's almost no reason to spend huge amounts of effort on an endpoint that (for example) your servers are only spending ~2% time on. Even if you made that endpoint instantaneous it would only represent a 2% improvement at the most.

Profiling: derailed_benchmarks is a great gem for profiling Rails to show where memory budget is going. You can also use this to do before/after comparisons to check memory savings and performance improvements from specific changes.

Use native code

Use gems like fast_blank and oj (if you're parsing or emitting JSON) to reduce ruby object allocations and CPU cycles. If you're using redis significantly, especially bulk/pipelined operations, consider ruby-hiredis which wraps a client written in C.

These are essentially "free" and can reduce memory pressure, as well as save you latency.

Use memory-efficient gems

If larger app code changes are on the table, you can sometimes go a step further and intentionally pick leaner libraries for certain classes of problems. For example here is a comparison of memory usage among some popular HTTP libraries:

Net::HTTP0.16 MB
RestClient1.72 MB
faraday2.43 MB
HTTParty5.57 MB
http.rb7.53 MB
Benchmark code.

However note that HTTP is a really tough example to implement in nontrivial apps, since you have to worry about transitive dependencies of all your other libraries too.

Use fewer gems

Sometimes using a gem to accomplish something is just not worth it. Open source libraries are by definition built to support many use cases and sometimes you just need a single 20-line class in /lib to accomplish what you need. Watch out for convenience gems like Hashie which patch common core classes and can contribute a lot of memory usage.

Don't load gems you don't need

Make sure you aren't accidentally loading development/test gems in production mode - make sure they are added to the intended :group in your Gemfile. This can be the cause of wasted memory or CPU cycles, or worse, security issues.

Sometimes you need a gem but only in a rake task or a background job. In that case you should make sure its not automatically loaded in your Gemfile:

gem 'whenever', require: false

Then in your rake task, manually require the library: require 'whenever'. This will prevent the gem from being loaded into the server process on app boot, saving memory. Note that this trick is only effective in production when the dependent code isn't in your eager load paths.

Turn off unused Rails components

Rails by default comes with many libraries that your app is unlikely to be using, and which may represent moderate memory savings if disabled. You can disable them from loading and only require the ones you need. For example, maybe your app doesn't use ActiveStorage, ActiveJob, and ActionCable. So don't load them. In config/application.rb:

Replace require "rails/all" with:

  # active_storage/engine
  # active_job/railtie
  # action_cable/engine
  # action_mailbox/engine
  # action_text/engine
  # rails/test_unit/railtie
  %w(
    active_record/railtie
    action_controller/railtie
    action_view/railtie
    action_mailer/railtie
  ).each do |railtie|
    begin
      require railtie
    end
  end

As you can see, I like to keep all the available railties commented out so they can be easily re-enabled later if needed. Once you've removed some of the railties you may need to remove the corresponding configuration options to get the app to boot again.

Warning! You can see the effect of this change using derailed_benchmarks, but only if you use dynamic app benchmarking. The static benchmarking won't notice this change because it only analyzes your Gemfile, so it has no way of knowing that not all of Rails is loaded.

Offload static assets to a CDN

Ideally, don't serve static assets from your web server at all, and put a CDN like Cloudflare or AWS Cloudfront in front of it. This saves you CPU and IO cycles to serve dynamic requests.

Minimize rack middleware

Rails is a rack app. Rack is a ruby web application interface, and works using middleware apps that work in a "pipeline" configuration. Each middleware acts as a linked list node that wraps the next node: it runs some code and calls the next piece of middleware. Rails is the last code to run. Note that this design means order matters! Some middleware (such as static fileserving, caching, or DOS protection libraries) may decide to return a response early without calling some of the later middleware, which means Rails and other middleware can get skipped.

Here's an example command to check what middleware is loaded in production for a sample rails 7 app:

$ SECRET_KEY_BASE=foo RAILS_ENV=production rails middleware

use ActionDispatch::HostAuthorization
use Rack::Sendfile
use ActionDispatch::Executor
use Rack::Runtime
use Rack::MethodOverride
use ActionDispatch::RequestId
use ActionDispatch::RemoteIp
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use ActionDispatch::Callbacks
use ActionDispatch::Cookies
use ActionDispatch::Session::CookieStore
use ActionDispatch::Flash
use ActionDispatch::ContentSecurityPolicy::Middleware
use ActionDispatch::PermissionsPolicy::Middleware
use Rack::Head
use Rack::ConditionalGet
use Rack::ETag
use Rack::TempfileReaper
use Warden::Manager
run MyApp::Application.routes

In a default Rails project, there is a handful of middleware I like to turn off even though they don't contribute significantly to latency (ActionDispatch::RequestId, Rack::Runtime, etc). In config/initializers/middleware.rb:

  Rails.application.config.middleware.delete Rack::Runtime
  Rails.application.config.middleware.delete ActionDispatch::RequestId

Gems commonly add middleware. Examples include telemetry middleware, DOS protection like rack-attack, and popular gems like omniauth (which is why you don't need to edit your routes file to use it). Omniauth is particularly bad because each individual provider is a separate middleware, so if you have, say, 50 of them, this ends up becoming a significant overhead on EVERY Rails request, even though those 50 middlewares are no-oping most of the time. One solution I've written for this was a hacky piece of rack middleware that would check the request path and skip all the omniauth middleware if the request couldn't possibly be an oauth request. This saved something like 10ms on every request.

Optimize routes

Rails attempts to match the current request method and path against every route defined in your routes.rb file from top to bottom. This has two implications:

  1. Fewer routes is better. So clean up dead routes, and avoid declaring routes that don't lead anywhere (e.g., don't use the shortcuts that generate 7 routes at a time if you really only need 1 or 2; use the :only option to avoid bloat).
  2. The order of your routes matters. Ideally, your routes are laid out and grouped in some logical way. However, for performance reasons, you actually want to sort your routes so the most popular endpoints appear first. Beware, of course, that the order of routes is important and can affect correctness.

You can audit your routes with rails routes.

In my own benchmarking, this was not a big factor usually, but if you're looking to squeeze out the last ms from everything, it's worth considering if you have an app with hundreds of routes.

Fix common app performance issues

Find and fix your n+1 queries. Using :includes to preload associated models on index pages will most likely lead to large performance gains by reducing roundtrips to the database. The bullet gem can help with this.

Minimize roundtrips to the database: sometimes (but not always) it's better to return all your data in one roundtrip by JOINing the data you need in related tables instead of fetching in another query. This is true even if you've optimized away an n+1 situation.

Use :pluck to fetch data from the database if you don't need ActiveRecord models. This will save you both memory and CPU cycles by not constructing heavyweight ActiveRecord objects if you just need primitives (like an array of strings or an array of numbers, etc). This can provide a minor performance improvement.

Avoid SELECT *: It's better to fetch exactly the data you need rather than all columns. Using SELECT *, which is the default, ends up being slower in at least 4 different ways:

Don't be lazy and specify exactly the columns you need in your hottest code paths:


  @users = User.select('username, email, id').where(...)

Run less ruby code

Don't filter objects from the database in app code: whenever possible, use WHERE clauses to fetch only the records you need. It's typically a code smell to load a bunch of records from the DB and then filter them in memory because it's doubly bad: You've not only made the database fetch and transmit extra records you weren't ultimately going to use, you also made yourself extra work in ruby code filtering it out and allocating a new array to copy the results to.

Minimize heavy use of partials and helpers: these add extra overhead that show up when you're chasing those last few milliseconds. Common culprits include helpers like link_to -- which starts to add up if you call it dozens of times.

Run fewer, faster SQL queries

Make your database queries fast. This generally boils down to a) use indexes appropriately, and b) minimize data stored and fetched.

Generally speaking, make sure columns referenced in WHERE clauses are indexed. There are rare exceptions to this rule: for example, when your write load is much higher than the read load, and you don't care about the performance of the (rare) read queries. Unnecessary indexes will take up disk space and significantly hamper write performance.

Use compound indexes if two columns are always referenced together in a WHERE clause. While this is not common, this will be slightly more efficient than using 2 indexes.

Avoid duplicate SQL queries: with associations, make sure you are specifying :inverse_of when necessary to avoid fetching the same row more than once in a given request. If you reference or load the same model more than once, make sure you're not going to the database multiple times by explicitly reusing the reference in your code.

Avoid relying on the ActiveRecord query cache: it's commonly a code smell to rely on this. It hides sloppy coding and design, and it wastes memory and cycles populating/checking/flushing the cache. Consider the example given in the Rails guide:


class ProductsController < ApplicationController

  def index
    # Run a find query
    @products = Product.all

    # ...

    # Run the same query again
    @products = Product.all
  end

end

Sure, this allows Product.all to be called twice and only result in one SQL query. Or...Just fix your duplicate SQL queries, and disable the query cache.

There are limited exceptions to this, based on how your app is architected. Sometimes a plugin system means that lots of subsystems are (by design) completely isolated from each other, so not all code knows what other code is going to run.

Minimize joins

Most 1-1 database relationships can and should be consolidated into a single wide table for better performance. This eliminates an entire redundant primary key and the need to join 2 tables. And if you avoid SELECT * you don't pay a query penalty where you don't need the columns. One (usually manageable) downside is that this can make the application code a bit messier if the two tables really did represent 2 distinct logical entities.

Cache expensive operations

Using redis or memcached to cache content and completely avoid SQL queries is usually worth it, but you'll want to cache in the largest chunks you can to minimize roundtrips. When talking to Redis or Memcached it's critical to avoid n+1 issues (just like with your SQL database), which can be more subtle and easier to miss because the roundtrips are more lightweight than SQL queries, and don't show up in the development logs by default. Caching parts of a page involving multiple records can be nontrivial, as it requires calculating a cache key so that you can correctly invalidate the cache at the right time. Be sure that gathering what you need to calculate this does not exceed the benefit of the actual caching, which requires benchmarking and profiling.

In specific scenarios that require maximum absolute performance, it might make sense to consider ActiveSupport::Cache::MemoryStore. This keeps cache entries in memory in the same Ruby process as the Rails server, which is extremely fast since there's no network boundary to traverse. This has a couple significant tradeoffs:

Make sure you understand the race_condition_ttl and enable it, as under high load this can cause a big load spike right when a popular cache key expires.

Reduce log verbosity

Lowering the log level to the absolute minimum you need is a great way to get some free performance back. Making your application code less chatty and consolidating logging calls also doesn't hurt.

Tune server settings

The current Rails default web server, puma, is fantastic when correctly configured. Usually your goal is to get the best performance per dollar spent, and to do that you'll want to maximize CPU utilization per MB of memory budget, since Rails is notoriously memory-hungry.

Under MRI, each process has a global interpreter lock, which means only one thread can run on the CPU at a time; therefore it can only saturate a single CPU core at most. So under MRI, you will want about 1 process per dedicated physical CPU core. Consider this simplified table:

Free memory is lowFree memory is high
CPU usage lowReduce process count. Increase threads until memory is exhausted. Memory may be underprovisioned or CPU overprovisioned. Increase thread count until latencies begin increasing, then increase process count if CPU usage doesn't respond to more threads.
CPU usage highIdealMemory is probably overprovisioned.

The number of threads you need will depend on your application characteristics, mostly the amount of I/O your application is doing. Most Rails apps spend I/O in 1) database calls or 2) networking calls. Generally, you should add threads to increase CPU usage and keep the processor busy while other threads are waiting on I/O. However, additional threads will use more memory since they each have their own stack and will allocate memory to run controller code and render views.

Context switching between threads is expensive, and minimizing it allows the time to be better spent making forward progress serving requests. Optimize your database calls to reduce the number of threads you need to keep the CPU busy.

Usually your database connection pool should exactly equal the number of threads in your Rails process (both are controlled by RAILS_MAX_THREADS). This is the default. You will obviously need to stay below the maximum connection limit your database allows. You may be able to raise the limit, but if your database is the bottleneck to your application performance (e.g., its either constrained by CPU, IO, or memory) then this will not help, and will very likely make things worse. If your database is not the bottleneck and the connection limit has not been reached, you can add more Rails processes / threads to continue scaling up traffic.

If you're only running one process because you don't have multiple cores available, make sure you boot puma in "single" mode so it doesn't spawn two separate processes which will waste memory for no benefit.

Try tuning your glibc memory behavior to trade off between memory and latency with MALLOC_ARENA_MAX. Increase the number of memory pools to reduce lock contention from threads, or reduce the number of pools to minimize memory fragmentation and waste.

Take advantage of copy-on-write

If you're running multiple puma processes in a cluster, make sure you preload the app before forking. This loads all framework and application code first before creating new processes, taking advantage of copy-on-write semantics to share the Rails and application code in the same physical memory among multiple processes.

Because of this copy-on-write semantic, in general, it is best to fork as many puma processes as possible on the smallest number of machines/containers:

# of processesapp code in RAM / # processesMemory saved (vs 1 proc / container)
1 process1no reduction
2 processes1/250% reduction
3 processes1/366% reduction
4 processes1/475% reduction

For example, it is usually a win to run a single container with a cluster of 8 puma processes, as opposed to 2 containers each with a cluster of 4 puma processes. This will be constrained by your high-availability needs though.