Reduce Ruby on Rails memory usage and decrease latency with this comprehensive collection of performance tips from almost 15 years working in Rails.
Instrument your app in production and spend the majority of your time on the endpoints that are the busiest. Due to Amdahl's law there's almost no reason to spend huge amounts of effort on an endpoint that (for example) your servers are only spending ~2% time on. Even if you made that endpoint instantaneous it would only represent a 2% improvement at the most.
Profiling: derailed_benchmarks is a great gem for profiling Rails to show where memory budget is going. You can also use this to do before/after comparisons to check memory savings and performance improvements from specific changes.
Use gems like fast_blank and oj (if you're parsing or emitting JSON) to reduce ruby object allocations and CPU cycles. If you're using redis significantly, especially bulk/pipelined operations, consider ruby-hiredis which wraps a client written in C.
These are essentially "free" and can reduce memory pressure, as well as save you latency.
getpidcall in glibc got slower around 2017, because it was updated to stop caching results. The PIDCache gem can safely cache this on the ruby side, and significantly reduce latency under linux. The readme also explains at a high level why the cache was removed from glibc, and why it is safe to cache in ruby.
If larger app code changes are on the table, you can sometimes go a step further and intentionally pick leaner libraries for certain classes of problems. For example here is a comparison of memory usage among some popular HTTP libraries:
However note that HTTP is a really tough example to implement in nontrivial apps, since you have to worry about transitive dependencies of all your other libraries too.
Sometimes using a gem to accomplish something is just not worth it. Open source libraries are by definition built to support many use cases and sometimes you just need a single 20-line class in /lib to accomplish what you need. Watch out for convenience gems like Hashie which patch common core classes and can contribute a lot of memory usage.
Make sure you aren't accidentally loading development/test gems in production mode - make sure they are added to the intended
:group in your Gemfile. This can be the cause of wasted memory or CPU cycles, or worse, security issues.
Sometimes you need a gem but only in a rake task or a background job. In that case you should make sure its not automatically loaded in your Gemfile:
gem 'whenever', require: false
Then in your rake task, manually require the library:
require 'whenever'. This will prevent the gem from being loaded into the server process on app boot, saving memory. Note that this trick is only effective in production when the dependent code isn't in your eager load paths.
Rails by default comes with many libraries that your app is unlikely to be using, and which may represent moderate memory savings if disabled. You can disable them from loading and only require the ones you need. For example, maybe your app doesn't use ActiveStorage, ActiveJob, and ActionCable. So don't load them. In
# active_storage/engine # active_job/railtie # action_cable/engine # action_mailbox/engine # action_text/engine # rails/test_unit/railtie %w( active_record/railtie action_controller/railtie action_view/railtie action_mailer/railtie ).each do |railtie| begin require railtie end end
As you can see, I like to keep all the available railties commented out so they can be easily re-enabled later if needed. Once you've removed some of the railties you may need to remove the corresponding configuration options to get the app to boot again.
Warning! You can see the effect of this change using derailed_benchmarks, but only if you use dynamic app benchmarking. The static benchmarking won't notice this change because it only analyzes your Gemfile, so it has no way of knowing that not all of Rails is loaded.
Here's an example command to check what middleware is loaded in production for a sample rails 7 app:
$ SECRET_KEY_BASE=foo RAILS_ENV=production rails middleware
In a default Rails project, there is a handful of middleware I like to turn off even though they don't contribute significantly to latency (ActionDispatch::RequestId, Rack::Runtime, etc). In
use ActionDispatch::HostAuthorization use Rack::Sendfile use ActionDispatch::Executor use Rack::Runtime use Rack::MethodOverride use ActionDispatch::RequestId use ActionDispatch::RemoteIp use Rails::Rack::Logger use ActionDispatch::ShowExceptions use ActionDispatch::DebugExceptions use ActionDispatch::Callbacks use ActionDispatch::Cookies use ActionDispatch::Session::CookieStore use ActionDispatch::Flash use ActionDispatch::ContentSecurityPolicy::Middleware use ActionDispatch::PermissionsPolicy::Middleware use Rack::Head use Rack::ConditionalGet use Rack::ETag use Rack::TempfileReaper use Warden::Manager run MyApp::Application.routes
Rails.application.config.middleware.delete Rack::Runtime Rails.application.config.middleware.delete ActionDispatch::RequestId
Gems commonly add middleware. Examples include telemetry middleware, DOS protection like rack-attack, and popular gems like omniauth (which is why you don't need to edit your routes file to use it). Omniauth is particularly bad because each individual provider is a separate middleware, so if you have, say, 50 of them, this ends up becoming a significant overhead on EVERY Rails request, even though those 50 middlewares are no-oping most of the time. One solution I've written for this was a hacky piece of rack middleware that would check the request path and skip all the omniauth middleware if the request couldn't possibly be an oauth request. This saved something like 10ms on every request.
Rails attempts to match the current request method and path against every route defined in your routes.rb file from top to bottom. This has two implications:
:onlyoption to avoid bloat).
You can audit your routes with
In my own benchmarking, this was not a big factor usually, but if you're looking to squeeze out the last ms from everything, it's worth considering if you have an app with hundreds of routes.
Don't use Haml, Slim, or related libraries. They are much slower than ERB and use more memory. NEVER use inline coffeescript, a feature of Haml and some other gems - this will cause the coffeescript compiler to boot and run for EVERY page render that has inline coffeescript.
Minimize heavy use of partials and helpers: these add extra overhead that show up when you're chasing those last few milliseconds. Common culprits like
link_to start to add up if you call them a lot.
Generally speaking, make sure columns referenced in WHERE clauses are indexed. There are rare exceptions to this rule: for example, when your write load is much higher than the read load, and you don't care about the performance of the (rare) read queries. In that case unnecessary indexes will just take up disk space and significantly hamper write performance.
Use compound indexes if two columns are always referenced together in a WHERE clause. While this is not common, this will be slightly more efficient than using 2 indexes.
Minimize your 1-1 relationships and their associated joins. Most 1-1 database relationships can and should be consolidated into a single wide table for better performance. This eliminates an entire redundant primary key and the need to join 2 tables. And if you avoid
SELECT * you don't pay a query penalty where you don't need the columns. One (usually manageable) downside is that this can make the application code a bit messier if the two tables really did represent 2 distinct logical entities.
Make your database queries fast. This generally boils down to a) use indexes appropriately, and b) minimize data stored and fetched.
Avoid duplicate SQL queries: with associations, make sure you are specifying :inverse_of when necessary to avoid fetching the same row more than once in a given request. If you reference or load the same model more than once, make sure you're not going to the database multiple times by explicitly reusing the reference in your code. This includes counts!
Avoid relying on the ActiveRecord query cache: it's commonly a code smell to rely on this. It hides sloppy coding and design, and it wastes memory and cycles populating/checking/flushing the cache. Consider the example given in the Rails guide:
class ProductsController < ApplicationController def index # Run a find query @products = Product.all # ... # Run the same query again @products = Product.all end end
Sure, this allows
Product.all to be called twice and only result in one SQL query. Or...Just fix your duplicate SQL queries, and disable the query cache.
There are limited exceptions to this, based on how your app is architected. Sometimes a plugin system means that lots of subsystems are (by design) completely isolated from each other, so not all code knows what other code is going to run. In that case, it might make sense to use the query cache in your app.
Don't filter results from the database in app code: whenever possible, use WHERE clauses to fetch only the records you need. It's typically a code smell to load a bunch of records from the DB and then filter them in memory because it's doubly bad: You've not only made the database fetch and transmit extra records you weren't ultimately going to use, you also made yourself extra work in ruby code filtering it out and allocating a new array to copy the results to.Find and fix your n+1 queries. Using
:includesto preload associated models on index pages will most likely lead to large performance gains by reducing roundtrips to the database. The bullet gem can help with this.
Don't load associations unless you really need them. Instead of
@book.author.present? you can sometimes get away with
@book.author_id.present? instead, which might save you a database call. This pattern is also a frequent source of n+1 queries. The tradeoff with this approach is that the association might be deleted or invalid, but this might not be a real problem depending on whether you have added foreign key constraints, what isolation level your database is running at, and how your app is designed.
Minimize roundtrips to the database: Often (but not always!) it's better to return all your data in one roundtrip by JOINing the data you need in related tables instead of fetching in another query. This is true even if you've optimized away an n+1 situation.
:pluck to fetch data from the database if you don't need ActiveRecord models. This will save you both memory and CPU cycles by not constructing heavyweight ActiveRecord objects if you just need primitives (like an array of strings or an array of numbers, etc). This can provide a minor performance improvement.
SELECT *: It's better to fetch exactly the data you need rather than all columns. Using
SELECT *, which is the default, ends up being slower in at least 4 different ways:
So don't be lazy and specify exactly the columns you need in your hottest code paths:
@users = User.select('username, email, id').where(...)
Note, however, that this doesn't matter in the majority of codepaths.
Don't COUNT unless you need the count: Sometimes you want to know if a collection is empty or not, but you don't actually need the precise count. In that case, use
exists? instead of
count. This is faster because the database can return early once it knows the result is nonempty.
Never load an entire table into memory: When iterating over any collection, you never want to load all the records into memory at once. Instead, you should use
find_in_batches to iterate over the records in batches. This will reduce the working memory needed for the request, often significantly. For example, if you have 10 million records, you can iterate over them in batches of 1000 at a time, instead of loading them all into memory at once, which is a 10,000x reduction in memory use.
This can be a sneaky problem because its hard to catch unless you're testing with sufficiently large test data in dev or pre-production environments. You can use safe_query to catch unintentional queries like this in development.
There is at least one major caveat with
find_in_batches: sorting is ignored. Under the covers, Rails implements the required batching and pagination by sorting on the primary key. If you need to load a collection in batches and a custom sort is required, you will need to reimplement this behavior yourself (and you'll want to be extra extra sure that your sort key is indexed).
There are no exceptions to this rule. Sometimes inexperienced developers do this accidentally, e.g., "I want to sum up all the pageviews of all these pages." Do the work in the database:
SELECT SUM(page_view_count) from pages WHERE ....
Using redis or memcached to cache content and completely avoid SQL queries is usually worth it, but you'll want to cache in the largest chunks you can to minimize roundtrips. When talking to Redis or Memcached it's critical to avoid n+1 issues (just like with your SQL database), which can be more subtle and easier to miss because the roundtrips are more lightweight than SQL queries, and don't show up in the development logs by default. Caching parts of a page involving multiple records can be nontrivial, as it requires calculating a cache key so that you can correctly invalidate the cache at the right time. Be sure that gathering what you need to calculate this does not exceed the benefit of the actual caching, which requires benchmarking and profiling.
In specific scenarios that require maximum absolute performance, it might make sense to consider
ActiveSupport::Cache::MemoryStore. This keeps cache entries in memory in the same Ruby process as the Rails server, which is extremely fast since there's no network boundary to traverse. This has a couple significant tradeoffs:
Make sure you understand the
race_condition_ttl and enable it, as under high load this can cause a big load spike right when a popular cache key expires.
Lower the log level setting in production to the absolute minimum you need. After that, consider consolidating extraneous logging calls to make the code less chatty overall. This a great way to get some free performance back. At sufficient scale this will lead to cost savings in your logging system too.
The current Rails default web server, puma, is fantastic when correctly configured. Usually your goal is to get the best performance per dollar spent, and to do that you'll want to maximize CPU utilization per MB of memory budget, since Rails is notoriously memory-hungry.
Under MRI, each process has a global interpreter lock, which means only one thread can run on the CPU at a time; therefore it can only saturate a single CPU core at most. So under MRI, you will want about 1 process per dedicated physical CPU core. Consider this simplified table:
|Free memory is low||Free memory is high|
|CPU usage low||Reduce process count. Increase threads until memory is exhausted. Memory may be underprovisioned or CPU overprovisioned.||Increase thread count until latencies begin increasing, then increase process count if CPU usage doesn't respond to more threads.|
|CPU usage high||Ideal||Memory is probably overprovisioned.|
The number of threads you need will depend on your application characteristics, mostly the amount of I/O your application is doing. Most Rails apps spend I/O in 1) database calls or 2) networking calls. Generally, you should add threads to increase CPU usage and keep the processor busy while other threads are waiting on I/O. However, additional threads will use more memory since they each have their own stack and will allocate memory to run controller code and render views.
Context switching between threads is expensive, and minimizing it allows the time to be better spent making forward progress serving requests. Optimize your database calls to reduce the number of threads you need to keep the CPU busy.
Usually your database connection pool should exactly equal the number of threads in your Rails process (both are controlled by
RAILS_MAX_THREADS). This is the default. You will obviously need to stay below the maximum connection limit your database allows. You may be able to raise the limit, but if your database is the bottleneck to your application performance (e.g., its either constrained by CPU, IO, or memory) then this will not help, and will very likely make things worse. If your database is not the bottleneck and the connection limit has not been reached, you can add more Rails processes / threads to continue scaling up traffic.
If you're only running one process because you don't have multiple cores available, make sure you boot puma in "single" mode so it doesn't spawn two separate processes which will waste memory for no benefit.
Try tuning glibc's memory allocator behavior to trade off between memory and latency with
MALLOC_ARENA_MAX. Increase the number of memory pools to reduce lock contention from threads, or reduce the number of pools to minimize memory fragmentation and waste. Most configurations benefit from
MALLOC_ARENA_MAX=2 as it represents a good tradeoff between space wasted and contention -- 1 causes lots of bottlenecking without saving much memory, and more than 2 often wastes memory without significantly improving lock waits.
Also, consider using jemalloc as the allocator, which often significantly reduces memory use with an algorithm that minimizes fragmentation. With the LD_PRELOAD flag you can use jemalloc without recompiling ruby. Here is an example Heroku buildpack that installs jemalloc into the compiled slug.
If you're running multiple puma processes in a cluster, make sure you preload the app before forking. This loads all framework and application code first before creating new processes, taking advantage of copy-on-write semantics to share the Rails and application code in the same physical memory among multiple processes.
Because of this copy-on-write semantic, in general, it is best to fork as many puma processes as possible on the smallest number of machines/containers:
|# of processes||app code in RAM / # processes||Memory saved (vs 1 proc / container)|
|1 process||1||no reduction|
|2 processes||1/2||50% reduction|
|3 processes||1/3||66% reduction|
|4 processes||1/4||75% reduction|
For example, it is usually a win to run a single container with a cluster of 8 puma processes, as opposed to 2 containers each with a cluster of 4 puma processes. This will be constrained by your high-availability needs though.