Your Mastodon instance is growing. Federation traffic climbs, media storage fills up, and Sidekiq queues start stretching longer than they should. This page sets out the scaling work that instance admins actually do in 2026: caching, object storage, database tuning, Sidekiq configuration, and CDN setup, with enough operational detail to make the next change in the right place.
Where Mastodon usually hits the wall first
Before changing any configuration, work out which layer is under pressure. That sounds obvious. It is also where people waste the most time.
Sidekiq is usually the first place you feel it. Federation delivery, media processing, email, and scheduled jobs all run through it. When those queues back up, federation slows and users notice the lag before the admin dashboard does. PostgreSQL is normally next. It holds user data, posts, relationships, and timeline generation, and the more complex timeline queries for accounts with thousands of follows can drag response times down even under moderate load.
Media storage is the slow-burn problem. Every federated post with attachments adds files somewhere, either on your server or fetched on demand, and the volume grows faster than most admins expect. Leave cleanup too long and it becomes a chore. The web process and the streaming process both scale with concurrent users, but they rarely become the first bottleneck. They matter once the other layers are under control.
Caching that actually earns its keep
Getting Redis right early
Mastodon uses Redis for session storage, Sidekiq queues, timeline caching, and rate limiting. A few settings are worth getting right from day one. Watch Redis memory use and set a sensible maxmemory limit - running out silently is worse than hitting a cap loudly. If Redis is shared with other services, a dedicated instance avoids unpredictable contention that can be surprisingly hard to diagnose later.
Splitting Redis into two instances is worth considering: one for cache data, one for persistent Sidekiq queues. A cache flush then cannot drain the job queue with it. Enable RDB snapshots for persistence, but tune snapshot frequency carefully - too aggressive and you get periodic CPU spikes at awkward moments.
HTTP caching without getting clever
Static assets - CSS, JS, images - should carry long cache headers. Mastodon fingerprints its assets, so aggressive caching is safe here. For API responses, check that your Nginx reverse proxy respects the cache headers Mastodon sends instead of stripping them.
The other high-value target is the media proxy cache. If Nginx caches proxied media from remote instances, repeated fetches drop away. That matters once you federate with busy servers. Without it, you keep paying for the same remote file over and over.
Application-level caching and request pressure
Mastodon caches timelines, account relationships, and other frequently read data in Redis. The main levers are MAX_THREADS for the web process, where higher values handle more concurrent requests, PostgreSQL connection pooling to cut per-request overhead, and the streaming API connection limit. Set that last one to match real concurrent users, not the number you hope to hit one day.
Moving media to object storage
For a growing instance, moving from local disk to object storage is not optional. It is one of the few changes where the payoff is immediate and the downside is close to zero.
The logic is straightforward. Your server disk stops filling up with media files, storage costs drop sharply, and S3-compatible storage is cheap per gigabyte compared with VPS block storage. It also fits naturally with a CDN, and there is no practical storage ceiling to plan around.
The setup is manageable. Choose a provider - AWS S3, Backblaze B2, Wasabi, or MinIO if you want self-hosted - create a bucket with the right access policies, then configure Mastodon’s .env.production with the storage credentials. Migrate existing local media with the tootctl media commands, then verify federation and media serving before calling it done.
Keeping object storage costs under control
Remote media cache is the main cost driver. Mastodon caches media from every federated post it processes, and on a well-connected instance that adds up fast. Use tootctl media remove to schedule periodic cleanup of old remote media - weekly is a reasonable starting point. Most providers also support lifecycle policies that delete cached media automatically after a set period, and some offer cold storage tiers for rarely accessed files. Both are worth setting up rather than leaving as manual chores.
PostgreSQL tuning that holds up under load
Connection pooling first
PgBouncer should be on every production Mastodon server. Putting a connection pooler between Mastodon and PostgreSQL reduces the number of direct database connections and handles bursts more gracefully than PostgreSQL can manage on its own. The setup effort is low. The payoff is steady.
Indexes and query health
Mastodon ships with sensible indexes, but they only stay useful if the database is maintained properly. Run VACUUM ANALYZE on a schedule, or tune autovacuum to do the work automatically. Slow query logging via pg_stat_statements will surface the statements worth investigating. On large instances, partial indexes tailored to actual usage patterns can help, but profile before adding them speculatively. Indexes added on a hunch often do less than expected and sometimes make writes slower.
Read replicas when the primary starts to sweat
For larger instances, a PostgreSQL read replica lets you offload timeline and search queries from the primary. Writes stay on the primary, read-heavy operations point to the replica. Once traffic is high enough to justify the setup, the reduction in primary database load is significant.
Maintenance and backups that are worth testing
Automate daily backups with pg_dump or WAL-based continuous backup. Test the restore procedure quarterly - not annually, quarterly. Watch disk usage and connection counts. Plan major PostgreSQL version upgrades well in advance, and test them thoroughly before touching production. An upgrade that goes wrong in staging is recoverable; one that goes wrong on a live instance is not.
Sidekiq tuning is where federation lives or dies
Sidekiq is the part that decides whether federation feels responsive or sluggish. Ignore it, and posts arrive late, outbound delivery falls behind, and nobody notices until the backlog is already hours deep.
Mastodon splits jobs across several queues. The default queue handles most federation and processing work. push manages outbound delivery, pull handles inbound content fetching, mailers covers email, and scheduler runs periodic tasks. Each queue has different throughput needs. Treating them as if they are interchangeable is a common mistake.
To add capacity, raise the thread count per Sidekiq process or run multiple processes with different queue assignments. The right split depends on traffic pattern. An instance that does heavy outbound federation needs more push capacity than one that mainly receives. Watch queue latency closely. If jobs regularly wait more than a few minutes, add capacity instead of hoping the backlog clears itself.
Four metrics are worth tracking consistently: queue depth, processing rate, error rate, and retry queue size. Queue depth shows how many jobs are waiting. Processing rate tells you how much work is actually moving. Error rate and retry queue size usually expose upstream problems before users do. A growing retry queue is often the first clue that something is broken, not just slow. Do not ignore it.
Our developer notes cover monitoring approaches in more detail.
CDN integration for static assets and media
A CDN takes real bandwidth load off the origin server and improves media delivery. For static assets, configure CDN_HOST in your environment and let the CDN pull from your server and cache globally. Users in distant regions get faster page loads without any per-request work on your part. For media, point the object storage bucket through the CDN, set the right cache headers, and optionally use a custom domain for cleaner URLs.
The bandwidth saving is often the most obvious gain at scale. CDN bandwidth is usually cheaper than origin bandwidth, and the CDN absorbs traffic spikes that would otherwise land directly on your server. On smaller instances the difference is marginal. On larger ones it becomes meaningful quickly.
Monitoring and alerting before things break
Set up monitoring before you need it, not after something has already broken. Prometheus and Grafana are the usual combination for server metrics covering CPU, RAM, disk, and network. Beyond infrastructure, watch application metrics such as request latency and error rates, database metrics including connection count and query performance, and Sidekiq queue depths.
Alerts should target the things that matter most: disk space, queue depth, and error rate. A 90% disk alert on your database server is worth ten dashboards you only glance at occasionally. The point is to learn about problems before your users do.
Mistakes that keep showing up
Waiting too long to move to object storage is the most common one. Starting with local disk and retrofitting later is painful because you are migrating live media while the instance is still running. If you can, start with object storage from the beginning.
Not watching Sidekiq queues is a close second. Federation delays are invisible until you look at queue depth, and by the time users complain the backlog can already be hours deep. Ignoring media cleanup is part of the same problem. Remote media cache can consume terabytes if left unmanaged, and it compounds quietly over months.
At the other end of the spectrum, over-optimising before you know where the bottleneck sits wastes time and occasionally makes things worse. Profile first, then act. And do not skip PgBouncer. Connection pooling is almost always worth the setup effort, and the cost of not having it tends to show up at the worst possible moment.
Frequently Asked Questions
How many users before I need to scale? Activity patterns matter more than raw user count. An instance with 100 very active users can need more resources than one with 500 casual users. Watch the actual metrics and scale on observed load, not headcount.
Is Docker or bare-metal better for scaling? Both work in practice. Docker simplifies deployment and makes it easier to scale individual services independently. Bare-metal gives you more direct control over the environment. For large instances, Kubernetes deployments are becoming more common. See our tools page for infrastructure resources.
How much does media storage cost? It varies by provider and instance activity. A moderately active instance might add 50-200 GB of new media per month. With regular remote media cleanup, total storage stays manageable. Budget from your growth rate, not a fixed estimate - the range is wide enough that a single number would mislead you.
Can I scale horizontally? Yes. Mastodon supports multiple web processes, streaming servers, and Sidekiq workers spread across multiple machines. The database is the main coordination point, which is why connection pooling and read replicas matter once you reach that scale.
When should I consider managed hosting instead? If infrastructure management is taking more time than you want to give it, managed hosting is a reasonable choice at any instance size - not just at large scale. Check our articles hub for hosting guides.
How do relays affect scaling? Relays increase federation traffic and media storage. They improve content diversity but add real load. Subscribe to relays carefully and monitor the impact on your fediverse instance’s resources - a busy relay can change Sidekiq throughput requirements noticeably, sometimes within hours of subscribing.