Mulitenancy

By Owen Diehl

May 22, 2022

Loki is multitenant by default, meaning that it can ingest, store, and query for different users (teams, organizations, etc) in the same process. This has some attractive benefits, namely that it’s much more economical to run a single database with multiple tenants than multiple databases with single tenants.

Loki does not have any sophisticated authentication on it’s own, so tenancy is determined by a special X-Scope-OrgID http header attached to all requests. A request with

X-Scope-OrgID: tenant1

will not be able to see or interfere with data from any other tenant.

One of the harder problems in building multi-tenant systems is building them in such a way that no one tenant can adversely affect other tenants (called the noisy neighbor problem). Loki handles this in a few ways:

Query Quality of Service

The first way Loki protects against noisy neighbors is by ensuring that no one tenant can consume all the querying capacity in a cluster. To do this, each tenant is given an independent queue where they can enqueue queries. Loki will then select a queue at random, dequeue a query, and process it on one of the querier replicas. Let’s look at a couple scenarios:

If the cluster is not under load and all tenants are enqueueing queries, they’ll all be processed.
If the cluster is under load and all tenants are enqueuing queries, they’ll all be subject to queueing delays or cancellations
If the cluster not under load and one tenant is enqueuing many queries, that tenant will be able to utilize the extra cluster capacity and all the queries will be processed.
If the cluster is under moderate load and one tenant is enqueueing many queries, only that tenant will see queue delays/cancellations; the other tenants will not.

In this way, the cluster can be scaled according to overall read usage and it will automatically limit tenants trying to consume more than their fair share of resources, but only when resources are in high demand. In low-demand scenarios, a single tenant can be allowed to take advantage of this extra processing power.

Per-tenant limits

Not all tenants need be the same. It’s common for some tenants to be much bigger than others and therefore be treated differently.

Loki supports a reloadable overrides configuration file which can specify these differences:

overrides:
  medium_user:
    ingestion_rate_mb: 10  # 10MB/s =~ 25TB/month
    ingestion_burst_size_mb: 20 # biggest payload
    max_query_parallelism: 32 # each query can execute 32 subqueries in parallel
    max_global_streams_per_user: 5000
    split_queries_by_interval: '30m' # each query can be split into 30m intervals and executed in parallel
  big_user:
    ingestion_rate_mb: 30  # ~75TB/month
    ingestion_burst_size_mb: 40
    max_query_parallelism: 64 # each query can execute 64 subqueries in parallel
    max_global_streams_per_user: 40000
    split_queries_by_interval: '15m' # each query can be split into 15m intervals and executed in parallel

In the above example, Loki will accept up to 10MB/s for medium_user, over which it will rate limit/reject writes. The big_user, however, won’t see rate limits until past 30MB/s.

Bonus: these files can also be edited/redeployed and Loki will notice the changes without needing to restart!