Object Storage actually designates a variety of data storage mechanisms. In our case, we actually refer to the ad-hoc standard developed under the Amazon S3 umbrella.

This page particularly documents the MinIO server (minio.torproject.org, currently a single-server minio-01.torproject.org) managed by TPA, mainly for GitLab's Docker registry, but it could eventually be used for other purposes.

[[TOC]]

Tutorial

Access the web interface

Note: The web interface was crippled by upstream on the community edition, removing all administrative features. The web interface is now only a bucket browser (and it can be used to create new buckets for the logged-in user)

To see if the service works, you can connect to the web interface through https://minio.torproject.org:9090 with a normal web browser.

If that fails, it means your IP address is not explicitly allowed. In that case, you need to port forward through one of the jump hosts, for example:

ssh -L 9090:minio.torproject.org:9090 ssh-fsn.torproject.org

In case you go through a jump host, the interface will be available on localhost, obviously: https://localhost:9090. In that case, web browsers will yield a certification name mismatch warning which can be safely ignored. See Security and risk assessment for a discussion on why that is setup that way.

For TPA, the username is admin and the password is in /etc/default/minio on the server (currently minio-01). You should use that account only to create or manage other, normal user accounts with lesser access policies. See authentication for details.

For others, you should have be given a username and password to access the control panel. If not, ask TPA!

Configure the local mc client

Note: this is necessary only if you are not running mc on the minio server directly. If you're admin, you should run mc on the minio server to manage accounts, and this is already configured. Do not* setup the admin credentials on your local machine.*

You must use the web interface (above) to create a first access key for the user.

Then record the access key on your account with:

mc alias set minio-01 https://minio-01.torproject.org:9000

This will prompt you for an access key and secret. This is the username and client provided by TPA, and will be saved in your ~/.mc directory. Ideally, you should create an access key specifically for the device you're operating from in the web interface instead of storing your username and password here.

If you don't already have mc installed, you can run it from containers. Here's an alias that will configure mc to run that way:

alias mc="podman run --network=host -v $HOME/.mc:/root/.mc --rm --interactive quay.io/minio/mc"

One thing to keep in mind if you use minio-client through a container like the above, is that any time the client needs to access a file on local disk (for example a file you would like to put to a bucket or a json policy file that you wish to import) the files should be accessible from within the container. With the above command alias the only place where files from the host can be accessed from within the container is under ~/.mc on the host so you'll have to move files there and then specify a path starting with /root/.mc/ to the minio-client.

Further examples below will use the alias. A command like that is already setup on minio-01, as the admin alias:

mc alias set admin https://minio-01.torproject.org:9000

Note that Debian trixie and later ship the minio-client package which can be used instead of the above container, with the minio-client binary. In that case, the alias becomes:

alias mc=minio-client

Note that, in that case, credentials are stored in the ~/.minio-client/ directory.

A note on "aliases"

Above, we define an alias with mc alias set. An alias is essentially a combination of a MinIO URL and an access token, with specific privileges. Therefore, multiple aliases can be used to refer to different privileges on different MinIO servers.

By convention, we currently use the admin alias to refer to a fully-privileged admin access token on the local server.

In this documentation, we also use the play alias which is pre-configured to use the https://play.min.io remote, a demonstration server that can be used for testing.

Create an access key

To create an access key, you should login the web interface with a normal user (not admin, authentication for details) and create a key in the "Access Keys" tab.

An access key can be created for another user (below gitlab) on the commandline with:

mc admin user svcacct add admin gitlab

This will display the credentials in plain text on the terminal, so watch out for shoulder surfing.

The above creates a token with a random name. You might want to use a human-readable one instead:

mc admin user svcacct add admin gitlab --access-key gl-dockerhub-mirror

The key will inherit the policies established above for the user. So unless you want the access key have the same access as the user, make sure to attach a policy to the access key. This, for example, is an access policy that limits the above access key to the gitlab-dockerhub-mirror bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BucketAccessForUser",
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "arn:aws:s3:::gl-dockerhub-mirror",
        "arn:aws:s3:::gl-dockerhub-mirror/*"
      ]
    }
  ]
}

You can attach it on creation with:

minio-client admin user svcacct add admin gitlab --access-key gl-dockerhub-mirror --policy gl-dockerhub-mirror.json

... or modify an existing key to add that policy with:

minio-client admin user svcacct edit admin gl-dockerhub-mirror --policy gl-dockerhub-mirror.json

If you have just created a user, you might want to add an alias for that user on the server as well, so that future operations can be done through that user instead of admin, for example:

mc alias set gitlab http://minio-01.torproject.org:9000

Create a bucket

A bucket can be created on a MinIO server using the mc commandline tool.

WARNING: you should NOT create buckets under the main admin account. Create a new account for your application as admin, then as that new account, create a specific access key, as per above.

The following will create a bucket named foo on the play server:

root@minio-01:~# mc mb play/foo
Bucket created successfully `foo`.

Try creating the same bucket again, to confirm it really exists, it should fail like this:

root@minio-01:~# mc mb play/foo
mc: <ERROR> Unable to make bucket `local/foo`. Your previous request to create the named bucket succeeded and you already own it.

You should also see the bucket in the web interface.

Here's another example, where we create a gitlab-registry bucket under the gitlab account:

mc mb gitlab/gitlab-registry

Listing buckets

You can list the buckets on the server with mc ls $ALIAS:

root@minio-01:~/.mc# mc ls gitlab
[2023-09-18 19:53:20 UTC]     0B gitlab-ci-runner-cache/
[2025-02-19 14:15:55 UTC]     0B gitlab-dependency-proxy/
[2023-07-19 15:23:23 UTC]     0B gitlab-registry/

Note that this only shows the buckets visible to the configured access token!

Adding/removing objects

Objects can be added to a foo bucket with mb put:

mb put /tmp/localfile play/foo

and, of course, removed with rm:

mb rm play/foo/localfile

Remove a bucket

To remove a bucket, use the rb command:

mc rb play/foo

This is relatively safe in that it only supports removing an empty bucket, unless --force is used. You can also recursively remove things with --recurse.

Use rclone as an object storage client

The incredible rclone tool can talk to object storage and might be the easiest tools to do manual changes to buckets and object storage remotes in general.

First, You'll need an access key (see above) to configure the remote. This can be done interactively with:

rclone config

Or directly on the commandline with something like:

rclone config create minio s3 provider Minio endpoint https://minio.torproject.org:9000/ access_key_id test secret_access_key [REDACTED]

From there you can do a bunch of things. For example, list existing buckets with:

rclone lsd minio:

Copying a file in a bucket:

rclone copy /etc/motd minio:gitlab

The file should show up in:

rclone ls minio:gitlab

See also the rclone s3 documentation for details.

How-to

Create a user

To create a new user, you can use the mc client configured above. Here, for example, we create a gitlab user:

mc admin user add admin/gitlab

(The username, above, is gitlab, not admin/gitlab. The string admin is the "alias" defined in the "Configure the local mc client" step above.)

By default, a user has no privileges. You can grant it access by attaching a policy, see below.

Typically, however, you might want to create an access key instead. For example, if you are creating a new bucket for some GitLab service, you would create an access key under the gitlab account instead of an entirely new user account.

Define and grant an access policy

The default policies are quite broad and give access to all buckets on the server, which is almost as the admin user except for the admin:* namespace. So we need to make a bucket policy. First create a file with this JSON content:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
            "s3:*"
        ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::gitlab/*", "arn:aws:s3:::gitlab"
      ],
      "Sid": "BucketAccessForUser"
    }
  ]
}

This was inspired by Jai Shri Ram's MinIO Bucket Policy Notes, but we actually grant all s3:* privileges on the given gitlab bucket and its contents:

  • arn:aws:s3:::gitlab grants bucket operations access, such as creating the bucket or listing all its contents

  • arn:aws:s3:::gitlab/* grants permissions on all the bucket's objects

That policy needs to be fed to MinIO using the web interface or mc with:

mc admin policy create admin gitlab-bucket-policy /root/.mc/gitlab-bucket-policy.json

Then the policy can be attached an existing user with, for example:

mc admin policy attach admin gitlab-bucket-policy --user=gitlab

So far, the policy has been that a user foo has access to a single bucket also named foo. For example, the network-health user has this policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
            "s3:*"
        ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::network-health/*", "arn:aws:s3:::network-health"
      ],
      "Sid": "BucketAccessForUser"
    }
  ]
}

Policies like this can also be attached to access tokens (AKA service accounts).

Possible improvements: multiple buckets per user

This policy could be relaxed to allow more buckets to be created for the user, for example by granting access to buckets prefixed with the username, for example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
            "s3:*"
        ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::foo/*", "arn:aws:s3:::foo",
        "arn:aws:s3:::foo*/*", "arn:aws:s3:::foo*/*"
      ],
      "Sid": "BucketAccessForUser"
    }
  ]
}

But this remains to be tested. For now, one bucket per "user", but of course users should probably set access tokens per application to ease revocation.

Checking access policies

This will list the access tokens available under the gitlab account and show their access policies:

for accesskey in $(mc admin user svcacct ls admin gitlab --json | jq -r .accessKey); do 
    mc admin user svcacct info admin $accesskey
done

For example, this might show:

AccessKey: gitlab-ci-osuosl
ParentUser: gitlab
Status: on
Name:
Description: gitlab CI runner object cache for OSUOSL runners, [...]
Policy: embedded
Expiration: no-expiry

The Policy: embedded means there's a policy attached to that access key. The default is Policy: inherited, which means the access token inherits the policy of the parent user.

To see exactly which policy is attached to all users, you can use the --json argument to the info command. This, for example, will list all policies attached to service accounts of the gitlab user:

for accesskey in $(mc admin user svcacct ls admin gitlab --json | jq -r .accessKey); do
    echo $accesskey; mc admin user svcacct info admin $accesskey --json | jq .policy
done

Password resets

MinIO is primarily access through access tokens, issued to users. To create a new access token, you need a user account.

If that password is lost, you should follow one of two procedures, depending on whether you need access to the main administrator account (admin, which is the one who can grant access to other accounts) or a normal user account.

Normal user

To reset the password on a normal user, you must login through the web interface; it doesn't seem possible to reset the password on a normal user through the mc command.

Admin user

The admin user password is set in /etc/default/minio. It can be changed by following a part of the installation instructions, namely:

PASSWORD=$(tr -dc '[:alnum:]' < /dev/urandom | head -c 32)
echo "MINIO_ROOT_PASSWORD=$PASSWORD" > /etc/default/minio
chmod 600 /etc/default/minio

... and then restarting the service:

systemctl restart container-minio.service

Access keys

Access keys secrets cannot be reset: the key must be deleted and a new one must be created in its place.

A better way to do this is to create a new key and mark the old one as expiring. To rotate the GitLab secrets, for example, a new key named gitlab-registry-24 was created (24 being the year, but it could be anything), and the gitlab-registry key was marked as expiring 24h after. The new key was stored in Trocla and the key name, in Puppet.

The runner cache token is more problematic, as the Puppet module doesn't update it automatically once the runner is registered. That needs to be modified by hand.

Setting quota for a bucket

Buckets, without the presence of a policy that limits their usage are unbounded: they can use all of the space available in the cluster.

We can limit the maximum amount of storage used on the cluster for each bucket on a per-bucket manner.

In this section, we use the gitlab-registry bucket in the cluster alias admin as an example, but any alias/bucket can be used instead.

To see what quota is currently configured on a bucket:

mc quota info admin/gitlab-registry

To set the quota limits for an individual bucket, you can set it with one command:

mc quota set admin/gitlab-registry --size 200gi

Finally you can remove the quota on a bucket:

mc quota clear admin/gitlab-registry

Upstream documentation for mc quota has unfortunately vanished from their new namespace AIStor as of the writing of this section (2025-08). You can checkout the deprecated community documentation for quota to get more details, or ou can also check out mc quota --help

An important note about this feature is that minio seems to have completely removed it from AIStor in order to only have it in the enterprise (non-free) version: https://github.com/minio/mc/issues/5014

Server naming in a minio cluster

In a multi-server minio cluster, you must use host names that have a sequential number at the end of the short host name. For example a cluster with a 4-machine pool could have host names that look like this:

  • storage1.torproject.org
  • storage2.torproject.org
  • storage3.torproject.org
  • storage4.torproject.org

If we suppose that each server only has one disk to expose to minio, the above would correspond to the minio server argument https://storage{1...4}.torproject.org/srv/minio

This sequential numbering also needs to be respected when adding new servers in the cluster. New servers should always start being numbered after the current highest host number. If we were to add a new 5-machine server pool to the cluster with the example host names above, we would need to name them storage5.tpo through storage9.tpo.

Note that it is possible to pad the numbers with leading zeros, so for example the above pool could be named storage01.tpo up to storage04.tpo. In the corresponding minio server URL, you then add a leading 0 to tell minio about the padding, so we'd have https://storage{01...04}.torproject.org/srv/minio. This needs to be planned in advance when creating the first machines of the cluster however since their hostnames also need to include the leading 0 in the number.

If you decommission a server pool, then you must not reuse the host names of the decommissioned servers. To continue the examples above, if we were to decommission the 4-machine server pool storage[1-4].tpo after having added the other 5-machine pool, then any new server pool that gets added afterwards needs to have machine names start at storage10.tpo (so you can never reuse the names storage1 through storage4 for that cluster)

Expanding storage on a cluster

minio lets you add more storage capacity to a cluster. This is mainly achieved by adding more server pools (a server pool is a group of machines each with the same amount of disks).

Some important notes about cluster expansion:

  • Once a server pool is integrated into the cluster it cannot be extended for example to add more disks or more machines in the same pool.
  • The only unit of expansion that minio provides is to add an entirely new server pool.
  • You can decommission a server pool. So you can, in a way, resize a pool but by first adding a new one with the new desired size, then migrating data to this new pool and finally decommissioning the older pool.
  • Single-server minio deployments cannot be expanded. In that case, to expand you need to create a new multi-server cluster (e.g. one server pool with more than one machine, or multiple server pools) and then migrate all objects to this new cluster.
  • Each server pool has an independent set of erasure sets (you can more or less think of an erasure set like a cross-nodes RAID setup).
  • If one of the server pools loses enough disks to compromise redundancy of its erasure sets, then all data activity on the cluster is placed on halt until you can resolve the situation. So all server pools must stay consistent at all times.

Add a server pool

When you add a new server pool, minio determines the error coding level depending on how many servers are in the new pool and how many disks each has. This cannot be changed after the pool was added to the cluster, so it is advised to plan the capacity according to redundancy needs before adding the new server pool. See erasure coding in the reference section for more details.

To add a new server pool,

  • first provision all of the new hosts and set their host names following the sequential server naming
  • make sure that all of the old and new servers are able to reach each other on the minio console port (default 9000). If there's any issue, ensure that firewall rules were created accordingly
  • mount all of the drives in directories placed in the same filesystem path and with sequential numbering in the directory names. For example if a server has 3 disks we could mount them in /mnt/disk[1-3]. Make sure that those mount points will persist across reboots
  • create a backup of the cluster configuration with mc admin cluster bucket export and mc admin cluster iam export
  • prepare all of the current and new servers to have new parameters passed in to the minio server, but do not restart the current servers yet.
  • Each server pool is added as one CLI argument to the server binary.
  • a pool is represented by a URL-looking string that contains two elements glued together: how the minio console should be reached and what paths on the host have the disks mounted on.
    • Variation in the pool URL can only be done using tokens like {1...7} to vary on a range of integers. This explains why hostnames need to look the same but vary only by the number. It also implies that all disks should be mounted in similar paths differing only by numbers.
    • For example of a 4-machine pool with 3 disks each mounted on /mnt/disk[1-3], the pool specifier to the minio server could look like this: https://storage{1...4}.torproject.org/mnt/disk{1...3}
  • if we continue on with the above example, assuming that the first server pool contained 4 servers with 3 disks each, then to add a new 5-machine server pool each with 2 disks, we could end up with something like this for the CLI arguments:

    https://storage{1...4}.torproject.org/mnt/disk{1...3} https://storage{5...9}.torproject.org/mnt/disk{1...2}
    
  • restart the minio service on all servers old and new with all of the server pool URLs as server parameters. At this point, the minio cluster integrates the new servers as a new server pool in the cluster

  • modify the load-balancing reverse proxy in front of all minio servers so that it will load-balance also on all new servers from the new pool.

See: upstream documentation about expansion

Creating a tiered storage

minio supports tiered storage for hosting files from certain buckets out to a different cluster. This can, for example, be used to have your main cluster on faster, SSD+NMVe disks while a secondary cluster would be provisioned with slower but bigger HDDs.

Note that since, as noted above, the remote tier as a different cluster, server pool expansion and replication sets need to be handled separately for that cluster.

This section is based off of the upstream documentation about tiered storage and shows how this setup can be created on your local lab for testing. The upstream documentation has examples but none of them are directly usable, and that makes it pretty difficult to understand what's supposed to happen where. Replicating this on production should just be a matter of adjusting URLs, access keys/user names and secret keys.

We'll mimic the wording that the upstream documentation is using. Namely:

  • The "source cluster" is the minio cluster being used directly by users. In our example procedure below on the local lab, that's represented by the cluster running in the lab container miniomain and accessed via the alias named main.
  • In the case of the current production that would be minio-01, accessed via the mc alias admin.
  • The "remote cluster" is the second tier of minio, a separate cluster where HDDs are used. In our example procedure below on the local lab, that's represented by the cluster running in the lab container miniosecondary and accessed via the alias named secondary.
  • In the case of the current production that would be minio-fsn-02, accessed via the mc alias warm.

Some important considerations noted in the upstream documentation about object lifecycle (the more general name given to what's being done to achieve a tiered storage) are:

  • minio moves objects from one tier to the other when the policy defines it. This means that the second tier cannot be considered by itself as a backup copy! We still need to investigate bucket replication policies and external backup strategies.
  • Objects in the remote cluster need to be available exclusively by the source cluster. This means that you should not provide access to objects on the remote cluster directly to users or applications. Access to those should be kept through the source cluster only.
  • The remote cluster cannot use transition rules of its own to send data to yet another tier. The source tier assumes that data is directly accessible on the remote cluster
  • The destination bucket on the remote cluster must exist before the tier is created on the source cluster

  • On the remote cluster, create user and bucket.

The bucket will contain all objects that were transitioned to the second tier and the user will be used by the source cluster to authenticate on the remote cluster when moving objects and when accessing them:

    mc admin user add secondary lifecycle thispasswordshouldbecomplicated
    mc mb secondary/remotestorage

Next, still on the remote cluster, you should make sure that the new user has access to the remotestorage bucket and all objects under it. See the section about how to grant an access policy

  1. On the source cluster, create remote storage tier of type minio named warm:

    mc ilm tier add minio main warm --endpoint http://localhost:9001/ --access-key lifecycle --secret-key thispasswordshouldbecomplicated --bucket remotestorage
    
  2. Note that in the above command we did not specify a prefix. This means that the entire bucket will contain only objects that get moved from the source cluster. So by extension, the bucket should be empty before the tier is added, otherwise you'll get an error when adding the tier.

  3. Also note how a remote tier is tied in to a pair of user and bucket on the remote cluster. If this tier is used to transition objects from multiple different source buckets, then the objects all get placed in the same bucket on the remote cluster. minio names objects after some unique id so it should in theory not be a problem, but you might want to consider whether or not mixing objects from different buckets can have an impact on backups, security policies and other such details.

  4. Lastly on the source cluster we'll create a transition rule that lets minio know when to move objects from a certain bucket to the remote tier. In this example, we'll make objects (current version and all non-current versions, if bucket revisions are enabled) transition immediately to the second tier, but you can tweak the number of days to have a delayed transition if needed.

  5. Here we're assuming that the bucket named source-bucket on the source cluster already exists. If that's not the case, make sure to create it and create and attach policies to grant access to this bucket to the users that need it before adding a transition rule.

     mc ilm rule add main/source-bucket --transition-tier warm --transition-days 0 --noncurrent-transition-days 0 --noncurrent-transition-tier warm
    

Setting up a lifecycle policy administrator user

In the previous section, we configured a remote tier and setup a transition rule to move objects from one bucket to the remote tier.

There's one step from the upstream documentation that we've skipped: creating a user that only has permission to administrate lifecycle policies. That wasn't necessary in our example since we were using the admin access key, which has all the rights to all things. If we wish to separate privileges, though, we can create a user that can only administrate lifecycle policies.

Here's how we can achieve this:

First, create a policy on the source cluster. The example below allows managing lifecycle policies for all buckets in the cluster. You may want to adjust that policy as needed, for example to permit managing lifecycle policies only on certain buckets. Save the following to a json file on your computer (ideally in a directory that mc can reach):

{
   "Version": "2012-10-17",
   "Statement": [
      {
            "Action": [
               "admin:SetTier",
               "admin:ListTier"
            ],
            "Effect": "Allow",
            "Sid": "EnableRemoteTierManagement"
      },
      {
            "Action": [
               "s3:PutLifecycleConfiguration",
               "s3:GetLifecycleConfiguration"
            ],
            "Resource": [
                        "arn:aws:s3:::*
            ],
            "Effect": "Allow",
            "Sid": "EnableLifecycleManagementRules"
      }
   ]
}

Then import the policy on the source cluster and attach this new policy to the user that should be allowed to administer lifecycle policies. For this example we'll name the user lifecycleadmin (of course, change the secret key for that user):

mc admin policy create main warm-tier-lifecycle-admin-policy /root/.mc/warm-tier-lifecycle-admin-policy.json
mc admin user add main lifecycleadmin thisisasecrettoeverybody
mc admin policy attach main warm-tier-lifecycle-admin-policy --user lifecycleadmin

Setting up a local lab

Running some commands can have an impact on the service rendered by minio. In order to test some commands without impacting the production service, we can create a local replica of the minio service on our laptop.

Note: minio can be run in single-node mode, which is simpler to start. But once a "cluster" is created in single-node mode it cannot be extended to multi-node. So even for local dev it is suggested to create at least two nodes in each server pool (group of minio nodes).

Here, we'll use podman to run services hooked up together in a similar manner than what the service is currently using. That means that we'll have:

  • A dedicated podman network for the minio containers.
  • This makes containers obtain an IP address automatically and container names resolve to the assigned IP addresses.
  • Two instances of minio mimicking the main cluster, named minio1 and minio2
  • The mc client configured to talk to the above cluster via an alias pointing to minio1. Normally the alias should rather point to a hostname that's load-balanced throughout all cluster nodes but we're simplifying the setup for dev.

In all commands below you can change the root password at your convenience.

Create the storage dirs and the podman network:

mkdir -p ~/miniotest/minio{1,2}
mkdir ~/miniotest/mc
podman network create minio

Start main cluster instances:

podman run -d --name minio1 --rm --network minio -v ~/miniotest/minio1:/data -e "MINIO_ROOT_USER=admin" -e "MINIO_ROOT_PASSWORD=testing1234" quay.io/minio/minio server http://minio{1...2}/data --console-address :9090
podman run -d --name minio2 --rm --network minio -v ~/miniotest/minio2:/data -e "MINIO_ROOT_USER=admin" -e "MINIO_ROOT_PASSWORD=testing1234" quay.io/minio/minio server http://minio{1...2}/data --console-address :9090

Configure mc aliases:

alias mc="podman run --network minio -v $HOME/miniotest/mc:/root/.mc --rm --interactive quay.io/minio/mc"
mc alias set minio1 http://minio1:9000 admin testing1234

Now the setup is complete. You can create users, policies, buckets and other artefacts in each different instance.

You can also stop the containers, which will automatically remove them. However as long as you keep the directory where the storage volumes are, you can start the containers back up with the same podman run commands above and resume your work from where you left it.

Note that if your tests involve adding more nodes into a new server pool, additional nodes in the cluster need to have the same hostname with sequentially incremented numbers so for example a new pool with two additional nodes should be named minio3 and minio4. Also, ff you decommission a pool during your tests, you cannot reuse the same hostnames later and must continue to increment numbers in hostnames sequentially.

Once your tests are all done, you can simply stop the containers and then remove the files on your disk. If you wish you can also remove the podman network if you don't plan on reusing it:

podman stop minio1
podman stop minio2
# stop any additional nodes in the same manner as above
rm -rf ~/miniotest
podman network rm minio

Note: To fully replicate production, we should also setup an nginx reverse proxy in the same network, load-balacing through all minio instances, then configure mc alias to point to the host used by nginx instead. However, the test setup still works when using just one of the nodes for management.

Pager playbook

Restarting the service

The MinIO service runs under the container-minio.service unit. To restart it if it crashed, simply run:

systemctl restart container-minio.service

Disk filling up

If the MinIO disk fills up, you can look in the web interface for a culprit, or on the commandline:

mc du --depth=2  admin

Disaster recovery

If the server is lost with all data, a new server should be rebuilt (see installation and a recovery from backups should be attempted.

See also the upstream Recovery after Hardware Failure documentation.

Reference

Installation

We followed the hardware checklist to estimate the memory requirement which happily happened to match the default 8g parameter in our Ganeti VM installation instructions. We also set 2 vCPUs but that might need to change.

We setup the server with a plain backend to save disk on the nodes, with the understanding this service has lower availability requirements than other services. It's especially relevant since, if we want higher availability, we'll setup multiple nodes, so network-level RAID is redundant here.

The actual command used to create the VM was:

gnt-instance add \
  -o debootstrap+bookworm \
  -t plain --no-wait-for-sync \
  --net 0:ip=pool,network=gnt-dal-01 \
  --no-ip-check \
  --no-name-check \
  --disk 0:size=10G \
  --disk 1:size=1000G \
  --backend-parameters memory=8g,vcpus=2 \
  minio-01.torproject.org

We assume the above scheme is compatible with the Sequential Hostnames requirements in the MinIO documentation. They use minio{1...4}.example.com but we assume the minio prefix is user-chosen, in our case minio-0.

The profile::minio class must be included in the role (currently role::object_storage) for the affected server. It configures the firewall, podman, and sets up the systemd service supervising the container.

Once the install is completed, you should have the admin password in /etc/default/minio, which can be used to access the admin interface and, from there, pretty much do everything you need.

Region configuration

Some manual configuration was done after installation, namely setting access tokens, configuring buckets and the region. The latter is done with:

mc admin config set admin/ region name=dallas

Example:

root@minio-01:~# mc admin config set admin/ region name=dallas
Successfully applied new settings.
Please restart your server 'mc admin service restart admin/'.
root@minio-01:~# systemctl restart container-minio.service
root@minio-01:~# mc admin config get admin/ region
region name=dallas

Manual installation

Those are notes taken during the original installation. That was later converted with Puppet, in the aforementioned profile::minio class, so you shouldn't need to follow this to setup a new host, Puppet should set up everything correctly.

The quickstart guide is easy enough to follow to get us started, but we do some tweaks to:

  • make the podman commandline more self-explanatory using long options

  • assign a name to the container

  • use /srv instead of ~

  • explicitly generate a (strong) password, store it in a config file, and use that

  • just create the container (and not start it), delegating the container management to systemd instead, as per this guide

This is the actual command we use to create (not start!) the container:

PASSWORD=$(tr -dc '[:alnum:]' < /dev/urandom | head -c 32)
echo "MINIO_ROOT_PASSWORD=$PASSWORD" > /etc/default/minio
chmod 600 /etc/default/minio
mkdir -p /srv/data

podman create \
   --name minio \
   --publish 9000:9000 \
   --publish 9090:9090 \
   --volume /srv/data:/data \
   --env "MINIO_ROOT_USER=admin" \
   --env "MINIO_ROOT_PASSWORD" \
   quay.io/minio/minio server /data --console-address ":9090"

We store the password in a file because it will be used in a systemd unit.

This is how the systemd unit was generated:

podman generate systemd --new --name minio | sed 's,Environment,EnvironmentFile=/etc/default/minio\nEnvironment,' > /etc/systemd/system/container-minio.service

Then the unit was enabled and started with:

systemctl enable container-minio.service && systemctl start container-minio.service

That starts MinIO with a web interface on https://localhost:9090 and the API on https://localhost:9000, even though the console messages mention addresses in the 10.0.0.0/8 network.

You can use the web interface to create the buckets, or the mc client which is also available as a Docker container.

The installation was done in issue tpo/tpa/team#41257 which may have more details.

The actual systemd configuration was modified since then to adapt to various constraints, for example the TLS configuration, container updates, etc.

We could consider Podman's quadlets, but those shipped only in Podman 4.4, which barely missed the bookworm release. To reconsider in Debian Trixie.

Upgrades

Upgrades are handled automatically through the built-in podman self-updater, podman-auto-update. The way this works is the container is ran with --pull=never so that a new image is not pulled when the container is started.

Instead, the container is labeled with io.containers.autoupdate=image and that is what makes podman auto-update pull the new image.

The job is scheduled by the podman package under systemd, you can see the current status with:

systemctl status podman-auto-update

Here are the full logs of an example successful run:

root@minio-01:~# journalctl _SYSTEMD_INVOCATION_ID=`systemctl show -p InvocationID --value podman-auto-update.service` --no-pager
Jul 18 19:28:34 minio-01 podman[14249]: 2023-07-18 19:28:34.331983875 +0000 UTC m=+0.045840045 system auto-update
Jul 18 19:28:35 minio-01 podman[14249]: Trying to pull quay.io/minio/minio:latest...
Jul 18 19:28:36 minio-01 podman[14249]: Getting image source signatures
Jul 18 19:28:36 minio-01 podman[14249]: Copying blob sha256:27aad82ab931fe95b668eac92b551d9f3a1de15791e056ca04fbcc068f031a8d
Jul 18 19:28:36 minio-01 podman[14249]: Copying blob sha256:e87e7e738a3f9a5e31df97ce1f0497ce456f1f30058b166e38918347ccaa9923
Jul 18 19:28:36 minio-01 podman[14249]: Copying blob sha256:5329d7039f252afc1c5d69521ef7e674f71c36b50db99b369cbb52aa9e0a6782
Jul 18 19:28:36 minio-01 podman[14249]: Copying blob sha256:7cdde02446ff3018f714f13dbc80ed6c9aae6db26cea8a58d6b07a3e2df34002
Jul 18 19:28:36 minio-01 podman[14249]: Copying blob sha256:5d3da23bea110fa330a722bd368edc7817365bbde000a47624d65efcd4fcedeb
Jul 18 19:28:36 minio-01 podman[14249]: Copying blob sha256:ea83c9479de968f8e8b5ec5aa98fac9505b44bd0e0de09e16afcadcb9134ceaa
Jul 18 19:28:39 minio-01 podman[14249]: Copying config sha256:819632f747767a177b7f4e325c79c628ddb0ca62981a1a065196c7053a093acc
Jul 18 19:28:39 minio-01 podman[14249]: Writing manifest to image destination
Jul 18 19:28:39 minio-01 podman[14249]: Storing signatures
Jul 18 19:28:39 minio-01 podman[14249]: 2023-07-18 19:28:35.21413655 +0000 UTC m=+0.927992710 image pull  quay.io/minio/minio
Jul 18 19:28:40 minio-01 podman[14249]:             UNIT                     CONTAINER             IMAGE                POLICY      UPDATED
Jul 18 19:28:40 minio-01 podman[14249]:             container-minio.service  0488afe53691 (minio)  quay.io/minio/minio  registry    true
Jul 18 19:28:40 minio-01 podman[14385]: 09b7752e26c27cbeccf9f4e9c3bb7bfc91fa1d2fc5c59bfdc27105201f533545
Jul 18 19:28:40 minio-01 podman[14385]: 2023-07-18 19:28:40.139833093 +0000 UTC m=+0.034459855 image remove 09b7752e26c27cbeccf9f4e9c3bb7bfc91fa1d2fc5c59bfdc27105201f533545

You can also see when the next job will run with:

systemctl status podman-auto-update.timer

SLA

This service is not provided in high availability mode, which was deemed too complex for a first prototype in TPA-RFC-56, particularly using MinIO with a containers runtime.

Backups, in particular, are not guaranteed to be functional, see backups for details.

Design and architecture

The design of this service was discussed in tpo/tpa/team#40478 and proposed in TPA-RFC-56. It is currently a single virtual machine in the gnt-dal cluster running MinIO, without any backups or redundancy.

This is assumed to be okay because the data stored on the object storage is considered disposable, as it can be rebuilt. For example, the first service which will use the object storage, GitLab Registry, generates artifacts which can normally be rebuilt from scratch without problems.

If the service becomes more popular and is more heavily used, we might setup a more highly available system, but at that stage we'll need to look again more seriously at alternatives from TPA-RFC-56 since MinIO's distributed are much more complicated and hard to manage than their competitors. Garage and Ceph are the more likely alternatives, in that case.

We do not use the advanced distributed capabilities of MinIO, but those are documented in this upstream architecture page and this design document.

Services

The MinIO daemon runs under podman and systemd under the container-minio.service unit.

Storage

In a single node setup, files are stored directly on the local disk, but with extra metadata mangled with the file content. For example, assuming you have a directory setup like this:

mkdir test
cd test
touch empty
printf foo > foo

... and you copy that directory over to a MinIO server:

rclone copy test minio:test-bucket/test

On the MinIO server's data directory, you will find:

./test-bucket/test
./test-bucket/test/foo
./test-bucket/test/foo/xl.meta
./test-bucket/test/empty
./test-bucket/test/empty/xl.meta

The data is stored in the xl.meta files, and is stored as binary with a bunch of metadata prefixing the actual data:

root@minio-01:/srv/data# strings gitlab/test/empty/xl.meta | tail
x-minio-internal-inline-data
true
MetaUsr
etag
 d41d8cd98f00b204e9800998ecf8427e
content-type
application/octet-stream
X-Amz-Meta-Mtime
1689172774.182830192
null
root@minio-01:/srv/data# strings gitlab/test/foo/xl.meta | tail
MetaUsr
etag
 acbd18db4cc2f85cedef654fccc4a4d8
content-type
application/octet-stream
X-Amz-Meta-Mtime
1689172781.594832894
null
StbC
Efoo

It is possible that such data store could be considered consistent if quiescent, but there's no guarantee about that by MinIO.

There's also a whole .minio.sys next to the bucket directories which contain metadata about the buckets, user policies and configurations, again using the obscure xl.meta storage. This is also assumed to be hard to backup.

According to Stack Overflow, there is a proprietary extension to the mc commandline called mc support inspect that allows inspecting on-disk files, but it requires a "MinIO SUBNET" registration, which is a support contract with MinIO, inc.

Erasure coding

In distributed setups, MinIO uses erasure coding to distribute objects across multiple servers and/or sets of drives. According to their documentation:

MinIO Erasure Coding is a data redundancy and availability feature that allows MinIO deployments to automatically reconstruct objects on-the-fly despite the loss of multiple drives or nodes in the cluster. Erasure Coding provides object-level healing with significantly less overhead than adjacent technologies such as RAID or replication.

This implies that the actual files on disk are not readily readable using normal tools in a distributed setup.

An important tool for capacity planning can help you know how much actual storage space will be available and with how much redundancy given a number of servers and disks.

Erasure coding is automatically determined by minio based on the number of servers and drives that's provided upon creating the cluster. See upstream documentation about erasure coding

Additionally to the above note about local storage being unavailable for consistently reading data directly from disk, erasure coding mentions the following important information:

MinIO requires exclusive access to the drives or volumes provided for object storage. No other processes, software, scripts, or persons should perform any actions directly on the drives or volumes provided to MinIO or the objects or files MinIO places on them.

So nobody or nothing (script, cron job) should ever apply modifications to minio's storage files on disk.

To determine the erasure coding that minio currently has set for the cluster, you can look at the output of:

mc admin info alias

This shows information about all nodes and the state of their drives. You also get information towards the end of the output about the stripe size (number of data + parity drives in each erasure set) and the number of parity drives, thus showing how much drives you can lose before risking data loss. For example:

┌──────┬────────────────────────┬─────────────────────┬──────────────┐
│ Pool │ Drives Usage           │ Erasure stripe size │ Erasure sets │
│ 1st  │ 23.6% (total: 860 GiB) │ 2                   │ 1            │
│ 2nd  │ 23.6% (total: 1.7 TiB) │ 3                   │ 1            │
└──────┴────────────────────────┴─────────────────────┴──────────────┘

58 KiB Used, 1 Bucket, 2 Objects
5 drives online, 0 drives offline, EC:1

In the above output, we have two pools, one with a stripe size of 2 and one with a stripe size of 3. The cluster has an erasure coding of one (EC:1) which means that each pool can sustain up to 1 disk failure and still be able to recover after the drive has been replaced.

The stripe size is roughly equivalent to the number of available disks within a pool up to 16. If a pool has more than 16 drives, minio divides the drives into a number of stripes (groups). Each stripe manages erasure coding separately and the disks for different stripes are chosen spread across machines to minimize the impact of a host going down (so if one host goes down it will affect more stripes simultaneously but with a smaller impact -- less disks go down for each stripe at once)

Setting erasure coding at run time

It is possible to tell minio to change its target for erasure coding while the cluster is running. For that we use the mc admin config set command.

For example, here we'll set our local lab cluster to 4 parity disks in standard configuration (all hosts up/available) and 3 disks for reduced redundancy:

mc admin config set minio1 storage_class standard=EC:4 rrs=EC:3 optimize=availability

When setting this config, standard should always be 1 more than rrs or equal to it.

Also importantly, note that the erasure coding configuration applies to all of the cluster at once. So the values chosen for number of parity disks should be able to apply to all pools at once. In that sense, choose the number of parity disks with the smallest pool in mind.

Note that it is possible to set the number of parity drives to 0 with a value of EC:0 for both standard and rrs. This means that losing a single drive and/or host will incur data loss! But considering that we currently run minio on top of RAID, this could be a way to reduce the amount of physical disk space lost to redundancy. It does increase risks linked to mis-handling things underneath (e.g. accidentally destroying the VM or just the volume when running commands in ganeti). Upstream recommends against running minio on top of RAID, which is probably what we'd want to follow if we were to plan for a very large object storage cluster.

TODO: it is not yet clear for us how the cluster responds to the config change: does it automatically rearrange disks in pool to fit the new requirements?

See: https://github.com/minio/minio/tree/master/docs/config#storage-class

Queues

MinIO has a built-in lifecycle management where object can be configured to have an expiry date. That is done automatically inside MinIO with a low priority object scanner.

Interfaces

There are two main interfaces, the S3 API on port 9000 and the MinIO management console on port 9090.

The management console is limited to an allow list including the jump hosts, which might require port forwarding, see Accessing the web interface for details, and Security and risk assessment for a discussion.

The main S3 API is available globally at https://minio.torproject.org:9000, a CNAME that currently points at the minio-01 instance.

Note that this URL, if visited in a web browser, redirects to the 9090 interface, which can be blocked.

Authentication

We use the built-in MinIO identity provider. There are two levels of access controls: control panel access (port 9090) is given to users which are in turn issued access tokens, which can access the "object storage" API (port 9000).

Admin account usage

The admin user is defined in /etc/default/minio on minio-01 and has an access token saved in /root/.mc that can be used with the mc commandline client, see the tests section for details.

The admin user MUST only be used to manage other user accounts, as an access key leakage would be catastrophic. Access keys basically impersonate a user account, and while it's possible to have access policies per token, we've made the decision to do access controls with user accounts instead, as that seemed more straightforward.

Tests can be performed with the play alias instead, which uses the demonstration server from MinIO upstream.

The normal user accounts are typically accessed with tokens saved as aliases on the main minio-01 server. If that access is lost, you can use the password reset procedures to recover.

Each user is currently allowed to access only a single bucket. We could relax that by allowing users to access an arbitrary number of buckets, prefixed with their usernames, for example.

A counter-intuitive fact is that when a user creates a bucket, they don't necessarily have privileges over it. To work around this, we could allow users to create arbitrary bucket names and use bucket notifications, probably through a webhook, to automatically grant rights to the bucket to the caller, but there are security concerns with that approach, as it broadens the attack surface to the webhook endpoint. But this is more typical of how "cloud" services like S3 operate.

Monitoring token

Finally, there's a secret token to access the MinIO statistics that's generated on the fly. See the monitoring and metrics section.

Users and access tokens

There are two distinct authentication mechanisms to talk to MinIO, as mentioned above.

  • user accounts: those grant access to the control panel (port 9090)
  • service accounts: those grant access to the "object storage" API (port 9000)

At least, that was my (anarcat) original understanding. But now that the control panel is gone and that we do everything over the commandline, I suspect those share a single namespace and that they can be used interchangeably.

In other words, the distinction is likely more:

  • user accounts: a "group" of service tokens that hold more power
  • service accounts: a sub-account that allows users to limit the scope of applications, that inherits the user access policy unless a policy is attached to the service account

In general, we try to avoid the proliferation of user accounts. Right now, we grant user accounts per team: we have a network-health user, for example.

We also have per service users, which is a bit counter-intuitive. We have a gitlab user, for example, but that's only because GitLab is so huge and full of different components. Going forward, we should probably create a tpa account and use service accounts per service to isolate different services.

Each service account SHOULD get its own access policy that limits its access to its own bucket, unless the service is designed to have multiple services use the same bucket, in which case it makes sense to have multiple service accounts sharing the same access policy.

TLS certificates

The HTTPS certificate is managed by our normal Let's Encrypt certificate rotation, but required us to pull the DH PARAMS, see this limitation of crypto/tls in Golang and commit letsencrypt-domains@ee1a0f7 (stop appending DH PARAMS to certificates files, 2023-07-11) for details.

Implementation

MinIO is implemented in Golang, as a single binary.

The service is currently used by the Gitlab service. It will also be used by the Network Health team for metrics storage.

Issues

There is no issue tracker specifically for this project, File or search for issues in the team issue tracker with the label ~"Object Storage".

Upstream has an issue tracker on GitHub that is quite clean (22 open issues out of 6628) and active (4 opened, 71 closed issues in the last month as of 2023-07-12).

MinIO offers a commercial support service which provides 24/7 support with a <48h SLA at 10$/TiB/month. Their troubleshooting page also mentions a community Slack channel.

Maintainer

anarcat setup this service in July 2023 and TPA is responsible for managing it. LeLutin did research and deployment of the multiple nodes.

Users

The service is currently used by the Gitlab service but may be expanded to other services upon request.

Upstream

MinIO is a well-known object storage provider. It is not packaged in Debian. It has regular releases, but they do not have release numbers conforming to the semantic versioning standard. Their support policy is unclear.

Licensing dispute

MinIO are involved in a licensing dispute with commercial storage providers (Weka and Nutanix) because the latter used MinIO in their products without giving attribution. See also this hacker news discussion.

It should also be noted that they switched to the AGPL relatively recently.

This is not seen as a deal-breaker in using MinIO for TPA.

Monitoring and metrics

The main Prometheus server is configured to scrape metrics directly from the minio-01 server. This was done by running the following command on the server:

mc admin prometheus generate admin

... and copying the bearer token into the Prometheus configuration (profile::::prometheus::server::internal in Puppet). Look for minio_prometheus_jwt_secret.

The upstream monitoring metrics do not mention it, but there's a range of Grafana dashboards as well. Unfortunately, we couldn't find a working one in our search; even the basic one provided by MinIO, Inc doesn't work.

We did manage to import this dashboard from micah, but it is currently showing mostly empty graphs. It could be that we don't have enough metrics yet for the dashboards to operate correctly.

Fortunately, our MinIO server is configured to talk with the Prometheus server with the MINIO_PROMETHEUS_URL variable, which makes various metrics visible directly in https://localhost:9090/tools/metrics.

Tests

To make sure the service still works after an upgrade, you can try creating a bucket.

Logs

The logs from the last boot of the container-minio.service can be inspected with:

journalctl -u container-minio.service -b

MinIO doesn't seem to keep PII in its logs but PII may of course be recorded in the buckets by the services and users using it. This is considered not the responsibility of the service.

Backups

MinIO uses a storage backend that possibly requires the whole service to be shutdown before backups are made in order for backups to be consistent.

It is therefore assumed backups are not consistent and a recovery of a complete loss of a host is difficult or impossible.

This clearly needs to be improved, see the upstream data recovery options and their stance on business continuity.

This will be implemented as part of TPA-RFC-84, see tpo/tpa/team#41415.

Other documentation

Discussion

Overview

This project was started in response to growing large-scale storage problems, particularly the need to host our own GitLab container registry, which culminated in TPA-RFC-56. That RFC discussed various solutions to the problem and proposed using a single object storage server running MinIO as a backend to the GitLab registry.

Security and risk assessment

Track record

No security audit has been performed on MinIO that we know of.

There's been a few security vulnerabilities in the past but none published there March 2021. There is however a steady stream of vulnerabilities on CVE Details, including an alarming disclosure of the MINIO_ROOT_PASSWORD (CVE-2023-28432). It seems like newer vulnerabilities are disclosed through their GitHub security page.

They only support the latest release, so automated upgrades are a requirement for this project.

Disclosure risks

There's an inherent risk of bucket disclosure with object storage APIs. There's been numerous incidents of AWS S3 buckets being leaked because of improper access policies. We have tried to establish good practices on this by having scoped users and limited access keys, but those problems are ultimately in the hands of users, which is fundamentally why this is such a big problem.

Upstream has a few helpful guides there:

Audit logs and integrity

MinIO supports publishing audit logs to an external server, but we do not believe this is currently necessary given that most of the data on the object storage is supposed to be public GitLab data.

MinIO also has many features to ensure data integrity and authenticity, namely erasure coding, object versioning, and immutability.

Port forwarding and container issues

We originally had problems with our container-based configuration as the podman run --publish lines made it impossible to firewall using our normal tools effectively (see incident tpo/tpa/team#41259). This was due to the NAT tables created by podman that were forwarding packets before they were hitting our normal INPUT rules. This made the service globally accessible, while we actually want to somewhat restrict it, at the very least the administration interface.

The fix ended up being running the container with relaxed privileges (--network=host). This could also have been worked around by using an Nginx proxy in front, and upstream has a guide on how to Use Nginx, LetsEncrypt and Certbot for Secure Access to MinIO.

UNIX user privileges

The container are ran as the minio user created by Puppet, using podman --user but not the User= directive in the systemd unit. The latter doesn't work as podman expects a systemd --user session, see also upstream issue 12778 for that discussion.

Admin interface access

We're not fully confident that opening up this attack surface is worth it so, for now, we grant access to the admin interface to an allow list of IP addresses. The jump hosts should have access to it. Extra accesses can be granted on a need-to basis.

It doesn't seem like upstream recommends this kind of extra security, that said.

Currently, the user creation procedures and bucket policies should be good enough to allow public access to the management console, that said. If we change this policy, a review of the documentation here will be required, in particular the interfaces, authentication and Access the web interface sections.

Note: Since the initial discussion around this subject, the admin web interface was stripped out of all administrative features. Only bucket creation and browsing is left.

Technical debt and next steps

Some of the Puppet configuration could be migrated to a Puppet module, if we're willing to abandon the container strategy and switch to upstream binaries. This will impact automated upgrades however. We could also integrate our container strategy in the Puppet module.

Another big problem with this service is the lack of appropriate backups, see the backups section for details.

Proposed Solution

This project was discussed in TPA-RFC-56.

Other alternatives

Other object storage options

See TPA-RFC-56 for a thorough discussion.

MinIO Puppet module

The kogitoapp/minio provides a way to configure one or many MinIO servers. Unfortunately it suffers from a set of limitations:

  1. it doesn't support Docker as an install method, only binaries (although to its defense it does use a checksum...)

  2. it depends on the deprecated puppet-certs module

  3. even if it would defend on the newer puppet-certificates module, that module clashes with the way we manage our own certificates... we might or might not want to use this module in the long term, but right now it seems too big of a jump to follow

  4. it hasn't been updated in about two years (last release in September 2021, as of July 2023)

We might still want to consider that module if we expand the fleet to multiple servers.

Other object storage clients

In the above guides, we use rclone to talk to the object storage server, as a generic client, but there are obviously many other implementations that can talk with cloud providers such as MinIO.

We picked rclone because it's packaged in Debian, fast, allows us to store access keys encrypted, and is generally useful for many other purposes as well.

Other alternatives include:

  • s3cmd and aws-cli are both packaged in Debian, but unclear if usable on other remotes than the Amazon S3 service
  • boto3 is a Python library that allows one to talk to object storage services, presumably not just Amazon S3 as well, Ruby Fog is the same for Ruby, and actually used in GitLab
  • restic can backup to S3 buckets, and so can other backup tools (e.g. on Mac, at least Arq, Cyberduck and Transmit apparently can)