A "status" dashboard is a simple website that allows service admins to clearly and simply announce down times and recovery.
Note that this be considered part of the documentation system, but is documented separately.
The site is at https://status.torproject.org/ and the source at https://gitlab.torproject.org/tpo/tpa/status-site/.
[[TOC]]
Tutorial
Local development environment
To install the development environment for the status site, you should have a copy of the Hugo static site generator and the git repository:
sudo apt install hugo
git clone --recursive -b main https://gitlab.torproject.org/tpo/tpa/status-site.git
cd status-site
WARNING: the URL of the Git repository changed! It used to be hosted at GitLab, but is now hosted at Gitolite. The repository is mirrored to GitLab, but pushing there will not trigger build jobs.
Then you can start a local development server to preview the site with:
hugo serve --baseURL=http://localhost/
firefox http://localhost:1313/
The content can also be built in the public/ directory with, simply:
hugo
Creating new issues
Issues are stored in content/issues/. You can create a new issue
with hugo new, for example:
hugo new issues/2021-02-03-testing-cstate-again.md
This create the file from a pre-filled template (called an
archetype in Hugo)
and put it in content/issues/2021-02-03-testing-cstate-again.md.
If you do not have hugo installed locally, you can also copy the
template directly (from themes/cstate/archetypes/default.md), or
copy an existing issue and use it as a template.
Otherwise the upstream guide on how to create issues is fairly thorough and should be followed.
In general, keep in mind that the date field is when the issue
started, not when you posted the issue, see this feature
request asking for an explicit "update" field.
Also note that you can add draft: true to the front-matter (the
block on top) to keep the post from being published on the front page
before it is ready.
Uploading site to the static mirror system
Uploading the site is automated by continuous integration. So you simply need to commit and push:
git commit -a -myolo
git push
Note that only the TPA group has access to the repository for now,
but other users can request access as needed.
You can see the progress of build jobs in the GitLab CI pipelines. If all goes well, successful webhook deliveries should show up in this control panel as well.
If all goes well, the changes should propagate to the mirrors within a few seconds to a minute.
See also the disaster recovery options below.
Keep in mind that this is a public website. You might want to talk
with the comms@ people before publishing big or sensitive
announcements.
How-to
Changing categories
cState relies on "systems" which live inside a "category" For example,
the "v3 onion services" are in the "Tor network" category. Those are
defined in the config.yml file, and each issue (in content/issues)
refers to one or more "system" that is affected by it.
Theming
The logo lives in static/logo.png. Some colors are defined in
config.yml, search for Colors throughout cState.
Pager playbook
No monitoring specific to this service exists.
Disaster recovery
It should be possible to deploy the static website anywhere that supports plain HTML, assuming you have a copy of the git repository.
The instructions in all of the subsections below assume you have a copy of the git repository.
Important: make sure you follow the installation instructions to also clone the submodules!
If the git repository is not available, you could start from scratch using the example repository as well.
From here on, it is assumed you have a copy of the git repository (or the example one).
Those procedures were not tested.
Manual deployment to the static mirror system
If GitLab is down, you can upload the public/ folder content under
/srv/static-gitlab-shim/status.torproject.org/.
The canonical source for the static websites rotation is defined in
Puppet (in modules/staticsync/data/common.yaml) and is
currently set to static-gitlab-shim.torproject.org. This rsync command
should be enough:
rsync -rtP public/ static-gitlab-shim@static-gitlab-shim.torproject.org:/srv/static-gitlab-shim/status.torproject.org/public/
This might require adding your key to
/etc/ssh/userkeys/static-gitlab-shim.more.
Then the new source material needs to be synchronized to the mirrors, with:
sudo -u mirroradm static-update-component status.torproject.org
This requires access to the mirroradm group, although typically the
machine is only accessible to TPA anyways.
Don't forget to push the changes to the git repository, once that is available. It's important so that the next people can start from your changes:
git commit -a -myolo
git push
Netlify deployment
Upstream has instructions to deploy to Netlify, which, in our case, might be as simple as following this link and filling in those settings:
- Build command:
hugo - Publish directory:
public - Add one build environment variable
- Key:
HUGO_VERSION - Value:
0.48(or later)
Then, of course, DNS needs to be updated to point there.
GitLab pages deployment
A site could also be deployed on another GitLab server with "GitLab pages" enabled. For example, if the repository is pushed to https://gitlab.com/, the GitLab CI/CD system there will automatically pick up the configuration and run it.
Unfortunately, due to the heavy customization we used to deploy the
site to the static mirror system, the stock .gitlab-ci.yml file will
likely not work on another system. An alternate .gitlab-ci-pages.yml
file should be available in the Git repository and can be activated in
the GitLab project in Settings -> CI/CD -> CI/CD configuration file.
That should give you a "test" GitLab pages site with a URL like:
https://user.gitlab.io/tpa-status/
To transfer the real site there, you need to go into the project's
Settings -> Pages section and hit New Domain.
Enter status.torproject.org there, which will ask you to add an
TXT record in the torproject.org zone.
Add the TXT record to domains.git/torproject.org, commit and push,
then hit the "Retry verification" button in the GitLab interface.
Once the domain is verified, point the status.torproject.org domain
to the new backend:
status CNAME user.gitlab.io
For example, in my case, it was:
status CNAME anarcat.gitlab.io
See also the upstream documentation for details.
Those are the currently known mirrors of the status site:
Reference
Installation
See the instructions on how to setup a local development environment and the design section for more information on how this is setup.
Upgrades
Upgrades to the software are performed by updating the cstate submodule.
Since November, the renovate-cron bot will pass through the project to make sure that submodule is up to date.
Hugo itself is managed through the Debian packages provided as part of
the bookworm container, and therefore benefit from the normal Debian
support policies. Major Debian upgrades need to be manually performed
in the .gitlab-ci.yml file and are not checked by renovate.
SLA
This service should be highly available. It should support failure from one or all point of presence: if all fail, it should be easy to deploy it to a third-party provider.
Design and architecture
The status site is part of the static mirror system and is built with cstate, which is a theme for the Hugo static site generator. The site is managed in a git repository on the GitLab server and uses GitLab CI to get built. The static-gitlab-shim service propagates the builds to the static mirror system for high availability.
See the static-gitlab-shim service design document for more information.
Services
No service other than the above external services are required to run this service.
Queues
There are no queues or schedulers for that service, although renovate-cron will pass by the project to check for updates once in a while.
Interfaces
Authentication
Implementation
Status is mostly written in Markdown, but the upstream code is written in Golang and its templating language.
Related services
Issues
File or search for issues in the status-site tracker.
Upstream issues can be found and filed in the GitHub issue tracker.
Users
TPA is the main maintainer of this service and therefore its most likely user, but the network health team are frequent users as well.
Naturally, any person interested in the Tor project and the health of the services is also a potential user.
Upstream
cState is a pretty collaborative and active upstream. It is seeing regular releases and is considered healthy, especially since most of the implementation is actually in hugo, another healthy project.
Monitoring and metrics
No metrics for this service are currently defined in Prometheus, outside of normal web server monitoring.
Tests
New changes to the site are manually checked by browsing a rendered version of the site and clicking around.
This can be done on a local copy before even committing, or it can be done with a review site by pushing a branch and opening a merge request.
Logs
There are no logs or metrics specific to this service, see the static site service for details.
A history of deployments and past version of the code is of course available in the Git repository history and the GitLab job logs.
Backups
Does not need special backups: backed up as part of the regular static site and git services.
Other documentation
- cState home page
- demo site
- cState wiki, see in particular the usage and configuration guides
Discussion
Overview
This project comes from two places:
-
during the 2020 TPA user survey, some respondents suggested to document "down times of 1h or longer" and better communicate about service statuses
-
separately, following a major outage in the Tor network due to a DDOS, the network team and network health teams asked for a dashboard to inform tor users about such problems in the future
This is therefore a project spanning multiple teams, with different stakeholders. The general idea is to have a site (say status.torproject.org) that simply shows users how things are going, in an easy to understand form.
Security and risk assessment
No security audit was performed of this service, but considering it only manages static content accessed by trusted users, its exposure is considered minimal.
It might be the target of denial of service attacks, as the rest of the static mirror system. A compromise of the GitLab infrastructure would also naturally give access to the status site.
Finally, if an outage affects the main domain name (torproject.org)
this site could suffer as well.
Technical debt and next steps
The service should probably be moved onto an entirely different domain, managed on a different registrar, using keys stored in a different password manager.
There used to be no upgrades performed on the site, but that was fixed in November 2023, during the Hackweek.
Goals
In general, the goal is to provide a simple interface to provide users with status updates.
Must have
- user-friendly: the public website must be easy to understand by the Tor wider community of users (not just TPI/TPA)
- status updates and progress: "post status problem we know about so the world can learn if problems are known to the Tor team."
- example: "[recent] v3 outage where we could have put out a small FAQ right away (go static HTML!) and then update the world as we figure out the problem but also expected return to normal."
- multi-stakeholder: "easily editable by many of us namely likely the network health team and we could also have the network team to help out"
- simple to deploy and use: pushing an update shouldn't require complex software or procedures. editing a text file, committing and pushing, or building with a single command and pushing the HTML, for example, is simple enough. installing a MySQL database and PHP server, for example, is not simple enough.
- keep it simple
- free-software based
Nice to have
- deployment through GitLab (pages?), with contingency plans
- separate TLD to thwart DNS-based attacks against torproject.org
- same tool for multiple teams
- per-team filtering
- RSS feeds
- integration with social media?
- responsive design
Non-Goals
- automation: updating the site is a manual process. no automatic reports of sensors/metrics or Nagios, as this tends to complicate the implementation and cause false positives
Approvals required
TPA, network team, network health team.
Proposed Solution
We're experimenting with cstate because it's the only static website generator with such a nice template out of the box that we could find.
Cost
Just research and development time. Hosting costs are negligible.
Alternatives considered
Those are the status dashboards we know about and that are still somewhat in active development:
- Cachet
- PHP
- MySQL database
- demo site (test@test.com, test123)
- responsive
- not decentralized
- no Nagios support
- user-friendly
- publicly accessible
- fairly easy to use
- aims for LDAP support
- no Twitter, Identica, IRC or XMPP support for now
- dropped RSS support
- future of the project uncertain (4037, 3968)
- cstate, Hugo-based static site generator, tag-based RSS feeds, easy setup on Netlify, GitLab CI integration, badges, read only API
- Staytus
- ruby
- MySQL database
- responsive
- email notifications
- mobile-friendly
- not distributed
- no Nagios integration
- no Twitter notifications
- user-friendly - seems to be even nicer than Cachet, as there are links to individual announcements and notifications
- no LDAP support
- MIT-licensed
- similar performance problems than Cachet
- vigil-server
- tinystatus
- uptime kuma more of a monitoring platform
Abandonware
Those were previously evaluated in a previous life but ended up being abandoned upstream:
- Overseer - used at Disqus.com, Python/Django, user-friendly/simple, administrator non-friendly, twitter integration, Apache2 license, development stopped, Disqus replaced it with Statuspage.io
- Stashboard - used at Twilio, MIT license, demo, Twitter integration, REST API, abandon-ware, no authentication, no Unicode support, depends on Google App engine, requires daily updates
- Baobab - previously used at Gandi, replaced with
statuspage.io, Django based
Hacks
Those were discarded because they do not provide an "out of the box" experience:
- use Jenkins to run jobs that check a bunch of things and report a user-friendly status?
- just use a social network account (e.g. Twitter)
- "just use the wiki"
- use Drupal ("there's a module for that")
- roll our own with Lektor, e.g. using this template
- using GitHub issues
example sites
- Amazon Service Health Dashboard
- Disqus - based on statuspage.io
- GitLab - based on status.io
- Github - "Battle station fully operational", auto-refresh, twitter-connected, simple color coded (see this blog post for more details), not open-source (confirmed in personal email between GitHub support and anarcat on 2013-05-02)
- Potager.org - ikiwiki based
- Riseup.net - RSS feeds
- Signal - simple, plain HTML page
- sr.ht - cState
- Twilio - email, slack, RSS subscriptions, lots of services shown
- Wikimedia - based on proprietary nimsoft software, deprecated in favor of Grafana
Previous implementations
IRC bot
A similar service was ran by @weasel around 2014. It would bridge the
status comments on IRC into a website, see this archived
version
and the source code, which
is still available.
Jenkins jobs
The site used to be built with Jenkins jobs, from a git repository on the git server. This was setup this way because that is how every other static website was built back then.
This involved:
- a new static component owned by
torwww(in thetor-puppet.gitrepository) - a new build script in the jenkins/tools.git repository
- a new build job in the jenkins/jobs.git repository
- a new entry in the ssh wrapper in the admin/static-builds.git repository
- a new gitolite repository with hooks to ping the Jenkins server and mirror to GitLab
We also considered using GitLab CI for deployment but (a) GitLab pages was not yet setup and (b) it didn't integrate well with the static mirror system for now. See the broader discussion of the static site system improvements.
Both issues have now been fixed thanks to the static-gitlab-shim service.