Donate-neo is the new Django-based donation site that is the frontend for https://donate.torproject.org.

[[TOC]]

Tutorial

Starting a review app

Pushing a commit on a non-main branch in the project repository will trigger a CI pipeline that includes deploy-review job. This job will deploy a review app hosted at <branchname>.donate-review.torproject.net.

Commits to the main branch will be deployed to a review app by the deploy-staging job. The deployment process is similar except the app will be hosted at staging.donate-review.torproject.net.

All review apps are automatically stopped and cleaned up once the associated branch is deleted.

Testing the donation site

This is the DONATE PAGE TESTING PLAN, START TESTING 26 AUGUST 2024 (except crypto any time). It was originally made in a Google docs but was converted into this wiki page for future-proofing in August 2024, see tpo/web/donate-neo#14.

The donation process can be tested without a real credit card. When the frontend (donate.torproject.org) is updated, GitLab CI builds and deploys a staging version at <https://staging.donate-review.torproject.net/.

It's possible to fill in the donation form on this page, and use Stripe test credit card numbers for the payment information. When a donation is submitted on this form, it should be processed by the PHP middleware and inserted into the staging CiviCRM instance. It should also be visible in the "test" Stripe interface.

Note that it is not possible to test real credit card numbers on sites using the "test" Stripe interface, just like it is not possible to use testing card numbers on sites using the "real" Stripe interface.

The same is true for Paypal: A separate "sandbox" application is created for testing purposes, and a test user is created and attached that application for the sake of testing. Said user is able to make both one-time and recurring transactions, and the states of those transactions are visible in the "sandbox" Paypal interface. And as with Stripe, it is not possible to make transactions with that fake user outside of that sandbox environment.

The authentication for that fake, sandboxed user should be available in the password store. (TODO: Can someone with access confirm/phrase better?)

NAIVE USER SITE TESTS

# What are we proving Who's Testing? Start when? How are we proving it
1 Basic tire-kicking testing of non-donation pages and links Tor staff (any) 27 August FAQ, Crypto page, header links, footer links; note any nonfunctional link(s) - WRITE INSTRUCTIONS
2 Ensure test-card transactions are successful - this is a site navigation / design test Tor staff 27 August Make payment with test cards; take screenshot(s) of final result OR anything that looks out of place, noting OS and browser; record transactions in google sheet - MATT WRITES INSTRUCTIONS

Crypto tests

# What are we proving Who's Testing? Start when? How are we proving it
3 Ensure that QR codes behave as expected when scanned with wallet app Al, Stephen ASAP Someone with a wallet app should scan each QR code and ensure that the correct crypto address for the correct cryptocurrency is populated in the app, in whichever manner is expected - this should not require us to further ensure that the wallet app itself acts as intended, unless that is desired
4 Post-transaction screen deemed acceptable (and if we have to make one, we make it) Al, Stephen ASAP (before sue's vacation) Al? makes a transaction, livestreams or screenshots result
5 Sue confirms that transaction has gone through to Tor wallet Al, Sue ASAP Al/Stephen make a transaction, Sue confirms receipt

Mock transaction testing

# What are we proving Who's Testing? Start when? How are we proving it
6 Ensure credit card one-time payments are tracked Matt, Stephen ~27 August Make payment with for-testing CC# and conspicuous donor name, then check donation list in CiviCRM
7 Ensure credit card errors are not tracked Matt, Stephen ~27 August Make payment with for-testing intentionally-error-throwing CC# (4000 0000 0000 0002) and ensure CiviCRM does not receive data. Ideally, ensure event is logged
8 Ensure Paypal one-time payments are tracked Matt, Stephen ~27 August Make payment with for-testing Paypal account, then check donation list in CiviCRM
9 Ensure Stripe recurring payments are tracked Matt, Stephen ~27 August Make payment with for-testing CC# and conspicuous donor name, then check donation list in CiviCRM (and ensure type is "recurring")
10 Ensure Paypal recurring payments are tracked Matt, Stephen ~27 August Make payment with for-testing Paypal account, then check donation list in CiviCRM (and ensure type is "recurring")

Stripe clock testing

Note: Stripe does not currently allow for clock tests to be performed with preseeded invoice IDs, so it is currently not possible to perform clock tests in a way which maps CiviCRM user data or donation form data to the donation. Successful Stripe clock tests will appear in CiviCRM Staging as anonymous.

# What are we proving Who's Testing? Start when? How are we proving it
11 Ensure future credit card recurring payments are tracked Matt, Stephen ~27 August Set up clock testing suite in Stripe backend with dummy user and for-testing CC# which starts on ~27 June or July, then advance clock forward until it can be rebilled. Observe behavior in CiviCRM (the donation will be anonymous as noted above).

Stripe and Paypal recurring transaction webhook event testing

# What are we proving Who's Testing? Start when? How are we proving it
12 Ensure future credit card errors are tracked Matt, Stephen ~27 August Trigger relevant webhook event with Stripe testing tools, inspect result as captured by CiviCRM
13 Ensure future Paypal recurring payments are tracked Matt, Stephen ~27 August Trigger relevant webhook event with Paypal testing tools, inspect result as captured by CiviCRM
14 Ensure future Paypal errors are tracked Matt, Stephen ~27 August Trigger relevant webhook event with Stripe testing tools, inspect result as captured by CiviCRM

NEWSLETTER SIGNUP

# What are we proving Who's Testing? Start when? How are we proving it
15 Test standalone subscription form Matt, Stephen ~27 August CiviCRM receives intent to subscribe and generates - and sends - a confirmation email
16 Test confirmation email link Matt, Stephen ~27 August Donate-staging should show a success/thank-you page; user should be registered as newsletter subscriber in CiviCRM
17 Test donation form subscription checkbox Matt, Stephen ~27 August Should generate and send confirmation email just like standalone form
18 Test "newsletter actions" Matt, Stephen ~27 August Should be able to unsub/resub/cancel sub from bespoke endpoints & have change in status reflected in subscriber status in CiviCRM

POST LAUNCH transaction tests

# What are we proving Who's Testing? Start when? How are we proving it
19 Ensure gift card transactions are successful Matt, Stephen 10 September Make payment with gift card and conspicuous donor name, then check donation list in CiviCRM
20 Ensure live Paypal transactions are successful Matt, Stephen 10 September Make payments with personal Paypal accounts, then check donation list in CiviCRM

Here's the test procedure for steps 15-17:

  • https://staging.donate-review.torproject.net/subscribe/ (tor-www / blank)
  • fill in and submit the form
  • Run the Scheduled Job: https://staging.crm.torproject.org/civicrm/admin/joblog?reset=1&jid=23
  • Remove the kill-switch, if necessary: https://staging.crm.torproject.org/civicrm/admin/setting/torcrm
  • View the email sent: https://staging.crm.torproject.org/civicrm/admin/mailreader?limit=20&order=DESC&reset=1
  • Click on the link to confirm
  • Run the Scheduled Job again: https://staging.crm.torproject.org/civicrm/admin/joblog?reset=1&jid=23
  • Find the contact record (search by email), and confirm that the email was added to the "Tor News" group.

Issue checklist

To be copy-pasted in an issue:

TODO: add newsletter testing

This is a summary of the checklist available in the TPA wiki:

Naive user site testing

  • [ ] 1 Basic tire-kicking testing of non-donation pages and links (Tor staff (any))
  • [ ] 2 Donation form testing with test Stripe CC number (Tor staff (any))

BTCPay tests

  • [ ] 3 Ensure that QR codes behave as expected when scanned with wallet app (Al?, Stephen)
  • [ ] 4 Post-transaction screen deemed acceptable (and if we have to make one, we make it) (Al, Stephen)
  • [ ] 5 Someone with Tor wallet access confirms receipt of transaction (Al, Sue)

Mock transaction testing

  • [ ] 6 Ensure credit card one-time payments are tracked (Matt, Stephen)
  • [ ] 7 Ensure credit card errors are not tracked (Matt, Stephen)
  • [ ] 8 Ensure Paypal one-time payments are tracked (Matt, Stephen)
  • [ ] 9 Ensure credit card recurring payments are tracked
  • [ ] 10 Ensure Paypal recurring payments are tracked

Stripe clock testing

Note: Stripe does not currently allow for clock tests to be performed with preseeded invoice IDs, so it is currently not possible to perform clock tests in a way which maps CiviCRM user data or donation form data to the donation. Successful Stripe clock tests will appear in CiviCRM Staging as anonymous.

  • [ ] 11 Ensure future credit card recurring payments are tracked

Stripe and Paypal recurring transaction webhook event testing

Neither Stripe nor Paypal allow for proper testing against recurring payments failing billing, and Paypal itself doesn't even allow for proper testing of recurring payments as Stripe does above. Therefore, we rely on a combination of manual webhook event generation - which won't allow us to map CiviCRM user data or donation form data to the donation, but which will allow for anonymous donation events to be captured in CiviCRM - and unit testing, both in donate-neo and civicrm.

  • [ ] 12 Ensure future credit card errors are tracked
  • [ ] 13 Ensure future Paypal recurring payments are tracked
  • [ ] 14 Ensure future Paypal errors are tracked

Newsletter infra testing

  • [ ] 15 Test standalone subscription form (Matt, Stephen)
  • [ ] 16 Test confirmation email link (Matt, Stephen)
  • [ ] 17 Test donation form subscription checkbox (Matt, Stephen)
  • [ ] 18 Test "newsletter actions" (Matt, Stephen)

Site goes live

Live transaction testing

  • [ ] 19 Ensure gift card credit card transactions are successful (Matt, Stephen)
  • [ ] 20 Ensure live Paypal transactions are successful (Matt, Stephen)

Pushing to production

If you have to make a change to the donate site, the most reliable way is to follow the normal review apps procedure.

  1. Make a merge request against donate-neo. This will spin up a container and the review app.

  2. Review: once all CI checks pass, test the review app, which can be done in a limited way (e.g. it doesn't have payment processor feedback). Ideally, another developer reviews and approves the merge request.

  3. Merge the branch: that other developer can merge the code once all checks have been done and code looks good.

  4. Test staging: the merge will trigger a deployment to "staging" (https://staging.donate-review.torproject.net/). This can be more extensively tested with actual test credit card numbers (see the full test procedure for major changes).

  5. Deploy to prod: the container built for staging is now ready to be pushed to production. In the latest pipeline generated from the merge in step 3 will have a "manual step" (deploy-prod) with a "play" button. This will run a CI job that will tell the production server to pull the new container and reload prod.

For hotfixes, steps 2 can be skipped, and the same developer can do all operations.

In theory, it's possible to enter the production container and make changes directly there, but this is strongly discouraged and deliberately not documented here.

How-to

Rotating API tokens

If we feel our API tokens might have been exposed, or staff leaves and we would feel more comfortable replacing those secrets, we need to rotate API tokens. There are two to replace: Stripe and PayPal keys.

Both staging and production sets of Paypal and Stripe API tokens are stored in Trocla on the Puppet server. To rotate them, the general procedure is to generate a new token, add it to Trocla, the run Puppet on either donate-01 (production) or donate-review-01 (staging).

Stripe rotation procedure

Stripe has an excellent Stripe roll key procedure. You first need to have a developer account (ask accounting) then head over to the test API keys page to manage API keys used on staging.

PayPal rotation procedure

A similar procedure can be followed for PayPal, but has not been documented thoroughly.

To the best of our best knowledge right now, if you log in to the developer dashboard and select "apps & credentials" there should be a section labeled "REST API Apps" which contains the application we're using for the live site - it should have a listing for the client ID and app secret (as well as a separate section somewhere for the sandbox client id and app secret)."

Updating perk data

The perk data is stored in the perks.json file at the root of the project.

Updating the contents of this file should not be done manually as it requires strict synchronization between the tordonate app and CiviCRM.

Instead, the data should be updated first in CiviCRM, then exported using the dedicated JSON export page.

This generated data can directly replace the existing perks.json file.

To do this using the GitLab web interface, follow these instructions:

  • Go to: https://gitlab.torproject.org/tpo/web/donate-neo/-/blob/main/perks.json
  • Click "Edit (single file)"
  • Delete the text (click in the text box, select all, delete)
  • Paste the text copied from CiviCRM
  • Click "Commit changes"
  • Commit message: Adapt the commit message to be a bit more descriptive (eg: "2025 YEC perks", and include the issue number if one exists)
  • Branch: commit to a new branch, call it something like "yec2025"
  • Check "create a merge request for this change"
  • Then click "commit changes" and continue with the merge-request.

Once the changes are merged, they will be deployed to staging automatically. To deploy the changes to production, after testing, trigger the manual "deploy-prod" CI job.

Pager playbook

High latency

If the site is experiencing high latency, check metrics to look for CPU or I/O contention. Live monitoring (eg. with htop) might be helpful to track down the cause.

If the app is serving a lot of traffic, gunicorn workers may simply be overwhelmed. In that case, consider increasing the number of workers at least temporarily to see if that helps. See the $gunicorn_workers parameter on the profile::donate Puppet class.

Errors and exceptions

If the application is misbehaving, it's likely an error message or stack trace will be found in the logs. That should provide a clue as to which parts of the app is involved in the error, and how to reproduce it.

Stripe card testing

A common problem for non-profits that accept donations via Stripe is "card testing". Card testing is the practice of making small transactions with stolen credit card information to check that the card information is correct and the card is still working. Card testing impacts organizations negatively in several ways: in addition to the bad publicity of taking money from the victims of credit card theft, Stripe will automatically block transactions they deem to be suspicious or fraudulent. Stripe's automated fraud-blocking costs a very small amount of money per blocked transaction, when tens of thousands of transactions start getting blocked, tens of thousands of dollars can suddenly disappear. It's important for the safety of credit card theft victims and for the safety of the organization to crush card testing as fast as possible.

Most of the techniques used to stop card testing are also antithetical to Tor's mission. The general idea is that the more roadblocks you put in the way of a donation, the more likely it is that card testers will pick someone else to card test. These techniques usually result in blocking users of the tor network or tor browser, either as a primary or side effect.

  • Using cloudflare
  • Forcing donors to create an account
  • Unusable captchas
  • Proof of work

However, we have identified some techniques that do work, with minimal impact to our legitimate donors.

  • Rate limiting donations
  • preemptively blocking IP ranges in firewalls
  • Metrics

An example of rate limiting looks something like this: Allow users to make no more than 10 donation attempts in a day. If a user makes 5 failed attempts within 3 minutes, block them for a period of several days to a week. The trick here is to catch malicious users without losing donations from legitimate users who might just be bad at typing in their card details, or might be trying every card they have before they find one that works. This is where metrics and visualization comes in handy. If you can establish a pattern, you can find the culprits. For example: the IP range 123.256.0.0/24 is making one attempt per minute, with a 99% failure rate. Now you've established that there's a card testing attack, and you can go into EMERGENCY CARD-TESTING LOCKDOWN MODE, throttling or disabling donations, and blocking IP ranges.

Blocking IP ranges is not a silver bullet. The standard is to block all non-residential Ip addresses; after all, why would a VPS IP address be donating to the Tor Project? It turns out that some people who like tor want to donate over the tor network, and their traffic will most likely be coming from VPS providers - not many people run exit nodes from their residential network. So while blocking all of Digital Ocean is a bad idea, it's less of a bad idea to block individual addresses. Card testers also occasionally use VPS providers that have lax abuse policies, but strict anti-tor/anti-exit policies; in these situations it's much more acceptable to block an entire AS, since it's extremely unlikely an exit node will get caught in the block.

As mentioned above, metrics are the biggest tool in the fight against card testing. Before you can do anything or even realize that you're being card tested, you'll need metrics. Metrics will let you identify card testers, or even let you know it's time to turn off donations before you get hit with a $10,000 from Stripe. Even if your card testing opponents are smart, and use wildly varying IP ranges from different autonomous systems, metrics will show you that you're having abnormally large/expensive amounts of blocked donations.

Sometimes, during attacks, log analysis is performed on the ratelimit.og file (below) to ban certain botnets. The block list is maintained in Puppet (modules/profile/files/crm-blocklist.txt) and deployed in /srv/donate.torproject.org/blocklist.txt. That file is hooked in the webserver which gives a 403 error when an entry is present. A possible improvement to this might be to proactively add IPs to the list once they cross a certain threshold and then redirect users to a 403 page instead of giving a plain error code like this.

donate-neo implements IP rate limiting through django-ratelimit. It should be noted that while this library does allow rate limiting by IP, as well as by various other methods, it has a known limitation wherein information about the particular rate-limiting event is not passed outside of the application core to the handlers of these events - so while it is possible to log or generate metrics from a user hitting the rate limit, those logs and metrics do not have access to why the rate-limit event was fired, or what it fired upon. (The IP address can be scraped from the originating HTTP request, at least.)

Redis is unreachable from the frontend server

The frontend server depends on being able to contact Redis on the CiviCRM server. Transactions need to interact with Redis in order to complete successfully.

If Redis is unreachable, first check if the VPN is disconnected:

root@donate-01:~# ipsec status
Routed Connections:
civicrm::crm-int-01{1}:  ROUTED, TUNNEL, reqid 1
civicrm::crm-int-01{1}:   49.12.57.139/32 172.30.136.4/32 2a01:4f8:fff0:4f:266:37ff:fe04:d2bd/128 === 172.30.136.1/32 204.8.99.142/32 2620:7:6002:0:266:37ff:fe4d:f883/128
Security Associations (1 up, 0 connecting):
civicrm::crm-int-01[10]: ESTABLISHED 2 hours ago, 49.12.57.139[49.12.57.139]...204.8.99.142[204.8.99.142]
civicrm::crm-int-01{42}:  INSTALLED, TUNNEL, reqid 1, ESP SPIs: c644b828_i cd819116_o
civicrm::crm-int-01{42}:   49.12.57.139/32 172.30.136.4/32 2a01:4f8:fff0:4f:266:37ff:fe04:d2bd/128 === 172.30.136.1/32 204.8.99.142/32 2620:7:6002:0:266:37ff:fe4d:f883/128

If the command shows something else than the status above, then try to reconnect the tunnel:

ipsec up civicrm::crm-int-01

If still unsuccessful, check the output from that command, or logs from strongSwan. See also the IPsec documentation for more troubleshooting tricks.

If the tunnel is up, you can check that you can reach the service from the frontend server. Redis uses a simple text-based protocol over TCP, and there's a PING command you can use to test availability:

echo PING | nc -w 1 crm-int-01-priv 6379

Or you can try reproducing the blackbox probe directly, with:

curl 'http://localhost:9115/probe?target=crm-int-01-priv:6379&module=redis_banner&debug=true'

If you can't reach the service, check on the CiviCRM server (currently crm-int-01.torproject.org) that the Redis service is correctly running.

Disaster recovery

A disaster, for the donation site, can take two major forms:

  • complete hardware failure or data loss
  • security intrusion or leak

In the event that the production donation server (currently donate-01) server or the "review server" (donate-review-01) fail, they must be rebuilt from scratch and restored from backups. See Installation below.

If there's an intrusion on the server, that is a much more severe situation. The machine should immediately be cut off from the network, and a full secrets rotation (Stripe, Paypal) should be started. An audit of the backend CiviCRM server should also be started.

If the Redis server dies, we might lose donations that were currently processing, but otherwise it is disposable and data should be recreated as required by the frontend.

Reference

Installation

main donation server

To build a new donation server:

  1. bootstrap a new virtual machine (see new-machine up to Puppet
  2. add the role: donate parameter to the new machine in hiera-enc on tor-puppet.git
  3. run Puppet on the machine

This will pull the containers.torproject.org/tpo/web/donate-neo/main container image from the GitLab registry and deploy it, along with Apache, TLS certificates and the onion service.

For auto-deployment from GitLab CI to production, the CI variables PROD_DEPLOY_SSH_HOST_KEY (prod server ssh host key), and PROD_DEPLOY_SSH_PRIVATE_KEY (ssh key authorized to login with tordonate user) must be configured in the project's CI/CD settings.

To setup a new donate-review server

  1. bootstrap a new virtual machine (see new-machine up to Puppet
  2. add the role: donate_review parameter to the new machine in tor-puppet-hiera-enc.git
  3. run puppet on the machine

This should register a new runner in GitLab and start processing jobs.

Upgrades

Most upgrades are performed automatically through Debian packages.

On the staging servers (currently donate-review-01), gitlab-runner is excluded from unattended-upgrades and must be upgraded manually.

The review apps are upgraded when new commits appear in their branch, triggering a rebuild and deployment. Similarly, commits to main are automatically built and deployed to the staging instance.

The production instance is only ever upgraded when a deploy-prod job in the project's pipeline is manually triggered.

SLA

There is not formal SLA for this service, but it's one of the most critical services in our fleet, and outages should probably be prioritized over any other task.

Design and architecture

The donation site is built of two main parts:

  • a django frontend AKA donate-neo
  • a CiviCRM backend

Those two are interconnected with a Redis server protected by an IPsec tunnel.

The documentation here covers only the frontend, and barely the Redis tunnel.

The frontend is a Django site that's also been called "donate-neo" in the past. Inversely, the old site has been called "donate paleo" as well, to disambiguate the "donate site" name.

The site is deployed with containers ran by podman and built in GitLab.

The main donate site is running on a production server (donate-01), where the containers and podman are deployed by Puppet.

There is a staging server and development "review apps" (donate-review-01) that is managed by a gitlab-runner and driven by GitLab CI.

The Django app is designed to be simple: all it's really doing is some templating, validating a form, implementing the payment vendor APIs, and sending donation information to CiviCRM.

This simplicity is powered, in part, by a dependency injection framework which more straightforwardly allows Django apps to leverage data or methods from parallel apps without constantly instantiating transient instances of those other apps.

Here is a relationship diagram by @stephen outlining this dependency tree:

erDiagram
    Redis ||--|{ CiviCRM : "Redis/Resque DAL"
    CiviCRM ||--|{ "Main app (donation form model & view)": "Perk & minimum-donation data"
    CiviCRM ||--|{ "Stripe app": "Donation-related CRM methods"
    CiviCRM ||--|{ "PayPal app": "Donation-related CRM methods"

Despite this simplicity, donate-neo's final design is more complex than its original thumbnailed design. This is largely due to the differential between donate-paleo's implementation of Stripe and PayPal payments, which have changed and become more strictly implemented over time.

In particular, earlier designs for the donate page treated the time-of-transaction result of a donation attempt as canonical. However, both Stripe and PayPal now send webhook messages post-donation intended to serve as the final word on whether a transaction was accepted or rejected. donate-neo therefore requires confirmation of a transaction via webhook before sending donation data to CiviCRM.

Also of note is the way CiviCRM-held perk information and donation minimums are sent to donate-neo. In early design discussions between @mathieu and @kez, this data was intended to be retrieved via straightforward HTTP requests to CiviCRM's API. However, this turned out to be at cross-purposes with the server architecture design, in which communication between the Django server and the CiviCRM server would only occur via IPsec tunnel.

As a result, perk and donation minimum data is exported from CiviCRM and stored in the donate-neo repository as a JSON file. (Note that as of this writing, the raw export of that data by CiviCRM is not valid JSON and must be massaged by hand before donate-neo can read it, see tpo/web/donate-neo#53.)

Following is a sequence diagram by @stephen describing the donation flow from user-initiated page request to receipt by CiviCRM:

sequenceDiagram
    actor user
    participant donate as donate tpo
    participant pp as payment processor
    participant civi as civicrm
    civi->>donate: Perk data manually pulled
    user->>donate: Visits the donation site
    donate->>user: Responds with a fully-rendered donation form
    pp->>user: Embeds payment interface on page via vendor-hosted JS
    user->>donate: Completes and submits donation form
    donate->>donate: Validates form, creates payment contract with Stripe/PayPal
    donate->>pp: Initiates payment process
    donate->>user: Redirects to donation thank you page
    pp->>donate: Sends webhook confirming results of transaction
    donate->>civi: Submits donation and perk info

Original design

The original sequence diagram built by @kez in January 2023 (tpo/web/donate-static#107) looked like this but shouldn't be considered valid anymore:

sequenceDiagram
    user->>donate.tpo: visits the donation site
    donate.tpo->>civicrm: requests the current perks, and prices
    civicrm->>donate.tpo: stickers: 25, t-shirt: 75...
    donate.tpo->>user: responds with a fully-rendered donation form
    user->>donate.tpo: submits the donation form with stripe/paypal details
    donate.tpo->>donate.tpo: validates form, creates payment contract with stripe/paypal
    donate.tpo->>civicrm: submits donation and perk info
    donate.tpo->>user: redirects to donation thank you page

Another possible implementation was this:

graph TD
    A(user visits donate.tpo)
    A --> B(django backend serves the donation form, with the all the active perks)
    B --> C(user submits form)
    C --> D(django frontend creates payment contract with paypal/stripe)
    D --> E(django backend validates form)
    E --> F(django backend passes donation info to civi)
    F --> G(django backend redirects to donation thank you page)
    F --> H(civi gets the donation info from the django backend, and adds it to the civi database without trying to validate the donation amount or perks/swag)

See tpo/web/donate-neo#79 for the task of clarifying those docs.

Review apps

Those are made of three parts:

  • the donate-neo .gitlab-ci.yml file
  • the review-app.conf apache2 configuration file
  • the ci-reviewapp-generate-vhosts script

When a new feature branch is pushed to the project repository, the CI pipeline will build a new container and store it in the project's container registry.

If tests are successful, the pipeline will then run a job on the shell executor to create (or update) a rootless podman container in the gitlab-runner user context. This container is set up to expose its internal port 8000 to a random outside port on the host.

Finally, the ci-reviewapp-generate-vhosts script is executed via sudo. It will inspect all the running review app containers and create a configuration file where each line will instantiate a virtual host macro. These virtual hosts will proxy incoming connections to the appropriate port where the container is listening.

Here's a diagram of the, which is a test and deployment pipeline based on containers:

A wildcard certificate for *.donate-review.torproject.net is used for all review apps virtual host configurations.

Services

  • apache acts as a reverse proxy for TLS termination and basic authentication
  • podman containers deploy the code, one container per review app
  • gitlab-runner deploys review apps

Storage

Django stores data in SQLite database, in /home/tordonate/app/db.sqlite3 inside the container. In typical Django fashion, it stores information about user sessions, users, logs, and CAPTCHA tokens.

At present, donate-neo barely leverages Django's database; the django-simple-captcha stores CAPTCHA images it generates there (in captcha_captchastore), and that's all that's kept there beyond what Django creates by default. Site copy is hardcoded into the templates.

donate-neo does leverage the Redis pool, which it shares with CiviCRM, for a handful of transient get-and-set-like operations related to confirming donations and newsletter subscriptions. While this was by design - the intent being to keep all user information as far away from the front end as possible - it is worth mentioning that the Django database layer could also perform this work, if it becomes desirable to keep these operations out of Redis.

Queues

Redis is used as a queue to process transactions from the frontend to the CiviCRM backend. It handles those types of transactions:

  • One-time donations (successful)
  • Recurring donations (both successful and failed, in order to track when recurring donations lapse)
  • Mailing list subscriptions (essentially middleware between https://newsletter.torproject.org and CiviCRM, so users have a way to click a "confirm subscription" URL without exposing CiviCRM to the open web)
  • Mailing list actions, such as "unsubscribe" and "optout" (acting as middleware, as above, so that newsletters can link to these actions in the footer)

The Redis server runs on the CiviCRM server, and is accessed through an IPsec tunnel, see the authentication section below as well. The Django application reimplements the resque queue (originally written in Ruby, ported to PHP by GiantRabbit, and here ported to Python) to pass messages to the CiviCRM backend.

Both types of donations and mailing list subscriptions are confirmed before they are queued for processing by CiviCRM. In both cases, unconfirmed data notionally bound for CiviCRM is kept temporarily as a key-value pair in Redis. (See Storage above.) The keys for such data are created using information unique to that transaction; payment-specific IDs are generated by payment providers, whereas donate-neo creates its own unique tokens for confirming newsletter subscriptions.

Donations are confirmed via incoming webhook messages from payment providers (see Interfaces below), who must first confirm the validity of the payment method. Webhook messages themselves are validated independently with the payment provider; pertinent data is then retrieved from the message, which includes the aforementioned payment-specific ID used to create the key which the form data has been stored under.

Recurring donations which are being rebilled will generate incoming webhook messages, but they will not pair with any stored form data, so they are passed along to CiviCRM with a recurring_billing_id that CiviCRM uses to group them with a recurring donation series.

Recurring PayPal donations first made on donate-paleo also issue legacy IPN messages, and have a separate handler and validator from webhooks, but contain data conforming to the Resque handler and so are passed to CiviCRM and processed in the same manner.

Confirming mailing list subscriptions works similarly to confirming donations, but we also coordinate the confirmation process ourselves. Donors who check the "subscribe me!" box in the donation form generate an initial "newsletter subscription requested" message (bearing the subscriber's email address and a unique token), which is promptly queued as a Resque message; upon receipt, CiviCRM generates a simple email to that user with a donate-neo URL (containing said token) for them to click.

Mailing list actions have query parameters added to the URL by CiviCRM which donate-neo checks for and passes along; those query parameters and their values act as their own form of validation (which is CiviCRM-y, and therefore outside of the purview of this writeup).

Interfaces

Most of the interactions with donate happen over HTTP. Payment providers ping back the site with webhook endpoints (and, in the case of legacy donate-paleo NVP/SOAP API recurring payments, a PayPal-specific "IPN" endpoint) which have to bypass CSRF protections.

The views handling these endpoints are designed to only reply with HTTP status codes (200 or 400). If the message is legitimate but was malformed for some reason, the payment providers have enough context to know to try resending the message; in other cases, we keep from leaking any useful data to nosy URL-prodders.

Authentication

donate-neo does not leverage the Django admin interface, and the /admin path has been excluded from the list of paths in tordonate.url; there is therefore no front-end user authentication at all, whether for users or administrators.

The public has access to the donate Django app, but not the backend CiviCRM server. The app and the CiviCRM server talk to each other through a Redis instance, accessible only through an IPsec tunnel (as a 172.16/12 private IP address).

In order to receive contribution data and provide endpoints reachable by Stripe/PayPal, the Django server is configured to receive those requests and pass specific messages using Redis over a secure tunnel to the CRM server

Both servers have firewalled SSH servers (rules defined in Puppet, profile::civicrm). To get access to the port, ask TPA.

CAPTCHAs

There are two separate CAPTCHA systems in place on the donation form:

  • django-simple-captcha, a four-character text CAPTCHA which sits in the form just above the Stripe or Paypal interface and submit button. It integrates with Django's forms natively and failing to fill it out properly will invalidate the form submission even if all other fields are correct. It has an <audio> player just below the image and text field, to assist those who might have trouble reading the characters. CAPTCHA images and audio are generated on the fly and stored in the Django database (and they are the only things used by donate-neo which are so stored).
  • altcha, a challenge-based CAPTCHA in the style of Google reCAPTCHA or Cloudflare Turnstile. When a user interacts with the donation form, the ALTCHA widget makes a request to /challenge/ and receives a proof-of-work challenge (detailed here, in the ALTCHA documentation). Once done, it passes its result to /verifychallenge/, and the server confirms that the challenge is correct (and that its embedded timestamp isn't too old). If correct, the widget calls the Stripe SDK function which embeds the credit card payment form. We re-validate the proof-of-work challenge when the user attempts to submit the donation form as well; it is not sufficient to simply brute force one's way past the ALTCHA via malicious Javascript, as passing that re-validation is necessary for the donate-neo backend to return the donation-specific client secret, which itself is necessary for the Stripe transaction to be made.

django-simple-captcha works well to prevent automated form submission regardless of payment processor, whereas altcha's role is more specifically to prevent automated card testing using the open Stripe form; their roles overlap but including only one or the other would not be sufficient protection against everything that was being thrown at the old donate site.

review apps

The donate-review runner uses token authentication to pick up jobs from GitLab. To access the review apps, HTTP basic authentication is required to prevent passers-by from stumbling onto the review apps and to keep indexing bots at bay. The username is tor-www and the password is blank.

The Django-based review apps don't handle authentication, as there are no management users created by the app deployed from feature branches.

The staging instance deployed from main does have a superuser with access to the management interface. Since the staging instance database is persistent, it's only necessary to create the user account once, manually. The command to do this is:

podman exec --interactive --tty donate-neo_main poetry run ./manage.py createsuperuser

Implementation

Donate is implemented using Django, version 4.2.13 at the time of writing (2024-08-22). A relatively small number of dependencies are documented in the pyproject.toml file and the latest poetry.lock file contains actual versions currently deployed.

Poetry is used to manage dependencies and builds. The frontend CSS / JS code is managed with NPM. The README file has more information about the development setup.

See mainly the CiviCRM server, which provides the backend for this service, handling perks, memberships and mailings.

Issues

File or search for issues in the donate-neo repository.

Maintainer

Mostly TPA (especially for the review apps and production server). A consultant (see upstream below) developed the site but maintenance is performed by TPA.

Users

Anyone doing donations to the Tor Project over the main website is bound to use the donate site.

Upstream

Django should probably be considered the upstream here. According to Wikipedia, "is a free and open-source, Python-based web framework that runs on a web server. It follows the model–template–views (MTV) architectural pattern. It is maintained by the Django Software Foundation (DSF), an independent organization established in the US as a 501(c)(3) non-profit. Some well-known sites that use Django include Instagram, Mozilla, Disqus, Bitbucket, Nextdoor and Clubhouse."

LTS releases are supported for "typically 3 years", see their release process for more background.

Support mostly happens over the community section of the main website, and through Discord, a forum, and GitHub issues.

We had a consultant (stephen) who did a lot of the work on developing the Django app after @kez had gone.

Monitoring and metrics

The donate site is monitored from Prometheus, both at the system level (normal metrics like disk, CPU, memory, etc) and at the application level.

There are a couple of alerts set in the Alertmanager, all "warning", that will pop alerts on IRC if problems come up with the service. All of them have playbooks that link to the pager playbook section here.

The donate neo donations dashboard is the main view of the service in Grafana. It shows the state of the CiviCRM kill switch, transaction rates, errors, the rate limiter, and exception counts. It also has an excerpt of system-level metrics from related servers to draw correlations if there are issues with the service.

There are also links, on the top-right, to Django-specific dashboards that can be used to diagnose performance issues.

Also note that the CiviCRM side of things has its own metrics, see the CiviCRM monitoring and metrics documentation.

Tests

To test donations after upgrades or to confirm everything works, see the Testing the donation site section.

The site's test suite is ran in GitLab CI when a merge request is sent, and a full review app is setup to test the site before the branch is merged. Then staging must be tested as well.

The pytest test suite can be run by entering a poetry shell and running:

coverage run manage.py test

This assumes a local development setup with Poetry, see the project's README file for details.

Code is linted with flake8, mypy and test coverage with coverage.

Logs

The logs may be accessed using the podman logs <container> command, as the user running the container. For the review apps, that user is gitlab-runner while for production, the user is tordonate.

Example command for staging:

sudo -u gitlab-runner -- sh -c "cd ~; podman logs --timestamps donate-neo_staging"

Example command on production:

sudo -u tordonate -- sh -c "cd ~; podman logs --timestamps donate"

On production, the logs are also available in the systemd journal, in the user's context.

Backups

This service has no special backup needs. In particular, all of the donate-review instances are ephemeral, and a new system can be bootstrapped solely from puppet.

Other documentation

Discussion

Overview

donate-review was created as part of tpo/web/donate-neo#6, tpo/tpa/team#41108 and refactored as part of tpo/web/donate-neo#21.

Donate-review's purpose is to provide a review app deploy target for donate-neo. Most of the other tpo/web sites are static lektor sites, and can be easily deployed to a review app target as simple static sites fronted by Apache. But because donate-neo is a Django application, it needs a specially-created deploy target for review apps.

No formal proposal (i.e. TPA-RFC) was established to build this service, but a discussion happened for the first prototype.

Here is the pitch @kez wrote to explain the motivation behind rebuilding the site in Django:

donate.tpo is currently implemented as a static lektor site that communicates with a "middleware" backend (tpo/web/donate) via javascript. this is counter-intuitive; why are the frontend and backend kept so separate? if we coupled the frontend and the backend a bit more closely, we could drop most of the javascript (including the javascript needed for payment processing), and we could create a system that doesn't need code changes every time we want to update donation perks

with the current approach, the static mirror system serves static html pages built by lektor. these static pages use javascript to make requests to donate-api.tpo, our "middleware" server written in php. the middleware piece then communicates with our civicrm instance; this middleware -> civicrm communication is fragile, and sometimes silently breaks

now consider a flask or django web application. a user visits donate.tpo, and is served a page by the web application server. when the user submits their donation form, it's processed entirely by the flask/django backend as opposed to the frontend javascript validating the forum and submitting it to paypal/stripe. the web application server could even request the currently active donation perks, instead of a developer having to hack around javascript and lektor every time the donation perks change

of course, this would be a big change to donate, and would require a non-trivial time investment for planning and building a web application like this. i figured step 1 would be to create a ticket, and we can go from there as the donate redesign progresses

The idea of using Django instead of the previous custom PHP code split in multiple components was that a unified application would be more secure and less error-prone. In donate paleo, all of our form validation happened on the frontend. The middleware piece just passed the donation data to CiviCRM and hopes it's correct. CiviCRM seems to drop donations that don't validate, but I wouldn't rely on that to always drop invalid donations (and it did mean we silently lose "incorrect" donations instead of letting the user correct them).

There was a debate between a CiviCRM-only implementation and the value of adding yet another "custom" layer in front of CiviCRM that we would have to maintain seemingly forever. In the end, we ended up keeping the Redis queue as an intermediate with CiviCRM, partly on advice from our CiviCRM consultant.

Security and risk assessment

django

Django has a relatively good security record and a good security team. Our challenge will be mainly to keep it up to date.

production site

The production server is separate from the review apps to isolate it from the GitLab attack surface. It was felt that doing full "continuous deployment" was dangerous, and we require manual deployments and reviews before GitLab-generated code can be deployed in that sensitive environment.

donate-review is a shell executor, which means each CI job is executed with no real sandboxing or containerization. There was an attempt to set up the runner using systemd-nspawn, but it was taking too long and we eventually decided against it.

Currently, project members with Developer permission or above in the donate-neo project may edit the CI configuration to execute arbitrary commands as the gitlab-runner user on the machine. Since these users are all trusted contributors, this should pose no problem. However, care should be taken to ensure no untrusted party is allowed to gain this privilege.

Technical debt and next steps

PII handling and Stripe Radar

donate-neo is severely opinionated about user PII; it attempts to handle it as little as is necessary and discard it as soon as possible. This is at odds with Stripe Radar's fraud detection algorithm, which weights a given transaction as "less fraudulent" the more user PII is attached to it. This clash is compounded by the number of well-intended donors using Tor exit node IPs - some of which which bear low reputation scores with Stripe due to bad behavior by prior users. This results in some transactions being rejected due to receiving insufficient signals of legitimacy. See Stripe's docs here and here.

Dependencies chase

The renovate-cron project should be used on the donate-neo codebase to ensure timely upgrades to the staging and production deployments. See tpo/web/donate-neo#46. The upgrades section should be fixed when that is done.

Django upgrades

We are running Django 4, released in April 2023, an LTS release supported until April 2026. The upgrade to Django 5 will carefully require reviewing release notes for deprecations and removals, see how to upgrade for details.

The next step here is to make the donate-review service fully generic to allow other web projects with special runtime requirements to deploy review apps in the same manner.

Proposed Solution

No upcoming major changes are currently on the table for this service. As of August 2023, we're launching the site and have our hands full with that.

Other alternatives

A Django app is not the only way this could have gone. Previously, we were using a custom PHP-based implementation of a middle ware, fronted by the static mirror infrastructure.

We could also consider using CiviCRM more directly, with a thinner layer in front.

This section describes such alternatives.

CiviCRM-only implementation

In January 2023, during donate-neo's design phase, our CiviCRM consultant suggested looking at a CiviCRM extension called inlay, "a framework to help CiviCRM extension developers embed functionality on external websites".

A similar system is civiproxy, which provides some "bastion host" approach in front of CiviCRM. This approach is particularly interesting because it is actually in use by the Wikimedia Foundation (WMF) to handle requests like "please take me off your mailing list" (see below for more information on the WMF setup).

Civiproxy might eventually replace some parts or all of the Django app, particularly things like (e.g. newsletter.torproject.org). The project hasn't reached 1.0 yet, and WMF doesn't solely rely on it.

Both of those typically assume some sort of CMS lives in front of the system, in our case that would need to be Lektor or some other static site generator, otherwise we'd probably be okay staying with the Django design.

WMF implementation

As mentioned above, the Wikimedia Foundation (WMF) also uses CiviCRM to handle donations.

Talking with the #wikimedia-fundraising (on irc.libera.chat), anarcat learn that they have a setup relatively similar to ours:

  • their civicrm is not publicly available
  • they have a redis queue to bridge a publicly facing site with the civicrm backend
  • they process donations on the frontend

But they also have differences:

  • their frontend is a wikimedia site (they call it donorwiki, it's https://donate.wikimedia.org/)
  • they extensively use queues to do batch processing as CiviCRM is too slow to process entries, their database is massive, with millions of entries

This mediawiki plugin is what runs on the frontend. An interesting thing with their frontend is that it supports handling multiple currencies. For those who remember this, the foundation got some flak recently for soliciting disproportionate donations for users in "poorer" countries, so this is part of that...

It looks like the bits that process the redis queue on the other end are somewhere in this code that eileen linked me to. This is the CiviCRM extension at least, which presumably contains the code which processes the donations.

They're using Redis now, but were using STOMP before, for what that's worth.

They're looking at using coworker to process queues on the CiviCRM side, but I'm not sure that's relevant for us, given our lesser transaction rate. I suspect Tor and WMF have an inverse ratio of foundation vs individual donors, which means we have less transactions to process than they do (and we're smaller anyway).

The old donate frontend was retired in tpo/tpa/team#41511.

Services

The old donate site was built on a server named crm-ext-01.torproject.org, AKA crm-ext-01, which ran:

  • software:
    • Apache with PHP FPM
  • sites:
    • donate-api.torproject.org: production donation API middleware
    • staging.donate-api.torproject.org: staging API
    • test.donate-api.torproject.org: testing API
    • api.donate.torproject.org: not live yet
    • staging-api.donate.torproject.org: not live yet
    • test-api.donate.torproject.org: test site to rename the API middleware (see issue 40123)
    • those sites live in /srv/donate.torproject.org

There was also the https://donate.torproject.org static site hosted in our static hosting mirror network. A donation campaign had to be setup both inside the static site and CiviCRM.

Authentication

The https://donate.torproject.org website was built with Lektor like all the other torproject.org static websites. It doesn't talk to CiviCRM directly. Instead it talks with with the donation API middleware through Javascript, through a React component (available in the donate-static repository). GiantRabbit called that middleware API "slim".

In other words, the donate-api PHP app was the component that allows communications between the donate.torproject.org site and CiviCRM. The public has access to the donate-api app, but not the backend CiviCRM server. The middle and the CiviCRM server talk to each other through a Redis instance, accessible only through an IPsec tunnel (as a 172.16/12 private IP address).

In order to receive contribution data and provide endpoints reachable by Stripe/PayPal, the API server is configured to receive those requests and pass specific messages using Redis over a secure tunnel to the CRM server

Both servers have firewalled SSH servers (rules defined in Puppet, profile::civicrm). To get access to the port, ask TPA.

Once inside SSH, regular users must use sudo to access the tordonate (on the external server) and torcivicrm (on the internal server) accounts, e.g.

crm-ext-01$ sudo -u tordonate git -C /srv/donate.torproject.org/htdocs-stag/ status

Logs

The donate side (on crm-ext-01.torproject.org) uses the Monolog framework for logging. Errors that take place on the production environment are currently configured to send errors via email to to a Giant Rabbit email address and the Tor Project email address donation-drivers@.

The logging configuration is in: crm-ext-01:/srv/donate.torproject.org/htdocs-prod/src/dependencies.php.

Other CAPTCHAs

Tools like anubis, while targeted more at AI scraping bots, could be (re)used as a PoW system if our existing one doesn't work.