This is a description of the critical path for Snowdrift development. I am
going to work on these things in the order presented: secrets storage, backups,
test infrastructure, notifications, product development. This list isn’t set in
stone, but it’s at least set in soft clay(?) so that everyone has an idea what
I’m working on. If you need something that’s not on the list, you might have to
wait a bit! I’ll motivate each item by describing what it will do for us and
drill into its sub-tasks a little bit for completeness.
Something that will raise our bus factor (a good thing) is better management of
operational secrets. These are things like backups, logins, account passphrases,
and certificates. The primary goal is to have a secure location to store secrets
that can be accessed during disaster recovery. Secondarily, we want secrets
stored somewhere that will permit automatic access by systems that have the
proper credentials. The plan for implementation is as follows:
- Prototype an access mechanism — probably using a combination of Vault, PGP,
and ssh keys.
- Implement the access mechanism
- Document the setup
- Implement a NixOps module that instantiates a server with the access mechanism
- Research non-AWS, non-US hosting options for an off-site server.
- Provision an off-site server and install the access mechanism on it
- Populate the secrets
- Distribute access to admins
- Schedule disaster recovery tests (system loss, personnel loss)
Once secret storage in place, we’ll be able to sigh a collective sigh of relief.
We will also be able to begin developing more automated deployment procedures. A
‘staging’ secrets server could be used to test deployments, for example.
The next critical component for the Snowdrift system is safe, automatic backups.
This will give us actual disaster recovery capability. Losing system data could
significantly set back the project’s adoption and chances for success. Backups
will reduce everyone’s stress levels by mitigating that eventuality.
Backups will be stored as secrets. Thus, access to backups will use the
secrets system designed above. Backups will be automated by creating systemd
timer units. Using systemd requires that the app server be upgraded, which
it desperately needs anyway. Upgrading the server will allow us to flex the
secrets system: all secrets needed to set up the server can be noted and stored.
Ideally, the systemd module would only have append-only push access to the
secret store. The plan:
- Implement systemd timer units for backups; test locally
- Implement a NixOps module that instantiates an app server that includes the
- Provision a new app server and migrate to it
- Document the secrets needed for the app server
- Install a backups timer unit on the aux and discourse servers as well
- Schedule a disaster recovery test (database corruption)
Now we have backups and secrets. The world is our oyster. As a last step before
leaving ops aside for a while, I will update the system architecture diagram,
which is currently a mobile phone photograph of a hand-drawn diagram. (It’s
pretty sweet, tho.)
With the critical ops components in place, coding work can continue.
We need all aspects of the website to be testable. Currently we cannot test
routes that call to Stripe, since there is no way to not actually hit Stripe’s
API. We need to mock out Stripe.com to avoid slow API calls and to test handling
of all possible return values.
This will likely be effected by creating a thin wrapper over the existing
Stripe library, then replacing existing Stripe usage with the new library. Once
complete, all aspects of the website should be testable, and I will spend a
few days expanding our test coverage. I will also give a workshop on testing
methods, particularly as they pertain to our app.
The next critical component for the app is a notification engine. This will
be a subsystem that anything can use to send application notifications to two
locations: the website, or user email addresses. We already have code that needs
this subsystem, so we can use those use cases to drive its development. That
will allow us to test, design, implement, and deploy a system that will probably
serve our needs for a good long while.
The final step of my personal critical path is to delegate to MSiep for the rest
of the path. As product lead, he will decide what steps need to be taken to
develop a product that meets the organization’s needs.
I hope that this big ol’ post is a nice indication that Snowdrift development
isn’t completely stalled. In fact, I have 20 hours a week to devote to Snowdrift
for the next little while, so things should feel downright lively. This is
made even more true thanks to the work that a half-dozen other volunteers
are making towards new features, refactorings, and auxiliary development and
documentation. There should be plenty of work for all of us volunteers, so sign
up at git.snowdrift.coop, sign up on the platform!, and keep interacting here on
aka chreekat, lead dev