I’ve started with only the database changes, but auth and api changes are on-topic here too; I just didn’t want to wait to post (and this post is long enough already).
Continuing the discussion from Plan for Moving Off Haskell (Part 1): Overview:
Database Changes
The database is broken into two groups. There’s the authentication tables, which contain basically just “users”— what you log in as. Then there’s the crowdmatch
tables, which contain information about pledges. The crowdmatch__patron
table links the two: the crowdmatch tables refer to ids in the patron table, and patrons in that table have a user id (usr
).
I think this split was a good idea; it makes changes to our auth system much less painful. That’s because many of the crowdmatch tables are there for a historical record. The patron id is repeated many times; we don’t want to have to update it everywhere if user ids change. This design means we only need to update it once per patron.
For now, I am only going to focus on changes to the crowdmatch tables, so I’ll omit the crowdmatch__
prefix in the discussion below.
Current status
Our database is created using Yesod’s default database interface, Persistent, based on a definition in Haskell. The current definition (shortened slightly, you can find the full version in Model.hs) looks like this.
Patron
usr Integer -- UNIQUE
created UTCTime
paymentToken PaymentToken Maybe -- denormalized, can be retrieved from PaymentTokenHistory
donationPayable DonationUnits -- denormalized, can be calculated from CrowdmatchHistory + PaymentHistory
pledgeSince UTCTime Maybe -- denormalized, can be calculated from PledgeHistory. Used to check who should be included in the crowdmatch
PledgeHistory
patron PatronId
time UTCTime
action StorageAction -- ENUM('CREATE', 'DELETE')
PaymentTokenHistory
patron PatronId
time HistoryTime
action StorageAction -- ENUM('CREATE', 'DELETE')
DonationHistory
patron PatronId
time HistoryTime
amount DonationUnits
fee Cents
CrowdmatchHistory
patron PatronId
date CrowdmatchDay
amount DonationUnits -- denormalized, can be calculated from PledgeHistory
I can’t say I understand how it works, but the results are basically what you’d guess.
- Every table has an
id bigint NOT NULL
added automatically. A type ofPatronId
means that column has the same type and that there’s a foreign key constraint so you can only insert values that already exist inPatron.id
. - A bunch of the types are newTypes (non-Haskellers: read “aliases”[1]) defined in ModelDataTypes.hs;
UTCTime
andHistoryTime
both translate to SQLtimestamp
. - If there’s no
Maybe
it’sNOT NULL
(non-SQLers: read “not nullable”[2]). -
StorageAction
is actually a regular string (err,character varying
) under the hood, but Haskell enforces that it will only be one of those two values.
Note: The denormalized fields exist because they’d be expensive to recompute based on the history. That’s fine. It would be nice to actually do the re-computation to double-check that everything is right, just before we charge people.
Should-Be-Uncontroversial Changes
Patron
usr Integer
created UTCTime
paymentToken PaymentToken Maybe
donationPayable Cents -- was DonationUnits
pledgeSince UTCTime Maybe
PledgeHistory
patron PatronId
time UTCTime
action StorageAction
PaymentTokenHistory
patron PatronId
time HistoryTime
action StorageAction
PaymentHistory -- was DonationHistory
patron PatronId
time HistoryTime
amount Cents -- was DonationUnits
fee Cents
transactionInfo ???? -- [NEW] Research needed to figure out exactly what to store here
CrowdmatchHistory
patron PatronId
date CrowdmatchDay
runAt UTCTime -- [NEW] Suggestions for a better name are welcome
amount Cents -- was DonationUnits
Rationale for each
The term “Donation” is sometimes confusing – does it mean charging someone or adding to their balance? Clarify it.
-
DonationHistory
→PaymentHistory
PaymentHistory
is strictly a historical record, so we want to store the same units the transaction actualy happened in.
-
PaymentHistory.amount
:DonationUnits
→Cents
We decided to round to the nearest cent at crowdmatch time (see part 1).
-
CrowdmatchHistory.amount
:DonationUnits
→Cents
This also means that outstanding balances cannot contain fractions of a cent.
-
Patron.donationPayable
:DonationUnits
→Cents
We currently run crowdmatches on the 3rd of the month. Anyone who was pledged on the 1st, and is still pledged on the 3rd, is counted as part of that month’s crowdmatches. This gives patrons 2 days to check how much they’ll be charged and drop their pledge before being counted in the crowdmatch. However, it also means we need to know the date that we ran the crowdmatches on, in order to re-calculate the value. So we need to store that date/time.
- NEW:
CrowdmatchHistory.runAt UTCTime
I’m pretty sure Stripe sends back some metadata about the transaction. I’m also pretty sure we want to store it, for auditability. I’m not sure exactly what that data is, though. Maybe the best thing here would be to use json in postgres and just store their whole response, as a blob.
At the very least, though, we probably want to store:
- Whether the transaction failed or succeeded (so we can also log failed transactions)
- Some form of transaction Id (so we can identify it to stripe, even if we didn’t store the full response)
Supporting Different Pledge Types
From the mechanism discussion, it’s clear that there are a lot of possible different variations, and that we might want to try some of them. Once we have multiple projects, we might even want to try different variations at the same time.
A coworker at my day job suggested that json-in-postgres might be a good fit here. Its performance is not as good if you need to query into the json structure, but if you are just retrieving the blob (so that the client can do the calculations), the overhead is pretty reasonable. Here’s what I was thinking.
PledgeHistory
patron PatronId
time UTCTime
action StorageAction
metadataId PledgeMetadataId
PledgeMetadata -- Better name suggestions?
info jsonb
For the current iteration of the mechanism, all entries in PledgeHistory
would have the same metadataId
, and the corresponding PledgeMetadata.info
could look something like this. Remember that we only need to include the parameters we want to vary, not all the data, since the client will knows which algorithm to use based on the version
field.
{
"version": 1,
"capCents": 600
}
If we decided on patron-based matching, we might go with something like this:
{
"version": 2,
"goalPatrons": 6000,
"pledgeCents": 600
}
Or a more flexible/explicit version that can support both patrons and dollars:
{
"version": 2,
"goalUnits": "cents",
"goal": 3600000,
"pledgeCents": 600
}
If we decided to go with patron-chosen goals, there would be a different row in PledgeMetadata
for each Pledge/Goal combination, and the PledgeHistory
entries would not all point to the same one. This would run into scaling issues sooner, but I did some really rough back-of-the-napkin estimates (which I inconveniently forgot to write down) and we’d almost certainly be fine until around 10 projects with 10k patrons each.
Anyway, two decisions left:
-
I’m not 100% decided on doing this; I’d like to get some more eyes on it and make sure there’s not a convincing argument against using json-in-postgres like this.
-
Should
version
be a required field in the json blob? If yes, should it be a column outside the json blob instead?
Supporting Multiple Projects
Although we won’t be supporting multiple projects right away, it is just over the horizon. I do not think we should actually add these columns to the DB now. However, we should consider whether they can easily be added later— to make sure it’s just a matter of a new Project
table and a bunch of ProjectId
s added to existing tables, not any major refactoring the tables that exist.
Patron
usr Integer
created UTCTime
paymentToken PaymentToken Maybe
donationPayable Cents
pledgeSince UTCTime Maybe -- Problem
Project
name String
-- Whatever other info we need here
PledgeHistory
patron PatronId
project ProjectId -- NEW
time UTCTime
action StorageAction
PaymentTokenHistory
patron PatronId
project ProjectId -- NEW
time HistoryTime
action StorageAction
metadataId PledgeMetadataId
PledgeMetadata
info jsonb
PaymentHistory
patron PatronId
time HistoryTime
amount Cents
fee Cents
transactionInfo ????
-- Need to verify that we wouldn't get any additional info to record here
CrowdmatchHistory
patron PatronId
project ProjectId -- NEW
date CrowdmatchDay
runAt UTCTime
amount Cents
Mostly smooth, except pledgeSince
needs to be pulled out into a different table, Pledge
, basically a cache of PledgeHistory
(or, from the other perspective, PledgeHistory
being a log of changes to Pledge
). We could do that now, or we could wait— it’s a relatively small refactor, although it does still involve copying data between tables.
Alternatively, I wonder if we just dropped pledgeSince
, whether we could find a fast-enough way to query PledgeHistory
directly for the most-recent update for some patron
and project
combination, given that time
should be monotonic.