DynamoDB provisioned throughput

Hi, we have a on-prem Datomic system (version 0.9.5786 — we aren’t renewing our license any more, and that’s one of the most recent we can still use; this is a legacy system that I’m trying to trim down to as small a footprint as possible) that uses DynamoDB for storage. We are currently using on-demand pricing mode for DynamoDB, because we haven’t had any luck with autoscaling and we don’t know how to provision throughput.

I’m seeing read capacity usage hover under 10 most of the time, with regular spikes around 100 (I believe these spikes are our backups, which we run periodically). Write capacity hovers at under 20, but spikes every few hours to 1200 (some logs analysis seems to point to Datomic flushing indexes). Setting the read capacity to 100 would be fine, and I think would save some money, but setting the write capacity to 1200 is out of the question. I’m certain our app is writing out more than it reads, given these numbers, which isn’t ideal but might be harder for me to change.

Is there a good way to configure Datomic to have a lighter-weight indexing process? Or is there a good DynamoDB configuration with autoscaling that can handle this load better? I think the last time I looked at migrating to Datomic Cloud the pricing just wasn’t good enough, it would be the same or more than our current system, and the migration path isn’t great still.

I realize this isn’t a great way to ask a question, since we aren’t actually paying for Datomic any longer, but again it’s a legacy system that I want to keep running but not be too expensive.

I’m a datomic on-prem user.

I was under the impression datomic read the properties of the ddb-tables and uses the capacity provided without going above it.

Some questions/thoughts

  • are you doing forced re-indexing before the workload is huge and capacity consuming?

  • are you doing gc regularly?
    (
    https://tonsky.me/blog/unofficial-guide-to-datomic-internals/ )

  • have you tried to restore the db to a pristine ddb table? I dont see it should make a huge difference but it can make more efficient use of the storage

/Linus

lör 16 jan. 2021 kl. 01:34 skrev Casey Marshall via Datomic Developers <cognitect@discoursemail.com>:

I’m not sure how Datomic handles DynamoDB capacity. Anecdotally, it doesn’t do well when it starts breaching capacity — I had tried enabling auto-scaling for our DynamoDB tables, and it did not scale up fast enough and starting failing, causing user-visible errors. This might have been fixed in newer versions of Datomic, but I don’t recall seeing any announcements of that.

We are not forcing re-indexing (I’m not sure how to do that, I wasn’t aware it was possible).

We do gc-storage regularly (currently, weekly). I’m not sure what a good cadence is for that.

Restoring to a new table might be a good option, if nothing else, it would swap out the pretty long history we currently have. I still don’t have a great way to do that kind of migration without downtime, though.

Thanks.

To avoid much of the downtime you would need to

  1. take a backup
  2. restore that backup (will take long time)
  3. put the system in read only mode
  4. take backup
  5. restore the rest of the backup (this is much faster than 2)
  6. make the system rw with the new db

To be able to make a system without any downtime you would need to be able to make the new system listen to the old systems writes until it took over with a fully updated db. Or be able to mirror the transactions from the sessions from the old system to while also creating new in the new db. If the system is working straight to without any queues or similar in between the transactions, that is.

/Linus

tis 19 jan. 2021 kl. 04:12 skrev Casey Marshall via Datomic Developers <cognitect@discoursemail.com>:

Switch the storage to postgresql (hosted on a VM with local NVMe), there you won’t be restricted/limited by read/writes.

I sympathize: we use dynamo with a large datomic db (> 8 billion datoms), and provisioning is a struggle.

Datomic does not automatically adjust to dynamodb capacity beyond naive backoff when dynamo starts rejecting requests. I’m not sure it does write prioritization either. If you exceed provisioned capacity the database will eventually fail because it cannot write its latest transaction log or even its heartbeat.

Notes on adjusting for use with dynamodb here: Capacity Planning | Datomic

The spiky read pattern is backups; the spiky write pattern is indexing and garbage collection. Indexing produces garbage.

You can flatten out your read capacity by

  • running backups more often, even continuously in a loop. This may write more garbage to your backup depending on your index rate, so your backups may become larger. The only way to eliminate this is to periodically perform a fresh “from scratch” backup into a new storage directory.
  • Lowering datomic.readConcurrency (I think) so it reads from dynamo more slowly.

You can flatten out your write capacity by:

  • Collect garbage more often; or not ever collecting it!
  • Lower datomic.writeConcurrency. Do this before any of the steps below…
  • index more often. Look at the graph of the datomic MemoryIndexMB metric. It should have a sawtooth pattern until it hits the minimum memory threshold (memory-index-threshold transactor setting), then there’s a cliff when it writes the index. If your slope is consistent, you can adjust the threshold down so that flushes occur more often and are therefore smaller. This will generate more garbage though.
  • index manually. You could “plan” indexes by temporarily increasing provisioned capacity, running datomic.api/request-index, then lowering it when it’s done. In combination with memory-index-threshold adjustments and knowledge of historical MemoryIndexMB behavior you can kind-of ensure that you won’t reach the threshold until you plan it.

Consider moving to an unmetered storage or some existing sql storage you have lying around. Datomic usages storage as a pretty dumb key-value store–this will put nearly no query load on your DB, but will read and write large blobs. There’s not that much special about datomic on dynamodb vs other storages, and in fact there’s a lot that isn’t a great fit. (e.g. storing binary blobs as strangely-encoded strings, having to split large segments into multiple dynamodb keys, being relatively expensive and having very spiky write patterns.)

1 Like