General Product Questions

A few years back I did some prototype work with datomic and even wrote a SQL-Like-Syntax>Datomic compiler, however after spending a few months reviewing I decided against datomic (some of the reasons behind the decision were specific to my product and architecture so I’m not going list out the details). Now, however, with a cloud offering I am taking a second look. Also note I haven’t used Amazons products (and don’t have an aws account) so please bear with me; some of these questions could be noobish.

  1. When I review your Amazon store page I see two Fulfillment Options, Solo and Production. After switching between Solo vs. Production the proceeding pricing options do not change and always shows “t2.small Vendor Recommended” (as the default option). I then reviewed the schematics (labeled “cloud formation templates”) and can only see two distinct differences. 1. the load-balancer has been added in production and 2. cloudwatch has been displaced. I assume the latter is meaningless. Reviewing the datomic docs shows Production creates “two or more i3.large Nodes” (source: https://docs.datomic.com/cloud/whatis/architecture.html#production) Why does the aws page show “t2.small Vendor Recommended” for ‘Production’ and not " i3.large Nodes"? I ask because I’m trying to get a sense of cost for a base system, but I am not getting that.

edit: Just saw the previous post asking the same question where I got my answer. Prolly worth fixing tho as I’m sure it will turn off quite a few people.

p.s. you have ‘web applications’ listed twice in your aws ‘product overview’.

  1. I wrote some code to manage sorting and paging across large data sets, however the work was designed to consider the app existing on the peer where data from queries were being cached and dynamically updated. With this new architecture I would be pulling much more data across the network (potentially outside the aws vpc) and losing the benefit of data locality. Is there (or are you considering) some mechanism for temporary caching of query results on the nodes?

  2. In the past you had an option for memecache integration. It looks like this is not part of the cloud offering; Your Production topology diagram shows the tx nodes use ssd cache, but the diagram does not show the same for reads nodes. The description for “Node” states a feature of “Low-latency caching”. Can you provide any details on the how the caching of read/querying works? i.e. is there an in-memory caching on read nodes or is this just aws EFS caching?

Non Cloud questions:

  1. Datomic recommends using datomic squuids for ids exposed to application users. Personally, I don’t think its reasonable to recommend using a proprietary algorithm to harden ones data. If, at some point, people need to move away from datomic they are going to have a real mess on their hands. Note - I’ve seen a few other libraries and blogs about creating squuids, but it’s still not the same library and I don’t want additional layers of complexity added to the system. Can you speak to this? A reasonable solution would be to open source squuids, but I’m assuming you, at least currently, don’t have plans to.

  2. Side question: Clojure specifies that keywords cannot contain dots (see https://clojure.org/reference/reader) yet datomic is using them. Is this a case of datomic being non conforming or Clojure’s documentation being out of date?

edit: Or is this just the reader spec vs. the keyword spec?

Thank You.

Hi Tim,

Thanks for the great questions and feedback. Taking them in order:

  1. We are aware of the problems on the AWS Marketplace page, and are working with AWS to improve it. In the meantime happy to help out here!
  2. Recent query results are stored in an LRU cache until you consume them.
  3. The caching story gets substantially better in Cloud with the introduction of valcache and the EFS cache. We have updated the docs to cover this.
  4. Squuids are not really needed now that Datomic has adaptive indexing. That said, the code is in the Clojure cookbook if you want it.
  5. Not sure, let me get back to you.

Thanks.

After reading more of the docs I’m realizing Datomic has changed significantly since I last looked (previous to the ‘pull’ api). I don’t remember ‘limit’, ‘offset’ or async ‘chunk’ options existing (though I may have missed it).

Re point 2: Am I right to assume the LRU cache is not something I can manage… i.e. there’s no way to send query and obtain a query reference key to then utilize the results in a downstream query call? Note that previously I was using a sync query call to obtain ids then dividing into chunks and running a query for each chunk of ids that would obtain the eavs for many sortable attribute/values. I would then sort each result set and finally do a ‘merge sort’ to process them all together. All of which was done in lazily to ensure the data processed could reside in available memory. Again this was just prototype work from years ago where the app was the peer.

All that said I still need to look a little further because it looks as though I should be doing a single async call and passing in a transducer fn to handle the work on the node. Anyways… I obviously have some homework to do.

edit: P.S. IMO most of your potential customers are likely experienced in rdbms systems; Examples of prominent features like paging and sorting should be easy to find and clearly defined (in depth) for people with differing backgrounds. Datomic is really, really different. Potential customers need to learn an entirely foreign language ‘datalog’, a fairly complex architecture with variances between options (on-prem vs. cloud). So better guides are where its at.

Other ‘how to’ examples I can think of would be: fast auto-complete, full-text search, memory optimization guides, db-connection examples (state management), best practices for querying/updating ordered data (lists if you will).

And for what’s it worth I should be able to read about these things without having to download a dataset and setup an environment. And if there are example apps they should include the above items (or as an option list out some of the open source apps others have created so we can look at code from working real world apps). Just my two cents.

1 Like

Hi Tim,

Thanks for the detailed feedback. You are right, the caches are not under your control in Cloud. That said:

  • I think you might be able to do your entire operation in one go with nested queries.
  • We are actively looking at ways to let you run your own code on Cloud nodes.

We are continuing to develop examples and will announce them here.

Best,
Stu

Just a note, I don’t necessarily need to run code (outside of api’s) on the Cloud nodes. I would, however, like to store a value in memory on the cloud node such that I could utilize that value in a downstream query call. Both will solve the data locality problem, but I’m just clarifying that if you decide against opening door number one that door number two would be surprisingly helpful.

P.S. Expiry options on that in-memory data would be great if it comes to that.

Thanks.

Jumping on an old thread to point out that squuids are still recommended in the on-prem docs. I only found out by chance on Clojurians Slack that they’re not needed.

https://docs.datomic.com/on-prem/identity.html

Tom,

Thanks for the catch. I’ve updated the documentation in on-prem to the following:

Datomic’s indexes are optimized to gracefully handle all UUIDs, so Squuids are not strictly necessary. However, you should still prefer Squuids if your ids may ever be indexed in other, non-Datomic systems.