Datomic Fulltext Search Equivalent

Hey, I’ve been using datomic on-prem for a while now and been reasonably happy with it and am now looking at the feasibility of running one of our products using datomic cloud/ions. What I’ve not seen is how well it integrates with existing aws search options as :fulltext is not available, I’ve seen the odd mention that suggests using cloudsearch, although my understanding is that aws elasticsearch is presently considered to be the better offering.

Which should I use? Any good resources on how to connect search and datomic/ions together? Or is this just a case of pick one and read the docs?

Anyone able to shed some light here?

1 Like

I looked at CloudSearch, and did some initial prototyping around it. Ultimately, CloudSearch seems like a barely maintained legacy service though, and I wouldn’t recommend it.

In the end, we decided to set up our own ElasticSearch cluster for fulltext search. Exactly how they coordinate depends on your use case. Obviously, you’re not going to be able to join on the fulltext in a datalog query.

But storing an ID in your entities which is shared with a document in ElasticSearch is of course entirely feasible. Now, ES mutates, so the promise of immutability ends at the border between the two.

I’ve considered solutions for this, but haven’t come to a conclusion yet.

Thanks @eneroth, that was the sense I was getting.

I’m getting the impression that I was getting.

Was it as simple as just setting up an elasticsearch client, connect to your cluster and then doing a lookup?

Any issues that I should watch out for and what libs did you end up using if you don’t mind me asking?

I’m stepping right into @vvvvalvalval’s area here (maybe he wants to weigh in), but I’d say that the complexity of indexing the data correctly again depends on your use case.

For connecting to the cluster, we’re using GitHub - mpenet/spandex: Elasticsearch client for Clojure (built on new ES 8.x java client).

The actual cluster setup is more my area. I wanted to sort of align with Datomic here, so I ended up creating a CloudFormation template to set up all the resources. After the fact, I wonder if using https://www.terraform.io would have been better. (As of Terraform 0.12, they will have better JSON support, which I can then make as a compilation target for the super small Clojure DSL I wrote for the CloudFormation template).

For deployments, we’re using CodeDeploy, like Datomic. The general idea is to keep the infrastructure immutable: changes are made to the templates, rather than directly to the running resources, which means that git functions as a log over the changes that has been made to the created resources.

Also, you should look into a strategy to be able to do rolling upgrades to the cluster nodes as a matter of routine rather than a once in a blue moon.

To quote Elastic on the matter,

ElasticSearch releases new versions with bug fixes and performance enhancements at a very fast pace, and it is always a good idea to keep your cluster current.

Upgrading should be a routine process, rather than a once-yearly fiasco that requires countless hours of precise planning.

I tried to do this using CodeDeploy and CloudFormation, but they have hard limits on how long a template upgrade can take. They will fail after an hour, which is not much depending on the amount of data you’re storing and need to shift between nodes as they are decommissioned and provisioned.

I ended up writing my own tool (Clojure, of course) which deals with the logic of taking down and bringing up nodes, check health status of the cluster, and so on.

1 Like

Any preexisting resources you used to do any of this? Or is it mostly custom implementation?

It’s been a while since I ran my own elasticsearch cluster :)…

I’m trying to get a handle on how much of a maintenance burden I’d be taking on by running this vs dropping moving to cloud and instead focusing on working out how to improve on-prem performance.

Personally I’m really surprised the docs don’t mention anything about how to do this other than :fulltext is not supported, considering cloud/ions is pitched as a more managed experience I would have thought it would be addressed?

@Folcon in my experience, the fulltext support in On-Prem won’t get you very far (it’s very limited in how you can customize it) and you’re likely to want to offload fulltext seach to a more capable engine anyway; so I wouldn’t make that the decisive factor between On-Prem and Cloud. You should rather think about your operational requirements and architecture to decide.

Regardless of whether you use On-Prem or Clould:

  1. Datomic makes it unusually straighforward to sync derived data to specialized stores such as ElasticSearch, because the Log API makes change detection very easy. A basic but often effective approach can be to periodically update the derived store in batches, by computing the set of entities affected by recent changes, and (re-)computing the documents for those.
  2. Setting up a production-ready ElasticSearch on AWS is a bit of an investment, so you may want to start with something more managed.
1 Like

Hi @vvvvalvalval,

Thanks for the response, that’s good to know.

With regards to using something more managed, do you have any specific suggestions?

I’ve had to setup an ES cluster previously on AWS, it was a bit of a pain to setup, but once deployed it wasn’t too problematic. Are there any issues I’m not aware of that you’ve experienced that have been a pain point?

Would be interesting if anyone has any updates (or new best practices) on this, since we’re close to having to implement something like it along with Datomic Ions :slight_smile: