Datomic Fulltext Search Equivalent


#1

Hey, I’ve been using datomic on-prem for a while now and been reasonably happy with it and am now looking at the feasibility of running one of our products using datomic cloud/ions. What I’ve not seen is how well it integrates with existing aws search options as :fulltext is not available, I’ve seen the odd mention that suggests using cloudsearch, although my understanding is that aws elasticsearch is presently considered to be the better offering.

Which should I use? Any good resources on how to connect search and datomic/ions together? Or is this just a case of pick one and read the docs?

Anyone able to shed some light here?


#2

I looked at CloudSearch, and did some initial prototyping around it. Ultimately, CloudSearch seems like a barely maintained legacy service though, and I wouldn’t recommend it.

In the end, we decided to set up our own ElasticSearch cluster for fulltext search. Exactly how they coordinate depends on your use case. Obviously, you’re not going to be able to join on the fulltext in a datalog query.

But storing an ID in your entities which is shared with a document in ElasticSearch is of course entirely feasible. Now, ES mutates, so the promise of immutability ends at the border between the two.

I’ve considered solutions for this, but haven’t come to a conclusion yet.


#3

Thanks @eneroth, that was the sense I was getting.

I’m getting the impression that I was getting.

Was it as simple as just setting up an elasticsearch client, connect to your cluster and then doing a lookup?

Any issues that I should watch out for and what libs did you end up using if you don’t mind me asking?


#4

I’m stepping right into @vvvvalvalval’s area here (maybe he wants to weigh in), but I’d say that the complexity of indexing the data correctly again depends on your use case.

For connecting to the cluster, we’re using https://github.com/mpenet/spandex.

The actual cluster setup is more my area. I wanted to sort of align with Datomic here, so I ended up creating a CloudFormation template to set up all the resources. After the fact, I wonder if using https://www.terraform.io would have been better. (As of Terraform 0.12, they will have better JSON support, which I can then make as a compilation target for the super small Clojure DSL I wrote for the CloudFormation template).

For deployments, we’re using CodeDeploy, like Datomic. The general idea is to keep the infrastructure immutable: changes are made to the templates, rather than directly to the running resources, which means that git functions as a log over the changes that has been made to the created resources.

Also, you should look into a strategy to be able to do rolling upgrades to the cluster nodes as a matter of routine rather than a once in a blue moon.

To quote Elastic on the matter,

ElasticSearch releases new versions with bug fixes and performance enhancements at a very fast pace, and it is always a good idea to keep your cluster current.

Upgrading should be a routine process, rather than a once-yearly fiasco that requires countless hours of precise planning.

I tried to do this using CodeDeploy and CloudFormation, but they have hard limits on how long a template upgrade can take. They will fail after an hour, which is not much depending on the amount of data you’re storing and need to shift between nodes as they are decommissioned and provisioned.

I ended up writing my own tool (Clojure, of course) which deals with the logic of taking down and bringing up nodes, check health status of the cluster, and so on.


#5

Any preexisting resources you used to do any of this? Or is it mostly custom implementation?

It’s been a while since I ran my own elasticsearch cluster :)…

I’m trying to get a handle on how much of a maintenance burden I’d be taking on by running this vs dropping moving to cloud and instead focusing on working out how to improve on-prem performance.

Personally I’m really surprised the docs don’t mention anything about how to do this other than :fulltext is not supported, considering cloud/ions is pitched as a more managed experience I would have thought it would be addressed?