Hi, my task is to determine all entities which were existing before but are missing in the current version of DB and finally say goodbye to all of them. For example I have 1 million of such entities, all of them are mapped into list of objects made from template {:db/excise e}. Then this list of one million items is passed to transactor. What I see that only one CPU of 8 is used. Are there any ways to make the process faster, 7 hours went, the excision process is still running.
P.S. When I tried to excise the whole DB passing all attributes it took only 1.5 hour. Yes it means to delete all DB, but anyway why partial excision tooks more time then excision of whole db
What are you timing? Everything in the code block? Am I correct in understanding you are trying to identify retracted entities and then excise them? Can you explain why? If you retracted these facts why are you interested in excising them?
P.P.S. Then DB under datomic is H2
Additionally, you are using the dev protocol which does not have the same performance characteristics as other production storage protocols. Are you planning on using this same strategy against a production database?
I should point out that excision was designed specifically to meet the following two scenarios and should be an extremely infrequent operation:
removing data for privacy reasons
removing data older than some domain-defined retention period
Excision is lossy. You cannot ask āwas this datom in the database prior to an excision?ā (that would defeat the purpose of excision.) Excision | Datomic
Excision creates a substantial burden on background indexing jobs in whose execution time is proportional to the size of the entire database, leading to back pressure and reduced write availability.
Am I correct in understanding you are trying to identify retracted entities and then excise them?
Right
Are you planning on using this same strategy against a production database?
No, dev mode only for local development, postgres for production use
Can you explain why? If you retracted these facts why are you interested in excising them?
Would say a lot of data were created in DB by mistake what has lead to increasing memory requirements, since every megabyte is payable we need to rid of unnecessary data
Yeah, it helped to make the excision process much faster, I saw alternation between high CPU usage and disk read(much bigger in comparing to the previous library) but I have faced another issue: 6288656 entities were sent to the excision process, but after the end of the process I still see 6192721 of entities which I have expected to be excised
P.S. Iām trying to do the same actions second time to check the number of excised entities after retrying
P.P.S. Peer has been restarted, also I have executed requestIndex, but judging by CPU usage/disk read/sync-index nothing happened.
P.P.P.S. Is a reindexing the part of the excision process? The doc says: At some point after the excision request, an indexing job will run.
So when I run sync-excise, does it wait the finish of that indexing job? I just look for a way which give me the answer: everything is done, all entities to be excised are already wiped from everywhere
P.P.P.P.S I have restarted the transactor when the future returned from sync-excise has provided ready db(what was the indication for me that all all activities completed)
An Interesting result I have got after the second run of the excision process. There are still the same number of entities which actually were expected to be excised: 6192721.
Futhermore interesting: I have checked one of entities, and I see that datomic has marked this entity as excised(we see two ids of excising because I did two attempts):
@nikolayandr what are you trying to tell here? I am very confused when you use terms like ānothing happened.ā Can you share what you are expecting to have happened when you run excision vs what actually happened?
The screenshots you show seem to show that you are looking at the :db/excise attributes. Per our docs (Excision | Datomic):
Note that the excise attributes themselves are protected from excision, so there is no way to āerase your tracksā. Every excision creates a permanent record. This helps preserve a fundamental value proposition of Datomic - it is a database that greatly facilitates your knowing why it is in the state it is in, and how it got there.
Thus excision strikes a delicate balance between forgetting, and remembering that you forgot.
@jaret
Right, Iām showing that datomic has marked these entities as excised(what is expected) but actually they continue existing in history DB(what is not expected). I thought that it is necessary to request indexing after ending the excision process but the DB still has these entities(looks like nothing happened after requesting index, no index job has started)
In another words: Is it expected that some number of entities sent to the excision process might be still available in the history DB once the excision process has ended?
I did some quick testing and was unable to see the same results. Would it be possible for you to make a gist of a small reproduction of the behavior you are seeing. I think you have most of the steps in this thread, but it would be nice to see a complete gist with an eye towards formatting the gist as we describe in writing a datomic problem report (i.e. environment, expectation, actual results etc).
More specifically, you say:
actually they continue existing in history DB(what is not expected).
Do you mean the counts or are you actually able to query the data? Is this only in console that you see this? I am hoping you can demonstrate this issue using the REPL.
Do you mean the counts or are you actually able to query the data? Is this only in console that you see this? I am hoping you can demonstrate this issue using the REPL.
Iām able to query data in history DB. The test was done in REPL packed in datomic distribution package.
I did some quick testing and was unable to see the same results.
I starting to think that it might be issue related to our data too. So reproducing can be possible using our DB
@nikolayandr Apologies for not being able to immediately review I plan to look at this today and Friday. I noticed that you had previously provided the backup file. If you want to send that confidentially you can share that with me via support@cognitect.com.