Connection reset - MSSQL backend


#1

Hi, we are having some trouble with the transactor using sqlserver as backend.

Datomic transactor reports:

2017-11-28 08:09:08.726 INFO  default    datomic.process-monitor - {:tid 13, :MemoryIndexMB {:lo 0, :hi 0, :sum 0, :count 1}, :AvailableMB 3960.0, :RemotePeers {:lo 1, :hi 1, :sum 1, :count 1}, :HeartbeatMsec {:lo 5000, :hi 5001, :sum 35002, :count 7}, :StorageGetMsec {:lo 
77300, :hi 77300, :sum 77300, :count 1}, :pid 27303, :event :metrics, :SelfDestruct {:lo 1, :hi 1, :sum 1, :count 1}, :MetricsReport {:lo 1, :hi 1, :sum 1, :count 1}, :StorageGetBackoffMsec {:lo 100, :hi 100, :sum 100, :count 1}}
2017-11-28 08:09:08.737 ERROR default    datomic.process - {:message "Critical failure, cannot continue: Start database failed", :pid 27303, :tid 63}
java.util.concurrent.ExecutionException: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_151]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_151]
        at clojure.core$deref_future.invokeStatic(core.clj:2208) ~[clojure-1.8.0.jar:na]
        at clojure.core$future_call$reify__6962.deref(core.clj:6688) ~[clojure-1.8.0.jar:na]
        at clojure.core$deref.invokeStatic(core.clj:2228) ~[clojure-1.8.0.jar:na]

Meanwhile in our application:

2017-11-28 08:09:08,777 WARN  org.hornetq.core.client: HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
Exception in thread "main" java.util.concurrent.ExecutionException: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at clojure.core$deref_future.invokeStatic(core.clj:2208)
        at clojure.core$future_call$reify__6962.deref(core.clj:6688)
        at clojure.core$deref.invokeStatic(core.clj:2228)
        at clojure.core$deref.invoke(core.clj:2214)
        at datomic.cluster$uncached_val_lookup$reify__2694.valAt(cluster.clj:192)
        at clojure.lang.RT.get(RT.java:762)
        at datomic.cache$double_lookup$reify__2463.valAt(cache.clj:355)
        at clojure.lang.RT.get(RT.java:762)
        at datomic.cache$lookup_transformer$reify__2454.valAt(cache.clj:242)
        at clojure.lang.RT.get(RT.java:762)
        at datomic.cache$lookup_cache$reify__2457.valAt(cache.clj:287)
        at clojure.lang.RT.get(RT.java:762)
        at datomic.common$getx.invokeStatic(common.clj:188)
        at datomic.common$getx.invoke(common.clj:184)

It seems that we are unable to connect to sqlserver, but we can establish a jdbc connection with no trouble, and we were able to read/write in the database, so it’s not a connectivity problem or even a security configuration problem in sqlserver. Furthermore, we create another “datomic” database within the same server and we were able to use it with no errors. We also restore a backup of that database in another server and we connected with no issues.

Have you guys any ideas?

We can provide to you a backup of the database ~42 MB

Thanks a lot


#2

Sebastián,

we were able to read/write in the database

Were you able to read/write via datomic or are you saying that you were able to read/write directly to the underlying storage MSSQL?

We also restore a backup of that database in another server and we connected with no issues.

The other server used the same backend storage? Are you able to restore the backup to the problematic SQL server? Are you able to restore to a different SQL DB?

Thanks,
Jaret


#3

Jaret,

I work with Sebastián :D.

We were able to read and write directly to the mssql.

The restored was performed in a different server with the same DBMS (sqlserver) the backups were created with tools of sqlserver not with datomic.

The other server is a sqlserver, and we were able to restore the backup in the problematic sqlserver, but we still have the same problem.

Thanks!


#4

Alexis,

You should not use underlying storage’s (sqlserver tools) backup/restore methods as they can corrupt the KV store, but you can rely on Datomic’s backup/restore.

In the same vein, we also recommend against writing directly to storage under the datomic table so as not to overwrite, alter, or corrupt Datomics use of the table as a KV store.

Could you create and use a Datomic backup for this server?
http://docs.datomic.com/backup.html

You should note that you may need to create a new storage per:

Backup and restore are not suitable for cloning of a database within a single storage. If you attempt to restore a database into a storage that already contains that database, but under a different name, the restore operation will fail.

Thanks,
Jaret


#5

Jaret,

If the kv store is corrupt how could we restore it in another server? We didn’t use datomic_kvs for our tests of course :P.

thanks


#6

Hi Jared,

Thanks a lot for your time, we really appreciate.

I think the focus it shouldn’t be the backup, because this problem happened without any backup procedure involved.
The backup was a way for us to discard problems, just that.

Wrapping things up, this is what happened; we saw this error in our transactor, with apparently no changes involved in our infrastructure (hardware, db, networking, etc) or software.

We tried the following to discard things, because we couldn’t determine where the problem is located:

  1. Network issues
    • JDBC connections outside datomic works perfectly, we can query the db, and write things. (with the same driver)
    • If we create a new database, datomic connects perfectly.
  2. Database Server issues
    • Same as (1)
  3. Database corruption
    • We made a backup and restored in another MSSQL server and datomic can connect perfectly and the data is all there.
  4. Disk corruption
    • We detached our database and moved to another disk, but the same error occurs.
  5. Datomic issues
    • If we created another db and datomic can connect without problem.
    • We tryed to connect from another datomic instance to our problematic database but the same error occurs.
  6. DB User/Login issues
    • We could connect via JDCB with the same user and password configured in datomic (same jdbc url)
    • We could connect via Management Console with the same user and password configured in datomic

Another thing that call our attention was this error on our MSSQL server every time datomic try to connect to this database (172.23.12.102 is our datomic node):

This error means something to you?

This is our datomic log at the same time we saw the error on MSSQL side:

2017-11-28 08:09:08.737 ERROR default    datomic.process - {:message "Critical failure, cannot continue: Start database failed", :pid 27303, :tid 63}
java.util.concurrent.ExecutionException: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_151]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_151]

Thanks a lot.


#7

Sebastián,

As a troubleshooting step, could you please use Datomic level backup and restore and post your results? If Datomic backup and restore won’t run on this DB it may point to a corrupted segment or issue with storage.

-Jaret