When to use `:db.type/uri`?

What are the benefits and drawbacks of using the :db.type/uri value type?

Would you store email addresses as URIs? (like mailto:x@y.z?)

I can see the following pros:

  1. validation and encoding of the valid characters for the parts of URIs. (not sure what’s a good realistic example for this case though)

and the following drawbacks:

  1. lack of official literal syntax, so URI values doesn’t appear as “just data”
  2. getting the “meat” of an email address would require something like (.getSchemeSpecificPart (URI. "mailto:x@y.z")), which is what’s required in most external systems.

Are there any runtime costs or savings using URIs in queries?

I tried and I can define a URI attribute as :db.unique/identity:

  {:db/ident       :email
   :db/valueType   :db.type/uri
   :db/cardinality :db.cardinality/one
   :db/unique      :db.unique/identity}

and can use it in lookup refs: (d/pull db '[*] [:email (URI. "mailto:x@y.z")]).

3 Likes

I think you have laid out the tradeoffs correctly.

Internally datomic (really fressian) stores a URI as a tagged value with the “uri” tag plus the string representation of the uri, so this is only a few bytes larger than just the string itself.

In the Java heap a URI object has many more string fields (one for each URI part plus one for the whole URI) so it definitely has more memory overhead than a raw string, but this probably doesn’t matter in practice.

I think this comes down to whether you want an actual URI type flowing through your application stack (maybe including adding a tag-reader and printer for URIs and edn/transit/nippy/whatever handlers) or if you prefer validating+encoding at the edges of the Clojure process and keeping it a URI internally, or if you don’t care about URIs at all and want everything “stringly-typed”.

You can use attribute predicates to validate URIs but keep storing it as a string. URI validation IME always ends up being use-case-specific because it’s common not to follow the spec exactly, or you only want some subset of legal URIs (E.g. specific schemas, length-limits, valid email domains, rejecting easy-to-abuse characters, etc).

2 Likes