Synchronization with remote databases

Context and Problem Statement

Synchronize the data in a library to a remote database, while handling conflicts and supporting offline-first paradigm.

Decision Drivers

Updates from the remote should be pulled in
No updates should get lost
Easy to implement
Easy to maintain

Considered Options

“Optimistic offline lock” with hashes for local file support
Algorithm based on “optimistic offline lock”
Use CRDTs

Decision Outcome

Chosen option: “‘Optimistic offline lock’ with hashes for local file support”, because simplest option to resolves all forces.

Pros and Cons of the Options

“Optimistic offline lock” with hashes for local file support

The Optimistic Offline Lock is good for synchronizing clients with a server when there is no other modification of data on client side. However, users might modify the .bib file external of JabRef. They might also open an existing .bib file and synchronize that. Thus, there are additions needed to handle the local synchronization.

Moreover, the optimistic offline lock does not say how a set of data is synchronized.

Both shortcomings are resolved by our algotihm. This algorithm is described at Remote JabDrive storage.

Algorithm based on “optimistic offline lock”

Optimistic Offline Lock is a well-established technique to prevent conflicts in concurrent business transactions. It assumes that the chance of conflict is low. Implementation details are found at https://www.baeldung.com/cs/offline-concurrency-control.

This is implemented for the SQL database synchronization, which is described at Remote SQL Storage.

Good, because this algorithm is already in place since 2016 for JabRef synchronizing with a PostgreSQL backend and a MySQL backend.
Bad, because it assumes the client to be online 100% and does not have handlings of cases where the client disconnects and alters data in other ways.

Use CRDTs

See https://automerge.org/blog/automerge-2/ for details.

Bad, because one needs to locally store a lot more metadata (e.g. for operational CRDTs you essentially need to have the full history of all edits). So you would need another file next to the bib file to store these.
Bad, because CRDTs are mainly used when you need low latency and high frequency of edits (e.g. multi-user chat or text editing). Not really something we care about.