This page is part of my digital garden.

This page might be unfinished or have typos. These pages are meant as part of a public living notebook to be edited over time. For more, visit the page explaining the concept of a digitial garden.

Databases

Short version: Files are bad1. Use SQLite2[^sqlite]. If you can’t just use SQLite then use PostgreSQL. If PostgreSQL isn’t enough try reducing load with caching via Redis.

PostgreSQL

SQLite

SQLite is an open source project known for its reliability. It’s written in C, but with an extremely thorough test suite covering all kinds of conditions. Instead of being its own server it runs as part of the application so there is no network or socket communication needed.

In many tests only SQLite gets writing to disk correctly and is often recommended as a better way to write files in general even if you don’t need the full SQL features3.

It’s best to always run SQLite in WAL mode as that’s the safest version and also allows for better multi-threading/process support.

Interesting note: because SQLite is a library inside the application some query patterns that would be horrible for MySQL/PostgreSQL are great for SQLite. Namely making many small queries is faster than trying to do everything within a single query as you might with Postgres.

PlanetScale

Nice serverless database based off Vitess with added DX features around it including advanced migrations built-in to the database.

The free plan is generous.

ElasticSearch

ElasticSearch is a powerful tool for storing documents which can be indexed and then searched any number of ways.

It’s used for Log analysis although it wasn’t originally intended for that.

Generally you want to keep your real data in something like PostgreSQL and then write something to sync changes to ElasticSearch as well as provide a way to recreate the “index” in a zero down time way.

Elastic search has “indexes” which are akin to what you’d call a database in PostgreSQL. You can create aliases for indexes so “my-index” actually points to “my-index-2021-11-27-0” which lets you create a new index “my-index-2021-11-28-0” without affecting the first index. Once it’s created and replicated you can point the original alias to the new index and delete the old index.

Elastic search clustering is very nice because by default a cluster will distribute the data in Shards. You can configure how many shards are made per index as well as how many times the shards should be replicated. The ES instances can determine a master node via election which coordinates changes in the cluster. However, a cool thing, you can talk to any of the instances to do your query and all the relevant nodes with shards will receive a request to do part of the work and return it. This means ElasticSearch does not need any kind of load balancing in front of it because the load balancing takes place inside the cluster itself.


  1. File Consistency - Dan Luu, Files are fraught with peril - Dan Luu, Talk at DeconstructConf 2019 ↩︎

  2. One process notes by David Crawshaw ↩︎

  3. Most databases got it wrong. Even PostgreSQL got it wrong and it revealed to the wider community that sync() in Linux when it fails that calling sync() again does not retry. Instead it clears the error flag and drops the write. SQLite got this correct in WAL mode which is not the default but is the recommended mode. ↩︎

Last updated on