Two sizes fit most: PostgreSQL and Clickhouse
Since the introduction of System R in 1974, relational databases in general, and SQL databases in particular, have risen to become the dominant approach to data persistence in the industry, and have maintained that dominance despite various significant challengers. Though some have rumored the death and decline of traditional relational databases, PostgreSQL has turned out to be an improvement on its predecessors, as well as its supposed successors. In fact, the open-source MySQL database was so ubiquitous that it became part of the eponymous LAMP stack (Linux, Apache, MySQL, Perl) that dominated early web development. The one big exception to this trend is OLAP, where specialized techniques that can drastically improve the performance of certain workloads have met with use-cases that actually require these techniques, with new contenders such as Clickhouse enabling qualitatively different approaches to analytics. One size does not fit all As often happens when a technology becomes dominant, it gets applied unthinkingly even when it may not actually be appropriate, and so all kinds of data was and is being pushed into general-purpose relational databases. Extreme examples could be found, such as developers creating remote Oracle databases for data sets with a total of 5 small elements (not columns, pieces of data) or Apple pushing their system logs into an SQLite database (a mistake they later corrected). Bind10 development started under the premise to solve scaling issues with Bind9 as DNS nameserver, using SQLite as backend. The DNS development was discontinued by ISC in 2014, and the OSS project Bundy remains inactive. PowerDNS focussed on performance scaling with MySQL/PostgreSQL early. In 2005, Michael Stonebraker, database researcher behind Ingres and later PostgreSQL, together with Uğur Çetintemel, penned a paper “One Size Fits All”: An Idea Whose Time Has Come and Gone arguing that this had gone too far too long and backing up that argument with benchmark results. In short, there were many workloads outside of the core application of databases, Online Transaction Processing (OLTP), where the general database architectures were outclassed sufficiently that it did not make sense to use them. It should be noted that Stonebraker and Çetintemel argued not against relational databases or SQL, but against a specific architecture descendent from the original System R and Ingres systems that were and still are being used by most general purpose database systems. This architecture has the following features: Disk and row-oriented storage and indexing structures Multithreading to hide latency Locking-based concurrency control mechanisms Log-based recovery In addition to special-purpose text indexing, the primary use-case for which the traditional architecture was proving inadequate was data warehouses, for which column stores were proving 10-100x more efficient than the traditional row stores. Clickhouse The prediction that OLAP database engines would split off from mainstream databases has largely come to pass in the industry, with OLAP databases now being a significant category in its own right, with vertica, the commercial offshot of the original cstore discussed in the paper, one of the major players. The practical advantages of these databases for analytical work are, as predicted, substantial enough that having a separate database engine is warranted. Or even necessary, as was the case for Yandex’s clickhouse OLAP database, recently spun out into a startup that just received a US $250m series B. The clickhouse developers wanted to have realtime analytics, […]
