This reply was too long for Discord so… let’s try out Substack’s new threads feature!
what's the problem you want to solve with a new database? just the query language? or other things too?
short answer: not just the query language, lots of other things, the query language rants are really rooted in a much deeper observation that I believe is the root cause of my primary criticism: lack of what’s required to facilitate innovation, competition and novel ideas.
SQL as a language is okay (avoiding the more complex parts, the fact that new features need entire new syntax, etc, all the shit I've spoken about before) it's fine, as a way to explore data, it's really good - commercial director asked me a question the other day and I threw together a quick query to answer, amazing.
There are plenty of other databases that don't use SQL, ***but*** they all suffer the same fundamental design issue. EdgeDB, FaunaDB, Neo4J, SurrealDB and many others all do this sort of complicated DSL that either compiles to SQL (which then results in all the weird issues with obscure SQL optimisation problems inherent to the language and implementations) or they invent a new language that's basically inspired by SQL that it inherits all the weirdness (lack of expressions, english-sentence-based syntax, etc)
My problem with a DSL (and, especially, a DSL intended to mimic human language) is that there’s a very high chance the structuring semantics will not match that of the host language. What I mean by that is programs (queries) written in the DSL live within their own context, separate from the host program. They need their own interfacing semantics too (with SQL, that means placeholders for arguments “select * from Table where x = $1” where $1 is provided elsewhere.
But, DSLs do not need to be implemented in raw characters. A DSL may exist within another predefined structure. One example of this is OpenAPI, which is a DSL *implemented* within YAML. It sort of uses YAML as its vessel of distribution, but within the YAML itself there are more strict rules. So, in SQL you’d have a rule that says “select must be followed by one or more column names” in OpenAPI you’d say “the value of $ref must be a path to a schema”.
The fundamental difference here is that I’ve not had to define what “value” means because the OpenAPI DSL exists within a much more generalised and well-defined language which defines keys, values, lists and scalar data types already.
So, if I was crazy, I could define a query language DSL within YAML. I would not need to worry about ordering (have you ever written a query where your “where” clause is below the “order” clause? why does that matter? because SQL defined it that way.) and I wouldn’t need to worry about all the little details, I could just declare the structured semantics of what different shapes keys, values, objects, etc *mean*.
Essentially, decoupling the syntax from the meaning, which is what most languages do. In most languages `if` means the exact same thing in every single context. And expression-based languages mean primitive parts can be composed together. OpenAPI can do this too!
Another benefit of using a “vessel” to hold your semantics is that other tools can easily generate this without needing to worry about syntax. When you generate an OpenAPI document from something else (maybe code or a class diagram) you don’t need to worry about when to insert a “:” character, or when to insert a new-line, or how many spaces to add based on the scope and context. You just create a hash table and let something else turn that into the raw characters.
Databases are really hard to build, apparently, that's what people keep telling me when I float this idea. It's like there's this very conservative "if it ain't broke don't fix it" mentality around this. People often cite Postgres' 30 years of "maturity" which is a valid point, but it's also 30 years of random ass features 7 people use (each with their own custom syntax, not built out of existing language constructs, no, actual new syntax. imagine if JavaScript was like "hey you want to read the filesystem? we invented a whole new set of keywords and syntactical elements just for that!")
Maybe folks said the same thing about programming languages, but then we had an explosion of those in the early 2000s and then more later. One of the most interesting ones I think was Go. Go was fascinating because it stripped away features and made simplicity its flagship.
So I think maybe what I’m throwing around with this radical database idea is “The Golang of Databases” in a sense that it’s incredible simple and conservative in terms of what it does. The vast majority of products I’ve built and people I know have built do not require: triggers, checks, full-text-search, weird JSON query syntax, custom data types, custom operators, custom functions, a full RBAC system, the ability to embed Perl, Python and others into queries.
More will come on this topic, but I must organise my thoughts in a more coherent way than just freestyling into a substack thread at 10:30pm on a thursday. peace ✌️
Building a new database
Building a new database
Building a new database
This reply was too long for Discord so… let’s try out Substack’s new threads feature!
short answer: not just the query language, lots of other things, the query language rants are really rooted in a much deeper observation that I believe is the root cause of my primary criticism: lack of what’s required to facilitate innovation, competition and novel ideas.
SQL as a language is okay (avoiding the more complex parts, the fact that new features need entire new syntax, etc, all the shit I've spoken about before) it's fine, as a way to explore data, it's really good - commercial director asked me a question the other day and I threw together a quick query to answer, amazing.
There are plenty of other databases that don't use SQL, ***but*** they all suffer the same fundamental design issue. EdgeDB, FaunaDB, Neo4J, SurrealDB and many others all do this sort of complicated DSL that either compiles to SQL (which then results in all the weird issues with obscure SQL optimisation problems inherent to the language and implementations) or they invent a new language that's basically inspired by SQL that it inherits all the weirdness (lack of expressions, english-sentence-based syntax, etc)
My problem with a DSL (and, especially, a DSL intended to mimic human language) is that there’s a very high chance the structuring semantics will not match that of the host language. What I mean by that is programs (queries) written in the DSL live within their own context, separate from the host program. They need their own interfacing semantics too (with SQL, that means placeholders for arguments “select * from Table where x = $1” where $1 is provided elsewhere.
But, DSLs do not need to be implemented in raw characters. A DSL may exist within another predefined structure. One example of this is OpenAPI, which is a DSL *implemented* within YAML. It sort of uses YAML as its vessel of distribution, but within the YAML itself there are more strict rules. So, in SQL you’d have a rule that says “select must be followed by one or more column names” in OpenAPI you’d say “the value of $ref must be a path to a schema”.
The fundamental difference here is that I’ve not had to define what “value” means because the OpenAPI DSL exists within a much more generalised and well-defined language which defines keys, values, lists and scalar data types already.
So, if I was crazy, I could define a query language DSL within YAML. I would not need to worry about ordering (have you ever written a query where your “where” clause is below the “order” clause? why does that matter? because SQL defined it that way.) and I wouldn’t need to worry about all the little details, I could just declare the structured semantics of what different shapes keys, values, objects, etc *mean*.
Essentially, decoupling the syntax from the meaning, which is what most languages do. In most languages `if` means the exact same thing in every single context. And expression-based languages mean primitive parts can be composed together. OpenAPI can do this too!
Another benefit of using a “vessel” to hold your semantics is that other tools can easily generate this without needing to worry about syntax. When you generate an OpenAPI document from something else (maybe code or a class diagram) you don’t need to worry about when to insert a “:” character, or when to insert a new-line, or how many spaces to add based on the scope and context. You just create a hash table and let something else turn that into the raw characters.
Databases are really hard to build, apparently, that's what people keep telling me when I float this idea. It's like there's this very conservative "if it ain't broke don't fix it" mentality around this. People often cite Postgres' 30 years of "maturity" which is a valid point, but it's also 30 years of random ass features 7 people use (each with their own custom syntax, not built out of existing language constructs, no, actual new syntax. imagine if JavaScript was like "hey you want to read the filesystem? we invented a whole new set of keywords and syntactical elements just for that!")
Maybe folks said the same thing about programming languages, but then we had an explosion of those in the early 2000s and then more later. One of the most interesting ones I think was Go. Go was fascinating because it stripped away features and made simplicity its flagship.
So I think maybe what I’m throwing around with this radical database idea is “The Golang of Databases” in a sense that it’s incredible simple and conservative in terms of what it does. The vast majority of products I’ve built and people I know have built do not require: triggers, checks, full-text-search, weird JSON query syntax, custom data types, custom operators, custom functions, a full RBAC system, the ability to embed Perl, Python and others into queries.
More will come on this topic, but I must organise my thoughts in a more coherent way than just freestyling into a substack thread at 10:30pm on a thursday. peace ✌️