Introduction and Recap
In the first article, I introduced the SubSelect
interface and its companion TSubSelect
io-ts schema. In this article, we’ll continue the learning journey of implementing SQL in io-ts.
If you haven’t already, please read the first entry in this series of articles where I introduced the problem statement:
we updated our Nestjs backend by adding a new POST endpoint having a fully documented, validated, and parsed input JSON body with complete SQL support
In this article, we’ll dive deeper into the interfaces and schemas we created to describe the SQL SELECT
statement. This article will focus on the ExpressionItem
and all the sub-structures required to describe a general SQL expression.
In the next sections, we’ll dive deeper into each of these structures. But first, to help explain later sections, let’s explore discriminating unions, which is the type pattern chosen for ExpressionItem
.
We found this pattern to be a perfect fit for describing SQL in an io-ts schema. When describing operators, for example, we found that the union of each operator’s discrete interface led to a structure that could be extended indefinitely. What’s more, adding a new operator to this structure is low effort and straightforward.
Next, I’ll describe discriminating unions and how they can be used.
Discriminating Unions
In TypeScript, discriminating union is a pattern for implementing a generalized shape with multiple concrete possibilities.
This pattern can be achieved in two ways:
- Pivot around a required, singleton value;
- Pivot around the presence of disjointed, required properties
To help explain these two, here is a contrived example of the first case:
enum UserType { admin = 'admin', public = 'public', } interface BaseUserOptions { userType: UserType; } interface AdminUserOptions extends BaseUserOptions { userType: UserType.admin; rootAccess: boolean; } interface PublicUserOptions extends BaseUserOptions { userType: UserType.public; enrollDate: string; } type UserOptions = AdminUserOptions | PublicUserOptions; const userOptions: UserOptions[] = [];
Elements of type UserType
could either be admin
or public
. Depending on which one of the two enumerated values is present, additional properties (e.g. rootAccess
and enrollDate
) can be expected.
To access the additional properties, you must first check the userType
property to know which interface is to be used. This is called type narrowing. Since the type narrowing is based on the unique value found in userType
, this makes userType
the discriminating value, a.k.a. the discriminator.
Here is an example of working with the broader UserOptions
interface:
const userOptions: UserOptions[] = someUserOptions(); for (const userOption of userOptions) switch (userOption.userType) { case UserType.admin: console.log(userOption.rootAccess); break; case UserType.public: console.log(userOption.enrollDate); break; }
In the second case, where you pivot around the presence of disjointed, required properties, the technique involves checking for the existence of properties.
Here is a contrived example of the second case:
interface AdminUserOptions { rootAccess: boolean; } interface PublicUserOptions { enrollDate: string; } type UserOptions = AdminUserOptions | PublicUserOptions;
The difference in the second construct is that AdminUserOptions
and PublicUserOptions
do not share any common property. But, they do have required properties of their own that do not overlap each other.
To access the data through the broader UserOptions
interface, you first check for the presence of a required property. In this case, the type narrowing is functioning through checking the shape of the object, not a value of a property shared by the objects.
Here is an example of working with the second form of the broader UserOptions
interface:
const userOptions: UserOptions[] = [...someUserOptions()]; for (const userOption of userOptions) { if ('rootAccess' in userOption) console.log(userOption.rootAccess); if ('enrollDate' in userOption) console.log(userOption.enrollDate); }
📝 It’s important to reiterate that the disjointed properties need to be required. If a shape is comprised of entirely optional properties, then TypeScript cannot eliminate it as a possibility just because you didn’t specify one of its properties.
Now that you’ve become more familiarized with the discriminating union pattern, let’s explore how we used this to describe the general and specific shapes that an ExpressionItem
can be.
Expression Item
The ExpressionItem
structure is a fundamental element of the SubSelect
. This structure can be used in quite a variety of places in the schema.
For example, you can find an ExpressionItem
:
- As a
SELECT
column; - As an array of rows in a
VALUES
predicate; - As criteria for joining two tables with the
ON
join conditions; - As criteria in the
WHERE
,GROUP BY
,HAVING
andORDER BY
clauses; - As arguments for functions (and functions themselves are
ExpressionItem
); - As parameters to any operator such as
=
,LIKE
,CASE
,+
etc.
For the sake of brevity, I want to focus on key themes when describing the different forms of the ExpressionItem
.
Here is the top-level ExpressionItem
structure:
export type ExpressionItem = | ExpressionColumnItem | ExpressionFunctionItem | ExpressionOperatorItem | ExpressionValueItem | SubSelect; export const TExpressionItem: t.Type<ExpressionItem> = t.recursion( 'ExpressionItem', () => t.union([ TExpressionColumnItem, TExpressionFunctionItem, TExpressionOperatorItem, TExpressionValueItem, TSubSelect ]), );
Each ExpressionItem
possibility is a unique interface. Each of those possibilities has at least one required property that is also unique to that interface.
💡 When you use this type of interface, first provide the value for the required property of the desired shape. Once you do that, Typescript narrows the possible remaining properties to those that intersect that required property.
🤯 Notice that
SubSelect
is one of the options for anExpressionItem
. This means anywhere an expression can be used, it can also be a nestedSubSelect
.
To access the specific properties of an ExpressionItem
, you first need to determine the shape of the item. Since this shape uses the second form of discriminating unions, we can discern what interface to use based on the presence of required, disjointed properties.
For example, if column
is present, then it must be an ExpressionColumnItem
since no other possible shape allows for a property named column
. To explain this better, the next sections dive deeper into ExpressionItem
and explore the possible shapes.
Expression Column Item
An ExpressionColumnItem
is an interface that allows for referring to a column, with an optional correlation. A correlation qualifies the column to a specific table reference.
Here is an example of selecting a column and providing the correlation:
SELECT A.status
Defining this interface is straightforward:
export const TExpressionColumnItem = t.intersection([ t.type({ column: t.string, }), t.partial({ correlation: t.string, }), ]); export type ExpressionColumnItem = t.TypeOf<typeof TExpressionColumnItem>;
📝 The discriminator here is the
column
property.
Specifying a value for the column
property implies that the only possible shape for ExpressionItem
is ExpressionColumnItem
. Therefore, the correlation
property is a valid optional argument when an ExpressionItem
has a column
property.
Here is the previous SQL SELECT
statement example encoded with the schema:
{ "select": [ { "column": "status", "correlation": "A" } ] }
Expression Value Item
An ExpressionValueItem
is an interface that allows for referring to a literal value. Depending on its data type (e.g. number, string), the reference may need to be quoted or embedded as is.
The following example demonstrates a SELECT
statement where a column is a literal value:
SELECT 'new' AS status
A literal value could be any one of the following primitive types: boolean
, null
, number
, or string
. To define this interface, specify value
as a union
of those options:
export const TExpressionValueItem = t.type({ value: t.union([ t.boolean, t.null, t.number, t.string ]), }); export type ExpressionValueItem = t.TypeOf<typeof TExpressionValueItem>;
📝 The discriminating property is the
value
property.
Here’s how the previous SQL SELECT
statement example appears as encoded with the schema:
{ "select": [ { "alias": "status", "value": "new" } ] }
Expression Function Item
The ExpressionFunctionItem
interface allows for referring to functions taking on the form of a function name, followed by a left parenthesis, zero or more comma-delimited arguments, and ending with a right parenthesis.
The example below demonstrates a SELECT
statement referencing a function call:
SELECT UPPER(name)
📝 This form does not consider any function that has one or more “noise” words. For example, the
TRIM
function allows you to specify words likeBOTH
andLEFT
to describe where to trim. Functions of these forms are described later with theExpressionOperatorItem
interface.
The ExpressionFunctionItem
interface requires a function name and optional arguments.
🔒 For security reasons, we decided it would be best to restrict the choices for a function name. You could remove this enumeration restriction and leave it open for the database manager to validate both the function signature and required permissions to execute it.
To begin constructing this interface, we start with the available function names:
// Note: For brevity's sake, only a single function categery is shown export enum AggregateFunction { avg = 'AVG', count = 'COUNT', cube = 'CUBE', groupingSets = 'GROUPING SETS', max = 'MAX', min = 'MIN', rollUp = 'ROLLUP', sum = 'SUM', } // Note: Utils.enum returns an io-ts validator, asserting a valid enum value export const TAggregateFunction = Utils.enum<AggregateFunction>(AggregateFunction); // Note: Not an exhaustive listing export const TFunctionName = t.union([ TAggregateFunction, TConditionFunction, TConfigurationSettingFunction, TGeometryFunction, TMathFunction, TObjectManagementFunction, TStringFunction, TSubSelectFunction, ]); export type FunctionName = t.TypeOf<typeof TFunctionName>;
📝 Above,
TFunctionName
demonstrates an example of how to combine multiple enumerations andunion
them together to create a composite enumeration.
Once the possible function names are enumerated, we define the ExpressionFunctionItem
interface:
export interface ExpressionFunctionItem { functionName: FunctionName; arguments?: ExpressionItem[] | ExpressionItem[][]; schemaName?: string; } export const TExpressionFunctionItem: t.Type<ExpressionFunctionItem> = t.recursion('ExpressionFunctionItem', () => t.intersection([ t.type({ functionName: TFunctionName, }), t.partial({ arguments: t.union([ t.array(TExpressionItem), t.array(t.array(TExpressionItem)) ]), schemaName: t.string, }), ]), );
📝 The discriminating property here is the
functionName
.
Not only is the ExpressionFunctionItem
a possible ExpressionItem
shape, each argument is also an ExpressionItem
.
Additionally, the arguments array can be a two-dimensional array. This was done so we could support grouping sets as in the example below:
SELECT status, state, COUNT(*) FROM my_table GROUP BY GROUPING SETS ((status, state), ())
Here is the grouping sets query when encoded in the schema:
{ "select": [ { "column": "status" }, { "column": "state" }, { "functionName": "COUNT", "arguments": [{ "column": "*" }] } ], "from": [ { "operator": "FROM", "tableName": "my_table" } ], "groupBy": [ { "functionName": "GROUPING SETS", "arguments": [[{ "column": "status" }, { "column": "state" }], []] } ] }
Expression Operator Item
SQL has a wide variety of operators. Consider some examples below:
SELECT '$' || TRIM(BOTH '*' FROM currency), transactionCount + 1
Anything resembling an operator e.g. ||
, +
, etc. is implemented as a member of the ExpressionOperatorItem
. This interface uses the discriminating union pattern where each operator
value is a unique value paired with its interface.
📝 When describing the
ExpressionFunctionItem
, I alluded that there are some functions with unique keywords and parameter requirements. Any of these types of functions are also treated as operators, even if their parsed output resembles a function. This is done so that we can use the discriminating union pattern of operators to implement specific interfaces for functions with unique requirements.
The number of operators is too great to expound upon in this article. The following snippet contains two examples of the options we’ve implemented so far, namely CASE
and +
:
export enum ConditionalOperator { case = 'CASE', } export const TConditionalOperator = Utils.enum<ConditionalOperator>(ConditionalOperator); export enum MathOperator { add = '+', } export const TMathOperator = Utils.enum<MathOperator>(MathOperator); export const TExpressionOperator = t.union([ TConditionalOperator, TMathOperator, ]); export type ExpressionOperator = t.TypeOf<typeof TExpressionOperator>; export const TBaseExpressionOperatorItem = t.type({ operator: TExpressionOperator, }); export type BaseExpressionOperatorItem = t.TypeOf< typeof TBaseExpressionOperatorItem >; export interface IExpressionOperatorItem { [ConditionalOperator.case]: ConditionalCaseItem; [MathOperator.add]: MathAddItem; } export type ExpressionOperatorItem = BaseExpressionOperatorItem & IExpressionOperatorItem[ExpressionOperator]; export const TExpressionOperatorItem: t.Type<ExpressionOperatorItem> = t.recursion('ExpressionOperatorItem', () => t.intersection([ TBaseExpressionOperatorItem, t.union([TConditionalCaseItem, TMathAddItem]), ]) );
The key takeaway from this snippet is the structure. We created an interface where each property name is a computed property value, derived from one of the enumerated operators. Then, we pair that with the io-ts
shape to match as usual.
Here’s how the ExpressionOperatorItem
structure is composed:
- Define logical groups of operators e.g.
MathOperator
andConditionalOperator
- Widen the set of available operators by creating a union named
ExpressionOperator
- Create the
BaseExpressionOperatorItem
base interface for all operators to extend - Next, create the
IExpressionOperatorItem
interface, which creates a dictionary of available operator interfaces, keyed on the computed properties of each corresponding operator - Finally, define the mutually recursive
ExpressionOperatorItem
type, which is the intersection of theBaseExpressionOperatorItem
and all the possible operators shapes provided by the dictionary.
Given this structure, we were able to define a wide variety of operators. So far, we’ve implemented quite a few operators and continue to add as we need. To date we’ve implemented:
- Comparison operators
BETWEEN
=
>
>=
IN
IS
<
<=
<>
- Conditional operator
CASE
- Logical operators
AND
OR
NOT
- Math operators
+
/
^
%
*
()
-
- Pattern operators
LIKE
~
!~
SIMILAR TO
- String operators
||
IS NORMALIZED
OVERLAY
POSITION
SUBSTRING
TRIM
- Type operator
CAST
Because each operator is unique, each shape is allowed to have any criteria that may or may not overlap any other operator. This allows for unlimited extension of the structure.
The next sections describe the definitions and provides examples of the +
operator and the CASE
operator.
Simple Math Operator Example
The MathAddItem
operator provides a simple example of implementing the +
SQL operator in JSON.
Here is an example usage of SQL +
:
SELECT A.item_count + 1;
Here is the interface:
export const TMathExpressionItem = t.intersection([ TBaseExpressionOperatorItem, t.type({ operator: TMathOperator, }), ]); export type MathExpressionItem = t.TypeOf<typeof TMathExpressionItem>; export interface MathAddItem extends MathExpressionItem { source: ExpressionItem; operator: MathOperator.add; target?: ExpressionItem; } export const TMathAddItem: t.Type<MathAddItem> = t.recursion('MathAddItem', () => t.intersection([ TMathExpressionItem, t.type({ source: TExpressionItem, operator: t.literal(MathOperator.add), }), t.partial({ target: TExpressionItem, }), ]), );
📝 The
operator
property is the discriminator and, in the case ofMathAddItem
, this is the literal value'+'
.
The source
and target
properties are two addends, both of which are ExpressionItem
.
This means we can sum any of the following items: columns, values, functions (whose parameters are also expressions), nested sub-selects, and nested operators, and any combination thereof.
Here is the simple math example SQL statement above, as encoded in the schema:
{ "select": [ { "source": { "correlation": "A", "column": "item_count" }, "operator": "+", "target": { "value": 1 } } ] }
Complex Case Statement Example
In the more complex ConditionalCaseItem
operator, we build up the representation of an SQL CASE
statement in two parts.
Consider this example of a SELECT
statement containing a CASE
statement:
SELECT CASE WHEN status IS NULL THEN 0 WHEN UPPER(status) IN ('OPEN', 'REMODEL') THEN 1 ELSE 2 END;
In order to support this type of structure, we created an interface to define the shape of the WHEN
. This structure consists of a search condition paired with the result of the statement when that condition evaluates to true.
To complete the overall structure of the CASE
statement, we composed a larger structure having the discriminating operator
, a collection of when
items, and an optional else
expression.
Here is the complete CASE
statement interface:
export interface ConditionalCaseWhenItem { where: ExpressionItem; then: ExpressionItem; } export const TConditionalCaseWhenItem: t.Type<ConditionalCaseWhenItem> = t.recursion('ConditionalCaseWhenItem', () => t.type({ where: TExpressionItem, then: TExpressionItem, }), ); export interface ConditionalCaseItem extends BaseExpressionOperatorItem { operator: ConditionalOperator.case; when: ConditionalCaseWhenItem[]; else?: ExpressionItem; } export const TConditionalCaseItem: t.Type<ConditionalCaseItem> = t.recursion('ConditionalCaseItem', () => t.intersection([ TBaseExpressionOperatorItem, t.type({ operator: t.literal(ConditionalOperator.case), when: t.array(TConditionalCaseWhenItem), }), t.partial({ else: TExpressionItem, }), ]), );
📝 The
operator
property is the discriminator and the value isCASE
.
The previous CASE
example, encoded in the schema, looks like this:
{ "select": [ { "operator": "CASE", "when": [ { "where": { "source": { "column": "status" }, "operator": "IS", "target": { "value": null } }, "then": { "value": 0 } }, { "where": { "source": { "functionName": "UPPER", "arguments": [{ "column": "status" }] }, "operator": "IN", "values": [{ "value": "OPEN" }, { "value": "REMODEL" }] }, "then": { "value": 1 } } ], "else": { "value": 2 } } ] }
What’s Next?
In this article, I explained how we encoded general SQL expression using a number of io-ts schemas. The ExpressionItem
is highly extensible and supports a wide variety of shapes. Some of the shapes, such as ExpressionValueItem
, are simple and straightforward. While other more complex shapes, such as ExpressionOperatorItem
pivot around a common operator
and provide an entire dictionary of possibilities.
Moreover, the ExpressionItem
has a mutually recursive relationship with the SubSelect
. This means that the definition of ExpressionItem
relies on SubSelect
and vice versa. Specifically, one of the possible shapes of an ExpressionItem
is the SubSelect
. Moreover, many parts of the SubSelect
allow for the specification of ExpressionItem
.
In the next article in the series, we’ll explore even more SubSelect
shapes, such as how to join tables with the FromItem
and how to filter results with the WhereItem
. We’ll even go into SQL set processing support with UNION
and INTERSECT
. Finally, we’ll discuss the WithItem
, which provides support for common-table expressions. At the end, we’ll have described a schema with complete FullSelect
support.
In later articles, we’ll dive deep into the SQL parser responsible for converting this complex schema into an SQL SELECT
statement.
But we’re not done there. Remember, all of this is in a Nestjs application. We have yet to discuss the Nestjs DTOs and the beautiful Swagger documentation, all of which is covered later on.