Grammar of JSON Queries
See here for a more comprehensive tutorial
Introduction
The format of this grammar approximates Extended Backus-Naur notation. However it is intended as input to human beings, not to parser generators such as Lex or Yacc. Do not expect formal rigor. Sometimes narrative text will explain things that are clumsy to express in formal notation. More often, the text will restate or summarize the formal productions.
Conventions:
The grammar is a series of productions.
A production consists of a name, followed by "::=", followed by a definition for the name. The name identifies a grammatical construct that can appear on the right side of another production.
Literals (including punctuation) are enclosed in 'single quotes', or in "double quotes" if case is not significant.
A single quotation mark within a literal is escaped with a preceding backslash: 'dog\'s tail'.
If a construct can be defined more than one way, then the alternatives may appear in separate productions; or, they may appear in the same production, separated by pipe symbols. The choice between these representations is of only cosmetic significance.
A construct enclosed within square brackets is optional.
A construct enclosed within curly braces may be repeated zero or more times.
JSON allows arbitrary white space between tokens. To avoid ugly clutter, this grammar ignores the optional white space.
In many cases a production defines a JSON object, i.e. a list of name-value pairs, separated by commas. Since the order of these name/value pairs is not significant, the grammar will not try to show all the possible sequences. In general it will present the required pairs first, if any, followed by any optional elements.
Since both EBNF and JSON use curly braces and square brackets, pay close attention to whether these characters are in single quotes. If they're in single quotes, they are literal elements of the JSON notation. Otherwise they are elements of the EBNF notation.
Primitives
We'll start by defining some primitives, to get them out of the way. They're mostly just what you would expect.
|
When json_query requires an integral value, it will usually accept a quoted string and convert it to an integer by brute force -- to zero if necessary. Likewise it may truncate a floating point number to an integral value. Scientific notation will be accepted but may not give the intended results.
|
The preferred way to encode a boolean is with the JSON reserved word true or false,
in lower case without quotation marks. The string true
, in upper,
lower, or mixed case, is another way to encode true. Any other string evaluates to
false.
As an accommodation to perl, numbers may be used as booleans. A numeric value of 1 means true, and any other numeric value means false.
Any other valid JSON value, such as an array, will be accepted as a boolean but interpreted as false.
The last couple of primitives aren't really very primitive, but we introduce them here for convenience:
|
A class_name is a special case of a string: the name of a class as defined by the IDL. The class may refer either to a database table or to a source_definition, which is a subquery.
|
A field_name is another special case of a string: the name of a non-virtual field as defined by the IDL. A field_name is also a column name for the table corresponding to the relevant class.
Query
The following production applies not only to the main query but also to most subqueries.
|
Except for the "distinct"
and no_i18n
entries,
each name/value pair represents a major clause of the SELECT statement. The name/value
pairs may appear in any order.
There is no name/value pair for the GROUP BY clause, because json_query generates it automatically according to information encoded elsewhere.
The "distinct"
entry, if present and true, tells json_query that
it may have to create a GROUP BY clause. If not present, it defaults to false.
The "no_i18n"
entry, if present and true, tells json_query to
suppress internationalization. If not present, it defaults to false. (Note that
"no_i18n"
contains the digit one, not the letter ell.)
The values for limit
and offset
provide the
arguments of the LIMIT and OFFSET clauses, respectively, of the SQL statement. Each
value should be non-negative, if present, or else the SQL won't work.
FROM Clause
The object identified by "from"
encodes the FROM clause of
the SQL. The associated value may be a string, an array, or a JSON object.
|
If from_list
is a class_name
, the
json_query inserts the corresponding table name or subquery into the FROM
clause, using the class_name
as an alias for the table
or subquery. The class must be defined as non-virtual in the IDL.
|
If from_list is a JSON array, then it represents a table-like function from which the SQL statement will select rows, using a SELECT clause consisting of "SELECT *" (regardless of the select_list supplied by the method parameter).
The first entry in the array is the name of the function. It must be a string
naming a stored function. Each subsequent entry is a function parameter. If
it is a string or a number, json_query will insert it into a comma-separated
parameter list, enclosed in quotes, with any special characters escaped as needed.
If it is the JSON reserved word null
, json_query will insert
it into the parameter list as a null value.
If from_list
is a JSON object, it must contain exactly one entry.
The key of this entry must be the name of a non-virtual class defined in the IDL.
This class will be the top-level class of the FROM clause, the only one named
outside of a JOIN clause.
|
If the associated data is a class_name
, json_query will
construct an INNER JOIN clause joining the class to the top-level clause,
using the columns specified by the IDL for such a join.
Otherwise, the associated data must be a JSON object with one or more entries, each entry defining a join:
|
The data portion of the "join_type"
entry tells json_query
whether to use a left join, right join, full join, or inner join. The values
"left"
, "right"
, and "full"
,
in upper, lower, or mixed case, have the obvious meanings. If the
"join_type"
entry has any other value, or is not present,
json_query constructs an inner join.
The "field"
and "fkey"
attributes specify the
columns to be equated in the join condition. The "field"
attribute refers to the column in the joined table, i.e. the one named by the
join_def
. The "fkey"
attribute refers to the
corresponding column in the other table, i.e. the one named outside the
join_def
-- either the top-level table or a table named by some
other join_def
.
It may be tempting to suppose that "fkey"
stands for "foreign key",
and therefore refers to a column in the child table that points to the key of a
parent table. Resist the temptation; the labels are arbitrary. The json_query
method doesn't care which table is the parent and which is the child.
These relationships are best explained with an example. The following
from_list
:
{ "aou": { "asv": { "type" : "left", "fkey" : "id", "field" : "owner" } } }
...turns into the following FROM clause:
FROM actor.org_unit AS "aou" LEFT JOIN action.survey AS "asv" ON ( "asv".owner = "aou".id )
Note in this example that "fkey"
refers to a column of the
class "aou"
, and "field"
refers to a
column of the class "asv"
.
If you specify only one of the two columns, json_query will try to identify the other one from the IDL. However, if you specify only the column from the parent table, this attempt will probably fail.
If you specify both columns, json_query will use the column names you specify, without verifying them with a lookup in the IDL. By this means you can perform a join using a linkage that the IDL doesn't define. Of course, if the columns don't exist in the database, the query will fail when json_query tries to execute it.
Using the columns specified, either explicitly or implicitly, the json_query method constructs a join condition. With raw SQL it is possible (though rarely useful) to join two tables by an inequality. However the json_query method always uses a simple equality condition.
Using a "filter"
entry in the join_def, you can apply one
or more additional conditions to the JOIN clause, typically to restrict the
join to certain rows of the joined table. The data associated with the
"filter"
key is the same sort of
where_condition
that you use for a WHERE clause
(discussed below).
If the string associated with the "filter_op"
entry is
"OR"
in upper, lower, or mixed case, then the json_query
method uses OR to connect the standard join condition to any additional
conditions supplied by a "filter"
entry.
(Note that if the where_condition
supplies multiple
conditions, they will be connected by AND. You will probably want to move
them down a layer -- enclose them in parentheses, in effect -- to avoid a
confusing mixture of ANDs and ORs.)
If the "filter_op"
entry carries any other value, or if
it is absent, then the json_query method uses AND. In the absence of a
"filter"
entry, "filter_op"
has no effect.
A "join"
entry in a join_def
specifies
another layer of join. The class named in the subjoin is joined to the class
named by the join_def
to which it is subordinate. By this
means you can encode multiple joins in a hierarchy.
SELECT Clause
If a query does not contain an entry for "select"
, json_query
will construct a default SELECT clause. The default includes every non-virtual
field from the top-level class of the FROM clause, as defined by the IDL. The
result is similar to SELECT *, except:
The default includes only the fields defined in the IDL.
The columns will appear in the same order in which they appear in the IDL, regardless of the order in which the database defines them.
There are other ways to specify a default SELECT list, as shown below.
If a "select"
entry is present, the associated value must
be a JSON object, keyed on class names:
|
The class_name
must identify either the top-level class or
a class belonging to one of the joins. Otherwise json_query will silently
ignore the select_list
.
|
If a field_list is either the JSON reserved word null
(in lower case) or an asterisk in double quotes, json_query constructs a
default SELECT list -- provided that the class is the top-level class of the
query. If the class belongs to a join somewhere, json_query ignores the
field_list
.
More commonly, the field_list
is a JSON array of zero or
more field specifications:
|
If the array is empty, json_query will construct a default SELECT list for the class -- again, provided that the class is the top-level class in the query.
In the simplest case, a field specification may name a non-virtual field defined in the IDL:
|
In some cases json_query constructs a call to the
oils_i18n_xlate
function to internationalize the value of the
selected column. Specifically, it does so if all the following are true:
the settings file defines a locale;
in the field definition for the field in the IDL, the tag
"il8n"
is present and true;the query does not include the
"no_il8n"
tag (or includes it with a value of false).
A field specification may be a JSON object:
|
The "column"
entry provides the column name, which must
be defined as non-virtual in the IDL.
The "alias"
entry provides a column alias. If no alias
is specified, json_query uses the column name as its own alias.
The "aggregate"
entry has no effect on the SELECT clause
itself. Rather, it affects the construction of a GROUP BY class. If there
is an "aggregate"
entry for any field, then json_query builds
a GROUP BY clause listing every column that is not tagged
for aggregation (or that carries an "aggregate"
entry with
a value of false). If all columns are tagged for
aggregation, then json_query omits the GROUP BY clause.
|
When a transform_spec
is present, json_query selects the
return value of a function instead of selecting the column directly. The entry
for "transform"
provides the name of the function, and the
column name (as specified by the "column"
tag), qualified by
the class name, is the argument to the function. For example, you might use such
a function to format a date or time, or otherwise transform a column value.
You might also use an aggregate function such as SUM, COUNT, or MAX (possibly
together with the "aggregate"
tag).
The "result_field"
entry, when present, specifies a subcolumn
of the function's return value. The resulting SQL encloses the function call
in parentheses, and follows it with a period and the subcolumn name.
The "params"
entry, if present, provides a possibly empty
array of additional parameter values, either strings, numbers, or nulls:
|
Such parameter values are enclosed in single quotes, with any special characters escaped as needed, and inserted after the column name as additional parameters to the function. You might, for example, use an additional parameter to provide a format string for a reformatting function.
WHERE Clause
There are two types of where_condition
: objects and arrays.
Of these, the object type is the more fundamental, and occurs at some level in every
where_condition
. The array type is mainly a way of circumventing
a limitation of the object type.
The object type of where_condition
is a comma-separated list
of one or more conditions
:
|
The generated SQL will include a code fragment for each condition
,
joined by AND (or in some cases by OR, as described below).
As usual for entries in a JSON object, each condition
consists
of a unique string to serve as a key, a colon, and an associated value.
The key string may be the name of a column belonging to the relevant table, or it may be an operator string. In order to distinguish it from any possible column name, an operator string always begins with a plus sign or minus sign.
JSON requires that every key string be unique within an object. This requirement imposes some awkward limitations on a JSON query. For example, you might want to express two conditions for the same column: id > 10 and id != 25. Since each of those conditions would have the same key string, namely "id", you can't put them into the same JSON object.
The solution is to put such conflicting conditions in separate JSON objects, and put the objects into an array:
|
The resulting SQL encloses each subordinate set of conditions
in parentheses, and connects the sets with AND (or in some cases OR, as described
below). It's possible to put only a single where_condition
in
the array; the result is to add a layer of parentheses around the condition.
There are two kinds of condition
where the operator begins
with a plus sign. In the simpler case, the associated data is simply a column name:
|
A plus_class
is a string that begins with a plus sign.
The rest of the string, after the plus sign, must be the class name for the table
to which the column belongs.
If the column is a boolean, then the resulting SQL uses it (as qualified by the class name) as a stand-alone condition.
Otherwise, this kind of syntax provides a way to place a column on the right side of a comparison operator. For example:
{ "from":"aou", "select": { "aou":[ "id", "name" ] }, "where": { "id": { ">": { "+aou":"parent_ou" } } } }
The resulting SQL:
SELECT "aou".id AS "id", "aou".name AS "name" FROM actor.org_unit AS "aou" WHERE ( "aou".id > ( "aou".parent_ou ) );
The other type of condition
that uses a
plus_class
applies a specified class name to a
where_condition
:
|
The resulting SQL is enclosed in parentheses, and qualifies the columns with the specified class name. This syntax provides a mechanism to shift the class context -- i.e. to refer to one class in a context that would otherwise refer to a different class.
Ordinarily the class name must be a valid non-virtual class defined in the IDL,
and applicable to the associated where_condition
. There is at
least one peculiar exception. The JSON fragment:
"+abc": { "+xyz":"frobozz" }
...is rendered as:
( "xyz".frobozz )
...even though neither "abc"
, nor "xyz"
,
nor "frobozz"
is defined in the IDL. The class name
"abc"
isn't used at all because the "+xyz"
operator overrides it. Such a query won't fail until json_query tries
to execute it in the database.
The other operators that may occur at this level all begin with a minus sign,
and they all represent familiar SQL operators. For example, the
"-or"
operator joins the conditions within a
where_condition
by OR (instead of the default AND), and
encloses them all in parentheses:
|
In fact the "-or"
operator is the only way to get OR into
the WHERE clause.
The "-and"
operator is similar, except that it uses AND:
|
Arguably the "-and"
operator is redundant, because you can
get the same effect by wrapping the subordinate where_condition
in a JSON array. Either technique merely adds a layer of parentheses, since AND
connects successive conditions by default.
The "-not"
operator expands the subordinate
where_condition
within parentheses, and prefaces the result
with NOT:
|
The "-exists"
or "-not-exists"
operator
constructs a subquery within an EXISTS or NOT EXISTS clause, respectively:
|
The remaining kinds of condition
all have a
field_name
on the left and some kind of predicate
on the right. A predicate
places a constraint on the value of
the column -- or, in some cases, on the value of the column as transformed by some
function call:
|
The simplest such constraint is to require that the column have a specific value, or be null:
|
You can also compare a column to a literal value using some kind of inequality. However it's a bit more complicated because you have to specify what kind of comparison to make:
|
A compare_op
is a string that defines a comparison operator.
Valid values include the following:
= <> != < > <= >= ~ ~* !~ !~* like ilike similar to
Strictly speaking, json_query accepts any compare_op
that doesn't contain semicolons or white space (or
"similar to"
as a special exception). As a result, it
is possible -- and potentially useful -- to use a custom operator like
">100*"
in order to insert an expression that would
otherwise be difficult or impossible to create through a JSON query. The ban
on semicolons and white space prevents certain kinds of SQL injection.
Note that json_query does not accept two operators that
PostgreSQL does accept: "is distinct from"
and "is not distinct from"
.
You can also compare a column to a null value:
|
The equality operator "="
turns into IS NULL. Any other
operator turns into IS NOT NULL.
When a compare_op
is paired with an array, it defines a
function call:
|
The first entry in the array is the function's name. Subsequent entries in the array, if any, represent the parameters of the function call. They may be strings, numbers, or nulls. In the generated SQL, the function call appears on the right of the comparison.
The "between"
operator creates a BETWEEN clause:
|
Although PostgreSQL allows a null value in a BETWEEN clause, json_query requires literal non-null values. It isn't sensible to use null values in a BETWEEN clause. A few experiments show that the results of the comparison are peculiar and erratic.
There are two ways to create an IN list of allowed values. The simplest is to put literal values into a JSON array:
|
As with BETWEEN clauses, json_query does not accept null values in an IN list, even though PostgreSQL does allow them. Nulls are not sensible in this context because they never match anything.
Having Clause
For the HAVING clause, json_query accepts exactly the same syntax as it accepts for the WHERE clause.
The other way to create an IN list is to use an explicit
"in"
operator with an array of literal values. This format
also works for the "not in"
operator:
|
Another kind of IN or NOT IN clause uses a subquery instead of a list of values:
|
The remaining types of predicate
can put a function call on
the left of the comparison, by using a transform_spec
together
with a "value"
tag. The transform_spec
is
optional, and if you don't need it, the same SQL would in many cases be easier to
express by other means.
The transform_spec
construct was described earlier in
connection with the SELECT clause, but here it is again:
|
As in the SELECT clause, the "transform"
string names the
function. The first parameter is always the column identified by the field_name.
Additional parameters, if any, appear in the param_list
. The
"result_field"
string, if present, identifies one column of a
multicolumn return value.
Here's a second way to compare a value to a literal value (but not to a null value):
|
...and a way to compare a value to a boolean expression:
|
The final predicate is another way to put a function call on the right side of the comparison:
|
This format is available for the sake of consistency, but offers no advantage over the simpler version.
ORDER BY Clause
There are two ways to encode an ORDER BY clause: as an array, or as a list. Either may be empty, in which case the generated SQL will not include an ORDER BY clause:
|
If not empty, the array contains one or more objects, each defining a sort field:
|
The "class"
and "field"
entries are
required, and of course the field must belong to the class. Furthermore, at
least one field from the class must appear in the SELECT clause.
The "direction"
entry, if present, specifies whether the
sort will be ascending or descending for the column in question. If the associated
value begins with "D" or "d", the sort will be descending; otherwise the sort will
be ascending. If the value is a number, it will be treated as a string that does not
start with "D" or "d", resulting in an ascending sort.
In the absence of a "direction"
entry, the sort will be
ascending.
The transform_spec
works here the same way it works in the
SELECT clause and the WHERE clause, enabling you to pass the column through a
transforming function before the sort:
|
When the order_by_list
is an object instead of an array,
the syntax is less verbose, but also less flexible. The keys for the object are
class names:
|
Each class must be referenced in the SELECT clause.
As in the SELECT clause, all the fields for a given class must be grouped together. You can't sort by a column from one table, then a column from a second table, then another column from the first table. If you need this kind of sort, you must encode the ORDER BY clause as an array instead of an object.
The data associated with a class_name
may be either an array
or an object. If an array, it's simply a list of field names, and each field must
belong to the class:
|
With this syntax, the direction of sorting will always be ascending.
If the data is an object, the keys are field names, and as usual the fields must belong to the class:
|
Since the field_name
is the key for the object, it cannot
appear more than once. As a result, some kinds of sorts are not possible with this
syntax. For example, one might want to sort by UPPER( family_name ), and then by
family_name with case unchanged, to make sure that "diBona" comes before "Dibona".
For situations like this, you must encode the ORDER BY clause as an array rather
than an object.
The data associated with each field_name
may take either of
two forms. In the simplest case, it's a literal value to specify the direction
of sorting:
|
If the literal is a string starting with "D" or "d", json_query sorts the field in descending order. Otherwise it sorts the field in ascending order.
In other cases, the field_name
may be paired with an object
to specify more details:
|
As before, the value tagged as "direction"
specifies the
direction of the sort, depending on the first character. If not otherwise
specified, the sort direction defaults to ascending.
Also as before, the transform_spec
may specify a function
through which to pass the column.
Since both the "direction"
tag and the
transform_spec
are optional, the object may be empty:
|