BigQuery SQL Syntax: Differences with Standard SQL

Dan LeBlanc
April 6, 2023

This article will outline some key differences for writing the dialect of SQL used in BigQuery, called GoogleSQL, as it differs from standard SQL syntax used in Redshift and Snowflake.

Previously, BigQuery supported a more divergent SQL dialect for BigQuery (now referred to as Legacy SQL).

Data Types

BigQuery does not support a number of different data types that are used in other platforms. For instance:

  • STRING is used instead of CHAR or VARCHAR.
  • There is no TIMESTAMP without a timezone in BigQuery. Bigquery uses UTC.
  • BIGDECIMAL is used instead of NUMBER or DECIMAL.
  • You cannot set DEFAULT for data types or for constraints such as NULL or NOT NULL.

Notes on Functions

BigQuery requires certain differences around functions due to differences in its architecture. Three notable ones that we’ve encountered:

  • MD5 is stored as a "byte" in BigQuery. Therefore, you must use TO_HEX(MD5) as a function to convert the MD5 to a string.
  • Truncating timestamps with ::TIMESTAMP or ::DATE does not work. Instead, you must use DATE() or DATETIME_TRUNC() to get just the date.
  • ILIKE (i.e., the case insensitive LIKE) does not exist! You should change your query to use LOWER on the comparisons.

CASCADE Support

The CASCADE argument, which eliminates a dataset and all resources within that dataset, is only available for DROP SCHEMA, not DROP TABLE.

In order to drop all the resources for a table you need to explicitly call out those resources as CASCADE does not work at the table level.

BigQuery SQL Gotchas

Commenting in BigQuery

When running SQL via ODBC avoid using multi-line comments like this:


-- ---------------------------------------------------
-- create common analytics schemas
-- ---------------------------------------------------

Instead, you need to use:


/* 
create common analytics schemas
*/

Or:


--
-- create common analytics schemas
--

Hot Tip: SQL execution via ODBC stalls when using the first mode of multi-line comment.

Using CLUSTER BY in BigQuery

When using CLUSTER BY with more than one field, do not use (), as BigQuery interprets fields inside parenthesis as a STRUCT.


CREATE TABLE IF NOT EXISTS SCHEMA.TABLE_NAME { 
 field_1 STRING, 
 field_2 STRING, 
 field_3 STRING
)
CLUSTER BY (field_1, field_2);

Instead, you need to use:


CREATE TABLE IF NOT EXISTS SCHEMA.TABLE_NAME { 
 field_1 STRING, 
 field_2 STRING, 
 field_3 STRING
)
CLUSTER BY field_1, field_2;

Other SQL Gotchas That We’ve Found

  • Don't name a column and a CTE the same, as it will interpret the column as an array.
  • ::DATE does not work. Use DATE() instead.
  • NVL does not work. Use COALESCE instead.
  • DELETE for upsert (i.e., update or insert) needs to have a JOIN.
  • SPLIT_PART does not work. Must use SPLIT, but that will turn what’s returned into an array, so we recommend something specific: e.g., SPLIT(email, '@')[OFFSET(0)] and SPLIT(email, '@')[OFFSET(1)].
  • UNION needs to be UNION ALL.

Keep Reading

Looking to choose a data warehouse for your brand?

We wrote an article comparing Snowflake, Redshift, and BigQuery: Choosing a Cloud Data Warehouse

Looking to build a custom data stack?

Daasity supports the varied data architecture, analytics, and reporting needs of consumer brands selling via eCommerce, Amazon, retail, and/or wholesale. Using Daasity, teams across an organization get a centralized and normalized view of all their data, regardless of the tools in their tech stack and how their future data needs may change. 

For more information about Daasity, our 60+ integrations, and how the platform drives more profitable growth for 1600+ brands, learn more here.

< src="https://unpkg.com/@lottiefiles/lottie-player@latest/dist/lottie-player.js" defer>