Analytics Space : BQ Stored procedures conversion

New table g1-site.clean.etl_calendar allows tracking the last successfully processed datetime for each table, enabling incremental loading and controlled reprocessing.

For the first run record per table should be inserted manually

Structure:

table_id, dataset_name, table_name, last_calculated_date

Example:

1, ‘clean’, ‘bets’, ‘2025-06-20 13:00:00’

Staging tables will be created with correct data types and partitioned by load_date(DAY) (time when record was put into table) using this logic when we process data not by business date all late arrivings will be successfully processed and not take too much resources as well as new data

For readability purposes all parameters (passed when calling sp) will have prefix p_ and all variables (declared and set inside the sp) prefix v_

By default procedures will be running with all parameters set to NULL

Syntax of a stored procedure

CREATE OR REPLACE `g1-site.<dataset_name>.<sp_name>`
(
  p_brandName ARRAY<STRING>,
  p_date_from TIMESTAMP,
  p_date_to TIMESTAMP
-- optional additional parameters
)
BEGIN
-- variables should be declared only at the beginning of sp
DECLARE v_start_time TIMESTAMP;
DECLARE v_end_time TIMESTAMP;
SET v_start_time = COALESCE(p_date_from, (SELECT last_calculated_date FROM g1-site.clean.etl_calendar WHERE table_name = ‘<clean_table_name>’), ‘2025-06-01’);
SET v_end_time = COALESCE(p_date_to, CURRENT_TIMESTAMP());

-- temporary table used for data control and on complex transformations 

CREATE TEMP TABLE t_main
(
target table schema definition
);

INSERT INTO t_main (...)
SELECT 
  <needed_transformed_columns>,
  v_end_time AS process_date
FROM <staging_table_name>
WHERE (p_brand_name IS NULL OR ARRAY_LENGTH(p_brand_name) = 0 OR BrandName IN UNNEST(p_brand_name))
  AND load_date >= v_start_time
  AND load_date < v_end_time;
-- assuming staging table is partitioned by load_date

MERGE INTO <clean_table_name> a
USING t_main t
  ON <key_columns>
WHEN MATCHED THEN 
  UPDATE SET
    <update_columns_list>
WHEN NOT MATCHED THEN
INSERT (...)
VALUES (...);

/*
For simple loading from table1 to table2 and some joins subquery can be used instead of a temp table
MERGE INTO <clean_table_name> a
USING (SELECT FROM <staging>)
…
*/

UPDATE g1-site.clean.etl_calendar
SET last_calculated_date = GREATEST(last_calculated_date, v_end_time)
WHERE table_name = ‘<clean_table_name>’;

END;

Example can be found here - https://github.com/infopandora/Analitycs/blob/main/BigQuery/development/process_bets.sql

Run this code and stored procedure will be saved in specified dataset under the Rountines section.

To execute stored procedure use

CALL <sp_name>(NULL, NULL, NULL);

To recalculate data for some period or some specific brand specify them in parameters

CALL <sp_name>([‘<full_brand_name_1>’, ‘<full_brand_name_2>’, etc], ‘2025-01-01’, ‘2025-02-01’);

In case of recalculation v_end_time will be ‘2025-02-01’ and because last update uses GREATEST(last_calculated_date, v_end_time) it wont interfere with normal flow and the next day will be calculated from the last calculated date and not the last run date.

Important notes:

Avoid using select * where possible;

Keep indentations between keywords to make code easier to debug;

When more than one table is used specify an alias for each column;