SQL Template Examples
Customer Profile Schema
Name | Type | Description |
customer_key | STRING | |
first_name | STRING | |
last_name | STRING | |
full_name | STRING | |
STRING | ||
email_hash | STRING | |
email_organization | STRING | |
email_tld | STRING | |
mobile | STRING | |
gender_derived | STRING | |
first_order_id | STRING | |
last_order_id | STRING | |
app_id | STRING | |
count_purchase | INTEGER | |
monetary | FLOAT | |
average_monetary | FLOAT | |
monetary_discount | FLOAT | |
average_monetary_discount | FLOAT | |
total_order_rating | INTEGER | |
total_product_rating | INTEGER | |
total_product_price_viewed | FLOAT | |
count_product_price_viewed | INTEGER | |
average_order_rating | FLOAT | |
average_product_rating | FLOAT | |
average_product_price_viewed | FLOAT | |
count_landing | INTEGER | |
count_page_view | INTEGER | |
count_product_view | INTEGER | |
count_checkout_view | INTEGER | |
count_email_open | INTEGER | |
count_email_click | INTEGER | |
count_campaign_click | INTEGER | |
count_order_review | INTEGER | |
count_product_review | INTEGER | |
count_unq_landing_page | INTEGER | |
count_unq_page_view | INTEGER | |
count_unq_product_view | INTEGER | |
count_unq_checkout | INTEGER | |
count_unq_campaign_medium | INTEGER | |
ts_first_purchase | TIMESTAMP | |
ts_last_purchase | TIMESTAMP | |
ts_first_clickstream | TIMESTAMP | |
ts_last_clickstream | TIMESTAMP | |
ts_first_page_view | TIMESTAMP | |
ts_last_page_view | TIMESTAMP | |
ts_first_product_view | TIMESTAMP | |
ts_last_product_view | TIMESTAMP | |
ts_first_checkout | TIMESTAMP | |
ts_last_checkout | TIMESTAMP | |
ts_first_email_open | TIMESTAMP | |
ts_last_email_open | TIMESTAMP | |
ts_first_email_click | TIMESTAMP | |
ts_last_email_click | TIMESTAMP | |
ts_updated | TIMESTAMP |
Customer Events Schema
Name | Type | Description |
app_id | STRING | generally this should match the domain name without the tld, e.g. serenataflowers, hipper |
customer_key | STRING | this is set by UDF: generate key(email, mobile, customer_id) |
STRING | ||
email_hash | STRING | to get hash as a text string from email address: to_hex(sha256(email_address)) |
email_valid_format | BOOLEAN | flag for valid email format |
email_organization | STRING | e.g. gmail, hotmail, live, outlook |
email_tld | STRING | e.g. co.uk, .com, .net |
mobile | STRING | |
first_name | STRING | |
last_name | STRING | |
gender_derived | STRING | uses lookup table: 'sf-hipper.pa.gender` (we should move this elsewhere) based on first_name |
full_name | STRING | concatenated first and last. may also contain 'title' |
customer_id | STRING | customer id as per business source |
ts_event | TIMESTAMP | generally the time the event took place, i.e. not ingestion time |
ts_inserted | TIMESTAMP | the timestamp of the etl |
visitor_id | STRING | from clickstream visitor_domain_id_src which is the snowplow: visitor_domain_id column |
visitor_network_id | STRING | the network_id is non-domain specific, i.e. 3rd party cookie_id |
event_type | STRING | clickstream, purchase, review, page, email, ticket, preference |
event_subtype | STRING | view, landing, click (on site action), impression, order_review, product_review |
event | STRING | generally for clickstream events and email (TBD) only: opt-in, opt-out, subscribe, unsubscribe, page_view, page_landing, product_view, product_click, product_landing, checkout_view, add_to_cart, checkout_landing, order_confirmation_view, blog_view, blog_landing, campaign_impression, experiment_impression, email_send,email_impression, email_unsbuscribe, email_subscribe, email_bounce, product_impression ie. from clickstream_log.event_action |
event_source | STRING | for clickstream events this is: relative_url_domain_org (domain, less tld) (e.g. google for external clicks, serenataflowers for internal) for non clickstream events we set this to the data source, e.g. feefo, trustpilot |
event_source_url | STRING | this is 'relative_url_path' from clickstream_log, i.e.the referring url, less the query string. NULL for other events |
event_source_host | STRING | this is 'relative_url_host' from clickstream_log, i.e.the referring host. NULL for other events |
order_id | STRING | |
orderline_id | STRING | |
order_total_net | FLOAT | order total is always the sum of all order lines Net Revenue, AFTER discount and Tax (currently sales data is stored this way already in sales_amount_net) |
order_total_tax | FLOAT | vat or sales tax amount |
order_total_discount | FLOAT | |
ts_previous_order | TIMESTAMP | the time of the previous order of this customer |
ts_first_order | TIMESTAMP | the time of first order of this customer |
order_sequence | INTEGER | this is the sequence (frequency) of the current order_id for a given customer |
order_recency | INTEGER | recency in days since the last order for this customer |
ts_delivery_scheduled | TIMESTAMP | the scheduled order delivery date and time. if a specific time window then this is set to the max of the timewindow. e.g. if 8-10am on 1/7 is specified, then this is 2018-07-01 10:00:00 |
ts_dispatch_scheduled | TIMESTAMP | |
currency_code | STRING | ISO 3 letter code |
customer_language | STRING | as set by customer on the website for preferred language, get via sales feed |
customer_address_country | STRING | |
customer_address_province | STRING | e.g. Texas for US, Scotland for UK |
customer_address_county | STRING | e.g. Kent for UK |
customer_address_city | STRING | |
customer_address_postal_code | STRING | |
customer_address_line1 | STRING | |
customer_address_line2 | STRING | |
delivery_address_country | STRING | ISO 2 letter code |
delivery_address_province | STRING | |
delivery_address_county | STRING | |
delivery_address_city | STRING | |
delivery_address_locality | STRING | |
delivery_address_postal_code | STRING | |
fulfillment_company | STRING | |
fulfillment_branch | STRING | |
carrier | STRING | |
order_rating | INTEGER | on a 1-5 scale |
order_review_text | STRING | |
product_name | STRING | |
product_sku | STRING | |
product_sku_base | STRING | the product_sku parent or template |
product_type | STRING | |
product_is_addon | BOOLEAN | |
product_price | FLOAT | this is the gross product price before any discount or tax, typically what is displayed in product_views |
product_price_net | FLOAT | this is the net sales mount per product, i.e. product_price minus tax and discount (the accounting price) |
product_quantity | INTEGER | |
product_rating | INTEGER | on a 1-5 scale |
product_review_text | STRING | |
session_id | STRING | |
page_view_id | STRING | |
page_type | STRING | |
page_language | STRING | |
page_title | STRING | this is the meta title of the page, i think this is picked up by snowplow but we currently don't (but should) include it in our clickstream_log transform process |
url_path | STRING | url without the querystring |
url_host | STRING | |
url_query | STRING | we include this for completness, in theory seldom needed, but some site use the query string to render variants of pages and to avoid missing out those nuances we include the fully unparsed query string here |
url_fragment | STRING | some sites use this instead of query params |
site_search_query | STRING | need to be defined, per client, a parsing rule for how to extract the query string from a particular url on the client site |
impression_placement | STRING | generally used only for impression of a page element for 'page events' such as experiments and product impressions. typical sf values are: standard, add-on-listing, recommend |
impression_row | INTEGER | |
impression_column | INTEGER | |
experiment | STRING | used for tracking experiments (a/b testing), this is typically a name given to a particular experiment such as 'responsive_layout' |
experiment_variant | STRING | name of a particular variation in an a/b test, e.g. 'control, 'show_responsive_page' |
campaign_media_type | STRING | generally all campaigns variables are picked up from the standard utm_ parameters (, which are standard across the industry. however we need to extend the standard utm_ variables to include: utm_media_type with possible values of paid/earned. Generally, only paid media campaigns have ANY utm parameters set. So when we detect for example a search visit (e.g. from google) with no utm or other parameter, then we set this to 'earned' |
campaign_source | STRING | generally utm_source as picked up by snowplow |
campaign_medium | STRING | generally utm_medium as picked up by snowplow |
campaign_type | STRING | extended utm param: utm_campaign_type |
campaign | STRING | generally utm_campaign as picked up by snowplow |
campaign_content | STRING | generally utm_content as picked up by snowplow |
campaign_keyword | STRING | |
campaign_search_query | STRING | generally utm_term as picked up by snowplow |
campaign_click_id | STRING | whilst we currently don't get adwords click report, we could, and if so this ties the click as parsed from clickstream into the exact adcreative, camapign, keyword etc. which is useful for attribution. this data is available (via api) for adwords and bing |
sales_promotion_campaign | STRING | from sales/purchase event feed |
sales_promotion_campaign_type | STRING | from sales/purchase event feed |
sales_promotion_code | STRING | from sales/purchase event feed |
sales_promotion_medium | STRING | from sales/purchase event feed |
sales_promotion_discount_type | STRING | discount, percent |
ticket_id | STRING | for customer service tickets (we might want to check the most common columns used by the likes of ZenDesk and other SaaS services that offer a ticketing system. https://github.com/snowplow/snowplow/wiki/Zendesk-webhook-setup |
ticket_type | STRING | |
ticket_text | STRING | free text input of a customer service ticket |
ip_address | STRING | |
ip_number | INTEGER | the ip address represented as an integer |
ip_continent | STRING | |
ip_timezone | STRING | |
ip_country_code | STRING | ISO code |
ip_country | STRING | |
ip_province | STRING | |
ip_county | STRING | |
ip_town | STRING | |
ip_postal_code | STRING | |
ip_longitude | FLOAT | |
ip_latitude | FLOAT | |
ip_organization | STRING | |
ip_is_proxy | BOOLEAN | |
os_family | STRING | |
os_name | STRING | |
browser_family | STRING | e.g. chrome, internet explorer, safari |
browser_version | STRING | |
browser_language | STRING | language and locale, e.g. gb:en, us:en |
browser_width | INTEGER | the width in pixels of the browser window |
browser_height | INTEGER | the height in pixels of the browser window |
browser_pixels | FLOAT | the total number of pixels, width and height of browser divided by 1,000,000 (i.e. MegaPixels) |
device_type | STRING | mobile, computer, tablet, game console |
device_screen_width | INTEGER | the width in pixels of the device |
device_screen_height | INTEGER | the height in pixels of the device |
device_screen_pixels | FLOAT | the total number of pixels, width and height of the device divided by 1,000,000 (i.e. MegaPixels) |
order_review_id | STRING | the source ref for an order review |
product_review_id | STRING | the source ref for a product review |
review_heading | STRING | the headline for the review |
review_sentiment | FLOAT | the sentiment score 0-1 |
is_verified | BOOLEAN | is review done by verified customer |
preference | STRING | name of preference, e.g. ‘newsletter’ |
preference_medium | STRING | where preference was made from e.g. preference center |
preference_frequency | STRING | how often emails can be sent to the subscriber |
preference_delay_from | DATE | |
preference_delay_to | DATE | |
preference_unsubscribe_reason | STRING | |
preference_consent_text | STRING | the text used in checkbox “I agree” |
How to use templates
When creating an sql template we can use 1 dynamic value so called placeholder which user is able to change. We have 2 types of placeholders:
- Condition - allows user to change condition and value for given placeholder
- Value - user can change only the value while the condition is predefined in the template
You should be using conditions when given templates can be used in different scenarios where values can be compared differently. For example recency, could be used when its equal for example 1 - first time buyers, or when > 1 when segmenting all repeat buyers.
Using value placeholder requires setting condition in the template for example days = {X}, so template user can change only the days value. See below for all available placeholders and examples.
Condition Placeholders
- {condition:number} - any number column condition, like recency = 10 or recency > 10, where = and 10 are defined by the template user.
- {condition:datetime_interval} - for comparing timestamp columns with interval starting from now; for example ts_event > current_timestamp - interval 2 hours; here "2 hours" is selected by the user
- {condition:date} - any date column condition, like date(ts_event) = '2020-01-01' or date(ts_event) > '2020-01-01', where = and '2020-01-01' are select by the template user.
- {condition:date_range} - comparing a date value in a range, for example date(ts_event) between '2020-01-01' and '2021-01-01'; template user has to select exact from and to dates in this case.
- {condition:text} - any text column condition, where we can have exact matches event = 'purchase' and also partial matches like url like '%birthday%'
- {condition:number[]} - when user can define more than 1 number value for exact matches, for example recency IN (1,2,3) - user can enter 1,2,3 values
- {condition:text[]} - similar to number[] but for multiple text exact
Value Placeholders
- {value:number} - when condition should be fixed, for example event = 'purchase' - user can only specify event value in this case but not the condition.
- {value:date} - allows to specify date value only
- {value:text} - allows to specify text value only
SQL Template Examples
Visited page reminder template
event IN ('page_view', 'page_landing') AND url_path = {value:text} AND date(ts_event) = current_date - interval 370 days
AND event_type = 'purchase'