SQL Segment Examples
Customer Profile Schema
Name | Type | Description |
customer_key | STRING | |
first_name | STRING | |
last_name | STRING | |
full_name | STRING | |
STRING | ||
email_hash | STRING | |
email_organization | STRING | |
email_tld | STRING | |
mobile | STRING | |
gender_derived | STRING | |
first_order_id | STRING | |
last_order_id | STRING | |
app_id | STRING | |
count_purchase | INTEGER | |
monetary | FLOAT | |
average_monetary | FLOAT | |
monetary_discount | FLOAT | |
average_monetary_discount | FLOAT | |
total_order_rating | INTEGER | |
total_product_rating | INTEGER | |
total_product_price_viewed | FLOAT | |
count_product_price_viewed | INTEGER | |
average_order_rating | FLOAT | |
average_product_rating | FLOAT | |
average_product_price_viewed | FLOAT | |
count_landing | INTEGER | |
count_page_view | INTEGER | |
count_product_view | INTEGER | |
count_checkout_view | INTEGER | |
count_email_open | INTEGER | |
count_email_click | INTEGER | |
count_campaign_click | INTEGER | |
count_order_review | INTEGER | |
count_product_review | INTEGER | |
count_unq_landing_page | INTEGER | |
count_unq_page_view | INTEGER | |
count_unq_product_view | INTEGER | |
count_unq_checkout | INTEGER | |
count_unq_campaign_medium | INTEGER | |
ts_first_purchase | TIMESTAMP | |
ts_last_purchase | TIMESTAMP | |
ts_first_clickstream | TIMESTAMP | |
ts_last_clickstream | TIMESTAMP | |
ts_first_page_view | TIMESTAMP | |
ts_last_page_view | TIMESTAMP | |
ts_first_product_view | TIMESTAMP | |
ts_last_product_view | TIMESTAMP | |
ts_first_checkout | TIMESTAMP | |
ts_last_checkout | TIMESTAMP | |
ts_first_email_open | TIMESTAMP | |
ts_last_email_open | TIMESTAMP | |
ts_first_email_click | TIMESTAMP | |
ts_last_email_click | TIMESTAMP | |
ts_updated | TIMESTAMP |
Customer Events Schema
Name | Type | Description |
app_id | STRING | generally this should match the domain name without the tld, e.g. serenataflowers, hipper |
customer_key | STRING | this is set by UDF: generate key(email, mobile, customer_id) |
STRING | ||
email_hash | STRING | to get hash as a text string from email address: to_hex(sha256(email_address)) |
email_valid_format | BOOLEAN | flag for valid email format |
email_organization | STRING | e.g. gmail, hotmail, live, outlook |
email_tld | STRING | e.g. co.uk, .com, .net |
mobile | STRING | |
first_name | STRING | |
last_name | STRING | |
gender_derived | STRING | uses lookup table: 'sf-hipper.pa.gender` (we should move this elsewhere) based on first_name |
full_name | STRING | concatenated first and last. may also contain 'title' |
customer_id | STRING | customer id as per business source |
ts_event | TIMESTAMP | generally the time the event took place, i.e. not ingestion time |
ts_inserted | TIMESTAMP | the timestamp of the etl |
visitor_id | STRING | from clickstream visitor_domain_id_src which is the snowplow: visitor_domain_id column |
visitor_network_id | STRING | the network_id is non-domain specific, i.e. 3rd party cookie_id |
event_type | STRING | clickstream, purchase, review, page, email, ticket, preference |
event_subtype | STRING | view, landing, click (on site action), impression, order_review, product_review |
event | STRING | generally for clickstream events and email (TBD) only: opt-in, opt-out, subscribe, unsubscribe, page_view, page_landing, product_view, product_click, product_landing, checkout_view, add_to_cart, checkout_landing, order_confirmation_view, blog_view, blog_landing, campaign_impression, experiment_impression, email_send,email_impression, email_unsbuscribe, email_subscribe, email_bounce, product_impression ie. from clickstream_log.event_action |
event_source | STRING | for clickstream events this is: relative_url_domain_org (domain, less tld) (e.g. google for external clicks, serenataflowers for internal) for non clickstream events we set this to the data source, e.g. feefo, trustpilot |
event_source_url | STRING | this is 'relative_url_path' from clickstream_log, i.e.the referring url, less the query string. NULL for other events |
event_source_host | STRING | this is 'relative_url_host' from clickstream_log, i.e.the referring host. NULL for other events |
order_id | STRING | |
orderline_id | STRING | |
order_total_net | FLOAT | order total is always the sum of all order lines Net Revenue, AFTER discount and Tax (currently sales data is stored this way already in sales_amount_net) |
order_total_tax | FLOAT | vat or sales tax amount |
order_total_discount | FLOAT | |
ts_previous_order | TIMESTAMP | the time of the previous order of this customer |
ts_first_order | TIMESTAMP | the time of first order of this customer |
order_sequence | INTEGER | this is the sequence (frequency) of the current order_id for a given customer |
order_recency | INTEGER | recency in days since the last order for this customer |
ts_delivery_scheduled | TIMESTAMP | the scheduled order delivery date and time. if a specific time window then this is set to the max of the timewindow. e.g. if 8-10am on 1/7 is specified, then this is 2018-07-01 10:00:00 |
ts_dispatch_scheduled | TIMESTAMP | |
currency_code | STRING | ISO 3 letter code |
customer_language | STRING | as set by customer on the website for preferred language, get via sales feed |
customer_address_country | STRING | |
customer_address_province | STRING | e.g. Texas for US, Scotland for UK |
customer_address_county | STRING | e.g. Kent for UK |
customer_address_city | STRING | |
customer_address_postal_code | STRING | |
customer_address_line1 | STRING | |
customer_address_line2 | STRING | |
delivery_address_country | STRING | ISO 2 letter code |
delivery_address_province | STRING | |
delivery_address_county | STRING | |
delivery_address_city | STRING | |
delivery_address_locality | STRING | |
delivery_address_postal_code | STRING | |
fulfillment_company | STRING | |
fulfillment_branch | STRING | |
carrier | STRING | |
order_rating | INTEGER | on a 1-5 scale |
order_review_text | STRING | |
product_name | STRING | |
product_sku | STRING | |
product_sku_base | STRING | the product_sku parent or template |
product_type | STRING | |
product_is_addon | BOOLEAN | |
product_price | FLOAT | this is the gross product price before any discount or tax, typically what is displayed in product_views |
product_price_net | FLOAT | this is the net sales mount per product, i.e. product_price minus tax and discount (the accounting price) |
product_quantity | INTEGER | |
product_rating | INTEGER | on a 1-5 scale |
product_review_text | STRING | |
session_id | STRING | |
page_view_id | STRING | |
page_type | STRING | |
page_language | STRING | |
page_title | STRING | this is the meta title of the page, i think this is picked up by snowplow but we currently don't (but should) include it in our clickstream_log transform process |
url_path | STRING | url without the querystring |
url_host | STRING | |
url_query | STRING | we include this for completness, in theory seldom needed, but some site use the query string to render variants of pages and to avoid missing out those nuances we include the fully unparsed query string here |
url_fragment | STRING | some sites use this instead of query params |
site_search_query | STRING | need to be defined, per client, a parsing rule for how to extract the query string from a particular url on the client site |
impression_placement | STRING | generally used only for impression of a page element for 'page events' such as experiments and product impressions. typical sf values are: standard, add-on-listing, recommend |
impression_row | INTEGER | |
impression_column | INTEGER | |
experiment | STRING | used for tracking experiments (a/b testing), this is typically a name given to a particular experiment such as 'responsive_layout' |
experiment_variant | STRING | name of a particular variation in an a/b test, e.g. 'control, 'show_responsive_page' |
campaign_media_type | STRING | generally all campaigns variables are picked up from the standard utm_ parameters (, which are standard across the industry. however we need to extend the standard utm_ variables to include: utm_media_type with possible values of paid/earned. Generally, only paid media campaigns have ANY utm parameters set. So when we detect for example a search visit (e.g. from google) with no utm or other parameter, then we set this to 'earned' |
campaign_source | STRING | generally utm_source as picked up by snowplow |
campaign_medium | STRING | generally utm_medium as picked up by snowplow |
campaign_type | STRING | extended utm param: utm_campaign_type |
campaign | STRING | generally utm_campaign as picked up by snowplow |
campaign_content | STRING | generally utm_content as picked up by snowplow |
campaign_keyword | STRING | |
campaign_search_query | STRING | generally utm_term as picked up by snowplow |
campaign_click_id | STRING | whilst we currently don't get adwords click report, we could, and if so this ties the click as parsed from clickstream into the exact adcreative, camapign, keyword etc. which is useful for attribution. this data is available (via api) for adwords and bing |
sales_promotion_campaign | STRING | from sales/purchase event feed |
sales_promotion_campaign_type | STRING | from sales/purchase event feed |
sales_promotion_code | STRING | from sales/purchase event feed |
sales_promotion_medium | STRING | from sales/purchase event feed |
sales_promotion_discount_type | STRING | discount, percent |
ticket_id | STRING | for customer service tickets (we might want to check the most common columns used by the likes of ZenDesk and other SaaS services that offer a ticketing system. https://github.com/snowplow/snowplow/wiki/Zendesk-webhook-setup |
ticket_type | STRING | |
ticket_text | STRING | free text input of a customer service ticket |
ip_address | STRING | |
ip_number | INTEGER | the ip address represented as an integer |
ip_continent | STRING | |
ip_timezone | STRING | |
ip_country_code | STRING | ISO code |
ip_country | STRING | |
ip_province | STRING | |
ip_county | STRING | |
ip_town | STRING | |
ip_postal_code | STRING | |
ip_longitude | FLOAT | |
ip_latitude | FLOAT | |
ip_organization | STRING | |
ip_is_proxy | BOOLEAN | |
os_family | STRING | |
os_name | STRING | |
browser_family | STRING | e.g. chrome, internet explorer, safari |
browser_version | STRING | |
browser_language | STRING | language and locale, e.g. gb:en, us:en |
browser_width | INTEGER | the width in pixels of the browser window |
browser_height | INTEGER | the height in pixels of the browser window |
browser_pixels | FLOAT | the total number of pixels, width and height of browser divided by 1,000,000 (i.e. MegaPixels) |
device_type | STRING | mobile, computer, tablet, game console |
device_screen_width | INTEGER | the width in pixels of the device |
device_screen_height | INTEGER | the height in pixels of the device |
device_screen_pixels | FLOAT | the total number of pixels, width and height of the device divided by 1,000,000 (i.e. MegaPixels) |
order_review_id | STRING | the source ref for an order review |
product_review_id | STRING | the source ref for a product review |
review_heading | STRING | the headline for the review |
review_sentiment | FLOAT | the sentiment score 0-1 |
is_verified | BOOLEAN | is review done by verified customer |
preference | STRING | name of preference, e.g. ‘newsletter’ |
preference_medium | STRING | where preference was made from e.g. preference center |
preference_frequency | STRING | how often emails can be sent to the subscriber |
preference_delay_from | DATE | |
preference_delay_to | DATE | |
preference_unsubscribe_reason | STRING | |
preference_consent_text | STRING | the text used in checkbox “I agree” |
SQL Examples
Birthday reminders
event IN ('page_view', 'page_landing') AND date(ts_event) = current_date - interval 30 days AND url_path = '/en/uk/flowers/next-day-delivery/birthday'
AND event_type = 'clickstream'