Reach out with questions, bugs, or suggestions. We'll respond ASAP. Your message is sent! We'll respond via email shortly.

Heap SQL Overview

Data Syncing

Heap SQL is a dynamic data warehouse and everything in Heap SQL is completely retroactive just like the Heap dashboard. As you define new events and modify existing events, we seamlessly update your cluster schema with all historical data, which ensures that Heap SQL is always a clean and accurate representation of your raw data since you’ve been using Heap. For more information on our sync process, please refer to our documentation.

Heap SQL Schema

In Heap, all data is hierarchical: you have many users who have many sessions and every session has many events. As a result, you can have metadata attached to every level of this hierarchy, which we refer to as user, session and event-level properties. Any information on the user-level propagates to the session and event level, and any information on the session level propagates to the event level.

For example, any user property (e.g. their email address) is available as a property on the session and event level in Heap, and any session property (e.g. UTM Source) is available as a property on the event level. However, any event level property (e.g. URL Path) is not available on the session or user-level as it pertains to only where the event itself occurred. For a complete list of all of the information we collect out of the box for each type of property, please refer to our documentation.

Heap SQL adopts a similarly hierarchical schema. By default, every environment schema will have a users table (for all user information), a sessions table (for all session information) and a table for every defined event in Heap or logged via our API. We’ll also create a table entitled all_events, which includes basic information about every event in the event tables joined together for ease in aggregate analysis. Please refer to our documentation for a detailed description of each table’s schema.

It is important to note that Heap SQL contains all of your defined data in Heap. Out of the box, Heap will automatically collect all user interactions (pageviews, clicks, etc.), which we refer to as raw data. Using our platform, we enable end-users to associate semantic meaning with this raw data. For instance, clicking on a cta.signup might actually mean that a user signed up. Defined events are created when one associates this semantic meaning and given a name such as Sign Up. After a user defines an event, Heap will automatically sync all of the historical data associated with the user interaction in its own table as well as include these events in the all_events table. You can read more about defining events here.

Table Relationships and Joining with Other Data

Joining Heap SQL Tables

Given the hierarchical nature of the data model, joining user information with event data is simple in Heap SQL. All you need to do is join the user table using Heap’s unique User ID with the table(s) you are analyzing.

main_production.users JOIN main_production.[event_name] ON main_production.users.user_id=main_production.[event_name].user_id

The same approach can be taken using a combination of the user and session ID by joining the sessions table with any event table(s) of interest.

Joining With Other Data Sets

Similarly, Heap SQL allows you to easily combine user interaction data with data from other sources such as support, marketing automation, or revenue data. We recommend sending an internal user ID shared between these sources via our Identify API, which is then exposed retroactively on the users table in Heap and ready to join with other sources. In most cases, you’ll need to join the event and users tables first so that the internal ID is available on the event level.

SELECT main_production.users.email, is_won
FROM main_production.users 
JOIN main_production.viewed_customer_page 
    ON main_production.users.user_id=main_production.viewed_costumer_page.user_id 
JOIN salesforce_.accounts 
    ON main_production.users.email=salesforce._account.name

Depending on the structure of your data, this query would show the account email of users who had viewed the customer page and whether or not that contract was won. Analyzing these two sources together can reveal whether or not the customers page is correlated with the number of deals won and help you answer questions like, “How can my sales team push more prospects to view this documentation?” or “How can I change my page to make it a more valuable resource?”

Next Topic: Aggregate Metrics