NatureBox is an innovative food startup reinventing the way customers buy delicious, healthy food. Founded in 2012, NatureBox offers over 100 unique snacks—all made with high-quality ingredients that are free from artificial colors, flavors, or sweeteners. In addition to being delivered straight to your home or office, NatureBox is now available at Target and on Delta flights.
As VP of Engineering at NatureBox, David Lee oversees the company’s Software Engineering, Product Management, Product Design, Data Science, and Business Intelligence teams. With hundreds of products and many more customers, having flexible analytics infrastructure in place is key for the company’s growth. After trying a number of options, the team started using Heap to get deeper insight into customer behavior.
The Early Era with Google Analytics
The first evolution of the team’s approach to analysis involved a familiar tool: Google Analytics. But for the team, Google Analytics was a poor fit for a few significant reasons:
- Too much instrumentation. “We were turning over our codebase once every ~7 months, so instrumentation was always an afterthought. We struggled to add it to our product specs in the early days,” David said.
- Inflexible and not fully self-serve. “We wanted to enable people outside of engineering to get insights on an ad hoc basis and in a more intuitive way that wouldn’t require a product requirements document and dozens of spreadsheets to complete, only to find out later that you had instrumented it wrong. We wanted more flexibility so that we could learn and adapt to what worked best for us,” said David.
- Leads to false confidence in data accuracy. GA comes with a number of built-in reports around funnels and segments. Says David, “These can be really misleading—they’re all superficial, and a lot of assumptions are being made. If a team using GA doesn’t know whether it’s instrumented completely or appropriately, that can lead to some dangerous business decisions.”
- Data stuck in silos. With just GA, there was no easy way to link back events to web behavior or extract that data. According to David, “We do a lot of offline processing, like customer billing, order fulfillment, recommendations, and predictive analysis. Our backend system has a lot of data. As a vertical, it’s important to view everything together—you can’t measure each part of that stack individually.”
- Can’t get detailed event insights. Pageview-level data is not super powerful for most apps, and the team lacked time to instrument events individually. “Anyone running ecommerce will tell you that website merchandising—the ordering, pricing, placement, and presentation of products—is incredibly important to what you sell, how fast you sell, and what you believe your customers like. Filters, sorting behavior—that’s important. Because marketing couldn’t see what people clicked on in GA when it came to website merchandising options, they had to correlate information,” said David.
With GA, the team couldn’t see what was actually driving user behavior. With this critical lack of insight, it was time to try something else.
Building Retroactive Tracking with Splunk x GA
After David joined, the team tried to build better tracking and enrich their GA data by building out a “homebrew solution” combining GA and Splunk. Or, as David joked, “The Poor Man’s Heap,” which would help them get insights retroactively. While more effective, this didn’t scale.
“We built this by adding JS classes that sent every single click identified by the childmost element back into a custom endpoint, which then fed into Splunk. Then, we’d go back and retroactively identify these events in Splunk,” said David.
“How getting insights looked in practice was, back in 2013, someone would have to ask me or the team a question and we’d have to go back into Splunk, start identifying events, then build reports, then spit out a .csv that they’d consume. But we were talking about one week’s turnaround time to answer a question. I used to say, ‘You have to ask me the right questions, ask me all the questions upfront so that I can divide up my time and dig into everything at once.’ This obviously doesn’t scale. Splunk isn’t for business users, and there was no self-serve functionality,” David said.
A few months into this experiment, David saw an early version of Heap, and was “so happy when I saw the Event Visualizer for the first time. I wouldn’t need to teach people how to use our custom setup anymore. Heap was easy to integrate, easy to onboard, data export was simple, and people could get up and running in just five minutes.”
Using Data in Every Decision
The team now uses Heap data in four ways, solving various analytics challenges from their earlier days:
- Information modeling. With Heap, the team can flexibly define events and model how their data is structured in both Heap and their data warehouse. With consistent definitions around their data, unlike with GA, the team can be confident in their analysis.
- Analysis with Heap SQL. Analysts are able to build their own models in a cohesive and flexible way across the company, as well as analyze data from multiple sources in one view, including Heap data via Heap SQL.
- Analysis using Heap. After transitioning, Heap data became the team’s primary source of truth for revenue-generating actions like conversions and core company KPIs. Business users (Marketing, Customer Insights, and Customer Service) can retroactively define events and segments, build quick reports, and answer questions using the Heap dashboard, without Engineering’s support.
Analytics Infrastructure Powered by Heap SQL
Using Heap SQL, Looker, and Redshift, the team can combine diverse data sources and analyze them with confidence. To model this data, NatureBox pumps in multiple data sources—from Heap, ERP systems, order fulfillment data, Facebook ads, and more—and performs the necessary transforms, all in Redshift. Heap data provides important context around customer preferences and behaviors.
“We combine the data because we don’t just want to see transaction data, we want to know preferences. All of that is generated by Heap data and pumped into our warehouse, where it’s combined. The way we cut the data—acquisition source, cohorts by product type—is all defined by Heap as well,” David says.
Everyone now understands how data flows into the warehouse from Heap, and if someone wants to see a cut of data that’s grounded in web behavior, they can do it easily.
Deep Investigative Analysis with Heap
Business users lean on Heap for “forensic research” when investigating trends or behavior—why did this happen, and how can I do more of it (if it’s good) or stop it (if it’s bad)? Heap helps them get contextual information on the fly, even if it’s not already defined.
When someone notices unusual behavior in another tool, they’ll dive into Heap to get more detail in just minutes.
“Maybe someone sees a spike in orders for a specific product and wants to know why,” David said. “With the Event Visualizer, they get quick wins just from identifying that event and getting context that they need. That goes a long way—using Heap, they can answer their question and take action in less than 15-20 minutes,” David said.
Creating and Comparing Cohorts
NatureBox has a few key customer segments—some regular subscribers, some who use NatureBox to order office snacks, others who order after-school treats. What are they buying? What are their snacking preferences? How frequently do they order? Heap helps NatureBox build stronger recommendations and engage users meaningfully through individual and cohort analysis.
“We’ve been trying to celebrate our top customers—people who have ordered a set number of times in the last six weeks. It would take way too long to look into an orders grid to find the customers that have >n orders. In Heap, it’s so simple to create a segment with customers with a set order frequency within a time period and watch it grow. Then, I can use List View to see what their clickstream looks like,” David said.
When they’d like to go one step deeper, Heap’s Compare feature makes it easy for them to create arbitrary definitions for complex personas and understand how event behavior compares across multiple groups. For example, the team can compare a group of users who’ve purchased 5x in the past month and found NatureBox via paid search vs. inactive customers that discovered NatureBox organically. Then, they can better answer questions like, “Which channels are most effective?” and “How are we retaining different user groups?”
Before, with Google Analytics, channels were predefined, and any changes required significant customization. With Heap, the team can flexibly specify what different criteria and channels mean to them.
Contextualizing A/B Test Data
At NatureBox, the team runs frequent A/B tests on how they order and present their many snack options. Because Heap ingests Optimizely data, the NatureBox team can use Heap to go beyond basic pass/fail test results and identify how their experiments impact larger objectives like feature engagement and retention.
With this granular insight, the team has been able to optimize experiments with longer-term business goals in mind and were able to boost their conversion rate 5x in just six weeks.
On the Horizon
From powering their analytics infrastructure to helping people answer questions in just minutes, the entire team at NatureBox now touches Heap data in some way, helping the team scale faster.
“We now have a single source of truth in our warehouse vs. in software silos, and Heap was the first to offer this as a feature,” David said. “When someone wants to build a campaign or build recommendations and asks, ‘Does this person consume pineapple snacks? Or peanuts?’ We’re finally able to understand that clearly and give customers a better experience.”