Heap Blog


How We Found a Missing Scala Class

Here’s a fun tale of debugging that starts off with a common alert nagging our on-call rotation, leads us to a very confusing NoClassDefFoundError, and ends up with us knowing a lot more about dynamic tracing in the JVM. An on-call annoyance We use Flink to compute near-real-time aggregations based on incoming events. Flink is […]

Virtual Events: Making Data-Driven Decisions a Reality

Our mission is to make it possible for anyone to use data to make better decisions. It should be easy for people to be right, instead of making decisions based on gut or loudness of voice. This same mission can be said to apply to at least a dozen other analytics companies, some dating back […]

Data Virtualization is Reshaping Analytics

A top reason customers choose Heap is for the ability to automatically capture customer behaviors on their website and integrate them with the rest of their customer data stack: shopping carts, payment platforms, CRM, email marketing, and A/B testing suites, to name a few. Heap makes sure that you have all the data you need […]

Making Our APIs Solid, By Breaking Them In Production

Collecting a data-set that our customers can trust is a precondition for our product to be useful. The difference between data collection working most of the time and all of the time is the difference between our product being flaky or stable. You can only get so far by whiteboarding your stack and trying to […]

Testing Database Changes the Right Way

Under the hood, Heap is powered by a petabyte-scale cluster of Postgres instances. Our dataset is large, and our customers run a wide variety of open-ended, ad hoc analytical queries that cover large portions of this dataset. In order to support ever larger customers and more complex queries over such a large amount of data, […]

How We Write Front-end Code

Writing front-end code in a sufficiently complex web app has never been an easy task. With all the view, state management, and routing libraries out there, it can be hard to know how best to fit the pieces together. Through writing many thousands of lines of code for the Heap web app, we’ve found that […]

Enabling New Projects At The Recurse Center

We’re big fans of the Recurse Center. RC runs retreats for programmers, in which engineers spend six to twelve weeks working on self-directed projects. It attracts an eclectic crew of smart, curious people. As such, it’s an excellent organization for us to partner with. We started working with RC in 2015, and since then Recursers […]

How To Structure Permissions In A SaaS App

So you’re building a SaaS product and you want to serve real customers and start making those fat enterprise bucks. Great! Now you need to support weird stuff you’ve never heard of before like LDAP, SAML, SSO, and… RBAC. What is RBAC? Role Based Access Control is a system for organizing permissions and specifying who […]

Decrypting PgBouncer’s Diagnostic Information

If you use Postgres at scale, at some point you’ll need a connection pooler. Postgres lets you configure a maximum number of concurrent queries via max_connections, but if you need to handle bursts of more than a few dozen connections at a time, you probably don’t want to provision a lot of connection slots to […]

Analyzing Millions of Postgres Query Plans

Making Heap fast is a unique and particularly difficult adventure in performance engineering. Our customers run hundreds of thousands of queries per week and each one is unique. What’s more, our product is designed for rich, ad hoc analyses, so the resulting SQL is unboundedly complex. For some background, Heap is a tool for analyzing […]