Heap Blog


Making Our APIs Solid, By Breaking Them In Production

Collecting a data-set that our customers can trust is a precondition for our product to be useful. The difference between data collection working most of the time and all of the time is the difference between our product being flaky or stable. You can only get so far by whiteboarding your stack and trying to […]

Testing Database Changes the Right Way

Under the hood, Heap is powered by a petabyte-scale cluster of Postgres instances. Our dataset is large, and our customers run a wide variety of open-ended, ad hoc analytical queries that cover large portions of this dataset. In order to support ever larger customers and more complex queries over such a large amount of data, […]

How We Write Front-end Code

Writing front-end code in a sufficiently complex web app has never been an easy task. With all the view, state management, and routing libraries out there, it can be hard to know how best to fit the pieces together. Through writing many thousands of lines of code for the Heap web app, we’ve found that […]

Enabling New Projects At The Recurse Center

We’re big fans of the Recurse Center. RC runs retreats for programmers, in which engineers spend six to twelve weeks working on self-directed projects. It attracts an eclectic crew of smart, curious people. As such, it’s an excellent organization for us to partner with. We started working with RC in 2015, and since then Recursers […]

How To Structure Permissions In A SaaS App

So you’re building a SaaS product and you want to serve real customers and start making those fat enterprise bucks. Great! Now you need to support weird stuff you’ve never heard of before like LDAP, SAML, SSO, and… RBAC. What is RBAC? Role Based Access Control is a system for organizing permissions and specifying who […]

Decrypting PgBouncer’s Diagnostic Information

If you use Postgres at scale, at some point you’ll need a connection pooler. Postgres lets you configure a maximum number of concurrent queries via max_connections, but if you need to handle bursts of more than a few dozen connections at a time, you probably don’t want to provision a lot of connection slots to […]

Analyzing Millions of Postgres Query Plans

Making Heap fast is a unique and particularly difficult adventure in performance engineering. Our customers run hundreds of thousands of queries per week and each one is unique. What’s more, our product is designed for rich, ad hoc analyses, so the resulting SQL is unboundedly complex. For some background, Heap is a tool for analyzing […]

Migrating To React + MobX While Shipping New Features

A year ago our front-end was written in a cumbersome combination of Backbone, TypeScript, and a custom state management layer. It was maintainable, but we wanted to ship features faster than it would let us. We wanted to migrate to a React + MobX architecture, but we couldn’t afford to spend six months rewriting most […]

Terraform Gotchas And How We Work Around Them

Heap’s infrastructure runs on AWS, and we manage it using Terraform. This post is a collection of tips and gotchas we’ve picked up along the way. Terraform and infrastructure as code Terraform is a tool from Hashicorp to help manage infrastructure declaratively. Instead of manually creating instances, networks, and so on in your cloud provider’s […]

How Basic Performance Analysis Saved Us Millions

This is the story of how I applied basic performance analysis techniques to find a small change that resulted in a 10x improvement in CPU use for our Postgres cluster and will save Heap millions of dollars over the next year. Indexing Data for Customer Analytics Heap is a customer analytics tool that automatically captures […]

Redshift Pitfalls And How To Avoid Them

Amazon Redshift is a data warehouse that’s orders of magnitudes cheaper than traditional alternatives. Many companies use it, because it’s made data warehousing viable for smaller companies with a limited budget. Since so many Heap customers use Redshift, we built Heap SQL to allow them to sync their Heap datasets to their own Redshift clusters. […]