Cluster-based access controls

Problem

Cockroach Labs faced a major problem: enterprise customers were unwilling to onboard multiple teams. Expansion revenue within the enterprise customer segement was a growth lever company leadership was keen on activating more.

Granted that a typical evaluation took between 6 months, we had to deliver on at least some asks around access controls needed to get teams started by June 2023.

Solution

Supporting enterprise customers needs for robust access controls required a multi-layered approach. After refactoring the permissions platform supporting CockroachDB Cloud, we layered on features like cluster-based access, unified roles for both human and service accounts, and updated UIs for managing users.

Cluster-scoped access control

Cluster-scoped roles allowed individual engineering teams to manage access to their own CockroachDB clusters.

Unified roles for human and service accounts

The significant refactor of the authorization system to unify human and service account authorization allowed us to not only develop more robust roles, but also assign the same roles to both human and service accounts.

Revised access management UIs

Borrowing patterns from the existing user invitation flow, we updated the role assignment UIs to support both cluster-scoped roles and the paradigm of multiple roles per user.

Impact

10+ enterprise new expansion evaluations

More than 10 enterprise customers began product and service migration evaluations for additional products once cluster-based access controls was delivered.

5 de-risked enterprise contract renewals

More than 10 enterprise customers began product and service migration evaluations for additional products once cluster-based access controls was delivered.

Increased developer velocity

More than 10 enterprise customers began product and service migration evaluations for additional products once cluster-based access controls was delivered.

Approach

The company had a working hypothesis that enterprise customers would be able to leverage CockroachDB's differentiators most effectively. Barring financial considerations, we had expected more migration evaluations, signaled by more users being granted access to a customer's CockroachDB account.

Yet, deal analysis showed that expansion revenue was falling behind projections. User analytics indicated most customers' new user count nosedived to 0 after the 3 month mark.

his pattern provided my team and I a starting point: excluding financial considerations, what had to be true for enterprise customers to be comfortable adding more users i.e. onboarding more teams who in turn would begin migration evaluations?

Insights

To answer this question, my team and I scheduled interviews with more than 30 customers in the enterprise segment about what would give them the confidence to onboard their next team to CockroachDB. Amongst what we learned, 3 learnings stood out:

Teams had to manage their own clusters

In our interviews, customers mentioned that if they had multiple teams using CockroachDB, at minimum it'd be necessary for those teams to manage access to their own clusters. Jack from team A shouldn't be able to misconfigure clusters managed by Tony's team B and vice versa.

At the time, anyone with access to a customer's account had access to every cluster within that account. Meeting this requirement required us to devise access controls that allowed limiting or "scoping" access to every cluster.

Programmatic management of clusters

During our interviews, more than 3/4 of our enterprise customers mentioned they used an infrastructure-as-code tool like Terraform and Pulumi to coordinate 3rd-party cloud infrastructure services within the customer's ecosystem. Typically these tools leveraged services accounts to interact with services like CockroachDB through public APIs.

Knowing this usage pattern signaled that programmatic access via public APIs would be critical to enterprise customers. As CockroachDB service accounts used a different permission model than human accounts, we realized that consistency mattered. We had to find some way for human and service accounts to utilize the same permission model.

Momentum mattered

Enterprise customers also shared that momentum was key for migration evaluations. If a team was told they could begin experimenting with CockroachDB, it had to be possible to provision access quickly to the team conducting the evaluation.

While access to services like CockroachDB was usually managed through a central identity management tool, in the case of evaluations most enterprises were willing to side step the normal access policies. Customers shared that even when tools were approved for use, ongoing evaluations typically meant access was shared from within the service's provided access management UIs.

Thus, while programmatic access via service accounts was key for getting a team running with CockroachDB in production, we had to ensure that our UI was also usable and durable for the long term.

Benchmarking

Though our customer conversations provided insights into the needs we were solving for, there were still a lot of questions about how "cluster-scoped access controls" and "unified user and service account permissions" would actually manifest in our product experience.

Analyzing our existing product experience, I noted how CockroachDB already supported "users"and "roles", and how those concepts were reinforced by the layout of our access management UIs. The challenge was how to introduce the third decision of "scope" in an unobtrusive way to our customers.

Design Patterns

With clarity that the main design challenge was introducing this notion of 'scope' to our product, I shifted my attention to how scope, users, and roles were presented by industry leading cloud infrastructure products.

While there were interface and interaction differences from one product to the next, scope was often the first decision presented to users in other products, followed by selecting a user and then finally designating a specific role for that user.

Exploration

Seeing the scope-user-role pattern repeated across multiple industry leading infrastructure products was good signal that the pattern had merit and was a good starting point.

Exploring scope-first solutions, I realized that a scope-first solution required dramatically change how access was granted via our admin dashboard. While introducing new patterns and / or new UI components wasn't off limits, I was concerned that introducing dramatic changes in this midst of this significant permissions platform refactor would result in possibly unnecessary mental load for our enterprise customers.

Discussing this with my design lead and team, we all agreed to explore additional solutions.

Prototype

In search of an alternative, I returned to the concepts of scope, user, and role. Scope-user-role was out, but could another permutation of that mental model work better for us?

Going back to our design system, I noted also how our user invitation modal had a two-input layout and supported inviting multiple users. Was there a way to repurpose or extend this UI in a way to support cluster-based access controls?

Given the existing user-role mental model, I hypothesized that a extending the mental model to a user-scope-role model would help our customers adapt to a world where access could scoped-down to specific objects.

Testing

Healthily skeptical of my hypothesis, I scheduled a number of user testing sessions with both customers and folks from our sales and customer success organizations to validate the approach. Unsurprisingly, issues surfaced.

For example, participants didn't understand what the roles allow them to do just by the name of the role.

Nor did they really understand some of the affordances I had intended to help our users transition into a world with more roles

That said, users had no problem with the roles control!

That said, users had no problem with the roles control!

Incorporating the feedback from our user testing and adjusting to some development challenges surfaced by our work, we were able to deliver cluster based access controls in June 2023.

Delivery & next steps

Over the next several months, we saw an uptick of new users being added to at least 10 enterprise customers accounts, followed by those customers reaching out to their account management teams to discuss starting evaluations for migrating additional use cases over to CockroachDB.

As customers onboarded more teams, we soon realized that cluster-scoped roles was just the tip of the iceberg and that we had a long way to go towards fully supporting enterprise customers' needs. Towards that goal, my team shifted gears over to introducing Folders and updating billing for CockroachDB cloud.