TL; DR: After solving Twitter’s scaling woes, Evan Weaver and Matt Freels set their sights on creating a massively scalable NoSQL database that doesn’t sacrifice data consistency, security, or relational querying. The result, FaunaDB, is a modern and cloud-native solution for heavily distributed applications that breaks free of the traditional client-server oriented systems. That means financial institutions, eCommerce enterprises, and media or entertainment organizations can seamlessly organize, query, and manage user identities, payments, and other data with accuracy, security, and reliability.
As NVIDIA’s popularization of graphics processing units revolutionized the PC gaming market, parallel computing, and artificial intelligence, the forward-thinking company struggled to keep up with the increasing demand for its customer identity and access management platform.
The system, which enables users to update drivers and optimize settings, as well as capture and share videos, was heavily customized and required significant time and financial investments to change. The user database lived in the cloud but could not reliably scale.
NVIDIA’s Director of Cloud Services Bill Wagner turned to FaunaDB, a relative newcomer in the space of relational NoSQL databases. The cloud-native platform delivers a unique balance between the operational safety and security of traditional database infrastructures with the agility and scalability of more modern solutions.
“FaunaDB just worked out of the box with minimal operational effort,” Bill said in an online case study. “With FaunaDB, we’re able to support tens of millions of users with a small operational staff. FaunaDB’s advanced features, like global replication, let us maintain high availability and correctness even in the case of unexpected regional outages.”
After launching FaunaDB services in 2016, NVIDIA relies on the platform to power several customer-facing services across a variety of cloud environments, including VMWare and Amazon Web Services. The databases have provided 100% availability despite partial failures or network partitions and currently serve more than 30,000 requests per second — at one-tenth the cost of ownership of NVIDIA’s previous solution.
Conquering the ‘Fail Whale’ and Applying Real-World Experience
From the outset, FaunaDB has maintained a laser-like focus on the singular goal of building the ideal enterprise database — something that Director of Developer Evangelism and Experience Chris Anderson said makes the company surprisingly unique.
“Everything else has fallen out of an open-source project or was incubated at a big web company,” he said. “When we started out to build an enterprise database, it was based on the co-founder’s experience scaling Twitter. The solution we’re bringing to market is based around what customers want, instead of scratching our own itch.”
As Twitter gained traction in the late 2000s, the service crashed so frequently users were intimately familiar with Twitter’s colorful 404 page, lovingly dubbed the Fail Whale. Evan Weaver ran the software infrastructure team but couldn’t count on existing databases to scale alongside the rapid mobile-first adoption trends.
Evan and Matt Freels, the technical lead of Twitter’s database team, built special-purpose, highly optimized distributed data storage systems for all of the networks’ core workloads. Twitter officially retired the Fail Whale in 2013.
“These systems were inflexible, though, and we always wanted to have a reusable data platform we could rely on so we could stay focused on product development instead of ‘basic’ problems like scaling a site to a half-billion users,” Evan said in a VentureBeat article.
Evan left Twitter in 2011 and started FaunaDB the following year, with Matt joining soon after. The pair aimed to capitalize on the scalability and flexibility of NoSQL databases while also including the enterprise-friendly elements of security, reliability, and consistency found in relational systems. The result harnesses all of the above — while also showcasing business agility, developer productivity, and simplicity that lowers operational costs.
FaunaDB Balances Scaling With Accurate and Consistent Databases
Twitter’s scaling struggles coincided with a larger industry shift to NoSQL or distributed databases. Instead of following relational, table-based data oversight as found in traditional SQL solutions, NoSQL databases are a collection of key-value pairs, documents, graph databases, or wide-column stores that don’t have standard schema definitions.
That data flexibility extends into database owners’ ability to scale, as NoSQL easily accounts for additional servers; SQL databases are vertically scalable, meaning organizations are restricted by the amount of CPU, RAM, and storage resources that can be added to a single server.
“NoSQL brought scale to the table, but it had to throw out data integrity to do it,” Chris said. “That was good enough to power a whole wave of database startups going on to becoming mature companies, but it was still not actually what the large companies or enterprises were looking for.”
Unlike other massively scalable databases, FaunaDB holds up to the rigorous demands and regulations of the financial and eCommerce industries. For instance, Capital One recently made FaunaDB a featured technology in the bank’s efforts to revamp legacy business systems.
FaunaDB moves database functionality away from outdated hardware architectures to nimble, and secure software-based solutions that operate on all cloud-native platforms, including AWS, Azure, Google Cloud, Docker, Kubernetes, and VMWare.
“By putting data integrity and operational safety at the top of the list when we designed Fauna, the product is actually built to be used in demanding environments with real, mission-critical workloads,” Chris said.
Mission-Critical Security, Usability, and Multitenant Capabilities
Beyond data integrity, Chris said he enjoys telling customers and potential users about the simplicity and streamlined functionalities built into FaunaDB. The platform is delivered to customers via a JAR file and runs anywhere you can deploy a Java virtual machine.
“It’s extremely simple to operate and plays well with various parts of the DevOps toolchain,” he said. “FaunaDB is unique among distributed databases for having the simplicity and transactions come together in a way that gives you operational safety.”
FaunaDB operates as a collection of nodes, each of which operates autonomously within a cluster. As nodes are added or removed, they communicate with each other to self-manage the cluster. Nodes are grouped into replicas, which each contain a full copy of an organization’s data. Redundancy is achieved by operating multiple replicas, Chris said, with safeguards included to make sure organizations are protected from user error or other calamities.
“For instance, the administrative tools won’t let you remove a node that has the last copy of some data,” he said. “We won’t let you put yourself in a position of losing data.”
In addition to redundancy, FaunaDB introduces operational resiliency by enabling enterprises to leverage multitenant environments and shared services in the form of hybrid analytical and transactional processing. Developers can run experiments without affecting the customer-facing experience, for example, or organizations can run secondary analytics queries behind live transactions.
For example, FaunaDB’s multifaceted approach enables complex financial services to assess credit worthiness, portfolio diversity, and minimum balance requirements as a client applies for a loan or weighs investment opportunities.
“The FaunaDB cluster is happiest when it’s servicing your entire enterprise,” Chris said. “You could have one cluster across dozens of datacenters with a single operational plane. Within that, you can have thousands of independent datasets, all with their own security, all with their own priority, and it makes it super easy to manage.”
On the Horizon: Blockchain, Streaming, and Data Retention
As FaunaDB examines the next frontiers of distributed relational NoSQL databases, Chris said staying far ahead of technology trends — as well as careful planning and development — is imperative.
“Correctness is so important, and we’re going to make bigger but slower moves,” he said, pointing toward blockchain-oriented features in the works. “Technically, they’re pretty easy to add, and it opens up a whole new marketplace and a bunch of use cases. But there are other cases where we build an entire product from scratch because we think something is more than a trend.”
In the case of blockchain functionality, according to Chris, the interest in real-time data streams led FaunaDB to examine the overall issue of temporal data organization.
“If you need to go back and look at the history of a data item or need to track changes to your data over time, then that’s on the developer to all of a sudden add another dimension to their schema, which can create complexity everywhere,” he said.
With FaunaDB, users can configure a particular data retention period during which the platform keeps object histories and can run auto-queries. Although the feature certainly pleases enterprise customers, Chris said the company’s forward-thinking implementation also optimizes the on-disk structures for streaming functionality.
“Data integrity is always going to be the first thing, but we emphasize the ability to stand up a cluster and service an entire customer base with independent, multitenant connections,” he said. “There’s no easier way to hand out database connections to a group of independent customers.”