Contents |
Introduction
The ability to ingest, analyze, and take action on high-volume, multi-structured data in real-time is quickly becoming a 'must have' capability for enterprises across vertical industries. Such real-time, intelligent decision-making is enabling new lines-of-business and is reinventing both horizontal business functions and industry-specific transactional services.
From an infrastructure perspective, Big, Fast Data requires a rethink of the way enterprises architect analytic and transactional systems. Traditionally, analytic and transactional systems are designed and deployed in relative isolation from one another and operate in a linear or sequential fashion. In order to support real-time, intelligent decision-making, the two systems must be combined to allow analytic insights and transaction processing to inform and support one another in sub-second time frames such that decisions can be made and executed upon fast enough to deliver maximize business value.
Traditional relational database management systems were neither designed nor optimized to support such large-scale, real-time analytic and transaction-oriented use cases simultaneously. Increasingly, new approaches that include one or another flavor of NoSQL database system and the use of flash SSDs and in-memory storage engines is required to make Big Fast Data a reality.
Tapad Disrupts Online Advertising
Tapad is an early adopter of this approach. The company, which operates in the ad-tech sector, enables real-time analytics, bidding and placement of personalized, online advertisements. Tapad employs a NoSQL, real-time analytics engine to facilitate this process. Dag Liodden, the company's Co-Founder and CTO, recently took part in a Wikibon Peer Incite discussion to share best practices and lessons learned as Tapad architected, deployed, and used its internal infrastructure.
In order to benefit from Liodden's insights, it is first important to understand the process behind ad-tech.
Many commercial Web publishers make space available on their Web pages for banner and display advertisements. Typically, when a user opens such a Web page, the browser reaches out to an online ad exchange network and requests an ad unit to serve to that user. The ad exchange broadcasts this information, often enriched with behavior data specific to the user in question, to multiple advertisers. Each advertiser compares the information against its internal ad inventory and existing ad campaigns to determine what that ad impression is "worth" to them. It then decides whether to place a bid and at what amount. Bids are returned to the ad exchange, which determines who the highest bidder is and delivers the winning advertisement.
This entire process - from the user opening the Web page and the ad exchange transmitting the request to the advertisers analyzing the enriched data and delivering the ad itself - must be executed in just milliseconds. Tapad's job is to facilitate this multi-step, real-time process.
The company is just two years old, so it had the luxury of a clean slate when choosing the database technology it used as the foundation of its infrastructure. Liodden identified several factors Tapad took into consideration.
- First was the type of data involved. To deliver the quality of analysis it required, Tapad's system must be able to process multi-structured data. This includes data that identifies the device from which a user is viewing a Web page, its geographic location, and historical browsing patterns and click-through behavior.
- The second factor was speed. Analysis of the data, the bidding process, and ad delivery all must take place in less than 100 milliseconds before the lag becomes noticeable to the user.
- A third factor was latency. The system must be able to incorporate just-created transaction data into its analysis in real time so that advertisers have up-to-the-second data against which to evaluate bids.
- A fourth factor was concurrency. At peak times, Tapad's system must be able to simultaneously handle up to 150,000 ad units per second.
Liodden said Tapad briefly considered traditional relation databases for the job, but quickly ruled them out. Optimizing and scaling an RDBMS for such real-time decision-making workloads would have been time and cost prohibitive, plus relational systems also contained superfluous functionality not applicable to Tapad's use case.
Tapad then looked to the emerging NoSQL space. While there are many flavors of NoSQL data stores, Liodden said Tapad's use case lent itself to the simplest type, a key value store, which supports querying and updating single IDs. Due to the speed requirements, Liodden also determined that spinning disk was not going to provide the performance Tapad required. This meant he needed to consider a NoSQL key value store that enabled in-memory storage and data processing.
Flash Over DRAM
One option was a NoSQL database with dynamic random access memory, or DRAM, serving as the persistent storage layer. DRAM could potentially provide the performance Tapad required, but Liodden determined it would also be prohibitively expensive as the system grew in scale. Tapad's data access pattern varies widely, Liodden said, meaning a DRAM-based caching layer would quickly become saturated, requiring increasing reads from disk. This would result in an unacceptable hit performance. To overcome this scenario, Liodden's team would need to frequently add more DRAM-based hardware as data volumes grew. Each new enterprise-class server added to the cluster would cost tens of thousands dollars, however, a potentially unsustainable approach from a cost perspective relative to Tapad's business model. Reading terabytes of data into DRAM every time new hardware is added to the cluster would also take more time than Liodden was willing to sacrifice.
The better option turned out to be a hybrid flash/DRAM approach from Aerospike, a Mountain View, Calif.-based NoSQL database maker. The Aerospike database is a key value store that uses flash-based solid-state drives (SSDs) as the persistent storage layer, supplemented with DRAM to store the related indices. Flash does not provide the same level of performance as DRAM, but it does deliver significantly higher read/write performance and consistency than disk, and its price-to-performance ratio, when used in combination with limited DRAM-based storage, hits the sweet-spot for supporting real-time, intelligent decision-making applications such as Tapad's, Liodden said.
Aerospike also employs techniques for spreading reads and writes evenly across SSDs, maximizing their lifespan. To wit, Tapad has not had to replace a single SSD in the 18 months the system has been live.
Currently, Tapad runs a five-node Aerospike cluster, with each server utilizing six 120 GB drives. The system manages more than 150 billion ad impressions per month across 2 billion devices. The system routinely supports 50,000 queries per second per server, reaching 150,000 ads per second during peak activity. Total data volume currently stands at 3.6 TBs and growing. Architecturally, the system is latency-aware down to the individual node and is able to intelligently route data to maximize performance, Liodden said.
The Aerospike-based system enables Tapad to serve a growing client base that includes Dell, Evidon, and three of the top four telecom providers. These and other clients rely on Tapad to get their ads in front of the right eyeballs at the right time. The results include improved ad conversion rates across end-user platforms, engagement with new segments of likely customers and, most importantly, increased sales and revenue.
Big Fast Data Use Cases
While Tapad's experience is focused on ad-tech, real-time, intelligent decision-making as enabled by Big Fast Data is applicable to numerous use cases across various industries. These include high-speed financial asset trading in which real-time analysis of multiple, multi-structured data sources must be performed and trading decisions made and executed in sub-second time-frames. Utility exchanges that facilitate bidding for and delivery of electricity and other forms of energy in real time are another logical use case. Recommendations engines that present cross-sell and up-sell offers to online consumers, which currently draw upon analysis of pre-processed data, are also candidates for real-time, intelligent decision-making approaches such as employed by Tapad.
Action Item: Many mission-critical business processes and applications that once took minutes, hours, or even days to complete are now candidates for real-time, intelligent execution thanks to Big Fast Data. Early adopters such as Tapad are living proof that real-time, intelligent decision-making is now a viable and practical option for those enterprises willing to shake the cobwebs from their old ways of doing business and the technologies that support these outdated processes. Those that do so will be in the best position to redefine and dominate existing markets and innovate to create new and highly profitable new lines of business. CIOs must recognize this shift and begin exploring new ways of delivering real-time, intelligent transactional services that customers, both enterprises and consumers, now expect.
Footnotes: