Virtually all major retailers collect data to better understand their customers. eBay, the online retail and auction site, is no exception.The company gathers and stores massive amounts of web click-through data, social data and other data types generated daily by its more than 100 million active users.
With such volumes of multistructured data being generated and stored every day, its not surprising that eBay made the strategic decision to invest in NoSQL, scale-out technology to support its operations. Storing, processing and analyzing that much data in a relational database would simply be cost and performance prohibitive.
The company currently uses a multi-datacenter DataStax Enterprise deployment, which is based on Apache Cassandra but also includes Apache Hadoop for large-scale batch analytics, to store and process much of its user-generated click-through data. More specifically, eBay has provisioned 250 TBs of storage on the platform to accommodate 6 billion-plus writes and 5 billion-plus reads daily.
eBay leverages DataStax Enterprise for a number of use case. For example, each time a user clicks the "Like", "Want" or "Own" button to add an item to his or her favorites page, the associated data is loaded and stored in one of the eBay's geographically dispersed Cassandra clusters. The platform is engineered for dynamic load balancing to ensure data is distributed evenly and always available.
The inclusion of Hadoop in the DataStax Enterprise platform also enables eBay to analyze this data in the same environment without needing to move it to a separate system for analysis. This speeds up the analysis process and eliminates the risk of losing data in transit.
Action Item: Retailers struggling to manage the growing volume of user-generated and web click-through data with traditional relational technology should evaluate the emerging ecosystem of NoSQL data stores. As the volume of multistructured data associated with online retail operations continues to grow (and grow), relational database technology simply will not suffice to ensure optimal efficiency and performance. Consider approaches like that taken by DataStax Enterprise, which combines a NoSQL data store with Hadoop deployed across scale-out clusters of off-the-shelf hardware, to enable real-time read/write capabilities and analytics.
Footnotes: