The following are my notes on a briefing of Wikibon analysts by Quantivo, a big data analysis startup that uses unique technology to discover unexpected answers to common marketing questions. This technology was developed inside a large, unnamed credit card company to identify fraud and associated issues in near real-time and has been spun out as an independent startup. Attending the meeting were several analysts from Wikibon, two officers of Quantivo and two account personnel from Quantivo's PR firm:
- Nick Allen, Wikibon
- Jeff Kelly, Wikibon
- David Floyer, Wikibon CTO
- Bert Latamore, Wikibon
- April Rudish, Account Executive, Trainer Communications
- Jenna Richard, Account Assoc., Trainer Communications
- Amrit Williams CTO, Quantivo
- Jim Chiang, Sr Dir. Of Sales, Quantivo
Notes from e-mail: Quantivo has big plans to shake up the status quo in business analytics, and I’d like to offer you a pre-briefing on its plans before a September 27 announcement under embargo. The race to understand the mind of the consumer is in high gear, with the market opportunity currently pegged at $24 billion annually. The problem is, the tools don’t exist to discover monetizable data patterns from the many, disparate data sources and datasets where the raw data may be buried. Hence, business decisions to develop new offerings that will address consumer needs are little better than a shot in the dark. That will all change on September 27, when Quantivo introduces a powerful “pattern-based” analytics offering that can search billions of records, in any combination of data sets stored online, offline in remote databases, or from both sources, and derive actionable information in under a minute. The result? Consumer offerings that propel action, and increased corporate revenues and profitability for providers. Just as importantly, Quantivo will make powerful analytics affordable to all companies, and available through a secure cloud-based portal.
April: What we'd like to start with is an overview of Quantivo. I'm not sure if any of you have herd of Quantivo before. So we want to give you an overview, what our go-to-market strategy is. And then give you an idea of what the platform can do. Then we can have Amrit and Jim give you a little bit of their background with Quantivo, and then we can just go from there. Does that sound good?
DF: Excellent.
April: All right. Amrit would you like to go ahead and introduce yourself and give a little of your background with Quantivo?
AW: thanks, April. I joined the company in the middle of 2011. I come to Quantivo most recently from IBM. I was the CTO of end-point security and director of emerging security technologies. I focused on things like analytics, especially around security and fraud, and governance; cloud computing security, mobile especially as it related to smart grids and highly distributed mobile computing environments; as well as social media. I was given this incredibly plum role at IBM where I was the executive to do all this cool stuff because they acquired a company where I was CTO in 2010 called BigFix. One of the things that was made really clear to us was that that acquisition was not just a technology acquisition, it was a ?? acquisition for talent. So they really tried to do a lot for us.
BigFix was a company that I was at for about 3.5 years, just under 4 years prior to the acquisition. I was the CTO at BigFix. I defined their go-to-market, their product, their strategies, their road-maps long term. The ?? positioning, what we would do, what we wouldn't do. And I spent a long time evangelizing the technology. BTW its a systems management and technology so not necessarily end-data or analytics. I was the largest enterprise software acquisition in 2010. So a phenomenal exit for a company that when I joined it just under four years prior nobody knew anything about, was doing less than $2 M in revenue, and when I joined the company I had a group of my peers basically tell me, “What are you doing. You're a smart guy, why would you join this dying patch management company?”
I joined BigFix from Gartner. I was an analyst at Gartner for some years. I focused on primarily IT security, particularly those aspects of IT security that have to do with analytics & data like security information, event management, security analytics. Prior to that I spent a long time during engineering stuff. I was a research analyst at McAfee, and I did development work, developed a lot of their anti-virus technologies back in the early '90s when they were a little company. So that is enough about me. What I wanted to do was introduce Jim Chang who is my friend and colleague here at Quantivo, and he is focusing on running our marketing programs. Jim, do you want to take a minute and introduce yourself please.
JC: Yeah, I'd be happy to. Thanks Amrit. Let me give you a little background about me. I've spent most of my career focused on the analytics space. I graduated from MIT back in the early '90s. I spent the first 7 years of my career doing development, coding for some Wall Street financial analytics firms. I ended up at Formix Software before the acquisition by IBM actually at the end of the '90s. And I ended up running their BI product group. So that was actually in the late 1990s time-frame. Since that time I've been involved with a number of analytics-related startups bringing really interesting technologies to the analytics space. One which dealt with distributed data management, another in RFD event processing, and most recently a company called Starview that did streaming event data analysis. I've been at Quantivo for 3 months now and I am currently running the marketing operations for the company.
AW: Thanks Jim. And I know that we are absolutely fascinating. My mom tells me this all the tim. But let's focus on Quantivo. If you have any questions about our backgrounds or what we focused on in the past we'll be happy to answer them. But the reason we are here is to talk about Quantivo and hopefully give you some perspective on what we do in market and the value that we add.
A couple of things for full disclosure. When I left IBM in June I basically said, “I'm not going to do anything for awhile. I'm gong to go and do yoga, and maybe I will figure out a language that ?? speaks, and spend some time with my kids. So friends of mine were contacted by Foundation Capital specifically the new CEO Dave Robins who I worked with at BigFix and IBM. Dave was the CEO also at BigFix, and Foundation contacted Dave, and they were interested in finding a new CEO, a new leader to take Quantivo to market. When I first looked at the technology it was very confusing to me. I don't know if any of you have familiarity with Quantivo or know what Quantivo is, but Quantivo is a company with some very interesting technology but had a very challenging time trying to describe that to the market. The folks who had brought the technology to market had a customer-oriented retail background, so they very much focused on that segment of the verticals out there – retail and trying to do customer behavior analytics and trying to deliver a cloud-based service. And it was a challenge for them. But the core technology itself is really exciting, and that's really what got me as a technologist and somebody who looks at markets very excited. Because I realized that there are some really dynamic things happening in market that can be very much benefited and impacted by the type of technology Quantivo has built.
Quantivo was started at a large credit card company by a team of internal & external developers. The technology was designed to analyze every credit card transaction. The company was handling 75 B credit card transactions in short time frames. They needed to look at raw, granular, event data associated with those transactions to discover for instance when some device in a gas station in some small town in Ohio has violated a business rule or that somebody is committing fraud there. There was no good technology that could look at data at the individual transaction level and handle those data volumes rather than summarizing and aggregating, which was the approach at that time.
They also wanted to understand behavior, segment behaviors and infer behaviors that might occur. For example So if they see event X and then see event Y is there a positive or negative association for event Z. And is that a rule violation or fraud event? Or is it an opportunity to maximize revenue or convert a customer or drive greater value to your base? That's what Quantivo built.
In 2008 the developers of the technology & its champions inside this credit card company left and founded this company. The patent is filed, the IP is all owned. So in 2008 they decided to bring this technology to market and put it in the cloud.
So this is a new technology that has been proven in very large environments, and it has some maturity to it.
So what are we bringing to the market? I want to jump over a lot of the normal business slides and get to the meat.
We have this great cloud computing technology that can find patterns in big data. We think we have created an architecture that supports the data reliance that David Velante and other people are seeing. We think we created an interface that can be put in the hands of a business user delivered to a form-factor in the cloud that cuts through the barriers a lot of people are experiencing to using advanced analytic tools.
Slide 5: How Quantivo works: We are based on proprietary, breakthrough technology on data processing and query execution & data storage. We can ingest data very quickly with no requirement for a priori knowledge of data structure or data definition. That coupled with a data processing engine that lets us identify the keys and structure of the data if we don't know it. We do that through a process that attempts to define cardinality. We also have a lot of tools that allows us to define unstructured data. Once we understand that definition, we remove redundancies, dedupe the data. Not a row store or column store type of database. Probably most akin to data indexing where we create an index tree that allows us to store account and measures & still link to raw detail while eliminating duplication. That usually gives us about 10X compression.
The other benefit of that is once we have processed the data we can perform all analysis & queries against that condensed state. We can move the data, parallelize it, replicate it, & we can enable its access & use without having to return it to its original state. Most technologies require you to uncompress data before you use it. That becomes challenging if you lack a dynamic infrastructure or you are doing ad hoc queries.
As part of creating that tree we also identify affinities & cardinality & start building patterns in data. What events events occur together and how often they occur together in the context of specific variables? Many of our customers don't know what they have in their data, they have no idea what they are looking for, they just have been told there is lots of value there.
We then enable train-of-thought analysis of that data with our interface & query environment.. We can provide generic OLAP style analytics to very advanced probabilistic behavioral segmentation analytics. The reason we have all these different stacks in place – our own intrface, data query engine, etc. – is that data processing engine has capabilities & characteristics that are challenging for third-party applications to work with. So it really is pretty much a full stack.
DF: I had a question on the sources of data. Are you bringing all those sources into one place or taking a distributed approach? And where is this running – in your data center?
AW: Slide 6: Currently Quantivo is delivered as a service hosted in Amazon E6, & we work with the rest of the Amazon infrastructure. The technology we developed to take advantage of parallelism and scalability in the cloud is not specific to Amazon. We found given the volumes of data we data that we needed to develop a technology that was not propriety to Amazon. So in addition to running on Amazon & like the infrastructure they provide us, we can deliver our technology in any environment or a private cloud. Originally it was on premise. The reason we like cloud technology & virtualization is the dynamic nature of analytics in general & because we can leverage the infrastructure to support dynamic use of resources. For example we are truly multitenant – not just logical but physical separation of tenant data. So not only do you get separation for governance reasons, no one tenant will not impact any other tenant if their usage spikes. Data volumes of client A won't impact the queries of client B. It also means we can control usage & scalability by customer or tenant if we choose to.
NA: It doesn't sound like multi-tenant to me.
AW: The physical separation is the file system and computing resource. The actual location of our giant computing resource is in the same multi-tenant cluster. So I can administer every tenant from a single place using the same tools & infrastructure. But the tenant's data is not physically housed in the same environment, even when we do parallelization. So I can have a single instance of Quantivo running that has physical & logical separation of the compute resources & storage, even though the infrastructure is all still within the Quantivo cluster. So for instance if you want to use autoscaling and shared volumes inside Amazon, you have query servers that autoscale and load balance. We don't use that model. Every query server uses its own single volume, we don't have shared volumes. So those instances, although they appear as physically separated computing resources, are managed in the same cluster that Quantivo manages.
We use internal storage for all data processing analysis. A lot of people are sending snapshots to us that are coming in to us, but we don't use external storage.
NA: If Amazon is controlling it, how do you know?
AW: Amazon has tools that let you monitor those capabilities. The Amazon environment has usage and licensing models for S3 storage that is different from EP2 computing resources. The S3 infrastructure doesn't have the same requirements for network IO between parallel nodes. If you are ding parallelization & want to run across parallel nodes you need a lot of IO activity.
We have the ability in addition to the multitenance to dynamically scale. So based on usage, number of queries or users or the type of query, we will scale up or down. This is key to why we deliver consistent service. It is cloud computing agnostic. We do have support for other cloud environments & have run on them in the past, and we can run on a private cloud for very large customers who want that or third-party customers who want to deliver analytics as a service to customers. We do have customers like that, primarily in the marketing optimization domain. These companies help large customers improve their marketing & want to apply the behavioral segmentation we provide as a service to their customers. Those service providers are running different instances of Quantivo in their own infrastructure.
DF: I want a sounder understanding of your core technology. I heard you describe it ,and I have heard a lot of descriptions of that kind of technology, but what are the core techniques used to sort it out? What does the user have to do to help the process along?
AW: Slide 13: This will answer your question. I want to take you through the use of construct for analyzing data & performing everything from basic queries to advanced segmentation queries & talk abou why that's challenging for many people. What you see is a screen shot of our UI. At first glance it looks challenging, but the concept is simple, just drag-&-drop. When users give us the data our technology automatically parses it to see if we can identify the data structure, assuming we were not given that. Then we load the data into Quantivo, so it should be up and running in a couple of days. Initial data load is often pretty large as sometimes customers don't know what is valuable & what is not. We do initial loads the same way Amazon does. We have customers with high-speed links and can wait for the data to be moved. Others ship us media and load it in. Also, a lot of the data we use is already in the cloud, so we pull it from aggregation points, and a lot of it is cloud friendly.
So once that's up there we transform it and create the environment. When user logs in they see this sreen minus the data. So what this is is the meta data – the structure, measures, filters, etc., that we have pulled out of the data set. So to create a query they need to know what they want to query. In this case the customer wants to look at the context of their invoice. They want to understand all the promotions sold to see what the best promotions are in this store (San Jose). So the user simply drags the commissions, drag the invoices, and drag the filters to create the query. What you see is the query response. So it shows that at San Jose there were about 300,000 invoices. Out of those 20,247 were sold as part of 10% off on holiday decorations. So within a quick time you can get a response seeing the best promotion. We can use that type of query for any number of things. But that's not terribly exciting. Any tool can do that – you can do it with Excel.
Slide 15: How we do segmentation: It starts to get exciting if you look at Slice 15. So in the last slide it was, “What as my greatest promotion?” And we saw that the top promotion in the San Jose store was 10% off on holiday decorations. That leads to the next question: what sold? And not only what sold, but what had the greatest association and affinity with this promotion compared to the overall population? All the user has to do to add this question is click on the cell that says “10% off on holiday decorations”, choose “add this target” that automatically puts the target there, send the query, and and what is returned to them is on top of slide 15, automatic association & pattern creation. So we found if you segment or target every customer who purchased something as part of the promotion, this is what they purchased. So compared to every other customer behavior, they 12% more likely to buy holiday items, 10X more likely to buy perennial plants and 5X more likely to buy extension cords. And the invoice margin averages were going up with the target. So if we were giving 105 off of the decorations, why was the margin going up? We will get into that in a minute. With this information a business user without sophisticated analytics knowledge, for instance how to map reduce, can answer fairly sophisticated questions. This kind of knowledge can help people optimize customer behavior.
But that is just the beginning because this is a simple behavioral segmentation. We can add multi-attribute segmentation. For example, in this next example from an online media company that delivers movies, videos & video game material through a web site. Their highest revenue generating segment is X-box 360 sports and action users. Their question was, how can we increase & cross-sell the highest segment. You can quickly get a simple answer.
Slide 17: But if you look on Slide 17 you see this type of query expanded out. The concept is: “what's my easiest cross-sell opportunities?” “What's my highest revenue generation segment?” “What is the type of behavior I can influence where I see a pattern that indicates people are more likely to spend?
This is a very common question from our customers. In almost every situation you get data bask that supports your assumptions and common sense. For instance, it is common sense to say that when a segment of my customers buy cookies probably about 30% are more likely to buy milk that the general population. But what we find is a lot of times the best opportunity is the one that doesn't make sense, isn't supported by assumption or common sense. In fact that is the key to data analysis – the outcome you can't just guess on. We use data to find that opportunity.
For instance, in this case the analysis showed that the greatest cross-sell opportunity they have with their sports and action segment is to sell music games. They thought it would be fighting or first-person shooters and spent a lot of time promoting cross-selling that way. But what the data showed was that of the segment of people who played sports action, the top thing they did on subsequent visits was play music games.
You'll notice on Slide 17 there is the concept of filters. They were running an ad campaign with Google and wanted to look at everyone referred by Google who then became in their first visit part of the sports and action X-box 350 users and see what every one of those users did the next time they visited the site through the 10th time. That seems like an easy thing to express in English – ery difficul with analytics. But hat this shows is that these guys were 4X more likely to be attracted to music games than anything else on subsequent visits. Fighting is only 2.9X. First-person shooters and wrestling games aren't even on this list. Also, the analysis showed that when these people came back to the site they spent almost double the time on the site and double the page use with music games. So they were doing more. So this is powerful information that suggests that they should target every X-box 360 user of sports action that hasn't visited my music game site with a program.
DF: I think I have the fundamentals. I just want to leave time for you to make the other points you want to make.
AW: One more point that I didn't articulate well: A lot of times the analysis that looks very advanced is simplistic in human terms or English. This is just trying to answer the question: “What are my greatest cross-sell opportunities & how does that look in the context of my population?” But it is not easy for most tools to do. For example in traditional RDBMS tools would be challenged by the volume and velosity of big data & required analysis of fairly large data sets. Active behavioral analysis requires large data sets. We can process billions to hundreds of billions of references very quickly. When you think of OLAP or other pre-aggregation sampling technologies, it can't analyze all the detailed event data. Then there are the advanced analytical solutions from people like Teradata, Oracle, and IBM – the monsters. Those technologies can be cost-prohibitive and challenging to deploy, configure, & use. We're designed for business users.
Hadoop has the same challenges as the other big solutions minus the licensing. But because of their schema-less nature and the way they work, they remove the burden of data ingestion but more the burden of analysis to analysis time. Using map reduce & trying to query against schema-less structures can be very challenging. We think because this challenge of big data for people to find what's in it, to find patterns, to do this easily, & to put the business user close to the insight so they can act on it, that we have a compelling, differentiating offering.
The corollary is to ensure that on each of this points important to us – that the cloud-based architecture continues to scale and be optimized for high-performance analysis against big data sets, that the interface is easy to use, and that we can continue to innovate on behavioral analytics. What we have added that we want to talk about to the press today is the ability to do some of this through one click. Or for instance to automate the process. One of the thing we have is this automatic pattern recognition.
Slide 16: A simple query to construct. I want to look at everything that's sold in the department for cabinets and counter tops. We automate the process of showing the things sold together to give them powerful business information. We can apply this concept of behavioral analytics to many domains. I talked to a company with 40 Tbytes of data. They don't know what's in there, they don't know what might be valuable, & they don't know where to start to look at it to impact their business to help their customers.
Since we have bout 20 minutes, I would like to open up for some Q & A.
NA: Does your technology lend itself to real-time analytics?
SW: Like the Infostreams technology? Yes & no. It does once we understand the structure and have the initial dump of data and if the structure of the data doesn't change. Then answering queries in real time could be accommodated. But we not trying to do that now. We are not necessarily designed to perform analysis or define the metadata of the content.
NA: Could you push the compute to the retail store?
SW: Potentially yes. What we have seen is when we create these behavioral segments automate the process to push these behavioral segments into other technologies. So it will push the analysis down into the customer's systems. So we're trying to close the loop of operational analytics in that way. I haven't looked at moving the compute resources to the customer. I think you are going toward moving the compute resources to the data as opposed to getting the data up to the compute resources. We could do that. The challenge would be updating the environment remotely while maintaining control of what they are running. I would have to look at the bottlenecks. My concern would be ensuring the compute resources they had were being managed and could take advantage of our dynamic scaling.
DF: On the business side, how's it going?
SW: we've only been here for a short period of time. For the most part we see a real opportunity to bring value. We want to create a self-service model so people can get value faster. Shortening the process a user has to take from wanting to ask questions to getting value from our tool. Let them onboard data themselves & analyzing in a sandbox environment, lensing & interacting with that data. We intend to offer a self-service preview model starting in early 2012.
For the rest of 2011 and 2012 the bulk of what we are doing will be putting in a foundation to ensure and easy, frictionless experience for our users. The reason is as a cloud service, I've been involved in a lot of enterprise sale situations & I hate enterprise sales. It's bulky, big, unpredictable, and they tend to be self-destructive. At Big Fix we spent a half million dollars with Salesforce.com & I don't know how many hundreds of thousands with Success Factors, and we never once saw a sales person. We were empowered to educate ourselves on that tool, deploy that tool, and start using it and get provisioned without involving sales. That's important to us. We have enterprise sales guys who maintain accounts, and we are going to maintain that structure, but in 2012 we're going to move to a different model around community nurturing. In this model we can use our own technologies to optimize how we target customers. So we want to build a community site that lets people get access to our tools quickly so they can try them out and sandbox without even talking to us. We hope that will get us into an underserved segment of the market.
We also want to support that with partners involved in business development activities. We have had some success with Logan Todd and Razorfish, companies that focus on providing market optimization to their customers. That gives us access to companies that we have not reached and to geographies we are not provisioned to enter. So we aren't able to go into the UK market – Logan Todd is. We can work with them to gain entry into that market. And we provide them a valuable analytic service that is essentially turn-key that they can monetize into their base.
Slide 18: The Quantive Solutions Focus: Have customer behavior analytics engine that provides analytics productivity, data mining, etc. and today we have a customer behavioral analytics technology that fits inside that. That just means we have a lexicon, language, and templates that are customer-behavior oriented. So we are focusing on delivering that customer behavior analytics solution and complimentary solutions. We think that it is exciting that if you apply customer segmentation to Twitter, we can show you what people say when they also mention you. For instance, it is hard to get computers to understand sarcasm. So imagine a tweet stream that starts with someone posting, “i heat SW. They lost my luggage again. I'll never fly them again.” followed by another user tweeting, “I lie SW.” The computer would say, “Here's another customer who loves Southwest.” bu the reality is that that comment is connected with tweets about things that really aren't likable at all. We can look at what is said but what other things are said when they talk about you. So we can show that yes a lot of people talk about Southwest, but here are the top three things they say when they tweet about Southwest.