Graphchi- Analyze Twitter’s Social Grap
Graphchi- Analyze Twitter’s Social Graph in under an hour… with a Mac Mini http://ow.ly/cjWdY #bigdata #Graphchi
Evolution and Analytics: An Interview with Dr. Gary Fogel
Earlier this year I had the opportunity to sit down with Gary Fogel, CEO of Natural Selection, Inc. to discuss a data analytics approach that his firm specializes in called evolutionary computation.
Evolutionary computation encapsulates the process of natural selection into a computer by following a process which 1) generates a population of solutions 2) scores and ranks these solutions 3) retains and modifies the solutions which perform well and finally 4) iterates this process thousands of times to find the predictive models with the best performance.
I found our chat to be highly informative and thought provoking, particularly with regard to some of the types of applications this approach is best suited for. Excerpts from our talk are featured in the video below.
Some core insights from the talk include:
- Evolutionary computation is valid not only for comparing learning methodologies, but also for optimizing within them. Gary provides insight on how the approach can be used in cluster analysis (2:54), decision tress (3:49), and neural networks (4:17).
- A primary advantage of the approach is that you can look data with large feature sets using a broad set of solutions and search them in ways that people have not necessarily considered before. The methodology can evolve its own answers without relying on pre-programmed knowledge. This flexibility allows researchers to find patterns and relationships in data that might not have been previously realized or inferred (7:32).
- The methodology is highly dynamic allowing for it to engage in continuous learning as environmental conditions change (9:23).
- Evolutionary computation works through a flow of generating solutions, fitness scoring, ranking/selection, variation, termination and then iteration (10:06).
- Approach can provide insight and adapt even when importance of data features and importance of combinations of these features changes over time (12:16).
- NSI has successfully applied evolutionary computation methods for homeland security applications such as risk management for container ship traffic (14:33) and safety of food imports (16:25), life science projects in drug design, cancer diagnostics and personalized medicine (19:20) and in routing and logistics optimization projects for both military and industry (20:32).
Interview with Peter Fader on Customer Centricity
A few months ago, I posted a review of Peter Fader’s book, Customer Centricity. Recently, my colleague Ramesh sat down with Peter to discuss inspiration for the book as well as key ideas within it.
Here are some of the highlights of the interview. If you a fan of Peter’s like me, I think you will enjoy it.
Are You “Customer Centric”?
More great stuff from the folks at the Wharton Customer Analytics Initiative (WCAI).
Yesterday, Wharton Professor Peter Fader released a new book entitled Customer Centricity. I had the privilege of reviewing a pre-release copy of the book and was struck by some really interesting and insightful points that Peter highlights in it.
Specifically,
Did you realize that some of the greatest brands in business history- Starbucks, Nordstrom and yes, even Apple have a long way to go in terms of optimizing the value of their customers?
Shocking right? But Peter makes a compelling case that some of the most successful (and valuable) companies still have room to grow in terms of deepening their understanding of their existing customers and using this information to create custom offerings that strengthen ties and add to the value of the customer relationships that these companies already have.
Did you know that oversimplification of how you classify customers has significant consequences in terms of the effort/investment that you should be making to make your most important customers happy?
There is a great discussion (and a specific example on the wireless industry) in the book about how an oversimplified view of customer churn can cause companies to overlook and under-invest in some of their most important relationships.
Did you know that analysis of customer value could represent an entirely new way of evaluating investment choices?
In the book, Peter introduces the idea of customer equity (the combined lifetime value of all of its customers) as an increasingly accepted metric for considering the value of certain types of firms. While challenging to execute, Peter points out that this may be an approach that is far more tangible and quantifiable than other measures such as brand equity.
These are but a few of the topics that he compellingly covers in this book. Peter is quick to point out that his ideas on customer centricity are more applicable to certain types of companies, but if you are a part of an entity that has large numbers of customers, sells services on a subscription basis (SaaS providers- are you listening?), has the ability to create custom offers, or is interested in investing in companies that have these characteristics, then this book lots to offer. I would highly recommend it.
For more information on the book and how to get it, click here.
Questions for Panelists at Tonight’s Forum on Data Mining and Predictive Analytics?
This evening, I will be moderating a panel on data mining and predictive analytics. In preparation for this I have doing a bit of research and wanted to share a few observations that will anchor the agenda for the talk.
We are inundated with data. It is driven the rapid changes in how we interact, communicate and function in a world that increasingly connected (both people and things) with the Internet. The amount of data being collected is growing at an explosive rate. In 2010, data collected was equivalent to the cumulative of data generated in the previous 5000 years. And the size of this data store is doubling every 2 years.
Consider the following:
- Facebook has at least 30Petabytes of Data, 30B pieces of content shared/month
- YouTube users upload 35 hours of video every minute
- Twitter delivered 25B tweets in 2010
- Zynga generates15 Terabytes of data every day
None of these companies mentioned above existed 7 years ago.
So we have companies that have far more data than ever before. How does this create a new paradigm? What are the ways in which commerce will be transformed? We’ve had companies exploiting large data sets with tools and techniques that have been around for decades- so what’s different now? Are we just dealing with more noise?
These are the topics that I hope we will gain some insight on in tonight’s discussion.
To get the discussion started tomorrow, I am highlighting three changes that I see.
#1) Data Availability-Exponential growth in data is available to everyone- companies and individuals alike. Whereas massive data sets were once the domain of well capitalized companies, now even startups and individuals can use them to find value.
#2) Tools-Data storage and processing infrastructure and tools are becoming increasingly powerful at fractions of the cost. Open source software, collaboration platforms, cloud infrastructure provide mechanisms for not only collecting and storing the data, but also processing it at scale.
#3) Investment-Markets are rewarding companies on the promise that they will find value in the massive data sets that they create. Have you checked out the valuation of Facebook, Twitter, or Zynga lately? To be sure, these companies are achieving astonishing growth rates, but investors are placing their bets on new ways of monetizing the data streams that these and other companies collect and control.
I would be very interested in getting my readers’ ideas on what to ask our panelists. I will try and incorporate suggestions into the flow of the discussion. For those of you who will not be able to make the session I will be blogging on the key takeaways and insights in the days ahead.
Looking forward to seeing many of you this evening.

