Archive

Archive for the ‘Data Mining’ Category

Evolution and Analytics: An Interview with Dr. Gary Fogel

Earlier this year I had the opportunity to sit down with Gary Fogel, CEO of Natural Selection, Inc. to discuss a data analytics approach that his firm specializes in called evolutionary computation.

Evolutionary computation encapsulates the process of natural selection into a computer by following a process which 1) generates a population of solutions 2) scores and ranks these solutions 3) retains and modifies the solutions which perform well and finally 4) iterates this process thousands of times to find the predictive models with the best performance.

I found our chat to be highly informative and thought provoking, particularly with regard to some of the types of applications this approach is best suited for.  Excerpts from our talk are featured in the video below.

 

Some core insights from the talk include:

  • Evolutionary computation is valid not only for comparing learning methodologies, but also for optimizing within them.  Gary provides insight on how the approach can be used in cluster analysis (2:54), decision tress (3:49), and neural networks (4:17).
  • A primary advantage of the approach is that you can look data with large feature sets using a broad set of solutions and search them in ways that people have not necessarily considered before.  The methodology can evolve its own answers without relying on pre-programmed knowledge.  This flexibility allows researchers to find patterns and relationships in data that might not have been previously realized or inferred (7:32).
  • The methodology is highly dynamic allowing for it to engage in continuous learning as environmental conditions change (9:23).
  • Evolutionary computation works through a flow of generating solutions, fitness scoring, ranking/selection, variation, termination and then iteration (10:06).
  • Approach can provide insight and adapt even when importance of data features and importance of combinations of these features changes over time (12:16).
  • NSI has successfully applied evolutionary computation methods for homeland security applications such as risk management for container ship traffic (14:33) and safety of food imports (16:25), life science projects in drug design, cancer diagnostics and personalized medicine (19:20) and in routing and logistics optimization projects for both military and industry (20:32).

Transforming Unstructured Data into Market Intelligence: An interview with WiseWindow

January 22, 2012 Leave a comment

At the end of last year, I sat down with my good friend and former colleague Marshall Toplansky to talk about how he is driving marketing innovation at WiseWindow- a company that he helped found and now runs as President.   WiseWindow is a company that is taking the unstructured data found on the Internet and social media and turning that into streams of useful data that marketers can use to track customer attitudes around brand and brand attributes.

You will find excerpts from the interview in the video clip below.  We talk about the evolution of sentiment analysis, how WiseWindow is driving innovation in this area(7:45),  and the benefits that accrue to companies that have chosen to innovate using his technology(12:30).

My core insights from our discussion included the following:

  • Sentiment analysis is rapidly gaining acceptance as an alternative/supplement  to traditional methods of measuring consumer attitudes including voice of customer, customer satisfaction, and net promoter score.
  • In many ways, sentiment analysis is superior because it can account for a much larger population sample on a broader array of topics than traditional methods and do it continuously.  This creates the ability to track the changes in attitudes based on tactics that a company might employ to improve positioning in the marketplace.   In short, the fact that sentiment analysis  is orders of magnitude faster and cheaper, is leading to new paradigms in terms of how companies can track sentiment nearly real time and then focus resources to respond.
  • Related to the point above, the predictive power of sentiment analysis is unleashed when companies are able to find relationships between customer attitudes and business performance.
  • WiseWindow is driving innovation in this area by using pattern recognition technology to address the fact that different word combinations reflect different sentiments depending on the  context in which  a company operates. In sum, it is using web crawling technology in conjunction with natural language processing algorithms to not only track what people are saying about its clients, but also what they are saying about the specific attributes of the products, brands, and competitors.
  • While financial services businesses have been early adopters of this technology, successful case studies are emerging in media and entertainment, autos, and election tracking.

For access to the full interview (a little over 60 minutes) I have included a podcast version of the talk.  Enjoy!

Transforming Unstructured Data into Market Intelligence: An Interview with WiseWindow

 

Video from Wharton San Diego Panel Discussion on Analytics

December 11, 2011 Comments off

Wanted to finally post the video from a panel discussion I hosted a few months back onto my site.  Great discussion and stellar panelists.  I will update this post on my key takeaways, as well as create a short form of the talk with highlights, but for now, here is the full-length, unproduced video.

It’s been a while since I have had time to do much on the blog- have been spending the last few months ramping up new clients and also working on a new initiative that I will be launching next year.   Stay tuned, there is more interesting and exciting stuff on the way…

Questions for Panelists at Tonight’s Forum on Data Mining and Predictive Analytics?

September 13, 2011 Leave a comment

This evening, I will be moderating a panel on data mining and predictive analytics.  In preparation for this I have doing a bit of research and wanted to share a few observations that will anchor the agenda for the talk.

We are inundated with data.  It is driven the rapid changes in how we interact, communicate and function in a world that increasingly connected (both people and things) with the Internet.  The amount of data being collected is growing at an explosive rate. In 2010, data collected was equivalent to the cumulative of data generated in the previous 5000 years.  And the size of this data store is doubling every 2 years.

Consider the following:

  • Facebook has at least 30Petabytes of Data, 30B pieces of content shared/month
  • YouTube users upload 35 hours of video every minute
  • Twitter delivered 25B tweets in 2010
  • Zynga generates15 Terabytes of data every day

None of these companies mentioned above existed 7 years ago.

So we have companies that have far more data than ever before.  How does this create a new paradigm?  What are the ways in which commerce will be transformed?  We’ve had companies exploiting large data sets with tools and techniques that have been around for decades- so what’s different now?   Are we just dealing with more noise?

These are the topics that I hope we will gain some insight on in tonight’s discussion.

To get the discussion started tomorrow, I am highlighting three changes that I see.

#1) Data Availability-Exponential growth in data is available to everyone- companies and individuals alike. Whereas massive data sets were once the domain of well capitalized companies, now even startups and individuals can use them to find value.

#2) Tools-Data storage and processing infrastructure and tools are becoming increasingly powerful at fractions of the cost.  Open source software, collaboration platforms, cloud infrastructure provide mechanisms for not only collecting and storing the data, but also processing it at scale.

#3) Investment-Markets are rewarding companies on the promise that they will find value in the massive data sets that they create.  Have you checked out the valuation of Facebook, Twitter, or Zynga lately?  To be sure, these companies are achieving astonishing growth rates, but investors are placing their bets on new ways of monetizing the data streams that these and other companies collect and control.

I would be very interested in getting my readers’ ideas on what to ask our panelists.  I will try and incorporate suggestions into the flow of the discussion.  For those of you who will not be able to make the session I will be blogging on the key takeaways and insights in the days ahead.

Looking forward to seeing many of you this evening.

Software Eating Your World? Would You Like a Byte?

August 24, 2011 2 comments

Last week, Marc Andreessen published an essay in the Wall Street Journal entitled “Why Software is Eating the World”.   A particular passage resonated with me:

‘Companies in every industry need to assume that a software revolution is coming. …new software ideas will result in the rise of new Silicon Valley-style start-ups that invade existing industries with impunity. Over the next 10 years, the battles between incumbents and software-powered insurgents will be epic.”

Since we are all going to be software companies (eat or be eaten), I think it is important to understand how/where value creation is shifting within the software industry itself. In short it is a tale of migration from features and functions to data – a trend which many of the enterprise software companies I work with see and understand, but are slow to adapt to- primarily because it is a trend highly disruptive to their existing business models.

To examine this shift, let’s look at how software has evolved over time.

Software as Tools

The first business applications were ones that helped individuals do tasks more effectively.  Remember Lotus 1-2-3? Wordstar?  In short, for roughly the first decade of the PC era, software was primarily a tool that you used to make some individual function/task more efficient to complete.  Software was an interface that allowed us to leverage the rapid advancements in computing power and get the work all of us were doing anyway, get done faster.

Software as a Repository of Best Practice

From point solutions that streamlined highly generic tasks, vendors began to create value (and charge for this value via licensing and services) by embedding specific business process knowledge into applications.  People could move down the learning curve more quickly and be managed more effectively by having them operate in a well defined and highly customized software environment.  Software became a way of capturing and propagating best practice within an enterprise and something that became critical in very specific functional areas of enterprises. CRM was an early example of this, integrated ERP (HR, financial accounting processes, inventory management, payroll) systems followed quickly thereafter.  The economics of SaaS/PaaS are generating a proliferation of these models across every industry vertical/ functional process imaginable.  All of these applications create standard work process and control mechanisms that drive productivity and consistency.

Software as a Communication Medium

As the various parts of a business process started to be connected, and common standards for connectivity (e.g., XML) evolved, communication/collaboration has become a central function integrated into applications.  Indeed, communication has given rise to a new value creation mechanism for software- transaction platforms.  Ebay? Paypal? Skype? They didn’t make money by selling software licenses- rather they made platforms for communication, collaboration and validation that allowed them to make money on the transactions that they brokered.

Software as a Data Collection Mechanism

So now we arrive at data.

Is Zynga a game or a data collection mechanism? Google search engine or data-based advertising platform?  Facebook communication tool or targeted marketing platform?

We now have access to literally millions of useful applications at little to no cost.  To be sure it is cheaper to create them, but firms are finding new ways to offer subsidized or free software because of the data they hope to compile through widespread distribution of their products.  Software has become a data collection mechanism and analytic competitors are hoarding data and learning how to make these data streams  useful to refine their own businesses and create value for others.

One thing is clear.  Across all of the disruptive models that have dominated “bubble 2.0”, none have involved licensing fees.  Indeed, the primary source of value that Mr. “no bubble here” Andreessen is so confident in is data.  The extent to which his investments will pan out will depend on whether they will be able to meaningfully realize value in the petabytes that his firms control.

What it all means?

If you are building a software product, you need to incorporate all of the means of generating value that I reference above.  Doing so not only maximizes value for your users, but provides you with flexibility to morph your business model for the future.

As you build, expect that at some point soon you will face someone willing to be highly disruptive that will be seeking to generate profits through business models that are vastly different to your own.

Recognize that value is shifting towards data and that to win, you had better become great at collecting, managing, analyzing, and MONETIZING all of the data streams that you control.

So  if you are in a business that charges fees for software licenses, or a platform that makes fees on transactions and have no vision or plan for how you will monetize the data you control, welcome to a decade of pain.  The “Silicon Valley-style” startups that Mr. Andreessen is funding are coming to eat your world.

Stay tuned for more on data and analytics- if you are in/around San Diego,  and interested in the topic, be sure to check out an event I am moderating on 9/13.

Data Mining and Predictive Analytics Panel 9/13

July 25, 2011 2 comments

 I am pleased to announce that I will be moderating a panel of academics and executives for a discussion on data mining and predictive analytics in September.  Data are such a critical ingredient in the future development and defense of profitable business models (on-line or off) and my aim for the talk will be to show how leaders across many industries are adapting and innovating in response to this trend.

Thus far we have confirmed the following speakers:

Dr. Stephen Coggeshall, CTO, ID Analytics

http://bit.ly/oKxaiO

Dr. Elea Feit, Research Director for Wharton’s Customer Analytics Initiative (WCAI)

http://bit.ly/nDa9rK

Scott Gnau, President, Teradata Labs

http://bit.ly/nFk8sA

Dr. Christopher Trepel, Senior Vice President Corporate Affairs and Chief Scientific Officer, Encore Capital Group

http://bit.ly/mZ5xy8

I will be posting insights and thoughts from the talk on my blog, but if you are in San Diego on 9/13 and interested in how data are transforming the competitive dynamics of multiple industries, you are most welcome to attend.  For event details and registration, click here.

Hope to see you there.