The Magic of Data Science 1

The Magic of Data Science (Part 2)

By Prad Upadrashta | September 12, 2019

A two-blog series on what data science can do, with real world examples.

Continuing, from where I left, with another example from my prior experience, here’s what Data Science can do..


This is about my work with one of the largest renewable energy providers in the country, with a $2.5 billion asset base. I examined their operational data — mostly hand written field technical notes — which I processed using NLP techniques to extract and structure the underlying failure statistics, in order to develop a set of reliability models that could predict when specific sub-components on a $1.5MM wind turbine would fail — we essentially had a model for each of the 66 or so line replaceable units (the level at which the procurement organization acquires parts for inventory).

The most interesting and unexpected result was that my data-driven failure models for the gearbox (the most expensive component) reflected the expected failure rate estimated from an independently developed physics model of the gearbox.

A secondary discovery was that they were reliant on two vendors for one of their electronic components — a mother board — and these two different classes of motherboards had a different useful life which was apparent from the data (nearly double for the more expensive board). This was determined blindly using only the data, as were a few other intimate details of their operations, and the customer remarked “How did you figure that out? Who gave you our operating curve and forecasts?”. Of course, their data told me the whole story.

I extended these results to an entire fleet of over 1200+ wind turbines on one of the largest wind farms in the country — enabling more streamlined utilization of resources, correct inventory stocking levels for critical spares, and freeing up working capital for the company by eliminating excess inventory. They literally revised their entire 2015 operating plan on the basis of my recommendations. Cranes for O&M repair are expensive, typically charged by the hour; meanwhile, the company had contractual power generation obligations to meet — so eliminating downtime was a key concern.

This work freed up millions of dollars in working capital. The “Aha!” moment came to me at 2:30am, while pouring over the failure statistics, as part of my EDA process. While I was able to predict the failure curves from my data, I was looking for a way to back calculate the industry benchmark numbers using published OEM data on the expected age of the asset to reconstruct a baseline, in a rather clever way. I essentially devised a novel way to estimate the budget shortfall that arises from underestimating the failure rates in the accelerating phase of the bathtub curve. .

In a bathtub curve, the accelerating phase represents the transition point when the rate of failures increases exponentially due to mechanical wear-out. I sent out an email of cautious triumph to the VP who owned the deal/solution, and that ended up going viral inside the company. It was a result that no one had yet been able to achieve previously, effectively resulting in the creation of a whole new methodology for predicting and estimating multiple failure modes from unstructured field data. The customer openly admitted, “You were able to do more with our data than we have in 5 years of operating which is remarkable.” Our work was featured at the ARC Industry Forum (2015), by the customer, as an example of how IoT and asset optimization were creating value for OEMs and operators alike.

The solution now underscores their “operational excellence” program. My work has direct implications against hundreds of millions of dollars in O&M cost liabilities at the fleet level.

Data science is never easy nor straightforward, the data are rarely just given to you in the form factor you need, sometimes certain groups actively exclude or withhold data from you for political reasons, the results are never guaranteed, but when they work your way, it is magical and there is nothing quite like the thrill of discovery.

This re-emphasizes a point I have made previously that in this business of data science you need to be comfortable with vagueness and uncertainty as your constant companions, and your job is to create clarity while tangibly improving the business and/or bottom line for your customers.

Before my pricing algos were deployed, I had to defend them to a room full of senior pricing managers who were dead set on believing that they knew their customers better than I did — and I had to win hearts and minds, with data. Not just once, but multiple times every week over a 1-year period.

Before my reliability models were deployed, I had to defend them to a room full of engineers who spent their careers studying wind turbines — I was the clueless one. On that note, don’t ever call a wind turbine a “windmill” — or expect the business to throw you out the nearest window.

Every step of the process requires that you shape and structure a solution that overcomes numerous limitations, hurdles, or challenges: technical, logistical, and/or political.

I have since had the fortune of doing a lot of other cool/fun things since those early projects, but there’s always a soft spot in your heart for the first big ones, which will choke you up every time.

The author, “Prad” for short, is a senior analytics executive and experienced data science practitioner with a distinguished track-record of driving AI thought leadership, strategy, and innovation at enterprise scale. His focus areas are Artificial Intelligence, Machine/Deep Learning, Blockchain, IIoT/IoT, and Industry 4.0.

Prad Upadrashta
Prad Upadrashta
Follow me:linkedin

Chief Data Science Officer

Pradyumna, or Prad, is a senior analytics executive and data expert who has driven AI thought leadership, strategy, and innovation at enterprise scale.