Healthcare Big Data: Mining Vs. Discovery
Sir Walter Raleigh’s military attempts to forcefully takeover gold mines in South America from the Spanish and to find new mines both ended in failure. When he turned his attention to the unexplored area of North America, that he named Virginia in honor of his virgin queen, Elizabeth I, he again failed to find new mines. He came back to the Queen instead with two plant life-forms unknown in the Old World: tobacco and potatoes. Although her majesty was not amused, the newly discovered plants were to have a far greater economic effect than Spain’s gold.
The typical procedures for handling “big data”, in healthcare as in other fields, are customarily referred to as “data mining”. The name is appropriate but self-confining. The healthcare industry now has an immense universe of data as its disposal, possibly more than any other industry. New data from pharmacies, insurance carriers, government, journals, hospitals, and data aggregators floods out from their data warehouses each month. And the data is complex compared to other industries too, with a typically large number fields for each record, and multiple identifying numbers for each entity. As McKinsey noted in its report (“The ‘big data’ revolution in healthcare”, Jan 2013), “…increases in data liquidity have brought the industry to the tipping point.”, concluding: “Big-data initiatives have the potential to transform healthcare, as they have revolutionized other industries… Healthcare stakeholders that take the lead in investing in innovative data capabilities and promoting data transparency will not only gain a competitive advantage but will lead the industry to a new era.”
This type of data certainly contains plenty of nuggets of useful information to be mined, but it also contains a treasure of often unexpected information for those open to the concept of data discovery. Handling and analyzing such large and complex volumes of data requires sophisticated tools, carefully planning and skilled analysts. Thus projects undertaken to mine healthcare data tend to be expensive. A sophisticated database infrastructure, be it based on a traditional SQL DBMS or a newer technology such as NoSQL or Hadoop, must first be implemented to contain the data, and specially trained data analysts must be engaged to process it. It seems very prudent then to set well-defined goals for such projects, with a limited scope.
In practice these goals are quite varied, such as: identifying leaders in a particular field, finding particular trends in treatments, correlating multiple medical factors, and linking together individuals and organizations. But although such a sound standard business practice of setting tightly defined goals seems wise and logical, it means much insightful information can be missed – there is a wealth of useful information buried in all that mountain of data, and, and as the old adage goes, “We don’t know what we don’t know.”
In a recent project we were asked to undertake such a project, with a well-defined set of goals of extracting information and links between a large number of key individuals. We accomplished the main goal, finding seventy-seven thousand healthcare providers closely linked to key practitioners. But, to the client’s delight, with our proprietary data discovery tools, we also discovered very interesting trends and correlations during the analysis, which were outside the project scope but inexpensive to accomplish once we started the data analysis. We discovered from the mountain of data not just the links to for key providers but also their affiliations to thousands of institutions. And interestingly it was also discovered that Nurse Practitioners were increasingly treating patients for a particular condition, while the traditional specialists were treating the condition less and less. Nurse practitioners had gone from prescribing 12% of the key drug to 14.5%, within just two years
So to really make use of this expensive data an additional approach should also be considered – one of simple exploration and discovery. And given the costs involved, a good way to make efficient use of resources is have hybrid projects where there is a set of defined goals for what is be pulled from the data, together with an explorative component. Of course general guidelines must be given as to what types of new discoveries are interesting, and the tools and analyst team involved must be carefully chosen for their exploratory talents.
Snowfish can help optimize your approach to big data, using our purpose-built software tools and extensive experience. We have worked with over two dozen life sciences companies for over a decade, discovering valuable insights from healthcare databases. We have also developed services such as clinical data gap analysis, KOL identification and mapping, identifying strategic partners, and healthcare provider optimization, all based on analyzing and value-adding large healthcare databases. If you are interested in learning more about Snowfish’s industry-leading approach to healthcare data discovery and mining, please feel free to reach out to us.