When I last visited the topic of Big Data (BD) and Analytics I proposed that Big Data could easily become a
wasteland for health providers and the next EHR boondoggle that could generate wads of cash for system
vendors. I noted a large investment in Big Data could easily go for naught if we do not pay attention to at
least two key issues. They were; employing bad data as a foundation, and blindly accepting analytics or
mathematical models that do not correctly represent your world.

I received several responses to that piece some stating that I was opposed to Big Data and Analytics. Not
true, as a onetime practitioner of analytics, back when it was called operations research in commercial
industry, I saw firsthand the value of BD, but also the very large expense and pitfalls. At the close of my
first writing I promised to follow up with a list of safety checks you should employ to avoid drowning in the
big data ocean. Here they are.

    1. Bad data. Big data and bad data do not mix. Before you jump in you should get clear answers to
    these questions. Do you thoroughly understand what is in your data? How old is it? Where and how
    it was originally generated? What coding structures were used? How has the coding structures
    changed over time? How many system conversions and mutations has the data gone through? What
    is the consistency and integrity of your data?  
    Scrubbing your data, particularly if it goes back several years and/or transcends different
    information systems is critical. A recent HISTalk piece written by Dan Raskin, MD covered this topic
    well. If you can’t answer these questions before you apply analytics, then all the conclusions you
    draw from your sophisticated analytics will be on a foundation of quick sand.  And be aware,
    scrubbing historical data can be very time consuming and costly, which leads us to the next safety

    2. Focus. Keep your focus as narrow as possible. When you jump in the BD ocean keep your eyes
    on that floating life preserver. If you do not, you’ll get overwhelmed and sink fast. Most big data
    projects will fail because you tried to do too much, or you were too broad in our goals which led to
    loss of control, missed target dates and over budget situations. It’s very easy
    to fall into this rip tide. For example, with a sea of data at our disposal we
    surely should be able to predict census or institution wide patient volumes
    for the next five or ten years. The complexity of such an analytical model
    could easily overwhelm. As an alternative try something more restricted and
    focused. For example, maybe just trying to predict volumes of a narrow
    specialty practice, or identifying the three primary causes of re-admits. With
    a narrow focus the probability of your model being useful will be far greater,
    which takes us to our next safety check.

    3. Validate your model. Run simulations against past time periods with known outcomes. Did you
    get the answer you expected? If not revise or replace the algorithm(s).  Smaller models are easier to
    validate, apply basic common sense against any prediction. Remember the end user, usually an
    executive or physician group, must buy-in to the model logic and have full trust in the data before
    they can accept any predictions. If they do not understand it, they will not trust the forecasts and it
    the model will never be used. Once smaller models are validated you can link multiple ones together
    to create larger organizational wide models.

    4. Change can sink your analytics. One of the primary reasons to apply models to big data is to
    predict change, then, use that new knowledge to deal with the change before it becomes a problem.
    Unfortunately there are some changes that your historical big data can’t predict. You need to
    understand them and factor them into any decisions you make. For example can your model
    anticipate changes within the practice of medicine? Medical protocols change almost every month
    due to new research and new technologies. Hardly a week goes by without reading about a new
    protocol for medications, diagnostic testing, and chronic decease management. Your ocean of big
    data cannot predict these changes and yet if you are planning a new medical service you need to
    somehow factor in these elements.

                                           Another very unpredictable element is government regulations. A good
                                           deal of industry change will be driven by what party wins each election.
                                           Today it’s MU, ACOs, P4P, value based purchasing, and many other
                                           regulations that did not exist five years ago, tomorrow it will be something
                                           else. If you can predict those changes you probably would do better
                                           in another profession. The analytics and  models you build will only
                                           reflect past practices and governmental policies and like they say on
                                           Wall Street, past performance may not be indicative of future results.
                                           In modeling building these are known as ad-hoc or exogenous variables.
    You take the model’s output then make a one-time ‘swag’ adjustment to reflect your best guess for
    exogenous factors.

    5. Pick the low hanging fruit first. There are two major kinds of analytics; strategic models, and
    operational models. Strategic analytics try to predict enterprise wide outcomes and volumes five to
    ten years out. They focus on questions such as; What are the population trends in our market?
    What patient programs should we be moving towards? Can they be financially viable? Where should
    they be located? What are the competitive factors?

    Operational models deal with more immediate issues, such as; How can we handle higher patient
    volumes using less resources? What can we do to reduce re-admits?  What is the ROI on a large
    capital investment? They are by nature near term and usually address efficiency questions.

    Due to their complexity and time horizon strategic analytics are tough to measure in terms of
    efficacy. Operational models are far easier to measure, while strategic models are more ‘sexy’ and
    costlier to build. Until you have had repeated good results with operational models you should stay
    away from strategic models. The low hanging fruit are in operational analytics. Moreover there are a
    myriad of them that could quickly generate real ROI and may only require ‘little data’.  

    6. Paralysis by analysis. You could spend a long time drifting in the big data ocean and paralysis
    by analysis could easily set in. Remember, there will always be flaws in your historical data, and no
    model can be perfect so do not let perfection become the enemy of good. This is not an academic
    exercise and you do not have an unlimited budget. All analytics need to be improved, so do it
    incrementally. Lastly after many iterations and revisions and based on your real life experiences if
    the model still does not make sense to you toss it out and move on.

    7. Educate and understand. What problems are you really trying to solve? Many organizations
    waste time and money building models for problems they really do not have or understand. Due to
    ‘hype’ department managers come to believe the model will fix operational problems. Department
    managers need to be trained in how to use and interpret these powerful tools. Understand what the
    tool can and can’t do, and what the real limitations of the model are. This step must come first or
    analytics projects can easily run amok

    If you use outside resources make sure they understand the health care industry and your particular
    venue.  Being expert in quantitative tools is not enough, having a sound footing in the complex
    relationships that drive the delivery of patient care is critical to the success of employing analytical

The annual budget is an excellent example of an operational model. Before you jump into BD, take this
test. How effective is your organization at budgeting? How close do you routinely come to hitting budget
targets? Have you used variable budgeting successfully? If you can’t answer these questions positively
you are not ready to swim in the BD ocean. Big data and analytics can be powerful tools when used with
foresight and care. Applying BD without clearly identifying your objectives, being familiar with the
weaknesses of your data, and not understanding the limits of mathematical modeling or analytical tools will
be a costly and fruitless exercise.

This article first published in HISTalk 11/21/2013

Frank Poggio
The Kelzon Group
Copyright 2013,
All rights reserved
Seven Safety Checks before You Dive into the Big Data Ocean
        Meet the author, Mr. Poggio at HIMSS2015 in Chicago and hear his presentation:
           "Seven Safety Checks before You Dive into the Big Data Ocean"

           To be presented April 15, 2015 - 2:30PM                      HIMSS Education Session #181