I am a Texan, and famously we Texans like to keep things simple and uncomplicated. Understanding these core tenants in artificial intelligence is possible without knowing all the technical stuff.
It’s important that industry terms like “Big Data” and “Cloud Computing” be made explainable for any audience, no matter what their background or level of technical experience.
In my job as a pediatric intensivist, working in a pediatric intensive care unit (PICU) and providing care to children who have life threatening diseases, technology and data are critical elements of me being able to do my job.
Additionally, I’m an informatician, meaning I have done some extra training to better understand how to effectively use information as a tool.
My career has brought me to the point where I feel confident I can take a reasonable shot at explaining these concepts to newcomers in the field, and I’m going to use a couple of metaphors as my tool to unpack them, or at least give you an entertaining read.
What is Big Data? Texan’s explanation
In real-terms, the answer is: “It’s the biggest, most complex set of data you can actually work with.” But of course, it’s not quite that simple. There are two implications buried in this statement:
1. If you’re not able to get value from the data you have, it’s worthless to you.
2. There’s always a bigger data set. (Hey – I’m Texan, so yes, I’m talking about size!)
Big Data is a truck: When you’re a child, your truck has pedals. You can handle that, and you happily move rocks around in the sandbox. But you can’t work dad’s truck.
Then you mature, and you learn how to drive dad’s truck, and you can haul furniture. But you can’t work the 18-wheeler yet. So, you go to driving school and learn how to drive an 18-wheeler, and you can haul lots of stuff, but you can’t load it by yourself. You need help to do that.
Then you realize you can’t work those massive mining trucks that have tires 15 feet tall. What do you do? Well, of course you go get trained on how to drive those guys…only to find out you can’t even drive it by yourself, much less load it. To get those trucks on the highway, you need a logistical support group that has mapped out the route, timed the journey to minimize traffic disruption, and is escorted by multiple teams.
With each step, the equipment is bigger, more complicated to operate, and more expensive. But with each step, you can do so much more.
The practical question isn’t “What is Big Data?” but rather “How Big does my Data need to be?”
In a competitive market with lots of trucks for sale, it is tempting to go buy the biggest, most expensive truck available. But that truck will only be a good investment if you have the right team of people to put it to good use – in the world of data, the team is much more important than the truck.
The team needs to have expertise at every step where data is involved. “Big Data” activities typically focus on the core tasks: acquisition, storage, retrieval, analysis. But often the biggest challenge is visualization and presentation, where “data” are transformed into “stories” to communicate with the outside world.
Now we’ve got our Big Data, we need to work with it – Texas style.
This is where The Cloud comes in – Texan’s analysis
In today’s world, it is common to have more data than a single device has the resources to manage. For instance, Google processes about 40,000 searches every second.
To find out how many times “Big Data” was searched for in 2017, it would take hours for a typical desktop computer to search through a list of all searches (1.2 trillion) to find the answer. However, using cloud technology, a much bigger virtual computer can be created almost instantly to accomplish the task in seconds.
When the task is done, the computing power from the virtual computer is returned to the pool of available resources so that another virtual computer can be created to accomplish the next task (usually for someone else).
Rather than everyone purchasing their own “really big computer,” cloud customers pay only for the resources they use, when they use them. In this way, customers can spend a couple of hundred dollars to have access to computational resources that would cost hundreds of thousands of dollars to buy outright.
The overheads necessary to keep purchased equipment running and secure are expensive. Beyond the purchase price, there are costs associated with electricity, human resources to keep the system’s software up-to-date, and technician costs to troubleshoot and repair defective equipment – to name a few.
In addition to the cost, there is also the issue of quality. When it comes to supporting purchased equipment, it costs more to provide “perfect support” than it does to provide “adequate support.” There is a diminishing return on investment, so businesses tend to settle for somewhere between these two levels.
For cloud providers, though, “perfect support” IS their job, so the quality of the cloud solution is typically going to be better than what can be achieved with owned resources.
A Texan’s Metaphor for Cloud Computing
Cloud Computing is a shipping company: When your stuff fits into a single shipping container, you can keep that on your property and keep it secure with a fence and a light. When you add a second container to hold more stuff, you have to extend your fence and buy more lights.
When you get 10 containers, you need to set up operations in a dedicated facility and hire a security guard, and you need to think about buying some heavy equipment to move containers around to adapt to changing needs.
With 100 containers, you’re spending a lot of money and energy managing the containers and are at risk of losing sight of the fact your business is to manage the stuff inside of the containers, not the containers themselves.
If you are suddenly tasked with managing 1000 containers worth of stuff, you’d have to hire a shipping company to manage them and leave you free to worry about their contents. Without outsourcing, it would be almost impossible to scale to that demand in the timeframe required.
The bottom line is: focus on what you’re good at and make sure you’re using the right tools to do the job you need to do. When it comes to Big Data, Cloud Computing is usually the tool you need.
Also, the bigger the job, the more important the team becomes. Yes, the tools grow too, but the team is what makes them work.
So there it is: Big Data and Cloud Computing as seen through the eyes of a Texan striving to transform the technical into common sense, by way of metaphor. I hope you techies out there will forgive me the liberties I have taken in presenting these topics.
Books are devoted to each of these areas and I have a great deal of respect for those who are practitioners in the field, especially those of you engaged in putting these tools to use in medicine where they are needed most, but where they are hardest to use.
This article originally appeared in AIMed Magazine issue 04, which you can read here.
Photo via Good Free Photos
Curt Kennedy is a pediatric intensivist at Texas Children’s Hospital in Houston, Texas. His primary interest is in prediction modeling using time series analyses that characterize deteriorations preceding many cardiac arrests in critically ill patients. His secondary interest is in automated screening for and detection of potentially actionable problems and the prevention of avoidable harm in critically ill patients. His aim is to equip bedside caregivers with decision support tools that bring them meaningful information across multiple channels, making it easy to use in all cases and hard to miss in cases where stakes are high. Curt personally codes all facets of the decision support platform, including: automated data extraction from the clinical interface of the Epic EMR (an admitted hack, but a current necessity), regular expression parsing to translate quasi-formatted text into relational data, SQL import/ export routines, and the creation of multiple channels of decision support output, including web, email, pager, and SMS text messaging.