Patrick Schwerdtfeger is a motivational speaker who can speak about big data, predictive analytics and business intelligence at your next business event. Contact us to check availability. The full transcript of the above video is included below.
Full Video Transcript:
Hi, and welcome to another edition of Strategic Business Insights. Today we’re going to talk about big data and exactly what it is. Let’s define big data for people who aren’t familiar with it.
It starts with one progression. This is an important place to start. Historically, data was being generated and accumulated by workers. In other words, employees of companies were entering data into computer systems. But then things evolved to the Internet and now users could generate their own data. So think about websites like Facebook. So all these users are signing up and they’re entering the data themselves.
That’s larger than the first by orders of magnitude. We’re talking about scalability here. It’s scaled up from just employees entering the data to users entering their own data. So all of a sudden the amount of data being accumulated was way higher than it was historically.
Well, now there’s even a third level in this progression, because now machines are accumulating data. The buildings and all of our cities are full of monitors that are monitoring humidity and temperature and electricity usage. There are smart meters on our homes that are measuring the amount of energy that our homes are creating. There are satellites around the earth that are monitoring the earth 24 hours a day, taking pictures, accumulating data.
Well that, once machines are accumulating data, that’s orders of magnitude higher than users. So there’s a progression, from employees generating data to users generating data to machines generating data. And we’re at the machine stage. So there are colossal amounts of data being generated.
Now, how has that changed things? Well, back in the good old days, people used to use relational databases to process through data. We don’t need to worry about what that is but essentially, again, a major shift has taken place. In the good old days, we used to take the data and bring it to the processor, the CPU, the computer chip, to process the data. But now there’s so much data that it overwhelms the CPU. Can’t do the processing because there’s too much data.
So now what people are doing is they’re bringing multiple processors and bringing it to the data. So in other words, you might have a whole row of servers and each server has some small component of the whole data set, and you put a processor in each one individually. It’s called parallel processing. So now the data’s being processed in a whole bunch of different places, parallel, at the same time.
So before, data was being brought to the processor. Now, processors are being brought to the data to process. And what is that? It’s scalably larger. In the first case, you bring the data to one CPU, but now you can bring an infinite number of CPUs to an infinite number of individual servers – parallel processing. It’s scalably larger. So now the data has grown scalably larger, orders of magnitude higher, and now we have a way to process which is scalably higher as well. So that’s the technological shift.
Now, let’s talk about some of the technologies that are allowing this to happen, and there are two that I want to mention here. One is called Hadoop. What’s Hadoop? Well, Hadoop is an open-source platform. Open-source means that it’s developed under the general public license. It’s developed by developers all across the world and it’s free to use.
Now, the reality of those, like for example, Linux is open-source. Some of the website-building platforms, what they call content management systems are open-source, like WordPress, Drupal and Joomla. Apache is a server software which is open-source. So the fact that it’s free is a little bit of an illusion because you need experts to really understand how to use it, how to implement it, how to customize it for the specific usage that you need. But the basic infrastructure of Hadoop is open-source, which means it’s free, and it organizes this parallel processing. It’s the software that allows that to happen.
And the second thing is called MapReduce, which is a way of putting a summary, basically a CliffsNotes version on each server of what data that server contains. So it’s a CliffsNotes version, it’s a summary, it’s a table of contents. It’s a table of contents, that’s the best analogy. So there’s a table of contents on each one, and all those tables of contents can go into one central server, which is essentially the search function. So the search function, you can search and be like, “Boom, okay, the answer we’re looking for is on this particular server.” That’s something that’s being done through MapReduce and Hadoop together. They work together. So those are technologies that are driving big data.
Now, who’s at the cutting edge? Google. Google’s at the cutting edge of so many things. Now, think about how much data is being accumulated by Google, not just in the search capacity and all the websites that they’re indexing, but also in all the other services that they offer like Google Analytics, Google AdWords, Google AdSense, Google Voice, Google Talk. I mean, there are so many different services that Google provides. They’re accumulating colossal amounts of data, and they know that there is a huge opportunity in that data. There are hidden wants and needs in that data that they can get into and develop algorithms providing profitable services that they can sell. They’re going to find ways to monetize this data.
And that’s not a bad thing. That’s not a bad thing. The bottom line is that services and business in general is going to become far more intuitive in the years to come because of data and how people are processing this data. So this is all a good thing. It’s going to make our world even better and a lot more fun to be a part of.
But this is a summary to give you an idea of what big data is. The data accumulation has scaled up, our ability to process the data has scaled up, and now companies like Google and many others are using these processes together to discover unbelievable insights that are hidden inside all of that data.
I hope that this video has helped you understand it a little bit better, get your head around what big data is and maybe allow you to speak a little bit more intelligently on the topic when it comes up in your world. Thanks for watching this video. My name is Patrick, reminding you to think bigger about your business, think bigger about your life.
NOTE: Patrick has a keynote program entitled Monetizing Big Data which he updates regularly with new case histories and offers at business conventions and conferences around the world.