So why is this suddenly such a big deal? Hasn’t all this AI / ML stuff been around for a long time?
Well in theory, yes, but the devil is in the scalability. Scalability basically means “how much data can you process, how long does it take, and how much money does it cost?”
Many of these Machine Learning ideas are not new, the basics have been around since the 1950s, but for most of that time it wasn’t practical to actually implement them the way we do now. General purpose programs were a lot easier to write and required a lot less physical computing hardware to run, and got the job done, mostly. ML was REALLY expensive, essentially prohibitive in cost, due to the massive amounts of data and computers required to produce anything even remotely accurate. I mean, this beast is HUNGRY! It wants to eat data and CPU cycles to achieve all the accuracy it can, which means it wants to eat your bank account.
But then some things happened, and the rules changed.
Thing 1: BIG DATA!!!
Various chained technology explosions since the creation of the internet have created a LOT of data over the past two decades. The Internet has been around since the 1980s and has been being heavily used by a lot of people since around 2000. An appreciable amount of what humans know and do is now migrated to the internet. Wikipedia has almost 7 million articles.1 There are currently almost 2 billion websites2 and hundreds of zettabytes of data.3 Knowledge, art, social interactions, financial transactions, entertainment – you name it – it’s online now. And because these petabytes of data needed to be managed, the tech industry developed methods to manage and query all that data cost effectively. Remember that “Big Data” thing that was all the buzz a few years back? This is that.
And thus, all the sudden all this massive amount of data was available to train machine learning models. Things like Amazon S3, Presto, Spark, Snowflake can manage data relatively easily that would have been impossible to even imagine even a few years ago.
In 1999 I worked on a data warehouse that was 600 gigabytes in size. At that time, it was one of the largest data warehouses in the world, so we were invited to conferences to talk about how we pulled it off. Now, the hard drive on your computer is bigger.
Places like Google and Facebook have exabytes of data under management. That’s an increase of 1,666,666,666% over my 600 gigabyte data warehouse. Even a small company like Vibrant Planet can manage petabytes of data easily.
Thing 2: $$$Targeted Advertising$$$
As the internet really got going, a few large companies learned how to use some of the vast amounts of data they were gathering to better target online advertisements to their users. Facebook and Google probably get top billing here, however they are not alone. It turned out that online advertising was the perfect incubator for a technology like machine learning.
Targeted advertising has also been around for awhile, but in the past it was mostly barely targeted. General programming or simple statistics, using relative coarse grained and obvious factors. Age. Zip Code. Ethnicity. Those things don’t really tell us much about a person, not really, and hence this early kind of targeted advertising didn’t work well. Well enough to be profitable, but lots of room for improvement. And places like Facebook and Google were reaching so many people that ANYTHING that improved their advertising effectiveness EVEN A LITTLE made those companies a metric shit ton of money.
If you serve a trillion ad impressions a year and you manage to improve the outcome to your advertisers enough to be able to charge .01 CENTS (one one hundredth of a penny) more for each impression, you just made a hundred million dollars.
So ML didn’t have to be perfect, it wasn't driverless cars running over granny when they mess up and dragging her down the road, it was forgiving. It just had to be a little tiny bit better then the way it was done before. This created the perfect flywheel for iteratively improving the techniques of ML over decades, as they were refined and refined over and over in order to squeeze every last dime out of those cat pictures and viagra searches. Companies like Facebook and Google invested billions of dollars into making ML better and made billions of dollars more off each improvement.
Some major advances in techniques came out of these activities, most importantly around attention that addressed some of the weaknesses in traditional neural nets. This new class of models (called transformers) were significantly more powerful than previous architectures.
Thing 3: Christmas Time in the Cloud
Of course being able to operate these models at places like Facebook and Google required a lot of physical computers and a lot of specialized software. This “training” and “inference” stuff is mega expensive, remember? It required more hardware than any but a handful of rich tech companies could afford and football field after football field of buildings stuffed with endless rows of computers always wanting more. For quite a while, this capability was only available to a chosen few organizations.
But Amazon, bless their hearts, had a problem. Their software engineering teams were slowing down as the code base grew bigger. They had also had a pretty successful rollout of merchant.com, an e-commerce-as-a-service platform that offered third-party retailers a way to build their own web-stores, which had given them an opportunity to test a lot of new ideas that were gaining traction in the software engineering zeitgeist, Service Oriented Architectures (SOA), continuous deployment (CICD), RESTful APIs, etc. Their engineering leadership saw an opportunity to build a shared development platform for the company that would allow their engineers to move much faster, allowing them to focus on writing code and not managing hardware.
And hey maybe, this new platform could even turn into a product for external users, kind of like merchant.com only more so?
And thus the Cloud was born.
And all of the sudden anyone with a credit card could book a ton of hardware whenever they wanted to. For an hour, for a day, for a year.
Vendors like Google and Microsoft are even pioneering next generation clouds (in Microsoft Planetary Computer and Google Earth Engine) built on top of their general purpose clouds but specifically tailored toward types of machine learning focused on geospatial processing and ecological problems. These platforms are very promising and offer to simplify and magnify the power of underlying and foundational cloud technologies. Originally intended as primarily research platforms for scientists, they are starting to cross the line into a sturdy platform software engineers can build upon.
And why was that important? Remember that “training” part of the ML cycle? It turns out training is the super super expensive part when it comes to hardware. Inference isn’t cheap, but training generally dwarfs it. Fortunately, training is not a frequent task. Train a model, keep an eye on it, when it stops working so well, retrain it. You need a LOT of computers for a week or a month and then you don’t need them anymore for a while. Thus renting turned out to be a lot more affordable than owning for ML.
Thing 4: GPUs, we are all gamers now
The final thing that happened was hands down the weirdest of all and proof positive that God has a sense of humor. As all computer professionals know, one of the primary purposes for computers has always been to play games. Games require drawing things on a screen and then moving them around (graphics). Drawing things and moving them on a screen is all about perspective, geometries, light, angles and things blocking other things. It turns out that to do that you need to do a lot of math very quickly. Most specifically a special kind of math called Linear Algebra. You may remember Matrices and Matrix operations from college if you took a Linear Algebra class; this is that.
You can do these operations on a normal computer CPU just fine, but it turns out that to do them really fast, or to do a lot of them at the same time, it really helps to have specialized pieces of hardware that are just about computing graphics. You need to do a LOT of these operations every second, even to display something simple moving around, like a cat or a bottle of viagra, and it’s hard for a poor CPU to keep up, especially since you left some app you downloaded running in the background and it’s using your computer on the sly to mine bitcoins.
So a special kind of hardware evolved with one job in life, to make games go fast and look good. This hardware is called a GPU; they have been around since the 90s. A couple of smallish companies made them, they were never a big deal for anything except playing “Call of Duty,” generating animation and special effects for Hollywood, and repeatedly guessing passwords.
Not a big deal until the crypto boom happened and a bunch of crypto bros figured out they could defraud the gullible more efficiently by using GPU’s to mine imaginary currency (bitcoins). Mining bitcoins is just math after all. It turned out “defrauding the gullible” was a pretty damn good business model for a few years, and NVIDIA was the GPU of choice to do it, and a little gaming company no one ever heard of was suddenly awash with dollars.
Jensen Huang, the CEO of NVIDIA, rather then just blowing all that money in Vegas, invested a lot of it right back into making it really easy to use his GPU’s for things other than playing Fortnight. Most importantly he had kept iterating on a really nice interface library called CUDA that made it much easier for ML Engineers to use these GPU things, making them more and more accessible and more and more useful.
Now all those ML software engineers at Facebooks and Google were already using GPU’s for ML And they quickly discovered that the main mathematical operations you need to do Machine Learning are, you guessed it, Matrix operations.They made you sweat through Calculus but it was actually Statistic and Linear Algebra you should have been taking. Another one of God’s little jokes and a great thing to yell at your high school math teacher about.
In order to be cool, the ML crew gave these matrices a cool new name, called tensors. And they cursed the cryptobros for hogging all the GPUs.
And then the crypto winter happened, a bunch of people went to jail, and many companies went bankrupt. It turns out that defrauding the public as a business model has some limitations after all. But that was ok, Jenson was in no danger of being kicked out of the four comma club because all the pieces were now in place for a REAL revolution to kick off.
- A strong cadre of seasoned ML Engineers were there, trained on years of selling cat pictures to people on Facebook and helping Google users search for the best deals on viagra.
- The cloud was there, opening the door to a lot of small startup companies that couldn’t afford a football field full of computers.
- The software was there, paid for by aforementioned cat pictures and viagra ads, plus Jensen’s dreams of the four comma club
- The hardware and interfaces were there thanks to Jensen, paid for indirectly by the late and unlamented cryptobros stealing grandma’s retirement accounts.
Where does innovation come from?
Most of the time it isn’t walking in the woods and an apple falling on your head, or a single big idea that hits you in the shower. It comes from looking at something that is happening in one industry and figuring out how to apply it to another.
And that is exactly what is happening with the new generation of ML. It’s breaking out of big tech company advertising jail and starting to be applied virtually everywhere. Most importantly computers are learning how to create things like pictures and text. This field is called “Generative AI”. ChatGPT doing your homework, computer generated art, it’s all pretty impressive. Computers can fake being creative now, about as reliably as a human tripping hard on LSD. Not something you want writing your legal briefs, but hey you can create your very own picture of a CAT HOLDING A BOTTLE OF VIAGRA now just by typing in a few prompts on Bing.
And that picture is below.
3 https://www.ipxo.com/blog/how-big-is-the-internet/ Note a Zettabyte is 10^21 bytes. Which is a lot.