The 7 Decade History of ChatGPT

By MIKE MAGEE

Over the past year, the general popularization of AI orArtificial Intelligence has captured the world’s imagination. Of course, academicians often emphasize historical context. But entrepreneurs tend to agree with Thomas Jefferson who said, “I like dreams of the future better than the history of the past.”

This particular dream however is all about language, its standing and significance in human society. Throughout history, language has been a species accelerant, a secret power that has allowed us to dominate and rise quickly (for better or worse) to the position of “masters of the universe.”

Well before ChatGPT became a household phrase, there was LDT or the laryngeal descent theory. It professed that humans unique capacity for speech was the result of a voice box, or larynx, that is lower in the throat than other primates. This permitted the “throat shape, and motor control” to produce vowels that are the cornerstone of human speech. Speech – and therefore language arrival – was pegged to anatomical evolutionary changes dated at between 200,000 and 300,000 years ago.

That theory, as it turns out, had very little scientific evidence. And in 2019, a landmark study set about pushing the date of primate vocalization back to at least 3 to 5 million years ago. As scientists summarized it in three points: “First, even among primates, laryngeal descent is not uniquely human. Second, laryngeal descent is not required to produce contrasting formant patterns in vocalizations. Third, living nonhuman primates produce vocalizations with contrasting formant patterns.”

Language and speech in the academic world are complex fields that go beyond paleoanthropology and primatology. If you want to study speech science, you better have a working knowledge of “phonetics, anatomy, acoustics and human development” say the experts. You could add to this “syntax, lexicon, gesture, phonological representations, syllabic organization, speech perception, and neuromuscular control.”

Professor Paul Pettitt, who makes a living at the University of Oxford interpreting ancient rock paintings in Africa and beyond, sees the birth of civilization in multimodal language terms. He says, “There is now a great deal of support for the notion that symbolic creativity was part of our cognitive repertoire as we began dispersing from Africa. Google chair, Sundar Pichai, maintains a similarly expansive view when it comes to language. In his December 6, 2023, introduction of their ground breaking LLM (large language model), Gemini (a competitor of ChatGPT), he described the new product as “our largest and most capable AI model with natural image, audio and video understanding and mathematical reasoning.”

Digital Cognitive Strategist, Mark Minevich, echoed Google’s view that the torch of human language had now gone well beyond text alone and had been passed to machines. His review: “Gemini combines data types like never before to unlock new possibilities in machine learning… Its multimodal nature builds on, yet goes far beyond, predecessors like GPT-3.5 and GPT-4 in its ability to understand our complex world dynamically.”

GPT what???

O.K. Let’s take a step back, and give us all a chance to catch-up.

What we call AI or “artificial intelligence” is a 70-year old concept that used to be called “deep learning.” This was the brain construct of University of Chicago research scientists Warren McCullough and Walter Pitts, who developed the concept of “neural nets” in 1944, modeling the theoretical machine learner after human brains, consistent of multiple overlapping transit fibers, joined at synaptic nodes which, with adequate stimulus could allow gathered information to pass on to the next fiber down the line.

On the strength of that concept, the two moved to MIT in 1952 and launched the Cognitive Science Department uniting computer scientists and neuroscientists. In the meantime, Frank Rosenblatt, a Cornell psychologist, invented the “first trainable neural network” in 1957 termed by him futuristically, the “Perceptron” which included a data input layer, a sandwich layer that could adjust information packets with “weights” and “firing thresholds”, and a third output layer to allow data that met the threshold criteria to pass down the line.

Back at MIT, the Cognitive Science Department was in the process of being hijacked in 1969 by mathematicians Marvin Minsky and Seymour Papert, and became the MIT Artificial Intelligence Laboratory. They summarily trashed Rosenblatt’s Perceptron machine believing it to be underpowered and inefficient in delivering the most basic computations. By 1980, the department was ready to deliver a “never mind,” as computing power grew and algorithms for encoding thresholds and weights at neural nodes became efficient and practical.

The computing leap, experts now agree, came “courtesy of the computer-game industry” whose “graphics processing unit” (GPU), which housed thousands of processing cores on a single chip, was effectively the neural net that McCullough and Pitts had envisioned. By 1977, Atari had developed game cartridges and microprocessor-based hardware, with a successful television interface.

With the launch of the Internet, and the commercial explosion of desk top computing, language – that is the fuel for human interactions worldwide – grew exponentially in importance. More specifically, the greatest demand was for language that could link humans to machines in a natural way.

With the explosive growth of text data, the focus initially was on Natural Language Processing (NLP), “an interdisciplinary subfield of computer science and linguistics primarily concerned with giving computers the ability to support and manipulate human language.” Training software initially used annotated or referenced texts to address or answer specific questions or tasks precisely. The usefulness and accuracy to address inquiries outside of their pre-determined training was limited and inefficiency undermined their usage.

But computing power had now advanced far beyond what Warren McCullough and Walter Pitts could have possibly imagined in 1944, while the concept of “neural nets” couldn’t be more relevant. IBM describes the modern day version this way:

“Neural networks …are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another… Artificial neural networks are comprised of node layers, containing an input layer, one or more hidden layers, and an output layer…Once an input layer is determined, weights are assigned. These weights help determine the importance of any given variable, with larger ones contributing more significantly to the output compared to other inputs. All inputs are then multiplied by their respective weights and then summed. Afterward, the output is passed through an activation function, which determines the output. If that output exceeds a given threshold, it “fires” (or activates) the node, passing data to the next layer in the network… it’s worth noting that the “deep” in deep learning is just referring to the depth of layers in a neural network. A neural network that consists of more than three layers—which would be inclusive of the inputs and the output—can be considered a deep learning algorithm. A neural network that only has two or three layers is just a basic neural network.”

The bottom line is that the automated system responds to an internal logic. The computers “next choice” is determined by how well it fits in with the prior choices. And it doesn’t matter where the words or “coins” come from. Feed it data, and it will “train” itself; and by following the rules or algorithms imbedded in the middle decision layers or screens, it will “transform” the acquired knowledge, into “generated” language that both human and machine understand.

In 2016, a group of tech entrepreneurs including Elon Musk and Reed Hastings, believing AI could go astray if restricted or weaponized, formed a non-profit called OpenAI. Two years later they released a deep learning product called Chat GPT. This solution was born out of the marriage of Natural Language Processing and Deep Learning Neural Links with a stated goal of “enabling humans to interact with machines in a more natural way.”

The GPT stood for “Generative Pre-trained Transformer.” Built into the software was the ability to “consider the context of the entire sentence when generating the next word” – a tactic known as “auto-regressive.” As a “self-supervised learning model,” GPT is able to learn by itself from ingesting or inputting huge amounts of anonymous text; transform it by passing it through a variety of intermediary weighed screens that jury the content; and allow passage (and survival) of data that is validated. The resultant output? High output language that mimics human text.

Leadership in Microsoft was impressed, and in 2019 ponied up $1 billion to jointly participate in development of the product and serve as their exclusive Cloud provider.

The first ChatGPT-1 by OpenAI was first introduced by GPT-1 in 2018, but not formally released publicly until November 30, 2022.

It was trained on an enormous BooksCorpus dataset. Its’ design included an input and output layer, with 12 successive transformer layers sandwiched in between. It was so effective in Natural Language Processing that minimal fine tuning was required on the back end.

OpenAI released version two, called GPT-2, next, which was 10 times the size of its predecessor with 1.5 billion parameters, and the capacity to translate and summarize. GPT-3 followed. It had now grown to 175 billion parameters, 100 times the size of GPT-2, and was trained by ingesting a corpus of 500 billion content sources (including those of my own book – CODE BLUE). It could now generate long passages on verbal demand, do basic math, write code, and do (what the inventors describe as) “clever tasks.” An intermediate GPT 3.5 absorbed Wikipedia entries, social media posts and news releases.

On March 14, 2023, GPT-4 went big language, now with multimodal outputs including text, speech, images, and physical interactions with the environment. This represents an exponential convergence of multiple technologies including databases, AI, Cloud Computing, 5G networks, personal Edge Computing, and more.

The New York Times headline announced it as “Exciting and Scary.” Their technology columnist wrote, “What we see emerging are machines that know how to reason, are adept at all human languages, and are able to perceive and interact with the physical environment.” He was not alone in his concerns. The Atlantic, at about the same time, ran an editorial titled, “AI is about to make social media (much) more toxic.

Leonid Zhukov, Ph.D, director of the Boston Consulting Group’s (BCG) Global AI, believes offerings like ChatGPT-4 and Genesis have the potential to become the brains of autonomous agents—which don’t just sense but also act on their environment—in the next 3 to 5 years. This could pave the way for fully automated workflows.”

Were he alive, Leonardo da Vinci, would likely be unconcerned. Five hundred years ago, he wrote nonchalantly, “It had long come to my attention that people of accomplishment rarely sat back and let things happen to them. They went out and happened to things.”

Mike Magee MD is a Medical Historian and regular contributor to THCB. He is the author of CODE BLUE: Inside America’s Medical Industrial Complex (Grove/2020).