By MIKE MAGEE
For my parents, March, 1965 was a banner month. First, that was the month that NASA launched the Gemini program, unleashing “transformative capabilities and cutting-edge technologies that paved the way for not only Apollo, but the achievements of the space shuttle, building the International Space Station and setting the stage for human exploration of Mars.” It also was the last month that either of them took a puff of their favored cigarette brand – L&M’s.
They are long gone, but the words “Gemini” and the L’s and the M’s have taken on new meaning and relevance now six decades later.
The name Gemini reemerged with great fanfare on December 6, 2023, when Google chair, Sundar Pichai, introduced “Gemini: our largest and most capable AI model.” Embedded in the announcement were the L’s and the M’s as we see here: “From natural image, audio and video understanding to mathematical reasoning, Gemini’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.
Google’s announcement also offered a head to head comparison with GPT-4 (Generative Pretrained Transformer-4.) It is the product of a non-profit initiative, and was released on March 14, 2023. Microsoft’s helpful AI search engine, Bing, helpfully informs that, “OpenAI is a research organization that aims to create artificial general intelligence (AGI) that can benefit all of humanity…They have created models such as Generative Pretrained Transformers (GPT) which can understand and generate text or code, and DALL-E, which can generate and edit images given a text description.”
While “Bing” goes all the way back to a Steve Ballmer announcement on May 28, 2009, it was 14 years into the future, on February 7, 2023, that the company announced a major overhaul that, 1 month later, would allow Microsoft to broadcast that Bing (by leveraging an agreement with OpenAI) now had more than 100 million users.
Which brings us back to the other LLM (large language model) – GPT-4, which the Gemini announcement explores in a head-to-head comparison with its’ new offering. Google embraces text, image, video, and audio comparisons, and declares Gemini superior to GPT-4.
Mark Minevich, a “highly regarded and trusted Digital Cognitive Strategist,” writing this month in Forbes, seems to agree with this, writing, “Google rocked the technology world with the unveiling of Gemini – an artificial intelligence system representing their most significant leap in AI capabilities. Hailed as a potential game-changer across industries, Gemini combines data types like never before to unlock new possibilities in machine learning… Its multimodal nature builds on yet goes far beyond predecessors like GPT-3.5 and GPT-4 in its ability to understand our complex world dynamically.”
Expect to hear the word “multimodality” repeatedly in 2024 and with emphasis.
But academics will be quick to remind that the origins can be traced all the way back to 1952 scholarly debates about “discourse analysis”, at a time when my Mom and Dad were still puffing on their L&M’s. Language and communication experts at the time recognized “a major shift from analyzing language, or mono-mode, to dealing with multi-mode meaning making practices such as: music, body language, facial expressions, images, architecture, and a great variety of communicative modes.”
Minevich believes that “With Gemini’s launch, society has arrived at an inflection point with AI advancement.” Powerhouse consulting group, BCG (Boston Consulting Group), definitely agrees. They’ve upgraded their L&M’s, with a new acronym, LMM, standing for “large multimodal model.” Leonid Zhukov, Ph.D, director of the BCG Global AI Institute, believes “LMMs have the potential to become the brains of autonomous agents—which don’t just sense but also act on their environment—in the next 3 to 5 years. This could pave the way for fully automated workflows.”
BCG predicts an explosion of activity among its corporate clients focused on labor productivity, personalized customer experiences, and accelerated (especially) scientific R&D. But they also see high volume consumer engagement generating content, new ideas, efficiency gains, and tailored personal experiences.
This seems to be BCG talk for “You ain’t seen nothing yet.” In 2024, they say all eyes are on “autonomous agents.” As they describe what’s coming next: “Autonomous agents are, in effect, dynamic systems that can both sense and act on their environment. In other words, with stand-alone LLMs, you have access to a powerful brain; autonomous agents add arms and legs.”
This kind of talk is making a whole bunch of people nervous. Most have already heard Elon Musk’s famous 2023 quote, “Mark my words, AI is far more dangerous than nukes. I am really quite close to the cutting edge in AI, and it scares the hell out of me.” BCG acknowledges as much, saying, “Using AI, which generates as much hope as it does horror, therefore poses a conundrum for business… Maintaining human control is central to responsible AI; the risks of AI failures are greatest when timely human intervention isn’t possible. It also demands tempering business performance with safety, security, and fairness… scientists usually focus on the technical challenge of building goodness and fairness into AI, which, logically, is impossible to accomplish unless all humans are good and fair.”
Expect in 2024 to see once again the worn out phrase “Three Pillars” . This time it will be attached to LMM AI, and it will advocate for three forms of “license” in operate:
Legal license – “regulatory permits and statutory obligations.”
Economic license – ROI to shareholders and executives.
Social license – a social contract delivering transparency, equity and justice to society.
BCG suggests that trust will be the core challenge, and that technology is tricky. We’ve been there before. The 1964 Surgeon General’s report knocked the socks off of tobacco company execs who thought high-tech filters would shield them from liability. But the government report burst that bubble by stating “Cigarette smoking is a health hazard of sufficient importance in the United States to warrant appropriate remedial action.” Then came the Gemini 6A’s 1st attempt to launch on December 12,1965. It was cancelled when its’ fuel igniter failed.
Generative AI driven LMM’s will “likely be transformative,” but clearly will also have its ups and downs as well. As BCG cautions, “Trust is critical for social acceptance, especially in cases where AI can act independent of human supervision and have an impact on human lives.”
Mike Magee MD is a Medical Historian and regular contributor to THCB. He is the author of CODE BLUE: Inside America’s Medical Industrial Complex.
Leave a comment