Has the Era of the LLM Come to an End?

Large language models (LLMs) continue to grow, sowing doubts about their costs, energy consumption, etc. Are we heading towards a new era of small language models (SLMs)?

Large language models (LLMs) started to become popular with the release of OpenAI’s ChatGPT.

Suddenly, everyone discovered that artificial intelligence (AI) was not just a technology in the guts of their mobile phones, smart speakers, video and music streaming services or ecommerce recommenders, etc., but that anyone could make use of it.

This generative AI is made possible by the LLMs behind it, powered by huge amounts of data. ‘For example, Meta’s Llama 3 is already at about 400 billion parameters. And they are going to grow even more, because it is thought that emerging capabilities can emerge that will help improve overall AI,’ says Enrique Lizaso, CEO and co-founder of Multiverse Computing.

This allows AI to do more and more things. But this evolution brings with it significant challenges. ‘AI LLMs have become powerful and transformative tools in virtually every field. However, as an emerging technology, they present some interesting challenges,’ says David Hurtado, Chief Innovation Officer at Microsoft.

Exorbitant costs

The first stumbling block is the high cost of training these models. ‘Firstly, training these models requires a large investment in computational resources and data, which can be expensive and complex. The work here is focusing on making the models more efficient to reduce costs and resource consumption,’ he says.

‘The costs associated with acquiring, training and tuning LLMs can be astronomical, as training some of the major models can cost almost $200m, which is prohibitive for many companies. On top of that, there is the cost of customising to the specific requirements or data of each organisation, as well as hiring the skilled professionals who can execute the project,’ says Jan Wildeboer, EMEA evangelist at Red Hat.

In addition, costs constantly are rising. Lizaso points out that the next generation of LLM is expected to cost close to $1 billion.

This has led to funding rounds such as the one Elon Musk has closed for xAI, raising $6 billion.

Rampant energy consumption

There is also the enormous energy consumption of the data centres that drive these LLMs, with the repercussions this has on both operational costs and their environmental impact.

‘LLMs need to be completely retrained every time data is added, which is also a high energy cost,’ says the Red Hat manager.

‘In some countries, such as Ireland, the consumption of data centres has gone out of control. It has been seen that they could account for as much as 30% of total electricity consumption. This is leading to the development of legislation mandating green power consumption. There is legislative and governmental pressure to adjust energy consumption,’ says Multiverse Computing’s CEO.

The big companies in the industry are taking action, as we have reported on various occasions. ‘We recognise the energy impact of these models and are committed to their sustainable development and operation. That’s why we invest in research to measure and reduce the energy use and carbon footprint of AI,’ says Microsoft’s Innovation lead.

Other factors

Those are the main challenges facing LLM developers, but they are not the only ones. ‘Another interesting challenge is the accuracy of the models. In certain very specific or technical contexts, an LLM may not be accurate enough. And it is not always improved with a bigger model. We are currently investing a lot of resources in improving the training processes to make the models more accurate and less prone to hallucination,’ says Hurtado.

Wildeboer also emphasises the doubts surrounding the transparency of LLMs, which is one of the big challenges for AI in the coming years. ‘They resemble an impenetrable black box. Their training with billions of raw data makes it difficult to trace the origin of their answers and the logic behind them. This opacity raises doubts about their reliability, makes it difficult to explain their decisions and raises serious concerns about fairness and the possible perpetuation of bias in sensitive areas such as justice or medicine’.

In a similar spirit, the Microsoft manager emphasises the challenge of accountability. ‘At Microsoft, we have a very strict RAI (Responsible AI) methodology, guided by key pillars such as fairness, reliability, security, privacy, inclusion, transparency and accountability. These values are translated into guidelines and procedures for all employees,’ he highlights.

Alternatives to LLMs

Despite this, it seems unlikely that we are nearing the end of the LLM era. But technology companies are aware of these challenges and know that they make it difficult for companies to deploy LLMs and develop use cases, so they are responding.

‘The options for solving the LLM challenges are twofold, in parallel. One is to consistently improve the efficiency of large models, so that they become smaller and cheaper. The second is the use of Small Language Models (SLMs),’ says Hurtado.

‘SLMs are a tremendously promising solution, since they use a fraction of the computational resources and energy consumption of LLMs, but with similar performance on certain tasks,’ he explains.

‘Both paths, creating SLMs and improving LLMs, go in parallel and are complementary. Everything points to the future being a combination of both,’ he adds.

Lizaso also believes so. ‘The big model makers, such as Meta, OpenAI or Anthropic, have seen this trend. As well as bringing out large models, they are also launching intermediate and smaller ones,’ he says.

What are the tech companies doing?

Following this trend, Microsoft has developed Phi-3, ‘a family of small language models that reimagines what is possible with this type of model,’ says Hurtado.

‘Phi-3 is designed to be highly efficient and adaptable, and offers exceptional performance. Phi-3-mini, with 3.8 billion parameters, has proven to be very efficient in language generation and understanding tasks, outperforming larger models. This model is ideal for applications that require fast and accurate responses in specific domains, such as customer service chatbots, recommender systems and virtual assistants,’ he says.

‘In addition, Phi-3 has been optimised to run on a wide range of devices, from cloud servers to mobile devices. For example, it is capable of running on an iPhone 15 with an A16 Bionic processor, achieving great fluidity. This opens up new possibilities for mobile applications that require natural language processing without relying on constant cloud connectivity,’ he argues.

‘Another key advantage is deployment flexibility. Phi-3 can be deployed in the cloud, on the edge or on local devices, allowing organisations to choose the best option for their specific needs. This flexibility is especially valuable in environments where data privacy and latency are critical, such as in healthcare and finance applications,’ he adds.

Finally, he notes that this family of models stands out for its customisability. ‘They are offered as open models that can be adjusted and tuned with domain-specific data to improve their accuracy and relevance in particular contexts. This allows organisations to tailor Phi-3 to their specific needs without the need for large investments’.

Another alternative to LLMs is the Swarm AI approach, based on the use of many small models, trained for specific tasks. ‘This innovative approach relies on collaboration between multiple small models, each specialised in a specific task. These models, whether developed in-house or acquired from third parties, are integrated into a meta-level that acts as a conductor, coordinating and combining their capabilities. This creates a modular and versatile AI, capable of addressing a wider range of challenges with greater accuracy and efficiency,’ says the Red Hat manager.

‘Upon receiving a query, the meta-tier strategically selects which model or combination of models is best equipped to provide the most accurate and relevant answer. We find that these smaller models are more agile and flexible, and are more likely to meet business and regulatory expectations,’ he says.

He further notes that his company has launched Red Hat Enterprise Linux AI (RHEL AI), ‘a foundational model platform that enables users to efficiently develop, test and run generative AI models to power enterprise applications’.

‘In RHEL AI we brought together Granite, which is the open source licensed LLM family, and the InstructLab model alignment tools, based on the Large Scale Alignment for Chatbots (LAB) methodology. We’ve packaged it all as a bootable, optimised RHEL image for deployment on individual servers in the hybrid cloud,’ he specifies.

Looking into the future, he believes that ‘smaller, more efficient, custom-built AI models will form a substantial mix of the enterprise IT stack, along with cloud-native applications’. ‘This infrastructure will enable enterprises to access AI and develop applications that fit their business needs,’ he predicts.

For his part, Lizaso explains that Multiverse Computing is betting on another solution. ‘These models can be compressed a lot, as long as you do it in an intelligent way, knowing what you are eliminating. We take a large model and compress it as much as we can, without losing capabilities. Because it is smaller, it has the costs of a small model. It gives companies the performance of the big model, but at the price of the small ones,’ he stresses.

He says his technology is capable of taking this compression up to 70%, with a loss of accuracy of between 2% and 4%. However, if it is to be used for application in a company’s internal procedures, it can be ‘sanitised’ by a very specific short retraining. Furthermore, the use of the company’s own databases makes these models very accurate in the field in which they are to be used.

Multiple advantages over LLMs…

The Microsoft manager points out that SLMs have two advantages over LLMs. ‘First and foremost is the lower cost of training and operation. This allows for two things. On the one hand, it is more accessible for any organisation to create or modify its own language model. The creation of an SLM does not require the financial and computational resources of an LLM. On the other hand, an SLM is easier to use and integrate into the company’s common tools. SLMs mean the democratisation of the creation and use of language models. The second advantage is with respect to execution outside the cloud. An SLM can be run on a mobile phone or a regular computer. This eliminates communication dependencies and greatly reduces usage costs,’ he says.

Along the same lines, the CEO of Multiverse Computing explains that the great advantage of his solutions is that they allow the same benefits as LLMs to be enjoyed on all types of devices, such as mobile phones, televisions, etc., but also on other elements, such as cars, for example, as they can operate without very powerful hardware and, above all, without an internet connection.

As an example, he indicates that these compressed models can be used to operate various functions of a vehicle, without the need for buttons or touch screens, using voice, reducing production costs by dispensing with these elements. But for this to happen, it is essential to have models that do not require connection to the cloud. ‘The model can’t get hung up because you enter a tunnel,’ he warns.

Similarly, these solutions have applications in the military, where you can’t rely on an internet connection, both for possible connectivity problems and for security.

Likewise, the security of an offline model is also crucial in environments such as healthcare, where the privacy of confidential and highly-sensitive information must be safeguarded at all costs.

In addition, Wildeboer notes that these smaller models ‘do not remain static after their initial training’. ‘They can continuously learn and update themselves with new data, without the need for comprehensive and costly retraining. This flexibility translates into significantly faster training times’. He also notes that ‘these are models that promote transparency, traceability and reliability, characteristics that are very relevant for the European Union, which is a pioneer in technological regulation’.

Furthermore, he insists that ‘the simplicity of small models democratises access to AI’. He also stresses that ‘their ease of training, optimisation and deployment allows companies, even with limited resources, to experiment and develop tailor-made solutions, overcoming the barrier of the shortage of AI specialists and the complexity associated with large linguistic models’.

…And some drawbacks

Despite these advantages over LLMs, SLMs also have a number of drawbacks.

‘Although SLMs are efficient and highly adaptive, LLMs are still superior in certain areas. Small models have been able to match large models in certain very specific tasks, but LLMs have greater processing power and language understanding, which makes them more suitable for complex tasks that require deep reasoning and/or processing of medium/large volumes of data. In this sense, in environments that require complex tasks or a high level of precision, LLMs will continue to be the solution,’ Hurtado predicts.

‘On the other hand, an off-cloud solution, deployed on a device, will require regular updates. This is one of the advantages of the cloud, the automatic and invisible update to the user. And given the speed at which AI is advancing these days, updates are constant,’ he acknowledges.

And the Red Hat man believes that uptake could also be a problem. ‘Will enough people, companies and projects explore this opportunity? We believe the benefits are compelling and we are already seeing solid traction in some important areas. The key is to grow this movement,’ he concludes.