
Artificial intelligence (AI) already contributes an estimated £3.7bn a year to the UK economy, but it is also forcing organisations to make increasingly strategic choices about how they invest. As AI adoption moves beyond model training into deployment and implementation, one question has become central: should organisations rely on large language models (LLMs) or small language models (SLMs)?
Given the resources required to train and run many models, the answer depends less on hype than on a clear understanding of trade-offs. In practice, the decision rests on four considerations: hardware capability, security, latency and suitability for edge computing.
Despite frequent debate within the industry, the distinction between LLMs and SLMs is rarely ideological. Once organisations define their use cases, the most appropriate model type usually becomes clear.
Hardware cost and complexity
Matching model size to hardware capacity is critical. LLMs demand substantial compute resources, often requiring high-bandwidth memory and multiple GPUs. As a result, most are run in centralised cloud environments, where costs can be managed at scale. Running LLMs locally is typically impractical for all but the most well-resourced organisations.
SLMs, by contrast, contain far fewer parameters, typically fewer than four billion, and therefore require significantly less memory. They do not depend on high-bandwidth memory and can operate efficiently across a much wider range of hardware. This translates into lower infrastructure costs and reduced operational complexity.
For environments with constrained compute resources, such as IoT devices, mobile platforms or AI-enabled PCs, SLMs are often the only viable option.
Security and data sovereignty
SLMs also offer advantages in data privacy and security. Data breaches are costly in any context, but the risk is heightened in regulated sectors such as finance, healthcare and transport, where sensitive information cannot easily be transmitted to cloud-based APIs.
UK government guidance on responsible data use, highlighted in toolkits such as the The Model for Responsible Innovation, and reinforced by legislation such as UK GDPR, has made data governance an imperative for organisations deploying AI. While self-hosted or isolated LLMs can mitigate some data-in-transit risks, they are often inefficient, requiring complex architectures and significant infrastructure investment.
SLMs can be deployed fully on premises or embedded directly into hardware, allowing organisations to maintain tighter control over sensitive data. In scenarios where data cannot leave a device or data centre, SLMs may be the only practical solution.
Latency and real-time decision-making
Latency is another critical factor. As AI shifts from training to inferencing, models are increasingly deployed in distributed and decentralised environments. Many applications rely on real-time data, leaving little tolerance for delay.
Cloud-based LLMs typically introduce several seconds of processing and network latency. For use cases such as voice assistants, customer service bots or industrial systems, this delay can undermine performance. In extreme cases, latency that escalates into downtime can carry serious financial consequences.
SLMs are designed for speed. By operating closer to where data is generated, they can deliver sub-second responses, making them better suited to latency-sensitive applications. While users may never see the underlying model, they notice the speed and accuracy of its outputs. For organisations prioritising responsiveness and user experience, this can be decisive.
Edge applications
These considerations converge most clearly at the edge. SLMs are purpose-built for edge computing, where connectivity may be limited and resilience is essential. They can support local decision-making in environments ranging from remote medical devices to industrial machinery.
For example, an SLM can enable real-time patient monitoring in healthcare settings or anomaly detection on a factory floor, even where network access is intermittent. These use cases demand lean infrastructure, strong data security and minimal latency, which are all conditions that favour smaller models.
LLMs, by contrast, are generally impractical at the edge due to their size and infrastructure requirements. As interest in edge AI grows, with around half of UK firms already exploring deployments, SLMs are likely to play an increasingly central role.
The case for LLMs
Not all applications, however, belong at the edge. Some organisations require the depth and breadth of reasoning that only LLMs can provide. Tasks involving complex analysis, broad contextual understanding or cross-domain knowledge often benefit from large, centralised models running in data centres or the cloud.
These use cases prioritise thoroughness over speed. They require substantial compute capacity, and some latency is acceptable. Organisations with the necessary infrastructure and fewer regulatory constraints are best positioned to benefit from LLMs in this context.
Cost and resource considerations
Cost remains a decisive factor. LLMs offer powerful reasoning and generalisation capabilities, but they come at a premium. Beyond access fees for cloud APIs, they require ongoing investment in specialised infrastructure and energy-intensive compute resources. This makes them viable primarily for large-scale initiatives with substantial budgets.
SLMs are comparatively economical. Their lightweight deployments and modest compute requirements make them accessible to small and medium-sized businesses, as well as enterprise edge environments. For routine tasks, templated responses or time-critical operations, they can deliver strong returns without the operational burden associated with larger models.
Needs, not labels
The choice between LLMs and SLMs should be guided by application requirements rather than terminology. End users care about speed, accuracy, privacy and cost, not whether a model is classified as “large” or “small”.
In practice, many organisations will deploy both: SLMs to unlock new capabilities at the edge, and LLMs to support complex, centralised workloads. The challenge lies in understanding where each fits and aligning model choice with business needs, infrastructure constraints and risk tolerance.
Successful AI strategies are rarely one-size-fits-all. They depend on selecting the right model for the task at hand, and on having the expertise to deploy, manage and govern those systems effectively as AI capabilities continue to scale.
Lenovo’s AI Services support organisations in operationalising trust through secure-by-design frameworks, bias mitigation and continuous oversight.
Artificial intelligence (AI) already contributes an estimated £3.7bn a year to the UK economy, but it is also forcing organisations to make increasingly strategic choices about how they invest. As AI adoption moves beyond model training into deployment and implementation, one question has become central: should organisations rely on large language models (LLMs) or small language models (SLMs)?
Given the resources required to train and run many models, the answer depends less on hype than on a clear understanding of trade-offs. In practice, the decision rests on four considerations: hardware capability, security, latency and suitability for edge computing.
Despite frequent debate within the industry, the distinction between LLMs and SLMs is rarely ideological. Once organisations define their use cases, the most appropriate model type usually becomes clear.
