Inside June 2025: Explore the Future of Warehousing & Intralogistics | Dark Warehouses, AMRs & More – Read the Digital Issue Now!

Sep 11, 2024

The Hidden Costs of AI

Dr. Utpal Chakraborty highlights the future potential of combining cloud and edge computing to optimize cost, latency, and performance for AI applications. This approach promises to make advanced AI technologies, such as Large Language Models (LLMs), more accessible and cost effective for businesses of all sizes, overcoming current challenges related to infrastructure costs and scalability.

In the future, a combination of cloud and edge computing will help balance cost, latency, and performance, making AI more accessible for businesses of all sizes, says Dr Utpal Chakraborty.

Representative image. Source: Hitesh Choudhary on Unsplash

The rapid rise of Large Language Models (LLMs) has unlocked incredible opportunities for companies of all sizes, from ambitious startups to global enterprises. Platforms like Hugging Face have made these foundational models available, opening doors to innovations in all domains and industry verticals. But the journey from accessing these models to building tailored applications isn’t without its hurdles. One of the biggest challenges is the significant infrastructure costs tied to training, fine-tuning, and deploying these models at scale. For startups and small businesses, this financial burden can become a major roadblock.

Why is training and fine-tuning LLMs expensive?

While it’s easier to use Pre-Trained models than building one from scratch, fine-tuning them for specific tasks can still be resource-intensive and costly.

Cost Drivers:

1. Computational Power

Fine-tuning LLMs requires access to high-performance hardware, like GPUs or TPUs. Even a moderately sized model can take hours of processing time on powerful GPUs, costing anywhere from $4 to $10 per hour. These expenses can quickly add up to thousands of dollars for a single fine-tuning session, and many such sessions are required for any LLM models before they start producing the desired accuracy that may go up to millions of dollars depending on the use case in hand.

2. Data Preparation

Before fine-tuning, there is the task of getting the data ready, cleaning it, formatting it, and maybe even expanding it. Plus, storing these large datasets can be expensive, especially when using cloud storage, which charges based on both space and data transfer.

3. Hyperparameter Optimisation

To ensure the model performs at its best, there’s often a need for repeated testing and tweaking, known as hyperparameter optimisation. This trial and error process can drive up the computational costs.

The high cost of running inference at scale

Once a model is trained and fine-tuned, the next challenge is deploying it. Whether it’s used to generate text or images in real-time or process data in bulk, running LLMs at scale can be expensive.

Cost Implications:

a) Real-Time Applications

Services like chatbots or virtual assistants need to provide responses instantly. To meet these low-latency demands, models must be hosted on high-performance servers, which can get costly fast.

b) Bulk Data Processing

While batch processing can reduce per-inference costs, it still requires substantial infrastructure, especially when handling large volumes of data.

c) Energy Usage

Running LLMs isn’t just about computing power; they also consume significant energy, further driving up costs.

The financial strain on startups

For smaller businesses, the high costs associated with training, fine-tuning, and running LLMs at scale can be a significant barrier. Unlike large companies with deep pockets, startups often operate with limited budgets, making it harder to justify these investments.

Economic Challenges:

a) High Initial Costs

Accessing the required computing power involves a hefty upfront investment. Whether opting for on-premises hardware or cloud-based solutions, the costs can exceed what most startups can afford.

b) Unpredictable Ongoing Costs

The expenses don’t stop after the initial setup. Maintenance, scaling, and data transfers can lead to unpredictable and sometimes unsustainable costs over time.

c) Competitive Disadvantage

Startups that can’t afford to invest in LLM infrastructure may find themselves at a disadvantage compared to larger players, potentially widening the technology gap.

Cloud solutions are double-edged sword

Cloud platforms like AWS, Google Cloud, and Azure offer a flexible alternative to on-premises infrastructure, allowing businesses to rent compute resources as needed. But while the flexibility is appealing, the costs can still spiral out of control.

The Pros and Cons:

a) Scalability

Cloud platforms make it easy to scale up or down based on demand, which is particularly beneficial for startups.

b) Escalating Costs

However, as usage increases, so do the costs. For example, using high-end GPUs like NVIDIA A100 on AWS can cost a significant amount for training any LLM.

c) Data Privacy

Some industries, like finance or healthcare, have strict requirements around data privacy, which cloud-based solutions may not fully meet. This could mean additional investments in security or hybrid cloud strategies, adding further costs.

d) Latency

For real-time applications, cloud-based models can sometimes face latency issues due to network delays.

Strategies to reduce costs

Despite the challenges, there are ways for startups and enterprises to mitigate the costs of adopting LLMs, although it’s not that easy.

a) Model Distillation and Pruning

Reducing model size without sacrificing performance can lower the computational power needed for training and inference, cutting costs.

b) Leveraging Pre-Trained Models

Companies can save by using pre-trained models and applying minimal fine-tuning on smaller datasets, reducing compute requirements.

c) Hybrid Cloud and Edge Computing

Combining cloud-based solutions with edge computing can optimise both cost and performance. Deploying models closer to the data source can reduce latency and bandwidth expenses.

d) Open-Source Resources

Participating in open-source communities or using platforms like Hugging Face can reduce some costs.

e) Partnerships and Grants

Engaging with cloud providers for credits or applying for research grants can help alleviate the financial burden.

Finding the balance (Accuracy vs. Costs)

One of the ongoing challenges with foundational LLMs is balancing accuracy with cost. While these models are powerful, they’re not always optimised for specific tasks outofthebox, particularly when dealing with domain-specific languages like legal, financial or medical jargon. Fine-tuning for higher accuracy adds to both the computational and financial load, creating a barrier to adoption, especially for startups.

What does the future look like?

Overcoming the infrastructure cost barrier for LLMs will require a mix of technological advancements and strategic optimisations. Innovations like model Pruning and Distillation will make it easier to run LLMs on modest infrastructure. Meanwhile, advancements in AI accelerators and energy-efficient hardware will lower operational costs. Cloud providers are also likely to offer more tailored, cost-effective solutions for startups.

In the future, a combination of cloud and edge computing will help balance cost, latency, and performance, making AI more accessible for businesses of all sizes. As the technology matures, we can expect a more inclusive landscape, where both startups and large enterprises can harness the power of AI without breaking the bank.

Dr Utpal Chakraborty is Chief Technology Officer at IntellAI NeoTech, Professor of Practice – VIPS, and Gartner Ambassador (AI). A former Head of Artificial Intelligence at YES Bank, he is an eminent AI, Quantum and Data Scientist, AI researcher and Strategist, having 21 years of industry experience, including working as Principal Architect in L&T Infotech, IBM, Capgemini and other MNCs in his past assignments. Dr Utpal is a well-known researcher, writer (author of 6 books) and speaker on Artificial Intelligence, IoT, Agile & Lean at TEDx and conferences around the world.

His recent research on machine learning titled “Layered Approximation for Deep Neural Networks” has been appreciated in different premier conferences, institutions, and universities.

______________________________________________________________________________________________

For a deeper dive into the dynamic world of Industrial Automation and Robotic Process Automation (RPA), explore our comprehensive collection of articles and news covering cutting-edge technologies, robotics, PLC programming, SCADA systems, and the latest advancements in the Industrial Automation realm. Uncover valuable insights and stay abreast of industry trends by delving into the rest of our articles on Industrial Automation and RPA at www.industrialautomationindia.in