Doubleword | Using LLMs for Enterprise Use Cases: How Much Does It Really Cost?

Main takeaways

Techniques like model compression, retrieval-augmented generation (RAG), and using smaller models for narrower tasks can make large language models (LLMs) more efficient and cost-effective.
For enterprises, some everyday use cases for LLMs include semantic search, document processing, text summarization, and generation tasks involving unstructured data.
When starting with LLMs, it's essential to set realistic expectations, start small, and quantify the potential benefits to justify the investment.
Cost drivers for using LLMs in the enterprise include computing costs (GPU/CPU), model size, engineering efforts, compliance/legal costs, and token usage optimization.
Options for deploying LLMs include using cloud APIs, self-hosting open-source models on-premises, or hybrid approaches based on data sensitivity. Titan Takeoff enables self-hosting and hybrid approaches without any additional overhead.
Banking and financial services are promising industries for LLM adoption due to the large volumes of unstructured data like research reports, contracts, and communications.

Recent advancements in generative AI, particularly large language models from the GPT family, have created immense excitement and opportunities across industries. CIOs of large enterprises understand this technology's potential to drive efficiencies, augment knowledge workers, and unlock new frontiers of innovation. However, they also recognize the complexities and challenges that come with adopting cutting-edge AI capabilities at an organizational scale.

One key lesson from our recent AI experts' discussion with Dataiku is setting realistic expectations. While the versatility of large language models is impressive, treating them as a silver bullet or a magic solution would be a mistake. We must approach them with the mindset that they are competent but imperfect "interns" who require guidance, oversight, and integration into our existing processes and systems.

Another crucial aspect is cost management. If not managed prudently, these models' compute requirements and data needs can quickly escalate costs. It's essential to quantify the potential benefits of each use case and weigh them against the investment required. The good news is that strategies like model compression, retrieval-augmented generation, and task-specific smaller models can help optimize costs without sacrificing performance.

When it comes to deployment options, we have several options. Cloud APIs from providers like OpenAI offer a low-friction entry point for experimentation. However, for sensitive data or enterprise-scale production deployments, self-hosting open-source models on-premises may be more suitable, albeit with additional engineering overhead. Hybrid approaches that combine the flexibility of cloud APIs with the control of self-hosting could also be explored. Positively, Titan Takeoff, our inference server gives users the freedom to choose between models, open-source and closed source, without any of the hassle, while staying within their secure premises and internal infrastructure.

Regardless of the deployment approach, a robust data and AI platform that integrates with existing systems and provides capabilities like cost tracking, security, and data pipelines is invaluable. Solutions like DataIku's LLM Mesh offer a comprehensive ecosystem to manage and orchestrate generative AI use cases within the enterprise.

As we roll out AI in the enterprise, starting small and iterating is crucial. Identify low-risk, high-impact use cases that involve processing and understanding large volumes of unstructured data, such as research reports, contracts, and communications. Industries like banking and financial services, which heavily rely on unstructured data, could be prime candidates for early adoption.

Using LLMs for Enterprise Use Cases: How Much Does It Really Cost?

Main takeaways

Footnotes

Table of contents:

Stop overpaying for inference.