In recent months, generative AI models and large language models (LLMs) like OpenAI's GPT-4 have led to massive improvements in natural language processing (NLP) capabilities. With their ability to understand language context and generate human-like text, these AI models have enabled new opportunities for businesses to gain insights from their textual data. However, relying solely on LLMs also comes with significant downsides, including high costs, security risks, and negative environmental impact.
At Lettria, we work with businesses across industries to build custom NLP solutions. While we’re integrating LLMs into some stages of our pipeline, such as data cleaning and enrichment, we believe developing your own models tailored to your needs is the most sustainable approach for most companies. Our platform uses a hybrid methodology, designed to leverage LLMs and generative AI models where they excel, while focusing our resources on building specialized models for core analysis and prediction tasks.
For example, one of our clients had never built an NLP model before, their team lacked the technical expertise and resources to do so from scratch. They planned to rely on a pre-built LLM but found the costs of running it at scale with their data volume would put them thousands of dollars over budget. By using our platform, they were able to leverage an LLM to label and enrich a subset of their data before training their own model for sentiment analysis, investing hundreds of dollars less than they expected. This hybrid approach gave them the best of both worlds: the power of LLMs with the affordability and control of building their own model.
LLMs for Text Analysis
Key Issues With These AI Models
LLMs have enabled huge leaps forward for NLP, but they also come with significant downsides:
- High costs: LLMs require enormous computational power to train and run, which translates to high costs for hardware, software, and environmental impact. For smaller businesses, these costs can be a barrier to accessing NLP capabilities.
- Data privacy and security risks: LLMs are trained on massive datasets that can contain sensitive information, making them vulnerable to data privacy leaks and adversarial attacks. There is also a risk of unwanted bias or flawed logic in their outputs.
- Negative environmental impact: The computing required for LLMs relies heavily on fossil fuels and puts a strain on energy resources. Training a single LLM can emit as much CO2 as driving a car for years.
Build your own NLP
Why Build Your Own NLP?
At Lettria, we believe building your own NLP models tailored to your needs is the most sustainable, cost-effective approach for most businesses. Developing specialized models in-house provides several key benefits:
- Control and transparency: You have full control and visibility into how your models work, using data and methodologies tailored to your needs. This enables explainability and reduces unwanted bias.
- Data privacy and security: Keeping models and data in-house limits exposure of sensitive information. You can also implement robust security practices and controls tailored to your needs. Deploying on private infrastructure rather than relying on public cloud services gives you more control over data privacy and governance.
- Cost savings: Developing focused models requires fewer computational resources than running a broad pre-built LLM, reducing costs for hardware, software, and environmental impact.
- Flexibility: Custom models can be adapted and retrained as your needs evolve, ensuring they continue to meet your business objectives in an optimal way.
- Explainability: Custom models provide insights into their accuracy on each label or category, enabling explainability. Switching to a supervised learning model gives you a clearly quantified accuracy rate for each classification. LLMs offer more limited explainability into their outputs and predictions.
- Data Privacy: Sensitive data can remain in-house rather than relying on public cloud infrastructure. Deploying models on your own private cloud or servers provides full control and compliance with regulations like GDPR.
- Explainability: Quantifying the Accuracy of Your Results For many NLP use cases, especially those involving sensitive data or decisions, explainability is crucial. When using a pre-built LLM, it can be difficult to determine exactly how accurate its outputs are for each category or classification.
Custom models trained on your data, however, provide clear accuracy metrics for each label. Switching to a supervised learning model gives you a quantified accuracy rate for each result, enabling explainability. For sensitive use cases, this is essential to understanding the reliability and limitations of your model's predictions.
The Lettria platform provides accuracy metrics and insights that enable you to understand exactly how your models are performing for each classification task. This explainability allows for more informed, responsible development of machine learning capabilities. Our hybrid methodology balances pre-built LLMs where they work well with supervised models for core tasks requiring high explainability.
Data Privacy: Keeping Full Control of Your Information
Large language models require huge amounts of data to train and run, raising valid concerns over privacy and data governance. When relying on public cloud infrastructure and services, there is always a possibility of data leaks or unwanted access to information, whether through hacking or issues with providers.
For many organizations, especially those in highly regulated industries like healthcare and finance, maintaining full control over data privacy and compliance is essential. Deploying models on private infrastructure rather than relying on public cloud services provides more control, security, and governance tailored to your needs.
The Lettria platform provides accuracy metrics and insights that enable you to understand exactly how your models are performing for each classification task. This explainability allows for more informed, responsible development of machine learning capabilities. Our hybrid methodology balances pre-built LLMs where they work best alongside supervised models for core tasks requiring high explainability.