TL;DR
"Just the other day, I was deciding which set of LLM tools to use to build my company's upcoming customer support chatbot, and it was the easiest decision of my life!" - said no one ever 馃毄馃毄馃毄
It has been a while since gpt-4's release but still, it seems like every week a new open-source LLM framework is launched, each doing the same thing as its 50+ other competitors while desperately explaining how it is better than its predecessor. At the end of the day, what developers like yourself really want is some quick personal anecdotes to weigh out the pros and cons of each. 馃懆馃徎鈥嶐煉�/p>
So, as someone who played around with more than a dozen of open-source LLM tools, I'm going to tell you my top picks so you don't have to do the boring work yourself. 馃槍
Let's begin!
1. DeepEval - The LLM Evaluation Framework
DeepEval is the LLM tool that will help you quantify how well your LLM application, such as a customer support chatbot, is performing 馃帀
It takes top spot for two simple reasons:
- Evaluating and testing LLM performance is IMO the most important part of building an LLM application.
- It is the best LLM evaluation framework available, and it's open-source 馃挴
For those who might not be as familiar, LLM testing is hard because there are infinite possibilities in the responses an LLM can output.馃槦 DeepEval makes testing LLM applications, such as those built with LlamaIndex or LangChain, extremely easy by:
- Offers 14+ research backed evaluation metrics to test LLM applications built with literally any framework like LangChain.
- Simple to use, great docs, and intuitive to understand. Perfect for those just getting started, but also technical enough for experts to dive deep into this rabbit hole.
- Integrated with Pytest, include it in your CI/CD pipeline for deployment checks.
- Synthetic dataset generation - to help you get started with evaluation in case you don't have a dataset ready.
- LLM safety scanning - automatically scans for safety risks like your LLM app being bias, toxic, etc.
After testing, simply go back to the LLM tool used for building your application (which I'll reveal my pick later) to iterate on areas that need improvement. Here's a quick example to test for how relevant your LLM chatbot responses are:
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
test_case = LLMTestCase(
input="How many evaluation metrics does DeepEval offers?",
actual_output="14+ evaluation metrics"
)
metric = AnswerRelevancyMetric()
evaluate(test_cases=[test_case], metrics=[metric])
(DeepEval's humble mascot wants a star)
2. LlamaIndex - Data Framework for LLM applications
While DeepEval evaluates, LlamaIndex builds. LlamaIndex is a data framework specifically designed for integrating large language models (LLMs) with various data sources, particularly for applications involving retrieval-augmented generation (RAG).
For those who haven't heard of RAG, it is the programmatic equivalent of pasting some text into ChatGPT and asking some questions about it. RAG simply helps your LLM application to be aware of context it is not aware of through the process of retrieval, and LlamaIndex makes this extremely easy.
You see, a big problem in RAG is connecting to data sources and parsing unstructured data (like tables in PDFs) from them. It's not hard, but extremely tedious to build out.
Here's an example of how you can use LlamaIndex to build a customer support chatbot to answer questions on your private data:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Some question about the data should go here")
print(response)
3. Ollama - Get up and running with large language models
Evaluating and building is important, but what about data privacy?
Ollama is an interesting one because it unlocks LLMs to be used locally. It allows users to run, customize, and interact with LLMs directly on their own hardware, which can improve privacy, reduce dependency on cloud providers, and optimize latency for certain use cases. Ollama streamlines working with open-source LLMs, making them more accessible and manageable for individuals and organizations without needing extensive machine learning expertise or cloud infrastructure.
For instance, using Ollama, you might load a model for customer support automation that runs locally on company servers. This setup keeps customer data private and may reduce response latency compared to a cloud-based setup. Ollama is also suitable for experimentation with open-source LLMs, like fine-tuning models for specific tasks or integrating them into larger applications without relying on external cloud services.
# List available models
ollama list
# Run a model with a prompt (for example, running a GPT-4-like model named `gpt4-all`)
ollama run gpt4-all -p "Explain the benefits of using DSPy."
4. Guidance
Guidance is a framework designed to help developers craft dynamic, efficient prompts for large language models (LLMs). Unlike traditional prompt engineering, which often relies on fixed templates, Guidance allows prompts to be dynamically constructed, leveraging control structures like loops and conditionals directly within the prompt. This flexibility makes it especially useful for generating responses that require complex logic or customized outputs.
A simple example is, customer Support Bots: Use conditionals to create prompts that adapt based on the customer鈥檚 question, providing personalized responses while maintaining consistency in tone and style instead of manual prompting.
import guidance
# Initialize the Guidance model (e.g., OpenAI or another model API)
gpt = guidance("gpt-3.5-turbo") # You can specify another model if available
# Define the dynamic prompt with Guidance
prompt = guidance("""
{{#if summary}}
Please provide a brief summary of the topic: {{topic}}.
{{else}}
Provide a detailed explanation of the topic: {{topic}}, covering all relevant details.
{{/if}}
""")
# Set up input parameters
params = {
"topic": "Machine Learning",
"summary": True # Toggle between True for summary or False for detailed response
}
# Run the prompt
response = prompt(params)
print(response)
5. DSPy - Algorithmically optimize LM prompts and weights
DSPy is designed to simplify the process of building applications that use LLMs, like those from OpenAI or Hugging Face. It makes it easier to manage how these models respond to inputs without needing to constantly adjust prompts or settings manually.
The benefit of DSPy is that it simplifies and speeds up application development with large language models by separating logic from prompts, automating prompt tuning, and enabling flexible model switching. This means developers can focus on defining tasks rather than on technical details, making it easier to achieve reliable and consistent results.
However, I've personally found DSPy hard to get started with, hence why it is the lowest on the list than the others.
So there you have it, the list of top LLM open-source trending tools and frameworks on Github you should definitely use to build your next LLM application. Think there's something I've missed? Comment below to let me know!
Thank you for reading, and till next time 馃槉
Top comments (11)
Never heard of DeepEval and Guidance
Thanks for an amazing article!
You're welcome anytime!
Why not LangChian?
I feel like langchain vs llamaindex is a close one and I didn't want to put both. Chose llamaindex since I found it easier to use and less bugs throughout development, but both is great!
Nice read, saved.
Thanks!
Very helpful
Thank you
Thanks for reading too!
Bookmarked! Thank you sharing 馃檹
So good. Thanks for.
Fantastic summary.