From Zero to AI Hero: Create Your Custom Chatbot with LlamaIndex
LlamaIndex is an open-source framework that lets you connect data sources to large language models (LLMs). It’s used to build applications like chatbots and knowledge agents.
Llamaindex has the following features
- Data integration: Integrates data from a variety of sources, including vector stores, document stores, graph stores, and SQL databases
- Querying: Orchestrates workflows for querying data, including prompt chains, advanced RAG, and agents
- Performance evaluation: Measures retrieval and LLM response quality
- Agent architecture: Breaks down complex questions, plans out tasks, and calls APIs
Now, let us try Llamaindex on our sample data.
I have exported my LinkedIn resume as a pdf & we will use it as a sample input data to query.
Open google colab from here & create new notebook
Let us write python code to query the resume. We will use following packages -
- llama-index to get access to its querying functions
- openai to send the queries to an LLM model
- pypdf to interact with the pdf files
!pip install llama-index openai pypdf
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"
You need to create an index on this input data
from llama_index.core import TreeIndex, SimpleDirectoryReader
resume = SimpleDirectoryReader("Private-Data").load_data()
new_index = TreeIndex.from_documents(resume)
Now, we are ready to query resume using query engine function
query_engine = new_index.as_query_engine()
response = query_engine.query("When did Ashish graduate?")
print(response)
print(query_engine.query("What certifications do Ashish have?"))
print(query_engine.query("What skills do Ashish have?"))
As you can see the model does a reasonably good job & provides accurate results till now.
Let us ask more questions but this time via chat engine
query_engine = new_index.as_chat_engine()
print(query_engine.chat("Ashish was in which company in 2020"))
print(query_engine.chat("After Schlumberger which companies did he work for?"))
As you can see from the output, the model hallucinates & provides in accurate answer. It should have mentioned Acquia.
This hands-on tutorial demonstrates a simple AI chatbot on your personal data using Llamaindex.
One can use a combination of Ollama & Mistral (instead of Open AI) to send the queries to a local llm model without a need for api key & rate limiting.
Please note that creation of index is a time consuming process which can greatly increase if the input data is huge. In such cases, you could consider creating it once & store it on the file system
new_index.storage_context.persist()
Once that is done, we can quickly load the storage context and create an index when it is needed
from llama_index import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
Thats it for this tutorial. If you liked my work, consider giving a few claps & follow me on linkedin for more such updates