1. Introduction

A friend of mine regularly runs into legal questions at work. These questions come from his clients. The thing is, he doesn't have a legal background, and it's not really his job to deal with these issues. But navigating laws and their connections gets messy fast. And if it's not your core job, there's almost no way to build up the experience to handle them confidently.

That's why he, his team, and probably others rely on a lawyer to get answers. As you can imagine, these lawyers have a heavy workload, and sometimes it takes weeks to get a response.

When he told me about it, I figured this could be a great opportunity to dive into RAG (Retrieval-Augmented Generation) and build an MVP. So I started working on a RAG chatbot. This post is the first iteration, and there will likely be more as the app evolves and becomes more sophisticated.

2. What is Retrieval-Augmented Generation (RAG)?

LLM's like ChatGPT have a vast amount of general knowledge. They can help you with everyday tasks in almost every domain. Regarding very specific and detailed knowledge they might have seen before but might have "problems" to remember. I imagine this has having kind of a "foggy mind" where they know the general direction but not every detail of it. In legal documents like laws these might be e.g. the number of a relevant article or the exact wording of a relevant passage.

This "foggy mind" shortcoming can be overcome with Retrieval-Augmented Generation (RAG). So the LLM does not have to answer out of memory but has a storage with the relevant documents and looks the information up. RAG is a hybrid architecture that has two stages:

Retriever: Given a query from a user the retriever searches the (document) storage to get the most relevant context (often via vector similarity using embeddings).
Generator: The retrieved context from stage 1 then is passed to an LLM, which uses it to base their answer on it.

3. Frameworks

I wanted to get something up and running as fast and simple as possible. This is why i went for a python only stack and no cloud database. This is what i used:

Streamlit - Streamlit is a fast and light python frontend framework. It lets you easily generate a simple UI. It is mostly used to showcase early MVP's or internal Dashboards. It lacks a lot of configuration options and SEO possibilities, but for a fast iteration without overhead it is perfect.
ChromaDB - ChromaDB is an open source vector database on top of Sqlite. It also writes in a single file and can therefore easily be selfhosted on a vps (check this blogpost to see how to set up a vps). It is designed for storing and querying embeddings.
OpenAI API - Well, it is just easiest and for an MVP i did not bother setting up a local LLM. The OpenAI API is used as the main LLM as well as for the Retriever and Generator.

4. Architecture

This is how the different pieces fit together:

How it works:

The user asks a question in the Streamlit App
The app retrieves relevant law passages and articles form ChromaDB
The app sends the context and question to OpenAI

LangChain orchestrates the workflow:
- It takes the user's question and retrieved context
- Formats them into a prompt using a template
- Sendds this prompt to OpenAI
- Parses and returns the answer

The LLM generates an answer, citing sources
The answer and sources are shown to the user

5. Challenges & Learnings

Since i wanted to be able to cite the exact article from a certain law, i had to store the articles as chunks. Where the chunks would be too long i had to separate the chunk with the same article. Storing this together with the law as metadata in ChromaDB was initially not trivial. Especially having different laws and different formats had to make me different chunking functions.

Building such applications has gotten much easier in the age of coding agents and IDE's (i only use cursor these days). But putting together different frameworks and libraries for this RAG app did not work by only prompting. Especially the metada storing in ChromaDB (which references the law and article number) was a struggle in the beginning. But once you go oldschool and visit the docs you are good to go.

See Fast Private Streamlit Sharing with Systemd for a guide on how to run the app as a systemd service on a VPS and share it with others.

Want to build something similar for your team or business? Contact me