Skip to content

Commit 6481295

Browse files
authored
Merge pull request #3 from akvo/feature/2-seed-rag-web-ui-with-unep-documents
Feature/2 seed rag web UI with unep documents
2 parents 7437ec4 + 2be061a commit 6481295

File tree

13 files changed

+451
-9
lines changed

13 files changed

+451
-9
lines changed

.env.example

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,3 +65,13 @@ ACCESS_TOKEN_EXPIRE_MINUTES=10080
6565

6666
# Timezone settings (optional)
6767
TZ=Asia/Shanghai
68+
69+
# DATA PIPELINE
70+
RAG_USERNAME="rag_admin"
71+
RAG_PASSWORD="RAGadmin1"
72+
73+
# CUSTOM DEV ENV
74+
NGINX_PORT=80
75+
BACKEND_PORT=8000
76+
DB_PORT=3306
77+
CHROMADB_IMAGE_VERSION=latest

backend/app/api/api_v1/api_keys.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
router = APIRouter()
1212
logger = logging.getLogger(__name__)
1313

14-
@router.get("/", response_model=List[schemas.APIKey])
14+
@router.get("", response_model=List[schemas.APIKey])
1515
def read_api_keys(
1616
db: Session = Depends(get_db),
1717
skip: int = 0,
@@ -26,7 +26,7 @@ def read_api_keys(
2626
)
2727
return api_keys
2828

29-
@router.post("/", response_model=schemas.APIKey)
29+
@router.post("", response_model=schemas.APIKey)
3030
def create_api_key(
3131
*,
3232
db: Session = Depends(get_db),

backend/app/api/api_v1/chat.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
router = APIRouter()
2020

21-
@router.post("/", response_model=ChatResponse)
21+
@router.post("", response_model=ChatResponse)
2222
def create_chat(
2323
*,
2424
db: Session = Depends(get_db),
@@ -51,7 +51,7 @@ def create_chat(
5151
db.refresh(chat)
5252
return chat
5353

54-
@router.get("/", response_model=List[ChatResponse])
54+
@router.get("", response_model=List[ChatResponse])
5555
def get_chats(
5656
db: Session = Depends(get_db),
5757
current_user: User = Depends(get_current_user),

dev.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
#!/usr/bin/env bash
2+
3+
COMPOSE_HTTP_TIMEOUT=180 docker compose \
4+
-f docker-compose.dev.yml \
5+
"$@"

docker-compose.dev.yml

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ services:
22
nginx-dev:
33
image: nginx:alpine
44
ports:
5-
- "80:80"
5+
- "${NGINX_PORT}:80"
66
volumes:
77
- ./nginx.dev.conf:/etc/nginx/nginx.conf:ro
88
depends_on:
@@ -28,7 +28,7 @@ services:
2828
context: ./backend
2929
dockerfile: Dockerfile.dev
3030
ports:
31-
- "8000:8000"
31+
- "${BACKEND_PORT}:8000"
3232
env_file:
3333
- .env
3434
environment:
@@ -63,7 +63,7 @@ services:
6363
- WATCHPACK_POLLING=true
6464
- CHOKIDAR_USEPOLLING=true
6565
- NODE_ENV=development
66-
- NEXT_PUBLIC_API_URL=http://localhost/api
66+
- NEXT_PUBLIC_API_URL=http://localhost:${NGINX_PORT}/api
6767
ports:
6868
- "3000:3000"
6969
volumes:
@@ -89,14 +89,15 @@ services:
8989
- MYSQL_PASSWORD=ragwebui
9090
- TZ=Asia/Shanghai
9191
ports:
92-
- "3306:3306"
92+
- "${DB_PORT}:3306"
9393
volumes:
9494
- mysql_data:/var/lib/mysql
9595
networks:
9696
- app_network
9797

9898
chromadb:
99-
image: chromadb/chroma:latest
99+
image: chromadb/chroma:${CHROMADB_IMAGE_VERSION}
100+
platform: linux/amd64
100101
ports:
101102
- "8001:8000"
102103
volumes:
@@ -118,6 +119,22 @@ services:
118119
networks:
119120
- app_network
120121

122+
script:
123+
build:
124+
context: ./script
125+
dockerfile: Dockerfile
126+
depends_on:
127+
- backend
128+
environment:
129+
- RAG_USERNAME=${RAG_USERNAME}
130+
- RAG_PASSWORD=${RAG_PASSWORD}
131+
volumes:
132+
- ./script:/app
133+
tty: true
134+
command: ["tail", "-f", "/dev/null"]
135+
networks:
136+
- app_network
137+
121138
volumes:
122139
mysql_data:
123140
chroma_data:

script/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
./downloads

script/Dockerfile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
FROM python:3.11-slim
2+
3+
ENV PYTHONUNBUFFERED 1
4+
ENV PYTHONDONTWRITEBYTECODE 1
5+
6+
WORKDIR /app
7+
8+
COPY requirements.txt /app/requirements.txt
9+
RUN pip install --no-cache-dir -r /app/requirements.txt
10+
11+
COPY . /app
12+
13+
# Set up the command to keep the container running
14+
CMD ["tail", "-f", "/dev/null"]

script/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# 📚 Table of Contents
2+
3+
- [📚 Table of Contents](#-table-of-contents)
4+
- [🤖 UNEP Knowledge Base import script](#-unep-knowledge-base-import-script)
5+
- [🔐 Environment Variables](#-environment-variables)
6+
- [🚀 Running the Script](#-running-the-script)
7+
- [📁 Directory Structure](#-directory-structure)
8+
9+
---
10+
11+
# 🤖 UNEP Knowledge Base import script
12+
13+
This script automates the process of collecting, saving, and uploading PDF documents from [GlobalPlasticsHub](https://globalplasticshub.org) into a RAG (Retrieval-Augmented Generation) system.
14+
15+
This Python script supports three main operation modes:
16+
17+
1. **CSV Only** – Save PDF URLs to a CSV file.
18+
2. **CSV + Download** – Save URLs and download the corresponding PDFs.
19+
3. **Full Process** – Save URLs, download PDFs, and upload/process them in RAG.
20+
21+
## 🔐 Environment Variables
22+
23+
Before running the script, set RAG credentials in your shell or environment:
24+
25+
``` bash
26+
export RAG_USERNAME="rag_admin"
27+
export RAG_PASSWORD="RAGadmin1"
28+
```
29+
30+
## 🚀 Running the Script
31+
32+
To execute the script:
33+
34+
```bash
35+
./dev.sh exec script python -m kb_init_unep
36+
```
37+
38+
You will be prompted to:
39+
- Choose the operation mode:
40+
1: Save PDF URLs to CSV only.
41+
2: Save to CSV and download PDFs.
42+
3: Full process (CSV + download + upload to RAG).
43+
44+
- Enter the number of documents to import.
45+
- Provide a description for the RAG knowledge base.
46+
47+
## 📁 Directory Structure
48+
```bash
49+
./downloads/unep/unep_files.csv – Stores PDF URLs and offsets.
50+
./downloads/unep/ – Folder where downloaded PDF files are saved.
51+
```

script/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)