Scalable question answering on long documents using a divide-and-conquer pipeline that turns documents into structured tables, performs data reconciliation, and answers via direct reasoning or SQL over DuckDB.
- Modular systems:
slidersagent, direct LLM baselines, and sequential chunking - Pluggable components: schema generation, extraction, merging, and answering
- Reconciliation is required: independent chunk extractions are normalized and consolidated via an LLM-based agent that emits declarative SQL over entity tables; this step is foundational for downstream reasoning and transparent queries
- Config-driven experiments (FinanceBench, Loong, BabiLong)
- Built-in logging, caching, and result summaries
# Requires Python >= 3.11 and uv (https://docs.astral.sh/uv/)
uv venv # create a virtual environment
uv sync # install dependencies from pyproject.tomlOptional system deps:
- Redis (for zero-temp LLM response caching):
sudo apt-get install redis-serverthensudo service redis-server start
Set Azure OpenAI credentials (or compatible) in .env at the repo root:
AZURE_OPENAI_API_KEY=... # required
AZARE_URL_ENDPOINT=... # e.g., https://<your-endpoint>.openai.azure.com/The library auto-loads .env via sliders/globals.py and initializes prompt templates.
The experiment configs reference local dataset paths. Update them to match your environment:
- FinanceBench markdowns and JSONL
- Loong benchmark JSONL and docs directory
- BabiLong generated JSON
See: configs/*.yaml and the docs below for details.
# Run with a config (recommended)
uv run sliders/runner.py --config configs/finance_bench_sliders_agent.yaml
# Run in parallel (async per-question)
uv run sliders/runner.py --config configs/loong_sliders.yaml --parallelOutputs are written to SLIDERS_RESULTS directory. Set it before running, for example:
export SLIDERS_RESULTS="$(pwd)/results" && mkdir -p "$SLIDERS_RESULTS"Experiments are driven by YAML configs with three main sections:
experiment: which benchmark to run, e.g.,finance_bench | loong | babilongsystem: which system to use, e.g.,sliders | direct_tool_use | direct_no_tool_use | sequential | rlmsystem_config,experiment_config,output_file: component-level knobs and I/O
Example: configs/finance_bench_sliders_agent.yaml
experiment: finance_bench
system: sliders
system_config:
generate_task_guidelines: false
generate_schema:
add_extra_information_class: false
extract_schema:
decompose_fields: false
dedupe_merged_rows: false
num_samples_per_chunk: 1
merge_tables:
merge_strategy: seq_agent # or objectives_based
models:
answer: { model: gpt-4.1, max_tokens: 8192, temperature: 0.0 }
# ... other model roles ...
experiment_config:
benchmark_path: /path/to/financebench.jsonl
files_dir: /path/to/financebench/markdown/pdfs/
soft_evaluator_model: gpt-4.1
hard_evaluator_model: gpt-4.1
num_questions: null
random_state: 42
document_config: { chunk_size: 16000, overlap_size: 0 }
output_file: finance_bench_sliders.jsonSee the full configuration reference in the docs.
slidersagent: full pipeline- Schema generation (
sliders/modules/generate_schema.py) - Schema-based extraction (
sliders/modules/extract_schema.py) - Table merging (
sliders/modules/merge_schema.py, strategies inmodules/merge_techniques/) - Answering: direct or SQL over DuckDB
- Schema generation (
Note: Reconciliation/merging is a crucial, non-optional stage when running the sliders system. It consolidates partial, fragmented, or conflicting values into a consistent database-style representation that downstream reasoning depends on.
- Baselines (
sliders/baselines.py)direct_no_tool_use: prompt-onlydirect_tool_use: ReAct-style with tools (e.g., Python)sequential: chunk-by-chunk stopping when answer found
- Logging: prompts, tool calls, and stages are logged via
sliders/callbacks/logging.py. - Caching: if temperature is
0.0and Redis is available, identical calls are cached (sliders/llm/llm.py). - Results: a JSON is emitted to
SLIDERS_RESULTS/<output_file>_<timestamp>.jsonwith per-question metadata and accuracy summary.
- Getting started and configuration reference: see
docs/CONFIG.md - Architecture and pipeline details: see
docs/ARCHITECTURE.md - Benchmark-specific guidance: see
docs/EXPERIMENTS.md - Development notes (env, caching, logging): see
docs/DEVELOPMENT.md
For deeper usage and customization, continue in the docs folder.
To run on your own dataset, see the "Use your own dataset" section in docs/EXPERIMENTS.md.
If you use SLIDERS in your research or production, please cite the repository (paper coming soon):
@misc{sliders2025,
title = {SLIDERS: Scalable Question Answering on Long Documents with Divide-and-Conquer},
author = {Harshit Joshi, Jadelynn Dao, Monica S. Lam},
year = {2025},
howpublished = {GitHub repository},
url = {https://github.com/stanford-oval/sliders}
}Beware: Documentation is generated using LLMs and may contain errors.
