This is a stub, please find the full documentation at https://pages.jlab.org/physdiv/ai-ml/llm-deployment-docs/

Overview

This document provides an introduction to open weights large language models (LLMs) we have deployed at Jefferson Lab. These services are designed to expose high-performance, GPU-accelerated large language models to internal users through authenticated and auditable interfaces. This will include access to paide commercial LLM vendors such as Google, OpenAI, and others.

The local LLM system consists of four major components:

LibreChat — The user-facing chat frontend authenticated through CILogon
LiteLLM API Gateway — A consolidated OpenAI-compatible inference gateway
Key Manager — A secure API-key issuance service used to issue API keys to users and bots
vLLM Workers — GPU-backed inference servers deployed on GPU nodes with dedicated Nvidia A100 hardware

Please refer to the full documentation in the link at the top of this page for details.

Accessing LLM tools at JLab (delphi,jlab.org)

Overview