LLM Hosting

On our HPC infrastructure, we host multiple stand-alone Open Weight Large Language Models (LLMs). Access to these models is available through both an API and a web interface, providing a data-sovereign alternative to commercial providers such as OpenAI/ChatGPT and Meta.

This service enables users to process sensitive data securely, with our IT Center ensuring compliance with data protection regulations. We prioritize privacy by not storing any data or requests made during usage.

Note

Please note that this service is currently in the beta phase, and access conditions may change in the future.

Provided Models

For a list of provided models, please refer to Available Models.

Access for API Usage

RWTH Aachen University is a member of the German AI Service Center WestAI. To request access to our services and self-hosted Large Language Models (LLMs) through the API, please contact us at contact@westai.de. Ensure that you include "Access to LLMs at RWTH Aachen University" in the subject line of your email. We will generate an API key for you and send it via email.

Access to Web Interfaces

As we are currently in the beta phase, access to the web interfaces is subject to the following limitations:

Members of RWTH Aachen University can utilize KI:connect (RWTHgpt) and select one of the models designated as [Experimental]. In the long term, we are developing solutions to make KI:connect accessible to external users as well.
For the time being, other users for now need to request access to the LLM-Lab hosted by Fraunhofer, which also provides a user interface for accessing our self-hosted and other models.

API Compatability and Endpoint

All our models are accesible through an OpenAI-compatible API under:

URL: https://llm.hpc.itc.rwth-aachen.de/
Port: 443

Examples

The following example shows how to query available models with curl:

curl https://llm.hpc.itc.rwth-aachen.de/v1/models \
    -H "Authorization: Bearer YOUR-API-KEY"

The following examples show how to perform a simple completion:

curlOpenAI Python SDKLangchain Py

curl https://llm.hpc.itc.rwth-aachen.de/v1/completions \
    -H "Authorization: Bearer YOUR-API-KEY" \
    -H "Content-Type: application/json" \
    -d '{
          "model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506", 
          "prompt": "San Francisco is a", 
          "max_tokens": 300
        }'

import openai

client = openai.OpenAI(
    base_url="https://llm.hpc.itc.rwth-aachen.de",
    api_key="YOUR-API-KEY"
)

response = client.completions.create(
    model="mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    prompt="San Francisco is a",
    max_tokens=300
)

print(response)

from langchain_openai import ChatOpenAI, OpenAI

llm = OpenAI(
    base_url="https://llm.hpc.itc.rwth-aachen.de",
    api_key="YOUR-API-KEY",
    model="mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    max_tokens=300
)

prompt = "San Francisco is a"
result = llm.invoke(prompt)
print(result)

The following examples show how to perform a chat completion:

curlOpenAI Python SDKLangchain Py

curl -X POST https://llm.hpc.itc.rwth-aachen.de/v1/chat/completions \
    -H "Authorization: Bearer YOUR-API-KEY" \
    -H "Content-Type: application/json" \
    -d '{
          "model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
          "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "San Francisco is a"}
          ],
          "max_tokens": 300
        }'

import openai

client = openai.OpenAI(
    base_url="https://llm.hpc.itc.rwth-aachen.de",
    api_key="YOUR-API-KEY"
)

response = client.chat.completions.create(
    model="mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "San Francisco is a"}
    ],
    max_tokens=300
)

print(response)

from langchain_openai import ChatOpenAI, OpenAI

llm = ChatOpenAI(
    base_url="https://llm.hpc.itc.rwth-aachen.de",
    api_key="YOUR-API-KEY",
    model="mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    max_tokens=300
)

messages = [
    ("system", "You are a helpful assistant."),
    ("human", "San Francisco is a"),
]

result = llm.invoke(messages)
print(result.content)

More examples (e.g. for vLLM or Ollama) can be found in our Example Collection.

Quotas

At the moment, we have not established any limits on requests or tokens in order to better evaluate demand and utilization. However, should we encounter bottlenecks or an increase in demand or user activity, we reserve the right to set appropriate limits.