Guide

Stream responses

MCP supports two response modes. Lumin uses JSON-only by default. Here is when that matters.

Streamable HTTP, JSON mode

Lumin runs the WebStandard StreamableHTTPServerTransport with enableJsonResponse: true. Each tool call returns a single JSON response over a single HTTP request. No long-lived connection, no server-sent events, no chunking.

When this matters

Most tools return in 200 to 800 ms. The longest single tool call isget_transit_advanced at around 2 seconds for a wide scan. That is well inside HTTP timeouts, so there is no reason to switch to SSE for individual calls.

For the seven-phase reading flow, the LLM client orchestrates 25 to 40 sequential JSON calls. Stream the LLM's output to your user while those tool calls happen. That is where the perceived latency goes.

If you need true streaming

Lumin's engine itself is request/response. If your downstream client expects SSE for the LLM layer, that is your client's responsibility. Lumin's tool calls remain HTTP request/response in the middle.

PreviousHandle rate limits NextTest locally