The BalusC Code: Real-time AI, Your Way

Update: OmniAI has been renamed to OmniHai. See the 1.0 release post for details.

~~OmniAI~~ OmniHai 1.0-M2 is here. This milestone brings streaming support for real-time chat experiences and custom handlers for ultimate flexibility.

Since the first milestone, I've not only added 2 new built-in AI providers, Mistral and Hugging Face, but I've also been working on two major features that were on my roadmap: streaming responses and the ability to customize how OmniHai interacts with AI providers. Let's dive in.

Streaming: Token by Token

Remember those "..." typing indicators while waiting for AI to think? With streaming, your users can now watch the response appear in real-time, token by token. This isn't just a nicer UX, it makes your application feel alive. You can use AIService#chatStream() to achieve this.

service.chatStream(message, token -> {
    System.out.print(token); // Called for each token.
}).exceptionally(e -> {
    System.out.println("\n\nError: " + e.getMessage()); // Handle error.
    return null;
}).thenRun(() -> {
    System.out.println("\n\n"); // Handle completion.
});

Under the hood, ~~OmniAI~~ OmniHai uses Server-Sent Events (SSE) to receive the stream from the AI provider. Each token triggers your callback, and you can display it immediately. No buffering, no waiting.

Streaming works with OpenAI, Anthropic, Google AI, xAI, and other providers extending from OpenAI. You can check support programmatically with AIService#supportsStreaming().

Custom Handlers: Your API, Your Rules

Every AI provider has its quirks. Maybe you need to add custom headers, track usage metrics, or parse responses differently. 1.0-M2 has extracted all handlers from the AI service implementations into common interfaces and reusable base implementations, allowing a clean way to customize how requests are built and responses are parsed.

There are two handler types:

AITextHandler: for chat, translation, moderation, and all text operations
AIImageHandler: for image analysis and generation

Here's a simple example that adds user tracking to every OpenAI request:

public class TrackingTextHandler extends OpenAITextHandler {

    @Override
    public JsonObject buildChatPayload(AIService service, ChatInput input, ChatOptions options, boolean streaming) {
        var payload = super.buildChatPayload(service, input, options, streaming);

        return Json.createObjectBuilder(payload)
            .add("user", getCurrentUserIdHash())
            .add("metadata", Json.createObjectBuilder()
                .add("session", getCurrentSessionIdHash()))
            .build();
    }
}

Wire it up with CDI:

@Inject
@AI(provider = OPENAI, apiKey = "#{keys.openai}", textHandler = TrackingTextHandler.class)
private AIService trackedService;

Or programmatically (note that a null handler in the strategy will let the service fall back to the provider's default one):

AIStrategy strategy = (new AIStrategy(TrackingTextHandler.class, null);
AIService service = AIConfig.of(AIProvider.OPENAI, apiKey).withStrategy(strategy).createService();

Handlers give you full control over:

Request payload construction
Response parsing (including custom JSON paths)
Streaming event processing
System prompt templates for summarization, translation, etc.

Each provider has a built-in handler (OpenAITextHandler, AnthropicAITextHandler, GoogleAITextHandler, etc.) that you can extend to override only what you need.

Installation

The second milestone is already available at Maven and it's only 110 KB (e.g. LangChain4J is well over 2MB!):

<dependency>
    <groupId>org.omnifaces</groupId>
    <artifactId>omniai</artifactId>
    <version>1.0-M2</version>
</dependency>

It only requires Jakarta JSON-P and optionally Jakarta CDI and Jakarta EL as dependencies, which are readily available on any Jakarta EE compatible runtime.

In case you're using a non-Jakarta EE runtime, such as Tomcat, you'll have to manually provide JSON-P and CDI implementations.

Demo

Here's a minimal Jakarta Faces based "Chat with AI!" demo which extends the previous demo with the new streaming feature.

The session scoped backing bean, modified to use chat streaming:

src/main/java/com/example/Chat.java

package com.example;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;

import jakarta.enterprise.context.SessionScoped;
import jakarta.faces.push.Push;
import jakarta.faces.push.PushContext;
import jakarta.inject.Inject;
import jakarta.inject.Named;
import jakarta.json.Json;

import org.omnifaces.ai.AIService;
import org.omnifaces.ai.cdi.AI;

@Named
@SessionScoped
public class Chat implements Serializable {

    private static final long serialVersionUID = 1L;

    public record Message(Type type, String content, String id) implements Serializable {
        public enum Type {
            sent, received, stream;
        }

        public String toJson() {
            return Json.createObjectBuilder().add("type", type.name()).add("content", content).add("id", id()).build().toString();
        }
    };

    @Inject @AI(apiKey = "your-openai-api-key") // Get a free one here: https://platform.openai.com/api-keys
    private AIService gpt;

    @Inject @Push
    private PushContext push;

    private String message;
    private List<Message> messages = new ArrayList<>();

    public void onload() { // Any ungrouped/aborted stream events need to be collapsed in case page is refreshed; this is not necessary if bean is view scoped instead of session scoped.
        var grouped = messages.stream().collect(groupingBy(Message::id, LinkedHashMap::new, toList()));

        for (var entry : grouped.entrySet()) {
            if (entry.getValue().size() > 1) {
                messages.removeAll(entry.getValue());
                addMessage(Message.Type.received, concatContent(entry.getValue()), entry.getKey());
            }
        }
    }

    public void send() {
        addMessage(Message.Type.sent, message, UUID.randomUUID().toString());

        var id = UUID.randomUUID().toString();

        gpt.chatStream(message, token -> {
            addMessage(Message.Type.stream, token, id);
        }).exceptionally(e -> {
            addMessage(Message.Type.stream, "[response aborted, please retry]", id);
            e.printStackTrace();
            return null;
        }).thenRun(() -> {
            var streamed = messages.stream().filter(m -> id.equals(m.id())).toList();
            messages.removeAll(streamed);

            if (streamed.isEmpty()) {
                addMessage(Message.Type.received, "[no response]", id);
            } else {
                addMessage(Message.Type.received, concatContent(streamed), id);
            }
        });

        message = null;
    }

    private static String concatContent(List<Message> messages) {
        return messages.stream().map(Message::content).collect(joining()).strip();
    }

    private void addMessage(Message.Type type, String content) {
        var message = new Message(type, content);
        messages.add(message);
        push.send(message.toJson());
    }

    public String getMessage() {
        return message;
    }

    public void setMessage(String message) {
        this.message = message;
    }

    public List<Message> getMessages() {
        return messages;
    }
}

The simple XHTML (slightly modified to add the <f:viewAction> and id to the message <div>):

src/main/webapp/chat.xhtml

<!DOCTYPE html>
<html lang="en"
    xmlns:f="jakarta.faces.core"
    xmlns:h="jakarta.faces.html"
    xmlns:ui="jakarta.faces.facelets"
    xmlns:pt="jakarta.faces.passthrough"
>
    <f:metadata>
        <f:viewAction action="#{chat.onload}" />
    </f:metadata>
    <h:head>
        <title>Chat with AI!</title>
        <h:outputStylesheet name="chat.css" />
        <h:outputScript name="chat.js" />
    </h:head>
    <h:body>
        <h:form id="form">
            <h:inputTextarea id="message" value="#{chat.message}" required="true" pt:placeholder="Ask anything" pt:autofocus="true" />
            <h:commandButton id="send" value="Send" action="#{chat.send}">
                <f:ajax execute="@form" render="message" onevent="chat.onsend" />
            </h:commandButton>
        </h:form>
        <h:panelGroup id="chat" layout="block">
            <ui:repeat value="#{chat.messages}" var="message">
                <div id="id#{message.id()}" class="message #{message.type()}">#{message.content()}</div>
            </ui:repeat>
            <script>chat.scrollToBottom();</script>
        </h:panelGroup>
        <h:form id="websocket">
            <f:websocket channel="push" scope="session" onmessage="chat.onmessage" />
        </h:form>
    </h:body>
</html>

The quick'n'dirty CSS (still the same):

src/main/webapp/resources/chat.css

:root {
    --width: 500px;
}
body {
    font-family: sans-serif;
    width: var(--width);
    margin: 0 auto;
}
#form {
    position: absolute; bottom: 0;
    display: flex; gap: 1em; 
    width: calc(var(--width) - 1em);
    margin: 1em 0; padding: 1em;
    border-radius: 1em; box-shadow: 0 0 1em 0 #aaa;
}
#form textarea {
    flex: 1;
    height: 4em;
    padding: .75em;
    resize: none;
}
#form textarea, #form input {
    border: 1px solid #ccc; border-radius: .75em;
} 
#chat {
    display: flex; flex-direction: column; gap: 1em;
    max-height: calc(100vh - 10em); overflow: auto;
    padding: 1em;
}
#chat .message {
    width: 66%;
    padding: 1em;
    border-radius: 1em;
    white-space: pre-wrap;
}
#chat .message.sent {
    align-self: flex-end;
    border: 1px solid #aca;
}
#chat .message.received {
    border: 1px solid #aac;
}
#chat .progress {
    min-height: 1em;
}
#chat .progress::after {
    content: "";
    animation: dots 1.5s steps(4, end) infinite;
}
@keyframes dots {
    0%  { content: ""; }
    25% { content: "."; }
    50% { content: ".."; }
    75% { content: "..."; }
}

The jQuery-less JS (only the onmessage has been rewritten to support chat streaming):

src/main/webapp/resources/chat.js

window.chat = {
    onsend: (event) => {
        if (event.status == "success") {
            document.getElementById("form:message").focus();
        }
    },
    onmessage: (json) => {
        chat.hideProgress();
        const message = JSON.parse(json);
        let div = document.querySelector(`#id${message.id}`);
        if (!div) {
            div = document.createElement("div");
            div.id = `id${message.id}`;
            div.className = `message ${message.type === "stream" ? "received" : message.type}`;
            document.getElementById("chat").appendChild(div);
        }
        if (message.type == "stream") {
            div.textContent += message.content;
        } else {
            div.textContent = message.content;
        }
        if (message.type == "sent") {
            chat.showProgress();
        }
        chat.scrollToBottom();
    },
    showProgress: () => {
        if (!document.querySelector(".progress")) {
            document.getElementById("chat").insertAdjacentHTML("beforeend", '<div class="progress"></div>');
            chat.scrollToBottom();
        }
    },
    hideProgress: () => {
        document.querySelectorAll(".progress").forEach(el => el.remove());
    },
    scrollToBottom: () => {
        const chat = document.getElementById("chat");
        chat.scrollTo({
            top: chat.scrollHeight,
            behavior: "smooth"
        });
    }
};

Don't forget to create a (empty) src/main/webapp/WEB-INF/beans.xml file and enable websocket endpoint in web.xml:

src/main/webapp/WEB-INF/web.xml

<context-param>
    <param-name>jakarta.faces.ENABLE_WEBSOCKET_ENDPOINT</param-name>
    <param-value>true</param-value>
</context-param>

Now your chat shows the AI "typing" in real-time rather than appearing all at once after a long wait.

Real APIs, Real Tests

How do you test a library that talks to external AI services? You test it against the real thing.

OmniHai's integration tests hit actual AI provider APIs. No mocks. No fakes. When OpenAI, Anthropic, or Google changes something, we know immediately.

As you can see in OpenAIServiceTextHandlerIT example,

@EnabledIfEnvironmentVariable(named = OpenAIServiceTextHandlerIT.API_KEY_ENV_NAME, matches = ".+")
class OpenAIServiceTextHandlerIT extends BaseAIServiceTextHandlerIT {

    protected static final String API_KEY_ENV_NAME = "OPENAI_API_KEY";

    @Override
    protected AIProvider getProvider() {
        return AIProvider.OPENAI;
    }

    @Override
    protected String getApiKeyEnvName() {
        return API_KEY_ENV_NAME;
    }
}

... these tests only run when API keys are present as environment variables. This keeps CI clean for contributors without keys while allowing full validation when needed. See also DEVELOPERS.md how to configure these keys.

One challenge with real API testing is rate limits. Hit one, and your entire test suite might fail. OmniHai handles this with a custom JUnit extension, the FailFastOnRateLimitExtension:

public class FailFastOnRateLimitExtension implements BeforeEachCallback, TestExecutionExceptionHandler {

    private static final ConcurrentMap<AIProvider, AtomicBoolean> RATE_LIMIT_HITS = new ConcurrentHashMap<>();

    @Override
    public void beforeEach(ExtensionContext context) throws Exception {
        var provider = getProvider(context);
        if (RATE_LIMIT_HITS.computeIfAbsent(provider, p -> new AtomicBoolean(false)).get()) {
            throw new TestAbortedException("Rate limit hit for " + provider + "; skipping remaining tests for this provider, we better retry later.");
        }
    }

    @Override
    public void handleTestExecutionException(ExtensionContext context, Throwable throwable) throws Throwable {
        if (throwable instanceof AIRateLimitExceededException) {
            RATE_LIMIT_HITS.computeIfAbsent(getProvider(context), p -> new AtomicBoolean(false)).set(true);
        }
        throw throwable;
    }

    private static AIProvider getProvider(ExtensionContext context) {
        if (!(context.getRequiredTestInstance() instanceof AIServiceIT instance)) {
            throw new IllegalStateException("FailFastOnRateLimitExtension must be used on subclasses of AIServiceIT");
        }
        return instance.getProvider();
    }
}

Once a rate limit is hit for a provider, remaining tests for that provider are skipped. No wasted quota, no DOS, no cascading failures.

Tests also include a 1-second delay between calls and verify actual response content, not just that an API was called, but that translations preserve markup, that language detection returns correct ISO codes, and that summarizations stay within word limits.

Give it a try

As always, feedback and contributions are welcome on GitHub!

The BalusC Code

Wednesday, January 28, 2026

Real-time AI, Your Way

Streaming: Token by Token

Custom Handlers: Your API, Your Rules

Installation

Demo

Real APIs, Real Tests

Give it a try

No comments:

About

Tags

Articles