Monday, February 23, 2026

OmniHai finds its voice

OmniHai 1.2 is out. After 1.1 gave the library ears and the ability to transcribe audio, 1.2 completes the picture by giving it a voice. Text-to-Speech (TTS) joins the API alongside some improvements around file handling and conversation history to make life easier.

Text-to-Speech

Audio generation is now a first-class citizen alongside audio transcription. The simplest form is a single generateAudio() method call that returns raw audio bytes:

byte[] audio = service.generateAudio("Welcome to OmniHai 1.2!");

If you want to stream the result directly to a file without buffering it in memory, pass a Path instead:

service.generateAudio("Welcome to OmniHai 1.2!", Path.of("/path/to/welcome.mp3"));

Of course also available as async method: generateAudioAsync().

So far, only OpenAI and Google AI are supported. Other providers are not supported simply because they do not offer any HTTP based API endpoint yet (Anthropic, Mistral, Meta AI, Azure, OpenRouter, Ollama), or do not offer an unified API which is compatible with all TTS models (HuggingFace), or only expose it through WebSocket streaming (xAI), which does not fit the request/response model OmniHai is built on. Support will be added as providers introduce one.

Customization is available through the new GenerateAudioOptions builder, which exposes voice, speed, and output format. Here's an example for OpenAI GPT:

var options = GenerateAudioOptions.newBuilder()
    .voice("nova")
    .speed(1.25)
    .outputFormat("opus")
    .build();

gpt.generateAudio("Welcome to OmniHai 1.2!", Path.of("/path/to/welcome.opus"), options);

One thing worth mentioning about Google AI: Gemini returns raw PCM audio rather than a proper audio container. OmniHai transparently adds a WAV header before handing back the result, so callers get consistent, playable audio regardless of which provider is used.

Transcribing from a Path

The existing transcription API only accepted byte arrays, which meant loading the entire audio file into memory before making the API call. In 1.2, transcribe() and transcribeAsync() now also accept a Path:

String transcript = service.transcribe(Path.of("/recordings/meeting.mp3"));

For providers that support a files API the file is streamed directly from disk, no intermediate copy and no heap pressure. The byte array overload from 1.1 still works exactly as before.

Path-backed File Attachments

The same improvement extends to chat attachments. ChatInput.Builder#attach() now accepts Path arguments alongside the existing byte array support:

var input = ChatInput.newBuilder()
    .message("Summarize this contract.")
    .attach(Path.of("/path/to/contract.pdf"))
    .build();

var summary = service.chat(input);

MIME type detection still reads only the magic bytes rather than assuming it based on the file extension, and the upload itself streams the content from disk. For large PDFs or images this is a meaningful difference, and it requires no change to calling code beyond passing a path instead of a byte array.

History Initialisation

Conversation memory has always lived in ChatOptions, but there was no way to seed it with a prior exchange. In 1.2 the ChatOptions.Builder gains a history() method that accepts an existing message list:

// At the end of a session, persist the history
List<Message> saved = options.getHistory();

// On the next session, restore it
var options = ChatOptions.newBuilder()
    .systemPrompt("You are a helpful assistant.")
    .withMemory()
    .history(saved)
    .build();

var response = service.chat("Where were we?", options);

This makes it straightforward to persist a conversation to a database, load it back on the next session, and hand it straight to the service without any manual message reconstruction.

Getting 1.2

Add the following dependency to your project and you are ready to go:

<dependency>
    <groupId>org.omnifaces</groupId>
    <artifactId>omnihai</artifactId>
    <version>1.2</version>
</dependency>

Feedback and contributions are welcome on the GitHub repository.

Thursday, February 12, 2026

OmniHai grows ears

OmniHai 1.1 is here! This release brings audio transcription, smarter conversation memory, automatic file cleanup, gzip compression, and a pile of hardening across the board.

If you missed the earlier posts: OmniHai is a lightweight Java utility library that provides a unified API across multiple AI providers for Jakarta EE and MicroProfile applications. Check out the introduction, streaming & custom handlers, and 1.0 release posts for the full backstory.

Here are the Maven coordinates:

<dependency>
    <groupId>org.omnifaces</groupId>
    <artifactId>omnihai</artifactId>
    <version>1.1</version>
</dependency>

Audio Transcription

OmniHai now transcribes audio. Just pass in the bytes:

byte[] audio = Files.readAllBytes(Path.of("meeting.wav"));
String transcription = service.transcribe(audio);

That's it. Supported formats include WAV, MP3, MP4, FLAC, AAC, AIFF, OGG, and WebM. The async variant transcribeAsync() is also available, like all other methods in AIService.

Providers with a native transcription API (OpenAI, Mistral, Hugging Face) use it directly for best accuracy. All other providers fall back to a chat-based approach where the audio is attached to a carefully crafted transcription prompt. This means transcription works everywhere, even on providers that don't have a dedicated endpoint for it. Integration tests are also caught up, and they all pass.

A new AIAudioHandler interface joins the existing AITextHandler and AIImageHandler for customization. The default handler produces a verbatim plain-text transcription, but you might want something different. For example: a medical or legal transcription handler that includes domain-specific terminology hints in the prompt, a handler that adds speaker labels and timestamps, or one that outputs SRT/VTT subtitle format instead of plain text. You can plug in your own via @AI(audioHandler = MyAudioHandler.class) or programmatically through AIStrategy. Speaking of which, AIStrategy now has convenient factory methods:

AIStrategy strategy = AIStrategy.of(MyTextHandler.class);
AIStrategy strategy = AIStrategy.of(MyTextHandler.class, MyImageHandler.class, MyAudioHandler.class);

Smarter Conversation Memory

As a reminder: OmniHai's conversation memory is fully caller-owned. There's no server-side session state, no database, no memory leaks, no lifecycle management to worry about. History lives in your ChatOptions instance, not in the service. You control it, you scope it, you discard it. This remains one of OmniHai's key design advantages.

In 1.0, memory kept everything. That's fine for short conversations, but eventually you'll hit the provider's context window. In 1.1, history is maintained as a sliding window with a default of 20 messages (10 conversational turns). Oldest messages are automatically evicted when the limit is exceeded:

ChatOptions options = ChatOptions.newBuilder()
    .withMemory(50) // Keep up to 50 messages (25 turns)
    .build();

File attachments are now tracked in history too. When you upload files in a memory-enabled chat, their references are preserved across turns so the AI can continue referencing previously uploaded documents:

ChatOptions options = ChatOptions.newBuilder()
    .withMemory()
    .build();

ChatInput input = ChatInput.newBuilder()
    .message("Analyze this PDF")
    .attach(Files.readAllBytes(Path.of("report.pdf")))
    .build();

String analysis = service.chat(input, options);
String followUp = service.chat("What's on page 2?", options); // AI still has access to the PDF

When messages slide out of the window, their associated file references are evicted as well. File tracking in history requires the AI provider to support a files API, which is currently the case for OpenAI(-compatible) providers, Anthropic, and Google AI.

Automatic File Cleanup

This one's a behind-the-scenes improvement that you don't have to think about, and that's the point ;) When you upload files via the chat API, they end up on the provider's servers. Some providers automatically clean up these after a day or two, or support expiration metadata, but there are providers which don't support expiration let alone automatic clean up. So uploaded files might accumulate forever and who knows what happens. OmniHai now handles this: uploaded files are automatically cleaned up in the background after 2 days in a fire-and-forget task. Only files uploaded by OmniHai are touched. No configuration needed.

By the way, the fire-and-forget task will automatically use the Jakarta EE container managed ExecutorService if available, or else the MicroProfile managed one, or else fall back to standard Java SE Executors.newSingleThreadExecutor (for e.g. Tomcat).

Gzip Compression

All HTTP responses from AI providers are now transparently decompressed when gzip-encoded. OmniHai sends Accept-Encoding: gzip on every request and handles the decompression automatically. This reduces bandwidth usage, which is particularly nice for those verbose JSON responses that AI providers love to return.

Under the Hood

Beyond the headline features, 1.1 includes a bunch of improvements:

  • ChatOptions#withSystemPrompt() creates a copy of existing options with a different system prompt, useful for reusing a base configuration across different use cases.
  • Hardened file upload handling across providers, especially for Mistral compatibility.
  • The OpenRouterAITextHandler was dropped entirely as improved file upload handling in the base OpenAITextHandler made it redundant.
  • Updated the default OpenAI model version.
  • Various javadoc fixes and additional unit/integration tests.

Give It a Try

As always, feedback and contributions are welcome on GitHub. If you run into any issues, open an issue. Pull requests are welcome too.

Wednesday, February 4, 2026

OmniHai 1.0 released!

After two milestones of a lightweight Java library providing one API across multiple AI providers, 1.0-M1: One API, any AI and 1.0-M2: Real-time AI, Your Way, today the library graduates to its first stable release. And comes with a new name: OmniHai.

Why "OmniHai"?

The rename from OmniAI to OmniHai was necessary because "OmniAI" was already used by several other products, making it difficult to discover, search for, and distinguish. The new name keeps "AI" clearly audible and visible, "Hai" sounds like AI, while being more memorable, more brandable, and actually findable on search engines. Also, "Hai" is Japanese for "yes", which felt fitting, one yes to any AI provider.

The Maven coordinates are now:

<dependency>
    <groupId>org.omnifaces</groupId>
    <artifactId>omnihai</artifactId>
    <version>1.0</version>
</dependency>

What's New in 1.0

Since the second milestone previous week, five major features were added: structured outputs, file attachments, conversation memory, proofreading, and MicroProfile Config support.

Structured Outputs

This is probably the most impactful addition. Instead of parsing AI responses as free-text strings, you can now get typed Java objects directly:

record ProductReview(String sentiment, int rating, List<String> pros, List<String> cons) {}

ProductReview review = service.chat("Analyze this review: " + reviewText, ProductReview.class);

Under the hood, OmniHai generates a JSON schema from your record (or bean) class, instructs the AI to return conforming JSON, and deserializes the response back. The JsonSchemaHelper supports primitive types, strings, enums, temporals, collections, arrays, maps, nested types, and Optional fields. You can if necessary also take manual control:

JsonObject schema = JsonSchemaHelper.buildJsonSchema(ProductReview.class);
ChatOptions options = ChatOptions.newBuilder().jsonSchema(schema).build();
String json = service.chat("Analyze this review: " + reviewText, options);
ProductReview review = JsonSchemaHelper.fromJson(json, ProductReview.class);

The content moderation internals were also refactored to use structured outputs, making ModerationResult parsing more robust across providers.

File Attachments

Chat input now supports attaching any file: images, PDFs, Word documents, audio, and more:

byte[] document = Files.readAllBytes(Path.of("report.pdf"));
byte[] image = Files.readAllBytes(Path.of("chart.png"));

ChatInput input = ChatInput.newBuilder()
    .message("Compare these files")
    .attach(document, image)
    .build();

String response = service.chat(input);

The in M2 introduced ChatInput.Builder#images(byte[]...) method is replaced by the more general attach(byte[]...) method that handles any file type the AI provider supports.

Conversation Memory

Multi-turn conversations are now a first-class feature. Enable memory on ChatOptions and OmniHai tracks the full conversation history for you:

ChatOptions options = ChatOptions.newBuilder()
    .systemPrompt("You are a helpful assistant.")
    .withMemory()
    .build();

String response1 = service.chat("My name is Bob.", options);
String response2 = service.chat("What is my name?", options); // AI remembers: "Bob"

// Access conversation history programmatically
List<Message> history = options.getHistory();

The key design decision here is that history lives in ChatOptions, not in the service. There is no server-side session state, no memory leaks, no lifecycle management. The caller owns the conversation. This aligns with the library's philosophy of being a utility for the AI developer (or framework), not a whole framework.

Proofreading

A small but useful addition: AI-powered grammar and spelling correction:

String corrected = service.proofread(text);

The AIService#proofread(String) uses a deterministic temperature to ensure consistent, reliable corrections while preserving the original meaning, tone, and style. Of course also available as proofreadAsync(String).

MicroProfile Config Support

Alongside the existing Jakarta EL expressions (#{...} and ${...}), the @AI qualifier now also resolves MicroProfile Config expressions ${config:...}:

@Inject
@AI(provider = AIProvider.OPENAI, apiKey = "${config:openai.api-key}")
private AIService gpt;

This makes OmniHai a natural fit not only for Jakarta EE runtimes, but also for MicroProfile runtimes such as Quarkus. On MicroProfile, secrets can live in microprofile-config.properties, environment variables, or any custom ConfigSource.

Under the Hood

Beyond the headline features, the 1.0 release includes:

  • DefaultAITextHandler and DefaultAIImageHandler replacing the previous abstract base classes, reducing boilerplate for custom providers
  • Improved Attachment model decoupled from OpenAI-specific assumptions
  • Comprehensive package-info Javadoc for all packages
  • Extensive unit tests (total 472, many generated with help of my assistant Claude) covering models, helpers, MIME detection, and expression resolvers
  • More integration tests (total 165), covering all text and image handling features of all 10 AI providers
  • Bug fixes and hardening based on those tests

Size

The library grew from about 70 KB in M1 to about 110 KB in M2 to about 155 KB in 1.0 final. Still at least 35x smaller than LangChain4J per provider module. The dependency story remains the same: only Jakarta JSON-P is required; CDI, EL, and MP Config are optional.

The Road Here

Three releases in roughly a month. The M1 established the core API with 8 providers. The M2 added chat streaming and custom handlers. This final release fills the remaining gaps for a production-ready library: structured outputs for type-safe responses, file attachments for multi-modal input, conversation memory for multi-turn interactions, and MicroProfile compatibility.

OmniHai is a sharp chef's knife, it does a few things very well. If you need RAG pipelines, agent frameworks, or vector stores, look at LangChain4J or Spring AI. If you need multi-provider chat, text analysis, and content moderation in Jakarta EE or MicroProfile with minimal dependencies, OmniHai is arguably the better choice.