OllamaApiFacade is an open-source library that allows you to run your own .NET backend as an Ollama API, based on the Microsoft Semantic Kernel. This lets clients expecting an Ollama backend interact with your .NET backend. For example, you can use Open WebUI with your own backend. The library also supports Semantic Kernel Connectors for local LLM/SLM services like LM Studio, AI Toolkit for Visual Studio Code and is easily extendable to add more interfaces.
- Seamless Ollama Backend Integration: OllamaApiFacade allows you to expose your .NET backend as a local Ollama API, making it compatible with Open WebUI solutions.
- Microsoft Semantic Kernel Support: Fully integrates with Semantic Kernel for building AI-based applications.
- Extensible: Easily add support for more interfaces through pull requests. Community contributions are highly encouraged!
- Custom Modelname Support: The library allows users to configure their own model names and backends.
You can install the OllamaApiFacade via
dotnet add package OllamaApiFacade- .NET 8.0 or later
- Microsoft Semantic Kernel
The following example demonstrates how to use the OllamaApiFacade with Microsoft Semantic Kernel and a local LLM/SLM service like LM Studio.
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using OllamaApiFacade.DemoWebApi.Plugins;
using OllamaApiFacade.Extensions;
var builder = WebApplication.CreateBuilder(args);
// Configure Ollama API to use a local URL
builder.ConfigureAsLocalOllamaApi();
builder.Services.AddKernel()
.AddLmStudio() // Adds LM Studio as the local LLM/SLM service
.Plugins.AddFromType<TimeInformationPlugin>(); // Adds custom Semantic Kernel plugin
var app = builder.Build();
// Map the POST API for chat interaction
app.MapPostApiChat(async (chatRequest, chatCompletionService, httpContext, kernel) =>
{
var chatHistory = chatRequest.ToChatHistory();
var promptExecutionSettings = new OpenAIPromptExecutionSettings
{
FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
};
await chatCompletionService.GetStreamingChatMessageContentsAsync(chatHistory, promptExecutionSettings, kernel)
.StreamToResponseAsync(httpContext.Response);
});
app.Run();As an example, you can now run Open WebUI with Docker after setting up your backend. To do so, simply use the following Docker command:
docker run -d -p 8080:8080 --add-host=host.docker.internal:host-gateway --name open-webui ghcr.io/open-webui/open-webui:mainThis command will start Open WebUI and make it accessible locally at http://localhost:8080. The --add-host=host.docker.internal:host-gateway flag is used to allow communication between the Docker container and your host machine.
For more detailed information on how to set up Open WebUI with Docker, including advanced configurations such as GPU support, please refer to the official Open WebUI GitHub repository.
If you want to specify your own model names instead of relying on the default configuration, you can do this by using MapOllamaBackendFacade:
var builder = WebApplication.CreateBuilder(args).ConfigureAsLocalOllamaApi();
var app = builder.Build();
// Map the Ollama backend with a custom model name
app.MapOllamaBackendFacade("mymodelname");
// Map the POST API for chat interaction
app.MapPostApiChat(async (chatRequest, chatCompletionService) =>
{
// Your custom logic here...
});
app.Run();The ConfigureAsLocalOllamaApi() method automatically configures the backend to run on the URL http://localhost:11434, which is commonly used by Ollama. However, if you prefer to configure your own URL settings, you can do so by modifying the launchSettings.json file. In such cases, using ConfigureAsLocalOllamaApi() is not necessary, as your custom settings will take precedence.
To modify the default URL, you can simply update the launchSettings.json file in your project as shown below:
{
"$schema": "http://json.schemastore.org/launchsettings.json",
"profiles": {
"http": {
"commandName": "Project",
"dotnetRunMessages": true,
"launchBrowser": false,
"launchUrl": "http://localhost:8080",
"applicationUrl": "http://localhost:11434",
"environmentVariables": {
"ASPNETCORE_ENVIRONMENT": "Development"
}
}
}
}By adjusting the applicationUrl, you can set your own custom port, and the ConfigureAsLocalOllamaApi() method will no longer be required in the code.
The OllamaApiFacade allows you to convert incoming messages from the Ollama format into Semantic Kernel data classes, such as using the .ToChatHistory() method to transform a chat request into a format that can be processed by the Semantic Kernel. Responses from the Semantic Kernel can then be converted back into the Ollama format using methods like .StreamToResponseAsync() or .ToChatResponse(), enabling seamless communication between the two systems.
Here's an example of how incoming messages are processed and transformed, with a response sent back in the Ollama format:
app.MapPostApiChat(async (chatRequest, chatCompletionService, httpContext) =>
{
var chatHistory = chatRequest.ToChatHistory();
var messages = await chatCompletionService.GetChatMessageContentsAsync(chatHistory);
var chatResponse = messages.First().ToChatResponse();
chatResponse.StreamToResponseAsync(httpContext.Response);
});In this example:
- The
chatRequestis transformed into achatHistoryobject using the.ToChatHistory()method, making it compatible with the Semantic Kernel. - The Semantic Kernel processes the
chatHistoryand retrieves the chat messages. - The first message in the list is converted back into an Ollama-compatible format using
.ToChatResponse(). - Finally, the response is serialized into JSON format and sent back to the client with
httpContext.Response.WriteAsync().
This ensures seamless communication between the Ollama client and the Semantic Kernel backend, allowing for the integration of advanced AI-driven interactions.
In this API, responses are typically expected to be streamed back to the client. To facilitate this, the StreamToResponseAsync() method is available, which handles the streaming of responses seamlessly. This method automatically supports a variety of data types from the Semantic Kernel, as well as direct ChatResponse types from Ollama. It ensures that the appropriate format is returned to the client, whether you're working with Semantic Kernel-generated content or directly with Ollama responses.
This method simplifies the process of returning streamed responses, making the interaction between the client and backend smooth and efficient.
To inspect HTTP traffic between your application and language model APIs, you can use the AddProxyForDebug extension method. This is particularly useful for debugging with tools like Burp Suite Community Edition or OWASP ZAP.
The simplest way to enable proxy debugging is:
builder.Services.AddProxyForDebug();Default behavior:
- Routes all HTTP/HTTPS traffic through
http://127.0.0.1:8080 - Automatically excludes port 6334 (commonly used by vector databases like Qdrant)
- Automatically detects and excludes gRPC traffic (ports 5001, 5051, 50051, 9090)
- Bypasses SSL certificate validation (required for Burp Suite interception)
// Use a different proxy port and exclude additional ports
builder.Services.AddProxyForDebug(
proxyUrl: "http://127.0.0.1:8888",
excludedPorts: new[] { 6334, 5001, 5051 }
);For complete control, use the configuration action:
builder.Services.AddProxyForDebug(options =>
{
// Proxy URL (default: http://127.0.0.1:8080)
options.ProxyUrl = "http://192.168.1.100:8080";
// Exclude specific ports from proxy (default: 6334)
options.ExcludedPorts = new[] { 6334, 5001, 5051, 443 };
// Exclude specific hosts from proxy
options.ExcludedHosts = new[] { "localhost", "grpc-backend.internal" };
// Auto-detect and exclude gRPC traffic (default: true)
options.AutoExcludeGrpc = true;
// Ignore SSL certificate errors for Burp Suite (default: true)
options.IgnoreSslErrors = true;
// Set environment proxy variables (default: true)
options.SetEnvironmentProxyVariables = true;
});You can also apply proxy settings to individual HttpClient instances:
builder.Services
.AddHttpClient<MyApiService>()
.AddProxyForDebug(
proxyUrl: "http://127.0.0.1:8080",
excludedPorts: new[] { 6334 }
);The proxy automatically detects and bypasses gRPC traffic based on:
- Common gRPC ports: 5001, 5051, 50051, 9090
- Host patterns: Hosts containing "grpc" (e.g.,
grpc.service.local) - URL patterns: Paths starting with
/grpc.or containing.proto
This ensures that gRPC communication (which doesn't work well through HTTP proxies) is automatically excluded while REST API calls are routed through your debugging proxy.
// Port 6334 (Qdrant) is automatically excluded
builder.Services.AddProxyForDebug();// gRPC traffic automatically bypasses the proxy
builder.Services.AddProxyForDebug(options =>
{
options.ExcludedPorts = new[] { 6334, 5001 }; // Qdrant + custom gRPC
});builder.Services.AddProxyForDebug("http://127.0.0.1:8888");AddProxyForDebug in production environments. Consider wrapping it in conditional compilation:
#if DEBUG
builder.Services.AddProxyForDebug();
#endif- Download Burp Suite Community Edition
- Configure Proxy Listener: Go to Proxy β Options β Proxy Listeners
- Default:
127.0.0.1:8080 - If port 8080 is occupied, use a different port (e.g., 8888)
- Default:
- Start Interception: Go to Proxy β Intercept β Enable/Disable interception
- View HTTP History: Go to Proxy β HTTP History to see all requests and responses
Using this setup, you can inspect all HTTP/HTTPS communication between your application and language model APIs, making debugging significantly easier.
We encourage the community to contribute to this project! If there are additional features, interfaces, or improvements you would like to see, feel free to submit a pull request. Contributions of any kind are highly appreciated.
- Fork the repository on GitHub.
- Create a new branch for your feature or bugfix.
- Commit your changes and push the branch to GitHub.
- Submit a pull request, and we will review it as soon as possible.
This project is licensed under the MIT License.
MIT License
Copyright (c) 2024 - 2026 Gregor Biswanger - Microsoft MVP for Azure AI and Web App Development
For any questions or further details, feel free to open an issue on GitHub or reach out directly.
