EphemerAl

EphemerAl: A Simple Self-Hosted Chat Interface for Local AI with Ollama that Accepts Documents and Images

EphemerAl is a lightweight, open-source web interface for interacting with Google’s Gemma 3 LLM locally on your hardware. I designed it for my day job to help keep our team’s sensitive info off cloud services, and to provide a modern AI experience to staff without the per-user cost required to achieve equivalent capabilities online. All responses are generated by Gemma 3 (12b or 27b), and enhanced by any documents, images, or queries you attach during a conversation. Gemma 3 4b will work if you’re hardware limited and want to try it out, but I found it was more Ai than AI.

While it wasn’t built for broad distribution, I’m sharing this generalized version in case it helps others looking for a local-only, account-free, multimodal LLM interface. . . whether to provide an operational tool, a staff learning environment, or bragging rights when friends visit on your home network.

View the full source code on GitHub

A screenshot of EphemerAl, a Docker-based self-hosted AI assistant for local LLM document Q&A and image analysis using Ollama

Core Features

This tool offers a straightforward set of capabilities to facilitate interaction with local LLM models.

Local AI Interaction: Engage in real-time conversations powered by Google’s Gemma 3 model (12B or 27B variants) through Ollama, supporting tasks such as question answering and idea generation without requiring an internet connection.
Document and Image Uploads: Submit files including PDFs, documents, and spreadsheets (handling over 100 formats via Apache Tika), allowing the model to incorporate user content into its responses.
Multimodal Functionality: Leverage Gemma 3’s ability to process images in conjunction with text, enabling analysis of visual elements like diagrams or photographs.
Customizable Interface: Incorporate a personal logo or adjust the appearance if desired using Streamlit, maintaining a clean and user-friendly design.
Ephemeral Nature: Conversations are not retained after a refresh or new session, which prevents sensitive information from accumulating on the system (as it would if chat histories were retained).
Simplified Deployment: The solution utilizes Docker Compose for containerized setup, making it accessible even for those new to such environments.

Technical Stack

EphemerAl leverages a straightforward application stack to ensure reliability and ease of use.

Frontend Framework: Python 3.11 with Streamlit for the web-based chat interface.
AI Backend: Ollama to manage Gemma 3 models, optimized for NVIDIA GPU acceleration.
File Processing: Apache Tika server for extracting text from uploaded documents.
Containerization: Docker and Docker Compose for isolated deployment.
Dependencies: Essential libraries such as requests, pytz, tika, openai client, and pillow.

System Requirements

To run this interface effectively, the following specifications are recommended.

Operating System: Windows 11 Pro or Enterprise, fully updated. WSL will be installed as part of setup (if not already present).
Graphics Processing Unit: One or more discrete NVIDIA GPU(s), preferably from the 30-series or later, with at least 12GB VRAM for the 12B model, 24GB+ suggested for the 27B model.
Nvidia Driver: The most recent WHQL-certified NVIDIA GPU driver. Optional components may be omitted.
Additional Note: If available, connect display to integrated graphics to allocate more VRAM to the NVIDIA GPU(s).

Deployment

Refer to the System Deployment Guide for detailed, step-by-step instructions, which include copy-paste commands suitable for beginners.

For automatic startup, utilize the provided PowerShell script.

Accessing the EphemerAl website

Local Access: Navigate to http://localhost:8501
Network Access: Use http://windows_host_ip_address:8501

Stopping the Application

Execute the following in an Administrator PowerShell window:

wsl --shutdown

To restart, either run wsl or reboot the system if you have the startup script installed.

Known Issues

These may be addressed if needed in the future.

Extended input text goes under the submission arrow for a few characters until line wrapping occurs.
There is no automatic enforcement of context length limits. Gemma 3 accommodates substantial ctx, so gracefully handling this has been deferred to see if it occurs.

Support

This project is provided as a resource for the community as-is. I hope it solves a problem or provides value outside my environment.

If you run into issues, consider submitting error details, including screenshots and system files, to an AI assistant for guidance. This isn’t meant to be snark, it’s amazing how well the big reasoning models can troubleshoot.

License:

MIT - (At least the parts of this stack that are mine to license)

This site is open source. Improve this page.