EphemerAl is a lightweight, open-source web interface (user-facing brand: EphemerAI) for interacting with local LLMs on your hardware via Ollama. I designed it for my day job to help keep our team’s sensitive info off cloud services, and to provide a modern AI experience to staff without the per-user cost required to achieve equivalent capabilities online. The repository now targets Qwen3.6-35B-A3B through a stable local alias (ephemeral-default) that you create from qwen3.6:35b-a3b, and can still be retargeted to other models by changing one environment variable (LLM_MODEL_NAME).
While it wasn’t built for broad distribution, I’m sharing this generalized version in case it helps others looking for a local-only, account-free, multimodal LLM interface. . . whether to provide an operational tool, a staff learning environment, or bragging rights when friends visit on your home network.
View the source code on GitHub

qwen3.6:35b-a3bephemeral-defaultUsing a stable local alias is intentional: it lets you pin runtime defaults (like context and generation parameters) in one place while the app consistently calls ephemeral-default.
num_ctx 262144.reasoning_effort="none".max_tokens unless you explicitly configure LLM_MAX_TOKENS.LLM_OUTPUT_RESERVE_TOKENS keeps response headroom while loading documents into context; it is not an output cap.LLM_REQUEST_TIMEOUT_S=1800 and LLM_MAX_RETRIES=0 for local long-running inference without hidden retries.Set LLM_MODEL_NAME to any available Ollama model tag or local alias:
docker-compose.ymlThe app performs model capability/context detection at runtime via Ollama (/api/show) so behavior remains adaptive across models.
EphemerAl is designed to minimize data retention:
st.session_state) for your browser session and are not persisted to disk by this app./tmp, so temporary files stay in RAM.Note that browser caching behavior depends on your browser settings and cache-control headers. For maximum privacy on shared machines, use private/incognito browsing or clear browser data after use.
If you enable a shared Ollama API backend, requests made directly to Ollama bypass the EphemerAl UI/session layer; privacy and logging behavior for those requests depends on the external client and Ollama deployment settings, not EphemerAl session behavior.
EphemerAl is designed for trusted local networks (home, office LAN) and does not implement authentication or transport encryption. The Streamlit container disables CORS and XSRF protection to allow straightforward LAN access. Do not expose this application to the public internet without adding a reverse proxy with authentication and TLS.
ollama/ollama:0.21.0apache/tika:3.3.0.0-fullQwen3.6-35B-A3B is a large model and should be planned like one.
ollama ps after deployment.If this model is too heavy for your machine, retarget LLM_MODEL_NAME to a smaller local model.
To run this interface effectively, the following specifications are recommended.
Use the step-by-step guide:
If your current stack still points at any older model tag, recreate or update your local ephemeral-default alias to point to qwen3.6:35b-a3b using the deployment guide, then confirm docker-compose.yml uses LLM_MODEL_NAME=ephemeral-default.
OLLAMA_NO_CLOUD=1) to keep this deployment local-only/privacy-aligned.docker-compose.api.yml override to expose port 11434.model: "ephemeral-default"reasoning_effort: "none"temperature: 0.7top_p: 0.8presence_penalty: 1.5http://localhost:8501http://<host-ip>:8501After deployment (or after UI updates), run this quick manual checklist in a browser:
Execute the following in an Administrator PowerShell window:
wsl --shutdown
To restart, either run wsl or reboot the system if you have the startup script installed.
This project is provided as a resource for the community as-is. I hope it solves a problem or provides value outside my environment.
If you run into issues, consider submitting error details, including screenshots and system files, to an AI assistant for guidance. This isn’t meant to be snark, it’s amazing how well the big reasoning models can troubleshoot.
License:
MIT - (At least the parts of this stack that are mine to license)