Google PM Open-Sources Always-On Memory Agent Without Vector DBs

Shubham Saboo, a senior AI product manager at Google, published an open-source “Always On Memory Agent” to the official Google Cloud Platform GitHub page this week under a permissive MIT License, allowing commercial use.

Contents

What the Architecture Actually Does
Why Flash-Lite Makes the Economics Work

The project tackles a persistent problem in agent design: how to ingest information continuously, consolidate it in the background, and retrieve it later — without a conventional vector database.

The repo’s central claim is direct: “No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory.”

According to the repository, the agent runs continuously, stores structured memories in SQLite, and runs scheduled memory consolidation every 30 minutes by default. It supports ingestion of text, image, audio, video and PDF files. A local HTTP API and Streamlit dashboard are included.

What the Architecture Actually Does

The repo uses a multi-agent internal architecture, with specialist subagents handling ingestion, consolidation, and querying. The supplied materials do not claim this is a shared memory framework for multiple independent agents — it is a memory layer built with specialist subagents and persistent storage.

Dropping the traditional retrieval stack removes separate embedding pipelines, vector storage, indexing logic and synchronization work. The tradeoff shifts performance questions from vector search overhead to model latency, memory compaction logic and long-run behavioral stability.

The system was built with Google’s Agent Development Kit (ADK), introduced in Spring 2025, and Gemini 3.1 Flash-Lite, which Google released on March 3, 2026 as its fastest and most cost-efficient Gemini 3 series model.

Why Flash-Lite Makes the Economics Work

Gemini 3.1 Flash-Lite is priced at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens. The company says it is 2.5 times faster than Gemini 2.5 Flash in time to first token and delivers a 45% increase in output speed while maintaining similar or better quality.

On published benchmarks, the model posts an Elo score of 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro.

A 24/7 service that periodically re-reads and consolidates memory needs predictable latency and low inference costs. Without that, “always on” becomes prohibitively expensive at production scale.

ADK is documented as model-agnostic and deployment-agnostic, with support for workflow agents, multi-agent systems, tools, evaluation, and deployment targets including Cloud Run and Vertex AI Agent Engine. That context makes the memory agent look less like a demo and more like a reference point for where agent infrastructure is heading.

For enterprise teams building support systems, research assistants, internal copilots and workflow automation, the repo matters less as a product than as a signal — persistent memory without vector databases is becoming an engineerable choice, not just a theoretical one. Governance questions follow immediately, as soon as memory stops being session-bound.

Photo by Pixabay

This article is a curated summary based on third-party sources. Source: Read the original article

What the Architecture Actually Does

More Read

Why Flash-Lite Makes the Economics Work

All the latest Foxiz news straight to your inbox​

All the latest Foxiz news straight to your inbox