Long-conversation summarization pipeline
Focus of the day
The day was dedicated to defining a practical strategy for summarizing very long texts and conversations in a way that can be used inside an enterprise AI and document-assistance stack. The work went beyond model comparison: it focused on shaping a realistic, cost-aware approach compatible with durable memory management.
Model selection by role
A first round of decisions clarified which models fit which purpose:
- For high-quality synthesis on very large inputs:
- Gemini 2.5 Pro for its large context window
- GPT-5.2 Thinking / GPT-5.4 for faithful, well-structured synthesis
- Claude Opus 4.6 for readability and nuance
- For low-cost extraction workflows:
- Gemini 2.5 Flash-Lite as the leanest option
- Gemini 2.5 Flash as a cost/performance compromise
- DeepSeek V3.x and Qwen 2.5 72B as credible alternatives for structured chunk-based extraction
Architectural decision
The key decision of the day was to avoid a single monolithic summary pass and instead adopt a staged pipeline:
- chunking,
- structured extraction per chunk,
- merge and deduplication,
- final consolidation.
Prompting was reframed accordingly: ask the model for key-idea extraction rather than narrative summarization. The target output includes categories such as main ideas, decisions, constraints, key facts, action items, and open questions, using strict JSON output.
Cost control and useful memory
A second major track focused on limiting the cost of continuous memory extraction:
- do not summarize every exchange by default;
- filter messages first and retain only durable information;
- prioritize user messages over every assistant response;
- trigger memory extraction only on significant events;
- separate raw recent context, compact durable memory, and full archives accessible through RAG.
The target pipeline therefore becomes: local rules → candidate fragments → deduplication → minimal LLM call → atomic storage.
Broader continuity
This work directly extends the monthly effort around OpenWebUI and the sovereign document stack: the goal is no longer only to have an interface or a RAG layer, but to make the extraction, capitalization, and reuse of useful conversational knowledge reliable. At the yearly scale, it strengthens the build-out of a sovereign enterprise AI foundation able to turn raw exchanges into reusable knowledge without uncontrolled cost growth or loss of architectural control.