Les enseignants ont besoin de moyennes à jour immédiatement après la publication ou modification des notes, sans attendre un batch nocturne. Le système recalcule via Domain Events synchrones : statistiques d'évaluation (min/max/moyenne/médiane), moyennes matières pondérées (normalisation /20), et moyenne générale par élève. Les résultats sont stockés dans des tables dénormalisées avec cache Redis (TTL 5 min). Trois endpoints API exposent les données avec contrôle d'accès par rôle. Une commande console permet le backfill des données historiques au déploiement.
79 lines
3.3 KiB
Markdown
79 lines
3.3 KiB
Markdown
# Semantic Splitting Strategy
|
|
|
|
When the source content is large (exceeds ~15,000 tokens) or a token_budget requires it, split the distillate into semantically coherent sections rather than arbitrary size breaks.
|
|
|
|
## Why Semantic Over Size-Based
|
|
|
|
Arbitrary splits (every N tokens) break coherence. A downstream workflow loading "part 2 of 4" gets context fragments. Semantic splits produce self-contained topic clusters that a workflow can load selectively — "give me just the technical decisions section" — which is more useful and more token-efficient for the consumer.
|
|
|
|
## Splitting Process
|
|
|
|
### 1. Identify Natural Boundaries
|
|
|
|
After the initial extraction and deduplication (Steps 1-2 of the compression process), look for natural semantic boundaries:
|
|
- Distinct problem domains or functional areas
|
|
- Different stakeholder perspectives (users, technical, business)
|
|
- Temporal boundaries (current state vs future vision)
|
|
- Scope boundaries (in-scope vs out-of-scope vs deferred)
|
|
- Phase boundaries (analysis, design, implementation)
|
|
|
|
Choose boundaries that produce sections a downstream workflow might load independently.
|
|
|
|
### 2. Assign Items to Sections
|
|
|
|
For each extracted item, assign it to the most relevant section. Items that span multiple sections go in the root distillate.
|
|
|
|
Cross-cutting items (items relevant to multiple sections):
|
|
- Constraints that affect all areas → root distillate
|
|
- Decisions with broad impact → root distillate
|
|
- Section-specific decisions → section distillate
|
|
|
|
### 3. Produce Root Distillate
|
|
|
|
The root distillate contains:
|
|
- **Orientation** (3-5 bullets): what was distilled, from what sources, for what consumer, how many sections
|
|
- **Cross-references**: list of section distillates with 1-line descriptions
|
|
- **Cross-cutting items**: facts, decisions, and constraints that span multiple sections
|
|
- **Scope summary**: high-level in/out/deferred if applicable
|
|
|
|
### 4. Produce Section Distillates
|
|
|
|
Each section distillate must be self-sufficient — a reader loading only one section should understand it without the others.
|
|
|
|
Each section includes:
|
|
- **Context header** (1 line): "This section covers [topic]. Part N of M from [source document names]."
|
|
- **Section content**: thematically-grouped bullets following the same compression rules as a single distillate
|
|
- **Cross-references** (if needed): pointers to other sections for related content
|
|
|
|
### 5. Output Structure
|
|
|
|
Create a folder `{base-name}-distillate/` containing:
|
|
|
|
```
|
|
{base-name}-distillate/
|
|
├── _index.md # Root distillate: orientation, cross-cutting items, section manifest
|
|
├── 01-{topic-slug}.md # Self-contained section
|
|
├── 02-{topic-slug}.md
|
|
└── 03-{topic-slug}.md
|
|
```
|
|
|
|
Example:
|
|
```
|
|
product-brief-distillate/
|
|
├── _index.md
|
|
├── 01-problem-solution.md
|
|
├── 02-technical-decisions.md
|
|
└── 03-users-market.md
|
|
```
|
|
|
|
## Size Targets
|
|
|
|
When a token_budget is specified:
|
|
- Root distillate: ~20% of budget (orientation + cross-cutting items)
|
|
- Remaining budget split proportionally across sections based on content density
|
|
- If a section exceeds its proportional share, compress more aggressively or sub-split
|
|
|
|
When no token_budget but splitting is needed:
|
|
- Aim for sections of 3,000-5,000 tokens each
|
|
- Root distillate as small as possible while remaining useful standalone
|