Les enseignants ont besoin de moyennes à jour immédiatement après la publication ou modification des notes, sans attendre un batch nocturne. Le système recalcule via Domain Events synchrones : statistiques d'évaluation (min/max/moyenne/médiane), moyennes matières pondérées (normalisation /20), et moyenne générale par élève. Les résultats sont stockés dans des tables dénormalisées avec cache Redis (TTL 5 min). Trois endpoints API exposent les données avec contrôle d'accès par rôle. Une commande console permet le backfill des données historiques au déploiement.
355 lines
9.6 KiB
Markdown
355 lines
9.6 KiB
Markdown
# Quality Scan Script Opportunities — Reference Guide
|
|
|
|
**Reference: `references/script-standards.md` for script creation guidelines.**
|
|
|
|
This document identifies deterministic operations that should be offloaded from the LLM into scripts for quality validation of BMad agents.
|
|
|
|
---
|
|
|
|
## Core Principle
|
|
|
|
Scripts validate structure and syntax (deterministic). Prompts evaluate semantics and meaning (judgment). Create scripts for checks that have clear pass/fail criteria.
|
|
|
|
---
|
|
|
|
## How to Spot Script Opportunities
|
|
|
|
During build, walk through every capability/operation and apply these tests:
|
|
|
|
### The Determinism Test
|
|
|
|
For each operation the agent performs, ask:
|
|
|
|
- Given identical input, will this ALWAYS produce identical output? → Script
|
|
- Does this require interpreting meaning, tone, context, or ambiguity? → Prompt
|
|
- Could you write a unit test with expected output for every input? → Script
|
|
|
|
### The Judgment Boundary
|
|
|
|
Scripts handle: fetch, transform, validate, count, parse, compare, extract, format, check structure
|
|
Prompts handle: interpret, classify with ambiguity, create, decide with incomplete info, evaluate quality, synthesize meaning
|
|
|
|
### Pattern Recognition Checklist
|
|
|
|
Table of signal verbs/patterns mapping to script types:
|
|
| Signal Verb/Pattern | Script Type |
|
|
|---------------------|-------------|
|
|
| "validate", "check", "verify" | Validation script |
|
|
| "count", "tally", "aggregate", "sum" | Metric/counting script |
|
|
| "extract", "parse", "pull from" | Data extraction script |
|
|
| "convert", "transform", "format" | Transformation script |
|
|
| "compare", "diff", "match against" | Comparison script |
|
|
| "scan for", "find all", "list all" | Pattern scanning script |
|
|
| "check structure", "verify exists" | File structure checker |
|
|
| "against schema", "conforms to" | Schema validation script |
|
|
| "graph", "map dependencies" | Dependency analysis script |
|
|
|
|
### The Outside-the-Box Test
|
|
|
|
Beyond obvious validation, consider:
|
|
|
|
- Could any data gathering step be a script that returns structured JSON for the LLM to interpret?
|
|
- Could pre-processing reduce what the LLM needs to read?
|
|
- Could post-processing validate what the LLM produced?
|
|
- Could metric collection feed into LLM decision-making without the LLM doing the counting?
|
|
|
|
### Your Toolbox
|
|
|
|
**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands.
|
|
|
|
- **Python:** Standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
|
|
- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
|
|
- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.
|
|
|
|
If you can express the logic as deterministic code, it's a script candidate.
|
|
|
|
### The --help Pattern
|
|
|
|
All scripts use PEP 723 and `--help`. When a skill's prompt needs to invoke a script, it can say "Run `scripts/foo.py --help` to understand inputs/outputs, then invoke appropriately" instead of inlining the script's interface. This saves tokens in prompts and keeps a single source of truth for the script's API.
|
|
|
|
---
|
|
|
|
## Priority 1: High-Value Validation Scripts
|
|
|
|
### 1. Frontmatter Validator
|
|
|
|
**What:** Validate SKILL.md frontmatter structure and content
|
|
|
|
**Why:** Frontmatter is the #1 factor in skill triggering. Catch errors early.
|
|
|
|
**Checks:**
|
|
|
|
```python
|
|
# checks:
|
|
- name exists and is kebab-case
|
|
- description exists and follows pattern "Use when..."
|
|
- No forbidden fields (XML, reserved prefixes)
|
|
- Optional fields have valid values if present
|
|
```
|
|
|
|
**Output:** JSON with pass/fail per field, line numbers for errors
|
|
|
|
**Implementation:** Python with argparse, no external deps needed
|
|
|
|
---
|
|
|
|
### 2. Template Artifact Scanner
|
|
|
|
**What:** Scan for orphaned template substitution artifacts
|
|
|
|
**Why:** Build process may leave `{if-autonomous}`, `{displayName}`, etc.
|
|
|
|
**Output:** JSON with file path, line number, artifact type
|
|
|
|
**Implementation:** Bash script with JSON output via jq
|
|
|
|
---
|
|
|
|
### 3. Access Boundaries Extractor
|
|
|
|
**What:** Extract and validate access boundaries from memory-system.md
|
|
|
|
**Why:** Security critical — must be defined before file operations
|
|
|
|
**Checks:**
|
|
|
|
```python
|
|
# Parse memory-system.md for:
|
|
- ## Read Access section exists
|
|
- ## Write Access section exists
|
|
- ## Deny Zones section exists (can be empty)
|
|
- Paths use placeholders correctly ({project-root} for _bmad paths, relative for skill-internal)
|
|
```
|
|
|
|
**Output:** Structured JSON of read/write/deny zones
|
|
|
|
**Implementation:** Python with markdown parsing
|
|
|
|
---
|
|
|
|
---
|
|
|
|
## Priority 2: Analysis Scripts
|
|
|
|
### 4. Token Counter
|
|
|
|
**What:** Count tokens in each file of an agent
|
|
|
|
**Why:** Identify verbose files that need optimization
|
|
|
|
**Checks:**
|
|
|
|
```python
|
|
# For each .md file:
|
|
- Total tokens (approximate: chars / 4)
|
|
- Code block tokens
|
|
- Token density (tokens / meaningful content)
|
|
```
|
|
|
|
**Output:** JSON with file path, token count, density score
|
|
|
|
**Implementation:** Python with tiktoken for accurate counting, or char approximation
|
|
|
|
---
|
|
|
|
### 5. Dependency Graph Generator
|
|
|
|
**What:** Map skill → external skill dependencies
|
|
|
|
**Why:** Understand agent's dependency surface
|
|
|
|
**Checks:**
|
|
|
|
```python
|
|
# Parse SKILL.md for skill invocation patterns
|
|
# Parse prompt files for external skill references
|
|
# Build dependency graph
|
|
```
|
|
|
|
**Output:** DOT format (GraphViz) or JSON adjacency list
|
|
|
|
**Implementation:** Python, JSON parsing only
|
|
|
|
---
|
|
|
|
### 6. Activation Flow Analyzer
|
|
|
|
**What:** Parse SKILL.md On Activation section for sequence
|
|
|
|
**Why:** Validate activation order matches best practices
|
|
|
|
**Checks:**
|
|
|
|
Validate that the activation sequence is logically ordered (e.g., config loads before config is used, memory loads before memory is referenced).
|
|
|
|
**Output:** JSON with detected steps, missing steps, out-of-order warnings
|
|
|
|
**Implementation:** Python with regex pattern matching
|
|
|
|
---
|
|
|
|
### 7. Memory Structure Validator
|
|
|
|
**What:** Validate memory-system.md structure
|
|
|
|
**Why:** Memory files have specific requirements
|
|
|
|
**Checks:**
|
|
|
|
```python
|
|
# Required sections:
|
|
- ## Core Principle
|
|
- ## File Structure
|
|
- ## Write Discipline
|
|
- ## Memory Maintenance
|
|
```
|
|
|
|
**Output:** JSON with missing sections, validation errors
|
|
|
|
**Implementation:** Python with markdown parsing
|
|
|
|
---
|
|
|
|
### 8. Subagent Pattern Detector
|
|
|
|
**What:** Detect if agent uses BMAD Advanced Context Pattern
|
|
|
|
**Why:** Agents processing 5+ sources MUST use subagents
|
|
|
|
**Checks:**
|
|
|
|
```python
|
|
# Pattern detection in SKILL.md:
|
|
- "DO NOT read sources yourself"
|
|
- "delegate to sub-agents"
|
|
- "/tmp/analysis-" temp file pattern
|
|
- Sub-agent output template (50-100 token summary)
|
|
```
|
|
|
|
**Output:** JSON with pattern found/missing, recommendations
|
|
|
|
**Implementation:** Python with keyword search and context extraction
|
|
|
|
---
|
|
|
|
## Priority 3: Composite Scripts
|
|
|
|
### 9. Agent Health Check
|
|
|
|
**What:** Run all validation scripts and aggregate results
|
|
|
|
**Why:** One-stop shop for agent quality assessment
|
|
|
|
**Composition:** Runs Priority 1 scripts, aggregates JSON outputs
|
|
|
|
**Output:** Structured health report with severity levels
|
|
|
|
**Implementation:** Bash script orchestrating Python scripts, jq for aggregation
|
|
|
|
---
|
|
|
|
### 10. Comparison Validator
|
|
|
|
**What:** Compare two versions of an agent for differences
|
|
|
|
**Why:** Validate changes during iteration
|
|
|
|
**Checks:**
|
|
|
|
```bash
|
|
# Git diff with structure awareness:
|
|
- Frontmatter changes
|
|
- Capability additions/removals
|
|
- New prompt files
|
|
- Token count changes
|
|
```
|
|
|
|
**Output:** JSON with categorized changes
|
|
|
|
**Implementation:** Bash with git, jq, python for analysis
|
|
|
|
---
|
|
|
|
## Script Output Standard
|
|
|
|
All scripts MUST output structured JSON for agent consumption:
|
|
|
|
```json
|
|
{
|
|
"script": "script-name",
|
|
"version": "1.0.0",
|
|
"agent_path": "/path/to/agent",
|
|
"timestamp": "2025-03-08T10:30:00Z",
|
|
"status": "pass|fail|warning",
|
|
"findings": [
|
|
{
|
|
"severity": "critical|high|medium|low|info",
|
|
"category": "structure|security|performance|consistency",
|
|
"location": { "file": "SKILL.md", "line": 42 },
|
|
"issue": "Clear description",
|
|
"fix": "Specific action to resolve"
|
|
}
|
|
],
|
|
"summary": {
|
|
"total": 10,
|
|
"critical": 1,
|
|
"high": 2,
|
|
"medium": 3,
|
|
"low": 4
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Checklist
|
|
|
|
When creating validation scripts:
|
|
|
|
- [ ] Uses `--help` for documentation
|
|
- [ ] Accepts `--agent-path` for target agent
|
|
- [ ] Outputs JSON to stdout
|
|
- [ ] Writes diagnostics to stderr
|
|
- [ ] Returns meaningful exit codes (0=pass, 1=fail, 2=error)
|
|
- [ ] Includes `--verbose` flag for debugging
|
|
- [ ] Has tests in `scripts/tests/` subfolder
|
|
- [ ] Self-contained (PEP 723 for Python)
|
|
- [ ] No interactive prompts
|
|
|
|
---
|
|
|
|
## Integration with Quality Analysis
|
|
|
|
The Quality Analysis skill should:
|
|
|
|
1. **First**: Run available scripts for fast, deterministic checks
|
|
2. **Then**: Use sub-agents for semantic analysis (requires judgment)
|
|
3. **Finally**: Synthesize both sources into report
|
|
|
|
**Example flow:**
|
|
|
|
```bash
|
|
# Run all validation scripts
|
|
python scripts/validate-frontmatter.py --agent-path {path}
|
|
bash scripts/scan-template-artifacts.sh --agent-path {path}
|
|
|
|
# Collect JSON outputs
|
|
# Spawn sub-agents only for semantic checks
|
|
# Synthesize complete report
|
|
```
|
|
|
|
---
|
|
|
|
## Script Creation Priorities
|
|
|
|
**Phase 1 (Immediate value):**
|
|
|
|
1. Template Artifact Scanner (Bash + jq)
|
|
2. Access Boundaries Extractor (Python)
|
|
|
|
**Phase 2 (Enhanced validation):** 4. Token Counter (Python) 5. Subagent Pattern Detector (Python) 6. Activation Flow Analyzer (Python)
|
|
|
|
**Phase 3 (Advanced features):** 7. Dependency Graph Generator (Python) 8. Memory Structure Validator (Python) 9. Agent Health Check orchestrator (Bash)
|
|
|
|
**Phase 4 (Comparison tools):** 10. Comparison Validator (Bash + Python)
|