Specification: ReadmeExtract Model

Purpose

Defines the ReadmeExtract model class responsible for extracting interpreted information from README frontmatter that are not verbatim parts of the original Markdown.

1. Class Definition

1.1 ReadmeExtract Model

class ReadmeExtract(BaseModel):
    """Extracted and interpreted information from README frontmatter."""
    
    # Frontmatter-derived fields
    authors: list[str] = Field(default_factory=list)
    supervisors: list[str] = Field(default_factory=list)
    
    # Raw frontmatter for reference
    raw_frontmatter: dict[str, Any] = Field(default_factory=dict)

2. Field Definitions

2.1 Frontmatter-Derived Fields

Field	Type	Description	Source
`authors`	list[str]	All author names	`authors`/`author` keys, processed per §3
`supervisors`	list[str]	Author names with Supervision role	`authors`/`author` keys, processed per §3
`raw_frontmatter`	dict	Complete frontmatter for reference	All YAML frontmatter

3. Authors/Supervisors Processing

3.1 Input Formats

The authors/author field in frontmatter can be:

String: Single author name
List of strings: Multiple author names
List of dicts: Author objects with name and roles fields
Dict: Single author object with name and roles fields

3.2 Processing Rules

All authors: Extract all author names regardless of roles
Supervisors: Extract author names whose roles list contains “Supervision” (case-insensitive)
Name extraction:
- String: Use as-is
- Dict: Extract name field, skip if missing
- List: Process each item recursively
Deduplication: Remove duplicate author names (case-insensitive)

3.3 Examples

# String format
authors: "Alice Example"

# List of strings
authors: ["Alice Example", "Bob Example"]

# List of dicts
authors:
  - name: "Alice Example"
    roles: ["Supervision", "Conceptualization"]
  - name: "Bob Example"
    roles: ["Validation"]

# Mixed format
authors:
  - "Alice Example"
  - name: "Bob Example"
    roles: ["Supervision"]

4. Construction Process

Parse frontmatter: Extract YAML between --- markers
Process frontmatter fields: Apply processing to get extracted data
Construct object: Create ReadmeExtract with all extracted data

5. Integration with Readme Model

The Readme model should include:

class Readme(BaseModel):
    # Readme-Fields (see spec_model_mapping.md or models/readme.py)
    # [...]
    # extra-field with the Extract (this spec)
    extra: ReadmeExtract  # All interpreted/extracted information from frontmatter

6. Error Handling

Invalid YAML: Log error, continue with empty frontmatter
Missing fields: Set to None, “” or empty list as appropriate
Malformed author data: Log warning, skip invalid entries, continue with valid ones

7. Examples

7.1 Simple README (No Frontmatter)

# My Project

This is the first paragraph of my project.

Extract:

All fields: “” or empty lists - depending on type; or None if type allows it.
raw_frontmatter: {}

7.2 README with Frontmatter

---
type: ML
priority: 5
authors:
  - name: Alice Example
    roles: [Supervision]
  - Bob Example
---

# Alpha One

## About

This is the main README for Alpha One.

Extract:

authors: [“Alice Example”, “Bob Example”]
supervisors: [“Alice Example”]
raw_frontmatter: {"type": "ML", "priority": 5, "authors": [...]}

8. Non-Goals

Network I/O
File system operations
Rendering or formatting
Validation beyond basic type checking
Content extraction (first paragraph, TODO sections) - see spec_model_mapping.md