---
title: ReadmeExtract Model
---

# Specification: ReadmeExtract Model

## Purpose

Defines the `ReadmeExtract` model class responsible for extracting interpreted information from README frontmatter that are not verbatim parts of the original Markdown.

## 1. Class Definition

### 1.1 ReadmeExtract Model

```python
class ReadmeExtract(BaseModel):
    """Extracted and interpreted information from README frontmatter."""
    
    # Frontmatter-derived fields
    authors: list[str] = Field(default_factory=list)
    supervisors: list[str] = Field(default_factory=list)
    
    # Raw frontmatter for reference
    raw_frontmatter: dict[str, Any] = Field(default_factory=dict)
```

## 2. Field Definitions

### 2.1 Frontmatter-Derived Fields

| Field | Type | Description | Source |
|-------|------|-------------|--------|
| `authors` | list[str] | All author names | `authors`/`author` keys, processed per §3 |
| `supervisors` | list[str] | Author names with Supervision role | `authors`/`author` keys, processed per §3 |
| `raw_frontmatter` | dict | Complete frontmatter for reference | All YAML frontmatter |

## 3. Authors/Supervisors Processing

### 3.1 Input Formats

The `authors`/`author` field in frontmatter can be:

- **String**: Single author name
- **List of strings**: Multiple author names
- **List of dicts**: Author objects with `name` and `roles` fields
- **Dict**: Single author object with `name` and `roles` fields

### 3.2 Processing Rules

1. **All authors**: Extract all author names regardless of roles
2. **Supervisors**: Extract author names whose `roles` list contains "Supervision" (case-insensitive)
3. **Name extraction**:
   - String: Use as-is
   - Dict: Extract `name` field, skip if missing
   - List: Process each item recursively
4. **Deduplication**: Remove duplicate author names (case-insensitive)

### 3.3 Examples

```yaml
# String format
authors: "Alice Example"

# List of strings
authors: ["Alice Example", "Bob Example"]

# List of dicts
authors:
  - name: "Alice Example"
    roles: ["Supervision", "Conceptualization"]
  - name: "Bob Example"
    roles: ["Validation"]

# Mixed format
authors:
  - "Alice Example"
  - name: "Bob Example"
    roles: ["Supervision"]
```

## 4. Construction Process

1. **Parse frontmatter**: Extract YAML between `---` markers
2. **Process frontmatter fields**: Apply processing to get extracted data
3. **Construct object**: Create ReadmeExtract with all extracted data

## 5. Integration with Readme Model

The `Readme` model should include:

```python
class Readme(BaseModel):
    # Readme-Fields (see spec_model_mapping.md or models/readme.py)
    # [...]
    # extra-field with the Extract (this spec)
    extra: ReadmeExtract  # All interpreted/extracted information from frontmatter
```

## 6. Error Handling

- **Invalid YAML**: Log error, continue with empty frontmatter
- **Missing fields**: Set to `None`, "" or empty list as appropriate
- **Malformed author data**: Log warning, skip invalid entries, continue with valid ones

## 7. Examples

### 7.1 Simple README (No Frontmatter)

```markdown
# My Project

This is the first paragraph of my project.
```

**Extract**:

- All fields: "" or empty lists - depending on type; or `None` if type allows it.
- `raw_frontmatter`: `{}`

### 7.2 README with Frontmatter

```markdown
---
type: ML
priority: 5
authors:
  - name: Alice Example
    roles: [Supervision]
  - Bob Example
---

# Alpha One

## About

This is the main README for Alpha One.
```

**Extract**:

- `authors`: ["Alice Example", "Bob Example"]
- `supervisors`: ["Alice Example"]
- `raw_frontmatter`: `{"type": "ML", "priority": 5, "authors": [...]}`

## 8. Non-Goals

- Network I/O
- File system operations
- Rendering or formatting
- Validation beyond basic type checking
- Content extraction (first paragraph, TODO sections) - see spec_model_mapping.md