--- title: ReadmeExtract Model --- # Specification: ReadmeExtract Model ## Purpose Defines the `ReadmeExtract` model class responsible for extracting interpreted information from README frontmatter that are not verbatim parts of the original Markdown. ## 1. Class Definition ### 1.1 ReadmeExtract Model ```python class ReadmeExtract(BaseModel): """Extracted and interpreted information from README frontmatter.""" # Frontmatter-derived fields authors: list[str] = Field(default_factory=list) supervisors: list[str] = Field(default_factory=list) # Raw frontmatter for reference raw_frontmatter: dict[str, Any] = Field(default_factory=dict) ``` ## 2. Field Definitions ### 2.1 Frontmatter-Derived Fields | Field | Type | Description | Source | |-------|------|-------------|--------| | `authors` | list[str] | All author names | `authors`/`author` keys, processed per §3 | | `supervisors` | list[str] | Author names with Supervision role | `authors`/`author` keys, processed per §3 | | `raw_frontmatter` | dict | Complete frontmatter for reference | All YAML frontmatter | ## 3. Authors/Supervisors Processing ### 3.1 Input Formats The `authors`/`author` field in frontmatter can be: - **String**: Single author name - **List of strings**: Multiple author names - **List of dicts**: Author objects with `name` and `roles` fields - **Dict**: Single author object with `name` and `roles` fields ### 3.2 Processing Rules 1. **All authors**: Extract all author names regardless of roles 2. **Supervisors**: Extract author names whose `roles` list contains "Supervision" (case-insensitive) 3. **Name extraction**: - String: Use as-is - Dict: Extract `name` field, skip if missing - List: Process each item recursively 4. **Deduplication**: Remove duplicate author names (case-insensitive) ### 3.3 Examples ```yaml # String format authors: "Alice Example" # List of strings authors: ["Alice Example", "Bob Example"] # List of dicts authors: - name: "Alice Example" roles: ["Supervision", "Conceptualization"] - name: "Bob Example" roles: ["Validation"] # Mixed format authors: - "Alice Example" - name: "Bob Example" roles: ["Supervision"] ``` ## 4. Construction Process 1. **Parse frontmatter**: Extract YAML between `---` markers 2. **Process frontmatter fields**: Apply processing to get extracted data 3. **Construct object**: Create ReadmeExtract with all extracted data ## 5. Integration with Readme Model The `Readme` model should include: ```python class Readme(BaseModel): # Readme-Fields (see spec_model_mapping.md or models/readme.py) # [...] # extra-field with the Extract (this spec) extra: ReadmeExtract # All interpreted/extracted information from frontmatter ``` ## 6. Error Handling - **Invalid YAML**: Log error, continue with empty frontmatter - **Missing fields**: Set to `None`, "" or empty list as appropriate - **Malformed author data**: Log warning, skip invalid entries, continue with valid ones ## 7. Examples ### 7.1 Simple README (No Frontmatter) ```markdown # My Project This is the first paragraph of my project. ``` **Extract**: - All fields: "" or empty lists - depending on type; or `None` if type allows it. - `raw_frontmatter`: `{}` ### 7.2 README with Frontmatter ```markdown --- type: ML priority: 5 authors: - name: Alice Example roles: [Supervision] - Bob Example --- # Alpha One ## About This is the main README for Alpha One. ``` **Extract**: - `authors`: ["Alice Example", "Bob Example"] - `supervisors`: ["Alice Example"] - `raw_frontmatter`: `{"type": "ML", "priority": 5, "authors": [...]}` ## 8. Non-Goals - Network I/O - File system operations - Rendering or formatting - Validation beyond basic type checking - Content extraction (first paragraph, TODO sections) - see spec_model_mapping.md