Specification: Data Collection Workflow

Goal

Aggregates information from GitLab (groups, projects, README files, issues) into a unified, enriched dataset that downstream renderers consume. The collector orchestrates calls to the GitLab API Client and produces structured overview rows (see Model Mapping).

1. Happy-Path Flow

Retrieve the list of groups visible to the configured API token(s).
For every group, retrieve projects.
For each project:
- Determine the branch used for README lookup: prefer default_branch from project metadata; fall back to main.
- Request README.md raw content; on 404 treat README as missing.
- Request issues list. Some projects may not have issue tracking, yielding issues=None.
Transform raw JSON/text responses into domain objects via model mapping (see related spec) and compose an overview row containing:
- Group object.
- Project object.
- Optional README object.
- Optional List of Issue objects (may be empty list).
- Extra metadata extracted from README front-matter (author, priority, etc.).
Return the list of rows to the caller, preserving the original discovery order.

2. Failure Handling

Failure point	Result
Group listing request returns error	Raise Collector Error and abort collection.
Project listing for a single group fails	Propagate error → abort collection (no partial results).
README fetch returns 404	Record `readme = None`; continue processing other artefacts.
README fetch returns non-404 error	Propagate as Collector Error.
Issues request fails	Record `issues = None`; continue processing other artefacts.

Errors are not silently ignored (except the explicitly graceful README-missing/no Issue-tracker case).

3. Concurrency & Rate Limits

Implementations may perform project-level fetches in parallel, but must honour the rate-limit handling strategy defined in the API client.
Parallelism must not reorder final output; order is defined by input discovery sequence (§1-5).

4. Output Contract

Returns an ordered, in-memory collection of overview rows.
None is always interpreted as “does not has this feature”, “not found”, etc. Empty values (such as "", [], ..) indicates the presence in the API, but empty.
No persistence or caching is performed at this layer.
The consumer applies sorting/grouping according to their own needs (see Table Sorting).

5. Non-Goals

Command-line parsing, configuration merging, or environment handling (see Settings).
Rendering concerns of any kind – these are addressed by higher-level specs.