Specification: Data Collection Workflow
Goal
Aggregates information from GitLab (groups, projects, README files, issues) into a unified, enriched dataset that downstream renderers consume. The collector orchestrates calls to the GitLab API Client and produces structured overview rows (see Model Mapping).
1. Happy-Path Flow
Retrieve the list of groups visible to the configured API token(s).
For every group, retrieve projects.
For each project:
Determine the branch used for README lookup: prefer
default_branchfrom project metadata; fall back tomain.Request
README.mdraw content; on 404 treat README as missing.Request issues list. Some projects may not have issue tracking, yielding
issues=None.
Transform raw JSON/text responses into domain objects via model mapping (see related spec) and compose an overview row containing:
Group object.
Project object.
Optional README object.
Optional List of Issue objects (may be empty list).
Extra metadata extracted from README front-matter (author, priority, etc.).
Return the list of rows to the caller, preserving the original discovery order.
2. Failure Handling
Failure point |
Result |
|---|---|
Group listing request returns error |
Raise Collector Error and abort collection. |
Project listing for a single group fails |
Propagate error → abort collection (no partial results). |
README fetch returns 404 |
Record |
README fetch returns non-404 error |
Propagate as Collector Error. |
Issues request fails |
Record |
Errors are not silently ignored (except the explicitly graceful README-missing/no Issue-tracker case).
3. Concurrency & Rate Limits
Implementations may perform project-level fetches in parallel, but must honour the rate-limit handling strategy defined in the API client.
Parallelism must not reorder final output; order is defined by input discovery sequence (§1-5).
4. Output Contract
Returns an ordered, in-memory collection of overview rows.
Noneis always interpreted as “does not has this feature”, “not found”, etc. Empty values (such as"",[], ..) indicates the presence in the API, but empty.No persistence or caching is performed at this layer.
The consumer applies sorting/grouping according to their own needs (see Table Sorting).
5. Non-Goals
Command-line parsing, configuration merging, or environment handling (see Settings).
Rendering concerns of any kind – these are addressed by higher-level specs.