Specification: Data Collection Workflow

Goal

Aggregates information from GitLab (groups, projects, README files, issues) into a unified, enriched dataset that downstream renderers consume. The collector orchestrates calls to the GitLab API Client and produces structured overview rows (see Model Mapping).


1. Happy-Path Flow

  1. Retrieve the list of groups visible to the configured API token(s).

  2. For every group, retrieve projects.

  3. For each project:

    • Determine the branch used for README lookup: prefer default_branch from project metadata; fall back to main.

    • Request README.md raw content; on 404 treat README as missing.

    • Request issues list. Some projects may not have issue tracking, yielding issues=None.

  4. Transform raw JSON/text responses into domain objects via model mapping (see related spec) and compose an overview row containing:

    • Group object.

    • Project object.

    • Optional README object.

    • Optional List of Issue objects (may be empty list).

    • Extra metadata extracted from README front-matter (author, priority, etc.).

  5. Return the list of rows to the caller, preserving the original discovery order.


2. Failure Handling

Failure point

Result

Group listing request returns error

Raise Collector Error and abort collection.

Project listing for a single group fails

Propagate error → abort collection (no partial results).

README fetch returns 404

Record readme = None; continue processing other artefacts.

README fetch returns non-404 error

Propagate as Collector Error.

Issues request fails

Record issues = None; continue processing other artefacts.

Errors are not silently ignored (except the explicitly graceful README-missing/no Issue-tracker case).


3. Concurrency & Rate Limits

  • Implementations may perform project-level fetches in parallel, but must honour the rate-limit handling strategy defined in the API client.

  • Parallelism must not reorder final output; order is defined by input discovery sequence (§1-5).


4. Output Contract

  • Returns an ordered, in-memory collection of overview rows.

  • None is always interpreted as “does not has this feature”, “not found”, etc. Empty values (such as "", [], ..) indicates the presence in the API, but empty.

  • No persistence or caching is performed at this layer.

  • The consumer applies sorting/grouping according to their own needs (see Table Sorting).


5. Non-Goals

  • Command-line parsing, configuration merging, or environment handling (see Settings).

  • Rendering concerns of any kind – these are addressed by higher-level specs.