CI workflows periodically fail fetching configlet (rate limited)

Occasionally the GH CI Actions fail on when running the “fetch configlet” step.

curl fails with an HTTP 403 error but typically succeeds on retry: curl: (22) The requested URL returned error: 403.

I discussed this with @BethanyG and we believe this is likely caused by a GH Rate Limit Exceeded error. (Yes, a rate limit response ought to be 427 but it appears GH uses 403. This is not an Exercism-specific bug with the response codes.)

Some viable options here are listed below. If you just want my thoughts on what’s the best for now, skip to the last bullet :slight_smile:

  • Do nothing. Maintain the current state. Typically this means having a maintainer manually retry the step.
  • Implement rate limiting across repo CIs. I’m not familiar with GH actions but I suspect synchronizing data across repo actions may be … tricky.
  • Cache the configlet somewhere off GH. This would involve running an additional service/cache, which is an “expensive” route.
  • Change the “fetch” pull model to a push model. e.g. on a configlet update, push the new configlet to all the repos. This avoids the whole rate limit issue but means changes to the configlet requires pushing a bunch of PRs and shepherding those across a large number (60? 150?) of repos.
  • Add retry logic (e.g. 3 retries, exponential backoffs with jitter, etc; or, parse the response and see if the response contains a “try again in N seconds”) to the “fetch configlet” logic. This isn’t a perfect solution as it will simply fail again, and harder, if we’re consistently running near the limit. However, if we only occasionally get rate limited, this is a relatively simple and low cost solution.

Tagging @ErikSchierboom @kytrinyx for thoughts.

1 Like

We have retrying built-in: github-actions/fetch-configlet at main · exercism/github-actions · GitHub

I don’t see us passing in a GitHub token though. Maybe that is something that we could try? It would be great if someone could check if that makes any difference.

Could efficacy not be tested by creating a dummy repo and spamming PRs at it?

I think I encountered this same issue.

It should be easy to test, as github responses contain rate limit headers

I’ve also seen this sometimes, and not just with configlet. I believe we should make fetch-configlet use a token.

Another possibility is using actions/cache, but I’d argue against that for now. It’d be slightly more complex, and it doesn’t address the root problem - there might be enough cache misses that we still end up hitting rate limits.

I’ve opened fetch-configlet: use a token · Issue #101 · exercism/github-actions · GitHub

3 Likes

We’ve merged a commit that should have fixed this.

Please let us know if you see similar failure of fetch-configlet during track CI in the future.

3 Likes