Scheduled Maintenance

iHiD · June 22, 2025, 5:39pm

New submissions (CLI or editor) should now work. Old solutions need to be reprocessed but my bastion isn’t up yet and it’s nearly 3am so I’m going to go to bed and I’ll rerun them in the morning.

Sorry that took a bit longer than expected!

There’s also a couple of other bits that won’t be working yet (e.g. LOC counter) but those should be quick fixes in my morning.

The upside is that the changes I’ve made today save Exercism nearly $7k/year. So hopefully it’s a little pain for some big gains

IsaacG · June 22, 2025, 7:48pm

I look forward to an explanation of what this PR actually does

I hope you get some solid sleep!

thoughtarray · June 23, 2025, 2:52am

I haven’t been able to get any Gleam tests to run via CLI or web. I switched to Python to see if it was something wrong with me, but those tests seem to run.

Anyone else having issues with Gleam? Could it be related to this?

BNAndras · June 23, 2025, 3:18am

I suspect the test runner might be too slow. Gleam version updated · exercism/gleam-test-runner@ad86fc9 · GitHub shows about 40 seconds to run the tests. It seems most of the tests should fail pretty quickly so that run time primarily is made of two of the test suites. Test runners only get 20 seconds to test student code so that’s a bit dicey.

iHiD · June 23, 2025, 3:58am

BEAM tracks were offline but are working again now, so Gleam should be good. Sorry about that, and thanks for reporting! I wouldn’t have known otherwise

thoughtarray · June 23, 2025, 2:03pm

Oh, glad we caught it! What was wrong with them?

Do y’all have some kind of health or metric dashboard you use for the platform? I’m in platform engineering myself; so I’m always curious about these things.

iHiD · June 23, 2025, 2:14pm

BEAM languages rely on an internal network that has no external network access (docker network create --internal internal). If you give them no network, they hang for about 20s before running, so you have to give them a network connection, even if it’s dead.

The new setup didn’t run the script that creates that network, so none of the new tooling machines had the network, so all the runs errored with a docker “no network called internal” or similar message. This is the commit but it just moved the relevant code to a place it acutally got called.

This has been broken for like 2 years, but the old AMIs had already had a BEAM language run through them when I made it, so the network had been created on them, so the error didn’t materialise until I made a new AMI from scratch yesterday.

No. Basically nothing breaks unless I do something, or we get attacked, in which case the “break” is very slow site which I pretty much get told about immediately by our users. Then I dig into metrics/logs on AWS to work out what’s going on. But the whole setup is very self-healing/scaling, so as we get increases in traffics, or machines fall over, things just heal themselves. I barely look at the ops side of things as a result. But this week we’d been hit really hard by bots which was overloading everything, so I spent the week changing things and deploying most of it yesterday.

These were the changes I ran yesterday along with this PR and the ones referenced in its description

thoughtarray · June 23, 2025, 2:31pm

Oh wow, those are pretty big changes! How do y’all test big changes like that? Do you have something like an infrastructural dev or a staging environment?

I guess in a way @BNAndras, you were right! Timing out of a network test did slow the tests haha.

iHiD · June 23, 2025, 2:33pm

Nah, I just roll with it and trust myself to not break things too much

iHiD · June 23, 2025, 2:34pm

It helps that I’ve written every line of that Terraform and the vast majority of the backend part of Exercism, so I have a really intimate understanding how everything fits together. And 30 years of pain from making mistakes, which has honed my knowledge of the mistakes I’m likely to make!

Annoyingly, I’d have had about 20mins of downtime yesterday, but I just couldn’t get the dockerfiles to build on GitHub actions. They just timed out every time. Eventually after 45mins one finally worked, and then it was smooth sailing. But that was the one bit I struggle with!

jagdishdrp · June 23, 2025, 2:38pm

That’s honestly so cool! You’ve been in tech for 30 years, that’s actually longer than I’ve been alive
I’m just 18, still figuring things out, but super inspired by that!

iHiD · June 23, 2025, 2:44pm

Yeah, I started aged 8. Started selling websites aged 14. Started running busineses aged 23. And now I’m 42. So I’ve been doing this for a while, and built a lot of websites from scratch over that time. So gone from old-school shared PHP servers, to complex micro-service autoscaling AWS. The secret is to try and keep being curious, keep having fun, and keep staying focused on the problem, not the shiniest new tech!

thoughtarray · June 23, 2025, 3:02pm

Honestly, I feel the same way. It’s one thing when you’re in a company working with others. It’s another when you are “the guy (or gal)” who holds up everything yourself. It’s not worth adding all of the overhead and cost of a bunch of tools or processes.

iHiD · June 23, 2025, 3:04pm

This is the problem we have btw. We just keep getting DDOS’d constantly atm. This is the forum. So normally basically very little traffic. Maybe 200 reqs/s. Then suddenly we get hit but hundreds of IPs at 1.4k reqs/s. Our forum servers are just too small to deal with that, so we 429 constantly. It lasts for a minute, then they stop the attack. So infuriating.

The website one was more extremely (like 50-100k reqs/s) but those webservers can handle that more. But it’s still infuriating and expensive for us!

thoughtarray · June 23, 2025, 3:05pm

Yikes! ChatGPT & other LLMs?

iHiD · June 23, 2025, 3:10pm

Maybe. ChatGPT etc would act responsibly, not flood sites, but other people might do otherwise. But I’m not convinced this forum has anywhere near enough pages to maintain 1.5kreq/s for a few minutes. It only probably has 1k topics in total. So it feels somewhat malicious.

jagdishdrp · June 23, 2025, 5:28pm

Wait, people are intentionally doing this just to disrupt things?

IsaacG · June 23, 2025, 5:30pm

It’s not happening by accident ;)

IsaacG · June 23, 2025, 8:41pm

8 posts were split to a new topic: JS Lasagna test help

IsaacG · June 24, 2025, 3:02pm

A post was merged into an existing topic: JS Lasagna test help