New submissions (CLI or editor) should now work. Old solutions need to be reprocessed but my bastion isn’t up yet and it’s nearly 3am so I’m going to go to bed and I’ll rerun them in the morning.
Sorry that took a bit longer than expected!
There’s also a couple of other bits that won’t be working yet (e.g. LOC counter) but those should be quick fixes in my morning.
The upside is that the changes I’ve made today save Exercism nearly $7k/year. So hopefully it’s a little pain for some big gains
I haven’t been able to get any Gleam tests to run via CLI or web. I switched to Python to see if it was something wrong with me, but those tests seem to run.
Anyone else having issues with Gleam? Could it be related to this?
I suspect the test runner might be too slow. Gleam version updated · exercism/gleam-test-runner@ad86fc9 · GitHub shows about 40 seconds to run the tests. It seems most of the tests should fail pretty quickly so that run time primarily is made of two of the test suites. Test runners only get 20 seconds to test student code so that’s a bit dicey.
BEAM tracks were offline but are working again now, so Gleam should be good. Sorry about that, and thanks for reporting! I wouldn’t have known otherwise
Do y’all have some kind of health or metric dashboard you use for the platform? I’m in platform engineering myself; so I’m always curious about these things.
BEAM languages rely on an internal network that has no external network access (docker network create --internal internal). If you give them no network, they hang for about 20s before running, so you have to give them a network connection, even if it’s dead.
The new setup didn’t run the script that creates that network, so none of the new tooling machines had the network, so all the runs errored with a docker “no network called internal” or similar message. This is the commit but it just moved the relevant code to a place it acutally got called.
This has been broken for like 2 years, but the old AMIs had already had a BEAM language run through them when I made it, so the network had been created on them, so the error didn’t materialise until I made a new AMI from scratch yesterday.
No. Basically nothing breaks unless I do something, or we get attacked, in which case the “break” is very slow site which I pretty much get told about immediately by our users. Then I dig into metrics/logs on AWS to work out what’s going on. But the whole setup is very self-healing/scaling, so as we get increases in traffics, or machines fall over, things just heal themselves. I barely look at the ops side of things as a result. But this week we’d been hit really hard by bots which was overloading everything, so I spent the week changing things and deploying most of it yesterday.
These were the changes I ran yesterday along with this PR and the ones referenced in its description
Oh wow, those are pretty big changes! How do y’all test big changes like that? Do you have something like an infrastructural dev or a staging environment?
I guess in a way @BNAndras, you were right! Timing out of a network test did slow the tests haha.
It helps that I’ve written every line of that Terraform and the vast majority of the backend part of Exercism, so I have a really intimate understanding how everything fits together. And 30 years of pain from making mistakes, which has honed my knowledge of the mistakes I’m likely to make!
Annoyingly, I’d have had about 20mins of downtime yesterday, but I just couldn’t get the dockerfiles to build on GitHub actions. They just timed out every time. Eventually after 45mins one finally worked, and then it was smooth sailing. But that was the one bit I struggle with!
That’s honestly so cool! You’ve been in tech for 30 years, that’s actually longer than I’ve been alive
I’m just 18, still figuring things out, but super inspired by that!
Yeah, I started aged 8. Started selling websites aged 14. Started running busineses aged 23. And now I’m 42. So I’ve been doing this for a while, and built a lot of websites from scratch over that time. So gone from old-school shared PHP servers, to complex micro-service autoscaling AWS. The secret is to try and keep being curious, keep having fun, and keep staying focused on the problem, not the shiniest new tech!
Honestly, I feel the same way. It’s one thing when you’re in a company working with others. It’s another when you are “the guy (or gal)” who holds up everything yourself. It’s not worth adding all of the overhead and cost of a bunch of tools or processes.
This is the problem we have btw. We just keep getting DDOS’d constantly atm. This is the forum. So normally basically very little traffic. Maybe 200 reqs/s. Then suddenly we get hit but hundreds of IPs at 1.4k reqs/s. Our forum servers are just too small to deal with that, so we 429 constantly. It lasts for a minute, then they stop the attack. So infuriating.
The website one was more extremely (like 50-100k reqs/s) but those webservers can handle that more. But it’s still infuriating and expensive for us!
Maybe. ChatGPT etc would act responsibly, not flood sites, but other people might do otherwise. But I’m not convinced this forum has anywhere near enough pages to maintain 1.5kreq/s for a few minutes. It only probably has 1k topics in total. So it feels somewhat malicious.