This is an umbrella ☂️ task for the upcoming Northward Switchover.
As of Sept 2023, switchovers take place at predictable dates; the work week of the Solar Equinox.
Important Dates:
- Services: Tuesday, 19 March 2024 @14:00 UTC
- Traffic: Tuesday, 19 March 2024 @14:00 UTC
- MediaWiki: Wednesday, 20 March 2024 @14:00 UTC
- Deployment server: Thursday, 21 March 2024
- codfw repool: Thursday, 27th March 2024 @14:00 UTC
Day 1 issues:
- Kartotherian started running out of resources, so we had to repool kartotherian on codfw and restart the service on both datacentres
- Thumbor was using swift.discovery.wmnet, thus thumbor on codfw was attempting to access swift on eqiad using codfw's creds, causing tons of 401s.
- mw-on-k8s started working harder than usual, expected since we turned off multi-DC, we added some more resources just to be on the safe side. Specifically, we added 53 replicas to mw-web and 10 to mw-api-ext.
- Unfortunate coincidence where around the services switchover, changeprop was overwhelmed for unrelated reasons, causing jobs to pile up
Day 2 issues:
- While stopping all maintenance scripts (01-stop-maintenance), we found a user triggered script which we fiercely killed manually, and continued the process
Day 3 issues:
- We switched to deploy1002.eqiad.wmnet without any issues.