The Software Ark: Issue 4
Quote of the week
Success is stumbling from failure to failure with no loss of enthusiasm.
- W. Churchill
Articles
Recovering from Work Stress - HBR
HBR has some good stuff. Stress recovery is a skill and it's a combo of knowing when you need to recover and how. Turns out you're least likely to give time for recovery when you need to - so when you're stressed you end up in a vicious cycle of taking on more work. Some rules of thumb: switch off outside of work, take micro-breaks throughout the day, choose high-effort recovery activities (i.e., not TV which is mindless), and stay connected to nature. A key part of my day is taking a break and playing a quick game of chess - ideally outside and OTB. Turns out that might be what keeps me feeling loose and de-stressed.
For people in leadership roles - it's good to remember that delegation is not abdication. What I've also found useful is Walters' lever of improvement to make sure you're working on high-impact stuff.
The fact that I can develop and deploy an ML model with a few lines of PyTorch is amazing. The growth of low-code data science seems almost inevitable (see this article).
On the other hand - seeing how far AlphaCode has come, it's only a matter of time before we're not writing code as much as instructing an AI on what to build. I wonder if it's appropriate to call that zero-code programming.
I also wonder if building increasingly specialized AI chips might be where the moneys at.
If adding someone to treatment A affects the experience of treatment B, the groups are correlated, and A/B testing isn't going to give you worthwhile data. Instead, you can do a release with "switchbacks" - where you apply treatment A for 30 minutes, and then treatment B, and so on - randomly shuffling the treatments to reduce and zero-out environmental noise. This only really works for non-customer facing changes (e.g., backend algos). Instead, a more powerful technique is "Synthetic Control" where you use a time-series model to predict the effect of the control treatment, and then just cut over to the treatment. The difference between the prediction and the new ground-truth is the average improvement. Since you're not toggling back and forth, there's not that much impact on the customer experience.
Going from Platforms to Products
One of the things that I think Amazon did right was its focus on "Customer Obsession" - not what the customer wanted, but what they needed / would be delighted by. This in turn fostered a product mindset in every engineer. If you know that you're being evaluated on how well you delighted the customer, you go out of your way to clear papercuts, and make sure the experience across products is seamless. It also meant that you focused on your end-customer, not your immediate users. AWS' dev-experience kind of sucks - but it's delightful in how featureful and reliable it is. That trumps dev-experience. People don't really ask for that - reliability is a checkbox to most, and people don't evaluate clouds on future features. But that's what they truly needed, even though they wanted better dev-ex.
What I will also say is that the preparedness paradox really comes into play with productizing platforms. It's hard to predict how bad an outage could have been and using that to predict how much to invest in KTLO/reliability is going to be tough. Moving to a product offering, you need to rethink valuation. A lot of the key-deliverables should be invisible to end-users, and unfortunately, they're going to the projects that aren't sexy and won't sell well to leadership. But remember, internally you can force consumption, but the free market is precisely that: free.
This article goes into a few good KPIs to measure for platform teams.
What an ineffective Staff+ program looks like
I think a lot of this applies for anyone with some level of experience in the field. Everyone needs organizational support.
To the "Yes, Amazon" point - what I will say is that Amazon treated it as separate job-role to standard software development. While PEs / Sr-staff engineers were expected to code, they were given a very wide mandate to do whatever they wanted. It led to some amazing displays of engineering excellence, provided great indirect mentoring for more Jr-engineers. I'm curious to see how Meta does it.
The more I think of it - the more an apprentice style software guild model makes sense. I know that I learnt a lot from these engineers with 30+ yrs. of exp under their belt, and it's why I've been passionate about giving back as well. But to do that at scale, you need organizational support.
I just joined a new team, so I'm looking forward to getting to know the team dynamics and participating in the team's rituals. It's a new team, so maybe I can sneak some of my own in as well. That said, some of the ones in this article boggle the mind. I love the idea around Shrek ears, but I hope to God I never have to sit around and wait for a block of ice to thaw.
My favorite one is bringing in donuts when you broke prod. Repentance for having kept someone from their well-deserved sleep. The number & quality of the donuts was directly correlated to the degree to which you'd broken prod. I once pushed code on a Friday. For that sin I brought in Dunkin Donuts.
I've always been a fan of documentation. Where others focused on TDD, I bought into DDD - documentation driven design. The key idea being that if writing adds clarity, creating the documentation first before you write any code forces you to refine your thinking to the point where the code naturally flows. In that sense language and its expectations are crucial. Naming something right pays out long-term dividends. Refactoring's are clean, a newcomer can immediately understand the codebase, code-search becomes a breeze, etc.
Naming is also one of two hard problems in CS - so this article on linguistic anti-patterns is a welcome breath of fresh air. A related but orthogonal read is this article on managing technical quality in a codebase.
Links
- DeSci - decentralized science
- Ransomware-as-a-service: keep up to date, use zero-trust approaches, and keep backups
- TikTok's browser has a keylogger built-in
- A full-fledged system design course, YMMV - I've not read it yet, but I like collecting them
- I must admit, I love mech keyboards, but this is ugly as sin
- Second order thinking and the market, keep asking "and then what"
- Algos to learn before your next system design interview. Honestly, it's a whole load of BS that you need to learn these before the sys-des interview, but I like algos, so I'm going to take a look.
- Dall-E2 generates some killer thumbnails. I'm using this for my blog as well - got my access and my set of free tokens over the weekend.
- People don't want to work
- Janet Jackson can hit a resonant frequency that can cause your hard drive to fail
- When you don't secure your S3 bucket, terrible things happen
- Negative zero is a thing
- Meta's ESMFold is almost as good, but ~10x as fast at predicting protein structures
- CloudFront supports QUIC
- DDB can import data directly from S3