On writing small software

I’m a professional software developer.¹ I get paid to understand customer and business needs and come up with a software solution that fulfills them, while being secure, correct, maintainable, and scalable.

But I am also a hobby software developer. And for some time I forgot about the freedoms that the latter gives you. This is the story of how I rediscovered some of the little joys of ~~writing software~~ creating something.

The professional programmer’s mindset

When writing code in a professional setting, there are things to focus on beyond artistic expression, artisanal satisfaction, and the pride a craftsperson.

While the software specifications have to be met, you also have to think about how your code will fare in the inevitable code review. Don’t get me wrong, I think code review is great, and, together with version control and unit testing, one of the few qualities that make me comfortable calling our profession software engineering instead of software development/programming/coding/scripting. But while code review does increase the code quality, it also makes you more conservative in your approaches.
Similarly, I think code style enforcement (linting, or full-on code formatters) are great for collaboration on code. I do enjoy how they can bring order into the mangled code mess that I just wrote with the press of a button, but they are not exactly… personal or fun.
Finally, there is the bus factor, or, more positively, the lottery factor, or, more probable, the job change factor: What happens to this code, once the person who wrote it is, for whatever reason, no longer around? I think about this more than it might be healthy. “What will the the person who has to look at this code once I am gone think of me?”

All of these things should be on your mind when writing code in a professional setting. But I had them weighing on my shoulders even when writing code just for fun. Am I just too professional? /s

The revelation, or, how would I do this?

At the start of this year, a good colleague of mine sent me a link to a blog post by Tom Forbes titled “I scanned every package on PyPi and found 57 live AWS keys”. This is obviously relevant to my interest, so I immediately started thinking about how I would design such a system, while my cursor was on its way to click the link.

Think about how you would design such a system and compare!

I came up with a pretty standard solution, at least for me:

Python as my language of choice
A cronjob to query the Python Package Index (PyPI) API
Some job queue, maybe redis, to fetch packages, and to analyze them
PostgreSQL to keep track of scanned versions and findings

Nothing too crazy, because we don’t need to store the actual packages. Right?

But reading Tom’s blog post made me see that solution in another light. Because his (idealized) solution consists conceptually of these three lines of bash:

1
2
3


git clone https://github.com/orf/pypi-data.git
fd -t f --min-depth=2 . 'release_data/' | parallel --xargs jq 'to_entries[] | .value.urls[].url' > all_urls.txt
cat all_urls.txt | parallel -I@ "wget @ -O downloads/@ && rg --json -z --multiline $chonky_regex downloads/@" > matches.txt

The script that ended up running on his laptop for 27 hours is more complex, but I still think it’s impressive how simple this is.

This script might not be the most fault resistant and “web-scale” solution. But damn is it understandable (and thereby hopefully maintainable). It fits into “human memory”, nay, I want to say, “human registers”, in its entirety. We don’t even need any abstractions that help us conceptualize what is happening, like a “job to download a package” could be. It’s all there, and relatively readable if you are remotely familiar with the command line. The hardest part about this is the Regular Expression, which admittedly was outsourced as $chonky_regex. And while it is indeed a chonky boi, I have seen way, waay worse.

Write it for yourself

This article got me thinking.

“It’s 10 PM - do you know where your IAM credentials are?”

No, not about my AWS keys or where they currently hang out, but how I tackle my private hobby code:
I seldom write unit tests for my private code that is not intended for public consumption. But more often than I care to admit I do structure it with an imaginary code reviewer in mind. Or a future open-source code contributor who is shaking their head in disgust over my bad abstraction or silly data structure. BUT YOU CAN KEEP SHAKING YOUR HEAD, YOU DON’T EVEN EXIST, STUPID THEORETICAL FUTURE CONTRIBUTOR!

Tackling this won’t succeed overnight, but I am working on it.

I want to build smaller, less reusable solutions, that don’t have to run forever and be fault tolerant to a fault. It’s all about small, ugly, funny, tricky, interesting code instead!

Writing this last paragraph made me think that my one way out of this professional (not to write enterprise-y) mindset were traditionally code challenges: Be it code golfing like js1k or very time-based “competitions”, like the National Novel Generation Month (NaNoGenMo). Maybe that was one of the reasons I enjoyed them so much!

Embrace the bash

Another takeaway for me was to embrace shell scripting more. I do know my way around the console, but seldomly venture outside the beaten paths or solved more complex tasks in it: I do know my cd, grep and |, but even just xargs was always kinda sus. More often than not I would resort to higher programming languages and mostly copy/pasted and adopted shell scripting solutions I found on the interwebs. But that way, I never remembered them to eventually coming up with them by myself from scratch.

But shell scripting is definitely a super power that I would like to get better at. I am fascinated by the Unix philosophy of writing small, simple programs that can consume the output of other small, simple programs, to form one mighty solution to a problem.

My most recent little hobby project is the RDAP bot which I already wrote about in my last blog post. It is currently consisting out of 14 lines of bash and stores its state in plain files on the filesystem. It doesn’t have the full functionality that I want it to have, and 14 lines already feel way too complex compared to the 3 lines you saw above, but it is a start! I came across jq, diff, and || before of course, but this was the first time I got a deeper understanding of their capabilities and limitations. The one thing I dislike about shell scripting is that I always feel like I am halfway into a command injection vulnerability. But I hope that gets better with more experience.

Let’s go and write some tiny software.

On top of that, I am also the director of our software development department, which obviously comes with more responsibilities than writing pretty software, but that is beside the point of this article. ↩︎

The professional programmer’s mindset#

The revelation, or, how would I do this?#

Write it for yourself#

Embrace the bash#

The professional programmer’s mindset

The revelation, or, how would I do this?

Write it for yourself

Embrace the bash