Cloud and AI practitioner. Founder of Perpetual Squared.
I help organisations cut through hype and build AI and cloud systems that actually work. I'm also the type of guy to spend a weekend figuring out how to run a 14B LLM on a laptop with soldered RAM and 8GB VRAM, just because it's an interesting opportunity to solve for.
- Cloud architecture: AWS, GCP, Azure
- AI/LLM systems and agent frameworks
- Platform engineering and DevSecOps
- MLOps and AI infrastructure
- Enterprise architecture and cost optimisation at scale
I work in a variety of domains professionally but in my spare time been looking into solving some hardware limitations I've run into:
- nbd-vram - CUDA daemon that backs an NBD block device with GPU VRAM. Turns your graphics card into a 1.3 GB/s swap device. No kernel module, no P2P API - just
dlopen(libcuda.so)and a Unix socket. - llm-fit - LD_PRELOAD hook that redirects
cudaMalloctocudaMallocManaged. Hypothesis: models slightly over VRAM capacity load fully on GPU via CUDA Unified Memory. Finding: works in a narrow ~200-300MB sweet spot. Beyond that, PCIe page migration is slower than Ollama's native CPU split.
No-fluff takes on cloud strategy, AI architecture, and the decisions that separate good systems from great ones.





