(More) Statistics Without the Agonizing Pain: Probability Distributions

One of my favorite conference talks of all time is Statistics Without the Agonizing Pain, by John Rauser, who was at that time head of data science at Pinterest. In this talk, he explains the statistical argument underpinning the Student’s t-Test in simple, approachable terms using an unforgettable example involving mosquitoes and beer. It’s about 15 minutes long, and well worth your time, if you haven’t watched it before.

After watching that video, I realized that—like most things—statistics is complex but ultimately straightforward once you understand the underlying ideas. The problem is that the modern approach to teaching statistics often gets in the way of that understanding. Historically, statistical methods were designed for a world where all computation had to be done by hand, so they were optimized to minimize calculation, not to maximize clarity or intuition. That design choice still has value today—efficient algorithms make modern statistical programs fast and practical. But we continue to teach statistics as if computation were still the bottleneck, even though we now all carry supercomputers in our pockets. Seen in that light, it’s obvious that the way we teach statistics has not kept up with the way we practice statistics.

To that end, I thought I’d write down some things I’ve learned about statistics over the years in a way that I hope is clearer than the average statistical textbook, mostly so I don’t forget them, but in the hopes that maybe they’ll be useful to others, too.

Continue reading “(More) Statistics Without the Agonizing Pain: Probability Distributions”

AWS SageMaker Object Detection Training Gotchas

As part of updates to arachn.io, I’ve started tinkering with object detection machine learning models. During my experiments on AWS SageMaker, I found that AutoPilot does not support object detection models, so I had train using notebooks. As a result, I hit some “gotchas” fine-tuning TensorFlow Object Detection models. While this notebook works a treat on its own training data (at least when run through SageMaker studio), this discussion will focus on things I learned while trying to run it on my own data on August 31, 2024.

Continue reading “AWS SageMaker Object Detection Training Gotchas”