I’m porting Jonas Wagner‘s excellent smartcrop.js, which analyzes an image to recommend a good crop, from JavaScript to Java.
When porting code, I always try to understand it along the way so that if (when) there are bugs, I’ll have an idea of where the bug might be. I used ChatGPT 4 to untangle a particularly terse bit of code, and was — frankly — shocked at how well it did.
The Original Code
Among many other heuristics, smartcrop applies the Rule of Thirds to find a good crop of an image quickly and reliably. It implements this rule with the following code:
// Gets value in the range of [0, 1] where 0 is the center of the pictures
// returns weight of rule of thirds [0, 1]
function thirds(x) {
x = (((x - 1 / 3 + 1.0) % 2.0) * 0.5 - 0.5) * 16;
return Math.max(1.0 - x * x, 0.0);
}
Now, I’m no savant, but I have a degree in CS and have been around the metaphorical block a few times, so I like to think I’m smarter than the average bear at reading code and math. That said, I found this code to be completely inscrutable, and how — even if — it implements a heuristic for the rule of thirds completely escaped me.
When I’m faced with dense numerical code that I can’t grok, I usually break out a Jupyter notebook and plug and chug until I understand what’s going on. It works, but it takes time and effort, and I’m just porting a library.
So this time, I asked my good buddy, ChatGPT, for some help reverse engineering the code.
Enter ChatGPT
I had the following (slightly abridged) conversation about the above code with ChatGPT:
ANDY This Java method is supposed to provide a heuristic value for the rule of thirds when cropping images. Can you help me unpack if it works, and how? private float thirds(float x) { x = (((x - 1.0f / 3.0f + 1.0f) % 2.0f) * 0.5f - 0.5f) * 16.0f; return Math.max(1.0f - x * x, 0.0f); } CHATGPT The rule of thirds is a principle in photography and visual arts that suggests that an image should be divided into nine equal parts by two equally spaced horizontal lines and two equally spaced vertical lines. Important compositional elements should be placed along these lines or at their intersections to create more tension, energy, and interest in the composition compared to simply centering the subject. The thirds function you've provided seems to be a heuristic designed to quantify how well a particular point (given by its x coordinate) aligns with the rule of thirds. The function manipulates the x coordinate in a way that likely aims to give higher values (closer to 1) when x is near the thirds (either 1/3 or 2/3 of the way across the frame), and lower values (closer to 0) as x moves away from these points. Let's break down the function step-by-step: x - 1.0f / 3.0f + 1.0f: This shifts the x value so that one of the thirds (specifically, the 1/3 mark) is moved to the origin (0 point), and then adds 1 to ensure the value is positive. ((... % 2.0f) * 0.5f - 0.5f): The modulo operation with 2 ensures that the result wraps around every 2 units, creating a repeating pattern. This is scaled down by 0.5 and then shifted by -0.5 to center the peak of the pattern at 0. The pattern repeats every 2 units, so there will be peaks at positions corresponding to the rule of thirds (1/3 and 2/3) due to the initial shift and wrap-around effect. * 16.0f: This scales the pattern up, increasing the sensitivity of the function around the thirds. The choice of 16 is likely arbitrary and could be adjusted based on how sharply the function should penalize deviations from the rule of thirds. Math.max(1.0f - x * x, 0.0f): Finally, the function squares the scaled value (increasing the penalty for being away from the thirds), subtracts it from 1 (inverting the curve so that values near the thirds are higher), and ensures that the result is not negative (using Math.max with 0). In essence, this function is designed to produce a bell-shaped curve that peaks at points corresponding to the rule of thirds, and the curve's width and height are manipulated through the scaling and shifting operations. The closer an element is to a third, the higher the heuristic value returned by the function, with the value decreasing quadratically as the element moves away from the third. This heuristic could be used in image cropping algorithms to score potential crops based on how well they align with the rule of thirds, favoring compositions where important elements are near these thirds lines. ANDY This is an outstanding description. Can you rewrite the above method so that the intent is clear? Using this as commenting throughout the function would be useful. CHATGPT Certainly! I'll rewrite the method with more descriptive variable names and include comments to explain each step of the process. This should make the intent and functionality of the method clearer: /** * Calculates a heuristic value for alignment with the rule of * thirds. This function evaluates how closely a given point (x) * aligns with the rule of thirds, returning a higher value for * points closer to the thirds lines (1/3 and 2/3) and lower values * for points further away. The function uses a bell-shaped curve * that peaks at the thirds lines and decreases quadratically with * distance from these points. * * @param x The normalized position of a point within a frame, where * 0 <= x <= 1. * @return A heuristic value indicating the alignment with the rule * of thirds, where 1 is perfectly aligned, and 0 indicates no * alignment. */ private float evaluateRuleOfThirds(float x) { // Shift x so that 1/3 aligns with 0 and wrap around every 2 // units to create a repeating pattern float shiftedAndWrappedX = (((x - 1.0f / 3.0f + 1.0f) % 2.0f) * 0.5f - 0.5f); // Scale the pattern to increase sensitivity around the thirds float scaledX = shiftedAndWrappedX * 16.0f; // Apply a quadratic function to create a bell-shaped curve that // peaks at the thirds and ensure the result is non-negative float heuristicValue = Math.max(1.0f - scaledX * scaledX, 0.0f); return heuristicValue; } ANDY In the above method, can you break the expression for shiftedAndWrappedX into multiple steps, each with their own comments? The Javadoc and everything else already looks great. CHATGPT Sure, I'll break down the expression for shiftedAndWrappedX into multiple steps with individual comments to further clarify each part of the process: /** * Calculates a heuristic value for alignment with the rule of * thirds. This function evaluates how closely a given point (x) * aligns with the rule of thirds, returning a higher value for * points closer to the thirds lines (1/3 and 2/3) and lower values * for points further away. The function uses a bell-shaped curve * that peaks at the thirds lines and decreases quadratically with * distance from these points. * * @param x The normalized position of a point within a frame, where * 0 <= x <= 1. * @return A heuristic value indicating the alignment with the rule * of thirds, where 1 is perfectly aligned, and 0 indicates no * alignment. */ private float evaluateRuleOfThirds(float x) { // Step 1: Shift 'x' so that the 1/3 mark aligns with 0 float shiftedX = x - 1.0f / 3.0f; // Step 2: Add 1 to ensure the value is positive (necessary for // the modulo operation) float positiveShiftedX = shiftedX + 1.0f; // Step 3: Apply modulo 2 to create a repeating pattern every 2 // units float wrappedX = positiveShiftedX % 2.0f; // Step 4: Scale and shift the wrapped value to center the peak // of the bell curve at 0. The '* 0.5f' scales the pattern down, // and the '- 0.5f' shifts the peak to 0 float shiftedAndWrappedX = (wrappedX * 0.5f) - 0.5f; // Step 5: Scale the pattern to increase sensitivity around the // thirds float scaledX = shiftedAndWrappedX * 16.0f; // Apply a quadratic function to create a bell-shaped curve that // peaks at the thirds and ensure the result is non-negative float heuristicValue = Math.max(1.0f - scaledX * scaledX, 0.0f); return heuristicValue; }
The Revised Code
That was surprisingly helpful. It picked up the quadratic nature of the algorithm and identified that the term 16.0f
is an arbitrary weight used to control the shape of the bell curve, and then explained it clearly and concisely. And it did it all in less time than I would have taken mucking about in Jupyter.
Hopefully the reader agrees that the resulting code is clearer:
/**
* Calculates a heuristic value for alignment with the rule of thirds.
* This function evaluates how closely a given point (x) aligns with the
* rule of thirds, returning a higher value for points closer to the
* thirds lines (1/3 and 2/3) and lower values for points further away.
* The function uses a bell-shaped curve that peaks at the thirds lines
* and decreases quadratically with distance from these points.
*
* @param x The normalized position of a point within a frame, where
* 0 <= x <= 1.
* @return A heuristic value indicating the alignment with the rule of
* thirds, where 1 is perfectly aligned, and 0 indicates no alignment.
*/
private float evaluateRuleOfThirds(float x) {
// Step 1: Shift 'x' so that the 1/3 mark aligns with 0
float shiftedX = x - 1.0f / 3.0f;
// Step 2: Add 1 to ensure the value is positive (necessary for the
// modulo operation)
float positiveShiftedX = shiftedX + 1.0f;
// Step 3: Apply modulo 2 to create a repeating pattern every 2 units
float wrappedX = positiveShiftedX % 2.0f;
// Step 4: Scale and shift the wrapped value to center the peak of the
// bell curve at 0. The '* 0.5f' scales the pattern down, and the
// '- 0.5f' shifts the peak to 0
float shiftedAndWrappedX = (wrappedX * 0.5f) - 0.5f;
// Step 5: Scale the pattern to increase sensitivity around the thirds
float scaledX = shiftedAndWrappedX * 16.0f;
// Step 6: Apply a quadratic function to create a bell-shaped curve
// that peaks at the thirds and ensure the result is non-negative
float heuristicValue = Math.max(1.0f - scaledX * scaledX, 0.0f);
return heuristicValue;
}
In a code review setting, I would have rejected the original code for being too hard to understand, and very pleased with ChatGPT’s rewrite. Of course, this is just one small example, but it’s clear to me that ChatGPT-style conversational models have a role in current and future developer workflows.
ChatGPT Subscription
As /u/TokyoOldMan on Reddit pointed out, it would be useful to know exactly which tool(s) I used to do this work!
I used my bog-standard $20/mo plan on OpenAI, which grants me access to ChatGPT 4. It’s possible one could have done this with a free plan, but I haven’t tested. Also, I’m not sure how ChatGPT 4 stacks up against ChatGPT 3.x in these scenarios, but I’d be very curious to hear from someone with more experience than me!
Update: Conclusions
Following a spirited discussion on Reddit, it’s clear that I didn’t do a good enough job of contextualizing this task, what I hoped ChatGPT would deliver, and my opinions of what it finally did deliver.
First, while I chose a dense bit of mathematical code on purpose, I chose a poor example. While the code “works” in the final analysis, as evidenced by the clear “rule of thirds” grid on the diagnostic image at the head of this post, the code is structured oddly, in that it assumes it is called with coordinates from an odd coordinate system that is not clear from the function itself or its documentation. Also, as /u/SLiV9 pointed out, not all of the terms in the math are required! So this snippet I picked ended up being a poor example because the quality of the original code itself distracts from the discussion about AI.
Next, while I think that the contributions ChatGPT made to to deciphering and documenting the code were useful, I also wasn’t clear on how they were useful. In my opinion, ChatGPT’s code is clearly a much better submission to a code review than the original code. However, that’s different from saying that ChatGPT’s code would have been accepted by the code review, and I didn’t make that distinction in the original writing. Rather, it’s simply a comment on how easy or hard the original code versus ChatGPT’s code would be to review, and ChatGPT’s code would be much easier to review.
In any case, after seeing ChatGPT in action on this code, it is my strong belief that ChatGPT (or similar models) have a role in modern software development workflows, and in my opinion this example bears that out. And that’s ultimately what I was trying to communicate.
or you could google for the comment and find one of the many places the function has been recoped and reused. First example I saw is an app called Photonic, but it popped up lots of places.
https://github.com/oferkv/phototonic/blame/master/SmartCrop.cpp#L162-L167