How Non-Programmers Can Direct AI to Do the Work: Extracting Diagrams and Math Formulas from an Image (Hands-On Example)

A hands-on case study showing how non-programmers can direct AI in Codex to recognize formulas, convert them to LaTeX, and align cropped diagrams with formulas, including practical results and limitations.

Prerequisites

Install VS Code and the Codex extension. You can refer to this article.
Using these tools does not require programming experience. In this workflow, VS Code is mainly for organizing files; you describe your goal, and AI writes and runs the code for you.
This post records the full process to spark ideas and help more people discover practical ways to use AI.

Goal

Prepare an image that contains both math formulas and illustrative diagrams:

We want to achieve three things:

  1. Split diagrams into clean, separate images.
  2. Recognize formulas and convert them to LaTeX for easier editing.
  3. Link each diagram to its corresponding formula.

Process

  1. First, recognize the image, generate a Markdown document, and convert formulas to LaTeX

Conversion result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# 1.png Recognition Result

## Geometry Formulas (LaTeX)

1. Triangle area (side lengths and circumradius):

$$
S = \frac{abc}{4R}
$$

2. Triangle area (inradius and semiperimeter):

$$
S = pr
$$

3. Frustum volume:

$$
V = \frac{\pi\left(r^2 + Rr + R^2\right)h}{3}
$$

4. Frustum lateral surface area:

$$
S = \pi a(R + r)
$$

5. Cylinder volume:

$$
V = \pi r^2 h
$$

6. Cylinder lateral surface area:

$$
S = 2\pi rh
$$

7. Sphere volume:

$$
V = \frac{4}{3}\pi R^3
$$

8. Sphere surface area:

$$
S = 4\pi R^2
$$

9. Law of sines:

$$
\frac{a}{\sin A}=\frac{b}{\sin B}=\frac{c}{\sin C}=2R
$$

10. Law of cosines:

$$
a^2=b^2+c^2-2bc\cos A
$$

$$
b^2=a^2+c^2-2ac\cos B
$$

$$
c^2=a^2+b^2-2ab\cos C
$$

All formulas were recognized, and AI even added formula names automatically, which is great. However, the small diagram crops were not generated yet.

  1. Try asking AI to crop the image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# 1.png Recognition Result (side-by-side layout with refined cropping)

<table>
  <tr>
    <td><img src="crops/01_triangle_circum.png" width="220" /></td>
    <td>

$$
S = \frac{abc}{4R}
$$

</td>
  </tr>
  <tr>
    <td><img src="crops/02_triangle_incircle.png" width="220" /></td>
    <td>

$$
S = pr
$$

</td>
  </tr>
  <tr>
    <td><img src="crops/03_frustum.png" width="220" /></td>
    <td>

$$
V = \frac{\pi\left(r^2 + Rr + R^2\right)h}{3}
$$

$$
S = \pi a(R + r)
$$

</td>
  </tr>
  <tr>
    <td><img src="crops/04_cylinder.png" width="220" /></td>
    <td>

$$
V = \pi r^2 h
$$

$$
S = 2\pi rh
$$

</td>
  </tr>
  <tr>
    <td><img src="crops/05_sphere.png" width="220" /></td>
    <td>

$$
V = \frac{4}{3}\pi R^3
$$

$$
S = 4\pi R^2
$$

</td>
  </tr>
  <tr>
    <td><img src="crops/06_laws.png" width="220" /></td>
    <td>

$$
\frac{a}{\sin A}=\frac{b}{\sin B}=\frac{c}{\sin C}=2R
$$

$$
a^2=b^2+c^2-2bc\cos A
$$

$$
b^2=a^2+c^2-2ac\cos B
$$

$$
c^2=a^2+b^2-2ab\cos C
$$

</td>
  </tr>
</table>

Result: The diagrams were cropped and placed in matching positions, but the crops still include noise from nearby areas.

  1. Fix over-cropping: keep the full target area first, then manually remove extra parts
    The output is still inconsistent at this step. It is not yet clear whether the issue comes from prompting or model variability in visual localization.

Summary

Using Codex feels different from chatting directly on chatgpt.com.
On chatgpt.com, it often feels like AI is guiding your work; in Codex, it feels more like AI is executing your instructions.
After you describe your requirement, AI can generate code, run it, and complete the task. The feeling is that you are directing AI to do the work.
This process does not require strong programming skills, and non-programmers can still get real results step by step.

记录并分享
Built with Hugo
Theme Stack designed by Jimmy