Reflexes, Cognition, and Thought
Source: Dev.to
Overview
In my previous posts I covered the basics—making LEDs blink and understanding wiring. This entry expands on what the droid will actually need to function, focusing on the Reflex Layer (Arduino prototyping) and the Cognition Layer (computer vision and local AI).
Reflex Layer: Arduino Prototyping
Visual Odometer
I built a visual odometer using four LEDs to represent the four bits of a signed char. Starting the counter at 120 (near the 127 limit of a 1‑byte signed integer) let me watch the moment the odometer overflowed:
- At 128 the LEDs flipped to ‑128, and the Serial Monitor reported a negative distance.
- Lesson: choose the correct data type for sensor values, or the droid will think it’s moving backward when a limit is reached.
Simulating Steps with a Photoresistor
Because I didn’t have a moving chassis yet, I used a photoresistor to simulate “steps.” Each flash of my phone’s light generated a pulse that the Arduino treated as a step. An additional LED changed color based on the detected light, giving immediate visual feedback.
Distance Calculation with the Pythagorean Theorem
Using the Pythagorean theorem
[ a^2 + b^2 = h^2 ]
I calculated the straight‑line distance from the starting point. The Serial Plotter displayed stair‑stepped (X) and (Y) coordinates while the computed hypotenuse traced a smooth curve.
#include
// ... logic to detect light pulse ...
if (sensorValue < 400 && !triggered) {
xPos += 5;
yPos += 3;
// h = sqrt(x^2 + y^2)
hypotenuse = sqrt(pow(xPos, 2) + pow(yPos, 2));
triggered = true;
}
Motor and Servo Hurdles
After the odometer worked, I tried adding hardware to spin a motor based on distance traveled. The Arduino motor shield installed easily, but wiring the Geek Servos proved confusing:
- I could light an LED, but the servo didn’t spin.
- The servo is essentially a motor that requires an external power source.
- The LEGO‑compatible servos need proper voltage and ground connections before they’ll move.
These challenges pushed me to explore the next brain layer.
Cognition Layer: Raspberry Pi 5 + Vision AI
Setting Up the “High‑Functioning” Brain
I assembled a Raspberry Pi 5 from a CanaKit kit (quick setup, package update). After the hardware was ready, I moved straight into edge AI.
Camera and Local Vision Language Model
- Connected an ELP 2.0 Megapixel USB camera.
- Installed Ollama and pulled the local Vision Language Model
openbmb/minicpm-v4.5. - Wrote a Python script with OpenCV to capture a frame and send it to the model.
Sample Output
DROID SAYS:
Observing: A human with glasses and purple attire occupies the center of an indoor space;
ceiling fan whirs above while wall decor and doorframes frame background elements—a truly multifaceted environment!
Processing a single frame took about three minutes—slow, but the droid is genuinely “thinking” about its surroundings.
Bridge Between Camera and AI
import cv2
import ollama
import os
import time
def capture_and_analyze():
# Initialize USB Camera
cam = cv2.VideoCapture(0)
if not cam.isOpened():
print("Error: Could not access /dev/video0. Check USB connection.")
return
print("--- Droid Vision Active ---")
# Warm-up: Skip a few frames so the auto-exposure adjusts
for _ in range(5):
cam.read()
time.sleep(0.1)
ret, frame = cam.read()
if ret:
img_path = 'droid_snapshot.jpg'
cv2.imwrite(img_path, frame)
print("Image captured! Sending to MiniCPM-V-4.5...")
try:
# Querying the local Ollama model
response = ollama.chat(
model='openbmb/minicpm-v:4.5',
messages=[{
'role': 'user',
'content': 'Act as a helpful LEGO droid. Describe what you see in one short, robotic sentence.',
'images': [img_path]
}]
)
print("\nDROID SAYS:", response['message']['content'])
except Exception as e:
print(f"Ollama Error: {e}")
# Clean up the photo after analysis
if os.path.exists(img_path):
os.remove(img_path)
else:
print("Error: Could not grab a frame.")
cam.release()
if __name__ == "__main__":
capture_and_analyze()
Next Steps
- Motor Integration: Resolve power‑supply wiring for the servos and test actual movement.
- Speeding Up Vision: Experiment with smaller, faster models (e.g., OpenCV face‑recognition, quantized VLMs) to reduce the three‑minute inference time.
- Layer Fusion: Combine reflexive motion control with cognitive perception so the droid can react to visual cues in real time.