反射、认知与思维

发布: 1周前 (2026年1月7日 GMT+8 15:55)

5 min read

Source: Dev.to

请提供您希望翻译的正文内容，我将按照要求把它译成简体中文并保留原有的格式、Markdown 语法以及技术术语。谢谢！

概述

在我之前的帖子中，我介绍了基础——让 LED 闪烁和了解布线。本篇文章扩展了机器人实际运行所需的内容，重点关注 Reflex Layer（Arduino 原型制作）和 Cognition Layer（计算机视觉和本地 AI）。

Source: …

Reflex Layer：Arduino 原型制作

可视里程表

我使用四个 LED 来表示有符号 char 的四位二进制，制作了一个可视里程表。把计数器从 120（接近 1 字节有符号整数的 127 上限）开始，我可以看到里程表溢出的瞬间：

当计数到 128 时，LED 翻转为 ‑128，串口监视器报告了负距离。
教训：为传感器数值选择正确的数据类型，否则当达到极限时，机器人会误以为自己在倒退。

用光敏电阻模拟步伐

因为当时还没有移动底盘，我使用光敏电阻来模拟“步伐”。每次手机灯光的闪烁都会产生一个脉冲，Arduino 将其视为一步。另一个 LED 根据检测到的光强改变颜色，提供即时的视觉反馈。

用勾股定理计算距离

利用勾股定理

[ a^2 + b^2 = h^2 ]

我计算了从起点到当前位置的直线距离。串口绘图器显示了阶梯状的 (X) 和 (Y) 坐标，而计算得到的斜边则绘出了一条平滑的曲线。

#include 

// ... logic to detect light pulse ...
if (sensorValue < 400 && !triggered) {
    xPos += 5;
    yPos += 3;
    // h = sqrt(x^2 + y^2)
    hypotenuse = sqrt(pow(xPos, 2) + pow(yPos, 2));
    triggered = true;
}

电机与舵机的难题

里程表工作后，我尝试添加硬件，根据行驶距离驱动电机。Arduino 电机盾安装很顺利，但连接 Geek 舵机时却让人摸不着头脑：

我可以点亮 LED，但舵机不转动。
舵机本质上是需要外部电源的电机。
兼容 LEGO 的舵机在转动前必须接好正确的电压和地线。

这些挑战促使我去探索下一层的大脑。

认知层：Raspberry Pi 5 + Vision AI

设置 “高功能” 大脑

我使用 CanaKit 套件组装了 Raspberry Pi 5（快速设置，更新软件包）。硬件准备好后，我直接进入边缘 AI。

摄像头和本地视觉语言模型

连接了一台 ELP 2.0 Megapixel USB 摄像头。
安装 Ollama 并拉取本地视觉语言模型 openbmb/minicpm-v4.5。
编写了一个使用 OpenCV 捕获帧并发送给模型的 Python 脚本。

示例输出

DROID SAYS:
Observing: A human with glasses and purple attire occupies the center of an indoor space;
ceiling fan whirs above while wall decor and doorframes frame background elements—a truly multifaceted environment!

处理单帧大约需要三分钟——虽然慢，但机器人真的在“思考”它的周围环境。

摄像头与 AI 之间的桥梁

import cv2
import ollama
import os
import time

def capture_and_analyze():
    # Initialize USB Camera
    cam = cv2.VideoCapture(0)

    if not cam.isOpened():
        print("Error: Could not access /dev/video0. Check USB connection.")
        return
    print("--- Droid Vision Active ---")

    # Warm-up: Skip a few frames so the auto-exposure adjusts
    for _ in range(5):
        cam.read()
        time.sleep(0.1)

    ret, frame = cam.read()

    if ret:
        img_path = 'droid_snapshot.jpg'
        cv2.imwrite(img_path, frame)
        print("Image captured! Sending to MiniCPM-V-4.5...")
        try:
            # Querying the local Ollama model
            response = ollama.chat(
                model='openbmb/minicpm-v:4.5',
                messages=[{
                    'role': 'user',
                    'content': 'Act as a helpful LEGO droid. Describe what you see in one short, robotic sentence.',
                    'images': [img_path]
                }]
            )
            print("\nDROID SAYS:", response['message']['content'])
        except Exception as e:
            print(f"Ollama Error: {e}")

        # Clean up the photo after analysis
        if os.path.exists(img_path):
            os.remove(img_path)
    else:
        print("Error: Could not grab a frame.")
    cam.release()

if __name__ == "__main__":
    capture_and_analyze()

下一步

Motor Integration: 解决伺服电机的电源布线并测试实际运动。
Speeding Up Vision: 实验更小、更快的模型（例如 OpenCV 人脸识别、量化 VLM）以缩短三分钟的推理时间。
Layer Fusion: 将反射性运动控制与认知感知相结合，使机器人能够实时响应视觉线索。

反射、认知与思维

概述

Reflex Layer：Arduino 原型制作

可视里程表

用光敏电阻模拟步伐

用勾股定理计算距离

电机与舵机的难题

认知层：Raspberry Pi 5 + Vision AI

设置 “高功能” 大脑

摄像头和本地视觉语言模型

示例输出

摄像头与 AI 之间的桥梁

下一步

相关文章

Rapg：基于 TUI 的密钥管理器

技术是赋能者，而非救世主

行业调查：编码更快，调试更慢

踏入 agentic coding

概述

Reflex Layer：Arduino 原型制作

可视里程表

用光敏电阻模拟步伐

用勾股定理计算距离

电机与舵机的难题

认知层：Raspberry Pi 5 + Vision AI

设置 “高功能” 大脑

摄像头和本地视觉语言模型

示例输出

摄像头与 AI 之间的桥梁

下一步

相关文章

Rapg：基于 TUI 的密钥管理器

技术是赋能者，而非救世主

行业调查：编码更快，调试更慢

踏入 agentic coding

认知层：Raspberry Pi 5 + Vision AI