Обзор Fusion HAT+ – Добавление голосового ИИ и управления сервоприводами/моторами к Raspberry Pi для робототехники, умного дома или образования — CNXSoft- новости Android-приставок и встраиваемых систем

Компания SunFounder предоставила образец для обзора платы расширения Fusion HAT+ для Raspberry Pi , разработанной для управления моторами и сервоприводами с использованием аудиовзаимодействия благодаря встроенным микрофону и динамику, а также моделям LLM. Она может использоваться в качестве контроллера робота с поддержкой ИИ, хаба для умного дома, голосового помощника или интерактивной образовательной платформы.

В этом обзоре, после распаковки и установки Fusion HAT+ на Raspberry Pi 5 с 2 ГБ ОЗУ, основное внимание будет уделено части голосового взаимодействия с использованием преобразования текста в речь (TTS), речи в текст (STT), а также локальных и облачных моделей LLM и VLM. Также будет быстро протестировано управление сервоприводом для махания флагом с помощью голосовых команд.

Распаковка SunFounder Fusion HAT+

Образец был получен в розничной упаковке с надписью «SunFounder Fusion HAT+ для Raspberry Pi» и описанием ключевых особенностей: аккумуляторная батарея, 12 каналов ШИМ, встроенные динамик и микрофон, 4 канала 12-битного АЦП, безопасное выключение, 4 драйвера для DC-моторов, расширение интерфейсов I2C/SPI/UART и совместимость с Raspberry Pi 5/4/3B+/Zero 2W.

В комплект входят плата Fusion HAT+, три кабеля для моторов, руководство по комплектации и сборке, литий-ионный аккумулятор 7.4В/2000 мАч, а также несколько винтов и нейлоновых стоек для сборки.

Сама плата Fusion HAT+ построена на микроконтроллере GD32, имеет разъемы для аккумулятора и зарядки через USB-C, конфигурационные перемычки, 40-пиновый GPIO-разъем, разъемы для ШИМ и сервоприводов, четыре порта для управления моторами, разъем I2C, динамик и несколько кнопок со светодиодами.

Микрофон расположен на нижней стороне платы.

Сборка Raspberry Pi 5 и Fusion HAT+

Для тестирования будет использоваться одноплатный компьютер Raspberry Pi 5 с 2 ГБ ОЗУ, сборка довольно проста.

Сначала устанавливаются стойки и закрепляются винтами. Затем плата Fusion HAT+ устанавливается на Raspberry Pi 5, добавляются еще четыре винта, подключается аккумулятор — и готово. Как обычно, SunFounder добавляет несколько запасных деталей на случай их потери во время сборки. В итоге остались две короткие стойки, две длинные стойки и два винта. Всегда приятно, когда компании поступают так, поскольку поиск винтов на полу или под мебелью не входит в список любимых занятий.

Ошибка новичка

После готовности аппаратной части, комплект был подключен к сетевому адаптеру Raspberry Pi 5В/5А через USB-C, и были успешно выполнены инструкции по тестированию различных демонстрационных примеров. Однако, чуть более чем через 24 часа все перестало работать, даже ранее рабочие примеры:

pi@raspberrypi:~/fusion-hat/examples $ sudo python tts_piper.py
2026-01-25 16:53:02.060985944 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
Traceback (most recent call last):
  File "/home/pi/fusion-hat/examples/tts_piper.py", line 3, in <module>
    tts = Piper()
  File "/usr/local/lib/python3.13/dist-packages/fusion_hat/tts.py", line 126, in _piper_init_with_speaker
    enable_speaker()
    ~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/dist-packages/fusion_hat/device.py", line 145, in wrapper
    raise_if_fusion_hat_not_ready()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/dist-packages/fusion_hat/device.py", line 133, in raise_if_fusion_hat_not_ready
    raise IOError("Fusion HAT not connected, check if Fusion Hat is powered on.")
OSError: Fusion HAT not connected, check if Fusion Hat is powered on.

Журнал ядра указывал на аппаратную проблему, так как Pi 5 не мог общаться с Fusion HAT+ по I2C:

[    3.311487] fusion_hat: loading out-of-tree module taints kernel.
[    3.329380] fusion_hat 1-0017: I2C write word failed: -121
[    3.329389] fusion_hat 1-0017: Failed to initialize channel 0: -121
[    3.329391] fusion_hat 1-0017: Failed to initialize PWM subsystem: -121
[    3.330123] fusion_hat 1-0017: I2C write word failed: -121
[    3.330583] fusion_hat 1-0017: I2C write word failed: -121
[    3.330719] fusion_hat 1-0017: I2C write word failed: -121
[    3.330852] fusion_hat 1-0017: I2C write word failed: -121
[    3.330984] fusion_hat 1-0017: I2C write word failed: -121
[    3.331122] fusion_hat 1-0017: I2C write word failed: -121
[    3.331258] fusion_hat 1-0017: I2C write word failed: -121
[    3.331400] fusion_hat 1-0017: I2C write word failed: -121
[    3.331537] fusion_hat 1-0017: I2C write word failed: -121
[    3.331909] fusion_hat 1-0017: I2C write word failed: -121
[    3.332048] fusion_hat 1-0017: I2C write word failed: -121
[    3.332308] fusion_hat 1-0017: I2C write word failed: -121
[    3.343487] fusion_hat 1-0017: probe with driver fusion_hat failed with error -121

Поэтому плата была выключена и предпринята попытка переустановить HAT+, но это не помогло. Утром было замечено, что Raspberry Pi 5 выключился, но поскольку после повторного запуска он работал нормально, этому не придали большого значения. Демо Online LLM с Gemini работало (оно не использует HAT). Четкого решения найти не удалось, поэтому была предположена аппаратная неисправность. Однако, после дополнительного изучения, стало ясно, что два оранжевых светодиода индикации батареи должны гореть при полной зарядке, а они были выключены, как будто батарея разряжена, хотя питание по USB было постоянно подключено. Аккумулятор также был отключен для повторной попытки, но ошибка повторялась.

Момент прозрения наступил вскоре, когда стало понятно, что подключение блока питания USB-C к порту USB-C на самом Raspberry Pi 5, вероятно, было не лучшей идеей…

… вместо порта USB-C «Charge» на самой плате Fusion HAT+. Сначала это не казалось ошибкой, так как аккумулятор был полностью заряжен и все работало гладко, пока на следующий день не выяснилось, что он разрядился.

Это также означает, что аккумулятор не является опциональным и должен быть подключен постоянно для работы Fusion HAT+.

Установка программного обеспечения

В основном следовал инструкциям на сайте документации для этой части и большей части остального обзора.

Первый шаг — запустить raspi-config и перейти в раздел A11 – Shutdown behavior : Необходимо выбрать опцию «B1 Full power off».

После перезагрузки можно установить пакет fusion-hat для получения необходимых библиотек и инструментов для работы с аппаратными средствами Fusion HAT:

curl -sSL https://raw.githubusercontent.com/sunfounder/sunfounder-installer-scripts/main/install-fusion-hat.sh | sudo bash

После еще одной перезагрузки можно выполнить скрипт настройки аудио:

sudo /opt/setup_fusion_hat_audio.sh

Вот вывод команды для справки:

get_sink_index ...
sink index not found.
Sometimes you need to reboot to activate the soundcard.
get_source_index ...
source index not found.
Sometimes you need to reboot to activate the soundcard.
set default Pulseaudio volume to 100% ...

Do you wish to test speaker now? [y/N] y
open speaker ...
/opt/setup_fusion_hat_audio.sh: line 411: /sys/class/fusion_hat/fusion_hat/speaker: No such file or directory
testing speaker ...

speaker-test 1.2.14

Playback device is default
Stream parameters are 48000Hz, S16_LE, 1 channels
WAV file(s)
Rate set to 48000Hz (requested 48000Hz)
Buffer size range from 2048 to 8192
Period size range from 1024 to 1024
Periods = 4
was set period_size = 1024
was set buffer_size = 4096
 0 - Front Left
Time per period = 1.403260
 0 - Front Left
Time per period = 1.495619
 0 - Front Left
Time per period = 1.474478

All done!

На этом этапе звука не было, но после перезагрузки работа аудио была подтверждена.

Варианты преобразования текста в речь (TTS) для Fusion HAT+

SunFounder предлагает поддержку нескольких движков TTS. Начнем с Espeak:

pi@raspberrypi:~ $ cd fusion-hat/examples/
pi@raspberrypi:~/fusion-hat/examples $ sudo python3 tts_espeak.py

Было слышно «Hello! I’m Espeak TTS».

Пользователи могут управлять громкостью, скоростью речи, паузами между словами и тоном:

from fusion_hat.tts import Espeak

# Create Espeak TTS instance
tts = Espeak()
# Set amplitude 0-200, default 100
tts.set_amp(200)
# Set speed 80-260, default 150
tts.set_speed(150)
# Set gap 0-200, default 1
tts.set_gap(1)
# Set pitch 0-99, default 80
tts.set_pitch(80)

tts.say("CNXSoft is in da house!")

Это работает и понятно, но голос довольно роботизированный.

Другой вариант — pico2wave:

pi@raspberrypi:~/fusion-hat/examples $ sudo python tts_pico2wave.py

Было слышно «Hello! I’m Pico2wave TTS» более естественным образом, чем в Espeak.

Отредактируем код:

from fusion_hat.tts import Pico2Wave

# Create Pico2Wave TTS instance
tts = Pico2Wave()

# Set the language
tts.set_lang('en-US')  # en-US, en-GB, de-DE, es-ES, fr-FR, it-IT

# Quick hello (sanity check)
tts.say("CNXSoft is in da house!")

Вот демо с обоими предложениями.

CNXSoft здесь не было распознано, но со стандартным предложением проблем не возникло.

tts.say("Hello from Chiang Mai, Thailand!")

Третий вариант — Piper TTS:

pi@raspberrypi:~/fusion-hat/examples $ sudo python3 tts_piper.py

Было четко слышно «Hello, I’m Piper TTS». Вывод показывает поддержку более широкого диапазона языков:

2026-01-24 21:00:25.335876115 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
['ar_JO', 'ca_ES', 'cs_CZ', 'cy_GB', 'da_DK', 'de_DE', 'el_GR', 'en_GB', 'en_US', 'es_ES', 'es_MX', 'fa_IR', 'fi_FI', 'fr_FR', 'hu_HU', 'is_IS', 'it_IT', 'ka_GE', 'kk_KZ', 'lb_LU', 'lv_LV', 'ml_IN', 'ne_NP', 'nl_BE', 'nl_NL', 'no_NO', 'pl_PL', 'pt_BR', 'pt_PT', 'ro_RO', 'ru_RU', 'sk_SK', 'sl_SI', 'sr_RS', 'sv_SE', 'sw_CD', 'tr_TR', 'uk_UA', 'vi_VN', 'zh_CN']
[]

Код был изменен для смены строки:

from fusion_hat.tts import Piper

tts = Piper()

# List supported languages
print(tts.available_countrys())

# List models for English (en_us)
print(tts.available_models('en_us'))

# Set a voice model (auto-download if not already present)
tts.set_model("en_US-amy-low")

# Say something
tts.say("CNXSoft is in da house!")

Это самое четкое TTS-решение на данный момент.

Финальный вариант — OpenAI TTS, но он не сработал, вероятно, из-за отсутствия купленных кредитов:

pi@raspberrypi:~/fusion-hat/examples $ sudo python tts_openai.py 
2026-01-24 20:57:51.354565538 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
Say: Hello! I'm OpenAI TTS.
[ERROR] OpenAI TTS API request error: 429 Client Error: Too Many Requests for url: https://api.openai.com/v1/audio/speech
Say: with instructions, I can say word sadly, with instructions: 'say it sadly'
[ERROR] OpenAI TTS API request error: 429 Client Error: Too Many Requests for url: https://api.openai.com/v1/audio/speech
Say: or say something dramaticly., with instructions: 'say it dramaticly'
[ERROR] OpenAI TTS API request error: 429 Client Error: Too Many Requests for url: https://api.openai.com/v1/audio/speech

SunFounder предоставляет сравнительную таблицу для всех четырех вариантов TTS, чтобы пользователи могли выбрать наиболее подходящий под свои требования.

Item	Espeak	Pico2Wave	Piper	OpenAI TTS
Runs on	Built-in on Raspberry Pi (offline)	Built-in on Raspberry Pi (offline)	Raspberry Pi / PC (offline, needs model)	Cloud (online, needs API key)
Voice quality	Robotic	More natural than Espeak	Natural (neural TTS)	Very natural / human-like
Controls	Speed, pitch, volume	Limited controls	Choose different voices/models	Choose model and voices
Languages	Many (quality varies)	Limited set	Many voices/languages available	Best in English (others vary by availability)
Latency / speed	Very fast	Fast	Real-time on Pi 4/5 with “low” models	Network-dependent (usually low latency)
Setup	Minimal	Minimal	Download .onnx + .onnx.json models	Create API key, install client
Best for	Quick tests, basic prompts	Slightly better offline voice	Local projects with better quality	Highest quality, rich voice options

Тестирование преобразования речи в текст (STT) с помощью Vosk

Vosk — единственный предоставленный пример для поддержки STT:

sudo python3 stt_vosk_stream.py

Вот вывод команды:

vosk-model-small-en-us-0.15.zip: 100%|█████| 39.3M/39.3M [00:03<00:00, 11.6MB/s]
Say something
final: i am see and excerpt
Say something
final: today the weather is very hot
Say something
final: hello everyone
Say something
final: i am
Say something
final: see an excerpt
Say something
final: can i say something
Say something
final: what are you doing
Say something
final: why don't you listen to me
Say something

Иногда он не понимал речь, возможно, из-за отсутствия американского акцента.

Вот как выглядит Python-код:

from fusion_hat.stt import Vosk as STT

stt = STT(language="en-us")

while True:
   print("Say something")
   for result in stt.listen(stream=True):
      if result["done"]:
            print(f"final:   {result['final']}")
      else:
            print(f"partial: {result['partial']}", end="\r", flush=True)

Vosk поддерживает более 20 языков и диалектов: английский, индийский английский, немецкий, французский, испанский, португальский, китайский, русский, турецкий, вьетнамский, итальянский, голландский, каталанский, арабский, греческий, фарси, филиппинский, украинский, казахский, шведский, японский, эсперанто, хинди, чешский, польский, узбекский, корейский, бретонский, гуджарати, таджикский, телугу и киргизский. Со временем в проект будут добавлены и другие языки.

Тестирование локальных LLM и VLM с помощью Ollama

Следующий шаг — тестирование демо локальных LLM (больших языковых моделей) и VLM (визуально-языковых моделей) с использованием Ollama. Поскольку в наличии только Raspberry Pi 5 с 2 ГБ ОЗУ, запускать Ollama на Pi не планируется. Вместо этого она была установлена на мини-ПК UP Xtreme i11 Edge с 16 ГБ ОЗУ:

curl -fsSL https://ollama.com/install.sh | sh

Автозапуск Ollama был отключен, и она запущена в серверном режиме на компьютере UP Xtreme i11 Edge:

sudo systemctl stop ollama
sudo systemctl disable ollama
ollama pull llama3.1:8b
OLLAMA_HOST=0.0.0.0 ollama serve

Также был отредактирован код на Raspberry Pi в файле ~/fusion-hat/examples/llm_ollama.py для добавления IP-адреса сервера и выбора только что установленной модели:

# Change this to your computer IP, if you run it on your pi, then change it to localhost
llm = Ollama(
    ip="192.168.31.12",          # e.g., "192.168.100.145" if remote
    model="llama3.1:8b"
)

Можно ввести промпт с клавиатуры для получения ответов.

pi@raspberrypi:~/fusion-hat/examples $ sudo python3 llm_ollama.py 
Hello, I am a helpful assistant. How can I help you?
>>> Why is the sky blue?
The color of the sky appears blue because of a phenomenon called scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths.

This is known as Rayleigh scattering, named after the British physicist Lord Rayleigh who first described the phenomenon in the late 19th century. As a result of this scattering, our eyes see the blue light being scattered in all directions and that's why the sky appears blue during the daytime.

It's worth noting that the color of the sky can appear different at different times of day or under different conditions, such as:

* During sunrise and sunset, when the sun is lower in the sky, the light has to travel through more of the atmosphere, which scatters the shorter wavelengths (like blue) even more. This is why the sky often appears red or orange during these times.
* On a cloudy day, the clouds can scatter the light in different ways, making the sky appear gray or white.

But overall, the basic principle of Rayleigh scattering is responsible for the blue color of the sky!

Do you have any other questions on this topic?

Визуально-языковые модели также поддерживаются, поэтому на сервере была установлена llava:7b:

devkit@UPX-i11:~$ ollama pull llava:7b

На Raspberry Pi 5 меняем IP-адрес с localhost на IP-адрес нашего сервера и задаем модель в примере llm_ollama_with_image.py :

llm = Ollama(
    ip="192.168.31.12",          # e.g., "192.168.100.145" if remote
    model="llava:7b"         # change to "moondream:1.8b" or "granite3.2-vision:2b" for 8GB RAM
)

Перед запуском команды к Raspberry Pi 5 была подключена камера Raspberry Pi Camera Module 3:

pi@raspberrypi:~/fusion-hat/examples $ python3 llm_ollama_with_image.py 
[0:00:31.352799647] [1918]  INFO Camera camera_manager.cpp:340 libcamera v0.6.0+rpt20251202
[0:00:31.361772221] [1921]  INFO RPI pisp.cpp:720 libpisp version 1.3.0
[0:00:31.364725184] [1921]  INFO IPAProxy ipa_proxy.cpp:180 Using tuning file /usr/share/libcamera/ipa/rpi/pisp/imx708.json
[0:00:31.373335239] [1921]  INFO Camera camera_manager.cpp:223 Adding camera '/base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a' for pipeline handler rpi/pisp
[0:00:31.373466406] [1921]  INFO RPI pisp.cpp:1181 Registered camera /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media2 and ISP device /dev/media0 using PiSP variant BCM2712_D0
[0:00:31.376763665] [1918]  INFO Camera camera.cpp:1215 configuring streams: (0) 1280x720-BGR888/sRGB (1) 1536x864-BGGR_PISP_COMP1/RAW
[0:00:31.376872369] [1921]  INFO RPI pisp.cpp:1485 Sensor: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10/RAW - Selected CFE format: 1536x864-PC1B/RAW
Hello, I am a helpful assistant. How can I help you?
>>> Describe the image
 The image shows an interior space with a desk and items on it. In the foreground, there appears to be a table with some boxes or materials on top of it. Behind the table, there is another table that seems to have a sign or label attached to it, though the details are not clearly visible. To the left of this second table, there is a partial view of what looks like packaging material, possibly cardboard boxes. The wall in the background is white and featureless, which provides a plain backdrop for the scene. There is no text present in the image to provide additional context or information about the location or purpose of the space.

Описание точное. Вот изображение, сохраненное в /tmp/llm-img.jpg.

Тест онлайн LLM Google Gemini

SunFounder предоставляет инструкции для OpenAI, Gemini, Qwen, Grok, DeepSeek и Doubao. Был выбран Google Gemini, в основном потому что это, похоже, единственный вариант с бесплатным тарифом.

Первый шаг — войти в Google AI studio и нажать Get API.

Был создан новый проект «Fusion HAT» перед нажатием «Create key».

Google AI Studio сгенерирует ключ API Gemini, который можно использовать для тестирования.

Скопируйте ключ и вставьте его в secret.py:

GEMINI_API_KEY="AI-your-key"

Файл llm_gemini.py по умолчанию использует модель gemini-2.5-flash, но можно переключиться на другую:

pi@raspberrypi:~/fusion-hat/examples $ python3 llm_gemini.py 
Hello, I am a helpful assistant. How can I help you?
>>> What do you know about CNX Software?           
**CNX Software** is a well-known and highly respected technology blog and website that focuses primarily on **embedded systems, open-source hardware, and software development**.

It's a go-to resource for developers, engineers, makers, and tech enthusiasts interested in the latest advancements in areas like:

1.  **Single Board Computers (SBCs):** They extensively cover popular SBCs like Raspberry Pi, ODROID, NVIDIA Jetson, and many other boards based on chips from Rockchip, Amlogic, Allwinner, NXP, etc.
2.  **Development Boards:** News and reviews on various development boards for microcontrollers (e.g., ESP32, ESP8266, STM32) and more complex SoCs.
3.  **IoT Devices:** Coverage of new Internet of Things hardware, sensors, and platforms.
4.  **Embedded Linux & Android:** Deep dives into custom ROMs, kernels, drivers, and distributions tailored for embedded devices, including Android TV boxes and mini PCs.
5.  **Processors & Architectures:** They report on new chip announcements, focusing heavily on ARM-based processors, but also increasingly on RISC-V and other architectures used in embedded contexts.
6.  **Mini PCs & TV Boxes:** Often with a focus on their hackability, potential for running alternative operating systems, and open-source firmware.
7.  **Open-Source Projects:** Highlighting new open-source hardware designs and software initiatives relevant to the embedded world.

**What kind of content do they provide?**

*   **News and Announcements:** Covering new product launches, crowdfunding campaigns (especially on platforms like Kickstarter and Indiegogo), and industry updates.
*   **Detailed Reviews:** Often including hands-on testing, benchmarks, power consumption analysis, and in-depth analysis of hardware and software performance. They go beyond surface-level marketing.
*   **Tutorials and How-tos:** Guides for setting up development environments, flashing firmware, or using specific features of new hardware.
*   **Technical Insights:** Explanations of new technologies, standards, and components relevant to the embedded space.

**Why is it valuable?**

*   **In-depth Technical Coverage:** They provide real technical specifications and often test actual performance, making it a reliable source for engineers and developers.
*   **Early Adopter Focus:** They are often among the first to report on new embedded hardware coming to market, especially from lesser-known Chinese manufacturers who are prominent in this space.
*   **Specialized Niche:** Their specific focus on embedded Linux/Android and open-source hardware makes them an authority in this particular segment of the tech world.
*   **Resource for Developers:** It helps developers and makers stay informed about new tools, platforms, and solutions for their projects.

In summary, for anyone involved in or curious about the world of embedded systems, open-source hardware, and low-power computing, CNX Software is an indispensable resource.
>>>

Люблю спрашивать о CNX Software, потому что большинство локальных LLM выдают несуразицу, в то время как онлайн LLM обычно предоставляют относительно точную информацию.

Локальный голосовой агент с использованием Piper TTS, Vosk STT и Ollama с Llama3.1:8b

Пока все хорошо, но подключение к локальной или онлайн LLM не использует возможности платы Fusion HAT+. Исправим это, попробовав демо локального голосового чат-бота. Сначала проверим исходный код, чтобы изменить язык для Vosk SST и Piper TTS, если это не американский английский, и задать правильные параметры для Ollama:

# Initialize speech recognition
stt = Vosk(language="en-us")

# Initialize TTS
tts = Piper()
tts.set_model("en_US-amy-low")

# Instructions for the LLM
INSTRUCTIONS = (
    "You are a helpful assistant. Answer directly in plain English. "
    "Do NOT include any hidden thinking, analysis, or tags like <think>."
)
WELCOME = "Hello! I'm your voice chatbot. Speak when you're ready."

# Initialize Ollama connection
llm = Ollama(ip="192.168.31.12", model="llama3.1:8b")
llm.set_max_messages(20)
llm.set_instructions(INSTRUCTIONS)

Это работало довольно хорошо, хотя иногда становилось комично из-за того, что скрипт слышал звуки «а?», возможно, из-за вентилятора мини-ПК UP Xtreme i11 Edge:

pi@raspberrypi:~/fusion-hat/examples $ sudo python3 local_voice_chatbot.py
2026-01-25 19:17:33.762952537 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
Hello! I'm your voice chatbot. Speak when you're ready.

🎤 Listening... (Press Ctrl+C to stop)
[YOU] why is the sky blue
The sky appears blue because of something called Rayleigh scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths.

This is why we see the sky appear blue during the day - our eyes are seeing the scattered blue light that's bouncing around everywhere. It's kind of like when you shine a flashlight through a prism and it breaks into different colors; with sunlight, the entire spectrum is being scattered, but our eyes pick up on the blue part.

At sunrise and sunset, the sky can appear more red or orange because the sun's light has to travel through more of the atmosphere to reach us, scattering away some of the shorter wavelengths.

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
That's a pretty normal reaction when someone first learns about it! It might seem a bit weird that the sky is blue and not just gray or white. But it's actually quite cool (and scientifically interesting) how light behaves in our atmosphere. If you want to know more, I can try explaining it in simpler terms or give you some analogies to help make sense of it!

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
Let me try again with a different explanation.

Imagine you're holding a big jar of glitter and shining a flashlight through it. The light scatters off the tiny particles of glitter, making it look like the entire jar is filled with sparkling colors. That's kind of what's happening with the sky when sunlight enters our atmosphere. The tiny molecules in the air are like the glitter, scattering the blue light in all directions and making the sky appear blue to us.

Does that make more sense?

🎤 Listening... (Press Ctrl+C to stop)
[YOU] it does thank you
I'm glad I could help explain it in a way that made sense to you. If you have any other questions or need help with anything else, feel free to ask!

🎤 Listening... (Press Ctrl+C to stop)

Приложение продолжает слушать и иногда слышит «а?», даже когда в комнате никого нет:

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
Still with the huh's I see! Okay, okay, I get it. Maybe we're just done for real this time. Bye!

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
I'm leaving. Seriously this time. Have a great day!

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
I'll just close the conversation now. Goodbye!

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
Goodbye again. (This is really it, I promise)

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
FINAL GOODBYE.

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
I'll just leave this conversation now and come back when you're ready to talk. Have a nice day!

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
Goodbye, I've left.

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
Conversation ended.

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
I'm out of here.

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
*ended*

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
This conversation has been closed.

🎤 Listening... (Press Ctrl+C to stop)
[YOU] huh
Goodbye!

🎤 Listening... (Press Ctrl+C to stop)

Голосовой помощник с ИИ на базе Google Gemini

Сделаем нечто еще более продвинутое, превратив Raspberry Pi 5 в подобие умной колонки с помощью демо голосового помощника с ИИ. Снова полагался на Google Gemini и отредактировал скрипт voice_assistant.py с правильными параметрами:

from fusion_hat.voice_assistant import VoiceAssistant
from fusion_hat.llm import Gemini as LLM
from secret import GEMINI_API_KEY as API_KEY

llm = LLM(
    api_key={API_KEY},
    model="gemini-2.5-flash",
)

# Robot name
NAME = "Buddy"

# Enable image, need to set up a multimodal language model
WITH_IMAGE = True

# Set models and languages
LLM_MODEL = "gemini-2.5-flash"
TTS_MODEL = "en_US-amy-low"
STT_LANGUAGE = "en-us"

Если камера не подключена, также нужно установить WITH_IMAGE в False.

pi@raspberrypi:~/fusion-hat/examples $ sudo python3 voice_assistant.py 
2026-01-31 14:36:12.874392721 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
[0:05:28.231334914] [3337]  INFO Camera camera_manager.cpp:340 libcamera v0.6.0+rpt20251202
[0:05:28.240260201] [3863]  INFO RPI pisp.cpp:720 libpisp version 1.3.0
[0:05:28.242815656] [3863]  INFO IPAProxy ipa_proxy.cpp:180 Using tuning file /usr/share/libcamera/ipa/rpi/pisp/imx708.json
[0:05:28.250784150] [3863]  INFO Camera camera_manager.cpp:223 Adding camera '/base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a' for pipeline handler rpi/pisp
[0:05:28.250828205] [3863]  INFO RPI pisp.cpp:1181 Registered camera /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media0 and ISP device /dev/media1 using PiSP variant BCM2712_D0
[0:05:28.254004103] [3337]  INFO Camera camera.cpp:1215 configuring streams: (0) 640x480-XBGR8888/sRGB (1) 1536x864-BGGR_PISP_COMP1/RAW
[0:05:28.254121621] [3863]  INFO RPI pisp.cpp:1485 Sensor: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10/RAW - Selected CFE format: 1536x864-PC1B/RAW
>>> heard: hey buddy
Waked, Listening ...
heard: what is the sky blue

The sky appears blue because of something called **Rayleigh scattering**.

Here's a simplified explanation:

1.  **Sunlight is made of all colors:** The light from the sun looks white, but it's actually a mix of all the colors of the rainbow (red, orange, yellow, green, blue, indigo, violet).
2.  **Earth's atmosphere:** Our atmosphere is made up of tiny gas molecules (like nitrogen and oxygen) and small particles.
3.  **Scattering light:** When sunlight enters the atmosphere, these tiny molecules and particles scatter the different colors of light in different directions.
4.  **Blue light scatters more:** Blue and violet light have shorter wavelengths than red and yellow light. Shorter wavelengths are scattered *much more effectively* by the tiny gas molecules in the atmosphere.
5.  **Why blue, not violet?** While violet light scatters even more than blue, our eyes are more sensitive to blue light. Also, some of the violet light is absorbed higher in the atmosphere. So, the dominant color we perceive is blue.

So, when you look up, you're seeing blue light that has been scattered across the sky and toward your eyes from all directions.
>>>

Поскольку к плате была подключена камера Raspberry Pi Camera Module 3, также была дана команда использовать ее:

heard: hey buddy
Waked, Listening ...
heard: can you use the camera and what do you see

Okay, using a camera on this image, here's what I "see" and can describe:

The image appears to be taken indoors and is rotated 90 degrees clockwise.

**Assuming the image is rotated upright for description:**

*   **Left Side (Wall and Window):**
    *   The upper half of the left side is dominated by a plain, light-colored (off-white or pale gray) wall with some minor smudges or imperfections.
    *   A dark brown horizontal band separates the wall from the window structure below it.
    *   Below the brown band is a window. It has a light-colored (likely aluminum or white) frame and appears to have a screen. Through the window, faint vertical bars are visible, suggesting security bars or a fence outside. The view outside is bright but indistinct.

*   **Right Side (Table/Surface and Objects):**
    *   The entire right side features a light brown surface with a distinct wavy wood grain pattern, resembling a table or shelf. There's a small, dark circular mark (possibly a screw hole or stain) on this surface towards the center.
    *   Stacked on this wooden surface are two boxes:
        *   The **top box** is dark gray or black. Some faint white text is visible on its side, but it's largely illegible, appearing to be technical specifications or branding (e.g., "ROHS © FC" and other smaller text, which is reversed due to the image rotation).
        *   Below it is a **white rectangular box**. On its visible side, there is clear blue text that reads "**Sunfounder**" (this text is also reversed due to the image rotation, but easily deciphered). Below "Sunfounder," smaller text indicates:
            *   **CN0522D**
            *   **Made in China**
            *   A date: **2023.11.26**
    *   Part of a dark blue or dark green fabric item (possibly a bag or clothing) is visible next to and beneath the white box.
    *   Further to the right, a crumpled white plastic bag or sheet is partially visible.

In summary, it's a photograph of a room interior, showing a wall, a window, and a wooden surface with stacked boxes and other items on it.

Довольно аккуратно! Видео здесь не представлено, так как эта часть будет продемонстрирована в другом примере с использованием сервопривода в разделе ниже. Заметим, что иногда звука не было совсем, но было замечено, что выполнение команды:

sudo python3 tts_pico2wave.py

исправляет проблему со звуком в примере voice_assistant.py. SunFounder и я не уверены, почему так происходит, и не могу воспроизвести это со 100% уверенностью.

Управление сервоприводом с TTS

Документация Fusion HAT+ сосредоточена на аудио и управлении LLM/LVM, но SunFounder также работает над «AI Fusion Lab Kit», который будет поставляться вместе с Fusion HAT+, а также моторами, сервоприводами и т.д. Можно проверить документацию для предстоящего набора для инструкций по управлению сервоприводом. К порту 0 был подключен сервопривод SG90, как показано на фото ниже, и для драматического эффекта добавлен тайский флаг 😉

Посмотрим на пример кода для AI Lab kit:

cd ~
git clone https://github.com/sunfounder/ai-lab-kit.git --depth 1
cd ai-lab-kit/python/

и запустим демо сервопривода:

python3 1.7_Servo.py

Сервопривод будет вращаться в бесконечном цикле. Низкоуровневые аспекты разобраны. Соответствующий Python-код:

from fusion_hat.servo import Servo   # Import the Servo class for controlling servos
from time import sleep               # Import sleep for timing delays

servo = Servo(0)                     # Create a Servo object on channel 0

while True:                          # Loop forever
    # Sweep from -90° to +90° in steps of 10°
    for i in range(-90, 91, 10):
        servo.angle(i)               # Set servo to angle i
        sleep(0.1)                   # Small delay for smooth movement

    # Sweep back from +90° to -90° in steps of -10°
    for i in range(90, -91, -10):
        servo.angle(i)               # Set servo to angle i
        sleep(0.1)                   # Small delay for smooth movement

Используем это, чтобы смешать TTS, Vosk STT, LLM Google Gemini и управление сервоприводом. Цель — вращать сервопривод при команде «wave the flag», а в остальных случаях отправлять запрос в Google Gemini. Отметим, что была выполнена смена с модели gemini-2.5-flash на gemini-2.5-flash-lite, так как дневной лимит в 20 кредитов был исчерпан во время тестирования. Другой способ — настроить биллинг и купить кредиты.

Вот код для демо:

from fusion_hat.voice_assistant import VoiceAssistant
from fusion_hat.llm import Gemini
from fusion_hat.servo import Servo
from secret import GEMINI_API_KEY
import time

# Robot name
NAME = "Buddy"

# Enable image, need to set up a multimodal language model
WITH_IMAGE = True

# Set models and languages
LLM_MODEL = "gemini-2.5-flash-lite"
TTS_MODEL = "en_US-ryan-low"
STT_LANGUAGE = "en-us"

# Enable keyboard input
KEYBOARD_ENABLE = True

# Enable wake word
WAKE_ENABLE = True
WAKE_WORD = [f"hey {NAME.lower()}"]
# Set wake word answer, set empty to disable
ANSWER_ON_WAKE = "Hi there"

# Welcome message
WELCOME = f"Hi, I'm {NAME}. Wake me up with: " + ", ".join(WAKE_WORD)

# Set instructions
INSTRUCTIONS = f"""
You are {NAME}, a helpful and friendly voice assistant.
Keep answers short, natural, and fun.
If the user says anything about waving a flag, only reply "Done" and nothing else. It's handled by the hardware.
"""

class FlagWavingAssistant(VoiceAssistant):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.servo = Servo(0) 
        self.answer_on_wake = ""

    def wave_flag(self):
        # Wave the flag back and forth 3 times
        self.tts.say("Waving the flag right now!")  
        for _ in range(3):
            self.servo.angle(-45)
            time.sleep(0.5)
            self.servo.angle(45)
            time.sleep(0.5)
        self.servo.angle(0)  # Reset to center

    def on_heard(self, text: str):
        text_lower = text.lower().strip()

        if "wave the flag" in text_lower or "wave flag" in text_lower:
            self.wave_flag()
            return

# ────────────────────────────────────────────────
# Main setup
# ────────────────────────────────────────────────

# Initialize Gemini LLM 
llm = Gemini(
    api_key=GEMINI_API_KEY,
    model="gemini-2.5-flash-lite",  
)

# Create your custom assistant
va = FlagWavingAssistant(
    llm,
    name=NAME,
    with_image=WITH_IMAGE,
    tts_model=TTS_MODEL,
    stt_language=STT_LANGUAGE,
    keyboard_enable=KEYBOARD_ENABLE,
    wake_enable=WAKE_ENABLE,
    wake_word=WAKE_WORD,
    answer_on_wake=ANSWER_ON_WAKE,
    welcome=WELCOME,
    instructions=INSTRUCTIONS,
)

if __name__ == "__main__":
    try:
        va.run()   # Starts the full loop: wake → listen → think → speak
    except KeyboardInterrupt:
        va.say("Goodbye!")
        print("Stopped by user.")

Демонстрация ниже показывает поддержку wake word, LLM Gemini с общим вопросом и описанием с фото, сделанного камерой Raspberry Pi Camera Module 3, и, наконец, махание флагом по голосовой команде с использованием TTS.

Вывод команды:

pi@raspberrypi:~/fusion-hat/examples $ sudo python3 voice_assistant_servo.py 
2026-02-01 11:10:56.476444466 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
[0:19:39.838079874] [3034]  INFO Camera camera_manager.cpp:340 libcamera v0.6.0+rpt20251202
[0:19:39.851522012] [3560]  INFO RPI pisp.cpp:720 libpisp version 1.3.0
[0:19:39.867269263] [3560]  INFO IPAProxy ipa_proxy.cpp:180 Using tuning file /usr/share/libcamera/ipa/rpi/pisp/imx708.json
[0:19:39.878685104] [3560]  INFO Camera camera_manager.cpp:223 Adding camera '/base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a' for pipeline handler rpi/pisp
[0:19:39.878735845] [3560]  INFO RPI pisp.cpp:1181 Registered camera /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media2 and ISP device /dev/media0 using PiSP variant BCM2712_D0
[0:19:39.882767570] [3034]  INFO Camera camera.cpp:1215 configuring streams: (0) 640x480-XBGR8888/sRGB (1) 1536x864-BGGR_PISP_COMP1/RAW
[0:19:39.882904718] [3560]  INFO RPI pisp.cpp:1485 Sensor: /base/axi/pcie@1000120000/rp1/i2c@88000/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10/RAW - Selected CFE format: 1536x864-PC1B/RAW
>>> heard: hey buddy
Waked, Listening ...
heard: why is the sky blue

The sky is blue because of the way the Earth's atmosphere scatters sunlight! Blue light waves are shorter, so they scatter more than other colors, making the sky appear blue.
>>> heard: hey buddy
Waked, Listening ...
heard: take a photo on describe what you see

I see a stuffed character, maybe a mascot, wearing a white hat with a red pompom, and a red cape or scarf. It looks like it's outdoors, possibly at an event or amusement park, with wooden structures and greenery in the background. The photo is taken from a low angle, looking up at the character.
>>> heard: hey buddy
Waked, Listening ...
heard: wave the flag

Done

Фото с камеры Raspberry Pi Camera Module 3:

Первая попытка с моделью Gemini 2.5 Flash была немного более многословной и, вероятно, описала бы также упаковку Fusion HAT+ для Raspberry Pi. Переключение на модель Gemini 2.5 Flash Lite генерирует более короткие ответы. Также был удален текст «Keep answers short, natural, and fun» из инструкций, и от Lite-модели все равно были получены относительно короткие ответы.

Это также работает от аккумулятора, поэтому пользователи могут отключить USB-кабель и использовать устройство в течение нескольких часов в автономном режиме. Кнопка питания, позволяющая выполнить безопасное выключение, также очень полезна, учитывая, что большинство будет использовать плату в headless-режиме (без дисплея).

Заключение

Было интересно экспериментировать с Fusion HAT+, подключенной к Raspberry Pi 5, поскольку SunFounder предоставляет отличную документацию для начала работы с преобразованием текста в речь (TTS), речи в текст (STT), локальными LLM/VLM через Ollama, онлайн LLM, такими как Google Gemini, а также для создания собственного голосового помощника.

Плата также предлагает интерфейсы для моторов, сервоприводов и других вводов-выводов, вместе с аккумулятором, что делает ее особенно полезной для голосоуправляемых роботов. Тем не менее, она также может быть интегрирована в контроллер для умного дома на базе Raspberry Pi или использоваться в образовательных целях, особенно с учетом достаточно конкурентной цены. Документация по управлению моторами/сервоприводами/вводами-выводами для самой Fusion HAT+ довольно ограничена, но можно положиться на документацию для предстоящего набора SunFounder AI Fusion Lab Kit, которая в этом отношении гораздо более подробна и будет поставляться с аксессуарами для создания всевозможных интересных демо и проектов.

Благодарю SunFounder за предоставление Fusion HAT+ для обзора. В настоящее время она продается за $34.99 на AliExpress , Amazon и в онлайн-магазине компании , включая аккумуляторный блок.

Выражаем свою благодарность источнику, с которого взята и переведена статья, сайту cnx-software.com.

Оригинал статьи вы можете прочитать здесь.