This project aims to create an USB video class visualizer for domain frequencies of the onboard microphone audio source. Imagine you are on the field and you want to analyze the contents of an audio source. The FFT camera will help you achieve just that. Simply plug the FFT camera into the USB socket of a computer, open Camera app then voila, the whole spectrum content is presented to you.
What makes this project different from a normal FFT analyzer is its integration level. Such integration is only possible from a fast real time MCU such as iMXRT1010.
In an FFT analyzer, the MCU provides data acquisition which makes a stream of data for the host PC to consume. In other words, data is sampled at regular interval then sent to PC software to store, analyzer, then present to user in graphical format. There is a lot of hardware on the market that takes in approach.
In an FFT visualizer that this project tries to build, the data is sampled, processed and presented to user in graphical format all done from the iMXRT1010. The user needs not to download any special software. Any computers with Windows 7 or 10 can just open Camera app and start seeing the analysis.
The software components include:
1. MCUxpresso with iMXRT1010 SDK
2. USB UVC host stack
3. CMSIS DSP
We will use ARM CMSIS library to process FFT so arm_cortexM7lfsp_math.a library needs to be linked and header files needs included. With this library, it operates on float data type, but the original data sampled from audio source has bytes data type so we need to convert with correct bit widths:
uint32_t sampleRate, uint32_t bitWidth, uint8_t *buffer, float32_t *fftData, float32_t *fftResult)
This operation is done every time a new sample is captured (FFT_CAMERA_DoCapture).
Results from FFT will be used to draw pixels on a bitmap of length and width which will be subsequently streamed over USB. The dimension of this bitmap is constrained by the memory availability of the device.
There are 2 types of video data that can be sent to PC, compressed data such as JPEG and MJPEG, uncompressed data such as RGB and YUV formats. In my initial testing, although JPEG does reduces the amount of streamed data, it imposes even more memory since conversion to JPEG format requires intermediate memory buffer. This leaves us either RGB or YUV format to choose.
My requirement is to have a colored spectrum and as large as possible video frame to have high resolution. If choose to use RGB24 which requires 3 bytes of data per pixel. For 480x360 frame, it requires ~518k of memory (8x out of memory). Then I choose RGB565 which requires 2 bytes of data per pixel. This translates to ~345k memory (5x out of memory). Then I choose RGB8 (grayscale), it requires ~172k mem (still out of memory). As last resort, I choose RGB8 (Black/White) which requires ~21k mem. This means I will store 1 bit of memory for every pixel in frame. From experiments, the largest frame I can fit is 800x360 and this seems provide decent visualization. Unfortunately, there is only enough memory for 1 frame buffer.
USB Video Class
The USB descriptor for UVC are well defined, the important details in the descriptor is the GUID which describes the format of the data stream. Several identifiers are publish here.
Unfortunately, Linux did not implement all combinations of GUID, so if I choose to use MEDIASUBTYPE_RGB565 for example, the USB device was not enumerated right and all sort of errors showed up in the log. I found MEDIASUBTYPE_RGB1 to be supported and the enumeration should look like this (no errors)
Processing signals/ video stream are inherently memory intensive. The IMXRT1010-EVK comes with MIMXRT1011DAE5A which has 64KB ROM and 128KB RAM.
Assume data input size is 2048 bytes, estimated memory requirements for FFT is:
fft_data : 0x2000
fft_results : 0x4000
For camera buffer,
Memory region Used Size Region Size %age Used
BOARD_FLASH: 112316 B 16 MB 0.67%
SRAM_OC: 61860 B 64 KB 94.39%
NCACHE_REGION: 0 GB 0 GB -nan%
SRAM_DTC: 22316 B 32 KB 68.10%
SRAM_ITC: 0 GB 32 KB 0.00%
As demonstrated in the memory section, to support high resolution, an other than conventional data format is used, so normal camera app would not be able to interpret this data. However Linux provides v4l2 tool that can capture raw video data and imagemagick to convert raw video data to image format.
# capture 1 frame from /dev/video1 and saw it to file
v4l2-ctl --device /dev/video1 --stream-mmap --stream-to=frame.raw --stream-count=1
# convert file with GRAY format to PNG image
convert -size 800x360 -depth 1 -colors 2 -format GRAY GRAY:frame.raw frame.png