Skip to main content

A Workflow For TinyML Development

defective ground structure

Now that commercial AI applications have gone well beyond recommendation engines and anomaly detection, more companies are looking at cost-effective and power effective-ways to add AI onto their edge devices. While you could bootstrap a stack of GPUs onto a huge motherboard and server power supply in a big metal box, this is just not practical for today’s cutting-edge applications. Power, footprint, and cost are the biggest barriers to deployment of AI on small microcontrollers, and these systems are only possible with a development framework that enables highly efficient AI inference without relying on the cloud.

TinyML is a development concept that stresses all of these points. The goal in TinyML development processes is to arrive at a model that can be deployed on custom hardware without excessive compute and power requirements. However, as AI moves closer to custom embedded devices, more companies will struggle to find a useful workflow and process for developing building AI-capable embedded applications. We’ll examine a typical workflow in this article.

TinyML Workflow Example

TinyML development concepts focus on bringing intelligence to the devices deployed at the edge. While there is embedded application development required for the new system, a device deployed at the edge is also a piece of hardware that is typically optimized for the AI-enabled task being performed. Therefore, TinyML development and hardware development are a linked process, where design choices in one area influence the other area.

Hardware Side

Companies often use custom hardware for edge-capable systems requiring an embedded AI application. The hardware platform could be modular, totally custom, or off-the-shelf from a module (CoM, SoM, SBC, etc.). Many products start from development boards or open-source projects that have a desired chipset and/or peripherals, or a codebase that can be extended to implement AI-driven functionality.

There are many specifications that drive AI development for edge AI systems, so we won’t repeat them all here. The table below shows the major points to consider in developing a TinyML embedded system.


  • Speed
  • Built-in cache
  • Interfaces for data Rx/Tx



  • Sensors
  • ASICs
  • Interfaces for data Rx/Tx


  • Hardware-driven (co-processor)
  • Model-based

Processors intended for use in TinyML development concepts are primarily small FPGAs and small microcontrollers. This forces the development team to find ways to optimize the embedded AI model for minimum compute requirements to address the intended task or problem.

AI Development Side

Developing an AI-enabled application is not just about the usage of AI to accomplish a specific task, it is about building a dataset and model that can provide highly accurate results when applied to the task or problem at hand. A workflow for AI development can proceed as follows:

Problem definition and dataset creation: The first step is to understand the problem at hand and define the task for the AI/ML model. This task could be classification, regression, anomaly detection, etc. We then collect and create a dataset relevant to the problem. This dataset must be representative of the real-world scenarios that the device will encounter. It may involve collecting sensor data, images, audio, or other relevant data from the field. Data augmentation techniques might be used to artificially increase the size of the dataset.

Preprocessing and feature extraction: Data preprocessing is an essential step to make raw data suitable for use in training a machine learning model. This could involve tasks like noise reduction, normalization, or inferring/interpolating missing data. In many cases, especially in signal processing tasks, feature extraction is applied to determine specific portions of an input dataset that will be suitable for training a model.

Model development: We select an appropriate model architecture based on the problem and the computational resources available on the device. This could be a simple linear model, a decision tree, a neural network, or a more complex architecture. Factors like model size, inference speed, and power consumption are considered while making this decision.

Model training: We use the prepared dataset to train the model using a suitable machine learning framework, such as TensorFlow, PyTorch, or others. This is usually done in the cloud where GPU resources can be accessed. Once a model is trained, it is validated using a separate validation set. This involves measuring metrics relevant to the task, such as accuracy, precision, recall, or mean absolute error.

Model optimization: Optimization is used to make a model more suitable for running on low-resource devices and involves techniques that reduce a model's size and computational requirements. This includes techniques such as:

  • Quantization
  • Pruning
  • Model compression
  • Conversion to floating point
  • Sparsity elimination
  • Additional pre-processing of training and input data

This is usually done using tools provided with the machine learning framework (e.g., TensorFlow) used to build and deploy the model.

Firmware Integration: The optimized model is then converted into a format that can be used for inference on the target device and is integrated into the firmware. An inference engine that is compatible with the hardware is used to load the model and perform inference. The firmware is also responsible for preprocessing input data gathered during operation in the field, and for handling the output of the model within a larger application.

Iterative refinement: Trained models that are accessed in an embedded application may require fine-tuning over time, which involves continuous monitoring of inference results involving known-good test cases. Based on inference results observed from the field, the model may need to be refined with additional acceleration or a revised dataset. This is an iterative process that continues until the required performance and accuracy are achieved.

This process is inherently iterative, and often there will be many loops back and forth as the model is optimized and refined to meet the specifications. With each iteration, the model is slowly becoming more suited to its target application.

Whenever your team is building edge computing products with an embedded AI application, make sure you use the complete set of system analysis tools from Cadence to evaluate systems functionality. Only Cadence offers a comprehensive set of circuit, IC, and PCB design tools for any application and any level of complexity. Cadence PCB design products also integrate with a multiphysics field solver for thermal analysis, including verification of thermally sensitive chip and package designs.

Subscribe to our newsletter for the latest updates. If you’re looking to learn more about how Cadence has the solution for you, talk to our team of experts.

Untitled Document