You don't need to be an AI or ML developer to know about the biggest tech trend of 2023. Generative AI and large language models have solidly become part of mainstream technology lexicon and there is no chance of these computing paradigms vanishing anytime soon. What people are realizing now is that large language models can be useful for much more than just mimicking human languages. These models could be deployed in a huge range of applications where certain types of autonomy are desirable, but the data set and input data types are not derived from human language.
More generally, tokenization and vectorization can be applied to various types of data and used to build a generative AI model, which could then be deployed in a variety of scenarios. At the network edge, generative AI currently relies on edge or cloud computing resources, but soon full autonomy may be possible with inference directly on the end device. Here is what it will take to reach that level of autonomy, both at the hardware level and software level.
Edge Generative AI in Hardware
At the hardware level, generative AI takes a significant amount of computing resources, which is why it currently exists only as part of data center architecture. AI generally requires a specialized approach to computing that enables massive parallelization and tensorial computation with low latency, but generative AI kicks these requirements into overdrive. Building generative AI capable systems at the edge focuses on implementation tensorial compute resources in two areas of system:
In a processor, which could use a tensorial processing block
In memory, which must allow for fast storage and access of data
To meet the demand of on-device AI inference and training, semiconductor companies are already providing small, specialized tensor processing units TPUs as efficient alternatives to GPUs. Development with these products is achieved using a developer toolkit available from the semiconductor vendor, which usually includes code examples and libraries for implementing on-device AI. This has been the dominant approach to AI development, but has not always enabled efficient development of a generative AI approach in embedded systems.
The Next Generation Will Become Generative
GPUs, TPUs, and even vanilla CPUs can all be used for some type of generative AI. It is really a matter of optimizing the hardware architecture and model architecture to ensure low latency inference with reasonable power consumption. In many embedded systems, form factor is also important as the demand for small devices is evergreen.
What are the processor options for generative AI at the edge? Some examples and their advantages are detailed in the list below.
Small GPUs - Many developers are familiar with these components and have experience with them, they are also proven low-risk components.
FPGAs - Developers have expanded hardware capabilities not found in conventional chips, and the required compute can be built as custom logic.
Small accelerator ASICs - The startup world is focusing on ASICs and small co-processors that are highly optimized for generative AI compute.
CPUs and MCUs are most likely not going to be the compute platform of choice for generative AI. The reason is quite simple: the compute architecture in CPUs and MCUs is not optimized for the highly parallelized tensorial computation required for inference in AI models. Specialized SoCs may also be an option, but AI compute blocks might be too small for on-device generative AI in these components.
Instead, developers should focus on making models very specific as this currently appears to be the trend in model building today. The proprietary models being developed today are technically the same architecture as LLMs, but they are not all being developed for generating human-readable language. In fact, according to a technical report from snorkel.ai, text generation is the 3rd most common usage of LLMs.
Proprietary LLMs rely on a company’s proprietary dataset. But with LLMs, the training dataset does not need to be samples of human language. Technically, any dataset could be used for training a generative AI model for embedded system as long as the data can be vectorized. This is the key to implementing new types of generative autonomy for an embedded system without relying on human language as an intermediary.
Embedded systems that implement high-compute workloads bring thermal and power demands that can be evaluated with the best set of system analysis tools from Cadence. Only Cadence offers a comprehensive set of circuit, IC, and PCB design tools for any application and any level of complexity. Cadence PCB design products also integrate with a multiphysics field solver for thermal analysis, including verification of thermally sensitive chip and package designs.