Sep 17, 2020
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
As machine learning (ML) has grown, many open source frameworks have come along, effectively democratizing what was once reserved for artificial intelligence (AI) researchers. Of these frameworks, TensorFlow has become the most popular for building and running ML models, and in particular, its Python implementation which has undergone the most development.
Not only does TensorFlow have a cool name, but it's actually derived from its inherent graph architecture where the nodes represent operations, and the edges are the tensors that flow in and out of the nodes (hence the name "TensorFlow").
Of course an ML framework is only part of the ML equation, and it takes a strong hardware platform to run models for inference efficiently. The Qualcomm Snapdragon mobile platform is designed for high-performance ML at the edge, and complements TensorFlow thanks to the rich set of tools in the Qualcomm Neural Processing SDK for artificial intelligence which converts trained TensorFlow models to representations optimized for Snapdragon.
Given the popularity of TensorFlow, we thought it would be useful to explore the framework itself to see how models are generated. In this blog, we'll look at TensorFlow's API layers, see how models are exported, and review the tools of our Neural Processing SDK for artificial intelligence that bridge these models with the Snapdragon heterogeneous mobile platform.
API layers architecture
TensorFlow is supported on a wide range of hardware platforms and languages, and offers a number of APIs, as summarized in Figure 1:
Starting with its underlying framework, the TensorFlow Distributed Execution Engine (execution engine) is the core of TensorFlow that trains and runs models. It takes as input, TensorFlow programs compiled by the language-specific frontend (e.g., Python frontend), containing the graph of operations and tensors. In code, developers perform compilation and training on the execution engine using TensorFlow's Session object. The execution engine then performs optimizations by analyzing the graph and invoking the specified operations in parallel, where possible.
On top of the execution engine sits a stack of APIs, each allowing the developers to work at different levels of abstraction. For the Python frontend, these levels include:
- Layers: the first level of abstraction for building deep neural networks using very explicit ML constructs. Note that this layer has been removed in TensorFlow 2.0.
- Estimator: high-level APIs to train a model, evaluate its accuracy, and even perform inference with that model. These APIs include a number of Canned Estimators (described below), or you can customize your own based on the parameters you pass to the tf.estimator.Estimator object.
- Keras: a high-level, Python-based API, that initially began as an implementation outside of TensorFlow. The Keras API currently provides the highest level of abstractions allowing ML practitioners to focus on building neural networks, without having to deal with low-level constructs and computations (e.g., the underlying algorithms and math). Keras has now been fully implemented on top of lower-level TensorFlow APIs and is included with the Python distribution of TensorFlow.
- Datasets: APIs to load and manipulate data, and pipe it into your model. For example, developers can use the tf.data.Dataset object to load training data, transform its shape, and then iterate on its output. The Datasets APIs can be used with a number of other API layers.
- Canned Estimators: pre-defined Estimators included with TensorFlow for common types of ML models and problems (e.g., tf.estimateor.LinearRegressor for linear regression).
Of these API levels, Keras' rich, high-level API, has become a popular choice for those wanting to quickly build, prototype, and experiment with models. Developers who need more control, will likely use the lower-level APIs.
Checkpoints and saving
When training a model, TensorFlow developers have the option to save checkpoints, which serializes the values of the model's parameters (e.g., weights). Checkpoints can be useful in case training is interrupted and needs to be continued at a later point. Checkpoints are different from saved models (described below) in that they don't contain information about the model itself, and are only of use when loaded by the API that constructs the model (e.g., to resume training, share weights with other ML practitioners who have the same model, etc.).
There are a number of files generated by a TensorFlow checkpoint operation including:
- checkpoint: keeps a record of the latest checkpoint file(s) saved.
- .data file(s): a collection of shard files containing the models weights, biases, gradients, and other saved variable values.
- .index file: stores which weights are stored in which shard.
- .meta file: contains the complete graph structure including variables, operations, etc.
Developers often save checkpoints after a certain number of training steps (e.g.,1000). They also typically encode the number of training steps in the suffix of .data filenames to indicate the checkpoint they correspond to. For example, the following files might exist after two checkpoints:
Once the ML practitioner is happy with the trained model, it can be saved (exported) to the SavedModel Format (.pb files). This is a Protobuf file format that saves everything about the model including its execution graph, custom objects, and layers. This format provides maximum portability as it does not require the original source code to reconstruct the model and is generally the preferred export format for production for .pb.
The process of saving the trained model for inference is sometimes called freezing the model because it eliminates any additional metadata (e.g., checkpoints).
Bridging models to Snapdragon-based devices
As mentioned, Snapdragon has rich support for running inference on trained TensorFlow models thanks to our Neural Processing SDK for artificial intelligence (AI). The inherent heterogeneous architecture of Snapdragon means that models can be executed on the platform's Qualcomm® Kryo™ CPU, Qualcomm® Adreno™ GPU, and Qualcomm® Hexagon™ DSP. This is accomplished by converting the trained TensorFlow model to the Snapdragon proprietary .dlc format using command-line tools provided in the SDK:
The SDK's tool that performs this conversion is snpe-tensorflow-to-dlc and forms the core of any TensorFlow-to-dlc pipeline. As described on the TensorFlow Model Conversion page, this tool can take either a frozen TensorFlow model or model checkpoint and convert it to .dlc format.
Before using the SDK, developers should note the TensorFlow Graph Compatibility page that lists which TensorFlow graph operations are compatible with the network layers supported by the SDK. Developers have the option to implement non-supported layers via user defined operations (UDO)s.
The SDK also provides the following command-line tools that developers can augment their ML pipelines with:
- snpe-dlc-quantize: quantizes a .dlc file to optimize its size.
- snpe-dlc-info: saves layer information to a .csv file.
- snpe-dlc-diff: saves the differences between two .dlc files to a .csv file.
- snpe-dlc-viewer: renders the network structure of a .dlc file in a browser.
Flowing your Tensors onto Snapdragon
TensorFlow represents models as a graph of nodes containing operations and edges comprised of tensors that flow through those nodes. It is this flexibility that has helped make TensorFlow one of the most popular open source ML frameworks today. To bring these models to the edge, Qualcomm Technologies, Inc. provides an equally flexible and rich SDK that runs these models with hardware-accelerated inference on mobile devices powered by Snapdragon.
If you'd like to learn more, be sure to check out the following TensorFlow projects on QDN:
Also be sure to check out some of our other recent ML blogs on QDN: