Developer Blog

Explore GenAI applications on TensorOpera AI Platform powered by Qualcomm Cloud AI 100 Accelerators

Written by

Parmeet Kohli

Aug 29, 2024

Introduction

TensorOpera and Qualcomm Technologies announce the next step in their strategic collaboration by showcasing their capabilities through a public SDXL endpoint served on TensorOpera AI Platform powered by Qualcomm Cloud AI 100. This enables AI developers to build, deploy, and scale generative AI applications with improved performance and cost efficiency. For enterprises, this collaboration eases the challenges of developing their own generative AI applications. Together, we provide a comprehensive platform that simplifies the complexities of generative AI development and access to advanced AI hardware.

How to try the public SDXL endpoint on Qualcomm Cloud AI 100?

TensorOpera

Aug 28, 2024 | 2:35

Video Player is loading.

Current Time 0:00

Duration 2:35

Loaded: 3.86%

Stream Type LIVE

Remaining Time 2:35

Sign up on TensorOpera: https://tensoropera.ai/home
Launch the TensorOpera AI Platform and navigate to the Model Marketplace
Pick Qualcomm-SDXL in the Model Marketplace
Try the public endpoint in the Playground
Integrate the endpoint into your application with OpenAI Standard Format under API

The public endpoint on Qualcomm Cloud AI 100 is priced at $0.00005 / Step, amounting to a 50% reduction in the price-to-performance ratio in comparison to SDXL on Nvidia A100.

How to get a scalable dedicated Qualcomm Cloud AI 100 endpoint on TensorOpera?

Qualcomm Cloud AI 100

Aug 28, 2024 | 0:36

Video Player is loading.

Current Time 0:00

Duration 0:35

Loaded: 16.93%

Stream Type LIVE

Remaining Time 0:35

A scalable dedicated endpoint exclusively allocates computing resources with the ability to dynamically adjust resource allocation based on demand, ensuring consistent performance, control, and cost efficiency.

After creating a TensorOpera account, go to the Model Marketplace, choose Qualcomm-SDXL, and move to the Playground.

Hit Deploy and choose your type of compute:
1. Dedicated Qualcomm Cloud AI 100
  1. Contact TensorOpera’s team with your use case specifications and requirements to allocate compute nodes to your account. Dedicated pricing for the Qualcomm Cloud AI 100 is at $0.20 / accelerator h.
2. Serverless Qualcomm Cloud AI 100
  1. Request early access to this feature by sending a request via this form: https://tensoropera.ai/qualcomm-cloud-ai-100.
Choose the number of replicas - we recommend a minimum of 2 Qualcomm Cloud AI 100 cards per replica.
Decide if you want to auto-scale according to your user demand and pick the criteria and decision window to scale up or down.
Hit Deploy!

Also, if you want to deploy your own customized model, simply follow the documentation to create your own Model Card: https://docs.tensoropera.ai/deploy/create_model.

Why use Qualcomm Cloud AI 100 vs NVIDIA A100?

Price <> Latency Comparison	INSTANCE	LATENCY (seconds)	PRICE (card per hour)
	Qualcomm Cloud AI 100 Pro (2 cards)	4.46s	$0.2/hr
	NVIDIA A100 (1 card)	2.89s	$1.30/hr

Qualcomm Cloud AI 100 is a market-leading AI inferencing solution that offers exceptional performance efficiency, density, and cost-effectiveness. With its industry-leading AI cores, Qualcomm Cloud AI 100 offers the same performance as competing offerings at half the price, illustrated in the table above.

How does the TensorOpera and Qualcomm Technologies Collaboration work?

Explore TensorOpera AI powered by Qualcomm

TensorOpera is supporting native deployment by monitoring Qualcomm Cloud AI 100 NPU Utils. When deploying models like SDXL and Llama, users select the specific model through Nexus UI and choose serverless deployment with Qualcomm Cloud AI 100 NPUs. The control plane then monitors the available NPUs and assigns the required number of NPUs to the job. TensorOpera library then builds the docker with Qualcomm Cloud AI SDK on NPUs and serves the model by TensorOpera inference runner. Such a native integration provides a seamless deployment experience. Users can then easily deploy and monitor model endpoints with zero-code effort.

Next Steps

So now it's your turn to access the public SDXL endpoints and deploy your own SDXL endpoint.

From today, you can play around in the Playground and integrate the API to your GenAI application by following the step-by-step approach mentioned in the blog. Don’t have access to Qualcomm Cloud AI 100? Make sure to fill out the request form.

If you would like to learn more about TensorOpera, check out their website and make sure to follow TensorOpera on LinkedIn or X.

AI Machine Learning Cloud Cloud Edge

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

About the Author

Parmeet KohliProduct Manager, Staff