Creating a microservice using the Qualcomm AI Inference Suite
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Sign upCome for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.
Join Developer Discord
Extending the idea of using AI for customer sentiment analysis in our last blog post, we show how to create a microservice to do the same task. For this example, we use the Golang (Go) programming language to show an example that can be compiled to a very small executable which could easily be containerized and scaled across any number of AI inference backends.
Setup
When it comes time to use AI inference in production, a common software architecture pattern is to use a microservice that is purpose built to do one thing and do it well. In this case, we want to create a service that does a specific AI inference task.
By encapsulating functionality this way, we can more easily scale the microservice to meet demand and abstract the backend doing the actual inference. It also allows us to use any number of backends to do the same job. We could task the microservice with logging requests, providing observability data, or other typical MLOps work.
Using the Qualcomm AI Inference Suite running on Cirrascale infrastructure (powered by Qualcomm Cloud AI accelerators), I’ll demonstrate how to build a microservice using the Go programming language.
Sample Scenario
Imagine you have been tasked with looking for trends in how customers react to the various product or service offerings of your company. Company leadership wants to know which products customers are happy with and which might need more work to improve. The data you have is a set of customer reviews that buyers have written after purchase.
In the past, you might have purchased a customer sentiment analysis package that ingests these feedback records and then presents a series of dashboards for analysis. AI can do this work without the need for separate software licenses. The approach we suggest here is to use AI to evaluate customer feedback, and then store the result tied back to the products in question. Now you can use standard reporting to see where customers were happy, neutral, or negative relative to your products.
Taking it a step further, you could also use AI to extract the reason(s) why they rated your products the way they did. Of course, this depends on the review. A review saying, “I loved this product,” doesn’t tell you what about it they liked. While a review saying, “The power button is hard to reach,” provides actionable feedback to improve.
What I’ll show here is a simple microservice that takes a set of feedback text as input and provides a sentiment as output.
Steps
- Create a simple web service that accepts the text of customer feedback
- Parse the request to ensure it is valid and the correct format
- Build our prompt to have AI evaluate it
- Call our AI endpoint
- Return the result
- Test the service by calling it with curl
In this sample, we used only the Go standard libraries and one convenience library (godotenv) to simplify storing our endpoint and API key. There are other third-party libraries that make creation of microservices easier and more convenient, but for the sake of simplicity we have stuck with the bare minimum.
Sample Run
Let’s first set up our server. Taking a look at the GitHub repository, you can run this sample from a terminal window on your PC (assuming Go is already installed) by first adding the godotenv library, updating your project, and then running.
>go get github.com/joho/godotenv
>go mod tidy
>go run .\micro.go
Once it is running, it will be serving on port 8080, and we can call it from a different terminal window (client) using the curl utility. Curl is a command line tool that simulates calling a web server from a browser with options to add headers and content. We use curl here to act as the ‘client’ of our microservice running as a ‘server.’
>curl -X POST -H "Content-Type: application/json"
-d '{"Feedback":"These running shoes are fantastic."}'
http://localhost:8080/
If everything is working correctly, you’ll see this output on the original (server) window.
>go run .\micro.go
These running shoes are fantastic.
positive
As you can see the service echoes back the feedback text, and on the next line a simple one-word evaluation of positive, neutral, or negative. You can try out different sets of feedback by modifying the curl command to see what happens.
Code Notes
This sample consists of just three functions: main, microHandler, and evalFeedback. The main program simply loads environment variables (endpoint and API key) and starts a server listening on port 8080 using Go’s standard net/http library.
microHandler does some rudimentary error checking and then reads in the JSON format of a request and maps it into a Go data structure. It then extracts the text of customer feedback and calls the third function, evalFeedback.
evalFeedback is where the magic happens. It creates a JSON payload by building up the correct format to submit to the Qualcomm AI Inference Suite API and then makes the call. The result is extracted and returned to the microHandler function, which then outputs the result.
If you aren’t familiar with the Go programming language, the code here may seem a bit convoluted as one has to build data structures to hold the expected JSON data to make accessing the values inside easier. Use the API documentation to see what JSON is expected and returned for each API endpoint. Here is the code that takes the data returned from the API and extracts the result of our AI call:
// Define a struct matching the relevant JSON structure
var result struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}
// Unmarshal JSON into the struct
if err := json.Unmarshal([]byte(string(body)), &result); err != nil {
panic(err)
}
// Extract the content from the first choice's message
content := result.Choices[0].Message.Content
return contentTry it out yourself
Using a pattern where the microservice abstracts the call to AI means that you can achieve added flexibility as you build production systems. Benefits include:
- Changing out your AI model at will without breaking your API contract with clients
- Adding load balancing across more than one AI backend
- Observability – logging call, result, and metric data for analysis
- Billing – tracking client use in a production system
- Caching – return from the cache instead of incurring more AI inference cost
Creating a microservice to perform a single AI task is a common way to simplify the dependencies of code so that you have flexibility in replacing your model and can scale without affecting client code. Try out this sample and let us know over on the Qualcomm Cloud AI Discord channel what you’ve created for your own scenarios.
The process of using inference on a scalable platform like the Qualcomm AI Inference Suite is as straightforward as using any other simple API.
Star this repo to follow updates in the future as we create more code samples.
Join our experts and fellow developers for real-time conversations and support at Qualcomm Developer Discord

