Everyone knows the story of the Wright brothers. They were the first people to actually build a flying machine that could carry a person operator. As one may assume this task did not just happen…
What’s covered?
Azure Stream Analytics is a real-time analytics and complex event-processing engine designed to analyze and process high volumes of fast-streaming data from multiple sources simultaneously. It supports the notion of a Job
, each of which consists of an input
, query
, and an output
. Azure Stream Analytics can ingest
data from Azure Event Hubs (including Azure Event Hubs from Apache Kafka), Azure IoT Hub, or Azure Blob Storage. The query
, which is based on SQL query language, can be used to easily filter, sort, aggregate, and join streaming data over a period of time.
Assume you have an application that accepts processed orders from customers and sends them to Azure Event Hubs. The requirement is to process the “raw” orders data and enrich it with additional customer info such as name, email, location etc. To get this done, you can build a downstream service that will consume these orders from Event Hubs and process them. In this example, this service happens to be an Azure Stream Analytics job (which we’ll explore later of course!)
In order to build this app, we would need to fetch this customer data from an external system (for example, a database) and for each customer ID in the order info, we would query this for the customer details. This will suffice for systems with low-velocity data or where end-to-end processing latency isn’t a concern. But it will pose a challenge for real-time processing on high-velocity streaming data.
Of course, this is not a novel problem! The purpose of this blog post is to showcase how you can use Azure Stream Analytics to implement a solution. Here are the individual components:
An individual order is a JSON payload that looks like this:
This is the workhorse of our solution! It joins (a continuous stream of) orders data from Azure Event Hubs with the static reference customers data based on the matching customer ID (which is id
in the customers
data set and id
in the orders
stream)
In this section, you’ll:
Please note that you need to create two topics:
Save the JSON below to a file and upload it to the storage container you just created.
To configure Azure Event Hubs Input
Open the Azure Stream Analytics job you just created and configure Azure Event Hubs as an Input. Here are some screenshots which should guide you through the steps:
Choose Inputs from the menu on the left
Select + Add stream Input > Event Hub
Enter Event Hubs details — the portal provides you the convenience of choosing from existing Event Hub namespaces and respective Event Hub in your subscription, so all you need to do is choose the right one.
To configure Azure Blob Storage Input:
Choose Inputs from the menu on the left
Select Add reference input > Blob storage
Enter/choose Blob Storage details
Once you’re done, you should see the following Inputs:
Azure Stream Analytics allows you to test your streaming queries with sample data. In this section, we’ll upload sample data for orders and customer information for the Event Hubs and Blob Storage inputs respectively.
Open the Azure Stream Analytics job, select Query and upload sample orders data for Event Hub input
Save the JSON below to a file and upload it.
Open the Azure Stream Analytics job, select Query and upload sample orders data for Blob storage input
You can upload the same JSON file that you uploaded to Blob Storage earlier.
Now, configure and run the below query:
Open the Azure Stream Analytics job, select Query and follow the steps as depicted in the screenshot below:
Select Query > enter the query > Test query and don’t forget to select Save query
The query JOINs
orders data from Event Hubs it with the static reference customers
data (from Blob storage) based on the matching customer ID (which is id
in the customers
data set and id
in the orders
stream.)
It was nice to have the ability to use sample data for testing our streaming solution. Let’s go ahead and try this end to end with actual data (orders) flowing into Event Hubs.
An Output
is required in order to run a Job
. In order to configure the Output, select Output > + Add > Event Hub
Enter Event Hubs details: the portal provides you the convenience of choosing from existing Event Hub namespaces and respective Event Hub in your subscription, so all you need to do is choose the right one.
In the Azure Stream Analytics interface, select Overview, click Start and confirm
Wait for the Job to start, you should see the Status
change to Running
Start a consumer to listen from Event Hubs output topic
Create a kafkacat.conf
file with Event Hubs info:
Let’s first start the consumer process that will connect to the output topic ( customer-orders
) which will get the enriched order information from Azure Stream Analytics
In another terminal, start sending order info to the orders
topic
In a terminal:
This will block, waiting for records from customer-orders
.
You can send order data via stdout
. Simply paste these one at a time and observe the output in the other terminal:
The output you see on the consumer terminal should be similar to this:
As expected, you won’t see a corresponding enriched event corresponding to orders placed by customers whose ID isn’t present in the reference customer data (in Blob Storage), since the JOIN criteria is based on the customer ID.
This brings us to the end of this tutorial! I hope it helps you get started with Azure Stream Analytics and test the waters before moving on to more involved use cases.
In addition to this, there’s plenty of material for you to dig in.
High-velocity, real-time data poses challenges that are hard to deal with using traditional architectures — one such problem is joining these streams of data. Depending on the use case, a custom-built solution might serve you better, but this will take a lot of time and effort to get it right. If possible, you might want to think about extracting parts of your data processing architecture and offloading the heavy lifting to services which are tailor-made for such problems.
In this blog post, we explored a possible solution for implementing streaming joins using a combination of Azure Event Hubs for data ingestion and Azure Stream Analytics for data processing using SQL. These are powerful, off-the-shelf services that you are able to configure and use without setting up any infrastructure, and thanks to the cloud, the underlying complexity of the distributed systems involved in such solutions is completely abstracted from us.
My name is Luis Alonso Solis, I am 22 years old, I am studiying Mechatronics engineering, I like travel with friends and I don’t like stay in home :(. In my free time I like to do sports like soccer…
Is Cryptocurrency investment always on your mind? Or are you thinking about where to invest in crypto? Set back for a while and reanalyze your financial plans! The Cryptocurrency investment mania and…
my heater sits beside me on my desk working hard — fulling its purpose in life to p …