Real-Time Weather Event Processing With HDF, Spark Streaming, and Solr


Real-Time Weather Event Processing With HDF, Spark Streaming, and Solr

By now, we’ve all gotten well-acquainted with Hortonworks DataFlow (HDF), which collects, curates, analyzes, and delivers real-time data to data stores (with help from Apache NiFi) in a super easy and quick way without having to actually code the various components required to deliver the expected results.

My team and I have also been exploring HDF implementation in various projects and POCs and I have to say, we have grown richer by our experience of working on it. It simply is fantastic!

Today, I am sharing one of our recent uses of HDF, which I hope will let you all implement it effectively, as well.


It’s live weather reporting using HDF, Kafka, and Solr.

Here are the environment requirements for implementing:

  • HDF (for HDF 2.0, you need Java 1.8).
  • Kafka.
  • Spark.
  • Solr).
  • Banana.

Now let’s get on to the steps!

1. Create DataFlow

Start HDF with /opt/HDF- start. Open HDF https://localhost:8090/nifi. Choose the following processors to create the data flow:

  • InvokeHTTP.
  • SplitJson.
  • EvaluationJsonPath.
  • ReplaceText.

Configurations for Each of the Processes

InvokeHTTP processor properties:


With the help of the InvokeHTTP processor, we are connecting to the source and getting data with the help of

It gives JSON data as below:

JSON data

The parameters details are:

  • coord
    • coord.lon City geo location, longitude
    • City geo location, latitude
  • weather
    • Weather condition ID.
    • weather.main: Group of weather parameters (rain, snow, extreme etc.).
    • weather.description: Weather condition within the group.
    • weather.icon: Weather icon ID.
  • base: Internal parameter.
  • main
    • main.temp: Temperature (unit default: Kelvin; metric: Celsius; imperial: Fahrenheit).
    • main.pressure: Atmospheric pressure (on the sea level, if there is no sea_level or grnd_level data), hPa.
    • main.humidity: Humidity, %.
    • main.temp_min: Minimum temperature at the moment. This is a deviation from the current temperature that is possible for large cities and megalopolises geographically expanded (use these parameters optionally). Unit default: Kelvin; metric: Celsius; imperial: Fahrenheit.
    • main.temp_max: Maximum temperature at the moment. This is a deviation from the current temperature that is possible for large cities and megalopolises geographically expanded (use these parameters optionally).Unit default: Kelvin; metric: Celsius; imperial: Fahrenheit.
    • main.sea_level: Atmospheric pressure on the sea level, hPa.
    • main.grnd_level: Atmospheric pressure on the ground level, hPa.
  • wind
    • wind.speed: Wind speed (unit default: meter/sec, metric: meter/sec, imperial: miles/hour).
    • wind.deg: Wind direction, degrees (meteorological).
  • clouds
    • clouds.all: Cloudiness, %.
  • rain
    • rain.3h: Rain volume for the last three hours.
  • snow
    • snow.3h: Snow volume for the last three hours.
  • dt: Time of data calculation, unix, UTC.
  • sys
    • sys.type: Internal parameter.
    • Internal parameter.
    • sys.message: Internal parameter.
    • Country code (GB, JP etc.).
    • sys.sunrise: Sunrise time, unix, UTC.
    • sys.sunset: Sunset time, unix, UTC.
  • id: City ID.
  • name: City name.
  • cod: Internal parameter.

The JSON data is split into separate records.

SplitJson Processor Properties


With the help of the SplitJson processor, we are splitting the JSON data based on the JsonPath Expression value:


EvaluationJsonPath processor

EvaluationJsonPath properties:


With the help of the EvaluationJsonPath processor, we are getting the required column values from the JSON data:


ReplaceText Processor

ReplaceText processor properties:


Select the required fields from the list in JSON Format:


With the help of the ReplaceText processor, we are replacing the required JSON records in the required columns.

PutKafka Processor

  • Ingesting data into Kafka topic weather-in.
  • PutKafka will create a new Kafka topic if a topic does not already exist.


Finally, load data into Kafka by using the PutKafka processor. Then, Spark does the job of reading data from Kafka and processing it.

2. Spark-Scala Application to Save Kafka Data in CSV Format

Create a Maven project with following code:




Indexing Data in Solr

To start Solr, open the terminal and run the below commands:


Open the following Solr UI in a browser: https://localhost:8983/solr

Solr UI in a browser

3. Create core in Solr:


Ingest data into Solr from the CSV file generated earlier from the Spark application by using the following code:


Now, data is loaded into Solr core.

You can check data inserted into the core in the web UI:

core in web UI

Open the Banana Web Dashboard using https://localhost:8983/solr/banana/index.html.

It will show a default introduction dashboard similar to the image below:

banana dashboard

Click New in the top right corner to create a new dashboard. It will prompt you to click a type of dashboard:

  • Time-series dashboard if any timestamp column contains your core.
  • Non time-series dashboard if you don’t have any timestamp columns.

time-series dashboard
Time-series Dashboard1

Click Create to create a new dashboard. A successfully created dashboard will look something like this:

created dashboard will

Change the Time Window to select the number of rows required to be shown. Go to Table Panel to check whether your data is correctly parsed or not. If everything is fine, your data will be displayed as follows:

Table Panel

Go to Dashboard settings at the top right corner of the web page:

dashboard settings
dashboard settings1

The webpage will prompt you to create a new panel. Click on that and it will take you to the row settings.


Click on Add panel to empty row.

Select your desired panel from the list. It will show you options based on its properties.

desired panel from

Fill all the required fields to get the graph:

get a get graph

The below diagram shows temperature levels on ground level distributed by the type of weather that day:

type of weather day

Similarly, the humidity on ground level is distributed by wind speed:

distributed by wind

Pressure on sea-level distributed by wind speed:

distributed by wind speed

Follow the below configurations to get a pie chart of various weather days:

Chart of various weather days

Chart of various

You will also see the count of different types of weather days (whose data is recorded):

count of different types


I hope this gives you enough perspective on putting HDF to (very effective) use!

This blog was originally published on Dzone…


Business Transformation through Advanced Analytics: How?

That we are well and truly in the digital age is beyond any doubt. There could still be some naysayers but one cannot deny for long the fact that big-data and analytics are here to stay for good to revolutionize all walks of life. Most acknowledge the power these equip us with to not only challenge the status-quo but also to help us enhance the quality of living in general through many mediums such as business, governance, education, health, social aspects, and security – to name a few, and to spur our overall growth and prosperity.

But adoption of big-data and analytics warrants that there is confidence in the capabilities (of data analytics), the vision to see beyond the ordinary and, above all, the willingness and patience to explore the hereto unexplored areas that could hold so much promise that’s being let pass by idly.


While that being said, there are also those progressive companies and organizations which have adopted big-data and analytics for bettering their business performance and who have now taken it a notch above by extending their investment into advanced analytics – from knowing just what happened and how, they’re now empowering themselves and their personnel with the capacity -with a reasonable degree of assurance – to better predict and forecast business outcomes beyond the ‘what can’ to the ‘by how much’ and further to hinting at the various ways (as indeed the best possible way) to achieve business outperformance, organizational efficiency, and general good.

Advanced analytics is an all-encompassing approach and, hence, citing a couple use-cases may not bring forth all the amazing possibilities that can be realized through information, insight, decision, and action (From the intricate ‘What needs to be changed? What will happen if a process is changed? And What else can be done for the better?’ to just ‘What happened & when? Why it happened, etc.’).

It is really about choosing the right approach and, accordingly, the right combination of the type or types of analytics from amongst graph analytics, facial recognition, predictive, prescriptive, pre-emptive, behavioral, GIS-enabled, text, and social media sentiment analytics, and text and data mining, etc.

But, let it not bother you. There’s help coming your way from us. We here at MSRCosmos will try to bring more authority into what we are saying through demos and free-trial offers. They are aimed to help you demystify advanced analytics ever so easily so that you are in a position to readily put them to use to get the most value out of your business data as well as get a glimpse of the various new ways in which to generate additional revenues.

You will get to experience first-hand the industry-specific advanced analytics solutions we have created that encompass:

  • Facial recognition (Not just image matching but much more than that such as face identification and verification, emotion detection and attention gauging, multi-face recognition, demographics identification, and much more)
  • Text analytics (Read what lies between the lines when people – your target audience- talk about you or a reference is made about you, anywhere online)
  • Social media sentiment analysis (Gauge audience sentiment -about your company & its offerings- hidden in the comments, posts, and shares on various social media that have even the slightest mention about your business), and
  • GIS-enabled analytics (Various analytics solutions that get impetus with nature’s help in better preparing your business to face situations created before and after the occurrence of natural catastrophes in terms of response, recovery, prevention, detection, assessment, and preparedness).

This is a suite of smart-analytics solutions that you can test with your business data and see the results on use-friendly interfaces with the help of our visualization solutions, to be able to make better sense of it.

Click here to register for a demo or, better still, a free-trial of an industry-solution that suits your needs.

Write in to us ( or call us (925 399-4218) if you have any specific query regarding advanced analytics.
This blog was originally published on Medium…


Facial Recognition Solutions: Use-Cases


The facial recognition market is expected to grow to more than $2 Bn by 2020. While that’s a small figure compared to that of the analytics market which is expected to grow to a whopping $200+ Bn around the same time, the demand for face analytics continues to grow in-line with the expectations as does the application of big-data and analytics in many spheres of our lives.

The fact that Facebook, Google, Amazon, Microsoft, and a host of other technology majors have acquired -and continue to be on the look-out for- start-ups and companies delving deep in the area of facial recognition, is a testimony to not only the growing demand of facial recognition tools but also the power it can equip organizations with to do so many wonderful things that weren’t even imagined earlier, much less possible. One of the many premises being how companies and organizations understood people (customers, prospects, visitors, strangers, patrons, commoners, suspects, etc.) beyond online footprints and such other touch-points.

Admittedly, facial recognition softwares have been in use for quite some time now. However, that was limited use by a select few such as the state and federal investigating agencies, security organizations and, perhaps, a handful of businesses, where it hadn’t fully matured into a reliable resource.

But, technological advancements, both in terms of hardware and software, have now equipped solution providers to conceive and build solutions that can go beyond the traditional and limited use of facial recognition techniques so as to help users with much more potent information with which to decide and take actions, for the better.

On the hardware front there are many types of cameras, surveillance systems, etc., that work at various resolutions and frames thus allowing capture of more than just the image. The software aspect of it has concepts such as artificial intelligence (AI), cognitive analytics, neural networks and machine learning fantastically well complemented by some of the latest visualization tools such as Microsoft Power BI, Tableau, Kibana, etc.

These advancements are firing the imagination of solution providers and enabling them to build very advanced and yet practically useful solutions that can be put to use in areas ranging from biometrics, information security and access control to law enforcement, surveillance systems, and smart-cards, etc.

The new solutions have gone beyond just matching a person with an existing image. They are capable of doing a lot more: from identifying and verifying a person and detecting his/her emotions as indeed the depth of those emotions to gauging that person’s attention to multi-face recognition (identifying more than one face) from a digital image or a video frame and also helping in demographics identification to boot!


Now, let us look at some of the use-cases of facial recognition:

  • Authorized entry into a stadium, building, or office premises
  • Authenticity verification of patrons throughout the games – regardless of the number of times one moves in or out of the stadium, as well as spotting suspicious behavior or activity at even such a crowded place
  • Virtual Persona Creation & Make-up: Players can get themselves into other people’s skin say celebrities, sports-stars, etc., e.g. video game players can use the avatar of the Undertaker and wrestle with The Rock on the other side!, or women can try out various make-up options and arrive at the best that suits them – without having to actually put on any grease paint at all!!
  • Security
    • Facial identification even if the person’s facial features have changed over time from the earlier instance when his/her image was captured and stored in the database
    • Spotting suspicious behavior based on a set of facial emotions (neutral, anger, fear, contempt, etc.)
    • Facial verification in cases such as schools where it is required that only verified personnel have access to students
  • Investigations – Image-based investigations help in quicker identification of facts relating to a crime, in faster disposal of cases, as well as in taking precautionary measures
  • Customer satisfaction / feedback review through sentiment detection. E.g. An existing customer, or a walk-in with the potential to be one, interacts with the business representative (enquires, explores, buys, transacts, etc.) and walks out. A facial recognition tool will instantly gauge the emotion and sentiment of that person and lets the relevant stake-holders know of whether that person was satisfied or not (based on positive, negative, or neutral sentiments of the face). These will be extremely useful in all B2C segments where there is direct interaction of customers with the brand or a business.
  • Since the solutions also identify demographics, it can be employed in all the places where there’s age-related prohibition such as cinema theaters, vending machines (for allowing only certain drinks to be had based on age), elections, etc.

While lot of work is still to be done in this area to arrive at the most impactful use-cases, why not explore the benefits of these advanced facial recognition tools to improve your organization’s operational efficiency as indeed performance?

Contact our team today to help you set-up a free-trial – in your environment. Or write in to us with your queries – our team shall respond to you with the relevant options and assist you in using the suitable solution. Email T: +1 925 399 4218.


This blog was originally published on Datafloq

Contact MSRCosmos

Contact Us