In this tutorial, let me walk you through the Words typed in the Unix Netcat Channel are streamed through Spark and written into Elastic Search lively as a word,count in a Elastic Search Index. Here am assuming Spark to ElasticSearch ingestion automatically creates the index during the ingestion process.
What steps I followed:
- Install elasticsearch (brew install elasticsearch)
- Install mobz/head plugin (https://github.com/mobz/elasticsearch-head)
- Open a Netcat channel in Unix streams and type some words
- Launch a spark program with org.apache.spark.streaming._ imported libraries
- Read the socketTextStream through spark streaming context via 9999 port
- flatmap the words
- For Each DStreams RDD => Convert them to Dataframe and Register a Temp Table
- Save each Dataframe to Elastic Search Index
- Open https://localhost:9200/_plugin/head and keep Refreshing the Page to see the latest word, count columns in a ES Index.
No Index is present:
Create the Below Program in IntelliJ and Execute the Program:
After the Exeuction in IntelliJ, you will see a Streaming Context launched locally and job keeps running in the Run Console.
Open Netcat and Type some words and Press Enter:
$ nc -lk 9999
(23 Documents must be stored)
Now see the words in Elastic Search with the Index Auto Created.
Enter some other words and Press Enter
Now, see it in the Elastic Search Index:
You can see document count increased to 49 as below.
This was all possible with the magic of Spark+DataFrame+ES APIs. Same way, you can write it into Cassandra, Mongo, Hbase, Oracle, mysql or any destination databases or systems.
Now you can stop the spark streaming job. Hope you Enjoyed this tutorial.