Premium Only Content
 
			Realtime Streaming with Unstructured Data Engineering | Get Hired as an Experienced Data Engineer
In this video you will be building a realtime streaming pipeline for unstructured data with different data types (TEXT, IMAGE, VIDEO, CSV, JSON, PDF) with over 600+ different datasets.
MORE DATA ENGINEERING VIDEOS AVAILABLE on datamasterylab.com
Like this video?
- Buy me a coffee: https://www.buymeacoffee.com/yusuf.ganiyu
- Support the channel: https://www.youtube.com/@codewithyu/join
Timestamps:
0:00 Introduction
1:50 System Architecture Overview
4:08 System Architecture Design
13:22 Setting up Spark Streaming for Unstructured Data
21:46 Handling multiple unstructured data types
24:31 Creating data schema
30:35 Creating custom user define functions for data extraction
51:14 Parsing and extracting text data
1:40:30 Structuring the results into a dataframe
1:46:15 Reading JSON structured files into the streams
1:49:47 Joining Structured and Unstructured Data Streams
1:52:50 Writing Data to AWS S3 Bucket
2:04:20 Creating AWS Glue Crawler for the data
2:08:25 Verifying the crawler results on Athena
2:11:36 Deploying Spark Streams to Spark Clusters
2:26:31 Verification of Results
2:29:40 Outro
👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/
🚀 X(Twitter): https://x.com/YusufOGaniyu
📝 Medium: https://medium.com/@yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Source Code and Datasets: https://www.buymeacoffee.com/yusuf.ganiyu/source-code-real-time-streaming-pipelines-unstructured-data
✅ Docker Compose Documentation: https://docs.docker.com/compose/
✅ Apache Spark Official Site: https://spark.apache.org/
✅ Confluent Docs: https://docs.confluent.io/home/overview.html
✅ S3 Documentation: https://docs.aws.amazon.com/s3/
✅ AWS IAM Documentation: https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html
✨ Tags ✨
Data Engineering, Apache Spark, Unstructured Data, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming
✨ Hashtags ✨
#DataEngineering #ApacheSpark #unstructureddata #Docker #ETLPipeline #DataPipeline #StreamingData #RealTimeAnalytics
- 	
				 3:05:11 3:05:11Inverted World Live6 hours agoThe Halloween Special with Drea De Matteo and Sam Tripoli | Ep. 133257K10
- 	
				 2:57:52 2:57:52Laura Loomer9 hours agoEP154: Naturalized US Navy Medic From GAZA Exposed For Ties To Hamas34.7K20
- 	
				 35:45 35:45Stephen Gardner12 hours ago🔴BREAKING: Election Auditor EXPOSES Democrat Election Fraud Evidence!27.9K48
- 	
				 15:48 15:48Sponsored By Jesus Podcast16 hours agoHow to Stop Being JEALOUS | When Comparison Steals Your Joy8.11K11
- 	
				  DLDAfterDark7 hours ago $6.24 earnedDon't Worry - Things Will Get SO Much Worse! Sometimes It Be Like That17K3
- 	
				 25:41 25:41Robbi On The Record12 hours ago $2.22 earnedThe Billion-Dollar Lie Behind OnlyFans “Empowerment” (Her Testimony Will Shock You) | part II11.1K8
- 	
				 12:22 12:22Cash Jordan4 hours ago"CHICAGO MOB" Fights Back... "ZERO MERCY" Marines DEFY Judge, SMASH ILLEGALS22.3K38
- 	
				 46:58 46:58Brad Owen Poker16 hours agoI Make QUAD ACES!!! BIGGEST Bounty Of My Life! Turning $0 Into $10,000+! Must See! Poker Vlog Ep 32312.9K6
- 	
				 2:52:28 2:52:28TimcastIRL7 hours agoSTATE OF EMERGENCY Declared Over Food Stamp CRISIS, Judge Says Trump MUST FUND SNAP | Timcast IRL233K129
- 	
				 3:22:45 3:22:45Tundra Tactical14 hours ago $20.28 earned🚨Gun News and Game Night🚨 ATF Form 1 Changes, BRN-180 Gen 3 Issues??, and Battlefield 6 Tonight!38.6K4