Project Upgrade for Spark Streaming.#1
Conversation
| implicit val actorSystem = ActorSystem("ReceiverSystem", ConfigFactory.load("akkaconfig/remoteActors").getConfig("remoteSystem")) | ||
| implicit val actorMaterializer = ActorMaterializer() | ||
| implicit val actorMaterializer = Materializer(actorSystem) | ||
|
|
There was a problem hiding this comment.
Symbol ActorMaterializer is deprecated. In recent versions of Akka, Materializer.apply(system) is the recommended way to create a system-wide materializer.
| val sentiments = sentences.map { sentence: CoreMap => | ||
| val tree = sentence.get(classOf[SentimentCoreAnnotations.AnnotatedTree]) | ||
| val tree = sentence.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree]) | ||
| // convert the score to a double for each sentence |
There was a problem hiding this comment.
AnnotatedTree is no longer available, so used SentimentAnnotatedTree.
| val kafkaProducer = getProducer() | ||
| val bindingFuture = Http().bindAndHandle(getRoute(kafkaProducer), "localhost", 9988) | ||
| val bindingFuture = Http().newServerAt("localhost", 9988).bind(getRoute(kafkaProducer)) | ||
|
|
There was a problem hiding this comment.
Symbol bindAndHandle is deprecated. Used Http().newServerAt(...)...bindFlow() to create server bindings.
| val kafkaProducer = getProducer() | ||
| val bindingFuture = Http().bindAndHandle(getRoute(kafkaProducer), "localhost", 9988) | ||
| val bindingFuture = Http().newServerAt("localhost", 9988).bind(getRoute(kafkaProducer)) | ||
|
|
There was a problem hiding this comment.
Symbol bindAndHandle is deprecated. Used Http().newServerAt(...)...bindFlow() to create server bindings.
|
@daniel-ciocirlan In many files, StreamingContext and DStream are both deprecated as of Apache Spark 3.x. They were replaced by structured streaming, which offers improved performance, scalability, and fault tolerance.
It's suggested to Migrate to Structured Streaming. Correct me if I'm wrong, To migrate our application to use structured streaming, we need to replace StreamingContext object with a SparkSession object, and use the readStream() method to create a DataFrame representing our input data stream. Do we need to do it ? Need your guidance. |
daniel-ciocirlan
left a comment
There was a problem hiding this comment.
Thanks for this, just some minor comments on lib upgrades
| val sparkVersion = "3.5.0" | ||
| val postgresVersion = "42.6.0" | ||
| val cassandraConnectorVersion = "3.4.1" // preview version at the moment of writing (October, 2023) | ||
| val akkaVersion = "2.8.0" |
There was a problem hiding this comment.
please use a 2.6 Akka at most, as they changed their license model and we don't want to use non-OSS libraries
| val postgresVersion = "42.6.0" | ||
| val cassandraConnectorVersion = "3.4.1" // preview version at the moment of writing (October, 2023) | ||
| val akkaVersion = "2.8.0" | ||
| val akkaHttpVersion = "10.5.0" |
There was a problem hiding this comment.
same for this one, I don't remember which was the last Akka HTTP version that was still OSS
| implicit val actorSystem = ActorSystem("ReceiverSystem", ConfigFactory.load("akkaconfig/remoteActors").getConfig("remoteSystem")) | ||
| implicit val actorMaterializer = ActorMaterializer() | ||
| implicit val actorMaterializer = Materializer(actorSystem) | ||
|
|
| val sentiments = sentences.map { sentence: CoreMap => | ||
| val tree = sentence.get(classOf[SentimentCoreAnnotations.AnnotatedTree]) | ||
| val tree = sentence.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree]) | ||
| // convert the score to a double for each sentence |
| ) | ||
| ) | ||
| complete { | ||
| Using(Source.fromFile("src/main/html/whackamole.html")) { source => |
There was a problem hiding this comment.
nice of you to auto-close the file 👍
| val kafkaProducer = getProducer() | ||
| val bindingFuture = Http().bindAndHandle(getRoute(kafkaProducer), "localhost", 9988) | ||
| val bindingFuture = Http().newServerAt("localhost", 9988).bind(getRoute(kafkaProducer)) | ||
|
|
|
@Ayush21-AI about DStreams - you can keep it as is for now. Please reply with a +1 if the Twitter project works with this new upgrade. There is a way for us to read regular rows, but that means a more significant overhaul - we need to register a custom data source that implements the DataSource V2 API. That upgrade will involve some more interesting work in the code, if you're up to it. We can make that a separate PR. |
1b068c0 to
e0cf1f8
Compare
| val cassandraConnectorVersion = "3.4.1" // preview version at the moment of writing (October, 2023) | ||
| val akkaVersion = "2.6.19" | ||
| val akkaHttpVersion = "10.2.10" | ||
| val twitter4jVersion = "4.0.7" |
There was a problem hiding this comment.
Updated the akka versions as per the akka licencing.
|
@daniel-ciocirlan updated the akka versions. While running the TwitterProject, i'm getting this below error : This error indicates that we are trying to serialize a Promise or Future object. This is not possible because these objects are not serializable. kindly suggest what to do? |
|
I don't understand - can you revert to the original code and check whtether this worked? |
e0cf1f8 to
0496777
Compare
0496777 to
ca3563b
Compare
| connection and reads data from the socket. | ||
| */ | ||
| var mayBeSocket: Option[Socket] = None | ||
|
|
| if a Twitter stream has been created. | ||
| */ | ||
| var mayBeStream: Option[TwitterStream] = None | ||
|
|
There was a problem hiding this comment.
updated as per your suggestion.
@daniel-ciocirlan Please review this PR.