Extract words from a string in spark hadoop with scala

Keywords´╝Ü regex scala apache-spark hadoop


I was using the code below to extract strings I nedded in Spark SQL. But now I am working with a ton of data in Spark Hadoop and I need help to extract strings. I tried the same code, but it does not work.

Can someone help me? Sorry my bad english.

val sparkConf = new SparkConf().setAppName("myapp").setMaster("local[*]")
val sc = new SparkContext(sparkConf)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import org.apache.spark.sql.functions.{col, udf}
import java.util.regex.Pattern

//User Defined function to extract
def toExtract(str: String) = {      
  val pattern = Pattern.compile("@\\w+")
  val tmplst = scala.collection.mutable.ListBuffer.empty[String]
  val matcher = pattern.matcher(str)
  while (matcher.find()) {
    tmplst += matcher.group()

val Extract = udf(toExtract _)
val values = List("@always_nidhi @YouTube no i dnt understand bt i loved the music nd their dance awesome all the song of this mve is rocking")
val df = sc.parallelize(values).toDF("words")