Spark Java Dataset filter condition not working

Keywords´╝Ü apache-spark apache-spark-sql apache-spark-dataset

Question: 

I am trying to filter the dataset on a filter condition which has multiple string checks

Example

"vic_cpc_qid = 'OCC_C_CSI' or vic_cpc_qid = 'OCW_A_RSI' or vic_cpc_qid = 'OCC_C_RSI' or vic_cpc_qid = 'OCW_A_CSI' or vic_cpc_qid = 'OCE_B_RSI' or vic_cpc_qid = 'OCE_B_CSI'"

here "vic_cpc_qid" is one of the columns in the dataset .

I am dynamically generating this expression using the below code

List<String> streams = Arrays.asList("OCC_C_CSI","OCW_A_RSI","OCC_C_RSI","OCW_A_CSI","OCE_B_RSI","OCE_B_CSI");
String orClause = " or ";
StringBuilder expressionBuilder = new StringBuilder();
expressionBuilder.append("\"");

    for(String stream : streams){
        expressionBuilder.append("vic_cpc_qid").append(" = ").append("'").append(stream).append("'");
        expressionBuilder.append(orClause);
    }

    String filterExpression = expressionBuilder.toString();
    String expression = filterExpression.substring(0, filterExpression.length() - orClause.length()).concat("\"");

If i use the result of the expression then I get an error 23:12:06 ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: filter expression ''vic_cpc_qid = \'OCC_C_CSI\' or vic_cpc_qid = \'OCC_C_RSI\' or vic_cpc_qid = \'OCE_B_CSI\' or vic_cpc_qid = \'OCE_B_RSI\' or vic_cpc_qid = \'OCE_E_CSI\' or vic_cpc_qid = \'OCE_E_RSI\' or vic_cpc_qid = \'OCE_F_CSI\' or vic_cpc_qid = \'OCW_A_CSI\' or vic_cpc_qid = \'OCW_A_RSI\''' of type string is not a boolean.;;

However when i statically use the expression in the filter clause like below, it works fine.

Dataset<Row> filterExample = dedupedDataset.filter("vic_cpc_qid = 'OCW_A_RSI' or vic_cpc_qid = 'OCC_C_CSI'");

Is there anything wrong that I am doing here?

Answers: