Is there a way I can access multiple JSON objects in array(struct) one by one in pyspark

Keywords´╝Ü json apache-spark dataframe pyspark pyspark-sql


I am a bit new to pyspark and json parsing and I am stuck in some certain scenario . Let me explain first What I am trying to do , I have a json file in which there is data element , that data element is an array which contains two other json objects . The given json file is below

    "id": "da20d14c.92ba6",
    "type": "Data Transformation Node",
    "name": "",
    "topic": "",
    "x": 380,
    "y": 240,
    "wires": [


Now What I want to do is to iterate over that data array one by one: meaning iterate to first object of json store it into a dataframe and than iterate to the second object and store it into another dataframe and than do a full join or any type of join on them.(is it possible)

If yes , how to do this in pyspark. So far what I have done is
tried to explode it but data is exploded at once rather than on by one

from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col
from pyspark.sql.functions import *
from pyspark.sql import Row
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \

sc = SparkContext.getOrCreate()

dataFrame ="multiline", "true").json("nodeWithTwoRddJoin.json")

dataNode ="data").alias("Data_of_node"))

But the above code gives me a collective dataset. Than I used

firstDataSet = dataNode.collect()[0]
secondDataSet =  dataNode.collect()[1] 

These lines give me a row which I cannot concert back to dataframe. Any suggestions and solutions