Is there a way I can access multiple JSON objects in array(struct) one by one in pyspark

Keywords´╝Ü json apache-spark dataframe pyspark pyspark-sql

Question: 

I am a bit new to pyspark and json parsing and I am stuck in some certain scenario . Let me explain first What I am trying to do , I have a json file in which there is data element , that data element is an array which contains two other json objects . The given json file is below

 {
    "id": "da20d14c.92ba6",
    "type": "Data Transformation Node",
    "name": "",
    "topic": "",
    "x": 380,
    "y": 240,
    "typeofoperation":"join",
    "wires": [
        ["da20d14c.92ba6","da20d14c.93ba6"]
    ],
 "output":true, 
 "data":[
      {
         "metadata_id":"3434",
         "id":"1",
         "first_name":"Brose",
         "last_name":"Eayres",
         "email":"beayres0@archive.org",
         "gender":"Male",
         "postal_code":null
      },
      {
         "metadata_id":"3434",
         "id":"2",
         "first_name":"Brose",
         "last_name":"Eayres",
         "email":"beayres0@archive.org",
         "gender":"Male",
         "postal_code":null
      }
   ]

 }

Now What I want to do is to iterate over that data array one by one: meaning iterate to first object of json store it into a dataframe and than iterate to the second object and store it into another dataframe and than do a full join or any type of join on them.(is it possible)

If yes , how to do this in pyspark. So far what I have done is
tried to explode it but data is exploded at once rather than on by one

from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col
from pyspark.sql.functions import *
from pyspark.sql import Row
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .getOrCreate()

sc = SparkContext.getOrCreate()

dataFrame = spark.read.option("multiline", "true").json("nodeWithTwoRddJoin.json")

dataNode = dataFrame.select(explode("data").alias("Data_of_node"))

dataNode.show()

But the above code gives me a collective dataset. Than I used

firstDataSet = dataNode.collect()[0]
secondDataSet =  dataNode.collect()[1] 

These lines give me a row which I cannot concert back to dataframe. Any suggestions and solutions

Answers: