Google News
logo
PySpark - Interview Questions
How to inner join two DataFrames?
We can use join() method that is present in PySpark SQL. The syntax of the method looks like:
join(self, other, on = None, how = None) ?

Where,

* other : It is the right side of the join.
* on : It is the column name string used for joining.  
* how : It is the type of join. Default type is inner. The values of the type can be inner, left, right, cross, full, outer, left_outer, right_outer, left_anti and left_semi.  

where() and filter() methods can be attached to the join expression to filter rows. We can also have multiple joins using the chaining join() method.  

For example, consider we have two dataframes named Employee and Department. Both have columns named as emp_id, emp_name, empdept_id and dept_id, dept_name respectively. We can internally join the Employee DataFrame with the Department DataFrame to get the department information along with the employee information. The code will look like:
emp_dept_df = empDF.join(deptDF,empDF.empdept_id==deptDF.dept_id,"inner").show(truncate = False) ?
Advertisement