Apache Spark Frequently Used Command

Action Syntax show Dataframe df.show() Stop Spark Session spark.stop() Count entries in dataframe df.count() Write to Delta table df.write.format(“delta”).mode(“overwrite”).save(“/venv/storage”) Read from Delta Table df = spark.read.format(“delta”).load(“/venv/storage”) Remove Duplicates df = df.dropDuplicates()

Setup Apache Sparks – Delta Lake Warehouse

This article explains how to setup a playground environment for data warehousing soluiotion. Testing Delta Lake setup and environment Reference Delta Lake basic official documentation https://delta.io/learn/getting-started https://pypi.org/project/delta-spark https://pypi.org/project/pyspark/3.3.4 Apache Spark / Delta lake Compatibility matrix https://stackoverflow.com/questions/76066363/unable-to-write-df-in-delta-format-on-hdfs