Apache Spark Frequently Used Command

Action Syntax show Dataframe df.show() Stop Spark Session spark.stop() Count entries in dataframe df.count() Write to Delta table df.write.format(“delta”).mode(“overwrite”).save(“/venv/storage”) Read from Delta Table df = spark.read.format(“delta”).load(“/venv/storage”) Remove Duplicates df = df.dropDuplicates()

Setup Apache Sparks – Delta Lake Warehouse

This article explains how to setup a playground environment for data warehousing soluiotion. Testing Delta Lake setup and environment Reference Delta Lake basic official documentation https://delta.io/learn/getting-started https://pypi.org/project/delta-spark https://pypi.org/project/pyspark/3.3.4 Apache Spark / Delta lake Compatibility matrix https://stackoverflow.com/questions/76066363/unable-to-write-df-in-delta-format-on-hdfs

SAP DataSphere Reference documents

Title Access Link Data Warehouse Modelling Guide https://erprealm.com/wp-content/uploads/2024/04/DWC_Acquiring_Preparing_Modeling_Data.pdf CDS Views https://help.sap.com/docs/SAP_S4HANA_CLOUD/c0c54048d35849128be8e872df5bea6d/5418de55938d1d22e10000000a44147b.html?q=CDS%20Views%20in%20Procurement Basic SAP reference guide : DataSphere https://developers.sap.com/tutorials/data-warehouse-cloud-bi1-connect-sac.html Attachments DW Modelling UserGuide