Apache Spark Frequently Used Command

Action Syntax show Dataframe df.show() Stop Spark Session spark.stop() Count entries in dataframe df.count() Write to Delta table df.write.format(“delta”).mode(“overwrite”).save(“/venv/storage”) Read from Delta Table df = spark.read.format(“delta”).load(“/venv/storage”) Remove Duplicates df = df.dropDuplicates()

Setup Apache Sparks – Delta Lake Warehouse

This article explains how to setup a playground environment for data warehousing soluiotion. Testing Delta Lake setup and environment Reference Delta Lake basic official documentation https://delta.io/learn/getting-started https://pypi.org/project/delta-spark https://pypi.org/project/pyspark/3.3.4 Apache Spark / Delta lake Compatibility matrix https://stackoverflow.com/questions/76066363/unable-to-write-df-in-delta-format-on-hdfs

How to access MariaDB from Fabric

Gateway is required. Download and install gateway. ODBC connector for maria db to be installed or you will get the following error An exception occurred: The ‘Driver’ property with value ‘{MariaDB ODBC 3.1 Driver}’ doesn’t correspond to an installed ODBC driver https://community.fabric.microsoft.com/t5/Desktop/Connection-to-MariaDB-container-not-possible-due-to-ODBC-3-1/m-p/2815511#M973792

SAP DataSphere Reference documents

Title Access Link Data Warehouse Modelling Guide https://erprealm.com/wp-content/uploads/2024/04/DWC_Acquiring_Preparing_Modeling_Data.pdf CDS Views https://help.sap.com/docs/SAP_S4HANA_CLOUD/c0c54048d35849128be8e872df5bea6d/5418de55938d1d22e10000000a44147b.html?q=CDS%20Views%20in%20Procurement Basic SAP reference guide : DataSphere https://developers.sap.com/tutorials/data-warehouse-cloud-bi1-connect-sac.html Attachments DW Modelling UserGuide