This article explains how to setup a playground environment for data warehousing soluiotion.

  1. Install Python version matching with the compatibility matrix of Delta lake
  2. Check Java version (development environment not run time)
  3. Create a new virtual environment if required
  4. pip install apache spark with the version compatible with delta lake based on the matrix
  5. pip install delta spark : Note version should based on the compatibility matrix
  6. activate virtual env and run the following command : pyspark
    • pyspark should load without any issue.
    • ensure JAVA_HOME , PYTHON_HOME , PYSPARK_HOME environment variables should be defined.
  7. stop pyspark using Control + C

Testing Delta Lake setup and environment

  1. Run the following command
    • pyspark --packages --conf "" --conf ""
  2. This should start the spark session with the io delta lake
  3. Validate storage
    • data = spark.range(0, 5)
    • data.write.format(“delta”).save(“/tmp/delta-table”)


Delta Lake basic official documentation

Apache Spark / Delta lake Compatibility matrix


Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *