Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
ex_kjetilh
New Contributor II

I am looking for a way of writing unit tests for pyspark transfomations.

I want to write tests for functions in fabric notebooks of the type: given this dataframe (read from file or code) do the transformation and check if resulting dataframe is like so and so.

How do I do this in fabric?

I want the test runs to be called from CI/CD.

However, I can find very little written about this. Maybe I am just not looking in the right places.

3 REPLIES 3
tayloramy
Contributor

Hi @ex_kjetilh

 

Below is a practical way to unit-test PySpark transformations from Fabric notebooks and run them in CI/CD. It boils down to: put your transform logic in plain Python modules, test them with pytest + a local Spark session (or Sparkโ€™s own testing helpers), and optionally add Fabric-side integration tests for end-to-end coverage.

 

  1. Refactor your notebook code into testable functions.
    Put all transformation logic in /src/your_pkg/transforms.py (imported by your notebook), so tests donโ€™t depend on a notebook runtime. See Databricksโ€™ pattern (same idea) for testing notebook code by moving logic into modules: Unit testing for notebooks.
  2. Write pytest unit tests that spin up a local Spark.
    Create /tests/conftest.py and /tests/test_transforms.py. Use Sparkโ€™s built-in test helpers (Spark 4.x) like pyspark.testing.assertDataFrameEqual.

# tests/conftest.py
import pytest
from pyspark.sql import SparkSession

@pytest.fixture(scope="session")
def spark():
    return (SparkSession.builder
            .master("local[*]")
            .appName("unit-tests")
            .getOrCreate())
โ€‹
# tests/test_transforms.py
from pyspark.sql import Row
from your_pkg.transforms import clean_and_join
from pyspark.testing import assertDataFrameEqual  # Spark โ‰ฅ 4.x

def test_clean_and_join(spark):
    left = spark.createDataFrame([Row(id=1, v="a "), Row(id=2, v="b")])
    right = spark.createDataFrame([Row(id=1, w=10), Row(id=2, w=20)])
    actual = clean_and_join(left, right)  # your transform
    expected = spark.createDataFrame([Row(id=1, v="a", w=10), Row(id=2, v="b", w=20)])
    assertDataFrameEqual(actual, expected, checkRowOrder=False)

 

  • Run tests in CI/CD (GitHub Actions or Azure DevOps).
    Pin your local pyspark to the Fabric runtime version to avoid surprises (check your Fabric Spark runtime, then set the same pyspark==x.y.* in tests). Example GitHub Actions job:
name: unit-tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.10' }
      - name: Install Java for Spark
        uses: actions/setup-java@v4
        with: { distribution: 'temurin', java-version: '11' }
      - name: Install deps
        run: |
          pip install "pyspark==<match_Fabric_runtime>" pytest chispa
      - name: Run pytest
        run: pytest -q --maxfail=1 --disable-warnings
โ€‹Fabric CI/CD background: Deployment pipelines overview, 

Git integration overview. Good write-ups with examples: Unit tests on Microsoft Fabric items (pytest), Optimizing for CI/CD in Microsoft Fabric.

  • Add Fabric integration tests.
    Keep unit tests fast/local. For end-to-end checks inside Fabric (e.g., against a Lakehouse table), you can:
    • Trigger a Notebook job in a test workspace that seeds tiny test data, calls your transform, and asserts results (either via Spark asserts or by writing a small โ€œresultโ€ table and checking row counts/values).
    • Or orchestrate with deployment pipelines/fabric-cicd and run a smoke-test notebook after deploy. Example concept posts: Automate testing Fabric pipelines with YAML, fabric-cicd library initial tests.
    • In notebooks, import your package; donโ€™t re-implement.
      Your Fabric notebook should import your_pkg.transforms so the code under test and the code you run in Fabric are the same. General notebook authoring doc: Develop, execute, and manage Fabric notebooks. 

       

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

v-prasare
Honored Contributor II

Hi @ex_kjetilh,

We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.


@tayloramy ,Thanks for your prompt response

 

 

Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support

KevinChant
Contributor III

I wrote a post some time ago relating to this, I hope it helps:

https://www.kevinrchant.com/2024/08/30/unit-tests-on-microsoft-fabric-items/ 

Helpful resources

Announcements
November Fabric Update Carousel

Fabric Monthly Update - November 2025

Check out the November 2025 Fabric update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Users online (27)