Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Reply
wildrose2023
New Contributor II

Hierarchy Modelling for SellOut Data

Hi everyone,

 

I'm looking for some input on the following modeling concept.

The idea is pretty straightforward: we receive SellOut data from different providersโ€”one file contains data at the SKU (product) level, and another contains data at the Brand level (which is the lowest level in that fileโ€™s hierarchy).

The SKU-level data is accurate for each SKU, but some SKUs are not reported. This means that simply summing the SKU data results in incorrect Brand totals. In these cases, the external file with Brand-level data becomes the source of truth.

 

To keep things clean and simple in Power BIโ€”avoiding complex DAX like ISINSCOPE and instead relying on plain SUM, AVG, etc.โ€”Iโ€™m aiming to bring everything into a single fact table. For additive measures, my approach is to assign the missing values to a blank row, which seems to solve the problem neatly.

 

However, for non-additive measuresโ€”like weighted distribution expressed as a percentageโ€”there needs to be a mathematical logic to determine the value attributed to the blank row so that aggregated results remain correct.

Does my diagram make sense to you, and does this approach seem solid? Do you think that the NON-ADDITIVE part makes it much more complicated to do in Databricks? As always thank you for the help and input!

wildrose2023_0-1744898975160.png

 

1 ACCEPTED SOLUTION
v-priyankata
Honored Contributor II

Hi @wildrose2023 
Thank you for sharing your detailed explanation and diagram. Youโ€™ve clearly put a lot of thought into modeling your SellOut data for both additive and non-additive measures.

 

Additive Measures:
Your approach of using a blank row to store the โ€œmissingโ€ SKU-level data to align with the brand-level source of truth makes sense. This ensures that simple aggregations work cleanly without extra DAX complexity.

 

Non-Additive Measures:
For weighted distributions (non-additive), your method of reverse-calculating the blank rowโ€™s value so that the overall brand-level measure aggregates to the correct average also looks solid. This technique ensures that the final brand-level WD aligns with the trusted brand-level file.

 

Regarding Databricks (or Spark-based environments):
In principle, the calculation logic for the blank row (reverse-calculating the missing piece) is the same, regardless of the underlying platform (Power BI, Databricks, etc.). The challenge in Databricks would typically be to replicate the dynamic calculation of that blank rowโ€™s value in PySpark or SQL. However, if you pre-calculate and stamp the value in your ETL layer (as your diagram suggests), youโ€™d avoid runtime complexity in the query layer same as in Power BI.

 

If this information is helpful, please โ€œAccept it as a solutionโ€ and give a "kudos" to assist other community members in resolving similar issues more efficiently. Thank you.

View solution in original post

3 REPLIES 3
lbendlin
Esteemed Contributor III

but some SKUs are not reported. 

how do you know that?  It is very hard to report on the absence of things if you don't know that they are absent.

v-priyankata
Honored Contributor II

Hi @wildrose2023 
Thank you for sharing your detailed explanation and diagram. Youโ€™ve clearly put a lot of thought into modeling your SellOut data for both additive and non-additive measures.

 

Additive Measures:
Your approach of using a blank row to store the โ€œmissingโ€ SKU-level data to align with the brand-level source of truth makes sense. This ensures that simple aggregations work cleanly without extra DAX complexity.

 

Non-Additive Measures:
For weighted distributions (non-additive), your method of reverse-calculating the blank rowโ€™s value so that the overall brand-level measure aggregates to the correct average also looks solid. This technique ensures that the final brand-level WD aligns with the trusted brand-level file.

 

Regarding Databricks (or Spark-based environments):
In principle, the calculation logic for the blank row (reverse-calculating the missing piece) is the same, regardless of the underlying platform (Power BI, Databricks, etc.). The challenge in Databricks would typically be to replicate the dynamic calculation of that blank rowโ€™s value in PySpark or SQL. However, if you pre-calculate and stamp the value in your ETL layer (as your diagram suggests), youโ€™d avoid runtime complexity in the query layer same as in Power BI.

 

If this information is helpful, please โ€œAccept it as a solutionโ€ and give a "kudos" to assist other community members in resolving similar issues more efficiently. Thank you.

Thank you @v-priyankata for validating ๐Ÿ™‚

Helpful resources

Announcements
Sticker Challenge 2026 Carousel

Join our Community Sticker Challenge 2026

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Free Fabric Certifications

Free Fabric Certifications

Get Fabric certified for free! Don't miss your chance.

January Fabric Update Carousel

Fabric Monthly Update - January 2026

Check out the January 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Users online (154)