Cross-referencing Notebooks In The Updated Fabric Notebook Copilot
Contents
Introduction
At FabCon Atlanta last week, the updated notebook Copilot for data engineering and data science was announced. It brings agentic capabilities to the Copilot and is much more intelligent and Fabric-aware than the previous version. You can read the documentation here. For example, you can now do and ask the Copilot following things which you couldn’t previously:
- Use the Copilot without starting a session. Just open the notebook, Copilot and start asking questions and making changes. Saves you CUs.
- list items in the workspace : list all the lakehouses in this workspace
- take actions : mount the and lakehouses and make the default lakehouse
- get ABFSS paths : give me abfs path of lakehouse <lakehouse_1>.
- create spark pool configurations using %%configure : add configuration to use 4 cores
- refer to content in a cell by cell # : explain cell 11,
Give it a try and you will be surprised how well it works.
But one of my most favorite features is being able to read and refer to other notebooks. For example, I can ask the Copilot to read notebook_1 from the same workspace. Think of the implications for a second. Below is one example, how this can be helpful.
Cross-referencing notebooks
- In a Fabric workspace I created a notebook with a markdown that includes rules from Palantir PySpark style guide. This style guide is an opinionated guide to PySpark code style for common situations and the associated best practices based on the most frequent recurring topics across the PySpark. Below is a summarized version in a markdown:
PySpark Style Guide
Purpose: This notebook is a style contract for AI-assisted code generation and review. When referenced from another notebook (e.g.,
refer to @pyspark_style_guide), you MUST apply every rule below to all PySpark code produced or reviewed in that session.Adapted from Palantir PySpark Style Guide (MIT License).
VERSIONS
Use features and API supported by following versions:
- Spark 3.5
- Delta 3.2
- Python 3.11
Enforcement Checklist
When reviewing or generating PySpark code, walk through each check below in order. Flag every violation found. Do not skip checks.
| # | Check | What to look for |
|---|---|---|
| C1 | Imports | Any bare from pyspark.sql.functions import ... or alias other than F, T, W |
| C2 | Column access | Any df.colName dot-access outside of a join on= clause |
| C3 | String column refs | Any F.col('x') that could just be 'x' (Spark 3.0+) |
| C4 | Variable names | Any single-letter dataframe names (df, o, d, t) |
| C5 | Magic values | Any literal string, number, or threshold inline in filter, when, withColumn, select that is not a named constant |
| C6 | Select contract | More than one function per column in a select, or a .when() expression inside a select |
| C7 | withColumnRenamed | Any use. Replace with select + .alias() |
| C8 | Empty columns | Any lit(''), lit('NA'), lit('N/A'). Must be lit(None) |
| C9 | Logical density | More than 3 boolean expressions in a single .filter() or F.when() without named variables |
| C10 | Chain length | More than 5 chained statements in one block |
| C11 | Chain mixing | Joins, filters, withColumn, and selects mixed in the same chain |
| C12 | Join hygiene | Any .join() missing explicit how= |
| C13 | Right joins | Any how='right'. Swap df order, use left |
| C14 | Window frames | Any Window.partitionBy(...).orderBy(...) without explicit .rowsBetween() or .rangeBetween() |
| C15 | Window nulls | F.first() or F.last() without ignorenulls=True |
| C16 | Global windows | Empty W.partitionBy() or window without orderBy used for aggregation. Use .agg() instead |
| C17 | Otherwise fallback | .otherwise(<catch-all value>) masking unexpected data. Use None or omit |
| C18 | Line continuation | Any \ for multiline. Wrap in parentheses instead |
| C19 | UDFs | Any @udf or F.udf(). Rewrite with native functions |
| C20 | Comments | Comments that describe what code does instead of why a decision was made |
| C21 | Dead code | Commented-out code blocks. Remove them |
| C22 | Function size | Functions over ~70 lines or files over ~250 lines |
Anti-Patterns (find and fix these)
Each pattern below is a regex-like signature. If you see it, it is a violation.
AP1: Bare function imports
# VIOLATION: any of these
from pyspark.sql.functions import col, when, sum, lit
import pyspark.sql.functions as func
# FIX: always
from pyspark.sql import functions as F
from pyspark.sql import types as T
from pyspark.sql import Window as W
AP2: Dot-access column references
# VIOLATION: df.column_name anywhere except join on=
df.select(df.order_id, df.amount)
df.withColumn('x', df.price * df.qty)
# FIX: use string refs
df.select('order_id', 'amount')
df.withColumn('x', F.col('price') * F.col('qty'))
AP3: Inline magic values
# VIOLATION: bare literals in logic
df.filter(F.col('amount') > 500)
F.when(F.col('status') == 'shipped', 'In Transit')
df.filter(F.col('days') < 365)
# FIX: named constants at top of cell/function
HIGH_VALUE_THRESHOLD = 500
STATUS_SHIPPED = 'shipped'
LABEL_IN_TRANSIT = 'In Transit'
ONE_YEAR_DAYS = 365
df.filter(F.col('amount') > HIGH_VALUE_THRESHOLD)
F.when(F.col('status') == STATUS_SHIPPED, LABEL_IN_TRANSIT)
df.filter(F.col('days') < ONE_YEAR_DAYS)
AP4: Complex logic inside .when() or .filter()
# VIOLATION: more than 3 conditions inline
df.filter(
(F.col('a') == 'x') & (F.col('b') > 10) & (F.col('c') != 'y')
& ((F.col('d') == 'online') | (F.col('d') == 'partner'))
)
# FIX: named boolean expressions, max 3 in the final filter
is_valid_type = (F.col('a') == TYPE_X)
above_threshold = (F.col('b') > MIN_THRESHOLD)
not_excluded = (F.col('c') != EXCLUDED_STATUS)
is_target_channel = (F.col('d') == CHANNEL_ONLINE) | (F.col('d') == CHANNEL_PARTNER)
flagged = is_valid_type & above_threshold & not_excluded & is_target_channel
df.filter(flagged)
AP5: .when() inside select
# VIOLATION: conditional logic embedded in select
df.select(
'order_id',
F.when(F.col('status') == 'shipped', 'In Transit')
.when(F.col('status') == 'delivered', 'Complete')
.alias('status_label'),
)
# FIX: select plain columns, then withColumn for derived logic
df = df.select('order_id', 'status')
df = df.withColumn(
'status_label',
F.when(F.col('status') == STATUS_SHIPPED, LABEL_IN_TRANSIT)
.when(F.col('status') == STATUS_DELIVERED, LABEL_COMPLETE)
)
AP6: Empty column sentinels
# VIOLATION
df.withColumn('notes', F.lit(''))
df.withColumn('review_date', F.lit('N/A'))
# FIX
df.withColumn('notes', F.lit(None))
df.withColumn('review_date', F.lit(None))
AP7: Missing window frame
# VIOLATION: implicit frame
w = W.partitionBy('customer_id').orderBy('order_date')
# FIX: always explicit
w = (W.partitionBy('customer_id')
.orderBy('order_date')
.rowsBetween(W.unboundedPreceding, 0))
AP8: Blanket .otherwise()
# VIOLATION: masks unexpected values
F.when(..., 'A').when(..., 'B').otherwise('Unknown')
# FIX: omit otherwise (returns null) or use lit(None) explicitly
F.when(..., 'A').when(..., 'B')
AP9: Monster chains
# VIOLATION: mixed concerns, too long
df = (df.select(...).filter(...).withColumn(...).join(...).drop(...).withColumn(...))
# FIX: separate by concern, max 5 per block
df = (
df
.select(...)
.filter(...)
)
df = df.withColumn(...)
df = df.join(..., how='inner')
AP10: Backslash continuation
# VIOLATION
df = df.filter(F.col('a') == 'x') \
.filter(F.col('b') > 10)
# FIX: parentheses
df = (
df
.filter(F.col('a') == 'x')
.filter(F.col('b') > 10)
)
Quick Reference (for code generation)
When writing new code, apply these defaults:
- Imports:
F,T,Wonly- Columns: string refs where possible,
F.col()when needed- Descriptive df names:
orders_df,active_orders, notdf,o- Constants: every literal in logic gets a
SCREAMING_SNAKEname- Selects: plain columns + one transform each, no
.when()inside- Chains: max 5 lines, group by concern (filter/select, then enrich, then join)
- Joins: always
how=, alwaysleftnotright, alias for disambiguation- Windows: always explicit frame, always
ignorenulls=Trueonfirst/last- Empty cols:
F.lit(None), neverlit('')orlit('NA')- No UDFs, no
.otherwise()fallbacks, no\continuations- Comments explain why, not what. No commented-out code.
- I named the notebook PYSPARK_STYLE_GUIDE. It’s all caps intentionally (more on this later).
- In another notebook, which already has some PySpark code , I opened Copilot.
- Asked : List all notebooks in this workspace. I can see the PYSPARK_STYLE_GUIDE notebook:

- My notebook has one cell with large code block (intentional). I prompted Copilot :
refer to @PYSPARK_STYLE_GUIDE and fix the code without losing the function and purpose
Copilot read the style notebook and applied the rules to the cells in this notebook. You could also use this to extract code patterns from other notebooks, e.g. how did <notebook_name> ingested the data, use the same library as <notebook_name> to create ML features etc. Super handy.
Your BI/DE/DS team could also create reference pattern notebooks, and refer them for driving consistency and quality. Note that you can list items in another workspace but can’t refer cross-workspace.
This was for Copilot in Fabric notebook. In an upcoming blog, I will share how I use Skills for Fabric for development.
About the author
Sandeep Pawar
Principal Program Manager | AI/ML @ Microsoft Fabric CAT
S, Pawar (06/05/2026) Cross-referencing Notebooks In The Updated Fabric Notebook Copilot. Generative AI Agent Governance – msandbu.org