Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
visudevaram
New Contributor

Error while reading csv file - java.io.IOException: Pipe has no content

Hi, I'm facing an issue while reading a csv file containing 10K rows wherein it takes more than a minute to read the file when I executed count on the dataframe. The read has succeeded but the logs show following error wherein the 1st attempt to read the file takes 55 seconds and fails. The 2nd attempt takes less than a second and succeded. 

 

java.io.IOException: Pipe has no content; awaitReadable() returned false [/mnt/vegas/pipes/b6bebd44-8996-43b0-b41a-dc6a637dfc1a.pipe, pos=0,blocksRead=0; bytesRead=0; availInPipe=0]
Vegas Service: Context=11e1107d-52f4-454a-bd25-834059954791, Abandoned pipe - unread by the time it was closed
	at org.apache.hadoop.shaded.com.microsoft.vegas.common.VegasPipeInputStream.pipelineThread(VegasPipeInputStream.java:536)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 

Strangely when I change the extension of the file to .txt and read it as a text file, it doesn't give below error and executes in 2 seconds. So I do not suspect network issue to be a cause. Also we are using Large sized nodes and the size of the file is 500KB, so I doubt if the executor size is an issue. Any other potential rootcauses here? 

 

This error was posted by someone else here but the resolution provided doesn't seem to be relevant here: https://community.fabric.microsoft.com/t5/Data-Engineering/Error-warnings-during-Delta-table-write-i...

 

Following is the code snippet I'm running:

 

abfs_path = "abfss://<my-container-name>@<my-sa-name>.dfs.core.windows.net/input/dir-name/file-name.csv"
file_dataframe = spark.read.option("header", True).csv(abfs_path)
file_dataframe.count()

 

8 REPLIES 8
Vinodh247
Contributor II

This issue is not about your code or the file content itself. It is related to how Fabricโ€™s underlying I/O layer handles streaming of small files in Spark runtime during the first read attempt. Let me break this down for you clearly. Its a know issues afaik, I am not hearing this for the first time.

 

  • When you trigger spark.read.csv(...).count(), Spark sends a request to read the file via the pipe mechanism (used internally by Fabric for data movement).

  • The first attempt times out because the pipe reports Pipe has no content, meaning the pipeline did not deliver any bytes during the initial read window.

  • Spark retries automatically, and the second attempt succeeds almost instantly (under 1 second), which is why your job completes but shows the warning in logs.

This behaviour is seen mostly with:

  • Small files (under a few MBs)

  • Cold paths (files not cached /indexed)

  • First access after cluster spinup or after idle periods

When you rename the file to .txt and use spark.read.text(), the Vegas pipeline is not invoked in the same way. The reader uses a simpler I/O path, so you do not hit the timeout scenario that happens with the CSV readerโ€™s initial metadata and schema inference phase.

 

Can you try the following mitigations steps?

 

  • Avoid schema inference during read, this removes extra round trips for inference.

  • Trigger a dummy read or spark.sql("SELECT 1") before reading the file to avoid cold-start latency.

  • If you have many small CSVs, combine them into a single larger file (although your current file is only ~500 KB, this helps if you scale up).

  • Even with large nodes, ensure there is no throttling. Use Spark UI or Fabric monitoring logs to confirm.

 

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]

Thanks for the response! I have run the commands after executing 

spark.sql("SELECT 1") and by disabling schema inference to false but it doesn't resolve the issue. In fact I've been running this in a interactive cluster several times but every execution results in execution time of 55 seconds with an underlying failure. 
visudevaram_0-1756719436447.png

 

 
Shahid12523
Honored Contributor

The error comes from Fabricโ€™s Vegas I/O layer โ€” first CSV read fails on the pipe, second retry works.

Happens mainly due to schema inference overhead + connector hiccup.

TXT works because text reader doesnโ€™t need schema inference.

โœ… Fix:

Define schema explicitly (.schema(...)) instead of inferring.

Use .option("inferSchema", False).

Add simple retry logic if needed.

๐Ÿ‘‰ Root cause is a Vegas connector bug, not file size or executor capacity.

Shahed Shaikh

Thanks for the response! I've tried the proposed fix i.e. defined a schema explicitly and set infer schema option to be false however the count operation still takes more than 55 seconds with an internal error. 

visudevaram_1-1756720435105.png

v-prasare
Honored Contributor II

Hi @visudevaram ,

If your issue still persists, please consider raising a support ticket for further assistance. our support team will help you in addressing this issue for you.
To raise a support ticket for Fabric and Power BI, kindly follow the steps outlined in the following guide:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

 

 

Thanks,

Prashanth Are

MS Fabric community support

v-prasare
Honored Contributor II

We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?
If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we donโ€™t hear back, weโ€™ll go ahead and close this thread.
Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. Weโ€™ll be happy to help.

Thank you for your understanding and participation.

Hi, The issue hasn't been resolved yet. As a workaround, I'm copying the file to Onelake and reading it from there which works properly. I will raise a support request and have this investigated. 

v-prasare
Honored Contributor II

Thanks for the confirmation. please do keep posted if you got any updates from support team here.

Helpful resources

Announcements
Users online (9,084)