Error while reading csv file - java.io.IOException...

visudevaram · ‎08-28-2025

Hi, I'm facing an issue while reading a csv file containing 10K rows wherein it takes more than a minute to read the file when I executed count on the dataframe. The read has succeeded but the logs show following error wherein the 1st attempt to read the file takes 55 seconds and fails. The 2nd attempt takes less than a second and succeded.

java.io.IOException: Pipe has no content; awaitReadable() returned false [/mnt/vegas/pipes/b6bebd44-8996-43b0-b41a-dc6a637dfc1a.pipe, pos=0,blocksRead=0; bytesRead=0; availInPipe=0]
Vegas Service: Context=11e1107d-52f4-454a-bd25-834059954791, Abandoned pipe - unread by the time it was closed
	at org.apache.hadoop.shaded.com.microsoft.vegas.common.VegasPipeInputStream.pipelineThread(VegasPipeInputStream.java:536)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

Strangely when I change the extension of the file to .txt and read it as a text file, it doesn't give below error and executes in 2 seconds. So I do not suspect network issue to be a cause. Also we are using Large sized nodes and the size of the file is 500KB, so I doubt if the executor size is an issue. Any other potential rootcauses here?

This error was posted by someone else here but the resolution provided doesn't seem to be relevant here: https://community.fabric.microsoft.com/t5/Data-Engineering/Error-warnings-during-Delta-table-write-i...

Following is the code snippet I'm running:

abfs_path = "abfss://<my-container-name>@<my-sa-name>.dfs.core.windows.net/input/dir-name/file-name.csv"
file_dataframe = spark.read.option("header", True).csv(abfs_path)
file_dataframe.count()

Vinodh247 · ‎08-31-2025

This issue is not about your code or the file content itself. It is related to how Fabric’s underlying I/O layer handles streaming of small files in Spark runtime during the first read attempt. Let me break this down for you clearly. Its a know issues afaik, I am not hearing this for the first time.

When you trigger spark.read.csv(...).count(), Spark sends a request to read the file via the pipe mechanism (used internally by Fabric for data movement).
The first attempt times out because the pipe reports Pipe has no content, meaning the pipeline did not deliver any bytes during the initial read window.
Spark retries automatically, and the second attempt succeeds almost instantly (under 1 second), which is why your job completes but shows the warning in logs.

This behaviour is seen mostly with:

Small files (under a few MBs)
Cold paths (files not cached /indexed)
First access after cluster spinup or after idle periods

When you rename the file to .txt and use spark.read.text(), the Vegas pipeline is not invoked in the same way. The reader uses a simpler I/O path, so you do not hit the timeout scenario that happens with the CSV reader’s initial metadata and schema inference phase.

Can you try the following mitigations steps?

Avoid schema inference during read, this removes extra round trips for inference.
Trigger a dummy read or spark.sql("SELECT 1") before reading the file to avoid cold-start latency.
If you have many small CSVs, combine them into a single larger file (although your current file is only ~500 KB, this helps if you scale up).
Even with large nodes, ensure there is no throttling. Use Spark UI or Fabric monitoring logs to confirm.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]

visudevaram · ‎09-01-2025

Thanks for the response! I have run the commands after executing

spark.sql("SELECT 1") and by disabling schema inference to false but it doesn't resolve the issue. In fact I've been running this in a interactive cluster several times but every execution results in execution time of 55 seconds with an underlying failure.

Shahid12523 · ‎09-01-2025

The error comes from Fabric’s Vegas I/O layer — first CSV read fails on the pipe, second retry works.

Happens mainly due to schema inference overhead + connector hiccup.

TXT works because text reader doesn’t need schema inference.

✅ Fix:

Define schema explicitly (.schema(...)) instead of inferring.

Use .option("inferSchema", False).

Add simple retry logic if needed.

👉 Root cause is a Vegas connector bug, not file size or executor capacity.

Shahed Shaikh

visudevaram · ‎09-01-2025

Thanks for the response! I've tried the proposed fix i.e. defined a schema explicitly and set infer schema option to be false however the count operation still takes more than 55 seconds with an internal error.

v-prasare · ‎09-08-2025

Hi @visudevaram ,

If your issue still persists, please consider raising a support ticket for further assistance. our support team will help you in addressing this issue for you.
To raise a support ticket for Fabric and Power BI, kindly follow the steps outlined in the following guide:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

Thanks,

Prashanth Are

MS Fabric community support

v-prasare · ‎09-15-2025

We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?
If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we don’t hear back, we’ll go ahead and close this thread.
Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. We’ll be happy to help.

Thank you for your understanding and participation.

visudevaram · ‎09-15-2025

Hi, The issue hasn't been resolved yet. As a workaround, I'm copying the file to Onelake and reading it from there which works properly. I will raise a support request and have this investigated.

v-prasare · ‎09-16-2025

Thanks for the confirmation. please do keep posted if you got any updates from support team here.

Error while reading csv file - java.io.IOException: Pipe has no content

Helpful resources

FabCon is coming to Atlanta

Error while reading csv file - java.io.IOException: Pipe has no content

Helpful resources