Discussion:
Issue with "Bad file descriptor (9)" when moving lots of files
Alan
2011-04-08 12:55:46 UTC
Permalink
I have an EON machine running with the CIFS version of snv_130. I've mounted a zpool from it on a Mac OS X machine via 'smb://'.

I don't have the rsync server setup yet on the eon box, so I'm just using a local rsync command to copy files from the mac to eon, but the process keeps failing.

Running something like:

"rsync -av ./Photos /Volumes/coltrane/Incoming/FromAbacus/"

##################################################

Photos/2010-old/Eric Lindell at Mojo's 2010-06-15/aws-20100619x21204321-01.cr2

...... lots of files move ....
Photos/2010-old/Eric Lindell at Mojo's 2010-06-15/aws-20100620x00565151-01.cr2

Photos/2010-old/Eric Lindell at Mojo's 2010-06-15/aws-20100620x00565179-01.cr2

Photos/2010-old/Eric Lindell at Mojo's 2010-06-15/aws-20100620x00565206-01.cr2

rsync: writefd_unbuffered failed to write 32768 bytes [sender]: Broken pipe (32)

rsync: write failed on "/Volumes/coltrane/Incoming/FromAbacus/Photos/2010-old/Eric Lindell at Mojo's 2010-06-15/aws-20100620x00565206-01.cr2": Bad file descriptor (9)

rsync error: error in file IO (code 11) at /SourceCache/rsync/rsync-40/rsync/receiver.c(268) [receiver=2.6.9]

rsync: connection unexpectedly closed (29618 bytes received so far) [sender]

rsync error: error in rsync protocol data stream (code 12) at /SourceCache/rsync/rsync-40/rsync/io.c(452) [sender=2.6.9]

##################################################

If I start the process again, it picks up where it left off (in this case "Photos/2010-old/Eric Lindell at Mojo's 2010-06-15/aws-20100620x00565206-01.cr2") and the files are fine once they make it over to the EON box. Rsync then chugs along again for a while before hitting the same error again on a different file.

(Note: eventually, I'll just use the rsync daemon which hopefully won't have problems. This might be related to a larger issue so I wanted to bring it up.)


My questions:

1. Any ideas what's going on?

2. If I want to see if the smb version of EON has the same issue, is it safe to export my zpool from the cifs version, install the smb version and then import the pool? (From my understanding, this should work fine, I just want to confirm.)
--
This message posted from opensolaris.org
Andre Lue
2011-04-08 13:30:31 UTC
Permalink
Yes, you can use the smb version without any issues. Just run the proper export and import sequence on the pool before shutdown/boot of each cifs/smb trial.

I'm not sure if this is a rsync 2.6.9 version related thing. I'm wondering if it is trying to set an attribute, maybe -a is trying to set an incompatible attribute (try relaxing switch to -t timestamp only as a test.

What happens if you try this local with OSX directory (exclude EON from equation)?

I have rsync 3.0.7 compiled for OSX I can also send you later to try.
--
This message posted from opensolaris.org
Alan
2011-04-17 23:47:06 UTC
Permalink
running: 'rsync -av 2010-old /Volumes/coltrane/test-transfer/' from my 'Photos directory, about 980 files moved and then I got this:

######################################################################

... bunch of stuff before this....

2010-old/Halloween Workshop with Bob and Mary/aws-20101016x21152402-01.cr2

2010-old/Halloween Workshop with Bob and Mary/aws-20101016x21155143-01.cr2

rsync: writefd_unbuffered failed to write 32768 bytes [sender]: Broken pipe (32)

rsync: write failed on "/Volumes/coltrane/test-transfer/2010-old/Halloween Workshop with Bob and Mary/aws-20101016x21155143-01.cr2": Bad file descriptor (9)

rsync error: error in file IO (code 11) at /SourceCache/rsync/rsync-40/rsync/receiver.c(268) [receiver=2.6.9]

rsync: connection unexpectedly closed (35488 bytes received so far) [sender]

rsync error: error in rsync protocol data stream (code 12) at /SourceCache/rsync/rsync-40/rsync/io.c(452) [sender=2.6.9]

######################################################################

Running, 'ls -la |grep aws-20101016x2115' in the local source directory, I get:

-rw-rw-rw-@ 1 alans staff 7469886 Oct 17 2010 aws-20101016x21152402-01.cr2
-rw-rw-rw-@ 1 alans staff 11520662 Oct 17 2010 aws-20101016x21155143-01.cr2
-rw-rw-rw-@ 1 alans staff 11721658 Oct 17 2010 aws-20101016x21155870-01.cr2

And on the EON volume, I get:

-rwx------ 1 alans staff 2342912 Apr 17 19:28 .aws-20101016x21155143-01.cr2.rhlJTG
-rwx------ 1 alans staff 7469886 Oct 17 2010 aws-20101016x21152402-01.cr2

######################################################################

Once again, if I just reissue the rsync command, it picks up from where it left off and moves the problem file over just fine. Running ls -la on the eon directory I now get:

-rwx------ 1 alans staff 2342912 Apr 17 19:28 .aws-20101016x21155143-01.cr2.rhlJTG
-rwx------ 1 alans staff 7469886 Oct 17 2010 aws-20101016x21152402-01.cr2
-rwx------ 1 alans staff 11520662 Oct 17 2010 aws-20101016x21155143-01.cr2
-rwx------ 1 alans staff 11721658 Oct 17 2010 aws-20101016x21155870-01.cr2

So, the dead file is still there, but the fresh working copy is all set.

######################################################################

Now here's the weird thing.... This all happened while going over a 1GB ethernet connection. However, when I disconnect the cable on the Mac and run the rsync over wireless, it doesn't appear to have an issue. I let it run for several hours and it never choked. I'm not going to do a lot of testing on that at this point, but I would have expected it to fail rather quickly as well. I'm sure that the total number of files sent over the slower wireless was well beyond where the wired connection usually fails. The wireless was plugging along nicely until I killed it so I could do a little more testing.

Other things that I've done:

* Mac to firewire dirve rsync - No issues.

* Mac to a windows network share drive - No issues.

So, at this point, it looks like the issue is only occurring when I'm writing to the eon server over the 1GB ethernet. Next up, I'm going to try the updated rsync client for the Mac to see if that works better. Just wanted to put this in as a status update in the mean time.
--
This message posted from opensolaris.org
Alan
2011-04-18 02:20:59 UTC
Permalink
I got the 3.0.8 version you sent, but also found this link (http://www.bombich.com/rsync.html) with a mac specific set of patches for 3.0.7 so I ran that. (Not sure if you did the same thing for 3.0.8 or not and figured it was close enough for jazz....)

Anyway, I got similar results running the updated version, but decided to try something else.

I setup a standard "cp -r" to work on a different directory at the same time that rsync was running. It turns out that at the same time that rsync choked, I got a 'Bad file descriptor' message from cp. So, it looks to be something with the connection and not directly related to rsync.

I don't know if this was something to do with eon, my router, the configuration of my mac or all of the above. At this point though, I think rsync can be eliminated from list of potential sources.
--
This message posted from opensolaris.org
Andre Lue
2011-04-18 02:59:21 UTC
Permalink
Hi Al,

The switch rsync -av switch is archive
"-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)"

for this file it will try to set all permissions including -@ which i don't think is compatible with zfs
-rw-rw-rw-@ 1 alans staff 7469886 Oct 17 2010 aws-20101016x21152402-01.cr2

It will not be able to recreate this @ info in "-rw-rw-rw-@" on any non HFS filesystem. I suspect this may be causing the issue. One test would be to sync files that do not have this perm.

The bad file descriptor is more a filesystem related error than a connection error.
--
This message posted from opensolaris.org
Continue reading on narkive:
Loading...