In my bioinformatics work I often stream files between linux hosts and Amazon S3. This could look like:
$ scp host:/path/to/file /dev/stdout | \ aws s3 cp—s3://bucket/path/to/file
This recently stopped working after upgrading:
ftruncate ”/dev/stdout”: Invalid argument Couldn’t write to ”/dev/stdout”: Illegal seek
I think I figured out why this is happening:
New versions of
scp
use the SFTP protocol instead of the SCP protocol. [1]
With scp
I can give the -O
flag:
Use the legacy SCP protocol for file transfers instead of the SFTP protocol. Forcing the use of the SCP protocol may be necessary for servers that do not implement SFTP, for backwards-compatibility for particular filename wildcard patterns and for expanding paths with a ‘~’ prefix for older SFTP servers.
This does work, but it doesn’t seem ideal: probably servers will drop support for the SCP protocol at some point? I’ve filed a bug with OpenSSH.
[1] “man scp
” gives me: “Since OpenSSH 8.8 (8.7 in Red
Hat/Fedora builds), scp has used the SFTP protocol for transfers by
default.”
Using scp to stdout looks weird to me no matter what. Why not
ssh -n host cat /path/to/file | weird-aws-stuff
… but do you really want to copy everything twice? Why not run
weird-aws-stuff
on the remote host itself?The remote host supports SCP and SFTP, but not SSH.
The bigger problem is that there is no standard data format or standard tool for moving data bigger than 5 GB. (This includes aws s3; which is not an industry standard)
Whoever builds the industry standard will get decision-making power over your specific issue.
Is there something special about 5GB?
What’s wrong with streaming?