Strange behavior in ParallelCopyGCSDirectoryIntoHDFSSpark with NIO ver > 0.66
closed | Created 2019-05-13 | Last updated 2019-09-11| Posted by SHuang-Broad | See in Github


NIO Spark bug


Bug Report

Affected tool(s) or class(es)

ParallelCopyGCSDirectoryIntoHDFSSpark

Affected version(s)

  • Latest public release version [version?]
  • Latest master branch as of [2019-05-13]

Description

ParallelCopyGCSDirectoryIntoHDFSSpark behaves in the following strange way:

  • under master/latest release, it fails to copy a GCS "directory" containing BAMs
  • under master/latest release, it successfully copies a GCS "directory" containing reference
  • changing the nio lib version from 81 to 66 in build.gradle, it successfully copies GCS "directories" containing reference or BAMs
  • see attached logs

Steps to reproduce

Both scripts referred to below need to be updated accordingly, but trivially

  • from the master branch, run the attached test.nio.ver.81.sh.

  • branch out from master, change the literal 81 to 66 on line 69 in build.gradle, run the attached test.nio.ver.66.sh.

Expected behavior

Files in the "directories" given in the gs path copied successfully.

Actual behavior

Fail. See logs attached.


test.nio.paraCopyHDFSSpark.zip

UPDATE:
reuploaded attachment


Return to top