I’m seeing a lot of jobs that fail because of socket errors in
TNetXNGFile::Open ERROR [ERROR] Socket error BTagTestDumper ERROR Couldn't open file: root://xrootd.ft.uam.es:1094//pnfs/ft.uam.es/data/atlas/atlasdatadisk/rucio/mc16_13TeV/08/03/DAOD_FTAG5.19513463._000003.pool.root.1
my understanding is that the jobs are timing out. This isn’t a major problem since the grid is smart enough to retry the jobs but it would be nice to know if there’s a standard solution (like increasing the timeout) that people would recommend.
My code is a simple for loop which reads from