MJB RECAST not working (atlas/atlas-conf-2018-041)

Running the mjb recast example from the tutorial (Workflows in ATLAS: The RECAST Command Line Interface):

recast run atlas/atlas-conf-2018-041 --tag mjb

is failing for me. Here is a snippet of the terminal output up to the error:

bash-3.2$ recast run atlas/atlas-conf-2018-041 --tag v1
2022-01-04 20:34:45,525 | packtivity.asyncback |   INFO | configured pool size to 4
2022-01-04 20:34:50,187 |      yadage.creators |   INFO | initializing workflow with initdata: {'databkgcache': 'cache.root', 'did': 375893, 'inputdata_xrootd': 'root://eosuser.cern.ch//eos/project/r/recast/atlas/ATLAS-CONF-2018-041/testdata/signal_evsel_inputs', 'mc16a_pattern': 'r9364', 'mc16d_pattern': 'r10201', 'weightfiles': ['mc16a_weights.json', 'mc16d_weights.json']} discover: True relative: True
2022-01-04 20:34:50,188 |    adage.pollingexec |   INFO | preparing adage coroutine.
2022-01-04 20:34:50,188 |                adage |   INFO | starting state loop.
2022-01-04 20:34:50,350 |     yadage.wflowview |   INFO | added </init:0|defined|unknown>
2022-01-04 20:34:52,379 |     yadage.wflowview |   INFO | added </selectsignals:0|defined|unknown>
2022-01-04 20:34:54,332 |    adage.pollingexec |   INFO | submitting nodes [</init:0|defined|known>]
2022-01-04 20:34:55,808 |       pack.init.step |   INFO | publishing data: <TypedLeafs: {'databkgcache': '/Users/jhaley/work/RECAST/tutorial/recast-v1/init/cache.root', 'did': 375893, 'inputdata_xrootd': 'root://eosuser.cern.ch//eos/project/r/recast/atlas/ATLAS-CONF-2018-041/testdata/signal_evsel_inputs', 'mc16a_pattern': 'r9364', 'mc16d_pattern': 'r10201', 'weightfiles': ['/Users/jhaley/work/RECAST/tutorial/recast-v1/init/mc16a_weights.json', '/Users/jhaley/work/RECAST/tutorial/recast-v1/init/mc16d_weights.json']}>
2022-01-04 20:34:55,808 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 0 | failed: 0 | total: 2 | open rules: 2 | applied rules: 2
2022-01-04 20:34:57,987 |           adage.node |   INFO | node ready </init:0|success|known>
2022-01-04 20:34:58,016 |    adage.pollingexec |   INFO | submitting nodes [</selectsignals:0|defined|unknown>]
2022-01-04 20:34:58,022 | pack.selectsignals.s |   INFO | starting file logging for topic: step
2022-01-04 20:35:06,777 | pack.selectsignals.s |  ERROR | non-zero return code raising exception
2022-01-04 20:35:06,778 | pack.selectsignals.s |  ERROR | subprocess failed. code: 1,  command docker run --rm -i  --cidfile /Users/jhaley/work/RECAST/tutorial/recast-v1/selectsignals/_packtivity/selectsignals.cid  -e KRB_SETUP_SCRIPT='/recast_auth/getkrb.sh'  -u root  -v /Users/jhaley/work/RECAST/tutorial/recast-v1/selectsignals:/Users/jhaley/work/RECAST/tutorial/recast-v1/selectsignals:rw -v /Users/jhaley/work/RECAST/tutorial/recast-v1/init:/Users/jhaley/work/RECAST/tutorial/recast-v1/init:rw -v /Users/jhaley/work/RECAST/tutorial/authdir:/recast_auth:rw gitlab-registry.cern.ch/recast-atlas/susy/atlas-conf-2018-041/mbj_analysis:ATLAS-CONF-2018-041 sh -c sh
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/packtivity/handlers/execution_handlers.py", line 338, in execute_and_tail_subprocess
    raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command 'docker run --rm -i  --cidfile /Users/jhaley/work/RECAST/tutorial/recast-v1/selectsignals/_packtivity/selectsignals.cid  -e KRB_SETUP_SCRIPT='/recast_auth/getkrb.sh'  -u root  -v /Users/jhaley/work/RECAST/tutorial/recast-v1/selectsignals:/Users/jhaley/work/RECAST/tutorial/recast-v1/selectsignals:rw -v /Users/jhaley/work/RECAST/tutorial/recast-v1/init:/Users/jhaley/work/RECAST/tutorial/recast-v1/init:rw -v /Users/jhaley/work/RECAST/tutorial/authdir:/recast_auth:rw gitlab-registry.cern.ch/recast-atlas/susy/atlas-conf-2018-041/mbj_analysis:ATLAS-CONF-2018-041 sh -c sh' returned non-zero exit status 1.
...

And here is the log file from the run select signalsstep (selectsignals/_packtivity/selectsignals.run.log) showing an error when running xrdfs:

2022-01-04 20:35:00,399 | pack.selectsignals.r |   INFO | starting file logging for topic: run
2022-01-04 20:35:00,928 | pack.selectsignals.r |   INFO | b'Password for recasttu@CERN.CH:'
2022-01-04 20:35:01,332 | pack.selectsignals.r |   INFO | b'Configured GCC from: /opt/lcg/gcc/6.2.0/x86_64-slc6'
2022-01-04 20:35:01,427 | pack.selectsignals.r |   INFO | b'Configured AnalysisBase from: /usr/AnalysisBase/21.2.27/InstallArea/x86_64-slc6-gcc62-opt'
2022-01-04 20:35:01,852 | pack.selectsignals.r |   INFO | b'Warning in <TInterpreter::ReadRootmapFile>: class  IBTaggingTruthTaggingTool found in libFTagAnalysisInterfacesDict.so  is already in libBTaggingTruthTaggingToolDict.so'
2022-01-04 20:35:01,853 | pack.selectsignals.r |   INFO | b'Warning in <TInterpreter::ReadRootmapFile>: class  BTaggingTruthTaggingTool found in libxAODBTaggingEfficiencyDict.so  is already in libBTaggingTruthTaggingToolDict.so'
2022-01-04 20:35:02,433 | pack.selectsignals.r |   INFO | b'xAOD::Init                INFO    Environment initialised for data access'
2022-01-04 20:35:06,611 | pack.selectsignals.r |   INFO | b'[ERROR] Internal error'
2022-01-04 20:35:06,614 | pack.selectsignals.r |   INFO | b'/build1/atnight/localbuilds/nightlies/21.2/athena/PhysicsAnalysis/D3PDTools/RootCoreUtils/Root/ShellExec.cxx:42:exception: command failed: xrdfs eosuser.cern.ch ls -l /eos/project/r/recast/atlas/ATLAS-CONF-2018-041/testdata/signal_evsel_inputs'
2022-01-04 20:35:06,615 | pack.selectsignals.r |   INFO | b'with output:'
2022-01-04 20:35:06,616 | pack.selectsignals.r |   INFO | b''
2022-01-04 20:35:06,617 | pack.selectsignals.r |   INFO | b'Traceback (most recent call last):'
2022-01-04 20:35:06,618 | pack.selectsignals.r |   INFO | b'File "/MBJ/build/x86_64-slc6-gcc62-opt/bin/MBJ_run.py", line 320, in <module>'
2022-01-04 20:35:06,619 | pack.selectsignals.r |   INFO | b'ROOT.SH.ScanDir().scan(sh_all, sh_list)'
2022-01-04 20:35:06,620 | pack.selectsignals.r |   INFO | b'TypeError: none of the 2 overloaded methods succeeded. Full details:'
2022-01-04 20:35:06,621 | pack.selectsignals.r |   INFO | b'const SH::ScanDir& SH::ScanDir::scan(SH::SampleHandler& sh, const string& dir) =>'
2022-01-04 20:35:06,622 | pack.selectsignals.r |   INFO | b'could not convert argument 2'
2022-01-04 20:35:06,623 | pack.selectsignals.r |   INFO | b'const SH::ScanDir& SH::ScanDir::scan(SH::SampleHandler& sh, SH::DiskList& list) =>'
2022-01-04 20:35:06,624 | pack.selectsignals.r |   INFO | b'/build1/atnight/localbuilds/nightlies/21.2/athena/PhysicsAnalysis/D3PDTools/RootCoreUtils/Root/ShellExec.cxx:42:exception: command failed: xrdfs eosuser.cern.ch ls -l /eos/project/r/recast/atlas/ATLAS-CONF-2018-041/testdata/signal_evsel_inputs'
2022-01-04 20:35:06,625 | pack.selectsignals.r |   INFO | b'with output:'
2022-01-04 20:35:06,626 | pack.selectsignals.r |   INFO | b'(C++ exception of type RCU::ExceptionMsg)'

Am I doing something wrong or is there an issue with this recast?
Thanks.
~Joe

I should note that I am able to successfully run recast for my local version of tutorial/vhbb:

recast run tutorial/vhbb --tag v1

Thanks for the report @jhaley. I’m able to reproduce your error on recast-atlas v0.1.9 with the following debug script

#!/bin/bash

export RECAST_AUTH_USERNAME=xxx
export RECAST_AUTH_PASSWORD=xxx
export RECAST_AUTH_TOKEN=xxx

eval "$(recast auth setup -a ${RECAST_AUTH_USERNAME} -a ${RECAST_AUTH_PASSWORD} -a ${RECAST_AUTH_TOKEN} -a default)"
eval "$(recast auth write --basedir authdir)"

printf '\n# recast catalogue ls\n'
recast catalogue ls
printf '\n# recast catalogue describe atlas/atlas-conf-2018-041\n'
recast catalogue describe atlas/atlas-conf-2018-041
printf '\n# recast catalogue check atlas/atlas-conf-2018-041\n'
recast catalogue check atlas/atlas-conf-2018-041

# run the workflow
TAG_NAME="debug"
if [ -d "recast-${TAG_NAME}" ];then
    sudo rm -rf "recast-${TAG_NAME}"
fi

recast run atlas/atlas-conf-2018-041 --backend docker --tag "${TAG_NAME}"

Also good catch on the selectsignals/_packtivity/selectsignals.run.log and thank you for reading the logs. :+1:

Off the top of my head I’m not sure what’s wrong. I’ll tag @lheinric here as he was the one who made the ATLAS-CONF-2018-041 workflow, but that also happened 3 years ago in August 2018 before there were any stable releases of recast-atlas. So it is possible there have been some breaking changes along the way. (Though I guess we know that it worked(?) back in August 2019 when @damacdon taught it at the 2019 US ATLAS Computing Bootcamp with presumably recast-atlas v0.0.16. Though the selectsignals.run.log seems to indicate this is a workflow software problem as opposed to a recast-atlas one).

Regardless though, if the atlas/atlas-conf-2018-041 workflow is broken then it shouldn’t be included in the example recast-atlas catalogue.

I’ve created Workflow failing with recast-atlas v0.1.9 (#1) · Issues · recast-atlas / susy / ATLAS-CONF-2018-041 · GitLab to track this for the time being as well as ATLAS-CONF-2018-041 example fails · Issue #93 · recast-hep/recast-atlas · GitHub. I’ll link any additional Issues back here.

This has now been resolved in Workflow failing with recast-atlas v0.1.9 (#1) · Issues · recast-atlas / susy / ATLAS-CONF-2018-041 · GitLab.

@jhaley, I know that you were having some further credential issues, so if you are still having them please continue to ask questions here and we can try to resolve them. :+1:

Thank you though for first opening this question, as it highlighted a pretty pathological issue on EOS!

@jhaley something that you can add to your Bash script or .gitlab-ci.yml or however you’re running things that should help you determine if you have the correct authorization for the files you need are the following lines RE: check-access-image and check-access-xrootd:

  # Authentication
  # Authenticate to pull your analysis image(s)
  - eval "$(recast auth setup -a ${RECAST_USER} -a ${RECAST_PASS} -a ${RECAST_TOKEN} -a default)"
  # Authenticate to download inputs from eos via xrootd
  - eval "$(recast auth write --basedir authdir)"

  # Check access to images and files
  - recast auth check-access-image gitlab-registry.cern.ch/recast-atlas/susy/atlas-conf-2018-041/mbj_analysis:ATLAS-CONF-2018-041
  - recast auth check-access-image gitlab-registry.cern.ch/recast-atlas/susy/atlas-conf-2018-041/mbj_histfitter:ATLAS-CONF-2018-041

  - recast auth check-access-xrootd root://eosproject.cern.ch//eos/project/r/recast/atlas/ATLAS-CONF-2018-041/

The advantage here is that these are explicit checks that run before the rest of the workflow, so if there is access problems it will fail sooner and for a clear reason.