Properly using FAXbox for data transfer and storage

If I’m working on an ATLAS Connect login node and I want to transfer datasets to FAXbox so that I can easily run dedicated jobs with the data on HTCondor batch systems what is the smartest way to do that?

In this example, if I’m looking at MC16 dijet events across all JZ slices, that’s a decent chunk of data.

$ rucio list-dids mc16_13TeV:mc16_13TeV.*.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ*WithSW.deriv.DAOD_JETM1.e7142_s*_r1*_p4049
| SCOPE:NAME                                                                                                          | [DID TYPE]   |
| mc16_13TeV:mc16_13TeV.364711.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ11WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049 | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364712.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ12WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049 | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364708.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ8WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364710.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ10WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049 | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364704.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ4WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364709.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ9WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364706.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ6WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364701.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ1WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364705.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ5WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364700.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ0WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364703.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ3WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364704.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ4WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364703.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ3WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364700.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ0WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364710.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ10WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049 | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364701.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ1WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364712.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ12WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049 | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364709.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ9WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364707.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ7WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364708.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ8WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364705.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ5WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364702.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ2WithSW.deriv.DAOD_JETM1.e7142_s3126_r10724_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364707.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ7WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364706.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ6WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364711.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ11WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049 | CONTAINER    |
| mc16_13TeV:mc16_13TeV.364702.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ2WithSW.deriv.DAOD_JETM1.e7142_s3126_r10201_p4049  | CONTAINER    |

If I want these files, what’s the best way to transfer them so I can use them? Should I got to my FAXbox area on ATLAS Connect and then manually intitate a bunch of rucio get commands against these datasets?

$ cd /faxbox2/user/"${USER}
$ mkdir mc1516
$ cd $_
$ # rucio get scripted commands go here

Or is there a more intelligent way to tell FAXbox that it should make these files available to me? Or should I do something else entirely?

(This is I think tangential to @damacdon’s question on How do I download files from a directory on eos recursively using xrootd? )

Assuming you have a rucio storage element (RSE) you can access via FAX, you can ask r2d2 to move things for you. But web GUIs are annoying, so better to script it.

The basic component you need is rucio add-rule (which of course has a --help command).

I have a few functions that do this. For example to move to an arbitrary RSE I use this:

function rucio-replicate-to () 
    if [[ $# -lt 1 ]]; then
        echo "No RSE given" 1>&2;
        return 1;
    local DATASETS="";
    if [[ ! -t 0 ]]; then
    if [[ ! -n $DATASETS ]]; then
        echo "Pipe in datasets" 1>&2;
        return 1;
    local DS;
    local OPTS="--lifetime 1296000 --asynchronous";
    local EX=$(echo $@ | sed -r 's/\s+/|/g');
    for DS in $DATASETS;
        rucio add-rule $DS 1 $EX $OPTS > /dev/null;
        echo $DS;

which is used like cat list-of-datasets.txt | rucio-replicate-to SOME_RSE

You can also use rucio list-rses to get a list of all of them. I also wrote a bash tab complete for the storage elements, in this case scratch disks:

_rucio-replicate-to () 
    local LOCAL_RSE_LIST=~/.rses;
    if [[ ! -f $LOCAL_RSE_LIST ]]; then
        rucio list-rses > $LOCAL_RSE_LIST;
    local DISKS=$(grep 'SCRATCHDISK$' ${LOCAL_RSE_LIST});
    local word=$2;
    COMPREPLY=($(compgen -W "$DISKS" -- $word));
    return 0

of course you can change the grep line to filter for whatever RSE pattern you want. There might be some smarter way to filter for ones that support FAX.

1 Like

Thanks @dguest — this is already helpful.

You can also use rucio list-rses to get a list of all of them

I now have a question that is more related to how ATLAS Connect works, as I don’t actually know what ATLAS Connect node I attach to. As I’m at the University of Illinois I’m physically closest to the MWT2 (and Illinois is part of it), but the MWT2 scratch RSE

$ rucio list-rses | grep SCRATCH | grep MWT2

isn’t guaranteed to be my ATLAS Connect login

$ hostname

I don’t really understand how much it matters, but I should probably figure out where is the best place to request replication of datasets.

This is more a question about how FAXbox works, but my understanding is that FAXbox is just another file system built on the underlying FAX storage system. So if you get the files on a RSE that FAX can access, there’s no need to use FAXbox, you should already have access via xrootd.

Anyway, this answer is out of alignment with your original question now, especially as it’s summarized in the title. You were asking how to use FAXbox, and although the title doesn’t say so you’re specifically asking about using it for things that already exist on an RSE.

My answer is basically that you shouldn’t be using FAXbox if you already have a way to transfer to an RSE that you can access via FAX. It looks like you have a few options that might work locally:

rucio list-rses | grep MWT2

but it also might be worth looking at the RSE expressions documentation to see if there’s a more general expression that will move the data to a site you want. It’s not obvious that there’s an RSE attribute for “supports FAX” but you could ask the FAX people for help there.

I also just found some instructions to read data directly off xrootd. The trick seems to be this command:

rucio list-file-replicas --protocol root --pfns --rse MY-FAVORITE_LOCALGROUPDISK container_name

which will give you a list of the files in a format that xrootd can use to read them in.

So long story short: you don’t need to use faxbox for this at all: a few commands should get them to a local RSE and then you can read them directly.

1 Like

Thanks, @dguest. This is exactly what I needed. As an explicit example, for me this was

$ rucio list-file-replicas --protocol root --pfns --rse MWT2_UC_SCRATCHDISK user.feickert.periodA.physics_Main.grp16_v01_p4061._2020-09-03_11-38_tree.root | tail