How to select the correct jet author for a b-tagging tool?

As someone who has little experience with actually applying flavor tagging in analyses, I’m not sure how to properly select the JetAuthor property when setting the properties for a b-tagging tool. Naively, it isn’t clear what the name “author” is meant to convey.

Explicitly, I want to get b-tagging score information on ghost tagged VR track jets with tracks ghost associated to them inside of large-R jets (so eventually using the GhostVR30Rmax4Rmin02TrackJetGhostTag aux info in my derivation’s AntiKt10LCTopoJets container to link to the AntiKtVR30Rmax4Rmin02TrackJets). Looking at the (March 2020) recommendations of the Tagger Recommendations for Release 21 TWiki and the Tagger Calibration Recommendations for Release 21 TWiki I find that the supported jet collections I can use are the AntiKtVR30Rmax4Rmin02TrackJets with DL1r and the recommended CDI file is xAODBTaggingEfficiency/13TeV/2020-21-13TeV-MC16-CDI-2020-03-11_v2.root with a time stamp of 201903.

Assuming that I have understood this information correctly, and if I’m initializing my b-tagging tool

ANA_CHECK(m_BJetSelectTool_handle.setProperty("FlvTagCutDefinitionsFileName", m_corrFileName));
ANA_CHECK(m_BJetSelectTool_handle.setProperty("TaggerName", m_taggerName));
ANA_CHECK(m_BJetSelectTool_handle.setProperty("OperatingPoint", m_operatingPt));
ANA_CHECK(m_BJetSelectTool_handle.setProperty("JetAuthor", m_jetAuthor));

then it is clear what I would use except for the JetAuthor:

  • FlvTagCutDefinitionsFileName: xAODBTaggingEfficiency/13TeV/2020-21-13TeV-MC16-CDI-2020-03-11_v2.root
  • TaggerName: DL1r
  • OperatingPoint: your choice depending on what you want (here for an example can use FixedCutBEff_85)
  • JetAuthor: ???

I think from discussions and the Tagger Calibration Recommendations for Release 21 TWiki mentioning that

Note: the JetAuthor is usually the name of the jet collection by convention

that I would want to use AntiKtVR30Rmax4Rmin02TrackJets_BTagging201903 as I want to b-tag AntiKtVR30Rmax4Rmin02TrackJets using the recommended CDI file, but this isn’t really clear to me or how I would determine this information or what the JetAuthor does.

How do I properly determine the JetAuthor in general?

This is obviously very important, and so I want to make sure I understand what I’m doing as it is noted in the Tagger Calibration Recommendations for Release 21 TWiki that

If you configure the tool with JetAuthor = "AntiKt4EMPFlowJets_BTagging201903" , and use it to calibrate a different collection (e.g. "AntiKt4EMTopoJets_BTagging201810" ) the tool will happily return scale factors. These results will be nonsense so this should be avoided.

@dguest if your or Bing have comments here, that would be helpful.

Hi @feickert, @biliu, here’s a table:

Jet collection (old EDM) New name in EDM JetAuthor in new CDI
AntiKtVR30Rmax4Rmin02TrackJets AntiKtVR30Rmax4Rmin02TrackJets_BTagging201810 AntiKtVR30Rmax4Rmin02TrackJets_BTagging201810
AntiKtVR30Rmax4Rmin02TrackJets_BTagging201903 AntiKtVR30Rmax4Rmin02TrackJets_BTagging201903
AntiKtVR30Rmax4Rmin02TrackGhostTagJets AntiKtVR30Rmax4Rmin02TrackGhostTagJets_BTagging201810 AntiKtVR30Rmax4Rmin02TrackJets_BTagging2018101
AntiKt4EMPFlowJets AntiKt4EMPFlowJets_BTagging201810 AntiKt4EMPFlowJets_BTagging201810
AntiKt4EMPFlowJets_BTagging201903 AntiKt4EMPFlowJets_BTagging201903

1 No official calibration.

If you’re using a new derivation, a new CDI File, and an officially supported jet collection, the JetAuthor is always the name of the jet collection. Internally the CDI has to have a key for each (jet collection, tagger, operating point) combination, and by convention the key of the jet collection matches the jet collection name in the EDM.

This means that, for better or worse, the user is responsible for matching the JetAuthor key to the jet collection key in storegate.

Right now we’re in a (hopefully rare) transition period between a few naming conventions, so the general convention is a bit broken. To make things more complicated, you’re looking at experimental collections.

A few notes about the GhostTag collections:

  • The jets themselves are exactly the same as the similarly named collections without GhostTag
  • There is no calibration for these jets, so one should take all the working points and scale factors with a grain of salt. That said, they seem to perform almost identically.
  • The 201903 version of these jets isn’t widely supported in the derivation framework. There are ways schedule them, see an example in FTAG5, but the way the way we store things in the EDM there is different from the way it’s done in the rest of the derivations. If you’re interested in using something like this you should ask me first.

Thanks very much @dguest. This was really informative.

For posterity, stepping through the information you gave me, these are the configurations I ended up with.

I am using the latest (March 2020) recommended CDI file: xAODBTaggingEfficiency/13TeV/2020-21-13TeV-MC16-CDI-2020-03-11_v2.root. As I’m interested in the AntiKtVR30Rmax4Rmin02TrackGhostTagJets (old EDM) collection (which you note is the same as the AntiKtVR30Rmax4Rmin02TrackJets collection) the JetAuthor I need is AntiKtVR30Rmax4Rmin02TrackJets_BTagging201810. Given this 201810 timestamp, and the CDI file’s supported and recommended taggers for it, I want to use TaggerName DL1.

So, to summarize the above like I did in my original question, the configuration I want (for my very specific case) is

Property Value
FlvTagCutDefinitionsFileName xAODBTaggingEfficiency/13TeV/2020-21-13TeV-MC16-CDI-2020-03-11_v2.root
JetAuthor AntiKtVR30Rmax4Rmin02TrackJets_BTagging201810
TaggerName DL1
OperatingPoint FixedCutBEff_85 (example)

As a piece of followup for posterity, the above is true because:

  • GhostVR30Rmax4Rmin02TrackJets are links to ghost associated track jets in large-R jets where the tracks used for matching came from AntiKtVR30Rmax4Rmin02TrackJets containers. The b-tagging scores for them come from using cone associated tracks in b-tagging.
  • GhostVR30Rmax4Rmin02TrackJetGhostTag are links to the same ghost associated track jets as GhostVR30Rmax4Rmin02TrackJet but use ghost associated tracks in b-tagging (ghost tagging).

So both GhostVR30Rmax4Rmin02TrackJet and GhostVR30Rmax4Rmin02TrackJetGhostTag are using the same AntiKtVR30Rmax4Rmin02TrackJets containers to form the jets. The only way they differ is in what tracks are used for b-tagging.

There are two track jet collection and two sets of links to them (ignoring different time stamp collections for a moment):

  • GhostVR30Rmax4Rmin02TrackJet links to AntiKtVR30Rmax4Rmin02TrackJets
  • GhostVR30Rmax4Rmin02TrackJetGhostTag links to AntiKtVR30Rmax4Rmin02TrackGhostTagJets

The reason the two track jet collections are the same is that:

  • the jet constituents (tracks) are the same, and
  • the jet clustering algorithm is the same.

As you said, the only thing that differs is the track selection for b-tagging.

This isn’t terribly efficient, since we build exactly the same jet collection twice (actually three times if we count the 2019 training with cone association), but it was the easiest way to support multiple versions of b-tagging on the same jet collection. Also some rough estimates indicate that b-tagging takes considerably more CPU than jet building.

1 Like