Interface between XGBoost and TMVA or C++

Dear experts,

I am currently trying to transfer a XGBoost model that was trained in python into TMVA. I have searched for documentation and I  found  these slides https://indico.cern.ch/event/773049/contributions/3476171/attachments/1936050/3208338/CHEP_2019__Machine_Learning_with_ROOT_TMVA.pdf showing how to convert a XGBoost model into a readable model by TMVA. I have tried these lines with root 6.20.02-x86_64-centos7-gcc8-opt and got the following error:

Traceback (most recent call last):
File “convert_xgboost_tmva.py”, line 8, in
ROOT.TMVA.SaveXGBoost(xgb,“XGBoostBDT”,“model.root”)
AttributeError: type object ‘TMVA’ has no attribute ‘SaveXGBoost’
[solis@hep11 SimpleAnalysis]$

Is there any workaround implemented already in TMVA to read a model from XGBoost or any interface to run XGBoost in C++? Thank you very much in advance for any feedback you have.

Kind regards,

Alvaro Lopez

Hi!

We don’t recommend using TMVA in production. You should use MVAUtils, it’s much lighter weight (in memory and maintainability) than TMVA. The readme there should help to get you started.

XGboost support was added by @yuchou, maybe he can help more.

Thanks a lot @dguest for your quick feedback ! Ok I will take a look. The README seems quite self-explanatory but will ping @yuchou if I get stuck with something.

Best,

Alvaro

Hi,

Just following up with Alvaro, we are trying to implement a multi-class BDT trained with multi:softprob objective function (into SimpleAnalysis), if I understood the MVAutils scripts are only able to convert XGBoost models for binary classification at the moment? I wonder, @yuchou, if it’s super hard to make this implementation? If not I would be happy to give it a go.

Or do you think something like XGBoost->XML->MVAutils would be a better option?

Cheers,
Hamish

I guess I’ll have to let @yuchou comment on how difficult adding an objective function would be. I’m not sure you’d buy much by converting to XML between XGBoost and MVAUtils, though: someone will probably have to add something to the C++ code.

Hi,

Yes, XGBoost currently only support regression and binary classification
Some similar functionality for multi:softmax objective was already implemented by @turra for LGBM.

I guess you need to add something similar in here for multi:softprob.
and add a similar function like this to get the prediction

Since this is a binary tree model, adding these two-part should be enough.

@turra and @christos could probably provide some inputs on the general structure of the package.

Cheers,
Yuan-Tang

I think @turra should comment on the python part especially.

Anyhow , currently


and then there are overloads.

Actually, currently if you ask for multiclassification from XBoostG I think you use the common implentation that has a comment that is valid for LGBM and TMVA.

Anyhow, more or less the idea is keep everything that is Node as simple as possible.
Keep the overall BDT as a single entry point.
We can create as many Forest impl as we wish
So in this case naively you want to have an actual overload for xBoostG doing the multiClassification

i.e not picking up

  Compute the prediction for multiclassification (a score for each class).
         * In addition to the input values need to pass the number of classes
         **/
        // Since TMVA and lgbm are identical the common implementation is here:
        // Return the softmax of the sub-forest raw-response
        virtual std::vector<float> GetMultiResponse(const std::vector<float>& values,
                                                    unsigned int numClasses) const override;
        virtual std::vector<float> GetMultiResponse(const std::vector<float*>& pointers,
                                                    unsigned int numClasses) const override;

which is basically what will do not if you ask for it , even if not supported but provide one

Ok thanks, a lot for the help.

I’ll see what I can do and ping if I get stuck.

Thanks!
Hamish

Hi all,

It turns out that the model dump from XGBoost for multi-class models has exactly the same format as LGBM (trees for each class interleaved). Indeed the implementation is common with LGBM/TMVA. So I just added a multi-class test in the python converter module and things seemed to work out of the box (using GetMultiResponse I can’t really see the need for overloading that function if the implementation is identical). The only thing required from the user is the number of classes, as I couldn’t find a way of accessing this from the model.

I can push the little update to the converter script if people are happy with this?

Cheers,
Hamish

Yes, please make a merge request! It might help if you reference this thread.

Of course anyone can make a WIP merge request without asking, it’s just a request after all.

yes

if it can use the common implementation , things are obviously easier --> no C++ code needed … as this is what will do today .

Just for posterity: There’s a merge request to fix this in the works now.