Gamma + jets analysis

Table of Contents

Presentation
- Get a working area
Step 1 - Running PAT
Step 2 - Convert PAT tuples to plain root trees
Step 3 - finalization
Step 4 - The plots

Presentation

This is a documentation for the gamma + jets framework.

Get a working area

The code is hosted on git, and available on read-access to anyone. You can find some documentation on git on their website: http://git-scm.com/documentation

To setup a working area, here the step you must follow:

setenv SCRAM_ARCH slc5_amd64_gcc462 / export SCRAM_ARCH=slc5_amd64_gcc462

cmsrel CMSSW_5_3_9_patch2
cd CMSSW_5_3_9_patch2/src/
cmsenv

addpkg DataFormats/PatCandidates   V06-05-06-10
addpkg PhysicsTools/PatAlgos       V08-09-56
addpkg PhysicsTools/PatUtils       V03-09-28
addpkg DataFormats/CaloRecHit      V02-05-11
addpkg DataFormats/StdDictionaries V00-02-14
addpkg FWCore/GuiBrowsers          V00-00-70
addpkg RecoMET/METProducers        V03-03-12-02

addpkg RecoParticleFlow/PFProducer V15-02-06

addpkg RecoTauTag/RecoTau V01-04-23
addpkg RecoTauTag/Configuration V01-04-10
addpkg CondFormats/EgammaObjects V00-04-00

cvs update -r 1.19.8.1 PhysicsTools/PatAlgos/python/tools/helpers.py

# LumiCalc
cvs co  -r V04-02-03 RecoLuminosity/LumiDB
cvs co -rV04-01-01 RecoLuminosity/LumiDB/python/csvLumibyLSParser.py  # Bug when creating PU profile when trigger is prescaled

# Photon ID
mkdir EGamma; cd EGamma
cvs co -r V00-00-30 -d EGammaAnalysisTools UserCode/EGamma/EGammaAnalysisTools
cd ..

# Quark / Gluon tagging
mkdir QuarkGluonTagger; cd QuarkGluonTagger
cvs co -r v1-2-3 -d EightTeV UserCode/tomc/QuarkGluonTagger/EightTeV
cd ..

# Filters

cvs co -r V00-00-13 RecoMET/METFilters
cvs co -r V00-00-08 RecoMET/METAnalyzers
cvs co -r 1.2 RecoMET/METFilters/python/metFilters_cff.py

cd JetMETCorrections
git clone https://github.com/blinkseb/GammaJetResiduals.git GammaJetFilter

cd ..
scram b -j 9

If everything went fine, you should now have a 'JetMETCorrections/GammaJetFilter' directory. It’s only here you’ll need to work.

Step 1 - Running PAT

All the scripts related to PAT are inside the 'crab' folder.

There’re three CMSSW python configuration files: produce_PAT_COMMON.py, produce_PAT_MC.py and produce_PAT_DATA.py. The most important one is produce_PAT_COMMON.py, as it contains all the PAT configuration. The MC and DATA ones only call the COMMON file with some options.

Let’s analyze a bit further the main function of the COMMON file :

def createProcess(runOnMC, runCHS, correctMETWithT1, processCaloJets, globalTag):

There’re five parameters:

runOnMC: if True, indicates that this is a MC samples
runCHS: if True, runs a new sequence of PF2PAT with CHS turned on. Please note that whether it’s True or not, a PF2PAT sequence with CHS turned off is always ran. So, if runCHS is true, you’ll have two PF2PAT sequences run in parallel (one with a PFlowAK5 postfix, one with a PFlowAK5chs postfix)
correctMETWithT1: if True, MET is corrected with Type-I algorithm.
processCaloJets: if True, process Calo jets along PF jets. This is usually off, but may be needed if you want to study calo jets. Please note that calo jets are not added inside the PF2PAT sequences, but inside the default PAT sequence.
globalTag: The global tag to use. It is very important to provide the right global tag, especially if you want the right jet corrections. You can find a list of the global tags on this page: https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions

You’ll usually need to edit the COMMON file to add new things to the PAT tuples, like a new jet collection, etc. For editing the global tags, you only need to edit the MC file. For the global tag related to DATA, the task is a little bit different, as I’ll explain soon.

Let’s now assume you have a working PAT configuration (you have, of course, tested it with cmsRun produce_PAT_MC.py). You’ll now need to launch the PAT production on real data and MC datasets. For that, you have two handy scripts: createAndRunMCCrab.py and createAndRunDataCrab.py. The first one launch PAT on MC, and the second on data.

createAndRunMCCrab.py

This one is really simple. At the beginnnig of the script, you have the definition of a tuple named datasets. You only need to edit this tuples to add, remove or edit datasets. It’s already prefilled with the datasets with use for the gamma + jets analysis, the binned G samples, as well as some QCD samples. You’ll also need to edit your email adress inside the script (line 32 when writing this documentation) to match your own.

Once you have edited the script, you now need to tweak the crab configuration template. It’s the file named 'crab_MC.cfg.template.ipnl'. The only things you need to edit is the white list section (at the end of the template), and the output storage_element (in the USER section). Right now, the template is configured for an usage at Lyon.

You can now launch crab. All you need to do is to execute the script 'createAndRunMCCrab.py'. Without arguments (ie, ./createAndRunMCCrab.py), all it does is to create the crab configuration files for all the datasets. If you use the option --run, it creates the configuration file and also run crab for each datasets.

createAndRunDataCrab.py

This script does the same thing that 'createAndRunMCCrab.py', but it’s more complicated because for data, each sample as an associated JSON file and a global tag. At the beginning of the script, you’ll find an array named datasets, containing, on each row, a definition of a dataset. Each dataset as the following form :

["/Photon/Run2012A-22Jan2013-v1/AOD", "Photon_Run2012A-22Jan2013", "", "FT_53_V21_AN3", [190456, 193621]]

It’s an array, which must contains exactly 5 entries. The first entry is the dataset path, as found in DAS. The second entry is the dataset name (ie, the name you want to give it). The third entry is the JSON file associated with this dataset. If it’s empty, the JSON file for this dataset is red from the global variable global_json. The fourth entry is the global tag for this dataset, and, finally, the last entry is an array of two values, describing the run range associated with this dataset.

You also need to tweak the associated template crab configuration, this time named 'crab_data.cfg.template.ipnl'. You need to edit the same thing as for the MC file.

This script is launched as the MC script (ie, the --run option for launching crab, and nothing for generating the configuration files).

For now, we’ll assume you have launched crab for data and MC. After some ( (very) long) babysitting, all your task are successfull. You now just need to publish all your tasks, using crab -publish. And it’s done for this part!

Important

Publishing

It’s very important your write somewhere the published dataset path crab will give you. You’ll need these for the step 2. One possibility is to create a twiki page at CERN, and write them here, as I do. See for exemple my page: https://twiki.cern.ch/twiki/bin/view/Main/SebastienBrochet

Step 2 - Convert PAT tuples to plain root trees

For this step, you’ll need to have some published dataset from step 1. If you don’t, grab some from my page, it should work: https://twiki.cern.ch/twiki/bin/view/Main/SebastienBrochet

This step will convert PAT tuples to plain root trees, performing a simple selection :

Select events with only one good photon : the photon ID is done at this step
Choose the first and second jet of the event, with a loose delta phi cut
Additionnaly, if requested, the JEC can be redone at this step, as well as the TypeI MET corrections. More details about that later.

Otherwise, all it’s done is to convert PAT object to root trees. The CMSSW python configuration files can be found in 'analysis/2ndLevel/', and are named 'runFilter_MC.py' and 'runFilter.py'. They are much simpler than those for PAT, because all they do is to run the GammaJetFilter responsible of the PAT → trees conversion.

runFilter[_MC].py

Theses config. files are really simple. They just configure the GammaJetFilter. A list of options with their meaning is available below.

isMC: If True, indicates we are running on MC.
photons: The input tag of the photons collection.
json (only for data): Indicates where the script can find the JSON file of valid run and lumi. This file is produced by crab at step 1. You should not need to tweak this option.
csv (only for data): Indicates where the script can find the CSV file produced by lumiCalc2, containing the luminosity corresponding for each lumisection. You should not need to tweak this option.
filterData (only for data): If True, the json parameter file will be used to filter run and lumisection according to the content of the file.
runOn[Non]CHS: If True, run the filter on (non) CHS collection. You need to have produced corresponding collection at step 1.
runPFAK5: If True, run the filter on PF AK5 jets.
runPFAK7: If True, run the filter on PF AK7 jets. Those jets need to have been produced at step 1.
runCaloAK5: If True, run the filter on calo AK5 jets. Those jets need to have been produced at step 1.
runCaloAK7: If True, run the filter on calo AK7 jets. Those jets need to have been produced at step 1.
doJetCorrection: If True, redo the jet correction from scratch. The jet correction factors will be read from global tag (by default), or from an external database if configured correctly.
correctJecFromRaw: If True, the new JEC factory is computed taking the raw jet. Turn off only if you know what you are doing.
correctorLabel: The corrector label to use for computing the new JEC. The default should be fine for PF AK5 CHS jets.
redoTypeIMETCorrection: If True, TypeI MET is recomputed. Automatically True if doJetCorrection is True.

You can find the code for the GammaJetFilter in 'src/GammaJetFilter.cc'. If an event does not pass the preselection, it’s dumped. Resulting root trees contains only potential gamma + jets events, with exactly one good photon.

Running crab

Like for step 1, you’ll need to run crab for step 2 too. In the 'analysis/2ndLevel/' folder, you’ll find the same createAndRun scripts as for step 1. You’ll need to edit both files to add the dataset path you have obtained from step 1. Don’t forget to also edit the template files, 'crab_data.cfg.template.ipnl' and 'crab_MC.cfg.template.ipnl' to change your storage element.

createAndRunMCCrab.py

This file is very simalar to the one for step 1. It has just been extended to include informations about the cross-section, the number of processed events, and the generated pt hat. The cross-section can be obtained on PREP for exemple.

createAndRunDataCrab.py

This file is very similar to the one for step 1. The format is the same, only things removed are the JSON file and the run range, no longer needed for this step.

Important

In order to automatically compute luminosity, you need to do the following things.

First, you need to create a folder for each dataset in your python configuration. These folder must have the same name as the dataset name defined in your configuration. For exemple, let’s assume you have the following configuration :

datasets = [

    ["/Photon/sbrochet-Photon_Run2012A-22Jan2013_24Apr13-v1-37e3bf2409397e623ffd52beab84a202/USER", "Photon_Run2012A-22Jan2013", "FT_53_V21_AN3"],
    ["/SinglePhoton/sbrochet-SinglePhoton_Run2012B-22Jan2013_24Apr13-v1-37e3bf2409397e623ffd52beab84a202/USER", "SinglePhoton_Run2012B-22Jan2013", "FT_53_V21_AN3"],
    ["/SinglePhoton/sbrochet-SinglePhoton_Run2012C-22Jan2013_24Apr13-v1-37e3bf2409397e623ffd52beab84a202/USER", "SinglePhoton_Run2012C-22Jan2013", "FT_53_V21_AN3" ],


    ]

You’ll need to create three folders, named 'Photon_Run2012A-22Jan2013', 'SinglePhoton_Run2012B-22Jan2013', and 'SinglePhoton_Run2012C-22Jan2013'.

Second, inside of each of these new folder, there must be two files : 'lumiSummary.json', and 'lumibyls.csv'. The first file is produced by crab at the end of the first step, using the command crab -report. You simply need to copy the file in the right folder. The second file is produced by lumiCalc2 using the following command :

lumiCalc2.py -i lumiSummary.json -o lumibyls.csv lumibyls

This step is mandatory, don’t forget it

Once crab is done, the only remaining step is to merge the output in order to have one file per dataset. For that, you have the 'mergeMC.py' and the 'mergeData.py'. Those two files rely on a script called 'crabOutputList.py', which read a crab task and list the output files. Unfortunately, this script heavily rely on the knowledge of Lyon infrascructure and utilities like rfdir. You’ll probably need to change rfdir to the tool you use you, like for exemple eos ls on lxplus for exemple. You’ll also need to edit line 48 to adapt to your own storage element.

So now, let’s assume you have been able to merge the output file. You should now have a root file for each MC dataset and one for each data dataset, with a prefix PhotonJet_2ndLevel_. Copy those files somewhere else. A good place could be the folder 'analysis/tuples/'. I usually create a folder with the date of the day to put the root tuples inside.

You can now go to step 3

Step 3 - finalization

For this step, I’ll assume you have the following folder structure

+ analysis
|- tuples
 |- <date>
  |- toFinalize (containing root files produced at step 2, with prefix PhotonJet_2ndLevel_)
  |- finalized (containing root files we will produce at this step)

The main utility here is the executable named 'gammaJetFinalized'. It’ll produce root files containing a set of histograms for important variable like balancing or MPF. You can find its sources in the folder 'bin/', in the file 'gammaJetFinalizer.cc'. Let’s have a look at the possible options :

gammaJetFinalizer  {-i <string> ... |--input-list <string>}
                      [--chs] [--alpha <float>]
                      [--mc-comp] [--mc] --algo <ak5|ak7> --type <pf|calo>
                      -d <string>

Here’s a brief description of each option :

-i (multiple times): the input root files
--input-list: A text file containing a list of input root files
--mc: Tell the finalizer you run an MC sample
--alpha: The alpha cut to apply. 0.2 by default
--chs: Tell the finalizer you ran on a CHS sample
--mc-comp: Apply a cut on pt_gamma > 200 to get rid of trigger prescale. Useful for doing data/MC comparison
--algo, ak5 or ak7: Tell the finalizer if we run on AK5 or AK7 jets
--type, pf or calo: Tell the finalizer if we run on PF or Calo jets
-d: The output dataset name. This will create an output file named 'PhotonJet_<name>.root'

An exemple of command line could be :

gammaJetFinalizer -i PhotonJet_2ndLevel_Photon_Run2012.root -d Photon_Run2012 --type pf --algo ak5 --chs --alpha 0.20

This will process the input file 'PhotonJet_2ndLevel_Photon_Run2012.root', looking for PF AK5chs jets, using alpha=0.20, and producing an output file named 'PhotonJet_Photon_Run2012.root'.

Note

When you have multiple input file (G MC for exemple), the easiest way is to create an input list and then use the --input-list option of the finalizer. For exemple, suppose you have some files named 'PhotonJet_2ndLevel_G_Pt-30to50.root', 'PhotonJet_2ndLevel_G_Pt-50to80.root', 'PhotonJet_2ndLevel_G_Pt-80to120.root', … You can create an input file list doing

ls PhotonJet_2ndLevel_G_* > mc_G.list

And them pass the 'mc_G.list' file to the option --input-list.

Note	You cannot use the --input-list option when running on data, for file structure reasons. If you have multiple data files, you’ll need first to merge them with hadd in a single file, and them use the -i option.

There’re two things you need to be aware before running the finalizer : the pileup reweighting, and the trigger selection. Each of them is explained in details below.

Per-HLT pileup reweighting

The MC is reweighting according to data, based on the number of vertices in the event, in order to take into account differences between simulation and data scenario wrt PU. In this analysis, the pileup profile for the data is generated for each HLT used during 2012, in order to take into account possible bias du to the prescale of such trigger.

All the utilities to do that are already available in the folder 'analysis/PUReweighting'. The relevant script is 'generatePUProfileForData.py'. As always, all you need to edit is at the beginning of the file.

The trigger list shoud be fine if you run on 2012 data. Otherwise, you’ll need to build it yourself. For the json file list, just add all the one provided and certified. You can provide only one for the whole run range, but beware it’ll take a very long time. It’s better to split in more json files to speed things up.

To run the script, you’ll also need to get the latest pileup json file available. Running something like this should work:

wget --no-check-certificate https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/PileUp/pileup_latest.txt

Execute the script using

./generatePUProfileForData.py pileup_latest.txt

Once it’s done, you should have a PU profile for each HLT of the analysis.

Trigger selection

To avoid any bias in the selection, we explicitely require that, for each bin in pt_gamma, only one trigger was active. For that, we use an XML description of the trigger of the analysis, as you can find in the 'bin/' folder. The description is file named 'triggers.xml'.

The format should be straightforward: you have a separation in run ranges, as well as in triggers. This trigger selection should be fine for 2012, but you’ll need to come with your own one for other datas.

The weight of each HLT is used to reweight various distribution for the prescale. In order to compute it, you need to have the total luminosity of the run range :

lumiCalc2.py -i <myjsonfile.json> --begin lowrun --end highrun overview

And the recorded luminosity for the trigger. For that, use

lumiCalc2.py -i <myjsonfile.json> --begin lowrun --end highrun --hlt "my_hlt_path_*" recorded

Sum all the luminosities for all HLT (only if they don’t overap in time), and divide by the total luminosity to have the weight.

You have a similar file for MC, named 'triggers_mc.xml'. On this file, you have no run range, only a list of HLT path. This list is used in order to know with HLT the event should have fired if it was data, in order to perform the PU reweighting. You can also specify multiple HLT path for one pt bin if there were multiple active triggers during the data taking period. In this case, you’ll need to provide a weight for each trigger (of course, the sum of the weight must be 1). Each trigger will be choose randolmy in order to respect the probabilities.

If you try this documentation on 2012 data, you should now have at least two files (three if you have run on QCD): 'PhotonJet_Photon_Run2012_PFlowAK5chs.root', 'PhotonJet_G_PFlowAK5chs.root', and optionnaly 'PhotonJet_QCD_PFlowAK5chs.root'. You are now ready to produce some plots!

Step 4 - The plots

First of all, you need to build the drawing utilities. For that, go into 'analysis/draw' and run make. You should now have everything built.

In order to produce the full set of plots, you’ll have to run 3 differents utility. You need to be in the same folder where the files produced at step 3 are.

First, drawPhotonJet_2bkg, like that:

../../../draw/drawPhotonJet_2bkg Photon_Run2012 G QCD pf ak5 LUMI

Then, you need to perform the 2nd jet extrapolation using drawPhotonJetExtrap, like this

../../../draw/drawPhotonJetExtrap --type pf --algo ak5 Photon_Run2012 G QCD

Finally, to produce the final plot, one last utility, draw_ratios_vs_pt, like this

../../../draw/draw_ratios_vs_pt Photon_Run2012 G QCD pf ak5

The names to pass to the script depends on what you use for the -d option in step 3. You can find what you used from the name of the root file.

If everything went fine, you should now have a lot of plots in the folder 'PhotonJetPlots_Photon_Run2012_vs_G_plus_QCD_PFlowAK5_LUMI', and some more useful in the folder 'PhotonJetPlots_Photon_Run2012_vs_G_plus_QCD_PFlowAK5_LUMI/vs_pt'.

Have fun!

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
analysis		analysis
bin		bin
crab		crab
doc		doc
interface/json		interface/json
plugins		plugins
src		src
.gitignore		.gitignore
BuildFile.xml		BuildFile.xml
README.asciidoc		README.asciidoc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

bin

bin

crab

crab

doc

doc

interface/json

interface/json

plugins

plugins

src

src

.gitignore

.gitignore

BuildFile.xml

BuildFile.xml

README.asciidoc

README.asciidoc

Repository files navigation

Gamma + jets analysis

Presentation

Get a working area

Step 1 - Running PAT

Step 2 - Convert PAT tuples to plain root trees

Step 3 - finalization

Step 4 - The plots

About

Releases

Packages

Languages

blinkseb/GammaJetResiduals

Folders and files

Latest commit

History

Repository files navigation

Gamma + jets analysis

Presentation

Get a working area

Step 1 - Running PAT

Step 2 - Convert PAT tuples to plain root trees

Step 3 - finalization

Step 4 - The plots

About

Resources

Stars

Watchers

Forks

Languages