This repo contains a set of codes to measure the following OpenMP parallelized clustering measures in a cosmological box (co-moving XYZ) or on a mock (RA, DEC, CZ). Also, contains the associated paper to be published in Astronomy & Computing Journal (at some point).
- Fast All theory pair-counting is at least an order of magnitude faster than all existing public codes. Particularly suited for MCMC.
- Python Extensions Python extensions allow you to do the compute-heavy bits using C while retaining all of the user-friendliness of python.
- Modular The code is written in a modular fashion and is easily extensible to compute arbitrary clustering statistics.
- Future-proof As I get access to newer instruction-sets, the codes will get updated to use the latest and greatest CPU features.
make >= 3.80
- OpenMP capable compiler like
icc
,gcc
orclang >= 3.7
. If not available, please disableUSE_OMP
option option intheory.options
andmocks.options
. You might need to ask your sys-admin for system-wide installs of the compiler; if you prefer to install your own thenconda install gcc
(MAC/linux) or(sudo) port install gcc5
(on MAC) should work. Note ``gcc`` on macports defaults to ``gcc48`` and the portfile is currently broken on ``El Capitan``. gsl
. Use eitherconda install -c https://conda.anaconda.org/asmeurer gsl
(MAC/linux) or(sudo) port install gsl
(MAC) to installgsl
if necessary.python >= 2.6
orpython>=3.4
for compiling the C extensions.numpy>=1.7
for compiling the C extensions.
If python and/or numpy are not available, then the C extensions will not be compiled.
Default compiler on MAC is set to ``clang``, if you want to specify a different compiler, you will have to call ``make CC=yourcompiler``
$ git clone https://github.com/manodeep/Corrfunc/
$ make
$ make install
$ python setup.py install (--user)
$ make tests
Assuming you have gcc
in your PATH
, make
and make install
should compile and install the C libraries + python extensions within the source directory. If you would like to install the python C extensions in your environment, then python setup.py install (--user)
should be sufficient.
The python package is directly installable via pip install Corrfunc
. However, in that case you will lose the ability to recompile the code according to your needs. Not recommended unless you are desperate (i.e., email me if you are having install issues).
If compilation went smoothly, please run make tests
to ensure the code is working correctly. Depending on the hardware and compilation options, the tests might take more than a few minutes. Note that the tests are exhaustive and not traditional unit tests.
While I have tried to ensure that the package compiles and runs out of the box, cross-platform compatibility turns out to be incredibly hard. If you run into any issues during compilation and you have all of the pre-requisites, please see the FAQ or email me. Also, feel free to create a new issue with the Installation
label.
All codes that work on cosmological boxes with co-moving positions are located in the xi_theory
directory. The various clustering measures are:
xi_of_r
-- Measures auto/cross-correlations between two boxes. The boxes do not need to be cubes.xi
-- Measures 3-d auto-correlation in a cubic cosmological box. Assumes PERIODIC boundary conditions.wp
-- Measures auto 2-d point projected correlation function in a cubic cosmological box. Assumes PERIODIC boundary conditions.xi_rp_pi
-- Measures the auto/cross correlation function between two boxes. The boxes do not need to be cubes.vpf
-- Measures the void probability function + counts-in-cells.
All codes that work on mock catalogs (RA, DEC, CZ) are located in the xi_mocks
directory. The various clustering measures are:
DDrppi
-- The standard auto/cross correlation between two data sets. The outputs, DD, DR and RR can be combined usingwprp
to produce the Landy-Szalay estimator for wp(rp).wtheta
-- Computes angular correlation function between two data sets. The outputs fromDDtheta_mocks
need to be combined withwtheta
to get the full ω(θ)vpf
-- Computes the void probability function on mocks.
PERIODIC
(ignored in case of wp/xi) -- switches periodic boundary conditions on/off. Enabled by default.OUTPUT_RPAVG
-- switches on output of<rp>
in eachrp
bin. Can be a massive performance hit (~ 2.2x in case of wp). Disabled by default. Needs code optionDOUBLE_PREC
to be enabled as well. For the mocks,OUTPUT_RPAVG
causes only a mild increase in runtime and is enabled by default.OUTPUT_THETAAVG
-- switches on output of in each theta bin. Can be extremely slow (~5x) depending on compiler, and CPU capabilities. Disabled by default.
LINK_IN_DEC
-- creates binning in declination for mocks. Please check that for your desired binning in rp/θ, this binning does not produce incorrect results (due to numerical precision).LINK_IN_RA
-- creates binning in RA once binning in DEC has been enabled. Same numerical issues asLINK_IN_DEC
FAST_DIVIDE
-- Divisions are slow but required DD(rp, π). This Makefile option (in mocks.options) replaces the divisions to a reciprocal followed by a Newton-Raphson. The code will run ~20% faster at the expense of some numerical precision. Please check that the loss of precision is not important for your use-case. Also, note that the mocks tests for DD(rp, π) will fail if you enableFAST_DIVIDE
.
The documentation is lacking currently but I am actively working on it.
Navigate to the correct directory. Make sure that the options, set in either theory.options
or mocks.options
in the root directory are what you want. If not, edit those two files (and possibly common.mk
), and recompile. Then, you can use the command-line executables in each individual subdirectory corresponding to the clustering measure you are interested in. For example, if you want to compute the full 3-D correlation function, \xi(r)
, then navigate to xi_theory/xi
and run the executable xi
. If you run executables without any arguments, the message will you tell you all the required arguments.
Look under the xi_theory/examples/run_correlations.c
and xi_mocks/examples/run_correlations_mocks.c
to see examples of calling the C API directly. If you run the executables, run_correlations
and run_correlations_mocks
, the output will also show how to call the command-line interface for the various clustering measures.
If all went well, the codes can be directly called from python
. Please see Corrfunc/call_correlation_functions.py
and Corrfunc/call_correlation_functions_mocks.py
for examples on how to use the Python interface. Here are a few examples:
from __future__ import print_function
import os.path as path
import numpy as np
import Corrfunc
from Corrfunc._countpairs import countpairs_wp as wp
# Setup the problem for wp
boxsize = 500.0
pimax = 40.0
nthreads = 4
# Create a fake data-set.
Npts = 100000
x = np.float32(np.random.random(Npts))
y = np.float32(np.random.random(Npts))
z = np.float32(np.random.random(Npts))
x *= boxsize
y *= boxsize
z *= boxsize
# Use a file with histogram bins, containing Nbins pairs of (rmin rmax)
binfile = path.join(path.dirname(path.abspath(Corrfunc.__file__)), "../xi_theory/tests/", "bins")
# Call wp
wp_results = wp(boxsize, pimax, nthreads, binfile, x, y, z)
# Print the results
print("###########################################")
print("## rmin rmax wp npairs")
print("###########################################")
for wp in wp_results:
print("{0:10.4f} {1:10.4f} {2:12.6f} {3:8d}"
.format(wp[0], wp[1], wp[3], wp[4]))
Please see this gist for some benchmarks with current codes.
DOUBLE_PREC
-- does the calculations in double precision. Disabled by default.USE_AVX
-- uses the AVX instruction set found in Intel/AMD CPUs >= 2011 (Intel: Sandy Bridge or later; AMD: Bulldozer or later). Enabled by default - code will run much slower if the CPU does not support AVX instructions. On Linux, check for "avx" in /proc/cpuinfo under flags. If you do not have AVX, but have a SSE4 system instead, email me - I will send you a copy of the code with SSE4 intrinsics. Or, take the relevant SSE code from the public repo at pairwise.USE_OMP
-- uses OpenMP parallelization. Scaling is great for DD (perfect scaling up to 12 threads in my tests) and okay (runtime becomes constant ~6-8 threads in my tests) forDDrppi
andwp
.
Optimization for your architecture
- The values of
bin_refine_factor
and/orzbin_refine_factor
in the countpairs_*.c files control the cache-misses, and consequently, the runtime. In my trial-and-error methods, I have seen any values larger than 3 are always slower. But some different combination of 1/2 for(z)bin_refine_factor
might be faster on your platform. - If you have AVX2/AVX-512/KNC, you will need to rewrite the entire AVX section.
Corrfunc is written/maintained by Manodeep Sinha. Please contact the author in case of any issues.
If you use the code, please cite using the Zenodo DOI. The BibTex entry for the code is
@misc{manodeep_sinha_2016_49720,
author = {Manodeep Sinha},
title = {Corrfunc: Corrfunc-1.0.0},
month = apr,
year = 2016,
doi = {10.5281/zenodo.49720},
url = {http://dx.doi.org/10.5281/zenodo.49720}
}
If you have questions or comments about the package, please do so on the mailing list: https://groups.google.com/forum/#!forum/corrfunc
Corrfunc is released under the MIT license. Basically, do what you want with the code including using it in commercial application.
- website (https://manodeep.github.io/Corrfunc/)
- version control (https://github.com/manodeep/Corrfunc)