Skip to content

cmcantalupo/SorterDist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

++++++++++++++++
+  SorterDist  +
++++++++++++++++

Distributed parallel sorter with Python interface and C++ back end.
Will implement OpenMP, MPI, cURL and AMQP network protocols.

--------
Overview
--------

  The functionality for non-mpi sorts is likley better done with 
  a Map/Reduce framework like Hadoop.  

  Will provide a framework for distributed data computing which is
  centered around the power of distributed sorting.  The python
  interface will give access to the fast native C++ functionality and
  also provide additional functionality for loosely coupled
  distributed systems.

  The C++ will serve as a fast engine for the basic functionality of
  the library.  SorterThreaded and SorterDistMPI will be implemented in
  C++ with BOOST bindings for Python.  There will be a native Sorter
  and SorterDist base class in both Python and C++.

  - Current state
  ----------------

    Sketched out the basic python interface.  

    Some work towards implementing the threaded shared memory parallel
    sort in C++: SorterThreaded.  The testing suite needs to be made
    more robust.  Current test suite passes with two threads.   

    Current design is for double data.  Need to see how to implement
    it for python objects through BOOST.

------
Python
------

  - Modules
  ----------

  + Sorter

    A low memory high performance base class holds a sequence of data
    and information about how to compare the data elements.  

    SorterThreaded

    Adds shared memory multi-threading to Sorter.  

  + SorterDist

    Class for distributed parallel sorting.  Will be able to keep
    track of where on other nodes data is stored for future sorts of
    data with the same distribution over the network.  The distributed
    sort will essentially provide a way to sequence data by a user
    defined metric that is distributed over a network.  The real power
    will come from an interface that will allow nodes to request
    specific sub-sequences of the sorted data to be gathered locally.
    This class will be created by a factory.  

  + SorterDistBase
    
    Base class from which distributed sorter's derive.  

  + SorterDistMPI

    MPI parallel sorter for distributed sorting functionality in a
    tightly coupled parallel environment. Implemented in C++.  Derived
    from SorterDist and SorterThreaded classes.

  + SorterDistAMQP

    AMQP parallel sorter for distributed sorting in a loosely coupled
    parallel environment.

  + SorterDistCurl
    
    cURL based parallel sorter for distributed sorting in a loosely 
    coupled parallel environment.  

  + SorterDistNice
    
    A low priority distributed parallel sorter that uses network
    bandwidth as it becomes available and writes the results to fast
    local disk. Can be created with a SorterDistAMQP or a
    SorterDistCurl object.  

---
C++
---

  - Classes
  ---------

  + Sorter

    Pure virtual base class which holds the data and the comparison
    function.

  + SorterThreaded

    Class for threaded parallel sorting.  Parallelism through OpenMP.

  + SorterDist

    Base class for distributed parallel sorting.  

  + SorterDistMPI

    MPI implementation of distributed parallel sorter. 


Chris Cantalupo
cmcantalupo@gmail.com
July 7, 2011

About

Distributed parallel sorter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published