Skip to content

cc272309126/cutorch

 
 

Repository files navigation

cutorch

Cutorch provides a CUDA backend for torch7.

Cutorch provides the following:

  • a new tensor type: torch.CudaTensor that acts like torch.FloatTensor, but all it's operations are on the GPU. Most of the tensor operations are supported by cutorch. There are a few missing ones, which are being implemented. The missing list can be found here: torch#70
  • cutorch.* - Functions to set/get GPU, get device properties, memory usage, set/get low-level streams, set/get random number generator's seed, synchronization etc. They are described in more detail below.
### torch.CudaTensor This new tensor type behaves exactly like a `torch.FloatTensor`, but has a couple of extra functions of note: - `t:getDevice()` - Given a CudaTensor `t`, you can call :getDevice on it to find out the GPU ID on which the tensor memory is allocated. ###`cutorch.*` API - `cutorch.synchronize()` : All of the CUDA API is asynchronous (barring a few functions), which means that you can queue up operations. To wait for the operations to finish, you can issue `cutorch.synchronize()` in your code, when the code waits for all GPU operations on the current GPU to finish. - `cutorch.setDevice(i)` : If one has multiple-GPUs, you can switch the default GPU (to allocate CUDA tensors and do operations). The GPU IDs are 1-indexed, so having 4 GPUs means, you can setDevice(1), setDevice(2), setDevice(3), setDevice(4). Alternatively, you can use [auto-device mode](#cutorch.api.autodevice). - `idx = cutorch.getDevice()` : Returns the currently set GPU device index. - `count = cutorch.getDeviceCount()` : Gets the number of available GPUs. - `totalMemory, freeMemory = cutorch.getMemoryUsage(devID)` : Gets the total and free memory in bytes for the given device ID. - `cutorch.seed([devID])` - Sets and returns a random seed for the current or specified device. - `cutorch.seedAll()` - Sets and returns a random seed for all available GPU devices. - `cutorch.initialSeed([devID])` - Returns the seed for the current or specified device - `cutorch.manualSeed(seed [, device])` - Sets a manually specified RNG seed for the current or specified device - `cutorch.manualSeedAll(seed)` - Sets a manually specified RNG seed for all available GPUs - `cutorch.getRNGState([device])` - returns the current RNG state in the form of a byte tensor, for the current or specified device. - `cutorch.setRNGState(state [, device])` - Sets the RNG state from a previously saved state, on the current or specified device. - `cutorch.getState()` - Returns the global state of the cutorch package. This state is not for users, it stores the raw RNG states, cublas handles and other thread and device-specific stuff. - `cutorch.withDevice(devID, f)` - This is a convenience for multi-GPU code, that takes in a device ID as well as a function f. It switches cutorch to the new device, executes the function f, and switches back cutorch to the original device. Alternatively, you can use [auto-device mode](#cutorch.api.autodevice). #### Auto-device mode

Computations on CUDA tensors must be run on the CUDA device where the tensor resides. Running a computation on a tensor from the wrong device will lead to a cutorch error.

If device is set to 0, cutorch will automatically determine where to run computation. In this mode, tensors must be created with the torch.CudaTensorOn(device,...), :cudaOn(device,...), and :cloneOn(device) convenience methods.

cutorch.setDevice(0)
local t1 = torch.CudaTensorOn(2, 1000)  -- on device 2
local t2 = torch.Tensor(1000):cudaOn(3) -- on device 3
local t3 = t1 + 1                       -- on device 2
#### Low-level streams functions (dont use this as a user, easy to shoot yourself in the foot): - `cutorch.reserveStreams(n)`: creates n user streams for use on every device. - `n = cutorch.getNumStreams()`: returns the number of user streams available on every device. By `default`, this is `0`, meaning only the default stream (stream 0) is available. - `cutorch.setStream(n)`: specifies that the current stream active for the current device (or any other device) is `n`. This is preserved across device switches. 1-N are user streams, `0` is the default stream. - `n = cutorch.getStream()`: returns the current stream active. By default, returns `0`. - `cutorch.setDefaultStream()`: an alias for `cutorch.setStream(0)` - `cutorch.streamWaitFor(streamWaiting, {streamsToWaitOn...})`: A 1-to-N-way barrier. `streamWaiting` will wait for the list of streams specified to finish executing all kernels/events/barriers. Does not block any of the streamsToWaitOn. Current device only. - `cutorch.streamWaitForMultiDevice(deviceWaiting, streamWaiting, {[device]={streamsToWaitOn...}...})`: (deviceWaiting, streamWaiting) will wait on the list of (`device`, `streams`...) pairs; handles single or multiple device. `cutorch.streamWaitForMultiDevice, a, b, {[a]={streams...}})` is equivalent to `cutorch.setDevice(a); cutorch.streamWaitFor(b, {streams...})`. - `cutorch.streamBarrier({streams...})`: an N-to-N-way barrier between all the streams; all streams will wait for the completion of all other streams on the current device only. More efficient than creating the same N-to-N-way dependency via `streamWaitFor`. - `cutorch.streamBarrierMultiDevice({[device]={streamsToWaitOn...}...})`: As with streamBarrier but allows barriers between streams on arbitrary devices. Creates a cross-device N-to-N-way barrier between all (device, stream) values listed. - `cutorch.streamSynchronize(stream)`: equivalent to `cudaStreamSynchronize(stream)` for the current device. Blocks the CPU until stream completes its queued kernels/events.

Transfering a FloatTensor src to the GPU:

dest = src:cuda() -- dest is on the current GPU

Allocating a tensor on a given GPU: Allocate src on GPU 3

src = torch.CudaTensorOn(3, 100)

Copying a CUDA tensor from one GPU to another: Given a tensor called src on GPU 1, if you want to create its clone on GPU 2, then:

local dest = src:cloneOn(2)

About

A CUDA backend for Torch7

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 43.5%
  • C 34.5%
  • Lua 21.3%
  • Other 0.7%