cutorch

Cutorch provides a CUDA backend for torch7.

Cutorch provides the following:

a new tensor type: torch.CudaTensor that acts like torch.FloatTensor, but all it's operations are on the GPU. Most of the tensor operations are supported by cutorch. There are a few missing ones, which are being implemented. The missing list can be found here: torch#70
cutorch.* - Functions to set/get GPU, get device properties, memory usage, set/get low-level streams, set/get random number generator's seed, synchronization etc. They are described in more detail below.

### torch.CudaTensor This new tensor type behaves exactly like a `torch.FloatTensor`, but has a couple of extra functions of note: - `t:getDevice()` - Given a CudaTensor `t`, you can call :getDevice on it to find out the GPU ID on which the tensor memory is allocated. ###`cutorch.*` API - `cutorch.synchronize()` : All of the CUDA API is asynchronous (barring a few functions), which means that you can queue up operations. To wait for the operations to finish, you can issue `cutorch.synchronize()` in your code, when the code waits for all GPU operations on the current GPU to finish. - `cutorch.setDevice(i)` : If one has multiple-GPUs, you can switch the default GPU (to allocate CUDA tensors and do operations). The GPU IDs are 1-indexed, so having 4 GPUs means, you can setDevice(1), setDevice(2), setDevice(3), setDevice(4). Alternatively, you can use [auto-device mode](#cutorch.api.autodevice). - `idx = cutorch.getDevice()` : Returns the currently set GPU device index. - `count = cutorch.getDeviceCount()` : Gets the number of available GPUs. - `totalMemory, freeMemory = cutorch.getMemoryUsage(devID)` : Gets the total and free memory in bytes for the given device ID. - `cutorch.seed([devID])` - Sets and returns a random seed for the current or specified device. - `cutorch.seedAll()` - Sets and returns a random seed for all available GPU devices. - `cutorch.initialSeed([devID])` - Returns the seed for the current or specified device - `cutorch.manualSeed(seed [, device])` - Sets a manually specified RNG seed for the current or specified device - `cutorch.manualSeedAll(seed)` - Sets a manually specified RNG seed for all available GPUs - `cutorch.getRNGState([device])` - returns the current RNG state in the form of a byte tensor, for the current or specified device. - `cutorch.setRNGState(state [, device])` - Sets the RNG state from a previously saved state, on the current or specified device. - `cutorch.getState()` - Returns the global state of the cutorch package. This state is not for users, it stores the raw RNG states, cublas handles and other thread and device-specific stuff. - `cutorch.withDevice(devID, f)` - This is a convenience for multi-GPU code, that takes in a device ID as well as a function f. It switches cutorch to the new device, executes the function f, and switches back cutorch to the original device. Alternatively, you can use [auto-device mode](#cutorch.api.autodevice). #### Auto-device mode

Computations on CUDA tensors must be run on the CUDA device where the tensor resides. Running a computation on a tensor from the wrong device will lead to a cutorch error.

If device is set to 0, cutorch will automatically determine where to run computation. In this mode, tensors must be created with the torch.CudaTensorOn(device,...), :cudaOn(device,...), and :cloneOn(device) convenience methods.

cutorch.setDevice(0)
local t1 = torch.CudaTensorOn(2, 1000)  -- on device 2
local t2 = torch.Tensor(1000):cudaOn(3) -- on device 3
local t3 = t1 + 1                       -- on device 2

#### Low-level streams functions (dont use this as a user, easy to shoot yourself in the foot): - `cutorch.reserveStreams(n)`: creates n user streams for use on every device. - `n = cutorch.getNumStreams()`: returns the number of user streams available on every device. By `default`, this is `0`, meaning only the default stream (stream 0) is available. - `cutorch.setStream(n)`: specifies that the current stream active for the current device (or any other device) is `n`. This is preserved across device switches. 1-N are user streams, `0` is the default stream. - `n = cutorch.getStream()`: returns the current stream active. By default, returns `0`. - `cutorch.setDefaultStream()`: an alias for `cutorch.setStream(0)` - `cutorch.streamWaitFor(streamWaiting, {streamsToWaitOn...})`: A 1-to-N-way barrier. `streamWaiting` will wait for the list of streams specified to finish executing all kernels/events/barriers. Does not block any of the streamsToWaitOn. Current device only. - `cutorch.streamWaitForMultiDevice(deviceWaiting, streamWaiting, {[device]={streamsToWaitOn...}...})`: (deviceWaiting, streamWaiting) will wait on the list of (`device`, `streams`...) pairs; handles single or multiple device. `cutorch.streamWaitForMultiDevice, a, b, {[a]={streams...}})` is equivalent to `cutorch.setDevice(a); cutorch.streamWaitFor(b, {streams...})`. - `cutorch.streamBarrier({streams...})`: an N-to-N-way barrier between all the streams; all streams will wait for the completion of all other streams on the current device only. More efficient than creating the same N-to-N-way dependency via `streamWaitFor`. - `cutorch.streamBarrierMultiDevice({[device]={streamsToWaitOn...}...})`: As with streamBarrier but allows barriers between streams on arbitrary devices. Creates a cross-device N-to-N-way barrier between all (device, stream) values listed. - `cutorch.streamSynchronize(stream)`: equivalent to `cudaStreamSynchronize(stream)` for the current device. Blocks the CPU until stream completes its queued kernels/events.

Common Examples

Transfering a FloatTensor src to the GPU:

dest = src:cuda() -- dest is on the current GPU

Allocating a tensor on a given GPU: Allocate src on GPU 3

src = torch.CudaTensorOn(3, 100)

Copying a CUDA tensor from one GPU to another: Given a tensor called src on GPU 1, if you want to create its clone on GPU 2, then:

local dest = src:cloneOn(2)

Name		Name	Last commit message	Last commit date
Latest commit History 317 Commits
lib		lib
rocks		rocks
test		test
torch		torch
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
FFI.lua		FFI.lua
LICENSE		LICENSE
README.md		README.md
Storage.c		Storage.c
Tensor.c		Tensor.c
Tensor.lua		Tensor.lua
TensorMath.lua		TensorMath.lua
TensorOperator.c		TensorOperator.c
init.c		init.c
init.lua		init.lua

License

cc272309126/cutorch

Folders and files

Latest commit

History

Repository files navigation

cutorch

Common Examples

About

Resources

License

Stars

Watchers

Forks

Languages