Skip to content

CV-IP/caffe-video_triplet

 
 

Repository files navigation

caffe-video_triplet

This code is developed based on Caffe: project site.

This code is the implementation for training the siamese-triplet network in the paper:

Xiaolong Wang and Abhinav Gupta. Unsupervised Learning of Visual Representations using Videos. Proc. of IEEE International Conference on Computer Vision (ICCV), 2015. pdf

Codes

Training scripts are in rank_scripts/rank_alexnet:

For implementation, since the siamese networks share the weights, so there is only one network in prototxt.

The input of the network is pairs of image patches. For each pair of patches, they are taken as the similar patches in the same video track. We use the label to specify whether the patches come from the same video, if they come from different videos they will have different labels (it does not matter what is the number, just need to be integer). In this way, we can get the third negative patch from other pairs with different labels.

In the loss, for each pair of patches, it will try to find the third negative patch in the same batch. There are two ways to do it, one is random selection, the other is hard negative mining.

In the prototxt:

layer {		
	name: "loss"	
	type: "RankHardLoss" 	
	rank_param{		
		neg_num: 4	
		pair_size: 2 	
		hard_ratio: 0.5 	
		rand_ratio: 0.5 	
		margin: 1 	
	} 	
	bottom: "norml2" 	
	bottom: "label" 	
}

neg_num means how many negative patches you want for each pair of patches, if it is 4, that means there are 4 triplets. pair_size = 2 just means inputs are pairs of patches. hard_ratio = 0.5 means half of the negative patches are hard examples, rand_ratio = 0.5 means half of the negative patches are randomly selected. For start, you can just set rand_ratio = 1 and hard_ratio = 0. The margin for contrastive loss needs to be designed for different tasks, trying to set margin = 0.5 or 0.1 might make a difference for other tasks.

Models

We offer two models trained with our method:

color model is trained with RGB images. gray model is trained with gray images (3-channel inputs). prototxt is the prototxt for both models. mean is the mean file.

Training Patches

The unsupervised mined patches can be downloaded from here: https://www.dropbox.com/sh/b9rgd8nkh498kuy/AACt6Gk8V7-f8Yq7qVSCs5TGa?dl=0

Each tar file contains different patches.

The example of the training list can be downloaded from here: https://dl.dropboxusercontent.com/u/334666754/unsup_patches/trainlist.txt

About

Unsupervised Learning using Videos (ICCV 2015)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 80.4%
  • Python 8.5%
  • Cuda 4.6%
  • CMake 3.0%
  • Protocol Buffer 1.5%
  • MATLAB 1.0%
  • Other 1.0%