Metric Filler::get_metric(double x,double y,double z,GEntity* ge){ Metric m; SMetric3 temp; SVector3 v1,v2,v3; Field* field; FieldManager* manager; v1 = SVector3(1.0,0.0,0.0); v2 = SVector3(0.0,1.0,0.0); v3 = SVector3(0.0,0.0,1.0); manager = ge->model()->getFields(); if(manager->getBackgroundField()>0){ field = manager->get(manager->getBackgroundField()); if(field){ (*field)(x,y,z,temp,ge); } } m.set_m11(v1.x()); m.set_m21(v1.y()); m.set_m31(v1.z()); m.set_m12(v2.x()); m.set_m22(v2.y()); m.set_m32(v2.z()); m.set_m13(v3.x()); m.set_m23(v3.y()); m.set_m33(v3.z()); return m; }
/** * \details Calculates incusive and exclusive values for every metric. */ void AggrCube::get_met_tree(vector<double>& excl_sevv, vector<double>& incl_sevv, inclmode cnode_mode, inclmode sys_mode, Cnode* cnode, Sysres* sys) const { const vector<Metric*>& metrics = get_metv(); size_t num_metrics = metrics.size(); excl_sevv.resize(num_metrics); incl_sevv.resize(num_metrics); #pragma omp parallel { #pragma omp for for (size_t i = 0; i < num_metrics; ++i) incl_sevv[i] = get_vcsev(INCL, cnode_mode, sys_mode, metrics[i], cnode, sys); #pragma omp for for (size_t i = 0; i < num_metrics; ++i) { Metric* met = metrics[i]; double result = incl_sevv[i]; for (unsigned int j = 0; j < met->num_children(); ++j) result -= incl_sevv[met->get_child(j)->get_id()]; excl_sevv[i] = result; } } }
json::Object metricToJson(const Metric& metric) { json::Object metricJson = metricBaseToJson(metric); metricJson["name"] = metric.data().name; metricJson["value"] = metric.data().value; return metricJson; }
TEST_F(HIST_PERC_Fixture,TestPercentilesMany) { bool failure = false; try { factory->process("create percentiles aaa with time_window=30 collect_period=10 ps=50:90:99:99.9"); } catch(std::runtime_error& ex) { cout << "Failed: " << ex.what() << endl; failure = true; } catch(...) { failure = true; } ASSERT_FALSE(failure); Dataset* dataset = Dataset::open(path, CWSE_DATA_ACCESS_READ_ONLY); try { uint16_t ids[4]; ids[0] = dataset->findColumn("aaa.50"); ids[1] = dataset->findColumn("aaa.90"); ids[2] = dataset->findColumn("aaa.99"); ids[3] = dataset->findColumn("aaa.99.9"); for (int i=0; i < 4; i++) { ASSERT_TRUE(ids[i] != CWSE_INVALID_COLUMN_ID); ASSERT_EQ(0, dataset->get(ids[i])); } Metric* metric = factory->find("aaa"); ASSERT_FALSE(metric == NULL); Percentiles* p = dynamic_cast<Percentiles*>(metric); ASSERT_FALSE(p == NULL); p->resetNextTimePoint(10); for (int i=1; i <= 100; i++) { metric->update(i); } p->onTimeTick(10); ASSERT_EQ(50.5, dataset->get(ids[0])); ASSERT_EQ(90.5, dataset->get(ids[1])); ASSERT_EQ(99.5, dataset->get(ids[2])); ASSERT_EQ(100, dataset->get(ids[3])); } catch(std::runtime_error& ex) { cout << "Failed: " << ex.what() << endl; failure = true; } catch(...) { delete dataset; ASSERT_TRUE(false); } delete dataset; }
TEST_F(HIST_PERC_Fixture,TestPercentilesExample) { bool failure = false; try { factory->process("create percentiles aaa with time_window=30 collect_period=10 ps=20:50:90"); } catch(std::runtime_error& ex) { cout << "Failed: " << ex.what() << endl; failure = true; } catch(...) { failure = true; } ASSERT_FALSE(failure); Dataset* dataset = Dataset::open(path, CWSE_DATA_ACCESS_READ_ONLY); try { uint16_t ids[3]; ids[0] = dataset->findColumn("aaa.20"); ids[1] = dataset->findColumn("aaa.50"); ids[2] = dataset->findColumn("aaa.90"); for (size_t i=0; i < sizeof(ids)/sizeof(*ids); i++) { ASSERT_TRUE(ids[i] != CWSE_INVALID_COLUMN_ID); ASSERT_EQ(0, dataset->get(ids[i])); } Metric* metric = factory->find("aaa"); ASSERT_FALSE(metric == NULL); Percentiles* p = dynamic_cast<Percentiles*>(metric); ASSERT_FALSE(p == NULL); p->resetNextTimePoint(10); double source[] = { 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99 }; for (size_t i=0; i < sizeof(source)/sizeof(*source); i++) { metric->update(source[i]); } p->onTimeTick(10); ASSERT_EQ(64, dataset->get(ids[0])); ASSERT_EQ(77, dataset->get(ids[1])); ASSERT_EQ(98, dataset->get(ids[2])); } catch(std::runtime_error& ex) { cout << "Failed: " << ex.what() << endl; failure = true; } catch(...) { delete dataset; ASSERT_TRUE(false); } delete dataset; }
double infinity_distance(SPoint3 p1,SPoint3 p2,Metric m){ double distance; double x1,y1,z1; double x2,y2,z2; double a,b,c,d,e,f,g,h,i; a = m.get_m11(); b = m.get_m21(); c = m.get_m31(); d = m.get_m12(); e = m.get_m22(); f = m.get_m32(); g = m.get_m13(); h = m.get_m23(); i = m.get_m33(); x1 = a*p1.x() + b*p1.y() + c*p1.z(); y1 = d*p1.x() + e*p1.y() + f*p1.z(); z1 = g*p1.x() + h*p1.y() + i*p1.z(); x2 = a*p2.x() + b*p2.y() + c*p2.z(); y2 = d*p2.x() + e*p2.y() + f*p2.z(); z2 = g*p2.x() + h*p2.y() + i*p2.z(); distance = std::max(std::max(fabs(x2-x1),fabs(y2-y1)),fabs(z2-z1)); return distance; }
TEST_F(HIST_PERC_Fixture,TestPercentilesAnotherExample) { bool failure = false; try { factory->process("create percentiles aaa with time_window=30 collect_period=10 ps=25:85"); } catch(std::runtime_error& ex) { cout << "Failed: " << ex.what() << endl; failure = true; } catch(...) { failure = true; } ASSERT_FALSE(failure); Dataset* dataset = Dataset::open(path, CWSE_DATA_ACCESS_READ_ONLY); try { uint16_t ids[2]; ids[0] = dataset->findColumn("aaa.25"); ids[1] = dataset->findColumn("aaa.85"); for (size_t i=0; i < sizeof(ids)/sizeof(*ids); i++) { ASSERT_TRUE(ids[i] != CWSE_INVALID_COLUMN_ID); ASSERT_EQ(0, dataset->get(ids[i])); } Metric* metric = factory->find("aaa"); ASSERT_FALSE(metric == NULL); Percentiles* p = dynamic_cast<Percentiles*>(metric); ASSERT_FALSE(p == NULL); p->resetNextTimePoint(10); double source[] = { 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 9, 9, 9, 10, 10, 10 }; for (size_t i=0; i < sizeof(source)/sizeof(*source); i++) { metric->update(source[i]); } p->onTimeTick(10); ASSERT_EQ(5, dataset->get(ids[0])); ASSERT_TRUE(9.5 == dataset->get(ids[1])) << " val=" << dataset->get(ids[1]) << endl; } catch(std::runtime_error& ex) { cout << "Failed: " << ex.what() << endl; failure = true; } catch(...) { delete dataset; ASSERT_TRUE(false); } delete dataset; }
void StatsD::publishAggregateMetrics() { for( AggregateMetricList::iterator itr = m_aggregate_metrics.begin(); itr != m_aggregate_metrics.end(); itr++ ) { Metric *metric = newMetric(itr->second); metric->convertToNonAggregateValue(); publishMetric(*metric); delete metric; } }
void Segmentor::getGoldActions(const vector<Instance>& vecInsts, vector<vector<CAction> >& vecActions){ vecActions.clear(); static Metric eval; #if USE_CUDA==1 static CStateItem<gpu> state[m_classifier.MAX_SENTENCE_SIZE]; #else static CStateItem<cpu> state[m_classifier.MAX_SENTENCE_SIZE]; #endif static vector<string> output; static CAction answer; eval.reset(); static int numInstance, actionNum; vecActions.resize(vecInsts.size()); for (numInstance = 0; numInstance < vecInsts.size(); numInstance++) { const Instance &instance = vecInsts[numInstance]; actionNum = 0; state[actionNum].initSentence(&instance.chars); state[actionNum].clear(); while (!state[actionNum].IsTerminated()) { state[actionNum].getGoldAction(instance.words, answer); vecActions[numInstance].push_back(answer); state[actionNum].move(state+actionNum+1, answer); actionNum++; } if(actionNum-1 != instance.charsize()) { std::cout << "action number is not correct, please check" << std::endl; } state[actionNum].getSegResults(output); instance.evaluate(output, eval); if (!eval.bIdentical()) { std::cout << "error state conversion!" << std::endl; exit(0); } if ((numInstance + 1) % m_options.verboseIter == 0) { cout << numInstance + 1 << " "; if ((numInstance + 1) % (40 * m_options.verboseIter) == 0) cout << std::endl; cout.flush(); } if (m_options.maxInstance > 0 && numInstance == m_options.maxInstance) break; } }
int main(void) { Metric mc; mc.subwinSize = Size3(16,16,1); mc.subwinStep = Size3(16,16,1); string scorefilepath = "/modules/Site/grantest_output.txt"; mc.searchPath = "/data/totest/gran/"; mc.subsample = true; mc.changeWin = false; //cout<<mc.subwinSize<<endl; mc.trainGranularity(mc.searchPath,scorefilepath); return 0; }
bool TextWriter::writeCommon(const Metric& metric) { std::ostringstream path; for (uint32_t i=0; i<_path.size(); ++i) { path << _path[i] << "."; } std::string mypath(path.str()); path << metric.getMangledName(); if (_regex.match(path.str())) { if (metric.used() || _verbose) { _out << "\n" << mypath; return true; } } return false; }
void thundersvm_predict_sub(DataSet& predict_dataset, CMDParser& parser, char* model_file_path, char* output_file_path){ fstream file; file.open(model_file_path, std::fstream::in); string feature, svm_type; file >> feature >> svm_type; CHECK_EQ(feature, "svm_type"); SvmModel *model = nullptr; Metric *metric = nullptr; if (svm_type == "c_svc") { model = new SVC(); metric = new Accuracy(); } else if (svm_type == "nu_svc") { model = new NuSVC(); metric = new Accuracy(); } else if (svm_type == "one_class") { model = new OneClassSVC(); //todo determine a metric } else if (svm_type == "epsilon_svr") { model = new SVR(); metric = new MSE(); } else if (svm_type == "nu_svr") { model = new NuSVR(); metric = new MSE(); } #ifdef USE_CUDA CUDA_CHECK(cudaSetDevice(parser.gpu_id)); #endif model->set_max_memory_size_Byte(parser.param_cmd.max_mem_size); model->load_from_file(model_file_path); file.close(); file.open(output_file_path, fstream::out); vector<float_type> predict_y; predict_y = model->predict(predict_dataset.instances(), -1); for (int i = 0; i < predict_y.size(); ++i) { file << predict_y[i] << std::endl; } file.close(); if (metric) { LOG(INFO) << metric->name() << " = " << metric->score(predict_y, predict_dataset.y()); } }
inline Future<Nothing> remove(const Metric& metric) { // The metrics process is instantiated in `process::initialize`. process::initialize(); return dispatch( internal::metrics, &internal::MetricsProcess::remove, metric.name()); }
// You'd think that work items that span entire rows rather than squarish 2d regions would be faster because they'd // have fewer calls to metric.setTree1, but it turns out they're about 5% slower. [Robb P. Matzke 2014-09-27] void init() { WorkItem toDo; while (workList.next().assignTo(toDo)) { for (size_t i=toDo.iBegin; i<toDo.iEnd; ++i) { if (i<functions1.size()) { metric.setTree1(functions1[i]); for (size_t j=toDo.jBegin; j<toDo.jEnd; ++j) { if (j<functions2.size()) { distance(i, j) = metric.compute(functions2[j]).relativeCost(); } else { distance(i, j) = 0.0; } } } else { for (size_t j=toDo.jBegin; j<toDo.jEnd; ++j) { distance(i, j) = 0.0; } } } } }
void Segmentor::getGoldActions(const vector<Instance>& vecInsts, vector<vector<CAction> >& vecActions) { vecActions.clear(); static Metric segEval, posEval; static CStateItem state[m_classifier.MAX_SENTENCE_SIZE]; static CResult output; static CAction answer; segEval.reset(); posEval.reset(); static int numInstance, actionNum; vecActions.resize(vecInsts.size()); for (numInstance = 0; numInstance < vecInsts.size(); numInstance++) { const Instance &instance = vecInsts[numInstance]; actionNum = 0; state[actionNum].initSentence(&instance.chars, &instance.candidateLabels); state[actionNum].clear(); while (!state[actionNum].IsTerminated()) { state[actionNum].getGoldAction(instance, m_classifier.fe._postagAlphabet, answer); vecActions[numInstance].push_back(answer); state[actionNum].move(state + actionNum + 1, answer, m_classifier.fe._postagAlphabet); actionNum++; } if (actionNum - 1 != instance.charsize()) { std::cout << "action number is not correct, please check" << std::endl; } state[actionNum].getSegPosResults(output); instance.evaluate(output, segEval, posEval); if (!segEval.bIdentical() || !posEval.bIdentical()) { std::cout << "error state conversion!" << std::endl; std::cout << "output instance:" << std::endl; for (int tmpK = 0; tmpK < instance.words.size(); tmpK++) { std::cout << instance.words[tmpK] << "_" << instance.postags[tmpK] << " "; } std::cout << std::endl; std::cout << "predicated instance:" << std::endl; for (int tmpK = 0; tmpK < output.size(); tmpK++) { std::cout << output.words[tmpK] << "_" << output.postags[tmpK] << " "; } std::cout << std::endl; exit(0); } if ((numInstance + 1) % m_options.verboseIter == 0) { cout << numInstance + 1 << " "; if ((numInstance + 1) % (40 * m_options.verboseIter) == 0) cout << std::endl; cout.flush(); } if (m_options.maxInstance > 0 && numInstance == m_options.maxInstance) break; } }
Metric Filler::get_metric(double x,double y,double z){ Metric m; STensor3 m2; if(CTX::instance()->mesh.smoothCrossField){ m2 = Frame_field::findCross(x,y,z); } else m2 = Frame_field::search(x,y,z); m.set_m11(m2.get_m11()); m.set_m21(m2.get_m21()); m.set_m31(m2.get_m31()); m.set_m12(m2.get_m12()); m.set_m22(m2.get_m22()); m.set_m32(m2.get_m32()); m.set_m13(m2.get_m13()); m.set_m23(m2.get_m23()); m.set_m33(m2.get_m33()); return m; }
void VirtualPathSelection::createPeerLink(const wns::service::dll::UnicastAddress myself, const wns::service::dll::UnicastAddress peer, const Metric linkMetric) { if(useStaticPS) { if (wns::simulator::getEventScheduler()->getTime() > staticPSsnapshotTimeout) { return; } } assure(mapper.knows(myself), "myself (" << myself << ") is not a known address"); assure(mapper.knows(peer), "peer (" << peer << ") is not a known address"); assure(isMeshPoint(myself) , "myself (" << myself << ") is not a known MP"); assure(isMeshPoint(peer), "peer (" << peer << ") is not a known MP"); assure(linkMetric.isNotInf(), "createPeerLink with metric == inf"); if(isPortal(myself) && isPortal(peer)) { MESSAGE_SINGLE(NORMAL, logger, "createPeerLink: Ignore wireless link between " << myself << " and " << peer << ", both are portals"); } else { int myselfId = mapper.get(myself); int peerId = mapper.get(peer); assure(linkCosts[myselfId][peerId].isInf(), "createPeerLink with already known linkCosts"); Metric newLinkMetric = linkMetric; if(preKnowledgeCosts[myselfId][peerId].isNotInf()) { newLinkMetric = newLinkMetric * (1.0-preKnowledgeAlpha) + preKnowledgeCosts[myselfId][peerId] * preKnowledgeAlpha; MESSAGE_SINGLE(NORMAL, logger, "createPeerLink: " << myself << "->" << peer << " has preKnowledge of " << preKnowledgeCosts[myselfId][peerId] << ", " << linkMetric << "->" << newLinkMetric); } linkCosts[myselfId][peerId] = newLinkMetric; MESSAGE_SINGLE(NORMAL, logger, "createPeerLink: " << myself << " --> " << peer << " costs " << linkCosts[myselfId][peerId]); pathMatrixIsConsistent = false; onNewPathSelectionEntry(); } }
void Segmentor::train(const string& trainFile, const string& devFile, const string& testFile, const string& modelFile, const string& optionFile, const string& lexiconFile) { if (optionFile != "") m_options.load(optionFile); m_options.showOptions(); vector<Instance> trainInsts, devInsts, testInsts; m_pipe.readInstances(trainFile, trainInsts, m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); if (devFile != "") m_pipe.readInstances(devFile, devInsts, m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); if (testFile != "") m_pipe.readInstances(testFile, testInsts, m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); vector<vector<Instance> > otherInsts(m_options.testFiles.size()); for (int idx = 0; idx < m_options.testFiles.size(); idx++) { m_pipe.readInstances(m_options.testFiles[idx], otherInsts[idx], m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); } createAlphabet(trainInsts); m_classifier.init(m_options.delta); m_classifier.setDropValue(m_options.dropProb); vector<vector<CAction> > trainInstGoldactions; getGoldActions(trainInsts, trainInstGoldactions); double bestPostagFmeasure = 0; int inputSize = trainInsts.size(); std::vector<int> indexes; for (int i = 0; i < inputSize; ++i) indexes.push_back(i); static Metric eval; static Metric segMetric_dev, segMetric_test; static Metric postagMetric_dev, postagMetric_test; int maxIter = m_options.maxIter * (inputSize / m_options.batchSize + 1); int oneIterMaxRound = (inputSize + m_options.batchSize - 1) / m_options.batchSize; std::cout << "maxIter = " << maxIter << std::endl; int devNum = devInsts.size(), testNum = testInsts.size(); static vector<CResult> decodeInstResults; static CResult curDecodeInst; static bool bCurIterBetter; static vector<Instance > subInstances; static vector<vector<CAction> > subInstGoldActions; for (int iter = 0; iter < maxIter; ++iter) { std::cout << "##### Iteration " << iter << std::endl; srand(iter); random_shuffle(indexes.begin(), indexes.end()); std::cout << "random: " << indexes[0] << ", " << indexes[indexes.size() - 1] << std::endl; bool bEvaluate = false; if (m_options.batchSize == 1) { eval.reset(); bEvaluate = true; for (int idy = 0; idy < inputSize; idy++) { subInstances.clear(); subInstGoldActions.clear(); subInstances.push_back(trainInsts[indexes[idy]]); subInstGoldActions.push_back(trainInstGoldactions[indexes[idy]]); double cost = m_classifier.train(subInstances, subInstGoldActions); eval.overall_label_count += m_classifier._eval.overall_label_count; eval.correct_label_count += m_classifier._eval.correct_label_count; if ((idy + 1) % (m_options.verboseIter * 10) == 0) { std::cout << "current: " << idy + 1 << ", Cost = " << cost << ", Correct(%) = " << eval.getAccuracy() << std::endl; } m_classifier.updateParams(m_options.regParameter, m_options.adaAlpha, m_options.adaEps); } std::cout << "current: " << iter + 1 << ", Correct(%) = " << eval.getAccuracy() << std::endl; } else { if (iter == 0) eval.reset(); subInstances.clear(); subInstGoldActions.clear(); for (int idy = 0; idy < m_options.batchSize; idy++) { subInstances.push_back(trainInsts[indexes[idy]]); subInstGoldActions.push_back(trainInstGoldactions[indexes[idy]]); } double cost = m_classifier.train(subInstances, subInstGoldActions); eval.overall_label_count += m_classifier._eval.overall_label_count; eval.correct_label_count += m_classifier._eval.correct_label_count; if ((iter + 1) % (m_options.verboseIter) == 0) { std::cout << "current: " << iter + 1 << ", Cost = " << cost << ", Correct(%) = " << eval.getAccuracy() << std::endl; eval.reset(); bEvaluate = true; } m_classifier.updateParams(m_options.regParameter, m_options.adaAlpha, m_options.adaEps); } if (bEvaluate && devNum > 0) { bCurIterBetter = false; if (!m_options.outBest.empty()) decodeInstResults.clear(); segMetric_dev.reset(); postagMetric_dev.reset(); for (int idx = 0; idx < devInsts.size(); idx++) { predict(devInsts[idx], curDecodeInst); devInsts[idx].evaluate(curDecodeInst, segMetric_dev, postagMetric_dev); if (!m_options.outBest.empty()) { decodeInstResults.push_back(curDecodeInst); } } std::cout << "dev:" << std::endl << "Seg: "; segMetric_dev.print(); std::cout << "Postag: "; postagMetric_dev.print(); if (!m_options.outBest.empty() && postagMetric_dev.getAccuracy() > bestPostagFmeasure) { m_pipe.outputAllInstances(devFile + m_options.outBest, decodeInstResults); bCurIterBetter = true; } if (testNum > 0) { if (!m_options.outBest.empty()) decodeInstResults.clear(); segMetric_test.reset(); postagMetric_test.reset(); for (int idx = 0; idx < testInsts.size(); idx++) { predict(testInsts[idx], curDecodeInst); testInsts[idx].evaluate(curDecodeInst, segMetric_test, postagMetric_test); if (bCurIterBetter && !m_options.outBest.empty()) { decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl << "Seg: "; segMetric_test.print(); std::cout << "Postag: "; postagMetric_test.print(); if (!m_options.outBest.empty() && bCurIterBetter) { m_pipe.outputAllInstances(testFile + m_options.outBest, decodeInstResults); } } for (int idx = 0; idx < otherInsts.size(); idx++) { std::cout << "processing " << m_options.testFiles[idx] << std::endl; if (!m_options.outBest.empty()) decodeInstResults.clear(); segMetric_test.reset(); postagMetric_test.reset(); for (int idy = 0; idy < otherInsts[idx].size(); idy++) { predict(otherInsts[idx][idy], curDecodeInst); otherInsts[idx][idy].evaluate(curDecodeInst, segMetric_test, postagMetric_test); if (bCurIterBetter && !m_options.outBest.empty()) { decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl << "Seg: "; segMetric_test.print(); std::cout << "Postag: "; postagMetric_test.print(); if (!m_options.outBest.empty() && bCurIterBetter) { m_pipe.outputAllInstances(m_options.testFiles[idx] + m_options.outBest, decodeInstResults); } } if (m_options.saveIntermediate && postagMetric_dev.getAccuracy() > bestPostagFmeasure) { std::cout << "Exceeds best previous DIS of " << bestPostagFmeasure << ". Saving model file.." << std::endl; bestPostagFmeasure = postagMetric_dev.getAccuracy(); writeModelFile(modelFile); } } } }
//void DataSet_load_from_python(DataSet *dataset, float *y, char **x, int len) {dataset->load_from_python(y, x, len);} void thundersvm_train_sub(DataSet& train_dataset, CMDParser& parser, char* model_file_path){ SvmModel *model = nullptr; switch (parser.param_cmd.svm_type) { case SvmParam::C_SVC: model = new SVC(); break; case SvmParam::NU_SVC: model = new NuSVC(); break; case SvmParam::ONE_CLASS: model = new OneClassSVC(); break; case SvmParam::EPSILON_SVR: model = new SVR(); break; case SvmParam::NU_SVR: model = new NuSVR(); break; } //todo add this to check_parameter method if (parser.param_cmd.svm_type == SvmParam::NU_SVC) { train_dataset.group_classes(); for (int i = 0; i < train_dataset.n_classes(); ++i) { int n1 = train_dataset.count()[i]; for (int j = i + 1; j < train_dataset.n_classes(); ++j) { int n2 = train_dataset.count()[j]; if (parser.param_cmd.nu * (n1 + n2) / 2 > min(n1, n2)) { printf("specified nu is infeasible\n"); return; } } } } if (parser.param_cmd.kernel_type != SvmParam::LINEAR) if (!parser.gamma_set) { parser.param_cmd.gamma = 1.f / train_dataset.n_features(); } #ifdef USE_CUDA CUDA_CHECK(cudaSetDevice(parser.gpu_id)); #endif vector<float_type> predict_y, test_y; if (parser.do_cross_validation) { predict_y = model->cross_validation(train_dataset, parser.param_cmd, parser.nr_fold); } else { model->train(train_dataset, parser.param_cmd); model->save_to_file(model_file_path); LOG(INFO) << "evaluating training score"; predict_y = model->predict(train_dataset.instances(), -1); //predict_y = model->predict(train_dataset.instances(), 10000); //test_y = train_dataset.y(); } Metric *metric = nullptr; switch (parser.param_cmd.svm_type) { case SvmParam::C_SVC: case SvmParam::NU_SVC: { metric = new Accuracy(); break; } case SvmParam::EPSILON_SVR: case SvmParam::NU_SVR: { metric = new MSE(); break; } case SvmParam::ONE_CLASS: { } } if (metric) { LOG(INFO) << metric->name() << " = " << metric->score(predict_y, train_dataset.y()) << std::endl; } return; }
void AdaRanker::learn(const QueryData &data, const Metric &metric) { weakRankerWeights = std::vector<double>(data.nfeature(), 0.0); std::vector<int> weakRankerChecked(data.nfeature(), 0); // precompute weak rankers std::vector<std::vector<double> > weakRankerScores(data.nquery(), std::vector<double>(data.nfeature(), 0.0)); for(int q = 0; q < data.nquery(); q++) { for(int f = 0; f < data.nfeature(); f++) { weakRankerScores[q][f] = metric.measure(weakRank(data.getQuery(q), f)); } } // a log of ranker scores std::vector<double> scores(1, metric.measure(rank(data))); // iterate over each feature std::vector<double> queryWeights(data.nquery(), 1.0/data.nquery()); for(int t = 0; t < data.nfeature(); t++) { // compute the query weighted score of unchecked each weak ranker std::vector<double> weightedScores(data.nfeature(), 0); for(int f = 0; f < data.nfeature(); f++) { if(weakRankerChecked[f] == 0) { for(int q = 0; q < data.nquery(); q++) { weightedScores[f] += weakRankerScores[q][f]*queryWeights[q]; } } } // find best weak ranker int bestWR = 0; for(int f = 0; f < data.nfeature(); f++) { if(weightedScores[f] > weightedScores[bestWR]) { bestWR = f; } } // compute alpha double num = 0.0; double den = 0.0; for(int q = 0; q < data.nquery(); q++) { num += queryWeights[q]*(1.0 + weakRankerScores[q][bestWR]); den += queryWeights[q]*(1.0 - weakRankerScores[q][bestWR]); } double alpha = 0.5*(log(num/den)); weakRankerWeights[bestWR] = alpha; weakRankerChecked[bestWR] = 1; // only update query weights if the new weak ranker improves overall performance double newScore = metric.measure(rank(data)); if((newScore-scores.back()) > 1e-4) { scores.push_back(newScore); std::vector<double> queryProbs; for(int q = 0; q < data.nquery(); q++) { queryProbs.push_back(exp(-metric.measure(rank(data.getQuery(q))))); } double totalQueryProb = std::accumulate(queryProbs.begin(), queryProbs.end(), 0.0); for(int q = 0; q < data.nquery(); q++) { queryWeights[q] = queryProbs[q]/totalQueryProb; } } else { weakRankerWeights[bestWR] = 0.0; } } }
static void update_freed(uint64_t freed, TR_AllocationKind allocationKind) { if (_enabled & allocationKind) _currentMemUsage.update_freed(freed); }
void Labeler::train(const string& trainFile, const string& devFile, const string& testFile, const string& modelFile, const string& optionFile, const string& wordEmbFile, const string& charEmbFile) { if (optionFile != "") m_options.load(optionFile); m_options.showOptions(); m_linearfeat = 0; vector<Instance> trainInsts, devInsts, testInsts; static vector<Instance> decodeInstResults; static Instance curDecodeInst; bool bCurIterBetter = false; m_pipe.readInstances(trainFile, trainInsts, m_options.maxInstance); if (devFile != "") m_pipe.readInstances(devFile, devInsts, m_options.maxInstance); if (testFile != "") m_pipe.readInstances(testFile, testInsts, m_options.maxInstance); //Ensure that each file in m_options.testFiles exists! vector<vector<Instance> > otherInsts(m_options.testFiles.size()); for (int idx = 0; idx < m_options.testFiles.size(); idx++) { m_pipe.readInstances(m_options.testFiles[idx], otherInsts[idx], m_options.maxInstance); } //std::cout << "Training example number: " << trainInsts.size() << std::endl; //std::cout << "Dev example number: " << trainInsts.size() << std::endl; //std::cout << "Test example number: " << trainInsts.size() << std::endl; createAlphabet(trainInsts); if (!m_options.wordEmbFineTune) { addTestWordAlpha(devInsts); addTestWordAlpha(testInsts); for (int idx = 0; idx < otherInsts.size(); idx++) { addTestWordAlpha(otherInsts[idx]); } cout << "Remain words num: " << m_textWordAlphabet.size() << endl; } if (!m_options.charEmbFineTune) { addTestCharAlpha(devInsts); addTestCharAlpha(testInsts); for (int idx = 0; idx < otherInsts.size(); idx++) { addTestCharAlpha(otherInsts[idx]); } cout << "Remain char num: " << m_charAlphabet.size() << endl; } NRMat<double> wordEmb; if (wordEmbFile != "") { readWordEmbeddings(wordEmbFile, wordEmb); } else { wordEmb.resize(m_textWordAlphabet.size(), m_options.wordEmbSize); wordEmb.randu(1000); } NRMat<double> charEmb; if (charEmbFile != "") { readWordEmbeddings(charEmbFile, charEmb); } else { charEmb.resize(m_charAlphabet.size(), m_options.charEmbSize); charEmb.randu(1001); } m_classifier.init(wordEmb, m_options.wordcontext, charEmb, m_options.charcontext, m_headWordAlphabet.size(), m_options.wordHiddenSize, m_options.charHiddenSize, m_options.hiddenSize); m_classifier.resetRemove(m_options.removePool, m_options.removeCharPool); m_classifier.setDropValue(m_options.dropProb); m_classifier.setWordEmbFinetune(m_options.wordEmbFineTune, m_options.charEmbFineTune); vector<Example> trainExamples, devExamples, testExamples; initialExamples(trainInsts, trainExamples); initialExamples(devInsts, devExamples); initialExamples(testInsts, testExamples); vector<int> otherInstNums(otherInsts.size()); vector<vector<Example> > otherExamples(otherInsts.size()); for (int idx = 0; idx < otherInsts.size(); idx++) { initialExamples(otherInsts[idx], otherExamples[idx]); otherInstNums[idx] = otherExamples[idx].size(); } double bestDIS = 0; int inputSize = trainExamples.size(); srand(0); std::vector<int> indexes; for (int i = 0; i < inputSize; ++i) indexes.push_back(i); static Metric eval, metric_dev, metric_test; static vector<Example> subExamples; int devNum = devExamples.size(), testNum = testExamples.size(); int maxIter = m_options.maxIter; if (m_options.batchSize > 1) maxIter = m_options.maxIter * (inputSize / m_options.batchSize + 1); double cost = 0.0; std::cout << "maxIter = " << maxIter << std::endl; for (int iter = 0; iter < m_options.maxIter; ++iter) { std::cout << "##### Iteration " << iter << std::endl; eval.reset(); if (m_options.batchSize == 1) { random_shuffle(indexes.begin(), indexes.end()); for (int updateIter = 0; updateIter < inputSize; updateIter++) { subExamples.clear(); int start_pos = updateIter; int end_pos = (updateIter + 1); if (end_pos > inputSize) end_pos = inputSize; for (int idy = start_pos; idy < end_pos; idy++) { subExamples.push_back(trainExamples[indexes[idy]]); } int curUpdateIter = iter * inputSize + updateIter; cost = m_classifier.process(subExamples, curUpdateIter); eval.overall_label_count += m_classifier._eval.overall_label_count; eval.correct_label_count += m_classifier._eval.correct_label_count; if ((curUpdateIter + 1) % m_options.verboseIter == 0) { //m_classifier.checkgrads(subExamples, curUpdateIter+1); std::cout << "current: " << updateIter + 1 << ", total instances: " << inputSize << std::endl; std::cout << "Cost = " << cost << ", SA Correct(%) = " << eval.getAccuracy() << std::endl; } m_classifier.updateParams(m_options.regParameter, m_options.adaAlpha, m_options.adaEps); } } else { cost = 0.0; for (int updateIter = 0; updateIter < m_options.verboseIter; updateIter++) { random_shuffle(indexes.begin(), indexes.end()); subExamples.clear(); for (int idy = 0; idy < m_options.batchSize; idy++) { subExamples.push_back(trainExamples[indexes[idy]]); } int curUpdateIter = iter * m_options.verboseIter + updateIter; cost += m_classifier.process(subExamples, curUpdateIter); //m_classifier.checkgrads(subExamples, curUpdateIter); eval.overall_label_count += m_classifier._eval.overall_label_count; eval.correct_label_count += m_classifier._eval.correct_label_count; m_classifier.updateParams(m_options.regParameter, m_options.adaAlpha, m_options.adaEps); } std::cout << "current iter: " << iter + 1 << ", total iter: " << maxIter << std::endl; std::cout << "Cost = " << cost << ", SA Correct(%) = " << eval.getAccuracy() << std::endl; } if (devNum > 0) { bCurIterBetter = false; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_dev.reset(); for (int idx = 0; idx < devExamples.size(); idx++) { string result_label; double confidence = predict(devExamples[idx].m_features, result_label); devInsts[idx].Evaluate(result_label, metric_dev); if (!m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(devInsts[idx]); curDecodeInst.assignLabel(result_label, confidence); decodeInstResults.push_back(curDecodeInst); } } metric_dev.print(); if ((!m_options.outBest.empty() && metric_dev.getAccuracy() > bestDIS)) { m_pipe.outputAllInstances(devFile + m_options.outBest, decodeInstResults); bCurIterBetter = true; } if (testNum > 0) { if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idx = 0; idx < testExamples.size(); idx++) { string result_label; double confidence = predict(testExamples[idx].m_features, result_label); testInsts[idx].Evaluate(result_label, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(testInsts[idx]); curDecodeInst.assignLabel(result_label, confidence); decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if ((!m_options.outBest.empty() && bCurIterBetter)) { m_pipe.outputAllInstances(testFile + m_options.outBest, decodeInstResults); } } for (int idx = 0; idx < otherExamples.size(); idx++) { std::cout << "processing " << m_options.testFiles[idx] << std::endl; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idy = 0; idy < otherExamples[idx].size(); idy++) { string result_label; double confidence = predict(otherExamples[idx][idy].m_features, result_label); otherInsts[idx][idy].Evaluate(result_label, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(otherInsts[idx][idy]); curDecodeInst.assignLabel(result_label, confidence); decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if ((!m_options.outBest.empty() && bCurIterBetter)) { m_pipe.outputAllInstances(m_options.testFiles[idx] + m_options.outBest, decodeInstResults); } } if ((m_options.saveIntermediate && metric_dev.getAccuracy() > bestDIS)) { if (metric_dev.getAccuracy() > bestDIS) { std::cout << "Exceeds best previous performance of " << bestDIS << ". Saving model file.." << std::endl; bestDIS = metric_dev.getAccuracy(); } writeModelFile(modelFile); } } // Clear gradients } if (devNum > 0) { bCurIterBetter = false; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_dev.reset(); for (int idx = 0; idx < devExamples.size(); idx++) { string result_label; double confidence = predict(devExamples[idx].m_features, result_label); devInsts[idx].Evaluate(result_label, metric_dev); if (!m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(devInsts[idx]); curDecodeInst.assignLabel(result_label, confidence); decodeInstResults.push_back(curDecodeInst); } } metric_dev.print(); if ((!m_options.outBest.empty() && metric_dev.getAccuracy() > bestDIS)) { m_pipe.outputAllInstances(devFile + m_options.outBest, decodeInstResults); bCurIterBetter = true; } if (testNum > 0) { if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idx = 0; idx < testExamples.size(); idx++) { string result_label; double confidence = predict(testExamples[idx].m_features, result_label); testInsts[idx].Evaluate(result_label, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(testInsts[idx]); curDecodeInst.assignLabel(result_label, confidence); decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if ((!m_options.outBest.empty() && bCurIterBetter)) { m_pipe.outputAllInstances(testFile + m_options.outBest, decodeInstResults); } } for (int idx = 0; idx < otherExamples.size(); idx++) { std::cout << "processing " << m_options.testFiles[idx] << std::endl; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idy = 0; idy < otherExamples[idx].size(); idy++) { string result_label; double confidence = predict(otherExamples[idx][idy].m_features, result_label); otherInsts[idx][idy].Evaluate(result_label, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(otherInsts[idx][idy]); curDecodeInst.assignLabel(result_label, confidence); decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if ((!m_options.outBest.empty() && bCurIterBetter)) { m_pipe.outputAllInstances(m_options.testFiles[idx] + m_options.outBest, decodeInstResults); } } if ((m_options.saveIntermediate && metric_dev.getAccuracy() > bestDIS)) { if (metric_dev.getAccuracy() > bestDIS) { std::cout << "Exceeds best previous performance of " << bestDIS << ". Saving model file.." << std::endl; bestDIS = metric_dev.getAccuracy(); } writeModelFile(modelFile); } } else { writeModelFile(modelFile); } }
bool testPowerMetrics() { unsigned int nbok = 0; unsigned int nb = 0; trace.beginBlock ( "Testing separable weighted metrics ..." ); Z2i::Point a( 0,0), bbis(4, 1), b(5,0), bb(5,-10), bbb(5,5),c(10,0); Z2i::Point d(5,-6); Z2i::Point starting( 0, 5), endpoint(10,5); typedef ExactPredicateLpPowerSeparableMetric<Z2i::Space, 2> Metric; Metric metric; trace.info()<< "a= "<<a<<std::endl; trace.info()<< "b= "<<b<<std::endl; trace.info()<< "bb= "<<bb<<std::endl; trace.info()<< "bbb= "<<bbb<<std::endl; trace.info()<< "c= "<<c<<std::endl; trace.info()<< "d= "<<d<<std::endl; bool closer = (metric.closestPower(bbis,a,0,c,0) == DGtal::ClosestFIRST); nbok += (closer) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "a is closer" << std::endl; closer = (metric.closestPower(bbis,a,10,c,35) == DGtal::ClosestFIRST); nbok += (!closer) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "c is closer with w_a=10 w_c=35" << std::endl; trace.endBlock(); trace.beginBlock("Testing Hidden with w=0"); bool hidden =metric.hiddenByPower(a,0,b,0,c,0,starting,endpoint,0); nbok += (!hidden) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "(a,b,c) returns false" << std::endl; hidden =metric.hiddenByPower(a,0,bb,0,c,0,starting,endpoint,0); nbok += (hidden) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "(a,bb,c) returns true" << std::endl; hidden =metric.hiddenByPower(a,0,bbb,0,c,0,starting,endpoint,0); nbok += (!hidden) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "(a,bbb,c) returns false" << std::endl; hidden =metric.hiddenByPower(a,0,d,0,c,0,starting,endpoint,0); nbok += (hidden) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "(a,d,c) returns true" << std::endl; trace.endBlock(); trace.beginBlock("Testing Hidden with w!=0"); hidden =metric.hiddenByPower(a,0,d,30,c,0,starting,endpoint,0); nbok += (hidden) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "(a,0,d,30,c,0) returns true" << std::endl; hidden =metric.hiddenByPower(a,10,d,10,c,10,starting,endpoint,0); nbok += (hidden) ? 1 : 0; nb++; trace.info() << "(" << nbok << "/" << nb << ") " << "(a,10,d,10,c,10) returns true" << std::endl; trace.endBlock(); return nbok == nb; }
void Filler::create_spawns(GEntity* ge,MElementOctree* octree,Node* node,std::vector<Node*>& spawns){ double x,y,z; double x1,y1,z1; double x2,y2,z2; double x3,y3,z3; double x4,y4,z4; double x5,y5,z5; double x6,y6,z6; double h; double h1,h2,h3,h4,h5,h6; Metric m; SPoint3 point; point = node->get_point(); x = point.x(); y = point.y(); z = point.z(); h = node->get_size(); m = node->get_metric(); h1 = improvement(ge,octree,point,h,SVector3(m.get_m11(),m.get_m21(),m.get_m31())); x1 = x + h1*m.get_m11(); y1 = y + h1*m.get_m21(); z1 = z + h1*m.get_m31(); h2 = improvement(ge,octree,point,h,SVector3(-m.get_m11(),-m.get_m21(),-m.get_m31())); x2 = x - h2*m.get_m11(); y2 = y - h2*m.get_m21(); z2 = z - h2*m.get_m31(); h3 = improvement(ge,octree,point,h,SVector3(m.get_m12(),m.get_m22(),m.get_m32())); x3 = x + h3*m.get_m12(); y3 = y + h3*m.get_m22(); z3 = z + h3*m.get_m32(); h4 = improvement(ge,octree,point,h,SVector3(-m.get_m12(),-m.get_m22(),-m.get_m32())); x4 = x - h4*m.get_m12(); y4 = y - h4*m.get_m22(); z4 = z - h4*m.get_m32(); h5 = improvement(ge,octree,point,h,SVector3(m.get_m13(),m.get_m23(),m.get_m33())); x5 = x + h5*m.get_m13(); y5 = y + h5*m.get_m23(); z5 = z + h5*m.get_m33(); h6 = improvement(ge,octree,point,h,SVector3(-m.get_m13(),-m.get_m23(),-m.get_m33())); x6 = x - h6*m.get_m13(); y6 = y - h6*m.get_m23(); z6 = z - h6*m.get_m33(); *spawns[0] = Node(SPoint3(x1,y1,z1)); *spawns[1] = Node(SPoint3(x2,y2,z2)); *spawns[2] = Node(SPoint3(x3,y3,z3)); *spawns[3] = Node(SPoint3(x4,y4,z4)); *spawns[4] = Node(SPoint3(x5,y5,z5)); *spawns[5] = Node(SPoint3(x6,y6,z6)); }
void Filler::print_node(Node* node,std::ofstream& file){ double x,y,z; double x1,y1,z1; double x2,y2,z2; double x3,y3,z3; double x4,y4,z4; double x5,y5,z5; double x6,y6,z6; double h; Metric m; SPoint3 point; point = node->get_point(); x = point.x(); y = point.y(); z = point.z(); h = node->get_size(); m = node->get_metric(); x1 = x + k1*h*m.get_m11(); y1 = y + k1*h*m.get_m21(); z1 = z + k1*h*m.get_m31(); x2 = x - k1*h*m.get_m11(); y2 = y - k1*h*m.get_m21(); z2 = z - k1*h*m.get_m31(); x3 = x + k1*h*m.get_m12(); y3 = y + k1*h*m.get_m22(); z3 = z + k1*h*m.get_m32(); x4 = x - k1*h*m.get_m12(); y4 = y - k1*h*m.get_m22(); z4 = z - k1*h*m.get_m32(); x5 = x + k1*h*m.get_m13(); y5 = y + k1*h*m.get_m23(); z5 = z + k1*h*m.get_m33(); x6 = x - k1*h*m.get_m13(); y6 = y - k1*h*m.get_m23(); z6 = z - k1*h*m.get_m33(); print_segment(SPoint3(x,y,z),SPoint3(x1,y1,z1),file); print_segment(SPoint3(x,y,z),SPoint3(x2,y2,z2),file); print_segment(SPoint3(x,y,z),SPoint3(x3,y3,z3),file); print_segment(SPoint3(x,y,z),SPoint3(x4,y4,z4),file); print_segment(SPoint3(x,y,z),SPoint3(x5,y5,z5),file); print_segment(SPoint3(x,y,z),SPoint3(x6,y6,z6),file); }
// all linear features are extracted from positive examples int Segmentor::createAlphabet(const vector<Instance>& vecInsts) { cout << "Creating Alphabet..." << endl; int numInstance = vecInsts.size(); hash_map<string, int> word_stat; hash_map<string, int> char_stat; hash_map<string, int> bichar_stat; hash_map<string, int> action_stat; hash_map<string, int> feat_stat; assert(numInstance > 0); static Metric eval; static CStateItem state[m_classifier.MAX_SENTENCE_SIZE]; static Feature feat; static vector<string> output; static CAction answer; static int actionNum; m_classifier.initAlphabet(); eval.reset(); for (numInstance = 0; numInstance < vecInsts.size(); numInstance++) { const Instance &instance = vecInsts[numInstance]; for (int idx = 0; idx < instance.wordsize(); idx++) { word_stat[normalize_to_lowerwithdigit(instance.words[idx])]++; } for (int idx = 0; idx < instance.charsize(); idx++) { char_stat[instance.chars[idx]]++; } for (int idx = 0; idx < instance.charsize() - 1; idx++) { bichar_stat[instance.chars[idx] + instance.chars[idx + 1]]++; } bichar_stat[instance.chars[instance.charsize() - 1] + m_classifier.fe.nullkey]++; bichar_stat[m_classifier.fe.nullkey + instance.chars[0]]++; actionNum = 0; state[actionNum].initSentence(&instance.chars); state[actionNum].clear(); while (!state[actionNum].IsTerminated()) { state[actionNum].getGoldAction(instance.words, answer); action_stat[answer.str()]++; m_classifier.extractFeature(state+actionNum, answer, feat); for (int idx = 0; idx < feat._strSparseFeat.size(); idx++) { feat_stat[feat._strSparseFeat[idx]]++; } state[actionNum].move(state+actionNum+1, answer); actionNum++; } if(actionNum-1 != instance.charsize()) { std::cout << "action number is not correct, please check" << std::endl; } state[actionNum].getSegResults(output); instance.evaluate(output, eval); if (!eval.bIdentical()) { std::cout << "error state conversion!" << std::endl; exit(0); } if ((numInstance + 1) % m_options.verboseIter == 0) { cout << numInstance + 1 << " "; if ((numInstance + 1) % (40 * m_options.verboseIter) == 0) cout << std::endl; cout.flush(); } if (m_options.maxInstance > 0 && numInstance == m_options.maxInstance) break; } m_classifier.addToActionAlphabet(action_stat); m_classifier.addToWordAlphabet(word_stat, m_options.wordEmbFineTune ? m_options.wordCutOff : 0); m_classifier.addToCharAlphabet(char_stat, m_options.charEmbFineTune ? m_options.charCutOff : 0); m_classifier.addToBiCharAlphabet(bichar_stat, m_options.tagEmbFineTune ? m_options.tagCutOff : 0); m_classifier.addToFeatureAlphabet(feat_stat, m_options.featCutOff); cout << numInstance << " " << endl; cout << "Action num: " << m_classifier.fe._actionAlphabet.size() << endl; cout << "Total word num: " << word_stat.size() << endl; cout << "Total char num: " << char_stat.size() << endl; cout << "Total bichar num: " << bichar_stat.size() << endl; cout << "Total feat num: " << feat_stat.size() << endl; cout << "Remain word num: " << m_classifier.fe._wordAlphabet.size() << endl; cout << "Remain char num: " << m_classifier.fe._charAlphabet.size() << endl; cout << "Remain bichar num: " << m_classifier.fe._bicharAlphabet.size() << endl; cout << "Remain feat num: " << m_classifier.fe._featAlphabet.size() << endl; //m_classifier.setFeatureCollectionState(false); return 0; }
const std::string RmmKeyGenerator::generate_key(const Metric& metric, const Resource& resource) { const auto& resource_key = resource.get_unique_key(); return resource_key + metric.get_component().to_string() + metric.get_name(); }
// all linear features are extracted from positive examples int Segmentor::createAlphabet(const vector<Instance>& vecInsts) { cout << "Creating Alphabet..." << endl; int numInstance = vecInsts.size(); hash_map<string, int> action_stat; hash_map<string, int> feat_stat; hash_map<string, int> postag_stat; assert(numInstance > 0); static Metric segEval, posEval; static CStateItem state[m_classifier.MAX_SENTENCE_SIZE]; static Feature feat; static CResult output; static CAction answer; static int actionNum; m_classifier.initAlphabet(); segEval.reset(); posEval.reset(); int maxFreqChar = -1; int maxFreqWord = -1; for (numInstance = 0; numInstance < vecInsts.size(); numInstance++) { const Instance &instance = vecInsts[numInstance]; for (int idx = 0; idx < instance.postagsize(); idx++) { postag_stat[instance.postags[idx]]; m_classifier.fe._tagConstraints.addWordPOSPair(instance.words[idx], instance.postags[idx]); } } m_classifier.addToPostagAlphabet(postag_stat); for (numInstance = 0; numInstance < vecInsts.size(); numInstance++) { const Instance &instance = vecInsts[numInstance]; actionNum = 0; state[actionNum].initSentence(&instance.chars, &instance.candidateLabels); state[actionNum].clear(); while (!state[actionNum].IsTerminated()) { state[actionNum].getGoldAction(instance, m_classifier.fe._postagAlphabet, answer); action_stat[answer.str()]++; m_classifier.extractFeature(state + actionNum, answer, feat); for (int idx = 0; idx < feat._strSparseFeat.size(); idx++) { feat_stat[feat._strSparseFeat[idx]]++; } state[actionNum].move(state + actionNum + 1, answer, m_classifier.fe._postagAlphabet); actionNum++; } if (actionNum - 1 != instance.charsize()) { std::cout << "action number is not correct, please check" << std::endl; } state[actionNum].getSegPosResults(output); instance.evaluate(output, segEval, posEval); if (!segEval.bIdentical() || !posEval.bIdentical()) { std::cout << "error state conversion!" << std::endl; std::cout << "output instance:" << std::endl; for (int tmpK = 0; tmpK < instance.words.size(); tmpK++) { std::cout << instance.words[tmpK] << "_" << instance.postags[tmpK] << " "; } std::cout << std::endl; std::cout << "predicated instance:" << std::endl; for (int tmpK = 0; tmpK < output.size(); tmpK++) { std::cout << output.words[tmpK] << "_" << output.postags[tmpK] << " "; } std::cout << std::endl; exit(0); } if ((numInstance + 1) % m_options.verboseIter == 0) { cout << numInstance + 1 << " "; if ((numInstance + 1) % (40 * m_options.verboseIter) == 0) cout << std::endl; cout.flush(); } if (m_options.maxInstance > 0 && numInstance == m_options.maxInstance) break; } m_classifier.addToActionAlphabet(action_stat); m_classifier.addToFeatureAlphabet(feat_stat, m_options.featCutOff); cout << numInstance << " " << endl; cout << "Action num: " << m_classifier.fe._actionAlphabet.size() << endl; cout << "Pos num: " << m_classifier.fe._postagAlphabet.size() << endl; cout << "Total feat num: " << feat_stat.size() << endl; cout << "Remain feat num: " << m_classifier.fe._featAlphabet.size() << endl; //m_classifier.setFeatureCollectionState(false); return 0; }
void Labeler::train(const string& trainFile, const string& devFile, const string& testFile, const string& modelFile, const string& optionFile, const string& wordEmbFile) { if (optionFile != "") m_options.load(optionFile); m_options.showOptions(); vector<Instance> trainInsts, devInsts, testInsts; static vector<Instance> decodeInstResults; static Instance curDecodeInst; bool bCurIterBetter = false; m_pipe.readInstances(trainFile, trainInsts, m_options.maxInstance); if (devFile != "") m_pipe.readInstances(devFile, devInsts, m_options.maxInstance); if (testFile != "") m_pipe.readInstances(testFile, testInsts, m_options.maxInstance); //Ensure that each file in m_options.testFiles exists! vector<vector<Instance> > otherInsts(m_options.testFiles.size()); for (int idx = 0; idx < m_options.testFiles.size(); idx++) { m_pipe.readInstances(m_options.testFiles[idx], otherInsts[idx], m_options.maxInstance); } //std::cout << "Training example number: " << trainInsts.size() << std::endl; //std::cout << "Dev example number: " << trainInsts.size() << std::endl; //std::cout << "Test example number: " << trainInsts.size() << std::endl; createAlphabet(trainInsts); if (!m_options.wordEmbFineTune) { addTestWordAlpha(devInsts); addTestWordAlpha(testInsts); for (int idx = 0; idx < otherInsts.size(); idx++) { addTestWordAlpha(otherInsts[idx]); } cout << "Remain words num: " << m_wordAlphabet.size() << endl; } NRMat<dtype> wordEmb; if (wordEmbFile != "") { readWordEmbeddings(wordEmbFile, wordEmb); } else { wordEmb.resize(m_wordAlphabet.size(), m_options.wordEmbSize); wordEmb.randu(1000); } NRVec<NRMat<dtype> > tagEmbs(m_tagAlphabets.size()); for (int idx = 0; idx < tagEmbs.size(); idx++) { tagEmbs[idx].resize(m_tagAlphabets[idx].size(), m_options.tagEmbSize); tagEmbs[idx].randu(1002 + idx); } m_classifier.init(m_labelAlphabet.size(), m_featAlphabet.size()); m_classifier.setDropValue(m_options.dropProb); vector<Example> trainExamples, devExamples, testExamples; initialExamples(trainInsts, trainExamples); initialExamples(devInsts, devExamples); initialExamples(testInsts, testExamples); vector<int> otherInstNums(otherInsts.size()); vector<vector<Example> > otherExamples(otherInsts.size()); for (int idx = 0; idx < otherInsts.size(); idx++) { initialExamples(otherInsts[idx], otherExamples[idx]); otherInstNums[idx] = otherExamples[idx].size(); } dtype bestDIS = 0; int inputSize = trainExamples.size(); int batchBlock = inputSize / m_options.batchSize; if (inputSize % m_options.batchSize != 0) batchBlock++; srand(0); std::vector<int> indexes; for (int i = 0; i < inputSize; ++i) indexes.push_back(i); static Metric eval, metric_dev, metric_test; static vector<Example> subExamples; int devNum = devExamples.size(), testNum = testExamples.size(); for (int iter = 0; iter < m_options.maxIter; ++iter) { std::cout << "##### Iteration " << iter << std::endl; random_shuffle(indexes.begin(), indexes.end()); eval.reset(); for (int updateIter = 0; updateIter < batchBlock; updateIter++) { subExamples.clear(); int start_pos = updateIter * m_options.batchSize; int end_pos = (updateIter + 1) * m_options.batchSize; if (end_pos > inputSize) end_pos = inputSize; for (int idy = start_pos; idy < end_pos; idy++) { subExamples.push_back(trainExamples[indexes[idy]]); } int curUpdateIter = iter * batchBlock + updateIter; dtype cost = m_classifier.process(subExamples, curUpdateIter); eval.overall_label_count += m_classifier._eval.overall_label_count; eval.correct_label_count += m_classifier._eval.correct_label_count; if ((curUpdateIter + 1) % m_options.verboseIter == 0) { //m_classifier.checkgrads(subExamples, curUpdateIter+1); std::cout << "current: " << updateIter + 1 << ", total block: " << batchBlock << std::endl; std::cout << "Cost = " << cost << ", Tag Correct(%) = " << eval.getAccuracy() << std::endl; } m_classifier.updateParams(m_options.regParameter, m_options.adaAlpha, m_options.adaEps); } if (devNum > 0) { bCurIterBetter = false; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_dev.reset(); for (int idx = 0; idx < devExamples.size(); idx++) { vector<string> result_labels; predict(devExamples[idx].m_features, result_labels, devInsts[idx].words); if (m_options.seg) devInsts[idx].SegEvaluate(result_labels, metric_dev); else devInsts[idx].Evaluate(result_labels, metric_dev); if (!m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(devInsts[idx]); curDecodeInst.assignLabel(result_labels); decodeInstResults.push_back(curDecodeInst); } } metric_dev.print(); if (!m_options.outBest.empty() && metric_dev.getAccuracy() > bestDIS) { m_pipe.outputAllInstances(devFile + m_options.outBest, decodeInstResults); bCurIterBetter = true; } if (testNum > 0) { if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idx = 0; idx < testExamples.size(); idx++) { vector<string> result_labels; predict(testExamples[idx].m_features, result_labels, testInsts[idx].words); if (m_options.seg) testInsts[idx].SegEvaluate(result_labels, metric_test); else testInsts[idx].Evaluate(result_labels, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(testInsts[idx]); curDecodeInst.assignLabel(result_labels); decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if (!m_options.outBest.empty() && bCurIterBetter) { m_pipe.outputAllInstances(testFile + m_options.outBest, decodeInstResults); } } for (int idx = 0; idx < otherExamples.size(); idx++) { std::cout << "processing " << m_options.testFiles[idx] << std::endl; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idy = 0; idy < otherExamples[idx].size(); idy++) { vector<string> result_labels; predict(otherExamples[idx][idy].m_features, result_labels, otherInsts[idx][idy].words); if (m_options.seg) otherInsts[idx][idy].SegEvaluate(result_labels, metric_test); else otherInsts[idx][idy].Evaluate(result_labels, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { curDecodeInst.copyValuesFrom(otherInsts[idx][idy]); curDecodeInst.assignLabel(result_labels); decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if (!m_options.outBest.empty() && bCurIterBetter) { m_pipe.outputAllInstances(m_options.testFiles[idx] + m_options.outBest, decodeInstResults); } } if (m_options.saveIntermediate && metric_dev.getAccuracy() > bestDIS) { std::cout << "Exceeds best previous performance of " << bestDIS << ". Saving model file.." << std::endl; bestDIS = metric_dev.getAccuracy(); writeModelFile(modelFile); } } // Clear gradients } }
void Segmentor::train(const string& trainFile, const string& devFile, const string& testFile, const string& modelFile, const string& optionFile, const string& wordEmbFile, const string& charEmbFile, const string& bicharEmbFile) { if (optionFile != "") m_options.load(optionFile); m_options.showOptions(); vector<Instance> trainInsts, devInsts, testInsts; m_pipe.readInstances(trainFile, trainInsts, m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); if (devFile != "") m_pipe.readInstances(devFile, devInsts, m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); if (testFile != "") m_pipe.readInstances(testFile, testInsts, m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); vector<vector<Instance> > otherInsts(m_options.testFiles.size()); for (int idx = 0; idx < m_options.testFiles.size(); idx++) { m_pipe.readInstances(m_options.testFiles[idx], otherInsts[idx], m_classifier.MAX_SENTENCE_SIZE - 2, m_options.maxInstance); } createAlphabet(trainInsts); addTestWordAlpha(devInsts); addTestWordAlpha(testInsts); NRMat<dtype> wordEmb, allwordEmb; if (wordEmbFile != "") { allWordAlphaEmb(wordEmbFile, allwordEmb); } else { std::cout << "must not be here, allword must be pretrained." << std::endl; } wordEmb.resize(m_classifier.fe._wordAlphabet.size(), m_options.wordEmbSize); wordEmb.randu(1000); cout << "word emb dim is " << wordEmb.ncols() << std::endl; NRMat<dtype> charEmb; if (charEmbFile != "") { readEmbeddings(m_classifier.fe._charAlphabet, charEmbFile, charEmb); } else { charEmb.resize(m_classifier.fe._charAlphabet.size(), m_options.charEmbSize); charEmb.randu(2000); } cout << "char emb dim is " << charEmb.ncols() << std::endl; NRMat<dtype> bicharEmb; if (bicharEmbFile != "") { readEmbeddings(m_classifier.fe._bicharAlphabet, bicharEmbFile, bicharEmb); } else { bicharEmb.resize(m_classifier.fe._bicharAlphabet.size(), m_options.bicharEmbSize); bicharEmb.randu(2000); } cout << "bichar emb dim is " << bicharEmb.ncols() << std::endl; NRMat<dtype> actionEmb; actionEmb.resize(m_classifier.fe._actionAlphabet.size(), m_options.actionEmbSize); actionEmb.randu(3000); cout << "action emb dim is " << actionEmb.ncols() << std::endl; NRMat<dtype> lengthEmb; lengthEmb.resize(6, m_options.lengthEmbSize); lengthEmb.randu(3000); cout << "length emb dim is " << actionEmb.ncols() << std::endl; m_classifier.init(wordEmb, allwordEmb, lengthEmb, m_options.wordNgram, m_options.wordHiddenSize, m_options.wordRNNHiddenSize, charEmb, bicharEmb, m_options.charcontext, m_options.charHiddenSize, m_options.charRNNHiddenSize, actionEmb, m_options.actionNgram, m_options.actionHiddenSize, m_options.actionRNNHiddenSize, m_options.sepHiddenSize, m_options.appHiddenSize, m_options.delta); m_classifier.setDropValue(m_options.dropProb); m_classifier.setOOVFreq(m_options.wordCutOff); m_classifier.setOOVRatio(m_options.oovRatio); m_classifier.setWordFreq(m_word_stat); vector<vector<CAction> > trainInstGoldactions; getGoldActions(trainInsts, trainInstGoldactions); double bestFmeasure = 0; int inputSize = trainInsts.size(); std::vector<int> indexes; for (int i = 0; i < inputSize; ++i) indexes.push_back(i); static Metric eval, metric_dev, metric_test; int maxIter = m_options.maxIter * (inputSize / m_options.batchSize + 1); int oneIterMaxRound = (inputSize + m_options.batchSize -1) / m_options.batchSize; std::cout << "maxIter = " << maxIter << std::endl; int devNum = devInsts.size(), testNum = testInsts.size(); static vector<vector<string> > decodeInstResults; static vector<string> curDecodeInst; static bool bCurIterBetter; static vector<vector<string> > subInstances; static vector<vector<CAction> > subInstGoldActions; for (int iter = 0; iter < maxIter; ++iter) { std::cout << "##### Iteration " << iter << std::endl; srand(iter); random_shuffle(indexes.begin(), indexes.end()); std::cout << "random: " << indexes[0] << ", " << indexes[indexes.size() - 1] << std::endl; bool bEvaluate = false; if(m_options.batchSize == 1){ eval.reset(); bEvaluate = true; for (int idy = 0; idy < inputSize; idy++) { subInstances.clear(); subInstGoldActions.clear(); subInstances.push_back(trainInsts[indexes[idy]].chars); subInstGoldActions.push_back(trainInstGoldactions[indexes[idy]]); double cost = m_classifier.train(subInstances, subInstGoldActions); eval.overall_label_count += m_classifier._eval.overall_label_count; eval.correct_label_count += m_classifier._eval.correct_label_count; if ((idy + 1) % (m_options.verboseIter*10) == 0) { std::cout << "current: " << idy + 1 << ", Cost = " << cost << ", Correct(%) = " << eval.getAccuracy() << std::endl; } m_classifier.updateParams(m_options.regParameter, m_options.adaAlpha, m_options.adaEps, m_options.clip); } std::cout << "current: " << iter + 1 << ", Correct(%) = " << eval.getAccuracy() << std::endl; } else{ if(iter == 0)eval.reset(); subInstances.clear(); subInstGoldActions.clear(); for (int idy = 0; idy < m_options.batchSize; idy++) { subInstances.push_back(trainInsts[indexes[idy]].chars); subInstGoldActions.push_back(trainInstGoldactions[indexes[idy]]); } double cost = m_classifier.train(subInstances, subInstGoldActions); eval.overall_label_count += m_classifier._eval.overall_label_count; eval.correct_label_count += m_classifier._eval.correct_label_count; if ((iter + 1) % (m_options.verboseIter) == 0) { std::cout << "current: " << iter + 1 << ", Cost = " << cost << ", Correct(%) = " << eval.getAccuracy() << std::endl; eval.reset(); bEvaluate = true; } m_classifier.updateParams(m_options.regParameter, m_options.adaAlpha, m_options.adaEps, m_options.clip); } if (bEvaluate && devNum > 0) { bCurIterBetter = false; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_dev.reset(); for (int idx = 0; idx < devInsts.size(); idx++) { predict(devInsts[idx], curDecodeInst); devInsts[idx].evaluate(curDecodeInst, metric_dev); if (!m_options.outBest.empty()) { decodeInstResults.push_back(curDecodeInst); } } std::cout << "dev:" << std::endl; metric_dev.print(); if (!m_options.outBest.empty() && metric_dev.getAccuracy() > bestFmeasure) { m_pipe.outputAllInstances(devFile + m_options.outBest, decodeInstResults); bCurIterBetter = true; } if (testNum > 0) { if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idx = 0; idx < testInsts.size(); idx++) { predict(testInsts[idx], curDecodeInst); testInsts[idx].evaluate(curDecodeInst, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if (!m_options.outBest.empty() && bCurIterBetter) { m_pipe.outputAllInstances(testFile + m_options.outBest, decodeInstResults); } } for (int idx = 0; idx < otherInsts.size(); idx++) { std::cout << "processing " << m_options.testFiles[idx] << std::endl; if (!m_options.outBest.empty()) decodeInstResults.clear(); metric_test.reset(); for (int idy = 0; idy < otherInsts[idx].size(); idy++) { predict(otherInsts[idx][idy], curDecodeInst); otherInsts[idx][idy].evaluate(curDecodeInst, metric_test); if (bCurIterBetter && !m_options.outBest.empty()) { decodeInstResults.push_back(curDecodeInst); } } std::cout << "test:" << std::endl; metric_test.print(); if (!m_options.outBest.empty() && bCurIterBetter) { m_pipe.outputAllInstances(m_options.testFiles[idx] + m_options.outBest, decodeInstResults); } } if (m_options.saveIntermediate && metric_dev.getAccuracy() > bestFmeasure) { std::cout << "Exceeds best previous DIS of " << bestFmeasure << ". Saving model file.." << std::endl; bestFmeasure = metric_dev.getAccuracy(); writeModelFile(modelFile); } } } }