void HwcmScorer::extractHeadWordChain(TreePointer tree, vector<string> & history, vector<map<string, int> > & hwc) { if (tree->GetLength() > 0) { string head = getHead(tree); if (head.empty()) { for (std::vector<TreePointer>::const_iterator it = tree->GetChildren().begin(); it != tree->GetChildren().end(); ++it) { extractHeadWordChain(*it, history, hwc); } } else { vector<string> new_history(kHwcmOrder); new_history[0] = head; hwc[0][head]++; for (size_t hist_idx = 0; hist_idx < kHwcmOrder-1; hist_idx++) { if (!history[hist_idx].empty()) { string chain = history[hist_idx] + " " + head; hwc[hist_idx+1][chain]++; if (hist_idx+2 < kHwcmOrder) { new_history[hist_idx+1] = chain; } } } for (std::vector<TreePointer>::const_iterator it = tree->GetChildren().begin(); it != tree->GetChildren().end(); ++it) { extractHeadWordChain(*it, new_history, hwc); } } } }
string HwcmScorer::getHead(TreePointer tree) { // assumption (only true for dependency parse: each constituent has a preterminal label, and corresponding terminal is head) // if constituent has multiple preterminals, first one is picked; if it has no preterminals, empty string is returned for (std::vector<TreePointer>::const_iterator it = tree->GetChildren().begin(); it != tree->GetChildren().end(); ++it) { TreePointer child = *it; if (child->GetLength() == 1 && child->GetChildren()[0]->IsTerminal()) { return child->GetChildren()[0]->GetLabel(); } } return ""; }