Classes | |
| class | Buffer |
| Class implements words buffer. More... | |
| class | TaggedLemma |
| Class holds lemma and tag indices together. More... | |
| struct | ContextWindow |
| Type holding context window settings. More... | |
| struct | Sieves |
| Lower bounds for evaluated N-grams. More... | |
| struct | Thresholds |
| Thresholds for various statistical tests evaluations. More... | |
| class | PartOfContext |
| Part of context is smallest unit used for context tracing functionality. More... | |
| class | EvaluationTables |
| Class evaluates contigency and expected frequency tables. More... | |
| class | index_overflow |
| Exception is thrown if incrementation of (unsigned) index would cause overflow of index datatype. More... | |
| class | Filter |
| Abstract class declares some basic interface for counting filter rules matches. More... | |
| class | UniRulesFilter |
| Class declares interface of unirules filter. More... | |
| class | MultiRulesFilter |
| Class declares interface of multirules filter. More... | |
| class | HashVector |
| Class for storage of. More... | |
| class | NGram |
| Class designed to keep single NGram data. More... | |
| class | NGramToken |
| Class purpose is to offer creation of temporary NGram instance to be used as a key value (token) when searching in NGramStore hashtables. More... | |
| class | NGramHashCounter |
| Class provides hashing function for N-grams instances. More... | |
| struct | NGramHasher |
| Functor, which wraps hashing function (NGramHashCounter::count_hash()). More... | |
| class | NGramStore |
| Class serves as NGram(s) store. More... | |
| struct | NGramFrequencies |
| Struct holds table of all *-frequencies of certain N-gram. More... | |
| class | NGramsProcessor |
| NGramsProcessor simplifies gathering of frequency stats of given N-gram (and all its downcasted types) from NGramStore container. More... | |
| class | Notifier |
| Notifier is very simple - just binds itself to given stream and flushes passed notification to that stream. More... | |
| class | ProgressNotifier |
| ProgressNotifier holds static stored timer within, which starts timing of program execution time immediately on program start. More... | |
| class | NTreeNode |
| Class holds information of single indexed-tree node. More... | |
| class | Overflow |
| Class serves as mapping between objects of given type and (some) statistics, which shall be stored in _ValueType. More... | |
| class | Parser |
| Parser object binds itself to open ifstream object, which corresponds to certain input datafile. More... | |
| class | MemoryBlock |
| Memory block is designed to keep large continuous chunk of memory within and provides methods for subsequent transfer of small pieces to client subjects. More... | |
| class | MemoryPool |
| Definition of memory pool (abstract) interface. More... | |
| class | RandomSizedMemoryPool |
| RandomSizedMemoryPool, as its name suggests, allows to transfer various-sized chunks of memory. More... | |
| class | VoidPool |
| VoidPool performs as pool of void type, although internally it's implementation is based on pool of char(s). More... | |
| class | fatal_error |
| Exception class for indication of fatal error occurrence in process phase. More... | |
| struct | DataFileStats |
| Class implements simple statistics counter. More... | |
| struct | NamedDataFileStats |
| Class extends DataFileStats: adds "label" (for filename). More... | |
| class | StringStore |
| Class purpose is to store unique C-strings and let them be accessed (later) by unique numeral indices of given type (which shall be the smallest one unsigned type capable of store as much values as the number of C-strings expected). More... | |
| class | StringStoreLessComparator |
| Functor designed for less than comparison of C-strings stored within StringStore. More... | |
| class | StringStoreEqualComparator |
| Functor designed for equal to comparison of C-strings stored within StringStore. More... | |
| struct | select1st |
| Template for functor class for function returning 1st pair member. More... | |
| struct | select2nd |
| Template for functor class for function returning 2nd pair member. More... | |
| class | SumDecompositions |
| Class purpose is to hold information about all decompositions of given sum (N) into given number of addends (M). More... | |
| class | Word |
| Word class holds all information for single input word entity. More... | |
Namespaces | |
| namespace | constants |
| All program constants happily live together in this nicenamed namespace. | |
| namespace | dependency |
| Special namespace used to keep variables with indices of few special dependency values. | |
| namespace | eval |
| Namespace wraps evaluation functions. | |
| namespace | notifier |
| Module declares (global) variables, which are wrapped into separate namespace. | |
| namespace | persistent |
| Namespace holds (global) variables. | |
| namespace | settings |
| Namespace holds (global) program settings and functions checking their corectness. | |
Typedefs | |
| typedef std::pair < words_store_t::const_iterator, words_store_t::const_iterator > | words_range_t |
| Words range is a range within the buffer, so its based on word buffer container type (see word.h). | |
| typedef unsigned short | word_order_t |
| This typedef determines max_sentence_length constant definition (maximum allowed value of this type is max sentence length). | |
| typedef unsigned int | string_index_t |
| This type must be capable of holding number of all different strings which are stored in certain store. | |
| typedef unsigned char | ngram_size_t |
| Type used for representation of N value and also for indexing of N-gram members (so must be capable of holding constants::max_grammity value). | |
| typedef unsigned char | dependency_index_t |
| This type must be capable of holding number of all different dependency types which may occurr within program run. | |
| typedef unsigned char | tag_index_t |
| This type must be capable of holding number of all different tags which may occurr within program run. | |
| typedef unsigned char | ngram_type_t |
| This type must be capable of holding |max_grammity| bits (not the value!). | |
| typedef unsigned int | context_frequency_t |
| Definition of type used for context frequency counter. | |
| typedef std::vector < PartOfContext > | context_t |
| Full context is made out from parts of context. | |
| typedef std::map< std::string, dependency_index_t > | string_to_dependency_t |
| typedef std::vector< std::string > | dependency_to_string_t |
| typedef std::vector< double > | exp_table_t |
| Container acts like table of expected frequencies (which are floating point numbers). | |
| typedef std::string | unirule_t |
| Rule is just a single string. | |
| typedef std::vector< std::string > | multirule_t |
| Rule is a bunch of strings. | |
| typedef std::vector< size_t > | filter_matches_stats_t |
| For each rule there's a counter of values matched. | |
| typedef std::pair < ngram_size_t, words_store_t::const_iterator > | raw_ngram_member_t |
| Typedef for pair of values for each raw N-gram member: 1. | |
| typedef std::vector < raw_ngram_member_t > | raw_ngram_t |
| Raw N-gram is just collection of owned members. | |
| typedef unsigned int | freq_counter_t |
| Frequency counter datatype definition. | |
| typedef size_t | frequency_t |
| Frequency datatype definition. | |
| typedef std::vector< frequency_t > | freq_table_t |
| Container for frequency statistics. | |
| typedef boost::progress_display | progress_display_t |
| Simply progress bar display is provided by one of boost libraries. | |
| typedef std::vector< subtree_t > | subtrees_container_t |
| Vector is used as subtrees container. | |
| typedef std::vector < subtrees_container_t > | subtrees_to_merge_t |
| Sometimes we need container for container of subtrees :). | |
| typedef size_t | tree_size_t |
| Type definition for tree size type used to store NTree indices. | |
| typedef std::vector< tree_size_t > | subtree_t |
| To represent subtrees (and resultant n-trees) vector is used. | |
| typedef std::set< tree_size_t > | childs_t |
| Childs of certain node are stored in set, because we want them sorted and there's no need for random access. | |
| typedef std::multimap < tree_size_t, subtree_t > | mapped_subtrees_t |
| Mapped subtrees are subtrees with some extra special information - key. | |
| typedef std::vector< NTreeNode > | ntree_t |
| To represent indexed-tree the std::vector of NTreeNode instances is used. | |
| typedef std::map< std::string, Filter::Stats > | morphologic_filter_file_stats_t |
| Mapping between files and their morphologic filter related statistics. | |
| typedef std::vector< std::string > | filenames_t |
| Container for input filenames. | |
| typedef std::vector < NamedDataFileStats > | stats_t |
| Typedef for container of datafiles statistics. | |
| typedef std::vector < std::vector< size_t > > | sum_decompositions_t |
| The container with tables (vectors of unsigned) which are decompositions of certain number (N) into certain number of addends (M). | |
| typedef std::deque< Word > | words_store_t |
| (Forward) declaration of container used to buffering of words (see buffer.h). | |
| typedef std::vector< double > | scores_t |
| Container for scores and statistics thresholds. | |
Functions | |
| string_to_dependency_t | _init_str2dep (void) |
| Inits string to dependency mapping. | |
| dependency_index_t | string2dependency (std::string &str) |
| Converts given string to index representation of dependency value. | |
| const char * | dependency2cstring (dependency_index_t dependency_index) |
| Converts given index to string representation of dependency value. | |
| double | _log10 (double x) |
| "Override" of std::log10 (return DBL_MIN, if given x is zero). | |
| void | _build_tree (const words_range_t &sentence, ntree_t &nodes) |
| Builds a ntree based on grammar dependency tree of given sentence. | |
| words_store_t::const_iterator | _extract_ngram (words_store_t::const_iterator sentence_start, const subtree_t &subtree, raw_ngram_t &raw_ngram) |
| Extracts raw N-gram related to passed subtree. | |
| NGram * | _store_ngram (const raw_ngram_t &raw_ngram, ngram_type_t type) |
| Stores instance of given N-gram. | |
| void | _store_wide_context (NGram *ngram, const raw_ngram_t &raw_ngram, words_range_t context_range) |
| Stores wide context for given N-gram. | |
| void | _store_raw_ngram (const raw_ngram_t &raw_ngram, const Buffer &buffer, NamedDataFileStats &stats, words_store_t::const_iterator head) |
| Stores all *-types for given raw N-gram and updates related file stats counter. | |
| size_t | extract (std::ifstream &input_file, NamedDataFileStats &stats) |
| Procedure extracts N-grams from given input datafile and counts file stats. | |
| bool | match_single_rule (const std::string &rule, const std::string &value_to_match) |
| Returns true, if value to match (VTM) matches the rule (R). | |
| bool | read_and_check_input_datafiles (filenames_t &files_to_process, size_t &files_total_size) |
| Reads filenames of input datafiles (from given file). | |
| bool | startup_init (void) |
| Initializes whatever has to be initialized on startup. | |
| bool | operator== (const NGram &lhs, const NGram &rhs) |
| Equal to operator for NGrams. | |
| bool | operator< (const NGram &lhs, const NGram &rhs) |
| Less than operator for NGrams. | |
| void | print_usage (void) |
| Procedure prints program usage information. | |
| subtrees_to_merge_t::size_type | _product (subtrees_to_merge_t::size_type product_so_far, const subtrees_container_t &subtrees) |
| void | _join_subtrees (const subtrees_to_merge_t &subtrees_by_root_node, subtrees_container_t &results) |
| void | extract_subtrees (ntree_t &nodes, tree_size_t subtree_size, mapped_subtrees_t &results) |
| Extracts indexed subtrees of given size from passed indexed-tree. | |
| bool | operator< (const NTreeNode &lhs, const NTreeNode &rhs) |
| Comparison function for NTreeNode(s). | |
| size_t | _ascii2size_t (const char *str) |
| Performs "safe" conversion from C-string to size_t type. | |
| void | parse_params (int argc, char **argv) |
| Procedure parses program params and initializes members of settings namespace. | |
| void | process (filenames_t &files_to_process, size_t files_total_size) |
| Procedure processes input datafiles from given list (filename). | |
| std::string | unsigned2str (size_t num) |
| Converts given unsigned number to string. | |
| std::string | double2str (double num, unsigned precision=0) |
| Converts given floating point number to string. | |
| size_t | _get_decompositions (size_t sum, size_t addends_count, sum_decompositions_t &results) |
| This helper function gets all possible ways to count `sum` as sum of `addends` addends. | |
| template<typename _Type> | |
| bool | ith_bit (_Type expr, size_t i) |
| template<typename _Type> | |
| _Type | set_ith_bit (_Type expr, size_t i, bool on=true) |
| template<typename _Type> | |
| size_t | bits_in (_Type expr) |
| template<typename _Type> | |
| _Type | ff (void) |
| void | _print_member (std::ostream &output, ngram_size_t index, const NGram::Member *member) |
| Helper procedure - prints given N-gram member to given output stream. | |
| void | _print_members (std::ostream &output, const NGram *ngram) |
| Helper procedure - prints given N-gram members to given output stream. | |
| void | _print_n_members (std::ostream &output, const NGram *ngram) |
| Helper procedure - prints given N-gram members to given output stream. | |
| void | _print_tables (std::ostream &output, const EvaluationTables &tables) |
| Helper procedure - prints certain N-gram evaluation tables to given output stream. | |
| void | _print_stats (std::ostream &output, const scores_t &scores) |
| Helper procedure - prints certain N-gram statistics to given output stream. | |
| void | write (std::ofstream &output_file) |
| Procedure writes extracted ngrams with their frequency count to given output file in plain text (tab-separated) format. | |
| void | _print_context (std::ofstream &output_file, const context_t *ctx, bool narrow=false) |
| void | write_contexts (std::ofstream &output_file) |
| Procedure writes extracted ngrams contexts to given output file. | |
| void | write_stats (std::ofstream &output_file) |
| Procedure writes datafiles statistics to given output file. | |
| void | write_morphologic_filter_stats (std::ofstream &output_file) |
| Procedure writes morphologic filtration statistics to given output file. | |
Variables | |
| const dependency_index_t | number_of_dependencies = 118 |
| const char * | dependency_to_cstring_tab [number_of_dependencies] |
| const ngram_size_t | _max_n = 8 |
| Maximum N for which hashing function bitmask tables are prepared. | |
| const size_t | _lemma_bitmasks [_max_n+1] |
| const size_t | _dependency_bitmasks [_max_n+1] |
| const size_t | _parent_bitmasks [_max_n+1] |
| const size_t | _dp_shift [_max_n+1] |
| const size_t | _bits_per_member [_max_n+1] = { 0, 32, 16, 11, 8, 7, 6, 5, 4 } |
| unsigned | processing_progress_precision = 4 |
| const char | fields_separator = '\t' |
| CSV format separator. | |
Some modules declares their own namespaces.
| typedef std::set<tree_size_t> ace::childs_t |
Childs of certain node are stored in set, because we want them sorted and there's no need for random access.
They are determined in the same way as in subtree_t container (by indices stored within container).
| typedef unsigned int ace::context_frequency_t |
Definition of type used for context frequency counter.
| typedef std::vector<PartOfContext> ace::context_t |
Full context is made out from parts of context.
| typedef unsigned char ace::dependency_index_t |
This type must be capable of holding number of all different dependency types which may occurr within program run.
| typedef std::vector<std::string> ace::dependency_to_string_t |
| typedef std::vector<double> ace::exp_table_t |
Container acts like table of expected frequencies (which are floating point numbers).
| typedef std::vector<std::string> ace::filenames_t |
Container for input filenames.
| typedef std::vector<size_t> ace::filter_matches_stats_t |
For each rule there's a counter of values matched.
| typedef unsigned int ace::freq_counter_t |
Frequency counter datatype definition.
We should keep in mind, that value of ngram_type_t type is stored within most significant bits of frequency counter, so its datatype should be big enough.
| typedef std::vector<frequency_t> ace::freq_table_t |
Container for frequency statistics.
| typedef size_t ace::frequency_t |
Frequency datatype definition.
It differs from freq_counter_t - frequency_t is frequency output (result) datatype (it's a type stored in overflow counters map and used for statistics evaluations). Despite this it may be the same type as freq_counter_t.
| typedef std::multimap<tree_size_t, subtree_t> ace::mapped_subtrees_t |
Mapped subtrees are subtrees with some extra special information - key.
The key can stand for one of the following: 1. Size of related subtree (use during extractor processing). 2. Index of root node of related subtree (used in resultant set). Multimap is used mainly because of the former. Multimap makes possible to access subtrees of given size in O(log N) time, where N stands for number of all subtrees of certain node.
| typedef std::map<std::string, Filter::Stats> ace::morphologic_filter_file_stats_t |
Mapping between files and their morphologic filter related statistics.
| typedef std::vector<std::string> ace::multirule_t |
Rule is a bunch of strings.
| typedef unsigned char ace::ngram_size_t |
Type used for representation of N value and also for indexing of N-gram members (so must be capable of holding constants::max_grammity value).
| typedef unsigned char ace::ngram_type_t |
This type must be capable of holding |max_grammity| bits (not the value!).
| typedef std::vector<NTreeNode> ace::ntree_t |
To represent indexed-tree the std::vector of NTreeNode instances is used.
It's essential that vector is indexed in the same fashion as NTreeNode instances it contains, so NTreeNode instance at i-th position has to have its index equal to i. Therefore, the first NTreeNode in ntree_t (at 0-th position) shall be abstract root (see definition of NTreeNode class) and is considered that way by extract_subtrees() procedure).
| typedef boost::progress_display ace::progress_display_t |
Simply progress bar display is provided by one of boost libraries.
| typedef std::pair<ngram_size_t, words_store_t::const_iterator> ace::raw_ngram_member_t |
Typedef for pair of values for each raw N-gram member: 1.
index of parent node in extracted N-gram (0 if root) 2. iterator to the corresponding word instance
| typedef std::vector<raw_ngram_member_t> ace::raw_ngram_t |
Raw N-gram is just collection of owned members.
| typedef std::vector<double> ace::scores_t |
Container for scores and statistics thresholds.
| typedef std::vector<NamedDataFileStats> ace::stats_t |
Typedef for container of datafiles statistics.
| typedef unsigned int ace::string_index_t |
This type must be capable of holding number of all different strings which are stored in certain store.
| typedef std::map<std::string, dependency_index_t> ace::string_to_dependency_t |
| typedef std::vector<tree_size_t> ace::subtree_t |
To represent subtrees (and resultant n-trees) vector is used.
Subtree nodes are determined by indices stored within container, so all values shall be unique (but it's not necessary to have them sorted all the time).
| typedef std::vector<subtree_t> ace::subtrees_container_t |
Vector is used as subtrees container.
| typedef std::vector<subtrees_container_t> ace::subtrees_to_merge_t |
Sometimes we need container for container of subtrees :).
| typedef std::vector<std::vector<size_t> > ace::sum_decompositions_t |
The container with tables (vectors of unsigned) which are decompositions of certain number (N) into certain number of addends (M).
| typedef unsigned char ace::tag_index_t |
This type must be capable of holding number of all different tags which may occurr within program run.
| typedef size_t ace::tree_size_t |
Type definition for tree size type used to store NTree indices.
Must be one of the unsigned types!
| typedef std::string ace::unirule_t |
Rule is just a single string.
| typedef unsigned short ace::word_order_t |
This typedef determines max_sentence_length constant definition (maximum allowed value of this type is max sentence length).
| typedef std::pair<words_store_t::const_iterator, words_store_t::const_iterator> ace::words_range_t |
Words range is a range within the buffer, so its based on word buffer container type (see word.h).
The range stays valid as long as no pop/push operations are perfomed on the buffer. The range conforms to container ranges behavior (behaves just like begin() and end() iterators).
| typedef std::deque<Word> ace::words_store_t |
| size_t ace::_ascii2size_t | ( | const char * | str | ) |
Performs "safe" conversion from C-string to size_t type.
| str | String to be converted. |
| void ace::_build_tree | ( | const words_range_t & | sentence, | |
| ntree_t & | nodes | |||
| ) |
Builds a ntree based on grammar dependency tree of given sentence.
Each word in sentence must have unique order no!
| sentence | Reference to pair of iterators pointing to the first and behind the last word of sentence. | |
| nodes | Reference to container of nodes which plays the tree role. |
| words_store_t::const_iterator ace::_extract_ngram | ( | words_store_t::const_iterator | sentence_start, | |
| const subtree_t & | subtree, | |||
| raw_ngram_t & | raw_ngram | |||
| ) |
Extracts raw N-gram related to passed subtree.
| sentence_start | Iterator pointing to the beginning of the sentence. | |
| subtree | Subtree of size N extracted from sentence dependency tree. | |
| raw_ngram | Reference to container for extracted (raw N-gram). |
| size_t ace::_get_decompositions | ( | size_t | sum, | |
| size_t | addends_count, | |||
| sum_decompositions_t & | results | |||
| ) |
This helper function gets all possible ways to count `sum` as sum of `addends` addends.
| sum | Sum to be reached. | |
| addends_count | Number of addends to consider. | |
| results | Container to store the results (passed by reference for effectivity). |
| string_to_dependency_t ace::_init_str2dep | ( | void | ) |
Inits string to dependency mapping.
| void ace::_join_subtrees | ( | const subtrees_to_merge_t & | subtrees_by_root_node, | |
| subtrees_container_t & | results | |||
| ) |
| subtrees_by_root_node | Vectors of subtrees to join (passed in outer vector). Subtrees within one vector cannot be merged together (they are subtrees of the same root node). | |
| results | Store place for created subtrees. |
| double ace::_log10 | ( | double | x | ) |
"Override" of std::log10 (return DBL_MIN, if given x is zero).
| x | Value to be logged :) |
| void ace::_print_context | ( | std::ofstream & | output_file, | |
| const context_t * | ctx, | |||
| bool | narrow = false | |||
| ) |
| void ace::_print_member | ( | std::ostream & | output, | |
| ngram_size_t | index, | |||
| const NGram::Member * | member | |||
| ) |
Helper procedure - prints given N-gram member to given output stream.
| output | Reference to output stream. | |
| index | Member index within the N-gram members. | |
| member | Pointer to const instance of NGram member. |
| void ace::_print_members | ( | std::ostream & | output, | |
| const NGram * | ngram | |||
| ) |
Helper procedure - prints given N-gram members to given output stream.
| output | Reference to output stream. | |
| ngram | Pointer to const instance of NGram. |
| void ace::_print_n_members | ( | std::ostream & | output, | |
| const NGram * | ngram | |||
| ) |
Helper procedure - prints given N-gram members to given output stream.
Procedure prints all N members, even if given N-gram is not full-typed (* is printed in place of missing member).
| output | Reference to output stream. | |
| ngram | Pointer to const instance of NGram. |
| void ace::_print_stats | ( | std::ostream & | output, | |
| const scores_t & | scores | |||
| ) |
Helper procedure - prints certain N-gram statistics to given output stream.
Note: Debug version prints formatted statistics (including name). Release version prints statistics in single-row fashion.
| output | Reference to output stream. | |
| scores | Reference to const instance of statistics (scores) table. |
| void ace::_print_tables | ( | std::ostream & | output, | |
| const EvaluationTables & | tables | |||
| ) |
Helper procedure - prints certain N-gram evaluation tables to given output stream.
Note: Debug version prints all tables, Release version only contigency table.
| output | Reference to output stream. | |
| tables | Reference to const instance EvaluationTables. |
| subtrees_to_merge_t::size_type ace::_product | ( | subtrees_to_merge_t::size_type | product_so_far, | |
| const subtrees_container_t & | subtrees | |||
| ) |
| NGram* ace::_store_ngram | ( | const raw_ngram_t & | raw_ngram, | |
| ngram_type_t | type | |||
| ) |
Stores instance of given N-gram.
If morphologic filter is on, the N-gram must pass the filter.
| raw_ngram | Raw N-gram intance to be stored. | |
| type | N-gram type to be stored. |
| void ace::_store_raw_ngram | ( | const raw_ngram_t & | raw_ngram, | |
| const Buffer & | buffer, | |||
| NamedDataFileStats & | stats, | |||
| words_store_t::const_iterator | head | |||
| ) |
Stores all *-types for given raw N-gram and updates related file stats counter.
Also stores N-gram context, if needed.
| raw_ngram | Reference to const raw N-gram to be stored. | |
| buffer | Reference to underlying buffer. | |
| stats | Reference to related file stats (to be updated). |
| void ace::_store_wide_context | ( | NGram * | ngram, | |
| const raw_ngram_t & | raw_ngram, | |||
| words_range_t | context_range | |||
| ) |
Stores wide context for given N-gram.
| ngram | Pointer to ngram instance. | |
| raw_ngram | Related raw N-gram object. | |
| context_range | Valid context range to be processed. |
| size_t ace::bits_in | ( | _Type | expr | ) | [inline] |
| expr | Expression. |
| const char * ace::dependency2cstring | ( | dependency_index_t | dependency_index | ) |
Converts given index to string representation of dependency value.
Index must be valid! Otherwise some kind of range_error raises.
| dependency_index | Index to be converted. |
| std::string ace::double2str | ( | double | num, | |
| unsigned | precision = 0 | |||
| ) |
Converts given floating point number to string.
| num | Number to become the string. | |
| precision | Output number precision. |
| size_t ace::extract | ( | std::ifstream & | input_file, | |
| NamedDataFileStats & | stats | |||
| ) |
Procedure extracts N-grams from given input datafile and counts file stats.
| input_file | Reference to input datafile. | |
| stats | Reference to statistics container (to be filled with file stats). |
| void ace::extract_subtrees | ( | ntree_t & | nodes, | |
| tree_size_t | subtree_size, | |||
| mapped_subtrees_t & | results | |||
| ) |
Extracts indexed subtrees of given size from passed indexed-tree.
For definition of indexed-tree see header file comment. For definition of ntree_t see its typedef description, and do so for definition of mapped_subtrees_t too.
| nodes | Reference to container of NTreeNode instances. | |
| subtree_size | Determines size of extracted subtrees. | |
| results | Reference to container capable of holding resulted subtrees. |
| _Type ace::ff | ( | void | ) | [inline] |
| bool ace::ith_bit | ( | _Type | expr, | |
| size_t | i | |||
| ) | [inline] |
| expr | Expression. | |
| i | Index. |
| bool ace::match_single_rule | ( | const std::string & | rule, | |
| const std::string & | value_to_match | |||
| ) |
Returns true, if value to match (VTM) matches the rule (R).
Note: If sizes of strings are different, only the corresponding substring of the longer one is considered. VTM matches R, if and only if all characters from of R except the insignificant one matches its counterpart in V.
| rule | ||
| value_to_match |
| bool ace::operator< | ( | const NTreeNode & | lhs, | |
| const NTreeNode & | rhs | |||
| ) | [inline] |
| bool ace::operator< | ( | const NGram & | lhs, | |
| const NGram & | rhs | |||
| ) |
Less than operator for NGrams.
| lhs | Left hand argument. | |
| rhs | Right hand argument. |
| bool ace::operator== | ( | const NGram & | lhs, | |
| const NGram & | rhs | |||
| ) |
Equal to operator for NGrams.
| lhs | Left hand argument. | |
| rhs | Right hand argument. |
| void ace::parse_params | ( | int | argc, | |
| char ** | argv | |||
| ) |
| void ace::print_usage | ( | void | ) |
Procedure prints program usage information.
| void ace::process | ( | filenames_t & | files_to_process, | |
| size_t | files_total_size | |||
| ) |
Procedure processes input datafiles from given list (filename).
Extracts and stores extracted n-grams.
| files_to_process | Container with filenames to process. | |
| files_total_size | Size (in chars) of all files (if non-zero, progress will be reported). |
| fatal_error | If fatal error happens while processing. |
| bool ace::read_and_check_input_datafiles | ( | filenames_t & | files_to_process, | |
| size_t & | files_total_size | |||
| ) |
Reads filenames of input datafiles (from given file).
| files_to_process | Container for store of retrieved input filenames. | |
| files_total_size | Reference to total files size counter. |
| _Type ace::set_ith_bit | ( | _Type | expr, | |
| size_t | i, | |||
| bool | on = true | |||
| ) | [inline] |
| expr | Expression. | |
| i | Index. | |
| on | Set to 1 (true) or to 0 (false)? |
| bool ace::startup_init | ( | void | ) |
Initializes whatever has to be initialized on startup.
| dependency_index_t ace::string2dependency | ( | std::string & | str | ) |
Converts given string to index representation of dependency value.
If given param is not found, special index (dependency::Undef) is returned.
| str | String to be converted. |
| std::string ace::unsigned2str | ( | size_t | num | ) |
Converts given unsigned number to string.
| num | Number to become the string. |
| void ace::write | ( | std::ofstream & | output_file | ) |
Procedure writes extracted ngrams with their frequency count to given output file in plain text (tab-separated) format.
| output_file | Reference to opened ofstream object which is binded to output file. |
| void ace::write_contexts | ( | std::ofstream & | output_file | ) |
Procedure writes extracted ngrams contexts to given output file.
| output_file | Reference to opened ofstream object which is binded to output file. |
| void ace::write_morphologic_filter_stats | ( | std::ofstream & | output_file | ) |
Procedure writes morphologic filtration statistics to given output file.
| output_file | Reference to opened ofstream object which is binded to output file. |
| void ace::write_stats | ( | std::ofstream & | output_file | ) |
Procedure writes datafiles statistics to given output file.
| output_file | Reference to opened ofstream object which is binded to output file. |
| const size_t ace::_bits_per_member[_max_n+1] = { 0, 32, 16, 11, 8, 7, 6, 5, 4 } |
| const size_t ace::_dependency_bitmasks[_max_n+1] |
Initial value:
{
0x00,
0x7C,
0x0C,
0x1C,
0x0C,
0x04,
0x00, 0x00, 0x00
}
| const size_t ace::_dp_shift[_max_n+1] |
Initial value:
{
0,
24,
12,
6,
4,
4,
0, 0, 0
}
| const size_t ace::_lemma_bitmasks[_max_n+1] |
Initial value:
{
0x00,
0x00FFFFFF,
0x00000FFF,
0x000000FF,
0x0000003F,
0x0000003F,
0x0000003F,
0x0000001F,
0x0000000F,
}
| const ngram_size_t ace::_max_n = 8 |
Maximum N for which hashing function bitmask tables are prepared.
| const size_t ace::_parent_bitmasks[_max_n+1] |
Initial value:
{
0x00,
0x03,
0x03,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00
}
| const char* ace::dependency_to_cstring_tab[number_of_dependencies] |
| const char ace::fields_separator = '\t' |
CSV format separator.
| const dependency_index_t ace::number_of_dependencies = 118 |
| unsigned ace::processing_progress_precision = 4 |
1.5.6