ace Namespace Reference

All modules within the program are declared in namespace ace in order to not pollute global namespace. More...


Classes

class  Buffer
 Class implements words buffer. More...
class  TaggedLemma
 Class holds lemma and tag indices together. More...
struct  ContextWindow
 Type holding context window settings. More...
struct  Sieves
 Lower bounds for evaluated N-grams. More...
struct  Thresholds
 Thresholds for various statistical tests evaluations. More...
class  PartOfContext
 Part of context is smallest unit used for context tracing functionality. More...
class  EvaluationTables
 Class evaluates contigency and expected frequency tables. More...
class  index_overflow
 Exception is thrown if incrementation of (unsigned) index would cause overflow of index datatype. More...
class  Filter
 Abstract class declares some basic interface for counting filter rules matches. More...
class  UniRulesFilter
 Class declares interface of unirules filter. More...
class  MultiRulesFilter
 Class declares interface of multirules filter. More...
class  HashVector
 Class for storage of. More...
class  NGram
 Class designed to keep single NGram data. More...
class  NGramToken
 Class purpose is to offer creation of temporary NGram instance to be used as a key value (token) when searching in NGramStore hashtables. More...
class  NGramHashCounter
 Class provides hashing function for N-grams instances. More...
struct  NGramHasher
 Functor, which wraps hashing function (NGramHashCounter::count_hash()). More...
class  NGramStore
 Class serves as NGram(s) store. More...
struct  NGramFrequencies
 Struct holds table of all *-frequencies of certain N-gram. More...
class  NGramsProcessor
 NGramsProcessor simplifies gathering of frequency stats of given N-gram (and all its downcasted types) from NGramStore container. More...
class  Notifier
 Notifier is very simple - just binds itself to given stream and flushes passed notification to that stream. More...
class  ProgressNotifier
 ProgressNotifier holds static stored timer within, which starts timing of program execution time immediately on program start. More...
class  NTreeNode
 Class holds information of single indexed-tree node. More...
class  Overflow
 Class serves as mapping between objects of given type and (some) statistics, which shall be stored in _ValueType. More...
class  Parser
 Parser object binds itself to open ifstream object, which corresponds to certain input datafile. More...
class  MemoryBlock
 Memory block is designed to keep large continuous chunk of memory within and provides methods for subsequent transfer of small pieces to client subjects. More...
class  MemoryPool
 Definition of memory pool (abstract) interface. More...
class  RandomSizedMemoryPool
 RandomSizedMemoryPool, as its name suggests, allows to transfer various-sized chunks of memory. More...
class  VoidPool
 VoidPool performs as pool of void type, although internally it's implementation is based on pool of char(s). More...
class  fatal_error
 Exception class for indication of fatal error occurrence in process phase. More...
struct  DataFileStats
 Class implements simple statistics counter. More...
struct  NamedDataFileStats
 Class extends DataFileStats: adds "label" (for filename). More...
class  StringStore
 Class purpose is to store unique C-strings and let them be accessed (later) by unique numeral indices of given type (which shall be the smallest one unsigned type capable of store as much values as the number of C-strings expected). More...
class  StringStoreLessComparator
 Functor designed for less than comparison of C-strings stored within StringStore. More...
class  StringStoreEqualComparator
 Functor designed for equal to comparison of C-strings stored within StringStore. More...
struct  select1st
 Template for functor class for function returning 1st pair member. More...
struct  select2nd
 Template for functor class for function returning 2nd pair member. More...
class  SumDecompositions
 Class purpose is to hold information about all decompositions of given sum (N) into given number of addends (M). More...
class  Word
 Word class holds all information for single input word entity. More...

Namespaces

namespace  constants
 All program constants happily live together in this nicenamed namespace.
namespace  dependency
 Special namespace used to keep variables with indices of few special dependency values.
namespace  eval
 Namespace wraps evaluation functions.
namespace  notifier
 Module declares (global) variables, which are wrapped into separate namespace.
namespace  persistent
 Namespace holds (global) variables.
namespace  settings
 Namespace holds (global) program settings and functions checking their corectness.

Typedefs

typedef std::pair
< words_store_t::const_iterator,
words_store_t::const_iterator > 
words_range_t
 Words range is a range within the buffer, so its based on word buffer container type (see word.h).
typedef unsigned short word_order_t
 This typedef determines max_sentence_length constant definition (maximum allowed value of this type is max sentence length).
typedef unsigned int string_index_t
 This type must be capable of holding number of all different strings which are stored in certain store.
typedef unsigned char ngram_size_t
 Type used for representation of N value and also for indexing of N-gram members (so must be capable of holding constants::max_grammity value).
typedef unsigned char dependency_index_t
 This type must be capable of holding number of all different dependency types which may occurr within program run.
typedef unsigned char tag_index_t
 This type must be capable of holding number of all different tags which may occurr within program run.
typedef unsigned char ngram_type_t
 This type must be capable of holding |max_grammity| bits (not the value!).
typedef unsigned int context_frequency_t
 Definition of type used for context frequency counter.
typedef std::vector
< PartOfContext
context_t
 Full context is made out from parts of context.
typedef std::map< std::string,
dependency_index_t
string_to_dependency_t
typedef std::vector< std::string > dependency_to_string_t
typedef std::vector< double > exp_table_t
 Container acts like table of expected frequencies (which are floating point numbers).
typedef std::string unirule_t
 Rule is just a single string.
typedef std::vector< std::string > multirule_t
 Rule is a bunch of strings.
typedef std::vector< size_t > filter_matches_stats_t
 For each rule there's a counter of values matched.
typedef std::pair
< ngram_size_t,
words_store_t::const_iterator > 
raw_ngram_member_t
 Typedef for pair of values for each raw N-gram member: 1.
typedef std::vector
< raw_ngram_member_t
raw_ngram_t
 Raw N-gram is just collection of owned members.
typedef unsigned int freq_counter_t
 Frequency counter datatype definition.
typedef size_t frequency_t
 Frequency datatype definition.
typedef std::vector< frequency_tfreq_table_t
 Container for frequency statistics.
typedef boost::progress_display progress_display_t
 Simply progress bar display is provided by one of boost libraries.
typedef std::vector< subtree_tsubtrees_container_t
 Vector is used as subtrees container.
typedef std::vector
< subtrees_container_t
subtrees_to_merge_t
 Sometimes we need container for container of subtrees :).
typedef size_t tree_size_t
 Type definition for tree size type used to store NTree indices.
typedef std::vector< tree_size_tsubtree_t
 To represent subtrees (and resultant n-trees) vector is used.
typedef std::set< tree_size_tchilds_t
 Childs of certain node are stored in set, because we want them sorted and there's no need for random access.
typedef std::multimap
< tree_size_t, subtree_t
mapped_subtrees_t
 Mapped subtrees are subtrees with some extra special information - key.
typedef std::vector< NTreeNodentree_t
 To represent indexed-tree the std::vector of NTreeNode instances is used.
typedef std::map< std::string,
Filter::Stats
morphologic_filter_file_stats_t
 Mapping between files and their morphologic filter related statistics.
typedef std::vector< std::string > filenames_t
 Container for input filenames.
typedef std::vector
< NamedDataFileStats
stats_t
 Typedef for container of datafiles statistics.
typedef std::vector
< std::vector< size_t > > 
sum_decompositions_t
 The container with tables (vectors of unsigned) which are decompositions of certain number (N) into certain number of addends (M).
typedef std::deque< Wordwords_store_t
 (Forward) declaration of container used to buffering of words (see buffer.h).
typedef std::vector< double > scores_t
 Container for scores and statistics thresholds.

Functions

string_to_dependency_t _init_str2dep (void)
 Inits string to dependency mapping.
dependency_index_t string2dependency (std::string &str)
 Converts given string to index representation of dependency value.
const char * dependency2cstring (dependency_index_t dependency_index)
 Converts given index to string representation of dependency value.
double _log10 (double x)
 "Override" of std::log10 (return DBL_MIN, if given x is zero).
void _build_tree (const words_range_t &sentence, ntree_t &nodes)
 Builds a ntree based on grammar dependency tree of given sentence.
words_store_t::const_iterator _extract_ngram (words_store_t::const_iterator sentence_start, const subtree_t &subtree, raw_ngram_t &raw_ngram)
 Extracts raw N-gram related to passed subtree.
NGram_store_ngram (const raw_ngram_t &raw_ngram, ngram_type_t type)
 Stores instance of given N-gram.
void _store_wide_context (NGram *ngram, const raw_ngram_t &raw_ngram, words_range_t context_range)
 Stores wide context for given N-gram.
void _store_raw_ngram (const raw_ngram_t &raw_ngram, const Buffer &buffer, NamedDataFileStats &stats, words_store_t::const_iterator head)
 Stores all *-types for given raw N-gram and updates related file stats counter.
size_t extract (std::ifstream &input_file, NamedDataFileStats &stats)
 Procedure extracts N-grams from given input datafile and counts file stats.
bool match_single_rule (const std::string &rule, const std::string &value_to_match)
 Returns true, if value to match (VTM) matches the rule (R).
bool read_and_check_input_datafiles (filenames_t &files_to_process, size_t &files_total_size)
 Reads filenames of input datafiles (from given file).
bool startup_init (void)
 Initializes whatever has to be initialized on startup.
bool operator== (const NGram &lhs, const NGram &rhs)
 Equal to operator for NGrams.
bool operator< (const NGram &lhs, const NGram &rhs)
 Less than operator for NGrams.
void print_usage (void)
 Procedure prints program usage information.
subtrees_to_merge_t::size_type _product (subtrees_to_merge_t::size_type product_so_far, const subtrees_container_t &subtrees)
void _join_subtrees (const subtrees_to_merge_t &subtrees_by_root_node, subtrees_container_t &results)
void extract_subtrees (ntree_t &nodes, tree_size_t subtree_size, mapped_subtrees_t &results)
 Extracts indexed subtrees of given size from passed indexed-tree.
bool operator< (const NTreeNode &lhs, const NTreeNode &rhs)
 Comparison function for NTreeNode(s).
size_t _ascii2size_t (const char *str)
 Performs "safe" conversion from C-string to size_t type.
void parse_params (int argc, char **argv)
 Procedure parses program params and initializes members of settings namespace.
void process (filenames_t &files_to_process, size_t files_total_size)
 Procedure processes input datafiles from given list (filename).
std::string unsigned2str (size_t num)
 Converts given unsigned number to string.
std::string double2str (double num, unsigned precision=0)
 Converts given floating point number to string.
size_t _get_decompositions (size_t sum, size_t addends_count, sum_decompositions_t &results)
 This helper function gets all possible ways to count `sum` as sum of `addends` addends.
template<typename _Type>
bool ith_bit (_Type expr, size_t i)
template<typename _Type>
_Type set_ith_bit (_Type expr, size_t i, bool on=true)
template<typename _Type>
size_t bits_in (_Type expr)
template<typename _Type>
_Type ff (void)
void _print_member (std::ostream &output, ngram_size_t index, const NGram::Member *member)
 Helper procedure - prints given N-gram member to given output stream.
void _print_members (std::ostream &output, const NGram *ngram)
 Helper procedure - prints given N-gram members to given output stream.
void _print_n_members (std::ostream &output, const NGram *ngram)
 Helper procedure - prints given N-gram members to given output stream.
void _print_tables (std::ostream &output, const EvaluationTables &tables)
 Helper procedure - prints certain N-gram evaluation tables to given output stream.
void _print_stats (std::ostream &output, const scores_t &scores)
 Helper procedure - prints certain N-gram statistics to given output stream.
void write (std::ofstream &output_file)
 Procedure writes extracted ngrams with their frequency count to given output file in plain text (tab-separated) format.
void _print_context (std::ofstream &output_file, const context_t *ctx, bool narrow=false)
void write_contexts (std::ofstream &output_file)
 Procedure writes extracted ngrams contexts to given output file.
void write_stats (std::ofstream &output_file)
 Procedure writes datafiles statistics to given output file.
void write_morphologic_filter_stats (std::ofstream &output_file)
 Procedure writes morphologic filtration statistics to given output file.

Variables

const dependency_index_t number_of_dependencies = 118
const char * dependency_to_cstring_tab [number_of_dependencies]
const ngram_size_t _max_n = 8
 Maximum N for which hashing function bitmask tables are prepared.
const size_t _lemma_bitmasks [_max_n+1]
const size_t _dependency_bitmasks [_max_n+1]
const size_t _parent_bitmasks [_max_n+1]
const size_t _dp_shift [_max_n+1]
const size_t _bits_per_member [_max_n+1] = { 0, 32, 16, 11, 8, 7, 6, 5, 4 }
unsigned processing_progress_precision = 4
const char fields_separator = '\t'
 CSV format separator.


Detailed Description

All modules within the program are declared in namespace ace in order to not pollute global namespace.

Some modules declares their own namespaces.


Typedef Documentation

typedef std::set<tree_size_t> ace::childs_t

Childs of certain node are stored in set, because we want them sorted and there's no need for random access.

They are determined in the same way as in subtree_t container (by indices stored within container).

typedef unsigned int ace::context_frequency_t

Definition of type used for context frequency counter.

typedef std::vector<PartOfContext> ace::context_t

Full context is made out from parts of context.

typedef unsigned char ace::dependency_index_t

This type must be capable of holding number of all different dependency types which may occurr within program run.

typedef std::vector<std::string> ace::dependency_to_string_t

typedef std::vector<double> ace::exp_table_t

Container acts like table of expected frequencies (which are floating point numbers).

typedef std::vector<std::string> ace::filenames_t

Container for input filenames.

typedef std::vector<size_t> ace::filter_matches_stats_t

For each rule there's a counter of values matched.

typedef unsigned int ace::freq_counter_t

Frequency counter datatype definition.

We should keep in mind, that value of ngram_type_t type is stored within most significant bits of frequency counter, so its datatype should be big enough.

typedef std::vector<frequency_t> ace::freq_table_t

Container for frequency statistics.

typedef size_t ace::frequency_t

Frequency datatype definition.

It differs from freq_counter_t - frequency_t is frequency output (result) datatype (it's a type stored in overflow counters map and used for statistics evaluations). Despite this it may be the same type as freq_counter_t.

typedef std::multimap<tree_size_t, subtree_t> ace::mapped_subtrees_t

Mapped subtrees are subtrees with some extra special information - key.

The key can stand for one of the following: 1. Size of related subtree (use during extractor processing). 2. Index of root node of related subtree (used in resultant set). Multimap is used mainly because of the former. Multimap makes possible to access subtrees of given size in O(log N) time, where N stands for number of all subtrees of certain node.

typedef std::map<std::string, Filter::Stats> ace::morphologic_filter_file_stats_t

Mapping between files and their morphologic filter related statistics.

typedef std::vector<std::string> ace::multirule_t

Rule is a bunch of strings.

typedef unsigned char ace::ngram_size_t

Type used for representation of N value and also for indexing of N-gram members (so must be capable of holding constants::max_grammity value).

typedef unsigned char ace::ngram_type_t

This type must be capable of holding |max_grammity| bits (not the value!).

typedef std::vector<NTreeNode> ace::ntree_t

To represent indexed-tree the std::vector of NTreeNode instances is used.

It's essential that vector is indexed in the same fashion as NTreeNode instances it contains, so NTreeNode instance at i-th position has to have its index equal to i. Therefore, the first NTreeNode in ntree_t (at 0-th position) shall be abstract root (see definition of NTreeNode class) and is considered that way by extract_subtrees() procedure).

typedef boost::progress_display ace::progress_display_t

Simply progress bar display is provided by one of boost libraries.

typedef std::pair<ngram_size_t, words_store_t::const_iterator> ace::raw_ngram_member_t

Typedef for pair of values for each raw N-gram member: 1.

index of parent node in extracted N-gram (0 if root) 2. iterator to the corresponding word instance

typedef std::vector<raw_ngram_member_t> ace::raw_ngram_t

Raw N-gram is just collection of owned members.

typedef std::vector<double> ace::scores_t

Container for scores and statistics thresholds.

typedef std::vector<NamedDataFileStats> ace::stats_t

Typedef for container of datafiles statistics.

typedef unsigned int ace::string_index_t

This type must be capable of holding number of all different strings which are stored in certain store.

typedef std::map<std::string, dependency_index_t> ace::string_to_dependency_t

typedef std::vector<tree_size_t> ace::subtree_t

To represent subtrees (and resultant n-trees) vector is used.

Subtree nodes are determined by indices stored within container, so all values shall be unique (but it's not necessary to have them sorted all the time).

typedef std::vector<subtree_t> ace::subtrees_container_t

Vector is used as subtrees container.

Sometimes we need container for container of subtrees :).

typedef std::vector<std::vector<size_t> > ace::sum_decompositions_t

The container with tables (vectors of unsigned) which are decompositions of certain number (N) into certain number of addends (M).

typedef unsigned char ace::tag_index_t

This type must be capable of holding number of all different tags which may occurr within program run.

typedef size_t ace::tree_size_t

Type definition for tree size type used to store NTree indices.

Must be one of the unsigned types!

typedef std::string ace::unirule_t

Rule is just a single string.

typedef unsigned short ace::word_order_t

This typedef determines max_sentence_length constant definition (maximum allowed value of this type is max sentence length).

typedef std::pair<words_store_t::const_iterator, words_store_t::const_iterator> ace::words_range_t

Words range is a range within the buffer, so its based on word buffer container type (see word.h).

The range stays valid as long as no pop/push operations are perfomed on the buffer. The range conforms to container ranges behavior (behaves just like begin() and end() iterators).

typedef std::deque<Word> ace::words_store_t

(Forward) declaration of container used to buffering of words (see buffer.h).

Few modules work with this buffer, but they're only interested in buffer underlying container. So it's more convenient to (pre)declare it here, because they include word.h anyway.


Function Documentation

size_t ace::_ascii2size_t ( const char *  str  ) 

Performs "safe" conversion from C-string to size_t type.

Parameters:
str String to be converted.
Returns:
Unsigned number represented by string or 0, if string does not represent valid number or is negative.

void ace::_build_tree ( const words_range_t &  sentence,
ntree_t &  nodes 
)

Builds a ntree based on grammar dependency tree of given sentence.

Each word in sentence must have unique order no!

Parameters:
sentence Reference to pair of iterators pointing to the first and behind the last word of sentence.
nodes Reference to container of nodes which plays the tree role.

words_store_t::const_iterator ace::_extract_ngram ( words_store_t::const_iterator  sentence_start,
const subtree_t &  subtree,
raw_ngram_t &  raw_ngram 
)

Extracts raw N-gram related to passed subtree.

Parameters:
sentence_start Iterator pointing to the beginning of the sentence.
subtree Subtree of size N extracted from sentence dependency tree.
raw_ngram Reference to container for extracted (raw N-gram).
Returns:
Iterator pointing to the word, which is head of extracted N-gram.

size_t ace::_get_decompositions ( size_t  sum,
size_t  addends_count,
sum_decompositions_t &  results 
)

This helper function gets all possible ways to count `sum` as sum of `addends` addends.

Parameters:
sum Sum to be reached.
addends_count Number of addends to consider.
results Container to store the results (passed by reference for effectivity).
Returns:
Size of `results` at the beginning of function execution.

string_to_dependency_t ace::_init_str2dep ( void   ) 

Inits string to dependency mapping.

void ace::_join_subtrees ( const subtrees_to_merge_t &  subtrees_by_root_node,
subtrees_container_t &  results 
)

Parameters:
subtrees_by_root_node Vectors of subtrees to join (passed in outer vector). Subtrees within one vector cannot be merged together (they are subtrees of the same root node).
results Store place for created subtrees.

double ace::_log10 ( double  x  ) 

"Override" of std::log10 (return DBL_MIN, if given x is zero).

Parameters:
x Value to be logged :)
Returns:
log10 of given x.

void ace::_print_context ( std::ofstream &  output_file,
const context_t *  ctx,
bool  narrow = false 
)

void ace::_print_member ( std::ostream &  output,
ngram_size_t  index,
const NGram::Member *  member 
)

Helper procedure - prints given N-gram member to given output stream.

Parameters:
output Reference to output stream.
index Member index within the N-gram members.
member Pointer to const instance of NGram member.

void ace::_print_members ( std::ostream &  output,
const NGram *  ngram 
)

Helper procedure - prints given N-gram members to given output stream.

Parameters:
output Reference to output stream.
ngram Pointer to const instance of NGram.

void ace::_print_n_members ( std::ostream &  output,
const NGram *  ngram 
)

Helper procedure - prints given N-gram members to given output stream.

Procedure prints all N members, even if given N-gram is not full-typed (* is printed in place of missing member).

Parameters:
output Reference to output stream.
ngram Pointer to const instance of NGram.

void ace::_print_stats ( std::ostream &  output,
const scores_t &  scores 
)

Helper procedure - prints certain N-gram statistics to given output stream.

Note: Debug version prints formatted statistics (including name). Release version prints statistics in single-row fashion.

Parameters:
output Reference to output stream.
scores Reference to const instance of statistics (scores) table.

void ace::_print_tables ( std::ostream &  output,
const EvaluationTables &  tables 
)

Helper procedure - prints certain N-gram evaluation tables to given output stream.

Note: Debug version prints all tables, Release version only contigency table.

Parameters:
output Reference to output stream.
tables Reference to const instance EvaluationTables.

subtrees_to_merge_t::size_type ace::_product ( subtrees_to_merge_t::size_type  product_so_far,
const subtrees_container_t &  subtrees 
)

NGram* ace::_store_ngram ( const raw_ngram_t &  raw_ngram,
ngram_type_t  type 
)

Stores instance of given N-gram.

If morphologic filter is on, the N-gram must pass the filter.

Parameters:
raw_ngram Raw N-gram intance to be stored.
type N-gram type to be stored.
Returns:
Pointer to related N-gram instance within the global N-gram store or NULL, if N-gram hasn't been stored (didn't make it through filter).

void ace::_store_raw_ngram ( const raw_ngram_t &  raw_ngram,
const Buffer &  buffer,
NamedDataFileStats &  stats,
words_store_t::const_iterator  head 
)

Stores all *-types for given raw N-gram and updates related file stats counter.

Also stores N-gram context, if needed.

Parameters:
raw_ngram Reference to const raw N-gram to be stored.
buffer Reference to underlying buffer.
stats Reference to related file stats (to be updated).

void ace::_store_wide_context ( NGram *  ngram,
const raw_ngram_t &  raw_ngram,
words_range_t  context_range 
)

Stores wide context for given N-gram.

Parameters:
ngram Pointer to ngram instance.
raw_ngram Related raw N-gram object.
context_range Valid context range to be processed.

template<typename _Type>
size_t ace::bits_in ( _Type  expr  )  [inline]

Parameters:
expr Expression.
Returns:
Number of nonzero bits in given expression.

const char * ace::dependency2cstring ( dependency_index_t  dependency_index  ) 

Converts given index to string representation of dependency value.

Index must be valid! Otherwise some kind of range_error raises.

Parameters:
dependency_index Index to be converted.
Returns:
String representation of given index.

std::string ace::double2str ( double  num,
unsigned  precision = 0 
)

Converts given floating point number to string.

Parameters:
num Number to become the string.
precision Output number precision.
Returns:
String representation of given number.

size_t ace::extract ( std::ifstream &  input_file,
NamedDataFileStats &  stats 
)

Procedure extracts N-grams from given input datafile and counts file stats.

Parameters:
input_file Reference to input datafile.
stats Reference to statistics container (to be filled with file stats).

void ace::extract_subtrees ( ntree_t &  nodes,
tree_size_t  subtree_size,
mapped_subtrees_t &  results 
)

Extracts indexed subtrees of given size from passed indexed-tree.

For definition of indexed-tree see header file comment. For definition of ntree_t see its typedef description, and do so for definition of mapped_subtrees_t too.

Parameters:
nodes Reference to container of NTreeNode instances.
subtree_size Determines size of extracted subtrees.
results Reference to container capable of holding resulted subtrees.

template<typename _Type>
_Type ace::ff ( void   )  [inline]

Returns:
Expression with all bits set to 1.

template<typename _Type>
bool ace::ith_bit ( _Type  expr,
size_t  i 
) [inline]

Parameters:
expr Expression.
i Index.
Returns:
True, if ith-bit of given expression is 1, false otherwise.

bool ace::match_single_rule ( const std::string &  rule,
const std::string &  value_to_match 
)

Returns true, if value to match (VTM) matches the rule (R).

Note: If sizes of strings are different, only the corresponding substring of the longer one is considered. VTM matches R, if and only if all characters from of R except the insignificant one matches its counterpart in V.

Parameters:
rule 
value_to_match 
Returns:
True, if value_to_match matches rule, false otherwise.

bool ace::operator< ( const NTreeNode &  lhs,
const NTreeNode &  rhs 
) [inline]

Comparison function for NTreeNode(s).

Comparison is based on NTreeNode indices.

Parameters:
lhs Left hand operand.
rhs Right hand operand.
Returns:
True, if left NTreeNode is lesser than right NTreeNode.

bool ace::operator< ( const NGram &  lhs,
const NGram &  rhs 
)

Less than operator for NGrams.

Parameters:
lhs Left hand argument.
rhs Right hand argument.
Returns:
True, if first NGram is lesser than second one.

bool ace::operator== ( const NGram &  lhs,
const NGram &  rhs 
)

Equal to operator for NGrams.

Parameters:
lhs Left hand argument.
rhs Right hand argument.
Returns:
True, if both NGrams are equal.

void ace::parse_params ( int  argc,
char **  argv 
)

Procedure parses program params and initializes members of settings namespace.

The params are the same as for main().

Parameters:
argc Number of params.
argv Pointer to array (of size argc) with pointers to C-strings.

void ace::print_usage ( void   ) 

Procedure prints program usage information.

void ace::process ( filenames_t &  files_to_process,
size_t  files_total_size 
)

Procedure processes input datafiles from given list (filename).

Extracts and stores extracted n-grams.

Parameters:
files_to_process Container with filenames to process.
files_total_size Size (in chars) of all files (if non-zero, progress will be reported).
Exceptions:
fatal_error If fatal error happens while processing.

bool ace::read_and_check_input_datafiles ( filenames_t &  files_to_process,
size_t &  files_total_size 
)

Reads filenames of input datafiles (from given file).

Parameters:
files_to_process Container for store of retrieved input filenames.
files_total_size Reference to total files size counter.
Returns:
True, if filenames was read successfully, false otherwise.

template<typename _Type>
_Type ace::set_ith_bit ( _Type  expr,
size_t  i,
bool  on = true 
) [inline]

Parameters:
expr Expression.
i Index.
on Set to 1 (true) or to 0 (false)?
Returns:
Expression

bool ace::startup_init ( void   ) 

Initializes whatever has to be initialized on startup.

Returns:
True, if initialization ends successfully, false otherwise.

dependency_index_t ace::string2dependency ( std::string &  str  ) 

Converts given string to index representation of dependency value.

If given param is not found, special index (dependency::Undef) is returned.

Parameters:
str String to be converted.
Returns:
Index representation of given string.

std::string ace::unsigned2str ( size_t  num  ) 

Converts given unsigned number to string.

Parameters:
num Number to become the string.
Returns:
String representation of given number.

void ace::write ( std::ofstream &  output_file  ) 

Procedure writes extracted ngrams with their frequency count to given output file in plain text (tab-separated) format.

Parameters:
output_file Reference to opened ofstream object which is binded to output file.

void ace::write_contexts ( std::ofstream &  output_file  ) 

Procedure writes extracted ngrams contexts to given output file.

Parameters:
output_file Reference to opened ofstream object which is binded to output file.

void ace::write_morphologic_filter_stats ( std::ofstream &  output_file  ) 

Procedure writes morphologic filtration statistics to given output file.

Parameters:
output_file Reference to opened ofstream object which is binded to output file.

void ace::write_stats ( std::ofstream &  output_file  ) 

Procedure writes datafiles statistics to given output file.

Parameters:
output_file Reference to opened ofstream object which is binded to output file.


Variable Documentation

const size_t ace::_bits_per_member[_max_n+1] = { 0, 32, 16, 11, 8, 7, 6, 5, 4 }

Initial value:

 { 
    0x00, 
    0x7C, 
    0x0C, 
    0x1C, 
    0x0C, 
    0x04, 
    0x00, 0x00, 0x00 
}

const size_t ace::_dp_shift[_max_n+1]

Initial value:

 {
    0, 
    24, 
    12, 
    6, 
    4, 
    4, 
    0, 0, 0 
}

const size_t ace::_lemma_bitmasks[_max_n+1]

Initial value:

 {
    0x00, 
    0x00FFFFFF, 
    0x00000FFF, 
    0x000000FF, 
    0x0000003F, 
    0x0000003F, 
    0x0000003F, 
    0x0000001F, 
    0x0000000F, 
}

Maximum N for which hashing function bitmask tables are prepared.

const size_t ace::_parent_bitmasks[_max_n+1]

Initial value:

 {
    0x00, 
    0x03, 
    0x03, 
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00 
}

const char ace::fields_separator = '\t'

CSV format separator.


Generated on Wed Aug 6 23:25:50 2008 for PACE by  doxygen 1.5.6