diff options
author | Daniel Povey | 2017-08-21 18:06:11 -0500 |
---|---|---|
committer | Daniel Povey | 2017-08-21 18:06:11 -0500 |
commit | 92dad5b6542bdbe0ef876b73cebbd16c391c56bf (patch) | |
tree | a5bd34103a7a96e6ad810c1491af723b63febd80 | |
parent | 91d47c1dc4cee5b28a2c13b1c63bbce8ecc8a54d (diff) | |
download | kaldi-92dad5b6542bdbe0ef876b73cebbd16c391c56bf.tar.gz kaldi-92dad5b6542bdbe0ef876b73cebbd16c391c56bf.tar.xz kaldi-92dad5b6542bdbe0ef876b73cebbd16c391c56bf.zip |
[doc] Add some terms to glossary
-rw-r--r-- | src/doc/glossary.dox | 30 | ||||
-rw-r--r-- | src/doc/graph.dox | 84 |
2 files changed, 72 insertions, 42 deletions
diff --git a/src/doc/glossary.dox b/src/doc/glossary.dox index 8bb8283e5..e1b3c01f7 100644 --- a/src/doc/glossary.dox +++ b/src/doc/glossary.dox | |||
@@ -50,11 +50,41 @@ contain alignment information as sequences of transition-ids for each word | |||
50 | sequence in the lattice. The program \ref bin/show-alignments.cc "show-alignments" shows | 50 | sequence in the lattice. The program \ref bin/show-alignments.cc "show-alignments" shows |
51 | alignments in a human-readable format. | 51 | alignments in a human-readable format. |
52 | 52 | ||
53 | |||
54 | <b>:cost:</b> Any quantity which is used as a 'cost' in a weighted FST | ||
55 | algorithm (e.g. acoustic cost, language model cost; see \ref lattices | ||
56 | for more details). Costs are, generally speaking, interpretable | ||
57 | as a negative log of a likelihood or probability, but there may | ||
58 | be scaling factors involved. | ||
59 | |||
53 | <b>:forced alignment:</b> see <b>alignment</b>. | 60 | <b>:forced alignment:</b> see <b>alignment</b>. |
54 | 61 | ||
55 | <b>:lattice:</b> A representation of alternative likely transcriptions of an utterance, together | 62 | <b>:lattice:</b> A representation of alternative likely transcriptions of an utterance, together |
56 | with associated alignment and cost information. See \ref lattices. | 63 | with associated alignment and cost information. See \ref lattices. |
57 | 64 | ||
65 | |||
66 | <b>:likelihood:</b> A mathematical concept meaning the value of a function representing | ||
67 | the distribution of a continuous value. These can be more than one. Often represented | ||
68 | in log space (as log-likelihood) because likelihood values of multi-dimensional | ||
69 | features can often be too small or large to fit in standard floating-point precision. | ||
70 | With standard cross-entropy trained neural net systems we obtain "pseudo-likelihoods" | ||
71 | by dividing log-probabilities by the priors of context-dependent states. | ||
72 | |||
73 | <b>:posterior:</b> "Posterior" is shorthand for "posterior probability" which is a very | ||
74 | general mathematical concept, generally meaning "the probability of some random | ||
75 | variable after seeing the relevant data". In general, posteriors will sum to one. | ||
76 | In Kaldi terminology, if you encounter the term "posterior", abbreviated to "post", | ||
77 | without further expanation it generally means the per-frame posterior probability of | ||
78 | transition-ids. However these posteriors may be very peaky (i.e. mostly ones and zeros) | ||
79 | depending how you obtained them, e.g. from a lattice or from an alignment. | ||
80 | Alignments and lattices can be converted to posteriors over transition-ids (see \ref lattice-to-post.cc), | ||
81 | or over lattice arcs (see \ref ali-to-post.cc and \ref lattice-arc-post.cc). | ||
82 | Posteriors over transition-ids can be converted to posteriors over pdf-ids or over phones; | ||
83 | see the tools \ref ali-to-post.cc, \ref post-to-pdf-post.cc and \ref post-to-phone-post.cc. | ||
84 | |||
85 | <b>:pdf-id:</b> The zero-based integer index of a clustered context-dependent HMM state; see | ||
86 | \ref transition_model_identifiers for more information. | ||
87 | |||
58 | <b>:transition-id:</b> a one-based index that encodes the pdf-id (i.e. the clustered context-dependent HMM state), | 88 | <b>:transition-id:</b> a one-based index that encodes the pdf-id (i.e. the clustered context-dependent HMM state), |
59 | the phone identity, and information about whether we took the self-loop or forward transition in the HMM. | 89 | the phone identity, and information about whether we took the self-loop or forward transition in the HMM. |
60 | Appears in lattices, decoding graphs and alignments. See \ref transition_model. | 90 | Appears in lattices, decoding graphs and alignments. See \ref transition_model. |
diff --git a/src/doc/graph.dox b/src/doc/graph.dox index 849abd3e7..cf0f99acb 100644 --- a/src/doc/graph.dox +++ b/src/doc/graph.dox | |||
@@ -24,35 +24,35 @@ namespace fst { | |||
24 | \page graph Decoding graph construction in Kaldi | 24 | \page graph Decoding graph construction in Kaldi |
25 | 25 | ||
26 | Firstly, we cannot hope to introduce finite state transducers and how they | 26 | Firstly, we cannot hope to introduce finite state transducers and how they |
27 | are used in speech recognition. For that, see | 27 | are used in speech recognition. For that, see |
28 | <a href=http://www.cs.nyu.edu/~mohri/pub/hbka.pdf> "Speech Recognition | 28 | <a href=http://www.cs.nyu.edu/~mohri/pub/hbka.pdf> "Speech Recognition |
29 | with Weighted Finite-State Transducers" </a> by Mohri, Pereira and | 29 | with Weighted Finite-State Transducers" </a> by Mohri, Pereira and |
30 | Riley (in Springer Handbook on SpeechProcessing and Speech Communication, 2008). | 30 | Riley (in Springer Handbook on SpeechProcessing and Speech Communication, 2008). |
31 | The general approach we use is as described there, but some of the details, | 31 | The general approach we use is as described there, but some of the details, |
32 | particularly with respect to how | 32 | particularly with respect to how |
33 | we handle disambiguation symbols and how we deal with weight-pushing, differ. | 33 | we handle disambiguation symbols and how we deal with weight-pushing, differ. |
34 | 34 | ||
35 | \section graph_overview Overview of graph creation | 35 | \section graph_overview Overview of graph creation |
36 | 36 | ||
37 | The overall picture for decoding-graph creation is that we | 37 | The overall picture for decoding-graph creation is that we |
38 | are constructing the graph HCLG = H o C o L o G. Here | 38 | are constructing the graph HCLG = H o C o L o G. Here |
39 | 39 | ||
40 | - G is an acceptor (i.e. its input and output symbols are the same) | 40 | - G is an acceptor (i.e. its input and output symbols are the same) |
41 | that encodes the grammar or language model. | 41 | that encodes the grammar or language model. |
42 | - L is the lexicon; its output symbols are words and its input | 42 | - L is the lexicon; its output symbols are words and its input |
43 | symbols are phones. | 43 | symbols are phones. |
44 | - C represents the context-dependency: its output symbols are phones | 44 | - C represents the context-dependency: its output symbols are phones |
45 | and its input symbols represent context-dependent phones, i.e. windows | 45 | and its input symbols represent context-dependent phones, i.e. windows |
46 | of N phones; see \ref tree_window. | 46 | of N phones; see \ref tree_window. |
47 | - H contains the HMM definitions; its output symbols represent context-dependent | 47 | - H contains the HMM definitions; its output symbols represent context-dependent |
48 | phones and its input symbols are transitions-ids, which encode the pdf-id and | 48 | phones and its input symbols are transition-ids, which encode the pdf-id and |
49 | other information (see \ref transition_model_identifiers) | 49 | other information (see \ref transition_model_identifiers) |
50 | 50 | ||
51 | This is the standard recipe. However, there are a lot of details to be | 51 | This is the standard recipe. However, there are a lot of details to be |
52 | filled in. We want to ensure that the output is determinized and minimized, | 52 | filled in. We want to ensure that the output is determinized and minimized, |
53 | and in order for HCLG to be determinizable we have to insert disambiguation | 53 | and in order for HCLG to be determinizable we have to insert disambiguation |
54 | symbols. For details on the disambiguation symbols, see below | 54 | symbols. For details on the disambiguation symbols, see below |
55 | \ref graph_disambig. | 55 | \ref graph_disambig. |
56 | 56 | ||
57 | We also want to ensure the HCLG is stochastic, as far as possible; in the | 57 | We also want to ensure the HCLG is stochastic, as far as possible; in the |
58 | conventional recipes, this is done (if at all) with the "push-weights" | 58 | conventional recipes, this is done (if at all) with the "push-weights" |
@@ -72,7 +72,7 @@ namespace fst { | |||
72 | as long as G was stochastic. Of course, G will typically not be quite | 72 | as long as G was stochastic. Of course, G will typically not be quite |
73 | stochastic, because of the way Arpa language models with backoff are represented in FSTs, | 73 | stochastic, because of the way Arpa language models with backoff are represented in FSTs, |
74 | but at least our approach ensures that the non-stochasticity "stays | 74 | but at least our approach ensures that the non-stochasticity "stays |
75 | put" and does not get worse than it was at the start; this approach avoids | 75 | put" and does not get worse than it was at the start; this approach avoids |
76 | the danger of the "push-weights" operation failing or making things worse. | 76 | the danger of the "push-weights" operation failing or making things worse. |
77 | 77 | ||
78 | \section graph_disambig Disambiguation symbols | 78 | \section graph_disambig Disambiguation symbols |
@@ -80,7 +80,7 @@ namespace fst { | |||
80 | Disambiguation symbols are the symbols \#1, \#2, \#3 and so on that | 80 | Disambiguation symbols are the symbols \#1, \#2, \#3 and so on that |
81 | are inserted at the end of phonemene sequences in the lexicon. When | 81 | are inserted at the end of phonemene sequences in the lexicon. When |
82 | a phoneme sequence is a prefix of another phoneme sequence in the lexicon, | 82 | a phoneme sequence is a prefix of another phoneme sequence in the lexicon, |
83 | or appears in more than one word, it needs to have one of | 83 | or appears in more than one word, it needs to have one of |
84 | these symbols added after it. These symbols are needed to ensure that | 84 | these symbols added after it. These symbols are needed to ensure that |
85 | the product L o G is determinizable. We also insert disambiguation symbols | 85 | the product L o G is determinizable. We also insert disambiguation symbols |
86 | in two more places. We have a symbol \#0 on the backoff arcs in the | 86 | in two more places. We have a symbol \#0 on the backoff arcs in the |
@@ -90,22 +90,22 @@ namespace fst { | |||
90 | at the beginning of the utterance (before we start outputting symbols). | 90 | at the beginning of the utterance (before we start outputting symbols). |
91 | This is necessary to fix a rather subtle problem that happens when we have | 91 | This is necessary to fix a rather subtle problem that happens when we have |
92 | words with an empty phonetic representation (e.g. the beginning and | 92 | words with an empty phonetic representation (e.g. the beginning and |
93 | end of sentence symbols \<s\> and \</s\>). | 93 | end of sentence symbols \<s\> and \</s\>). |
94 | 94 | ||
95 | We give the outline of how we would formally prove that the | 95 | We give the outline of how we would formally prove that the |
96 | intermediate stages of graph compilation (e.g. LG, CLG, HCLG) are determinizable; | 96 | intermediate stages of graph compilation (e.g. LG, CLG, HCLG) are determinizable; |
97 | this is important in ensuring that our recipe never fails. By determinizable, | 97 | this is important in ensuring that our recipe never fails. By determinizable, |
98 | we mean determinizable after epsilon removal. The general setup is: | 98 | we mean determinizable after epsilon removal. The general setup is: |
99 | first, we stipulate that G must be determinizable. This is why we need the \#0 | 99 | first, we stipulate that G must be determinizable. This is why we need the \#0 |
100 | symbols (G is actually deterministic, hence determinizable). Then we want L to | 100 | symbols (G is actually deterministic, hence determinizable). Then we want L to |
101 | be such that for any determinizable G, L o G is determinizable. [The same | 101 | be such that for any determinizable G, L o G is determinizable. [The same |
102 | goes for C, with L o G on the right instead of G]. There are a lot of details | 102 | goes for C, with L o G on the right instead of G]. There are a lot of details |
103 | of the theory still to be fleshed out, but I believe it is sufficient for L to have two | 103 | of the theory still to be fleshed out, but I believe it is sufficient for L to have two |
104 | properties: | 104 | properties: |
105 | - \f$ L^{-1} \f$ must be functional | 105 | - \f$ L^{-1} \f$ must be functional |
106 | - equivalently: any input-sequence on L must induce a unique output-sequence | 106 | - equivalently: any input-sequence on L must induce a unique output-sequence |
107 | - equivalently: for any linear acceptor A, A o L is a linear transducer or empty. | 107 | - equivalently: for any linear acceptor A, A o L is a linear transducer or empty. |
108 | - L has the twins property, i.e. there are no two states reachable | 108 | - L has the twins property, i.e. there are no two states reachable |
109 | with the same input-symbol sequence, that each have a cycle with the | 109 | with the same input-symbol sequence, that each have a cycle with the |
110 | same input sequence but different weight or output sequence. | 110 | same input sequence but different weight or output sequence. |
111 | 111 | ||
@@ -117,7 +117,7 @@ namespace fst { | |||
117 | The \ref ContextFst object (C) is a dynamically created | 117 | The \ref ContextFst object (C) is a dynamically created |
118 | FST object that represents a transducer from context-dependent phones to | 118 | FST object that represents a transducer from context-dependent phones to |
119 | context-independent phones. The purpose of this object is to avoid us having | 119 | context-independent phones. The purpose of this object is to avoid us having |
120 | to enumerate all possible phones in context, which could be difficult when | 120 | to enumerate all possible phones in context, which could be difficult when |
121 | the context width (N) or the number of phones is larger than normal. The | 121 | the context width (N) or the number of phones is larger than normal. The |
122 | constructor ContextFst::ContextFst requires the context-width (N) and | 122 | constructor ContextFst::ContextFst requires the context-width (N) and |
123 | central-position (P) which are further explained in \ref tree_window; their | 123 | central-position (P) which are further explained in \ref tree_window; their |
@@ -125,7 +125,7 @@ namespace fst { | |||
125 | It also requires the integer id of a special symbol, the "subsequential | 125 | It also requires the integer id of a special symbol, the "subsequential |
126 | symbol" (referred to as '$' in the reference given above), which the FST outputs | 126 | symbol" (referred to as '$' in the reference given above), which the FST outputs |
127 | N-P-1 times after all phones are seen (this ensures that the context FST | 127 | N-P-1 times after all phones are seen (this ensures that the context FST |
128 | is output-deterministic). In addition to this it requires lists of the | 128 | is output-deterministic). In addition to this it requires lists of the |
129 | integer id's of the phones and the disambiguation symbols. The vocabulary | 129 | integer id's of the phones and the disambiguation symbols. The vocabulary |
130 | on the output side of the ContextFst consists of the set of phones and | 130 | on the output side of the ContextFst consists of the set of phones and |
131 | disambiguation symbols plus the subsequential symbol. The vocabulary on the | 131 | disambiguation symbols plus the subsequential symbol. The vocabulary on the |
@@ -135,14 +135,14 @@ namespace fst { | |||
135 | place of epsilon in the "traditional recipe", but which we treat as any other | 135 | place of epsilon in the "traditional recipe", but which we treat as any other |
136 | disambiguation symbol (it is necessary to ensure determinizability in the sense | 136 | disambiguation symbol (it is necessary to ensure determinizability in the sense |
137 | we prefer, i.e. with epsilon removal). The subsequential symbol '$' has nothing corresponding | 137 | we prefer, i.e. with epsilon removal). The subsequential symbol '$' has nothing corresponding |
138 | to it on the input side (this is also the case in the traditional recipe). | 138 | to it on the input side (this is also the case in the traditional recipe). |
139 | The symbol id's corresponding to the disambiguation | 139 | The symbol id's corresponding to the disambiguation |
140 | symbols on the input side are not necessarily the same integer identifiers as used on the output | 140 | symbols on the input side are not necessarily the same integer identifiers as used on the output |
141 | side for the corresponding symbols. | 141 | side for the corresponding symbols. |
142 | The ContextFst object has a function ILabelInfo() | 142 | The ContextFst object has a function ILabelInfo() |
143 | which returns a reference to an object of type std::vector<std::vector<int32> > which | 143 | which returns a reference to an object of type std::vector<std::vector<int32> > which |
144 | enables the user to work out the "meaning" of each symbol on the input side. The | 144 | enables the user to work out the "meaning" of each symbol on the input side. The |
145 | correct interpretation of this object is described further in \ref tree_ilabel. | 145 | correct interpretation of this object is described further in \ref tree_ilabel. |
146 | 146 | ||
147 | There is a special "Matcher" object called ContextMatcher which is intended to be | 147 | There is a special "Matcher" object called ContextMatcher which is intended to be |
148 | used in composition algorithms involving the ContextFst (a Matcher is something that OpenFst's | 148 | used in composition algorithms involving the ContextFst (a Matcher is something that OpenFst's |
@@ -162,14 +162,14 @@ namespace fst { | |||
162 | We deal with the whole question of weight pushing in a slightly different way from | 162 | We deal with the whole question of weight pushing in a slightly different way from |
163 | the traditional recipe. Weight pushing in the log semiring can be helpful in speeding | 163 | the traditional recipe. Weight pushing in the log semiring can be helpful in speeding |
164 | up search (weight pushing is an FST operation that ensures that the arc probabilities | 164 | up search (weight pushing is an FST operation that ensures that the arc probabilities |
165 | out of each state "sum to one" in the appropriate sense). | 165 | out of each state "sum to one" in the appropriate sense). |
166 | However, in some cases weight pushing can have bad effects. The issue is that | 166 | However, in some cases weight pushing can have bad effects. The issue is that |
167 | statistical language models, when represented as FSTs, generally "add up to more than one" | 167 | statistical language models, when represented as FSTs, generally "add up to more than one" |
168 | because some words are counted twice (directly, and via backoff arcs). | 168 | because some words are counted twice (directly, and via backoff arcs). |
169 | 169 | ||
170 | We decided to avoid ever pushing weights, so instead we handle the whole issue in a different | 170 | We decided to avoid ever pushing weights, so instead we handle the whole issue in a different |
171 | way. Firstly, a definition: we call a "stochastic" FST one whose weights sum to one, and | 171 | way. Firstly, a definition: we call a "stochastic" FST one whose weights sum to one, and |
172 | the reader can assume that we are talking about the log semiring here, not the tropical one, | 172 | the reader can assume that we are talking about the log semiring here, not the tropical one, |
173 | so that "sum to one" really means sum, and not "take the max". | 173 | so that "sum to one" really means sum, and not "take the max". |
174 | We ensure that each stage of graph creation has the property that if the previous stage | 174 | We ensure that each stage of graph creation has the property that if the previous stage |
175 | was stochastic, then the next stage will be stochastic. That is: if G is stochastic, | 175 | was stochastic, then the next stage will be stochastic. That is: if G is stochastic, |
@@ -179,12 +179,12 @@ namespace fst { | |||
179 | stochasticity". Now, it would be quite possible to do this in a very trivial but not-very-useful | 179 | stochasticity". Now, it would be quite possible to do this in a very trivial but not-very-useful |
180 | way: for instance, we could just try the push-weights algorithm and if it seems to be failing because, | 180 | way: for instance, we could just try the push-weights algorithm and if it seems to be failing because, |
181 | say, the original G fst summed up to more than one, then we throw up our hands in horror and | 181 | say, the original G fst summed up to more than one, then we throw up our hands in horror and |
182 | announce failure. This would not be very helpful. | 182 | announce failure. This would not be very helpful. |
183 | 183 | ||
184 | We want to preserve stochasticity in | 184 | We want to preserve stochasticity in |
185 | a stronger sense, which is to say: first measure, for G, the min and max over all its states, | 185 | a stronger sense, which is to say: first measure, for G, the min and max over all its states, |
186 | of the sum of the (arc probabilities plus final-prob) out of those states. This is what our | 186 | of the sum of the (arc probabilities plus final-prob) out of those states. This is what our |
187 | program "fstisstochastic" does. If G is stochastic, both of these numbers would be one | 187 | program "fstisstochastic" does. If G is stochastic, both of these numbers would be one |
188 | (you would actually see zeros from the program, because actually we operate in log space; this is | 188 | (you would actually see zeros from the program, because actually we operate in log space; this is |
189 | what "log semiring" means). We want to preserve stochasticity in the following sense: that this | 189 | what "log semiring" means). We want to preserve stochasticity in the following sense: that this |
190 | min and max never "get worse"; that is, they never get farther away from one. In fact, this | 190 | min and max never "get worse"; that is, they never get farther away from one. In fact, this |
@@ -194,28 +194,28 @@ namespace fst { | |||
194 | - Determinization | 194 | - Determinization |
195 | - Epsilon removal | 195 | - Epsilon removal |
196 | - Composition (with particular FSTs on the left) | 196 | - Composition (with particular FSTs on the left) |
197 | There are also one or two minor algorithms that need to preserve stochasticity, | 197 | There are also one or two minor algorithms that need to preserve stochasticity, |
198 | like adding a subsequential-symbol loop. | 198 | like adding a subsequential-symbol loop. |
199 | Minimization naturally preserves stochasticity, as long as we don't do any weight pushing | 199 | Minimization naturally preserves stochasticity, as long as we don't do any weight pushing |
200 | as part of it (we use our program "fstminimizeencoded" which does minimization without | 200 | as part of it (we use our program "fstminimizeencoded" which does minimization without |
201 | weight pushing). Determinization preserves stochasticity | 201 | weight pushing). Determinization preserves stochasticity |
202 | as long as we do it in the same semiring that we want to preserve stochasticity in (this means | 202 | as long as we do it in the same semiring that we want to preserve stochasticity in (this means |
203 | the log semiring; this is why we use our program fstdeterminizestar with the option | 203 | the log semiring; this is why we use our program fstdeterminizestar with the option |
204 | --determinize-in-log=true). Regarding epsilon removal: firstly, we have | 204 | --determinize-in-log=true). Regarding epsilon removal: firstly, we have |
205 | our own version of epsilon removal "RemoveEpsLocal()" (fstrmepslocal), which doesn't guarantee | 205 | our own version of epsilon removal "RemoveEpsLocal()" (fstrmepslocal), which doesn't guarantee |
206 | to remove all epsilons but does guarantee to never "blow up". This algorithm is unusual among | 206 | to remove all epsilons but does guarantee to never "blow up". This algorithm is unusual among |
207 | FST algorithms in that, to to what we need it to do and preserve stochasticity, it needs to | 207 | FST algorithms in that, to to what we need it to do and preserve stochasticity, it needs to |
208 | "keep track of" two semirings | 208 | "keep track of" two semirings |
209 | at the same time. That is, if it is to preserve equivalence in the tropical semiring and | 209 | at the same time. That is, if it is to preserve equivalence in the tropical semiring and |
210 | stochasticity in the log semiring, which is what we need in practice, | 210 | stochasticity in the log semiring, which is what we need in practice, |
211 | it actually has to "know about" both semirings simultaneously. This seems to be an edge case | 211 | it actually has to "know about" both semirings simultaneously. This seems to be an edge case |
212 | where the "semiring" concept somewhat breaks down. | 212 | where the "semiring" concept somewhat breaks down. |
213 | Composition on the left with the lexicon L, the context FST C and the H tranducer (which encodes | 213 | Composition on the left with the lexicon L, the context FST C and the H tranducer (which encodes |
214 | the HMM structure) all have to preserve stochasticity. Let's discuss this the abstract: | 214 | the HMM structure) all have to preserve stochasticity. Let's discuss this the abstract: |
215 | we ask, when composing A o B, what are sufficient properties that A | 215 | we ask, when composing A o B, what are sufficient properties that A |
216 | must have so that A o B will be stochastic whenever B is stochastic? We believe these properties are: | 216 | must have so that A o B will be stochastic whenever B is stochastic? We believe these properties are: |
217 | 217 | ||
218 | - For any symbol sequence that can appear on the input of B, the inverse of A must accept | 218 | - For any symbol sequence that can appear on the input of B, the inverse of A must accept |
219 | that sequence (i.e. it must be possible for A to output that sequence), and: | 219 | that sequence (i.e. it must be possible for A to output that sequence), and: |
220 | - For any such symbol sequence (say, S), if we compose A with an unweighted linear FST with | 220 | - For any such symbol sequence (say, S), if we compose A with an unweighted linear FST with |
221 | S on its input, the result will be stochastic. | 221 | S on its input, the result will be stochastic. |
@@ -223,14 +223,14 @@ namespace fst { | |||
223 | These properties are true of C, L and H, at least if everything is properly normalized | 223 | These properties are true of C, L and H, at least if everything is properly normalized |
224 | (i.e. if the lexicon weights sum to one for any given word, and if the HMMs in H are properly | 224 | (i.e. if the lexicon weights sum to one for any given word, and if the HMMs in H are properly |
225 | normalized and we don't use a probability scale). However, in practice in our graph creation | 225 | normalized and we don't use a probability scale). However, in practice in our graph creation |
226 | recipes we use a probability scale on the transition probabilities in the HMMs (similar to the | 226 | recipes we use a probability scale on the transition probabilities in the HMMs (similar to the |
227 | acoustic scale). This means that the very last stages of graph creation typically don't | 227 | acoustic scale). This means that the very last stages of graph creation typically don't |
228 | preserve stochasticity. Also, if we are using a statistical language model, G will typically | 228 | preserve stochasticity. Also, if we are using a statistical language model, G will typically |
229 | not be stochastic in the first place. What we do in this case is we measure at the start | 229 | not be stochastic in the first place. What we do in this case is we measure at the start |
230 | how much it "deviates from stochasticity" (using the program fstisstochastic), and during | 230 | how much it "deviates from stochasticity" (using the program fstisstochastic), and during |
231 | subsequent graph creation stages (except for the very last one) we verify that the | 231 | subsequent graph creation stages (except for the very last one) we verify that the |
232 | non-stochasticity does not "get worse" than it was at the beginning. | 232 | non-stochasticity does not "get worse" than it was at the beginning. |
233 | 233 | ||
234 | 234 | ||
235 | */ | 235 | */ |
236 | 236 | ||