<P> Given basic sound blocks that a machine digitized, one has a bunch of numbers which describe a wave and waves describe words . Each frame has a unit block of sound, which are broken into basic sound waves and represented by numbers which, after Fourier Transform, can be statistically evaluated to set to which class of sounds it belongs . The nodes in the figure on a slide represent a feature of a sound in which a feature of a wave from the first layer of nodes to the second layer of nodes based on statistical analysis . This analysis depends on programmer's instructions . At this point, a second layer of nodes represents higher level features of a sound input which is again statistically evaluated to see what class they belong to . Last level of nodes should be output nodes that tell us with high probability what original sound really was . </P> <Ul> <Li> Search to match the neural - network output scores for the best word, to determine the word that was most likely uttered . </Li> </Ul> <Li> Search to match the neural - network output scores for the best word, to determine the word that was most likely uttered . </Li> <P> Speech recognition can become a means of attack, theft, or accidental operation . For example, activation words like "Alexa" spoken in an audio or video broadcast can cause devices in homes and offices to start listening for input inappropriately, or possibly take an unwanted action . Voice - controlled devices are also accessible to visitors to the building, or even those outside the building if they can be heard inside . Attackers may be able to gain access to personal information, like calendar, address book contents, private messages, and documents . They may also be able to impersonate the user to send messages or make online purchases . </P>

What is step four in the core attributes of speech