I bought the book with much anticipation since I am a regular reader of Mr. Geitgey’s posts. The book did not disappoint. I particularly enjoyed the introductory section on neural networks, specially, , the lucid description of forward and back propagation. I have read many references on the web and have taken the famous machine learning course by Andrew NG but none of those references explained how a neural network works as well as machine learning is fun did.
The code examples are easy-to-read and are well organized.
The VMware virtual machine is a nice touch.
I would have liked to see more discussion about adversarial neural networks and generative neural networks. In addition, more details of commonly used optimization algorithms such as gradient descent would have been welcome.
Finally, a section on how to install the several libraries mentioned would also be handy.
Speech Recognition
The role of bridging technology in speech-recognition
If you are a screen reader user and want to use speech-recognition, then you will be asked to use bridging technology. As of this writing, there are two options you have.
- J-Say wwith Dragon Naturally Speaking
- System Access with Windows speech-recobnition.
This post is not about which technology is better. One question that is asked often is can you use speech-recognition without bridging technology? The answer is yes you can but there are things that bridging technology helps you to do. You need to decide if these things matter to you. Most of my experience has been with J-Say so the below points relate to that technology.
- Reading the training text
- This is true of DNS version 8 and I suspect this has not changed for Dragon version 12.5. Training the text was very difficult. The difficulty was in knowing what text to speak next. I had to use the jaws color reporting feature accessed via jaws key + 5 to check the color of the text.
- Screen reader control
- Bridging technology gives you screen reader control so you can use your voice to do screen reading tasks. I have found that tasks such as commanding the computer to speak the contents of a dialog box, line or paragraph work nicely but scrolling and editing text is better done by the keyboard.
- Checking the status of the microphone
- When using speech-recognition, it is crucial to know if the microphone is off or on. Bridging technology like J-Say may give you an easier way of determining this.
There are advantages to not using any bridging technology. The main advantage is that you are not tied to any screen reader or other product so can update as soon as updates are released. You do get third party programs such as knowbrainer which give you commands that make formatting easier but these products are not particularly screen reader compatible. For example, knowbrainer works by pressing keys to execute various commands. That gets noisy quickly when using a screen reader. The other advantage is cost. It is significantly cheaper to just buy Dragon Naturally Speaking and Windows speech recognition is built into windows.
controlling jaws for Windows through Dragon NaturallySpeaking using J say
this post is going to walk you through the steps of controlling jaws for Windows from Dragon naturally speaking. You cannot speak the keystrokes to control Jaws for Windows directly since the Jaws keyboard driver runs at a lower level than Dragon’s keyboard driver. There are two ways you can control jaws from Dragon.. One way is to use the com interface that Jaws provides. The second way is to call Jaws scripts. I am going to be showing the second way since I’m assuming that you have J-Say technology installed.
The first thing you need to do is to determine the name of the script you want to call. You can see the name of the script from the Jaws scripting manager or by looking at the Jaws keyboard manager. Once you have that information, you can begin writing your command. We will use the example of initiating a jaws tandem session. This is a globally defined jaws script and is not in the set of current J-Say commands. A jaws tandem session is initiated by calling the StartOrEndTandemSession script.
I am going to assume that the StartOrEndTandemSession script is installed and working.
1. Go to the desktop.
2. From the tools menu of dragon, invoke the “add new command” dialog. If you have dragon running, say “add command”.
3. Dictate or type a name and a description.
4. Make the command a global command as well as an advanced scripting command.
Note:
On my computer, the combo box to select the type of the command is not spoken automatically. You may have to use the Jaws read current line command to read the value that has been selected in the combo box.
5. Keep tabbing until you reach an edit area where you can type your script. 6. A set of begin and end statements will already be inserted for you. 7. Between them, enter the following line.
DllCall “MSGW1004″,”JFWRunScript”,”StartOrEndTandemSession”
Note:
I do not know what MSGW1004 stands for but I suspect, after reading the help it is a dll name.
8. Now, activate the jaws cursor and click the save button. This is found at the bottom of the scripting edit area.
9. Assuming there are no errors, you will be back at the desktop or perhaps in the command browser.
10. In any case, your command is ready now so all you now need do is to speak the name of the command and it should work.
You can have as many dllCall statements as you want in a script. The key thing to remember is that the Jaws and Dragon scripts should be synchronized that is they should be loaded at the same time otherwise, if the Dragon commands are loaded and the Jaws script files are absent, you will get an “unknown script call” to the script if you speak the relevant Dragon command.
Let us look at another example. I have created a macro in Outlook that deletes messages that have the same thread. I need to invoke this macro when I say “delete thread”. The Dragon command is below. I have added comments to each statement.
Sub Main
DllCall “MSGW1004″,”JFWRunScript”,” SpeechOff” ‘turn jaws speech off SendDragonKeys “{alt+f8}” ‘invoke the macro ecxecution dialog SendDragonKeys “{alt+r}” ‘only 1 macro in outlook so run it
DllCall “MSGW1004″,”JFWRunScript”,” SpeechOn” ‘enable jaws speech End Sub
From the Keyboard to the Microphone: A More Natural Way of Computing
This is my article in <a href="http://www.blindskills.com/dialogue.html"Dialogue magazine It deals with migrating from using the keyboard to using speech-recognition.
The article also makes a special reference to screen readers and other bridging technology like J-Say
The article has been published in the November December Issue.
Conversing with your computer: challenges, solutions and the road ahead<
By: Pranav Lal
Apaper I presented on theNational Conference onInformation and Communication Technologies (ICT) at Ahmedabad held on 19 and 20 September 2008
Introduction
When we think of conversing with our computer, pictures of HAL in 2001, a space audicy come to mind. However, the technology for doing this is here and is being used heavily in specialist applic
ations. Namely, speech-recognition engines have been connected to screen readers to allow you to converse with your computer and control it at the same time. This paper will discuss these technologies. It will highlight the unique challenges in connecting screen readers to speech-recognition applications and elucidate how they have been over cum. The paper will focus largely on the Windows operating system from Microsoft since most of the solutions are Windows based. Before the challenges are discussed, it is important to understand how these technologies work in isolation. When a blind user interacts with a computer, he uses a program called a screen reader. The screen reader, via a speech synthesizer converts the output of the computer into speech. The screen reader has a series of commands that the user can use to read various parts of the screen. For example, the user can press a keystroke to read a paragraph of text. The screen reader also has to track what is happening on the computer to handle things like popups. For example, if another application such as a CPU temperature monitor pops up a message, the screen reader must interrupt if it is speaking and speak that message. Or, if the user has set the reader to ignore such warnings, the screen reader will behave accordingly. The screen reader uses a variety of techniques to monitor what is happening on the computer. For instance, it taps into the windows messaging subsystem and tracks the messages windows is sending to different applications. If the screen reader detects a message of significance, it traps and processes it. The screen reader also deals with complex screen layouts and has to render the World Wide Web in a comprehensible form. All screen readers read from left to right and top to bottom. The user has tremendous flexibility on what he wants read and at what time. Speech-recognition programs on the other hand work in reverse that is they convert the spoken word into text and commands. A large number of them use hidden Markov models to determine word probabilities and output the most probable match of a word on the screen. Along the way, the speech-recognition program has to take into account the acoustics of the environment, the pronunciation and accent of the user etc. In theory, interlinking a screen reader with a speech-recognition program should not be anything very difficult. As long as the speech-recognition program uses standard controls, a screen reader would be able to interact with it without any problems. This is indeed the case. However, as we will see in the subsequent section, this does not give the complete picture. There are several other elements that need to be taken into account when these technologies are interlinked. Plus, interlinking these technologies requires a lot of time and effort and anyone setting out to do this must have a sustainable business model. This paper will also examine the current business model being followed by such companies. Finally, the paper will give a quick glimpse into the future of this bridging technology.
The challenges of integrating screen readers and speech-recognition programs
1. Technical challenges There are several technical challenges that need to be overcome to link screen readers to speech-recognition programs. For example, a speech-recognition program usually works at a higher level in the Windows operating system than a screen reader so; there is no easy way for the speech-recognition program to control the screen reader. This means that the user would not be able to give Voice Commands to control the screen reader. This will prevent a user achieving hands-free control of the computer. Another technical challenge relates to providing suitable feedback to the user on what the speech-recognition program is doing. For example, whenever a speech-recognition program in counters text that it cannot recognize, it visually signals the user. For example, Dragon NaturallySpeaking, a leading speech-recognition product from Nuance shows three question marks on the screen when it cannot recognize what the user is saying. The screen reader has to be able to track when this occurs and give the user suitable feedback without interrupting anything else that would be going on. Finally, the technologies need to work seamlessly across different hardware and software combinations. 2. The challenge of usability Another challenge that must be overcome for successful integration of speech-recognition with screen reading is usability. That is, how easy is it for the user to actually use both technologies? Each of these technologies if taken alone can be extremely complex. The challenge lies in masking the complexities of these technologies to allow the user seamless interaction with the computer. 3. Environmental challenges The use of a screen reader means that the computer on which the user is working needs to be able to handle both screen reading and speech-recognition applications. Both these applications require considerable amounts of CPU processing power and ram. More than that, they also require high-quality sound sources to insure that simultaneous input and output can be handled. 4. The challenge of development If a third-party developer has to take on the above two challenges, he has to keep up with developments in both technologies. On top of that, the market for this kind of technology is quite fragmented since every customer has her own needs. This involves a fair amount of customization of the given technology to meet the individual requirement of the customer. For example, there could be one customer who would just be using a simple set of productivity applications. There could be another customer who would need a significant degree of customization of both technologies to handle challenging applications such as those used in call centers. Finally, there could be a third customer who would want to experiment with different applications and, who would need the ability to extend both technologies at will. 5. Market economics Any company building such a solution would need to ensure that it had a very strong business model. A lot of research is required in building such a solution. Once that research has been carried out, continuous testing is also required to ensure that the solution works as expected. Finally, teams of specialists are required to actually go and install the product as well as training people on how to use this kind of technology.
Some solutions
The discussion around the solutions has to be necessarily product based. There is no one framework or strategy that can be used to marry speech-recognition applications and screen readers. The current solution on the market is the J-ware line of products from T&T Consultancy Ltd. Their flagship product is called J-say. J-say combines Dragon NaturallySpeaking and Jaws for Windows to provide a seamless and consistent solution for integrating Dragon NaturallySpeaking and Jaws for windows. Jaws for windows is one of the foremost screen readers on the market. The reason it lends its self so well for integration with speech-recognition is because of its extremely powerful scripting language. So, let us see how J-say addresses the challenges we had listed above. 1. Technical challenges The precise details of J-say’s implementation remain confidential. J-Say relies heavily upon a complex set of JAWS for Windows scripts to Provide the necessary oral and Braille-based feedback to the visually Impaired user and to create the many special utilities and programs J-Say has available. The JAWS scripting language contains a full Programming interface for creating access to applications or to automate Specific routines. Using this programming language, J-Say interacts With Microsoft Windows through its windows hierarchy, and makes Extensive use of Microsoft Active Accessibility implementation and the Object Model code found within Microsoft applications such as Word, Outlook and Internet Explorer. In this way, J-Say is able to report Information back to the user accurately by interacting directly with the Operating system or application rather than relying upon screen-based Data which can become temporarily obscured. A minimal amount of scripting is undertaken using Dragon NaturallySpeaking’s macro creation capability. The process of creating A voice command is to specify the voice phrase within the Dragon NaturallySpeaking Command Browser and through a simple routine the JAWS Script which will in turn carry out the action the user has in mind is Linked to the voice command via a simple routine called from a DLL file. 2. The challenge of usability J-say has adopted a minimal speech approach. This approach cuts both ways that is it minimizes the amount of verbal feedback the user receives from Jaws for Windows and on the other hand, the user is able to accomplish any given task with the minimum of speech input. This has been done by designing several easy to remember commands that provide complete control to the user. For example, the command to read the current line on the screen is simply “speak line”. Similarly, the command to reboot the computer is “restart the computer”. There is also extensive help available while using J-say so that the user does not really have to remember anything except perhaps the command to invoke the help facility. J-Say also takes into account the shift that occurs when using speech-recognition to carry out everyday tasks. For example, it is easy to select text when using a keyboard. All that the user has to do is to hold down the shift key and move the arrow keys. This approach is very tedious when using speech-recognition since firstly, there is no easy way to hold down a modifier key and secondly, even the slightest error in utterance could lead to the selected text being replaced with junk text. J-Say has a unique selection facility that allows the user to mark the beginning of a block of text. The user can then use any means at her disposal to navigate to the end of the text and mark it. Finally, the user can issue commands to cut or copy the text to the windows clipboard. 3. Environmental challenges There is no getting around using a powerful computer for using this kind of bridging technology. Using a computer with low hardware specifications will lead to suboptimal results. At the time of this writing, it is recommended that separate sound sources are used for input of speech and output of synthesized speech. This will ensure that there is no cross talk between these sources. Finally, a high quality microphone is required so that it only picks up the user’s voice and does not get confused by the speech synthesizer’s speech and other environmental noise. 4. The challenge of development Fortunately, Dragon Naturally Speaking and Jaws for Windows are very extensible. Both have powerful scripting languages and APIs that allow third party developers to customize both solutions. J-Say allows the user to take advantage of this extensibility by exposing it’s interface with which the user can call Jaws for Windows Scripts from Dragon Naturally Speaking. Calling a Jaws script is a matter of executing a single routine from a Dragon Naturally Speaking script. To add to that T&T Consultancy and the J-Say user community are exceptionally supportive to anyone who is attempting to customize J-Say. 5. Market economics The adaptive technology industry is littered with examples of companies who had terrific products, growing customer bases but failed. One of the causes of this failure has been the companies’ focus on a single disability. Though J-Say is the flag ship product of T&T Consultancy, they do not have it as their sole focus. They have created other products such as the scripts for ITunes. They are also reseller of several adaptive products. To add to this, they have altered their business model to focus on other disabilities besides vision and are actively promoting their products as cross disability products. This has helped them to get a significantly larger market share. Also, they have elected to go with distributers who are not dedicated to serving only a single kind of customer. For example, their distributer for North America and Canada, also supplies products to the medical community. Also, a significant part of T-And-T’s income comes from training and onsite support which are recurring activities. This has made the company less dependent on new sales which can be slow due to the nature of the access technology market. Finally, T-And-T listens very closely to it’s customers. The CEO actively participates on the user list and trains users himself. Plus, the beta testers of the product cover a large cross section of customers. J-Say is not restricted to a single speech-recognition solution. J-Say technology has also been applied to Windows Vista speech-recognition in the form of the J-vist product. This product also meets the needs of customers who would rather use speech-recognition for dictation and control the computer using other means.
The road ahead
It is not possible to outline specific developments that would take place in this area. However, some general comments can be made. Speech-recognition is being gradually applied in various devices. We can see it most frequently in our ability to dictate an assigned name of a contact into our mobile phones. Similarly, speech synthesis is also catching up. Again, note the synthetic voice announcing the name of the caller in a number of mobile phones. In the long run, speech as a mode of input will probably replace the keyboard since it is far more natural for a person. For this to happen though a significant number of technical challenges, the discussion of which is beyond the scope of this paper need to be met. In terms of economics, small companies with innovative products will succeed in the market. Some of these products may be niche products initially so the scale of operations could be small, but over time, as the word spreads and technology improves, they will become common place.
Acknowledgements
The author would like to thank the following people for their invaluable contributions. Brian Hartgen of T&T Consultancy Limited for his help with the technical explanation of J-Say. Brian is also the lead developer of J-Say. Edward S. Rosenthal, President and CEO of Next Generation Technologies Inc., for his help with the perspective on speech-recognition and other suggestions for this paper.
References
T&T Consultancy, the makers of J-Say Next Generation Technologies, master distributer of J-Say especially for the USA FreedomScientific, the makers of Jaws For Windows Nuance, the makers of Dragon Naturally Speaking