I had to receive my cousin at the airport. Her flight was landing at 23:45 IST. I have always been concerned about the reaction of security personnel to my video glasses. This time, I was not going inside the terminal building therefore, decided to wear my Vuzix M400 device and take a chance.
I need not have worried. No one gave me a second look or if they did, that look did not translate into any conversation or interfearance
The airport was well lit and I was able to sense people moving ahead of me. Sometimes, I saw patches of relative darkness followed by patches of light and then some kind of object like a railing or a sign. This meant that people had clustered around said obstacle and then moved away.
The OCR feature came in handy in reading the various signs.
One thing to remember at least at terminal 3 of the Indira Gandhi International Airport, is that the gate numbers are written above the gates therefore you need to raise your head to look at them unless you are tall. It is also possible to tilt the lens of the Vuzix device upwards but that did not help as much as I expected it to.
There were also times that partial signs were read. In such a case, pan your head until you get something recognizable in focus. You may also need to venture closer to the sign. It is better to frame the sign in the center of your view and then move closer.
Do watch out for people and other moving obstacles like trolleys while moving.
After reading the above pages, you may well ask, what is the point of the vOICe specially when your phone can do as good a job of text reading?
For one thing, you will need your phone to coordinate with the person and to make phone calls. who you are picking.
In addition, the visual context changes almost constantly in such an environment and in case you are moved by the crowd, you will need a mechanism to reestablish visual context. This is where an immersive setup comes in handy.
In this case, I was able to see the sign of a money exchange and was able to tell my cousin where I was standing. I did have sighted assistance with me and there were multiple money exchanges so we eventually landed up also referencing an ATM which did the trick. The sign for the ATM was harder to read probably because it was in a smaller font or may be because I was a little further from it.
So, what about those Ray-ban smart glasses? They could have helped here but as of this writing, I would need to keep queering them constantly to account for the changing context. Their things around feature would have been useful but I do not know if they would have read text that was on the objects that were all around me without sending a lot of questions to the bot. I would also not have identified the changes in landscapes as we moved between the parking and the airport terminal.
I do plan to use the vOICe inside the airport next but will see when I get the opportunity to do that.
concepts
When the unthinkable happens
My father died on 17 August in hospital. I am not going to elaborate on my emotions in this post.
I wanted to experience every minute of his last physical journey. He was placed on a stretcher which was probably made of bamboo. I rode with his body in the hearse. ‘
I do have my organic memories of that ride but wanted something tangible.
Thanks to Dr. Meijer and the vOICe, I was able to see his body and the flowers that it had been draped with as we drove to the crematoria.
I now have a physical and audio / visual memory of him.
Moreover, if I want to see a photograph, I have the ability to do so.
Once again, I am grateful to Dr., Meijer’s design decision that allows the vOICe to run on virtually any modern device. I did not have my video glasses with me, all I had was my iPhone 13 mini. However, in a matter of a minute, by entering the URL to the vOICe web app, I had sight.
My father was excited about AI and what it meant for me. I am placing a soundscape of the body and am placing an unedited description generated by ChatGPT above the audio element.
The image shows a close-up of a person who appears to be in a medical setting. The individual has a tanned complexion with visible signs of aging and is resting with their eyes closed. They are wrapped in white medical fabric, similar to gauze or a thin blanket, which envelops most of their head and neck area. There is a blue pad or cloth near the top of their head, possibly a part of a hospital bed or medical equipment.
The person has a white, cotton-like material inserted into their nostrils, commonly used to stop bleeding or to keep the nostrils clear. There are a few small red marks, possibly bruise or minor wound, on the visible side of their face. Their expression is relaxed, suggesting they are unconscious or sleeping.
The image’s surroundings are out of focus and not clearly defined, emphasizing the person and their immediate condition. The color palette consists mainly of blues, whites, and skin tones.
There is no visible text in the image providing additional details or context. The user interface from an image viewer is visible, showing options for editing and information, including the file name “IMG_1921.HEIC” and the file size of “1.9 MB”. The zoom level in the viewer
Reader’s Note
The image in the soundscape is without any of the screen elements mentioned in the above description. I had opened the image in the windows 11 photo viewer and had used the ai content describer plugin to get a description of the image from ChatGPT.
Interrogating images for maximum accessibility
Until fairly recently, image accessibility was a no go unless you could print them in tactile form. If you had the source data, you could generate them by assigning labels which again needed human intervention.
In some cases, you could use sonification which is still the case and indeed, it remains the only way to “experience” an image on the fly.
However, thanks to large language models and their ability to process images, we can now ask questions of these confounded sets of pixels.
I use the nvda screen reader which has 3 add-ons that generate image descriptions.
XPoseImageCaptioner
This add-on has been written by Christopher Millsap. It can generate image descriptions from jpeg and png files. It works offline that is, once the add-on is installed, you do not need to be connected to the internet to generate image descriptions.
It works the best for scenes like photographs of beaches, monuments and so on and does not have OCR capabilities. It is extremely fast because no network access is required.
In my experience, the descriptions are sufficiently detailed that you get a better than average understanding of the scene.
Cloud Vision
This add-on has been written by Aleks Samos.
It uses the google vision API to generate image descriptions. It can also read qr codes though this is something I have not tried.
The descriptions that are generated are good and if there is text in the image, then that text will also be rendered. Read the help for usage instructions. The home page of this add-on is in Russian though its help as rendered in the add-on store is in English. A handy feature is that it can place the description in a virtual buffer letting you review it at leisure or copy it into another document.
AI Content Describer
This add-on has been written by Carter Temm. It sends the image to be described to open AI’s chat-gpt.
Its descriptions are detailed and text in the image is also rendered. However, in many cases, the output is like a human reading back the description therefore, you get the text in context. You can specify the initial prompt that you send with the image. This gives you the ability to tweak the chat-gpt answer the way you want.
E.G., I was wanting to analyze a form that I was filling out. Its check boxes were rendered in a non-standard way such that the screen reader detected them as buttons. Here is the prompt I used.
“Describe this image and if it is a screen shot of a form, tell me which fields have errors and or are rendered in a different color or are highlighted in red”
There is an unbound keystroke that captures a screen shot and sends it to Open AI.
You can open each description in its own dialogue box.
The hardest part about using this add-on is the getting of the open AI API key. You need to deal with the Open AI interface. It is not bad but it takes some exploring to find where to buy credits from. In addition, when I added my credit card, I was charged. This is a verification process that Open AI does. The charge was reversed. I was confused by the charged thinking that I had bought credit but that was not the case.
All the above 3 add-ons are screen reader and consequently platform dependent.
Enter Vision
This is a python script written by Rastislav Kish. You can specify the image file or files that you need described. You can also specify the prompt that you want to send to chat-gpt. However, usually, the default prompt works.
The setup of this script is more involved with you having to install two python packages and adding an environment variable for your Open API key. However, the instructions are easy to follow.
This is the only solution so far that allows you to ask subsequent questions to chat-gpt relating to the specified image. It is screen reader independent and can also act like a command line chat-gpt client.
So far so good but how about an example? See the below image that I took on a recent trip to the Maldives.
I ran all the above solutions with their default prompts on the above image and here is what I got.
EXpose
a beach with a body of water and a pier in the distance with footprints in the sand.
Cloud Vision
sunset on the tropical beach
Beach ridge: Sea
outdoor, nature, cloud, beach, water, sky, ground, horizon, sand, shore, mudflat, calm, coast, coastal and oceanic landforms, landscape, sunrise, sunset, ocean.
a beach with a pier and a sunset.
AI Content Describer
This image captures a tranquil beach scene at twilight. The photo is taken from the shore, looking out across calm waters towards the horizon where the sky exhibits warm hues of yellow and orange near the setting sun, transitioning to soft blues higher up. A few scattered clouds are visible in the sky, adding texture and depth to the vista. On the right, there’s a silhouette of a pier or structure extending into the water, with reflections glimmering on the surface. The beach in the foreground is scattered with footprints, indicating recent activity, and the waves gently lap at the sand. The overall atmosphere is serene, evoking the quiet calm of an evening by the sea. The image is also tilted, giving a dynamic feel to the composition.
Vision
See the conversation here.
I have run the script on my desktop computer that runs linux. I have done minor formatting to improve clarity.
[pranav@archlinux ~]$ ~/vision/vision IMG_1542.jpg
The image captures a tranquil beach scene at twilight. The sun appears to be setting, casting a warm glow across the sky with colors ranging from blue to orange. Gentle waves are lapping at the shore, and you can see footprints scattered on the smooth sand, indicating people have walked there recently. On the right, extending into the sea, there appears to be a row of overwater bungalows or structures, which suggests this may be a resort or a vacation area. In the background, there are several boats on the horizon, further giving the impression of a seaside locale popular for recreation or tourism. The overall effect is peaceful and picturesque, emphasizing the natural beauty of a beachside sunset.
Enter your message describe the boats
In the image, there are a few distant boats visible on the horizon. Due to the distance and resolution of the image, detailed descriptions of the boats are limited. However, they appear as small silhouetted shapes against the backdrop of the lighter sky reflecting on the water. These could likely be leisure boats or vessels used for fishing or transport, common in coastal areas with tourist activities. The boats are too far away to discern any specific details such as the type, size, or features of the boats.
Conversation price: 0.02433
>
Conclusion
We are in a better place than even a year ago with respect to access to images. Of course, the descriptions are only as good as the models we have and the prompts we use. In addition, I am hoping for more integration of large language models with image processing tools such that we can get coordinates of specific parts of images. This will allow us to focus on our areas of interest and then to do further analysis such as passing an area of the image through a specialist tool like a sonification program to experience just that part.
Sharing in national pride thanks to the Chandrayaan- 3 landing
The Chandrayaan- 3’s landing on the moon is old news. Many of us rushed to the television to watch the live feed from ISRO.
Like most Indians, it was a proud moment for me given that landing on the south pole of the moon had not been accomplished before.
I wanted to watch a clean video and did try watching the one on youtube.
Subsequently, ISRO which I was able to watch thanks to The vOICe.
I was able to sense the change in the image as the lander got closer and closer to the lunar surface.
For best results, do the following.
- Launch the vOICe for windows first.
- Activate client window sonification with ctrl+f9.
- Apply 4-fold zoom with f4.
- Turn off high contrast if you want to. (Thanks to Dr. Meijer for this tip.)
- Load the video and switch to full screen view.
Enjoy the extremely short video. Note the change in perspective as the lander gets closer.
For comparison, watch and or listen to the audio described video at the below link.
Described video of the Chandrayaan-3 landing
Note:
The video is in Hindi.
The low vision plunge
The start of 2023 has been a rocky one with Sacheta and I being hurled into the low vision world like never before. She has always had low vision thanks to glaucoma, but it has reduced further due to damage to her optic nerve. I am not going to talk about the causes behind the damage but the technical adaptations that we have had to make and are still making. My idea behind writing this post is to help others in our situation.
The first thing to know about low vision is that the characteristics of sight that a person has, are radically different therefore what works for me will not work for you. The settings and adaptations are unique and may change from day-to-day or from month-to-month. This is something that you will have to figure out on your own. There may be rehabilitation professionals who may help with some of this but as of this writing, I have not found any in India who have substantive knowledge in this regard. Moreover, there is a huge emotional component which remains unaddressed.
Sacheta has always had low vision but its characteristics have altered significantly.
The change in the kind of light
Sacheta used to prefer strong white light. We have LEDs across the house.
She now prefers a mix of white and yellow lights. Strong white light hurts her eyes.
Extreme contrast sensitivity
People who have glaucoma suffer from contrast sensitivity. In her case, even moving between rooms now is a huge issue where her eyes take time to adapt to the change in the levels of light.
The conundrum of natural light
Sacheta sees better in natural light, but bright sunlight virtually destroys her vision for about 6 to 7 clock hours. The vision becomes blurry and effectively non-functional.
A lack of depth perception
Given the above changes, she has difficulty in gauging depth therefore detecting stairs specially those going down is a challenge and the same applies to judging distances.
The Solutions
Tinted glasses
Sacheta now carry’s three pairs of glasses. One pair are her regular glasses, one pair is tinted glasses- to better handle indoor light-level transitions and the final pair are dark glasses. She keeps changing glasses as needed.
We did consider photochromatic lenses but those would not help in her situation because they only darken in the presence of strong sunlight.
A brighter monitor
Sacheta is just about able to use the screen of her laptop. She used a big screen in any case, but we have had to invest in a brighter monitor. As of this writing, we have finalized the LS27AG300N monitor.
We did consider monitors from BenQ but have settled on the Samsung monitor because of greater brightness. BenQ may have worked and for all we know work better, but eye comfort technology reduces brightness which makes it difficult for her to see text on the screen.
The 27 inches width was determined by Sacheta. We could have gotten wider monitors but that is what worked for her.
Increasing the use of screen reading technology
Sacheta has begun to learn the The nvda screen reader and is using it heavily in Microsoft Outlook and in Microsoft word.
The same applies to Google Talkback.
She does not use them like a totally blind individual would, but they reduce eye strain significantly because she does not need to read text in detail. She mouses to where she has to go and the fact that the screen reader speaks text under the mouse helps her to know where she is. She is still learning how to go mouse less. She continues to use the windows magnifier.
The good news is that nvda works seamlessly with the windows magnifier and there is a winMag add-on that adds functionality to nvda that supports various features of the windows magnifier and avoids key clashes such as when navigating within tables.
We have also tried various visual options like inverting colors, inverting the color of only the mouse pointer and various high-contrast themes. The Aquatic theme worked the best for her.
One thing to watch out for is that when you do color inversion or change the theme, many icons look different. This became uncomfortable for Sacheta hence she changed the theme and inversion settings to their defaults.
However, when she is doing work that involves a lot of reading of text, she does change the theme but once again, it is a matter of choice.
Talkback has been a little easier to grasp because there are fewer keystrokes, and she has learnt the basic gestures for navigation. Using the magnifier is also possible and does not involve any change in gestures. One thing that helped her learn the gestures was the tactile book and directions from the excellent personnel at IIT. She is also making heavy use of the google assistant to interact with her onePlus 9 pro phone. She also uses talkback on our Sony Bravia television.
Using the cane
She has begun to carry the standard white cane. This is to help when she cannot see steps in bright sunlight and to also signal to the rest of the world that she may need assistance.
The kitchen
Cooking is a significant part of our lives and Sacheta has not let this reduction in vision stop her. She has made lighting changes to the kitchen such as installing different lights above the kitchen slabs. One more thing we are planning to try is using spacefelt tags for labelling food etc.
Other solutions
We have tried the CCTV cameras from vision aid and would also like to try the following solutions which are not available in India.
Esight eyeware
Oxsight Onyx glasses
We have other solutions such as the vOICe as well and ara but we propose to try them a little later. The speed of change needs to be regulated.
A note on lifestyle changes
While I have documented technology solutions in the above post, implementing these involves significant lifestyle changes. Do not underestimate this aspect. E.G., if we go out and the light levels are unusual, Sacheta and I do take significantly more help from people around us than before. This loss of independence takes getting used to.
In addition, she has to be extremely careful when travelling in a car during the day because bright sunlight will cause her vision to become non-functional if she exposes her eyes to bright sunlight.
Getting help
There are several hospitals who have low vision units, but they were not able to provide any solutions beyond magnifiers and some experimental video glasses. We received tangible help from the following entity.
ASSISTECH group of IIT Delhi
Using wps with a mikrotik router from the commandline
I had to use the wps functionality on my mikrotik router to pair a specialized device namely ara.
The only way to connect this device to the internet is by using the wps functionality.
I am blind therefore do not use winbox or the router’s web interface because of their inaccessibility. What I mean is that these interfaces do not yield sufficient information to my screen reading technology so that I can navigate and activate the relevant elements like links and or buttons.
This is why I went the commandline route. Here is what you need to do.
/interface wireless
We are not going to concern ourselves with the hardware wps button because that is usually multi-function and it is hard to track the lights that glow on the router to indicate that the device has entered wps.
We will take the wlan1 interface as an example. To see a list of interfaces, run the following command.
print
Suppose we want to enable wps by pressing a virtual button on the router on wlan1, here is what we do.
set wlan1 wps-mode=push-button-virtual-only
The above command will instruct the router to enable the virtual button that can be triggered from software.
We will now actually activate wps.
wps-push-button wlan1
The above command placed the wlan1 interface on the router into wps server mode. This is when you initiate device pairing.
A date with an indigenous haptic smartwatch
I recently read about a smartwatch researchers had developed at IIT Kanpur. This watch gave feedback via haptics that is by touch.
They gave me a chance to test the prototype and I am writing this article with their permission. Be warned, this unit is not commercially available.
Why the watch
My first question was why a specialized gadget for the blind? So many mainstream watches work. The answer was one I should have expected. “cost”. The watch that has been created costs several orders of magnitude less than existing watches.
Mind you, this is an accessible smartwatch so not only can the blind use it but it is one of the few smartwatches that the deaf blind can use out of the box.
How does it work?
This is where the real innovation begins.
- The face of the watch has 12 tactile markers that vibrate and also respond to pressure.
- You move your finger along the face of the watch just like you do in a tactile Braille watch.
- As you move your finger to each marker, it vibrates corresponding to the time. Two long vibrations means that that marker represents hours.
- Two short vibrations means that that marker corresponds to the minutes.
- Let me illustrate this with an example. Suppose the time is 03:20, then you move your finger to the 3 o-clock position on the face and feel two long vibrations. You move further to the marker corresponding to 20 and feel two short vibrations.
- What about a time like 3:19? You will feel the 2 short vibrations on the marker corresponding to the number 15 and if you want a greater resolution, you long press the marker and then count the number of vibrations such that in our example, you will feel 4 of them that is 15+4=19.
You may be wondering what about other features like heart rate monitoring?
You invoke the menu by pressing the crown or at least that is how it is planned. You then have haptic icons. E.G., for measuring your heart rate, when you place your finger on that marker, which is at the 12 o-clock position, you feel the vibration like a heartbeat. That is a haptic icon for you.
The same applies to other functions like measuring your breath.
Another area which this watch shines is that there is no need for a mobile phone to configure its basic functions. A mobile app is in the works however, all the functions can be configured directly from the watch.
The road ahead
The researchers would like to do more tests and are keen to meet potential users across India. IN addition, they are also talking to manufacturers to bring the watch to market.
It is possible to attach a speaker as well but as of this writing, this has not been done.
Acknowledgements
My thanks to Professor Siddhartha Panda and Vishwaraj Srivastava, the creators of the watch for answering my e-mails, questions and facilitating my seeing the watch.