I was at MKT. The restaurant is setup differently to most such establishments. It is in the lower ground floor of the Chanakya mall and boasts a variety of cuisines. You can read about them at the above link. Visually, it is a fascinating place thanks to the live kitchens. I have spent many hours looking at food and one of the biggest problems with food is the lack of contrast. It is usually difficult to get good light and food with a high contrast as compared to your plate such that you can see the food and eat. As usual, I was wearing the vision 800 glasses running the vOICe Sacheta and I wanted to see their 4 open kitchens. Once again, I was in a situation where touching was not appropriate because the chefs were stacking food outside, ready for the waiters to pickup. I wanted to see the activity. The kitchens were behind transparent glass but they did have large windows to allow the cooks to send the food out. I was able to see the clean rising flame of the Italian kitchen and the round pizzas that emerged from it. The nachos were lined up in baskets outside the live Mexican counter. They were cylindrical but there were some differences in their shapes in stacking.
We then had a look at the Indian section where the cook was making chapatis. I could see these as dark disks lying on the counter. This was one of those situations where I used my other senses to zero on to the object of visual interest. I localized the sound of the stacking and then pointed my head in that direction. Visual scanning would have given me the same information but I treat vision as a multi sensory process.
We then moved to the Chinese counter where there was some kind of machine and a lot of activity taking place.
I did once again try to watch the cooks at work but was unable to perceive the actual motion of the people. I did notice the rapid changes in the scenery so could tell something was happening.
Sacheta took a few videos after seeking permission from the staff. That gave me the chance to stand still and look. Panning my head from side-to-side also helped and when looking through a window, pan up and down for maximum visual coverage. You can see details without panning but the devil is in the details and a little interaction brings a lot of clarity.
My special thanks to Mr. Pankaj Mishra, one of the manager’s of MKT for being so welcoming. The service is good and, before I forget, the food was excellent.
Pro food tip: do try the house special when it comes to drinks.
Pro vision tip: when your table has a lot of items on it, scan your vicinity for that tall glass of mocktail so that you can grab it in one shot. Blindness techniques work too but scanning is so much cooler.
concepts
A review of Machine Learning is Fun! by Adam Geitgey
I bought the book with much anticipation since I am a regular reader of Mr. Geitgey’s posts. The book did not disappoint. I particularly enjoyed the introductory section on neural networks, specially, , the lucid description of forward and back propagation. I have read many references on the web and have taken the famous machine learning course by Andrew NG but none of those references explained how a neural network works as well as machine learning is fun did.
The code examples are easy-to-read and are well organized.
The VMware virtual machine is a nice touch.
I would have liked to see more discussion about adversarial neural networks and generative neural networks. In addition, more details of commonly used optimization algorithms such as gradient descent would have been welcome.
Finally, a section on how to install the several libraries mentioned would also be handy.
Shopping for footwear using synthetic vision
Sacheta wanted to buy footwear, so there we were, in Kala Niketan, Janpath on a Saturday evening looking for bellies and such. As usual, I was wearing my Vision 800 glasses running the vOICe. One advantage in footwear shopping is that the customer needs to sit to try it. The shop was not too crowded and we did not have a problem in finding a seat. However, before sitting, we looked around the shop, asking the staff for what she wanted. There were shelves packed with footwear. I was unable to understand the shapes until I touched a slipper. It had been stacked on its end such that the strap that comes on the top of your foot was facing the customer. This is different from how shoes are stacked because shoes are kept with their backs to the customer or at least that was the way they were kept the last time I checked.
The shapes were uniform from a far and the shelf boundaries formed hard edges in the scene. The shop had a large range of footwear in several colors. Once I was seated, I began to play with the color filter. Blue, green and red did not yield much feedback. I hit pay dirt when I chose orange. This was strange. I could hear shoppers asking for a colors like “rose gold”, whatever that is. Where was this color? I don’t think anyone wears orange shoes but then what do I know? The analyze option came in really handy. I sat back letting the scan tell me what filters were working. I then enabled the live OCR which introduced a new wrinkle. As I panned, I heard “fresh stock.” Hmm, what could they mean? Were they referring to the latest fashion? Was someone baking shoes? I could not ask Sacheta because she was engaged with a persistent sales person.
Once she had completed her shopping I was able to clear up some of the mysteries. The orange filter worked probably because the shop used a lot of yellow light. The “fresh stock” meant that there was no sale on those items. Don’t ask me why they could not say this up front.
I did have a chance to look at shoes and to distinguish details, my palm remains the best tool. However, with some more practice, I think I will be able to reliably distinguish between shelves full of slippers and shelves where shoes are stacked.
I did not buy anything so did not get a chance to try walking but I can see myself creating a visual landmark of a shelf or something else and walking independently to test shoes.
Some may ask, what is the point of it all? The simple way to buy footwear is to ask for what you want and try until you like something. Yes, this approach works well but my scene reader helped me to get a better idea of what was in the shop and thereby make a more informed choice.
Note
Photography is not allowed in the shop so I did not take any images.
Note 2
Be careful when looking and searching for a place to sit. These shops are full of humans sitting and not paying attention to their surroundings. Stay in the middle of the isle as much as you can. Let the staff guide you to a seat.
House hunting with artificial vision
I was recently looking at places to rent for my mother in-law. I was wearing the vOICe as my wife and I looked at prospective houses. For those unfamiliar with the Indian real estate market, here are a few things to note
- Real estate is an investment for many people. Landlords construct houses specifically for renting.
- Many houses are little more than shells with plumbing.
- You need to check carefully because maintenance is bad here. The landlord’s idea is to get the maximum return for the lowest cost.
- Be careful about the furniture because it can be old and unusable.
- Real estate agents seldom give you all the facts about a place up front.
I used passive echo location to get a sense of the size of the house.
I used my nose to tell how well the air was circulating in the house which was a key to ventilation.
I did not want to touch anything because the houses were not being cleaned regularly. The vOICe however came in handy to ask about bits of furniture. I was also able to see and ask about the paint on the walls because I was able to perceive texture differences. In addition, I was able to explore independently and supplement my other senses and see the quality of doors.
I was unable to confirm if objects were dusty visually but I expect if you see something hazy, assume it is dusty.
I was also able to get an idea of how the common staircase was maintained by scanning the walls. I did feel them while ascending and descending the staircase but the vOICe gave me a larger field of view.
One example of detecting furniture was when I spotted a stack of something. It turned out to be a wall rack cut out of the slab of the open kitchen.
You may well ask me, why not ask my sighted wife to describe things to me? I could have done that but she was making her own observations, talking to the agent and navigating me. We did however had a discussion in the car where I was able to compliment some of her observations.
Our hunt continues but I am glad I have the ability to get data about my environment at a distance.
Access to travel the Planet Abled way
Planet Abled invited me to speak at the Access To Travel conference conducted by them on 27 September 2017. I went in expecting the usual conference but was pleasantly surprised. It was conducted at Park Hotel. The conference experience began just after the metal detectors. I was barely through the device when I was met by a Planet Abled team member and escorted to the venue. I checked as I entered the hall. Something was wrong. The light level dropped, and I was in an enclosed space. I am light independent therefore I continued through and entered the auditorium.It turns out that the entrance had been reconfigured as a darkness simulator.
Cameras and Planet Abled staff were everywhere. Unlike other conferences, they did not ask me to settle down. There was active focus on moving around and interaction. The staff helped. We, the tourists had a chance to meet other key leaders in the travel industry such as Mr. Subash Goyal. The food was good and was dry. I have attended more than my share of conferences where the food has gravy and is impossible to eat without 2 working hands.
it is rare to have a discussion on recreation in India. The access to travel conference was one of the few events that addressed not only the challenges when traveling with a disability but allowed me to see what everyone else was doing to have fun.
The most enjoyable part for me in the conference was the emphasis on stories. Each speaker had his own story to tell and had time to tell it in. It was also very easy to ask speakers questions and to meet those who stayed after the conference. As always, the speakers were cross disabilities. I returned home with a greater sense of unity. Our senses that we used to engage with the world were different, but the problems were the same and had the same broad solution namely making people better humans and treating each other with dignity and respect.
Artificial vision in the enterprise
I recently acquired the Vision 800 smart glasses. This gave me a compact and convenient setup allowing me to run the the vOICe. I have had several visual experiences. When I wear the glasses, I am effectively wearing an Android tablet on my head. Yes, it would be nice to do multiple things but given the specifications of the glasses, I use them as a dedicated vision device. I also use bone conduction headphones.
- I am able to read floor numbers as well as other signage. This means that when the fire martial asks me to exit from gate 3B, I know what he is talking about. In addition, I can navigate the stairs independently and do not need to count floors.
- I can lean forward and see if my laptop has an error message. It is easier talking to the help desk if I can tell the problem and many times, I can solve the problem independently.
- I am better at indoor navigation. I am able to tell when silent humans are in the way.
- The camera on the vision 800 glasses is on the extreme left. I am not used to scanning therefore the narrow field of view and the left orientation is not matching the sense of space my body has. This is taking getting used to.
- I am also still working out the right time to look down.
- I can derive more information about my environment such as detecting flower pots that have been placed on top of filing cabinets.
- Bone conduction headphones are a double edged sword. Yes, I can hear environmental sounds but on the other hand, in situations like lunch time at the office Cafeteria, they are almost useless. I cannot hear the soundscapes unless I increase the volume significant in the vOICe.
- I have run the glasses for over 8 hours. They do not heat up much.
- I can better handle situations where colleagues leave things on different parts of my table. For example, a colleague heated my lunch and his. He kept my lunch but forgot to tell me that he had done this. I scanned the table and was able to get hold of my plate.
Post processing images including describing them automatically
As most of you know, I publish plenty of images on this blog. I ensure that all of them are described. The biggest challenge I have in posting photographs on this blog is captioning them. I have to get images described manually before I put them up here. Once I take my photographs I isolate my pictures by location because of the geotagging my phone does. I then send them to people who have been on the trip who describe the images. I have been searching for solutions that describe images automatically. I was thrilled to learn that wordpress had a plugin that used the Microsoft Cognitive Services API to automatically describe images. The describer plugin however did not give me location information therefore I rolled my own code in python. I have created a utility that queries Google for location and the Microsoft Cognitive Services API for image descriptions and writes them to a text file. I had tried to embed the descriptions in EXIF tags but that did not work and I cannot tell why.
References
You will need an API key from the below link.
Microsoft Cognative Services API
The wordpress plugin that uses the Microsoft Cognitive Services API to automatically describe images when uploading
Notes
- You will need to keep your cognitive services API key alive by describing images at least once in every 90 days I think.
- Do account for Google’s usage limits for the geotagging API.
- In the code, do adjust where the image files you want described live as well where you want the log file to be stored.
- Do ensure you add your API key before you run the code.
import glob from PIL import Image from PIL.ExifTags import TAGS from PIL.ExifTags import TAGS, GPSTAGS import piexif import requests import json import geocoder def _get_if_exist(data, key): if key in data: return data[key] return None def get_exif_data(fn): """Returns a dictionary from the exif data of an PIL Image item. Also converts the GPS Tags""" image = Image.open(fn) exif_data = {} info = image._getexif() if info: for tag, value in info.items(): decoded = TAGS.get(tag, tag) if decoded == "GPSInfo": gps_data = {} for t in value: sub_decoded = GPSTAGS.get(t, t) gps_data[sub_decoded] = value[t] exif_data[decoded] = gps_data else: exif_data[decoded] = value return exif_data def _convert_to_degrees(value): """Helper function to convert the GPS coordinates stored in the EXIF to degrees in float format""" d0 = value[0][0] d1 = value[0][1] d = float(d0) / float(d1) m0 = value[1][0] m1 = value[1][1] m = float(m0) / float(m1) s0 = value[2][0] s1 = value[2][1] s = float(s0) / float(s1) return d + (m / 60.0) + (s / 3600.0) def get_lat_lon(exif_data): """Returns the latitude and longitude, if available, from the provided exif_data (obtained through get_exif_data above)""" lat = None lon = None if "GPSInfo" in exif_data: gps_info = exif_data["GPSInfo"] gps_latitude = _get_if_exist(gps_info, "GPSLatitude") gps_latitude_ref = _get_if_exist(gps_info, 'GPSLatitudeRef') gps_longitude = _get_if_exist(gps_info, 'GPSLongitude') gps_longitude_ref = _get_if_exist(gps_info, 'GPSLongitudeRef') if gps_latitude and gps_latitude_ref and gps_longitude and gps_longitude_ref: lat = _convert_to_degrees(gps_latitude) if gps_latitude_ref != "N": lat = 0 - lat lon = _convert_to_degrees(gps_longitude) if gps_longitude_ref != "E": lon = 0 - lon return lat, lon def getPlaceName(fn): lli=() lli=get_lat_lon(get_exif_data(fn)) g = geocoder.google(lli, method='reverse') return g.address def getImageDescription(fn): payload = {'visualFeatures': 'Description'} files = {'file': open(fn, 'rb')} headers={} headers={ 'Ocp-Apim-Subscription-Key': 'myKey'} r = requests.post('https://api.projectoxford.ai/vision/v1.0/describe', params=payload,files=files,headers=headers) data = json.loads(r.text) dscr=data['description'] s=dscr['captions'] s1=s[0] return s1['text'] def tagFile(fn,ds): img = Image.open(fn) exif_dict = piexif.load(img.info["exif"]) exif_dict['Description''Comment']=ds exif_bytes = piexif.dump(exif_dict) piexif.insert(exif_bytes, fn) img.save(fn, exif=exif_bytes) def createLog(dl): with open('imageDescriberLog.txt','a+') as f: f.write(dl) f.write("\n") path = "\*.jpg" for fname in glob.glob(path): print("processing:"+fname) createLog("processing:"+fname) try: imageLocation=getPlaceName(fname) except: createLog("error in getting location name for file: "+fname) pass try: imageDescription=getImageDescription(fname) except: createLog("error in getting description of file: "+fname) pass imgString="Description: "+imageDescription+"\n"+"location: "+imageLocation createLog(imgString) try: tagFile(fname,imgString) except: createLog("error in writing exif tag to file: "+fname) pass