For a long time, an ability that humans had that machines didn’t was the ability to look at a picture and accurately describe what was going on in the picture. Relatively recently, technology advanced to the point that computers can identify what was a face and what wasnt a face with a high degree of accuracy, even to the point that a person’s age can be estimated by a machine, albeit with slightly less accurate results. Microsoft is looking to push this technology even further by making it so that a computer can not only look at a picture and tell what objects are in the picture, but also identify the main subject and write out what the picture is trying to convey.
Essentially, Microsoft is trying to teach machines how to see pictures as a human would. For example, if a picture had a woman in a crowd holding a camera as the main subject, then a machine would identify the picture as such, rather than identifying the picture as merely a crowd of people. Of course, this is the intended goal, but like any piece of technology the early iterations of it are likely to be imperfect.
The potential uses of such technology expands far beyond merely generating captions for images though. By the end of the year, Microsoft is looking to integrate real time translation technology into Skype, something that was once the stuff of science fiction. Eventually, maybe your Xbox with Kinect can pause your game without you having to tell it, maybe it can even tell when you are getting scared just by your facial expressions. But why stop there when your Xbox can have an AI in it that can communicate with your phone or computer that is constantly learning about you and your world that can also follow you wherever you go.
Of course, that is actually a terrifying idea that is often the subject of big brother esque stories, and a huge breach of privacy as we know it, but then again that is the case with any new technology these days.