Words and images: an essay by Ben Schouten, 2002
If computers can be seen as calculators, then the question arises whether intelligence and particularly visual intelligence can be produced by mere calculations. Unfortunately, this question will not be answered in this presentation. One thing is certain; we have taken a road to (visual) intelligence. We may wonder where this road will bring us and where we are now. Archiving concepts become available in industry, society and art.
Recognition is definitely a part of visual intelligence. It relates an emotion, experience or visual input to an earlier event. Image recognition is based on having seen something before. Our brain is effective in this. Human beings are able to understand and work with concepts. Computers are far less intelligent.
To a certain extent concepts can be expressed in language. Language enables the description of concepts; we are able to explain what we see. Although it is a tedious task, someone escorting a blind person can describe the things he sees to him. Even harder it is for the blind person to imagine what has been described. The mapping from image space to concepts is not one-to-one, after all: "A picture may be worth a thousand words".
In contemporary multimedia applications, an image is described by features, like the color or shape of an object. This is a many to one mapping and as a consequence, there is no one-to-one reverse mapping. A car can be red. But there are a lot of other objects with a red color.
As we can learn from the human seeing-eye dog, one way of doing is to describe the content in language, as done in the case of keywords. But then a more intelligent way of looking for similarities is required. 'Visual information retrieval' (VIR) systems process visual content in a way human beings do. 'Content based image retrieval' (CBIR) is based on the fact that images can be retrieved because of their similarity to other images.
One can distinguish three core components of these systems:
As we are living in the age of information technology, with the amount of content growing exponentially, the need to handle and archive this information grows as fast. We are overwhelmed with information:
http://www.sims.berkeley.edu/how-much-info
and the ability to retrieve and archive this information becomes proportionally smaller to the growing amount of data. In the domain of 'visual information systems', information can be processed for several purposes: