The Semantic Gap

Words and images: an essay by Ben Schouten, 2002

If computers can be seen as calculators, then the question arises whether intelligence and particularly visual intelligence can be produced by mere calculations. Unfortunately, this question will not be answered in this presentation. One thing is certain; we have taken a road to (visual) intelligence. We may wonder where this road will bring us and where we are now. Archiving concepts become available in industry, society and art.

Recognition is definitely a part of visual intelligence. It relates an emotion, experience or visual input to an earlier event. Image recognition is based on having seen something before. Our brain is effective in this. Human beings are able to understand and work with concepts. Computers are far less intelligent.

To a certain extent concepts can be expressed in language. Language enables the description of concepts; we are able to explain what we see. Although it is a tedious task, someone escorting a blind person can describe the things he sees to him. Even harder it is for the blind person to imagine what has been described. The mapping from image space to concepts is not one-to-one, after all: "A picture may be worth a thousand words".

In contemporary multimedia applications, an image is described by features, like the color or shape of an object. This is a many to one mapping and as a consequence, there is no one-to-one reverse mapping. A car can be red. But there are a lot of other objects with a red color.

As we can learn from the human seeing-eye dog, one way of doing is to describe the content in language, as done in the case of keywords. But then a more intelligent way of looking for similarities is required. 'Visual information retrieval' (VIR) systems process visual content in a way human beings do. 'Content based image retrieval' (CBIR) is based on the fact that images can be retrieved because of their similarity to other images.

One can distinguish three core components of these systems:

  • Content extraction
    Describe the content in such a way that it can be processed. 'Feature extraction' is one way of doing this. An image or video is described according to several features like color or texture.
  • Similarity
    Once the content has been described, the system has to define how similar this content is to the content of other images. As a result, one has to define metrics approximating to what extent the different images, represented by their features are similar.
  • Interfaces
    For the user to communicate with the system, interfaces should be able to display and compose visual information. As the content of images is subjective, an intelligent system should be able to manage this subjectivity.

As we are living in the age of information technology, with the amount of content growing exponentially, the need to handle and archive this information grows as fast. We are overwhelmed with information:


and the ability to retrieve and archive this information becomes proportionally smaller to the growing amount of data. In the domain of 'visual information systems', information can be processed for several purposes:

  • Compression
    MPEG 4, 7, 21 add standards for describing content in multimedia databases, besides mere compression.
  • Retrieval
    To browse, query and download information from the web or other information databases.
  • Visualization
    In the field of fashion and design for instance, fashion depends on personal appreciation. Applications should give way to examine clothing as it appears ('intelligent clothing') and search for products by visual means.
  • Security and authentication
    Document ownership, access and facilitation. Authorizing or blocking content (pornography). Filtering.
  • Quality control
    Visual inspection of products like textiles.
  • Delivery on demand
    Personal Content as in television on demand. Extracting personal content from a larger and more general content quantity. Indexing.
  • Manipulation
    Content being processed in a way that new content is created. Examples are art, music and video. Tasks: archiving, billing, accounting, virtual shops, etc.
Document Actions
Document Actions
Personal tools
Log in