• Turing Post
  • Posts
  • The Dawn of Computer Vision: From Concept to Early Models (1950-70s)

The Dawn of Computer Vision: From Concept to Early Models (1950-70s)

What is common between a cat and a first computer vision model?

Introduction

Vision is more than meets the eye. This statement holds profound truth in the realms of both biology and technology. From Hermann von Helmholtz in the 19th century, who first posited that our perception is an active brain function, to the pioneering strides in neural networks by Warren McCulloch and Walter Pitts in the 20th century, the quest to understand and replicate human vision has been relentless and fascinating. These early intellectual adventures laid the groundwork for today’s sophisticated computer vision systems, blending insights from biology, psychology, and computer science.

What pivotal developments occurred between 1950 and 1970 that cemented vision's importance in AI? In this episode, we're diving into the era's transformative milestones, such as the invention of the ophthalmoscope, Claude Shannon's revolutionary communication theory, and the influential role of the perceptron in AI. How did the insights of David Hubel, Torsten Wiesel, and Frank Rosenblatt drive the interplay between neural discoveries and technological advances? This exploration not only honors historical achievements but also paves the way for future advancements in vision systems. Join us as we embark on this inspiring journey, sure to ignite the imaginations of a new generation of visionaries.

We also answer what all that has to do with demons!

Understanding Biological Foundations of Vision

IBecome Premium to support our efforts to spread the ML knowledge →

Since the 19th century, scientists tried to understand how the human eye and brain process visual information. Hermann von Helmholtz, one of the key figures of his time, proposed the "theory of vision" hinting that vision results from unconscious inference, suggesting perception is an active, interpretative act by the brain rather than a passive process. His ideas on unconscious inference had a lasting impact on computational vision methods. Helmholtz also developed the theory of color vision, published works on physiological optics and invented vision-related instruments like the ophthalmoscope for examining the eye's interior.

Fast forward to the 20th century, in 1943, Warren McCulloch and Walter Pitts, inspired by the human brain, created a computational model of neural activity. It formed the conceptual foundation of the future neural networks as mentioned in the third episode of our historical series about LLMs.

On the hardware front, Claude Shannon's seminal 1948 paper, "A Mathematical Theory of Communication," showed how to measure information, signal transmission, and compression limits – crucial for computer vision signal processing and pattern recognition.

Building on McCulloch and Pitts's model, Frank Rosenblatt introduced the perceptron in 1958, an early neural network for pattern recognition, run on IBM's room-sized 704 computer. Initially promising, the perceptron received considerable attention for its potential in solving complex problems. However,  as we’ll see further, a decade later, Marvin Minsky and Seymour Papert demonstrated its limitations in certain problem types, leading to a decline in neural network research funding.

An image of the perceptron from Rosenblatt's “The Design of an Intelligent Automaton,” Summer 1958

Frank Rosenblatt ’50, Ph.D. ‘56 works on the “perceptron” – what he described as the first machine “capable of having an original idea.”

Meanwhile, David Hubel and Torsten Wiesel from Harvard Medical School worked on the biological foundations of vision and made an important discovery bridging neuroscience and computer vision. Their experiments on cats revealed specialized neurons responding to specific visual cues like edges, angles, and oriented lines. "Center-surround" retinal cells detect basic contrasts, while "simple cells" in the visual cortex are tuned to more complex features like line orientations. They also identified "complex cells" that react to moving lines, highlighting visual processing's dynamic aspect. Here is an article with many fascinating details from the history of this discovery.

In the classic neuroscience experiment, Hubel and Wiesel discovered a cat's visual cortex neuron (right) that fires strongly and selectively for a bar (left) when it is in certain positions and orientations

Hubel and Wiesel's findings, which earned them the Nobel Prize in Physiology or Medicine in 1981, uncovered the visual cortex's hierarchical and modular organization. Neurons at different levels process progressively more intricate visual features, mirroring the approach of many modern computer vision systems. Their work inspired early computer vision algorithms that mimicked the hierarchical processing observed in the brain.

One of the pioneering works in this regard was the "Pandemonium" system proposed by Oliver Selfridge in 1959. The name is not random, Pandemonium is the capital of Hell in John Milton's epic poem Paradise Lost. It was a hierarchical, parallel processing model with "demons" that process visual stimuli in stages, each group handling a specific aspect of recognition:

  • Image Demon: Captures the initial image as it appears on the retina.

  • Feature Demons: Specialize in recognizing specific visual features like lines or curves. Each demon "yells" upon detecting its feature.

  • Cognitive Demons: Listen to the feature demons' "yells" and respond based on the presence of patterns they are trained to recognize. Their response strength depends on the detected pattern's match to their designated feature.

  • Decision Demon: Hears the cognitive demons' responses and chooses the loudest, effectively determining the final perception.

While limited by earlier template matching theories, the architecture suggested the concept of feature detection, where visual stimuli are broken down into constituent features for analysis. This connectionist approach to pattern recognition influenced later developments in artificial intelligence and cognitive science which were formed in the 1950s.

Digital Imaging and Signal Processing Advances

During the 1960s, the ability to digitize images evolved significantly, enabling the transformation of images into a digital format for computer processing. This period also marked a surge in interest in what was then known as "machine perception" – the capability of a computer system to interpret data in a manner humans use their senses to relate to the world around them.

In 1963, Larry Roberts at MIT advanced this field with his thesis "Machine Perception of Three-Dimensional Solids," where he introduced algorithms that could reconstruct the three-dimensionality of a scene from a two-dimensional image.

Image Credit: Larry Roberts’ thesis "Machine Perception of Three-Dimensional Solids,"

Concurrently, the development of the Fast Fourier Transform (FFT) algorithm by James Cooley and John Tukey in 1965 revolutionized digital signal processing. The FFT efficiently computes the Discrete Fourier Transform (DFT), which converts a signal from its spatial domain, such as an image, to the frequency domain. Here, different frequencies reveal various image features, including edges, textures, and brightness levels. Utilizing the FFT, computer vision techniques began performing essential tasks:

  • Image filtering: This involves separating desired information from noise by manipulating specific frequencies.

  • Noise reduction: Unwanted high-frequency components are identified and removed.

  • Feature extraction: Isolates specific frequency ranges corresponding to important features like edges or textures, aiding object recognition.

These functions are crucial for enhancing image clarity, ensuring accurate object recognition, and improving overall performance in various computer vision applications.

Moreover, filtering emerged as another influential technique, akin to modern social media filters but rooted in more complex mathematical principles. Pioneered in the 19th century and adapted for digital signals in the mid-20th century, filters such as Gaussian, Laplacian, and median are used for:

  • Smoothing: Reducing noise to produce a cleaner signal.

  • Edge detection: Identifying sharp transitions in intensity, crucial for recognizing objects.

  • Noise removal: Eliminating unwanted electrical or environmental interference.

AI Labs and Their Contributions to Computer Vision

The 1960s marked the emergence of AI as a formal academic discipline. The field began at MIT by Marvin Minsky and John McCarthy who established the pioneering MIT Artificial Intelligence Laboratory. This lab served as a foundational training ground for scientists who went on to create their own research lab at Carnegie Tech and Stanford University by 1962. This AI research was supported by the Advanced Research Projects Agency (ARPA) and other military and defense departments in the US.

In 1966 at MIT, Marvin Minsky and Seymour Papert launched the "Summer Vision Project" to teach computers to "see." Despite widespread anecdotes suggesting Marvin Minsky assigned this project to his undergraduate student Gerald Sussman, it was, in fact, a team effort with Sussman coordinating. The New Yorker described the team as "a group of hackers – Gerald Sussman, William Gosper, Jack Holloway, Richard Greenblatt, Thomas Knight, Russell Noftsker, and others," who focused on segmenting images into objects, background, and chaos. This project, like many others of its era, highlighted the complexity and challenges inherent in early AI tasks.

At the same period of time, John McCarthy, another pioneer in artificial intelligence, transitioned from MIT to Stanford University in 1963 and established the Stanford Artificial Intelligence Laboratory (SAIL). This laboratory quickly evolved from a small group to a robust department attracting top talents like Edward Feigenbaum, focusing on specialized research areas such as hand-eye coordination and speech recognition.

In its formative years, SAIL became known for integrating vision with robotics. Here are some of the fascinating projects:

  • Stanford Arm: Designed in 1969 by Victor Scheinman, a Mechanical Engineering student at SAIL, this all-electric mechanical manipulator was among the first robots designed specifically for computer control. The Stanford Arm could perform simple tasks and was a significant step toward robots capable of interacting with the physical world.

  • SHAKEY the Robot: Developed from 1966 to 1972, SHAKEY was capable of planning, route-finding, and rearranging simple objects. Equipped with cameras and bump sensors, SHAKEY could navigate predefined environments, avoid obstacles, and interact with objects. Its development played a crucial role in advancing scene understanding and robot motion planning, foundational elements in the evolution of mobile robotics.

  • Stanford Cart: Originally built in 1960 by James L. Adams for remote vehicle control studies using video feedback, the Stanford Cart was paused after President Kennedy's Moon mission announcement in 1962. It was rediscovered in 1966 by Les Earnest at SAIL and repurposed for autonomous road vehicle research. Rodney Schmidt developed new control links, and by 1971, the cart could autonomously follow a white line at slow speeds. From 1971 to 1980, Hans Moravec enhanced its capabilities, introducing multi-ocular vision to navigate complex environments, culminating in successful autonomous navigation of a cluttered room in 1979.

Stanford Cart with cable, 1961

A young Hans Moravec with the Stanford Cart c1977

SHAKEY THE ROBOT – 1966, Source: https://www.sri.com/hoi/shakey-the-robot/

In 1967, in the middle of the Cold War, researchers from the USSR also made significant contributions to computer vision. Russian engineers Aleksandr Arkadev and Emmanuil Braverman published "Computers and Pattern Recognition," where they tackled advanced problems such as distinguishing male from female portraits and differentiating letters like 'a' from 'b'. Their work effectively linked Soviet advancements with American developments in machine learning methods for computer vision.

Hardware Advances

The 1969 invention of charge-coupled devices (CCDs) by Willard Boyle and George E. Smith at Bell Labs enabled efficient transfer and storage of analog signals, revolutionizing digital imaging and early computer vision systems. In the 1970s, solid-state image sensors based on CCD technology, along with CMOS (Complementary metal–oxide–semiconductor) sensors developed by researchers like Nobukazu Teranishi and Eric Fossum, drove further hardware advances.

As the decade progressed, the theoretical frameworks of computer vision evolved into more complex models and algorithms. Researchers expanded their focus from simple object recognition to interpreting scenes and contexts, reflecting the complexities of human visual perception.

Computational Approach and the First Commercial Applications of Image Processing

David Marr, who joined MIT's AI Lab in 1973 and became a tenured Psychology professor by 1980, was initially focused on the general theory of the brain but shifted to the study of computer vision. He was probably the first to advocate for a computational approach to vision, focusing on specific tasks and mechanisms rather than broad theories like neural nets. His significant work began with emphasizing computational and algorithmic understandings of vision, which he believed were essential for truly grasping the functionality of information processing systems.

During his short life lasting 35 years, Marr made significant contributions to the field of vision and computational neuroscience:

  • Theory of Binocular Stereopsis (Marr and Poggio, 1976, 1979): Marr developed a theory of how depth perception works in the brain through binocular stereopsis – the method by which the brain interprets different images from each eye to perceive depth. This work laid the groundwork for 3D perception in computer vision.

  • Primal Sketch (Marr and Nishihara, 1978): Marr proposed the concept of the "primal sketch" as a way to describe how the visual system interprets edges and textures in the environment. This representation was an early step in Marr’s vision processing theory, serving as the initial stage where the visual system detects simple elements in the visual field.

  • Book "Vision": His posthumous book, "Vision: A Computational Investigation into the Human Representation and Processing of Visual Information," published in 1982, encapsulates his theories and has been highly influential in cognitive science and computer vision.

Simultaneously, the hierarchy model of the visual nervous system proposed by Hubel and Wiesel echoed in Kunihiko Fukushima's work on the Cognitron (1975) and later the Neocognitron (1980). Fukushima introduced self-organizing capabilities in neural networks, essential for pattern recognition independent of position, enhancing applications in automated visual recognition.

Azriel Rosenfeld at the University of Maryland made foundational contributions to digital image processing:

  • Research in Digital Geometry and Digital Topology (1960s-1970s): Rosenfeld's work became fundamental in developing industrial vision inspection systems applied in various sectors, from automotive to electronics manufacturing.

  • Publication of the First Textbook on Digital Image Analysis (1969): Rosenfeld wrote the first textbook on digital image processing that helped formalize the discipline and guide future researchers and practitioners.

  • Founding Editor of the First Journal on Image Processing (1972): The "Computer Graphics and Image Processing" journal became a leading publication for scholars to share their research and advancements in image analysis.

  • Leadership in Conferences and Academic Growth (1970s): His organizational roles in conferences during the 1970s set the stage for the first international conference on computer vision in 1987.

The 1970s also witnessed advancements that broadened access and fueled the growth of digital technology:

  • Bridging the Gap for Accessibility: In 1974, Ray Kurzweil's groundbreaking development - the first commercial Optical Character Recognition (OCR) system - transformed the lives of visually impaired individuals. This system converted printed text into spoken words, granting them a newfound level of independence by making written information accessible.

  • Revolutionizing Digital Media: The same year, Nasir Ahmed introduced the Discrete Cosine Transform (DCT), a revolutionary technique for digital signal processing. DCT excelled at image compression, significantly reducing the data needed to store and transmit images and videos. This breakthrough proved crucial as the use of digital media, especially with the rise of the internet, began to explode. DCT's impact continues to be felt today, with its incorporation into influential standards like JPEG and MPEG, enabling the efficient and high-quality distribution of multimedia content worldwide.

Legacy and Impact

The 1950s and 1970s were a crucial period for computer vision. Pioneering researchers, faced with processing power and storage limitations, weren't deterred. By the decade's end, computer vision had transitioned from theoretical concepts to tackling real-world problems – tasks like object recognition, scene understanding, and robot navigation.

This period laid the groundwork for the dramatic advancements that followed. As Moore's Law (the number of transistors on a microchip doubles approximately every two years, enhancing computing power) continued to hold, with computing power growing exponentially, researchers were able to develop even more sophisticated algorithms. The next chapter in this story will explore these advancements and the rise of deep learning techniques that revolutionized the field in the following decades. Stay tuned!

Next episode: The Expansion of Theory and Practice: 1980s

How did you like it?

Login or Subscribe to participate in polls.

If you want to get access to this and other articles, please share with your friends and colleagues and receive one month of Premium subscription for free 🤍

Reply

or to participate.