Before Convolutional Neural Networks were introduced, the way practitioners processed image data was by taking each pixel, extracting data from it such as its RGB values etc.,
The problem with this method is, we lose a lot of spatial structure. What this means is, the order of pixels doesn't matter. The pixels could be jumbled for all we care because Neural Networks don't care about the order of features. We get similar results no matter how the pixels are ordered.
We want models that make assumptions like nearby pixels being related to each other and build efficient models for learning from Image Data.
CNNs are built exactly for this purpose. These days, no image processing models are built without using CNNs in any way.
CNNs are very much inspired by human vision and are much more efficient than other fully connected architectures because they have fewer parameters. Convolutions are also easily parallelizable across GPU cores.
As a result, practitioners are now using CNNs wherever possible, even on tasks with one-dimensional structured data like Audio, Text, Time Series where conventionally RNNs are used.
This article is a summary of Introduction to Convolutional Neural Networks from the book Dive into Deep Learning