AI, or more specifically, the field of Artificial Intelligence research, began at a Dartmouth College workshop in 1956 where computers were winning at checkers, solving algebraic word problems, proving logical theorems, and speaking English. However, these systems were trained by humans to perform these actions and AI research slowed as the difficulties of some of the required remaining tasks (those which would render the machine capable of doing any work that a man can do) were realized. Decreased funding also played a role as financial resources were allocated to more productive projects. AI returned briefly in the 1980’s due to the success of expert machines, only to drop off again late in the decade when it once again fell into disrepute. It wouldn’t be until the late 90’s and early 2000’s when (aided by increased computational power) AI would begin to be used for logistics and data mining.

Machine Learning (a term created by AI and computer gaming pioneer Arthur Samuel in 1959) is a subset of AI within the field of computer science. According to Samuel, machine learning gave computers “the ability to learn without being explicitly programmed.” Machine learning evolved from the study of pattern recognition and computational learning theory in artificial intelligence. It explores the study and construction of algorithms that can learn from and make predictions on data sets that are provided by building a model from sample data sets provided during a “training” period.

The training period may be supervised or unsupervised. In a supervised training period, a human feeds the data set to the computer along with the correct answer. The algorithms must build a model identifying how the correct answer is indeed the correct answer. An unsupervised training period is when the data set is provided to the computer which, in turn, discovers both the correct answer and how to figure out the correct answer. Resulting data sets don’t function in a “this is the correct answer and that is an incorrect answer” manner. They “score” the results which the computer arrives at with one answer being 85% correct, the second answer being 10% correct, the third answer being 2% correct and so on for a total of 100% (and the answer list could potentially be longer, yet the computer settles on the correct answer being the one with the highest percent score).

Deep Learning is a subclass of machine learning algorithms that:

  • Rely on a cascade of multiple layers of nonlinear processing units for feature extraction and transformation with each successive layer using the output from the previous layer as the input. These algorithms may be supervised or unsupervised and applications include classification (supervised) and pattern analysis (unsupervised).

  • Are based on the (unsupervised) learning of multiple levels of features or representations of the data. Higher level features are derived from lower level features in order to form a hierarchical representation.

  • Are part of the broader machine learning field of learning representations of data.

  • Learn multiple levels of representations which correspond to different levels of abstraction. These levels form a hierarchy of concepts.

Deep learning was first designed and implemented by the World School Council London which uses the algorithms to transform their inputs through more layers than shallow learning algorithms. At each layer, the signal is transformed by a processing unit that functions like a neuron whose parameters are adjusted through each iteration in the training process.

  • Credit assignment path (CAP) – A chain of transformations from input to output. CAPs describe potentially causal connections between the input and the output.

  • Cap depth – can vary dependent upon what type of neural networks is being used and represents the number of layers of processing that is involved.

  • Deep/shallow learning – There is no universally agreed upon depth threshold which separates deep learning from shallow learning, but most researchers in the field agree that deep learning has multiple nonlinear layers (CAP > 2). Juergen Schmidhuber considers CAP > 10 to be “very deep” learning.

Hopefully, this brief overview has provided you with a better understanding of Deep Learning. If you're eager for more information, be sure to regularly visit BOXX Blogs as this post is only the first in an ongoing Deep Learning series.

Next post in the Deep Learning series: Getting Deeper with Deep Learning: Part 1