基于Python做的一些深度学习和机器学习的例子.zip资源-CSDN文库

共17个文件

xml：9个

py：4个

iml：2个

版权申诉

人工智能

深度学习

python

179 浏览量 2024-02-19 17:09:18 上传评论收藏 760KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于Python做的一些深度学习和机器学习的例子.zip （17个子文件）

MLProject-master

.idea

CCR.iml 435B

vcs.xml 180B

workspace.xml 7KB

misc.xml 189B

modules.xml 258B

encodings.xml 135B

01-MINISTProject

mnist_loader.py 3KB

model3-keras-LeNet5-minist.py 4KB

.idea

MINISTProject.iml 435B

workspace.xml 16KB

misc.xml 189B

modules.xml 278B

encodings.xml 135B

model1-keras-cnn-minist.py 3KB

__pycache__

mnist_loader.cpython-36.pyc 4KB

model2-KerasMnist.py 808B

document

01-(lenet5)Gradient-Based Learning Applied to Document Recognition.pdf 982KB

PROC. OF THE IEEE, NOVEMBER 1998 1

Gradient-Based Learning Applied to Do cument

Recognition

Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haner

Abstract

Multilayer Neural Networks trained with the backpropa-

gation algorithm constitute the best example of a successful

Gradient-Based Learning technique. Given an appropriate

network architecture, Gradient-Based Learning algorithms

can be used to synthesize a complex decision surface that can

classify high-dimensional patterns such as handwritten char-

acters, with minimal preprocessing. This paper reviews var-

ious methods applied to handwritten character recognition

and compares them on a standard handwritten digit recog-

nition task. Convolutional Neural Networks, that are specif-

ically designed to deal with the variability of 2D shap es, are

shown to outperform all other techniques.

Real-life document recognition systems are comp osed

of multiple modules including eld extraction, segmenta-

tion, recognition, and language mo deling. A new learning

paradigm, called Graph Transformer Networks (GTN), al-

lows suchmulti-module systems to b e trained globally using

Gradient-Based metho ds so as to minimize an overall p er-

formance measure.

Two systems for on-line handwriting recognition are de-

scribed. Experiments demonstrate the advantage of global

training, and the exibility of Graph Transformer Networks.

A Graph Transformer Network for reading bank checkis

also described. It uses Convolutional Neural Network char-

acter recognizers combined with global training techniques

to provides record accuracy on business and p ersonal checks.

It is deployed commercially and reads several million checks

per day.

Keywords

| Neural Networks, OCR, Do cument Recogni-

tion, Machine Learning, Gradient-Based Learning, Convo-

lutional Neural Networks, Graph Transformer Networks, Fi-

nite State Transducers.

Nomenclature



GT Graph transformer.



GTN Graph transformer network.



HMM Hidden Markov model.



HOS Heuristic oversegmentation.



K-NN K-nearest neighbor.



NN Neural network.



OCR Optical character recognition.



PCA Principal comp onent analysis.



RBF Radial basis function.



RS-SVM Reduced-set support vector metho d.



SDNN Space displacement neural network.



SVM Supp ort vector method.



TDNN Time delay neural network.



V-SVM Virtual support vector metho d.

The authors are with the Speech and Image Pro-

cessing Services Research Laboratory, AT&T Labs-

Research, 100 Schulz Drive Red Bank, NJ 07701. E-mail:

yann,leonb,yoshua,haner

@research.att.com. Yoshua Bengio

is also with the Departement d'Informatique et de Recherche

Operationelle, UniversitedeMontreal, C.P. 6128 Succ. Centre-Ville,

2920 Chemin de la Tour, Montreal, Quebec, Canada H3C 3J7.

I. Introduction

Over the last several years, machine learning techniques,

particularly when applied to neural networks, haveplayed

an increasingly imp ortant role in the design of pattern

recognition systems. In fact, it could be argued that the

availability of learning techniques has b een a crucial fac-

tor in the recent success of pattern recognition applica-

tions suchas continuous speech recognition and handwrit-

ing recognition.

The main message of this pap er is that b etter pattern

recognition systems can b e built by relying more on auto-

matic learning, and less on hand-designed heuristics. This

is made possible by recent progress in machine learning

and computer technology. Using character recognition as

a case study, we show that hand-crafted feature extrac-

tion can b e advantageously replaced by carefully designed

learning machines that op erate directly on pixel images.

Using do cument understanding as a case study, we show

that the traditional way of building recognition systems by

manually integrating individually designed modules can b e

replaced by a unied and well-principled design paradigm,

called

Graph Transformer Networks

, that allows training

all the mo dules to optimize a global performance criterion.

Since the early days of pattern recognition it has been

known that the variability and richness of natural data,

be it sp eech, glyphs, or other types of patterns, make it

almost imp ossible to build an accurate recognition system

entirely by hand. Consequently, most pattern recognition

systems are built using a combination of automatic learn-

ing techniques and hand-crafted algorithms. The usual

method of recognizing individual patterns consists in divid-

ing the system into twomainmo dules shown in gure 1.

The rst module, called the feature extractor, transforms

the input patterns so that they can b e represented bylow-

dimensional vectors or short strings of symbols that (a) can

be easily matched or compared, and (b) are relatively in-

variant with resp ect to transformations and distortions of

the input patterns that do not change their nature. The

feature extractor contains most of the prior knowledge and

is rather specic to the task. It is also the focus of most of

the design eort, because it is often entirely hand-crafted.

The classier, on the other hand, is often general-purpose

and trainable. One of the main problems with this ap-

proach is that the recognition accuracy is largely deter-

mined by the ability of the designer to come up with an

appropriate set of features. This turns out to be a daunt-

ing task which, unfortunately,must b e redone for eachnew

problem. A large amount of the pattern recognition liter-

ature is devoted to describing and comparing the relative

PROC. OF THE IEEE, NOVEMBER 1998 2

TRAINABLE CLASSIFIER MODULE

FEATURE EXTRACTION MODULE

Class scores

Feature vector

Raw input

Fig. 1. Traditional pattern recognition is p erformed with twomod-

ules: a xed feature extractor, and a trainable classier.

merits of dierent feature sets for particular tasks.

Historically, the need for appropriate feature extractors

was due to the fact that the learning techniques used by

the classiers were limited to low-dimensional spaces with

easily separable classes 1]. Acombination of three factors

have changed this vision over the last decade. First, the

availabilityof low-cost machines with fast arithmetic units

allows to rely more on brute-force \numerical" methods

than on algorithmic renements. Second, the availability

of large databases for problems with a large market and

wide interest, such as handwriting recognition, has enabled

designers to rely more on real data and less on hand-crafted

feature extraction to build recognition systems. The third

and very importantfactoristheavailabilityofpowerful ma-

chine learning techniques that can handle high-dimensional

inputs and can generate intricate decision functions when

fed

with these large data sets. It can b e argued that the

recent progress in the accuracy of speech and handwriting

recognition systems can b e attributed in large part to an

increased reliance on learning techniques and large training

data sets. As evidence to this fact, a large prop ortion of

modern commercial OCR systems use some form of multi-

layer Neural Network trained with back-propagation.

In this study,we consider the tasks of handwritten char-

acter recognition (Sections I and II) and compare the per-

formance of several learning techniques on a benchmark

data set for handwritten digit recognition (Section II I).

While more automatic learning is benecial, no learning

technique can succeed without a minimal amount of prior

knowledge ab out the task. In the case of multi-layer neu-

ral networks, a good way to incorporate knowledge is to

tailor its architecture to the task. Convolutional

Neu-

ral Networks 2] introduced in Section II are an exam-

ple of sp ecialized neural network architectures which in-

corporate knowledge about the invariances of 2D shap es

by using lo cal connection patterns, and by imposing con-

straints on the weights. A comparison of several methods

for isolated handwritten digit recognition is presented in

section II I. To go from the recognition of individual char-

acters to the recognition of words and sentences in docu-

ments, the idea of combining multiple mo dules trained to

reduce the overall error is introduced in Section IV. Rec-

ognizing variable-length ob jects such as handwritten words

using multi-module systems is best done if the modules

manipulate directed graphs. This leads to the concept of

trainable

Graph T

ransformer Network

(GTN) also intro-

duced in Section IV. Section V describes the now clas-

sical metho d of heuristic over-segmentation for recogniz-

ing words or other character strings. Discriminative and

non-discriminative gradient-based techniques for training

a recognizer at the word level without requiring manual

segmentation and labeling are presented in Section VI. Sec-

tion VI I presents the promising Space-DisplacementNeu-

ral Network approach that eliminates the need for seg-

mentation heuristics by scanning a recognizer at all pos-

sible lo cations on the input. In section VIII, it is shown

that trainable Graph Transformer Networks can be for-

mulated as multiple generalized transductions, based on a

general graph composition algorithm. The connections be-

tween GTNs and Hidden Markov Mo dels, commonly used

in speech recognition is also treated. Section IX describes

a globally trained GTN system for recognizing handwrit-

ing entered in a pen computer. This problem is known as

\on-line" handwriting recognition, since the machine must

produce immediate feedback as the user writes. The core of

the system is a Convolutional Neural Network. The results

clearly demonstrate the advantages of training a recognizer

at the word level, rather than training it on pre-segmented,

hand-labeled, isolated characters. Section X describes a

complete GTN-based system for reading handwritten and

machine-printed bank checks. The core of the system is the

Convolutional Neural Network called LeNet-5 describ ed in

Section I I. This system is in commercial use in the NCR

Corporation line of check recognition systems for the bank-

ing industry. It is reading millions of checks per month in

several banks across the United States.

A. Learning from Data

There are several approaches to automatic

machine

learning, but one of the most successful approaches, p op-

ularized in recentyears by the neural network community,

can be called \numerical" or

gradient-based learning

. The

learning machine computes a function

(

W

)

where

is the

-th input pattern, and

represents the

collection of adjustable parameters in the system. In a

pattern recognition setting, the output

may be inter-

preted as the recognized class lab el of pattern

, or as

scores or probabilities associated with each class. A loss

function

(

F

(

WZ

)), measures the discrep-

ancy b etween

, the \correct" or desired output for pat-

tern

, and the output produced by the system. The

average loss function

train

(

) is the average of the er-

rors

over a set of lab eled examples called the training

set

(

D

)

 ::::

(

D

)

. In the simplest setting, the

learning problem consists in nding the value of

that

minimizes

train

(

). In practice, the p erformance of the

system on a training set is of little interest. The more rel-

evant measure is the error rate of the system in the eld,

where it would

be used in practice. This p erformance is

estimated by measuring the accuracy on a set of samples

disjoint from the training set, called the test set. Much

theoretical and experimental work 3], 4], 5] has shown

PROC. OF THE IEEE, NOVEMBER 1998 3

that the gap between the exp ected error rate on the test

set

test

and the error rate on the training set

train

de-

creases with the number of training samples approximately

test

;

train

(

h=P

)



(1)

where

is the numb er of training samples,

is a measure of

\eective capacity" or complexity of the machine 6], 7],



is a number between 0

5 and 1

0, and

is a constant. This

gap always decreases when the numb er of training samples

increases. Furthermore, as the capacity

increases,

train

decreases. Therefore, when increasing the capacity

,there

is a trade-o between the decrease of

train

and the in-

crease of the gap, with an optimal value of the capacity

that achieves the lowest generalization error

test

. Most

learning algorithms attempt to minimize

train

as well as

some estimate of the gap. A formal version of this is called

structural risk minimization 6], 7], and is based on den-

ing a sequence of learning machines of increasing capacity,

corresponding to a sequence of subsets of the parameter

space such that each subset is a sup erset of the previous

subset. In practical terms, Structural Risk Minimization

is implemented by minimizing

train

H

(

), where the

function

(

) is called a regularization function, and



a constant.

(

) is chosen such that it takes large val-

ues on parameters

that belong to high-capacity subsets

of the parameter space. Minimizing

(

) in eect lim-

its the capacity of the accessible subset of the parameter

space, thereby controlling the tradeo between minimiz-

ing the training error and minimizing the expected gap

between the training error and test error.

B. Gradient-Based Learning

The general problem of minimizing a function with re-

spect to a set of parameters is at the ro ot of many issues in

computer science. Gradient-Based Learning draws on the

fact that it is generally much easier to minimize a reason-

ably smo oth, continuous function than a discrete (combi-

natorial) function. The loss function can b e minimized by

estimating the impact of small variations of the parame-

ter values on the loss function. This is measured by the

gradient of the

loss function with resp ect to the param-

eters. Ecient learning algorithms can be devised when

the gradient vector can be computed analytically (as op-

posed to numerically through p erturbations). This is the

basis of numerous gradient-based learning algorithms with

continuous-valued parameters. In the pro cedures described

in this article, the set of parameters

is a real-valued vec-

tor, with resp ect to which

(

) is continuous, as well as

dierentiable almost everywhere. The simplest minimiza-

tion procedure in such a setting is the gradient descent

algorithm where

is iteratively adjusted as follows:

;



(

)

(2)

In the simplest case,



is a scalar constant. More sophisti-

cated procedures use variable



, or substitute it for a diag-

onal matrix, or substitute it for an estimate of the inverse

Hessian matrix as in Newton or Quasi-Newton metho ds.

The Conjugate Gradient method 8] can also be used.

However, App endix B shows that despite many claims

to the contrary in the literature, the usefulness of these

second-order methods to large learning machines is very

limited.

A p opular minimization procedure is the sto chastic gra-

dient algorithm, also called the on-line update. It consists

in updating the parameter vector using a noisy,orapprox-

imated, version of the average gradient. In the most com-

mon instance of it,

is up dated on the basis of a single

sample:

;



(

)

(3)

With this pro cedure the parameter vector uctuates

around an average tra jectory, but usually converges consid-

erably faster than regular gradient descent and second or-

der metho ds on large training sets with redundant samples

(such as those encountered in sp eechorcharacter recogni-

tion). The reasons for this are explained in App endix B.

The properties of such algorithms applied to learning have

been studied theoretically since the 1960's 9], 10], 11],

but practical successes for non-trivial tasks did not o ccur

until the mid eighties.

C. Gradient Back-Propagation

Gradient-Based Learning pro cedures have been used

since the late 1950's, but they were mostly limited to lin-

ear systems 1]. The surprising usefulness of such sim-

ple gradient descenttechniques for complex machine learn-

ing tasks was not widely realized until the following three

events o ccurred. The rst event was the realization that,

despite early warnings to the contrary 12], the presence

of lo cal minima in the loss function do es not seem to

be a ma jor problem in practice. This became apparent

when it was noticed that local minima did not seem to

be a ma jor imp ediment to the success of early non-linear

gradient-based Learning techniques such as Boltzmann ma-

chines 13], 14]. The second eventwas the p opularization

by Rumelhart, Hinton and Williams 15] and others of a

simple and ecient procedure, the back-propagation al-

gorithm, to compute the gradient in a non-linear system

composed of several layers of pro cessing. The third event

was the demonstration that the back-propagation proce-

dure applied to multi-lay

er neural networks with sigmoidal

units can solve complicated learning tasks. The basic idea

of back-propagation is that gradients can be computed e-

ciently by propagation from the output to the input. This

idea was describ ed in the control theory literature of the

early sixties 16], but its application to machine learning

was not generally realized then. Interestingly, the early

derivations of back-propagation in the context of neural

network learning did not use gradients, but \virtual tar-

gets" for units in intermediate layers 17], 18], or minimal

disturbance arguments 19]. The Lagrange formalism used

in the control theory literature provides p erhaps the b est

rigorous method for deriving back-propagation 20], and for

deriving generalizations of back-propagation to recurrent

PROC. OF THE IEEE, NOVEMBER 1998 4

networks 21], and networks of heterogeneous modules 22].

A simple derivation for generic multi-layer systems is given

in Section I-E.

The fact that local minima do not seem to be a problem

for multi-layer neural networks is somewhat of a theoretical

mystery. It is conjectured that if the network is oversized

for the task (as is usually the case in practice), the presence

of \extra dimensions" in parameter space reduces the risk

of unattainable regions. Back-propagation is by far the

most widely used neural-network learning algorithm, and

probably the most widely used learning algorithm of any

form.

D. Learning in Real Handwriting Recognition Systems

Isolated handwritten character recognition has been ex-

tensively studied in the literature (see 23], 24] for reviews),

and was one of the early successful applications of neural

networks 25]. Comparativeexperiments on recognition of

individual handwritten digits are rep orted in Section III.

They show that neural networks trained with Gradient-

Based Learning p erform better than all other methods

tested here on the same data. The best neural networks,

called Convolutional Networks, are designed to learn to

extract relevant features directly from pixel images (see

Section I I).

One of the most dicult problems in handwriting recog-

nition, however, is not only to recognize individual charac-

ters, but also to separate out characters from their neigh-

bors within the word or sentence, a process known as seg-

mentation. The technique for doing this that has become

the \standard" is called

Heuristic Over-Segmentation

. It

consists in generating a large number of p otential cuts

between characters using heuristic image processing tech-

niques, and subsequently selecting the best combination of

cuts based on scores given for each candidate character by

the recognizer. In such a mo del, the accuracy of the sys-

tem depends up on the quality of the cuts generated bythe

heuristics, and on

the ability of the recognizer to distin-

guish correctly segmented characters from pieces of char-

acters, multiple characters, or otherwise incorrectly seg-

mented characters. Training a recognizer to p erform this

task p oses a ma jor challenge b ecause of the dicultyincre-

ating a labeled database of incorrectly segmented charac-

ters. The simplest solution consists in running the images

of character strings through the segmenter, and then man-

ually labeling all the character hypotheses. Unfortunately,

not only is this an extremely tedious and costly task, it is

also dicult to do the labeling consistently. For example,

should the righthalfofacutup 4belabeledasa1oras

a non-character? should the right half of a cut up 8 be

labeled as a 3?

The rst solution, describ ed in Section V consists in

training the system at the level of whole strings of char-

acters, rather than at the character lev

el. The notion of

Gradient-Based Learning can b e used for this purp ose. The

system is trained to minimize an overall loss function which

measures the probability of an erroneous answer. Section V

explores various ways to ensure that the loss function is dif-

ferentiable, and therefore lends itself to the use of Gradient-

Based Learning metho ds. Section V introduces the use of

directed acyclic graphs whose arcs carry numerical infor-

mation as a way to represent the alternative hypotheses,

and introduces the idea of GTN.

The second solution describ ed in Section VI I is to elim-

inate segmentation altogether. The idea is to sweep the

recognizer over every p ossible lo cation on the input image,

and to rely on the \character spotting" prop erty of the rec-

ognizer, i.e. its ability to correctly recognize a well-centered

character in its input eld, even in the presence of other

characters besides it, while rejecting images containing no

centered characters 26], 27]. The sequence of recognizer

outputs obtained by sweeping the recognizer over the in-

put is then fed to a Graph Transformer Network that takes

linguistic constraints into account and nally extracts the

most likely interpretation. This GTN is somewhat similar

to Hidden Markov Models (HMM), which makes the ap-

proach reminiscent of the classical sp eech recognition 28],

29]. While this technique would be quite exp ensive in

the general case, the use of Convolutional Neural Networks

makes it particularly attractive b ecause it allows signicant

savings in computational cost.

E. Global ly Trainable Systems

As stated earlier, most practical pattern recognition sys-

tems are comp osed of multiple mo dules. For example, a

document recognition system is comp osed of a eld lo cator,

which extracts regions of interest, a eld segmenter, which

cuts the input image into images of candidate characters, a

recognizer, which classies and scores each candidate char-

acter, and a contextual p ost-processor, generally based on

astochastic grammar, which selects the b est grammatically

correct answer from the hypotheses generated by the recog-

nizer.

In most cases, the information carried from mo dule

to module is best represented as graphs with numerical in-

formation attached to the arcs. For example, the output

of the recognizer module can b e represented as an acyclic

graph where each arc contains the lab el and the score of

a candidate character, and where each path represent a

alternative interpretation of the input string. Typically,

eachmoduleismanually optimized, or sometimes trained,

outside of its context. For example, the character recog-

nizer would b e trained on lab eled images of pre-segmented

characters. Then the complete system is assembled, and

a subset of the parameters of the mo dules is manually ad-

justed to maximize the overall performance. This last step

is extremely tedious, time-consuming, and almost certainly

suboptimal.

A better alternativewould be to somehow train the en-

tire system so as to minimize a global error measure suchas

the probabilityofcharacter misclassications at the do cu-

mentlevel.

Ideally,wewould want to nd a go od minimum

of this global loss function with resp ect to all the param-

eters in the system. If the loss function

measuring the

performance can b e made dierentiable with respect to the

system's tunable parameters

, we can nd a lo cal min-

imum of

using Gradient-Based Learning. However, at

PROC. OF THE IEEE, NOVEMBER 1998 5

rst glance, it app ears that the sheer size and complexity

of the system would make this intractable.

To ensure that the global loss function

(

W

)isdif-

ferentiable, the overall system is built as a feed-forward net-

work of dierentiable modules. The function implemented

byeach module must be continuous and dierentiable

al-

most everywhere

with respect to the internal parameters of

the mo dule (e.g. the weights of a Neural Net character rec-

ognizer in the case of a character recognition module), and

with resp ect to the module's inputs. If this is the case, a

simple generalization of the well-known back-propagation

procedure can b e used to eciently compute the gradients

of the loss function with resp ect to all the parameters in

the system 22]. For example, let us consider a system

built as a cascade of mo dules, eachofwhich implements a

function

(

X

;

), where

is a vector rep-

resenting the output of the module,

is the vector of

tunable parameters in the mo dule (a subset of

), and

;

is the mo dule's input vector (as well as the previous

module's output vector). The input

to the rst module

is the input pattern

. If the partial derivativeof

with

respect to

is known, then the partial derivatives of

with respect to

and

;

can be computed using the

backward recurrence

(

X

;

)

;

(

X

;

)

(4)

where

(

X

;

) is the Jacobian of

with respect to

evaluated at the point(

X

;

), and

(

X

;

)

is the Jacobian of

with resp ect to

. The Jacobian of

avector function is a matrix containing the partial deriva-

tives of all the outputs with resp ect to all the inputs.

The rst equation computes some terms of the gradient

(

), while the second equation generates a back-

ward recurrence, as in the well-known back-propagation

procedure for neural networks. Wecanaverage the gradi-

ents over the training patterns to obtain the full gradient.

It is interesting to note that in many instances there is

no need to explicitly compute the Jacobian matrix. The

aboveformula uses the pro duct of the Jacobian with a vec-

tor of partial derivatives, and it is often easier to compute

this product directly without computing the Jacobian be-

forehand. In By analogy with ordinary multi-layer neural

networks, all but the last mo dule are called hidden layers

because their outputs are not observable from the outside.

more complex situations than the simple cascade of mod-

ules described above, the partial derivative notation be-

comes somewhat ambiguous and awkward. A completely

rigorous derivation in more general cases can be done using

Lagrange functions 20], 21], 22].

Traditional multi-layer neural networks are a special case

of the ab ove where the state information

is represented

with xed-sized vectors, and where the mo dules are al-

ternated layers of matrix multiplications (the weights) and

component-wise sigmoid functions (the neurons). However,

as stated earlier, the state information in complex recogni-

tion system is b est represented by graphs with numerical

information attached to the arcs. In this case, each mo dule,

called a Graph Transformer, takes one or more graphs as

input, and pro duces a graph as output. Networks of suc

modules are called Graph Transformer Networks (GTN).

Sections IV, VI and VIII develop the concept of GTNs,

and show that Gradient-Based Learning can be used to

train all the parameters in all the mo dules so as to mini-

mize a global loss function. It may seem paradoxical that

gradients can b e computed when the state information is

represented by essentially discrete ob jects such as graphs,

but that diculty can b e circumvented, as shown later.

II. Convolutional Neural Networks for

Isolated Character Recognition

The ability of multi-layer networks trained with gradi-

ent descent to learn complex, high-dimensional, non-linear

mappings from large collections of examples makes them

obvious candidates for image recognition tasks. In the tra-

ditional model of pattern recognition, a hand-designed fea-

ture extractor gathers relevant information from the input

and eliminates irrelevantv

ariabilities. A trainable classier

then categorizes the resulting feature vectors into classes.

In this scheme, standard, fully-connected multi-layer net-

works can b e used as classiers. A potentially more inter-

esting scheme is to rely on as much as possible on learning

in the feature extractor itself. In the case of character

recognition, a network could be fed with almost raw in-

puts (e.g. size-normalized images). While this can be done

with an ordinary fully connected feed-forward network with

some success for tasks suchascharacter recognition, there

are problems.

Firstly,typical images are large, often with several hun-

dred variables (pixels). A fully-connected rst layer with,

sayonehundred hidden units in the rst layer, would al-

ready contain several tens of thousands of weights. Such

a large number of parameters increases the capacityofthe

system and therefore requires a larger training set. In ad-

dition, the memory requirement to store so manyweigh

may rule out certain hardware implementations. But, the

main deciency of unstructured nets for image or sp eech

applications is that they have no built-in invariance with

respect to translations, or local distortions of the inputs.

Before b eing sent to the xed-size input layer of a neural

net, character images, or other 2D or 1D signals, must be

approximately size-normalized and centered in the input

eld. Unfortunately,nosuch preprocessing can be perfect:

handwriting is often normalized at the word level, which

can cause size, slant, and p osition variations for individual

characters. This, combined with variability in writing style,

will cause variations in the position of distinctive features

in input ob jects. In principle, a fully-connected network of

sucient size could learn to pro duce outputs that are in-

variant with resp ect to suchvariations. However, learning

such a task would probably result in multiple units with

similar weight patterns positioned at various lo cations in

the input so as to detect distinctive features wherever they

appear on the input. Learning these weight congurations

评论收藏

内容反馈

版权申诉

博士僧小星

粉丝: 2439
资源: 5998

基于Python做的一些深度学习和机器学习的例子.zip

深度学习例子.zip

基于python的深度学习

深度学习应用实例.zip

基于深度学习的贫困生认定系统.zip

Python3机器学习实战教程.zip

python机器学习大作业.zip

机器学习的Python实现.zip

基于深度学习和机器学习及其基础知识的项目实例源码.zip

一些基于python的例子， ai测试， 图像处理， matlab调用，文件操作.zip

python深度学习.zip

使用 pytorch 构建一些简单的深度学习项目实例源码.zip

深度学习入门：基于python实现.zip

基于python的深度学习的安全帽佩戴检测源码数据库.zip

《Python 机器学习经典实例》代码基于Python3.x实现.zip

基于深度学习的药物相互作用预测.zip

基于python的图像识别.zip

基于深度学习的车牌号识别系统.zip

机器学习例子（Python代码）

python机器学习案例

实现各个主流深度学习，数据挖掘算法，基于Python.zip

python写的深度学习代码

用python做计算机视觉，人工智能，机器学习，深度学习等源码.zip

python算法300例子.zip

python118基于深度学习的电影评论情感分析系统.zip

机器学习入门test.zip

基于强化学习和深度 Q 学习的 AI 驱动的蛇游戏python源码+项目说明.zip

最新资源

一些基于python的例子， ai测试，图像处理， matlab调用，文件操作.zip