๐Ÿ‘ฆ ๋‚ด์ผ๋ฐฐ์›€์บ ํ”„/TIL(Today I Learned)

TIL_220516_๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋กœ์ ํŠธ ๊ธฐ์ดˆ

  • -

๋”ฅ ๋Ÿฌ๋‹์ด๋ž€?

๋”ฅ ๋Ÿฌ๋‹ :

  • ๋จธ์‹  ๋Ÿฌ๋‹์˜ ํ•œ ๋ถ„์•ผ
  • ์ธต(Layer)์„ ๊นŠ๊ฒŒ(Deep) ์Œ“๋Š”๋‹ค๊ณ  ํ•ด์„œ ๋”ฅ๋Ÿฌ๋‹
  • ๋”ฅ๋Ÿฌ๋‹์˜ ๋‹ค๋ฅธ ๋‹จ์–ด ํ‘œํ˜„
    1. ๋”ฅ๋Ÿฌ๋‹(Deep learning)
    2. Deep neural networks
    3. Multilayer Perceptron(MLP)
  • ๋”ฅ๋Ÿฌ๋‹์˜ ์ฃผ์š” ๊ฐœ๋…๊ณผ ๊ธฐ๋ฒ•
    • ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ์™€ ์—ํญ
    • ํ™œ์„ฑํ™” ํ•จ์ˆ˜
    • ๊ณผ์ ํ•ฉ๊ณผ ๊ณผ์†Œ์ ํ•ฉ
    • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•
    • ๋“œ๋ž์•„์›ƒ
    • ์•™์ƒ๋ธ”
    • ํ•™์Šต๋ฅ  ์กฐ์ •

 

๋ถ„์•ผ ์„ค๋ช…

 


 

๋”ฅ๋Ÿฌ๋‹์˜ ์—ญ์‚ฌ

XOR ๋ฌธ์ œ : 

๊ธฐ์กด์˜ ๋จธ์‹ ๋Ÿฌ๋‹์€ AND, OR ๋ฌธ์ œ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘

 

 

Perceptron(ํผ์…‰ํŠธ๋ก ) : 

๋…ผ๋ฆฌํšŒ๊ท€์œผ๋กœ ํผ์…‰ํŠธ๋ก ์„ ์„ค๋ช…ํ•˜๋Š” ์ˆ˜์‹
Perceptron(ํผ์…‰ํŠธ๋ก )์˜ ๋ชจ์–‘

 

ํ•˜์ง€๋งŒ ํ•™์Šต ์‹œํ‚ค๊ธฐ์—๋Š” XOR๋ฌธ์ œ๋ฅผ ํ’€์ง€ ๋ชปํ–ˆ๋‹ค.

Multilayer Perceptrons (MLP)๋ผ๋Š” ๊ฐœ๋…์„ ํ†ตํ•ด ๋ฌธ์ œ๋ฅผ ํ’€์–ด๋ณด๋ ค๊ณ  ํ–ˆ์œผ๋‚˜ ์‹คํŒจ.

Multilayer Perceptrons (MLP) ์˜ ๋ชจ์–‘

 

Backpropagation (์—ญ์ „ํŒŒ) :

1974๋…„์— ๋ฐœํ‘œ๋œ Paul Werbos(ํด)์ด๋ผ๋Š” ์‚ฌ๋žŒ์˜ ๋ฐ•์‚ฌ ๋…ผ๋ฌธ์˜ ์‹œ์ž‘

Backpropagation ์˜ ๋ชจ์–‘

  1. ์šฐ๋ฆฌ๋Š” W(weight)์™€ b(bias)๋ฅผ ์ด์šฉํ•ด์„œ ์ฃผ์–ด์ง„ ์ž…๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์ถœ๋ ฅ์„ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
  2. ๊ทธ๋Ÿฐ๋ฐ MLP๊ฐ€ ๋งŒ๋“ค์–ด๋‚ธ ์ถœ๋ ฅ์ด ์ •๋‹ต๊ฐ’๊ณผ ๋‹ค๋ฅผ ๊ฒฝ์šฐ W์™€ b๋ฅผ ์กฐ์ ˆํ•ด์•ผํ•œ๋‹ค.
  3. ๊ทธ๊ฒƒ์„ ์กฐ์ ˆํ•˜๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ์ถœ๋ ฅ์—์„œ Error(์˜ค์ฐจ)๋ฅผ ๋ฐœ๊ฒฌํ•˜์—ฌ ๋’ค์—์„œ ์•ž์œผ๋กœ ์ ์ฐจ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค.

 

' 1986๋…„์— Hinton ๊ต์ˆ˜๊ฐ€ ๋˜‘๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ๋…์ž์ ์œผ๋กœ ๋ฐœํ‘œ : ํ•ต์‹ฌ๋ฐฉ๋ฒ•์€ ๋ฐ”๋กœ ์—ญ์ „ํŒŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ฐœ๊ฒฌ '

 

์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ชจ์–‘

 


 

Deep Neural Networks ๊ตฌ์„ฑ ๋ฐฉ๋ฒ•

Layer(์ธต) ์Œ“๊ธฐ : 

  • Input layer(์ž…๋ ฅ์ธต): ๋„คํŠธ์›Œํฌ์˜ ์ž…๋ ฅ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ํ•™์Šต์‹œํ‚ค๊ณ  ์‹ถ์€ x ๊ฐ’์ž…๋‹ˆ๋‹ค.
  • Output layer(์ถœ๋ ฅ์ธต): ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์˜ˆ์ธกํ•œ ๊ฐ’, ์ฆ‰ y ๊ฐ’์ž…๋‹ˆ๋‹ค.
  • Hidden layers(์€๋‹‰์ธต): ์ž…๋ ฅ์ธต๊ณผ ์ถœ๋ ฅ์ธต์„ ์ œ์™ธํ•œ ์ค‘๊ฐ„์ธต์ž…๋‹ˆ๋‹ค.
  • ํ’€์–ด์•ผํ•  ๋ฌธ์ œ์— ๋”ฐ๋ฅธ ์ž…๋ ฅ์ธต๊ณผ ์ถœ๋ ฅ์ธต์˜ ๋ชจ์–‘์ด ์ •ํ•ด์ง.
  • ๋Œ€ํ‘œ์ ์œผ๋กœ ์‹ ๊ฒฝ์จ์•ผํ•  ์ธต์€ ์€๋‹‰์ธต : ์™„์ „์—ฐ๊ฒฐ ๊ณ„์ธต (Fully connected layer = Dense layer)

๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ 3๊ฐ€์ง€

 

๊ธฐ๋ณธ์ ์ธ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(Deep neural networks) ๊ตฌ์„ฑ :

  • ์ž…๋ ฅ์ธต์˜ ๋…ธ๋“œ ๊ฐœ์ˆ˜ 4๊ฐœ
  • ์ฒซ ๋ฒˆ์งธ ์€๋‹‰์ธต ๋…ธ๋“œ ๊ฐœ์ˆ˜ 8๊ฐœ
  • ๋‘ ๋ฒˆ์งธ ์€๋‹‰์ธต ๋…ธ๋“œ ๊ฐœ์ˆ˜ 16๊ฐœ
  • ์„ธ ๋ฒˆ์งธ ์€๋‹‰์ธต ๋…ธ๋“œ๊ฐœ์ˆ˜ 8๊ฐœ
  • ์ถœ๋ ฅ์ธต ๋…ธ๋“œ๊ฐœ์ˆ˜ 3๊ฐœ
  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜(activation function)๋ฅผ ๋ณดํŽธ์ ์ธ ๊ฒฝ์šฐ ๋ชจ๋“  ์€๋‹‰์ธต ๋ฐ”๋กœ ๋’ค์— ์œ„์น˜

๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(Deep neural networks) ๊ตฌ์„ฑ

 

๋„คํŠธ์›Œํฌ์˜ Width(๋„ˆ๋น„)์™€ Depth(๊นŠ์ด) ๊ฐœ๋…

Baseline model(๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ) : ์ ๋‹นํ•œ ์ •ํ™•๋„์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ

  • ์ž…๋ ฅ์ธต: 4
  • ์ฒซ ๋ฒˆ์งธ ์€๋‹‰์ธต: 8
  • ๋‘ ๋ฒˆ์งธ ์€๋‹‰์ธต: 4
  • ์ถœ๋ ฅ์ธต: 1

 

๊ธฐ๋ณธ์ ์ธ ์‹คํ—˜(ํŠœ๋‹)์˜ ์˜ˆ์‹œ

1) ๋„คํŠธ์›Œํฌ์˜ ๋„ˆ๋น„๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•

  • ์ž…๋ ฅ์ธต: 4
  • ์ฒซ ๋ฒˆ์งธ ์€๋‹‰์ธต: 8 * 2 = 16
  • ๋‘ ๋ฒˆ์งธ ์€๋‹‰์ธต: 4 * 2 = 8
  • ์ถœ๋ ฅ์ธต: 1

 

2) ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•

  • ์ž…๋ ฅ์ธต: 4
  • ์ฒซ ๋ฒˆ์งธ ์€๋‹‰์ธต: 4
  • ๋‘ ๋ฒˆ์งธ ์€๋‹‰์ธต: 8
  • ์„ธ ๋ฒˆ์งธ ์€๋‹‰์ธต: 8
  • ๋„ค ๋ฒˆ์งธ ์€๋‹‰์ธต: 4
  • ์ถœ๋ ฅ์ธต: 1

 

3) ๋„ˆ๋น„์™€ ๊นŠ์ด๋ฅผ ์ „๋ถ€ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•

  • ์ž…๋ ฅ์ธต: 4
  • ์ฒซ ๋ฒˆ์งธ ์€๋‹‰์ธต: 8
  • ๋‘ ๋ฒˆ์งธ ์€๋‹‰์ธต: 16
  • ์„ธ ๋ฒˆ์งธ ์€๋‹‰์ธต: 16
  • ๋„ค ๋ฒˆ์งธ ์€๋‹‰์ธต: 8
  • ์ถœ๋ ฅ์ธต: 1
์‹ค๋ฌด์—์„œ๋Š” ๋„คํŠธ์›Œํฌ์˜ ๋„ˆ๋น„์™€ ๊นŠ์ด๋ฅผ ๋ฐ”๊พธ๋ฉด์„œ ์‹คํ—˜์„ ๋งŽ์ด ํ•จ. ๊ทธ๋งŒํผ ์‹œ๊ฐ„๋„ ๋งŽ์ด ๋“ค๊ณ  ์ง€๋ฃจํ•œ ์ž‘์—…. ๊ณผ์ ํ•ฉ๊ณผ ๊ณผ์†Œ์ ํ•ฉ์„ ํ”ผํ•˜๊ธฐ์œ„ํ•ด์„œ๋Š” ๊ผญ ํ•„์š”ํ•œ ๋…ธ๊ฐ€๋‹ค์ด๋‹ค.

 


 

๋”ฅ๋Ÿฌ๋‹์˜ ์ฃผ์š” ๊ฐœ๋…

 

Batch size, Epoch (๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ, ์—ํญ)

batch์™€ iteration

  • ๋ฐ์ดํ„ฐ์…‹์„ ์ž‘์€ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์„œ ํ•™์Šต์„ ์‹œํ‚ค๋Š”๋ฐ ์ชผ๊ฐœ๋Š” ๋‹จ์œ„๋ฅผ ๋ฐฐ์น˜(Batch)
  • ์ชผ๊ฐœ๋Š” ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜๋Š” ๊ฒƒ์„ Iteration(์ดํ„ฐ๋ ˆ์ด์…˜)

epoch

  • ๋ฌธ์ œ๋ฅผ ์—ฌ๋Ÿฌ๋ฒˆ ํ’€์–ด๋ณด๋Š” ๊ณผ์ • epochs(์—ํญ)
  • batch๋ฅผ ๋ช‡ ๊ฐœ๋กœ ๋‚˜๋ˆ ๋†“์•˜๋ƒ์— ์ƒ๊ด€ ์—†์ด ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์„ ํ•œ ๋ฒˆ ๋Œ ๋•Œ ํ•œ epoch์ด ๋๋‚จ.

 

๋”ฐ๋ผ์„œ 1์ฒœ๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹์„ 1์ฒœ๊ฐœ ๋‹จ์œ„์˜ ๋ฐฐ์น˜๋กœ ์ชผ๊ฐœ๋ฉด, 1๋งŒ๊ฐœ์˜ ๋ฐฐ์น˜๊ฐ€ ๋˜๊ณ , ์ด 1๋งŒ๊ฐœ์˜ ๋ฐฐ์น˜๋ฅผ 100์—ํญ์„ ๋ˆ๋‹ค๊ณ  ํ•˜๋ฉด 1๋งŒ * 100 = 100๋งŒ๋ฒˆ์˜ ์ดํ„ฐ๋ ˆ์ด์…˜์„ ๋„๋Š” ๊ฒƒ์ด ๋ฉ๋‹ˆ๋‹ค!

 

 

๋”ฅ๋Ÿฌ๋‹ ์ชผ๊ฐœ๊ธฐ ๋ฐ˜๋ณต ๋ชจ์–‘

 

Activation functions (ํ™œ์„ฑํ™” ํ•จ์ˆ˜)

  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ๋น„์„ ํ˜• ํ•จ์ˆ˜ : ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜
์—ฐ๊ตฌ์ž๋“ค์€ ๋‰ด๋Ÿฐ์˜ ์‹ ํ˜ธ์ „๋‹ฌ ์ฒด๊ณ„๋ฅผ ํ‰๋‚ด๋‚ด๋Š” ํ•จ์ˆ˜๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ ๋งŒ๋“ค์—ˆ๋Š”๋ฐ, ์ „๊ธฐ ์‹ ํ˜ธ์˜ ์ž„๊ณ„์น˜๋ฅผ ๋„˜์–ด์•ผ ๋‹ค์Œ ๋‰ด๋Ÿฐ์ด ํ™œ์„ฑํ™” ํ•œ๋‹ค๊ณ ํ•ด์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

MLP์˜ ์—ฐ๊ฒฐ ๊ตฌ์กฐ๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋‰ด๋Ÿฐ์ด ์—ฐ๊ฒฐ๋œ ๋ชจ์Šต

 

  • ๋Œ€ํ‘œ์ ์ธ ์˜ˆ : ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜

๋น„์„ ํ˜• ํ•จ์ˆ˜ ์ž๋ฆฌ์— ์‹œ๊ทธ๋ชจ์ด๋“œ ๋ฐฐ์น˜

 

  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ข…๋ฅ˜
  • ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ๋ณดํŽธ์ ์œผ๋กœ ์“ฐ์ด๋Š” ํ™œ์„ฑํ™”ํ•จ์ˆ˜๋Š” ๋‹จ์—ฐ ReLU(๋ ๋ฃจ)
    • ๋‹ค๋ฅธ ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ๋น„ํ•ด ํ•™์Šต์ด ๋น ๋ฅด๊ณ , ์—ฐ์‚ฐ ๋น„์šฉ์ด ์ ๊ณ , ๊ตฌํ˜„์ด ๊ฐ„๋‹จ

์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ข…๋ฅ˜

 

Overfitting, Underfitting (๊ณผ์ ํ•ฉ, ๊ณผ์†Œ์ ํ•ฉ)

๊ณผ์ ํ•ฉ ํ˜„์ƒ(Overfitting)

  • Training loss๋Š” ์ ์  ๋‚ฎ์•„์ง€๋Š”๋ฐ Validation loss๊ฐ€ ๋†’์•„์ง€๋Š” ์‹œ์ 
  • ๋ฌธ์ œ์˜ ๋‚œ์ด๋„์— ๋น„ํ•ด ๋ชจ๋ธ์˜ ๋ณต์žก๋„(Complexity)๊ฐ€ ํด ๊ฒฝ์šฐ ๊ฐ€์žฅ ๋งŽ์ด ๋ฐœ์ƒํ•˜๋Š” ํ˜„์ƒ

๊ณผ์†Œ์ ํ•ฉ(Underfitting)

  • ๋ฐ˜๋Œ€๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ’€์–ด์•ผํ•˜๋Š” ๋ฌธ์ œ์˜ ๋‚œ์ด๋„์— ๋น„ํ•ด ๋ชจ๋ธ์˜ ๋ณต์žก๋„๊ฐ€ ๋‚ฎ์„ ๊ฒฝ์šฐ ๋ฌธ์ œ๋ฅผ ์ œ๋Œ€๋กœ ํ’€์ง€ ๋ชปํ•˜๋Š” ํ˜„์ƒ

๊ณผ์ ํ•ฉ ํ˜„์ƒ(Overfitting), ๊ณผ์†Œ์ ํ•ฉ(Underfitting)

์šฐ๋ฆฌ๋Š” ์ ๋‹นํ•œ ๋ณต์žก๋„๋ฅผ ๊ฐ€์ง„ ๋ชจ๋ธ์„ ์ฐพ์•„์•ผ ํ•˜๊ณ  ์ˆ˜์‹ญ๋ฒˆ์˜ ํŠœ๋‹ ๊ณผ์ •์„ ๊ฑฐ์ณ ์ตœ์ ํ•ฉ(Best fit)์˜ ๋ชจ๋ธ์„ ์ฐพ์•„์•ผํ•œ๋‹ค.

๊ณผ์ ํ•ฉ(Overfitting)์„ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์—๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์ง€๋งŒ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋ชจ์œผ๊ธฐ, Data augmenation, Dropout ๋“ฑ์ด ์žˆ๋‹ค.

 


 

๋”ฅ๋Ÿฌ๋‹์˜ ์ฃผ์š” ์Šคํ‚ฌ

Data augmentation (๋ฐ์ดํ„ฐ ์ฆ๊ฐ•๊ธฐ๋ฒ•)

  • ๊ณผ์ ํ•ฉ์„ ํ•ด๊ฒฐํ•  ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•
  • ๋ถ€์กฑํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด์ถฉํ•˜๊ธฐ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•๊ธฐ๋ฒ•์ด๋ผ๋Š” ๊ผผ์ˆ˜์•„๋‹Œ ๊ผผ์ˆ˜๋ฅผ ์‚ฌ์šฉ
  • ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ๋ถ„์•ผ์˜ ๋”ฅ๋Ÿฌ๋‹์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•

Data augmentation (๋ฐ์ดํ„ฐ ์ฆ๊ฐ•๊ธฐ๋ฒ•) ์˜ˆ์‹œ

 

Dropout (๋“œ๋ž์•„์›ƒ)

  • ๊ณผ์ ํ•ฉ์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•
  • ๊ฐ ๋…ธ๋“œ๋“ค์ด ์ด์–ด์ง„ ์„ ์„ ๋นผ์„œ ์—†์• ๋ฒ„๋ฆฐ๋‹ค๋Š” ์˜๋ฏธ
  • ๊ฐ ๋ฐฐ์น˜๋งˆ๋‹ค ๋žœ๋คํ•œ ๋…ธ๋“œ๋ฅผ ๋Š์–ด๋ฒ„๋ฆผ. ์ฆ‰ ๋‹ค์Œ ๋…ธ๋“œ๋กœ ์ „๋‹ฌํ•  ๋•Œ ๋žœ๋คํ•˜๊ฒŒ ์ถœ๋ ฅ์„ 0์œผ๋กœ ๋งŒ๋“ค์–ด๋ฒ„๋ฆฌ๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

A > B : Dropout (๋“œ๋ž์•„์›ƒ)
์ถฉ๋ถ„ํ•  ๋งŒํผ์˜ ์ „๋ฌธ๊ฐ€๋งŒ ์„ ์ถœํ•ด์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋Œ์–ด๋ƒ„

 

Ensemble (์•™์ƒ๋ธ”)

  • ์ปดํ“จํŒ… ํŒŒ์›Œ๋งŒ ์ถฉ๋ถ„ํ•˜๋‹ค๋ฉด ๊ฐ€์žฅ ์‹œ๋„ํ•ด๋ณด๊ธฐ ์‰ฌ์šด ๋ฐฉ๋ฒ•
  • ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ๊ฐ๊ฐ ํ•™์Šต์‹œํ‚จ ํ›„ ๊ฐ๊ฐ์˜ ๋ชจ๋ธ์—์„œ ๋‚˜์˜จ ์ถœ๋ ฅ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํˆฌํ‘œ๋ฅผ ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์˜ ๊ธฐ๋ฒ•๊ณผ ๋น„์Šท
  • ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ชจ๋ธ์—์„œ ๋‚˜์˜จ ์ถœ๋ ฅ์—์„œ ๋‹ค์ˆ˜๊ฒฐ๋กœ ํˆฌํ‘œ(Majority voting)๋ฅผ ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ๊ณ , ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ๊ณ , ๋งˆ์ง€๋ง‰์— ๊ฒฐ์ •ํ•˜๋Š” ๋ ˆ์ด์–ด๋ฅผ ๋ถ™์ด๋Š” ๊ฒฝ์šฐ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์‘์šฉ์ด ๊ฐ€๋Šฅ
  • ์•™์ƒ๋ธ”์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์ตœ์†Œ 2% ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ ํšจ๊ณผ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค

Ensemble (์•™์ƒ๋ธ”)

 

Learning rate decay (Learning rate schedules)

  • ์‹ค๋ฌด์—์„œ๋„ ์ž์ฃผ ์“ฐ๋Š” ๊ธฐ๋ฒ•
  • Local minimum์— ๋น ๋ฅด๊ฒŒ ๋„๋‹ฌํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์‚ฌ์šฉ
  • ์™ผ์ชฝ(Decaying) : (์„ ํ˜ธ) ํฐ ํญ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ณ  ๋’ท๋ถ€๋ถ„์œผ๋กœ ๊ฐˆ ์ˆ˜๋ก ์ ์  ์กฐ๊ธˆ์”ฉ ์›€์ง์—ฌ์„œ ํšจ์œจ์ ์œผ๋กœ Local minimum์„ ์ฐพ๋Š” ๋ชจ์Šต
  • ์˜ค๋ฅธ์ชฝ(Decent) : Learning rate๋ฅผ ๊ณ ์ •์‹œ์ผฐ์„ ๋•Œ์˜ ๋ชจ์Šต
  • Keras ์—์„œ ์ž์ฃผ ์‚ฌ์šฉ : tf.keras.callbacks.LearningRateScheduler() ์™€ tf.keras.callbacks.ReduceLROnPlateau() ๋กœ Learning rate๋ฅผ ์กฐ์ ˆ

Learning rate decay ๊ธฐ๋ฒ•
Local minimum์„ ํšจ๊ณผ์ ์œผ๋กœ ์ฐพ๋„๋ก ๋„์™€์คŒ.

 


 

3์ฃผ์ฐจ ์ˆ™์ œ

 

'๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ๊ฐ€์žฅ ์œ ๋ช…ํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ค‘ ํ•˜๋‚˜์ธ MNIST ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ง์ ‘ ๋ถ„์„ํ•ด๋ณด๋„๋ก ํ•ฉ์‹œ๋‹ค!'

 

 

MNIST ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ์†์œผ๋กœ ์“ด 0 ~ 9 ๊นŒ์ง€์˜ ์ˆซ์ž ์ด๋ฏธ์ง€ ๋ชจ์Œ์ด๋ผ๊ณ  ํ•œ๋‹ค.

 

์ˆ™์ œ์ด๋‹ˆ.. ๋ถ„์„ํ•ด๋ณด๋„๋ก ํ•˜์ž.

 

 


๊ธฐ๋ณธ ํ‹€

๋”๋ณด๊ธฐ
# ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ
!kaggle datasets download -d oddrationale/mnist-in-csv
!unzip mnist-in-csv.zip


# ํŒจํ‚ค์ง€ ๋กœ๋“œ
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import Adam, SGD
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder


# ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ
train_df = pd.read_csv('mnist_train.csv')
train_df.head()
test_df = pd.read_csv('mnist_test.csv')
test_df.head()


# ๋ผ๋ฒจ ๋ถ„ํฌ
sns.countplot(train_df['label'])
plt.show()


# ์ „์ฒ˜๋ฆฌ
train_df = train_df.astype(np.float32)
# ์†Œ์ˆ˜์  float32 (๋น„ํŠธ) ๋กœ ๋ฐ”๊ฟˆ.
x_train = train_df.drop(columns=['label'], axis=1).values
# x ๊ฐ’์—๋Š” label ๋งŒ ๋นผ์ฃผ๊ณ , (().values) ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ np.array ๋กœ ๋ณ€ํ™˜
y_train = train_df[['label']].values
# y ๊ฐ’์—๋Š” label ๋งŒ ๋„ฃ์–ด์คŒ.

test_df = test_df.astype(np.float32)
x_test = test_df.drop(columns=['label'], axis=1).values
y_test = test_df[['label']].values

print(x_train.shape, y_train.shape)
# ํŠธ๋ ˆ์ด๋‹ : ๋ฐ์ดํ„ฐ์…‹์€ 27455, ์ž…๋ ฅ ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜ ํ”ฝ์…€์˜ ํฌ๊ธฐ : 784 / output ์˜ ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜ 1๊ฐœ
print(x_test.shape, y_test.shape)
# ํ…Œ์ŠคํŠธ : ๋ฐ์ดํ„ฐ์…‹์€ 7172, ์ž…๋ ฅ ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜ ํ”ฝ์…€์˜ ํฌ๊ธฐ : 784 / output ์˜ ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜ 1๊ฐœ


# ๋ฐ์ดํ„ฐ ๋ฏธ๋ฆฌ๋ณด๊ธฐ
index = 1
plt.title(str(y_train[index]))
plt.imshow(x_train[index].reshape((28, 28)), cmap='gray')
# reshape ์œผ๋กœ ์ด์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜ (x:28px, y:28px), cmap='gray' ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ ๋ณ€ํ™˜
plt.show()
# 6=G๋ฒˆ์— ํ•ด๋‹นํ•˜๋Š” (28, 28) ์ด๋ฏธ์ง€
# ์ •์ƒ์ ์œผ๋กœ ๋ณ€ํ™˜์ด ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ ๋ฐฉ๋ฒ• : One-hot encoding ํ›„ ๋ฐ์ดํ„ฐ ๋ฏธ๋ฆฌ๋ณด๊ธฐ ์žฌ์‹คํ–‰

# One-hot encoding
encoder = OneHotEncoder()
y_train = encoder.fit_transform(y_train).toarray()
# ๋ผ๋ฒจ ๊ฐ’๋งŒ One-hot encoding ์ ์šฉ ํ›„ array ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜
y_test = encoder.fit_transform(y_test).toarray()

print(y_train.shape)
# ์ •์ƒ์ ์œผ๋กœ ๋ณ€ํ™˜์ด ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ ๋ฐฉ๋ฒ• : One-hot encoding ํ›„ ๋ฐ์ดํ„ฐ ๋ฏธ๋ฆฌ๋ณด๊ธฐ ์žฌ์‹คํ–‰


# ์ผ๋ฐ˜ํ™”
x_train = x_train / 255. # 0 ~ 255 ์˜ ๋ฐ์ดํ„ฐ๋ฅผ 255 ๋กœ ๋‚˜๋ˆ„๋ฉด 0 ๊ณผ 1 ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ๋ถ„
# ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์˜ ์ตœ๋Œ€ ํ”ฝ์…€์€ 255px
x_test = x_test / 255.
# 1๋ฒˆ ์ด์ƒ ์‹คํ–‰์‹œ์ผฐ์„ ๊ฒฝ์šฐ 2๋ฒˆ์˜ ๋‚˜๋ˆ” ๊ฐ’์ด ์ ์šฉ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ผญ 1๋ฒˆ๋งŒ ์‹คํ–‰ํ•  ๊ฒƒ.


# ๋„คํŠธ์›Œํฌ ๊ตฌ์„ฑ
input = Input(shape=(784,))
hidden = Dense(1024, activation='relu')(input)
hidden = Dense(512, activation='relu')(hidden)
hidden = Dense(256, activation='relu')(hidden)
output = Dense(10, activation='softmax')(hidden)
# output, Dense ์›ํ•˜๋Š” ๊ฐ’์˜ ์ข…๋ฅ˜ ์ˆ˜ 0~9๊ฐœ 10๊ฐœ
# shape ๋กœ Dense ๊ฐ’ ํ™•์ธ
# activation='softmax' : ๋‹คํ•ญ ๋…ผ๋ฆฌ ํšŒ๊ท€ ์‚ฌ์šฉ

model = Model(inputs=input, outputs=output)

model.compile(loss='categorical_crossentropy',= optimizer=Adam(lr=0.001), metrics['acc'])
# ๋‹คํ•ญ ๋…ผ๋ฆฌ ํšŒ๊ท€ : categorical_crossentropy ์‚ฌ์šฉ
# metrics=['acc'] : 0 ~ 1 ์‚ฌ์ด๋กœ ์ •ํ™•๋„๋ฅผ ํผ์„ผํŠธ๋กœ ๋‚˜ํƒ€๋ƒ„

model.summary()


# ํ•™์Šต
history = model.fit(
    x_train,
    y_train,
    validation_data=(x_test, y_test), # ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด ํ•œ epoch์ด ๋๋‚ ๋•Œ๋งˆ๋‹ค ์ž๋™์œผ๋กœ ๊ฒ€์ฆ
    epochs=20 # epochs ๋ณต์ˆ˜ํ˜•์œผ๋กœ ์“ฐ๊ธฐ!
)


# ํ•™์Šต ๊ฒฐ๊ณผ ๊ทธ๋ž˜ํ”„
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
# .plot ๋กœ ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])โ€‹

 

์—๋Ÿฌ ๋…ธํŠธ

์ด์ „ ์‹ค์Šต๊ณผ ๋™์ผํ•œ ํผ์œผ๋กœ ์•ˆ์— ๋“ค์–ด๊ฐˆ ๋‚ด์šฉ์„ ์ˆ˜์ •ํ•˜๊ณ  ์‹คํ–‰ํ–ˆ๋‹ค.

 

ValueError

 

์‰ฝ๊ฒŒ ๋„˜์–ด๊ฐˆ ์นœ๊ตฌ๊ฐ€ ์•„๋‹ˆ์ง€ ^^..

 

์ฒ˜์Œ์—” ์›์ธ์ด ๋ญ”์ง€๋ฅผ ์ƒ๋‹จ์—์„œ ์—๋Ÿฌ๊ฐ€ ๋˜๋Š” ๋ฌธ๊ตฌ๋ฅผ ๊ตฌ๊ธ€๋งํ•˜๋ฉฐ ์ฐพ์•„๋ดค์ง€๋งŒ,

 

๋‹ต์ด ์ข€์ฒ˜๋Ÿผ ๋ณด์ด์ง€ ์•Š์•˜๋‹ค.

 

๊ทธ๋Ÿฌ๋‹ค

 

๊ฐ™์€ ํŒ€์›์—๊ฒŒ ๋ฌผ์–ด๋ดค๋Š”๋ฐ,

 

์•„๋‹ˆ ์™ ๊ฑธ ํŒ€์›์€ ๊ฐ€์žฅ ๋ฐ‘์— ์—๋Ÿฌ์— ๋Œ€ํ•œ ๋ถ€๋ถ„์„ ๊ฐ€๋ฆฌํ‚ค๋ฉฐ 'Shapes' ์— ๋Œ€ํ•œ ์˜ค๋ฅ˜๋ผ๊ณ  ์•Œ๋ ค์ฃผ์—ˆ๋‹ค.

 

์ด๋ ‡๊ฒŒ๋งŒ ๋ด์„  Shapes ์˜ ๋ญ๊ฐ€ ์ž˜๋ชป๋œ๊ฑด์ง€ ๋ชฐ๋ž๋Š”๋ฐ,

 

output ์˜ Dense ๋ฅผ ๋ณด์•„๋ผ..

 

๊ผญ ์ด๋Ÿฐ ์นœ๊ตฌ๋“ค ์žˆ์ง€ ์•Š์€๊ฐ€..

 

์ถœ์ œ๋œ ๋ฌธ์ œ๋ฅผ ์ฝ๊ณ  ํ•ด๋‹นํ•˜๋Š” ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค.

 

๋ถ„๋ช… 0 ~ 9 ๊นŒ์ง€์˜ ๊ฐ’์„ ์ฐพ์œผ๋ผ ํ–ˆ๊ฑฐ๋Š˜

 

10(0~9)์ด ์•„๋‹Œ 24 ๋ผ๋Š” ๊ฐ’์„ ๋„ฃ๊ณ  ์ถœ๋ ฅ์ด ์ž˜ ๋‚˜์™€์ฃผ๊ธธ ๋ฐ”๋ผ๋Š” ๋‚˜ ์ž์‹ .. ์œ ์ฃ„..

 

๋ฐ”๋กœ 10 ์œผ๋กœ ๊ณ ์ณ์“ฐ๊ณ  ์žฌ์‹คํ–‰ ํ–ˆ๋”๋‹ˆ ๋„ˆ๋ฌด ์ž˜ ๋˜์—ˆ๋‹ค..

 

์€ผ๋“ฏ -ใ……-

 

 

 

์ถœ์ฒ˜ ์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉํด๋Ÿฝ

 

 
Contents

ํฌ์ŠคํŒ… ์ฃผ์†Œ๋ฅผ ๋ณต์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค

์ด ๊ธ€์ด ๋„์›€์ด ๋˜์—ˆ๋‹ค๋ฉด ๊ณต๊ฐ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.