software engineering/ํŒŒ์ด์ฌ ๋จธ์‹ ๋Ÿฌ๋‹

[Machine Learning] ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• (Decision trees)

jjingle 2024. 1. 11. 14:33

ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•(tree-based methods)

Predictor ๊ณต๊ฐ„(space) -> ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹จ์ˆœํ•˜๊ณ  ์ž‘์€ ๊ณต๊ฐ„์œผ๋กœ ๊ณ„์ธตํ™”(stratify), ๋‚˜๋ˆ„๋Š”(segment) ๋ฐฉ๋ฒ•
=>
Predictor ๊ณต๊ฐ„์„ ๋‚˜๋ˆ„๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ๋ถ„ํ•  ๊ทœ์น™์ด, ๋งˆ์น˜ ๋‚˜๋ฌด๊ฐ€ ๊ฐ€์ง€๋ฅผ ์น˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•˜์—ฌ decision tree ๋ฐฉ๋ฒ•

 

  • ์žฅ์  : ๋‹จ์ˆœํ•ด์„œ ํ•ด์„ํ•˜๊ธฐ ์‰ฌ์›€
  • ๋‹จ์  : Decision tree ๋ฐฉ๋ฒ•์€ ๋ณดํ†ต ๋‹ค๋ฅธ ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์ง€ ๋ชปํ•จ
  • => ๋Œ€์•ˆ์œผ๋กœ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํŠธ๋ฆฌ๋ฅผ ๋งŒ๋“ค์–ด ์˜ˆ์ธก์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ์‹์ธ  bagging, random forests, boosting ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉ
  • (๋‹จ, ์ด ๋ฐฉ๋ฒ•์€ ํ•ด์„๋ ฅ์ด ๋–จ์–ด์ง)

 

  • Internal node(๋‚ด๋ถ€ ๋…ธ๋“œ) : ๊ธฐ์ค€์œผ๋กœ ๋น„๊ตํ•˜์—ฌ ์ขŒ์šฐ๋กœ ๋‚˜๋ˆ”
  • Terminal node(ํ„ฐ๋ฏธ๋„ ๋…ธ๋“œ) ๋˜๋Š” leaf(์žŽ)์˜ ์ˆซ์ž : ํ•ด๋‹น ๊ณต๊ฐ„์— ํฌํ•จ๋œ response์˜ ํ‰๊ท ๊ฐ’

 

ํŠธ๋ฆฌ ์ƒ์„ฑ ๊ณผ์ •

  • Predictor space๋ฅผ J๊ฐœ์˜ ๊ฒน์น˜์ง€ ์•Š๋Š” ๊ณต๊ฐ„์œผ๋กœ ๊ตฌ๋ถ„
  • ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ๋Š” R ์ค‘ ํ•˜๋‚˜์˜ ๊ณต๊ฐ„์— ๋–จ์–ด์ง€๋ฉฐ ๊ฐ™์€ ๊ณต๊ฐ„์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋“ค์€ ๋ชจ๋‘ ๊ฐ™์€ ์˜ˆ์ธก๊ฐ’์„ ๊ฐ€์ง (ํ•ด๋‹น ๊ณต๊ฐ„์˜ training data์˜ response์˜ ํ‰๊ท )
  • ํŠธ๋ฆฌ ์ƒ์„ฑ ๋ชฉํ‘œ : RSS๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” box๋ฅผ ์ฐพ๋Š” ๊ฒƒ
  • Predictor space๋ฅผ J๊ฐœ์˜ box๋กœ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์€ ์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅํ•จ -> ๋”ฐ๋ผ์„œ greedy(ํ˜„์‹œ์ ์—์„œ ๊ฐ€์žฅ ํฐ ์ด๋“์„ ์ฃผ๋Š” ๊ฒƒ์„ ์„ ํƒ)ํ•œ ์ ‘๊ทผ๋ฐฉ์‹์ธ recursive binary splitting(๋ฐ˜๋ณต์ ์œผ๋กœ 2๊ฐœ์˜ ๊ณต๊ฐ„์œผ๋กœ ๊ตฌ๋ถ„) ๋ฐฉ๋ฒ• ์‚ฌ์šฉ
  • ๋งˆ์น˜ ๋‚˜๋ญ‡๊ฐ€์ง€์ฒ˜๋Ÿผ ์œ„์—์„œ ์•„๋ž˜๋กœ(top-down) ํŠธ๋ฆฌ์˜ ๊ณต๊ฐ„์„ ๊ณ„์†ํ•ด์„œ ๋ถ„ํ• ํ•˜์—ฌ ๋งค๋ฒˆ ์ƒˆ๋กœ์šด ๊ณต๊ฐ„ ์ƒ์„ฑ
  • ๋งค๋ฒˆ ๊ณต๊ฐ„์„ ๊ตฌ๋ถ„ํ•  ๋•Œ๋งˆ๋‹ค ํ˜„์‹œ์ ์—์„œ ์ตœ์ ์˜ ๋ถ„ํ• ๋ฒ•์„ ์„ ํƒํ•˜๋ฏ€๋กœ greedyํ•œ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ๋ถˆ๋ฆผ
  • ์ค‘๋‹จ ๊ทœ์น™์„ ๋งŒ์กฑํ•  ๋•Œ๊นŒ์ง€ ๊ณต๊ฐ„ ๋ถ„ํ•  ๋ฐ˜๋ณต ์ˆ˜ํ–‰

 

ํŠธ๋ฆฌ ๊ณผ์ ํ•ฉ(overfitting) ๋ฐฉ์ง€

  • J๊ฐ€ ํด(ํŠธ๋ฆฌ๊ฐ€ ๊นŠ์€) ๊ฒฝ์šฐ : ๊ณผ์ ํ•ฉ ๋  ์ˆ˜ ์žˆ์Œ
  • J๊ฐ€ ์ž‘์„ ๊ฒฝ์šฐ : ์ž‘์œผ๋ฉด variance๊ฐ€ ๋‚ฎ๊ณ  ํ•ด์„๋ ฅ์ด ์ข‹์•„์ง(๋‹จ, bias๋Š” ์ปค์ง)
  • ๋Œ€์•ˆ : ํŠธ๋ฆฌ ์ƒ์„ฑ ๊ณผ์ •์—์„œ ํŠธ๋ฆฌ๊ฐ€ ๋„ˆ๋ฌด ๊นŠ์–ด์ง€์ง€ ์•Š๋„๋ก ์กฐ์ ˆ
  • ๋‹จ์  : ์ด ๋Œ€์•ˆ์€ ๊ทผ์‹œ์•ˆ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ํ–ฅํ›„ ์žˆ์„ ํฐ RSS์˜ ๊ฐ์†Œ๋ฅผ ๋งŒ๋“œ๋Š” ๊ณต๊ฐ„ ๋ถ„ํ• ์„ ์‚ฌ์ „์— ์ฐจ๋‹จํ•  ์ˆ˜ ์žˆ์Œ

 

Pruning a tree (ํŠธ๋ฆฌ ๊ฐ€์ง€์น˜๊ธฐ)

  • ๊ณผ์ ํ•ฉ์„ ๋ง‰๊ธฐ ์œ„ํ•œ ๋” ์ข‹์€ ๋ฐฉ๋ฒ•์€ ํฐ ํŠธ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ํŠธ๋ฆฌ๋ฅผ pruning ํ•˜์—ฌ ๋ถ€๋ถ„ ํŠธ๋ฆฌ(subtree)๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ

 

Classification tree

  • Regression tree์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ƒ์„ฑํ•˜๋˜ ํ•œ ๊ณต๊ฐ„(region)์— ๊ฐ€์žฅ ๋งŽ์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š” training data์˜ ํด๋ž˜์Šค(class)๋กœ ์˜ˆ์ธก
  • RSS ๋Œ€์‹  classfication error rate๋ฅผ ์‚ฌ์šฉ
  • ๋‹จ, classfication error rate๋Š” ํŠธ๋ฆฌ ์ƒ์„ฑ์— ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ๋งŒํผ ๋ฏผ๊ฐํ•˜์ง€ ์•Š์•„ ๋‹ค๋ฅธ 2๊ฐ€์ง€ ๊ธฐ์ค€์ด ๋” ๋งŽ์ด ์‚ฌ์šฉ๋จ

 

ํŠธ๋ฆฌ์˜ ์žฅ๋‹จ์ 

  • ์žฅ์ 
    • ์„ค๋ช…ํ•˜๊ธฐ ์‰ฌ์›€(๋ถ„๊ธฐ๊ฐ€ ๋‹จ์ˆœํ•  ๊ฒฝ์šฐ ์„ ํ˜•ํšŒ๊ท€๋ณด๋‹ค ๋” ์ง๊ด€์ ์ž„)
    • Decision tree ๊ฐ€ ์‚ฌ๋žŒ์˜ ์˜์‚ฌ๊ฒฐ์ • ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•˜๋‹ค๋Š” ๋ฏฟ์Œ์ด ์žˆ์Œ
    • ๊ทธ๋ฆผ์œผ๋กœ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ์–ด ๋น„์ „๋ฌธ๊ฐ€๋„ ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์›€
    • Dummy variable ์ƒ์„ฑ ์—†์ด ์งˆ์ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ
  • ๋‹จ์ 
    • ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹ค๋ฅธ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ์„ฑ๋Šฅ์ด ๋‚ฎ์Œ

=> ๊ทธ๋Ÿฌ๋‚˜, ์—ฌ๋Ÿฌ ๊ฐœ์˜ decision tree๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ํ•ฉ์น˜๋Š”(aggregating) ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ