software engineering/ํŒŒ์ด์ฌ ๋จธ์‹ ๋Ÿฌ๋‹

[Machine Learning] Subset selection๊ณผ ์ตœ์  ๋ชจ๋ธ ์„ ์ •

jjingle 2024. 1. 10. 17:01
  1. Subset selection (๋ถ€๋ถ„์ง‘ํ•ฉ ์„ ํƒ)
    • p๊ฐœ์˜ predictor ์ค‘ response์™€ ๊ด€๋ จ๋œ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐ๋˜๋Š” predictor ์‹๋ณ„
    • ์‹๋ณ„๋œ p๊ฐœ ๋ณด๋‹ค ์ ์€ ์ˆ˜์˜ predictor๋งŒ์„ least squares ๋ฐฉ๋ฒ•์œผ๋กœ ์ ํ•ฉ
  2. Shrinkage (์ˆ˜์ถ•)
    • p๊ฐœ์˜ predictor๋กœ ์ ํ•ฉํ•˜๋˜ coefficient ์ถ”์ • ๊ฐ’์ด 0์œผ๋กœ ์ž‘์•„์ง
    • ์ •๊ทœํ™”(regularization)๋กœ๋„ ๋ถˆ๋ฆผ
    • ๋ชจ๋ธ์˜ variance๋ฅผ ์ค„์ด๊ณ  ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•˜๋Š” ํšจ๊ณผ๋ฅผ ๊ฐ€์ง
  3. Dimension Reduction (์ฐจ์› ์ถ•์†Œ)
    • p๊ฐœ์˜ predictor๋ฅผ M์ฐจ์›์˜ subspace์— ํˆฌ์‚ฌํ•˜๋Š” ๋ฐฉ๋ฒ•(M < p)
    • M๊ฐœ์˜ linear combination์„ ๋งŒ๋“ค์–ด๋‚ด ์„ ํ˜•ํšŒ๊ท€์˜ predictor๋กœ ์‚ฌ์šฉ
    • ๋น„์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์˜ ํ•˜๋‚˜

01. Stepwise selection

  • Best subset์€ ๊ณ„์‚ฐ๋Ÿ‰์ด ๋งŽ์•„ p๊ฐ€ ๋งค์šฐ ํด ๊ฒฝ์šฐ ๋ฌธ์ œ๊ฐ€ ๋จ -> 2 p์ œ๊ณฑ ๊ฐœ์˜ ์กฐํ•ฉ ์กด์žฌ
  • p๊ฐ€ ํด ๊ฒฝ์šฐ ํ†ต๊ณ„์ ์ธ ๋ฌธ์ œ๋ฅผ ๊ฐ€์งˆ ์ˆ˜๋„ ์žˆ์Œ
    • Search space(๊ฒ€์ƒ‰ ์˜์—ญ)์ด ๋„“์„ ๊ฒฝ์šฐ(๋‹ค์–‘ํ•œ predictor์˜ ์กฐํ•ฉ), training data์— ๊ณผ์ ํ•ฉ๋˜๋Š” ๋ชจ๋ธ์„ ์ฐพ์„ ํ™•๋ฅ ์ด ๋†’์•„์ง
  • ๊ณ„์‚ฐ๋Ÿ‰๊ณผ ๊ณผ์ ํ•ฉ ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋Œ€์•ˆ์œผ๋กœ stepwise ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉ
    • stepwise ๋ฐฉ๋ฒ•์€ best subset๋ณด๋‹ค ํ›จ์”ฌ ์ ์€ ์ˆ˜์˜ ๋ชจ๋ธ์„ ํƒ์ƒ‰

 

02. Forward stepwise selection

  • predictor๊ฐ€ ์—†๋Š” ๋ชจ๋ธ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ๋ชจ๋“  predictor๊ฐ€ ๋ชจ๋ธ์— ์‚ฌ์šฉ๋  ๋•Œ๊นŒ์ง€ ํ•œ๋ฒˆ์— ํ•˜๋‚˜์”ฉ predictor๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ๊ณ„์‚ฐ๋Ÿ‰ ์ธก๋ฉด์—์„œ  best subset ๋Œ€๋น„ ์žฅ์ ์ด ์žˆ์Œ -> ๊ทธ๋Ÿฌ๋‚˜, best subset์œผ๋กœ ์ฐพ์€ best ๋ชจ๋ธ์„ forward stepwise selection์œผ๋กœ ์ฐพ๋Š”๋‹ค๋Š” ๋ณด์žฅ์€ ์—†์Œ

 

03. Backward stepwise selection

  • ๋ชจ๋“  predictor๊ฐ€ ํฌํ•จ๋œ ๋ชจ๋ธ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ๊ฐ€์žฅ ์œ ์šฉํ•˜์ง€ ์•Š์€ predictor๋ฅผ ํ•œ๋ฒˆ์— ํ•˜๋‚˜์”ฉ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ๋ฒ•