½ÃÀ庸°í¼­
»óǰÄÚµå
1817382

GenAI ¸ðµ¨ Æ®·¹ÀÌ´× ³Ê¸Ó : ½ÇÀü ȯ°æ¿¡¼­ AI Ãß·Ð ¿öÅ©·ÎµåÀÇ ºñ¿ë ¹× ·¹ÀÌÅϽÃÀÇ »è°¨°ú È®À强ÀÇ Çâ»ó

Beyond GenAI Model Training: Reducing Cost and Latency and Improving Scalability of AI Inferencing Workloads in Production

¹ßÇàÀÏ: | ¸®¼­Ä¡»ç: IDC | ÆäÀÌÁö Á¤º¸: ¿µ¹® 18 Pages | ¹è¼Û¾È³» : Áï½Ã¹è¼Û

    
    
    



¡Ø º» »óǰÀº ¿µ¹® ÀÚ·á·Î Çѱ۰ú ¿µ¹® ¸ñÂ÷¿¡ ºÒÀÏÄ¡ÇÏ´Â ³»¿ëÀÌ ÀÖÀ» °æ¿ì ¿µ¹®À» ¿ì¼±ÇÕ´Ï´Ù. Á¤È®ÇÑ °ËÅ並 À§ÇØ ¿µ¹® ¸ñÂ÷¸¦ Âü°íÇØÁֽñ⠹ٶø´Ï´Ù.

IDC Perspective´Â »ý¼ºÇü AI(GenAI) Ãß·Ð ¿öÅ©·Îµå¸¦ ½ÇÀü ȯ°æ¿¡¼­ È®ÀåÇÒ ¶§ÀÇ °úÁ¦¿Í Çõ½ÅÀ» ޱ¸Çϰí, ºñ¿ë »è°¨, ·¹ÀÌÅϽà °³¼±, È®À强¿¡ ÁßÁ¡À» µÎ°í ÀÖ½À´Ï´Ù. Ãß·Ð ÆÛÆ÷¸Õ½º¸¦ ÃÖÀûÈ­Çϱâ À§ÇÑ ¸ðµ¨ ¾ÐÃà, ¹èġó¸®, ij½Ã, º´·ÄÈ­ µîÀÇ ¹æ¹ý¿¡ ´ëÇØ¼­µµ ÁßÁ¡ÀûÀ¸·Î ´Ù·ç°í ÀÖ½À´Ï´Ù. AWS, DeepSeek, Google, IBM, Microsoft, NVIDIA, Red Hat, Snowflake, WRITER µîÀÇ º¥´õ´Â GenAI Ãß·Ð È¿À²¼º°ú Áö¼Ó°¡´É¼ºÀ» ³ôÀ̱â À§ÇÑ ±â¼ú Çõ½ÅÀ» ÃßÁøÇϰí ÀÖ½À´Ï´Ù. º» ¹®¼­´Â Á¶Á÷ÀÌ Ãß·Ð Àü·«À» »ç¿ë »ç·Ê¿¡ ¸ÂÃß¾î Á¶Á¤Çϰí, Á¤±âÀûÀ¸·Î ºñ¿ëÀ» Àç°ËÅäÇϰí, Àü¹®°¡¿Í Á¦ÈÞÇÏ´Â °ÍÀ¸·Î ½Å·Ú¼º°ú È®À强ÀÌ ¶Ù¾î³­ AI µµÀÔÀ» ½ÇÇöÇϵµ·Ï ¾îµå¹ÙÀ̽ºÇϰí ÀÖ½À´Ï´Ù. "AI Ãß·ÐÀÇ ÃÖÀûÈ­´Â ´Ü¼øÈ÷ ¼Óµµ ¹®Á¦°¡ ¾Æ´Õ´Ï´Ù. ºñ¿ë, È®À强, Áö¼Ó °¡´É¼º °£ÀÇ ±ÕÇüÀ» ¼³°èÇÏ¿© Çõ½Å°ú ºñÁî´Ï½º ¿µÇâÀÌ ¸¸³ª´Â »ý»ê ȯ°æ¿¡¼­ »ý¼ºÇü AIÀÇ ÀáÀç·ÂÀ» ½ÇÇöÇÏ´Â °ÍÀÔ´Ï´Ù."¶ó°í IDCÀÇ AI ¼ÒÇÁÆ®¿þ¾î ¸®¼­Ä¡ µð·ºÅÍ Kathy Lange´Â ¸»Çß½À´Ï´Ù.

À̱×Á¦Å¥Æ¼ºê ½º³À¼ô

»óȲ °³¿ä

  • AI Ãß·ÐÀ̶õ ¹«¾ùÀΰ¡? ¿Ö Áß¿äÇÑ °ÍÀΰ¡?
  • È¿À²ÀûÀÎ AI Ã߷п¡ ´ëÇÑ ¼ö¿ä Áõ°¡
    • GenAI Ãß·Ð ÀÎÇÁ¶ó ½ºÅÃ
    • GenAI Ãß·Ð ÆÛÆ÷¸Õ½º¿¡ ¿µÇâÀ» ¹ÌÄ¡´Â ¿äÀÎ
      • ¸ðµ¨ ¾ÐÃà ±â¼ú
      • µ¥ÀÌÅÍ ¹èġó¸® ±â¼ú
      • ij½Ã¿Í ±â¾ï Å×Å©´Ð
      • È¿À²ÀûÀÎ µ¥ÀÌÅÍ ·Îµù°ú »çÀü ó¸®
      • ÀԷ°ú Ãâ·Â »çÀÌÁ Ãà¼ÒÇÑ´Ù.
        • º´·ÄÈ­
        • ¸ðµ¨ ¶ó¿ìÆÃ
        • °¡Àå È¿°úÀûÀ̶ó°í »ý°¢µÇ´Â ¼ÒÇÁÆ®¿þ¾î Ç÷§Æû ÃÖÀûÈ­ ¹æ¹ýÀº ¾î¶² °ÍÀΰ¡?
        • Å×½ºÆ® ½Ã°è»ê(Ãß·Ð ½Ã°è»êÀ̶ó°íµµ ºÒ¸²)
        • »õ·Î¿î Á¶»ç ºÐ¾ß
    • Å×Å©³î·¯Áö °ø±Þ¾÷ü Çõ½Å

Å×Å©³î·¯Áö ±¸ÀÔÀÚ¿¡ ´ëÇÑ ¾îµå¹ÙÀ̽º

Âü°í ÀÚ·á

  • °ü·Ã Á¶»ç
  • ¿ä¾à
KSA 25.10.01

The IDC Perspective explores the challenges and innovations in scaling generative AI (GenAI) inference workloads in production, emphasizing cost reduction, latency improvement, and scalability. It highlights techniques like model compression, batching, caching, and parallelization to optimize inference performance. Vendors such as AWS, DeepSeek, Google, IBM, Microsoft, NVIDIA, Red Hat, Snowflake, and WRITER are driving advancements to enhance GenAI inference efficiency and sustainability. The document advises organizations to align inference strategies with use cases, regularly review costs, and partner with experts to ensure reliable, scalable AI deployment."Optimizing AI inference isn't just about speed," says Kathy Lange, research director, AI Software, IDC. "It's about engineering the trade-offs between cost, scalability, and sustainability to unlock the potential of generative AI in production, where innovation meets business impact."

Executive Snapshot

Situation Overview

  • What Is AI Inference, and Why Is It Important?
  • Growing Demand for Efficient AI Inference
    • The GenAI Inference Infrastructure Stack
    • Factors That Influence GenAI Inference Performance
      • Model Compression Techniques
      • Data Batching Techniques
      • Caching and Memorization Techniques
      • Efficient Data Loading and Preprocessing
      • Reducing Input and Output Sizes
        • Parallelization
        • Model Routing
        • Which Software Platform Optimization Techniques Are Considered Most Effective?
        • Test-Time Compute (aka Inference-Time Compute)
        • An Emerging Field of Research
    • Technology Supplier Innovation

Advice for the Technology Buyer

Learn More

  • Related Research
  • Synopsis
»ùÇà ¿äû ¸ñ·Ï
0 °ÇÀÇ »óǰÀ» ¼±Åà Áß
¸ñ·Ï º¸±â
Àüü»èÁ¦