|
시장보고서
상품코드
1441458
자동차용 AI 기반 모델 기술과 이용 동향(2023-2024년)Automotive AI Foundation Model Technology and Application Trends Report, 2023-2024 |
||||||
2023년 이후 더 많은 차량 모델이 기반 모델과 연결되기 시작했고, 자동차 기반 모델 솔루션을 출시하는 Tier 1이 증가하고 있습니다. 특히 테슬라의 FSD V12의 큰 진전과 SORA의 출시는 조종석과 지능형 운전에서 AI 기반 모델의 구현을 가속화했습니다.
엔드 투 엔드 자율주행 기반 모델 붐이 일고 있습니다.
2023년 2월, 엔드 투 엔드 자율주행 모델을 채택한 Tesla FSD v12.2.1은 직원과 테스터뿐만 아니라 미국에서도 추진되기 시작했습니다. 첫 고객 피드백에 따르면 FSD V12는 상당히 강력하여 이전에는 자율주행을 믿지 않았던 일반인들도 감히 FSD를 이용할 수 있게 되었다고 합니다. 예를 들어 Tesla FSD V12는 도로의 웅덩이를 우회할 수 있으며, Tesla 엔지니어는 "이러한 운전 접근 방식은 명시적인 코드로 구현하기 어렵지만 Tesla의 엔드 투 엔드 접근 방식으로는 거의 어려움 없이 구현할 수 있다"고 말했습니다.
자율주행을 위한 AI 기반 모델 개발은 4단계로 나눌 수 있습니다.
1.0 단계에서는 지각 수준의 기반 모델(Transformer)을 사용합니다.
2.0 단계는 모듈화이며, 기본 모델은 인식, 계획 및 제어, 의사 결정에 사용됩니다.
3.0 단계는 엔드-투-엔드 기반 모델(한쪽 '끝'은 센서의 원시 데이터, 다른 쪽 '끝'은 구동 동작을 직접 출력하는 모델)입니다.
4.0 단계는 수직적 AI에서 범용 AI(AGI의 세계 모델)로 나아가는 단계입니다.
대부분의 기업은 현재 2.0 단계에 있으며, Tesla FSD V12는 이미 3.0 단계에 있습니다. 2024년 1월 30일, Xpeng Motor는 엔드-투-엔드 모델이 다음 단계에서 자동차에 완전히 사용될 것이라고 발표했으며, NIO와 Li Auto도 2024년에 "엔드-투-엔드 기반" 자율주행 모델을 발표할 것으로 알려졌습니다. 엔드-투-엔드 기반' 자율주행 모델을 발표할 것으로 알려졌습니다.
FSD V12의 운전 판단은 AI 알고리즘에 의해 생성됩니다. 이를 위해 30만 줄 이상의 C 코드를 대체하는 방대한 비디오 데이터로 훈련된 엔드 투 엔드 신경망을 사용하며, FSD V12는 검증이 필요한 새로운 경로를 제공합니다. 실현 가능하다면 업계에 파괴적인 영향을 미칠 것으로 예상됩니다.
2월 16일, OpenAI는 텍스트에서 동영상으로 변환하는 모델 SORA를 발표하며 AI 동영상 용도의 폭넓은 채택을 알렸습니다.
SORA는 텍스트와 이미지에서 최대 60초의 동영상 생성을 지원할 뿐만 아니라, 동영상 생성, 복잡한 시나리오 및 캐릭터 생성, 물리적 세계의 시뮬레이션 능력에서 기존 기술을 크게 능가합니다.
SORA와 FSD V12는 시각을 통해 AI가 현실의 물리적 세계를 이해하고 심지어 시뮬레이션할 수 있게 해주며, Elon Mask는 FSD 12와 Sora는 시각을 통해 세계를 인식하고 이해하는 AI의 능력의 두 가지 열매일 뿐이며, FSD는 궁극적으로 운전 행동에 사용되며, Sora는 비디오 생성에 사용된다, Sora는 비디오 생성에 사용될 것으로 보고 있습니다.
SORA의 높은 인기는 FSD V12의 합리성을 더욱 증명하고 있으며, Mask는 "지난해부터 테슬라의 세대별 비디오"라고 말했습니다.
AI 기반 모델은 빠르게 진화하고 새로운 기회를 가져다 줄 것입니다.
최근 3년 동안 자율주행의 기본 모델은 여러 차례 진화했고, 주요 자동차 제조업체의 자율주행 시스템은 매년 재작성되고 있으며, 후발주자에게도 진입 기회를 제공합니다.
CVPR 2023에서 SenseTime, OpenDriveLab, Horizon Robotics가 공동으로 발표한 엔드투엔드 자율주행 알고리즘인 UniAD가 2023 Best Paper를 수상했습니다.
자동차 AI 기반 모델에 대해 조사 분석하여 알고리즘 및 기반 모델 개요, 기반 모델 활용 동향, 기업 개요 등의 정보를 제공합니다.
Since 2023 ever more vehicle models have begun to be connected with foundation models, and an increasing number of Tier1s have launched automotive foundation model solutions. Especially Tesla's big progress of FSD V12 and the launch of SORA have accelerated implementation of AI foundation models in cockpits and intelligent driving.
End-to-End autonomous driving foundation models boom.
In February 2023, Tesla FSD v12.2.1, which adopts an end-to-end autonomous driving model, began to be pushed in the United States, not just to employees and testers. According to the feedback from the first customers, FSD V12 is quite powerful, allowing ordinary people who previously did not believe in and use autonomous driving to dare to use FSD. For example, Tesla FSD V12 can bypass puddles on roads. A Tesla engineer commented: this kind of driving approach is difficult to implement with explicit code, but Tesla's end-to-end approach makes it almost effortlessly.
The development of AI foundation models for autonomous driving can be divided into four phases.
Phase 1.0 uses a foundation model (Transformer) at the perception level.
Phase 2.0 is modularization, with foundation models used in perception, planning & control and decision.
Phase 3.0 is end-to-end foundation models (one "end" is raw data from sensors, and the other "end" directly outputs driving actions).
Phase 4.0 is about heading from vertical AI to artificial general intelligence (AGI's world model).
Most companies are now in Phase 2.0, while Tesla FSD V12 is already in Phase 3.0. Other OEMs and Tier1s have followed up with the end-to-end foundation model FSD V12. On January 30, 2024, Xpeng Motor announced that its end-to-end model will be fully available to vehicles in the next step. It is known that NIO and Li Auto will also launch "end-to-end based" autonomous driving models in 2024.
FSD V12's driving decisions are generated by an AI algorithm. It uses end-to-end neural networks trained with massive video data to replace more than 300,000 lines of C++ code. FSD V12 provides a new path that needs to be verified. If it is feasible, it will have a disruptive impact on the industry.
On February 16, OpenAI introduced text-to-video model SORA, signaling the wide adoption of AI video applications. SORA not only supports generation of up to 60-second videos from texts or images, but it well outperforms previous technologies in capabilities of video generation, complex scenario and character generation, and physical world simulation.
Through vision both SORA and FSD V12 enable AI to understand and even simulate the real physical world. Elon Mask believes that FSD 12 and Sora are just two of the fruits of AI's ability to recognize and understand the world through vision, and FSD is ultimately used for driving behaviors, and Sora is used to generate videos.
The high popularity of SORA is further evidence of the rationality of FSD V12. Musk said "Tesla generative video from last year".
AI foundation models evolve rapidly, bringing new opportunities.
In recent three years foundation models for autonomous driving have undergone several evolutions, and the autonomous driving systems of leading automakers must be rewritten almost every year, which also provides entry opportunities for late entrants.
At CVPR 2023, UniAD, an end-to-end autonomous driving algorithm jointly released by SenseTime, OpenDriveLab and Horizon Robotics, won the 2023 Best Paper.
In early 2024, Waytous' technical team and the Institute of Automation Chinese Academy of Sciences jointly proposed GenAD, the industry's first generative end-to-end autonomous driving model which combines generative AI and end-to-end autonomous driving technology. This technology is a disruption to UniAD progressive process end-to-end solution, and explores a new end-to-end autonomous driving mode. The key is to using generative AI to predict temporal evolution of the vehicle and surroundings in past scenarios.
In February 2024, Horizon Robotics and Huazhong University of Science and Technology proposed VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes multi-view image sequences as input in a streaming manner, transforms sensor data into environmental token embeddings, outputs the probabilistic distribution of action, and samples one action to control the vehicle. Using only camera sensors, VADv2 achieves state-of-the-art closed-loop performance in CARLA Town05 benchmark test, much better than all existing approaches. It runs stably in a fully end-to-end manner, even without rule-based wrapper.
On the Town05 Long benchmark, VADv2 achieved a Drive Score of 85.1, a Route Completion of 98.4, and an Infraction Score of 0.87, as shown in Tab. 1. Compared to the previous state-of-the-art method, VADv2 achieves a higher Route Completion while significantly improving Drive Score by 9.0. It is worth noting that VADv2 only utilizes cameras as perception input, while DriveMLM utilizes both cameras and LiDAR. Furthermore, compared to the previous best method which only relies on cameras, VADv2 demonstrates even greater advantages, with a remarkable increase in Drive Score of up to 16.8.
Also in February 2024, the Institute for Interdisciplinary Information Sciences at Tsinghua University and Li Auto introduced DriveVLM (its whole process shown in the figure below). A range of images are processed by a large visual language model (VLM) to perform specific chain of thought (CoT) reasoning to produce driving planning results. This large VLM includes a visual encoder and a large language model (LLM).
Due to limitations of VLMs in spatial reasoning and high computing requirements, DriveVLM team proposed DriveVLM-Dual, a hybrid system that combines advantages of DriveVLM and conventional autonomous driving pipelines. DriveVLM-Dual optionally combines DriveVLM with conventional 3D perception and planning modules, such as 3D object detector, occupancy network, and motion planner, allowing the system to achieve 3D localization and high-frequency planning. This dual-system design, similar to slow and fast thinking processes of human brain, can effectively adapt to changing complexity of driving scenarios.
AI and cloud companies attract attention as foundation models emerge.
As AI foundation models emerge, computing power, algorithm and data are indispensable. AI companies (iFLYTEK, SenseTime, Megvii, etc.) that are good at algorithms and have a large reserve of computing power, and cloud computing companies (Inspur, Volcengine, Tencent Cloud, etc.) with powerful intelligent computing centers, come under a spotlight of OEMs.
In the field of AI Foundation Model, SenseTime has deployed cockpit multimodal foundation model SenseChat-Vision, Artificial Intelligence Data Center (AIDC, with computing power of 6000P), and autonomous driving foundation model DriveMLM. In early 2024, SenseTime launched DriveMLM and achieved good results on CARLA, the most authoritative list of closed-loop test. DriveMLM is an intermediate solution between modular and end-to-end solutions and is interpretable.
For collection of autonomous driving corner cases, Volcengine and Haomo.ai work together to use foundation models to generate scenarios and improve annotation efficiency. The cloud service capabilities provided by Volcengine help Haomo.ai to improve the overall pre-annotation efficiency of DriveGPT by 10 times.
In 2023, Tencent released upgraded products and solutions in Intelligent Vehicle Cloud, Intelligent Driving Cloud Map, Intelligent Cockpit and other fields. In terms of computing power, Tencent Intelligent Vehicle Cloud enables 3.2Tbps bandwidth, 3 times higher computing performance, 10 times higher communication performance, and an over 60% increase in computing cluster GPU utilization, providing high-bandwidth, low-latency intelligent computing power support for training foundation models for intelligent driving. As for training acceleration, Tencent Intelligent Vehicle Cloud combines Angel Training Acceleration Framework, with training speed twice and reasoning speed 1.3 times faster than the industry's mainstream frameworks. Currently Bosch, NIO, NVIDIA, Mercedes-Benz, and WeRide among others are users of Tencent Intelligent Vehicle Cloud. In 2024, Tencent will further strengthen construction of AI foundation models.