시장보고서

상품코드

2064024

지능형 주행 엔드투엔드 대규모 모델 조사 보고서(2026년)

Intelligent Driving End-to-End Large Model Research Report, 2026

발행일: 2026년 05월 | 리서치사: 구분자

ResearchInChina | 페이지 정보: 영문 595 Pages | 배송안내 : 1-2일 (영업일 기준)

가격

Unprintable PDF (Single User License)

PDF 보고서를 1명만 이용할 수 있는 라이선스입니다. 인쇄 불가능하며, 텍스트의 Copy&Paste도 불가능합니다.

US $ 4,500

￦ 6,988,000

Printable & Editable PDF (Enterprise-wide License)

PDF 보고서를 동일 기업의 모든 분이 이용할 수 있는 라이선스입니다. 인쇄 가능하며 인쇄물의 이용 범위는 PDF 이용 범위와 동일합니다.

US $ 6,800

￦ 10,559,000

※ 부가세 별도

한글목차

영문목차

샘플 요청 목록에 추가

※ 본 상품은 영문 자료로 한글과 영문 목차에 불일치하는 내용이 있을 경우 영문을 우선합니다. 정확한 검토를 위해 영문 목차를 참고해주시기 바랍니다.

지능형 주행 대규모 모델에 관한 조사 - 기술 경쟁과 패러다임 통합의 중요한 시기

자율주행 기술이 L2에서 L3-L4로 급속히 진화하는 가운데, 지능형 주행 시스템은 기존의 규칙 기반 아키텍처에서 데이터 기반 및 인지 기반 차세대 아키텍처로 크게 전환되고 있습니다. 그 기반이 되는 핵심 기술로서, 지능형 주행 대규모 모델은 업계 경쟁에서 핵심적인 분야로 자리 잡고 있습니다. 피지컬 AI 시대의 도래가 가속화되는 가운데, 자율주행은 그 첫 번째 대규모 응용 시나리오로 자리매김하며, 자동차를 기존의 이동 수단이라는 틀을 넘어선 '슈퍼 에이전트'로 급속히 진화시켜, 모빌리티, 모바일 오피스, 가정 생활, 그리고 제3자 생태계를 연결하는 모든 시나리오에 대응하는 지능형 허브로 변모하고 있습니다.

산업적 관점에서 볼 때, 물리적 AI는 여전히 기술 융합의 초기 단계에 있으며, 전 세계 자율주행 시장에는 아직 개척되지 않은 거대한 잠재력이 있습니다. 데이터에 따르면, 전 세계 승용차 보유 대수는 약 15억 대, 상용차 및 트럭은 2억 8,000만 대, 운행 중인 택시는 1,800만 대에 달합니다. 전 세계의 연간 총 주행 거리는 13조 킬로미터에 달하는 반면, 자율주행 차량의 주행 거리는 고작 7억 킬로미터에 불과하여 전체의 약 0.006%에 그칩니다. 향후 성장 잠재력은 매우 크다고 할 수 있습니다.

또한 기술 도입 속도를 고려할 때, 지능형 주행의 대규모 모델은 중요한 기술적 발전의 전환기를 맞이하고 있습니다. 부문화된 엔드투엔드 솔루션은 2024년부터 2025년까지 양산 단계에 접어들 것이며, 단일 모델을 기반으로 한 엔드투엔드 및 VLA 기술은 2025년부터 2026년까지 집중적으로 도입될 전망입니다. 이에 더해, 자율주행 경험의 지속적인 개선과 L3-L4 수준의 고도화된 자율주행 기술의 성숙도가 가속화됨에 따라, 피지컬 AI도 급속히 발전하고 있습니다. ResearchInChina는 자율주행용 대규모 모델의 3가지 주요 동향을 예측하고 있습니다.

동향 1 : 2026년 자율주행 대규모 모델 진화의 핵심은 여러 기술 경로의 경쟁과 심층적인 통합이 될 것입니다.

Bosch, Momenta 통합 모델 1 : 단일 모델 엔드투엔드 + 월드 모델 + 강화 학습, 대표적인 공급업체 - WeRide, Bosch, Momenta

특징 : 단일 모델 엔드투엔드 모델은 스마트 드라이빙의 핵심이 되는 신경망으로 기능하며, 센서 입력과 운전 출력을 정보 손실 없이 직접 연결하여 매우 높은 성능 상한을 실현합니다. 월드 모델은 미래의 도로 상황을 추론하는 역할을 담당하며, 시뮬레이션 훈련을 위해 저렴한 비용으로 방대한 양의 롱테일 시나리오를 생성할 수 있습니다. 강화 학습은 보상 메커니즘에 기반하여 추론 공간 내에서 반복 및 최적화를 수행하고, 최적의 주행 전략을 도출하여 다양한 돌발 상황에 대처합니다. 이 세 가지의 조합을 통해 '데이터 생성(월드 모델) → 정책 학습(강화 학습) → 의사결정 및 실행(엔드투엔드 모델)'이라는 강력한 폐쇄 루프가 형성됩니다. 이를 통해 자율주행 시스템은 방대한 주행 데이터를 바탕으로 학습하며 지속적으로 발전할 수 있게 됩니다.

통합 모드 2 : E2E+기반 모델(VLM/VLA)+강화 학습+월드 모델, 대표적인 공급업체 - Horizon Robotics 및 Afari Technology

특징 : 비전·언어 대형 모델은 인지적 추론을 담당하는 '대뇌' 역할을 하고, 소형 엔드투엔드 모델은 신속한 실행을 담당하는 '소뇌' 역할을 합니다.

Horizon Robotics는 단일 모델을 기반으로 한 E2E+VLM+강화 학습+월드 모델을 채택하고 있습니다. Horizon Robotics의 '고속 사고 + 저속 사고'라는 듀얼 트랙 자율주행 아키텍처는 강화 학습을 핵심으로 하고 있습니다. 한편, 월드 모델과 시뮬레이션 훈련을 통해 엔드투엔드 직관 모델을 강화함으로써 밀리초 단위의 응답을 가능하게 하는 동시에, 드문 단시간 시퀀스의 롱테일 시나리오에 대한 대응 능력을 보완합니다. 한편, 추론 기능을 강화함으로써 VLM 인지 모델을 개선하고, 긴 시계열의 복잡한 시나리오에 대한 의미 이해 능력과 논리적 추론 능력을 높이고 있습니다. 최종적으로 VLM의 기능을 차량 모델로 이전하여, 양자화 및 증류를 통해 경량화된 배포를 실현하고, '밀리초 수준의 고속 응답 + 장시계열의 저속 추론'이라는 균형 잡힌 폐쇄 루프를 구축하고 있습니다.

Afari Technology는 VLA+E2E+월드 모델이라는 아키텍처를 채택하고 있습니다. 이 아키텍처에서 VLA 모델은 슬로우 시스템에 의한 고수준 의사결정과 유사한 추론을 담당하고, E2E(엔드투엔드) 알고리즘은 패스트 시스템과 유사한 액션 매핑을 담당합니다. 320억 파라미터를 가진 대규모 모델을 활용해 대규모 멀티모달 사전 학습(VLM)을 수행하고, 이를 70억 파라미터의 경량 모델로 증류함으로써 성능과 배포 간의 균형을 최적화합니다. 또한, 지각과 운전 동작을 조화시키고, 운전 영역의 지식을 도입하며(VLA), 지도 학습을 통한 미세 조정을 통해 고수준의 운전 전략과 행동 규범을 학습합니다→인간의 운전 스타일과 안전 제약을 조화시키는 강화 학습을 통해 지각·의사결정·제어의 폐쇄 루프 최적화를 실현합니다.

통합 모드 3 : VLA+월드 모델, 주요 공급업체 - Zhuoyu Technology 및 XPeng

특징 : VLA는 현재 환경 인식, 과거 운전 패턴 학습, 그리고 다음 행동 결정을 담당합니다. 월드 모델은 도로 위의 각 대상이 향후 5-10초 동안 어떻게 상호작용할지 추론하는 역할을 담당합니다. VLA는 현재를 이해하는 데는 능숙하지만, 미래를 예측하는 데는 능숙하지 않습니다. 한편, 월드 모델은 예측 능력은 뛰어나지만, 예측 결과에 대해 반성하거나 추론할 수는 없습니다. 이 두 가지의 조합이 완전한 '뇌'를 구성합니다.

동향 2 : VLA와 월드 모델의 융합 패러다임은 물리적 AI 구현에 있어 주요 기법 중 하나가 될 것으로 예상됩니다.

미래의 대규모 자율주행 모델 진화의 핵심은, 그 근간을 이루는 패러다임을 '인간의 운전 모방'에서 '물리적 세계에 대한 이해'로 근본적으로 재구축하는 데 있습니다. VLA와 월드 모델은 둘 중 하나를 선택해야 하는 문제가 아닙니다. 미래의 대규모 자율주행 모델은 이 두 가지의 융합 형태가 될 것입니다. 현재, 두 접근 방식의 차이점은 VLA 지지자들이 '이해'를 운전의 전제로 보는 반면, 월드 모델 지지자들은 '예측'이 핵심이라고 생각한다는 점에 있습니다.

월드 모델 학파는 물리적 세계의 변화가 연속적이며 고차원적이라고 생각합니다. 언어는 이산적이고 저차원의 기호 체계이며, 물리에서 언어로의 변환에는 필연적으로 정보 손실이 수반됩니다. 월드 모델은 더 높은 대역폭을 가진 물리적 표현을 직접 조작합니다. VLA 진영은 VLA의 가장 큰 장점이 월드 모델이나 모델 기반 강화 학습과 결합하여 미세 조정할 수 있다는 점에 있다고 보고 있습니다. VLA는 월드 모델의 장점을 활용할 수 있지만, 월드 모델은 VLM/VLA의 장점을 활용할 수 없습니다. 언어는 인간의 상식을 압축한 패키지이기 때문에 강력한 일반화 능력을 부여합니다. VLA는 언어를 통해 '상식 추론' 능력과 Chain-of-Thought(CoT)를 갖추고 있으며, 이를 통해 자기 설명 능력을 획득하고 있습니다.

이 두 가지 접근 방식의 장점과 차이점을 바탕으로, 업계에서는 두 방식을 융합하기 위한 노력이 시작되고 있습니다. 현재, VLA와 월드 모델의 융합에는 주로 세 가지 주류 방식이 존재합니다. 즉, 잠재 공간 통합 융합, 아키텍처 수준의 심층 융합, 그리고 모듈형 협업 융합(클라우드 시뮬레이터형)입니다.

융합 모드 1 : 잠재 공간 통합 융합, 대표적인 예 - Xiaomi OneVL 및 Huawei DriveVLA-W0

그 핵심은 추론 단계에서 추가 모듈을 도입하는 것이 아니라, 월드 모델의 예측 능력을 VLA의 학습 목표에 반영하는 데 있습니다. 구체적으로는 VLA 모델의 학습 과정에 미래 이미지 예측 작업을 추가함으로써, 모델이 행동 예측뿐만 아니라 미래 시점의 환경 상태(즉, 미래 이미지)도 학습할 수 있도록 합니다. 이러한 설계를 통해 모델은 단순히 희소한 행동 감독 신호에 적응하는 데 그치지 않고, 운전 환경의 기반이 되는 동적 법칙을 학습하도록 유도됩니다.

잠재 공간 통합 및 융합 사례 1 : 샤오미 OneVL 자율주행 모델

2026년 5월 13일, 샤오미는 VLA, 월드 모델, 잠재 공간 추론이라는 세 가지 기술적 접근 방식을 하나의 프레임워크에 통합한 완전 오픈 소스 자율주행 모델 'Xiaomi OneVL'을 공식 출시했습니다. 이 모델의 핵심적인 혁신은 잠재 공간 추론을 통해 여러 기술적 패러다임을 깊이 있게 통합한 데 있습니다. 추론 과정을 사람이 읽을 수 있는 자연어로 분해하고, 단어 단위로 연역 논리를 생성하는 기존의 솔루션과 달리, Xiaomi OneVL은 고차원 벡터화된 잠재 공간에서 엔드투엔드 논리 연산을 직접 수행합니다. 이 잠재 공간은 VLA의 시나리오 인식·이해 능력과 월드 모델의 환경 시계열 예측 능력을 모두 통합하고 있으며, 모든 추론 연산이 텍스트 수준이 아닌 벡터 수준에서 수행되기 때문에 기존 VLA 솔루션에 비해 추론 효율이 획기적으로 향상되었습니다.

구현 메커니즘과 관련하여, 우선 모델 내에 두 가지 유형의 잠재 변수, 즉 시각적 잠재 토큰과 언어적 잠재 토큰을 도입하고 있습니다. 전자는 장면 내의 물리적 관계나 시간적 변화를 코드화하여, 월드 모델의 예측 능력을 담당합니다. 후자는 운전 의도와 의미론적 논리를 표현하며, VLA의 이해 능력을 담당합니다.

다음으로, OneVL에서는 두 개의 보조 디코더를 도입했으며, 이들은 학습 단계에서만 사용됩니다. 언어 보조 디코더는 언어 잠재 토큰에서 사람이 읽을 수 있는 CoT 텍스트를 복원하는 역할을 담당하며, 모델이 특정 운전 결정을 내린 이유를 설명합니다. 시각 보조 디코더는 시각적 잠재 토큰을 바탕으로 향후 프레임의 시각적 토큰(0.5초 후 및 1.0초 후의 이미지)을 예측하는 역할을 수행하며, 이를 통해 모델이 장면의 변화를 예측할 수 있게 합니다. 추론 단계에서는 두 디코더가 모두 제거되고, 모델은 계획 결과를 직접 출력합니다. 이를 통해 원스텝 추론이 실현되며, 자기회귀로 인해 발생하는 지연의 누적이 완전히 제거됩니다.

잠재 공간 통합 융합 사례 2 : Huawei DriveVLA-W0가 월드 모델링 태스크를 통해 미래 이미지를 예측

기존의 VLA 모델은 '감독 정보의 부족'이라는 근본적인 문제에 직면해 있습니다. VLA 모델의 입력은 고차원 멀티모달 데이터(정면 뷰 이미지 시퀀스, 언어 지시, 과거 행동 등)이지만, 지도 신호는 저차원 행동 토큰에 불과합니다. 모델의 표현 능력 대부분이 낭비되어, 그 결과 운전 환경의 복잡한 역학을 충분히 학습하지 못하게 되며, VLA 모델이 지닌 막대한 잠재력을 효과적으로 발휘할 수 없습니다.

아래 그림에서 볼 수 있듯이, 훈련 데이터의 양이 70만 프레임에서 700만 프레임, 나아가 7,000만 프레임(데이터 양이 증가함에 따라)으로 늘어남에 따라 충돌률은 감소하는 경향을 보이고 있습니다. 즉, 훈련 데이터가 많을수록 안전성이 향상된다는 뜻입니다. 그러나 월드 모델을 사용하지 않는 기존의 VLA 기술 패러다임에서는 데이터가 700만 프레임에서 7,000만 프레임으로 증가하면 충돌률 감소 속도가 둔화됩니다. 이는 데이터가 VLA의 안전 성능 향상에 미치는 효과에는 한계가 있음을 보여줍니다.

희소 자기 지도 학습, 데이터 스케일링 법칙의 붕괴, 물리적 시계열 예측 능력의 부재와 같은 VLA의 과제를 해결하기 위해 화웨이는 논문에서 DriveVLA-W0라는 훈련 패러다임을 제안했습니다. 이는 훈련 단계에서 미래의 이미지를 고밀도 자가 지도 신호로 예측하는 월드 모델을 도입하여, 환경의 동적 변화를 이해하는 능력을 유지하면서 미래의 시계열 예측 능력을 향상시키는 것입니다. 기존 VLA와 비교하여, DriveVLA-W0는 월드 모델링(미래 도로 상황 예측) 기능을 추가했습니다. 이로 인해 데이터 양이 늘어날수록 그 이점이 확대되고, 데이터의 확장 법칙이 강화됩니다.

구체적으로는 VLA 모델의 학습 과정에 미래 이미지 예측 작업을 추가함으로써, 모델이 행동 예측뿐만 아니라 미래 시점의 환경 상태(즉, 미래 이미지)도 학습할 수 있도록 합니다. 이러한 설계 덕분에 모델은 단순히 희소한 행동 감독 신호에 적응하는 데 그치지 않고, 주행 환경의 근저에 있는 동적 법칙을 학습하도록 유도됩니다.

융합 모드 2 : 아키텍처 수준의 심층 융합, 대표적인 예 - VLA-World

월드 모델이 외부 도구로 기능하며, 먼저 생성한 다음 전달하는 사전 학습 융합(외부 강화 학습)과 달리, 아키텍처 수준의 심층 융합에서는 월드 모델의 기능을 VLA의 고유 기능으로 내재화하여, 계획과 생성이 동일한 아키텍처 내에서 함께 발전합니다.

2026년 4월 상하이교통대학교와 화웨이 중앙연구소가 공동으로 제안한 VLA-World는 세계 모델 기능을 심도 있게 통합한 통합형 VLA 아키텍처입니다. 기존 솔루션에서는 세계 모델과 VLA가 서로 독립적으로 작동했으며, 전자는 시뮬레이션 영상 생성을 담당하고, 후자는 지각 추론 및 의사결정의 출력을 담당했습니다. VLA-World는 시각적 생성과 의사결정 추론 간의 특징량 공유를 위해 단일 VLA 백본 네트워크를 채택하고 있습니다. 궤적 예측과 시각 생성을 동일한 의사결정 체인 내의 연속된 단계로 통합하고, 먼저 운동 궤적을 예측한 뒤 그 궤적을 바탕으로 미래의 이미지를 추론하는 인과 논리를 따름으로써, 모듈 간의 깊은 결합과 일관성 높은 추론 체인을 실현하고 있습니다.

작동 원리:

궤적 인지에 의한 조건화 : VLA-World는 먼저 궤적을 예측한 뒤, 그 궤적을 조건으로 향후 프레임을 생성합니다. 궤적 예측 결과는 시각 생성의 조건부 신호로서 직접 기능하며, 생성 과정을 이끌어갑니다. 이처럼 궤도는 '어디로 가는지'를 결정하고, 이미지는 '그곳에 도착했을 때 무엇을 보게 되는지'를 제시함으로써 인과적인 의존 관계를 형성합니다.

생성 및 추론의 통합 : 세계 모델과 VLA가 두 개의 독립된 모듈이었던 기존 방식과 달리, VLA-World에서는 두 모듈이 동일한 VLA 백본을 공유합니다. 즉, 시각 생성과 추론을 동일한 VLA 구조 내에서 통합하고 있습니다.

GRPO를 통한 종단 간 정합 - 강화 학습 단계에서 GRPO(Group Relative Policy Optimization)를 사용하여 모델을 최적화합니다. 모델은 여러 후보 궤도와 이에 대응하는 미래 이미지를 생성하며, '상상된 미래'가 '실제 안전한 결정'과 일치하는 결과에 대해 보상을 제공합니다. 이러한 메커니즘을 통해 시각 생성은 더 이상 독립적인 작업이 아니라, 항상 다운스트림 단계의 결정 품질을 높이는 역할을 하게 됩니다.

동향 3 : 지능형 주행 AI의 기반 모델로의 진화가 가속화되면서, 업계는 기반 모델의 범용적인 인식·추론 능력을 둘러싼 경쟁 국면에 접어들 것입니다.

2026년은 자율주행 기반 모델이 등장하기 시작한 첫 해입니다. DeepRoute.ai, Afari Technology, Zhuoyu Technology, Li Auto, XPeng이 관련 제품을 발표했습니다. 기반 모델의 핵심은 물리적 세계를 대상으로 하는 범용적이고 재사용 가능한 인지 기반을 구축하여, 모든 수준의 자율주행과의 호환성을 확보하고 시나리오를 초월한 능력 이전을 실현하는 데 있습니다.

첫째, 자율주행은 본질적으로 전형적인 확장성 문제이며, 현재의 구현은 주로 모델 용량의 부족과 데이터 폐루프의 낮은 효율성으로 인해 제약을 받고 있습니다. 첫째, 기존의 기반 모델은 규모가 제한적이며, 롱테일의 복잡한 시나리오에 대한 일반화 능력이 부족합니다. 둘째, 고가치 데이터 마이닝은 수동 선별 및 검토에 의존하고 있으며, 데이터가 파편화되어 있고 자동화가 미흡하기 때문에 장기적인 반복 수행 능력이 제한되고 있습니다.

모델 용량 부족과 데이터 폐루프의 비효율성이라는 두 가지 병목 현상을 해결하기 위해 DeepRoute.ai는 400억 개의 매개변수를 가진 통합형 VLA 기반 모델이라는 솔루션을 제안했습니다. 그 핵심이 되는 혁신은 '삼위일체' 모델 역할 설계에 있으며, 동일한 모델이 '운전자(시각 입력 → 실시간 운전 판단)', '분석가(주요 시나리오에 대한 진단적 이해)' , '비평가/심판(운전 행동의 안전성과 합리성 평가)'이라는 세 가지 역할을 동시에 수행할 수 있게 함으로써, 운전 시스템을 단순한 실행 시스템에서 인지 능력을 갖춘 지능형 시스템으로 진화시킵니다.

전처리 단계에서 DeepRoute.ai는 궤적 감독에 의존하는 기존의 엔드투엔드 모델 접근 방식(데이터 활용률 0.001%에 불과)을 포기하고, 대신 동영상 예측 작업을 채택했습니다. 이를 통해 모델은 동영상 시퀀스를 예측함으로써 실세계의 동적 구조를 학습할 수 있게 되었으며, 모든 픽셀을 지도 신호로 변환함으로써 데이터 활용률을 거의 100%까지 높였습니다.

코어 트레이닝 단계(Mid-train)에서는 모델이 다음 3가지 작업을 중심으로 공동 학습을 수행합니다. V+A(비전+액션)를 통한 기존의 종단간 운전 학습, V+A→L(액션 후 설명)을 통한 분석가와 비평가의 역할 활성화, 그리고 V→L+A(다중 모달 논리 추론)를 통한 추론 능력을 갖춘 운전자 훈련입니다. Chain-of-Thought를 활용하여, 먼저 모델이 주요 사건에 대한 언어적 설명과 결정 논리를 출력하도록 한 다음, 구체적인 주행 궤적을 출력하도록 합니다.

엔지니어링 구현 측면에서 DeepRoute.ai는 KV 캐시, 멀티 토큰 예측(MTP), 모델 양자화, 그리고 독자적으로 개발한 추론 엔진과 같은 최적화 기법을 활용함으로써, 1,000개의 시각 토큰과 수십 개의 추론 토큰에 대한 1단계 처리 지연 시간을 60-85밀리초 이내로 억제하고, 10-15Hz의 실시간 폐루프 제어 능력을 실현하고 있습니다. 또한, 기반 모델은 차량용 칩의 연산 능력에 따라 유연하게 디스티레이션을 적용할 수 있어, 100 TOPS 플랫폼에는 순수한 운전 VA 모델을, 500 TOPS 플랫폼에는 논리적 추론 기능을 갖춘 VLA 모델을 적용할 수 있습니다.

또한, 기반 모델은 실제 세계의 물리 법칙과 공간 논리를 학습하기 위해 사전 학습되어 있으며, 네이티브 제로샷 전이 기능을 갖추고 있습니다. 범용적인 인식 기반을 갖추고 있으며, 모델 증류, 연산 능력 최적화, 기능 미세 조정을 통해 L2 운전 지원부터 L4 자율주행에 이르기까지 모든 수준에 적응합니다. 우선 자율주행에 적용되며, 향후 휴머노이드 로봇이나 산업용 로봇 등 여러 분야로 확대되어 '단일 플랫폼으로 모든 것을 지능화'하는 것을 실현할 것입니다.

2026년, Zhuoyu Technology는 전략을 전면적으로 전환할 예정입니다. 네이티브 멀티모달 기반 모델을 기술적 토대 삼아, '지능형 주행 Tier 1 공급업체'에서 '모빌리티·물리적 AI 기업'으로의 도약을 목표로 합니다. 승용차, 상용차, L4 제품, 해외 진출을 아우르는 모든 시나리오 및 수직 분야에서의 양산 확대에 주력하며, 나아가 구현형 로봇 분야로 사업을 확장해 나갈 것입니다.

Zhuoyu는 VLA(VLA World Model, 네이티브 멀티모달 FM)를 발표했습니다. 이는 통합된 백본을 사용하여 시각, 텍스트, 센서 데이터를 처리하고, 잠재 공간 내에서 물리적 추론을 수행하여 운전 동작을 직접 출력하는 시스템입니다. 사전 학습 단계부터 이미지, 동영상, 텍스트, 운전 및 로봇 데이터를 활용한 공동 학습을 수행하고, 통합된 잠재 공간에서 물리적 세계에 대한 예측과 추론을 통해 의미론과 물리 법칙을 모두 이해합니다.

2026년은 자율주행 대규모 모델의 기술적 진화와 패러다임 융합에 있어 중요한 해가 될 것입니다. 여러 기술 경로의 경쟁과 통합, VLA와 월드 모델의 협업적 구현, 그리고 기반 모델의 대규모 배포가 맞물리면서 자율주행 산업을 '기술적 탐구' 단계에서 '대규모 구현' 단계로 가속화할 것입니다. 다중 경로 통합의 기술 혁신이든, 기반 모델의 범용적 확대이든, 그 핵심은 "더 안전하고, 더 효율적이며, 실제 운전 시나리오에 더 잘 적응할 수 있다"는 목표를 중심에 두는 데 있습니다. '물리 AI' 구현이라는 추세는 자율주행 시스템을 '인간을 모방하는' 단계에서 '세상을 이해하는' 단계로 한 단계 더 진화시켜, 진정한 자율주행을 실현하게 될 것입니다.

향후 기술의 지속적인 발전과 산업 체인의 협력적 개선에 따라, 자율주행용 대규모 모델은 기존의 병목 현상을 점차 해소하고, 자율주행의 대규모 도입을 뒷받침하는 핵심이 될 뿐만 아니라, 모빌리티 분야의 발전 양상을 재구축함과 동시에 이동체 물리 AI의 추가적인 시나리오로의 확장 및 적용을 촉진할 것으로 전망됩니다.

As autonomous driving technology rapidly iterates from L2 to L3-L4, intelligent driving systems are shifting profoundly from traditional rule-driven architectures to the new generation of data-driven + cognition-driven architectures. As the underlying core enabler, intelligent driving large models have become the core track in industry competition. As the accelerated arrival of the Physical AI era, autonomous driving stands as its first large-scale application scenario, promoting automobiles to evolve rapidly into super agents that transcend the nature of traditional transportation tools and become all-scenario intelligent hubs connecting mobility, mobile office, home life, and third-party ecosystems.

From an industrial perspective, Physical AI remains in the early stage of technological fission, and the global autonomous driving market holds massive untapped potential. According to the data, there is a global ownership of about 1.5 billion passenger cars, 280 million commercial vehicles and trucks, and 18 million operating taxis. The total annual global driving mileage reaches 13 trillion kilometers, while the autonomous driving mileage is only 700 million kilometers, accounting for only about 0.006%. The future incremental potential is significant.

Judging further from the pace of technological implementation, intelligent driving large models are ushering in a critical technological iteration window period. The segmented end-to-end solution has come into mass production during 2024-2025, and the one-model end-to-end and VLA technologies are intensively implemented during 2025-2026. Coupled with the continuous upgrading of intelligent driving experience and the accelerated maturation of L3-L4 high-level autonomous driving technology, physical AI is accelerating. ResearchInChina predicts three major evolution trends of intelligent driving large models.

Trend 1: The Core Focus of Autonomous Driving Large Model Evolution in 2026 Will Be Competition and Deep Integration of Multiple Technical Routes.

Bosch,Momenta Integration Mode 1: One-model End-to-End + World Model + Reinforcement Learning, Representative Suppliers: WeRide, Bosch and Momenta

Features: The one-model end-to-end model serves as the core neural network of intelligent driving, directly connecting sensor input and driving output with zero information loss and extremely high performance ceiling; the world model is responsible for future deduction of road conditions and can generate massive long-tail scenarios at low cost for simulation training; reinforcement learning iterates and optimizes in the deduction space relying on the reward mechanism, outputs the optimal driving strategy, and copes with various sudden working conditions. The combination of the three forms a powerful closed loop of "data generation (world model) -> policy training (reinforcement learning) -> decision and execution (end-to-end model)". This enables intelligent driving systems to learn from massive driving data and keep evolving.

Integration Mode 2: E2E + Foundation Model (VLM/VLA) + Reinforcement Learning + World Model, Representative Suppliers: Horizon Robotics and Afari Technology

Features: The vision-language large model acts as the "cerebrum" responsible for cognitive reasoning, and the small end-to-end model acts as the "cerebellum" responsible for rapid execution.

Horizon Robotics adopts the one-model E2E + VLM + reinforcement learning + world model. Horizon Robotics' "fast thinking + slow thinking" dual-track intelligent driving architecture takes reinforcement learning as the hub. On the one hand, it empowers the end-to-end intuition model through the world model and simulation training, enabling it to respond in milliseconds while complementing the ability to handle rare short-time-sequence long-tail scenarios. On the other hand, it empowers the VLM cognitive model through reasoning enhancement, strengthening its semantic understanding and logical reasoning capabilities for long-time-sequence complex scenarios. It finally realizes the migration of VLM capabilities to the vehicle model, and completes lightweight deployment by quantization and distillation, building a balanced closed loop of "millisecond-level fast response + long-time-sequence slow reasoning".

Afari Technology adopts the VLA + E2E + world model architecture, in which the VLA model is responsible for reasoning similar to the high-level decision by the slow system, and the E2E end-to-end algorithm is responsible for mapping actions similar to the fast system. The 32B-parameter large model is used for large-scale multimodal pre-training (VLM) -> distilled into a 7B lightweight model, balances performance and deployment (VLM) -> aligning perception and driving actions, introduces driving domain knowledge (VLA) -> supervised fine-tuning, and learns high-level driving strategies and behavioral norms -> reinforcement learning aligning human driving styles and safety constraints, realizing perception-decision-control closed-loop optimization.

Integration Mode 3: VLA + World Model, Representative Suppliers: Zhuoyu Technology and XPeng

Features: VLA is responsible for perceiving the current environment, learning historical driving patterns, and determining the next action. The world model is responsible for deducing how each target on the road will interact in the next 5 to 10 seconds. VLA is good at understanding the present but not predicting the future; the world model is good at prediction but does not reflect on and reason about the prediction results. The combination of the two constitutes a complete brain.

Trend 2: The VLA and world model fusion paradigm is expected to become one of the main ways for the implementation of Physical AI.

The core of the future evolution of intelligent driving large models is the fundamental reconstruction of the underlying paradigm from "imitating human driving" to "understanding the physical world". VLA and world model are not an either-or choice. The future intelligent driving large model will be a fusion of the two. At present, the divergence between the two routes lies in that VLA advocates believe that "understanding" is the premise of driving, while world model advocates believe that "prediction" is the key.

World model advocates believe that changes in the physical world are continuous and high-dimensional. Language is a discrete, low-dimensional symbolic system - the transformation from physics to language is inevitably accompanied by information loss. The world model directly operates physical representations with higher bandwidth. VLA advocates believe that the biggest advantage of VLA is that it can be fine-tuned with the world model or model-based reinforcement learning. It can absorb the advantages of the world model, while the world model cannot utilize the advantages of VLM/VLA. Language brings strong generalization capability for it is a compressed package of human common sense. VLA possesses "common sense reasoning" capability and Chain-of-Thought (CoT) via language, thus gaining self-explanation capability.

Based on the advantages and divergences of the two routes, the industry has begun to explore the fusion path of the two. At present, there are three mainstream fusion modes for VLA and world model: latent space unified fusion, in-depth fusion at the architectural level, and modular collaborative fusion (cloud simulator type).

Fusion Mode 1: Latent Space Unified Fusion, Representatives: Xiaomi OneVL and Huawei DriveVLA-W0

The core is to embed the prediction capability of the world model into the training objectives of VLA, rather than adding additional modules in the reasoning stage. Specifically, it adds a future image prediction task to the training process of the VLA model, allowing the model to not only learn to predict actions, but also the environmental state (i.e., future images) at future moments. This design forces the model to learn the underlying dynamic laws of the driving environment, rather than just fitting sparse action supervision signals.

Case 1 of Latent Space Unified Fusion: Xiaomi OneVL Autonomous Driving Model

On May 13, 2026, Xiaomi officially released Xiaomi OneVL, a fully open-sourced autonomous driving model which unifies the three technical routes of VLA, world model and latent space reasoning into the same framework. The core breakthrough of this model is the in-depth unification of multiple technical paradigms through latent space reasoning. Differing from traditional solutions that decompose the reasoning process into human-readable natural language and generate deduction logic word by word, Xiaomi OneVL directly completes end-to-end logical operations in the high-dimensional vectorized latent space. This latent space integrates both the scenario perception and understanding capability of VLA and the environmental time-series prediction capability of the world model, and all reasoning operations are carried out at the vector level rather than the text level, achieving a significant leap in reasoning efficiency compared with traditional VLA solutions.

In terms of implementation mechanism, firstly, two types of latent variables are introduced inside the model: visual latent token and language latent token. The former is responsible for encoding physical relationships and time-series changes in the scene, carrying the prediction capability of the world model. The latter is responsible for expressing driving intentions and semantic logic, carrying the understanding capability of VLA.

Secondly, OneVL introduces two auxiliary decoders, which are only used in the training stage. The language auxiliary decoder is responsible for restoring human-readable CoT text from the language latent token, explaining why the model makes a certain driving decision. The visual auxiliary decoder is responsible for predicting future frame visual tokens (images after 0.5 seconds and 1.0 seconds) from the visual latent token, allowing the model to predict scene changes. During inference, both decoders are removed, and the model directly outputs planning results, realizing one-step reasoning and completely eliminating the delay accumulation caused by autoregression.

Case 2 of Latent Space Unified Fusion: Huawei DriveVLA-W0 Predicts Future Images Through World Modeling Tasks

Traditional VLA models face a fundamental problem: Supervision Deficit. The input of VLA models is high-dimensional multimodal data (front-view image sequences, language instructions, historical actions, etc.), but the supervision signal is only low-dimensional action tokens. Most of the model's representation capacity is wasted, resulting in its inability to fully learn the complex dynamics of the driving environment, and the huge potential of VLA models cannot be effectively released.

As can be seen from the figure below, as the amount of training data increases from 700,000 frames to 7 million frames and then to 70 million frames (ever more data), the collision rate shows a downward trend, that is, the more training data, the better the safety. However, for the traditional VLA technical paradigm without the world model, when the data increases from 7 million frames to 70 million frames, the decline in collision rate slows down, indicating that data has limited effect on improving the safety performance of VLA.

To solve the sore points of VLA such as sparse supervision, failure of data scaling law, and lack of physical time-series prediction capability, Huawei proposed the DriveVLA-W0 training paradigm in its paper, introducing the world model to predict future images as dense self-supervision signals during the training stage, so as to increase future time-series prediction while maintaining the ability to understand environmental dynamics. Compared with traditional VLA, DriveVLA-W0 adds world modeling (predicting future road conditions): the more data, the greater the advantage is magnified, and the data scaling law is strengthened.

Specifically, it adds a future image prediction task to the training process of the VLA model, allowing the model to not only learn to predict actions, but also the environmental state (i.e., future images) at future moments. This design forces the model to learn the underlying dynamic laws of the driving environment, rather than just fitting sparse action supervision signals.

Fusion Mode 2: In-depth Fusion at the Architectural Level, Representative: VLA-World

Differing from pre-training fusion (external reinforcement), where the world model acts as an external tool to generate first and then transmit, in-depth fusion at the architectural level internalizes the world model capability into the native capability of VLA, with planning and generation growing together in the same architecture.

VLA-World, jointly proposed by Shanghai Jiao Tong University and Huawei Central Research Institute in April 2026, is an integrated VLA architecture with deeply embedded world model capabilities. In traditional solutions, the world model and VLA are independent of each other, with the former responsible for generating simulation videos and the latter for perception reasoning and decision output. VLA-World adopts a single VLA backbone network for feature sharing between visual generation and decision reasoning. It integrates trajectory prediction and visual generation into continuous links of the same decision chain, and follows the causal logic of predicting motion trajectory first and then deducing future images based on the trajectory, realizing deep module coupling and highly coherent reasoning chain.

Working Mechanism:

Trajectory Perception Conditioning: VLA-World predicts the trajectory first, and then generates future frames conditioned on the trajectory: the trajectory prediction result directly serves as the conditioning signal for visual generation to guide the generation process. In this way, the trajectory determines "where to go", and the image presents "what to see when arriving there", forming a causal dependency.

Unified Generation and Reasoning: Differing from the past when the world model and VLA were two independent modules, VLA-World enables the two to share the same VLA backbone, that is, unifying visual generation and reasoning in the same VLA structure.

GRPO End-to-End Alignment: GRPO (Group Relative Policy Optimization) is used to optimize the model during the reinforcement learning stage. The model generates multiple candidate trajectories and corresponding future images, and rewards those results where the "imagined future" is consistent with the "real safe decision". This mechanism makes visual generation no longer an independent task, but always serves the quality of downstream decisions.

Trend 3: The Evolution of Intelligent Driving AI Towards Foundation Models Accelerates, and the Industry Will Enter A Competition Period of General Cognitive and Reasoning Capabilities of Foundation Models.

2026 is the first year of the launch of autonomous driving foundation models. DeepRoute.ai, Afari Technology, Zhuoyu Technology, Li Auto, and XPeng have launched related products. The core of foundation models is to build a universal and reusable cognitive base for the physical world, realizing full-level intelligent driving compatibility and cross-scenario capability migration.

Firstly, autonomous driving is essentially a typical scaling problem, and current implementation is mainly restricted by insufficient model capacity and low efficiency of data closed-loop. First of all, the existing foundation models have limited scale and insufficient generalization capability for long-tail complex scenarios; secondly, high-value data mining relies on manual screening and review, with fragmentation and low automation, limiting long-term iterative capabilities.

To address the two bottlenecks of insufficient model capacity and inefficient data closed-loop, DeepRoute.ai proposed a solution, a unified 40B-parameter VLA foundation model. The core innovation lies in the "trinity" model role design, allowing the same model to play three roles simultaneously: driver (visual input -> real-time driving decision), analyst (diagnostic understanding of key scenarios), and critic/ referee (evaluating the safety and rationality of driving behavior), upgrading the driving system from a simple execution system to an intelligent system with cognitive capabilities.

In the pre-training stage, DeepRoute.ai abandons the traditional approach of the end-to-end model relying on trajectory supervision (data utilization rate is only 0.001%), and instead adopts the video prediction task, enabling the model to learn the dynamic structure of the real world by predicting video sequences, turning every pixel into a supervision signal and increasing the data utilization rate to nearly 100%.

In the core training stage (Mid-train), the model conducts joint training around three tasks: V+A (vision + action) to learn conventional end-to-end driving, V+A->L (explanation after action) to activate the analyst and critic roles, and V->L+A (multimodal logical reasoning) to train a driver with reasoning capability, using Chain-of-Thought to let the model first output language descriptions and decision logic of key events, and then output specific driving trajectories.

In terms of engineering implementation, DeepRoute.ai controls the single-step processing latency of 1,000 visual tokens and dozens of reasoning tokens within 60-85 milliseconds using optimization methods such as KV Cache, Multi-Token Prediction (MTP), model quantization, and self-developed reasoning engine, realizing 10-15Hz real-time closed-loop control capability. Moreover, the foundation model can be flexibly distilled according to the computing power of vehicle chips, and deploy a pure driving VA model on a 100 TOPS platform, and a VLA model with logical reasoning capability on a 500 TOPS platform.

Then the foundation model pre-trains to learn the physical laws and spatial logic of the real world, with native zero-shot migration capability. With a universal cognitive base, it adapts to all levels from L2 assisted driving to L4 autonomous driving through model distillation, computing power tailoring, and capability fine-tuning. It is first applied to autonomous driving, and will migrate to multiple tracks such as humanoid robots and industrial robots in the future, realizing "one foundation making all things intelligent".

In 2026, Zhuoyu Technology fully transforms its strategy. Taking the native multimodal foundation model as the technical base, it aims to upgrade from an "intelligent driving Tier 1 supplier" to a "mobile physical AI company", focusing on mass production expansion across all scenarios and vertical domains covering passenger cars, commercial vehicles, L4 products and overseas layout, and extending to the field of embodied robots.

Zhuoyu launched VLA (VLA World Model, native multimodal FM): it uses a unified Backbone to process visual, text, and sensor data, completes physical reasoning in the latent space, and directly outputs driving actions. From the pre-training stage, it conducts joint training with image/video/text/driving/robot data, and performs prediction and reasoning of the physical world in a unified latent space, understanding both semantics and physical laws.

In 2026, a critical year for the technological iteration and paradigm fusion of intelligent driving large models, the competition and integration of multiple technical routes, the collaborative implementation of VLA and world model, and the large-scale launch of foundation models will jointly promote the intelligent driving industry to accelerate from "technological exploration" to "large-scale implementation". Whether it is technological innovation of multi-route integration or generalized layout of foundation models, the core is to revolve around the goal of "safer, more efficient, and more adaptable to real driving scenarios". The trend of "physical AI" implementation will further drive intelligent driving systems to evolve from "imitating humans" to "understanding the world", realizing true intelligent driving.

In the future, with the continuous iteration of technologies and the coordinated improvement of the industry chain, intelligent driving large models will gradually break through existing bottlenecks, become the core support for the large-scale implementation of autonomous driving, reshape the development pattern of the mobility sector, and also facilitate the extension and application of mobile physical AI in more scenarios.

1 Fundamentals of End-to-End Autonomous Driving Technology

1.1 Terms and Concepts of End-to-End Autonomous Driving
Explanation of End-to-End Autonomous Driving Terminologies
Correlation and Differences of End-to-End Related Concepts
1.2 Introduction to End-to-End Autonomous Driving and Development Status
- 1.2.1 Overview
- Emerging Background of End-to-End Autonomous Driving
- Deduced Impacts of Large AI Models on the Pattern of Autonomous Driving Industry
- Reasons for the Emergence of End-to-End Autonomous Driving: Commercial Value
- Transformer Enables Autonomous Driving
- Differences between End-to-End and Traditional Architectures (1)
- Differences between End-to-End and Traditional Architectures (2)
- Evolution of End-to-End Architecture
- Evolution Route of End-to-End Autonomous Driving
- Comparison between One-Model and Two-Model End-to-End
- Performance Parameter Benchmarking of Mainstream One-Model/Segmented End-to-End Systems
- Challenges and Solutions for Large-Scale Mass Production of End-to-End: Computing Power Supply/Data Acquisition
- Challenges and Solutions for Large-Scale Mass Production of End-to-End: Team Building/Interpretability
- Progress and Challenges in End-to-End Systems: World Model Generation + Neural Network Simulator + RL Accelerating Innovation
- Perception Layer under End-to-End Architecture
- 1.2.2 Implementation Methods of End-to-End Models
- Two Implementation Approaches for End-to-End
- End-to-End Implementation Method: Imitation Learning
- End-to-End Implementation Method: Reinforcement Learning
- Basic Architecture and Definition of Reinforcement Learning
- Mainstream Reinforcement Learning Algorithms
- 1.2.3 Verification Methods of End-to-End Models
- Dataset Evaluation Methods for End-to-End Autonomous Driving
- Three Major Simulation Tests for End-to-End Autonomous Driving Models (1) - Bench2Drive
- Three Major Simulation Tests for End-to-End Autonomous Driving Models (2) - HUGSIM
- Three Major Simulation Tests for End-to-End Autonomous Driving Models (3) - DriveArena
1.3 Classic End-to-End Autonomous Driving Cases
SenseTime UniAD: Path Planning-Oriented Large AI Model Provides E2E Commercial Scenario Applications
Technical Principles and Architecture of SenseTime UniAD
Technical Principles and Architecture of Horizon VAD
Technical Principles and Architecture of Horizon VADv2
Training of VADv2
Technical Principles and Architecture of DriveVLM
Li Auto Adopts Mixture-of-Experts (MoE) Architecture
MOE and STR2
Shanghai Qi Zhi Institute's E2E-AD Model SGADS: A Safe and Generalized E2E-AD System Based on Reinforcement Learning and Imitation Learning
Shanghai Jiao Tong University's ActiveAD Active Learning Case: Solving Data Labeling Bottleneck from A Data-centric Perspective
Most End-to-End Autonomous Driving Systems Are Developed Based on Foundation Models
1.4 Foundation Models
- 1.4.1 Introduction to Foundation Models
- Significance of Introducing Multimodal Models into End-to-End Autonomous Driving
- Core of End-to-End Systems - Foundation Models
- Foundation Model 1: Large Language Model (LLM) - Application Cases in Autonomous Driving
- Foundation Model 2: Vision Foundation - Application in Intelligent Driving
- Foundation Model 2: Vision Foundation - Latent Diffusion Models Framework
- Foundation Model 2: Vision Foundation - Wayve GAIA-1
- Foundation Model 2: Vision Foundation - DriveDreamer Framework
- Foundation Model 3: Multimodal Foundation Model - MFM
- Foundation Model 3: Multimodal Foundation Model - Application of GPT-4V in Intelligent Driving
- 1.4.2 Foundation Models - Multimodal Foundation Model
- Development and Overview of Multimodal Foundation Model
- Multimodal Foundation Model vs. Single-Modal Foundation Model (1)
- Multimodal Foundation Model vs. Single-Modal Foundation Model (2)
- Technical Panorama of Multimodal Foundation Model
- Multimodal Information Representation
- 1.4.3 Foundation Models - MLLM
- Multimodal Large Language Model (MLLM)
- Architecture and Core Components of Multimodal Large Language Model
- Mainstream Multimodal Large Language Models
- Application of Multimodal Large Language Model in Intelligent Driving
- CLIP Model
- LLaVA Model
1.5 Vision-Language Model (VLM)
Application of Vision-Language Model (VLM) in Intelligent Driving
Application of Foundation Models in Autonomous Driving
Application of Vision-Language Model (VLM)
Development History of Vision-Language Model (VLM)
Architecture of Vision-Language Model (VLM)
Application Principles of VLM in End-to-End Autonomous Driving
Application of VLM in End-to-End Autonomous Driving
Challenges Faced by VLM Models in Intelligent Driving
1.6 Vision-Language-Action Model (VLA)
VLM->VLA
VLM +E2E ->VLA
Analysis of VLA Architecture
Typical VLA Architectures
VLA Architecture Analysis Case: Disassembling Li Auto MindVLA Architecture (1)
VLA Architecture Analysis Case: Disassembling Li Auto MindVLA Architecture (2)
Concept of VLA Large Models
Principles of VLA Model
Classification of VLA Models
Interpretation of VLA Technology Evolution
Large Language Model as One of the Cores of End-to-End
Technical Architecture and Key Technologies of VLA
Advantages of VLA (1)
Advantages of VLA (2)
Advantages of VLA (3)
Deployment Challenges of VLA Model - Real-Time Response Capability
Real-Time Performance and Memory Occupancy Challenges of VLA Model Deployment
Deployment Challenges of VLA Model - Data (1)
Deployment Challenges of VLA Model - Data (2)
Deployment Challenges of VLA Model - Long-Term Task Planning Capability
Evolution Route of VLA Large Models
Representative Models of VLA Technical Paradigms
VLA Datasets and Benchmarks
1.7 World Model
World Model Prototype: Mental Model (1)
World Model Prototype: Mental Model (2)
Key Definitions and Application Development of World Model
Basic Architecture of World Model
Three Core Values of World Model Empowering Autonomous Driving
Two Major Technical Routes of World Model
Generative World Model DIAMOND: Diffusion Model + Real-Time RL Adaptation + Long-Term Stability
Generative Interactive World Model Genie: Unsupervised Learning of Real-World Physical Laws from Unlabeled Internet Videos
Technical Principles and Paths of WorldDreamer
Implicit World Model: Technical Principles and Paths of V-JEPA2
Implicit World Model: Technical Principles and Paths of Comma.ai
Framework Setting and Implementation Difficulties of World Model
Video Generation Methods Based on Transformer and Diffusion Models
World Model May be One of the Ideal Approaches to Realize End-to-End Autonomous Driving
World Model - Generation of Virtual Training Data
World Model - Tesla World Model
World Model - NVIDIA
InfinityDrive: Breaking Time Limits in Driving World Models
Parameter Performance of SenseAuto InfinityDrive
Pipeline of SenseAuto InfinityDrive
SenseTime DiT Architecture and Main Video Generation Evaluation Metrics FID/FV
Deployment Challenges of World Model in Autonomous Driving
1.8 Comparison between End-to-End Large Model Technical Paradigms
- 1.8.1 Technical Paradigm Comparison: Modular End-to-End vs. One-Model End-to-End vs. VLM/VLM+E2E/VLA
- Summary of Comparison between Three Mainstream Intelligent Driving Models (1): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
- Summary of Comparison between Three Mainstream Intelligent Driving Models (2): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
- Summary of Comparison between Three Mainstream Intelligent Driving Models (3): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
- Definition and Classification of Generalized End-to-End (GE2E)
- Comparison of Different GE2E Autonomous Driving Paradigms: Planning-Only E2E vs. Multi-Task E2E
- Comparison of Different GE2E Autonomous Driving Paradigms: VLM-Driven Cognitive End-to-End Driving
- Comparison between Two Technical Paradigms: VLM + Traditional E2E
- Architecture Summary of Various GE2E Autonomous Driving Models
- Performance Comparison between Various GE2E Autonomous Driving Models
- 1.8.2 Technical Paradigm Comparison: VLA vs. World Model
- VLA vs. World Model: Who will Win?
- Performance Competition between VLA and World Model
- Summary of Comparison between VLM/VLA/World Models
1.9 Diffusion Models
Four Mainstream Generative Models
Principles of Diffusion Models
Diffusion Models Optimize Core Links of Intelligent Driving Trajectory Generation
Diffusion Models Optimize Intelligent Driving Trajectory Generation
Application of Diffusion Models in Intelligent Driving
Practical Application Cases of Diffusion Model

2 Technical Routes and Development Trends of End-to-End Autonomous Driving

2.1 Technical Trends of End-to-End Autonomous Driving
Summary of Evolution Route of Intelligent Driving End-to-End Large Models
Trend 1: The Core Focus of Autonomous Driving Large Model Evolution in 2026 Will Be Competition and Deep Integration of Multiple Technical Routes
Integration Case 1: Overall Architecture of Afari Technology's Autonomous Driving System Adopts VLA+E2E Collaborative Closed Loop
Integration Case 2: L3-Capable World Action Model (WAM) Builds Trinity Architecture of "VLA + World Model + Safety Adversarial Model"
Trend 2: VLA and World Model Fusion Paradigm Is Expected to Become One of the Mainstream Approaches for Physical AI Implementation
VLA+World Model Integration Case 1: Xiaomi OneVL Unifies VLA and World Model into One Framework
Disassembly of Xiaomi OneVL Architecture
VLA+World Model Integration Case 2: XPeng Launches X-World
VLA+World Model Integration Case 3: Huawei DriveVLA-W0 Predicts Future Images via World Modeling Tasks
Disassembly of DriveVLA-W0 Architecture
DriveVLA-W0 Leverages World Models to Amplify Autonomous Driving Data Scaling Law
VLA+World Model Integration Case 4: Bosch ExploreVLA Introduces World Model Based on VLA+RL to Achieve Three Major Breakthroughs
Disassembly of Bosch ExploreVLA Model Architecture
Trend 3: Autonomous Driving Is Entering the Physical AI Stage
Ultimate Form of Physical AI Connects Digital and Physical Worlds, and Autonomous Driving Serves as Its Optimal Implementation Carrier
Trend 4: Evolution of Intelligent Driving AI Towards Foundation Models Accelerates, and the Industry Will Enter A Competition Period of General Cognitive and Reasoning Capabilities of Foundation Models
Case 1: Hardcore Technological Innovations in DeepRoute 40B VLA Foundation Model
Case 2: Core of 2026 Strategy of Zhuoyu Technology: Building Mobile Intelligent Foundation Model (1)
Case 2: Core of 2026 Strategy of Zhuoyu Technology: Building Mobile Intelligent Foundation Model (2)
Case 3: XPeng World Foundation Model
Trend 5: End-to-End Autonomous Driving Has Entered the Stage of Data Closed-Loop Competition and Refined Operation
Case: NVIDIA MOSAIC
Trend 6: Robots and Intelligent Driving Become Two Mainstream E2E Application Scenarios on the Road to AGI (1)
Trend 6: Robots and Intelligent Driving Become Two Mainstream E2E Application Scenarios on the Road to AGI (2)
2.2 End-to-End Autonomous Driving Market Trends
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (1)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (2)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (3)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (4)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (5)
Solution Layout Comparison between Other End-to-End Autonomous Driving System Suppliers
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (1): Xiaomi, XPeng, Li Auto, NIO
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (2): Changan, BYD, Leapmotor
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (3): Chery, Dongfeng, IM Motors
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (4): GAC, FAW Hongqi, Geely

3 End-to-End Autonomous Driving Suppliers

3.1 Afari Technology - End-to-End Autonomous Driving Model
Profile
Fully Entering into AI-Driven Intelligent Vehicle Era
AI + Vehicle Strategy
Top-Level Strategy and Commercial Closed Loop
Ecosystem Alliance
Judgment on Next-Generation End-to-End Architecture Trend (1)
Judgment on Next-Generation End-to-End Architecture Trend (2)
Judgment on Next-Generation End-to-End Architecture Trend (3)
End-to-End Large Model Architecture: E2E2.0+VLA
E2E Architecture
World Model Closed-Loop Simulation Architecture
Native Intelligent Driving Foundation Model
Three Major Businesses (1)
Three Major Businesses (2): Robotaxi Deployment Plan, 2026-2030
Evolution Route of Intelligent Driving Solutions (ASD1.0 to ASD4.0) and End-to-End Large Model
Mass Production of Chongqing Qianli Intelligent Driving Technology Co., Ltd.
3.2 Horizon Robotics - End-to-End Autonomous Driving Large Model
Ultimate Strategic Roadmap: 2025-2030+
Three Strategic Evolutions
Latest Product Launches in 2026 (1)
Latest Product Launches in 2026 (2)
Adopts One-Model End-to-End + VLM Solution
Introduction of Reinforcement Learning and World Model
Thoughts on One-Model End-to-End Large Models
Urban Driving Assistance System: HSD
Journey 6 Series Chips
SparseDriveV2 (1)
SparseDriveV2 (2)
UMGen: Unified Framework for Multimodal Driving Scene Generation
GoalFlow: Goal-Driven Approach Unlocking New Future of Generative End-to-End Strategies
MomAD: Momentum-Aware Planning in End-to-End Autonomous Driving
DiffusionDrive: Towards Generative Multimodal End-to-End Autonomous Driving
RAD: Post-Training Paradigm of End-to-End Reinforcement Learning Based on 3DGS Digital Twin World
Mass Production
Super Drive High-Level Intelligent Driving and Its Advantages
Architecture and Technical Principles of Super Drive
Senna Intelligent Driving System (Large Model + End-to-End)
Core Technologies and Training Methods of Senna
Core Modules of Senna
3.3 Zhuoyu Technology - Intelligent Driving Large Model
Comparison of Three Intelligent Driving Model Paradigms: One-Model End-to-End, World Model and VLA (1)
Comparison of Three Intelligent Driving Model Paradigms: One-Model End-to-End, World Model and VLA (2)
Launched Mobile Physical AI Foundation Model in 2026: Native Multimodal Foundation Model
Comparison between Three VLA Technical Paradigms and Zhuoyu's 2026 Native Multimodal Foundation Model
Evolution Route of ClixPilot End-to-End Large Model (1)
Evolution Route of ClixPilot End-to-End Large Model (2)
End-to-End World Model Architecture
Two-Stege Training Model for End-to-End World Model
Core Functions of Generative Intelligent Driving GenDrive
Core Technologies of Generative Intelligent Driving
Two-Model End-to-End
Interpretable One-Model End-to-End
Mass Production and Clients of End-to-End
3.4 NVIDIA - Intelligent Driving Large Model
Ten-Year Layout of Autonomous Driving Business
L2++/L4 Intelligent Driving Plan (2026-2030)
L3 and L4 Implementation Roadmap of NVIDIA
DRIVE Full-Stack Driving Assistance Platform: 5-Layer Architecture
Drive Hyperion 10 (1): Hardware Configuration
Drive Hyperion 10 (2): Software Architecture
Building Autonomous Driving Safety and AI Ecosystem Based on Halos OS
DRIVE AV Intelligent Driving Large Model Solution: VLA + Classic Rule-Based Algorithms
E2E+VLM->Drive VLA (1)
E2E+VLM->Drive VLA (2)
VLA On-Vehicle Deployment Solution (1)
VLA On-Vehicle Deployment Solution (2)
Launched Alpamayo 1.5
Drive VLA Technical Route: 10B Large Model Alpamayo 1.5
New-Generation In-Vehicle Computing Platform - Drive Thor
World Foundation Model Development Platform - Cosmos
Cosmos Training Paradigm
NVIDIA DriveOS: Foundation Platform Built for Autonomous Driving
Core Design Concept of NVIDIA Multicast
End-to-End Intelligent Driving Framework - Hydra-MDP
Self-Developed Model Architecture - Model Room
3.5 Momenta - Intelligent Driving Large Model
Profile
R7 Reinforcement Learning World Model
Mass-Produced Vehicles Equipped with R7
R6 Flywheel Large Model
Disassembly of One-Model End-to-End
Algorithm Development Path
Evolution Roadmap of Intelligent Driving Large Models
Intelligent Driving Technology Evolution and Industrial Paradigm Changes
End-to-End Planning Architecture
End-to-End Large Model Mass Production Solutions
3.6 DeepRoute.ai - Intelligent Driving Large Model
Product Layout and Strategic Deployment
Launched Unified Foundation Model in 2026
Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (1)
Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (2)
Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (3)
Value Brought by Foundation Models
End-to-End Intelligent Driving Large Model Evolution, 2023-2026
DeepRoute IO 2.0: VLA 2.0 (1)
DeepRoute IO 2.0: VLA 2.0 (2)
VLA2.0 Designated Mass Production Projects
Adopted End-to-End Intelligent Driving Solutions in 2023
In-Depth Cooperation with Volcano Engine in 2025
Implementation Platform of RoadAGI - AI Spark
End-to-End VLA Model: VLA1.0
End-to-End VLA Model: Architecture of VLA1.0
End-to-End 1.0 Designated Mass Production Projects
Introduction of Hierarchical Hint Tokens
End-to-End Training Solution - DINOv2
Application Value of DINOv2 in Computer Vision
VQA Evaluation Dataset for Intelligent Driving
BLEU Evaluation Metrics and CIDEr Automatic Evaluation Metric for Image Caption Generation Tasks
Score Comparison between DeepRoute HoP and Huawei Solution
3.7 Huawei - End-to-End Intelligent Driving Large Model
Evolution Roadmap of Qiankun Intelligent Driving Large Model (ADS2.0 to ADS5)
ADS 5 (1): WEWA 2.0 Architecture
Comparation between WEWA2.0 and WEWA1.0
ADS 5 (2): Computing Power
ADS 5 (3): Benchmarking of Four Versions and Production Vehicle Models
Hierarchical Architecture of Pangu Large Model
Pangu Model Product System (1)
Pangu Model Product System (2)
ADS 4: WEWA 1.0
In-Depth Integration of ADS 4 and XMC, and Cloud Simulation Verification
ADS 4: Commercial L3 Highway Solution
Mass Production of ADS 4 End-to-End
ADS 2.0 (1): End-to-End Concept and Perception Algorithm
ADS 2.0 (2): End-to-End Concept and Perception Algorithm
Summary of ADS 2.0
ADS 3.0 (1): End-to-End
ADS 3.0 (2): End-to-End
ADS 3.0 (3): ASD3.0 VS. ASD2.0
ADS 3.0 End-to-End Application Case (1): STELATO S9
ADS 3.0 End-to-End Application Case (2): LUXEED R7
ADS 3.0 End-to-End Application Case (3): AITO Series
Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (1)
Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (2)
Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (3)
Multimodal LLM End-to-End Autonomous Driving Solution
End-to-End Test - VQA Tasks
Architecture of DriveGPT4
End-to-End Training Solution Case
Two Training Stages of DriveGPT4
Comparison between DriveGPT4 and GPT4V
3.8 QCraft - Intelligent Driving Large Model
Product Matrix in Intelligent Driving: Three-Tier Product Matrix of Intelligent Driving System QPilot 2.0
Mass-Produced Urban NOA End-to-End Solution Based on Single Journey 6M Chip
Core Technologies Implementing Urban NOA with Single J6M Chip: Interpretable One-Model End-to-End
Core Technologies Enabling Ultimate Urban NOA Experience: VLA and World Model Architecture
Evolution of Intelligent Driving Large Models
Intelligent Driving Solution Evolution Roadmap
Data and Model Training Closed Loop
Ecosystem Partners Panorama
3.9 Bosch - Intelligent Driving Large Model
Zongheng Driving Assistance Solution
Urban Driving Assistance Solution Based on End-to-End Model
China Strategic Layout of Bosch Mobility
Bosch Mobility Launched New Organizational Restructuring and Strategic Cooperation Based on End-to-End Development Trends
Adopt One-Model End-to-End for Mass Production Solutions
End-to-End Technical Route of Premium Zongheng Driving Assistance Solution
Disassembly of One-Model End-to-End Technical Paradigm
Comparison between End-to-End Mass Production Solutions
Overall Design Idea of CriticVLA
Architecture of CriticVLA (1)
Architecture of CriticVLA (2)
Classification System of Foundation Models for Autonomous Driving Trajectory Planning
Customized Foundation Models for Trajectory Planning: Fine-Tuning
Foundation Model for Autonomous Driving Trajectory Planning: Customized Foundation Models for Trajectory Planning
Foundation Model for Autonomous Driving Trajectory Planning: Models Focused Solely on Trajectory Planning
Models and Core Features of Trajectory Planning Methods with Language Interaction Capability
Core Features of Models with Action Interaction Capability: Training Datasets, Training Methods and Evaluation Metrics
3.10 WeRide - End-to-End Large Model
Profile
Business Model
Financial Overview, 2023-2025
Five Major Product Matrices
Exploration of Business Model for L4 Autonomous Driving Multi-Scenario Application
Traditional Autonomous Driving Architecture: Two Major Problems of Perception-Prediction-Planning-Control Modular Pipeline
Unsolved Problems of One-Model End-to-End
E2E + Traditional Pipeline Dual Architecture
E2E Model Architecture
Evolution Route of End-to-End Autonomous Driving Large Models
Hardware Architecture of Gen8 L4 Autonomous Driving System
HPC 3.0
Self-Developed General Simulation Model: WeRide GENESIS
3.11 Pony.ai - End-to-End Intelligent Driving Large Model
Profile
Three Major Business Lines and Business Model
Robotaxi Business Layout
Business Model of Robotaxi
Revenue Overview, 2024-2025
Comparative Analysis between Pony.ai and WeRide: Market Value, Revenue, Business, Robotaxi Business and Intelligent Driving Models
PonyWorld World Model 2.0 (1)
PonyWorld World Model 2.0 (2)
PonyWorld World Model 2.0 (3)
PonyWorld World Model 2.0 (4)
E2E End-to-End Intelligent Driving Model
Evolution Route of 1st to 7th Generation Robotaxi Products
Released New-Generation Autonomous Driving Domain Controller
Ecosystem Partners
3.12 Baidu - End-to-End
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
Overview of Baidu Apollo
Robotaxi Business Layout
Commercial Implementation Progress of Robotaxi (1): Overseas Markets
Commercial Implementation Progress of Robotaxi (2): Domestic Market
Key Nodes of Robotaxi Deployment in 8 Cities in China, 2021-2026
Two-Model End-to-End: Adopt the Strategy of Segmenting First and Then Joint Training
Production Vehicle Equipped with Two-Model End-to-End Architecture: Jiyue 07
Baidu Automotive Cloud 3.0 Enables End-to-End Systems in Three Aspects (1)
Baidu Automotive Cloud 3.0 Enables End-to-End Systems in Three Aspects (2)
3.13 SenseAuto - End-to-End
Profile
Technical Route Analysis 1: End-to-End Autonomous Driving Evolution Roadmap
Technical Route Analysis 2: Analysis of Generative Intelligent Driving R-UniAD (1)
Technical Route Analysis 3: Analysis of Generative Intelligent Driving R-UniAD (2)
Architecture of R-UniAD
Practical Demonstration of R-UniAD: Complex Scene Mining, 4D Simulation Reproduction, Reinforcement Learning and Generalization Verification
Kaiwu World Model 2.0
Mass Production
Released UniAD End-to-End Solution
DriveAGI: New-Generation Intelligent Driving Large Model and Its Advantages
DiFSD: End-to-end Intelligent Driving System That Simulates Human Driving Behaviors
DiFSD: Technical Interpretation
3.14 Wayve - Intelligent Driving Large Model
Profile
Advantages of AV 2.0
Latest Progress: Architecture of GAIA-1 World Model
GAIA-1 World Model - Token
GAIA-1 World Model - Generation Effects
LINGO-2 Model
3.15 Waymo - Intelligent Driving Large Model
Foundation Model
Building the Driver Algorithm
Validating the Driver Algorithm
Released Multimodal End-to-End Model EMMA
EMMA: Multimodal Input
EMMA: Defining Driving Tasks as Visual Q&A
EMMA: Introducing Chain-of-Thought Reasoning to Enhance Interpretability
Limitations of EMMA Model
Implementation and Operation
3.16 GigaAI - End-to-End
Profile
Evolution Route of World Models
Hierarchical Construction Method for 4D Generative World Models
Application of World Models (1)
Application of World Models (2)
ReconDreamer
World Model: DriveDreamer
World Model: DriveDreamer 2
Overall Framework of DriveDreamer4D
3.17 Nullmax - Intelligent Driving Large Model
Profile
MaxDrive Driving Assistance Solution
New-Generation Intelligent Driving Technology - Nullmax Intelligence
End-to-End Technical Architecture
End-to-End Data Platform
HiP-AD: End-to-End Intelligent Driving Framework Based on Multi-Granularity Planning and Deformable Attention
Mass Production

4 End-to-End Autonomous Driving Layout of OEMs

4.1 Xiaomi
Profile
2026 Strategic Planning/li>
Comprehensive Analysis of New Vehicle Planning in 2026
Product Positioning and Parameter Benchmarking of 2026 New Vehicles (1)
Product Positioning and Parameter Benchmarking of 2026 New Vehicles (2)
Organizational Structure Changes of Intelligent Driving Division
Intelligent Driving Technical Route: Full-Route Pre-Research without Betting on Single Technology
Comparison between VLA and End-to-End Routes
Intelligent Driving Algorithm Evolution Trend: from Modular End-to-End to End-to-End Architecture Introducing World Model + Reinforcement Learning
Launched XLA Cognitive Large Model in 2026
Evolution Roadmap of Intelligent Driving System and Large Models
Enhanced Version of HAD (1)
Enhanced Version of HAD (2)
End-to-End VLA Intelligent Driving Solution Orion
ORION Framework
Physical World Modeling Architecture
Multi-Model End-to-End with Three-Layer Separated Modeling
Long Video Generation Framework - MiLA
4.2 XPeng
Evolution Roadmap of End-to-End Intelligent Driving Large Models
Autonomous Driving Product Planning, 2025~2026
L4 Autonomous Driving Layout in 2026: Robotaxi
Second-Generation VLA: Native Multimodal Physical World Large Model
L4 Capability = Model X Computing Power X Data X Vehicle Hardware
Second-Generation VLA (1)
Second-Generation VLA (2)
World Foundation Model (1)
World Foundation Model (2)
Core Technical Path of World Foundation Model
Three Phased Achievements in R&D of World Foundation Model
Cloud Model Factory (1)
Cloud Model Factory (2)
End-to-End System: Architecture
4.3 Li Auto
Evolution Roadmap of End-to-End Intelligent Driving Large Models (1)
Evolution Roadmap of End-to-End Intelligent Driving Large Models (2)
Launched New-Generation Unified Architecture MindVLA-o1 in 2026 (1)
Launched New-Generation Unified Architecture MindVLA-o1 in 2026 (2)
Next-Generation Unified Architecture MindVLA-o1 (1)
Next-Generation Unified Architecture MindVLA-o1 (2)
Next-Generation Unified Architecture MindVLA-o1 (3)
Evolution from E2E+VLM Dual System to MindVLA
Architecture of MindVLA Model
Core Technology 1 of MindVLA: Great 3D Physical Spatial Perception Capability
Core Technology 2 of MindVLA: Integration with Large Language Model (LLM)
Core Technology 3 of MindVLA: Combination of Diffusion and RLHF
Core Technology 4 of MindVLA: World Model and NVAIE Accelerated Reinforcement Learning
End-to-End Solution (1): Iterative Evolution of System 1
End-to-End Solution (2): System 1 (End-to-End Model) + System 2 (VLM)
End-to-End Solution (3): Intelligent Driving Technical Architecture
End-to-End Solution (4): DriveVLM Large Model - Architecture
End-to-End Solution (5): DriveVLM Large Model - Rendering Effects
End-to-End Solution (6): DriveVLM Large Model - BEV and Text Feature Processing
4.4 Tesla
Interpretation of 2024 AI Conference
Development History of AD Algorithms
Summary of End-to-End Progress, 2023-2024
FSD v13 (1)
FSD v13 (2)
FSD v13 (3): Subsequent Updates
Development History of AD Algorithms: Entering the Perception-heavy Map-light Era
Development History of AD Algorithms: Shadow Mode
Development History of AD Algorithms: Background of Occupancy Network Adoption
Development History of AD Algorithms: Occupancy Network (1)
Development History of AD Algorithms: Occupancy Network (2)
Development History of AD Algorithms: Occupancy Network (3)
Development History of AD Algorithms: Multi-Camera Fusion Algorithm HydraNet
Development History of AD Algorithms: FSD V12
Core Elements of Perception-Decision Full-Stack Integrated Model
End-to-End Algorithms
World Model (1)
World Model (2)
Data Engine
Dojo Supercomputer Center: Overview
Dojo Supercomputer Center: Training Tile Based on D1 Chip Integration
Dojo Supercomputer Center: Computing Power Development Plan
4.5 NIO
Organizational Structure Adjustment of Intelligent Driving Division, 2024-2025
From Model-Based to End-to-End, World Model Becomes Dominant Technical Paradigm
Evolution Route of End-to-End Large Models
Detailed Explanation of Intelligent Driving System
NIO World Model (NWM) (1)
NIO World Model (NWM) (2)
Imagination Reconstruction Capability and Swarm Intelligence of World Model
NSim Simulator (NIO Simulation)
World Model 2.0
Comparation between End-to-End Model and World Model
Comparation between VLA and World Model
4.6 Changan
Dubhe Plan 2.0 - Tianshu Intelligent Driving
Software Architecture of TOPS AD
Brand Layout
ADAS Strategy: "Dubhe Plan" Strategy
End-to-End System: BEV+LLM+GoT (1)
End-to-End System: BEV+LLM+GoT (2)
Production Vehicle Equipped with End-to-End System: NEVO E07
4.7 Chery
Product Matrix and Vehicle Models
Evolution History of Intelligent Driving System
Launched Four Versions of Falcon Pilot in 2025
Progress of End-to-End Intelligent Driving Large Models (1)
Progress of End-to-End Intelligent Driving Large Models (2)
4.8 GAC Group
Intelligent Driving Large Model Strategy
Evolution Roadmap of ADiGO Intelligent Driving System (ADiGO1.0 to ADiGO6.0)
Launched Five Major Intelligent Driving Platforms in 2025
L2.9 Vehicles and Urban NOA Algorithm/Intelligent Driving System Suppliers
Achieves "High-End Orientation + Mass Popularization" of Urban NOA through "Dual-Gradient Intelligent Driving Suppliers + Scenario-Price Precision Matching" Strategy
Established Huawang Adopting the "GAC Smart Manufacturing + Huawei Intelligence" Model to Expand High-End Market and Improve Brand Matrix
First Model Huawang Aistaland F03 Expected to Be Launched in Q2 2026
Momenta 5.0 One-Model End-to-End Algorithm Is Deployed on RMB150,000-Level Vehicles, and Urban NOA Function Is Also Available
Trumpchi Xiangwang S7 to Be Equipped with Momenta R6 Reinforcement Large Model
Architecture of ADiGO End-to-End Embodied Reasoning Model
Core Technologies of ADiGO
4.9 Leapmotor
Released World Model in 2026
D19 Adopts VLA Large Model to Realize Full-Scenario Door-to-Door NOA
Adopts Intelligent Driving System Self-Development Model
Evolution Roadmap of Leapmotor Pilot (1)
Evolution Roadmap of Leapmotor Pilot (2)
End-to-End High-Level Intelligent Driving
Application Scenarios of End-to-End High-Level Intelligent Driving
4.10 IM Motors
Iteration History of Intelligent Driving System
Cooperation with Momenta on Intelligent Driving
IM AD End-to-End 2.0 Intelligent Driving Large Models
Core Technologies of IM AD End-to-End 2.0 Intelligent Driving Large Models
Application Scenario Comparison between IM AD End-to-End 2.0 Intelligent Driving Large Models
4.11 FAW Hongqi
Technical Architecture of Sinan Intelligent Driving
Core Technologies of End-to-End Large Models
Sinan Intelligent Driving Solution
Vehicle Deployment Schedule and Future Planning of Sinan Intelligent Driving Solution
Sinan Intelligent Driving System: Co-Developed with DJI Zhuoyu Technology (1)
Sinan Intelligent Driving System: Co-Developed with DJI Zhuoyu Technology (2)
Deployed Vehicles and Key Configurations of Sinan Intelligent Driving System
Zhuoyu End-to-End 4.0 System Debuted with Sinan Intelligent Driving in 2026
FAW Hongqi 9 Series Models to Adopt Huawei Hi Mode in 2026
4.12 Dongfeng
Intelligent Driving Strategic Plan 2026-2030
Launched Four-Tier Tianyuan Intelligent Driving Product Matrix in 2025: Full Coverage from L2 to L4/L5
Comparison of Intelligent Driving Configurations between Production Vehicles First Equipped with Tianyuan T100/T200/T500
Tianyuan Intelligent Driving Technical Architecture R-AiD
Intelligent Driving Strategy: Self-development + External Procurement in Parallel in Short Term, and Gradual Self-development for Replacement in Long Term
4.13 BYD
Overview of 2026 Intelligent Driving Planning
Layout in Intelligent Driving Field: Pre-Research on World Models
Organizational Structure Adjustment of Intelligent Driving Team (1): Integration of Dual Intelligent Driving Departments to Pool Resources to Accelerate Universal Intelligent Driving
Organizational Structure Adjustment of Intelligent Driving Team (2): Establishment of Advanced Technology R&D Center to Increase Investment in

지능형 주행 엔드투엔드 대규모 모델 조사 보고서(2026년)

Intelligent Driving End-to-End Large Model Research Report, 2026

목차

제1장 엔드투엔드 자율주행 기술 기초

제2장 엔드투엔드 자율주행 기술적 루트와 개발 동향

제3장 엔드투엔드 자율주행 공급업체

제4장 엔드투엔드 자율주행 OEM 레이아웃

Table of Contents

1 Fundamentals of End-to-End Autonomous Driving Technology

2 Technical Routes and Development Trends of End-to-End Autonomous Driving

3 End-to-End Autonomous Driving Suppliers

4 End-to-End Autonomous Driving Layout of OEMs

지능형 주행 엔드투엔드 대규모 모델 조사 보고서(2026년)

Intelligent Driving End-to-End Large Model Research Report, 2026

목차

제1장 엔드투엔드 자율주행 기술 기초

제2장 엔드투엔드 자율주행 기술적 루트와 개발 동향

제3장 엔드투엔드 자율주행 공급업체

제4장 엔드투엔드 자율주행 OEM 레이아웃

Table of Contents

1 Fundamentals of End-to-End Autonomous Driving Technology

2 Technical Routes and Development Trends of End-to-End Autonomous Driving

3 End-to-End Autonomous Driving Suppliers

4 End-to-End Autonomous Driving Layout of OEMs

원하시는 정보를 찾아 드릴까요?