|
시장보고서
상품코드
1613812
중국의 자동차 멀티모달 인터랙션 개발(2024년)China Automotive Multimodal Interaction Development Research Report, 2024 |
||||||
현재 조종석 인터랙션 용도 중 음성 인터랙션은 지능형 조종석에서 가장 광범위하고 빈번하게 사용되고 있으며, ResearchInChina의 최신 통계에 따르면 2024년 1월부터 8월까지 약 1,100만 대의 자동차에 자동 음성 시스템이 탑재되었습니다. 바이두 아폴로(Baidu Apollo)의 지능형 조종석 사업 총경리인 리타오(Li Tao)는 "처음에는 하루에 3-5번 정도 사용했지만 지금은 두 자릿수로 늘어났고, 음성 대화 기술을 선도하는 일부 모델에서는 세 자릿수에 육박한다"고 말했습니다. 음성 대화 기술을 선도하는 일부 모델에서는 세 자릿수에 육박하고 있다"고 지적했습니다.
음성 인식 기능의 빈번한 사용은 사용자의 인터랙티브 경험을 크게 최적화할 뿐만 아니라 터치 및 얼굴 인식과 같은 다른 인터랙티브 방식과의 융합이라는 개발 추세를 촉진하고 있습니다. 예를 들어 NIO Banyan 2.4.0의 풀 캐빈 메모리 기능은 얼굴 인식을 기반으로 NOMI가 정보를 기록한 탑승자에게 적극적으로 인사합니다(예 : "Good morning, Doudou") Zeekr 7X는 음성 인식과 눈 접촉을 통합하여 운전자가 눈으로 말하고 고개를 기울일 수 있습니다. 고개를 기울여 음성으로 차량을 조작할 수 있습니다.
지문, 정맥, 심박수 등 생체인식 기술은 음성인식, 얼굴인식 등 성숙한 상호작용 방식에 비해 아직 탐색 및 개발 초기 단계에 있지만, 점차 대량 생산 및 활용이 이루어지고 있습니다. 예를 들어 BYD는 2024년 손바닥 정맥 인증 기능을 출시해 편리한 차량 잠금 해제를 실현했고, 제네시스와 메르세데스-벤츠는 각각 2025 제네시스 GV70과 2025 메르세데스-벤츠 EQE BEV에 지문 인증 시스템을 탑재했다, 사용자는 지문만으로 본인 확인, 차량 시동, 결제 등 다양한 조작을 할 수 있습니다. 또한 Exeed Sterra는 새로운 ET 모델에도 ArcSoft의 시각 인식 기술을 사용하여 차량내 지능형 건강 모니터링 기능을 구현하여 심박수, 혈압, 혈중 산소 포화도, 호흡수, 심박수 변동 등 5가지 주요 신체 지표를 포함한 건강 보고서를 출력합니다.
생체인식 기술의 도입은 운전의 편리성을 향상시킬 뿐만 아니라 자동차의 안전 보호 성능을 크게 향상시켜 피로 운전, 자동차 도난 등 잠재적인 안전 위험을 효과적으로 방지할 수 있습니다. 향후 이러한 생체인식 기술은 지능형 커넥티드카 개발에 더욱 광범위하게 적용되어 운전자에게 더욱 안전하고 개인화된 모빌리티 경험을 제공할 것으로 예상됩니다.
사례1: Genesis 2025GV70의 지문인식 시스템은 사용자가 지문인식을 통해 개인화된 설정(시트, 포지션 등)을 빠르게 적용할 수 있을 뿐만 아니라 차량 시동/운전을 지원합니다. 또한 간편 조작, 지문 결제, 주차요원 모드 등 개인화된 연동 기능이 있습니다.
사례 2: BYD의 손바닥 정맥 인증 시스템은 카메라로 손바닥 정맥 데이터를 읽어 8-20cm 거리, 수평 360도, 수직 15도 범위에서 인식합니다. 상업용 이미지 획득 모듈로 정맥 패턴의 이미지를 획득하고, 알고리즘으로 특징을 추출하고 저장하여 최종적으로 식별 및 인식을 실현합니다. 향후 고급 브랜드 Yangwang 모델에 우선 탑재될 가능성이 있습니다.
사례 3: Exeed Sterra ET 모델에는 DHS 지능형 건강 모니터링 기능이 탑재되어 있습니다. 첨단 시각적 멀티모달 알고리즘을 기반으로 체표에서 건강 상태를 실시간으로 분석하여 심박수, 혈압, 혈중 산소포화도, 호흡수, 심박수 변동 등 5가지 주요 신체 지표를 측정하고 건강 보고서를 출력할 수 있습니다.
중국의 자동차 산업에 대해 조사분석했으며, 멀티모달 인터랙션 주류의 방식, 2024년 발매에서 인터랙션 방식의 이용, 각 OEM/공급업체 솔루션, 개발 동향 등의 정보를 제공하고 있습니다.
Multimodal interaction research: AI foundation models deeply integrate into the cockpit, helping perceptual intelligence evolve into cognitive intelligence
China Automotive Multimodal Interaction Development Research Report, 2024 released by ResearchInChina combs through the interaction modes of mainstream cockpits, the application of interaction modes in key vehicle models launched in 2024, and the cockpit interaction solutions of OEMs/suppliers, and summarizes the development trends of cockpit multimodal interaction fusion.
Among current cockpit interaction applications, voice interaction is used most widely and most frequently in intelligent cockpits. According to the latest statistics from ResearchInChina, from January to August 2024, the automate voice systems were installed in about 11 million vehicles, a year-on-year increase of 10.9%, with an installation rate of 83%. Li Tao, General Manager of Baidu Apollo's intelligent cockpit business, pointed out that "the frequency of people using cockpits has increased from 3-5 times a day at the beginning to double digits today, and has even reached nearly three digits on some models with leading voice interaction technology."
The frequent use of voice recognition function not only greatly optimizes user interactive experience, but also promotes the development trend of fusing with other interactive modes such as touch and face recognition. For example, the full-cabin memory function of NIO Banyan 2.4.0 is based on face recognition, and NOMI actively greets occupants who have recorded information (e.g., "Good morning, Doudou"); Zeekr 7X integrates voice recognition with eye contact to enable the driver to see and speak to control, and tilt his/her head to control the car via voice.
Compared with the mature interaction modes such as voice and face recognition, biometric technologies such as fingerprint, vein, and heart rate are still in the early stage of exploration and development, but they are gradually being mass-produced and applied. For example, BYD launched a palm vein recognition function in 2024, which can realize convenient vehicle unlocking; Genesis and Mercedes-Benz introduced fingerprint recognition systems in the 2025 Genesis GV70 and 2025 Mercedes-Benz EQE BEV respectively, allowing users to complete a range of operations such as identification, vehicle start and payment only with fingerprints; in addition, Exeed Sterra still uses visual perception technology provided by ArcSoft in new ET model, realizing in-cabin intelligent health monitoring function, and outputting health reports for users including five major physical indicators, i.e., heart rate, blood pressure, blood oxygen saturation, respiratory rate and heart rate variability.
Introduction of biometric technology not only improves driving convenience, but also significantly enhances the safety protection performance of vehicles, effectively preventing potential safety hazards such as tired driving and car theft. In the future, these biometric technologies will be more widely integrated into the development of intelligent and connected vehicles, providing drivers with a safer and more personalized mobility experience.
Case 1: Fingerprint recognition system of Genesis 2025 GV70 allows users to quickly apply personalized settings (seats, positions, etc.) through fingerprint authentication, and also supports vehicle start/drive. In addition, there are personalized linkage functions such as easy to use, fingerprint payment, and valet mode.
Case 2: BYD's palm vein recognition system uses a camera to read palm vein data for recognition at a distance of 8-20cm, 360 degrees horizontally and 15 degrees vertically. It uses professional image acquisition module to obtain images of vein patterns, extracts characteristics through algorithms and stores them, and finally realizes identification and recognition. In the future, it may be first installed in high-end brand Yangwang models.
Case 3: Exeed Sterra ET model is equipped with DHS intelligent health monitoring function. Based on advanced visual multimodal algorithm, it can analyze health status in real time according to the surface of the human body, measure the five major physical indicators of heart rate, blood pressure, blood oxygen saturation, respiratory rate and heart rate variability, and output a health report.
China Society of Automotive Engineers clearly defines and classifies intelligent cockpits in its jointly released white paper. The classification system is based on capabilities achieved by intelligent cockpits, comprehensively considers the three dimensions of human-machine interaction capabilities, scenario expansion capabilities, and connected service capabilities, and subdivides intelligent cockpits into five levels from L0 to L4.
With the wide adoption of AI foundation models in intelligent cockpits, HMI capabilities have crossed the boundary of L1 perceptual intelligence and entered a new stage of L2 cognitive intelligence.
Specifically, in the stage of perceptual intelligence, intelligent cockpit mainly relies on the in-cabin sensor system, such as cameras, microphones and touch screens, to capture and identify the behavior, voice and gesture information of driver and passengers, and then convert the information into machine-recognizable data. However, limited by established rules and algorithm framework, the cockpit interaction system in this stage still lacks the capability of independent decision and self-optimization, which is mainly reflected in the passive response to input information.
After entering the cognitive intelligence stage, intelligent cockpits can comprehensively analyze multiple data types such as voice, vision and touch by virtue of powerful multimodal processing capabilities of foundation model technology. This feature makes intelligent cockpits highly intelligent and humanized, able to actively think and serve, as well as keenly perceive actual needs of the driver and passengers, providing users with personalized HMI services. perceives
Case 1: SenseAuto introduced an intelligent cockpit AI foundation model product, A New Member For U, at the 2024 SenseAuto AI DAY. It can be regarded as the "Jarvis" on the vehicle, which can weigh up occupants' words and observe their expressions, actively think, serve, and plan. For example, on the road, it can actively turn up the air conditioner temperature and lower music volume for the sleeping children in the rear seat, and adjust the chassis and driving mode to the comfort mode to create a more comfortable sleeping environment. In addition, it can actively detect the physical condition of occupants, find the nearest hospital for the sick ones, and plan the route.
Case 2: NOMI Agents, NIO's multi-agent framework, uses AI foundation models to reconstruct NOMI's cognition and complex task processing capabilities, allowing it to learn to use tools, for example, calling search, navigation, and reservation services. Meanwhile, according to complexity and time span of the task, NOMI is able to perform complex planning and scheduling. For example, among NOMI's six core multi-agent functions, "NOMI DJ" recommends a playlist that suits the context to users based on their needs, and actively creates an atmosphere; "NOMI Exploration" understands based on spatial orientation, matches map data and world knowledge, and answers children's questions, for example, "what is the tower on the side?".