Yuanrong Qixing Develops VLA Model for Intelligent Driving Adventures

Yuanrong Qixing has developed the VLA model, embarking on a 'crazy adventure' in the realm of intelligent driving. From concept to realization, they interpret their technological faith through perseverance, advancing intelligent driving from 'execution' to 'thought'. On a hot afternoon in June 2024, Zhou Guang was in a test vehicle near a traffic light where a nondescript sign indicated 'vehicles turning left are not controlled by the light', yet the test vehicle still stopped for the red light. That moment made him realize that human drivers could instantly understand such special scenarios, while the then most advanced end-to-end models struggled to comprehend written traffic signs. This issue planted a seed in his mind, and he brought it up during several internal meetings with the R&D team. Meanwhile, Yuanrong Qixing was exploring various paths in general artificial intelligence, leading to the birth of the VLA prototype in the RoadAGI laboratory. This prototype possesses environmental semantic understanding capabilities, enabling it to perform simple tasks such as picking up and placing items and avoiding obstacles. During a demo, Zhou found that this model, which can make decisions based on environmental information and language commands, bore a striking resemblance to intelligent driving models that require complex road situation interpretation, but with the significant difference of understanding semantic information. With the emergence of large language models like ChatGPT, Zhou and his team became increasingly certain that as current end-to-end models struggled with corner cases, the VLA, integrating language understanding, could forge a new path—this was not merely a technical overlay but a way for machines to truly 'understand' the physical world. In September 2024, Yuanrong Qixing elevated the VLA model to a company-wide R&D project. In the whirlwind of technological development, timing is often crucial; this is a 'tech gamble' ahead of the industry. Leaders do not wait for the 'wind' but instead see the direction of the wind early on. As an AI company, Yuanrong Qixing firmly believes that AI technology will reshape productivity relationships and lead the Fourth Industrial Revolution. Among the many AI-derived applications, Yuanrong Qixing chose intelligent driving as its breakthrough point because it can break the barriers between the digital and physical worlds. With the large-scale application of assisted driving technology, the foundational model created by Yuanrong Qixing will deeply interact with driving behavior and the physical world, thus gaining insights into its operational rules. Whether it's a 'no-map' solution, an end-to-end model, or the VLA model, Yuanrong Qixing consistently focuses on solving problems through AI technology. However, the hardest part is not the technology itself but finding the right path in the unmapped territory. The emergence of the VLA model allows intelligent driving to transition from 'executor' to 'thinker'—it begins to understand 'why to drive this way,' not just 'how to drive.' This is the advantage of the VLA model and the beginning of the R&D journey. After deciding on a new technological direction, excitement filled the team, eager to be the industry leader and develop a better intelligent driving solution. However, as they began the work, numerous difficulties arose. On one hand, VLA has limited research applications in the field of intelligent driving, leading to a scarcity of reference knowledge, requiring R&D personnel to read extensively and gradually explore; on the other hand, the company faced pressure for customer mass production, with primary resources leaning towards mass production projects, and the effectiveness of new technology remained uncertain, causing VLA development to proceed cautiously and slowly. 'At first, we were all captivated by VLA's language talent,' recalled product manager Shi Jie. The VLA model possesses strong text comprehension and OCR recognition capabilities, and the team invested considerable effort in overcoming scenarios involving tidal lanes, variable lanes, and waiting areas. When the test vehicle successfully navigated the previously troublesome 'vehicles turning left are not controlled by the light' sign and provided a text explanation of its driving decision process, everyone in the vehicle was thrilled, resolving the 'black box' issue of current end-to-end systems and achieving transparent reasoning through a chain of thought (CoT), greatly enhancing user trust. At the same time, the VLA model learns vast knowledge from the internet, capable of handling many corner cases, such as recognizing overloaded small trucks and tires on the road; it can also interact with the vehicle via voice commands, allowing real-time dialogue to control the vehicle. However, a perilous moment during a test shifted everyone's expectations of the VLA. One day, while the test vehicle was about to turn left under a bridge with no visible vehicles, it suddenly encountered a delivery rider, forcing the vehicle to make an emergency stop, startling everyone inside. This prompted reflection: if an experienced driver were behind the wheel, they would have slowed down in advance to avoid risks in the bridge's blind spot. This realization made it clear to everyone that safety is the lifeline of assisted driving; users need a genuinely safe assisted driving system that can proactively predict and avoid risks, which is more crucial than 'voice interaction.' Advanced semantic reasoning about the entire scenario is what's currently lacking in end-to-end systems, yet is a strength of the VLA; at that moment, 'defensive driving' became the core evolutionary direction of the VLA model. Technical breakthroughs can continually push limits, but safety will always be our baseline. In balancing safety, efficiency, and comfort, we strive to make assisted driving a beloved daily mode of transportation for users. The R&D journey is also filled with technical challenges. The developers of the VLA model, including Xiao Yi, are exploring various aspects. The development of the VLA model involves processes such as architecture design, data exploration and scaling, model validation, deployment, and continuous iteration. Initially, Xiao Yi planned for cloud-based inference, deploying the large language model in the cloud and sending data back to the vehicle for control. However, the latency from cloud to vehicle was too significant for real-world driving; for example, at a speed of 60 km/h, assuming a 2-second delay, the vehicle would have already traveled 33 meters by the time the cloud results were received, and during that time, the actual road conditions would have changed, posing a great threat to driving safety in scenarios requiring timely responses. A month later, the development team abandoned cloud-based inference in favor of local model deployment. For the vehicle's limited computing power, local deployment posed new challenges for model design, acceleration, and deployment optimization. The R&D team compressed the vocabulary and applied pruning and acceleration to the model, while Yuanrong Qixing's powerful inference engine team conducted extensive operator optimizations, memory optimizations, and hardware feature adaptations, ultimately allowing the VLA to run smoothly on the vehicle. Additionally, there was a larger challenge at hand. Data is the foundation of all AI models, and high-quality, large-scale data is crucial for the VLA model. Relying solely on manual labeling was inefficient. Subsequently, Xiao Yi adopted an iterative large model approach to automate data labeling, solving the challenge of scaling data annotation; currently, Yuanrong Qixing has reached a scale of tens of millions of Clips. 'Now the industry is chasing large models, but what is truly scarce is an understanding of the essence of driving,' Zhou Guang often emphasizes to the team. While the industry is busy loading more corpora into systems, Yuanrong Qixing's VLA is deeply learning 'how to make the safest decisions in imperfect human driving environments'—this is the soul of the AI driver. This year, more than five models equipped with Yuanrong Qixing's VLA model will be mass-produced, with the first model set to hit the road in August. 'For the VLA, I hope it can be applied to Robotaxi and become a genuine AI driver, allowing users to communicate directly with it in a quiet environment. It will not only respond to commands but also proactively guard safety,' Zhou Guang stated. With the iteration and application of the VLA model, we anticipate it will not only become users' 'AI driver' but also drive the entire industry towards a safer, more transparent intelligent driving era, making every journey more secure and comfortable. The development of the VLA is a microcosm of Yuanrong Qixing's technological faith—refusing to be a follower, but rather a definer. This path is difficult but worthwhile. Navigating the tumultuous waves of industry development, Yuanrong Qixing understands that the VLA is merely a temporary anchor in the journey. Only by grounding ourselves in technological R&D can we navigate through the storms and steer towards the depths of human wisdom. In the future, Yuanrong Qixing will not be limited to automotive applications but is committed to training more advanced AI models to empower various intelligent agents, achieving point-to-point mobility, first reaching RoadAGI (Road General Artificial Intelligence), and ultimately progressing towards general artificial intelligence, igniting a transformative singularity in human productivity.

Yuanrong Qixing Develops VLA Model for Intelligent Driving Adventures

Images