The Compression Story: Why Cheaper AI Models Are the Real Game Changers

For the past few years, artificial intelligence capability has been on a seemingly one-way trajectory toward scale. The prevailing narrative has focused on bigger models, more compute power, and exponentially higher costs.

The assumption baked into most enterprise AI adoption strategies has been simple: powerful equals expensive and cloud-dependent. But that assumption is breaking rapidly, and the implications for the project delivery sector are profound.

The recent releases of Alibaba's Qwen 3.5 small model series [1] and Google's Gemini 3.1 Flash-Lite [2] represent two simultaneous compressions happening at opposite ends of the technology stack. One pushes robust intelligence down to the edge device. The other makes cloud inference cheap enough to run at immense volume without rationing.

In our view, these aren't just incremental efficiency wins; they are the catalyst for a fundamental shift in how we capture, process, and manage project data.

The Race to the Bottom (in a Good Way)

Let's look closely at the numbers, because the economics of AI are changing faster than the capabilities. Google's Gemini 3.1 Flash-Lite is priced at one-eighth the cost of Gemini 3.1 Pro and one-quarter the cost of Anthropic's Haiku [2].

Yet, despite this drastic price reduction, it scored a 12-point jump on the Artificial Analysis Intelligence Index over its predecessor [2]. It is explicitly designed for high-volume data processing, routing, and translation workloads.

Meanwhile, Alibaba's open-weight Qwen 3.5 series offers models as small as 0.8 billion parameters that can run locally on everyday hardware, while still beating older, massive 200-billion parameter models in efficiency per token [1].

"The new system improves both performance and cost efficiency. Qwen-3.5 was designed with 'native multimodal capabilities,' meaning it can process text, images, and video inside a single model rather than requiring separate tools." — Alibaba Cloud

What we are witnessing is the rapid commoditisation of baseline intelligence. The real battleground in AI competition is no longer just about who can build the smartest, most capable model for complex reasoning, but who can build the fastest and cheapest one for everyday utility.

Why Construction Specifically Needs This

Project delivery sits at an awkward intersection of data generation and data utility. We generate enormous volumes of unstructured data daily: requests for information (RFIs), meeting minutes, variation orders, daily site logs, cost reports, and complex submittals. Historically, this data has been too expensive and too sensitive to process comprehensively with AI.

The sensitivity piece—which includes commercial data, subcontractor pricing, and preliminary estimates—has kept many firms highly cautious about uploading their data to public cloud models. The cost piece has meant that when cloud-processed AI is used, it gets applied selectively rather than systematically.

You might use a large language model to draft a single difficult contractual email or summarise a specific report, but you wouldn't use it to continuously monitor and cross-reference every single communication across a £500 million project. It simply cost too much.

Both of these blockers are weakening simultaneously. The Qwen 3.5 small models mean that highly sensitive data can be processed locally, entirely on-device or on a secure local server, without ever touching an external network [1]. The Gemini 3.1 Flash-Lite pricing means that non-sensitive, high-volume data can be processed in the cloud at a scale that was previously cost-prohibitive [2].

The Real-Time Operating Layer

When edge inference is powerful enough and cloud inference is cheap enough, the industry hits a critical threshold. We reach a point where all project data becomes processable.

Not sampled. Not summarised after the fact for a monthly report. Captured, analysed, and acted upon continuously.

The forward implication isn't just about administrative efficiency. It is that project intelligence stops being a retrospective reporting function and becomes a real-time operating layer. Consider these immediate use cases:

Instant RFI cross-referencing: Every RFI is instantly checked against the entire project history, local building codes, and previous similar projects, with the analysis happening locally on a site manager's laptop.
Automated progress tracking: Thousands of daily site photos are processed through a cheap cloud model to automatically update progress schedules and verify materials delivered against the BIM model.
Predictive safety monitoring: Visual data is analysed continuously to flag potential safety hazards before they become incidents.

This isn't science fiction or a five-year projection. The models capable of doing this efficiently and cheaply exist today. The barrier is no longer technological; it is organisational and structural.

The Strategic Pivot: The NEOM Example

We are already seeing physical infrastructure adapt to this new digital reality. Consider NEOM's 'The Line' in Saudi Arabia. Originally pitched as a 170-kilometre sci-fi megacity designed to house nine million people, it is now being drastically scaled back and repurposed as an AI data centre hub. [3]

This strategic pivot reflects the growing global trend of AI infrastructure development taking precedence over traditional real estate. The world is reallocating massive amounts of capital from traditional mega-projects to the digital infrastructure required to support this intelligence layer.

For project delivery professionals, the lesson is clear: the digital tools are getting cheaper, but the physical infrastructure to support the broader AI ecosystem is becoming the new gold rush.

Actionable Insights for Project Leaders

So, how do we adapt to this compression of cost and capability? The era of expensive, precious AI is ending. We must adjust our operational strategies accordingly.

The most powerful approach is to use a hybrid model combining both compression strategies. Use Qwen 3.5 locally for sensitive data processing and real-time site analysis. Use Gemini 3.1 Flash-Lite in the cloud for high-volume, non-sensitive data processing. This gives you the security benefits of edge computing combined with the cost efficiency of cloud processing.

For example, you might process all site photos locally using Qwen 3.5 to identify potential safety issues before they are transmitted off-site. Then, you use Flash-Lite in the cloud to process all RFIs, change orders, and communications, routing them to the appropriate team members. The combination gives you real-time safety monitoring with cost-effective administrative processing.

The era of expensive, precious AI is ending. The era of cheap, ubiquitous, and local intelligence is here. The firms that win will be the ones that embed this cheap intelligence into every layer of their operations.

Takeaways

Stop rationing AI: If you are still treating AI queries as a scarce resource within your teams, you are using the wrong models. Look into integrating models like Gemini 3.1 Flash-Lite via API for high-volume, low-complexity tasks. Build workflows that assume intelligence is essentially free.
Explore local inference: If data privacy, client confidentiality, or lack of site connectivity are your primary blockers, investigate open-weight models like Qwen 3.5. Setting up local AI capabilities completely removes the security risks associated with cloud transmission.
Rethink your data strategy: If processing data is practically free, your competitive advantage shifts from how you process data to what data you capture. Start capturing everything. Upgrade your site sensors, mandate comprehensive digital logging, and ensure your data lakes are structured appropriately.

Links and Stuff

[1] VentureBeat: Alibaba's small, open source Qwen3.5-9B beats OpenAI's GPT-OSS.

[2] Google Blog: Gemini 3.1 Flash-Lite: Built for intelligence at scale.

[3] EcoTicias: The 170 km "science fiction city" no longer exists.

[4] Inc.com: Alibaba Unveils a Faster, Cheaper Qwen-3.5 AI.

All content reflects our personal views and is not intended as professional advice or to represent any organisation.