Cypress Falls - AI for Construction & Infrastructure

by Miranda Alldritt

Discussions about estimating models often begin with data quality. The implicit assumption is that better data is the prerequisite for better estimates, and that most modelling limitations are temporary artefacts of incomplete information. In practice, this framing is misleading. Construction and infrastructure estimating has always operated under conditions of partial, inconsistent, and noisy information, and there is little reason to expect that reality to change in any fundamental way.

The more useful question is not how to achieve perfect data, but how to design estimating models that remain defensible, transparent, and practically useful when data is imperfect, which is to say almost always.

This article focuses on what that means in practical terms, and how experienced estimators can think about model design, validation, and use when uncertainty is not a flaw in the system but a defining feature of it.

Imperfection Is Structural, Not Transitional

It is tempting to treat imperfect data as a temporary condition, something that will improve once systems are standardised, classifications are aligned, or reporting discipline increases. While incremental improvements are certainly possible, many sources of imperfection are structural rather than accidental, and will persist regardless of how much effort is invested in data management.

Project data is created to support delivery and commercial control, not future estimation. Scope definitions evolve as designs mature and practical constraints emerge. Cost codes change to reflect procurement strategy or contractual structure. Work is re-sequenced, repackaged, or accelerated in response to site conditions, resource availability, or client priorities. These changes are rational in context, but they leave behind data that is difficult to interpret analytically without deep contextual knowledge that is rarely captured in structured form.

Other sources of imperfection are inherent to the nature of construction and infrastructure delivery. Many asset types are delivered infrequently, particularly for major capital works or specialised infrastructure. Novel designs exist precisely because historical analogues are insufficient, leaving estimators with limited comparable data by definition. Market conditions, regulation, and supply chains introduce variability that is not repeatable or predictable in ways that can be captured cleanly in historical cost databases.

Expecting clean, complete, and stable datasets in this environment is unrealistic. The discipline lies in acknowledging that reality explicitly and designing models that function appropriately within it, rather than waiting for conditions that will not arrive.

What "Imperfect Data" Actually Looks Like in Estimating

In estimating contexts, data imperfection tends to appear in a small number of recurring forms, each with different implications for modelling and each requiring different analytical responses.

Sparse data arises when there are few comparable projects for a given asset type, scope configuration, or delivery method. An organisation might have delivered 200 projects over a decade, but when filtered for comparability (similar asset type, delivery model, scale, and context), the relevant sample might shrink to 8 or 12 projects. This is common in major capital works, specialised infrastructure, or emerging technologies where the portfolio velocity is inherently low.

Incomplete data occurs when key cost drivers were never captured or were recorded inconsistently across projects. This is rarely random; attributes tend to be missing because they were not required for delivery or financial reporting at the time. Site constraint classifications, ground condition assessments, productivity assumptions, or risk allowance rationales may be absent or exist only in narrative form that resists systematic analysis.

Noisy data reflects variability introduced by one-off events that are orthogonal to the underlying cost drivers that matter for future estimates. Weather impacts, mid-project design changes, learning effects on first-of-kind work, accounting artefacts from cost code restructuring, or commercial decisions made for reasons unrelated to technical scope all introduce variation that obscures the patterns estimators need to identify.

Heterogeneous data combines projects that appear similar at a high level but differ materially in ways that are only partially recorded. A portfolio of "hospital projects" might include new builds, refurbishments, operational expansions, and seismic upgrades, each with fundamentally different cost drivers but all classified under the same asset category in the database.

Each of these conditions violates classical modelling assumptions in different ways. The mistake is not encountering them, which is inevitable, but pretending they can be eliminated through data cleaning alone or deferring analytical work indefinitely while waiting for conditions to improve.

The Limits of Data Cleaning as a Strategy

Data cleaning is necessary, but it is not sufficient. No amount of cleaning can recover information that was never captured in the first place, nor can it fully reconcile fundamentally different scope interpretations or delivery contexts after the fact without introducing estimator judgement that should be made explicit rather than embedded in preprocessing decisions.

More importantly, an excessive focus on cleaning can delay learning indefinitely. Estimators often have enough information to extract directional insight long before datasets are pristine. The question is whether the modelling approach can tolerate ambiguity without producing misleading confidence or spurious precision that obscures genuine uncertainty.

Effective cleaning focuses on alignment and coherence, not perfection. Are scope definitions broadly comparable such that differences in scale and complexity can be reasonably attributed to measured drivers? Are costs mapped consistently to estimating drivers rather than to financial reporting categories that shift over time? Are outliers understood in terms of what made them anomalous, rather than simply removed because they are statistically distant? Beyond these alignment tasks, the burden shifts from cleaning data to designing models that behave sensibly when information is incomplete or uncertain.

Shifting the Modelling Objective: Robustness Over Precision

When data is imperfect, the objective of modelling must change fundamentally. Instead of optimising point accuracy, which may not be achievable or even meaningful given data limitations, the goal becomes robustness: models that produce stable, interpretable behaviour across a range of plausible inputs and that degrade gracefully when pushed beyond their intended domain rather than producing confident but unsupportable predictions.

From an estimating perspective, robust models share several characteristics that align with how experienced estimators reason about early-stage costs. They favour simpler relationships unless additional complexity is clearly justified by substantial improvements in explanatory power. They expose sensitivity to key assumptions rather than hiding it within model internals. They support ranges and scenarios rather than single-point outputs. They make uncertainty explicit through confidence intervals or prediction bounds rather than treating it as an implicit qualifier that gets mentioned in narrative but not quantified.

This orientation may differ from how analytical models are sometimes evaluated in academic or research contexts, where point accuracy metrics dominate, but it aligns far more closely with what decision-makers actually need at early project stages: a structured understanding of cost behaviour and its sensitivity to key drivers, not a false sense of precision.

Classical Parametric Approaches and Where They Break Down

Classical parametric techniques remain the foundation of most estimating models for good reason. Regression-based cost estimating relationships, scaling laws derived from engineering principles, and rules-based adjustments grounded in experience are transparent, explainable, and anchored in engineering intuition. They make assumptions visible in ways that support challenge and refinement, and they produce results that estimators can explain to project teams and decision-makers without requiring statistical expertise to interpret.

Their limitations emerge specifically as data quality degrades in the ways described above. Small samples produce unstable regression coefficients that shift materially when individual projects are added or removed from the dataset. Correlated drivers (asset scale and complexity often move together, for instance) undermine the interpretability of individual coefficients and make it difficult to isolate the effect of specific factors. Outliers, which may represent genuine edge cases or may be data artefacts, exert disproportionate influence on fitted relationships and can dominate model behaviour in ways that are not obvious without careful diagnostic analysis.

Estimators often compensate for these limitations by manually adjusting model outputs based on professional judgement, which is entirely appropriate but reintroduces subjectivity without making it explicit, testable, or improvable over time. Recognising these failure modes is important because it clarifies where additional analytical support is genuinely helpful rather than merely adding sophistication without corresponding substance.

Why Modern Analytical Techniques Help with Imperfection

Modern statistical and machine learning techniques are not valuable because they "fix" bad data or eliminate the need for judgement. They are valuable because they are explicitly designed to operate under conditions of uncertainty, sparsity, and noise, and because they formalise responses to data imperfection that experienced estimators already employ informally.

Regularisation techniques (ridge regression, LASSO, elastic net) explicitly penalise model complexity, discouraging the fitting of relationships that are not strongly supported by the available data. In practical terms, regularisation prevents models from chasing apparent patterns that are actually noise or artefacts of small samples. The result is more conservative, generalisable behaviour that is often exactly what is needed in early-stage estimating where extrapolation beyond historical experience is inevitable. The choice between regularisation variants depends on objectives: LASSO performs variable selection and can zero out weak predictors entirely, ridge regression shrinks coefficients without eliminating variables, and elastic net provides a middle ground that is often most appropriate for estimating applications where drivers are correlated but all contribute some information.

Bayesian methods formalise the role of prior knowledge in estimation, allowing experienced judgement to be incorporated explicitly rather than applied as post-hoc adjustments. Instead of treating historical data as the sole source of truth, which is problematic when sample sizes are small, Bayesian approaches encode expectations about cost behaviour (scaling relationships should fall within engineering norms, unit rates should align with market intelligence, productivity cannot exceed physical limits) and allow those expectations to be updated as empirical evidence accumulates. When an organisation has strong domain expertise but limited recent project completions, perhaps five pump stations delivered over the past four years compared to decades of accumulated knowledge, Bayesian updating allows that accumulated knowledge to be weighted appropriately rather than being overridden by five recent observations. As additional projects complete, the model's beliefs shift gradually and transparently toward empirical evidence while maintaining continuity with prior understanding.

Hierarchical models support learning at multiple levels simultaneously, which mirrors how estimators actually think about knowledge transfer across contexts. A portfolio might include treatment plants, pump stations, and reservoirs. At the highest level, general cost-capacity relationships apply across all water infrastructure. At an intermediate level, mechanical complexity affects pump stations differently than treatment plants. At the project level, site-specific factors introduce variation around these general patterns. Hierarchical models learn these relationships jointly rather than treating each project as isolated or forcing all projects into a single undifferentiated model, allowing robust general patterns to inform estimates for contexts with limited specific data.

Machine learning models enable exploration of higher-dimensional driver spaces and identification of interaction effects (how ground conditions affect productivity differently depending on access constraints, for instance) that are difficult to specify manually in advance. When combined with regularisation to prevent overfitting, these techniques can extend parametric thinking by identifying non-linear relationships and threshold effects that classical approaches would miss. However, dimensionality should scale with sample size; adding more potential drivers when data is already sparse makes overfitting worse rather than better, so the value of higher-dimensional exploration depends critically on having sufficient data density.

The common thread across these techniques is not prediction accuracy per se, but stabilisation: producing model behaviour that is less sensitive to the quirks, gaps, and noise inherent in small or messy datasets, and that degrades more gracefully when applied to contexts that differ from historical experience.

Working Productively with Partial Information

One of the most practically useful shifts enabled by modern analytical techniques is the ability to learn from partial data without requiring every project to contribute information to every relationship. Projects can inform the model where they are comparable and well-documented, and can be excluded from specific analyses where they are not, without being discarded entirely from the learning process.

Consider a utility organisation with historical data across multiple pipeline projects. Records might show good consistency for pipe diameter, length, material type, and installation method across the full portfolio, but inconsistent or missing information for ground conditions (recorded narratively in some projects, categorically in others, or not at all in early projects), traffic management approach (not consistently captured until regulatory requirements changed), or utility conflicts (documented when encountered but not when absent). Rather than waiting for ground condition data to be complete or excluding all projects with missing attributes, models can learn robust relationships for diameter, length, material, and installation method with high confidence while treating ground conditions, traffic management, and conflicts probabilistically based on the subset of projects where this information is available.

This approach is more honest than forcing incomplete attributes into precise assumptions, and it aligns closely with how estimators reason informally when making judgements about projects where some information is known with confidence and other aspects remain uncertain. The difference is that this reasoning becomes systematic, repeatable, and improvable over time rather than remaining tacit knowledge that varies between estimators.

A More Concrete Estimating Example

To make these concepts more tangible, consider a water utility organisation delivering a portfolio that includes both treatment plants and pump stations. Over the past fifteen years, the organisation has delivered 45 treatment plants spanning a range of capacities, treatment technologies, and site contexts, providing relatively rich data for understanding treatment plant cost behaviour. During the same period, the organisation has delivered only 6 pump stations, creating a sparse data problem for pump station estimation.

Treating pump stations as entirely independent from treatment plants and attempting to build a pump station cost model from 6 observations alone would produce highly unstable results. Regression coefficients would be dominated by whichever projects happened to be in the sample. Adding or removing a single project could shift predictions materially. Confidence in the model would be appropriately low, potentially leading estimators to rely entirely on judgement rather than structured analysis.

A hierarchical modelling approach instead recognises that treatment plants and pump stations share some common characteristics (they are both water infrastructure with capacity-dependent costs, they both involve civil, mechanical, electrical, and control systems, they both exhibit economies of scale) while differing in specific ways (pump stations are mechanically simpler but more dependent on site configuration and automation requirements, treatment plants have more complex process equipment but more standardised civil works). The model learns general cost-capacity scaling relationships from the full portfolio of 51 projects, then allows pump station-specific adjustments based on the 6 pump station observations where mechanical complexity and automation emerge as particularly important drivers.

The result is not precision, which would be inappropriate given limited pump station data, but improved stability and defensibility compared to either treating the 6 pump stations in isolation or ignoring the distinction between asset types entirely. Initial pump station estimates carry appropriately wide uncertainty bounds that reflect limited specific evidence, but those bounds are narrower and more structured than they would be with pure judgement alone. As additional pump stations are delivered, the model updates its understanding of pump station-specific cost behaviour while retaining the general infrastructure knowledge learned from the broader portfolio.

This example illustrates several key principles: learning can occur at multiple levels of generality, scarce data in one domain can be informed by richer data in related domains, and uncertainty should be explicit and proportional to the evidence supporting each relationship.

Embedding Domain Constraints: What This Looks Like in Practice

One of the risks of advanced analytical techniques, particularly machine learning approaches, is that they can produce results that are mathematically plausible but practically nonsensical, fitting apparent patterns in data that violate physical reality or engineering experience. This risk increases as data quality deteriorates because noise and artefacts provide more opportunities for models to learn spurious relationships.

Embedding domain constraints is therefore essential for working productively with imperfect data. Estimators define which relationships are permissible based on engineering reality, where monotonicity should hold, and what ranges are physically or commercially plausible. Models operate within those bounds, allowing analytical techniques to calibrate relationships while respecting practical limits.

Concrete examples of domain constraints that might be encoded include:

Cost must increase monotonically with asset capacity. While economies of scale mean unit cost typically decreases as capacity increases, absolute cost should never decrease. A model that predicts lower total cost for a larger asset has learned a spurious relationship from noise in the data.

Productivity rates cannot exceed theoretical maxima. If analysis suggests that crews could install 200 linear meters of pipe per day when physical handling, welding, and testing requirements limit maximum practical productivity to 80 meters per day, the model is extrapolating inappropriately or has confused causation with correlation.

Unit rates must fall within bounds informed by market intelligence. Concrete costs estimated at $10 per cubic meter or $1000 per cubic meter are both implausible for typical Canadian construction markets, suggesting either data errors or model misspecification, regardless of what regression analysis might indicate.

Scaling relationships should conform to engineering expectations. Pump station costs might reasonably scale with capacity according to a power law with exponent between 0.5 and 0.8, reflecting economies of scale balanced against increased complexity. Exponents outside this range should trigger investigation rather than being accepted at face value.

These constraints are not arbitrary restrictions on model flexibility. They represent accumulated professional knowledge about how construction costs actually behave, knowledge that should not be discarded simply because a dataset happens to exhibit patterns that contradict it. The most effective implementations establish these constraints collaboratively between estimators who understand cost behaviour and analysts who implement models, creating hybrid approaches where human-defined structure combines with machine-assisted calibration within appropriate boundaries.

When Imperfect Data Is Not Good Enough: Decision Criteria

While modern techniques make working with imperfect data more tractable, there are limits beyond which modelling becomes unproductive regardless of analytical sophistication. Estimators need decision criteria for recognising when data limitations are too severe to support systematic learning, and when relying on judgement alone is more honest than constructing models that provide false confidence.

Modelling with imperfect data becomes questionable or counterproductive when several conditions apply simultaneously:

Sample size is critically small (fewer than 5 to 10 comparable observations for the specific context being estimated) and no transferable patterns are available from related asset types or delivery contexts that could inform hierarchical or Bayesian approaches. At this threshold, regression-based relationships become unstable to the point where they change materially with each new project, and it becomes difficult to distinguish genuine patterns from sampling variation.

Systematic drivers explain minimal variation in observed costs, with the majority of variation attributable to one-off circumstances, commercial decisions, or factors that were not recorded and cannot be reconstructed. If analysis suggests that measurable cost drivers account for less than 30 to 40 percent of observed cost variation, modelling may provide little value over informed judgement because the model cannot capture the factors that actually determine outcomes.

Critical scope-defining attributes are missing or inconsistent for more than half of available projects, making it impossible to establish what projects are actually comparable. If you cannot reliably determine whether projects had similar scope, similar complexity, or similar delivery constraints, learning cost relationships becomes problematic because you do not know whether observed cost differences reflect differences in scope or differences in efficiency, context, or other factors.

Delivery context has changed fundamentally such that historical patterns are unlikely to apply to future projects. Major shifts in procurement approach, regulatory requirements, delivery technology, or market conditions can render historical data unrepresentative of current reality, particularly if those shifts occurred recently and few projects have been delivered under the new conditions.

When these thresholds are exceeded, the appropriate response is typically to rely on structured judgement, analogous projects selected and adjusted manually, or parametric benchmarks from external sources, rather than attempting to build bespoke models from inadequate internal data. Acknowledging these limits explicitly is a sign of maturity, not failure.

Validation Under Imperfect Conditions

Validating models built on imperfect data requires more sophistication than simple holdout testing, which implicitly assumes that data is plentiful enough to divide into training and test sets without introducing additional problems. When datasets are already small or have significant gaps, traditional validation approaches often provide misleading signals about model performance.

For small samples (under 20 to 30 observations), leave-one-out cross-validation provides more robust assessment than single holdout samples. Each project serves as a test case, with the model trained on all remaining data and then used to predict the held-out project. This maximises learning from limited observations while still providing out-of-sample performance assessment. The variance in cross-validation results becomes an important signal in itself: high variance indicates that model performance is heavily dependent on which specific projects are included, suggesting fragility that should inform how the model is used.

For heterogeneous data, validation should assess performance within defined subgroups rather than only in aggregate. A model might perform adequately on average while performing poorly for specific contexts that are underrepresented in training data. Stratified validation that evaluates performance separately for different asset types, delivery models, or scale ranges provides a more nuanced understanding of where the model can be relied upon and where it cannot.

Uncertainty calibration becomes a critical validation metric when working with imperfect data, arguably more important than point accuracy. Are predicted confidence intervals or prediction bounds well-calibrated, meaning that actual costs fall within stated ranges at approximately the expected frequency? A model that is wrong 50 percent of the time but correctly represents its uncertainty through appropriately wide bounds is often more useful than one that appears accurate on average but systematically under-represents risk and variability. Evaluating calibration requires examining not just whether predictions were close to actuals, but whether the stated confidence in those predictions was justified by the data quality and model limitations.

Sensitivity analysis showing how validation metrics change under different outlier treatments, different sample compositions, or different assumptions about missing data provides insight into model robustness. If validation results change dramatically when a single influential project is excluded or when missing data is handled differently, the model's apparent performance may be fragile and should be interpreted with appropriate caution.

These validation approaches acknowledge explicitly that model quality cannot be separated from data quality, and that assessing fitness for purpose requires understanding not just whether predictions were accurate historically, but whether the basis for those predictions is likely to remain relevant and whether uncertainty is being represented honestly.

What Tools and Infrastructure Are Required

Working systematically with imperfect data requires more than conceptual understanding; it requires practical tools and infrastructure that support the analytical approaches discussed above while remaining accessible to estimating teams.

Statistical software environments such as R or Python provide the necessary analytical capabilities for regularisation, Bayesian analysis, hierarchical modelling, and cross-validation. These tools are increasingly accessible, with extensive documentation and libraries specifically designed for the types of analysis relevant to estimating. While not all estimators need to become proficient programmers, organisations should build or acquire capability in these environments, either within estimating teams or through close collaboration with analytical specialists who understand estimating needs.

Database structures that can accommodate missing data explicitly (rather than forcing placeholder values or excluding incomplete records) and that separate raw delivery data from analytical datasets become important as imperfect data is used systematically. The translation layer discussed in previous articles, which maps financial cost codes to estimating drivers, needs to be structured to handle partial mappings and to flag where attribution is ambiguous or assumptions have been made.

Visualisation capabilities for exploring data quality issues, understanding patterns in missing data, and communicating model behaviour support both model development and stakeholder communication. Being able to show which relationships are well-supported by data and which are more speculative, or to visualise how uncertainty propagates through estimates, makes working with imperfect data more transparent and defensible.

Documentation systems that capture domain constraints, modelling assumptions, data quality assessments, and validation results ensure that models remain interpretable over time and can be maintained as data improves or as delivery context changes. Without persistent documentation, the rationale behind modelling choices gets lost, making it difficult to update models appropriately or to challenge them when they produce questionable results.

The level of sophistication required scales with organisational ambition and portfolio complexity. Smaller organisations or those beginning to work with imperfect data systematically can start with simpler implementations, perhaps basic regularised regression implemented in accessible tools, before progressing to more advanced techniques as capability and confidence develop.

Common Failure Modes When Working with Imperfect Data

Understanding how efforts to build models from imperfect data commonly fail helps organisations avoid predictable mistakes and establish more realistic expectations about what can be achieved.

Overfitting to noise remains a persistent risk despite regularisation and other protective measures. Models can appear to perform well in cross-validation while actually encoding artefacts of the specific sample rather than generalisable relationships. This often manifests as models that perform significantly worse on new projects than they did on historical data used for development, or as models whose predictions change dramatically when retrained after a single new project completes.

Spurious precision occurs when models produce point estimates and narrow confidence intervals that are not supported by underlying data quality. This is particularly problematic when models are complex or opaque, making it difficult for estimators to assess whether apparent confidence is justified. The result is decisions made with false certainty, which may be worse than acknowledging uncertainty explicitly and using structured judgement.

Extrapolation beyond sparse data becomes dangerous when models are applied to contexts or scales that are poorly represented in training data. Models may produce confident predictions for these cases despite having no empirical basis for them, and the imperfection of training data makes it harder to detect when extrapolation is occurring. Documenting the domain of applicability explicitly and flagging when estimates fall outside it becomes critical.

Failure to update as conditions change means that models calibrated on historical data become progressively less relevant as delivery context evolves. This is exacerbated when data is imperfect because estimators may attribute poor model performance to data limitations rather than to genuine shifts in cost behaviour, delaying necessary recalibration.

Treating missing data patterns as random when they are actually systematic leads to biased estimates. If certain types of projects consistently have missing attributes, and those projects differ materially from projects with complete data, models learned from complete cases may not generalise to the full population of interest. Understanding patterns in missingness and accounting for them analytically becomes important for avoiding this bias.

Measuring Success with Imperfect Data

Success when building models from imperfect data should be measured differently than when working with high-quality datasets. The appropriate metrics reflect the objectives of robustness, transparency, and decision support rather than narrow point accuracy.

Improved decision consistency across estimators and over time suggests that models are embedding systematic knowledge effectively. When different estimators produce more similar results for comparable work, or when the same estimator produces more consistent results when revisiting similar decisions at different times, this indicates that models are reducing reliance on tacit individual judgement in favor of shared understanding.

Well-calibrated uncertainty means that prediction intervals or confidence bounds match realised outcomes at expected rates. If 80 percent confidence intervals contain actual costs approximately 80 percent of the time across multiple projects, this suggests that uncertainty is being quantified honestly rather than being understated to provide false comfort.

Stability as data evolves indicates that models are capturing genuine patterns rather than noise. When new projects are added to training data, model predictions for existing contexts should shift gradually rather than changing dramatically, suggesting that learned relationships are robust and generalisable.

Appropriate sensitivity to key drivers means that estimates respond to changes in scope, context, or delivery approach in ways that align with engineering intuition and professional experience. Models should be more sensitive to factors that genuinely drive cost variation and less sensitive to factors that are incidental, with sensitivity patterns that can be explained and defended to project teams and decision-makers.

Productive feedback loops emerge when estimators trust models sufficiently to use them systematically, generating additional data that improves future models, which increases trust further. When this virtuous cycle establishes itself, the imperfection of data begins to diminish over time as the discipline of learning systematically creates pressure to capture information in more useful forms.

Reframing Progress and Maturity

Building estimating models with imperfect data is not about eliminating uncertainty or waiting for conditions that will never arrive. It is about managing uncertainty explicitly, consistently, and honestly through structured methods that acknowledge data limitations rather than pretending they do not exist or assuming they will resolve themselves without deliberate effort.

Progress is necessarily incremental. Early models will be crude, limited in scope, and appropriately uncertain in their predictions. Insights will be partial and provisional, subject to revision as more data accumulates or as analytical methods improve. This does not represent failure; it represents realistic engagement with the actual conditions under which construction and infrastructure estimating operates.

Modern analytical techniques do not remove uncertainty or substitute for estimating expertise, but they make working with uncertainty more tractable and more systematic. They allow estimators to scale judgement across portfolios, learn from experience in structured ways that compound over time, and support better decisions earlier in the project lifecycle even when information is incomplete.

The shift required is primarily cultural rather than technical: accepting that imperfect data is not an obstacle to estimating maturity but rather the environment in which maturity is demonstrated. Organisations that make this shift, building capability to work productively with imperfection rather than waiting indefinitely for perfection, position themselves to learn continuously from experience and to improve estimating performance incrementally over time. Those that continue to treat data quality as a precondition rather than as a parallel workstream risk defer their learning and don't improve. In these cases, estimating remains an artisanal practice that struggles to scale and increase accuracy.

How are you maximizing your learning?

Building Estimating Models with Imperfect Data