According to a recent study by Reggora, the average mortgage loan repurchase rate is 0.49% and results in an average cost of $32,288 to the lender. However, what if the repurchase risk was much higher? According to Restb.ai’s recent white paper analyzing the reliability of appraisal condition and quality adjustments, a staggering 33.6% of appraisals were identified as either having an unwarranted condition or quality adjustment or including an adjustment that was not justified by AI’s analysis of the property’s photos. Assuming a conservative estimate of 2.5M appraisals completed each year, lenders are collectively opening themselves up to a risk of more than $27 billion in repurchase costs.
This white paper examines the critical role of condition and quality in real estate appraisals, highlighting notable discrepancies between appraiser assessments and AI-driven evaluations. By analyzing 1,271 appraisals and 6,495 comparable properties, we uncover varying types of inconsistencies that can lead to valuation inaccuracies. Our findings emphasize the importance of more robust quality control in appraisal practices, particularly related to adjustments, or lack thereof, of property condition and quality.
The accuracy of appraisals is paramount in real estate transactions, influencing lending decisions, market valuations, and investment strategies. This report highlights the challenges appraisers face in consistently and accurately assessing property condition and quality, and the subsequent impact on appraisal outcomes.
Leveraging advanced AI and computer vision technology, Restb.ai provides an objective analysis of appraisal data, revealing patterns and insights that have previously been overlooked due to technological limitations (i.e. the ability to audit condition and quality at scale).
No time to read the full white paper?
Tune into our AI-generated podcast summary for the key takeaways.
Limitations of Appraiser Assessments of Condition and Quality
Reliability of AI-Generated Condition and Quality Scores
Impact on Valuation
Accurate property appraisals are essential for informed decision-making and an efficient lending environment. Condition and quality adjustments play a pivotal role in determining a property’s fair market value. However, inconsistencies in these assessments can undermine the reliability of appraisals.
This report presents an in-depth analysis of condition and quality assessments and adjustments, utilizing AI-driven evaluations to provide an objective perspective. By examining a dataset of 1,271 appraisals, we identify trends and discrepancies that highlight the challenges in achieving consistent appraisal accuracy.
Overview of Analyzed Appraisal Features
This study focuses on two critical appraisal components: property condition and quality. The Uniform Appraisal Dataset (UAD) provides a framework detailing how each property can be scored along each dimension.
Appraisers are required to utilize the UAD framework to assess a condition rating (C1 to C6) and a quality of construction rating (Q1 to Q6) for the subject property and each comparable property. In the event there are differences between a subject property’s condition and/or quality and a referenced comparable property, the appraiser may provide a valuation adjustment to account for the impact on the valuation.
Consistently and reliably scoring properties based on their condition and quality is a challenge. While the UAD provides clear criteria for assessing these aspects, they are not quantifiable and objective in the same way as other features, such as square footage or the number of bathrooms. They must be interpreted by each appraiser, which can introduce subjectivity.
Appraisers are instructed to adopt a “holistic view” of each property. However, many homeowners implement renovations over time and the extent of those improvements can vary. A property may have a bathroom renovated to a C2 level, but if the rest of the property is in a C4 condition, what is the correct way to account for that? When renovating a kitchen, what’s the correct way to differentiate between repainting cabinets vs. replacing them?
Even experienced appraisers often face challenges consistently answering these questions. A paper by Michael D. Eriksen, Chun Kuang, and Wenyu Zhu, analyzing appraisal attributes, highlighted appraisers that had completed an appraisal for a particular property and then reused it as a comparable property on a future appraisal recorded a different condition score 12.6% of the time and a different quality score 9.5% of the time.
Complicating matters further, Figure 1 shows 81.1% of properties from the Appraisal-Level Public Use File (PUF) are classified as either a C3 or C4 and 97.5% are classified as a Q3 or Q4, making it challenging to determine when an adjustment may be necessary. Given the clustering of properties on similar scores, it is a challenge to know when an adjustment may be necessary. Two properties may both “correctly” be considered C4s, but there still may be a material difference in the properties’ condition and a value adjustment may be warranted.
Figure 1: Condition and Quality Distribution from the Appraisal-Level PUF
The reality is that each property exists on a spectrum. Every property that is now a C4, was once a C3, C2, and even a C1. The lines between these categories are blurry, and the perceived condition and quality can vary among appraisers, or even the same appraiser at different times, contributing to the difficulty of achieving objective and consistent assessments.
Despite these challenges, reliable analysis of condition and quality remain critical due to their impact on valuations and risk. As can be seen in Figure 2, Fannie Mae’s most recent findings highlight inadequate selection of comparables, inadequate adjustments on comparables, and inaccurate reporting of subject condition and quality as their top three findings.
Figure 2: Fannie Mae’s Top 10 Findings in 2024 Q2/Q3
Unfortunately, it is difficult and time consuming for Appraisal Management Companies (AMCs) and lenders to easily identify these issues. While the subject property can be validated based on the included imagery within an appraisal, each comparable only features a single photo of its exterior. Quality control teams frequently don’t have the bandwidth to pull up each comparable property’s images to ensure all adjustments make sense. As such, many condition and quality risks can slip through the cracks.
The lack of transparency on these attributes can even be taken advantage of by appraisers to reach an unjustified value. Notably, in one recently settled appraiser bias case, it was stated in the complaint that,
“The majority of improper adjustments made by “the appraiser” are concealed through her use of “C” and “Q” rates”: 1) “She applied Q3 rating to the Plaintiffs’ home, a 10% downward adjustment, but such ratings are intended for stock homes located on above-average residential development tracks, rather than the Q2 rating appropriate for semi-custom homes with “detailed, high quality exterior ornamentation, high quality interior refinements, and detail” per the Fannie Mae and Freddie Mac Uniform Mortgage Data Program. 2) She applied a C3 rating to the Plaintiffs’ home, a 10% downward adjustment, but such ratings are reserved for homes still in their first cycle “of replacing short-lived building components (appliances, floor coverings, HVAC, etc.)” even though the Plaintiffs had replaced all of those components with high end components, and there was little or no deferred maintenance.”
As can be seen through these examples, inaccurate assessment of condition and quality not only leads to greater risk for lenders and the GSEs, but also detrimental outcomes for borrowers as well.
Recognizing the critical importance of reliable and consistent condition and quality assessments, AMCs, lenders, and Government-Sponsored Enterprises (GSEs) are turning to technology to aid in quality control processes. Computer vision and artificial intelligence (AI) are increasingly being utilized to objectively analyze property photos and evaluate properties on their condition and quality.
In a recent Appraiser Update from Fannie Mae, Fannie Mae states computer vision’s impact on appraisers will be the following: “Appraisers who are diligent in factually and objectively determining C and Q ratings (and adjustments) will have a competitive advantage, while those who are not rigorous may experience higher rates of defects and all the associated impacts such as lender requests for reconsideration of value, or Appraiser Quality Monitoring letters.“ They also highlight the reliability of AI with the following comment about their analysis of over a million appraisals, “Appraisal experts in our Loan Quality Center reviewed those reports and found the model prediction was 98% accurate.”
An advantage of AI compared to human analysis is that AI can consistently analyze properties in a repeatable fashion. The subjective nature of condition and quality means that subconscious biases related to the location of a home, personal preferences, or something as innocent as an appraiser’s mood that day, can unduly influence an assessment. Meanwhile, AI is trained over property imagery independently of that property’s price, region, owners, or any other aspect that is more difficult for a human to abstract.
Another key benefit of using photo-based AI is that it can provide more granular assessments of a property. Rather than having 6 ratings to categorize properties, it can consistently provide nuanced analysis that makes it easier to identify when properties are truly comparable and when adjustments may be necessary. For example, Figure 3 provides a case where the subject property is a C3.4 (i.e. a C3) with comparable property A that is a C3.5 (i.e. a C4) and comparable property B that is a C2.6 (i.e. a C3). Which is more deserving of an adjustment?
Figure 3: Importance of Granular Scores for Understanding Adjustments
As the Figure 4 below details, a considerable percentage of properties exist in these grey zones between a C3/C4 and a C2/C3. Detailed scores provided by AI can help determine when there are meaningful differences between two properties, regardless of where they fall on the spectrum.
Figure 4: AI’s Condition Distribution Analyzing Property Images
Beyond just providing an overall score, Figure 5 demonstrates computer vision’s ability to break out and score the different components of a property. While humans may struggle at consistently providing scores of homes with varying levels of updates, AI is able to effectively and consistently aggregate how different areas of a home impact its overall score.
Figure 5: Example of AI-Generated Scores for Property and Sub-Components
With the GSE’s appraisal modernization efforts and the new UAD requiring condition and quality scores to be broken out into interior and exterior scores, it is essential to incorporate more robust ways to ensure appropriate assessments of condition and quality.
To understand the prevalence of condition and quality issues, we analyzed 1,271 appraisals. AI was leveraged to generate scores for the subject properties based on their appraisal imagery while the most recent listing photos were utilized to score each comparable property.
In the following section we will examine:
To begin our analysis, Table 1 highlights the various cases that can occur when examining an appraisal for potential condition and quality issues. For clarity, these observations are based purely on the appraiser provided scores and are not considering any AI analysis.
Next we examine how condition and quality scores vary for the subject property. The subject property is intentionally differentiated from the assessments of comparable properties due to differences in how the scores were determined for both cases.
For the subject property, the appraiser has either visited it in person or received a complete set of data from a data collector, while for the comparable properties, the appraiser may have done as little as drive by the front of the property and at most, analyzed the photos from a recent listing. Similarly, the AI-generated scores are based off of the appraisal imagery for the subject property, while the comparable property scores are generated from listing imagery. While extensive work has been put into our AI models to normalize differences in image quality and property presentation, we decided it was more appropriate to evaluate both independently.
Analyzing the scores provided in each report, Figure 6 shows 86.1% of appraisals were scored as C3 or C4 and 97.0% of appraisals were scored as a Q3 or a Q4. This is consistent with the numbers previously highlighted above based on appraisals in the Appraisal-Level PUF.
Figure 6: Appraiser Condition and Quality Scores for Subject Property
Figure 7: AI-Generated Condition and Quality Scores for Subject Properties
Figure 8: AI-Generated Condition and Quality Scores for Subject Properties
Table 2: Appraiser vs. AI Subject Property Condition and Quality Differences
When examining the differences between appraiser scores on the comparable properties they selected, Figure 9 shows a roughly similar distribution to the subject property analysis with 84.5% of properties being scored as C3 or C4 (vs. 86.1 for subject) and 96.3% of properties being scored as Q3 or Q4 (vs. 97.0% for subject).
Figure 9: Appraiser Condition and Quality Scores for Comparable Properties
Meanwhile, there are more noticeable differences between the subject and comparables when analyzing the AI-generated scores. According to Figure 10, the comparable condition scores are generally lower compared to the subject. Where 28.6% of subject properties had C4 ratings, 21.1% of comparable properties were C4. Conversely, 57.2% of subject properties were C3, while 63.8% of comparable properties were C3, revealing appraisers used comparables in better condition more frequently. There are various possible explanations, but it could be related to a subconscious bias to compare a property with better condition comparables in order to achieve a higher valuation. This would be consistent with a recent CSS study highlighting that appraisals in H1 2024 came in higher than the sale price 51.1% of the time, at the sale price 40.5% of the time, and below the sale price in 8.4% of cases.
Figure 10: AI-Generated Condition Scores for Comparable Properties
Figure 11 reveals an even more dramatic shift with comparable properties being higher quality on average. 59.4% of subject properties were Q4 compared to 37.4% of comparable properties and 37.6% of subject properties were Q3 compared to 53.8% of comparable properties. Similar to the condition scores, this could be evidence of a tendency to utilize higher quality comparable properties that, if not adjusted for, could lead to overvaluations.
Figure 11: AI-Generated Quality Scores for Comparable Properties
When comparing each comparables’ appraisal score with the corresponding AI-generated score in Table 3, we see a notably greater percentage of inconsistencies than when analyzing the subject properties. This higher percentage could be linked to the shifts seen in the condition and quality scores, or indicate that appraisers are more inconsistent when evaluating comparable properties they have not analyzed to the same extent as the subject properties, as could be reasonably expected.
Table 3: Appraiser vs. AI Comparable Property Condition and Quality Differences
While identifying when an AI-generated score differs from an appraiser rating may help identify problematic appraisals (see Exhibit B at end of report), it is more relevant to determine when adjustments were improperly made or omitted based on the AI’s standardized analysis of the subject and comparable property. For example, an appraiser may score the subject and all comparable properties incorrectly, but if their scores are all over/under assessed like in the example appraisal in Exhibit C, then there isn’t necessarily a risk of over or undervaluation.
Let’s start by analyzing all of the cases where the AI-generated scores indicate an adjustment may be needed. Table 4 below details the cases when there is a medium risk of an adjustment being warranted up to a high risk. If you recall earlier when looking at the appraiser scores, condition adjustments were made on more than a third of comparables (34.4%) and quality adjustments were made on more than one in every ten (11.6%). At the medium risk threshold that indicates more adjustments should be made, while at the higher risk threshold, fewer adjustments are warranted. Additionally, the AI-generated scores similarly show a more consistent need for condition adjustments than quality adjustments, though the exact magnitude varies depending on the specified tolerance.
However, in many of these cases, the appraiser made an adjustment to account for the differences in condition or quality. More relevant to our study is when the comparables’ AI-generated values were meaningfully different from the AI-generated subject score and no adjustments were made by the appraiser. An example of an appraisal where adjustments were expected based on the AI assessment, but none were included can be seen in Exhibit D.
As can be seen in Table 5 below, just under a quarter of comparables are a medium risk of needing an adjustment for either condition (23.4%) or quality (24.5%), while 11.7% of comparables are at a high risk of warranting a condition adjustment and 4.1% of needing a quality adjustment.
Notably, there are more cases with comparable properties being utilized without proper condition adjustments than quality adjustments. Furthermore, despite appraisers already making condition adjustments at a 3x rate of quality adjustments, there are still more cases where condition differences are not properly being accounted for than quality issues.
The last case we investigated was the rate when an adjustment was made but the differences in AI-scores indicate it may not have been needed. For this scenario, we identify a medium risk case as when an adjustment exists for a difference of less than or equal to 0.5 and a high risk when an adjustment exists for a difference of less than equal to 0.1 (i.e. essentially considered the same by the AI). An example of an appraisal with unwarranted adjustments can be found in Exhibit E.
Surprisingly, Table 6 indicates appraisers are making adjustments incorrectly more frequently than they are failing to provide adjustments, which could be indicative of appraisers using condition and quality adjustments to justify inaccurate valuations.
Given the above calculated rates, the final step is to determine how many appraisals feature at least one problematic comparable property. As can be seen below in Table 7, a remarkable 73.9% of appraisals have a medium risk and 33.6% have a high risk of an improper condition or quality adjustment.
As this study has detailed, condition and quality assessments and price adjustments are frequently inaccurate, inevitably leading to imprecise valuations and increased risk. These errors are understandable given the difficulty of manually condensing complex property details into two high-level scores.
While multiple quality reviews exist through the lifetime of an appraisal, consistently identifying and correcting these issues has remained a challenge. It is simply too difficult to know when a problematic condition or quality adjustment, or lack thereof, may exist, and too time consuming to pull up photos of comparable properties to ensure all properties have been evaluated consistently.
Fannie Mae has identified these inaccuracies as frequent risks, and legal cases have highlighted how they can lead to flawed valuations.
These risks translate to significant financial costs for lenders. According to a recent study by Reggora, the average mortgage loan repurchase rate is 0.49% and results in an average cost of $32,288 to the lender. Assuming a conservative estimate of 2.5M appraisals completed each year, the 33.6% of high risk appraisals would equate to a collective lender risk of more than $27 billion in repurchase costs.
Fortunately, this study highlights the opportunity of computer vision to automatically identify these issues. While some appraisers remain skeptical of AI, its value is in its ability to immediately flag potential issues for closer review rather than waiting for discrepancies to be found later in the appraisal process. The detailed nature of the computer vision scores enables each appraiser, lender, AMC or GSE to determine their own risk tolerance by setting the thresholds for when they would like to be notified of a possible issue.
By leveraging AI-driven evaluations and adhering to best practices, appraisers can improve accuracy and lenders can minimize risk. Furthermore, this data driven approach can lead to greater transparency and trust in the appraisal process for all stakeholders.
For those interested in more studies like this, please let us know at insights@restb.ai what topic you would like analyzed next!
Exhibit A: Study Overview
Exhibit B: Appraiser and AI Condition and Quality Overview
Exhibit C: Appraisal where no adjustments are needed, but scores are consistently off
Figure 12 details a comparables grid where the appraiser has indicated the quality of the subject and all comparable properties as Q4. No adjustments have been made for quality.
Meanwhile, the AI-generated scores in Figure 13 show that each property is closer to a Q2 than a Q4. Given the similar nature of the properties’ quality, there likely isn’t a risk of over or undervaluation, even if the appraiser has misassessed the quality of each property. However, this can still cause issues as many systems and portals may show appraisers what properties have been scored based on prior appraisals, leading to problematic data being referenced by other appraisers in the future.
Exhibit D: Appraisal where adjustments weren’t made, but were warranted
Figure 14 details a comparables grid where the appraiser has indicated the condition of the subject and all comparable properties as C4. No adjustments have been made for condition.
Exhibit E: Appraisal with adjustments made that were not warranted
Figure 16 details a comparables grid where the appraiser has indicated the condition of the subject and all comparable properties as C3. Condition adjustments have been made for Comparables 1 and 2.
Meanwhile, the AI-generated scores in Figure 17 show that all of the properties are in largely similar conditions (C3). In this case, there is a high risk the property may be undervalued due to its comparables being adjusted down unnecessarily.