Data Mining Investments:<br>Mortgage Portfolios

Risk Monitors, a wholly owned subsidiary of GMAC needed to streamline and improve the review, analysis and interpretation of data assets. The challenge was to build loan-level models to assess the monetary effect of prepayments on mortgage portfolios in order to estimate their market value. From the mortgage-holder's standpoint, mortgages are essentially securities, and are traded like other securities. According to the Mortgage Bankers Association, there are more than 50 million mortgages outstanding in the United States.

Loan-level modeling helps investors make decisions about hedging, risk assessment, and loan retention. However, various methodologies for forecasting prepayments can introduce as much as a 20- to 30-percent difference in the estimated cash flow of a portfolio. For a typical portfolio, such differences could easily be measured in the hundreds of millions of dollars per year in cash flow alone.

Economic theories can attempt to explain possible prepayment drivers, but without data mining, confirmation involved arduous and lengthy analyses by very qualified researchers. The objective of Risk Monitors was to determine the feasibility of evaluating methodologies more quickly to see how well they tracked reality. To achieve this, Risk Monitors and Silicon Graphics Inc. (SGI) entered into a collaborative arrangement in February 1998, with the goal of employing Silicon Graphics' MineSet data mining software extensively on a dataset of about 11 million loans.


  • Data cleansing is an often unrecognized use of data mining. During the knowledge discovery process, the team could swiftly identify data discrepancies caused by input errors and other factors that otherwise might have led to an incorrect valuation model. The team went through several iterations, each time finding problems with the data. This use of data mining requires that the models be comprehensible. Had opaque models such as neural networks been used, it is unlikely that problems would have been discovered.
  • Often, data was graphed geographically. This made it easy to see that the refinance costs in certain states were much higher than others. Observing this, some Risk Monitors customers, whose business it is to solicit refinancing by targeting individual states, shifted priorities to the lower cost states, and saving on telemarketing costs.
  • Other analysis showed that the best lag (time difference) between events, such as treasury rate changes and prepayment was about four weeks. This is shorter than conventional wisdom of about eight to 10 weeks.
  • Despite cleansing, there are many unexplained discrepancies in the dataset. For example, mortgages in which the zip code was unknown show very large prepayments, typically at factor of five times higher than in any known zip code.
  • In several cases, the team was able to identify spurious correlations caused by external processes or events. For instance, a large group of loans was paid off during a certain shift in the yield curve. It turned out that although a servicer had sold off a portion of its portfolio at that time, there may or may not be a correlation to the yield curve.

One factor that became clear to Risk Monitors is that a data mining environment must allow for modeling, drill-downs, drill-through, and the integration of different tools and visualization to provide the most useful insights. Users need more than a single algorithm to mine data effectively.

Aydin Senkut is MineSet Product Manager for Silicon Graphics, Inc. (Mountain View, Calif.) can be reached at (650) 960-1980 or via e-mail at Richard Harmon is Managing Director for Risk Monitors, Inc. (White Plains, N.Y.) and can be reached at (914) 397-9400 or via e-mail at

Back To Mullins Article