Abstract: | ABSTRACT: Watershed classification using multivariate techniques requires the incorporation of continuous datasets representing controlling environmental variables. Often, out of convenience and availability rather than importance to the structure of the system being modeled, the environmental data used originate from a variety of sources and scales. To demonstrate the importance of appropriate environmental data selection, classifications of six‐digit hydrologic units (1:24,000) across selected geographic areas within the Interior Columbia River Basin were produced. Canonical correspondence analysis was used to select and test environmental variables important in predicting Rosgen stream types and valley bottom classes. Then, hierarchical agglomerative clustering was used to group (classify) watersheds based on these variables. Statistically significant results were derived from the use of organized classification data with presumed predictive relationships to watershed properties, and a random distribution of environmental variables from the same datasets provided similar results. The results contained herein demonstrate that these analysis techniques do not necessarily select meaningful variables from a broad spectrum of data and that significant results are easily generated from randomly associated data. It is suggested that classifications produced using these multivariate techniques, especially when using multi‐scale data or data of unknown significance, are subject to invalid inferences and should be used with caution. |