“The customer is the final filter. What survives the whole process is what people wear.” – Marc Jacobs Fashion is a fascinating domain for computer vision. Not only does it offer a challenging testbed for fundamental vision problems—human body parsing [42, 43], crossdomain image matching [28, 20, 18, 11], and recognition [5, 29, 9, 21]—but it also inspires new problems that can drive a research agenda, such as modeling visual compatibility [19, 38], interactive fine-grained retrieval [24, 44], or reading social cues from what people choose to wear [26, 35, 10, 33]. At the same time, the space has potential for high impact: Aleesha Institute Fashion Designing the global market for apparel is estimated at $3 Trillion USD . It is increasingly entwined with online shopping, social media, and mobile computing—all arenas where automated visual analysis should be synergetic. In this work, we consider the problem of visual fashion forecasting. The goal is to predict the future popularity of fine-grained fashion styles. For example, having observed the purchase statistics for all women’s dresses sold on Ama zon over the last N years, can we predict what salient visual properties the best selling dresses will have 12 months from now? Given a list of trending garments, can we predict which will remain stylish into the future? Which old trends are primed to resurface, independent of seasonality? Computational models able to make such forecasts would be critically valuable to the fashion industry, in terms of portraying large-scale trends of what people will be buying months or years from now. http://www.aleeshainstitute.com/ They would also benefit individuals who strive to stay ahead of the curve in their public persona, e.g., stylists to the stars. However, fashion forecasting is interesting even to those of us unexcited by haute couture, money, and glamour. This is because wrapped up in everyday fashion trends are the effects of shifting cultural attitudes, economic factors, social sharing, and even the political climate. For example, the hard-edged flapper style during the prosperous 1920’s in the U.S. gave way to the conservative, softer shapes of 1930’s women’s wear, paralleling current events such as women’s right to vote (secured in 1920) and the stock market crash 9 years later that prompted more conservative attitudes . Thus, beyond the fashion world itself, quantitative models of style evolution would be valuable in the social sciences. While structured data from vendors (i.e., recording purchase rates for clothing items accompanied by meta-data labels) is relevant to fashion forecasting, we hypothesize that it is not enough. Fashion is visual, and comprehensive fashion forecasting demands actually looking at the prod ucts. Thus, a key technical challenge in forecasting fashion is how to represent visual style. Unlike articles of clothing and their attributes (e.g., sweater, vest, striped), which are well-defined categories handled readily by today’s sophisticated visual recognition pipelines [5, 9, 29, 34], styles are more difficult to pin down and even subjective in their definition. In particular, two garments that superficially are visually different may nonetheless share a style. Furthermore, as we define the problem, fashion forecasting goes beyond simply predicting the future purchase rate of an individual item seen in the past. So, it is not simply a regression problem from images to dates. Rather, the forecaster must be able to hypothesize styles that will become popular in the future—i.e., to generate yet-unseen compositions of styles. The ability to predict the future of styles rather than merely items is appealing for applications that demand interpretable models expressing where trends as a whole are headed, as well as those that need to capture the life cycle of collective styles, not individual garments. Despite some recent steps to qualitatively analyze past fashion trends in hindsight [41, 33, 10, 39, 15], to our knowledge no existing work attempts visual fashion forecasting. We introduce an approach that forecasts the popularity of visual styles discovered in unlabeled images. Given a large collection of unlabeled fashion images, we first predict clothing attributes using a supervised deep convolutional model. Then, we discover a “vocabulary” of latent styles using non-negative matrix factorization. The discovered styles account for the attribute combinations observed in the individual garments or outfits. They have a mid-level granularity: they are more general than individual attributes (pastel, black boots), but more specific than typical style classes defined in the literature (preppy, Goth, etc.) [21, 38, 34]. We further show how to augment the visual elements with text data, when available, to discover fashion styles. We then train a forecasting model to represent trends in the latent styles over time and to predict their popularity in the future. Building on this, we show how to extract style dynamics (trendy vs. classic vs. outdated), and forecast the key visual attributes that will play a role in tomorrow’s fashion—all based on learned visual models. We apply our method to three datasets covering six years of fashion sales data from Amazon for about 80,000 unique products. We validate the forecasted styles against a heldout future year of purchase data. Our experiments analyze the tradeoffs of various forecasting models and representations, the latter of which reveals the advantage of unsupervised style discovery based on visual semantic attributes compared to off-the-shelf CNN representations, including those fine-tuned for garment classification. Overall, an important finding is that visual content is crucial for securing the most reliable fashion forecast. Purchase meta-data, tags, etc., are useful, but can be insufficient when taken alone.