What Are the Right Data for AI?AI is most likely to succeed when used with high-quality data sources on which to “learn” and classify data in relation to outcomes. However, most clinical data, whether from electronic health records (EHRs) or medical billing claims, remain ill-defined and largely insufficient for effective exploitation by AI techniques. For example, EHR data on demographics, clinical conditions, and treatment plans are generally of low dimensionality and are recorded in limited, broad categorizations(eg, diabetes) that omit specificity (eg, duration, severity, and pathophysiologic mechanism). A potential approach to improving the dimensionality of clinical data sets could use natural language processing to analyze unstructured data, such as clinician notes. However, many natural language processing techniques are crude and the necessary amount of specificity is often absent from the clinical record.