It is common in statistical data to have categorical variables, indicating some subdivision of data, such as social class, primary diagnosis, tumor stage, Tanner stage of puberty, etc. Typically, these are input using a numeric code.Such variables should be specified as factors in R. This is a data structure that (among other things) makes it possible to assign meaningful names to the categories.There are analyses where it is essential for R to be able to distinguish between categorical codes and variables whose values have a direct numerical meaning .The terminology is that a factor has a set of levels — say four levels for concreteness. Internally, a four-level factor consists of two items: (a) a vector of integers between 1 and 4. (b) a character vector of length 4 containingstrings describing what the four levels are.