Analysis-ready datasets have been responsibly collected and reviewed so that analysis of the data yields clear, consistent, and error-free results to the greatest extent possible. When working on a research project, take steps to ensure that your data is safe, authentic, and usable.
Since data is often messy, with data management, we aim to clean it before we analyze it. The following are concepts for preparing analysis-ready datasets.
Spreadsheets are a two-dimensional way to store, view, analyze, and alter two-dimensional data. Best practices for creating datasets:
Tidy Data specifically has become the standard format for the sciences because it easily allows people to easily turn a data table into graphs, analysis and insight. Dr. Hadley Wickham, Chief Scientist at RStudio and Adjunct Professor of Statistics at University of Auckland, Stanford, and Rice University, coined the term “tidy data” in order to minimize the effort involved when preparing data for visualization and statistical modeling.
A “tidy dataset” has the default structure:
Validation helps ensure that data is collected correctly. Best practices for validating your datasets:
Standardization ensures the data is internally consistent. Ensures the data is the same kind and format for each data element that you are collecting. It also helps minimize data collection and analysis errors and prevents inconsistencies. Best practices for standardizing your data:
Before performing analysis of your data, review the datasets for inaccuracies, inconsistencies, or sensitive data. Cleaning your data allows you to identify outliers or errors before you compile your results. Best practices for cleaning your data:
Before analyzing or sharing your data, ensure that you have appropriate documentation. Appropriate documentation facilitates the understanding, analysis, sharing and reuse of your data. Best practices for documenting your data:
Email dataservices@ucdavis.edu to schedule a consultation related to the organization, storage, preservation, and sharing of data.