All Courses
Data Science: Wrangling
edX
Course
Beginner
Free to Audit
Certificate

Data Science: Wrangling

Harvard University

Learn to process and convert raw data into formats needed for analysis.

1 hrs/week8 weeksEnglish108,331 enrolled
Free to Audit

About this Course

In this course, part of our Professional Certificate Program in Data Science ,we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point. Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form is called data wrangling. This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.

What You'll Learn

  • Importing data into R fromdifferent file formats
  • Web scraping
  • How to tidy data using the tidyverse tobetter facilitateanalysis
  • String processing with regular expressions (regex)
  • Wrangling data using dplyr
  • How to workwith dates and times as file formats
  • Text mining

Instructors

R

Rafael Irizarry

Professor of Biostatistics

Topics

Data Science
HyperText Markup Language (HTML)
Text Mining
Web Pages
Parsing
Data Wrangling

Course Info

PlatformedX
LevelBeginner
PacingUnknown
CertificateAvailable
PriceFree to Audit

Skills

علم البيانات
لغة ترميز النص التشعبي (HTML)
تنقيب النصوص
صفحات الويب
التحليل (Parsing)
Data Wrangling

Start Learning Now