Data Science in Human & Social Science for Women Empowerment

DataScience Training

SQL and GitHub

This course will briefly introduce to the most important programming languages and tools that Data Scientists use on a daily basic

The context and purpose in which they are typically used will be outlined and the most valuable commands for beginners will be presented:

SQL has become a cornerstone of modern data management. In this course, we will explore different ways SQL can be used to retrieve data from databases. We will discuss what GitHub is, what features it offers, and how software developers can benefit from it.

At the end of the course, students will know the field of activity and the commands that are most common.

Go to course

Introduction to RStudio software

This course presents the concept of RStudio Software. We will learn the history the computing environment Analysis Techniques Community, how to install it, and we will explore RStudio Creating a Project Notebook.

Go to course

Linear discriminant analysis

In this training module you will be introduced to the use of Linear Discriminant Analysis (LDA). LDA is as a method for finding linear combinations of variables that best separates observations into groups or classes, and it was originally developed by Fisher (1936).

This method maximizes the ratio of between-class variance to the within-class variance in any particular data set. By doing this, the between-groups variability is maximized, which results in maximal separability.

LDA can be used with purely classification purposes, but also with predictive objectives.

Go to course

Sampling theory

In this training module you will be introduced to basics of sampling theory. Related to the theory of statistical inference, more specifically to the tools that allow for calculating confidence intervals, we will study the procedures that are used to find optimal sample sizes, depending on the characteristic to be estimated and the sampling technique used.

In this module we will study the differences between sample-based data and population-based data and the most commonly applied sampling techniques: simple and statified sampling. Additionally, we will explore the rules for finding optimal sample sizes, conditional on some objectives related to the confidence and the margin of error that we want to have in our inferences.

Go to course

Test yourself with the skills you have acquired so far

Generalized linear models: ANOVA

In this module you will be introduced to the basic concepts of one- and two-factor Analysis of Variance (ANOVA), which can be understood as a basic linear model.

In this course you will learn how ANOVA can be useful to test, understand and identify the conditions necessary to apply these techniques and perform one-way and multiple analyses of variance and interpret the results obtained.

Go to course

Introduction to machine learning

This script provides definitions of the fundamental concepts in machine learning, as well as descriptions of the main methods used, including some specific examples and applications. You can choose to read the script at a superficial level, to gain a basic grasp of the field, or read the more in-depth descriptions, in particular the methods section, to obtain an intermediate-level understanding of machine learning.

Statistics and Machine Learning provide the main tools for your work as a data scientist. Understanding the various machine learning methods - how they work, what their main advantages are, and how to evaluate their performance on a given task - can help you make better decisions about when to use them and will make you a more versatile data science expert.

Go to course

Data Science & Social Impact: Achieving Positive Outcomes

In this course, we will take a look at the many data science applications which can make the world a slightly better place. We will then go into detail on the social media monitoring conducted on behalf of Amnesty
International Italy to understand how such an application can work.

In the next section, we will explore some of the harmful effects which data science and AI can have. This will help us understand why there is a need for AI systems to be trustworthy.
Finally, we will get familiar with some of the challenges of fairness metrics and see what these metrics can mean in practice.

Go to course

Text Mining

Text mining is a confluence of natural language processing, data mining, machine learning, and statistics used to mine knowledge from unstructured text.

In this course you will learn what text mining is, the challenges and the process flow. You will also study text mining techniques and at the end of the module you will see a case study with Python.

Go to course

Test yourself with the skills you have acquired so far

Data visualization - dashboards

This course presents the concepts of dashboard, the structure of a dashboard , the purpose and objectives of making a dashboardsection, as well as types of a dashboard.

The last part of the module will be dedicated to a case study.

Go to course

Cluster Analysis

In this training module you will be presented the multidimensional analysis technique called Cluster Analysis, also called automatic group analysis.

Cluster analyses are used to group statistical units that have characteristics in common and assign them to categories not defined a priori. The groups that are formed must be as homogeneous as possible inside (intra-cluster) and heterogeneous outside (inter-cluster).

The application of this type of analysis is manifold: computer science, medicine, biology, marketing.

The last part of the module will be dedicated to the application of cluster analysis with the R software.

Go to course

Correspondence Analysis, AC

In this training module you will be presented the multidimensional analysis technique called Correspondence Analysis, AC.

Correspondence Analysis is a form of multidimensional scaling, which essentially builds a kind of spatial model that shows the associations between a set of categorical variables. If the set includes only two variables, the method is usually called Simple Correspondence Analysis (SCA). If the analysis involves more than two variables, then it is usually called Multiple Correspondence Analysis (MCA). In this module we will deal with the analysis of simple correspondences, the objective of this analysis is to reduce the dimensionality of the phenomenon under investigation while preserving the information contained by it. The technique is applicable to phenomena measured with qualitative variables.

The last part of the module will be dedicated to the application of AC with the R software.

Go to course

Principal component analysis (PCA)

In this training module, the multidimensional analysis technique called Principal Components Analysis (PCA) will be presented, whose objective is to reduce the dimensionality of a phenomenon under investigation while preserving the information contained in it. The technique is applicable to phenomena measured with quantitative variables, thus distinguishing itself from other dimensionality reduction techniques, such as simple correspondence analysis (CA) or multiple correspondence analysis (MCA), developed for the analysis of qualitative variables.

The last part of the module will be dedicated to the application of PCA with R.

Go to course

Data Journalism and Storytelling

This course presents the concepts of data journalism and data storytelling. These concepts are described and explained in relation to the world of data. It is explained how to merge data science, a field of study characterised by hard skills, with soft skills and what the advantages of this combination are.

Go to course

Test yourself with the skills you have acquired so far