English   Danish

2020/2021  KAN-CDSCO1001U  Foundations of Data Science: Programming and Linear Algebra

English Title
Foundations of Data Science: Programming and Linear Algebra

Course information

Language English
Course ECTS 7.5 ECTS
Type Mandatory
Level Full Degree Master
Duration One Semester
Start time of the course Autumn
Timetable Course schedule will be posted at calendar.cbs.dk
Study board
Master of Science (MSc) in Business Administration and Data Science
Course coordinator
  • Raghava Rao Mukkamala - Department of Digitalisation
Main academic disciplines
  • Information technology
  • Statistics and quantitative methods
Teaching methods
  • Blended learning
Last updated on 10-06-2020

Relevant links

Learning objectives
  • Summarize different fundamental concepts, techniques and methods of big data processing
  • Describe and analyse various architectures and platforms for big data processing
  • Design and implement interactive programs using Python programming language using its appropriate linguistic features.
  • Demonstrate understanding of imperative, declarative and object oriented language features of Python language and know when it is appropriate to use each.
  • Write programs in Python programming language that make use of external libraries, APIs, etc.
  • Demonstrate basic understanding of mathematical and statistical foundations needed for data mining and machine learning.
  • Exhibit deeper knowledge and understanding of the topics as part of the project and the report should reflect on critical awareness of the methodological choices with written skills to accepted academic standards.
Course prerequisites
Even though there are no prerequisites for this course, it is recommended to have some experience with programming and understanding basic statistics. Moreover, this course is a demanding course as the students will learn new technologies, methods, Python programming language and therefore it requires an interest in and commitment to hands-on learning.
Prerequisites for registering for the exam (activities during the teaching period)
Number of compulsory activities which must be approved (see s. 13 of the Programme Regulations): 3
Compulsory home assignments
Each assignment is 1-3 pages in group of 1-4 students.
The students have to get 3 out of 5 assignments approved in order to go to the exam.

There will not be any extra attempts provided to the students before the ordinary exam.
If a student cannot hand in due to documented illness, or if a student fails the activity in spite of making a real attempt to pass the activity, then the student will be given one extra attempt before the re-exam. Before the re-exam, there will be one home assignment (max.10 pages) which will cover 3 mandatory assignments.
Examination
Foundations of Data Science: Programming and Linear Algebra:
Exam ECTS 7,5
Examination form Oral exam based on written product

In order to participate in the oral exam, the written product must be handed in before the oral exam; by the set deadline. The grade is based on an overall assessment of the written product and the individual oral performance.
Individual or group exam Individual oral exam based on written group product
Number of people in the group 2-4
Size of written product Max. 15 pages
Assignment type Project
Duration
Written product to be submitted on specified date and time.
20 min. per student, including examiners' discussion of grade, and informing plus explaining the grade
Grading scale 7-point grading scale
Examiner(s) Internal examiner and second internal examiner
Exam period Winter
Make-up exam/re-exam
Same examination form as the ordinary exam
In order to participate in the oral exam, the written product must be handed in before the oral exam; by the set deadline. The grade is based on an overall assessment of the written product and the individual oral performance.
Course content, structure and pedagogical approach

This course provides an introduction to three main areas:
Python programming, mathematical/statistical foundations of Data Science such as Linear Algebra and Big data architectures/platforms.

Furthermore, this course provides knowledge about

 

  • Introduction to Python programming language such as programming basics, Boolean algebra, choice, repetition
  • Functions, classes, modules, data structures and collections in Python language
  • Introduction to Algorithmic Complexity and some simple Algorithms and Data Structures
  • Linear Algebra: Vectors, Vector Spaces, Basis
  • Matrices, Dimensions,Gaussian Elimination, and Inner Product, Eigenvector
  • Elementary probability theory, standard distributions,
  • Architectures and platforms for big data processing such Hadoop, Spark, distributed file systems
  • Git fundamentals and branching

 

 

 

Furthermore, the course provides the students with practical hands-on experience on many of the topics listed above. After completing the course the students will be able to apply and use various programming constructs in Python language and also a good understanding of big data architectures and foundational mathematical/statistical theories that are required for data science courses.

 

Description of the teaching methods
The course consists of lectures, exercises, and mandatory assignments. The lectures will be delivered online and the hands-on exercise sessions will be conducted on campus. There will be a teaching assistant/instructors providing technical support for the hands-on exercise sessions.

The presented theories, concepts and methods should be applied in practice in the exercise sessions. The students will work on the mandatory assignments to consolidate their understanding of the concepts and the application of the concepts using the practical skills obtained from the hands-on exercises.
Feedback during the teaching period
In this course, feedback to the students will be provided in the following ways.

1) During the hands-on exercises following each lecture, the students will receive help and feedback in solving the practical hands-on exercises from the teacher and the instructors.

2) At the end of each exercise session, we will go through the solutions to the exercises and discuss various techniques and alternative methods to solve the exercises and also clarify any questions from the students.

3) Feedback on the mandatory assignments will be provided to students as part of the grading for the mandatory assignments. Since the mandatory assignments are at the group level, the students will receive collective feedback on their group submission.

Student workload
Lectures 32 hours
Exercises 32 hours
Prepare to class 44 hours
Project work & report 80 hours
Exam and prepare 10 hours
Total 206 hours
Expected literature

The literature can be changed before the semester starts. Students are advised to find the final literature on Canvas before they buy the books.Notes, scientific articles, chapters and webpages will be handed out/made available during the course

 

Textbooks:

 

 

Authors(s)

Title

Publisher/ ISBN/ DOI

[ICPP]

John V. Guttag

Introduction to Computation and Programming Using Python

The MIT Press/

ISBN-13: 978-0262519632

[PPICS]

John M. Zelle

 

Python Programming:

An Introduction to Computer Science

Franklin, Beedle & Associates; 3rd edition

ISBN-10: 1590282752

ISBN-13: 978-1590282755

[CMLA]

Philip N. Klein

Coding the Matrix: Linear Algebra through Applications to Computer Science

Newtonian Press/ ISBN-13: 978-0615880990

[PG]

Scott Chacon, Ben Straub

Pro Git

https:/​/​git-scm.com/​book/​en/​v2

[LAP]

David C. Lay, Steven R. Lay, Judi J. McDonald

Linear Algebra and Its Applications (FIFTH EDITION)

Pearson; 5 edition

ISBN-10: 1292092238

ISBN-13: 978-1292092232

 

 

 

 

 

 

 

 

 

 

Journal papers:

 

[1]

Singh, Dilpreet, and Chandan K. Reddy. "A survey on platforms for big data analytics." Journal of Big Data 2, no. 1 (2015): 8.

[2]

Chen, Min, Shiwen Mao, and Yunhao Liu. "Big data: A survey." Mobile networks and applications 19, no. 2 (2014): 171-209.

[3]

Fang, Hua, Zhaoyang Zhang, Chanpaul Jin Wang, Mahmoud Daneshmand, Chonggang Wang, and Honggang Wang. "A survey of big data research." IEEE network 29, no. 5 (2015): 6-9.

[4]

Oetiker, Tobias, Hubert Partl, Irene Hyna, and Elisabeth Schlegl. "The not so short introduction to LATEX2ε." Electronic document available at  https:/​/​tobi.oetiker.ch/​lshort/​lshort.pdf  (1995).

Last updated on 10-06-2020