¹û¶³Ó°Ôº

XClose

¹û¶³Ó°Ôº Module Catalogue

Home
Menu

Engineering for Data Analysis 2 (COMP0239)

Key information

Faculty
Faculty of Engineering Sciences
Teaching department
Computer Science
Credit value
15
Restrictions
Module delivery for PGT (FHEQ Level 7) available on MSc Artificial Intelligence and Data Engineering; MSc Data Science and Machine Learning; MSc Software Systems Engineering.
Timetable

Alternative credit options

There are no alternative credit options available for this module.

Description

Aims:

The aims of the module are to provide a background in theoretical software engineering as it applies to large scale data analysis on parallel systems. Students will be introduced to principles in designing and developing data science applications platforms. The module will teach students the applied, technical details of deploying and building data science applications. By completion of the module students will be able to develop and write their own large scale, state-of-the-art Machine Learning analyses

Intended learning outcomes:

On successful completion of the module, a student will be able to:

  1. Describe the theoretical principles of large-scale data storage and addressing.
  2. Design and implementation of large-scale data analysis software.
  3. Explain and evaluate data processing strategies such as Extract, Transform, Load (ETL).
  4. Design effective data analysis applications for large scale data analysis which uses machine learning.
  5. Critically evaluate different platforms for deploying data analysis applications (e.g., Hadoop, spark.)
  6. Create novel large scale data analysis pipelines using distributed programming, which can derive new key insights from that data.

Indicative content:

The following are indicative of the topics the module will typically cover:

  • Data Modelling & Extract Transform Load (ETL).
  • Storage of large-scale data in a form suitable for modern data science (parallel file systems, graph stores, document stores, object stores, immutability, file format design choices including modern standards such as HDF5 (Hierarchical Data Format). Semantic databases/ triplestores will be covered.
  • Formalisms for addressing large scale data (Relational Database like constructs such as Amazon Web Services Athena, redistributable data dictionaries.)
  • Programming on distributed systems (e.g., Celery, Hadoop, Slurm.)
  • Machine Learning at scale – scaled resources, including accelerators (e.g., Graphics Processing Units, Tensor Processing Units.)
  • Workflow management systems (e.g., Common Workflow Language, Storm, Spark, Luigi.)

Requisites:

To be eligible to select this module as optional or elective, a student must: (1) be registered on a programme and year of study for which it is formally available; and (2) have completed Engineering and Data Analysis 1 (COMP0235).

Module deliveries for 2024/25 academic year

Intended teaching term: Term 2 ÌýÌýÌý Postgraduate (FHEQ Level 7)

Teaching and assessment

Mode of study
In person
Methods of assessment
50% Exam
50% Coursework
Mark scheme
Numeric Marks

Other information

Number of students on module in previous year
6
Module leader
Dr Daniel Buchan
Who to contact for more information
cs.pgt-students@ucl.ac.uk

Last updated

This module description was last updated on 8th April 2024.

Ìý