I.The Basic Information of the Course
Course Number:202012420020
The English Name of the Course:Introduction to Data Analysis
The Chinese Name of the Course:数据分析导论
In-class Hours and Allocation:Total class hours:32, classroom teaching: 26 class hours, classroom discussion: 6 class hours
Credit(s):2
Semester:2
Applied Discipline(Professional Degree Category):Science and Engineering Discipline
Course Object Oriented:Academic Master, Academic Doctor
Evaluation Mode:Project or small thesis
Teaching Method:Blended Teaching
Course Opening Department:College of Mathematical Sciences
II.Prerequisite Course
Probability theory and mathematical statistics, Linear Algebra
III. The Objectives and Requirements of the Course
In the face of increasingly huge data resources, people urgently need powerful tools to analyze and find useful information. Data analysis is a new interdisciplinary that gathers statistics, machine learning, database, artificial intelligence and other subjects. This course deeply discusses the principle of data analysis, integrates the contributions of mathematics, information science, computing science and statistics to data analysis, and trains students to have good scientific research ability and creativity. Through the study of this course, students can have a systematic understanding and mastery of the basic principles, basic methods and common algorithms of data analysis.
IV. The Content of the Course
Data analysis is the theory and method of analyzing and processing data. It is an extraordinary process to identify effective, novel, potentially useful and ultimately understandable patterns from the data set. It is the basis of artificial intelligence. The purpose of data analysis is to discover knowledge. Knowledge discovery is to turn data into information and knowledge and find the gold of knowledge from data, which will contribute to development of knowledge innovation and knowledge economy. This course comprehensively and systematically introduces the basic methods of data analysis, reflecting the latest achievements of current knowledge discovery research.
This course mainly introduces the basic principles and methods of data analysis, including data and images, feature selection, linear and non-linear feature extraction, clustering and classification methods, abnormal knowledge discovery methods, and subspace learning for processing high-dimensional data. These contents are the cutting-edge basic research topics of artificial intelligence and data sciences.
Chapter 1 Introduction to data science and artificial intelligence(2 class hours)
Introduction of the relationship between the related disciplines of artificial intelligence and the importance of data science in artificial intelligence. Understanding methods of mathematical representation of image and visual information, and types of visual forms of data. Learning current situation and direction of science and technology development in the intelligent era.
Chapter 2 Feature selection(6 class hours)
Learn feature selection tasks, understand and master feature selection methods: feature selection based on identification matrix,feature selection based on dependency; feature selection based on knowledge granularity; feature selection based on information entropy; feature selection based on mutual information. Apply the knowledge learned to solve the problem of feature selection in real life.
Chapter 3 Feature extraction(6 class hours)
Basic principles and methods of principal component analysis and linear discriminant analysis. Linear feature extraction methods such as local preserving projection and neighborhood preserving embedding. Understanding basic principles of manifold learning and Mastering global feature extraction methods such as multidimensional scale analysis and isometric mapping. Learning typical non-linear feature extraction methods such as local linear embedding, Laplace feature mapping and local tangent space alignment feature extraction method. Exploration of low dimensional visualization method for high dimensional data.
Chapter 4 Unsupervised Learning(5 class hours)
Understand basic principles and methods of unsupervised learning, master model-based clustering method, grid based clustering method, partition based clustering method, and density based clustering method. Master and understand graph cutting and spectral clustering, and learn to use typical clustering methods to solve practical problems.
Chapter 5 Supervised Learning(5 class hours)
Understand the basic idea and method of supervised learning, Bayes classification method and k-nearest neighbor classification method, linear support vector machine classification method and nonlinear support vector machine method. Discussion on the latest method and development of semi supervised learning.
Chapter 6 Subspace learning(4 class hours)
Introduction of the concept of vector norm and matrix norm. Grasp the significance of various vector and matrix norms in data representation. Master the basic principles and methods of soft subspace learning and clustering. Understand and master the representation of sparse subspace and low rank subspace based on the principle of compressed perception. Master the solution methods of matrix norm optimization. Learn data subspace clustering method and missing data subspace completion method
Chapter 7 Abnormal knowledge discovery(4 class hours)
Concepts and significance of abnormal knowledge discovery. Brief introduction of key technologies of granular computing and abnormal knowledge discovery. Basic methods of abnormal knowledge discovery: model-based knowledge discovery method, distance based knowledge discovery method, clustering based knowledge discovery method and density based knowledge discovery method. Learn to use common abnormal knowledge discovery methods to solve practical problems.
During the courses of theoretical teaching, the ideological and political content and case analysis of the course are interspersed with teaching and discussion.
Case 1In the information age, data has become an important factor of production, social wealth and even a key resource for competition among countries. The 13th five-year plan put forward the implementation of the national big data strategy, took the lead in establishing the national big data center (Guiyang), comprehensively promoting the development and application of China's big data, accelerating the construction of a data power, promoting the opening and sharing of data resources, releasing technology dividends, system dividends and innovation dividends, and promoting economic transformation and upgrading. Large data research and analysis centers with various forms and complete functions have been established in China. In order to cultivate more talents, hundreds of colleges and universities in China have established big data, artificial intelligence colleges and research institutes, which are at the forefront of the world.
Case 2With the coming of the fourth industrial revolution, countries all over the world have realized that artificial intelligence is the key competition field among countries in the future, so they are competing for the commanding height of this round of scientific and technological revolution. The iFLYTEK intelligent medical assistant has become the first robot in the world to pass the national doctor qualification examination. In the blizzard competition of International English speech synthesis competition in 2017, the iFLYTEK won the first place for the 12th time in a row, and it is the only one in the world that enables speech synthesis technology to reach the level of human speech. In an international authoritative AI competition, the intelligent medical technology for pulmonary nodule medical image has reached the average level of doctors in the top three hospitals.
V. Reference Books, Reference Literatures, and Reference Materials
A. Text Books, Monographs and References
1. Zhihua Zhou. Machine Learning. Press of Tsinghua University, 2017.
2. Changlin Mei, Jincheng Fan. Data Analysis Methods. Higher Education Press,2006.
3.Xindong Wu, V. Kumar. Ten Algorithms for Data Mining. Press of Tsinghua University,2013
4. Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: Concepts and Technologies. Morgan Kaufmann Publishers, 2011
5. K.P. Murphy. Machine Learning, A probabilistic Perspective.MIT Press, Cambridge, Mass, USA, 2012.
B. Learning Resources (Time New Rome 12 points)
1.https://github.com/dlaptev/RobustPCA
2.https://www.sciencedirect.com/journal/artificial-intelligence
Outline Writer (Signature):
Leader in charge of teaching at the College (Signature):
Date: