当前位置: 主页 > 观点 > 内容页

现代数据架构课程-哥伦比亚大学课程学习笔记

2023-03-29 01:51:09 来源:哔哩哔哩

这是当年念书的时候记录的学习笔记,感觉还是非常稚嫩,而且基本都是框架性质的东西,看个热闹吧哈哈哈哈哈哈哈。

Modern Data Architecture Learning Journal 

1.1 Software Development Life Cycle.

According to the description from The Business Analyst’s Handbook, Software Development Life Cycle has generally five stages of Initiation, Discovery, Construction, Final V&V and Closeout. This structure has a lot in common with the structure of method called “Project Management Life Cycle”, which is wildly applied during my working period. To be specific, the methodologies called Scrum and Waterfall are also developed based on the basic software and project management life cycle with essential parts stressed more.

1.2 Architecture to connect business and technology sides.


(相关资料图)

The architecture of the modern data base will be the glue bridging the gap between the two sides. Architecture unites Tech&Biz based on four layers:

Presentation(change management); Logic(insights, process and projects); Data; Sources of data

When I was working in a huge drink manufacturing and marketing organization, the IT department is sharpening the big data platform for the vending machines sales information.We have the source of data, the data itself, the next step is to go up to convert data into insights and projects, and finally to support the change of business, organization or management flow, as what I did as the project manager. This is the internal consulting process which starts from the source to presentation. External consulting projects, generally speaking, starts with the problem and the top of the four layers and walk down to the sources of data for quantitative solution.

1.3 Data Lake and Data Warehouse.

Apparently Data Lake, as the platform to support hybrid of structured and unstructured data, is growing in data industry with great advantage such as processing speed, storage convenience and agility of application with security. Conceding the traditional data warehouse has been playing the role of infrastructure in the industry, the innovation and replacement brought by data lake would not happen until ROI become significant for both medium and long term development at least.

1.4 ROI

If we area talking about the data application and it final outcome for different business, organizations and projects, a measurable metric or KPI will be significant for the evaluation. ROI, return on investment, would be a general number used to conduct the evaluation. Financially speaking, Return on Investment = (Profit/Savings from Investment – Cost of Investment) / Cost of Investment. On technical side, the number has flexibility on calculation. We can either focus on the improved efficiency with (Delta – Mod – BI)/KPI or we have ROI = (Delta – (Mod + BI)) / BI.

For Future Application

In the future of my career life, in which I plan to be a quantitative expert in marking and operation areas, the application of data mining, data architecture, and database should always focus on the four layers of Presentation(change management), Logic(insights, process and projects), Data and Sources of data. Lines of business, shared services, enterprises level application, these three factors play as initiator, enabler and barrier on all four layers. The three roles of initiator, enabler and barrier should play the guiding principles for my professionalism in related industries to inspire and maintain my

Group Project

Currently the group project focuses on business requirements, and what we have built for the project are the drivers, motivation and criteria of the project in business areas mainly. We also learned that today the control of technology is shared among a variety of participants, which should be demonstrated in our group project. The architecture of data should be the glue to bridge the gap between the business and technique sides.

In the business req document, at least we need to demonstrate the clarity of opportunity/problem/purpose, ensure the completeness to include a description of the anticipated user group, anticipated usage, data volume, up-time requirements, and high-level functionality, and the establish the success criteria/vocabulary that can be objectively measured. The criteria is stated in clear, unambiguous terms that can be understood by both the business and the tech sides.

To be more specific about the criteria, we need to have anticipated user group, anticipated usage, data volume, up-time requirements, and high-level functionality. Up-time requirements refers to the total amount of time that the system is available for end-use applications. The value is stated as a percent of total scheduled working hours.

These are the uptime percentages and corresponding downtime values for customers that must be available all the time (24×365).

•Less than 90% (downtime of 876 or more hours (36 days)/year)

•90 to 95% (downtime of 438 to 876 hours/year)

•95 to 99% (downtime of 88 to 438 hours/year)

•99.1 to 99.9% (downtime of 8.8 to 88 hours/year)

•99.99% (downtime of about 50 minutes/year)

•99.999% (downtime of about 5 minutes/year)

Typically the cost per outage hour is used as a determining factor in up-time requirements. When talking about unplanned outages, the uptime requirements must be based only off of the scheduled working hours. This means the cost of an outage should be calculated based on the worst possible time.

Emerging Technologies

4.1 Big data and six sigma

The core of Six Sigma is a philosophy and focus for reducing variability in process operations. It involves process definition and the incorporation of an array of statistical analytic methods to measure the performance of various attributes

Big data is not the problem of size, the real problem is the big deal. Which means the gap between the the ways you want to manipulate the data and the limited ways the data present itself. In DMAIC model of 6sigma, the big data grants us more perspectives and methods to evaluate operation issues.

4.2 Social Medias like Twitter and Facebook

We know that funds companies like TwoSigma is use Twitter to conduce sentiment analysis for stock market. Facebook has the ability to tell when customers might be in couple and when they might breakup. Social medias grants data mining the approaches to the analysis or customized and personal life.

本文使用 文章同步助手 同步

标签:

发展
天天热讯:百利班登录平台
(资料图)您好,现在渔夫来为大家解答以上的问题。百利班登录平台相信很多小伙伴还不知道,现在让我们一起来看看吧!1、那里的老师基本上都有
速读:贵州广电网络电视只显示小屏幕不能全屏播放(贵州广电网络电视)
(相关资料图)您好,现在渔夫来为大家解答以上的问题。贵州广电网络电视只显示小屏幕不能全屏播放,贵州广电网络电视相信很多小伙伴还不知道,
天天速递!《草虫村教案》
您好,现在渔夫来为大家解答以上的问题。《草虫村教案》相信很多小伙伴还不知道,现在让我们一起来看看吧!(资料图片仅供参考)1、1.正确读写