Posts Tagged ‘Slideshare Cnbloggercon 正业余 Fenng Database’

中文网志年会SlideShare.net今日幻灯故事“美好架构,美好未来”

Posted in 草根传播 on November 20th, 2008 by Oliver Ding – 5 Comments

今天的“今日幻灯故事”是来自Fenng的“美好架构,美好未来”。

Fenng是大陆blog圈中知名的技术blogger之一,他的blog名为DBA notes,主要讨论数据和网络架构,分享了许多有价值的信息和思考给业界人士。

今年的中文网志年会很幸运邀请他来现场就如何建设一个可扩展的Web网站做演讲,小容没有在现场,从大家的blog反馈来说,这次Fenng的演讲因为场地、音响和气氛的影响,效果不是很理想。

2008年的网志年会的主题朝着多元化方向去发展,技术类的演讲与往届相比在整个议程安排中显得比较少。很难评价这是好事,或是坏事。有人感叹在年会上找不到熟悉的老朋友一样。新朋友少了,老朋友多了。这是好事,还是坏事,同样难以评价。

不过,年会已经四岁了,他不再是依恋父母的婴儿了,他已经成大成为有些顽皮的小孩,对外部世界充满了好奇,开始和其他小朋友玩乐,继续尝试,继续犯错。他已经有自己的生命了。

Fenng说他今后会在其他偏向于技术主题的场合继续演讲这方面的主题。对于从事网络开发的同仁来说,能够继续聆听他的分享实在是一件幸事。

虽然说在24小时之内可以架设一个网站出来,然而,放慢一些脚步,将基础架构规划好,却是磨刀不误砍柴工。多花一些时间把基础架构部署好,是为了更好的迎接未来。

Fenng多年来一直关心这方面的话题,不仅分享了许多国外同仁关于这方面的资讯,也将自己的工作心得和体会分享给大家,在小容看来,这是社会化网络中“正业余”价值的最佳案例之一。正业余,这是小容受“专业余”这个名词的启发而发明的一个新概念,它用来描述人们在业余时间,继续通过社会化网络来分享自己的正式职业角色中的思考和体验(这也是小容对于社会化网络与职业活动两者之间的关系的思考之一,日后将会陆续分享出来)。这样的“正业余”活动,促成行业内部的专业交流,让知识得以流动,让产业从业者融合在一个没有公司边界的知识网络中。正是这样一个个投身于“正业余”活动敬业人士,他们之间的紧密联系组成了推动产业演化的实践社群,为终身学习提供了一个取之不尽用之不竭的活力源泉。

小容曾在2006年杭州的中文网志年会上有幸与Fenng见上一面。平时也一直订阅他的blog,从中学习和了解关于数据库和网络架构的知识。当然,小容并不是要转型成为一个数据库工程师:)只是,作为Web创业团队的一员,我们需要了解各个层面的基本知识,这样和团队成员沟通时,才能使用和他们相同的语言,让沟通没有障碍。

小容也顺便整理了一些自己感兴趣的贴子,做了一个Google Docs共享文件。从Fenng的blog出发,小容收集了如下的关于数据库和基础架构的资料:

- 7个英文的PPT下载地址或在线观看地址;

- 16篇英文的关于Web2.0网站架构分析和MySQL数据库的文章,其中包括LiveJournal,Flickr,Twitter,mixi.jp,Wikipedia,FeedBurner,Second Life,Bloglines,craigslist,Amazon,Facebook等公司的案例;

- 11篇中文的关于Web2.0网站架构分析和MySQL数据库的文章。

整理这个文档的体会是,国外行业交流的气氛非常活跃,小容就自己有限的观察,对比起来,大陆blogger在行业交流上的气氛还相当淡薄,在许多行业和专业领域,尚未有许多从业者在社会化网络上展开如上所述的“正业余”活动。期待不久的将来,这样的状态会有所改变。

这个文档的TinyURL地址是:http://tinyurl.com/57mtft

小容也将这个文档共享给了Fenng,如果你也有兴趣编辑这个文档,添加更多相关资源,请在此留言,留下你的gmail地址。

考虑读者打不开Google Docs文档,现在小容将这个文档的内容全文贴在下面:

参考资料:

Part 1: English PPT:

1. LiveJournal’s Backend: A history of scaling (PDF)
http://www.danga.com/words/2005_oscon/oscon-2005.pdf

2. Capacity Planning for LAMP MySQL Conf 2007 (PPT)
John Allspaw, Engineering Ops Manager, Flickr.com
http://www.scribd.com/doc/40284/Capacity-Planning-for-LAMP-MySQL-Conf-2007

3. Twitter: A Small Talk on Getting Big (PPT)
http://www.slideshare.net/britt/a-small-talk-on-getting-big-113066

4. mixi.jp: Scaling Out with Open Source (PPT)

Batara Kesuma, Chief Technology Officer, mixi, Inc.

Track: Cluster, Replication, and Scale-out
Date: Thursday, April 27
Time: 1:30pm - 2:15pm
Location: Ballroom D

mixi is a Social Networking Site (SNS) that emphasizes communicating functions.

mixi is considered one of the hottest web properties in Japan, currently ranked #50 (worldwide) in the Alexa page view ranking. Since its launch in February 2004, membership now exceeds 3.1 million, with the speed of growth exponentially faster than Yahoo! Japan.

mixi has an estimated 150 million daily page views with more than 400 million MySQL DB queries.

The secret of mixi’s success is its community members — mixi cannot be joined without a registered member’s invitation. About 10,000 people register to join every day.

Login ratio of the mixi is extremely high. About 70% of the members visit mixi once every three days. The average page view per person for one visit estimates around 50-60, all dynamically generated pages.

The presentation will explain about mixi’s large scale-out web architecture and go in-depth discussing the use of commodity hardware and open source software, especially MySQL.

Download PPT:
http://conferences.oreillynet.com/presentations/mysql06/mixi_update.pdf

5.Wikimedia architecture (PDF)
http://www.nedworks.org/~mark/presentations/san/Wikimedia%20architecture.pdf

6. Federation at Flickr: Doing Billions of Queries Per Day (PPT)
http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day

7. FeedBurner Scalability (PPT)
Scalable Web Applications using MySQL and Java Joe Kottke, Director of Network Operations

View PPT:
http://www.slideshare.net/didip/feed-burner-scalability/

Part 2: English Articles:

1. Scaling Twitter: Making Twitter 10000 Percent Faster

Thu, 01/17/2008 - 16:08 — Todd Hoff

The Stats: Over 350,000 users. The actual numbers are as always, very super super top secret.
600 requests per second. Average 200-300 connections per second. Spiking to 800 connections per second. MySQL handled 2,400 requests per second. 180 Rails instances. Uses Mongrel as the “web” server. 1 MySQL Server (one big 8 core box) and 1 slave. Slave is read only for statistics and reporting. 30+ processes for handling odd jobs. 8 Sun X4100s. Process a request in 200 milliseconds in Rails. Average time spent in the database is 50-100 milliseconds. Over 16 GB of memcached.

Twitter’s API Traffic is 10x Twitter’s Site
- Their API is the most important thing Twitter has done.
- Keeping the service simple allowed developers to build on top of their infrastructure and come up with ideas that are way better than Twitter could come up with. For example, Twitterrific, which is a beautiful way to use Twitter that a small team with different priorities could create.

Treat your scaling plan like a business plan. Assemble a board of advisers to help you.

Don’t make the database the central bottleneck of doom. Not everything needs to require a gigantic join. Cache data. Think of other creative ways to get the same result. A good example is talked about in Twitter, Rails, Hammers, and 11,000 Nails per Second

Turn your website into an open service by creating an API. Their API is a huge reason for Twitter’s success. It allows user’s to create an ever expanding and ecosystem around Twitter that is difficult to compete with. You can never do all the work your user’s can do and you probably won’t be as creative. So open you application up and make it easy for others to integrate your application with theirs.

Read the full article:
http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster

2. Web 2.0 and Databases Part 1: Second Life

Like everybody else, we started with One Database All Hail The Central Database, and have subsequently been forced into clustering. However, we’ve eschewed any of the general purpose cluster technologies (mysql cluster, various replication schemes) in favor of explicit data partitioning. So, we still have a central db that keeps track of where to find what data (per-user, for instance), and N additional dbs that do the heavy lifting. Our feeling is that this is ultimately far more scalable than black-box clustering. Right now we’re still in the transition process, so we remain vulnerable to overload. As Cory mentioned, we’re moving to an HTTP-based internal communication model in order to improve our flexibility.

I think the biggest lesson we learned is that databases need to be treated as a commodity. Standardized, interchangeable parts are far better in the long run than highly-optimized, special-purpose gear. Web 2.0 applications will require more horsepower with less money than One Database or his big brother One Cluster All Hail The Central Cluster will offer. (After all, a 64-way Mysql Cluster installation is just the budget-friendly version of a Sun E-10000.) Unfortunately, this seems to be the minority view, at least if the dearth of automated db provisioning tools is any indication.

Read the full article:
http://radar.oreilly.com/archives/2006/04/web-20-and-databases-part-1-se.html

3. Database War Stories #2: bloglines and memeorandum

Gabe wrote: “I didn’t bother with databases because I didn’t need the added complexity… I maintain the full text and metadata for thousands of articles and blog posts in core. Tech.memeorandum occupies about 600M of core. Not huge.”

Mark wrote: “The 1.4 billion blog posts we’ve archived since we went on-line are stored in a data storage system that we wrote ourselves. This system is based on flat files that are replicated across multiple machines, somewhat like the system outlined in the Google File System paper.”

“Bloglines has several data stores, only a couple of which are managed by “traditional” database tools (which in our case is Sleepycat). User information, including email address, password, and subscription data, is stored in one database. Feed information, including the name of the feed, description of the feed, and the various URLs associated with feed, are stored in another database. The vast majority of data within Bloglines however, the 1.4 billion blog posts we’ve archived since we went on-line, are stored in a data storage system that we wrote ourselves. This system is based on flat files that are replicated across multiple machines, somewhat like the system outlined in the Google File System paper, but much more specific to just our application. To round things out, we make extensive use of memcached to try to keep as much data in memory as possible to keep performance as snappy as possible.”

Read the full article:
http://radar.oreilly.com/archives/2006/04/database-war-stories-2-bloglin.html

4. Database War Stories #3: Flickr

I also asked Cal: “I’m particularly interested in how the folksonomy model intersects with the traditional database. How do you manage a tag cloud? A lot of ideas about how databases are supposed to look start to go by the wayside…” He replied:

“tags are an interesting one. lots of the ‘web 2.0′ feature set doesn’t fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that’s slow to generate, but if it’s so expensive to generate that you can’t ever regenerate that view without pegging a whole database server then it’s not going to work (or you need dedicated servers to generate those views - some of our data views are calculated offline by dedicated processing clusters which save the results into mysql).

federating data also means denormalization is necessary - if we cut up data by user, where do we store data which relates to two users (such as a comment by one user on another user’s photo). if we want to fetch it in the context of both user’s, then we need to store it in both shards, or scan every shard for one of the views (which doesn’t scale). we store alot of data twice, but then theres the issue of it going out of sync. we can avoid this to some extent with two-step transactions (open transaction 1, write commands, open transaction 2, write commands, commit 1st transaction if all is well, commit 2nd transaction if 1st commited) but there still a chance for failure when a box goes down during the 1st commit.

Read the full article:
http://radar.oreilly.com/archives/2006/04/database-war-stories-3-flickr.html

5. Database War Stories #4: NASA World Wind

Patrick Hogan of NASA World Wind, an open source program that does many of the same things as Google Earth, uses both flat files and SQL databases in his application. Flat files are used for quick response on the client side, while on the server side, SQL databases store both imagery (and soon to come, vector files.) However, he admits that “using file stores, especially when a large number of files are present (millions) has proven to be fairly inconsistent across multiple OS and hardware platforms.”

I asked: “Tell me about your database architecture for NASA World Wind.” Patrick replied:

“What appears to the user as a single image of a very large physical range really consists of millions of images. In an application like World Wind, which displays many different kinds of large ranges, the database must hold billions of images (Gigaimages) or references to them. Each image, although typically ~20KB can also be megabytes in size.

Demand on the image-serving database is bursty and intense. Dozens of images could be needed immediately with each small change in the user’s view direction.

On the client side of World Wind, there is very limited use of traditional SQL-based databases. The client depends mostly on flat-file stores to maintain data. However, on the server side of things, World Wind enabled servers have relied on SQL-based databases to store imagery and will soon in-the near future deliver vector-based data via the WFS protocol. World Wind already delivers data via WMS.”

Read the full article:
http://radar.oreilly.com/archives/2006/04/database-war-stories-4-nasa-wo.html

6.Database War Stories #5: craigslist

Craig showed a slide (which helped inspire my postings about asymmetric competition [1, 2, 3]) that listed the number of employees at the top ten web sites. Most of them have thousands of employees. Some have tens of thousands. Craigslist, at #7 on the list, has 19.

Eric’s email has that embattled “news from the front” feel that you might expect from a site handling that much traffic with only 19 employees!

…”mysql upgrades can be the best thing ever [but can also] make you hate yourself.

We upgraded our search clusters to 4.1x a while back and got a huge performance boost from 4.0. there were no notes in the change log that fulltext indexing had been touched but it surely rocked.

We once rolled at a minor revision 4.0.x 4.0.x++ and query optimization flipped over on its head, seemed fine in testing. It suddenly was choosing complete different indexes than the prior version. But only in some cases. So it hit the live site and bad things happened.”

Read the full article:
http://radar.oreilly.com/archives/2006/04/database-war-stories-5-craigsl.html

7.Database War Stories #6: O’Reilly Research

Finding usable information in large, unstructured data sets. This is a relatively new problem for business. Not too long ago, data warehouses tended to store structured operational data or clickstreams from web activity, with good keys and controlled data entry. Nowadays we’re building data warehouses w/ jobs, blogs and other unstructured data that requires different techniques. To look for trends in unstructured data we need to use techniques similar to those used for effective search. Cleverness is a plus.

MySQL databases configured for transactions will perform slowly if used for business intelligence (and vice versa). I moved some queries from a transaction system to a data mart system and saw 30-50% faster performance. For a mart you want to use big buffers, big pages and process as much data as possible for each step (handle batches). Transaction systems are typically optimized to handle many small, unrelated data bits (handle one thing well at a time). My experience is that it’s difficult to integrate transaction and analysis oriented tasks on the same box.

Read the full article:
http://radar.oreilly.com/archives/2006/05/database-war-stories-6-oreilly.html

8.Database War Stories #7: Google File System and BigTable

“Interesting discussion. I don’t have much to add. I’ve been working with a number of other people here at Google on building a large-scale storage system for structured and semi-structured data called BigTable. It’s designed to scale to hundreds or thousands of machines, and to make it easy to add more machines the system and automatically start taking advantage of those resources without any reconfiguration. We don’t have anything published about it yet, but there’s a public talk about BigTable that I gave at University of Washington last November available on the web (try some searches for bigtable or view the talk).”

Read the full article:
http://radar.oreilly.com/archives/2006/05/database-war-stories-7-google.html

9. TALK: BigTable: A System for Distributed Structured Storage

BigTable is a system for storing and managing very large amounts of structured data. The system is designed to manage several petabytes of data distributed across thousands of machines, with very high update and read request rates coming from thousands of simultaneous clients. In this talk, Jeff Dean, Google, discusses the basic design of BigTable and its implementation, provides some performance measurements, and outlines some current applications of the system. He also touches on Google’s future goals and directions for the system.

Read the full article:
http://www.uwtv.org/programs/displayevent.aspx?rID=4188

10. Bigtable: A Distributed Storage System for Structured Data

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

Read the full article:
http://labs.google.com/papers/bigtable.html

Download: PDF Version (http://labs.google.com/papers/bigtable-osdi06.pdf )

11. Database War Stories #8: Findory and Amazon

On Findory, our traffic and crawl is much smaller than sites like Bloglines, but, even at our size, the system needs to be carefully architected to be able to rapidly serve up fully personalized pages for each user that change immediately after each new article is read.

Our read-only databases are flat files — Berkeley DB to be specific — and are replicated out using our own replication management tools to our webservers. This strategy gives us extremely fast access from the local filesystem. We make thousands of random accesses to this read-only data on each page serve; Berkeley DB offers the performance necessary to be able to still serve our personalized pages rapidly under this load.

Our much smaller read-write data set, which includes information like each user’s reading history, is stored in MySQL. MySQL MyISAM works very well for this type of non-critical data since speed is the primary concern and more sophisticated transactional support is not important. While it has not been necessary yet, our intention is to scale our MySQL database with horizontal partitioning, though we may also experiment with MySQL Cluster as well.

For both read-only and read-write data, we have been careful to keep the data formats compact to ensure that the total active data set can easily fit in main memory. We attempt to avoid as much disk access as we can.

After all of this, Findory is able to serve fully personalized pages, different for each reader, that change immediately when each person reads a new article, all in well under 100ms. People don’t like to wait. We believe getting what people need quickly and reliably is an important part of the user experience.

Read the full article:
http://radar.oreilly.com/archives/2006/05/database-war-stories-8-findory-1.html

12.Early Amazon: Splitting the website

We designed a rough architecture for the system. There would be two staging servers, development and master, and then a fleet of online webservers. The staging servers were largely designed for backward compatibility. Developers would share data with development when creating new website features. Customer service, QA, and tools would share data with master. This had the added advantage of making master a last wall of defense where new code and data would be tested before it hit online.

Read-only data would be pushed out through this pipeline. Logs would be pulled off the online servers. For backward compatibility with log processing tools, logs would be merged so they looked like they came from one webserver and then put on a fileserver.

Stepping out for a second, this is a point where we really would have liked to have a robust, clustered, replicated, distributed file system. That would have been perfect for read-only data used by the webservers.

Read the full article:
http://glinden.blogspot.com/2006/02/early-amazon-splitting-website.html

13.Database War Stories #9 (finis): Brian Aker of MySQL Responds

I agree about the common design patterns, but I didn’t hear that flat files don’t scale. What I heard is that some very big sites are saying that traditional databases don’t scale, and that the evolution isn’t from flat files to SQL databases, but from flat files to sophisticated custom file systems. Brian acknowledges that SQL vendors haven’t solved the problem, but doesn’t seem to think that anyone else has either.

“Predictably the solution was to partition the database with one master database for lookups to find out where the actual database holding the real data was. AKA I suggested that they partition their data, and as is often the case their data partitioned quite easily. This is the sort of use case I see over and over again. There is a talk I’ve been giving for years on how people lay out their database environment, its been interesting to watch what the converging use cases are, and every time I give the talk I find new insights on how people are creating clusters/creating scale out.

Reading through the comments you got on your blog entry, these users are hitting on the same design patterns. There are very common design patterns for how to scale a database, and few sites really turn out to be all that original. Everyone arrives at certain truths, flat files with multiple dimensions don’t scale, you will need to partition your data in some manner, and in the end caching is a requirement.”

Read the full article:
http://radar.oreilly.com/archives/2006/05/brian-aker-of-mysql-responds.html

14. Wikimedia architecture
Wed, 08/22/2007 - 23:56 — Todd Hoff

Wikimedia is the platform on which Wikipedia, Wiktionary, and the other seven wiki dwarfs are built on. This document is just excellent for the student trying to scale the heights of giant websites. It is full of details and innovative ideas that have been proven on some of the most used websites on the internet.

Read the full article:
http://highscalability.com/wikimedia-architecture

15. Engineering@Facebook’s notes: Scaling Out

by Jason Sobel (notes) Wednesday, August 20, 2008 at 11:05am

With the network and hardware in place we set up our standard 3 tier architecture: web server, memcache server, and MySQL database. The MySQL databses in Virginia were going to run as slaves of the west coast databases, so we spent a couple weeks copying all the data across the country and setting up replication streams.

Now that the hardware, network, and basic infrastructure was set up it was time to face the two main application level challenges: cache consistency and traffic routing.

Fortunately, the solution is a lot easier to explain than the problem. We made a small change to MySQL that allows us to tack on extra information in the replication stream that is updating the slave database. We used this feature to append all the data objects that are changing for a given query and then the slave database “sees” these objects and is responsible for deleting the value from cache after it performs the update to the database.

How’d we do it? MySQL uses a lex parser and a yacc grammar to define the structure of a query and then parse it.

Read the full article:
http://www.new.facebook.com/note.php?note_id=23844338919&id=9445547199&index=016.

16. Yacc: Yet Another Compiler-Compiler

Computer program input generally has some structure; in fact, every computer program that does input can be thought of as defining an “input language” which it accepts. An input language may be as complex as a programming language, or as simple as a sequence of numbers. Unfortunately, usual input facilities are limited, difficult to use, and often are lax about checking their inputs for validity.

Yacc provides a general tool for describing the input to a computer program. The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized. Yacc turns such a specification into a subroutine that han- dles the input process; frequently, it is convenient and appropriate to have most of the flow of control in the user’s application handled by this subroutine.

Read the full article:
http://dinosaur.compilertools.net/yacc/index.html

Part 3: Chinese Articles:

1. 了解一下 Technorati 的后台数据库架构

Technorati (现在被阻尼了, 可能你访问不了)的 Dorion Carroll在 2006 MySQL 用户会议上介绍了一些关于 Technorati 后台数据库架构的情况.

全文:http://www.dbanotes.net/web/technorati_db_arch.html

2. mixi.jp:使用开源软件搭建的可扩展SNS网站
全文:http://www.example.net.cn/archives/2006/06/mixijpeoaoeiiae.html

3. FeedBurner:基于MySQL和JAVA的可扩展Web应用
全文:http://www.example.net.cn/archives/2006/06/feedburneruoumy.html

4.Craigslist 的数据库架构

Tim O’reilly 采访了 Craigslist 的 Eric Scheide ,于是通过这篇 Database War Stories #5: craigslist 我们能了解一下 Craigslist 的数据库架构以及数据量信息。

全文:http://www.dbanotes.net/database/craigslist_database_arch.html

5.LinkedIn 架构笔记

LinkedIn 的 CTO Jean-Luc Vaillant 在 QCon 大会上做了 ”Linked-In: Lessons learned and growth and scalability“ 的报告。不能错过,写一则 Blog 记录之。

全文:http://www.dbanotes.net/arch/linkedin.html

6.LinkedIn 架构与开发过程
全文:http://www.dbanotes.net/arch/linkedin_soa.html

7.Facebook 的 Scaling Out 经验

今天阅读了这篇 Scale Out, 工程师 Jason Sobel 介绍了在对付跨地域 MySQL 复制网络延迟的问题。

全文:http://www.dbanotes.net/arch/facebook_scaling_out.html

8. 从 Flickr 的 DB 服务器配置说起 Swap

又读了一遍这个 PPT: Federation at Flickr: Doing Billions of Queries Per Day ,发现还是值得咀嚼一下,尽管这”甘蔗”已经被吃过了。

全文:http://www.dbanotes.net/arch/flickr_db_swap.html

9. 学习 Flickr 的 基于 LAMP 的容量规划经验

最近其实发现了不少可以和大家一起学习的好内容。Flickr 的 John Allspaw 在 MySQL Conf 2007 作了一个题为 Capacity planning for LAMP (下载PDF文件) 的技术报告,说起容量规划,多少有点空对空的意思,不过这个 PPT 还是介绍了不少 Flickr 的网站运维经验。

全文:http://www.dbanotes.net/web/flickr_lamp_capacity_planning.html

10. MySQL 大企业级应用可行性分析(之一)
全文:http://www.dbanotes.net/database/mysql_comment.html

11. WikiPedia 技术架构学习分享
全文:http://www.dbanotes.net/opensource/wikipedia_arch.html

更新于Nov 19, 2008

Fenng也整理了一个这方面的豆列,推荐了相关的图书,请看这里:Web 2.0 网站架构不可或缺的图书

更多内容请参考Fenng的blog:http://www.dbanotes.net