Mining
the Web�@�֍u�X�P�W���[�� |
|
|
|
|
�����͂P�R�}���R�O��������g�����Ă܂����B�e�X�A�R�O���O�������\�����������܂��B �i�������������K�X�X�P�W���[�����O�|���A��������������ړ����܂��B���ꂩ��T���W���A�U���Q�U�����x�u�ł��B�e�l���蓖�Ă�ꂽ���e���AA4��5���O��ł܂Ƃ߂Ă��������B�t���v���W�F�N�^�[��p�ӂ��܂��̂ŁA�g���Ĕ��\���Ă��悢�ł��B |
�֍u�����\�������P������]�����l�͈ȉ��̘_��(�{���̂W�����֘A����)�̂����ǂꂩ���ǂ�ŁA���e��v�����|�[�g�iA4�p��
10���ȏ�j��d�q���[���ŐX��(moris@k.u-tokyo.ac.jp) ��2003�N8��31���܂łɑ����Ă��������B |
|
M.
Hersovici, M. Jacovi, Y.S. Maarek, D. Pelleg, M. Shtalheim and S. Ur, The
Shark-Search algorithm --- an application: tailored Web site mapping, in: 7th
World-Wide Web Conference, April, 1998, Brisbane, Australia |
F Menczer and RK Belew. Adaptive retrieval agents: Internalizing
local context and scaling up to the web. Machine Learning, 39(2/3), 203-242,
2000. |
M. Najork and J. L. Wiener. Breadth-first search crawling yields
high-quality pages. Proceedings of the 10th International World Wide Web
Conference, May 2001. |
Jeffrey Dean, Monika Rauch Henzinger: Finding Related Pages in
the World Wide Web. WWW8 / Computer Networks 31(11-16): 1467-1479 (1999) |
Soumen Chakrabarti, Martin van den Berg, Byron Dom: Focused
Crawling: A New Approach to Topic-Specific Web Resource Discovery. WWW8 /
Computer Networks 31(11-16): 1623-1640 (1999) |
Michelangelo
Diligenti, Frans Coetzee, Steve Lawrence, C. Lee Giles, Marco Gori: Focused
Crawling Using Context Graphs. VLDB 2000: 527-534 |
�@ |
���� |
���e |
�ŏ����� |
�������� |
���\�S���� |
�����i�ڈ��j |
2 |
CRAWLING THE WEB |
17 |
�@ |
�@ |
�@ |
2.1 |
HTML and HTTP Basics |
18 |
�@ |
�@ |
�@ |
2.2 |
Crawling Basics |
19 |
5 |
�A�R�@�W�� |
�@ |
2.3 |
Engineering Large-Scale Crawlers |
22 |
13 |
�����@�G�W |
4��24�� |
2.4 |
Putting Together a Crawler |
35 |
�@ |
�@ |
�@ |
2.5 |
Bibliographic Notes |
40 |
10 |
Somboonviwat�@Kulwadee |
�@ |
3 |
WEB SEARCH AND INFORMATION RETRIEVAL |
45 |
�@ |
�@ |
�@ |
3.1 |
Boolean Queries and the Inverted Index |
45 |
8 |
�X�� |
�@ |
3.2 |
Relevance Ranking |
53 |
14 |
�X�� |
5��1�� |
3.3 |
Similarity Search |
67 |
�@ |
�@ |
�@ |
3.4 |
Bibliographic Notes |
75 |
12 |
�����@���� |
�@ |
4 |
SIMILARITY AND CLUSTERING |
79 |
�@ |
�@ |
�@ |
4.1 |
Formulations and Approaches |
81 |
�@ |
�@ |
�@ |
4.2 |
Bottom-Up and Top-Down Partitioning Paradigms |
84 |
10 |
��J�@���t |
5��15�� |
4.3 |
Clustering and Visualization via Embeddings |
89 |
10 |
�����c�@�T�� |
�@ |
4.4 |
Probabilistic Approaches to Clustering |
99 |
16 |
���@���� |
5��22�� |
4.5 |
Collaborate Filtering |
115 |
�@ |
�@ |
�@ |
4.6 |
Bibliographic Notes |
121 |
10 |
�ēc�@���� |
�@ |
5 |
SUPERVISED LEARNING |
125 |
�@ |
�@ |
�@ |
5.1 |
The Supervised Learning Scenario |
126 |
�@ |
�@ |
�@ |
5.2 |
Overview of Classification Strategies |
128 |
�@ |
�@ |
�@ |
5.3 |
Evaluating Text Classifiers |
129 |
�@ |
�@ |
�@ |
5.4 |
Nearest Neighbor Learners |
133 |
11 |
���i�@�K�� |
�@ |
5.5 |
Feature Selection |
136 |
11 |
�x���@���� |
5��29�� |
5.6 |
Bayesian Learners |
147 |
�@ |
�@ |
�@ |
5.7 |
Exploiting Hierarchy among Topics |
155 |
13 |
�����@�_�� |
�@ |
5.8 |
Maximum Entropy Learners |
160 |
�@ |
�@ |
�@ |
5.9 |
Discriminative Classification |
163 |
9 |
�V�H�@���� |
�@ |
5.10 |
Hypertext Classification |
169 |
�@ |
�@ |
�@ |
5.11 |
Bibliographic Notes |
173 |
8 |
�����ف@���� |
6��5�� |
6 |
SEMI SUPERVISED LEARNING |
177 |
�@ |
�@ |
�@ |
6.1 |
Expectation Maximization |
178 |
7 |
�����@�m�� |
�@ |
6.2 |
Labeling Hypertext Graphs |
184 |
11 |
�ΐ��@�m�� |
�@ |
6.3 |
Co-training |
195 |
�@ |
�@ |
�@ |
6.4 |
Bibliographic Notes |
198 |
8 |
���I�@���� |
6��12�� |
7 |
SOCIAL NETWORK ANALYSIS |
203 |
�@ |
�@ |
�@ |
7.1 |
Social Sciences and Bibliometry |
205 |
�@ |
�@ |
�@ |
7.2 |
PageRank and HITS |
209 |
16 |
�����@�N�� |
�@ |
7.3 |
Shortcomings and the Coarse-Grained Graph Model |
219 |
�@ |
�J���@�q�� |
6��19�� |
7.4 |
Enhanced Models and Techniques |
225 |
16 |
�X�� |
�@ |
7.5 |
Evaluation of Topic Distillation |
235 |
8 |
���@���� |
�@ |
7.6 |
Measuring and Modeling the Web |
243 |
�@ |
�@ |
�@ |
7.7 |
Bibliographic Notes |
254 |
12 |
���u�c�@�ǘa |
7��3�� |
8 |
RESOURCE DISCOVERY |
255 |
�@ |
�@ |
�@ |
8.1 |
Collecting Important Pages Preferentially |
257 |
�@ |
�@ |
�@ |
8.2 |
Similarity Search Using Link Topology |
264 |
13 |
���X���@�L |
�@ |
8.3 |
Topical Locality and Focused Crawling |
268 |
16 |
�ē��@���Y |
�@ |
8.4 |
Discovering Communities |
284 |
�@ |
�@ |
�@ |
8.5 |
Bibliographic Notes |
288 |
�@ |
�@ |
�@ |
9 |
THE FUTURE OF WEB MINING |
289 |
�@ |
�@ |
�@ |
9.1 |
Information Extraction |
290 |
11 |
�����@���a |
7��10�� |
9.2 |
Natural Language Processing |
295 |
�@ |
�@ |
�@ |
9.3 |
Question Answering |
302 |
�@ |
�@ |
�@ |
9.4 |
Profiles, Personalization, and Collaboration |
305 |
�@ |
�@ |
�@ |
9.end |
�@ |
306 |
12 |
�ɓ� �G�a |
7��17�� |
�@ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|