WebMatrix: 自分が書いた記事の数を数えてみた

執筆日時： 2014年2月6日02時00分

f:id:daruyanagi:20140205163749p:plain

こういうのは別に WebMatrix でやる必要はないのだけど……慣れているので。あと、もっときれいに書き直そうかと思ったけど、面倒くさくなってやめた。

@using System.Text
@using System.Text.RegularExpressions
@{
// 2008年まではシステムが旧かったので、
// http://www.forest.impress.co.jp/article/ 以下に過去ログがある
var archives = new List<string>()
{
"http://www.forest.impress.co.jp/article/200704.html",
"http://www.forest.impress.co.jp/article/200705.html",
"http://www.forest.impress.co.jp/article/200706.html",
"http://www.forest.impress.co.jp/article/200707.html",
"http://www.forest.impress.co.jp/article/200708.html",
"http://www.forest.impress.co.jp/article/200709.html",
"http://www.forest.impress.co.jp/article/200710.html",
"http://www.forest.impress.co.jp/article/200711.html",
"http://www.forest.impress.co.jp/article/200712.html",
"http://www.forest.impress.co.jp/article/200801.html",
"http://www.forest.impress.co.jp/article/200802.html",
"http://www.forest.impress.co.jp/article/200803.html",
"http://www.forest.impress.co.jp/article/200804.html",
"http://www.forest.impress.co.jp/article/200805.html",
"http://www.forest.impress.co.jp/article/200806.html",
"http://www.forest.impress.co.jp/article/200807.html",
"http://www.forest.impress.co.jp/article/200808.html",
"http://www.forest.impress.co.jp/article/200809.html",
"http://www.forest.impress.co.jp/article/200810.html",
"http://www.forest.impress.co.jp/article/200811.html",
"http://www.forest.impress.co.jp/article/200812.html",
};
// おニューなシステムの過去ログの URL を足す
var d = new DateTime(2009, 1, 1);
while (d < DateTime.Today)
{
archives.Add(string.Format("http://www.forest.impress.co.jp/backno/top/index{0:0000}{1:00}.html", d.Year, d.Month));
d = d.AddMonths(1);
}
// システム変更時に UTF-8 にすべきって主張しておけばよかった
var encoding = Encoding.GetEncoding("Shift_JIS");
// 自分の記事を保持しておくリスト
var my_articles = new List<string>();
}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
@foreach (var archive in archives)
{
// 過去ログページの解析
var month = new WebClient(){ Encoding = encoding, }.DownloadString(archive);
var regex = new Regex(@"""(/[^""]+.html)""");
var links = regex.Matches(month).Cast<Match>();
foreach (var link in links)
{
var l = "http://www.forest.impress.co.jp" + link.Groups[1];
// ダイジェストニュース、アップデート、バックナンバーは読み飛ばす
if (l.IndexOf("digest") >= 0)
{
continue;
}
if (l.IndexOf("update") >= 0)
{
continue;
}
if (l.IndexOf("backno") >= 0)
{
continue;
}
// ニュースやレビュー記事の場合、著者名を拾う
try
{
var article = new WebClient(){ Encoding = encoding, }.DownloadString(l);
if (article.IndexOf("柳 英俊") >= 0)
{
my_articles.Add(l); // おれのだー！
}
}
catch
{
// リンク切れとかあるかもしれん
}
}
}
<ul>
@foreach(var my_article in my_articles.Distinct())
{
<li><a href="@my_article">@my_article</a></li>
}
</ul>
<p>@my_articles.Distinct().Count() 件の記事が見つかりました。</p>
</body>
</html>

Distinct() 2回呼んでたりするし、見る人が見たら殺されかねないコードだけど、まぁ、使い捨てだし！　たぶんちゃんとカウントできてる気がする。

f:id:daruyanagi:20140205165911p:plain

ただ、時間は割とかかる。ローカルで動かすだけなら問題ないみたいだけど。

追記

なかじ様が面白いことをしていた。

PowerShell：だるやなぎ様が書いた記事の数を数えてみた - なか日記