Inventory the Web to help the
Posted: Thu Jul 10, 2025 10:09 am
The Internet Archive is grateful to Google for running their “Google Summer of Code” (GSoC) program, providing support for students and open source projects.
This year the GSoC will support 5 students to work with the Internet Archive on the following projects:
Anish Kumar Sarangi – Continue development of the Chrome extension “Wayback Machine” Today this extension is used by 10s of thousands of people to help them archive URLs, access archived content buy sales lead from broken links (404s, etc.) and perform other functions to help make the web more useful and reliable. We will build on that work, adding features, fix bugs and supporting efforts to bring this tool to millions of users.
Zhengyue Cheng – Wayback Machine do a better job of archiving it. Today the Wayback Machine archives about 1.5 billion URLs/week. A goal of this project will be to help inform the selection of “seeds” for that effort, to help ensure our coverage is as complete and distributed as possible. We don’t know what we don’t know and this project will help us fill in the blanks.
Fotios Tsalampounis – Add functionality to the Wayback Machine to help people learn about changes in web pages over time. Leveraging work done by the Environmental Data Governance Initiative (EDGI) we will continue to develop software to detect changes in the content of web pages and provide user-facing and API-based interfaces to those changes.
This year the GSoC will support 5 students to work with the Internet Archive on the following projects:
Anish Kumar Sarangi – Continue development of the Chrome extension “Wayback Machine” Today this extension is used by 10s of thousands of people to help them archive URLs, access archived content buy sales lead from broken links (404s, etc.) and perform other functions to help make the web more useful and reliable. We will build on that work, adding features, fix bugs and supporting efforts to bring this tool to millions of users.
Zhengyue Cheng – Wayback Machine do a better job of archiving it. Today the Wayback Machine archives about 1.5 billion URLs/week. A goal of this project will be to help inform the selection of “seeds” for that effort, to help ensure our coverage is as complete and distributed as possible. We don’t know what we don’t know and this project will help us fill in the blanks.
Fotios Tsalampounis – Add functionality to the Wayback Machine to help people learn about changes in web pages over time. Leveraging work done by the Environmental Data Governance Initiative (EDGI) we will continue to develop software to detect changes in the content of web pages and provide user-facing and API-based interfaces to those changes.