In brief
The Google Books Project has stimulated since its creation, back in 2004, a lot of attention in both the positive and negative sides. In a nutshell, it’s a giant project that aims at building the biggest body of knowledge of all time. This is done through an unprecedented endeavor aiming at scanning and digitizing all the books of the planet in all languages and for all genres.
This project has raised the enthusiasm of many because of all the potential that it represents. It has also raised many objections especially from the intellectual property and copyrighting point of view.
Google went through legal battles during 10 years and won it in 2015 thanks to the fair use rule.
The main positive facts regarding the Google Books project can be summarized in the following points:
- Efficiency
- User Experience (UX)
- Feedback & critics about books
- Community
- Ecology
The main negative facts regarding the Google Books project are:
- Copyright and legal issues
- Scanning errors
- Errors in metadata
- Language issues or linguistic imperialism
After almost 2 decades, one obvious financial fact is that Google has underestimated the needed resources for the completion of such a project by orders of magnitude. The initial estimates are at $200M whereas the most recent estimates state numbers that are rather around $4B.
Google has announced that there are 130 million titles that exist today in total. In 2019, Google announced that its teams scanned 40 million titles up to that date, hence almost one third of the total books that exist on earth today..
This post was created using the Lean Market Research Methodology. It’s highly recommended for Product Managers because of all its inspiring tech content.
Let’s start by the big picture
Importance of books
Books are vital for transmitting knowledge and information, for inspiring, for teaching and influencing. There is no need to argue how much books have shaped and will continue to shape human civilizations and history.
Books have seen an important evolution through time while fulfilling these different missions.
They started handwritten on stones, wood and leather, then paper and digital screens.
Indeed, it started with the primates in their caves. It moved then to a more civilized version in old China and Egypt with the rolled up manuscripts. These books belong to the Papyrus plant and were able to reach more than 15m or 50 feet in width once unrolled. This book version existed since the 4th millennium BCE.
The Romans created a more robust and compact version of books in the first century CE with the Roman Codex. This version of books is still handwritten and hasn’t reached the efficiency of printed books yet.
Historical facts
The first version of the printed books appeared in Asia during the first 5 centuries CE.
Came then the modern version of printed books, that is the printing press by Johannes Gutenberg, that appeared on the German side of the Holy Roman Empire in 1439 CE.
This mechanical invention is one of most important inventions of all times because it allowed an unprecedented streaming of information and knowledge. This is performed through a mass printing of books, newspapers and miscellaneous written devices.
This invention led to some other innovations in literature like the fabrication of the first dictionaries in the 16th century that started at less than 10000 words. Today’s dictionaries, like the Oxford dictionary, handle nearly 300000 words and exist in both paper and digital forms.
Speaking of today, the most recent version of books is the e-Book. It’s a digitized version of the book that the reader can access via digital tools like mobile phones, laptops, tablets or digital book readers.
Today, “Print on Demand” is also a hybrid version of books. This is where the paper version is printed only when it’s requested by the reader. Hence avoiding unnecessary inventories or waste of paper.
In this framework, Google Books came to bridge the gap between the old paper version of books that have existed since centuries or even the new non digitized ones, and the digital version of books. This amazing project aims at scanning all the books that exist today and build the biggest body of knowledge of all times, in a digitized version !
Google Books: Principles and key numbers
The official mantra and first mission of Google is to “Organize the world’s information and make it universally accessible and useful”.
Whether one is for or against it, Google Books Project is very consistent with this mantra.
Google announced officially the Google Books Project in December 2004. Its official stated goal is to build and grant free access to the largest body of knowledge of all times.
It’s made possible thanks to the most recent technologies like the Cloud, Artificial Intelligence (AI). It’s also made possible thanks to the increasing efficiency of scanning machines.
Some numbers
In 2010, Google estimated that there are 130 million titles in the world and set up the objective of scanning all of them. In 2019, hence 15 years after the start of the giant scanning effort, Google announced that it scanned 40 million titles in 400 different languages.
This number is far from the initial goals. But it’s still an impressive one third of the total books that exist today on earth, can you imagine this !
The scanning speed started at 40min for a book of 300 pages in 2002. Few months later, Google created and patented much more efficient solutions that can scan 6000 pages in 60 minutes. This corresponds to an increase in efficiency by a factor of more than 10 !
In 2004, at the start of the project, Google estimated the global cost of the project to be between $150M and $200M. The average cost estimated for digitizing one book is $30. Simple math states that scanning all the books would cost $4B instead of 200M which is 20 times more expensive than the initial estimates!
Google Books offer 4 levels of access to books, depending on the copyright agreement :
- Full view: For public books or books opened by their owners
- Preview: For non public books whom owners agreed on the preview
- Snippet view: For non public books whom owners didn’t refuse the snippet view
- No preview: For non public books whom owners refused any disclosure of the copyrighted content
Copyright options
In this framework, Google offers to copyright owners 3 choices:
- Take part of the partner program and allow a partial or full display of the book
- Display only snippets from the book that are the most consistent with the user search
- Opt out from the partner program in which case the book will not be scanned
Most of the scanned books so far are in the public domain since most of them are no longer in print nor commercially available. Libraries, whether public or private ones, are a major source of books for the project, but they are not the only ones.
Indeed, authors and publishers are another major source of books to the project since they gain considerable visibility when they integrate the partner program, which is a win win configuration.
For this project, Google partnered with very prestigious libraries worldwide like the ones of Harvard, Stanford, Michigan, Oxford, The National Library of Catalonia, the French Public Library of Lyon, the Bavarian State Library in Germany, New York Public Library, to name just a few.
Google Books Project is not the sole project aiming at building the biggest body of knowledge of all times. In fact, other projects exist as well like: Project Gutenberg, Internet Archive, HathiTrust, The National Digital Library of India (NDLI), Europeana and Gallica.
Marketing and strategic benefits for google
A legit question to be asked at this level would be: Why does Google invest all these efforts and resources in such a crazy project that is considered by many as simply unfeasible ?
In this amework, 4 major strategic and marketing facts can explain objectively the motivations behind such an amazing endeavor :
- It’s perfectly consistent with the official mission of Google and the brand image that Google has been building since decades. That is to “organize the world’s information and make it universally accessible and useful”. Consequently, such a deep and highly impactful project can only reinforce that mantra publicly.
- It makes Google look like one of the smartest and most powerful brands on the planet. In fact, what is more powerful than holding the biggest body of knowledge of all times !
- Through such a project, Google has an unprecedented database that it can mine in order to make unique breaking through products and services, leveraging all the knowledge and information that exist in that huge database.
- Google Scholar database, which is the equivalent of Google books but specializing in scientific content and discoveries, in addition to Google Books database, can be used in order to detect and hire the brightest minds.
Google Books Project has raised many voices, both in the positive and negative sense. Let’s take a look at the main arguments from both sides.
Positive facts about the Google Books Project
Efficiency:
Having immediate access to tens of millions of books in a fraction of seconds is definitely the dream coming true of any researcher, scientist or simply any information consumer. Even if the Google Books Project didnt’t yet scan all the books that exist on earth, its usage even in its current stae gives a clear idea about the tremendous power of the tool
User Experience (UX):
The digital interface didn’t start perfect and it’s not yet perfect but the user experience is evolving through time since the start of the Google Project. Now the user can make the searches easily, using efficient filters that make the research very straightforward. The Display of the books is also very intuitive and the readability is globally of a good quality. The display of the snippets among the content of a given book takes the research to the next level and makes the researcher save an important time and energy in finding out the best resources for his work.
Feedback & critics of the books:
The feedback and critics of the books are one of the major added values of Google Books Project. Indeed, they make it possible for the researcher to have a first global yet valuable idea about a given resource. For example the relevance of the book regarding a given subject, the quality of the writing or the pros and cons of the content.
Community:
Google Books is not a social media but it leverages this power of any digital tool of making it possible to connect with others that share the same background, needs and profiles. Google Books offer some functionalities that make it easier to connect with other readers, researchers and information seekers.
Ecology:
Going fully digital reduces naturally the paper consumption hence the ecological impact of the book industry on the planet. This positive impact is also seen through the reduction, and even the deletion, of the logistic chain of fabricating then routing and commercializing physical books.
Negative facts about the Google Books Project
Copyright and legal issues:
A major objection that faced the Google Books project in its very early stage is the copyrighting issue. Indeed, many authors, publishers, associations and organizations sued Google for not respecting the basic rules of copyrighting when it scans and publishes copyrighted content without requesting any green light from its owners.
This led to major legal battles that Google went through during 10 years and that Google won in 2015. At that date, the right to fair use was granted to Google. The legal decision was based on the fact that such a giant body of knowledge that Google is building can only serve the higher public interest.
Scanning errors:
The technologies behind book scanning are impressive but not yet perfect. This is especially true when the project aims at scanning tens of millions of books. In this case, even if your process is 99.999% accurate, you end up having thousands of flaws out of your millions of scanned books.
This is particularly the case where sometimes human hands were scanned and published instead of the book pages !
Errors in metadata:
Same applies with the metadata of books. Metadata stands for data like the date when the book was published, the name of the authors, the genre of the book and so on. When the metadata is captured wrongly, this leads to wrong search results like books published before the date of birth of the author. This leads obviously to major mistakes during ones research and reduces the credibility of the Google Books technology.
Language issues or linguistic imperialism:
This particular point can be considered socio-economical or even political. It stands for the criticism that Google started massively at first with English books to be scanned. This led to objections stating that Google is favoring the English culture in front of other major and minor cultures worldwide. This took naturally a political and diplomatic flavor at a point of time.
Verdict
Whether you are in favor of it or against it, you can state confidently that the Google Books Project is one of the biggest bodies of knowledge of all times.
Some facts state that it’s an unprecedented wonder like the fact that it’s aiming at scanning all the books that exist on earth whatever their language or content is. Or facts like the numbers showing how Google is progressing consistently on that amazing project.
These same numbers, show on the other hand, that since 15 years, Google has seemingly underestimated the needed efforts, for the completion of such a giant project, by orders of magnitude and that it’s way under the initially announced numbers.
Another important weakness to the project is all the objections that it has raised and continues to raise regarding the copywriting and intellectual property aspects.
In all, one can say, subjectively, that even if everything is not perfect around the Google Books Project, there are many facts that show that despite all the hurdles, it’s still progressing consistently, towards building the biggest body of knowledge of all times.
Don’t hesitate to comment and provide your opinion regarding the Google Books Project !
Resources
- https://artsandculture.google.com/story/OAXR-SPrQmOCew
- https://en.wikipedia.org/wiki/Holy_Roman_Empire
- https://en.wikipedia.org/wiki/Johannes_Gutenberg
- https://actualitte.com/article/101092/reportages/histoire-de-l-ebook-12-de-google-print-a-google-books
- https://m.youtube.com/watch?v=zz_vG9b9dv0
- https://en.m.wikipedia.org/wiki/Google_Books
- https://en.m.wikipedia.org/wiki/Google_Books
- https://www.quora.com/Which-is-the-business-model-behind-Google-Scholar-or-is-it-a-byproduct-from-indexing-scientific-articles
- https://m.youtube.com/watch?v=3PhMoq77NkA
- https://m.youtube.com/watch?v=zyNSap5XSv0
- https://actualitte.com/article/10831/acteurs-numeriques/google-books-15-ans-40-millions-de-livres-numerises-en-400-langues
- https://m.youtube.com/watch?v=3PhMoq77NkA
- https://m.youtube.com/watch?v=zyNSap5XSv0
- https://scinfolex.com/2015/10/21/comment-laffaire-google-books-se-termine-en-victoire-pour-le-text-mining/
- https://en.m.wikipedia.org/wiki/Google_Books