Is the current legal situation with patents different?
My understanding is that Google didpatent transformers, but the patent explicitly only covered encoder/decoder architectures and e.g. GPT-2 uses a decoder-only architecture and so not covered under that patent (and that it would have been very hard for OpenAI to obtain and defend a patent for decoder-only transformers due to Google’s prior art).
If your question is, instead, “why didn’t the first person to come up with the idea of using computers to predict the next element in a sequence patent that idea, in full generality”, keep in mind that (POSIWID aside) patents are intended “to promote the progress of science and useful arts”. They are not meant as a way of allowing the first person to come up with an idea to prevent all further research in vaguely adjacent fields.
As a concrete example of the sorts of things patents don’t do, take O’Reilly v. Morse, 56 U.S. 62 (1853). In his patent application, Morse claimed
Eighth. I do not propose to limit myself to the specific machinery or parts of machinery described in the foregoing specification and claims; the essence of my invention being the use of the motive power of the electric or galvanic current, which I call electro-magnetism, however developed for marking or printing intelligible characters, signs, or letters, at any distances, being a new application of that power of which I claim to be the first inventor or discoverer.
If this claim can be maintained, it matters not by what process or machinery the result is accomplished. For aught that we now know some future inventor, in the onward march of science, may discover a mode of writing or printing at a distance by means of the electric or galvanic current, without using any part of the process or combination set forth in the plaintiff’s specification. His invention may be less complicated-less liable to get out of order-less expensive in construction, and its operation. But yet if it is covered by this patent the inventor could not use it, nor the public have the benefit of it without the permission of this patentee. [...] In fine, he claims an exclusive right to use a manner and process which he has not described and indeed had not invented, and therefore could not describe when he obtained his patent. The court is of opinion that the claim is too broad, and not warranted by law.
“why didn’t the first person to come up with the idea of using computers to predict the next element in a sequence patent that idea, in full generality”
Patents are valid for about 20 years. But Bengio et al used NNs to predict the next word back in 2000:
patents are intended “to promote the progress of science and useful arts”.
I knew this is how patents were supposed to work in theory, but I also assumed that the actual practice is different. People complain about patent trolls, patents being granted for trivial applications of existing ideas, patent claims written in a maximally vague way that later allows lawyers to claim that they apply to all kinds of things that the patent owner didn’t even think about at the time, etc.
Amazon had “one click” patented, how did that promote the progress of science and useful arts?
People complain about patent trolls, patents being granted for trivial applications of existing ideas, patent claims written in a maximally vague way that later allows lawyers to claim that they apply to all kinds of things that the patent owner didn’t even think about at the time, etc.
All of these things indeed happen, but if they get resolved, this tends to happen in subsequent litigation for patent infringement, in which the party that gets accused of infringing raises the defense of invalidity, which then gets resolved by factfinders and courts.
In practice, it is relatively easy to get a patent approved because this is initially[1] not an explicitly adversarial process: the PTO (Patent and Trademark Office, in the US) simply reviews your patent claim and says ‘yes’/‘no’ without usually getting direct input from your competitors/adversaries/random other people that might publicly assert your patent is nonsense. But a patent alone does not physically cause most meaningful stuff to happen: in order to actually exclude others from making, buying, or selling the invention, you need to file a specific cause of action in court. And that’s when the bogus patent claims are usually brought down (if the alleged infringer fights back): they are superficially reasonable enough to get past the PTO, but not past an explicitly adversarial process in which the opponent’s attorney explains to an unbiased and experienced judge why the patent is invalid.
So what’s the whole deal about patent trolls and other stuff like that? Well, it goes back to a clause I wrote in my first paragraph: “if they get resolved.” Note that, in the story I told above, it might be easy to defeat a bogus patent in court, but you must generally still go to court in the first place. And this is a significant deterrent in many situations because of the necessary investments of time and resources, such as money. This becomes particularly prohibitive given the American rule that governs most situations that arise in the US and basically says that each party is responsible for paying its own attorney’s fees (barring exceptional circumstances), regardless of who wins the case.
So patent trolling persists not because terrible patents routinely survive close scrutiny, but instead because, at least sometimes, it successfully bullies rightful opponents to the patent by disincentivizing them from even making their case.
There are statutes providing for reexaminations of patents after they get approved, but in practice this doesn’t seem to dispose of the majority of contentious issues in this area.
OK, now it seems to me that the nature of the patent battle is different when it is “inventor vs inventor” or “corporation vs corporation”.
In a “corporation vs corporation” battle, stupid patents are destroyed at the court.
In an “inventor vs inventor” battle, if the first inventor becomes a successful entrepreneur (or joins forces with one), it becomes an asymmetric “corporation vs other inventors” battle, and the other inventors lose.
So I guess the answer to my original question is: because this time, multiple corporations immediately saw that this is going to be super profitable, so they keep each other in check.
(I suppose if there was some genius in a garage with some revolutionary ideas trying to compete with the established AI companies, he would still get asymmetrically squashed like a bug… maybe using patents, maybe something else.)
My understanding is that Google did patent transformers, but the patent explicitly only covered encoder/decoder architectures and e.g. GPT-2 uses a decoder-only architecture and so not covered under that patent (and that it would have been very hard for OpenAI to obtain and defend a patent for decoder-only transformers due to Google’s prior art).
If your question is, instead, “why didn’t the first person to come up with the idea of using computers to predict the next element in a sequence patent that idea, in full generality”, keep in mind that (POSIWID aside) patents are intended “to promote the progress of science and useful arts”. They are not meant as a way of allowing the first person to come up with an idea to prevent all further research in vaguely adjacent fields.
As a concrete example of the sorts of things patents don’t do, take O’Reilly v. Morse, 56 U.S. 62 (1853). In his patent application, Morse claimed
The court’s decision stated
...which might have something to do with autoregressive language models being more popular than encoder/decoder ones.
Patents are valid for about 20 years. But Bengio et al used NNs to predict the next word back in 2000:
https://papers.nips.cc/paper_files/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf
So this idea is old. Only some specific architectural aspects are new.
I knew this is how patents were supposed to work in theory, but I also assumed that the actual practice is different. People complain about patent trolls, patents being granted for trivial applications of existing ideas, patent claims written in a maximally vague way that later allows lawyers to claim that they apply to all kinds of things that the patent owner didn’t even think about at the time, etc.
Amazon had “one click” patented, how did that promote the progress of science and useful arts?
All of these things indeed happen, but if they get resolved, this tends to happen in subsequent litigation for patent infringement, in which the party that gets accused of infringing raises the defense of invalidity, which then gets resolved by factfinders and courts.
In practice, it is relatively easy to get a patent approved because this is initially[1] not an explicitly adversarial process: the PTO (Patent and Trademark Office, in the US) simply reviews your patent claim and says ‘yes’/‘no’ without usually getting direct input from your competitors/adversaries/random other people that might publicly assert your patent is nonsense. But a patent alone does not physically cause most meaningful stuff to happen: in order to actually exclude others from making, buying, or selling the invention, you need to file a specific cause of action in court. And that’s when the bogus patent claims are usually brought down (if the alleged infringer fights back): they are superficially reasonable enough to get past the PTO, but not past an explicitly adversarial process in which the opponent’s attorney explains to an unbiased and experienced judge why the patent is invalid.
So what’s the whole deal about patent trolls and other stuff like that? Well, it goes back to a clause I wrote in my first paragraph: “if they get resolved.” Note that, in the story I told above, it might be easy to defeat a bogus patent in court, but you must generally still go to court in the first place. And this is a significant deterrent in many situations because of the necessary investments of time and resources, such as money. This becomes particularly prohibitive given the American rule that governs most situations that arise in the US and basically says that each party is responsible for paying its own attorney’s fees (barring exceptional circumstances), regardless of who wins the case.
So patent trolling persists not because terrible patents routinely survive close scrutiny, but instead because, at least sometimes, it successfully bullies rightful opponents to the patent by disincentivizing them from even making their case.
There are statutes providing for reexaminations of patents after they get approved, but in practice this doesn’t seem to dispose of the majority of contentious issues in this area.
OK, now it seems to me that the nature of the patent battle is different when it is “inventor vs inventor” or “corporation vs corporation”.
In a “corporation vs corporation” battle, stupid patents are destroyed at the court.
In an “inventor vs inventor” battle, if the first inventor becomes a successful entrepreneur (or joins forces with one), it becomes an asymmetric “corporation vs other inventors” battle, and the other inventors lose.
So I guess the answer to my original question is: because this time, multiple corporations immediately saw that this is going to be super profitable, so they keep each other in check.
(I suppose if there was some genius in a garage with some revolutionary ideas trying to compete with the established AI companies, he would still get asymmetrically squashed like a bug… maybe using patents, maybe something else.)