Thomas comments on Crazy Ideas Thread—October 2015

Thomas 7 Oct 2015 7:46 UTC
0 points
To build a giant lookup table. Google is a small giant lookup table, but we need a much, much bigger one.

Google is too much about interfacing with their table, but that should be put aside for the moment. What I want is to input any blob of data and output should be all possible relations this blob of data has with any other blob of data.

For example, if I input an integral (calculus), its solution (function) would be one of the natural outputs. If I input a picture, all pictures of the same object(s) is the natural answer this GLT should return. Then you can filter them further. It goes on and on. The table itself is constantly updated, of course.

The craziness of this idea is only in that I think it would soon replace Google. Otherwise it’s quite basic.
- HungryHobo 7 Oct 2015 13:20 UTC
  6 points
  Parent
  The rub is what you consider the “natural” output.
  
  If you give it a picture of a blackboard is the natural output pictures of other blackboards, is the natural output similar pictures of rooms with similar color schemes ,is it the life story of the poet who wrote the quote written on the blackboard, internet posts which include the quote ,is it the details of the famous historical event where a politician quoted a line from the same poem or is it the details of the car reg in a photo on the wall?
  
  If you upload a photo of a screwdriver should it give you info on how/where to buy that kind, lists of different types of screwdrivers or pictures of the type of screw it’s designed to fit into?
  
  A major problem you run into with this kind of thing is that you get so very very very many potential links. take a normal photo and there’s thousands, possibly millions of things that link to it reasonable only 1 node away and you need to not just filter but to also prioritize.
  
  Except where it’s pretty safe to assume like with math problems you have to give some kind of hint about what kind of thing you’re looking for.
  - Thomas 7 Oct 2015 13:46 UTC
    −2 points
    Parent
    
    … details of the car reg in a photo on the wall?
    
    Yes. At first, you have only several relations known for such a blackboard. But the GLT updates automatically, via NN for example. Just as it was an indexing machine. Which it is.
    
    Many paths lead from such a blackboard picture. Perhaps as much as one million or more. Perhaps a window is near this blackboard and the Saint Peter basilica in Rome is clearly visible through. Thus, a whole new line of relations is opened here. You can filter them in and out.
    
    Did I mentioned that this table is giant? It would dwarf Google. In fact, every Google query can be only added to it. Another possible relation in the GLT. Along with the IP, date, time, OS—where and when the query has been made.
    
    Input bit blob (string), output bit blob (string). Those kind of tuples, along with some “meta-data”, are the GLT’s (retrievable of course) members.
    - HungryHobo 7 Oct 2015 14:06 UTC
      2 points
      Parent
      Spurious connections would likely be a massive headache like patterns on the wall matching patterns in the shadows of some random photos taken 2000 miles away while the handwriting style gets matched to a pair of Ukrainian schoolchildren who have never been within 3000 miles while the sentence writing style itself gets matched to an internet post by someone completely unrelated who’s never been within 5000 miles talking about potato dishes.
      
      I get that the table is giant but it sounds almost like an expert system which you don’t ask questions but rather throw info at and hope it comes back with what you want.
      
      Even bounded these things can be a headache. I’ve written code that tried to identify duplicate image regions between 2 images and you’d be surprised how many little sections of images that brute force searches can find matches for in others. little areas of sand, particularly generic trees, shapes in clouds, actual duplicated areas which do match but which are rotated through 27 degrees so that you can’t do a straight pixel by pixel match or which are slightly more compressed slightly less compressed.
      
      If, for example, your system stores info about links between every image that has the same pattern of stars in it then you’d likely need more storage space than you could get by turning the earth into computronium. Exponentials are a bugger.
      
      You hit the same problem one 1000000x worse if you’re trying to match on everything everywhere everywhen.
      - Thomas 7 Oct 2015 14:25 UTC
        0 points
        Parent
        Crazy ideas thread, isn’t it?
        
        Still, it isn’t more crazy that Google would look like in say 1990.
        
        Spurious connections would likely be a massive headache
        
        Likely so, but manageable, one way or another.
        
        You hit the same problem one 1000000x
        
        That would justify 6 out of 20 zeros, wouldn’t it?
        HungryHobo 7 Oct 2015 14:49 UTC
        2 points
        Parent
        Oh I like the idea, some kind of massive expert system would be awesome.
        
        I’m just running through some of the problems since I’ve played with some things in related domains.
- Ruzeil 7 Oct 2015 8:11 UTC
  5 points
  Parent
  Wolphram Alpha (http://www.wolframalpha.com/) does more or less what You have described, so I suppose that Stephen has devised/engineered some kind of “table”. But still, can You give some more technical insights in Your idea, since sounds interesting (for me at least).
  
  BR
  - Thomas 7 Oct 2015 8:44 UTC
    1 point
    Parent
    WA is quite impressive in some sub-fields. But not nearly enough. What I want are all possible known relations your nick “Ruzeil” has with anything else. A picture (all known pictures) of you and anybody else who may use it as a nick or a (sur)name etc. Then all your posts here and all those who discussed with you...
    
    If there is a known relation anywhere in this world, that relation should be in this GLT. Then you filter out (and aggregate) as you want. Well, the interface let you do it easily and an API exists as well.
    
    Perhaps 10^20 records are in the table, you can play with. The number grows and grows. And you can access to view all of them.
    
    Every relation in this table has its own probability. Some quite high, some not. And are constantly updated as well. Even the number of possible attributes of a relation in the table develops as well.
    
    Needless to say, you can use the table to see networks of relations between elements of any list you choose to provide to this GLT.
    - pico 7 Oct 2015 19:52 UTC
      4 points
      Parent
      Setting aside whether or not this is useful, I’m not convinced that the implementation you described is practical. Google based search on hyperlinks specifically because that was easy to implement. Is there a smaller search space than the entirety of human knowledge on which this would still be useful?
- [deleted] 7 Oct 2015 16:13 UTC
  3 points
  Parent
  The craziness of this idea is only in that I think it would soon replace Google. Otherwise it’s quite basic.
  
  This was basically the idea behind Wolfram Alpha. He also thought it would soon replace google. But :
  1. It’s very very hard to do. Play around with Wolfram Alpha and you’ll soon see that while sometimes it spits out exactly what you want, other times it just can’t understand what you’re looking up.
  2. Most people don’t think this way. They can’t formulate a query in the proper way to get the correct things out of it.
- Lumifer 7 Oct 2015 15:24 UTC
  2 points
  Parent
  Looks like the semantic web.
- tim 8 Oct 2015 3:15 UTC
  0 points
  Parent
  I dunno. I don’t think I would use what you’re describing over Google. Filtering the associations with little to no work from the end user is huge. If I type “register s” into google, it instantly understands that I want to know about registering scripts in asp.net due to my previous search history, the types of sites I visit, etc.
  
  I think you are underestimating what a tremendous pain in the ass it will be to manually filter through the massive number of associations with a particular string.
  
  In incognito mode “register script” gives links to various resources (WGA/Library of Congress/etc) directed at screenwriters along with sites directed towards programming in languages I don’t know and don’t care about. And this is after Google has removed/hidden links it believes to be spammy or generally unhelpful toward people who make this search.
  - Thomas 8 Oct 2015 7:34 UTC
    0 points
    Parent
    The thing is, that every search you make, is going to be appended to the GLT. I said so, that each Google query can be just added to the table. But not only your Google query, if you choose so—but every GLT query as well.
    
    But even without this option, your “register s” example would work better on GLT then on Google.
    
    With this option on, so much easier.
    
    I think you are underestimating what a tremendous pain in the ass it will be to manually filter through the massive number of associations with a particular string.
    
    Millions of filters would be inside GLT, already. Yours may be added. It is a main advantage over Google. Quite obvious to me.
    - tim 9 Oct 2015 6:03 UTC
      0 points
      Parent
      That conveys a much different impression than
      
      What I want is to input any blob of data and output should be all possible relations this blob of data has with any other blob of data. … If I input a picture, all pictures of the same object(s) is the natural answer this GLT should return.
      
      And how is this functionality
      
      Millions of filters would be inside GLT, already. Yours may be added. It is a main advantage over Google.
      
      any different from Google in the first place? Are you implying they aren’t already mining information regarding each user’s search-revision and link-clicking habits to improve their filters as whole?
      - Thomas 9 Oct 2015 8:52 UTC
        2 points
        Parent
        Google is enough and will be enough? They already doing this and that and everything?
        
        Had Brin and Page thought like that, we would be on AltaVista. But there would be no AltaVista as well. Not even an iron ax.
        
        Some people have no imagination, whatsoever. Most of them. Including very many on this site.
        
        This is a crazy idea thread, remember? Someone may pick one of those ideas and put it into life. That’s all that it is. I will not go into technical details, for sure.
        tim 10 Oct 2015 0:26 UTC
        0 points
        Parent
        shrug
        
        I am interested in your idea but based on your description, I am legitimately uncertain as to how it is measurably different from what Google already does.
        
        I am certainly not saying that Google is and always will be the best.
        Thomas 10 Oct 2015 8:12 UTC
        0 points
        Parent
        Currently Google does not give you all the available pictures of an object, from a photo you have.
        
        This “horizontal” knowledge isn’t present in Google’s databases.
        
        Additionally, page ranking, whichever it is currently, does not permit you to sort the answers by yourself. You may want that. Or implement a function like “the shortest”. And many more complex functions.
        
        Sites are just one type of object. You can’t Google for most other objects.
        
        There are some cameras in Africa, showing you water ponds. I want to know, if there is a waterhole, where a lion came into the picture less than 100 seconds ago. Or a warthog. Or both.
        
        And so on.
        
        Above mentioned GLT would give you such answers, Google doesn’t.