3D Video, Camera accelerometer gyro inertia, depth, all sky, global correlation networks

I am watching many live videos on the Internet.  It is an exploding social phenomena on the Internet.  I am interested in the machine vision, artificial intelligence, image processing, sensor correlation network sides of it as well.

Here I am looking at a bridge over a river.  At the moment I happened to start, it is raining and the wind is shaking the camera severely.  So the movement is many pixels, not parts of a pixel. The far ones are smaller, that helps determine distance.  And the pixel to transverse movement ratios are different. I will work out the details slowly as I have time

RKK YouTube Official Channel Arupo tv
https://www.youtube.com/watch?v=iFEjuS6Alzs

There are round globe shaped lights across the bridge at regular intervals.  Some closer, some farther away. If the camera is shaking in a known way then distance and pixel shifts would be easy to calculate.

If all cameras also had inertial measurement recordings (3 axis accelerometer for motion or movement, and 3 axis gyros for rotation), then a tiny bit of data would tell you how far away things are.  You could jiggle the camera in a known path for optimal measurements.  Or just record the accelerations for cheap 3D.  Share the vibration and depth information with seismic network and 3D imaging networks.  Share the rain and wind information with the meteorology networks.  Share the microphone and low frequency vibrations with the infrasound networks.  I think all the sensor networks will learn to use all parts of their signals, not just the obvious or usual ones.

Faster inertial recordings of movement between frames could be stored in a part of the image, with lossless video, or start sharing packets that contain video, audio, inertial and other data.  Make the algorithms and reference datasets universally available on the Internet to everyone and every sensor, so that global, regional, local and topic integrations are possible.  You could have a 1 MegaSample per second accelerometer (they tend to call them vibration sensors or microphone or strain sensors when using faster sampling rates) where the recordings are stored with each video packet.  60 frames per second with 1000th second exposures leaves lots of time for gathering and packing data to send.

In any video camera the far pixels contain many more atoms than the closer ones.  It is hard to go from atomic composition of materials because just looking at them without much spectroscopic or complementary sensor data (now) means you don’t know if a pixel showing a distant mountain is looking at granite, or soil, or a wide mixture of things.  If you know it is plants, you can use closer plant correlations and patterns (these show in the spatial frequencies and periodic correlation).  I am learning to see the world in a whole new way because I have records to work with.  I can look at fine details of things for hours or days or months or years at a time.

I started out with the superconducting gravimeters.  Those are only one sample per second. But they collect data for decades.  And they are precise and lock to the sun and moon almost exactly. Enough to use them as tracking and “gravitational GPS” devices.

Wrote  a small program in Visual Studio, Visual Basic Forms.  I read pixels on my left screen and look at them on my right screen.  I can enlarge the images.  But the video is that sloppy and clumsy jpg type compression.  Whatever the compression, I don’t know the details right now and so cannot get the most out of the pixels because of unknown processing steps by someone else. I can use C or C++ in Visual Studio or I can play game with trying to get Chrome browser DevTools to work as a professional development tool, rather than a plaything for unsupervised groups using it for their own toy.  I wish there were stable tools for the Internet, rather than the hodge-podge that is out there now.

But, as well as the pixels I read are somewhat representative of the real world, I can still apply many correlation and measurement methods.  Yesterday I was watching stars on one of the few all sky cameras where there is good seeing.

From their known position and motion, I can solve for the camera position and direction and changes over time. That is the same as having an accelerometer, gyro on the camera(s). It is somewhat limited, but better than nothing. From the camera motion derived from the known star, planet, moon, sun, comet, satellite, plane, (car, people, bird, fog, cloud – anything moving that can be tracked and a continuous position function estimated) motions, I can then look at the pixels shifts of other things in the image to solve for sub-pixel data.

If I shift a part of a scene by half a pixel, and the illumination and geometry and things have not changed much in 1/30th or 1/60th or 1/millionth of a second, then the new pixel values tell about subpixels.  I would start with breaking all the pixels into 4 pieces and noting the data for each time.  If the scene shifts by half a pixel left, the subpixel on that side is the same as the right subpixel of the pixel to its left.  Words are not very good for this, except when you are programming tiny algorithms.  I can visualize most of it and just write down the solution or algorithm. But it is easy to write a simulation to show one layer with the grid of subpixels for the previous frame, then the grid of subpixels for the current frame and allow that they are shifted, rotated and perhaps change in distance as well.  All we have is total intensity estimates for each color for the coarse pixels, but given a specific relative change in the two grids, an estimate for every subgrid of the “truth” of the image can be made.  It is not perfectly exact, but ALL reality is so much larger in information than our models, our cameras, our computers can handle.  A pixel of water on the ocean can represent tens of cubic meters of water.  And each 0.018 KiloGrams of water is 6.022*10^23 molecules of water.  All moving independently, all with their individual rotations, vibrations, and interactions with nearby molecules and fields and electrons.

It is a shame the webcames only show lossy 1080p.  An all sky camera for stars (fish eye, 180 degree, 360 degree cameras) can capture the whole sky and serve views for humans by mapping the pixels.  A small bit of memory makes that a table lookup and probably can be done in real time.  A multicore or multiprocessor can serve many people looking at images from the same sensor.  Probably the sky cams from sporting and entertainment events already do that.  I only have one lifetime and the world is so much larger and more complex than I will have time for.

A decent high resolution camera for an all sky camera, using lossless format, and sending summary frames (also lossless) every second would be a very useful resource and global archive for teaching and learning and see things in the sky.  Personally, I think every webcam should use lossless format, the image archived for historical and research purposes, and a lively and thoughtful global correlation and image processing community established.

There are cars and trucks in the video. As there are in the many thousands of traffic cameras and videos that just happen to have cars, trucks, planes, drones, and other moving things.  Each of those is a solid thing that exists between frames.  And usually not with much change in velocity or acceleration or jerk between frames. So all the intermediate frames can be estimated if you happen to have a full 3D model of the specific model of car or truck or bicycle or plane or rocket ship. Even people have skeleton models and better and better animations. So you see a few pixel blob of a person, you can substitute a full model, do the lighting and do the geometry, and figure out what is the most likely thing they are doing.  Or assign probabilities to each possible scenario.

I forgot ships, containers, golf carts, shopping carts, boxes, bags, cans, barrels.  I forgot leaves blowing from side to side, or leaves or trees. All those have, or can have, high resolution models for anyone with a lower resolution camera image to use for guessing what the fine details are like.

So I have a traffic camera with 20 cars showing.  In principle, I can use a generic “sedan” model watch for a few frames, narrow down the model and color and size and direction and velocity and acceleration and then use that to look for specifics. There was an astronomer (AutoStakkert) who was showing how you an stack many frames of a license plate, register each frame to one model, and then use the subpixel information (it is uniform motion of the pixel centers and locations and geometry) to get more information from many frames, than from one.  And getting the human out of the loop, out of being the single point of failure in the complex and tedious algorithm, will help.  Let them work on using all the data and seeing how if fits into the world.

I had a discussion with my brother, Clif, yesterday.  We were again bemoaning the fact that all the programming languages still are not evolved. We both have been programming for decades.  He started as a professional programmer 45 years ago, and probably was working on it before.  Me, I got paid for programming in college and then went to work as a scientific programmer just over 50 years ago and I had been designing and using computers for five years before.  So we together have seen almost every programming language, database, algorithms, device and proposal.

What we see is that all the programming still forces each programmer to memorize sequences of things, to keep in mind the myriad connections between the parts and their interdependencies.  We (the whole of programmers on the earth) are still mostly working with ascii text editors and the same tools that were all that were available those many decades ago.  Now, it is great if you get paid by the hour, because typing text into computers can take hundreds of times longer than if you have decent tools.  What I don’t like is that the systems are not getting smarter and more courteous and more helpful, but thoughtless and insensitive to human needs.

Too tired to make that a clear picture. Will try again sometime.

I worked in my high school television station as a camera man. That camera was on a dolly and was taller than me.  It could have weighed two hundred pounds.  It seemed huge and it was only analog 325×240 or something like that.  You have this huge lens changer that you had to pull out and turn to change focus. (I changed the lens the first time the director told me they were going to my camera. Either I thought I was supposed to change or I just happened to be changing when he put my camera live. Either way I think it went out live and I was embarrassed.  Did I spend every moment of my life working, or trying to learn useful skills? It seems like it.)

But my point was, that I have wanted to use the data from the cameras for more than 50 years now.  I bought many small cameras and have tried to use the data. What  I am still missing is the patience to spend days or weeks or months just trying to get the data out of the camera and into memory where I can write decent algorithms.  After many many years I finally have enough to do that. And the thousands of live videos come, and live telescopes and live microscopes and security cameras, and live video conferences – and there is such a wealth of data — all in lossy formats — to work with. Sort of.  Always half-way.  Always an after thought.  Always for one purpose when ignoring all the possible things that can be done with that same data stream.

That video is running on my other screen.  Out of the corner of my eye I saw it go through a zoom out.  So for those frames the zoom was probably a mechanical zoom with continuous motion of the lenses. So that analog zoom means that a continuous motion model, and subpixel methods will make ALL those frames more valuable.

Richard K Collins, Director, The Internet Foundation


Feedback on a YouTube video:  I am looking at https://www.youtube.com/watch?v=irLk8do_6Mc and similar live videos from Japan. I want to save them to watch later. But the first item in the list of “SAVE” has no text. There is a check box and then empty space, then the other checkboxes and their text. I cannot include a screen shot because when that save box is open, it is not possible to open my account and get to feedback. And I cannot change tabs with this message. And I cannot attach a screen clip that I made. You have my email. Please contact me if you don’t understand. I have some general recommendations for all the YouTube live videos. I can write policies for all videos on the Internet. That is why I am looking at the ones on YouTube. But it would be easier if someone from Google/YouTube who cares about global Internet standards would work with me. Thanks. Richard K Collins, Director, the Internet Foundation


Created a “Live Videos on the Internet” playlist on YouTube with this description:  I have found thousands of live videos on YouTube and the Internet. For the Internet Foundation, I am trying to establish standards. They should be lossless, traceable, with precise location and direction information. The technical details of the camera, lens and camera properties should be available. The owner should be contactable. That can be done through anonymous messaging through YouTube (or whatever site the video is on). There are many more issues, and this playlist on YouTube is probably not the right place for this. But I have another list with over a thousand live videos collected, and I cannot see how to transfer to this one, and to categorize them so it will be useful to others.

 


 

Richard K Collins

About: Richard K Collins

Director, The Internet Foundation Studying formation and optimized collaboration of global communities. Applying the Internet to solve global problems and build sustainable communities. Internet policies, standards and best practices.


Leave a Reply

Your email address will not be published. Required fields are marked *