Amazon: Analysis

How Amazon’s Go Store’s AI Works

A look into how Amazon uses AWS, ML, RL, and Simulation to power the Go Store’s “Just Walk Out” experience.

Amazon solved 6 core problems needed to be solved to provide the experience. The Go store was a Computer Vision Complete problem which is a reference to the NP complete class of algorithmic problems from computer science.

Sensor Fusion: Aggregate signals across different sensor (or cameras because this was solved using nothing but computer vision).

Calibration: Have each camera know its location in the store very accurately.

Person detection: Continuously identify and track each person in the store.

Object Recognition: Distinguish the different items being sold.

Pose Estimation: Detect what exactly each person near a shelf is doing with their arms.

Activity Analysis: Determine whether a person has picked up vs. returned an item.

To figure out who takes what (Person Identification) Amazon has to track each person the whole time when they are in the store, from the moment they walk in until they leave. Some of the difficult problems that had to be solved by the Locator component were:

Occlusion – where a person is blocked from view by something in the store.

Tangled State – where people are very close to each other.

To address these problems, Amazon uses custom camera hardware that does both RGB video and distance calculation. From there, they segment the images into pixels, group pixels into blobs, and label each blob as person/not-person. Finally, they build a location map from the frame using triangulation of each person across multiple cameras.

Linker - The next task was to ensure the labels are preserved across frames in the video, moving from locating to tracking the customers in the store. The problems experienced in this phase were:

Disambiguating Tangled States - When 2 people get very close together, this lowers confidence of who’s who. The Go store technology handles this by marking these customers as low confidence, so they get scheduled to be re-identified over time.

There is a follow up phase for distinguishing Amazon Associates, who likely perform different behavior than customers (likely put items on shelves rather than take them off).

Product ID Detection: The key here is which specific items are off of the shelf and in someone’s hand. Some of the problems faced and solutions in this phase were:

Items very similar, like 2 different flavors of the same brand of drink, were distinguished using residual neural networks that do refined product recognition.

Lighting and deformation change the items, which was solved using a lot of training set data generation for these specific challenges.

The most challenging problem for the Customer Association was combing all of the information from the above steps to answer the question “Who took what?”

Pose Estimation: The Location tracking Go store cameras look from the top down, not from an isometric view, so they need to trace a path through the pixels representing the arm between the items and a customer.

Action Determination: To avoid charging customers for items they didn’t take because they put it back. To solve this the system needs to count all the items on the shelf rather than using a simple assumption based on space. You also have the Long Trail where there are a massive number of poses people can be in when picking an object off the shelf, especially when multiple customers are in close proximity. To solve this, Amazon had to generate data using simulators to create virtual customers, cameras, lighting and shadows.

Streaming Services

This system has the following components:

Video capture with compute on bard to do basic preprocessing and cut down the bandwidth requirement.

Video streamer appliance on site to handle video codecs, network issues, and guarantee delivery to the cloud.

Video servers on the cloud to capture and store video in S3 and Dynamo.

Entry and Exit Detection

To detect when people enter and exit the store to create the shopping session, the system has the following components:

Mobile App to scan QR when you show up at the store.

Association System associates your likeness in the video to your account based on your

position in entrance. Creation of the session happens based on association.

Cart, Payment, and Receipts

These are all basically the same as what you have on Amazon.com, so there wasn’t much innovation to be discussed here.

Amazon Go stores, the futuristic convenience store where you simply walk in and grab what you want and back out.

Amazon

Pages

Analysis

No comments:

Post a Comment