Indico Data partners with Convex Insurance to speed up and simplify submission intake and enhance underwriting efficiency
Learn More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report


Deep Learning in Fashion (Part 3): Clothing Matching Tutorial

August 9, 2016 | Business, Developers, Image Data Use Case, Tutorials

Back to Blog

In Part 2 of this series, we discussed how e-commerce fashion sites typically make clothing recommendations based on image similarity (here’s a great tutorial on how to do that, by the way). But what if you could also recommend products based on how well they matched with a past purchase? We built a fashion matching demo to show how you can do just that, and I’ll also teach you how to build that model in Python through this tutorial.



Image credit to DeepLearning.TV

The Task

Build a simple fashion matching model using Custom Collections (our customizable machine learning API) — because this is just a proof of concept, we need to constrain the problem to a limited wardrobe of clothes so that it’s easier to make sure things are working. Start small, and build from there. So, let’s match shirts to pants. Our wardrobe will only have five pairs of pants, but an “unlimited” number of shirts to choose from.

The Data

Apparently we had this dataset of images from Lord & Taylor’s product feed laying around, so I selected five pairs of pants that I felt were different enough from one another, but could also be staples in someone’s closet (except maybe those crazy, colourful pants. I’m a little scared of them, but also find them kind of fascinating).


Who…how? Why?

Because Custom Collections works using transfer learning (see Part 1 of this series for more information), we can build this model with a very small dataset. It’s awesome! I randomly selected 100 different shirts from the Lord & Taylor feed. If you want to follow along, you can clone the dataset and skeleton code from the Github repo.

Training the Model

Before we dive in, one quick check — have you set up your indico account yet? In case you haven’t, follow our Quickstart Guide. It will walk you through the process of getting your API key and installing the indicoio Python library. If you run into any problems, check the Installation section of the docs. You can also reach out to us through that little chat bubble if things start exploding.

Assuming your account is all set up and you’ve installed everything, let’s proceed.

Step 1: Labeling the Data

Just kidding, you don’t actually have to label anything if you don’t want to — I’ve already done it in that text file, clothes_match_labeled_data_1.txt. If you want to add other images to the dataset though, you’ll need to know how I’ve labeled the data.
If you take a look at the text file, you’ll see a bunch of seemingly random numbers.

1050: [1, 2, 3, 4, 5]
1349: [1, 2, 3, 4, 5]
4160: [1, 2, 3]
12180: [1, 2, 3, 4]
12234: [2, 3, 4]

All the numbers before the colon actually correspond to the file names of the various shirts without the .jpg extension at the end. The numbers wrapped in the lists each correspond to a pair of pants:

So, I’ve actually labelled each shirt with the different pants I think it will match with. For example, I think shirt #12234 probably goes best with the dark blue jeans (2), the white pants (3), and the classic light blue jeans (4). Side note, I’m by no means a fashion guru, so if you disagree with any of pairings, go ahead and tweak the labels as you see fit. I’m certain I’ve horrified some poor fashionista somewhere with at least one of these pairings.

Step 2: Preparing the Labeled Data for Your Collection

Now let’s open up (In case you missed it above, we’ll be working in Python.) The Custom Collections API only takes lists of items paired with a single label, so we need to take apart the dataset and rearrange it so that this:

1050: [1, 2, 3, 4, 5]

reads more like this:

1050: 1
1050: 2
1050: 3
1050: 4
1050: 5

We do this in the generate_training_data() function. I’m not going to go into detail as it’s just pre-processing code, but if you have any questions please reach out through the little chat bubble.

Step 3: Training Your Collection

This is the fun part. Also, the easiest part.

Go to the top of your file and import indicoio. Don’t forget to set your API key — there are a number of ways you can do it; I like to put mine in a configuration file.

import indicoio
from indicoio.custom import Collection
indicoio.config.api_key = 'YOUR_API_KEY'

Go back down to the bottom of your file and under if __name__ == "__main__", pass in the file containing your labeled data for processing, then define your empty Collection. Now, just add your data to the Collection and train!

if __name__ == "__main__":
    train = generate_training_data("clothes_match_labeled_data_1.txt")
    collection = Collection("clothes_collection_1")
    for sample in tqdm(train):
        print sample

Simple as that. Adding .wait() will block until the training is complete. Otherwise you’ll need to check the status of your new Collection separately and make sure it’s ready before using it in analysis (check the docs for more information). Also, I’ve used tqdm so you’ve got a progress bar telling you how much of your data has been added to the collection as you wait. Since the dataset is pretty small, it shouldn’t take very long to train — maybe 15 minutes? It really depends on how fast your Internet connection is though.
All done? Now you’ve got a clothing matching model 🙂

Testing the Model

Now that you’ve got a working model, we need to test it to make sure it’s returning the results you want. I set aside some shirts in the test_shirts folder that weren’t in the training dataset (you don’t want to test on shirts the model has already seen — of course it will get the answer right). Comment out the code for training the model under if __name__ == "__main__", and then un-comment the following code.

collection = Collection("clothes_collection_1")
print collection.predict("9915.jpg")
print collection.predict("9969.jpg")
print collection.predict("12770.jpg")
print collection.predict("13668.jpg")
print collection.predict("14195.jpg")

You’ve got five pairs of pants, so anything with a probability of 0.2 and above is indicative of what the model feels is a definite match. Your results should appear as below (don’t worry if there are slight variations in the numbers — that’s normal. As long as they’re roughly the same).

{u'label1': 0.0183739774, u'label2': 0.8100245667, u'label3': 0.1102390038, u'label4': 0.060105865800000005, u'label5': 0.0012565863}
{u'label1': 0.0661260124, u'label2': 0.26007383030000003, u'label3': 0.5855780751, u'label4': 0.0881810026, u'label5': 4.10797e-05}
{u'label1': 0.0496193967, u'label2': 0.1867289942, u'label3': 0.5951979577000001, u'label4': 0.1683675217, u'label5': 8.612970000000001e-05}
{u'label1': 0.24439144670000001, u'label2': 0.0158232929, u'label3': 0.7217895728, u'label4': 0.016270852000000002, u'label5': 0.0017248356}
{u'label1': 0.0068146869, u'label2': 0.5331305272, u'label3': 0.004712549700000001, u'label4': 0.43632563480000003, u'label5': 0.019016601600000002}

Hmm, looks like the model doesn’t think those brightly coloured, slightly terrifying pants match with anything. When I was labeling the dataset, I marked them as matching with plain white shirts, but there were only a couple of white shirts in that dataset. Seems like the model didn’t see enough positive examples to confidently match those pants with white shirts.

The results for the other pants look about right to me though. So, let’s give the model a few more example matches of crazy pants to white shirts and see what happens!

Go back and add the ten white shirts in the white_shirts folder to your training_shirts folder. You can simply add the data to your existing Collection and retrain the model if you like, but for comparison purposes I’m going to create a new Collection and train a new model.
clothes_match_labeled_data_2.txt is the dataset updated with the new white shirts and their labels. Note that I still labelled the white shirts as matching with the jeans as well as the crazy pants — if I didn’t do that, the model would learn that white shirts do not match at all with jeans and only with crazy pants, which isn’t the case.

Comment out the prediction calls again, and update if __name__ == "__main__": to reflect the new dataset (and un-comment the training section).

train = generate_training_data("clothes_match_labeled_data_2.txt")

To train a new model so you can easily compare results, give your Collection a different name. Otherwise, leave it as it is.

collection = Collection("clothes_collection_2")

Run the code again to add your Collection and train your model.
All set? Let’s test again! Don’t forget to comment/un-comment.

collection = Collection("clothes_collection_2")
print collection.predict("9915.jpg")
print collection.predict("9969.jpg")
print collection.predict("12770.jpg")
print collection.predict("13668.jpg")
print collection.predict("14195.jpg")

Your results should look similar to this.

{u'label1': 0.046529974, u'label2': 0.7398524388000001, u'label3': 0.1189986829, u'label4': 0.0836434747, u'label5': 0.0109754297}
{u'label1': 0.07557007810000001, u'label2': 0.2758909466, u'label3': 0.48225349580000004, u'label4': 0.1661975791, u'label5': 8.79005e-05}
{u'label1': 0.1087516552, u'label2': 0.216920259, u'label3': 0.5524971075, u'label4': 0.119904359, u'label5': 0.0019266193000000002}
{u'label1': 0.2308552171, u'label2': 0.032740030600000004, u'label3': 0.6973776149, u'label4': 0.0369096186, u'label5': 0.0021175188}
{u'label1': 0.018701656100000002, u'label2': 0.2193391817, u'label3': 0.0032624906, u'label4': 0.29365973160000003, u'label5': 0.46503694}

Woo! Looks like the model learned that white shirts are the only chance you’ll have to take those pants out of your closet, so you’d better grab them while you can and strut about. And voilà — there’s your little clothing matching model.

Next Steps

So now you’ve got a proof of concept for fashion matching recommendations, but you want to expand your wardrobe. Well, you’ll need a bigger dataset; a few hundred will probably do (the benchmark is around 25-50 per class/label). That’s not as scary as it sounds — just get comfortable in a coffee shop for a couple of hours. What better excuse to escape the office and devour a couple of pastries while looking at clothes?
You may also have noticed that we didn’t show the model the images of the pants in this demo. It works for this case because the model knows what the features of the shirts look like, as well as the relationships among the five pairs of pants because we’ve provided the labels. So, it has an abstract understanding of these five pairs of pants. To make the model more generalizable though, you’d need to show it the images of the pants (and any other fashion items). I haven’t explored beyond this fixed wardrobe yet so I can’t go into further detail.

In any case, you know the basics for how to approach this problem, without the barriers that typically accompany deep learning models (like model complexity, or large amounts of data required to build). It’s through this concept of transfer learning where we see how the benefits of deep learning — like accuracy and flexibility — become accessible and customizable to businesses with smaller data. I hope you enjoyed this series, and if you need any help at all, reach out to us at or speak to Julie (who is most definitely a human, and an awesome one at that) through that little chat bubble!

Effective January 1, 2020, Indico will be deprecating all public APIs and sunsetting our Pay as You Go Plan.

Why are we deprecating these APIs?

Over the past two years our new product offering Indico IPA has gained a lot of traction. We’ve successfully allowed some of the world’s largest enterprises to automate their unstructured workflows with our award-winning technology. As we continue to build and support Indico IPA we’ve reached the conclusion that in order to provide the quality of service and product we strive for the platform requires our utmost attention. As such, we will be focusing on the Indico IPA product offering.



Increase intake capacity. Drive top line revenue growth.


Get started with Indico

1-1 Demo



Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!