Skip to content Skip to sidebar Skip to footer

Combining 2 Lists Like UNION Ie Keeping 1 Copy Of Mutual Items Between 2 Lists And Appending Others

I edited it. I made them lists to dictionaries. If a and b are 2 dictionaries: a = {'UK':'http://www.uk.com', 'COM':['http://www.uk.com','http://www.michaeljackson.com']} bb = {

Solution 1:

I think maybe a defaultdict could be helpful here:

from collections import defaultdict

a = [['UK', ['http://www.uk.com']], ['COM', ['http://www.uk.com'],['http://www.michaeljackson.com']]]

b = [['Australia', ['http://www.australia.com']], ['COM', ['http://www.Australia.com'], ['http://www.rafaelnadal.com'], ['http://www.rogerfederer.com']]]

d = defaultdict(list)
d.update((v[0],v[1:]) for v in a)
for v in b:
    country_or_com = v[0]
    urls = v[1:]
    d[country_or_com].extend(urls) 

This isn't exactly the data-structure you ask for, but it's pretty close (and I think prefereble).

If you really want it in the format you have (although in a different order):

c = []
for k,v in d.items():
   out = [k]
   out.extend(v)
   c.append(out)

results in:

[['Australia', ['http://www.australia.com']],
 ['COM', ['http://www.uk.com'], ['http://www.michaeljackson.com'],['http://www.Australia.com'], ['http://www.rafaelnadal.com'], ['http://www.rogerfederer.com']],
 ['UK', ['http://www.uk.com']]]

As desired.


Solution 2:

Consider using a different data structure, such as a dictionary of sets.

a = [['UK', ['http://www.uk.com']], ['COM', ['http://www.uk.com'],['http://www.michaeljackson.com']]]
b = [['Australia', ['http://www.australia.com']], ['COM', ['http://www.Australia.com'], ['http://www.rafaelnadal.com'], ['http://www.rogerfederer.com']]]

# convert these to dictionaries with set values
a = {item[0]:set(s[0] for s in item[1:]) for item in a}
b = {item[0]:set(s[0] for s in item[1:]) for item in b}

# define a function to update our dictionary-of-sets data structure
def union_update_setdict(D, *setdicts):
    """Update dictionary D (with `'key':set(value)` items) with items from setdicts.

    If a new key is added to D from setdicts, a shallow copy of the value
    is added to D.
    """
    for setdict in setdicts:
        for k,v in setdict.items():
            try:
                D[k].update(v)
            except KeyError:
                D[k] = v.copy()

union_update_setdict(a, b)



# Now let's test that the code works

expected = [['UK', ['http://www.uk.com']], ['COM', ['http://www.uk.com'], ['http://www.michaeljackson.com'], ['http://www.Australia.com'], ['http://www.rafaelnadal.com'], ['http://www.rogerfederer.com']], ['Australia', ['http://www.australia.com']]]

# put the "expected" results in our new data structure for comparison
expected = {item[0]:set(s[0] for s in item[1:]) for item in expected}

print a
assert expected == a

If you really need to keep using your terrible data structure, you can convert it back when you are finished:

terribledatastruct = [[k]+[[item] for item in v] for k,v in a.items()]
print terribledatastruct

Solution 3:

if your lists are dictionaries you just need to merge the dictionaries:

>>> a = {'UK':'http://www.uk.com', 'COM':['http://www.uk.com','http://www.michaeljackson.com']}
>>> bb = {'Australia': 'http://www.australia.com', 'COM':['http://www.Australia.com', 'http://www.rafaelnadal.com','http://www.rogerfederer.com']}
>>> dict(a.items()+bb.items())
{'Australia': 'http://www.australia.com', 'COM': ['http://www.Australia.com', 'http://www.rafaelnadal.com', 'http://www.rogerfederer.com'], 'UK': 'http://www.uk.com'}

update

my answer so far does that:

>>> sk = list(set(bb.keys()+a.keys()))
>>> sk
['Australia', 'COM', 'UK']
>>> nd
{}

>>> for i in sk: 
...     if i in a.keys():
...        nd[i]=a[i]
... 
>>> nd
{'COM': ['http://www.uk.com', 'http://www.michaeljackson.com'], 'UK': 'http://www.uk.com'}
>>> for i in sk: 
...     if i in bb.keys():
...        nd[i]=bb[i]
... 
>>> nd
{'Australia': 'http://www.australia.com', 'COM': ['http://www.Australia.com', 'http://www.rafaelnadal.com', 'http://www.rogerfederer.com'], 'UK': 'http://www.uk.com'}

I suspect one should still use dictionaries, here is a not very cpu efficient* way to do it:

a = {'UK':'http://www.uk.com', 'COM':['http://www.uk.com','http://www.michaeljackson.com']}
bb = {'Australia': 'http://www.australia.com', 'COM':['http://www.Australia.com', 'http://www.rafaelnadal.com','http://www.rogerfederer.com']}

sk = list(set(bb.keys()+a.keys()))

nd = {}

for i in sk:
    plholder=[]
    if i in a.keys():
        print i
        print isinstance(a[i], str)
        if isinstance(a[i], str):
            plholder.append(a[i])
        else:
            if i in a.keys():
                for v in a[i]: plholder.append(v)
    if i in bb.keys():
        if isinstance(bb[i], str): plholder.append(bb[i])
        else:
            if i in a.keys():
                for v in bb[i]: plholder.append(v)
    nd[i]=plholder
print nd
{'Australia': ['http://www.australia.com'], 'COM': ['http://www.uk.com', 'http://www.michaeljackson.com', 'http://www.Australia.com', 'http://www.rafaelnadal.com', 'http://www.rogerfederer.com'], 'UK': ['http://www.uk.com']}

*non cpu efficient because for large data sets append will be very slow.


Solution 4:

Strange data structures are often to suit some strange devilish library lurking below. I'd propose, for obviousness's sake, to transform that into proper data structures, work with those, then transform it back. You avoid implementation trouble with a hand-knitted union algorithm on your strange structure etc.

First, make the data decent:

a_ = { x[0]: set(e[0] for e in x[1:]) for x in a }
b_ = { x[0]: set(e[0] for e in x[1:]) for x in b }

Then work with it:

c_ = defaultdict(set)
for k, v in a_.iteritems():
    c_[k] |= v
for k, v in b_.iteritems():
    c_[k] |= v

Then transform it back to your strange structure:

return [[k] + [[e] for e in v] for k, v in c_.iteritems()]

This way I think what is really done is clear.

Assumptions: I assume (1) that sorting orders are not important and (2) that values appear only once. If some of this is not true, please state so. Your question was so brief I had to do some interpretation.

for people who are not familiar with sets unique short cuts: c_[k] |= v, can be written as:
c_[k] = c_[k].union(v).

see sets


Post a Comment for "Combining 2 Lists Like UNION Ie Keeping 1 Copy Of Mutual Items Between 2 Lists And Appending Others"